Last year we adopted Application Insights when entering the Azure ecosystem and it replaced a database logging system we had. The logs were always meant to be ephemeral so the database solution was unnecessary I/O. But the task of migrating them always just sat in technical debt.
Adding App Insights was a quick solution to this - all we needed was to add a client to our existing ILogger
.
Querying your Application Insights logs
The Amazon MWS report API and its availability has always been a sore spot. The API status page is rarely correct during downtimes and occasionally reports get stuck in processing.
We needed to be reactive to these events. Using Application Insights, we would track:
- The number of reports being requested every N minutes
- Occurrences of reports being stuck in a
Processing
state
Azure presents App Insights data in a column format:
timestamp | message | severityLevel | operation_Id |
---|---|---|---|
2019-07-23T10:00:05.000Z | Message1 | Error | FunctionA |
2019-07-23T10:00:10.000Z | Message2 | Warning | FunctionB |
And in the Azure Portal, you can query it as you would a SQL table. For example, you could select just the messages over a certain timeframe:
traces
| where timestamp > ago(1h)
| project message
Or distinct messages:
traces
| where timestamp > ago(1h)
| distinct message
Or in our case, we could track the number of reports that have been in a processing state for over 5 mins:
customMetrics
| where timestamp > ago(5m) and name == "ReportType_X_InProgress"
| summarize ReportCount = sum(value)
Alerting on actionable events
Using this Kusto query pattern, we set up alerts that became actionable tasks. After building a query, you can set up an alert:
- Alert name
- Alert condition (a kusto query that returns results)
- Alert threshold (the alert will trigger if the query result passes this threshold)
- Action group (defines who will be notified if the alert triggers)
We could then assign ourselves as the action group so it would notify us via e-mail or SMS. Upon receiving these alerts, a ticket was then opened with support so that the stuck reports could be removed.