Turning Application Insights telemetry into actionable events

July 23, 2019

azure amazon mws api

Last year we adopted Application Insights when entering the Azure ecosystem and it replaced a database logging system we had. The logs were always meant to be ephemeral so the database solution was unnecessary I/O. But the task of migrating them always just sat in technical debt.

Adding App Insights was a quick solution to this - all we needed was to add a client to our existing ILogger.

Querying your Application Insights logs

The Amazon MWS report API and its availability has always been a sore spot. The API status page is rarely correct during downtimes and occasionally reports get stuck in processing.

We needed to be reactive to these events. Using Application Insights, we would track:

The number of reports being requested every N minutes
Occurrences of reports being stuck in a Processing state

Azure presents App Insights data in a column format:

timestamp	message	severityLevel	operation_Id
2019-07-23T10:00:05.000Z	Message1	Error	FunctionA
2019-07-23T10:00:10.000Z	Message2	Warning	FunctionB

And in the Azure Portal, you can query it as you would a SQL table. For example, you could select just the messages over a certain timeframe:

traces
| where timestamp > ago(1h)
| project message

Or distinct messages:

traces
| where timestamp > ago(1h)
| distinct message

Or in our case, we could track the number of reports that have been in a processing state for over 5 mins:

customMetrics
| where timestamp > ago(5m) and name == "ReportType_X_InProgress"
| summarize ReportCount = sum(value)

Alerting on actionable events

Using this Kusto query pattern, we set up alerts that became actionable tasks. After building a query, you can set up an alert:

Alert name
Alert condition (a kusto query that returns results)
Alert threshold (the alert will trigger if the query result passes this threshold)
Action group (defines who will be notified if the alert triggers)

We could then assign ourselves as the action group so it would notify us via e-mail or SMS. Upon receiving these alerts, a ticket was then opened with support so that the stuck reports could be removed.