Turning Application Insights telemetry into actionable events

azure amazon mws api

Last year we adopted Application Insights when entering the Azure ecosystem and it replaced a database logging system we had. The logs were always meant to be ephemeral so the database solution was unnecessary I/O. But the task of migrating them always just sat in technical debt.

Adding App Insights was a quick solution to this - all we needed was to add a client to our existing ILogger.

Querying your Application Insights logs

The Amazon MWS report API and its availability has always been a sore spot. The API status page is rarely correct during downtimes and occasionally reports get stuck in processing.

We needed to be reactive to these events. Using Application Insights, we would track:

  • The number of reports being requested every N minutes
  • Occurrences of reports being stuck in a Processing state

Azure presents App Insights data in a column format:

timestampmessageseverityLeveloperation_Id
2019-07-23T10:00:05.000ZMessage1ErrorFunctionA
2019-07-23T10:00:10.000ZMessage2WarningFunctionB

And in the Azure Portal, you can query it as you would a SQL table. For example, you could select just the messages over a certain timeframe:

traces
| where timestamp > ago(1h)
| project message

Or distinct messages:

traces
| where timestamp > ago(1h)
| distinct message

Or in our case, we could track the number of reports that have been in a processing state for over 5 mins:

customMetrics
| where timestamp > ago(5m) and name == "ReportType_X_InProgress"
| summarize ReportCount = sum(value)

Alerting on actionable events

Using this Kusto query pattern, we set up alerts that became actionable tasks. After building a query, you can set up an alert:

  • Alert name
  • Alert condition (a kusto query that returns results)
  • Alert threshold (the alert will trigger if the query result passes this threshold)
  • Action group (defines who will be notified if the alert triggers)

We could then assign ourselves as the action group so it would notify us via e-mail or SMS. Upon receiving these alerts, a ticket was then opened with support so that the stuck reports could be removed.