Azure Service Health continuously notifies you of issues that may affect the availability of your environment, such as service incidents, planned maintenance periods, or regional outages.

We’ve recently enhanced our Azure integration to include additional support for monitoring Service Health issues, enabling you to keep tabs on the health of your Azure environment and take proactive measures to mitigate downtime. Within minutes of setting up the integration, you’ll see rich, contextual Service Health events appear within your event stream, where you can monitor and correlate them with data from more than 450 infrastructure technologies (including other Azure services), all in one place.

Get a cohesive view of Azure Service Health issues

When a new issue is identified, Azure reports an Azure Service Health event indicating the nature of the problem and affected resources and regions. Azure then continuously updates the status of the issue via a series of events until it is finally resolved.

With Datadog, you can clearly monitor every stage of Service Health issues within our event stream under the “Azure Service Health” namespace. Datadog collects these events automatically for all subscriptions being monitored with our Azure integration. You’ll see each issue cohesively grouped by its Tracking ID, enabling you to get full visibility into its current status and progression, from start to finish. This makes it easier for you to keep track of high-priority issues and follow up on their progress.

Slice and dice Azure Service Health events using tags

The Azure Events API provides valuable metadata around each Service Health event. Datadog automatically converts this metadata into key:value tags that you can use to easily filter and search through all your events. To point out a few:

  • service: The impacted Azure service(s) (e.g., Azure Virtual Machines)
  • status: The status of the event (i.e., active or resolved)
  • region: The impacted Azure region(s) (e.g., US East, Global)
  • incident_type: The type of Service Health event (ServiceIssue, PlannedMaintenance, SecurityAdvisory, HealthAdvisory)
  • level: The severity level of the event (i.e., informational, warning, or critical)

In addition to these tags, each Azure Service Health event includes a description that captures the essence of the issue from the perspective of the Azure engineers investigating the problem. Some events may also contain mitigation steps for addressing the issue and reducing its impact.

#integration #datadog #azure service health

Monitor Azure Service Health events with Datadog
1.35 GEEK