Monitor SQL Server Always On Availability groups using extended events

In this 33rd article of SQL Server Always On Availability Group series, we will use extended events to
monitor the availability group.

Introduction

Database professionals’ primary role is to do proactive monitoring for ensuring system availability. DBA must
investigate the problem root cause analysis in case any critical event or downtime occurs. You can use various
things for investigation purposes such as SQL Server logs, Windows & Cluster logs, and dynamic management views
output, profiler and extended events.

SQL Server Always On Availability Group provide robust high availability and disaster recovery solution. It is
equally essential and beneficial to monitor the AG group ownership, their synchronization states, failover, critical
parameters. Let’s go ahead and see how extended events are used in monitoring the availability groups.

Environment details

You can use earlier articles in this series (see TOC at the bottom) and configure two-node SQL Server Always On
Availability group replicas as shown below. The availability group databases are in the synchronized commit mode.

  • Primary Replica: SQLNode1INST1
  • Secondary Replica: SQLNode2INST1
  • Database: [MyNewDB]

SQL Server Always On Availability group environment

SQL Server Always On Monitoring using the extended events

SQL Server Extended events are a lightweight monitoring tool that captures useful SQL instance and database
parameters. We can collect the required database and use it for monitoring and troubleshooting purpose. For example,
in an article, Monitoring SQL Server deadlocks using the system_health extended event, we use it to get deadlock XML and graph using
the default system_health session. We can use SSMS GUI or t-SQL to create, configure and retrieve data from extended
event files.

Once we configure the availability group, the create Availability Group Wizard automatically creates the
AlwaysOn_health extended event session on all replicas participating in the AG configuration. You
can view it in SSMS navigating to Management -> Extended Events -> AlwaysOn_health.

If you configure the availability group using t-SQL or new availability group, it does not start the alwayson_health
event session. Therefore, you should verify that the extended event session is running and collecting the required
data.

AlwaysOn_health extended event

To understand the AlwaysOn_health event session, it is a good idea to script it out. Right-click on the event
session-> Script Session as -> Create To-> New Query editor window.

Script Session

As shown above, it has several extended events along with the error numbers to capture. We use the sys.dm_xe_objects to retrieve a brief
description of these extended events.

extended events

alwayson_ddl_executed:

This event occurs when we execute a data definition language (DDL) statement such as CREATE, ALTER, DROP on an
availability group database. It does not capture detailed information about DDL. For example, you cannot retrieve an
execution plan. Its main purpose is to indicate you with the DDL followed by planned or unexpected failover.

availability_group_lease_expired:

If the Windows failover cluster and availability group have a connectivity issue, it might cause the lease expired
messages in the event logs. You might face automatic failover for the synchronous replica’s in case it occurs on the
primary replica.

availability_replica_automatic_failover_validation:

For an automatic failover, SQL Server validates that the primary target replica is synchronized or not. It provides
an AG failover point for failovers. DBA’s can use the information to investigate the cause of an automatic failover
using this extended event.

availability_replica_manager_state_change:

This event occurs if the availability replica manager states changes. It is also a useful event session to
investigate the reason for a different state.

availability_replica_state_change:

We can use this event to monitor the state of the availability group replica. It is also helpful to understand the
internal of an AG failover.

error_reported:

It tracks various error numbers to find out the connectivity errors for availability group endpoints. You can query
the sys.messages table to check the description for these different error id’s.

In the query output, you get the error id and its corresponding description.

error description

To summaries these error ids you can refer to the following table for errors:

Category

Error id

description

Consistency issues Corruption

823

824

829

Logical consistency error, disk corruption, database consistency issues reported by DBCC CHECKDB.

Changing roles

1480

It gives you a message once the AG replica changes its role, such as secondary to primary.

Endpoint

9691

9692

9693

If SQL Server could not listen for an IP address or port, it logs the message in the extended event. For
example, in the case of the issue of SQL listener, we get the corresponding event.

Connection handshake

28034

28036

28047

28048

28080

You get these errors in case of connection handshake failure. For example, if your AG owner or service
account does not have to connect permissions for the endpoint, it raises a connection handshake message.

Connection timeouts

35201

35202

35204

35206

35207

In case your availability group replicas get any connection timeouts, it logs the information in the
extended event session.

Windows failover informational messages

41048

41049

41050

41051

41052

41053

41054

41055

These are the informational messages in different states of replicas during failover. It is useful to
understand the failover process in SQL Server Always On Availability Groups.

AG failover issues

41142

41144

Suppose your SQL Server Always On Availability group replica tries for automatic failover, but the new
primary replica is not ready for it, it logs the corresponding event in the extended event.

hadr_db_partner_set_sync_state

This event is to track the HADR partner sync state changes.

lock_redo_blocked

This event is useful to track the redo thread blocks that might cause latency in the data synchronizations.

Data Storage for Alwayson_health extended event

Open the Alwayson_health extended event properties and navigate to Data Storage. By default, it stores the collected
data in an event file. This event file gets created in the instance log directory.

  • Maximum file size: 5 MB
  • Maximum number of files: 4
  • Enable file rollover: It rollover the files once it extended event collected data fills the 4 files with 5 MB
    each. It again starts overwriting the data from the first event file

Enable file rollover

Browse to your Log directory for SQL instance, and here you see the event files.

event files

Monitor SQL Server Always On Availability Groups using the Extended event

We can extract data from the AlwaysOn_health extended event using GUI and T-SQL script.

You can expand the AlwaysOn_health in SSMS, right-click on the package0.event_data file and select View targeted
data. If you want to view the live data, click on the Watch live data on the root Alwayson_health folder in SSMS.

It opens the event viewer to show the collected information. You can filter, aggregate, and choose columns in this
window for your required data.

Monitor AG using the Extended event

Sometimes, it is not convenient to get information using the graphical method. You can use t-SQL as well to get data
from the extended event files and choose the required data.

To demonstrate the data collection for SQL Server Always On Availability Group, I suspend the data movement from the
secondary replica. You can refer to series articles for understanding the suspend and resume data movement.

Expand Availability Group Database -> Suspend Data Movement.

Suspend Data Movement

Once the data movement suspends, wait for some time and again resume the data movement using the Resume Data
movement wizard.

Resume Data Movement

Use the following query to filter the extended event collected data for error 35264(suspend),35265 (resume).

In the below query, we use sys.fn_xe_file_target_read_file to read the event file and filter the data.

In the output, you can see the suspend and resume data movement for the availability group database. In the
suspended message, it specifies that the database [MyNewDB] is suspended due to user activity.

Filter the extended events

Similarly, you can filter the extended event session for error for troubleshooting SQL Server Always On Availability Group.

Apart from the default availability group, you can configure additional event sessions as per your requirement. To
check the extended events for HADR, you can filter events from the sys.dm_xe_objects as shown below.

It gives you a list of 178 events. Here, you can see a snippet of all corresponding extended events.

additional event sessions

Conclusion

In this article, we learned the extended events for monitor SQL Server Always On Availability group related events
and errors. You can add additional monitoring events to collect the relevant data as per your requirement.

Table of contents

Rajendra Gupta
Latest posts by Rajendra Gupta (see all)

Author: admin

Leave a Reply

Your email address will not be published.