Database-level health detection in SQL Server Always On Availability Groups

Introduction

SQL Server Always On Availability Groups create a group of databases that fails over together in case the primary replica is unavailable. These availability group databases failover manually or automatically depending upon your AG configuration. By default, SQL Server fails over all databases from the primary replica to the secondary replica if the primary replica is down or SQL Services stopped in the primary instance.

You can create multiple SQL Server Always On Availability Group in SQL Server configuration. Usually, we group the application dependent databases in an availability group. These databases must be failed over together for smooth working on the application.

Suppose you have two databases in an AG group. Your primary instance is online, but one of the AG databases becomes unhealthy on the primary replica. Your application reports issue because it could not access a database in the AG group. SQL Server does not perform a failover of the AG group to the secondary replica. It requires manual intervention to fix the issues at the primary replica or perform a manual failover to the secondary replica. Sometimes we want that SQL Server should check the database health as well in an AG group and perform automatic failover if any of the databases is not healthy.

Do we have the option to enable the database-level health detection in SQL Server Always ON? Let’s explore it in this article.

Prerequisites

You should configure SQL Server Always On Availability Groups in synchronized mode. You can refer to this article, Add a new node into existing SQL Server Always On Availability Groups for further insights.

In the earlier articles, we have created an availability group with the following configuration.

Current primary replica: SQLNode1
Secondary replica in Synchronous mode: SQLNode2
Secondary replica in Asynchronous mode: SQLNode3
Two availability group databases [MyNewDB] and [SQLShackDemo]

SQL Server Always on Availability Group setup

An Overview of the database-level health detection for SQL Server Always On Availability Groups

SQL Server 2016 introduced **Enhanced database failover **to monitor the database health and trigger the failover for that availability group. This feature is known as Database Level Health Detection in an availability group.

We can enable this feature at the availability group; therefore, it monitors all databases corresponding to that availability group. It cannot be enabled for an individual database in an AG group. Your database should be in synchronized data commit with automatic failover mode so that SQL Server can failover it in case of any issues.

SQL Server performs the following checks for the database-level health detection. If the database status is online and SQL Server could not write in the transaction log of an AG database, it initiates AG failover. It also checks for the following error ids as part of the database health check.

Suppose you have the following AG architecture with one secondary replica. Your AG database stores data files in D drive and log file in the L drive.

disk configuration

Now, on your primary replica, log drive L fails. SQL Server is not able to write in the AG database transaction log. It fails a successful database health detection. SQL Server availability group fails over from primary to the secondary replica.

disk failure and failover

In another scenario, your data drive fails on the primary replica, but your log drive is intact. In this case, no AG failover occurs because SQL Server can write in the transaction log.

data disk failure and failover

Database-level health detection does not monitor the idle database or disk uptime. It does not directly monitor the database files. In the above image, we saw that disk failure causes the availability group automatic failover. Suppose your AG database is idle and does not have any active transactions. In the idle state, SQL Server does not perform any physical activity to the disk. Now, suppose, you have other databases as well in the disk and disk gets full due to those database transactions. Due to disk space issues, SQL Server might not be able to perform any transactions for the databases lying in that disk. Later. You perform a transaction and SQL Server not able to write database checkpoints and write the data into the disk. At that time, SQL Server finds unhealthy database status and perform a database failover as a result of the database health change.

Similarly, if your database can satisfy the data page from the memory, SQL Server does not require to perform a physical read from the disk. It gets the data from the buffer pool memory. It might also not cause your database to failover for SQL Server always-on availability group.

#alwayson availability groups #installation #setup and configuration #database

Introduction

Prerequisites

An Overview of the database-level health detection for SQL Server Always On Availability Groups

sqlshack.com

Database-level health detection in SQL Server Always On Availability Groups