Useful Considerations for SQL Server Disaster Recovery

An overview of RPO, RTO and SLA

In the digital era, data availability and security is a critical task for every organization. It is a database administrator’s duty to create a fail-safe mechanism to ensure your database is available as per the defined RPO, RTO and SLA.

Recovery Point Objective (RPO): It refers to the maximum amount of data you can afford to lose. It is aligned with your database backups and recovery strategies. For example, if you take database backup every hour, you can lose a maximum of one hour of data. If your RPO is 10 minutes, you are not in a good position from a data recovery point of view

In the below image, we see the RTO and RPO corresponding to SQL Server Disaster Recovery.

SQL Server Disaster Recovery

Recovery Time Objective (RTO)

Once a disaster occurs, DBA’s primary responsibility is to recover your database asap for business continuity. Therefore, the RTO is a measure of how long your organization can afford the downtime of databases before things come back to normal. In the above image, we see the recovery time objective after a disaster occurs. Usually, we should define our system and DR process in such a way to recover asap in less than the RTO.

Service Level Agreements (SLA)

The third useful parameter is the Service Level Agreement between the customer and the vendors. It covers the service quality, availability and responsibilities. It is defined based on the RPO and RTO of organization requirements for SQL Server Disaster Recovery.

Business Continuity for SQL Server Disaster Recovery

There are several parameters you should consider meeting the recovery objectives (RPO, RTO) and SLA’s regarding the databases for SQL Server Disaster Recovery.

Define Database backups policy

You must define the backup policy for both critical and non-critical database systems. In SQL Server, we combine the full, differential, transaction log backups for restoring databases in case of any issues.

Combine different backup mechanism to ensure you minimize the downtime and effort in the restoration. For example, you use a combination of full, differential and transaction log backups for large databases while for smaller databases, you can take full backups and regular log backups. You can also leverage filegroup backups, piecemeal restores, the smart transaction log and differential backup as well to take backup depending upon data changes
Store the backups on a different media such as SAN, Tape, Cloud URL. You should not store the backups on the drives where your data and log file exists. In that particular drives goes down, you lose both data and backups
You must configure the backups even if you have high availability infrastructure such as VM snapshots, Storage level replication, Windows failover clusters or SQL Server Always On Availability Groups

Dialog showing the backup and restore technique for SQL Server Disaster Recovery

Regular test the database backups

Sometimes, database professional feels good that they have 100% compliance on database backup. You also report 100% backup compliance to your regular reports to management. But, someday system crashes, and your backup would not work.

To avoid this humiliating scenario, database professionals must conduct regular database restoration drives. In these drives, you can choose a random restore point ( RPO) and restore your backups on a test environment and validate that you meet the RPO for SQL Server Disaster Recovery. It ensures your backup policy is aligned with your organization requirement as well as it boosts your confidence in database backups for any unforeseen situation.

#sql #database #programming #developer