ObjectRocket gave to me... two HA Postgres replicas!

We’re going to keep the holiday theme going and introduce another great new feature on the ObjectRocket service, and that’s High Availability (HA) for our PostgreSQL service. Every datastore we offer on ObjectRocket is built for production workloads, which generally requires HA, so we’ve been working hard over the past few months to deliver PostgreSQL HA just in time for the holidays.

Why High Availability is important
If the terms ‘High Availability’ or ‘HA’ are unfamiliar to you, let’s do a quick review of why HA is important. First and foremost, the three main benefits of HA are:

Zero or greatly reduced downtime
Protection against data loss
Increased database performance
There are a number of methods for implementing High Availability across datastores; even on a single datastore like PostgreSQL there are numerous technologies available. However, a key component to almost any HA solution is a replica of your data. What this means is that you only see one dataset/database, but behind the scenes, there are one or more exact copies (replicas) of that data. In the event that the main database (called the“master” in most replication schemes) encounters an issue like hardware failure, software failure, or corruption, a replica can then be used to replace the master.

That last point touches on the second main component of most HA systems, and that is an automated failover mechanism (or promotion, or election in other schemes). Replication, as described above, ensures that you always have multiple healthy copies of the data, but you need something else to:

Detect that a problem on the master has occurred
Select an appropriate replica to promote to master
Repair the failed master and/or create a new replica (to replace the one that has been promoted)
The final component, (which is sometimes combined with the second) is a device to handle the routing of requests to the right node. If your application is pointed to the master for writing data (since writing to a replica is a ‘no no’), but that master fails, how does your application know to point to the newly promoted master? Once again, there are various ways to solve this, but the most popular are proxies or load balancers; rather than point your application directly to the database server, you point your application to the proxy/load balancer and it determines the right place to send your traffic.

To tie it all together, the automated failover system and proxy/load balancer work together when a failover occurs. When a new master is promoted, the proxy/load balancer is informed to direct traffic to the new master. Nothing changes in your application, and besides a potential blip in responses during promotion, the application doesn’t even need to know a promotion has occurred.

This is a greatly simplified overview of the process, but it covers the fundamental components. Now let’s dive into the technologies that we used for each of those components on our solution.

The Technologies We Used
Now that we’ve reviewed the key components, let’s dive into how we’re providing each of the components above.

Replication
This one was easy, because Postgres supports a number of replication schemes natively. No new tools required for this one. We support a configurable number of replicas; either 1 or 2 is supported today, but we’ll be expanding the options in the future.

One other finer point of replication is the concept of synchronous and asynchronous replication. It can get pretty in-depth, but the key point here is that with synchronous replication the master waits for each replica to confirm that a write has completed to the replica before the master considers the write complete. In asynchronous, the master fires writes to replicas but doesn’t confirm they’ve completed before confirming the write to the application/client.

The solution that we use (and we’ll get to in the following paragraph) enables us to support both asynchronous and synchronous replication. By default we enable synchronous replication (there are even more settings down this rabbit hole, but we configure replication in our environment to confirm that a write has been written to the Write-ahead Log (WAL) on master and to at least one replica), but you can alter the settings per transaction or per session.

#postgresql #uncategorized

objectrocket.com

ObjectRocket gave to me... two HA Postgres replicas! | ObjectRocket