System Resiliency Planning : Part 3 — Data Replication

Peter Aten
4 min readJan 21, 2021

The risk of data loss, or the burden of data recovery, is perhaps the most complex risk associated with disaster recovery. This post explores concepts related to this topic.

Other posts in this series:
Part 1 — Setting Objectives
Part 2 — Options to Support High Availability
Part 4 — Example Scenarios and Summary

Data Replication

Protecting your data will include, at a minimum, replicating it. There are a few options for data replication, each with pros and cons.

Backup

This could also be characterized as “periodic data replication.” At regular intervals, a snapshot copy of the database is taken and stored in a different location.

Replication

This could also be characterized as “continuous data replication,” supporting a zero or near-zero RPO. As each transaction is committed to the primary database, it is also committed to one or more secondary databases. These commits to secondary databases can be synchronous, i.e., the commit to the primary database is not considered complete until all secondary databases have also recorded the commit providing “guaranteed consistency” or zero RPO.

More commonly, particularly across regions, commits are asynchronous where the secondary databases are “eventually consistent” with the primary database; the gap in consistency can range from milliseconds to minutes. The benefit here is reduced latency of the database commit. This has to do with physics. It simply takes longer to move data to multiple locations simultaneously than to one now and others later.

Data Replication Type Pros and Cons

Some database products have synchronous multi-AZ replication within a region. Some have asynchronous replication between AZs or regions. In rare cases, some products have synchronous multi-region replication. It’s always a good idea to confirm what the product you are using or considering supports. Notable examples at the time of writing (January 2021) include:

When replicating data, keep in mind any legal restrictions on the geographic location of your data.

Data Backup vs. Replication

Backups are not mutually exclusive from one of the replication options, so both methods can be employed if appropriate. However, are backups still relevant with data replication? The downside of replication is that changes to the primary database are also reflected in any secondary databases, even when they include transactions that damage the data. Therefore, if someone accidentally deletes some data in your production database, that data will be immediately deleted in all the copies of your data. Same with data corruption.

You should consider doing backups in addition to replication if you can imagine a scenario where restoring from a backup with the associated need to recover (or permanently lose) data is preferable to keeping the replicated data.

Data Loss vs. Data Unavailability

Imagine a system where data is replicated asynchronously to another cloud region when the primary region suffers a network failure, which both brings down your application, but also immediately stops data replication to the secondary region before it’s completely in sync.

Have you lost data?

It depends. Technically, the un-replicated data is not actually lost but more accurately described as unavailable. Depending on your RTO, you may have some options to avoid having to recover lost data.

  • If the primary region network recovers before you fail over to the secondary region, problem solved! Your previously unavailable data is now available again, and a failover is not required.
  • If you fail over to the secondary region, can you do so as a read-only system for some period of time, extending the opportunity for the primary region to recover and again avoid data recovery?

As soon as the system fails over to the secondary region and begins processing transactions, you’ve transitioned from data unavailability to potential data loss.

Summary

Protecting your data can be a complex undertaking, and hopefully this has provided you with a high level understanding of concepts and options that will be helpful in determining the appropriate strategy for your system. In the next and final post in this series, I’ll explore a few hypothetical examples to illustrate how different strategies are appropriate for different situations.

Next post: Part 4 — Example Scenarios and Summary

--

--

Peter Aten

Interested in making great software, and particularly in how to make teams more effective