The risk of data loss, or the burden of data recovery, is perhaps the most complex risk associated with disaster recovery. This post explores concepts related to this topic.

Other posts in this series:
Part 1 — Setting Objectives
Part 2 — Options to Support High Availability
Part 4 — Example Scenarios and Summary

Data Replication

Protecting your data will include, at a minimum, replicating it. There are a few options for data replication, each with pros and cons.

This could also be characterized as “periodic data replication.” …


Building on prior posts in this series, let’s explore how to apply what we’ve discussed using some hypothetical examples.

Part 1 — Setting Objectives
Part 2 — Options to Support High Availability
Part 3 — Data Replication

Resiliency/DR Scenarios

This system has the following characteristics:

  • Slow rate of data change — there are few users and data is updated ad hoc.
  • Data is useful for a long time — these products have a long lifespan.
  • Product updates are usually made via spreadsheet upload, so there is a record of changes that can be reprocessed.
  • Downstream systems cache the product catalog in an…

Imagine a system that you’re responsible for having an outage. That’s bad enough, but now imagine that when your system comes back online, a bunch of data is missing, lost forever. Feeling nauseous?

Systems failures are sudden, infrequent, and inevitable, and without some advance preparation they can be extremely painful. Some forethought can go a long way to avoiding, or recovering gracefully from, a potential systems disaster.

It’s common for product owners and development teams to focus first on delivering functional value, and last (if ever, in many cases) on resiliency and disaster recovery planning. Those priorities are understandable, but…


High Availability (HA) is a desirable but sometimes poorly understood concept in system resiliency design. This post explores options to increase HA for your system, particularly via cloud computing infrastructure.

Other posts in this series:
Part 1 — Setting Objectives
Part 3 — Data Replication
Part 4 — Example Scenarios and Summary

High Availability (HA) vs. Disaster Recovery (DR)

Are these the same thing? If I have one, do I need the other? What separates them?

HA is a widely-used, non-specific term to describe systems and/or system components that mitigate one or more failure modes which would otherwise cause a DR situation, generally by adding redundancy to…


Norms reduce ‘friction’ in business and social settings by allowing people to know what to expect from each other, without constantly needing to re-negotiate those expectations. They are the ‘rules of the road’, both written and unwritten. I propose that norms can be powerful tools to help a team achieve high performance, but you’re far more likely to realize this benefit if you approach the creation of norms as a deliberate act.

Norm: a standard or pattern, especially of social behavior, that is typical or expected of a group

Not all norms are well considered or even consciously adopted. For…

Peter Aten

Interested in making great software, and particularly in how to make teams more effective

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store