Backups: Securing your Cloud Resources against disaster

This post will detail the logical structure and leave the implementation up to you to use with your preferred tools. We will focus on implementing a full backup strategy within AWS.

Backups: Securing your Cloud Resources against disaster
Unfortunately, we can't just lock things up in vaults anymore - we have to be more thorough and diligent than ever.

In today’s world of high-availability clouds, there is still some paranoia around the complete loss of service for cloud providers and backups. In particular, many businesses are concerned that if they put all their eggs in one basket, they may be forced out of business if say, Amazon, were to go out of business. While this is highly unlikely, it is technically a possibility. For those organisations with a very low risk profile, there is a simple solution for you to setup to ensure that you can maintain a cloud-agnostic(ish) environment.

What are we preparing for?

First, we must define what each problem is that we are going to solve. At a high level, we want to make sure that our business is protected against disaster as possible. This can be broken down into multiple scenarios that need to be mitigated against:

  • The deletion of data or resources from our tenancy.
  • The permanent loss of an Availability Zone.
  • The permanent loss of a Region.
  • The permanent loss of our provider-level backups (e.g. tenancy backups being deleted).
  • Our cloud provider going out of business or ceasing operations in our country (e.g. trade sanction).

These scenarios are ordered from most likely to least likely. Each tier of problem is progressively harder. Each scenario gets a little bit harder to mitigate against, although it is feasibly possible to prevent loss in all cases.

How will we prevent loss?

Preventing loss is pretty straight forward. In each scenario, we just need to take backups of our resources. The problem that needs to be solved is simply how to get the backups into a place that is significantly separated from our risk scenario.

Deletion of data and loss of an AZ

Deletion of data or resources within AWS is very simple to guard against. This simply involves implementing AWS Backup to take backups of your resources using Snapshots. Snapshots are backed by S3 and as such, they are replicated across AZs and are tolerant to AZ failures. AWS Backup provides a simple and cheap method for backing up your resources within a region.

Loss of a Region

The loss of a region, should not impair the operability of your application. AWS has many different resources available to manage replicating data across regions. It is simply a case of implementing cross-region backup solutions on a service-by-service basis. We will cover off two of the most important scenarios here.

EC2 and RDS

EC2 and RDS support copying Snapshots across regions. To ensure that your data is available in another Region, you simply need to make sure the Snapshots taken by AWS Backup are copied across to your backup Region regularly. These copied Snapshots incur duplicate storage costs, however within a region Snapshots are de-duplicated so it will not cost more than double of our current storage costs for each additional backup Region.

S3

S3 has a super handy feature known as Cross-Region Replication (CRR). CRR ensures that data between two regions stays in sync. The great part about CRR is you can make manual backups of any data that doesn’t natively support cross-Region backup support. Once your manual backups are taken and stored in S3, CRR takes care of duplicating this data into your Backup region. By also implementing versioning on these buckets, you ensure that when objects are “deleted” and that action is synced, the previous versions are still available (for higher security implement MFA Delete). It’s also important to transition data to Glacier and take advantage of Vault Locking. Vault Locking means you can create Write-Once-Read-Many (WORM) archives that are permanently available as per the policies that are set. This means that anything stored in S3 should not be delete-able.

Loss of Provider-based Backups

The loss of Provider-based Backups is down to malicious action. In almost all circumstances, it is unlikely that Amazon would induce the deletion of an account without any notice. It is more likely that an unauthorised individual would access an account and delete backup resources to impair or destroy your business. There is a simple solution for you to ensure that your resources are protected: setup a second account with global deny Service Control Policies limiting deletion of Backup objects. This account will have a secure root account that has its password and MFA stored offline and in an inaccessible location. Normal users can only add to the account and cannot destroy resources. All S3 bucket actions should be versioned and MFA Delete implemented. All of these restrictions make this account and effective DR/”offsite” backup solution by logically decoupling your backup resources from your operations account.

Once this account has been setup, you can simply setup the following:

  • Copy EBS snapshots to the second account.
  • Copy RDS snapshots to the second account.
  • Setup CRR for any critical S3 backups to sync into the second account.

To ensure a very high level of redundancy, you can also implement the Region-syncing features from “Loss of a Region” as well.

Cloud Provider Unavailability

While it is very unlikely for a Cloud Provider such as AWS or Azure to go out of business, it is not impossible. Therefore, we should try and secure ourselves against this possibility by setting up backups that are provider-agnostic. An additional feature of setting up these agnostic backups, is that it provides us with a pattern of migrating to a new Cloud Provider if we were to choose so in the future.

Unfortunately, if we are using agnostic methods for Backup this does limit the solutions we can use. In my own experience, at the very least the following should be done:

  • All databases should be backed up using vendor provided DBMS tools (e.g. mysqldump).
  • Configuration of servers should be fully documented.
  • IaC should be stored in code repositories.
  • Any other data considered important to your business (e.g. Git repositories) should also have backups taken.

Once you’ve taken backups and documented all necessary components of your architecture, is all you need to do is store this data somewhere that isn’t your current provider. This could be as simple as downloading all the data onto a very large HDD you have sitting next to your desk, or syncing the data into Azure Blob Storage (or a similar service from another Cloud Provider). Other solutions include using tools that take backups of services and can store them agnostically in a third-party location.

Once you have completed these final step, your environment should be fully protected against almost all disasters (except maybe a dinosaur-like extinction event).

Sleeping easy

Now that we’ve got everything backed up, you can sleep easy. We’ve taken a very simple approach to Backups an ensured that our business is protected in almost any scenario.

Thanks for reading! If you’re interested in more articles on securing your business against threats, please see Establishing Trust: Why TLS should be important to you. If you’d like to offer some suggestions for future content, please contact me on Twitter @stophammotime.