Five Key Considerations About Backup Testing

You are only as good as your last backup, so it is said, so making regular backups is a keystone of any business continuity and disaster recovery strategy.

Firms need to keep copies of their data to protect against hardware failures and system outages, as well as power or network disruption, flooding or fire.

Backup protects your business against data corruption caused by application errors and accidental deletion. And increasingly, off-site backups are a vital defence against malware, and especially ransomware.

System downtime is expensive. An oft-quoted figure from industry analyst firm Gartner puts the cost per minute of downtime at $5,600, or more than $300,000 an hour.

With businesses relying increasingly on online business processes, the cost could be higher. And if businesses fail to back up their data effectively, there’s a risk that they might not be able to recover at all.

Testing, testing…

Backups, however, only protect the business if they work. This makes it vital for organizations to test backups, and do so regularly.

The growth in ransomware makes this even more important, as a clean, air-gapped backup might be the only way to restart the business after an attack. Yet analysts estimate that around one in three businesses fail to test their backups. Even those that do, might not do so effectively.

This article sets out five priorities for IT teams, to ensure their backups really do protect the business.

The key aim of backup testing is to ensure the business can retrieve its data and continue operations. Backup policies should be seen in tandem with wider business continuity or disaster recovery plans, as well as the data protection strategy.

These policies should set out the recovery point objective (RPO) and the recovery time objective (RTO). The RPO sets out how old the most recent backup can be, or put another way, the amount of data loss the organization can tolerate and still operate. The RTO specifies how quickly systems must be recovered. Unless the business tests recovery, CIOs will not know if they can meet the RTO and RPO, or if recovery works at all.

1. What to test

Businesses should test that they can restore files, folders and volumes from backups on a storage volume, user and application basis.

For business applications, the recovery of databases is key. For unstructured and user data, the key need is to restore server or network-attached storage (NAS) volumes.

But simply restoring from a backup volume, tape or the cloud is not enough. Businesses need to plan for outages that damage the whole IT environment, not just some data.

In the case of electrical or hardware failure, they need to test they can install and spin up new hardware, or failover to a backup site and restore data to it.

The business should also test full and partial restores. Restoring a whole environment takes time and is disruptive. Smaller-scale tests for vital or vulnerable data, as well as tests to recover deleted or corrupted files, go hand-in-hand with large-scale disaster recovery (DR) drills.

Lastly, IT teams should test that they can recover data held off-site, in software-as-a-service (SaaS) applications and in the cloud.

2. How often to test

Backup testing should be regular and routine. In an ideal world, businesses would test every backup, but that is rarely practical.

Instead, IT teams should ensure a regular testing schedule based on the business’s risk appetite and the importance of its data.

Although some businesses test annually, a large annual DR exercise is not enough. Instead, data protection experts recommend tests on a monthly or weekly basis, and potentially more often for critical systems, applications and data.

Businesses should also test backups before and after a system change or upgrade; a number of high-profile outages in the banking sector, for example, have been caused by simple hardware upgrades. Organizations should also re-test their systems after an outage and test new systems before they move into production.

3. Can you restore the data?

The first question is whether your backups work, physically. This might be obvious with tape and other removable media, but businesses should also test that their recovery software can successfully restore data from disks, offsite datacentres and the cloud. Moving large volumes of physical media or restoring over a WAN and LAN will always be a challenge under DR conditions.

Testing will show up any weak spots. Tests also confirm if the business can meet its RTO, RPO and other regulatory requirements.

“Organizations will have to set a tolerance for disruption for each ‘important business service’ and ensure they recover these. This includes the recovery of the systems that support the services themselves,” says Elliot Rose, head of cyber security at advisory firm PA Consulting.

4. Is recovery accurate and effective?

As well as testing that recovery is (physically) possible, IT teams need to check that the right data is recovered to the right systems.

Although backup systems can check logical recovery using tools such as checksum validation and validation via virtual machines, further checks will be needed to guard against data corruption.

These checks will need business input – subject matter specialists are best placed to spot incorrect or damaged datasets. The business will feed back on whether systems were recovered in the best order, and if RTOs, RPOs and regulatory objectives were met.

And, although it’s not directly part of a recovery test, the CIO should check that business continuity measures operate correctly during an outage, and that ransomware protections are maintained during data recovery from offsite media.

5. Are recoveries consistent?

CIOs should check backup and recovery processes work consistently and are tested consistently across the organization.

This means ensuring backups are consistent across departments and across applications, in keeping with the DR policy. They should be consistent over time, so the business knows each backup is robust. And, as far as possible, they should be consistent across in-house, external and cloud data stores because failure in any one could disrupt the business.

Lastly, the IT team should record the results of tests and act on, and share, the lessons learned. A recovery test or DR drill is disruptive and can be expensive, so make the investment work for the business.

Originally published on ComputerWeekly.com, by Stephen Pritchard, November 18, 2020.