Originally published on the Quest blog, August 27, 2018, by Bennet Klein

Many people think developing a disaster recovery plan is all about calamitous events such as fires and earthquakes, where physical property is damaged and people’s lives are at risk. But actually, from an IT and business perspective, a disaster is any unexpected event that disrupts business continuity with system downtime and lowered productivity levels. From this view, a disaster can be much more mundane, such as a power failure or human error.

Being able to predict potential problems before they happen is key to a strong disaster recovery plan. Implementing a predictive business continuity strategy enables businesses to not only avoid disasters but also prevent revenue loss and business downtime caused by everyday problems.

Here are a few of the most common business-threatening events:

1. Human error. Human errors often cause systems to become logically corrupt or unusable. An accident as simple as an employee tripping on a cord can bring down an entire storage system. Predicting the types of human errors that are most likely to occur and having protocols in place to resolve them quickly is key to avoiding lost productivity.

2. Malicious attack. While accidental human actions happen fairly often, intentional actions are also increasingly common. For example, disgruntled or former employees may attack and bring down IT systems. So can viruses. Cyberterrorism is an even more pressing concern. Nowadays most organizations are aware that disaster can stem from malicious acts and are implementing predictive business continuity strategies to avoid them. Making use of up-to-date data protection, backup and recovery solutions can keep systems running as usual and halt these attacks in their tracks.

3. Data corruption. A data corruption outage occurs when a corrupt hardware or software component causes corrupt data to be read or written to the database. Data corruption takes many forms. It can be widespread or it can be localized. The impact of a data corruption outage will vary accordingly.

Corruption in a single database block might affect few users, but corruption in a large portion of a database would make it essentially unusable. Most IT professionals have seen some form of data corruption in their careers, although organizations understandably tend not to publicize these problems. Such data corruption can be caused by hardware failures or human error. Proactively instating a solid data backup and recovery plan can turn data corruption on its heels.

4. Storage failure. A storage failure outage occurs when the storage holding some or all of the database contents becomes unavailable because it has shut down or is no longer accessible. Many companies have had complete storage failures — often caused unintentionally by pesky humans. For example, at one organization, someone stacked a set of disk drives against a wall, inadvertently turning off a switch and causing system failures — an issue that was difficult to track down.

Another company that relied heavily on its storage area network (SAN) made the seemingly simple choice to lay carpet in its data center to reduce noise. When an authorized employee walked in to check the SAN and touched its racking bay, the static electricity discharge shorted the controller unit and the entire SAN went down. Without knowing that the cause of the problem was the electrostatic charge built up by walking on the carpet, the company put in a new controller. After it was up and running, someone else touched the rack again, and the new controller was also fried.

Predictive business continuity solutions focus on infrastructure optimization and data protection so that when a major component in the IT system fails, critical applications and data are not impacted and users can stay productive while the IT team works to understand why the system failed and correct the issue.

5. Power or network failure. Power failures may seem mundane, yet they can have a crippling effect on business. In fact, they are just about the most common cause of system downtime. Any business-centric disaster recovery plan should include a redundant local area network (LAN) network infrastructure as well as steps for how to restore the LAN. Network failures are also a very common cause of unplanned downtime. The loss of a single network switch could quickly turn into a major, time-consuming outage for an organization.

6. Natural disaster. While they aren’t the most common types of disasters that will threaten business continuity, hurricanes and earthquakes do happen. IT organizations still need to be proactive and build a sound recovery strategy to minimize their impact and maintain business continuity.