The Top 5 Disaster Recovery Planning Mistakes
Admittedly, planning for a fire, flood, power surge, cyberattack or other crisis isn’t an exciting and enjoyable process. But disasters occur regularly. The corresponding business interruptions can jeopardize an organization’s livelihood and threaten a business’ survival.
Everyone from the US Small Business Administration (SBA) to the US Department of Homeland Security’s Ready.gov website and the Federal Emergency Management Agency (FEMA) maintain disaster planning resources, while encouraging businesses and individuals to properly plan for crises. The SBA even publishes its own 92-page disaster plan with additional guidance on its website.
It’s commonly reported that companies suffering disasters experience serious disruptions. These reports frequently include disturbing statistics regarding business’ ability to re-open or survive. Ensure your organization is properly prepared to respond effectively to unforeseen events. Drafting and maintaining a proper disaster recovery plan is among the very first fundamental steps. But avoid making any one of these top five disaster planning mistakes when developing and managing your organization’s business continuity and disaster recovery (BCDR) plans.
1. Having no disaster plan
A surprising number of businesses fail to create any semblance of a disaster plan. When a crisis occurs, whether due to internal or external factors, companies without a proper business continuity plan subsequently find themselves ill prepared. Mission-critical applications, communications and production can all come to a halt. Worse, there may be no way to recover operations, even using an alternative site.
As Entrepreneur magazine notes, every small and medium business (SMB) needs a disaster recovery plan. The publication and other observers commonly list many of the same reasons small and medium-sized firms require such planning: to protect against natural disasters, to lessen the impact of cyber attacks, to safeguard data, because people make mistakes and because hardware, software and systems fail.
Waiting until a disaster occurs is too late to begin implementing workarounds. Although many business owners believe their ability to think on the fly and work rapidly within a fluid environment will carry them through a crisis, they’re often mistaken. Needed data could prove both inaccessible and unrecoverable. The hardware powering applications may have been destroyed. Confusion—including which individual is to fulfill what role, where current contact information for various stakeholders can be found (because it’s possible the office’s telephones and messaging systems could be destroyed and inoperable) and determining what’s broken and what’s not—typically reigns.
Without advanced planning it’s unlikely a secondary recovery site is online, automatic failover systems are in place and team members understand the specific roles and functions they are to fulfill and how. It’s possible, too, many of these employees will simply prove unavailable, as they’ll be overwhelmed dealing with their own personal emergencies. Thus, contingency plans are also required.
The best defense against disasters is advanced planning. Creating a comprehensive disaster plan and implementing the supporting systems required to safely back up data and systems and permit recovery using an alternative location must be completed before a crisis occurs. Just be sure, when creating a disaster plan, to include all required stakeholders and to carefully review and test the implementation. You don’t want to wait only to discover when a disaster occurs that your plan is inadequate, as that’s the second biggest disaster-planning mistake SMBs make.
2. Relying on an inadequate plan
Whereas some organizations maintain no disaster plan whatsoever, still more companies rely upon inadequate disaster planning to carry them through a crisis, should one occur. Overlooking mission-critical programs, failing to understand the need for an alternative recovery site and neglecting to monitor and safeguard backup operations all threaten a business’ ability to survive a disaster.
Organizations sometimes consider offsite backups and maybe a cloud application as adequate protection against a catastrophe. Businesses shouldn’t wait for a crisis to discover those steps are insufficient, however. Whether reviewing guidance from government agencies, the business press or disaster planning experts, many of the same recommendations appear on every DR planning checklist, yet these additional steps are often skipped.
Effective disaster plans don’t omit important details. Instead, proper disaster planning requires exploring a number of subjects. Companies can review guidance provided by the government or private parties to help structure business continuity initiatives and ensure essential steps aren’t skipped.
Among the common elements appearing within a variety of disaster planning recommendations are the following:
- Perform a comprehensive business impact analysis
- Conduct interviews with frontline employees
- Identify mission critical applications and systems
- Document the business’ recovery requirements
- Conduct gap analyses
- Draft recovery strategies
- Organize a recovery team
- Implement failover and recovery site initiatives
- Test and validate the DR plan, adjusting for errors
Some organizations invest the time and effort needed to fulfill all those tasks but forget to maintain the corresponding plan. That’s the third biggest disaster planning mistake to avoid.
3. Neglecting to update the disaster plan
Even well-prepared disaster plans go bad with time as employees leave an organization, new applications are introduced, old platforms are replaced and sites change. In addition to developing and implementing disaster plans, companies must also continually revisit and update those plans.
Corresponding BCDR emergency contact lists become outdated as employees come and go. Whenever a stakeholder leaves, institutional knowledge may exit with them. New employees taking their place may not realize the individuals they’re replacing carried specific disaster response responsibilities, and organizations often forget to reassign such duties. Subsequently, when a crisis arises, the organization may find itself stumbling through efforts to try and recover operations due to the skill and function gaps that can arise.
Forgetting to include a new essential application within a business continuity management process is a common mistake many companies make, too. The oversight is particularly dangerous, as essential data and processes are often migrated to new systems. Failing to include the new program and corresponding data within the organization’s backup and recovery processes can result in lost data and significant delays in returning operations to normal.
The same is true for programs that are replaced. Sometimes old, outdated applications and their non-changing data continue to be included within automated offsite backups as part of a larger BCDR plan, but the new system that replaces the old program, and the new solution’s corresponding data and information, aren’t included within a company’s disaster plan and recovery workflows. As a result, critical data can be lost when a crisis occurs.
New sites are, typically, harder to miss or overlook. Yet, on occasion, new locations may be abruptly brought online. Project managers don’t always immediately have an opportunity to include these new sites within the organization’s broader BCDR workflows, so vulnerabilities can arise.
Regularly reviewing and updating disaster plans is among the best methods of catching such oversights. Unless a disaster plan is periodically reviewed, BCDR plans and recovery workflows can become dangerously outdated.
4. Failing to test the disaster plan
One way to discover if a disaster plan doesn’t properly cover an essential workflow or stakeholders don’t understand their crisis responsibilities is to test the plan. Testing options can include simply reviewing the plan to spot obvious errors or a much more comprehensive approach, such as spinning up a secondary site to confirm operations can be restored at an alternative location, should the need arise. Other testing options include conducting a tabletop run-through and testing a variety of scenarios, such as how well automatic failover systems perform when a primary production site goes offline.
When conducting a purposeful run-through with all stakeholders, if participants are thorough and deliberate, the review should prove revealing and surface omissions or oversights. Going a step farther, actually spinning up a secondary site to power live operations is an excellent method of testing a DR plan. Live testing may be conducted during off hours to lessen the impact of any issues, and the subsequent lessons can prove valuable.
Should an application required to recover operations prove unavailable, the plan can be tweaked. If the same discovery isn’t made until a crisis occurs, however, delays could prolong returning to normal operations and important information could be lost.
New software solutions that haven’t yet been integrated within the company’s DR plan should become apparent, as should any malfunctioning business continuity management (BCM) platform, an automated offsite backup that’s experiencing trouble and recovery solutions that can’t adequately handle load once pressed into production. Should a remote office or branch location have been forgotten in DR planning, testing the corresponding recovery plan should help catch such failures, too.
Regular BCDR testing helps ensure configuration errors, recovery role omissions, application oversights and site issues are surfaced. With notes and information gleaned from reviewing and testing a disaster plan, organizations and their stakeholders can make corrections and tweaks necessary to help an organization recover effectively, should a disaster arise.
But one thing stakeholders shouldn’t do is trust backups are completing properly. The ways in which backups can fail are legendary, so be on guard against this common disaster planning mistake.
5. Overlooking failed backup routines
Organizations dependent upon backup routines to power disaster recovery—seemingly a dependency shared by numerous entities—sometimes discover a backup operation hasn’t completed properly. Whether backing up data locally, creating system images, capturing virtual machine snapshots or backing up data offsite, backup operations are far from foolproof.
In a particularly complex twist, backups can fail to meet a business’ recovery needs despite the backup operation reporting the process completed properly. This news can understandably prove distressing, considering backups sometimes fail to properly create an image or back up servers, databases, applications and other data in a way that actually enables recovering operations, even when the backup application reports the operation completed properly. Problems can arise due to portions of the backup media becoming corrupt, a failure to create hardware-independent backups, a backup operation having trouble capturing open databases at the moment a backup runs and even improper formatting on destination media.
In some cases, organizations are aware backup operations are failing or completing with errors. Resolving backup issues and errors should always prove a priority. Should those issues persist, required systems and data could prove irrecoverable in the event of a disaster. In other cases, backups might complete properly but fail to capture important applications or program data installed as part of a new initiative or as a new solution.
For assurance, backups should be monitored vigilantly and periodically tested to ensure they are completing properly, capture all required data and can be employed in a manner that meets the organization’s recovery requirements. Simply reviewing log files and confirming backup operations report completing properly is, unfortunately, insufficient. The proof is in whether the backup is recoverable, required systems and data can be restored to an alternate machine or site and the backup can actually recover operations within the required time frame.
Does disaster planning keep you up at night?
News articles routinely confirm disaster planning and recovery to be among the primary topics keeping technology professionals up at night. Disaster planning generates stress for good reason; there’s so much riding on the results when a catastrophe strikes.
Fortunately, you need not go it alone. Louisville Geek can assist businesses with developing, testing and updating business continuity plans and corresponding solutions. For more information, call Louisville Geek at 502-897-7577 or email [email protected].