1 Building a Disaster Recovery Program By: Stieven Weidner, Senior Manager Part two of a two-part series. If you read my first article in this series, Building a Business Continuity Program, you know that a Disaster (IT) Recovery Program is part of a larger Business Continuity Program. Basically, a Disaster (IT) Recovery Program should be developed from the needs of the business, not developed in isolation using assumptions. However, as a Business Continuity and Disaster Recovery expert, I ve come to realize that many companies either do not have the time, the money, or the personnel to develop their Disaster (IT) Recovery Program properly. A Business Continuity Program and Disaster (IT) Recovery Program both have a similar purpose, to protect and recover a business from events detrimental to its immediate and long- term sustainability. Developing the Disaster (IT) Recovery Program based on the outcomes of the Business Continuity Program will ensure you are protecting and recovering the proper IT resources in the appropriate timeframes. However, developing the Disaster (IT) Recovery Program in a vacuum has the potential for overlooking critical business components which may be crucial during times of crisis. A Disaster (IT) Recovery Program, like a Business Continuity Program, is a program and not a project, and has many parts, each important to the overall success of the program. While a company may initiate the program through a short-term project, the effort to maintain this program goes on indefinitely. Below is a diagram providing a somewhat simplistic view of Disaster (IT) Recovery, but it does outline the most critical activities and the sequence the program should follow.
2 1) Risk and BIA: a. Risk (Evaluation and Control): Similar to understanding business risk for the company s Business Continuity Program, a Disaster (IT) Recovery Program begins by understanding and evaluating the IT risk. This is accomplished by reviewing the company s IT risk history as well as projecting what type of risks the company may realize in the future. These risks will be quantified by rating their potential occurrence (how likely they may occur) and their potential impact to the IT infrastructure (how severe of an impact they may have). Once these are known, you must prioritize these risks and develop suitable mitigation strategies (controls) for each of them. Of course, it may be unwise to spend money on risks that have a low probability of occurring or that may have a minimal impact if they do, so a do nothing strategy just may be a reasonable decision. Your leadership will help justify which mitigation strategies to implement and which risks to accept. Keep in mind that insurance should play a part in your mitigation strategy as well. b. Business Impact Analysis (BIA): As stated earlier, a comprehensive Disaster (IT) Recovery Program can only be developed if the business needs are understood. The company must identify what resources it must have and what functions it must perform to survive. Subsequently, the company should pinpoint what resources and functions it can survive without, at least for a period of time. The BIA identifies and reviews the business workflows and functions, prioritizes them based on business need (including critical periods), and determines how quickly they must be recovered to minimize negative business impacts during an event. One of the most important outcomes of the BIA as it relates to the Disaster (IT) Recovery Program is the identification of the Recovery Time Objectives (RTO) and the Recovery Point Objectives (RPO) for business applications. Note: a BIA is generally completed every three to five years depending on the company s business climate. 2) Identify New Business Workflows/Functions: While the BIA identifies the company s current workflows and functions, as well as their criticality and IT requirements, the company must also identify newly implemented business workflows and functions as well as changes in criticality which may impact the current IT environment between BIA intervals. Any new workflow, function,
3 process, equipment, or application, or even a change in a current workflow s criticality (RTO/RPO), may change Business Continuity requirements and, hence, Disaster Recovery requirements. While the IT department may not always be aware of new business initiatives, in order for it to remain proactive and protect company interests, it is strongly recommended that the department participate in the early stages of business development activities. It is expected that business priorities will change over time, and participating in those business planning activities will ensure that longterm and short-term IT strategies will continue to meet business needs and goals. The IT department needs to proactively seek out those changes and work with business leaders to understand their criticality and the possible IT impact. Only then can the IT department develop and modify their infrastructures and recovery strategies appropriately to provide the business the proper disaster recovery support. 3) Develop IT Recovery Strategies: Understanding the current and new business workflows, functions, and criticality will provide the Disaster Recovery planner the ability to identify the associated hardware, software, interfaces, and networks to support them. The Disaster Recovery planner has the task of ensuring that the identified IT recovery infrastructure will meet business needs and the identified RTO and RPO, as well as seamlessly adapt to the current or modified production environment and data backup strategies. 4) Procure Critical IT Recovery Infrastructures and Recovery Location: Implementing the IT recovery strategy may have multiple components. As an example, a business may require a hot site for its critical applications, which has pre-built equipment available to turn on in a moment s notice. For those not-as-critical applications, a drop ship arrangement, which may take a week or more before equipment is delivered, might be a perfect solution. Then again, the business may require a fully functional redundant site for when downtime of any kind is simply not an option. You can buy these services from a vendor, develop them in-house, or combine resources. It s a matter of finding the appropriate solution that s affordable and still meets the company s level of adversity in the face of downtime. 5) Identify Disaster Recovery Plan Requirements: Before Disaster (IT) Recovery plan development can begin, you must assemble the required
4 information, which is based on the IT production environment, as well as the IT recovery environment, and includes all IT stakeholder data, both internal and external to your business. This information provides the plan development process the foundation it needs to be successful. The data I initially focus on capturing is identified in the chart below. Application Hardware Recovery Application Name Device Type Location - Normal Hosting Related Business process/department Device Name (i.e., server name) Location Recovery Facility Vendor Name Vendor Name Vendor Name & Contact Contract & Support Number Contract & Support Number Vendor Declaration Info Software Key (if applicable) OS Version Internal Declaration Authorities Internal App Owner Internal Device Owner DR Plan Names (Plan IDs) RTO, RPO & Recovery Priority IP Addresses DR Plan Owners Backup/Restore Restore Type: Tape/SAN/etc. Restore Agent Type (If applicable) Off-Site vendor & contact info Server Type: physical/virtual Remote Secure Access (RSA)/ Integrated Lights- Out (ILO) IP addresses Device s Domain Name (DNS) Special Required Tools/ Equipment/Information DR Plan Team Members DR Plan Off-site Location(s) Escalation Procedures List of internal/external Stakeholders While all the above information is very important, the three most critical items are the RTO, RPO, and recovery priority. These three provide the Disaster Recovery planner the guidance to develop the recovery plan strategy in the proper order, providing the IT teams a logical sequence to follow and ensuring the Disaster (IT) Recovery Program meets the business requirements. a. Create New Plans: This is the act of developing the disaster recovery plans your company requires to support an IT event. Similar to the
5 development of Business Continuity plans, Disaster (IT) Recovery plans can be as varied as the companies that develop them. My advice for any plan is to keep it simple and to the point. Avoid documenting information that is readily available or isn t necessary during an event. For example, a vendor s documentation may provide all the required recovery steps for an application. Feel free to identify this documentation within your own plans and use it instead of recreating it. Of course, you need to develop a way to keep the information contained in these plans safe, secure, and stored in a location that is both readily accessible and out of harm s way. Many companies have adopted the use of encrypted key fobs to store their recovery plans and other critical business information, but there are many other potential solutions. Web-based software can also be a solution as long as access to the web is available during an event. b. Update Plans: The act of modifying plans with updated information ensures the plans remain viable. Typical updates might include name and contact information changes, changes in hardware configurations, or escalation procedures. These updates are generally identified through Periodic Reviews or the result of exercises. 6) Plan Maintenance: Whether you are maintaining Business Continuity plans, Crisis Management plans, Emergency Response plans, or Disaster (IT) Recovery plans, they are all living documents and should follow the same three basic processes outlined below. a. Exercise and Training: These two activities go hand in hand. Exercising provides training, and training makes exercises more efficient and effective. Plan exercises should, at a minimum, occur once per calendar year. However, many companies exercise much more often. While tabletop exercises provide a low-cost method to walk through the plans and validate plan structure and function, actual recovery exercises provide real-life situational awareness and will not only validate structure and function, but will also exercise staff skill, recovery timeframes, system/software compatibility, and data backups. The company generally develops its exercise schedule and type based on its risk tolerance and/or regulatory requirements. b. Events: Events provide a real-time validation of DR plans, albeit with greater impact to the organization.
6 c. Lessons Learned: If you exercise or actually experience an event and you haven t identified stronger recovery methods, you ve missed a great opportunity. Documenting lessons learned provides the one chance to enhance your company s plans by identifying improvement opportunities and finding their solutions. d. Update Plans: The act of updating plans with the solutions identified through lessons learned will enhance a plan s recoverability and provide better recoveries in the future. 7) Periodic Reviews: Reviews can be broken down into long-term reviews and short-term reviews. a. Long-term reviews provide the process to validate the most current Risk Evaluation and Control and BIA documentation, ensuring the original Business Continuity strategies still meet business goals. If not, new business plans and, hence, new Disaster (IT) Recovery plans would be developed based on any updated strategies. The long-term reviews should be completed every three to five years on average. b. Short-term reviews should be completed at least annually, but it is highly recommended to review your plans more often to ensure the information contained within them remains viable. Contact information becomes stale very quickly if not routinely reviewed. If the company exercises often, formal plan reviews may be required less frequently. 8) Communication Plan: Communications is probably the single most important aspect of any plan, and the communication processes and strategies should be well integrated into the program. Each Disaster (IT) Recovery plan requires a clear process for declaring a disaster, assembling the required IT recovery teams, and determining how, when, and to whom to escalate if required. The communication plan should also provide the procedure for leveraging the support vendors and reaching out to the company s own internal business departments. In addition, a comprehensive communication plan should include the process for establishing an Emergency Operations Center (EOC). The EOC is a pre-established and pre-configured location where IT leadership and communication support functions, such as the help desk, can assemble and support IT recovery functions. The EOC is the focal point for all IT communications including status updates and issue escalation.
7 Today businesses rely heavily on information technology. Even a downtime of short duration can cause considerable impact to business functions. Protecting IT infrastructure from harm has become increasingly important for many businesses and has actually become comparatively less expensive and complex. The use of virtual servers has substantially reduced recovery complexity and, if paired with a remote clustering strategy, can mitigate most of a company s IT risk. Generally, a shorter recovery time, as in the case of remote clustering, means a higher cost to implement and maintain. Keep in mind there are many factors to take into account before deciding on any one strategy. As mentioned above, implementing multiple recovery strategies that combine high availability for the critical applications and a less expensive tape recovery for the less critical applications may be sufficiently effective, at a more realistic cost. The key for developing a successful Disaster (IT) Recovery Program is understanding that business requirements must drive it. The alignment of the IT recovery strategy with business requirements ensures the most critical business functions will be available as quickly as possible, and that the impact to the business is minimized. The risk is yours! Plan accordingly. About Navigate Navigate is a management consulting firm focused on helping clients solve their business problems. We provide solutions in the areas of strategy, business operations, technology and risk. Effective project management, organizational change and leadership flow through everything we do. Objectivity, practical advice and regionally located consultants offer our clients a better consulting experience. We deliver outcomes.