The Disaster Recovery Self-Assessment Guide and Validation Model Jim Kates Cognizant Technology Solutions Jim.Kates@cognizant.com
How Would You Evaluate Your DRP? (Is it a Disaster Recovery Plan or a Dilbert Recovery Plan)
Recovery Planning Is Essential Disaster Recovery Planning & Testing Provide Stability This Session will introduce a keep it simple approach and a self-assessment process that will address: How do you assure yourself that your plan has an adequate recovery solution? Is a test enough to validate the plan? Are you comfortable with Certifying that your plan will work?
Self Assessment Guide and Validation Model Overview Facilitates review of disaster recovery (DR) plans and processes in large, distributed, geographical dispersed organizations Designed to allow non-technical individuals to get a clear picture of the DR status, gaps and overall systems or service recoverability Self-serve approach allows the Business Process Owner to select appropriate risk areas to focus on in their DR planning activities Comprised of a Self Assessment questionnaire and testing process to verify/certify recovery plan quality
The Disaster Recovery Validation Model A Three Tiered Approach to Self-Assessment
Backup Level of the Validation Model Data Complete and Inclusive a process that ensures critical data files are identified, complete, inclusive and copied to some form of storage media that can be placed in a safe location. Recovery Point Objective (RPO) the targeted limit to how old data can be when restored and made available to users. Recovery Time Objective (RTO) defines how quickly the system or service must be restored or make available to users. Off-site Storage a separate facility where backup media is stored. Resources Complete having the appropriate recovery environment, staff, software, hardware, supplies, etc. to ensure that a system or service can be recovered.
Backup is the Most Essential Component of a Recovery Capability Making sure that data, programs, critical documentation are available in the event of a disruption is the most basic requirement Recovery cannot happen in most cases without careful attention to Backup The level of attention to Backup will be driven by the Business Process Owner clearly defining RTO and RPO If Business process Owners must re-input data into the system after recovering from a disaster, then written procedures to accomplish the task are a must
Restoration Level of the Validation Model Media (tapes, disks, documents) storage media, which has the data necessary to meet the Business Process Owner s recovery objectives (generally outlined by RTO & RPO) Single Point of Failure (SPOF) A resource or item that due to its failure or lack of availability leads to an interruption in a business process or causes it to fail entirely Skills the personnel with the appropriate knowledge of the production environment processes and procedures Process a detailed written explanation of how to perform a specific task related to recovery or operation of a system or service Software (SW) computer programs, whether package applications or custom programs, used as part of a business function to process information stored electronically Hardware (HW) computers and their related peripheral equipment that provide the physical processing, storage and transmissions controls as directed by software to meet the needs of the business function
Restore includes all of the resources needed to provide functionality Restoration requires that the broad array of resources needed to support a system or service be considered Consideration must be given to not only the availability of data, software and hardware, but also the procedures and individuals with the requisite skills to accomplish a recovery The keys to addressing and effectively maintaining the restore recovery capability are: Regular maintenance to documentation, Testing of restore processes, and To the degree appropriate, testing of the end-to-end restore process Some simple systems may not require a complex environment, but there is still a large quantity of issues.
Recovery Level of the Validation Model Alternate Equipment technology equipment comparable/compatible with the system production equipment and software that can be used to conduct recovery of a system or service Testing to exercise a recovery plan and related procedures to verify that the plan and defined resources (equipment, network, data, etc.) are adequate to recovery a system or service within targeted timeframes (RTO & RPO) Alternate Location a site with appropriate power, network, security and space to support the recovery environment for the system or service being recovered.
Recovery Where the Rubber Meets the Road! Recovery can refer to either testing process or the actual response to a disaster Recovery means that backup data has been restored in a manner that closely simulates recovery from an actual business disruption It is where the viability of a recovery plan is clearly defined through the success or failure to meet the objectives set by the Business Process Owner
The Self-Assessment Process
Self-Assessment Questionnaire A series of questions that will assist the Business Process Owner in determining how complete or adequate his/her recovery capabilities are The questions, in general, require a Yes or No answer, some are intended to generate an understanding or documentation of what is in place, and some include the option of indicating that there is an Informal Process in Place Upon completion, all No and Informal Process answers are entered in a Potential Exposure Document so that action plans can be put in place to the mitigate risk A Self-Assessment sign-off form is available to document the process and current state of the plan A Certification sign-off form is available for use once the plan has been successfully tested
Recovery Testing - Essential to Viable Recovery Planning Testing is essential to verify the plan is complete 90%+ recovery plans tested for the first time demonstrate that the technology preparations and documentation are incomplete or not viable* 68% of businesses have recovery plan, of these only 45% test their plans** Regular testing of plans is required to ensure changes to environment, software, files, network are addressed and recovery status is maintained Sources: * Sungard, ** Disaster Recovery Journal
Certification and Test Types Pass/Fail testing of the recovery plan is conducted with the simple goal of determining if the recovery objectives can be met Attributes minimal resource requirements, requires detailed procedures, may result in a need to conduct a multiple of tests to achieve objectives Pass/Fail w Mitigations testing of the recovery plan is conducted and objectives met, but some intervention is provided where the plan is inaccurate or not complete, ie, missing backup files provided from production environment, procedures updated to reflect errors etc Attributes more resource intensive, impact on production resources, more efficient than Pass/Fail at meeting recovery objectives and verifying that the recovery plan is complete Interactive testing of the recovery plan is conducted & the recovery team continues to work through issues until the system meets recovery objectives Attributes resource intensive, impact on production resources, efficient at meeting recovery objectives and verifying that the recovery plan is complete
Self-Assessment And Certification Is NOT A One Time Event Plan maintenance Incorporated into production change control process Ongoing testing Frequency of testing determined by system criticality Iterative nature of testing Testing objectives and scope often expanded as planning/testing experience is gained over time Alternate site selection & review Testing helps validate and ensure that the alternate site is appropriate and viable for the recovery of a system or service Plan Certification Plan Modifications Plan Review & Maintenance Technology Refinements Risk Assessment Recovery Planning & Testing Cycle Recovery Strategy Plan Testing Plan Development
Certification Process Simplifies DR Planning Closing The Gap Validating Recovery Capabilities Certification Process Self Certification Certification Testing SELF ASSESSMENT QUESTIONAIRE SURVEY PASS FAIL OR PASS FAIL WITH MITIGATION INTERACTIVE CORRECTION S TILL PASS CERTIFIED DISASTER RECOVERY PLANS YOUR COMPANY S DR POLICIES AND GUIDELINES
Validation Model & Certification - Benefits to the Organization Provides a process to ensure critical systems or services can be recovered within targeted timeframes Allows your company to provide products & services that meet customer expectations - products & services are delivered as promised Complies with regulatory requirements, ie, NIST, Sarbanes Oxley Growing regulatory focus on protecting information and availability Protects Stockholders assets and Corporate earnings Failure to recover systems supporting earnings drivers can be devastating Facilitates improved operational risk management processes to address escalating threat environment highlighted by recent terrorist activities
How would you have evaluated your DRP in the past? A Swag! Do you see how using the building blocks of the Validation Model and structuring a set of questions that are applicable to your organization, you would be better prepared to assure yourself that your DR Plans has an adequate recovery solution? Do you agree that testing alone may not be enough? Backup, restore and recovery process and procedures need to be assessed as well. By following this self assessment process, we hope you can agree that certifying the DR Plans simplifies the process by closing the gaps and validating the recovery capabilities. Questions?