Business Continuity and Disaster Recovery Steve Earley, CISA, CISSP, CRISC, CFSA, ITILV, MCP Senior Manager, Internal Audit and Risk Advisory Services Schneider Downs & Co., Inc. Columbus, Ohio 15-272 Business Continuity and Disaster Recovery PowerPoint Presentation... 1 Business Continuity and Disaster Recovery i
ii Adding and Conserving Value for Your Clients
Business Continuity and Disaster Recovery Ohio State Bar Association November 30, 2015 Steve Earley M.S., CISA, CISSP, CRISC, CFSA, ITILv3, MCP Senior Manager, IT Audit, Internal Audit and Risk Advisory Services 20+ years IT experience, 11 years audit/risk/security Oversee all IT audit and advisory services: PCI, SOC 1 & 2, regulatory compliance (e.g., SOX, HIPAA), business continuity planning, and IT security assessments. Technical leadership experience across multiple business sectors, including: public accounting, defense, healthcare, financial services, government, and high technology. Previously served as the head of IT operations for a secure web collaboration service with Adobe, serving over 1 million DoD users globally. Steve was also previously the Chief Information Security Officer for a large state government agency, Information Security Officer for two major credit card-issuing banks, and founder of his own risk and security consulting business. Retired U.S. Navy Commander; specialized in information assurance and cyberintelligence BSBA Accounting, THE Ohio State University, Columbus, Ohio MS IT Management, THE Naval Postgraduate School, Monterey, California Wrote the first BCP for Stryker Medical (Kalamazoo, MI) in 2001 Business Continuity and Disaster Recovery 1 2
About Schneider Downs As one of the largest certified public accounting and business advisory firms in the region, Schneider Downs serves clients throughout the country and around the world. By integrating highquality resources, systems and personnel, Schneider Downs has built a reputation of delivering individualized services built on insight, innovation, and experience to meet each client s specific needs. For more information, visit us at www.schneiderdowns.com We Are Committed to Your Success 3 Agenda Business Continuity vs. Disaster Recovery Why is BC/DR Important? Definitions / Key Terms Business Continuity Process o Business Impact Analysis (BIA) o Recovery Strategies o Plan Development o Testing/Exercises Outsourcing Options for DR Vendor Management / SOC Report Considerations Record Retention/Destruction Q&A 4 2 Adding and Conserving Value for Your Clients
Business Continuity vs. Disaster Recovery Business Continuity: The strategic and tactical capability of the organization to plan for and respond to incidents and business disruptions in order to continue business operations at an acceptable predefined level. Disaster Recovery: The process, policies and procedures related to preparing for recovery or continuation of technology infrastructure, systems and applications which are vital to an organization after a disaster or outage. Disaster Recovery focuses on the information or technology systems that support business functions, as opposed to Business Continuity which involves planning for keeping all aspects of a business functioning in the midst of disruptive events. Disaster recovery is a subset of Business Continuity. Source: Disaster Recovery Journal 5 Why is BC/DR Important? Disasters happen. They can be small and straightforward to deal with, or if you re unlucky, you may be faced with a fullblown catastrophe. The key to recovery is to plan for the disaster and minimize your downtime and data loss. Thinking through scenarios before a disaster makes it far easier to recover from a disaster! Business Continuity and Disaster Recovery 3 6
BC/DR Fun Facts Gartner estimates that only 35% of SMBs have a comprehensive disaster recovery plan in place. There is less than a 10% survival rate for organizations without a plan. According to research by the University of Texas, only 6% of companies suffering from a catastrophic loss survive, while 43% never reopen and 51% close within two years International Data Corporation estimates that companies lose an average of $84,000 for every hour of downtime. According to Strategic Research, the cost of downtime is estimated at close to $90,000 per hour. Sources: https://www.corpmagazine.com/executives-entrepreneurs/expert-advice/we-dont-need-no-stinkingbusiness-continuity-plan/; https://iosafe.com/industry-stats 7 Disaster Declarations FEMA Declarations (76) since January 1, 2015: 40 Major Disasters (none in Ohio) 2 additional Emergency Declarations 34 Fire Management Assistance Declarations Ohio has had 11 Major Disasters and/or Emergency Declarations in the past 10 years. Source: FEMA 8 4 Adding and Conserving Value for Your Clients
It s the Big One, Elizabeth! I m coming to join you, Honey! Fred Sanford The common perception is that a disaster will be a major catastrophic event such as: Natural Disaster (Hurricane, Tornado, Earthquake, Flooding, Blizzard) Fire Lightning Strikes Terrorist Attack Plane Crash Train Wreck Civil Unrest 9 More Realistic Scenarios Most common disasters are much smaller events: Internal plumbing leaks; accidental discharge of sprinkling system Underground power and communications cables being damaged, resulting in a loss of service Inability to quickly restore utilities following storm damage (e.g., holiday snow/ice storm 2004; Hurricane Ike 2008) Small-scale cyberattacks (e.g., workstation viruses, Denial of Service attacks, network intrusion resulting in systems being shut down) Human error (e.g., unplugging the server from the wall) Such mundane events still have the potential to disrupt critical business operations and bring them to a grinding halt. 10 Business Continuity and Disaster Recovery 5
Failure to Plan is Planning to Fail Loss of Business/Customers Loss of Credibility/Goodwill Cash Flow Problems Degradation of Service to Customers Inability to Make Payroll Loss of Production Capabilities Loss of Operational Data Financial Loss ( revenue, fines/penalties) Loss of Financial Control Loss of Customer Account Management 11 Key Terms Major Phases Disaster: Situation where widespread human, material, economic or environmental losses have occurred which exceed the ability of the affected organization, community or society to respond and recover using its own resources. (Source: ISO) Outage: Interruption of automated processing systems, infrastructure, support services, or essential business operations, which may result in the inability to provide services for some period of time. Relocation: Movement of people, processes, technology to alternate site(s) following (or prior to!) a disaster. Recovery: Implementing the prioritized actions required to return the processes and support functions to operational stability following an interruption or disaster. Restoration/Resumption/Reconstitution: Processes/procedures for repair of hardware, relocation of the primary site and its contents, and returning to normal operations at the permanent operational location. 12 6 Adding and Conserving Value for Your Clients
Key Terms - General Business Impact Analysis (BIA): Process designed to assess the potential quantitative (financial) and qualitative (nonfinancial) impacts that might result if an organization were to experience a business disruption. Business Continuity Plan (BCP): Documented procedures that guide organizations to respond, recover, resume and restore to a pre-defined level of operation following disruption. Continuity Of Operations Plan (COOP): Another term for Business Continuity. Often used in the public sector (particularly U.S. Government). Call Tree: Listing/chart showing calling responsibilities and the order used to contact management, employees, customers, vendors, and key contacts in the event of an emergency, disaster, or severe outage situation. 13 Key Terms General (cont.) Critical (Customer, Data, Infrastructure, Supplier): Entity/element that would have a key business function impacted by disruption/outage. Service Level Agreement (SLA): Formal agreement between a service provider and client (whether internal or external), which covers the nature, quality, availability, scope and response of the service provider. The SLA should cover dayto-day situations and disaster situations, as the need for the service may vary in a disaster. Vital Records: Records essential to the continued functioning or reconstitution of an organization during and after an emergency and also those records essential to protecting the legal and financial rights of that organization and of the individuals directly affected by its activities. Business Continuity and Disaster Recovery 7 14
Key Terms Facilities and Power Backup Generator: Independent source of power (usually diesel or natural gas). Uninterruptible Power Supply (UPS): A backup electrical power supply that provides continuous power to critical equipment in the event that commercial power is lost. The UPS (usually a bank of batteries) offers short-term protection against power surges and outages. The UPS usually only allows enough time for vital systems to be correctly powered down. 15 Key Terms Facilities and Power (cont.) Alternate Site Site ready for use following declaration of a disaster or business interruption Used to continue urgent and important activities of an organization May include IT equipment, desks, phones, paper files/forms Hot Site: Facility equipped with full technical needs (IT, telecom, infrastructure); rapid resumption of operations Warm Site: Partially equipped; may need to bring servers, load some software, etc. to fully resume operations Cold Site: Physical space only; typically includes power, phone line(s), internet connection(s) 16 8 Adding and Conserving Value for Your Clients
Key Terms - Technology Backup: Data (electronic or paper) and programs are copied in some form to be available if the original data is lost, destroyed or corrupted. Continuous/High Availability: System or application that supports operations which continue with little to no noticeable impact to the user. Data Mirroring: The act of copying data from one location to a storage device at another location in (or near) real time. Database Replication: The partial or full duplication of data from a source database to one or more destination databases. 17 Key Terms RPO and RTO Recovery Point Objective (RPO): The point in time to which data is restored and/or systems are recovered after an outage. How much data am I willing to lose? Recovery Time Objective (RTO): The period of time within which systems, applications, or functions must be recovered after an outage. Typically measured in business days/hours or elapsed time on a 24-hour clock. How long am I willing to have my systems down? Business Continuity and Disaster Recovery 9 18
Business Continuity Process Source: http://www.ready.gov/business/implementation/continuity 19 Business Impact Analysis (BIA) Purpose: Meet with business leaders to understand what could go wrong during potential disaster scenarios. Related to (and often conducted concurrently with) an annual risk assessment process. Important Terms: Impact: The effect, acceptable or unacceptable, of an event on an organization. The types of business impact are usually described as financial and non-financial and are further divided into specific types of impact. Risk: Potential for exposure to loss which can be determined by using either qualitative or quantitative measures. Loss: Unrecoverable resources that are redirected or removed as a result of a business continuity event. Such losses may be loss of life, revenue, market share, competitive stature, public image, facilities, or operational capability. 10 Adding and Conserving Value for Your Clients 20
Business Impact Analysis (cont.) Recoverable Loss: Financial losses due to an event that may be reclaimed in the future, e.g. through insurance or litigation. Risk Transfer: Common technique used by risk managers to address or mitigate potential exposures of the organization. A series of techniques describing the various means of addressing risk through insurance and similar products. Mission-Critical Activities/Applications: The critical operational and/or business support activities (either provided internally or outsourced) required by the organization to achieve its objectives (i.e., services and/or products, revenue). Annual Loss Exposure/Expectancy (ALE): A risk management method of calculating loss based on a value and level of frequency. 21 BIA Key Items to Cover What are the significant business processes? What are the requirements necessary to continue these business processes? What are the critical roles within each process? What data is involved and how is it used? Review the business processes from the view of your external stakeholders. Review the processes from your internal stakeholders, focusing on resources required to deliver processes expected by external stakeholders. Identify single points of failure (SPOF). Identify availability requirements for the processes and systems that support the processes (i.e., RPO, RTO). Is everything protected that needs protected? Identify process inter-dependencies Prioritize processes and recovery efforts Business Continuity and Disaster Recovery 11 22
Develop Recovery Strategies Develop recovery goals for each system or system component defined in the BIA. Balance the cost and benefits of possible approaches. Select the type of solution(s) and determine the scale of cost associated with the recovery. Identify the types of disasters you need to prepare for and classify them by their impact to your business. Need to take into account the particular characteristics of the infrastructure, human and data aspects of recovery. 23 An Ounce of Prevention is Worth a Pound of Cure Identify any Preventative Measures In most cases, it is cheaper (way cheaper!) to prevent the problem than to recover from it. Review single points of failure (SPOF) and eliminate wherever possible Ensure regular maintenance is performed servers, generators, fire suppression Install electronic sensors that monitor environmental factors Install performance monitors that will provide early notice of server failures 24 12 Adding and Conserving Value for Your Clients
Develop a Plan Project Initiation Executive Sponsor (critical!!!) Create BC/DR Team Disaster Recovery Coordinator Master of Disaster Team Leads and Members Define Roles and Responsibilities Create a BC/DR Policy(ies) Scope What does it cover and when does it start/stop? Resources Define maximum level of resources available (financial, space, equipment, staffing) Define Success: Needs to be measurable 25 Develop a Plan (cont.) Develop the Plan Document the plan and implement the infrastructure required to enable the plan. In addition to the specific recovery procedures, the document should include all of background information, assumptions and constraints that went into making the plan. 26 Business Continuity and Disaster Recovery 13
What Should be in my Plan? Introduction Document the goals and scope of the plan along with any requirements that must be taken into account when ever the plan is updated. Good place to track all of the changes that have been made to the plan (i.e., change log with dates). Even better to keep in DMS with version control! Information from this section should come from the project initiation phase. 27 What Should be in my Plan? (cont.) Operational Overview The purpose of this section is to provide a concise picture of the overall approach. Identify the systems being protected and the recovery strategy employed. Description of the recovery team and their roles. The information from this section will come from the business impact analysis and project initiation phases. 28 14 Adding and Conserving Value for Your Clients
What Should be in my Plan? (cont.) Notification/Activation This section includes activities to notify recovery personnel, assess system damage, and implement the plan. Easy to forget that declaring the emergency and deciding that it is time to initiate operations under the BC/DR plan can be difficult and requires advance planning as well. Without clear guidelines, there is a natural tendency to delay action until it is certain a disaster is imminent. This is typically too late. 29 What Should be in my Plan? (cont.) Notification/Activation section must answer the following questions: What basis will be used to activate the plan? What information is required to make such a decision? Are there additional guidelines that need to be considered? Who is responsible for performing the damage assessment to provide the information needed above? Are their any restrictions that could prevent us from making the decision? Who is responsible for the go / no-go decision? What are the rules of succession? How will teams be notified? How will recovery teams communicate? Think in terms of backup communications, physical meeting sites, etc. Business Continuity and Disaster Recovery 15 30
What Should be in my Plan? (cont.) Recovery Phase This section documents in detail the solutions to be used to recover each system and the procedures required to carry out the recovery and restore operations. This section should contain detailed instructions and checklists necessary to recover your systems. The documentation should be at the level of detail that the least knowledgeable person on the team could execute the instructions. I would suggest that you actually perform a system recovery and document each step in the process. You would be surprised how much you miss the first time. 31 What Should be in my Plan? (cont.) Restoration Phase Recovery activities are terminated and normal operations are resumed. What are the conditions for returning to normal operations? What if the new normal is different than the old normal? (think 9/11) What are the guidelines that are used to terminate the recovery efforts and return to normal operations? How is information communicated? 32 16 Adding and Conserving Value for Your Clients
What Should be in my Plan? (cont.) Appendices Team Contact Information Critical Vendor Contact Information, SLAs, reciprocal agreements, contracts, etc. Emergency Contacts (Police, Ambulance, Fire, etc.) for primary and alternate sites SOPs and checklists for system recovery Lists of equipment, system requirements for hardware, software, firmware, etc. Description and directions to alternate site(s) System documentation Copies of Software Licenses (needed to reinstall applications) 33 Testing and Exercises Develop schedule for regular training, testing, and exercises Hold frequent training with DR Team members; maybe focus on one phase at a time Conduct orientation exercises (also called table top exercises) Conduct testing of portions of the plan regularly (e.g., system restorations) Hold a larger-scale BC/DR exercise every 1-2 years if possible. Update the plan based on lessons learned! Business Continuity and Disaster Recovery 17 34
Business Continuity Exercises People-focused activities designed to execute business continuity plans and evaluate the individual and/or organization performance against approved standards or objectives. Exercises can be announced or unannounced Purpose: Training team members Validating the business continuity plan Exercise results identify plan gaps and limitations and are used to improve and revise the Business Continuity Plans. Types of Exercises: Desktop Exercise, Table Top Exercise, Simulation Exercise, Operational Exercise, Mock Disaster, Full Rehearsal. 35 BC/DR Planning - Word of Caution No matter how good your planning and testing are, depending on a single solution, especially a complex one, means that your recovery is all-or-nothing proposition. This is extremely risky. Consider building your plan with a series of backup solutions so that, if one fails, another is in place to recover. This may increase your cost, but will mitigate risk with an all-or-nothing approach. 36 18 Adding and Conserving Value for Your Clients
Critical Success Factors Senior Management Buy-in Test, Train and Educate Regularly Plan Maintenance BC/DR plan has to be a living, breathing document Include as a budget line item this is not free Audit the results and outcomes of testing 37 Outsourcing - You Don t Have to Do It Alone Consider outsourcing portions of the DR plan. According to a recent IDC study: For enterprises that didn t outsource, average loss $4M per disaster incident Enterprises that outsourced: average $1.1M In-house models cost 32% more than outsourced models Outsourcing reduced RTO by factor of 0.62 Outsourcing Options for Disaster Recovery Alternate data centers Alternate work sites Managed backup/recovery services Voice over IP (VoIP) Virtualized environments (faster startup/install of servers) Business Continuity and Disaster Recovery 19 38
But If You Outsource Perform appropriate levels of due diligence Have a solid third party risk management strategy/program Only use reputable vendors Ask the right questions about how the vendor is handling your sensitive data Remember, the vendor s risk equals your risk, but your clients won t blame the vendor if they make a mistake. You can outsource the process, but not the accountability. Demand a SOC report! 39 AICPA Service Organization Control (SOC) Reports SOC 1: Report on Controls at a Service Organization Relevant to User Entities Internal Control over Financial Reporting (ICFR) SOC 2: Report on Controls at a Service Organization that focuses on one or more of the following Trust Service Principles: Security, Availability, Processing Integrity, Confidentiality and/or Privacy. Restricted use report. SOC 3: Trust Services Report similar in scope to the SOC2, but the report does not contain details of the auditor s tests and results. General use report; can be freely distributed or posted on a website. 40 20 Adding and Conserving Value for Your Clients
Trust Services Principles & Criteria Five Principles of a system are defined as follows: Criteria Common to All Principles [Security, Availability, Processing Integrity, and Confidentiality] New in 2014; mandatory for 2015 reports. Common Criteria do not cover Privacy. Security - The system is protected against unauthorized access (both physical and logical). Covered entirely by Common Criteria; therefore, all SOC 2 & 3 reports address the Security Principle at a minimum. Availability - The system is available for operation and use as committed or agreed. Processing Integrity - System processing is complete, accurate, timely, and authorized. Confidentiality - Information designated as confidential is protected as committed or agreed. Privacy - Personal information is collected, used, retained, disclosed, and destroyed in conformity with the commitments in the entity s privacy notice and with criteria set forth in Generally Accepted Privacy Principles (GAPP). 41 Additional Criteria for Availability # Criteria A1.1 Current processing capacity and usage are maintained, monitored and evaluated to manage capacity demand and to enable the implementation of additional capacity to help meet availability commitments and requirements. A1.2 Environmental protections, software, data backup processes, and recovery infrastructure are designed, developed, implemented, operated, maintained, and monitored to meet availability commitments and requirements. A1.3 Procedures supporting system recovery in accordance with recovery plans are periodically tested to help meet availability commitments and requirements. Business Continuity and Disaster Recovery 21 42
Additional SOC Thoughts SOC 1 typically does not include testing of BC/DR. Section 3 ( Description of System ) and 4 (Testing) will show what was tested. If BC/DR is in Section 5 ( Other Information ), this implies that it was not tested. With the establishment of SOC 2, AICPA recognized importance of BC/DR by including the Availability principle. For DR outsourcing, you ll want to see SOC 2 coverage of Security and Availability principles; possibly also Confidentiality and/or Privacy. 43 Record Retention and Destruction Make sure to only retain documents/data for as long as they are needed. More data equals: Longer time to backup Longer time to restore Additional overhead/costs Greater security risk More to provide during legal discovery Destruction: Paper: cross-cut shredding (dumpster diving DOES happen!) Data/Media: wiping/physical destruction Servers/Workstations: wiping/physical destruction (and full-disk encryption for laptops) Vendors: make sure they have similar provisions 44 22 Adding and Conserving Value for Your Clients
BC/DR - Parting Thoughts The reality A sophisticated DR plan that is too complex or expensive to properly maintain and test is worse than a plan that only does the minimum because it gives a false sense of security. It is better to have a minimal/skeleton plan than nothing at all. Have some sort of plan to avoid hair on fire situations Know who to call, and when to declare an emergency Maintain system documentation, adequate backup and restore procedures, and test periodically Assess your vendors risks regularly. 45 Questions / Comments? Steve Earley, M.S., CISA, CISSP, CRISC, CFSA, ITILv3, MCP Senior Manager, IT Audit Internal Audit and Risk Advisory Services searley@schneiderdowns.com (614) 586-7115 Thank You! Business Continuity and Disaster Recovery 23 46