Business Continuity Planning Principles and Best Practices Tom Hinkel and Zach Duke
Agenda Key components essential to a FFIEC compliant Business Continuity Plan Recovery Time Objectives & Recovery Point Objectives Lessons learned from recent disasters Cost-effective recovery options to protect your critical business processes Testing moving from Compliant to Recoverable
FFIEC BCP Update
FFIEC Four Phase Process 1 Business Impact Analysis 4 Risk Monitoring and Testing 2 Risk Assessment 3 Risk Management
Business Impact Analysis Foundation for all Business Continuity Plans FDIC Are disaster recovery and business continuity plans based upon a business impact analyses (Y/N)? Identification and prioritization of all business functions and processes, including interdependencies Maximum Allowable Downtime (MAD) Recovery Time Objectives (RTO) Recovery Point Objectives (RPO)
BIA - Recovery Time Objectives Must factor the cost to NOT recover Reputational Operational Regulatory Strategic as well as the financial cost to recover.
BIA Example - Teller - Relative Risk Recovery Time (days) Financial Operational Regulatory Strategic Reputation < 0.5 9 2 0 1 2 1 7 4 1 3 5 2 6 5 8 5 8 3 5 5 10 7 10 4 3 6 10 8 10 5 2 7 10 9 10 6+ 2 7 10 10 10 0 = insignificant risk 10 = unacceptably high risk
BIA Example - Teller -
Business Impact Analysis RTO, MAD & RPO RTO - The amount of downtime that can be tolerated without exceeding risk tolerances. MAD The point at which recovery become impossible or losses unrecoverable. RPO Acceptable data loss. Defined by frequency of data backups.
Business Impact Analysis One of the advantages of analyzing allowable downtime and recovery objectives is the potential support it may provide for the funding needs of a specific recovery solution based on the losses identified and the importance of certain business functions and processes.
Risk Assessment Impact x Probability = Potential Severity Analyze threats based on the potential impact to the institution The most difficult threats to address are those that have a high impact on the institution but a low probability of occurrence.
Risk Assessment Relative Severity Index 4 3.5 3 Disruption 2.5 2 1.5 1 0.5 0 Threat
High Probability Low Impact 5 4 3 High Probability High Impact Fraud, Theft, Blackmail Sabotage Vandalism & Looting Terrorism Fire Probability 2 1 0-5 -4-3 -2-1 0 1 2 3 4 5-1 -2-3 Flood, Water Damage Air Contaminants Severe Weather Hazardous Spills Comm. w/customers Comm. w/employees Payment Service Providers Affiliates & Vendors Power Failure Low Probability Low Impact -4-5 Impact Low Probability High Impact Equipment & Software Failure Trans. System Disruptions Water System Disruptions Pandemic
Interdependencies Analyzing interdependencies represents a critical step in the business continuity process and is an integral part of a business impact analysis. A work flow analysis involves an assessment and prioritization of those business functions and processes that must be recovered. Should assist management in determining the priority of business functions and processes and the overall affect on recovery timelines.
Work Flow Example Interdependencies
Work Flow Example Interdependencies
Criticality Interdependencies
Complexity Interdependencies
Where to Start? Perform/Re-visit Business Impact Analysis Senior Management / Board of Directors Review Likelihood of Achieving RTO Technology, Facility, and Personnel Review Review High Risks from Risk Assessment Review RTO Estimates Highlight Deficient Areas to Management Gap Analysis
Technology s Role Design Addresses BIA / RA Policy and procedure adherence Process / Automation Reduces and RTO Risk reducing Documentation Oversight Trending Testing / Reporting (RTA)
Downtime Always on / connected society Acceptable length of downtime continues to shorten Critical applications shift Internet Email
Advantages Centralization of IT Reduced branch / user infrastructure Enhanced administration with one time updates Scalability new locations / institutions Risks All eggs in one basket (DR) Bandwidth All applications may not work
What is Virtualization? I keep hearing the term, but what does it mean to me? Common Terms Hypervisor VMWare XENServer Hyper V VDI Virtual Desktop Infrastructure The new Dumb Terminal Citrix Application Virtualization (Centralization) Loans Shared storage (SAN)
The Old Server Room
Virtualization Virtualization Stable configuration with N+1 design New servers can be provisioned quickly Additional resources easily added for scalability Better utilization of resources Centralized management Disaster recovery replication
Basic N+1
Shared Storage Server A Virtual N+1 with Shared Storage Server B Server C ESX 1 Server 1 ESX 2 Server 2 SAN Server 3 ESX 3 Server Alpha Server Zulu
Server Failure Scenario Server A ESX Physical Server Failure Server B Server C ESX 1 Server 1 ESX 2 Server 2 SAN Server 3 ESX 3 Server Alpha Server Zulu
Disaster Recovery Main Office / OPS DR Location Server A Server A Server B Server B Server C ESX 1 SILVER PEAK Server C Server 1 ESX 2 ESX 1 Server 1 Server 2 SAN SAN Server 2 Server 3 ESX 3 Server Alpha SILVER PEAK ESX 2 Server 3 Server Alpha Server Zulu Server Zulu
Circuit Redundancy MPLS Communications Automatic Router Failover Internet Connectivity - continues to have increased importance for institutions Key Technology Options Redundant Internet (Main Location & DR location) Redundant Hardware
Point to Point Single Point of Failure Location 1 Location 2 Location 3 Location 4
Frame Relay Private Virtual Circuits Location 1 Location 2 Location 3 Location 4
MPLS Example Everyone is Connected Location 1 Location 2 Location 3 Location 4
Data Vaulting Remote offsite backups via Internet connection More frequent backups than tapes (lower RPO) Removes need for tapes and rotations, file recovery NOT location specific Removes people from the process Encrypted from Point to Point Risks Internet connectivity Large restores (i.e. whole server)
Server Recovery Server Recovery Allows for snapshot replication of servers Significantly reduced RTO Does not require independent hardware for testing or recovery Centrally managed Designed for smaller data footprints
The Cloud
Cloud Solutions Service Bureau Core Processing Application Hosting (SaaS) Mortgage, Imaging, New Accounts Hosted Exchange Encryption, SPAM / AV Protection Hosted Servers Web, Intranet, etc. Collocated Solutions Move Servers from the Main Office / Ops Center Reduces Disaster Recovery Risk
Additional Considerations Facilities Personnel Cross training Plan for DR without key personnel Unplanned Recovery Events Documentation of DR events that occur during normal processes
Why Test? Validation of the Business Continuity Plan Auditors/regulations require BCP testing Practical application to support regulatory and audit issues Potentially identify operational exposures Maintain awareness and education
Testable DR/BCP - Recoverable Do test recovery objectives follow BIA? Do test scenarios follow RA? Process recovery NOT system recovery Tested Increasingly complex scenarios Gap analysis between RTO and actual recovery capability
Questions/Feedback Tom Hinkel, CISA, CRISC, CCSA Director of Compliance tom@safesystems.com www.complianceguru.com Zach Duke EVP, Business Development zach@safesystems.com #SafeConf