University of Michigan Disaster Recovery / Business Continuity Administrative Information Systems. 1
Michigan Administrative Information Services (MAIS) MAIS is responsible for the production support of information technology systems that support the university s mission-critical business processes. Disaster Recovery / Business Continuity Officer 2
History of Disaster Recovery 1950s: off site storage of critical hardcopies 1960s: periodic file backups stored off site 1970s: regular back up and off site storage 1980s: use of alternate sites 1990s: network recovery planning 3
Today: Enterprise-wide Contingency Planning Critical business processes Dependence on systems / internet Facility recovery requirements Reduce outage time Ensure Recovery Point Objective (RPO) 4
Project Background From 1996 2001, the university converted almost all of its major administrative systems to new software applications running in a new technical infrastructure A plan did not exist for restoring the new environment Executive office issued the mandate for a disaster recovery / business continuity project 5
BCP Objectives Protect staff ensure safety with fire alarms & extinguishers, security, training, etc. Avoid systems disruptions identify failure modes or weaknesses; Raise awareness Safeguard systems assets items that generate direct benefits or they add value by supporting other assets Minimize confusion / miscommunications locating and contacting personnel; Predetermined meeting place 6
Priorities Reduce exposure Determine business needs Set long-term goals and priorities 7
Standard Business Continuity Planning Steps Risk assessment: understand threats and risks Business Impact Analysis (BIA): understand impact to business in lost income, image, etc. Mitigate risks: prevent disruptions Recover Business: Planned contingencies Resume Business: full restored business 8
Mitigate Risks: Prevent Disruptions Generator Machine room location Personnel travel 9
Recover Business: Planned Contingencies Work around processes Evaluate risks and impacts to develop scenarios Set business priorities to determine recovery order 10
Risk Management Contradictions What is the level of risk stakeholders are willing to assume? What risk is actually reduced? What budget is available? Fort Knox-like protection means high cost 11
Project Was Divided Into Three Phases Phase I, stop gap measures Phase II, implement recovery solution Phase III, test solution 12
Phase I, Stop Gap Measures Identify and apply the best value solutions to big gaps first Prioritize Hundreds of opportunities for risk reduction arise during analysis Organization must decide which of these should be implemented and in what sequence 13
Phase I, Stop Gap Measures Use existing tools, materials and staff Disaster occurs in the midst of respective, critical business cycles Nature of the disaster is unknown Duration of system outage is unknown 14
Phase I Output: Recovery Time Objective Business unit contingency plans MAIS technical infrastructure recovery plans Evaluated contingencies and timelines to derive RTO RTO used to determine recovery solution 15
Phase II, Implement Recovery Solution Select Vendor Review Risk Assessment Update plans Implement readiness preparations and procedures Develop continuous planning process 16
Select Vendor Size Stability Technology Local presence Experience in actual recoveries Range of services 17
Vendors responding to RFP SunGard IBM Various small companies 18
Vendor Size 42 locations in north America 50 mobile data centers 19
Vendor Stability 70% of NASDAQ trades flow through SunGard systems Customers include 47 of the world s 50 largest financial institutions 15 trillion dollars investment assets worldwide pass through Sungard systems daily 20
Vendor Technology 30 different technology platforms, but dependent on facility 21
Vendor Local Presence Southfield Office 22
Vendor Experience 25 years experience 1500 recoveries all successful Over 100,000 tests 23
Vendor Range of services Testing Silhouette OS Partnership with Iron Mountain Mobile recovery Professional services 24
SunGard Purchased Services Performed Information Protection Analysis Purchased Sillouette OS Assistance with technical recovery planning Purchased PreCovery software for plan management 25
Phase II, Review Risk Assessment Business Impact Analysis (BIA): Understand Impact to Business Don t collect data by unplugging equipment and monitoring the accumulation of losses! List assets, estimates of impact, likelihood and resulting exposure Compare the reduction in risk per dollar spent for each measure, giving relative value to the business and a basis for comparison 26
Phase II, Blackout of 2003 and BIA 50 million people / 9,300 square miles Students Research Visitors Personnel 27
Phase II, Blackout Lessons Learned Be prepared Involve sr. Management in planning Communicate have clear decision making authority Practice 28
Phase II, Update plans Tech recovery Business continuity develop MAIS business continuity 29
Phase II, Continuous Planning Plan maintenance New products and services - sometimes at reducing cost Many services are contracted - improved terms can frequently be negotiated Budgets are set according to stakeholder perception of the risks - continual awareness Align budget with expectations 30
Phase III, Testing Plan effectiveness Metrics Enterprise operations Contingency planning 31
Approach Planning With a Sense of Urgency Higher user demands / dependence on technology Enterprise wide planning Understand business impact Critical business processes Reduce outage time Ensure Recovery Point Objective (RPO) 32
Remain Focused On solutions not scenarios On business dependence on systems / internet On recovery requirements RPO, RTO On filling big holes first 33
Next Steps University shared recovery solution 34