MEDITECH Disaster Recovery



Similar documents
Leveraging Virtualization for Disaster Recovery in Your Growing Business

Technical Considerations in a Windows Server Environment

The Benefits of Continuous Data Protection (CDP) for IBM i and AIX Environments

Planning and Implementing Disaster Recovery for DICOM Medical Images

Offsite Backup with Fast Recovery

The 9 Ugliest Mistakes Made with Data Backup and How to Avoid Them

NSI Solutions with Microsoft VSS

Whitepaper. Disaster Recovery as a Service (DRaaS): A DR solution for all

DISASTER RECOVERY BUSINESS CONTINUITY DISASTER AVOIDANCE STRATEGIES

BACKUP ESSENTIALS FOR PROTECTING YOUR DATA AND YOUR BUSINESS. Disasters happen. Don t wait until it s too late.

Backup vs. Business Continuity: Using RTO to Better Plan for Your Business

White Paper: Backup vs. Business Continuity. Backup vs. Business Continuity: Using RTO to Better Plan for Your Business

Developing a Complete RTO/RPO Strategy for Your Virtualized Environment

Backup vs. Business Continuity: Using RTO to Better Plan for Your Business

Powered by DATTO. Backup vs. Business Continuity: Using a Recovery Time Objective (RTO) to Better Plan for Your Business

Breakthrough Data Recovery for IBM AIX Environments

Backup vs. Business Continuity: Using RTO to Better Plan for Your Business Networks Plus

The State of Global Disaster Recovery Preparedness

Preventing Downtime from Data Loss and Server Failure

Parish National Bank. Parish National Bank increases protection with recovery management from EMC and VMware BUSINESS VALUE HIGHLIGHTS

Three Things to Consider Before Implementing Cloud Protection

DISASTER RECOVERY PLANNING GUIDE

Always On: Unitrends Disaster Recovery Services (DRaaS)

Disaster Recovery. Stanley Lopez Premier Field Engineer Premier Field Engineering Southeast Asia Customer Services and Support

Maximizing Business Continuity and Minimizing Recovery Time Objectives in Windows Server Environments

16 Common Backup Problems & Mistakes

Advent. Disaster Recovery: Options for Investment Managers. A White Paper from Advent Software and CyGem Ltd. Advent Software, Inc.

Backup & Disaster Recovery Options

Cloud Backup and Recovery

Whitepaper: Backup vs. Business Continuity

Top 10 Best Practices of Backup and Replication for VMware and Hyper-V

Frequently Asked Questions about Cloud and Online Backup

Read this guide and you ll discover:

With 57% of small to medium-sized businesses (SMBs) having no formal disaster

Top 10 Disaster Recovery Pitfalls

July 30, Internal Audit Report Information Technology Business Continuity Plan Information Technology Department

W H I T E P A P E R. Disaster Recovery: You Can Afford It

Requirements Checklist for Choosing a Cloud Backup and Recovery Service Provider

Disaster Recovery and Business Continuity What Every Executive Needs to Know

High Availability and Disaster Recovery for Exchange Servers Through a Mailbox Replication Approach

Library Recovery Center

Disaster Recovery Hosting Provider Selection Criteria

PRODUCT SCENARIOS BEST-IN-CLASS DISASTER RECOVERY FOR WINDOWS SERVERS

BUSINESSES NEED TO MAXIMIZE PRODUCTIVITY, LOWER COSTS AND DECREASE RISKS EVERY DAY.

2014 StorageCraft. All rights reserved. CASE STUDY: NUMA NETWORKS

Backup 2.0: un opportunità bestiale. Todd Fredrick Executive Vice President Sales&Marketing e Cofounder di AppAssure

Disaster Recovery for Small Businesses

Availability and Disaster Recovery: Basic Principles

Planning a Backup Strategy

Hybrid Business Cloud Backup

5 Essential Benefits of Hybrid Cloud Backup

Things You Need to Know About Cloud Backup

What you need to know about cloud backup: your guide to cost, security, and flexibility. 8 common questions answered

Business Continuity and Disaster Survival Strategies for the Small and Mid Size Business.

Perforce Backup Strategy & Disaster Recovery at National Instruments

White Paper FASTFILE / Page 1

Manufacturers Need More Than Just Backup... But they don t need to spend more! axcient.com

TOP CONSIDERATIONS FOR BUSINESS ONLINE BACKUP

Disaster Recovery. Maximizing Business Continuity and Minimizing Recovery Time Objectives in Windows Server Environments.

2014 StorageCraft. All rights reserved. CASE STUDY: ADVANCED BACKUP SOLUTIONS

HOW TO CHOOSE A CLOUD BACKUP SERVICE PROVIDER

TABLE OF CONTENTS DR IMPLEMENTATIONS:... DRAAS:... DR BUDGETS:... INTRODUCTION:... KEY FINDINGS:... PREPARATION:... COMPLIANCE:... CONCLUSION:...

Welcome to My E-Book

W. Curtis Preston Executive Editor, TechTarget Independent Backup Expert

STORAGECRAFT SHADOWPROTECT 5 SERVER/SMALL BUSINESS SERVER

Top 10 Reasons for Using Disk-based Online Server Backup and Recovery

Virtualization. Disaster Recovery. A Foundation for Disaster Recovery in the Cloud

Cloud, Appliance, or Software? How to Decide Which Backup Solution Is Best for Your Small or Midsize Organization.

Capacity planning with Microsoft System Center

What You Need to Know About Cloud Backup: Your Guide to Cost, Security, and Flexibility

Hosting.com & VMware Deliver Disaster Recovery For All

How To Choose An Online Backup Solution

How to Choose a Cloud Backup Service Provider

The Essential Guide for Protecting Your Legal Practice From IT Downtime

StorageCraft Technology Corporation Leading the Way to Safer Computing StorageCraft Technology Corporation. All Rights Reserved.

How to Choose a Cloud Backup Service Provider

Backup and Recovery 1

How To Protect Data On Network Attached Storage (Nas) From Disaster

Delivering Fat-Free CDP with Delphix. Using Database Virtualization for Continuous Data Protection without Storage Bloat.

Cloud Computing. Chapter 10 Disaster Recovery and Business Continuity and the Cloud

With a Data Backup Plan, Your Business is Safe. NCGIT.com

About Backing Up a Cisco Unity System

Approximately 260 PST files totaling 180GB will be included in the pilot. 2. Are the Windows XP clients running XP 64 bit or 32 bit OS?

Justifying an Investment in Disaster Recovery

Backup is Good, Recovery is KING

BUSINESS CONTINUITY PLAN

Disaster Recovery 101. Sudarshan Ranganath & Matthew Phillips Ellucian

Financial Services Need More than Just Backup... But they don t need to spend more! axcient.com

WHY CLOUD BACKUP: TOP 10 REASONS

2013 StorageCraft. All rights reserved. CASE STUDY: CONNECTICUT COMPUTER SERVICE

Disaster Recovery. Steve Suttles. In Partial Fulfillment of the requirements. For Senior Design CTC 492

How SMBs Can Benefit From Hybrid Cloud-Based Backup and Business Continuity

Traditional Backup vs. Business Continuity

How do you test to determine which backup and restore technology best suits your business needs?

Why cloud backup? Top 10 reasons

Rapid recovery from bare metal, to dissimilar hardware or to and from virtual environments.

Virtual Infrastructure Security

Availability Digest. What is Active/Active? October 2006

Speeding Recovery Through the Cloud Presentation to Mid TN ISC2 Matthew Stevens, Senior Solutions Engineer Windstream Hosted Solutions

Always Be Testing: Making the Case for ABT

Transcription:

MEDITECH Disaster Recovery Real-World Disasters and Key Lessons Learned Prepared for MUSE Education

Where Did I Find Today s Real- World Examples? All real-world All from my first-hand experiences Names are withheld to protect the innocent 2 BridgeHead Software / Healthcare Data Management

Goals for this Discussion Familiarize you with some of the language of Disaster Recovery Share real-world DR examples and lessons learned Engage in a discussion of how to apply those to your hospital environments 3 BridgeHead Software / Healthcare Data Management

DR Language: Planning Against RPO and RTO RPO = Recovery Point Objective Formal Definition: The time from before the disaster event occurs, from which you have a recoverable set of data Practical Definition: How much data can you afford to lose? RTO = Recovery Time Objective Formal Definition: The time that it takes after a disaster event occurs, for you to recover your healthcare applications and data Practical Definition: How long can you afford to be down? 4 BridgeHead Software / Healthcare Data Management

Understanding RPO and RTO RPO: How much data can you lose? RTO: How much time can you wait? 1 week 4 hrs 1 min 1 min 4 hrs 1 week Point in Time at which the Disaster occurs 5 BridgeHead Software / Healthcare Data Management

Understanding RPO and RTO RPO: How much data can you lose? RTO: How much time can you wait? The sweet spot for most hospitals 1 week 4 hrs 1 min 1 min 4 hrs 1 week Point in Time at which the Disaster occurs 6 BridgeHead Software / Healthcare Data Management

First, Some Surprising Facts About MEDITECH Recovery The most common root causes to failures are: Data corruption User error The vast majority of day-to-day restores come from monthly tapes Often, IT is not the limiting factor to restore time Patient data is always important, but in some scenarios it s not required as quickly as other data During recovery, almost all hospitals discover that there is some application or set of applications that they have not been protecting They almost always thought they had been adequately protecting all of their applications 7 BridgeHead Software / Healthcare Data Management

Is this your disaster recovery plan? 8 BridgeHead Software / Healthcare Data Management Where are YOUR tapes?

Real-World Scenario: Critical Access Hospital, Weather Event Hospital building was seriously affected by a weather event- sections of the hospital were uninhabitable The datacenter was in a different section of the hospital and not directly affected by the weather event, but was flooded with water due to aftereffects Hospital IT staff were unable to reach the hospital for some time due to police lines and confusion When staff was finally able to get to the datacenter, untrained personnel removed servers but left SAN in the water Hospital s tapes were in a cardboard box next to SAN 9 BridgeHead Software / Healthcare Data Management

Immediate Aftermath of the Disaster Hospital closed immediately: Patients were transferred to other hospitals No one even tried to recover data for 3 days Other life threatening issues were addressed first Staff were unable to be onsite due to family issues and site restrictions Staff resignations due to unclear future, both administrative and clinical No hardware to restore to when they finally got around to it Data on SAN and tape is still underwater at this point Key restore requirements were very surprising to most people- remember this is still a business Payroll- employees still need to get paid (correctly!) Blackberry/Exchange Server- staff communication 10 BridgeHead Software / Healthcare Data Management

Final Outcomes A rented space was used for recovery Hardware was shipped in by multiple vendors to accommodate restore process Data was backed up from SAN after lengthy drying process In order to restore Payroll, the vast majority of MEDITECH servers needed to be restored because of dependencies between modules Hospital made cutoff for payroll company, but just barelythis was almost 2 weeks after the disaster event It was over 6 months before the hospital saw another patient due to physical infrastructure issues Hospital eventually merged with a regional organization 11 BridgeHead Software / Healthcare Data Management

Lessons Learned ü There is no substitute for an offsite copy of all of your data. ü A hospital is still a business and sometimes this is forgotten in disaster planning- patient data is key, but without employees, money, communications, and infrastructure, the hospital can not operate. ü In a site or regional disaster, there is a good chance that staff may not be allowed near the datacenter for safety reasons. ü Staff may be preoccupied with personal or family issues and unable to assist with recovery. ü Disaster planning is more than software and hardware- documentation and plans are key- remember it may not be your staff attempting the recovery. ü You never want to have to put an employee s life at risk to attempt to recover data. 12 BridgeHead Software / Healthcare Data Management

13 BridgeHead Software / Healthcare Data Management

Real-World Scenario: Regional Hospital, Datacenter Modification Hospital was adding new air handler to datacenter Contractor drilled a large hole in the concrete floor to run a power line for air handler and left core and concrete dust below floor When air handler was turned on Servers, storage units, and switches pull air from front to back, and the dust was sucked into every running piece of equipment in the datacenter Customer had very limited backups of critical systems, including PACS images, MEDITECH systems, ancillary servers, and infrastructure Hospital had a disk based backup solution which was also affected by the dust cloud Due to insurance issues and liability determinations, remediation was delayed and the customer was unable to make changes to the environment Vendors had difficulty supporting hardware until remediation was completed 14 BridgeHead Software / Healthcare Data Management

Final Outcomes A company was brought in to individually clean each piece of equipment. Hospital was forced to take multiple extended downtimes to critical systems while remediation was in process IT staff was directly involved in each step of the process, including nights and weekends, limiting their availability to work on new profitable initiatives within the hospital over several months Hospital was at significant risk of disk failure on hundreds of pieces of equipment for a lengthy period of time- they were very fortunate Another minor disaster during the recovery process may have been completely unrecoverable The disk backups were as affected by this as the disks the primary data was on- if it s only on disk, it s at risk 15 BridgeHead Software / Healthcare Data Management

Lessons Learned ü There is no substitute for an offsite copy of all of your data. ü I doubt anyone could have predicted this specific disaster situation. Your plan will never be perfect but if some basic situations are well covered for, many others will be covered as well. ü In this case, RTO was very lengthy- months. This customer never lost any data, but my guess is that they would have preferred to restore from a day old tape to new hardware than go through what they did. ü Disk based backup solutions are very good for fast recovery of data, but make sure that there is still an offsite copy as well. ü The application which took the most significant overall impact from this disaster and recovery process was the customer s PACS system, and the DICOM images stored on a SAN with no backup. These images need to be protected. 16 BridgeHead Software / Healthcare Data Management

17 BridgeHead Software / Healthcare Data Management

Real-World Scenario: Large Hospital Chain, Disk Corruption MEDITECH EMR disk slowly corrupted over several monthscorruption was not significant enough to bring down the EMR Backups to tape were happening on a daily basis and were dutifully backing up the corruption As corruption progressed, server became slower and slowerfinally staff rebooted the server and it would not come up. When the previous night s backup was restored, it contained the corruption and would also not boot. Customer had replicated MEDITECH data using array replication technology, but the replicated copy was corrupted as well Customer finally found a good backup with no corruption from several months back In the meantime, consultants were trying to find a way to fix the corruption Customer was faced with a difficult question: do I restore from a known good backup from months ago and lose a lot of EMR data or wait for a solution to resolve the corruption? 18 BridgeHead Software / Healthcare Data Management

Final Outcomes Hospital elected to wait to see if consultants could fix corruption Consultants were able to transplant the Master File Table (MFT) from the good, restored disk to the corrupted disk, and it worked- hospital was able to recover back to the time of the reboot Hospital still had to re-enter and fix data since the reboot- EMR was down for over 36 hours 19 BridgeHead Software / Healthcare Data Management

Lessons Learned ü It s extremely important to keep many generations of backups. The vast majority of MEDITECH restores come from monthly (or yearly) tapes. ü Who in your organization would make the decision to keep a system down in order to get back to a better point in time? ü More importantly, who in your organization would make the decision to lose several months of data in order to get back up and running sooner? ü Array based replication is a good tool, but it does not eliminate the need to keep many generations of backups (offsite!). Be sure if you re using array based replication in MEDITECH that you are getting application consistent points in time. 20 BridgeHead Software / Healthcare Data Management

21 BridgeHead Software / Healthcare Data Management

Real-World Scenario: Overwrite of MEDITECH SCA Data MEDITECH Scanning and Archiving (SCA) servers contain images of Point of Care scans (drivers licenses, insurance cards, etc) and Reports archived from other modules within MEDITECH. It can also contain data from other applications and sources. MEDITECH SCA disks contain millions of files and grow very large. They are difficult to protect using standard backup strategies because of the large volume of data and files. MEDITECH overwrites data in SCA disks on a regular basis. A change was made in this hospital s MEDITECH environment which caused over a million reports in SCA to be overwritten Because the backup window for SCA was so lengthy, backups had been stopped months ago The customer was replicating the disk with the SCA data, which caused the data at the replicated site to be unusable as well. 22 BridgeHead Software / Healthcare Data Management

Final Outcomes MEDITECH and the customer spent months rebuilding the data from the MEDITECH modules. 23 BridgeHead Software / Healthcare Data Management

Lessons Learned ü Deep generational protection is a key requirement for any backup strategy. (Recovery Depth Objective) ü Replication is good for specific scenarios, but if mistakes are made those will be replicated too. ü MEDITECH SCA is a unique application with specific protection requirements, and moving forward it will require careful disaster recovery planning as it becomes more important. ü Depending on your implementation, MEDITECH SCA may be a very important part of your MEDITECH environment. 24 BridgeHead Software / Healthcare Data Management

Disasters You May Not Have Considered Train car full of chemicals overturns next to the hospital- staff is not allowed in the hospital Sprinkler system in datacenter activated by high heat floods servers and storage (you don t have water pipes in your datacenter, do you?) SAN fails due to bad firmware upgrade Wind breaks the window of the datacenter and rain gets into the core switch (I ve seen it happen) Your backup tapes are overwritten every day by staff trying to save money An upgrade breaks a system and can t be backed out Hospital becomes triage center for latest SARS epidemic With a Category 5 hurricane bearing down, your IT staff evacuates with their families 25 BridgeHead Software / Healthcare Data Management

What Does Good Look Like? Establish well thought out, realistic RPOs and RTOs for all applications. Geographically dispersed protection- multiple sites and multiple formats Disk and tape provide both fast restore and offsite protection No onsite tapes (the next building doesn t count!) Cloud is fine- if you can get the data there and back! Deep generational protection (monthly and yearly backups) MEDITECH application consistent backups (ISB, IDR, MBF) PACS/DICOM geographical and generational protection Other applications Exchange, SQL, Domain Controllers, SharePoint, Oracle Cardiology, Oncology, Pathology etc. non-radiology PACS systems VMWare If multiple sites, be sure you can back up in your secondary datacenter! Your hospital may be able to recover from one disaster, but what if another hits while you are recovering? Have a well thought out and documented plan for disaster recovery- it may not be you doing the restore A well trained staff knowledgeable in your backup product Know who in your organization can make critical decisions involving data loss and recovery in a disaster 26 BridgeHead Software / Healthcare Data Management

Why BridgeHead? [placeholder for corporate team use] 27 BridgeHead Software / Healthcare Data Management

Thank-you! 28 BridgeHead Software / Healthcare Data Management