17573: Availability and Time to Data How to Improve Your Resiliency and Meet Your SLAs



Similar documents
Achieve Continuous Computing for Mainframe Batch Processing. By Hitachi Data Systems and 21st Century Software

HITACHI DATA SYSTEMS USER GORUP CONFERENCE 2013 MAINFRAME / ZOS WALTER AMSLER, SENIOR DIRECTOR JANUARY 23, 2013

Simplify and Improve Database Administration by Leveraging Your Storage System. Ron Haupert Rocket Software

System Availability and Data Protection of Infortrend s ESVA Storage Solution

Synchronous Data Replication

How to Manage Critical Data Stored in Microsoft Exchange Server By Hitachi Data Systems

Storage Virtualization for Mainframe DASD, Virtual Tape & Open Systems Disk

Using Hitachi Protection Manager and Hitachi Storage Cluster software for Rapid Recovery and Disaster Recovery in Microsoft Environments

Restoration Technologies. Mike Fishman / EMC Corp.

Continuous Data Protection for any Point-in-Time Recovery: Product Options for Protecting Virtual Machines or Storage Array LUNs

The Benefits of Continuous Data Protection (CDP) for IBM i and AIX Environments

Storage System High-Availability & Disaster Recovery Overview [638]

Application Backup and Restore using Fast Replication Services. Ron Ratcliffe March 13, 2012 Session Number 10973

HP StorageWorks Data Protection Strategy brief

DEFINING THE RIGH DATA PROTECTION STRATEGY

Local and Remote Replication Solutions from IBM, EMC, and HDS Session SHARE Pittsburgh August 6, 2014

Gartner: The Broken State of Backup

IBM DB2 Recovery Expert June 11, 2015

Trends in Data Protection and Restoration Technologies. Mike Fishman, EMC 2 Corporation (Author and Presenter)

IMS Disaster Recovery Overview

Blackboard Managed Hosting SM Disaster Recovery Planning Document

Unitrends Backup & Recovery Solutions and Disaster Recovery Best Practices

Disaster Recovery for Oracle Database

courtesy of F5 NETWORKS New Technologies For Disaster Recovery/Business Continuity overview f5 networks P

Lunch and Learn: Modernize Your Data Protection Architecture with Multiple Tiers of Storage Session 17174, 12:30pm, Cedar

BUSINESS CONTINUITY AND DISASTER RECOVERY FOR ORACLE 11g

Business Continuity: Choosing the Right Technology Solution

WHITE PAPER Overview of Data Replication

Quickly Deploy Microsoft Private Cloud and SQL Server 2012 Data Warehouse on Hitachi Converged Solutions. September 25, 2013

DISASTER RECOVERY BUSINESS CONTINUITY DISASTER AVOIDANCE STRATEGIES

Business Continuity with the. Concerto 7000 All Flash Array. Layers of Protection for Here, Near and Anywhere Data Availability

SQL-BackTrack the Smart DBA s Power Tool for Backup and Recovery

BACKUP IS DEAD: Introducing the Data Protection Lifecycle, a new paradigm for data protection and recovery WHITE PAPER

Virtual Disaster Recovery

Building Continuous Cloud Infrastructures

High availability and disaster recovery with Microsoft, Citrix and HP

Introduction to Data Protection: Backup to Tape, Disk and Beyond. Michael Fishman, EMC Corporation

Pervasive PSQL Meets Critical Business Requirements

Backup and Recovery Solutions for Exadata. Cor Beumer Storage Sales Specialist Oracle Nederland

Enabling comprehensive data protection for VMware environments using FalconStor Software solutions

Veritas Cluster Server from Symantec

Hitachi Universal Replicator Software Architecture Overview

Backup and Recovery Solutions for Exadata. Ľubomír Vaňo Principal Sales Consultant

Virtualizing disaster recovery using cloud computing

Introduction to Data Protection: Backup to Tape, Disk and Beyond. Michael Fishman, EMC Corporation

Storage and Disaster Recovery

Technology Insight Series

Leveraging the Cloud for Data Protection and Disaster Recovery

NetApp SnapMirror. Protect Your Business at a 60% lower TCO. Title. Name

Real-time Protection for Hyper-V

TABLE OF CONTENTS THE SHAREPOINT MVP GUIDE TO ACHIEVING HIGH AVAILABILITY FOR SHAREPOINT DATA. Introduction. Examining Third-Party Replication Models

The Oracle StorageTek VSM6 Providing Unprecedented Storage Versatility

SQL Server Storage Best Practice Discussion Dell EqualLogic

Technical Considerations in a Windows Server Environment

EMC NETWORKER SNAPSHOT MANAGEMENT

Rajesh Gupta Best Practices for SAP BusinessObjects Backup & Recovery Including High Availability and Disaster Recovery Session #2747

Informix Dynamic Server May Availability Solutions with Informix Dynamic Server 11

Backup to Tape, Disk and Beyond. Jason Iehl, NetApp

Veritas Storage Foundation High Availability for Windows by Symantec

Stretching A Wolfpack Cluster Of Servers For Disaster Tolerance. Dick Wilkins Program Manager Hewlett-Packard Co. Redmond, WA dick_wilkins@hp.

Contents. SnapComms Data Protection Recommendations

SQL SERVER ADVANCED PROTECTION AND FAST RECOVERY WITH EQUALLOGIC AUTO-SNAPSHOT MANAGER

Disaster Recovery Issues and Solutions

Taking the Disaster out of Disaster Recovery

The Difference Between Disaster Recovery and Business Continuance

Microsoft SharePoint 2010 on VMware Availability and Recovery Options. Microsoft SharePoint 2010 on VMware Availability and Recovery Options

SQL SERVER ADVANCED PROTECTION AND FAST RECOVERY WITH DELL EQUALLOGIC AUTO SNAPSHOT MANAGER

EMC Backup and Recovery for Microsoft SQL Server 2008 Enabled by EMC Celerra Unified Storage

Dell Data Protection Point of View: Recover Everything. Every time. On time.

Introduction to Enterprise Data Recovery. Rick Weaver Product Manager Recovery & Storage Management BMC Software

A SURVEY OF POPULAR CLUSTERING TECHNOLOGIES

The case for cloud-based disaster recovery

Mayur Dewaikar Sr. Product Manager Information Management Group Symantec Corporation

ETERNUS CS High End Unified Data Protection

Disaster Recover Challenges Today

Westek Technology Snapshot and HA iscsi Replication Suite

IBM TotalStorage IBM TotalStorage Virtual Tape Server

How To Write A Server On A Flash Memory On A Perforce Server

EonStor DS remote replication feature guide

Transcription:

17573: Availability and Time to Data How to Improve Your Resiliency and Meet Your SLAs Ros Schulman Hitachi Data Systems Rebecca Levesque 21st Century Software August 13, 2015 10-11:00 am Southern Hemisphere 5 Disney Dolphin Hotel RS

About the Speakers Ros Schulman is Director, Data Protection at Hitachi Data Systems with over 35 years experience in the IT business both on the vendor and customer side including systems programming, operations, technical support and sales support. Ms. Schulman has worked at Hitachi Data Systems for over 22 years. She currently works in the Global Solutions Strategy and Development organization with responsibility for Data Protection Software and Mainframe Management software including disaster recovery and backup software. She spends much of her time with customers, discussing their unique Data Protection challenges. She has also co-authored many white papers in the area of Business Continuity. Rebecca M. Levesque is Chief Executive Officer of 21st Century Software and responsible for shaping the company s product strategy and vision. She has over 20 years experience helping hundreds of companies establish resiliency and recoverability strategies from any disruptive event. Her knowledge and depth of experience is offered as a speaker at SHARE, IBM Systems Magazine, ACP, and DRJ. She is a published author in trade journals, white papers, and industry forums. Additionally, Rebecca has helped executives, storage managers, and technical architecture professionals leverage their existing storage technology investments while ensuring efficiencies across the enterprise. About this Session Whatever type of Data Protection you have in place, this session demystifies long-standing (and often misguided) assumptions made around recoverability, availability and, more importantly, time to data. With an average of 12 to 15 copies of data out there, does your crash consistent point meet your SLA and have you considered all the variables, including batch. Learn how one company has faced these challenges, and combined storage-based replication and host-based software to improve their SLAs, to deliver a resilient and effective recovery solution. 2

Mainframe Missed Understandings In this 24x7 world, in which customers want to make online transactions at any time they choose, the window for batch processing has closed significantly. Yet the amount of batch work hasn t shrunk? (BMC White Paper) Most data center outages are caused by a systems or applications programmer, or another staff member creating a localized problem. If not caught promptly, the localized problem may spread, creating an entire systems outage? (Enterprise Systems) It s common to have an isolated product showing 100% availability when the user is unhappy. (Enterprise Systems) Our DR exercises have been 100% successful how is success measured? Mainframers are open to new processes are you? RS/RML 3

Gartner Key Challenges of Backup Solutions (Dave Russell) The convergence of backup and disaster recovery technologies has occurred largely because snapshot, replication and virtualization have made it possible to recover from a disaster without the need for a traditional data restoration. The methods involved in this convergence seek to minimize storage cost, while also allowing for instant recovery. (Posey, 2014) RML 4

Issues with Backup - Time - Recovery RS

What is your Recovery Point? DATA GROWTH 4 5 6 7 8 Successful backup 9 10 1 11 2 12 Backup Window 1 Needed 2 3 4 Data Recovery Objective 5 6 7 8 9 10 1 12 1 Available 2 Disaster Strikes 3 1 2 3 4 5 6 7 8 9 Data Loss 10 11 12 1 2 3 4 5 6 7 8 9 10 1 12 Disaster Strikes 1 2 3 4 5 6 7 8 3 9 10 RTO 11 12 1 2 3 4 5 6 Full backup Incr 1 Incr 2 Incr 3 Incr 4 7 Data Restored 8 9 Incr5 10 1 12 And, interdependencies between apps? RML 6

Your Business SLA The pressure is on for business and IT services to offer 100% availability and not some number of nines, even during site disasters. Despite businesses' desire for 100% availability and "availability as a utility," IT is not yet managing service availability like a utility. IT often lacks the necessary standards, the architectures are complex and dependent on many components, and. the people and processes involved in IT service delivery now often increase the risk of downtime more than technology risks do. September, 2014 Donna Scott - Gartner RML 7

RPO Affects RTO = SLA Rapid restore and recovery from any failure is essential Fault-tolerant rapid recovery application environments are needed With mainframes this includes both transactional and batch processing lications and databases have advanced transaction recovery features Batch jobs present a different problem What is continuous data protection (CDP) and near-cdp? RS/RML 8/17/2015 8

CDP Definition Continuous data protection (CDP) - recovery approach that continuously, or nearly continuously, captures and transmits changes to files or blocks of data while journaling these changes Some CDP solutions can be configured to capture data either continuously (true CDP) or at scheduled times (near-cdp/snapshots) Near-CDP is often done every few minutes or hours, minimizing potential data loss RS/RML 8/17/2015 9

Percentage Uptime Achieved Per Year Uptime Maximum Downtime/Year % Missed* Six nines 99.9999% 31.5 seconds Five nines 99.999% 5 minutes 35 sec Four nines 99.99% 52 minutes 33 sec Three nines 99.9% 8 hours 46 min Two nines 99.0% 87 hours 36 min One nine 90.0% 36 days 12 hours From http://www.continuitycentral.com/feature0267.htm 63% 34% 25% *Continuity Software Survey (10/2014) RS 10

Operational or DR or Both? Continuous availability enables 24/7 access to ITenabled business functions, processes and applications Involves two strategies: high availability (minimizing unplanned downtime) and continuous operations (minimizing planned downtime) Sometimes, continuous availability architectures embody disaster recovery strategies IT service processing continues despite disaster events Gartner refers to this as "multisite continuous availability (Gartner, Donna Scott - 2014) RML 11

Recovery Classes and Performance Operational Recovery (restore a file, folder, volume, system) RPO CDP Snapshots Backup 0 Secs Mins Hours Days RTO Disk Virtual Tape Tape Disaster Recovery (restore operations at/from another location) RPO Sync & Async Mirror Backup Replication Off-Site Tape 0 Secs Mins Hours Days RTO Disk Virtual Tape Tape RS 12

Filter. : ATM Backup S Dsname lid Method Rec Date/Time - ------------------------------------------- -------- -------- ---- ---------- R SUPT.DRVFI.ATM.FILEIN.PS ATM DSS 2010.041 18:12 _ SUPT.DRVFI.ATM.GDG001.PS.G0004V00 ATM DSS 2010.041 18:12 _ SUPT.DRVFI.ATM.GDG001.PS.G0006V00 ATM DSS 2010.042 13:55 _ SUPT.DRVFI.ATM.GDG001.PS.G0007V00 ATM DSS 2010.042 13:43 IT Resiliency Where, Why, How Long Operational Restore DR Production Restore Process Operational Resilience Disruptive Event Recovery/ Secondary Site Restore Process Reports SLAs do not care whether primary or secondary site is affected - any failures due to corruption, deletion, and other possible reasons requires restoration/recovery and affects Time to Data RML/RS

SLA Tiers Tie Solutions to SLA Determine SLA per application Tier 1 - continuous availability requirement, maximum four minutes downtime a year Tier 2 - high availability, maximum of one outage per year, maximum four hours outage per year Tier 3 - recovery essential within 24 hours, maximum three outages per year Tier 4 - recovery required within three days, maximum four outages per year Tier 5 - delayed recovery, all other services (Hiles, Continuity Central, 2015) RS/RML 14 14

Data Protection Technologies and Recovery Time Technology RPO Range RTO Range MAX Downtime per year Archive Static Depends Hours Traditional Backup to Tape/Disk including VTL or PBBA 24-168 hours 1-24 hours 1d 19h 49m 44.8s SLA Hours 99.5% Snapshot, near-cdp With or without lication Awareness Minutes 36 hours 15 mins 12 hours Synchronous Replication 0-2 mins 1-4 hours Synchronous Replication with Failover 0-2 mins 5-60 mins Asynchronous Replication 0-60 mins 1 8 hours Asynchronous Replication with Failover 0-60 mins 30 mins 4 hours 3 DC (Sync/HA and Async) 0-60 mins 5 mins 4 hours 4 Hours 22 minutes 4 Hours 22 minutes 52 mins 5 mins 4 hours 22 mins Minutes 99.95% 99.95% 99.99%- 99.999% 99.95% 52 mins 99.99% 4 hours 22 mins -5 mins 99.95%- 99.999% RS 15

Batch lication Crash Consistency Problem From http://www.smartercomputingblog.com/smarter-computing/business-resiliency-future/ RS/RML 16

Establish Problem and ropriate Solution Square expenditure and focus with expectation Too much focus on DR and not enough on resiliency Does not solve operational recovery issues Suitable solutions for smaller, local failures High availability does not solve everything Establish SLA s to meet business, not rely on RTO/RPOs Use the right technology lication uptime is the name of the game Most replication technologies cannot create crash consistent data May be ok for databases What about open or stranded datasets Non-database application recoverability requires different approach for both operational recovery and DR RS 17

Other Considerations Storage replication, software replication alone doesn t automate and guarantee failover Failover waits for go ahead or automate and roll the dice Corruption Tape/DASD speed differential SLA achieved through approach What about GDPS? Batch has to be restarted even with GDPS Potential for transactions to have been stranded in the failed site Attempts to apply the stranded changes to the data in the active site may result in an exception or conflict, as the before image of the update that is stranded will no longer match the updated value in the active site. (IBM Share - August, 2012) RML 18

One Customer s SLA Challenge Credit card processing Interlocking dependencies 24/7 operation Payment Card Industry (PCI) standards RTO = 15 minutes Current manual processes unable to meet RTO Production control system programmer to handle issues RS 8/17/2015 19

One Customer s roach Individual failure inevitable Fault tolerant storage Remote replication with multi-site DC operations Snapshot and recovery system for application state, data checkpoints, and recovery (pointin-time or PIT) Batch journal-like database Open datasets identified Restore via panel RML 8/17/2015 20

Mainframe Environment Example HUR A - B TimeLiner FC HUR separate consistency Opt. SI SI B- C TimeLiner TimeLiner, InstaRestore & InstaSnap are patent pending technologies RS 21

Mainframe Environment Example HUR A - B TimeLiner FC HUR separate consistency Opt. SI SI B - C TimeLiner TimeLiner, InstaRestore & InstaSnap are patent pending technologies RS 22

Mainframe Environment Example SI B- C On restart, TimeLiner and InstaSnap recognize that all of the FC data is not available for the third PiT. Opt. SI It will then roll back to the second PiT for job recovery. TimeLiner TimeLiner, InstaRestore & InstaSnap are patent pending technologies RS 23

Time to Data is the ultimate determinant of SLA effectiveness The amount of TIMErequired to restore application access to a business is what separates a temporary Operational Restore from a full-blown DISASTER RML 24

Questions to Ponder SLA s Are business SLAs achievable currently and do they match RTO/RPO s? Do you have the right technology to map to you SLA s What do you test from a recovery standpoint DR unplanned failover Data corruption Mid-batch cycle Deletions RML/RS 25

Hitachi Mainframe Business Continuity Portfolio In-System Replication Solutions IN-SYSTEM REPLICATION SOLUTIONS ShadowImage In-System Replication For full volume clones of business data with consistency (ATTIME split with HUR) IBM-Compatible FlashCopy Point-in-time volumes and datasets via IBM command set in z/os environment Remote Replication Solutions REMOTE REPLICATION SOLUTIONS Hitachi TrueCopy Synchronous, consistent clones at remote location up to 300km (~180 miles) Hitachi Universal Replicator (HUR) Any distance, unique use of Pull technology and Journal Volumes to accommodate link outages or interruptions Replication Management Solutions REPLICATION MANAGEMENT SOLUTIONS Hitachi Business Continuity Manager (BCM) Replication management in z/os environments Hitachi Replication Manager with mainframe BCM Enterprise-wide GUI replication management and monitoring for mainframe and open systems IBM-Compatible FlashCopy/SE Point In Time volumes with virtual volume pool as target vs. whole volumes IBM-Compatible XRC* Asynchronous remote replication via compatible with IBM XRC in z/os environment * Note: Only available as a migration tool to HUR IBM Basic HyperSwap, GDPS HyperSwap certification and/or integration High-availability storage for z/os environments Replication Assessment, Migration and Implementation Services Best practice designs that ensure results and lower risks in implementation Replication Score Card (Health Check) Services Periodic checkups that ensure continued solution value RS

21 st Century Software z/os Portfolio VFI Data Resiliency Storage Management & Usage Tape Migration In-System Replication Solutions Slashes recovery time objectives and assures SLAs Erases recoverability gaps, ensures resiliency, and reduces storage Enables recovery to job/dataset level through real-time architecture and patent pending software code (InstaSnap, InstaRestore, and Timeliner) Focuses on intricate, applicationlevel dependencies Protects InFlight data and maximizes snap copies sync ed to application backups Cross-application dependencies Total Storage Remote Replication Solutions True knowledge for storage and management class Insightful reporting for JCL coding to avoid policies Tier data based upon usage, reference, and criticality (from VFI) Data modeling for new purchase or improving allocation and utilization SpaceFinder Intelligent resource management tools Tape/Assist Migration in Place Replication Management Solutions Reduces cost of managing tape resources while providing continuous availability Director and high speed copy utilities Backend utility processes assure referential data Portability, disaster recovery, and backup copies for data resiliency Consolidation of data Expertise to assist in migration management Recovery Resiliency, Storage Management and Modeling, Tape Migration and Implementation Services RML

Questions? Thank you! 28

Thank You! For additional Hitachi information, please contact: Ros Schulman Data Protection Technologies Global Solutions Strategy and Development Hitachi Data Systems 973 207 4138 (cell) Ros.Schulman@hds.com For additional 21 st Century Software information, please contact: Rebecca Levesque 21 st Century Software, Inc. 940 West Valley Road Suite 1604 Wayne PA 19087 U.S.A. 610 971 9946 x 200 610 659 6521 (cell) RebeccaL@21csw.com RS 29