1 Soluzioni di Business Continuity per l Information Technology The New Enterprise Data Center IBM Information Infrastructure 2008 IBM Corporation Stefano Ricci IT Architect IBM Italia Systems & Technology Group 21 Novembre 2008
2 Agenda Principi e caratteristiche funzionali di una soluzione di Business Continuity Architettura di una soluzione di BC : - Tiers della Business Continuity - Componenti e tematica multi-site Infrastruttura tecnologica a supporto delle soluzioni BC - Soluzioni di replica Storage Based - Clustering - Automazione e scenari di recovery
3 IT Business Continuity is more than Recovery Business Continuity Disciplines IT Business Continuity High Availability Fault-tolerant, failureresistant infrastructure supporting continuous application processing Continuous Operations Non-disruptive backups and system maintenance coupled with continuous availability of applications Protection of critical Business data Recovery is predictable and reliable Disaster Recovery Protection against unplanned outages such as disasters through reliable, predictable recovery Operations continue after a disaster Costs are predictable and manageable
4 IT Business Continuity is more than Recovery Business Continuity Disciplines Contrary to popular belief, hardware failures account for a minority of system outages Several studies place the proportion between 18% and 45% Human error, software error and planned maintenance cause the majority of service outages 40% 60% Technology Hardware and software capabilities Process Definition/design, compliance and continuous improvement People Roles & responsibilities, management, skills development & discipline
5 Key Elements of Business Continuance Plan Strategic planning Linee guida alla continuità operativa nella Pubblica Amministrazione Elementi di valutazione di una soluzione di BC: Data protection Data availability Data recoverability Manageability and Operability Total Cost of Ownership (TCO)
6 Key Elements of Business Continuance Plan Metrics Principal BC/DR metrics : Recovery Time Objective (RTO) - measured in hours (or minutes). How long can the business tolerate the IT systems being unavailable? Recovery Point Objective (RPO) - measured in seconds (or minutes). How much data can the business afford to recover (or lose)? Network Recovery Time (NRO) - measured in minutes. How long will it take to reestablish and / or restart the network? RTO and RPO considerations : Determine RTO and RPO as a trade-off between Cost and LoB requirements Business Impact Analysis (BIA) What Host platforms do you use zseries, Open Systems, mixed data How many Data Centers - Three site topology required? Evaluate distance and bandwidht Telecom bandwidth often the largest TCO component (often 50% - 70%) on a three year basis
7 Timeline of an IT Recovery Recovery Point Objective (RPO). How much data must be recreated? RPO Production Site Mirroring provides a point in time, data consistent copy of data at the remote site. Recovery Site Execute hdw., o/s, and data integrity recovery Assess Management Control Telecom Network Data Physical Facilities Operating System Production Outage! Network Staff Operations Staff Applications Staff Recovery Time Objective (RTO) of hardware data integrity. RPO Software transaction integrity recovery Δ Data Applications Recovery Time Objective (RTO) of transaction integrity. Now we're done!
8 Categories of IT Components that must be recovered A comprehensive architecture with the blending of five IT components, resulting in best solution: Servers Storage Software and Automation Infrastructure and Networking xseries pseries iseries Solaris HP-UX WinNT/2000 Operating System Applications zseries xseries pseries iseries Solaris HP-UX WinNT/2000 Operations Staff Network Staff zseries Data Applications Staff Skills and Services Management Control Telecom Network Physical Facilities
9 Business Continuity Tiers Balancing recovery time objective with cost / value Recovery from a disk image Recovery from tape copy Cost / Value BC Tier 7 Site Mirroring with automated recovery BC Tier 6 Storage mirroring (with or without automation) Applications req s : little or no tolerance for data loss, and a need to restore data to applications rapidly. BC Tier 5 Software mirroring BC Tier 4 Point in Time copy for Backup/Restore BC Tier 3 Tape Libraries, Electronic Vaulting 15 Min. 1-4 Hr Hr Hr Hr.. 24 Hr.. BC Tier 2 - Hot Site, Restore from Tape BC Tier 1 Restore Days from Tape Recovery Time Objective (guidelines only)
10 Business Continuity Tiers - business process segments Tier 7 Continuous Availability CA Tier 6 Cost Rapid Data Recovery CA CA RDP RDP RDP Tier 4 BCK/R Tier 3 Backup/Restore 15 Min. 1-4 Hr Hr Hr Hr.. Tier 2 24 Hr.. Applic 1 Tier 1 BCK/R BCK/R Applic 2 Applic 3 Days Recovery Time Objective Some examples. Transactional data Design or project data Mission-critical Mission-critical Mission-critical Historical Historical Historical Archival Archival Archival
11 Server - High Availability / Disaster Recovery: Clustering Solution
12 Server : zserver Continuous Availability GEO Plex
13 Storage - Where to place data replication? Host-based (Server) replication AIX Logical Volume Manager (LVM), AIX Geographic Logical Volume Manager (GLVM), General Parallel File System (GPFS) or Application replication Hardware (Storage Array ) based replication: IBM Metro Mirror (Disk Storage synchronous mirroring), IBM Global Mirror (Disk Storage asynchronous mirroring), IBM Virtual Tape Remote Replication Network (SAN) based replication : Brocade and CISCO Switch Routing appliance
14 Storage - Where to place data replication? Production Site Geographic Load Balancer Site Load Balancer Web Server Clusters Application / DB Server Clusters Application / DB Replication LAN and PC Tape Backup Geographic Load Balancer Secondary Site(s) More complexity Site Load Balancer Web Server Clusters Server Clusters Server Replication Application / DB Server Clusters Server Clusters Storage Storage Replication Replication, PiT Image, Tape HOST Replication - Application / Database replication based Requires less bandwidth, usually asynchronous mirroring only Span of consistency is the application or database only More complex implementation, must implement for each application HOST Replication - Server based Relatively simple implementation, application independent Uses server cycles, span limited to that OS HW Replication Storage Array based Simpler implementation, mirror logical disks, multiple heterogeneous systems, application independent Requires more bandwidth Consistent Repeatable RPO (RTO depends on Server/Data/Workload/Network) More expensive
15 Application Based Replication: DB2 HADR DB2 for z/os implements DB2 data sharing on a parallel sysplex DB2 for LUW (DB2 for Linux, Unix, and Windows) systems employs a "shared nothing" architecture. DB2 for LUW can restart quickly with use of High Availability Disaster Recovery (HADR) where the primary DB2 instance is continuously sending log records to the hot standby DB2 instance keeping its database copy synchronized. DB2 HADR LOG SHIPPING DB2 CLUSTERING
16 Application Based Replication: Oracle Dataguard Oracle Data Guard is a software solution that maintains the consistency between the primary and standby databases by synchronously or asynchronously transmitting transactional redo data from the primary to the standby databases. Stretch Cluster (or Extended Clusters, or Distance Clusters, or Geo-Clusters) allows these nodes to be located at different sites while still maintaining a unified cluster configuration over a single database.
17 Disk System Copy functions Copia Locale E una copia dei dati istantanea (copia dei puntatori ai blocchi di dati) all interno dello stesso sistema di Storage Copia Remota tra 2 siti Sincrona : copia dei dati (senza perdita transazioni) tra sistemi di Storage su distanza metropolitana Asincrona : copia dei dati (con perdita transazioni) tra sistemi di Storage su distanza geografica Copia Remota tra 3 siti Copia dei dati sincrona verso il sito vicino (per garantire la Continuita operativa) ed asincrona verso il sito piu lontano (per finalita di Disaster Recovery) Host adp Host adp controller WAN MAN MAN controller Advanced Functions Advanced Functions Cache Cache PTC Host adp RAID Sito 3 controller Sito 1 RAID Advanced Functions Cache asincrono sincrono sincrono RAID Sito 2
18 Disk System Copy functions Synchronous Remote Copy Asynchronous Remote Copy Local Copy Point in time Copy
19 General Disk Replication Guidance What Customers Want to Do Site A Site A Synchronous Paths Up to 300 fibre km Site B Primary Secondary Synchronous Site A Site A Site B Synchronous Paths Asynchronous Path Unlimited Distance Unlimited Distance Asynchronous Path Site B Site C
20 Remote Data Replication Technologies Two sites Synchronous IBM Metro Mirror A Synch A->B Three sites B Synch EMC SRDF/S A HDS TrueCopy B 3 sites A->B->C Asynch Asynchronous A Asynch A->B IBM Global Mirror B C EMC SRDF/A HDS Universal Replicator B A Synch z/os Asynchronous IBM zos Global Mirror (XRC) 3 sites A->B, A->C A z/os Global A->B B Asynchl C
21 IBM z/os Global Mirror (XRC) concepts - Journaling System Data Mover Primary Systems Consistency 4 Groups form Write data to primary 1 Application gets "write completed" 3 2 Updates offloaded to SDM Consistency Groups 5 written to Journal datasets Primary disk Subsystems Journal Data sets Write to Secondary 6 Control datasets updated 7 Secondary disk Subsystems Control Data set
22 3 sites Disk replication Metro B A Global network C z/os G lobal M irror A The three site replication requirements are: Fast Failover / Failback to any site Fast re-establishment of 3 site recovery, without production outages z/os P SDM Metro Mirror S Quickly resynchronize any site with incremental changes only (Links and bandwidth assumed between all sites) Cascading vs Star configuration B F S C
23 Le Soluzioni a Nastro HOST Il nastro rimane il dispositivo di memorizzazione col miglior rapporto costo/prestazioni Operazioni su tape automatizzate attraverso librerie automatiche (ATL) munite di accessori robotici In ambienti di tipo Large Enterprise le soluzioni Tape vengono virtualizzate con un caching a disco Tape Library ATL Virtual Engine Il concetto di deduplicazione dei dati consente la riduzione estrema dei dati memorizzati Tape Drive Cartridge Library frame
24 Il concetto di Virtual Tape I principali benefici apportati da un apparato di virtualizzazione di nastri: F... Simula allo host un numero variabile di tape drive virtuali Virtual Drive 1 Virtual Drive 2 Virtual Drive n Elimina i ritardi indotti dall accesso al tape fisico poiche i dati vengono scritti in cache (memoria dei server interni). I dati vengono scaricati in asincrono dalla cache ai nastri Disegnato per utilizzare a pieno la capacita delle cartucce e delle librerie di nastri. Fa lo stack di piu volumi logici sulle cartucce in backend permettendo di avere un numero ridotto di tape drives reali Tape Volume Cache Virtual Volume 1 Virtual Volume 2 Virtual Volume n Logical Volume 1 Logical Volume n
25 Enterprise Tape solutions: Three Site Grid Configuration Virtual Tape Libraries.. Utilize tape assets more efficiently Improves and speeds tape backup process Improves RTO and RPO for tape-based data Remote site Local/Regional ENet Switch TS7740 ENet Switch LAN/WAN ENet Switch ENet Switch TS7740 ENet Switch ENet Switch TS7740 Copies between local site are usually immediate Copies to remote site are usually deferred
26 Virtual Tape Libraries and Deduplication Backup Server Represented capacity Primary Site ProtecTIER Gateway Physical capacity Backup Server Represented capacity PT-server based replication Secondary Site Backup Server 26 Significant bandwidth reduction ProtecTIER Gateway Physical capacity Jan 2, 2009
27 Connettivita Fiber Channel Fiber Channel Architecture Set di regole (da FC-0 a FC-4) che definisce lo standard da implementare per il trasferimento di dati tra computer e device su fibra ottica FICON E il protocollo che implementa in ambiente zseries la Fiber Channel Architecture. E uno standard industriale (FC-SB-2). FCP E il Fiber Channel Protocol per SCSI e permette di mappare i comandi del protocollo SCSI sulla Fiber Channel Architecture.
28 Tiers of Disaster Recovery: Positioning Automation Automation Benefits Provides High Availability solution for large multi-platform zos, Windows, UNIX and Linux environments. Support a wide variety of data replication methods: storage replication, server-based replication, database or software replication Reduce / avoid human errors during the recovery process Enables regular testing to validate that BC/DR procedures are:repeatable reliable scalable auditable Helps reduce infrastructure management costs Helps maintain recovery readiness. Mission Critical Applications Value Somewhat Critical Applications Not so Critical Applications Dedicated Remote Hot Site Tier 7 - Near zero or zero Data Loss: Highly automated takeover on a complex-wide or business-wide basis, using remote disk mirroring Active Secondary Site Point-in-Time Backup 15 Min Time to Recover (hrs) Automation is a principal enabler for successful BC/DR plans
Microsoft Cross-Site Disaster Recovery Solutions End-to-End Solutions Enabled by Windows 2008 Failover Clustering, Hyper-V, and Partner Solutions for Data Replication Published: December 2009 Introduction:
Front cover Content Manager OnDemand Backup, Recovery, and High Availability Introducing basic concepts, strategies, options, and procedures Covering multiplatforms, iseries, and z/os Including real world
Microsoft Corporation and HP Using Network Attached Storage for Reliable Backup and Recovery Microsoft Corporation Published: March 2010 Abstract Tape-based backup and restore technology has for decades
EMC NetWorker Version 8.2 SP1 Server Disaster Recovery and Availability Best Practices Guide 302-001-572 REV 01 Copyright 1990-2015 EMC Corporation. All rights reserved. Published in USA. Published January,
Dell NetVault Backup Technical Overview A technical overview of NetVault Backup, including its architecture, benefits, key components and licensing options. Written by Dell Software Introduction NetVault
WHITE PAPER Meeting Backup and Archive Challenges Today and Tomorrow Sponsored by: Fujitsu Nick Sundby November 2014 IDC OPINION IDC's end-user surveys show data integrity and availability remains a top
Data Protection for Isilon Scale-Out NAS A Data Protection Best Practices Guide for Isilon IQ and OneFS By David Thomas, Solutions Architect An Isilon Systems Best Practices Guide May 2009 ISILON SYSTEMS
Acronis Backup & Recovery 11 Next Generation Physical, Virtual, Cloud Backup, Disaster Recovery, and Data Protection Solution from Acronis An Acronis White Paper Copyright Acronis, Inc., 2000 2011 Table
EMC SOLUTIONS FOR MICROSOFT SQL SERVER WITH EMC VNX SERIES EMC Solutions Group Abstract This document describes various best practices for deploying Microsoft SQL Server with EMC VNX series storage arrays.
Technical white paper Backup and Recovery for HP ConvergedSystem 700x with HP Data Protector & HP StoreOnce Reference Architecture and Best Practices Table of contents I. Executive summary... 2 Deployment
Introduction to Windows Storage Server 2003 Architecture and Deployment Microsoft Corporation Published: July 2003 Abstract Microsoft Windows Storage Server 2003 is the latest version of Windows Powered
Technical white paper Using HP StoreOnce D2D systems for Microsoft SQL Server backups Table of contents Executive summary 2 Introduction 2 Technology overview 2 HP StoreOnce D2D systems key features and
Best Practices for the HP EVA Array using VMware vcenter Site Recovery Manager Table of contents Introduction... 2 HP StorageWorks Continuous Access EVA... 3 Data replication... 3 DR groups and copy sets...
Assess, Adjust, Improve An LXI Publication Page 1 of 11 Your company's ability to recover is a high priority. In a survey by Contingency Planning & Management Magazine of 1437 contingency planners, 76%
One Stop Data & Networking Solutions PREVENT DATA LOSS WITH REMOTE ONLINE BACKUP SERVICE Prevent Data Loss with Remote Online Backup Service The U.S. National Archives & Records Administration states that
Feature Guide DOUBLE-TAKE AVAILABILITY Double-Take Availability provides affordable data protection, ensures minimal data loss and enables immediate recovery from any system outage. Double- Take Availability
WHITE PAPER Addressing Virtualization and High-Availability Needs with Sun Solaris Cluster Sponsored by: Sun Microsystems Jean S. Bozman October 2009 EXECUTIVE SUMMARY Global Headquarters: 5 Speen Street
SOLUTION BRIEF KEY CONSIDERATIONS FOR BACKUP AND RECOVERY Among the priorities for efficient storage management is an appropriate protection architecture. This paper will examine how to architect storage
Hitachi Universal Storage Platform V Hitachi Universal Storage Platform VM User and Reference Guide FASTFIND LINKS Document Organization Product Version Getting Help Contents MK-96RD635-17 Copyright 2011
An Oracle Technical White Paper May 2011 Oracle Optimized Solution for Enterprise Cloud Infrastructure Introduction... 1 Overview of the Oracle Optimized Solution for Enterprise Cloud Infrastructure...
NetVault, NDMP and Network Attached Storage Simplicity and power for NAS Written by Adrian Moir, Dell Scott Hetrick, Dell Abstract This technical brief explains how Network Data Management Protocol (NDMP)