User Experience: BCPii, FlashCopy and Business Continuity



Similar documents
IBM Replication Solutions for Business Continuity Part 1 of 2 TotalStorage Productivity Center for Replication (TPC-R) FlashCopy Manager/PPRC Manager

IBM Systems and Technology Group Technical Conference

Lisa Gundy IBM Corporation. Wednesday, March 12, 2014: 11:00 AM 12:00 PM Session 15077

Backup Solutions with Linux on System z and z/vm. Wilhelm Mild IT Architect IBM Germany mildw@de.ibm.com 2011 IBM Corporation

The Consolidation Process

IBM Tivoli Storage FlashCopy Manager Overview Wolfgang Hitzler Technical Sales IBM Tivoli Storage Management

SHARE in Pittsburgh Session 15591

Storage System High-Availability & Disaster Recovery Overview [638]

Forecasting Performance Metrics using the IBM Tivoli Performance Analyzer

IMS Disaster Recovery Overview

Oracle on System z Linux- High Availability Options Session ID 252

IMS Disaster Recovery

FDR/UPSTREAM INNOVATION Data Processing Providing a Long Line of Solutions

Tivoli Flashcopy Manager Update and Demonstration

FICON Extended Distance Solution (FEDS)

How To Use Anibom Smart Cloud For Business

Using Virtualization to Help Move a Data Center

Management with IBM Director

HITACHI DATA SYSTEMS USER GORUP CONFERENCE 2013 MAINFRAME / ZOS WALTER AMSLER, SENIOR DIRECTOR JANUARY 23, 2013

IBM TSM DISASTER RECOVERY BEST PRACTICES WITH EMC DATA DOMAIN DEDUPLICATION STORAGE

Improving disaster recovery with Virtual Tape Libraries in Mainframe Environments By Deni Connor Principal Analyst, Storage Strategies NOW

IBM SAP International Competence Center. The Home Depot moves towards continuous availability with IBM System z

How To Use Ibm Tivoli Composite Application Manager For Response Time Tracking

Aktuelles aus z/vm, z/vse, Linux on System z

Deploying a private database cloud on z Systems

Data Protection with IBM TotalStorage NAS and NSI Double- Take Data Replication Software

Backing Up and Restoring a z/vm Cluster and Linux on System z Guests

Simplify and Improve Database Administration by Leveraging Your Storage System. Ron Haupert Rocket Software

Non-disruptively Migrating z/vm and Linux Guests in Their Entirety

IBM Tivoli Storage FlashCopy Manager

Volvo IT s Mainframe journey

IBM DB2 Recovery Expert June 11, 2015

Experiences with Using IBM zec12 Flash Memory

IBM Infrastructure Suite for z/vm and Linux

How To Backup At Qmul

DASD Backup Automation

Veritas Cluster Server from Symantec

High Availability for Linux on IBM System z Servers

Data Protection for Open Systems with the FDR Products on your z/os Mainframe

Session 1494: IBM Tivoli Storage FlashCopy Manager

This presentation provides an overview of the architecture of the IBM Workload Deployer product.

HA / DR Jargon Buster High Availability / Disaster Recovery

Top 10 Tips for z/os Network Performance Monitoring with OMEGAMON Session 11899

BUSINESS PROCESS MANAGEMENT

Effective Storage Management for Cloud Computing

SHARE Lunch & Learn #15372

Business Resilience for the On Demand World Yvette Ray Practice Executive Business Continuity and Resiliency Services

IBM Tivoli Storage Productivity Center (TPC)

IBM Tivoli Monitoring for Databases

Disk Library for mainframe - DLm6000 Product Overview

Alternate Methods of TSM Disaster Recovery: Exploiting Export/Import Functionality

Improve your IT Analytics Capabilities through Mainframe Consolidation and Simplification

TSM for Advanced Copy Services: Today and Tomorrow

Backup and Restore Manager for z/vm

Continuous Data Protection for DB2

z/os Curriculum Job Control Language (JCL) Curriculum JES Curriculum WebSphere Curriculum TSO/ISPF for z/os Curriculum

DS8000 Data Migration using Copy Services in Microsoft Windows Cluster Environment

FDR/UPSTREAM and INNOVATION s other z/os Solutions for Managing BIG DATA. Protection for Linux on System z.

BMC Mainframe Solutions. Optimize the performance, availability and cost of complex z/os environments

Customer Experiences:

Managed Services - A Paradigm for Cloud- Based Business Continuity

DB2 for z/os System Level Backup Update

Disk-to-Disk-to-Offsite Backups for SMBs with Retrospect

IBM Storwize Rapid Application Storage

SuSE Linux High Availability Extensions Hands-on Workshop

Exploiting the Virtual Tape System for Enhanced Vaulting & Disaster Recovery A White Paper by Rocket Software. Version 2.

Symantec Cluster Server powered by Veritas

IP Storage On-The-Road Seminar Series

Veritas InfoScale Availability

TSM (Tivoli Storage Manager) Backup and Recovery. Richard Whybrow Hertz Australia System Network Administrator

BACKUP SOLUTIONS FOR SCHOOLS. Advice and Guidance. ICT Services 42 New Union Street Coventry CV1 2HN

Implementing Tivoli Storage Manager on Linux on System z

What s the best disk storage for my i5/os workload?

Storage Track - IBM Server Systems Technical Conference 2009

2. Highlights and Updates: ITSM for Databases

Universal Backup Device with

Exam : IBM : IBM NEDC Technical Leader. Version : R6.1

Replicating Mainframe Tape Data for DR Best Practices Session #10929

High Availability Architectures for Linux in a Virtual Environment

Cloud Infrastructure Management - IBM VMControl

UPSTREAM for Linux on System z

Building Continuous Cloud Infrastructures

Challenges of Capacity Management in Large Mixed Organizations

A Brief Introduction to IBM Tivoli Storage Manager Disaster Recovery Manager A Plain Language Guide to What You Need To Know To Get Started

Protecting Microsoft Hyper-V 3.0 Environments with CA ARCserve

BACKUP AND RECOVERY PLAN MS SQL SERVER

How To Understand And Understand The Basic Principles Of An Ansper System

Backup and Recovery Solutions for Exadata. Ľubomír Vaňo Principal Sales Consultant

CA ARCserve Replication and High Availability Deployment Options for Hyper-V

VERITAS NetBackup 6.0 Enterprise Server INNOVATIVE DATA PROTECTION DATASHEET. Product Highlights

z/vm and Linux Disaster Recovery A Customer Experience Lee Stewart Sirius Computer Solutions (DSP)

Backups in the Cloud Ron McCracken IBM Business Environment

Accelerate with ATS DS8000 Hardware Management Console (HMC) Best practices and Remote Support Configuration September 23rd, 2014.

Transcription:

User Experience: BCPii, FlashCopy and Business Continuity Mike Shorkend Isracard Group 4:30 PM on Monday, March 10, 2014 Session Number 14953 http://www.linkedin.com/pub/mike-shorkend/0/660/3a7 mshorkend@isracard.co.il mike@shorkend.com

Trademarks The following are trademarks of the International Business Machines Corporation in the United States and/or other countries. AIX* DB2* HiperSockets IBM* IBM logo* IMS CICS System z System z9 System z10 System z114 Tivoli WebSphere* z/os* z/vm* zseries* * Registered trademarks of IBM Corporation The following are trademarks or registered trademarks of other companies. Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both. Microsoft, Windows, Windows NT and the Windows logo are trademarks of Microsoft Corporation in the United States, other countries, or both. Red Hat, the Red Hat "Shadow Man" logo, and all Red Hat-based trademarks and logos are trademarks or registered trademarks of Red Hat, Inc., in the United States and other countries. Control-M and Control-O are trademark of BMC * All other products may be trademarks or registered trademarks of their respective companies. 2

Agenda Introduction Level 1: Synchronous Replication Level 2: Logical Copies Level 3: DRP testing Level 4: Offsite Backup Copy Questions 3

Monthly turnover of 9 billion NIS

About me Manager, Central Infrastructures at Isracard Responsible for z/os, z/vm, Linux(z and x), enterprise storage 2 teams Mainframe OS, Linux and Storage My background is z/os system programming, tuning and capacity planning 7 years at Isracard 5

The Challenges and Triggers Normal threats like floods, earthquake, fire Geo-political specific threats like terror and cyber attacks In November 2008 a large Israeli financial institute had a 60 hour outage due to a logical error that was replicated to the DR site. Compliance Financial Constraints 6

Isracard Infrastructure Primary Site 34/54 km Backup Site z114 z114 z196 Z10 BC + CBUs Z10 BC - ELS z/os z/vm+loz BC IBM DS8700 MM IBM DS8870 BC VMware, Windows, Linux IBM XIV Synchronous Replication IBM XIV 7

Isracard Mainframe Infrastructure Primary Site z/os 1.13 DB2 10 CICS/TS 4.1 IMS(DBCTL) 11.1 z114 z114 z196- ELS z/vm 6.3 RHEL 5.6 WAS WMB WODM IBM DS8700 IBM XIV IBM VT7720 8

Agenda Introduction Level 1: Synchronous Replication Level 2: Logical Copies Level 3: DRP testing Level 4: Offsite Backup Copy Questions 9

Synchronous replication All production DASD are replicated to the DR site using Metro Mirroring(aka PPRC). Managed by Tivoli Productivity Center for Replication Approximately 16TB (9TB allocated) on 1800 volumes If one pair fails, I/O is frozen and all pairs are suspended creating a write dependent consistent mirror at the DR site(deals with the rolling disaster scenario) I/O is released after a suspend(the other option is a sysplex wide outage). Availability preferred over mirror update. Monitored by hourly jobs 10

11

Agenda Introduction Level 1: Synchronous Replication Level 2: Logical Copies Level 3: DRP testing Level 4: Offsite Backup Copy Questions 12

Logical Error Challenges If you have a software, hardware or application error that corrupts your data it gets replicated synchronously to your mirror Backups can help, but how do you get a consistent production copy? FLASH COPY is good but costly How do you check that your copy images are valid? 13

Our solution We take a space efficient flash copy of our production data every business day Two copies are kept: todays and yesterdays A third copy can be taken at any time(more on that later) After the copy is created, it is IPLed and data integrity is verified Only after it is verified, the previous days copy can be removed All automatic, using BMC/Control-M and Control-O,DSCLI and BCPii 14

Building Blocks(1/3):Space Efficient Flash Copy Primary Site Backup Site 7.4TB 16TB 16TB z/os Production Bxxx Primary z/vm Production Primary MM MM z/os Production 9xxx Secondary z/vm Production Secondary z/os Logical Copy 1 7xxx SEFC target z/os Logical Copy 2 8xxx SEFC target z/os DR test 6xxx SEFC target z/vm DR test FC target TEST Share in Atlanta 2012. Session 2402 Jeff Suarez Share in Austin 2009. Session 3080. Linda Gundy 15

Building Blocks(2/3):BCPii 15048: What's New in BCPii in z/os 2.1? Full REXX Support and Faster Data Retrieval, Wednesday, March 12, 2014: 8:00 AM-9:00 AM Grand Ballroom Salon C, Steve Warren 16

Building Blocks(3/3) Control-M z/os and distributed scheduling Control-O z/os Automation DSCLI command line interface to DS8870 17

The Big Picture End of day processing CTM Create new Flash CTM IPL and Verify z/os CSMCLI BCPii+CTO When the job(on z/os) indicating end of day processing has finished, a condition is raised by Control-M. This condition causes a DSCLI script to be run that creates the flash copy. When the script ends, Control-M raises a condition that causes a job on z/os to run that activates the coupling facility and the z/os image at the DR site. Another job monitors the IPL message log 18

Creating the New Flash flashmankal1.script Copy on Write For Consistency Target is Space efficient 19

Automated IPL using BCPii Submit job that activates the coupling Submit job that listens on console traffic (of the IPLing image) Submit job that activates the z/os image(load on activation set) Respond to WTORs using the listener job using CONTROL/O ControlO/Cosmos takes over the IPL process when it can When the system is up, run a CICS transaction (using the MODIFY command) to verify data integrity 20

Automated IPL Job... 21

22 Listener Job interact with console

23 Automated IPL

24 Automated IPL

Logical Copy - Recap Primary Site Secondary Site CEC Coupling Facility Production LPAR Support Element Backup LPAR CTM BCPii CTO SSPC DS /HMC IBM DS8870 Logical SEFC 25

Logical Copy side benefits A DR test every day! A true production environment which can be used to test new versions of software Improves MTTR picks up errors at IPL time 26

SEFC the downside SEFC impacts PPRC latency SEFC performance is impacted (affects DR tests) If we ever need to use it(which is highly unlikely), we will not IPL directly from the copy. We will have to restore some or all of our data to the primary volumes 27

BCPii gotcha We had a problem responding to WTORs early in an IPL You need to set the HWI_CMD_OSCMD_PRIORITYTYPE field to HWI_CMD_PRIORITY 28

Agenda Introduction Level 1: Synchronous Replication Level 2: Logical Copies Level 3: DRP Testing Level 4: Offsite Backup Copy Questions 29

DRP testing the limitations We do not use the secondary PPRC volumes for DR testing We never stop the mirroring The User DR site and the IT DR center are 30km apart 30

DRP testing - How do we do it? We take snapshots of our production secondary copies and use them For z/os it is another SEFC set For zvm it is a FC set For the distributed environment we use XIV snapshots The VTL does not support snapshots(yet coming soon), but we can read the production tapes. Scratches are taken from a special pool. All communication between the primary site and the DR site is disconnected Synchronous replication for the DS8K and XIV continues A test runs for about 36 hours 31

Agenda Introduction Level 1: Synchronous Replication Level 2: Logical Copies Level 3: DRP Testing Level 4: Offsite Backup Copy Questions 32

Offsite backup copy The financial compliance regulation laws require that we have a third copy(that is, at neither of our sites) of our data at a secured location The assumption is that this copy will be used if both sites are permanently unavailable. Every Friday morning we bring up an LPAR at our primary site that reads that mornings logical copy and dumps it to a TS3500. Cartridges and reports are exported and sent off site. Another LPAR is needed because you can t bring the logical copies online (same VOLSERS as the production). 33

Offsite Backup Copy Primary Site Secondary Site Production LPAR The backup takes approx. 24 hours NJE CTM BCPii Backup LPAR IBM DS8870 Logical SEFC TS3350 34

Offsite backup copy - Output Cartridges that contain: Our production data Rexx and edit macros to customize the restore jobs at the new (unknown) site Hardcopy documentation Requirements Hardware, software Inventory reports(created dynamically for each copy) VOLSER to dataset mapping Catalog structure 35

Next Steps Main site transfer and implement a three site solution Change DR drill methodology Implement Hyperswap with TPC on z/os Replace DS8700 (at main site) with DS8870(3Q14) Re-evaluate SEFC due to limitations Re-evaluate offsite backup copy on cartridges and move to third copy on DASD Distributed environment implement logical copy 36

Summary Scenario Primary site DS8xxx failure Primary site complete failure Logical error that gets mirrored Both sites fail Protection Metro Mirror Copy MM copy + Backup CEC Logical Copy Offsite backup copy 37

Questions? 38