Disaster Recovery. Awareness

Similar documents

Best Practices in Disaster Recovery Planning and Testing

BUSINESS CONTINUITY PLAN OVERVIEW

Business Continuity. Client Briefing

Abhi Rathinavelu Foster School of Business

Oracle Maps Cloud Service Enterprise Hosting and Delivery Policies Effective Date: October 1, 2015 Version 1.0

ASX CLEAR (FUTURES) OPERATING RULES Guidance Note 10

Disaster Recovery Plan Checklist

ASX SETTLEMENT OPERATING RULES Guidance Note 10

The 5 Most Commonly Used Disaster Recovery Process

The University of Iowa. Enterprise Information Technology Disaster Plan. Version 3.1

Temple university. Auditing a business continuity management BCM. November, 2015

Continuity of Operations Planning. A step by step guide for business

Best Practices in Developing an IT Disaster Recovery Plan. Vijaykumar Kulkarni AGM Product Management

Disaster Recovery 101. Sudarshan Ranganath & Matthew Phillips Ellucian

Blackboard Managed Hosting SM Disaster Recovery Planning Document

How to Design and Implement a Successful Disaster Recovery Plan

AUSTRACLEAR REGULATIONS Guidance Note 10

IT Disaster Recovery Plan Template

<Client Name> IT Disaster Recovery Plan Template. By Paul Kirvan, CISA, CISSP, FBCI, CBCP

DISASTER RECOVERY WITH AWS

RLI PROFESSIONAL SERVICES GROUP PROFESSIONAL LEARNING EVENT PSGLE 125. When Disaster Strikes Are You Prepared?

New Clerk Academy. August 13, 2015

Course: Information Security Management in e-governance. Day 2. Session 5: Disaster Recovery Planning

IT Disaster Recovery Plan Template ABC PVT LTD

Offsite Disaster Recovery Plan

Business Continuity Planning (BCP) / Disaster Recovery (DR)

Zerto Virtual Manager Administration Guide

THORNBURG INVESTMENT MANAGEMENT THORNBURG INVESTMENT TRUST. Business Continuity Plan

Lunch and Learn: Modernize Your Data Protection Architecture with Multiple Tiers of Storage Session 17174, 12:30pm, Cedar

Business Continuity Planning in IT

DISASTER RECOVERY BUSINESS CONTINUITY DISASTER AVOIDANCE STRATEGIES

The State of Global Disaster Recovery Preparedness

Microsoft Hyper-V Powered by Rackspace & Microsoft Cloud Platform Powered by Rackspace Support Services Terms & Conditions

The Weill Cornell Medical College and Graduate School of Medical Sciences. Responsible Department: Information Technologies and Services (ITS)

Desktop Scenario Self Assessment Exercise Page 1

Disaster Recovery Plan

IT Disaster Recovery and Business Resumption Planning Standards

Why Should Companies Take a Closer Look at Business Continuity Planning?

INSIDE. Preventing Data Loss. > Disaster Recovery Types and Categories. > Disaster Recovery Site Types. > Disaster Recovery Procedure Lists

Disaster Recovery Plan

Consulting Solutions Disaster Recovery. Yucem Cagdar

SCADA Business Continuity and Disaster Recovery. Presented By: William Biehl, P.E (mobile)

MANAGEMENT AUDIT REPORT DISASTER RECOVERY PLAN DEPARTMENT OF FINANCE AND ADMINISTRATIVE SERVICES INFORMATION TECHNOLOGY SERVICES DIVISION

DISASTER RECOVERY Steps You Need to Take (Before It s Too Late)

Business Continuity & Recovery Plan Summary

Template Courtesy of: Cloudnition LLC 55 W. 22 nd St Suite 115 Lombard, IL (630)

Business Continuity & Recovery Plan Summary

Table of Contents... 1

Interactive-Network Disaster Recovery

Leveraging Virtualization for Disaster Recovery in Your Growing Business

Business Continuity and Disaster Survival Strategies for the Small and Mid Size Business.

Designtech Cloud-SaaS Hosting and Delivery Policy, Version 1.0, Designtech Cloud-SaaS Hosting and Delivery Policy

MARQUIS DISASTER RECOVERY PLAN (DRP)

Disaster Recovery Hosting Provider Selection Criteria

MiServer and MiDatabase. Service Level Expectations. Service Definition

Ongoing Help Desk Management Plan

Business Continuity Plan

DISASTER RECOVERY PLANNING GUIDE

CA ARCserve and CA XOsoft r12.5 Best Practices for protecting Microsoft SQL Server

Advent. Disaster Recovery: Options for Investment Managers. A White Paper from Advent Software and CyGem Ltd. Advent Software, Inc.

Business Continuity Planning Toolkit. (For Deployment of BCP to Campus Departments in Phase 2)

2014 NABRICO Conference

Disaster Preparedness & Response

Proactive IT Solutions More Reliable Networks Are Our Business

RL Solutions Hosting Service Level Agreement

Schedule 5: SaaS Premium Service Level Agreement

a Disaster Recovery Plan

Disaster Recovery Planning

Creating a Business Continuity Plan

EXECUTIVE SUMMARY 1.1 PROJECT OBJECTIVES

Speeding Recovery Through the Cloud Presentation to Mid TN ISC2 Matthew Stevens, Senior Solutions Engineer Windstream Hosted Solutions

HA / DR Jargon Buster High Availability / Disaster Recovery

TO AN EFFECTIVE BUSINESS CONTINUITY PLAN

How to write a DISASTER RECOVERY PLAN. To print to A4, print at 75%.

The Promise of Virtualization for Availability, High Availability, and Disaster Recovery - Myth or Reality?

Operational Continuity

Disaster Recovery Checklist Disaster Recovery Plan for <System One>

Clovis Municipal School District Information Technology (IT) Disaster Recovery Plan

MSP Service Matrix. Servers

Voyageur Internet Inc. 323 Edwin Street Winnipeg MB R3B 0Y7 Main: (204) Fax: (204)

Appendix E to DIR Contract Number DIR-TSO-2736 CLOUD SERVICES CONTENT (ENTERPRISE CLOUD & PRIVATE CLOUD)

Network & Information Security Policy

NETWORK SERVICES WITH SOME CREDIT UNIONS PROCESSING 800,000 TRANSACTIONS ANNUALLY AND MOVING OVER 500 MILLION, SYSTEM UPTIME IS CRITICAL.

Business Continuity Planning. Donna Curran, Director Audit and Risk Management February, 2014

Business Continuity Management

The Difference Between Disaster Recovery and Business Continuance

Benefits and Potential Drawbacks to Implementing SAP as a Hosted Solution; Run SAP Like a Factory Factory Controls

Disaster recovery strategic planning: How achievable will it be?

Business Continuity Glossary

Business Continuity Planning (BCP) / Disaster Recovery (DR)

Transcription:

Disaster Recovery Awareness 1

Threats to Business Environmental: Fire, flood, earthquake, etc. Loss of utilities & services: Power interruption, etc. Systems or equipment failure Information security breach Civil protest/unrest Disruptions in third parties and business partners Pandemic outbreaks Acts of terrorism 2

Impacts Due to Disaster Revenue Loss Angry Customers Non-Compliance with Laws and Regulations Embarrassment 3

Definitions, Acronyms and Abbreviations High Availability (failover) is the characteristic of a system to protect against or recover from minor IT outages in a short time frame with largely automated means. Disaster Recovery (DR) is the ability to continue with services in the case of major IT outages, often with reduced capabilities or performance. Disaster Recovery solutions typically involve manual activities. Business Continuity (BC) takes Disaster Recovery one step further, and includes facilities, work force continuity, vital business operations, etc. 4

Definitions, Acronyms and Abbreviations (Continued) Recovery Time Objective (RTO) Recovery Point Objective (RPO) Tier 0 The maximum elapsed time a business function can sustain a service interruption from the time a crisis is identified to the restoration of service. The amount of data loss that is acceptable. For example, if an RPO of 6 hours is acceptable, SPE will be able to restore systems back to the state they were in 6 hours before the disaster or less. Any data created or modified inside a recovery point objective will be either lost or must be recreated during a recovery. Basic Infrastructure (Such as DNS, LDAP, Active Directory, network services, etc.), required for business application to run. Tier 1 Application with Recovery Time Objective of less than 12 hours and Recovery Point Objective of less than 5 minutes. Tier 2 Tier 3 Application with Recovery Time Objective of more than 12 hours but less than 48 hours, and Recovery Point Objective of less than 24 hours. Application with Recovery Time Objective of more than 48 hours and less than 15 days, and Recovery Point Objective of less than 36 hours. Tier 4 Application with Recovery Time Objective of more than 15 days and Recovery Point Objective of less than 36 hours. 5

DR Plan Disaster Recovery Plan: Aims to provide an organized way to respond to a disruptive event. Addresses the IT procedures to be followed before, during and after a loss to identified applications at Primary Datacenter (Chandler, AZ) The primary objective is to provide the capability to bring up critical applications and data at an alternate site within a time frame that minimizes the loss to the organization. Per Business Impact Analysis 2008 Tier s determined & DR Environment implemented for the following SPE Business Critical Applications. As of January 2013 DR Systems located at El Segundo colo. (SAP in process of establishing DR environment) evmi ITSM/C2C PRISM TAAS GPMS 6

DR Site DR Site Name: Digital Realty Trust co-location. (El Segundo Datacenter) Address: 2260 E El Segundo Blvd, City, State ZIP: El Segundo, CA 90245 Directions: Contact Name / Phone: 15 minute drive from Corporate Pointe, Culver City, CA Ron Goede Director Facilities SPE - EIS 600 Corporate Pointe 310.665 6639 Office 310.678.6461 Cell E-mail: Ron_Goede@spe.sony.com Web: http://www.digitalrealtytrust.com/ 7

DR Process Life Cycle Project Planning- Completed in 2008 Completed in 2008 Risk Assessment & Business Impact Analysis Established in 2008 updates in progress DR Environment Implementation Maintenance & Updating evmi, PRISM, ITSM/C2C, TAAS, GPMS DR Procedure Documentation Awareness & Training DR Testing & Exercising DR testing & exercising is performed after implementation phase to ensure DR environment can be brought up within the required time (RTO) using the documented DR procedure. 8

DR Methodology Applications were classified as Tier1/Tier2/Tier3 based on the results of Risk Assessment Determined RTO and RPO to ensure minimum business impact DR environments were set up at the DR site (currently being updated to mirror Prod.) DR Procedure Documents developed (currently being updated) Tests conducted Tier Maintain RTOthe Document RPO 1 Less than 12 hrs Minutes to 6 hrs 2 12-48 hrs 6-24 hrs 3 2 wks Less than 36 hrs 4 4 wks Less than 36 hrs 9

Tiers based on Business Impact Analysis 2008 Application name Division Tier Financial Impact Per Day 5 Time And Attendance System (TAAS) Corporate Tier 1 $6 M (If it occurs on any Monday) 25 evmi HE Tier 1 $4,693,334 38 ITSM, C2C, ITSM-S, SARA TV-I Tier 2 $3.2 Million 30 SPIRIT MP Tier 1 $2.7 Million 8 Timecapture Imageworks Corporate Tier 1 > $2 M (If it occurs on any Monday or Friday) 24 PRISM & Pathfinder HE Tier 2 $1 Million 34 Domestic TV Sales Management TV-D Tier 2 $64,286 23 Sales Estimation Tool (SET) HE Tier 2 (Tier 1?) $50,000 31 SPIRITworld MP Tier 1 $18,600 28 GPMS Suite MP Tier 3 $18,000 27 InterPlan MP Tier 2 (Tier 1?) $16,000 7 Hyperion Enterprise Corporate Tier 1 (Tier 2?) $5,000 14 Sony Pictures Stock Footage DMG Tier 1 $5,000 22 Revenue Sharing System (REVSHARE) HE Tier 3 $5,000 11 cineshare+ / cineview DMG Tier 2 $3,900 4 HR Connection Corporate Tier 3 $3,000 1 IRIMS Incident Management Corporate Tier 3 $800 6 Medical Services System - Medgate Corporate Tier 3 $350 3 Remedy IT Corporate Tier 1 $0 9 E-911 LAN Alert (Siemens Phone System) Corporate Tier 1 (Tier 3?) 17 SPiDR (Stellent) ETS Tier 1 (Tier 0?) $0 12 Secure File Distribution Services DMG Tier 2 $0 20 myspe, myspej, myspti ETS Tier 2 $0 13 Digital Media Repository DMG Tier 3 $0 36 DealMaker TV-D Tier 3 $0 37 SPT B2B Marketing TV-D Tier 3 $0 10 ACORN/DMI DMG Tier 1 * Tier 1 Tier 2 Tier 3 29 Spent and Committed MP Tier 1 * 32 Automated System for Ad/Pub fulfillment (ASAP) MP Tier 2 * $0 39 Harris Landmark Traffic and Ad Sales TV-I Tier 2 * 40 Harris VISION Scheduling TV-I Tier 2 * 2 CorpTax Corporate Tier 3 * * Awaiting final number 10

Disaster Levels Asset Level Recovery Level I Site Level Recovery Level II City Level Recovery Level III Can we work from the same premises? Can we operate from any other location within LA (SPE Lot)? Recovery Team Leader to mobilize all Recovery Teams and guide them to relocate to another SPE location outside LA or mobilize teams from other SPE location outside LA (London, etc.) Operations at Recovery Location Recovery Team Leader to mobilize the required Recovery Teams and guide them to relocate to another area within the same premises Recovery Team Leader to mobilize the required Recovery Teams and guide them to relocate to another SPE location within LA Restoration Begin Recovery Procedures Begin Recovery Procedures Begin Recovery Procedures 11

DR Network Strategy Emergency Types Description Causes Network Recovery Strategy Level 1 Asset Level Failure System crashes, disk failures, theft, hacking, virus threats SPE LAN to be used primarily. However, if administrators are away from the primary site, they could use VPN and make the DR site as primary. Level 2 Site Level Failure Major power outages, communication link failures, network component failures, sabotage Administrators would be required to use Sony LAN from the alternate site or VPN and make the DR site as primary. Level 3 City Level Failure Total outage because of natural or manmade disasters Administrators in the US may not be able to use either LAN or VPN and would need to depend on administrators outside US (Europe/India etc). 12

DR Operations Flow Recovery Flowchart Disaster Notify DR Team & Crisis Management Team Crisis Management Team/Facilities Management Alert Damage Assessment Team Emergency Plan Activation (Evacuation) Verification & Assessment DR Plan Activation & Notification Procedures Assemble Teams DR Teams IT Recovery Team Salvage Team Perform Recovery at Alternate Site Begin Salvage Operations at Primary Site Normal Operations Team Return to Normal Operation End of Crisis 13

Teams & Responsibilities Team Crisis Management Team Responsibilities Directs, allocates resources and oversees the situation Damage Assessment Team Assesses the impact of disaster event and informs Crisis Management Team Coordinates the collection of information & evidence required for insurance claims Salvage Team Disaster Recovery Team Normal Operations Team Returns primary site to normal operating conditions Clears and repairs the primary processing facility Implements recovery procedure at DR Site Recovers the systems & data at DR Site Returns Production from DR to Primary Site Other teams required during Disaster Recovery are: Emergency Response Team Evacuation of employees, coordinating with non-spe emergency teams (Fire Dept., etc.) Public Relations Team Handling media, public, etc. Security Incident Response Team If there is any security incident (Denial of Service (DoS), Virus Attack, etc.) 14

Current Status DR architecture has not been updated since 2011 to match Production architecture. Operational Testing documents for the 5 identified applications that have DR in place (PRISM, evmi, ITSM/C2C, GPMS and TAAS ) date back to 2011 In the process of determining the requirements for additional infrastructure: oto account for changes in the architecture and Infrastructure for currently established DR applications oto provision Infrastructure for changes in the architecture and infrastructure for existing DR platforms as well as new DR platforms (such as Oracle on AIX, JBoss, etc.)

Current Status Business Impact Analysis was completed in 2007 since then no formal updates have been published. New applications may not have a DR environment/plan SAP -Draft of Disaster Recovery Plan being finalized. Infrastructure has been provisioned. Per Business Continuity Team (Scot Falkenstein) - we are aligned with policy effective May 12, 2012 Testing will need to be re-scheduled once the architectural and design changes, Version upgrades and WebLogic to JBoss conversions are completed. (TAAS, ITSM/C2C, GPMS, PRISM)

Process Flow for 2013 DR Parallel Testing SAP evmi 2. 1. Review current documents on File Perform Initial Analysis Architecture updates? Personnel changes? Call tree? 3. Meet with Application & Platform Leads 4. Coordinate Application review and updates 5. Confirm DR Servers including platform interfaces 6. Upgrade platforms 7. Update DR Env. & Testing Documents 8. Schedule Test 9. Conduct Test 10. Document results, learnings and resolve issues 11. Prepare for Full Test ITSM, PRISM, TAAS, WebMethods, WDG, Datastage, IDM, D-Series, Alfresco, Oracle, JBoss, Weblogic, Apache, BO

Disaster Recovery Flow Chart During Disaster (evmi Recovery at DR Site) SAMPLE Crisis Management Team Disaster Declared @ Chandler Network/ Storage Team Server Team Database Team ETL Team Application Team IT Recovery Coordinator Activate the Salvage team to recover/salvage from the Primary site Send notifications To DR Teams Ensure connectivity to DR Site Check & stop applications or services running at Chandler (if any) Stop replication Recover & import volumes Mount volumes & check file systems E No All volumes mounted & file systems consistent? Network team Storage team Yes Mount and start the Oracle and DB2 databases Start middleware (webmethods, Datastage & D-Series) Change Pegasus interface to point to DR site A 18

Disaster Recovery Flow Chart During Disaster (Recovery at DR Site) Continued... SAMPLE Crisis Management Team Network/ Storage Team Server Team Database Team ETL Team Application Team IT Recovery Coordinator A Start evmi Application & Services Test evmi services & Test evmi application E No Is evmi application running? Yes Inform users to Switch over to DR URL Declare DR site up Inform Crisis Management Team Monitor the situation Is Primary site up? B Yes No 19

Disaster Recovery Flow Chart After Disaster (Restoration to Normal Operations) SAMPLE Crisis Management Team B Network/ Storage Team Ensure connectivity to primary site Server Team Database Team ETL Team Application Team IT Recovery Coordinator Start reverse replication of data from DR site Stop replication, recover & import primary volumes Mount Primary volumes & check file systems No All volumes mounted & file systems consistent? Yes Mount and start the Oracle and DB2 databases Start middleware (webmethods, Datastage & D-Series) Change Pegasus interface to point to Primary site Start Siteminder & evmi Application & Services Test evmi services & Test evmi application Network team Storage team Is evmi application running? Yes 20 C No

Disaster Recovery Flow Chart After Disaster (Restoration to Normal Operations) Continued SAMPLE Crisis Management Team Network/ Storage Team Server Team Database Team ETL Team Application Team IT Recovery Coordinator C Check & stop applications or services running at DR (if any) Inform users to Switch-over to Chandler Prod. URL Declare Primary site up Inform Crisis Management Team Start replication to DR STOP Network team Storage team 21

Important Contact Details SAMPLE Notification Tree Sample - evmi. 22

Type of Tests Parallel The DR site is activated Only test transactions are carried out at the DR site Main business continues at the primary site without any interruptions Full The primary (Prod) site is shut down DR site is activated as the primary Business is conducted at the DR site 23

Testing Strategy and Prioritization DR testing for current DR Applications has been prioritized by determining when their changes, upgrades and conversions will be complete. oif changes are due to complete in the next 3 months (May 2013), we have scheduled DR testing to occur after changes are complete. (PRISM & ITSM/C2C) oif changes are greater than 3 months away we will test as soon as architecture has been updated, then re-test after changes are made. o evmi has fewer dependencies therefore we will conduct some testing in April 2013, a Parallel test in June 2013 & re-test in the 4 th Quarter when all changes are complete. othe GPMS team has specifically requested that we delay all DR testing till October 2013 osap failover to El Segundo has been tested successfully (March 2013). DR Parallel testing will be conducted once all changes/updates are complete

Tentative Schedule DR Parallel Testing TAAS MAY 2013 PRISM MAY/JUNE 2013 ITSM/C2C MAY/JUNE 2013 GPMS 4th QUARTER PER REQUEST evmi APRIL 2013 SAP MARCH 2013 (FAILOVER TESTED)

You cannot control disaster. But you can be prepared for it. 26