Document Details. 247Time Backup & Disaster Recovery Plan. Author: Document Tracking. Page 1 of 12



Similar documents
ICT Disaster Recovery Plan

Datacentre Studley. Dedicated managed environment for mission critical services. Six Degrees Group

Data Centre Services. JT First Tower Lane Data Centre Facility Product Description

Sovereign. The made to measure data centre

Fully Managed IT Support. Proactive Maintenance. Disaster Recovery. Remote Support. Service Desk. Call Centre. Fully Managed Services Guide July 2007

Solution Overview. Our Solution employs two tiers of storage aligning costs of storage with the changing value of data over time.

Powerful Dedicated Servers

Client Security Risk Assessment Questionnaire

Data Centre Service Level Agreement

DISASTER RECOVERY AND BUSINESS CONTINUITY

Onsite Support: 24/7/365 onsite team. Security: 24/7/365 onsite manned security and CCTV. Monitoring: 24/7/365 Environmental monitoring and management

South Datacentre Studley

Adlib Hosting - Service Level Agreement

REDCENTRIC MANAGED SERVER SERVICE DEFINITION

G-Cloud 6 Service Definition DCG Cloud Disaster Recovery Service

SERVICE SCHEDULE PUBLIC CLOUD SERVICES

National Aluminium Co. Ltd.

TalentLink Disaster Recovery & Service Continuity

out of this world guide to: POWERFUL DEDICATED SERVERS

PAAS Public Sector Managed Services

Colocation Service Definition. SD008 v1.3 Issue Date 19 Feb 09

SERVICE SCHEDULE DEDICATED SERVER SERVICES

112 Linton House Union Street London SE1 0LH T: F:

Information Services hosted services and costs

Data Centre Services. JT Rue Des Pres Data Centre Facility Product Description

DISASTER RECOVERY. Omniture Disaster Plan. June 2, 2008 Version 2.0

Colocation Service Level Agreement

Secure, Scalable and Reliable Cloud Analytics from FusionOps

Datacentre London 1. Dedicated managed environment for mission critical services. Six Degrees Group

Colocation, Cloud and Managed Services

SCHEDULE 25. Business Continuity

Overview Customer Login Main Page VM Management Creation... 4 Editing a Virtual Machine... 6

SERVICE SCHEDULE MANAGED HOSTED APPLICATIONS

CONTENTS. Security Policy

CLOUD SERVICE SCHEDULE

SCHEDULE 25. Business Continuity

UMHLABUYALINGANA MUNICIPALITY IT PERFORMANCE AND CAPACITY MANAGEMENT POLICY

Our Hosting Infrastructure. An introduction to our Platform, Data Centres and Data Security.

Server Monitoring & Management Services

Network Router Monitoring & Management Services

Data Center Infrastructure & Managed Services Outline

Security Policy JUNE 1, SalesNOW. Security Policy v v

Information Technology Security Procedures

BELLE VUE MANCHESTER DATA CENTRE

Level I - Public. Technical Portfolio. Revised: July 2015

IT INFRASTRUCTURE MANAGEMENT SERVICE ADDING POWER TO YOUR NETWORKS

The University of Iowa. Enterprise Information Technology Disaster Plan. Version 3.1

SERVICE LEVEL AGREEMENT

IT Disaster Recovery Plan Template

<Client Name> IT Disaster Recovery Plan Template. By Paul Kirvan, CISA, CISSP, FBCI, CBCP

GSN Cloud Contact Centre Availability & DR Datasheet

Backup and Redundancy

SERVICE SCHEDULE CO-LOCATION SERVICES

REVIEWED ICT DATA CENTRE PHYSICAL ACCESS AND ENVIROMENTAL CONTROL POLICY

Datacentre South London Data sheet

Guardian365. Managed IT Support Services Suite

Data Centre Stockholm II, Sweden Flexible, advanced and efficient by design.

REDCENTRIC MANAGED BACKUP SERVICE SERVICE DEFINITION

SERVICE SCHEDULE INFRASTRUCTURE AND PLATFORM SERVICES

OKHAHLAMBA LOCAL MUNICIPALITY

Fully Managed Secure Data Sharing (a cloud service)

SITECATALYST SECURITY

Call: Disaster Recovery/Business Continuity (DR/BC) Services From VirtuousIT

HealthcareBookings.com Security Set Up

North Street Global, LLC. Business Continuity Plan

Blackboard Collaborate Web Conferencing Hosted Environment Technical Infrastructure and Security

Security+ Guide to Network Security Fundamentals, Fourth Edition. Chapter 13 Business Continuity

High Availability & Disaster Recovery Development Project. Concepts, Design and Implementation

Backup & Disaster Recovery Options

1 ForestSafe SaaS Service details Service Description Functional Non Functional

NETWORK SERVICES WITH SOME CREDIT UNIONS PROCESSING 800,000 TRANSACTIONS ANNUALLY AND MOVING OVER 500 MILLION, SYSTEM UPTIME IS CRITICAL.

Business Continuity Planning and Disaster Recovery Planning

Rotherham CCG Network Security Policy V2.0

About Injazat. Enterprise Cloud Services. Premier Data Center. IT Outsourcing. Learning and Development Services. Enterprise Application Services

The evolution of data connectivity

DATA CENTRE DATA CENTRE MAY 2015

SERVICE SCHEDULE PULSANT ENTERPRISE CLOUD SERVICES

Business Continuity & Recovery Plan Summary

ASX Australian Liquidity Centre. ASXCoLo

Westek Technology Snapshot and HA iscsi Replication Suite

Business Continuity & Recovery Plan Summary

Infrastructure & Software

LDeX Group. Colocation Solutions for High Expectations

Birkenhead Sixth Form College IT Disaster Recovery Plan

Business Continuity Exercise: Electricity Supply Failure Appendix 4.4

Features Security. File Versioning. Intuitive User Interface. Fast and efficient Backups

MARULENG LOCAL MUNICIPALITY

Sagari Ltd. Service Catalogue and Service Level Agreement For Outsource IT Services

Disaster Recovery. Policy - External

Transcription:

Document Details Title: Author: 247Time Backup & Disaster Recovery Plan Document Tracking Page 1 of 12

TABLE OF CONTENTS 1 INTRODUCTION... 3 1.1 OVERVIEW... 3 1.2 DEFINED REQUIREMENT... 3 2 DISASTER OVERVIEW... 3 2.1 DISASTER DEFINITION... 3 2.2 EXAMPLE OF DISASTER SCENARIO... 3 2.3 SCOPE OF DISASTER RECOVERY PROCEDURES... 4 2.4 ALTERNATIVE LOCATION... 5 2.5 SERVICE DESCRIPTION... 6 3 PROCESS OVERVIEW... 7 3.1 OWNERSHIP... 7 3.2 ROLES AND RESPONSIBILITIES... 7 3.3 INVOKING DISASTER RECOVERY... 8 3.4 MEETING ARRANGEMENTS... 8 3.5 CONVENING THE TEAM... 8 4 RECOVERY PROCESS... 9 4.1 REPLICATION/SYNCHRONISATION... 9 4.2 CUTOVER TO THE ALTERNATIVE SYSTEM... 9 5 PROCESS TESTING... 11 6 ASSUMPTIONS... 11

247Time Disaster Recovery Plan 1 Introduction 1.1 Overview The purpose of this document is to define the disaster recovery service to be provided by 247Time and the outsourcing teams of the server business and the customer. (The Team). The process by which disaster recovery will be invoked is defined as are the criteria that must be met. This version of the document includes the means by which transfer of the system in Recovery to the alternative site we take place and how The Team will establish their connection to it. This document is from a practical point of view and does not reflect contractual responsibilities. The definitions of responsibilities are in the agreements between The Team and its hosting partner. This document is to provide a clear description of who will actually take what action in the event of a disaster and serves no other purpose. 1.2 Defined Requirement The definition of disaster recovery requirement in the contract between The Team and its hosting partner is: The Supplier shall provide appropriate disaster recovery plans to include alternative hosting location (different city to main hosting site) and guarantee the Supplier System will be up and available within 1 hour of the disaster recovery process being invoked. The Team shall have the right to test the Supplier s disaster recovery plans on an annual basis. The further definition of the requirement is as described below: The has an existing disaster recovery site hosting systems in place. This site meets the requirement of being in a different city to the main hosting location and shares no service connections with the main site. Only those Servers protected by Double-Take Availability Software will be available at the (remote) DR location. 2 Disaster Overview 2.1 Disaster Definition Disaster recovery to the alternative location should there be an event of an incident that materially damages the main location such that repairs cannot be guaranteed to allow recovery of the system within 4 hours will be immediately invoked. The Team or its hosting partner may only invoke disaster recovery to the alternative location. In the event of service failure due to material damage to the main location or hardware failure such that repairs cannot be achieved within 4 hours of the failure first being reported a decision shall be taken jointly between The Team and its hosting partner based on overall expected recovery time. 2.2 Example of Disaster Scenario Disasters are by their very nature unpredictable, however, it is possible to suggest the type of event where the recovery process described in this document would be appropriate. Page 3 of 12

The most serious event would involve complete loss of the main site such that all equipment and data at that site were to be lost and no personnel from that site are available to recover the system. An event such as a large explosion adjacent to the building that caused a structural collapse and prevented access would fall into this category Other events envisaged that would cause part of the main suite to become unusable but would mean that personnel were available to recover the system but that the system itself beyond use. An extremely serious flood would be an example. An event where the situation is less clear would be a serious hardware fault within the Hosted Partners infrastructure such that repairs would take a significant amount of time. An example of this type of event would be multiple board failure within the core network that made a large portion of the network to be unavailable. 2.3 Scope of Disaster Recovery Procedures In Scope: Alternative Location The process to invoke disaster recovery to the alternative location. Transfer to the alternative location where the restoration of services from the main location cannot be guaranteed within 1 hour of the disaster recovery process being invoked. Access to the service at the alternative location from The Team. Business continuity for a period of less than 4 hours. Out of Scope: Recovery of any lost data or re-work to bring the recovered system up to date from the point of the latest available system backup. The (remote) DR location does not have provision to restore tape-based Backups. Service failure due to software or data issues where the production environment may be repaired or reconstituted in situ. 1 st line response to service failure using spare equipment and normal maintenance arrangements. Routine hardware maintenance and repair procedures. The process by which a recovered system is either transferred back to the main location or to an alternative permanent location is acknowledged but not included in this document other that in its inception. This process is entirely dependent on circumstances following a disaster and so is not defined. Page 4 of 12

Specific responsibilities in regard of this process are: Responsibility Defined requirement to be met by this process Overall process design and maintenance to meet the defined requirement. Technical design of alternative system Detailed process for bringing alternative hardware and operating environment online Process for connecting The Team Users to the alternative location. Owner Facility Services Manager, The Team Facility Services Manager, The Team Director of Technical Services, Data Centre Support Manager, Facility Services Manager, The Team 2.4 Alternative Location The building is constructed and configured to industry standards. The building measures approximately 40,000 sq ft over ground and two upper floors. The specification includes: 4MW power supply from a 6.6Kv primary ring main 2000 KVA stand by generator Fully air-conditioned Fire detection and suppression including VESDA State-of-the-art security systems Multiple ducting to site boundary Items of plant monitored are as follows: CCTV 24*7*365 Facility Monitoring Facility power supply characteristics including voltage, frequency, current and harmonics Online individual rack power consumption, Humidity and temperature monitoring /control Leak detection Critical infrastructure monitoring Network monitoring. CCTV monitoring is in operation and is visible at the facility. CCTV starts at the car park and extends right the way through the data centre and facility to individual rack level. CCTV records are kept for 30 days. Movement Sensors Motion detection systems throughout the building and intruder alarms on all lifts, with a specifically and individually secured goods lift. Each data centre hall requires the use of a proximity card for entry. Page 5 of 12

Power Backup Systems The local electrical utility company provides a high capacity HV ring from their primary substation. Being a ring the site is protected against cable damage from road works and the like. The facilities have several independent n+1 UPS Systems and multiple diesel generators within onsite energy centres. Each hall benefits from an independent power supply, backed by UPS and generator. If the power goes down totally, the UPS systems take over, providing uninterrupted power whilst the generators kick in. Once activated, the generators have a weeks worth of fuel on-site HVAC Temperature is stabilised between 18 C and 22 C within the recommended guidelines for IT equipment established by the well respected ASHRAE organisation. Fire Detection & Suppression Very Early Smoke Detection Apparatus (VESDA) equipment. VESDA systems constantly analyse air composition by passing particles in front of a laser, and triggering warning systems if particles of combustion are found. FM200 and Argonite gas are used to suppress any potential fires before they take hold. Both gases are harmless to systems and all fires are extinguished within 10 seconds, minimising toxins and the presence of soot. Relocation Process The process by which the system is recovered to an alternative location is designed to be as simple as possible and to rely only on replication-based technology that synchronises a copy of specified Production system Virtual Machines to the alternative location. Alternative hardware will be maintained to the same specification of the Production environment. 2.5 Service Description The alternative system shall: Be based upon a real-time replicated / synchronised copy of the specified Production Environment. Be created from the latest available synchronised copy of the specified Production Environment. Data available will be from the latest available synchronised copy of the specified Production Environment. Shall not have any particular steps taken to accelerate recovery in the event of a system failure that causes the delay of any critical process. Use hardware pre-racked and commissioned at the alternative site dedicated for the creation of the alternative system and that has been tested for use in this manner. Allow connection from The Team using the same connection methods as the Production Environment (Client-based VPN). The Team shall be responsible for establishing their connection to the alternative site. These responsibilities are for initial setup, any maintenance required and invocation in the event of a disaster. Whilst in operation, not be upgraded in any way except to allow for the application of emergency patches to the operating environment or virus protection. Page 6 of 12

Be supported using the reasonable endeavours of both The Team and its. Normal service level terms cannot apply as the circumstances of any event requiring the use of the alternative location cannot be predicted. Shall be created by a process documented to the degree that it may be established by its s personnel who are unfamiliar with the particulars of the system in the event of the experienced staff being unavailable. 3 Process Overview 3.1 Ownership Element of Service Invocation of Disaster Recovery Relocation to the alternative site Detailed process for bringing alternative hardware and operating environment online The Team connection to the alternative site Business continuity for the first 24 hours Owner Joint Ownership: Data Centre Support Manager, Payroll Manager, The Team Data Centre Support, Data Centre Support Manager, Facility Services Manager, The Team Payroll Manager, The Team 3.2 Roles and Responsibilities Individual Role / Activities Escalation Point Payroll Manager, Server Provider System Administrator, Safe Outsourcing Data Centre Support Manager Implementation Manager Joint responsibility for invocation of disaster recovery. Responsible for communications with Safe. The Team connection to the alternative site. Business Continuity for the first 24 hours. The Team connection to the alternative site. Joint responsibility for invocation of disaster recovery. Communications with Safe Outsourcing Relocation to the alternative site The Team Facility Services Manager, Server Provider Director of Technical Services, Data Centre Support Manager, Page 7 of 12

3.3 Invoking Disaster Recovery System Failure Obvious Disaster? No Continue repairs for 4 hours Yes No Yes The Team and to decide whether to invoke disaster recovery Invoke Disaster Recovery 3.4 Meeting Arrangements No physical meetings shall take place. The timescale of the recovery process is such that all meetings shall take place as conference calls. The only scheduled meeting shall be the a three way meeting between the nominated individuals at The Team and its to decide whether or not a situation warrants invoking disaster recovery. All further communications are two way between The Team and its. These communications are scheduled in the process and will be telephone calls backed up by email. 3.5 Convening the team The conference call to discuss the invocation of disaster shall take place between the Data Centre Support Manager and the Customer Services Director of Safe immediately following a system failure if any of the parties feel that it is warranted or if there is a risk of the system not being recovered within a 4 hour period. Page 8 of 12

4 Recovery Process 4.1 Replication/Synchronisation The Production system is subject to a real-time replication and synchronisation process utilising Double- Take Availability Virtual Host Edition Software. The Software runs on the Operating System hosting the The Team Services, and replicates all changes and updates to the alternate location in real-time. The Software is configured to use 1 GB for RAM caching, 10 GB of Disk caching and up to 8 Mb/s Network Bandwidth per Server for replication. This provides the means to restore the alternative system to the latest available replication point. 4.2 Cutover to the alternative system ID Task Owner Disaster recovery invoked. The Team and 1 Either Disaster Recovery will have been initiated automatically or invoked during a conference call between the The Team and its. This call may have been initiated by any party. The decision and the time it was made shall be confirmed during the call and be the subject of an email to all parties. The Team to assume that the system will be restored no longer than 24 hours after disaster recovery has been invoked and follow their business continuity plan. 2 is to follow the steps detailed in the document The Team Disaster Recovery Procedure which describes the exact method to cut over to the alternative system. The tasks below are taken directly from that document. 2.1 The Data Centre Service Desk will take initial receipt of the alerting mechanism and record the incident. 2.2 The Data Centre Service Desk will notify the Data Centre support manager should the incident fall outside the normal recovery procedure. 2.3 The Data Centre Support Manager will determine whether the pre-requisite disaster recovery invocation conditions have been met and the service to the hosted Safe computing cannot be materially recovered before invoking the disaster recovery procedure specific to Safe Computing. 2.4 Ensure that all the incident details have been correctly recorded and carry before carrying out the disaster recovery procedure actions. 2.5 The Data Centre Support Manager will contact the Implementation Manager who will confirm/deny that any further action is possible before the Disaster recovery procedure is invoked for The Team. Data Centre Service Desk Data Centre Service Desk Data Centre Support Manager Data Centre Support Manager Data Centre Support Manager / Hosting Partner Implementation Manager The Team will be notified that the Disaster Recovery process has been invoked. Page 9 of 12

ID Task Owner 2.6 Arrange for the necessary support personnel to be available. Data Centre Support Manager 2.7 All management and monitoring systems will Data Centre be updated in accordance with the recovery operation. Service Desk 2.8 The Data Centre Support Manager will notify the nominated The Team personnel on estimated timescales for the recovery operation. This task is dependent upon input from the Implementation manager/team. 2.9 Implementation Manager to initiate the alternate location recovery procedure for The Team. 2.10 Failover XXX-XXX-XXX-01 onto DR Platform Connect to DR location Host Server Launch Double-Take Console Failover XXX-XXX-XXX-01 Launch Console Start XXX-XXX-XXX-01 Login to XXX-XXX-XXX-01 Assign XXX-XXX-XXX-01 DR Network Address Connect XXX-XXX-XXX-01 to DR Network Virtual Switch Restart XXX-XXX-XXX-01 Verify Network connectivity 2.11 Failover XXX-XXX-XXX-02 onto DR Platform Connect to DR location Host Server Launch Double-Take Console Failover XXX-XXX-XXX-02 Launch Console Start XXX-XXX-XXX-02 Login to XXX-XXX-XXX-02 Assign XXX-XXX-XXX-02 DR Network Address Connect XXX-XXX-XXX-02 to DR Network Virtual Switch Restart XXX-XXX-XXX-02 Verify Network connectivity 2.12 Failover XXX-XXX-XXX-03 onto DR Platform Connect to DR location Host Server Launch Double-Take Console Failover XXX-XXX-XXX-03 Launch Console Start XXX-XXX-XXX-03 Login to XXX-XXX-XXX-03 Assign XXX-XXX-XXX-03 DR Network Address Connect XXX-XXX-XXX-03 to DR Network Virtual Switch Restart XXX-XXX-XXX-03 Verify Network connectivity 2.13 Failover XXX-XXX-XXX-04 onto DR Platform Connect to DR location Host Server Launch Double-Take Console Failover XXX-XXX-XXX-04 Launch Console Start XXX-XXX-XXX-04 Data Centre Support Manager Implementation Manager Implementation Team Implementation Team Implementation Team Implementation Team Page 10 of 12

ID Task Owner Login to XXX-XXX-XXX-04 Assign XXX-XXX-XXX-04 DR Network Address Connect XXX-XXX-XXX-04 to DR Network Virtual Switch Restart XXX-XXX-XXX-04 Verify Network connectivity 2.14 Implementation Manager will notify the Data Centre Support manager that Disaster Recovery process has been completed. implementation manager will also update the account manager that the Disaster Recovery procedure has been completed so that commercial requirements can be met. 2.15 Data Centre Support Manager will ensure that all management and monitoring systems are updated. 2.16 Data Centre Support Manager will notify The Team that the Disaster Recovery process has been completed. 3 4 5 6 7 8 Following task 2.16 shall inform Safe Outsourcing of the date and time of the replication point used to create the alternative system. The Team shall then follow the business continuity plans to prepare re-work or recover additional data from dated input documents. The Team connection to the alternative site is preconfigured so the switchover mechanism is simply to start using the DR shortcuts issued to Users. Confirm System available and accepted. Carry out re-work as necessary and restore/reproduce nondatabase data using dated input documents. Convene a meeting to plan restore of system to a main system at the original or an alternative permanent location. 5 Process Testing Implementation Manager Data Centre Support Manager Data Centre Support Manager The Team The Team The Team The Team and The Team The process of disaster recovery to the alternative site shall be tested once every twelve months. Staff of all parties shall be aware that a test is planned in order that the performance of the test may be monitored and the effect on other customers of The Team and is minimised. Initiation of a test shall be the responsibility of The Team and planned with the Facility Services Manager and Data Centre Support Manager of. Following the test a joint report shall be produced with listing all activities completed and stating those which were successful and those that were not. The process shall then be amended to deal with the issues found and a confirmation report produced. Additional costs dependent on the level of resource and duration will apply for invocation of a DR Trial or DR invocation. Resource will be required for the invocation of DR and for reconfiguration of the System after restoration of the Production Environment. 6 Assumptions During the use of the alternative site no additional Disaster Recovery shall be available. Page 11 of 12

Page 12 of 12