TalentLink Disaster Recovery & Service Continuity



Similar documents
Frankfurt Data Centre Overview

CloudDesk - Security in the Cloud INFORMATION

System Security. Your data security is always our top priority

Fully Managed Secure Data Sharing (a cloud service)

Document Details. 247Time Backup & Disaster Recovery Plan. Author: Document Tracking. Page 1 of 12

Oracle Maps Cloud Service Enterprise Hosting and Delivery Policies Effective Date: October 1, 2015 Version 1.0

Secure Your Business with EVault Cloud-Connected Solutions

DISASTER RECOVERY AND BUSINESS CONTINUITY

SaaS Security for the Confirmit CustomerSat Software

DISASTER RECOVERY WITH AWS

IT Disaster Recovery Plan Template

Perceptive Software Platform Services

Infrastructure & Software

Leveraging Virtualization for Disaster Recovery in Your Growing Business

Managed IT Secure Infrastructure Flexible Offerings Peace of Mind

SNAP WEBHOST SECURITY POLICY

Proposal for Business Continuity Plan and Management Review 6 August 2008

InsightCloud. Hosted Desktop Service. What is InsightCloud? What is SaaS? What are the benefits of SaaS?

DISASTER RECOVERY. Omniture Disaster Plan. June 2, 2008 Version 2.0

Colocation, Cloud and Managed Services

Business Continuity Planning Principles and Best Practices Tom Hinkel and Zach Duke

<Client Name> IT Disaster Recovery Plan Template. By Paul Kirvan, CISA, CISSP, FBCI, CBCP

Business Continuity & Recovery Plan Summary

Las Vegas Datacenter Overview. Product Overview and Data Sheet. Created on 6/18/2014 3:49:00 PM

itg CloudBase is a suite of fully managed Hybrid & Private Cloud Services ready to support your business onwards and upwards into the future.

Solutions as a Service N.Konstantinidis Technical Director - MNG

Business Continuity & Recovery Plan Summary

Offsite Disaster Recovery Plan

YubiCloud OTP Validation Service. Version 1.2

StratusLIVE for Fundraisers Cloud Operations

Infrastructure Support Engineer Job Profile

SQUIZ SOLUTIONS. Disaster Recovery and Security October 13. Zetland House Clifton Street London EC2A 4LD

Enterprise level security, the Huddle way.

Company Management System. Business Continuity in SIA

Clovis Municipal School District Information Technology (IT) Disaster Recovery Plan

TUFTS HEALTH PLAN CORPORATE CONTINUITY STRATEGY FREQUENTLY ASKED QUESTIONS OVERVIEW CORPORATE CONTINUITY PROGRAM.

GTS Software Remote Desktop Services

TECHNICAL SECURITY AND DATA BACKUP POLICY

Secure, Scalable and Reliable Cloud Analytics from FusionOps

VMWARE VSPHERE 5.0 WITH ESXI AND VCENTER

Disaster Recovery 101. Sudarshan Ranganath & Matthew Phillips Ellucian

Security. Environments. Dave Shackleford. John Wiley &. Sons, Inc. s j}! '**»* t i j. l:i. in: i««;

BME CLEARING s Business Continuity Policy

Virtualization, Business Continuation Plan & Disaster Recovery for EMS -By Ramanj Pamidi San Diego Gas & Electric

Tufts Health Plan Corporate Continuity Strategy

INSIDE. Preventing Data Loss. > Disaster Recovery Types and Categories. > Disaster Recovery Site Types. > Disaster Recovery Procedure Lists

Understanding Sage CRM Cloud

Shared Machine Room / Service Opportunities. Bruce Campbell November, 2011

a Disaster Recovery Plan

Information security controls. Briefing for clients on Experian information security controls

ENTERPRISE BUSINESS CONTINUITY BUILT FROM THE GROUND UP

Call: Disaster Recovery/Business Continuity (DR/BC) Services From VirtuousIT

Level I - Public. Technical Portfolio. Revised: July 2015

A risky business. Why you can t afford to gamble on the resilience of business-critical infrastructure

AVLOR SERVER CLOUD RECOVERY

Bridging the gap between local IT and Cloud services, keeping you in control

GOVERNANCE AND SECURITY BEST PRACTICES FOR PAYMENT PROCESSORS

DESIGNATED CONTRACT MARKET OPERATIONAL CAPABILITY TECHNOLOGY QUESTIONNAIRE

Datacentre Studley. Dedicated managed environment for mission critical services. Six Degrees Group

Easily recover individual files or full disaster restores. Your data will be there when you need it - it s ready to restore. Install it and forget it

Data Center Infrastructure & Managed Services Outline

HP Data Protector software Zero Downtime Backup and Instant Recovery. Data sheet

NORTH HAMPSHIRE CLINICAL COMMISSIONING GROUP BUSINESS CONTINUITY MANAGEMENT POLICY AND PLAN (COR/017/V1.00)

SWAP EXECUTION FACILITY OPERATIONAL CAPABILITY TECHNOLOGY QUESTIONNAIRE

IBX Business Network Platform Information Security Controls Document Classification [Public]

Karen Winter Service Manager Schools and Traded Services

Our Cloud Offers You a Brighter Future

Cloud Computing Disaster Recovery (DR)

Whitepaper - Security e-messenger

Security+ Guide to Network Security Fundamentals, Fourth Edition. Chapter 13 Business Continuity

SCADA Business Continuity and Disaster Recovery. Presented By: William Biehl, P.E (mobile)

YubiCloud Validation Service. Version 1.1

ULH-IM&T-ISP06. Information Governance Board

How To Manage A Disruption Event

Security from a customer s perspective. Halogen s approach to security

Things You Need to Know About Cloud Backup

CompTIA Cloud+ 9318; 5 Days, Instructor-led

Hosting Services VITA Contract VA AISN (Statewide contract available to any public entity in the Commonwealth)

Bridging the gap between local IT and Cloud services, keeping you in control

Product Overview. UNIFIED COMPUTING Managed Hosting Compute Data Sheet

CompTIA Cloud+ Course Content. Length: 5 Days. Who Should Attend:

Disaster Recovery Policy

Projectplace: A Secure Project Collaboration Solution

IBM Cognos TM1 on Cloud Solution scalability with rapid time to value

micros MICROS Systems, Inc. Enterprise Information Security Policy (MEIP) August, 2013 Revision 8.0 MICROS Systems, Inc. Version 8.

Disaster Recovery Committee. Teresa Knox

BUSINESS CONTINUITY PLAN OVERVIEW

Guardian365. Managed IT Support Services Suite

Template Courtesy of: Cloudnition LLC 55 W. 22 nd St Suite 115 Lombard, IL (630)

Bridging the gap between local IT and Cloud services, keeping you in control

Overview Customer Login Main Page VM Management Creation... 4 Editing a Virtual Machine... 6

Backup and Redundancy

TPS Virtualization and Future Virtual Developments. Paul Hodge

By. Mr. Chomnaphas Tangsook Business Director BSI Group ( Thailand) Co., Ltd

Online Business Continuity Solutions for Small Businesses Comparison Report: A Sampling of Online Business Continuity, Disaster Recovery, and Backup

Attachment E. RFP Requirements: Mandatory Requirements: Vendor must respond with Yes or No. A No response will render the vendor nonresponsive.

G-Cloud 6 Service Definition DCG Cloud Disaster Recovery Service

Bridging the gap between local IT and Cloud services, keeping you in control

Barracuda Backup for Managed Services Providers Barracuda makes it easy and profitable. White Paper

Our Colorado region is offering a FREE Disaster Recovery Review promotional through June 30, 2009!

Transcription:

Technical Services Briefing Document TalentLink Disaster Recovery & Service Continuity Version 1.2 (January 2012)

Contents Overview Planning for Service Continuity Disaster Recovery Process Business Continuity Management TalentLink Disaster Recovery & Service Continuity 04/01/2012 Page 2

Overview Document Purpose The purpose of this document is to describe the provisions made for disaster recovery of the TalentLink service hosted by Lumesse at the Frankfurt Data Centre. It is also intended to provide the reader with an insight to the steps taken by Lumesse to prevent against and minimise the impact the occurrence of service failure. Scope Lumesse has implemented a Business Continuity Management System (BCMS) based on the BS25999 standards. As part of this BCMS Lumesse maintains a disaster recovery plan to cater for total site loss of a production data centre with the objective of performing service recovery at a secondary production data centre within the Lumesse secure service network that is prepared for the purpose. While a similar approach is taken for disaster recovery of other Lumesse services and data centres, the scope of this document is specific to the TalentLink infrastructure hosted within the Frankfurt Data Centre. The in scope scenario for disaster recovery is an extended total loss of access or of complete site of the Lumesse Data Centre in Frankfurt. Examples of events that could trigger such a scenario include fire, considerable equipment theft, storm, flood, malware outbreak, lightning, or facility shutdown due to other life threatening circumstances. The disaster recovery service has not been devised to cater for momentary or short lived service failures or component failure. These situations are catered for by standard incident and problem management processes. Service Summary The objective of the disaster recovery service for TalentLink in Frankfurt, Germany is to restore full service functionality at the Lumesse Data Centre in Milton Keynes, UK within a Recovery Time Objective (RTO) of 48 hours and within a Recovery Point Objective (RPO) of 24 hours. As is explained in this document, this is achieved through the use of virtualised infrastructure and storage replication techniques, combined with pre-existing capacity and standard operating procedures. The Technical Services team maintains standard operating procedures to support the execution of the disaster recovery process. Tests, exercises and review of these procedures are completed each year. TalentLink Disaster Recovery & Service Continuity 04/01/2012 Page 3

Planning for Service Continuity When developing and deploying solutions for hosting its SaaS offerings, Lumesse designs infrastructure to deliver high availability as well as to provide continuity of service in the event of invocation of the disaster recovery procedure. The purpose of this section is to provide an overview of the steps taken to achieve this. Lumesse Frankfurt Data Centre Lumesse hosts the TalentLink service at the TeleCity Group Data Centre in Frankfurt Germany. TeleCity is a leading provider of premium data centre services in Europe and were selected as a service provider by Lumesse in part due to their accreditation for the ISO 27001:2005 standard for information security management and the ISO9001:2008 standard for quality management. Attributes of the data centre that contribute to service continuity include: Direct, redundant connection to multi carrier Internet services for high availability Dedicated, secure caged area for Lumesse with card and PIN access to ensure only approved personnel have access 24x7x365 at site engineering services available on demand within a formal Network Operations Centre Business Management System (BMS) maintaining temperature and humidity levels on the data floor areas Very Early Smoke Detection Apparatus (VESDA) and Inergen gas fire extinction systems 24x7 security enforcement from on-site team supported by closed circuit camera systems, automated electronic lock systems and access control list approval for visitors, including Lumesse personnel Standard Hosting Components Lumesse uses the following core hardware components to host the TalentLink service: HP DL 380 and 580 servers configured to a high level of component redundancy Compellent Storage Area Network (SAN) for Fibre Channel and SATA disk storage Cisco Fibre Channel storage switches configured for high availability Cisco Catalyst Chassis Local Area Network (LAN) switches configured for high availability All SAN and local storage configured in redundant arrays HP ilo Advanced solution to support unattended operations by the Lumesse Technical Services team To support disaster recovery Lumesse deploys these standard infrastructure components at all data centres hosting the TalentLink service. TalentLink Disaster Recovery & Service Continuity 04/01/2012 Page 4

Server Virtualisation TalentLink is hosted using a server virtualisation approach based upon VMware vsphere. vsphere allows a flexible and rapid approach to service provision but also supports high availability of services, protecting against server hardware failure. Service Monitoring and Support In order to maintain good visibility of service levels, Lumesse monitors infrastructure and service condition using the Nimsoft monitoring solution supplemented by proprietary service monitoring components provided with infrastructure components, for example the Compellent SAN. Services are monitored and managed by the Lumesse Technical Services team that also provides an out of hours service to ensure availability of support staff at any time. To further protect service availability, TeleCity Group Frankfurt provides Lumesse with additional oversight of the service management console and will contact on call engineers for support in the event of service alerts, escalating up through the Technical Services management team as required. Backup and Recovery Backup and recovery of the TalentLink service is achieved through a combination of SAN to SAN replication of the virtual infrastructure used to host the service and a tapeless vaulting solution, also replicated between data centres, to cover key data components. All data (including candidate document data) and application assets are hosted on the Compellent SAN within the Frankfurt data centre. This service provides for regular snap shots of storage during the day that can be restored immediately within the data centre by the Technical Services team should the need arise to recover server images, candidate documents or complete database copies. The Oracle database is backed up using the i365 evault backup and recovery service which deploys a specific agent for Oracle database support. Backups are taken on a daily basis and versions of data are maintained on a grandfather, father, son basis. Backups are automatically encrypted using 256bit AES encryption for additional protection. Backups success rates are monitored and restore tests are regularly performed. Database backups are retained for a maximum 6 months. All data backups for TalentLink are automatically replicated from the Frankfurt Data Centre to the Milton Keynes Data Centre. evault backups are replicated each 24 hours, while Compellent SAN data is replicated on a continuous basis during the day. This approach ensures that all data assets required for service recovery are already located at the recovery data centre and that sufficient capacity to operate the service is continually available. TalentLink Disaster Recovery & Service Continuity 04/01/2012 Page 5

Disaster Recovery Process Technical Solution Disaster Recovery is achieved through a combination of virtual machines and SAN to SAN replication to re-create the environment in another data centre as quickly as possible with the minimal loss of data. The use of storage replication ensures that software revisions and patch levels at the recovery centre are kept consistent with the production data centre. Even though the candidate data represents a huge volume of tiny files, these too are replicated and are immediately available to the recovery team. The diagram below illustrates the key components at the primary and recovery data centres and the replication between them. TLK - High Level Disaster Recovery Process TLK Web and Application Servers TLK Web and Application servers VM registration VMware vsphere VMware vsphere Servers SAN to SAN replication Servers Frankfurt DNS Redirection Milton Keynes Customers The following table details the recovery approach to each of the TalentLink service components that need to be recovered in the event of a disaster. TalentLink Disaster Recovery & Service Continuity 04/01/2012 Page 6

Component Recovery Process Process Candidate data Database Server SAN to SAN Storage Replication Cloning of production database server Continuous, automated replication to achieve a recovery time objective of no greater than 24 hours Re-fresh of database server image for recovery process following TalentLink version release, typically each month Web Servers SAN to SAN Storage Replication Continuous, automated replication to achieve a recovery time objective of no greater than 24 hours On invocation - register server's in DR environment Application Servers SAN to SAN Storage Replication Continuous, automated replication to achieve a recovery time objective of no greater than 24 hours On invocation - register server's in DR environment DNS Firewall Load balancer Database transaction logs Database data Create DNS entries Firewall rules pre-established in Milton Keynes data centre Rules created in advance on Milton Keynes load balancer installation SAN to SAN Storage Replication evault backup Replication Change Time To Live (TTL) in recovery preparation Change DNS records on invocation Recovery rule set enabled by Lumesse 24x7x365 security services provider at the time of invocation. Recovery rules already in place Continuous, automated replication to achieve a recovery time objective of no greater than 24 hours On invocation apply transaction logs to recovered database to minimise RPO Daily backup replication to ensure availability of database at recovery site. On invocation restore database into recovery database server Invocation The following roles are authorised to initiate the disaster recovery process: Chief Technology Officer Head of Technical Services TalentLink Disaster Recovery & Service Continuity 04/01/2012 Page 7

Recovery Team The recovery team is comprised of systems engineers, database administrators and application specialists from the Technical Services team as well as key 3 rd party service providers contracted to be available on a 7x24x365 basis. Communications During the disaster recovery invocation, communication to all stakeholders shall be coordinated by the following roles: Head of Worldwide Corporate Communications and PR Global Director of Support Restoring to normal Operating State Once the primary data centre has been restored to full capability, replication shall be re-established between this and the recovery data centre. Once replication latency has reached appropriate levels, a change window shall be arranged to reverse the disaster recovery process and re-establish normal operating state. TalentLink Disaster Recovery & Service Continuity 04/01/2012 Page 8

Business Continuity Management Although the purpose of this document is to describe the disaster recovery capability and processes, this documents forms part of the formal Business Continuity Management System (BCMS) of Lumesse. In this chapter of brief description will be given of the BCMS. Business Continuity Management System (BCMS) Lumesse has implemented a BCMS in line with the BS25999 market standard. Lumesse aims to be accredited against this standard in 2012 for its services. As part of the BCMS the following governance concepts have been implemented: Management has set and communicated the business continuity objectives, with due regard to acceptable level of risks, contractual duties and interests of its key stakeholders; Management has established and communicated a Business Continuity policy; A formal BCM governance structure has been implemented Staff is training to ensure its competency and knowledge of the BCM objectives, procedures and processes Business Impact Analysis & Risk Assessments Business Impact Analysis are performed and regularly updated at a department and location level to identify critical dependencies, activities, and resources. The results of the BIA are signed off on an appropriate management level. Risk Assessments are performed and regularly updated to ensure all risks stay with a formalized risk tolerance level. Business Continuity Plans Based on the Business Impact Analysis and the risk assessments, formal Business Continuity plans have been established to ensure Lumesse can handle an interruption in services. All of the business processes are regularly analysed for single points of failures, critical resources in terms of people, applications, infrastructure and suppliers. Threats and risks of each of these components are regularly assessed and mitigated. To ensure people are aware of the business continuity plans and possible threat to business process, the business continuity plans are regularly exercised. Lessons learned and improvements to the business continuity plans are implemented. Continuous Improvement To ensure the BCMS and the Business Continuity plans are continuously improved BCM exercises and DR tests are executed are formally reviewed for lessons learned. Lessons learned are implemented via a formal corrective actions procedure. A combination of internal audits and external audits are used to review focussed assessments of the BCMS. TalentLink Disaster Recovery & Service Continuity 04/01/2012 Page 9