Flinders University IT Disaster Recovery Framework

Similar documents
How To Manage A Disruption Event

Business Continuity Planning and Disaster Recovery Planning

Business Continuity Management Framework

Virginia Commonwealth University School of Medicine Information Security Standard

Shankar Gawade VP IT INFRASTRUCTURE ENAM SECURITIES PVT. LTD.

CENTRAL BANK OF KENYA (CBK) PRUDENTIAL GUIDELINE ON BUSINESS CONTINUITY MANAGEMENT (BCM) FOR INSTITUTIONS LICENSED UNDER THE BANKING ACT

PAPER-6 PART-1 OF 5 CA A.RAFEQ, FCA

NSW Government Disaster Recovery Guidelines

Business Continuity (Policy & Procedure)

Business Resiliency Business Continuity Management - January 14, 2014

NHS 24 - Business Continuity Strategy

Business Continuity Policy

BUSINESS CONTINUITY MANAGEMENT GUIDELINES FOR BANKS AND FINANCIAL INSTITUTIONS

Business continuity management policy

External Supplier Control Requirements BCM

Principles for BCM requirements for the Dutch financial sector and its providers.

Business Continuity Management (BCM) Policy

South West Lincolnshire NHS Clinical Commissioning Group Business Continuity Policy

The PNC Financial Services Group, Inc. Business Continuity Program

The PNC Financial Services Group, Inc. Business Continuity Program

DRAFT BUSINESS CONTINUITY MANAGEMENT POLICY

Joint Universities Computer Centre Limited ( JUCC ) Information Security Awareness Training- Session Four

Business Continuity Planning (800)

Statement of Guidance

NORTH HAMPSHIRE CLINICAL COMMISSIONING GROUP BUSINESS CONTINUITY MANAGEMENT POLICY AND PLAN (COR/017/V1.00)

Subject: Internal Audit of Information Technology Disaster Recovery Plan

Business Continuity Management

Federal Financial Institutions Examination Council FFIEC. Business Continuity Planning BCP MARCH 2003 MARCH 2008 IT EXAMINATION

Business Continuity Management

BUSINESS CONTINUITY MANAGEMENT POLICY

Federal Financial Institutions Examination Council FFIEC BCP. Business Continuity Planning FEBRUARY 2015 IT EXAMINATION H ANDBOOK

Business Continuity and Disaster Recovery Planning

Introduction UNDERSTANDING BUSINESS CONTINUITY MANAGEMENT

The Disaster Recovery Self-Assessment Guide and Validation Model. Jim Kates Cognizant Technology Solutions

Overview of how to test a. Business Continuity Plan

ICT Contingency Plan Top Level Plan

Monetary Authority of Singapore BUSINESS CONTINUITY MANAGEMENT GUIDELINES

Business Continuity Plan Toolkit

Business Continuity Management Policy

Business Continuity Management. Policy Statement and Strategy

Business Continuity Plan

IT Disaster Recovery Plan Template

Solihull Clinical Commissioning Group

HA / DR Jargon Buster High Availability / Disaster Recovery

NHS Hardwick Clinical Commissioning Group. Business Continuity Policy

SUPERVISORY AND REGULATORY GUIDELINES: PU BUSINESS CONTINUITY GUIDELINES

Business Continuity Planning

Ohio Supercomputer Center

Q uick Guide to Disaster Recovery Planning An ITtoolkit.com White Paper

Application / Hardware - Business Impact Analysis Template. MARC Configuration Requirements. Business Impact Analysis

Disaster Recovery Policy

Success or Failure? Your Keys to Business Continuity Planning. An Ingenuity Whitepaper

Business Continuity Management Program Development Guide

NEEDS BASED PLANNING FOR IT DISASTER RECOVERY

<Client Name> IT Disaster Recovery Plan Template. By Paul Kirvan, CISA, CISSP, FBCI, CBCP

Guideline on Business Continuity Management

Australia Pacific LNG Project. Narrows Crossing Pipeline Environmental Management Plan Attachment 3 Crisis and Emergency Management Directive

State of Oregon. State of Oregon 1

Business Continuity - IT Disaster Recovery Discussion Paper - - Commercial in Confidence Version V2.0R Wednesday, 5 September 2012

a Disaster Recovery Plan

Business Continuity Management

CIS 523/423 Disaster Recovery Business Continuity

Emergency Response and Business Continuity Management Policy

BUSINESS CONTINUITY POLICY

How to write a DISASTER RECOVERY PLAN. To print to A4, print at 75%.

Why Should Companies Take a Closer Look at Business Continuity Planning?

Prudential Practice Guide

MARQUIS DISASTER RECOVERY PLAN (DRP)

Proposal for Business Continuity Plan and Management Review 6 August 2008

Release: 1. BSBCON601B Develop and maintain business continuity plans

Balancing and Settlement Code BSC PROCEDURE BSCP537. QUALIFICATION PROCESS FOR SVA PARTIES, SVA PARTY AGENTS AND CVA MOAs

1.0 Policy Statement / Intentions (FOIA - Open)

Business Continuity Management For Small to Medium-Sized Businesses

APPENDIX 50. Enterprise risk management - Risk management overview

Offsite Disaster Recovery Plan

NHS ISLE OF WIGHT CLINICAL COMMISSIONING GROUP BUSINESS CONTINUITY POLICY

PBSi Business Continuity Planning

With the large number of. How to Avoid Disaster: RIM s Crucial Role in Business Continuity Planning. Virginia A. Jones, CRM, FAI RIM FUNDAMENTALS

Second Clinical Safety Review of the Personally Controlled Electronic Health Record (PCEHR) June 2013

Building a Disaster Recovery Program By: Stieven Weidner, Senior Manager

Tips and techniques a typical audit programme

University of Glasgow. Policy for. Business Continuity Management

VISION FOR LEARNING AND DEVELOPMENT

KPMG Information Risk Management Business Continuity Management Peter McNally, KPMG Asia Pacific Leader for Business Continuity

NOT PROTECTIVELY MARKED BUSINESS CONTINUITY. Specialist Operations Contingency Planning Business Continuity Manager

November 2007 Recommendations for Business Continuity Management (BCM)

Business Continuity and Risk Management. Ken Kaberia Principal BCM Officer, Enterprise Risk Safaricom Limited

BUSINESS CONTINUITY PLAN

Technology Risk Management

State of South Carolina Policy Guidance and Training

Information security controls. Briefing for clients on Experian information security controls

Business Continuity Policy

PROCEDURES BUSINESS CONTINUITY MANAGEMENT FRAMEWORK PURPOSE INTRODUCTION. 1 What is Business Continuity Management? 2 Link to Risk Management

Temple university. Auditing a business continuity management BCM. November, 2015

Business continuity management and planning

Sound Transit Internal Audit Report - No

Company Management System. Business Continuity in SIA

Transcription:

Flinders University IT Disaster Recovery Framework Establishment: Flinders University, 1 August 2013 Last Amended: Manager, ITS Security Services, 4 October 2013 Nature of Amendment: Initial release Date Last Reviewed: N/A Responsible Officer: Director, Information Technology Services

Version Control: Version Last Amended Date Amended By Reason for Amendment Draft 1 August 2013 Manager, ITS Security Services v1.0 4 October 2013 Manager, ITS Security Services First Draft Initial Release

Contents 1 Purpose... 1 1.1. Objectives... 2 1.2. Scope... 2 1.3. Triggers and Invocation Procedures... 3 1.4. Consider Assumptions... 3 2 The IT Disaster Recovery Lifecycle... 4 2.1. Program Governance... 4 2.2. Analyse... 5 2.2.1 Objectives... 5 2.2.2 Current State Assessment... 5 2.2.3 Application Criticality Analysis... 5 2.2.4 Infrastructure Risk Assessment... 7 2.3. Develop... 8 2.3.1 Objectives... 8 2.3.2 Availability / Recovery Strategies... 8 2.3.3 IT Disaster Recovery Plan... 9 2.3.4 Application Recovery Plans... 9 2.3.5 Infrastructure Recovery Plans... 10 2.4. Implement... 10 2.4.1 Objectives... 10 2.4.2 Implementation... 10 2.4.3 Training... 11 2.4.4 Testing and Exercising... 11 2.5. Continuous Improvement... 13 3 Roles and Responsibilities... 14 3.1. Role Descriptions... 14 3.1.1 IT Disaster Recovery Manager... 14 3.1.2 IT Disaster Recovery Coordinator... 15 3.1.3 IT Disaster Recovery Technical Team Member... 15 3.1.4 IT Disaster Recovery Steering Committee... 15 3.1.5 IT DR Owner... 16 3.2. Role Assignment... 16

4 Disaster Recovery Artefact Management... 17 5 Definition of Terminology... 18

1 Purpose Information Technology (IT) and systems are vital to supporting Flinders University s business processes. It is critical that the services provided by these systems are able to operate effectively without excessive interruption. Disaster Recovery Planning (DRP) supports this requirement by establishing thorough plans, procedures and technical measures that can enable a system to be recovered appropriately following a disaster. IT systems are vulnerable to a variety of disruptions, ranging from mild (e.g., short-term power outage, disk drive failure) to severe (e.g., equipment destruction, fire) and from a variety of sources such as natural disasters to acts of terrorism. While much vulnerability may be minimized or eliminated through technical, management, or operational solutions as part of Flinders University s risk management effort, it is virtually impossible to completely eliminate all risks. In many cases, critical resources may reside outside the University s control (such as electric power or telecommunications), and the University s may be unable to ensure their availability. Thus effective disaster recovery planning, testing, and execution are essential to mitigate the risk of system and service unavailability. IT disaster recovery plans aim to provide a clear recovery path in the event of losing a critical technology component such as an application or piece of technology infrastructure. They should be detailed to a level which assists staff in recovery of systems. These plans should be action and outcome oriented as they establish: management and staff responsibilities; key action steps to be followed and the strategic options and information required; activation, management and escalation processes; internal and external communication protocols; dependencies for all in-scope components; critical resources and contact numbers for key staff and external service providers; recovery instructions for all in-scope components which may include failover capabilities, build procedures, software loading and configuration procedures, connectivity, data restoration; and testing and maintenance requirements. The IT Disaster Recovery Program is developed in accordance with the IT Disaster Recovery Policy (refer to Flinders University IT Disaster Recovery Policy). 1

1.1. Objectives The University is committed to ensuring effective recovery processes operate across all parts of the University that provide a critical system, either internally or externally. This commitment involves the implementation and end-to-end adoption of an IT Disaster Recovery Plan (DRP). In order to achieve alignment and consistency of the recovery plans, as well as to understand the interdependencies within the University, this document offers an IT disaster recovery framework which will provide guidelines and minimum standards for IT disaster recovery planning. Additionally, the framework also provides guidance on achieving the following: Identifying and classifying applications for each faculty, school or division within the University; continuing the operation of critical applications in the event of a disruptive incident; understanding the criteria and triggers for invoking the IT DRP; ensuring that all staff understand their roles and responsibilities when a disruption occurs; ensuring that there is a clear understanding throughout the University of what accountabilities and responsibilities are in place during an interruption to business as usual ; understanding the necessary documentation and procedures needed for IT disaster recovery planning and the University s expectation around these; educating and training staff on IT disaster recovery and exercising the IT disaster recovery plans. 1.2. Scope IT disaster recovery planning is one part of the larger process associated with managing a disaster or incident. The scope of the IT Disaster Recovery Program described in this framework is limited to the applications supported by the University s Information Technology Services (ITS) division. This framework does not cover emergency response procedures or business continuity management procedures. Please refer to individual faculties and divisions for details business continuity and emergency response plans. 2

1.3. Triggers and Invocation Procedures If a disruptive IT incident intensifies it can become a disaster, escalating the IT disaster recovery effort to a higher level within the University. ITS Incident Management processes will be used as a first response to an IT incident. If the incident cannot be resolved within a suitable timeframe, or the magnitude of the incident is much greater than expected, then IT Disaster Recovery should be activated.! IT Incident ITS Operations Team No Routine IT Incident Escalates to Consults DR Activation required? Yes Initiate Disaster Recovery Activates IT Director Provides updates as required IT Disaster Recovery Manager Provide updates to IT Disaster Recovery Plan Activates relevant Outsourced DR Support? Activates IT DR Steering Committee Yes No ITS (or Business) Technical Recovery Team Application / Infrastructure Recovery Plans Exercise Vendor Contracts Technical Recovery Procedures 1.4. Consider Assumptions Figure 1 - High Level IT DR Escalation The University should consider the assumptions that are being made during the IT disaster recovery planning process. These assumptions should be clearly stated in the IT Disaster Recovery Plan so as to recognise the limitations of the plan. Assumptions should be communicated and agreed by all stake holders. For example, an assumption might be this plan assumes that no more than 2 critical systems within the region are affected by the incident. Individuals should also consider the following possibilities when assessing the impact of an incident: the maximum Recovery Time Objective (RTO) of a critical system may be exceeded; the incident has the potential to involve multiple departments across the University; remote sites may also be involved; there may be a prolonged impact on downstream dependencies; there may be increased public attention (including media) of the areas affected. 3

2 The IT Disaster Recovery Lifecycle The IT Disaster Recovery Lifecycle is shown below highlighting the key steps that should be taken in order to develop an effective IT Disaster Recovery Plan: PROGRAM GOVERNANCE ANALYSE DEVELOP IMPLEMENT Current State Assessment Availability / Recovery Strategies Resource Acquisition Application Criticality Analysis Plans, Procedures and Activities IT Disaster Recovery Plan Application Recovery Plans Infrastructure Recovery Plans Training & Awareness Infrastructure Risk Assessment Validation Training CHANGE MANAGEMENT, QUALITY ASSURANCE & CONTINUOUS IMPROVEMENT Figure 2 - IT Disaster Recovery Lifecycle Diagram The remainder of this section will aim to explain each of the phases described in the lifecycle diagram above. 2.1. Program Governance Program governance is the system by which the IT disaster recovery activities and strategies are driven, managed and directed within the University. It provides structure and consistency to the management of the disaster recovery plan and influences how the University will: set and achieve recovery objectives; assess and manage risks; control critical documents and align them with the broader Governance framework; allocate roles and responsibilities and align the University recovery initiatives; achieve optimal performance. This framework will serves as the driver for governance as it largely combines many elements of the IT disaster recovery policy. The IT disaster recovery governance should articulate and communicate the University s strategic approach toward such planning. 4

Good governance should also be embedded within system recovery thinking and approaches within each level of the University. Disaster recovery governance should be aligned with their wider corporate governance initiatives. Formalising and communicating the roles and responsibilities of key IT disaster recovery stakeholders is a critical component of effective IT disaster recovery governance. Success of this framework therefore hinges on the wide recognition and acceptance of allocated IT disaster recovery roles and responsibilities. Section 3 defines the roles and responsibilities which should be allocated within the University and recommends the appropriate level of training, engagement and exercising that those positions should undertake. 2.2. Analyse 2.2.1 Objectives The objective of the analysis phase is to: analyse and understand the current state of disaster recovery plans in place across the University; identify and quantify the exposure levels of the University to key and prioritised risks; identify critical systems that are used to support the University s functions; identify manual workarounds or alternate working arrangements that are already in place in the event of a loss of a key system. 2.2.2 Current State Assessment A current state assessment should be performed across the IT application environment to determine the level of IT DR maturity. The current state assessment should occur in conjunction with the Application Criticality Assessment, described in the next section. Where gaps are identified, these should be highlighted for remediation. 2.2.3 Application Criticality Analysis In order to ensure IT disaster recovery requirements are met, the business needs to identify and classify the key systems that support the business operations by completing a Business Impact analysis (BIA) annually or where significant change has occurred. This will form the basis of developing and maintaining a disaster recovery strategy that will encompass the critical applications and services across the University. System Identification A discovery exercise must be conducted with the different faculties and divisions in order to establish a prioritised list of systems and services that are being used. Meetings and workshops should be conducted with key representatives from each faculty or division, who have a good working knowledge of the day to day operations of their business area. A toolkit has been provided to help capture the information required, which includes the following information: 5

Application / service name; Business Impact of an outage to the application over a period of time; Threshold of acceptable data loss; Known and approved manual workarounds; Internal and external application support arrangements. While only the application name is required for identification purposes, the remaining information helps with classification, which is covered in the next section. If this exercise has already been performed, then the existing list of key systems should be used as a starting point in discussions with the faculties and divisions, with a view to validate the list of key systems and capturing any new systems that have been implemented. Refer to Appendix A: System Identification & Classification Toolkit. System Classification (Application Criticality Analysis) Once the list of key systems has been established, these systems must be classified according to their criticality. As part of the system identification exercise, the faculties and divisions should be stepped through a series of questions, including: Determining the impact-over-time caused by an outage. This will enable the capture of the magnitude of the impact to the University and the escalating impact caused by a prolonged outage of the application. This will provide the information necessary to determine the Recovery Time Objective (RTO) for each application. Determining the data loss acceptable as a result of an outage. This information should be used to determine the Recovery Point Objective (RPO) for each application. The Recovery Time Objective (RTO) and the Recovery Point Objective (RPO) form the basis of the business recovery requirements for the application in terms of its recoverability and availability. Applications outages that have the potential to cause a significant impact to the University as a result of a system outage need to be identified and appropriately planned for. Using the business requirements, the applications should be classified using the following model: Classification Definition * Conditions Met Tier 1 Tier 2 Tier 3 Critical applications services must be restored as a high priority Important applications services must be restored as soon as Tier 1 applications have been restored Convenient applications services can be stood down temporarily services must be restored as soon as Tier 2 applications have been restored Causes at least a serious impact to the University after an outage that lasts for a period of 4 hours. Causes at least a serious impact to the University after an outage that lasts for a period of 1-2 days. Causes at least a serious impact to the University after an outage that lasts for a period of 3-7 days. 6

Non-essential applications All other applications, that either: Tier 4 services can be stood down for extended periods without significant impact lowest priority in restoration order only cause a serious impact after an outage lasting greater than 3 7 days never cause a serious impact Table 1 - Application Criticality Classification 2.2.4 Infrastructure Risk Assessment Once the key systems have been classified according to their criticality, it is important to identify what the key supporting infrastructure services they depend on. The potential hazards and threats that can cause an application outage due to a disruption in these infrastructure services should also be identified. There are a number of threats that can cause an outage, ranging from human error, sabotage or natural disasters. In order to determine what risks can potentially cause an outage to the supporting IT infrastructure, the following steps must be performed: 1. Develop an application topology map; 2. Perform a Single-Point-of-Failure analysis. Application Topology Maps In order to develop effective system continuity and recovery strategies, each system should be assessed to determine what critical IT infrastructure is used to support its operation. An application topology map helps to provide an end-to-end view of the critical infrastructure that is required by a system to operate. The following information should be determined in order to build a suitable application topology map: Determine the hardware that is used to support the system, including: o o Physical and / or virtual servers, including application server(s), database server(s), web server(s), etc. that are used to run the application; Network links, including switches, cable / fibre capacity and pathways and any redundancies; Determine if there are any critical upstream dependencies (data feeds from other systems); Determine the location of the data centre(s) or server room(s) that house the basic hardware described above; Determine the support arrangements (including any third party support) available for the hardware. Where third parties are responsible for the support of any hardware, obtain any Service Level Agreements (SLA) that exist. Single-Point-of-Failure (SPoF) Analysis Once the application topology has been constructed, it can be used to establish any high level weaknesses in the application design (not including any technical functionality weaknesses such as logical coding errors or business functional requirements). 7

Of particular importance from the context of availability is the existence of redundant hardware and redundant network pathways to provide continued system uptime in the event of an outage affecting one component of the hardware. 2.3. Develop 2.3.1 Objectives The objectives of the Development phase of the IT Disaster Recovery Framework are to: select acceptable continuity and recovery strategies to address the key risks identified; minimise the impact and duration of disruptions to services that critical systems and their key dependencies deliver; document recovery plans for key systems; provide the suitable mechanism to re-establish normal BAU operations; ensure that all personnel are aware of their roles and responsibilities both during and after an incident. 2.3.2 Availability / Recovery Strategies The availability and recovery strategies should be developed based on the business requirements established in the Analysis phase. Specifically the RTO and the RPO should be assessed in the context of the existing or proposed strategy and determine whether it is achievable. The RTO is the timeframe that the business has agreed with ITS, establishes the recovery time required following an outage. If any of the individual components of the application (servers, databases, and network components) cannot be recovered within this period, then new strategies should be developed. The backup strategy employed for each application should be determined by the RPO established by the business and ITS. The amount of data loss that can be afforded should drive the type and frequency of data backups required. It is important to note that when formulating these strategies, consideration should be given to the suitability (financially and practically) of the proposed strategies. It is therefore important for each strategy to be endorsed and signed off by an executive within the University. Presenting the executive with a number of strategies from which they select the chosen option, is another method which can be employed by the University. This method is recommended when strategies result in high exposure (financial, legal, reputation, etc.) to the University or to the industry as a whole. A number of recovery strategies can be selected for each application, including the following examples to recover from virtual infrastructure, recover from tape backups, rebuild the infrastructure and reinstall the application, etc. In some cases it may be most effective to do nothing, and wait for the disaster event to pass. These decisions will need to be made by the IT DR Team at the time. 8

2.3.3 IT Disaster Recovery Plan The IT Disaster Recovery Plan (IT DRP) will provide a holistic view of the IT environment and how it supports the critical applications used by the University. It will provide strategies and guidance for the recovery of the underlying infrastructure, including the data centres, servers, data storage and network links and infrastructure applications (e.g. Active Directory, LDAP), based on the business recovery requirements. The following information should be provided within the IT DRP: definition of a disaster and triggers for consideration of when the IT DRP should be invoked; activation and escalation procedures; roles and responsibilities during the recovery; communication and escalation processes when a plan is invoked; internal and external communication strategies (usually sourced from a communications plan); short term workarounds and alternate working procedures; key contact details for, emergency services, all relevant staff involved in the recovery of IT and external parties involved in the recovery of IT; Critical dependencies; Recovery Assumptions; The IT DRP will serve as a master plan encompassing the key services and components of the IT environment at the University. However, should an outage only affect a single application, there will be a suite of application specific recovery plans accompanying the IT DRP for all applications deemed critical to the University. These are covered in the next section. 2.3.4 Application Recovery Plans While the IT Disaster Recovery Plan coordinates the overall recovery process, a major component of recovering the University s critical operational ability lies with the successful recovery of individual applications. Each application that has been identified as critical must have an Application Recovery Plan (ARP) that will include the necessary steps needed to successfully recover the application. The ARP should also include: recovery of specific hardware / infrastructure (some may be referred onto a dedicated Infrastructure Recovery Plan); recovery of software files; recovery of application specific data (from databases or other storage means), using the available backup strategies; roles and responsibilities for recovery; short term workarounds and alternate working procedures; 9

key contact details for all relevant staff involved in the recovery of IT and external parties involved in the recovery of IT. 2.3.5 Infrastructure Recovery Plans The critical infrastructure services identified will have a dedicated Infrastructure Recovery Plan (IRP) to plan and coordinate the recovery of the service. The IRP should also include: recovery of specific hardware / infrastructure (some may be referred onto another Infrastructure Recovery Plan); recovery of software files; recovery of service specific data, using the available backup strategies; roles and responsibilities for recovery; short term workarounds and alternate working procedures; key contact details for all relevant staff involved in the recovery of IT and external parties involved in the recovery of IT. 2.4. Implement 2.4.1 Objectives The objectives of the Implementation phase of the IT Disaster Recovery Framework are to implement: the IT Disaster Recovery plan, including potential acquisitions and site rollouts; IT Disaster Recovery training for all staff through a comprehensive training plan and schedule; an exercising plan for the IT Disaster Recovery plans on a regular basis, and should be based on an existing exercising plan and schedule. 2.4.2 Implementation This section is focussed on rolling out the IT Disaster Recovery Plan and associated Application Recovery Plans that have been developed. The information and strategies contained within each plan must first be validated by its associated recovery team to ensure they are realistic, factually accurate and fit for purpose. A business representative must also validate the plan to ensure that it meets their recovery requirements. Once plans have been validated, they must be ratified by the University and the IT Disaster Recovery Owner (Information Technology Director). Following official sign-off, copies of the plans must be provided to the following people and groups: IT Disaster Recovery Steering Committee IT Disaster Recovery Manager IT Disaster Recovery Team(s) and Technical Recovery Teams 10

IT DR Owner Other key stakeholders, as determined by the IT DR Owner and Steering Committee The final version of the IT DRP must also be stored in TRIM (the University s document and records management / storage system) and also the University s file server. 2.4.3 Training Appropriate IT disaster recovery training programs should be developed and implemented to ensure that each staff member with roles and responsibilities assigned during a disaster response have the required knowledge and capability. The development of a continuity verification process is essential to ensure that employees are familiar with the measures implemented and that they are confident and competent in their use. A training and testing regime also ensures that the dependent resources and faculties / divisions that support the business recovery strategies and are aligned with the plans in place. Training can be categorised into the following areas: awareness - aimed at providing a cross-section of staff with a general understanding of the subject; team training - aimed at providing key team appointments with targeted training for their respective roles and responsibilities at providing greater level of understanding for individuals who have a team role; coaching - Aimed at providing key team appointments with targeted training for their respective roles and responsibilities including media handling. A training schedule should be established to provide initial training to all staff who may be called upon during a disaster or be involved in continuity planning. Training may involve participation in components of IT DRP testing. 2.4.4 Testing and Exercising The development of an effective testing process is essential to ensure that staff are familiar with the recovery measures implemented and that procedures are update and relevant. The IT DRP should be tested on an annual basis or after any major updates to the technical environment. IT Disaster Recovery testing can consist of the following of approaches: Table Top - This exercise involves the owner and a subset of users of the plan to read over the plan in detail, and ensure that the information contained remains factually accurate and should theoretically continue to provide effective recovery. Walkthrough This exercise chronologically step the recovery team through the process for responding to and managing a crisis using the plans and tools specific for the University. It is aimed at increasing confidence in the use of the plans and the operation of the team during a crisis. Isolated simulation This exercise involves the live activation of the teams and plans using a realistic, hypothetical scenario limited to a specific application and / or 11

associated infrastructure. Exercise participants respond to and manage the incident using the IT DRP and any relevant ARPs. Integrated Simulation This exercise involves the live activation of teams and plans using a realistic, hypothetical scenario involving multiple applications and / or associated infrastructure, to test the ability to restore each application within its business requirements when there is an outage involving multiple applications. Exercise participants respond to and manage the incident using the IT DRP and any relevant ARPs. Full Simulation - This exercise is the most robust examination of the team and plans. It involves the live activation of teams across more than one level of the organisation using a realistic, hypothetical scenario covering all critical applications. Teams are activated to manage the incident using the plan and tools specific to the organisation. This exercise is usually incorporated with a Business Continuity or Crisis Management exercise. Testing Schedule A testing schedule should be developed and consist of a mixture of the types of testing as outlined above. The following table describes the minimum standard for the frequency for tests. Frequency of Testing Across Application Tiers Test Type Tier 1 Tier 2 Tier 3 Tier 4 Table Top After initial plan development; when significant changes to the content occur; when changes to the University s response or organisational structure occur Walkthrough Twice a year Annual Annual Isolated Simulation Integrated Simulation Full Simulation Once every 2 years Annual Annual No minimum standard At the discretion of the Executive Table 2 - Disaster Recovery Test Types & Frequency No minimum standard Note that when an integrated or full simulation is performed over an application, an isolated simulation need not be performed in the same year as that application s ARP will have been tested already. It is important to be aware of the costs an interruption to the University can have and careful strategising and consideration must to be undertaken when planning for each test exercise. 12

Test Document Requirements The following documents should be maintained as part of each disaster recovery test: 1. Test Notification Communication to notify appropriate staff of DR testing, including University faculties, divisions or staff involved, date, type, locations and applications involved. 2. Test Scope Provide the background, objectives, application and / or infrastructure scope, risks, issues, assumptions and reporting guidelines. 3. Test Script This is the actual test plan for the test containing, test objectives, test steps, expected results, actual results, testing staff involved, business owner validation and sign-off. 4. Test Debrief Report Immediately following each test, a written debrief report should be produced, outlining the overall outcomes of the test, the lessons learnt, the areas for improvement and action items resulting from the test. Templates for the above documents are provided with the IT DRP. 2.5. Continuous Improvement The primary goal of the Continuous Improvement phase is to make the IT Disaster Recovery Program at the University self-sufficient and sustainable. A formalised continuous-renewal and review feedback cycle should be developed to help ingrain the processes within the University and ensure it adapts and grows with the organisation. Continuous Improvement can be achieved by implementing rigorous processes around the management of the program, such as: change management of the IT Disaster Recovery program and plans; effective testing and update schedules for plans; ensuring compliance with industry better practice by performing external reviews of the IT Disaster Recovery program. As iterations over the lifecycle of the IT Disaster Recovery program are made, areas of improvements will become clear. It is essential that these improvements are captured and explored sufficiently. If changes are required to the program and / or the IT DRP, they should be validated to ensure the IT DRP remains effective and meets or exceeds the business recovery requirements 13

3 Roles and Responsibilities Formalising the roles and responsibilities of key stakeholders throughout each level of the University is a critical component to achieve effective IT disaster recovery. Therefore the entire IT disaster recovery strategy relies on clear definition, allocation and acceptance of individual roles and responsibilities. The following roles are examples of good practice for an organisation the size and complexity of Flinders University. It is mandatory that these are adopted in a consistent manner across the University. IT Disaster Recovery Manager IT Disaster Recovery Steering Committee IT DR Owner IT Disaster Recovery Coordinator IT Disaster Recovery Team Table 3 - Role Relationships 3.1. Role Descriptions This section describes the responsibilities associated with the key roles in place for IT Disaster Recovery at Flinders University, namely the IT Disaster Recovery Manager, IT Disaster Recovery Coordinator, IT Disaster Recovery Technical Team Member and the IT Disaster Recovery Steering Committee. 3.1.1 IT Disaster Recovery Manager The IT Disaster Recovery Manager is a key role with ownership of and accountability for the IT Disaster Recovery Program. The key responsibilities of the IT Disaster Recovery Manager include: obtaining Executive endorsement for the creation of, and significant amendments made to, the IT Disaster Recovery framework (this document); directing the IT Disaster Recovery training and exercising program and schedule; supervising an annual review of the IT Disaster Recovery process for consistency across all levels; supervising and/or being involved in IT Disaster Recovery exercises; collating results from IT disaster exercises and providing a summary of those results to the executive; 14

creating and delivering the IT Disaster Recovery training and exercising programs and schedules; ensuring annual reviews of IT Disaster Recovery plans are undertaken; reports to and advises the IT Disaster Recovery Steering Committee; reports to the Audit Committee. 3.1.2 IT Disaster Recovery Coordinator An IT Disaster Recovery Coordinator represents each school and department which participates in the IT disaster recovery plan and/or houses a critical system. The key responsibilities of an IT Disaster Recovery Coordinator include: participating in IT disaster recovery training as arranged by the IT Disaster Recovery Coordinator; assisting in the planning and executing of the testing and exercising of the plans; reporting the testing results to the IT Disaster Recovery Manager; participating in an annual review of the IT disaster recovery plan. 3.1.3 IT Disaster Recovery Technical Team Member The IT Disaster Recovery Technical Team (DRTT) is composed of technical experts (usually ITS staff) who will execute the procedures required, as directed by the IT Disaster Recovery Plan and associated Applications Recovery Plans. The IT DRTT will typically consist of staff with skills and knowledge of the applications in scope for recovery, their supporting infrastructure and the backup strategies in place. The responsibilities of the IT DRTT are to: be familiar with the IT DRP and ARPs related to the application(s) within scope for recovery; execute the procedures required to restore the application(s) in a timely manner and use their best efforts to meet the business recovery requirements; communicate regularly with the Disaster Recovery Coordinator and the Disaster Recovery Manager to keep them updated as to the progress of recovery. 3.1.4 IT Disaster Recovery Steering Committee The IT Disaster Recovery Steering Committee represents all of the schools and departments within the University. The purpose of the committee is to: make consensus decisions on the IT Disaster Recovery Framework; make recommendations to the University on the implementation of IT disaster recovery across the University; provide feedback and lessons learnt on completion of any testing or exercising. 15

3.1.5 IT DR Owner The IT DR Owner is ultimately responsible and accountable for the effectiveness of the IT DR Program. The purpose of this role is to: provide regular oversight and direction to the IT DR Manager and the IT DR Steering Committee; make decisions on funding for projects relating to IT DR; review reports on DR tests and incidents to monitor the effectiveness of the IT DR Program. 3.2. Role Assignment The IT Disaster Recovery roles mentioned in section 3.1 Role Descriptions, will be integrated with existing roles at Flinders University. IT Disaster Recovery (DR) Role Assigned to IT DR Owner Director, Information Technology Services IT DR Manager Associate Director, Infrastructure Services IT DR Coordinator(s) Application dependent IT DR Team Associate Director, Infrastructure Services IT DR Technical Recovery Team Associate Director, Application Services Associate Director, Client Services Manager, ITS Security Technical Representatives: o Windows Server Team o Linux Server Team o Backup Team Associate Director, Infrastructure Services Backup Operator Various ITS staff, as required by application Application Technical Manager (if relevant) Application DBA (if relevant) IT DR Steering Committee Director, Information Technology Services Associate Director, Infrastructure Services Manager, ICT Security Non ITS Representatives: o Director, Educational ICT o Director, Student Administration and Systems Table 4 - IT DR Role Assignment 16

4 Disaster Recovery Artefact Management As part of the University s Disaster Recovery Program, a number of documents, templates and toolkits ( artefacts ) need to be readily available and easily accessible. The following table lists the key artefacts, and their locations. Artefact Purpose Location(s) IT Disaster Recovery Policy IT Disaster Recovery Framework IT Disaster Recovery Plan (DRP) Application Recovery Plans (ARPs) Application Identification & Classification Toolkit Disaster Recovery Testing Schedule Provides guiding policy statements and direction for the IT Disaster Recovery Program. Provides high level descriptions of the IT Disaster Recovery Program, the methodology and the minimum standards. This is the master IT Disaster Recovery Plan, documenting the recovery of key components of the IT environment. These documents support the IT DRP by providing specific recover information for critical applications. Template for conducting workshops with the faculties and divisions of the University to determine the critical applications required by the business. Schedule for periodic testing of the IT Disaster Recovery Plan and Application Recovery Plans for critical applications Table 5 - IT Disaster Recovery Artefacts V:\ITS Community\IT Disaster Recovery Plans V:\ITS Community\IT Disaster Recovery Plans V:\ITS Community\IT Disaster Recovery Plans V:\ITS Community\IT Disaster Recovery Plans V:\ITS Community\IT Disaster Recovery Plans V:\ITS Community\IT Disaster Recovery Plans 17

5 Definition of Terminology Terminology Application Recovery Plan (ARP) Critical Application Exercising and testing Governance Interdependencies IT Disaster Recovery Plan (DRP) Maximum Acceptable Outage (MAO) Outage Recovery Point Objective (RPO) Recovery Time Objective (RTO) Risk Assessment The University Definition A plan which documents the recovery of elements specific to an application including, where relevant, any supporting infrastructure. ARPs are to be used in conjunction with the IT Disaster Recovery Plan. An application that has been classified in the highest priority recovery tier, based on an analysis of the business requirements and dependency on this application. Terms used interchangeably within this framework to refer to activities that are structured to practice the implementation of IT Disaster Recovery during a simulated outage. The system by which BCM is directed and managed within an organisation, including the establishment of clear links between corporate governance, risk management, compliance and assurance. In the context of applications, this refers to information flows between applications. Plan which documents the recovery of the general IT environment provided by Flinders University s Information Technology Services (ITS) division, including network and communications systems, and address each key application and the corresponding infrastructure via the Application Recovery Plans. The point in time after which the impacts of a business process outage become unacceptable. Impacts could become unacceptable because of financial losses, operational disruption, regulatory obligations, reputation damage or other reasons. The inability to perform a particular business function, process or service for whatever reason. The time at which data loss pertaining to a critical business application becomes intolerable due to significant impact to the University. The time by which a critical business process or function must be resumed in order to ensure the viability of on-going business operations; the RTO must be equal to or less than the MAO. An RTO may be thought of as acceptable downtime. Involves gaining an understanding of the risks faced by an organisation from a business continuity perspective. This includes all aspects of operations from electricity supply to human resources and payroll. Refers to Flinders University, Bedford Campus. Table 6 - Terminolo 18

19