DELIVERABLE D2.3 Integrated report on the link between Risk Assessment and Contingency Planning Methodologies Deliverable: D 2.3 Integrated report on the link between RA and CP Version: Seventh Framework Programme Theme ICT-SEC-2007-7.0-01 Project Acronym: EURACOM Project Full Title: European Risk Assessment and Contingency Planning Methodologies for interconnected networks Grant Agreement: 225579 Coordinator: EOS
<THIS PAGE IS INTENTIONALLY BLANK> Page 2 of 118
Table of Contents 1 Introduction... 10 1.1 Context of EURACOM... 10 1.2 WP2 Deliverables... 11 1.3 WP 2.3 Objectives... 12 1.4 Links of WP2.3 with other EURACOM deliverables... 12 1.5 Structure of the document... 13 1.6 Acronyms... 14 2 Analysis of links between available Risk Assessment and Contingency Planning methodologies... 16 2.1 Objectives of the section... 16 2.2 Relationship between Risk Assessment & Contingency Planning... 16 2.2.1 The preparation loop: from RA to CP... 17 2.2.2 The lessons learnt loop... 17 2.2.3 The relationship at a glance... 18 3 Founding principles of the approaches... 20 3.1 The Scope of applicability of the approaches... 20 3.2 Glossary of Terms and Risk Management Concepts... 23 3.2.1 Definition of terms... 23 3.2.2 Impact for the combined structure of Risk Assessment and Contingency Planning approaches... 26 3.2.3 Towards a holistic, combined, all-hazards approach... 27 4 Risk Assessment... 29 4.1 EURACOM WP 2.1 Desktop Study... 29 4.2 Methodology for Holistic Risk Assessment... 30 Page 3 of 118
4.2.1 Overview of Structure... 30 4.2.2 Introduction to Holistic Risk Management... 31 4.2.3 Methodology description... 32 4.2.3.1 STEP 1: Constitute the Holistic Risk Assessment Team... 32 4.2.3.2 STEP 2: Define the scope of the Risk Assessment... 33 4.2.3.3 STEP 3: Define the scales for risk evaluation... 34 4.2.3.4 STEP 4: Understand the assets in the scope... 36 4.2.3.5 STEP 5: Understand the threats... 37 4.2.3.6 STEP 6: Review security and Identify vulnerabilities... 38 4.2.3.7 STEP 7: Evaluate the associated risks... 40 4.2.4 Maintenance of the risk assessment... 41 4.3 The implementation of EURAM within the Energy Sector... 42 4.3.1 Electricity Transmission... 42 4.3.2 Gas Transmission... 49 4.3.3 Oil Transmission... 56 5 Contingency Planning... 63 5.1 Introduction... 63 5.2 EURACOM WP 2.2 Desktop Study... 63 5.3 The Contingency Planning Approach at a glance... 66 5.4 Preparation Phase... 67 5.4.1 The Objectives and scope... 68 5.4.2 Organisation for Contingency Planning... 70 5.4.3 Risk Mitigation Strategy Setting... 73 5.4.4 Implementation of Prevention and Protection measures... 75 Page 4 of 118
5.4.5 Implementation of Response and Recovery measures... 78 5.4.5.1 Approach - Scenarios selection... 78 5.4.5.2 Continuity of Supply objectives... 79 5.4.5.3 Derive Supply Continuity Objectives in the infrastructure... 79 5.4.5.4 Define possible strategies to meet the Supply Continuity Objectives... 80 5.4.5.5 Selection of strategies... 80 5.4.5.6 Implementation of Response and Recovery Measures: the contingency plan... 81 5.4.5.7 Supporting data: the key elements of a Contingency Plan... 82 5.4.5.7.1 Incident Management... 82 5.4.5.7.2 Crisis Management... 83 5.4.5.7.3 Business Continuity Management... 84 5.4.5.7.4 Disaster Recovery Management... 85 5.5 The Test, Exercise & Training Phase... 87 5.5.1 Contingency Planning Training... 87 5.5.2 Test the Contingency Plan... 89 5.5.3 Contingency Exercises... 91 5.6 The Maintenance Phase... 94 5.6.1 Contingency Planning Maintenance... 94 5.6.2 Lessons Learnt... 97 6 The EURACOM Combined Risk Assessment and Contingency Planning Approach... 100 6.1 The preparation loop... 101 6.2 The lessons learnt loop... 102 7 Managing Dependencies of the energy sector in Risk Assessment and Contingency planning... 104 7.1 Introduction... 104 Page 5 of 118
7.2 Managing dependencies in risk assessment (EURAM)... 105 7.2.1 Defining the scope of the analysis and the risk assessment team... 106 7.2.2 Identifying vulnerabilities stemming from interdependency situations within a wider scope 106 7.2.3 Evaluating (inter)dependency risks... 108 7.3 Managing dependencies in contingency planning... 109 7.3.1 Preparation Phase... 109 7.3.1.1 The Objectives and scope... 109 7.3.1.2 Organisation for Contingency Planning... 110 7.3.1.3 Risk Mitigation Strategy Setting... 110 7.3.1.4 Implementation of Prevention and Protection measures... 111 7.3.1.5 Implementation of Response and Recovery measures... 111 7.3.2 Test Exercise and Training Phase... 112 7.3.2.1 Contingency Planning Training... 112 7.3.2.2 Test the Contingency Plan... 113 7.3.2.3 Contingency Exercises... 113 7.3.3 Maintenance Phase... 114 7.3.3.1 Contingency Planning Maintenance... 114 7.3.3.2 Lessons Learnt... 114 7.3.3.3 Monitoring and Information Sharing... 114 7.4 Current Framework for Operational Practices... 115 8 Conclusion... 118 Page 6 of 118
Table of Figures Figure 1: Structure of the EURACOM project... 11 Figure 2: Interactions between Risk Assessment and Contingency Planning... 18 Figure 3: Interactions between Risk Assessment and Contingency Planning... 19 Figure 4: The energy networks analysis framework... 20 Figure 5: Energy players from organisational to international... 21 Figure 6: High level overview of the Supply Chain... 21 Figure 7: Focus of RA and CP approach... 22 Figure 8: The collection of the Risk Management processes... 24 Figure 9: The EURAM 7 step Risk Assessment approach... 31 Figure 10: Holistic Risk Assessment Team... 42 Figure 11: Impact Scales for Electricity Transmission... 44 Figure 12: Probability Scales for Electricity Transmission... 45 Figure 13: Common Threats to Electricity Transmission operators... 47 Figure 14: Holistic Risk Assessment Team... 49 Figure 15: Impact Scales for Gas Transmission... 52 Figure 16: Probability Scales for Gas Transmission... 52 Figure 17: Common Threats to Gas Transmission operators... 54 Figure 18: Holistic Risk Assessment Team... 56 Figure 19: Impact Scales for Oil Transmission... 59 Figure 20: Probability Scales for Oil Transmission... 59 Figure 21: Common Threats to Oil Transmission operators... 61 Figure 22: The Contingency Planning 3 Phase Approach... 66 Figure 23: Contingency Planning Preparation Phase structure... 67 Page 7 of 118
Figure 24: Risk Assessment Scale... 69 Figure 25: Role vs. Contribution Internal Matrix... 71 Figure 26: Role vs. Contribution External Matrix... 72 Figure 27: Continuity Objective Profile... 79 Figure 28: Continuity Objective... 80 Figure 29: The test, exercise and training phase... 87 Figure 30: The maintenance phase... 94 Figure 31: The Combined Approach Taken by EURACOM to Risk Assessment and Contingency Planning.... 100 Figure 32: The preparation loop... 101 Figure 33: The lessons learnt loop... 102 Figure 34: The High Level Analysis of Risk Assessment and Contingency Planning.... 105 Figure 35: A unique severity scale for multi-stakeholders scopes... 107 Figure 36: A unique severity scale for multi-stakeholders scopes... 108 Page 8 of 118
<THIS PAGE IS INTENTIONALLY BLANK> Page 9 of 118
1 Introduction 1.1 Context of EURACOM The objective of EURACOM is to identify, together with European Critical Energy Infrastructure operators, a common and holistic approach (end-to-end energy supply chain) for risk assessment and contingency planning methods. This is to facilitate the establishment of appropriate levels of resilience within critical energy services across the whole ( end-to-end ) energy infrastructure chain. EURACOM s activities to define common risk assessment and contingency planning methodologies will build upon the EURAM project results: the concepts of the EURAM methodology will be specifically developed further for the energy sector. The objective is to create more resilient energy infrastructures by developing methodologies and tools that assure a dialogue, sharing of data and close co-operation between energy operators, large energy users, security solution suppliers, administrations, regulatory bodies, and other stakeholders. This approach requires common methodologies all along the value chain, from production to distribution. It also requires common methodologies at different hierarchical levels, from individual companies up to European level. EURACOM has to cover all applicable hazards to the energy sector, including threats from natural causes, human intent, technical failure, human failure, dependencies of other Critical Infrastructures and other dependencies. In the development of the EURACOM project, it was apparent that methodological solutions and supporting tools should be developed in close cooperation with European Critical Energy Infrastructure operators. The EURACOM project has been structured accordingly. In order to develop the methodology and supporting tools, the structure depicted in Figure 1 is applied to the EURACOM project. Page 10 of 118
Figure 1: Structure of the EURACOM project 1.2 WP2 Deliverables The role of Work Package 2 (WP2) in EURACOM is the identification of a common and holistic approach for risk assessment and contingency planning. WP2 has three deliverables: Deliverable 2.1: Concerns the analysis of available Risk Assessment approaches to identify good practices from several domains including security industry, national guidance and energy standards. Deliverable 2.2: Concerns the analysis of Contingency Planning approaches to identify good practices from several domains including security industry, national guidance and energy standards. Deliverable 2.3 (This Report): Concerns the analysis of the communally accepted links between Risk Assessment and Contingency Planning practices and the creation of Risk Assessment and Contingency Planning approaches which can be combined and are clearly targeted to the energy sector. Page 11 of 118
1.3 WP 2.3 Objectives The objective of Work Package 2.3 (WP 2.3) is described in the EURACOM Description of Work document: A 1.2 Project Summary: Abstract: EURACOM objective is to identify, together with EU Energy Infrastructure Operators, a holistic approach (end-to-end energy supply train: from fuel transport, power generation and transmission) for risk assessment and contingency planning solutions The scope also includes the requirement to report on the link between Risk Assessment and Contingency Planning Methodologies. In addition to the analysis of the link between Risk Assessment and Contingency Planning methodologies, this report will deliver a combined and holistic approach to Risk Assessment and Contingency Planning in a format that can be used as a framework for implementation by the Energy sector operators. These main additional aspects are delivered through: 1. the creation of a risk assessment and of a contingency planning approach to be implemented at operator (=organisation) level and, 2. as the last section of the document recommendations on how risk assessment and contingency planning processes can be implemented and supported at higher level of analysis (i.e. on the scope of interconnected energy infrastructures involving many operators). 1.4 Links of WP2.3 with other EURACOM deliverables This deliverable D2.3 has relationships with several other EURACOM deliverables: D1.1 Generic system architecture with relevant functionalities for hazard identification : D1.1 models and describes the Energy environment in which the approaches described in this present deliverable will be applicable. As such, D1.1 is a major input in defining the scope of D2.3 as described in section 3.1. D2.1 Common Areas of Risk Assessment Methodologies : D2.1, after presenting the results of the analysis performed in a Desktop study of available Risk Assessment approaches, provides conclusions about the good practices of the discipline. These good practices will be used in order to develop the EURACOM Risk Assessment approach as described in section 4.1. D2.2 Common Areas of Contingency Planning Methodologies : D2.2, after presenting the results of the analysis performed in a Desktop study of available Contingency Planning approaches, provides Page 12 of 118
conclusions about the good practices of the discipline. These good practices will be used in order to develop the EURACOM Contingency Planning approach as described in section 5.3. D2.2 and D2.3 are also combined in order to feed into the analysis of the links between Risk Assessment and Contingency Planning practices as described in section 2. D6.3 Update and validation of used Risk Assessment and Contingency Planning methodologies : D6.3 will contain the final version of the EURACOM Risk Assessment and Contingency Planning methodologies. These methodologies will have evolved from the approaches of D2.3 thanks to the input of the case studies (WP4) and associated workshops (WP5). This will help in refining the approach and in particular to tailor it to the needs of the Energy sector. 1.5 Structure of the document The document is broken down in several sections: Section 2 provides an analysis of links between available Risk Assessment and Contingency Planning methodologies. It looks in particular at the way the two processes are relying on one another. Section 3 presents the founding principles of the approaches by first presenting the scope of applicability they are designed for, by positioning them against other concepts in a wider Risk Management perspective and by providing some of the key characteristics they will have to comply with. Section 4 presents the results of our work to propose a Risk Assessment approach to the energy sector. Section 5 presents the results of our work to propose a Contingency Planning approach to the energy sector. Section 6 summarises how the Risk Assessment approach and the Contingency Planning approach interact to deliver The EURACOM Combined Risk Assessment and Contingency Planning Approach. Section 7 presents recommendations to allow the implementation of the EURACOM approaches at higher level of analysis (i.e. above the single operator level) for Managing Dependencies of the energy sector in Risk Assessment and Contingency planning Page 13 of 118
1.6 Acronyms BCM BCP BIA CI CIP CM CP DR EU ICT IM IPOCM KPI OR PDCA PM RA RAM RM TSO Business Continuity Management Business Continuity Planning (or Plan) Business Impact Analysis Critical Infrastructure Critical Infrastructure Protection Crisis Management Contingency Planning (or Plan) Disaster Recovery European Union Information & Communication Technologies Incident Management Incident Preparedness and Operational Continuity Management Key Performance Indicator Organisational Resilience Plan Do Check Act Project Management Risk Assessment Risk Assessment Methodology Risk Management Transmission System Operator Page 14 of 118
<THIS PAGE IS INTENTIONALLY BLANK> Page 15 of 118
2 Analysis of links between available Risk Assessment and Contingency Planning methodologies 2.1 Objectives of the section The objective of this section is to identify the links and the interactions between Risk Assessment and Contingency Planning. The scope of this section satisfies the EURACOM Description of Work requirement to report on the link between Risk Assessment and Contingency Planning Methodologies 2.2 Relationship between Risk Assessment & Contingency Planning Risk Assessment and Contingency Planning are both key elements within an organisation s Risk Management process. They are essential in the effort to ensure that risk are identified, prevented, treated and where risk mitigation is not feasible, processes are created and implemented to manage incidents should they occur. The EURACOM deliverable D2.1 (Common Areas of Risk Assessment methodologies) highlighted, by its non-inclusion, the issue that the majority of Risk Assessment standards and methodologies make little, if any, reference to the Contingency Planning processes, even though Contingency Planning processes rely on a clear evaluation of the business impact of adverse events. The EURAM methodology is an exception in this respect as it discusses contingency and includes scenario based contingency workshops within the methodology itself. In contrast to this, the findings within the deliverable D2.2 (Contingency Planning Methodologies and Business Continuity) highlighted within section 2.4 Relation to Risk Management, that the majority of the Business Continuity and Contingency Planning Standards and Guidelines included some element of Risk (and Vulnerability) Analysis (and also Business Impact Analysis) within the structure of their framework. D2.2 also identified that although this Risk Assessment link/requirement was included within most of the Standards and Guidelines, the depth of this Risk Assessment link/requirement is very limited with little or no granularity. However, there are some links between the two set of practices even if those are not translated into standards. This section aims at clarifying what these links are and therefore provides objectives for the development of the Risk Assessment and Contingency Planning sections developed later in this document. Please see Figure 8: The collection of the Risk Management processes, for a high level overview on where the two processes reside within the Risk Management process. Page 16 of 118
2.2.1 The preparation loop: from RA to CP As the essential underpinning element of the Risk Management process, Risk Assessment is the initial process used to assess the potential impact and the likelihood of threats exploiting vulnerabilities. Therefore the Risk Assessment process, along with the Business Impact Analysis, provides an organisation with the necessary information required to address risk (Treat, Avoid or Accept) in line with the organisations Risk Management objectives. The Contingency Planning process receives the majority of its input from the Risk Assessment process (including the Business Impact Analysis). Contingency Planning is used by an organisation to plan for the prevention of incidents by implementing formal protective controls and also with ways of minimising the effect of an incident by creating appropriate response and recovery processes. 2.2.2 The lessons learnt loop The lessons learnt following Contingency Planning exercises, testing or incidents (within the organisation or more largely within the energy sector) will provide a feed back to the organisations Risk Management Life Cycle. Following this input, several maintenance actions can take place at multiple levels: First the previous Risk Assessments may require to undergo re-evaluation in order to integrate new data about risk elements through better understanding of vulnerabilities, finer evaluation of threats or better appreciation of the actual chain of reaction that would cause the ultimate impact on the organisation; Then the mitigation controls may be reviewed in order to cover the gaps identified by the lessons learnt. Page 17 of 118
2.2.3 The relationship at a glance From the first principles described in the preparation loop and in the lessons learnt loop, it is possible to build a high-level picture of relationships between Risk Assessment and Contingency Planning processes. Figure 2: Interactions between Risk Assessment and Contingency Planning The preparation loop is a very linear process of implementation of the succession of steps in Risk Assessment and Contingency planning. On the other hand, taking into account lessons learnt is an activity, which requires coming back to previous stages of the analysis, update information, cascade the results into the later stages and to ensure that all underlying elements (processes, documents, plans, etc.) are kept up to date in a controlled manner. It is therefore important that this loop is controlled through a sound process. For this purpose, it is possible to introduce a maintenance process which will coordinate all updates of Risk Assessment and Contingency Planning from lessons learnt but also on other events like periodical review of plans, change in environment (new threats, etc.), and change in operations. By doing this, the lessons learnt loop is better controlled and the changes they can induce are managed consistently with the other changes that can result from other maintenance operations. The introduction of maintenance in the link is introduced below: Page 18 of 118
Figure 3: Interactions between Risk Assessment and Contingency Planning This high level view provides the desired output for the EURACOM RA and CP approaches to operate in a combined manner and sharing a common maintenance process. Page 19 of 118
3 Founding principles of the approaches 3.1 The Scope of applicability of the approaches The EURACOM deliverable D 1.1 provides a view of the energy networks which are analysed, and a way to model the energy networks. These views are analysed on different layers as illustrated on the following diagram. Network Information Process Strategy European level National level Organisational level ELECTRICITY GAS OIL Figure 4: The energy networks analysis framework On this framework, the issue of resilience of energy networks is applicable in and across all the dimensions depicted in this diagram: From Organisational to European levels as the issues are not only intrinsic to the individual organisations and the consequences and the management of adverse events have respectively the potential and the necessity to spread at European scale. Page 20 of 118
Producers Transmission Operators Traders / Shippers Distribution Operators Suppliers Ownership of commodities End Users D2.3 Integrated report on the link between Risk Assessment and Contingency Planning Methodologies Figure 5: Energy players from organisational to international Within each of the Oil, Electricity and Gas services throughout their energy sector supply chain and also across sectors as energy flows move from one sector to another (mainly from Gas to Electricity). Figure 6: High level overview of the Supply Chain In all the layers of one organisation Strategy, Process, Information and Network as the risk factors and the associated responses do not reside in a single layer and rather form a holistic posture where all the measures taken at strategic, processes, information or network level are meant to work in conjunction. Page 21 of 118
This very short introduction to the scope (not entering into any detail) of D 1.1 already provides an insight to the order of magnitude of the complexity of the subject and it is not the objective of EURACOM to provide answers to all of this. The objective and therefore the scope of D2.3 is to concentrate on the energy operators for which the Risk Assessment and Contingency Planning methodologies are meant (as depicted on the figure below). This does not mean that the full picture does not need being taken into account. On the contrary, this picture is very important to set and to integrate to understand in which context each single operator Risk Management approach (i.e. Risk Assessment + Contingency Planning) will have to operate. Network Information Process Strategy European level National level Organisational level ELECTRICITY GAS OIL Figure 7: Focus of RA and CP approach The scope of applicability for WP 2.3 is governed by the EURACOM deliverable D 1.1 and is to build up the method to start with the operator level and expand to higher levels at later sections in the document (i.e. National or European level). The justification of this is that the implementation of consistent and efficient risk management by each operator within its organisation is the prerequisite and foundation for a collective and federated resilience across sectors and borders. This focus on single operators should not forget the relationships they have with other external stakeholders. On the contrary, it should treat them as critical but from the sole perspective of the entity on which the analysis is applied. To meet this objective, the EURACOM D2.3 deliverables proposes first a generic approach to Risk Assessment and Contingency Planning for energy operators and then provides specific information on how to actually implement this generic approach into the distinct sectors of Electricity, Oil and Gas. These sectors are analysed through the focal point constituted by TSOs which are at the heart of mutual dependencies at European level and by taking into account their connections to the rest of the supply chain (source, distribution, other grids, etc.). Page 22 of 118
The initial path explored by the document was to view Risk Assessment and Contingency Planning at the operator level, then the end of the document gives some directions to reflect on for analysis at higher levels where interactions are not any more seen only from one organisation perspective but from the point of view of a network of organisations. 3.2 Glossary of Terms and Risk Management Concepts 3.2.1 Definition of terms It has been identified as part of the desktop studies of task 2.1 and task 2.2 that the use of terms varies considerably with mixes of notions like contingency planning and business continuity planning. The stance taken in EURACOM is one of integrated risk and contingency management 1. For the purpose of this document, the descriptions of the different terms are as follows: Risk Management This is the collection of processes that form an organisations formal threat and vulnerability management process. This includes all processes for risk assessment, risk treatment, risk avoidance, risk acceptance, response and recovery. 1 Integrated Emergency Management concept - Guidance on Part 1 of the Civil Contingencies Act 2004 HM Government United Kingdom Page 23 of 118
Risk Assessment Figure 8: The collection of the Risk Management processes Risk Assessment is the process used to assess the potential impact and the likelihood of a threat exploiting vulnerabilities in order to provide a risk rating prior to the implementation of any risk treatment or mitigation. Business Impact Analysis Business Impact Analysis is the analysis of how a risk scenario can cause a loss to an organisation, this analysis is primarily orientated on the impact on the organisation s business processes. Risk Treatment Risk Treatment is where the risk is reduced by the implementation of countermeasures designed for risk mitigation. Risk treatment measures are aimed at reducing the probability and/or the severity of risk factors. Risk Avoidance Risk Avoidance is used where the Risk Treatment is too costly or too impractical to implement and where Risk Acceptance is not a viable option for an organisation to consider. Risk avoidance is a decision Page 24 of 118
to change the company infrastructure or more largely the mode of operation to ensure there is no more exposure to a risk. Risk Acceptance Risk Acceptance is utilised when the risk level is acceptable to the business or when the risk can not be avoided or mitigated to an acceptable residual risk level. The decision is ultimately taken by the organisation risk owner(s) to accept the risk or residual risk. Contingency Planning Contingency Planning is required to plan for incidents by implementing formal controls to assist with the prevention of incidents and also with ways of minimising the effects should an incident occur by creating appropriate response and recovery processes. Contingency Plan A contingency plan is one of the results of Contingency Planning. Contingency plans are the set of controls materialised through organisation, measures and resources which are put in place as a response and recovery capability to respond to major incidents. Contingency Planning vs. Contingency Plan Contingency Planning is the process by which an organisation prepares itself for the management of incidents and this covers the identification and implementation of prevention, protection, response and recovery mechanisms. The contingency plan is a result of this process focusing on formalising the mechanisms for response and recovery should an incident occur. Incidents, Major Incidents, Disasters & Crises Various definitions exist for these terms. EURACOM proposes to classify these terms in two categories depending on their magnitude and the reaction they trigger for an organisation: 1. Incident is a term used for the occurrence of issues which are a priori of limited magnitude. As a consequence, an organisation would deal with incidents as part of a routine Incident Management Process. 2. Major incidents, Disasters and Crises are reserved for issues whose order of magnitude or complexity can not be handled through a routine incident management process and require the Page 25 of 118
special dispositions of Incident Management (or Crisis Management), Business Continuity and Disaster Recovery. High Impact, Low Frequency events fall for example in this category. Incident Management Incident Management is a process that an organisation puts in place to manage the occurrence of incidents of low to moderate magnitude. The Incident Management process has the possibility of escalating into Crisis Management should the situation deviate from a low to moderate magnitude. Crisis Management Crisis Management is used to formally manage an active incident which has escalated beyond the routine Incident Management process. Crisis management is the organisational and infrastructure measures put in place to ensure that an organisation can be organised in times of crisis (alert rising, crisis cell mobilisation, decision taking, situation awareness and communication of directives). Business Continuity Business Continuity ensures that business recovery processes are implemented to ensure continuity of service with the minimum of disruption. Business continuity is mainly targeted at continuity of supply of goods or services. Disaster Recovery Management Disaster Recovery Management is often used as IT Disaster Recovery Management; it provides the processes for the recovery of key ICT systems following an incident. 3.2.2 Impact for the combined structure of Risk Assessment and Contingency Planning approaches The clarification of these terms allows for the scope and definition of the boundaries and the links between the Risk Assessment Approach and the Contingency Planning approach. When considering the combined nature of the two approaches developed in this document, the choice has been taken to remove all Risk Assessment or Business Impact Analysis from the Contingency Planning approach. It is worth mentioning that, as the analysis within D2.2 has shown, most of the Business Continuity or Contingency Planning approaches integrate a Business Impact Analysis stage. Our choice to remove that step has been taken as the EURACOM Contingency Planning approach will receive those inputs from the EURACOM Risk Assessment approach. Page 26 of 118
Also EURACOM clarifies the differences and relations between concept whose boundaries are often fuzzy like Contingency Planning, Business Continuity, Incident & Crisis Management and Disaster Recovery. All these notions are federated and are clearly differentiated under the umbrella of Contingency Planning which covers the entire spectrum of issues. 3.2.3 Towards a holistic, combined, all-hazards approach The EURACOM approach should respond to three main characteristics. These should be: Holistic in terms of infrastructure coverage, which means that it should include all aspects that contribute to operations, i.e. the physical infrastructure, the ICT infrastructure, the organisation (including links to external stakeholders) and human factor aspects, and the human resources. Combined in the sense that Risk Assessment and Contingency Planning processes need to be closely integrated with clear linkages between one another. All-hazards, in the sense it will cover the two main categories: 1. Accidental (Human or Technical, Natural causes, or linked to external dependencies), and 2. Deliberate (Human). Page 27 of 118
<THIS PAGE IS INTENTIONALLY BLANK> Page 28 of 118
4 Risk Assessment 4.1 EURACOM WP 2.1 Desktop Study EURACOM WP2.1 performed an analysis on available risk assessment methodologies in order to learn from good practices and also to assess their suitability to the context of EURACOM. The analysis, the results and the recommendations are reported in D2.1 Common areas of Risk Assessment Methodologies. The major conclusions from WP2.1 are: [ ] When we look at the EURAM method and compare it to the RA methods we assessed, we can conclude: The EURAM method is still one of the few methods which is both holistic, all-hazard and generically applicable to all Critical Infrastructure (CI) sectors. The EURAM method is one of the few methods that can be applied to all operational and organisational levels of CI and even trans-sector. The EURAM method is unique in the sense that it provides a mechanism to spread responsibilities for risk management over all levels while assuring all risk factors are addressed. Additionally, this is facilitated by a non-prescriptive mechanism. The EURAM method is still rather conceptual and has few supporting tools (although it includes the start of some supporting checklists). The EURAM method complies with the common good practice approaches identified in most of the other Risk Assessment methods. [ ] The further recommendations from WP2.1 to develop the EURACOM Risk Assessment approach are: [ ] Develop supporting tools (checklists or otherwise) to support easy application of the method with respect to determining: o Threats o Vulnerabilities o Effects o Assets Supply clear, tangible, and easy-to-follow steps that require a minimum of expertise of the user. o Develop a supporting glossary of the terms used; o Support the execution of the method with simple, easily distributable tools. Develop a checklist whereby the user can determine beforehand what information is required to complete the RA and where it may be found. Page 29 of 118
Of course, when the RA method is to be applied to a single sector, it can be further honed to the needs of that specific sector (e.g. energy), by conforming to the terminologies of the sector, concentrating on specific assets, threats, vulnerabilities, and effects that are most relevant to the sector. This will further heighten the ease of use. [ ] These conclusions and recommendations are used to develop the EURACOM Risk Assessment approach. 4.2 Methodology for Holistic Risk Assessment 4.2.1 Overview of Structure The seven steps of this risk assessment process are described below from a high level perspective. These are directly extracted from the results of the EURAM approach. The changes introduced by EURACOM will become visible at a more granular level and will include the tailored approach for the energy transmission operations within the energy sector including: Electricity in section 4.3.1, Gas in section 4.3.2 and Oil in section 4.3.3. The decision to narrow the scope to transmission is justified by the fact that energy grids are the pivotal point of dependencies and cascading effects at European scale whether we talk about dependencies between grids themselves or their interaction with source and distribution. In this sense energy sources and distribution knock on effects are analysed through the point of view of the transmission networks and especially the impact they can induce on energy transmission networks and in turn how these networks can propagate the impact to the distribution. Page 30 of 118
Figure 9: The EURAM 7 step Risk Assessment approach 4.2.2 Introduction to Holistic Risk Management Before presenting the detail of the methodology, it is important to present what a Holistic Approach to Security means. A holistic approach or holistic risk management aims at managing risks using a joined up approach. This joined up approach requires each dimensions of risks to be considered, these dimensions are: Physical security. Information and Communication Technology security. Organisational security. Human factor aspects regarding security. These four dimensions will be used to analyse each of the components of the risk (please refer to the glossary section): Assets. Vulnerabilities. Threats. Effects. Page 31 of 118
4.2.3 Methodology description 4.2.3.1 STEP 1: Constitute the Holistic Risk Assessment Team The objective of this step is to select a team that will be in charge of conducting the holistic Risk Assessment. This team will be ideally composed of several persons including: A team leader that will be responsible for the completeness, consistency and homogeneity of the risk evaluation, and Several team members who will bring their expertise from the four dimensions of holistic security physical, Information and Communication Technology, Organisational and human aspects. For the reliability of the risk assessment process the team members should be independent, i.e. those not implementing the process. The implementers will be consulted during the risk assessment process and will contribute with their expert knowledge. It is important that the skills and experience are carefully selected as it is the basis of a successful risk assessment. It is also important to make sure that everybody in the team understands the holistic approach to security. With regards to the team leader, the role is extremely important as this person will be in charge of ensuring that all areas of risk are given equal consideration and that the process of information sharing and identification of risk within the team goes smoothly. As this role is so critical, it is suggested that organisations seek external assistance on the above to overcome any skill gaps or other potential internal difficulties. Concerning the lead associated to the exercise, it is recommended that the responsibility of the implementation of the approach be at a transversal level to avoid the pitfall of silos 2 often found in organisations. When the scope is at a company or operator level, this means that the ownership of the risk assessment has to be taken at senior management level above the various departments or functions. Output: An operational holistic risk assessment team 2 Silos is referring to the compartmentalisation often noticed in organisations where risk factors are not managed across the whole organisation but in silos (e.g. Physical Security, IT security, HR, etc.). Page 32 of 118
4.2.3.2 STEP 2: Define the scope of the Risk Assessment This step can be implemented on smaller or larger scopes with more or less detail depending on the resources applied and the stakes involved, as the principles remains applicable with scale. However, the scope of the holistic risk assessment needs to be clearly defined and understood by all the team. The scope definition needs to have its reality set from a holistic point of view which means that: It should have a physical perimeter including physical assets. It should be composed of defined systems and networks. It should have boundaries from an organisational point of view with identification of the various job functions involved. Please note that dependencies of the organisation towards elements outside of the scope can be analysed using the results of the EURAM project on the Methodology for (inter)dependency analysis. Output: Definition of scope understood by all team. Page 33 of 118
4.2.3.3 STEP 3: Define the scales for risk evaluation This methodology takes the path of practicality. The evaluation of the risk (R) is reached by direct evaluation of probability of occurrence (P) and Severity (S). With R = P x S. It is therefore important at the beginning of the project to define the scale against which probability and impact will be evaluated. For practical reasons, qualitative scales are advised on a 1 to 5 range as it gives enough values to discriminate the risks. The Severity and Probability scale can be presented this way. Probability 1 2 3 4 5 Very low probability Low probability Medium probability High Probability Near certainty Evaluation of feasibility of an attack or likelihood of an accident Concerning the probability scale and later on in the methodology, there are a few pitfalls and good practices to keep in mind when carrying out the exercise: Probability scales need to fit with mainly two types of adverse events: 1. Untargeted attacks or accidents. For these types of events the most appropriate way to evaluate probability is based on historical evidence, using for example experience or statistics on records from past incidents. These records can be gathered at the operator level on past incidents in their particular infrastructure but for low occurrence events wider scopes of information (e.g. sector records, national records if available) would provide a more extensive view of the number of incidents for a statistical analysis. 2. Targeted or intentional attacks. For these types of events, the statistical approach is not as appropriate as the fact that an incident has not occurred yet does not mean it is not a feasible attack and even less that it is not going to happen in the future. The more appropriate approach to evaluate the probability of such events is to evaluate the feasibility of an attack taking into account various factors as attractiveness of the target (asset), motivation/ skills/ resources of the attacker and level of protection of the target (asset). In this area, it might be difficult for operators to assess the probability of certain areas of risks; this is where there can be significant value in sharing information with peers from other organisations or to receive intelligence information from national intelligence agencies. Page 34 of 118
Severity 1 2 3 4 5 Low impact Medium impact Significant impact Critical impact Most severe impact Evaluation of impact on product/ service delivery, citizen security, image, citizen confidence, financial impact, or other aspects. Even if impact and probability levels examples have to be adapted to the scope of the analysis, it is necessary to have a common definition of impact and probability levels, to enable analysis of interdependencies between critical infrastructures. Therefore generic scales across sectors should be used (please refer to section 7.2.3 for detail). Output: Defined scales for evaluation of Probability and Severity. Page 35 of 118
4.2.3.4 STEP 4: Understand the assets in the scope The objective of this section of the risk assessment is for the team members to get an understanding of how the critical infrastructure delivers its service/product. The objective is therefore to understand the organisation in place, the infrastructure (physical or IT) necessary to operate and also the skills required. Through this task, the whole team will understand broadly the operations of the critical infrastructure. Each expert should also reach a deeper understanding of the assets in his area of expertise: Physical assets, ICT assets, Organisational assets, Human resources. It is important to note that the methodology described in the EURAM project for the analysis of (inter)dependencies, when applied on the same scope as the risk assessment, can also assist in understanding how the various assets interact to support the operations. Output: General understanding of the assets involved and their criticality for the operations (this does not imply formalisation of an exhaustive asset register as it is felt that such a detailed register adds little additional value to the approach suggested) Page 36 of 118
4.2.3.5 STEP 5: Understand the threats The objective of this stage is to understand the threat context the infrastructure faces. This does not mean that an exhaustive inventory of threats has to be conducted as it is understood that a vast majority of threats are going to be common and already clearly understood by each expert in his domain. The objective here is more to understand specific areas of the threat profile. Threats need to be understood in the context of the infrastructure studied: level of terrorist threat in the country, past natural disasters in the region, past incidents in the sector or other intelligence on specific threat agents. To support the collaboration between critical infrastructures in performing risk analysis and interdependencies analysis, it is necessary that similar types of threats are considered to ensure consistency of results. Following this principle, the team should refer to a list of classes of threat to consider when doing the analysis to avoid any gaps. An example of a generic threat classification is given for each specific sector, i.e. Electricity, Gas and Oil, in sections 4.3.1, 4.3.2 and 4.3.3. In these classifications, dependence upon other infrastructures is one of the threat categories. Output: Threat profile report detailing information on the level of specific threats in the context of the target of the risk assessment. Page 37 of 118
4.2.3.6 STEP 6: Review security and Identify vulnerabilities The objective of this step is for the security experts of the team to review the actual security controls in place to protect the infrastructure, given the assets and threat context understood at the previous stages. This will lead to the identification of missing security controls and also the effectiveness of these security controls in managing the risk. This will then lead to the identification of the vulnerabilities across the various dimensions of the holistic risks: Physical vulnerabilities (e.g. lack of perimeter protection, lack of access control) ICT vulnerabilities (e.g. no segregation of networks, no antivirus) Organisational vulnerabilities (e.g. no segregation of duties, no allocation of security responsibilities) Human vulnerabilities (e.g. poor training, no screening of key personnel) Sources supporting security review and vulnerabilities identification An energy transmission operator will be required to identify a number of sources for vulnerability information and will also have to rely on their own subject matter experts to verify that vulnerabilities exist, or not as the case may be, within the organisation. To support this process, there are two main types of available information: Security standards providing good practices on security implementation. These sources can be used in order to perform a gap analysis. On this principle, any significant gap to good practices can be considered as a vulnerability. Vulnerability information sources providing data on actual vulnerabilities. These sources are much more focused to specific vulnerabilities that may be exploited by specific threats, which may reside in a particular technology, etc. These two types of information can be found in several sources: Industry Associations: Industry associations will provide a good source for the notification of sector explicit and general vulnerabilities, as these organisations are setup to aid and assist the industry to maintain good practices and therefore they will highlight vulnerabilities that may harm their members. Industry associations may not always provide a service that is very current (dependent on the criticality level of vulnerability), especially if they issue periodic vulnerability bulletins, e.g. monthly. National Government (Security): Government departments will often provide Critical Infrastructure organisations updates on the security threats within the sector which will enable an operator to validate Page 38 of 118
if their organisation and infrastructure is potentially vulnerable. As for example: the UK Centre for the Protection of National Infrastructure (CPNI): The CPNI provides Critical Infrastructure within the UK with Security advice and security good practice guidelines, including an ICT vulnerability watch service. http://www.cpni.gov.uk Some private companies provide a security assessment and notification service about physical security threats to sectors such as the energy sector (including early detection and remediation advice where appropriate). Information about such threats and vulnerabilities are sent out to the subscribed service users. Some Government departments and private companies will provide an ICT vulnerability watch and notification service where ICT vulnerabilities (including remediation advice where appropriate) are collated and sent out to the subscribed service users. Manufacturers: These will often provide notification of vulnerabilities within their products (software/hardware) and suggestions for remediation, but sometimes they may not be able to provide timely notification or even effective remediation. Internal: It is also very important that an Energy Transmission System Operator utilises it own internal experts to monitor their environment and maintain a level of vulnerability watch within their area of expertise. Examples of other sources for vulnerability information and assistance Guide for ICT Vulnerability Identification: ICT Standard ISO/IEC 27002 for Corporate ICT systems. This Standard provides good practices for ICT security. It is therefore possible through a gap analysis to identify vulnerabilities. MPSCIE, E-SCSIE and national process control information exchanges (e.g., ISACs): The Meridian, European, and national SCADA (and process control) Security Information Exchanges aims for the process control users, governments and research to benefit from the ability to collaborate on a range of common security-related issues, and to focus effort and share resource where appropriate. The intended outcome is a raised level of protection adopted across international as well as Europe's SCADA and other Process Control Systems. The National Vulnerability Database (NVD): Is a publicly accessible reference system for publicly known ICT vulnerabilities and exposures. It is funded by the National Cyber Security Division of the United States Department of Homeland Security: http://nvd.nist.gov/ Bugtraq: This is a mailing list where ICT Security issues and vulnerabilities are sent to subscribers of the service. Options to subscribe to other Security related information such as security incidents is also available: http://www.securityfocus.com/archive Page 39 of 118
NIST: The US National Institute of Standards and Technology have issued a publication on Creating a Patch and Vulnerability Management system: http://csrc.nist.gov/publications/nistpubs/800-40- Ver2/SP800-40v2.pdf MITRE: 1. CVE database is a dictionary of publicly known information security vulnerabilities and exposures. http://cve.mitre.org/ 2. Common Weakness Enumeration (CWE) provides a unified, measurable set of software weaknesses that is enabling more effective discussion, description, selection, and use of software security tools and services that can find these weaknesses in source code and operational systems as well as better understanding and management of software weaknesses related to architecture and design. http://cwe.mitre.org/ CVSS-SIG Common Vulnerability Scoring System Support v2 (CVSS) CVSS provides a universal open and standardized method for rating IT vulnerabilities. http://www.first.org/cvss/ Output: Documented list of detailed vulnerabilities on the scope of the study in all areas of holistic security. 4.2.3.7 STEP 7: Evaluate the associated risks From the vulnerabilities listed at the previous stage, associated risks are identified. For each vulnerability identified, the associated scenario(s) of incident can be developed. A scenario of incident associated to a vulnerability is a threat exploiting this vulnerability to harm assets and more largely the infrastructure. For each scenario, probability and severity are evaluated using the scales previously defined which allows then to evaluate the risk associated to each vulnerability. It is important to note that this step of the risk assessment can also benefit from inputs from an (inter)dependency analysis (please refer to the (inter)dependency analysis approach developed by the EURAM project) carried out on the same scope. This dependency analysis will provide useful information for identification of scenarios and ranking of associated risks. In this context, the different components of the risk will be identified in the following manner: Vulnerability: In the case of a dependency, the vulnerability is that one or several assets are dependent on a service provided with limited resilience in case of disruption of this service. Threat: In this context the threat is the disruption of the essential service associated to the dependency. Asset: the assets impacted are the assets being dependent. Page 40 of 118
Severity: the dependency analysis will provide useful information for severity analysis identifying in particular any possible knock-on effects and evolution of the impact over time. Probability: The evaluation of the probability will be supported by the description of the dependency context. The result of this last step is the list of relevant risks that the infrastructure faces from a holistic point of view. These risks are all evaluated and ranked which will support decision making in the risk mitigation process part of the contingency planning approach. Output: List of risks that have been qualified in terms of associated vulnerability (ies) and probability & severity levels. These risks will also be useful to support interdependencies analysis between critical infrastructures. By construction these risks can be compared or cross-analysed with risks identified in an other infrastructure provided that the same approach has been followed. This methodology used with supporting guidelines ensures: Consistent definition of scope, Consistent scales for impact, probability and risk evaluation, Comprehensive list of threat classes for threat context identification. 4.2.4 Maintenance of the risk assessment This information is the result of the holistic risk assessment and this is the document that will have to be maintained regularly to follow the evolution of the risk profile depending on: The evolution of the infrastructure (reorganisation, new assets, etc.), Changes in the threat context, Implementation of new security controls, Discovery of new vulnerabilities, attack techniques. The maintenance of the risk assessment can also receive some feedback from experience of real or simulated incidents through lessons learnt. This experience allows to refine the evaluation of risks and notably in terms of the effects and cascading consequences or to identify new risks which were not anticipated before. Page 41 of 118
4.3 The implementation of EURAM within the Energy Sector The following subsections detail the tailored approached for the energy transmission sector (Electricity, Gas and Oil) as mentioned in section 4.2.1. It must be mentioned that there will be a number of similarities between the 3 energy transmission sectors (especially gas & oil transmission) and therefore some of the requirements will be the same (e.g. Setting up the Holistic team, SCADA, etc.). 4.3.1 Electricity Transmission Step1: Constitute the Holistic Risk Assessment Team The objective of Step 1 is to create and the holistic risk assessment team. The team should comprise personnel as described in Figure 10, with the level of their contribution dependent on the risk assessment being undertaken. The Risk Manager is the owner/lead for this process and should be assisted in the process by 3-4 independent individuals. Expert and specialist input into the process will be provided by the following organisational functions as and when required to the Holistic Risk Assessment team: Holistic Risk Assessment Team Contributors Maintenance Team (Transmission Infrastructure) Engineering Team (Transmission Infrastructure) ICT & Physical Security Contingency Planning Manager Facilities Team HR Team Control/Dispatch Room Manager SCADA/Telemetry Manager(s) Logistics Team ICT (system & networks) Team Figure 10: Holistic Risk Assessment Team Page 42 of 118
Step2: Define the scope of the risk assessment It is essential that the definition of the scope of the risk assessment is fully understood by the holistic risk assessment team. The elements that could be considered for inclusion within the scope of the risk assessment undertaken on an electricity transmission system operator s environment might be: Primary Electricity Transmission Infrastructure: Wire (overhead, under ground &, underwater) Pylons & Poles Substations Interconnector: An Interconnector is the point where the transmission network connects either at a national or cross border/international level with another TSO area. Supporting Electricity Transmission Infrastructure: SCADA/Telemetry: These contain the key elements for the management, monitoring and control of the electricity transmission infrastructure and include real time and historic status information. ICT Networks and Systems: The ICT systems and networks is the infrastructure that supports the operations of the corporate and the SCADA/Telemetry environments. Facilities: The facilities include the buildings and land where electricity transmission assets are located. Engineering function: The engineering function is responsible for the deployment and management of the assets used for the transmission of electricity over the transmission infrastructure. Maintenance function: The maintenance function has the role of maintaining the infrastructure and work in conjunction with the engineering function. Contingency plans: The contingency plans provide the organisation with the tools required to react in an effective manner following an incident. Electricity Transmission Infrastructure Dependencies: Electricity supply: This can be from Nuclear, Fossil, Bio, renewable, etc. Page 43 of 118
Not owned telecommunications: Communication with own process control elements, adjacent TSOs, DSOs, producers (planning 24h and longer term), relevant Power exchanges (e.g., APX), maintenance crews,..) Weather (forecasting) services (short term and 24h planning demand as well as wind/solar power supply) Step 3: Define the scales for risk evaluation The objective is to provide defined scales for the evaluation or probability and severity. In the electricity transmission sector, the scales for risk evaluation of incidents can be estimated in the following terms: Impact Scales The following table indicates the possible impact scales for an electricity TSO: 1: Low impact 2: Medium impact 3: Significant impact 4: Critical impact 5: Most severe impact Extent of loss of supply (by percentage of customers, by percentage of nominal capacity) < 5% > 5 % > 25% > 50% > 75% < 25% < 50% < 75% Duration of power outage or fluctuation of supply quality (Brown Outs/Surges/Spikes) < 5 Seconds > 5 Seconds > 5 Minutes > 1 Hour > 12 Hours < 5 Minutes < 1 Hour < 12 Hours Financial loss Loss of revenue, Customer compensation, Cost of reactive remedial action & Regulator fines as a percentage of revenue during the period of an incident <.5% < 5% < 25% < 50% > 50% Figure 11: Impact Scales for Electricity Transmission Page 44 of 118
Probability Scales The following table indicates the possible probability scales for an electricity transmission operator: 1 Very low probability 2 Low probability 3 Medium probability 4 High Probability 5 Certainty Accidental or untargeted attacks It is extremely unlikely that the incident will occur as for example there is merely no experience of it in the electricity transmission sector. The incident is not likely to occur as for example experience of it is very limited in the electricity transmission sector. It is likely that the incident will occur as, for example, similar incidents have been reported in the electricity transmission sector. It is very likely that the incident will occur in the organisation as, for example, most of the electricity transmission sector has already suffered such incidents. The incident will happen in the organisation in the close future. Deliberate attacks Attack would require virtually unlimited resources (money, skills, etc.) Attack very difficult to perform needing conjunction of expert skills and money. Attack not easy but could be possible with single expert skills and a reasonable investment in time and effort. Attractiveness, lack of protection and, resources of the attacker making the attack perfectly feasible. Attractiveness, lack of protection and, resources of the attacker making the attack ordinary. Figure 12: Probability Scales for Electricity Transmission Page 45 of 118
Step 4: Understand the assets in the scope The objective of Step 4 is to gain an understanding of the assets within the scope of the risk assessment for operations. Examples of the assets that could be within the scope of the risk assessment and that require their criticality and their priority levels to the electricity transmission service to be understood are: The transmission grid (wires & cables, poles & pylons): The need is to fully understand and appreciate the extent of the network and the resilience levels provided in case of the loss of a line. Dependencies to energy sources, such as a Power Plant or other interconnections with other transmission networks are also to be analysed. This analysis should be done for the different mode of operation of the infrastructure such as seasonal usage and other usage patterns such as increases in peak demand. The substations: These assets are used to step down the voltage from the very high voltages (e.g. from 110KV to 765KV) to lower voltages (~50KV) (or vice-versa to step up lower voltages to high voltage), to switch the energy flow, to control the reactive power, and to self-protect the grid elements (e.g., tripping when lightning, wire failures, technical failure occur). The need at this stage is to understand the criticality of substations within the energy supply chain. These assets contain Transformers, Circuit Breakers, Protection devices, VAR compensators (SVR, Phasors), and Switches. Measurement devices in substations provide the TSO with insight in the current energy flows in the grid. Substations can also be the point where electricity Transmission System operations exchange power with the Distribution System operations start. The ICT systems & networks: This is primarily the organisations ICT assets and the need to protect sensitive information that may facilitate a compromise of the electricity transmission system infrastructure should its protection fail. It is also important to identify any possible route from the ICT corporate infrastructure into the Telemetry infrastructure. The SCADA/Telemetry infrastructure: The need is to understand the potential control of the energy transmission infrastructure that can be achieved using the SCADA/Telemetry components and the potential to cause widespread disruption to the electricity transmission infrastructure should the systems be misused or compromised. The Engineering and Maintenance teams: The need at this stage is to understand what are the critical activities undertaken, who are the key actors within these teams and what other dependencies they rely upon in order to deliver the required levels of service. The Control/Dispatch Room (including Control Room personnel): The need is to understand the level of resources required to function for different scenarios (normal load, peak time, incidents, etc). Also how much of the control rooms management, monitoring and control of the systems are manual, automatic or a combination of both. Page 46 of 118
Interconnection points: These assets are where the transmission network interconnects with other networks and what the level of criticality is given to each interconnection. The Contingency Plan: The need is to understand the contingency plan and all the resources that can be activated during an incident affecting the scope of the risk assessment. Step 5: Understand the threats The objective is to understand the specific threats in the context of the target of the risk assessment being undertaken. The common threats against an electricity transmission organisation would include, but are not limited to, the following: Intent Failure/Accident Nature Cascade Acts of Terrorism Acts of Vandalism Theft (copper/metals) Theft (equipment) Industrial action Targeted Cyber Attack Virus/Trojans EMP Act of War Negligence Mistake Impact (e.g. vehicle against pylon/pole) Ingress of Water Explosion Disclosure of information (Theft/Leakage) Equipment malfunction or failure Extreme weather conditions Pandemic (Flu/etc) Geological Fire Flood Solar Activity Loss of power supply/utilities/services Loss of Telecoms Loss of Energy Supply to the Electricity Transmission Network (Interconnector / Generated supply)loss of black start capability Loss of pumped storage capacity Diplomatic Incident Chemical (spillage) Loss, unavailability or turnover of personnel Outdated and unmaintainable technology Figure 13: Common Threats to Electricity Transmission operators An Electricity Transmission System operator is required to review the high level threats listed above and if they are applicable to their context, they should proceed to evaluate the level of threat exposure, Page 47 of 118
taking into consideration that the level of the threat of exposure may not be consistent across the scope of the risk assessment. Step 6: Review Security and identify vulnerabilities The Step 6 objective is documenting the detailed vulnerabilities within the scope of the holistic risk assessment. This step is very dependent on the particular situation at hand. It is therefore difficult to provide a list of typical vulnerabilities in the TSO domain. This would have the detrimental effect of focusing the user of this approach on a finite list of vulnerabilities, which would by no mean be exhaustive. Step 7: Evaluate the associated risk The objective of Step 7 is to compile a list of risk factors that have been qualified in terms of associated vulnerability(ies), probability, and severity levels. As such, it is not possible to tailor this explicitly for the energy sector as it is a generic step. However, by virtue of the previous 6 steps being tailored for the Electricity Transmission Operator, the output of Step 7 is specific to the energy sector as a whole. Page 48 of 118
4.3.2 Gas Transmission Step1: Constitute the Holistic Risk Assessment Team The objective of Step 1 is to create and the holistic risk assessment team. The team should comprise personnel as described in the table below, with the level of their contribution dependent on the risk assessment being undertaken. The Risk Manager is the owner/lead for this process and should be assisted in the process by 3-4 independent individuals. Expert and specialist input into the process will be provided by the following organisational functions as and when required to the Holistic Risk Assessment team: Holistic Risk Assessment Team Contributors Maintenance Team (Transmission Infrastructure) Engineering Team (Transmission Infrastructure) ICT & Physical Security Contingency Planning Manager Facilities Team HR Team Control/Dispatch Room Manager SCADA/Telemetry Manager(s) Logistics Team ICT (system & networks) Team Safety Team Figure 14: Holistic Risk Assessment Team Page 49 of 118
Step2: Define the scope of the risk assessment It is essential that the definition of the scope of the risk assessment is fully understood by the holistic risk assessment team. The elements that could be considered for inclusion within the scope of the risk assessment undertaken on a gas transmission operator s environment might be: Primary Gas Transmission Infrastructure: Pipe (Over ground, underground & underwater) Storage Tanks: These are located in strategic points along the transmission network and are used to store the gas and to release it on demand as demand cannot be met by pipes alone. Gas Transmission Network booster stations (compressors): These are used to maintain the pressure of the gas within a section of the transmission network through the use of compressors. Supporting Gas Transmission Infrastructure: Offshore gas feeder (gas receipt) station (removal of moisture and condensate). Odourisation (add bad smell with e.g. Odoran). Blending station (N 2 injection ). SCADA/Telemetry: These contain the key elements for the management, monitoring and control of the gas transmission and storage infrastructure and include real time and historic status information. ICT Networks and Systems: The ICT systems and networks is the infrastructure that supports the operations of the corporate and the SCADA/Telemetry environments. Facilities: The facilities include the buildings and land where gas transmission and storage assets are located. Engineering function: The engineering function is responsible for the deployment and management of the assets used for the transmission of gas over the transmission infrastructure. Maintenance function: The maintenance function has the role of maintaining the infrastructure and work in conjunction with the engineering function. Contingency plans: The contingency plans provide the organisation with the tools required to react in an effective manner following an incident. Page 50 of 118
Gas Transmission Infrastructure Dependencies: Tankers (Ship): These contain up to 1506,000 cubic meters of Liquid Natural Gas (LNG) and are used to transport gas to onshore gas terminals ready to be processed and added to the gas transmission network. Terminals: This is where gas from the gas fields (and Tankers) is stored and processed and where gas is introduced to the national transmission network (also cross border & international via interconnectors). LNG is warmed and converted back to it gaseous form (re-gasification) at the terminal, before being injected into the transmission network. Interconnector: An Interconnector is the point where transmission networks connect either at a national or cross border/international level. PIG Launchers: These are Y shaped points within a gas pipeline where a maintenance PIG (Pipeline Inspection Gauge) or Scraper is introduced into the pipeline. Step 3: Define the scales for risk evaluation The objective is to provide defined scales for the evaluation or probability and severity. In the gas transmission sector, the scales for risk evaluation of incidents can be estimated in the following terms: Impact Scales The following table indicates the possible impact scales for a gas transmission operator: 1: Low impact 2: Medium impact 3: Significant impact 4: Critical impact 5: Most severe impact Extent of loss of supply (by percentage of customers, by percentage of nominal capacity < 5% > 5 % > 25% > 50% > 75% < 25% < 50% < 75% Duration of loss of supply or sustained low pressure. < 6 Hours > 6 Hours > 12 Hours > 24 Hours > 48 Hours < 12 Hours < 24 Hours < 48 Hours Page 51 of 118
Financial loss Loss of revenue, Customer compensation, Cost of reactive remedial action & Regulator fines as a percentage of revenue during the period of an incident <.5% < 5% < 25% < 50% > 50% Figure 15: Impact Scales for Gas Transmission Probability Scales The following table indicates the possible probability scales for a gas transmission operator: 1 Very low probability 2 Low probability 3 Medium probability 4 High Probability 5 Certainty Accidental or untargeted attacks It is extremely unlikely that the incident will occur as for example there is merely no experience of it in the gas transmission sector. The accident is not likely to occur as for example experience of it is very limited in the gas transmission sector. It is likely that the accident will occur as, for example, similar accidents have been reported in the gas transmission sector. It is very likely that the accident will occur in the organisation as, for example, most of the gas transmission sector has already suffered such incidents. The accident will happen in the organisation in the close future. Deliberate attacks Attack would require virtually unlimited resources (money, skills, etc.) Attack is very difficult to perform needing conjunction of expert skills and money. Attack is not easy but could be possible with single expert skills and a reasonable investment in time and effort, Attractiveness, lack of protection and, resources of the attacker making the attack perfectly feasible. Attractiveness, lack of protection and, resources of the attacker making the attack ordinary. Figure 16: Probability Scales for Gas Transmission Page 52 of 118
Step 4: Understand the assets in the scope The objective of step 4 is to gain an understanding of the assets within the scope of the risk assessment for operations. Examples of the assets that could be within the scope of the risk assessment and that require their criticality and their priority levels to the gas transmission service to be understood are: The transmission grid (pipelines, Interconnectors, compressor stations, odourisation): The need is to fully understand and appreciate the extent of the network and the resilience levels provided in case of the loss of a section of pipelines, Interconnector or special processing facilities. Dependencies exist to energy sources, such as power supply for compression and processing LNG shipping Terminal or other interconnections with other transmission networks. This analysis should be done in a different mode of operations such as seasonal usage and other usage patterns such as increases in peak demand and price fluctuations. The Terminals, storage facilities, and infeeder processing plants: these assets are used to receive, to process, to store and to pump gas to the gas transmission grid. Gas can be received directly from gas production fields using pipelines or by LNG tanker/harbour facilities. The ICT systems & networks: This is primarily the organisations ICT assets and the need to protect sensitive information that may facilitate a compromise of the gas transmission infrastructure should its protection fail. It is also important to identify any possible route from the ICT corporate infrastructure into the Telemetry infrastructure. The SCADA/Telemetry infrastructure: The need is to understand the potential control of the energy transmission infrastructure that can be achieved using the SCADA/Telemetry components and the potential to cause widespread disruption to the gas transmission infrastructure should the systems be misused or compromised. The Engineering and Maintenance teams: The need at this stage is to understand what are the critical activities undertaken, who are the key actors within these teams and what other dependencies they rely upon in order to deliver the required levels of service. The Control/Dispatch Room (including Control Room personnel): The need is to understand the level of resources required to function for different scenarios (normal load, peak time, incidents, etc). Also how much of the control rooms management, monitoring and control of the systems are manual, automatic or a combination of both. Interconnection points: These assets are where the transmission network interconnects with other networks and what the level of criticality is given to each interconnection. The Contingency Plan: The need is to understand the contingency plan and all the resources that can be activated during an incident affecting the scope of the risk assessment. Page 53 of 118
Step 5: Understand the threats The objective is to understand the specific threats in the context of the target of the risk assessment being undertaken. The common threats against a gas transmission organisation would include, but are not limited to, the following: Intent Failure Nature Cascade Acts of Terrorism Acts of Vandalism Theft (copper/metals) Theft (equipment) Industrial action Targeted Cyber Attack Negligence Mistake Impact (e.g. vehicle against over ground pipe) Ingress of Water Explosion Extreme weather conditions Pandemic (Flu/etc) Geological Fire Flood Loss of power supply/utilities/services Loss of Telecoms Loss of Gas Supply to the Transmission Network (Interconnector / Supply) Virus/Trojans EMP Disclosure of information (Theft/Leakage) Act of War Equipment malfunction or failure Sabotage Diplomatic Incident Chemical (spillage) Loss, unavailability or turnover of personnel Outdated and unmaintainable technology Figure 17: Common Threats to Gas Transmission operators A Gas Transmission operator is required to review the high level threats listed above and if they are applicable to their context, they should proceed to evaluate the level of threat exposure, taking into consideration that the level of the threat of exposure may not be consistent across the scope of the risk assessment. Page 54 of 118
Step 6: Review Security and identify vulnerabilities The Step 6 objective is documenting the detailed vulnerabilities within the scope of the holistic risk assessment. This step is very dependent on the particular situation at hand. It is therefore difficult to provide a list of typical vulnerabilities for gas transmission this would have the detrimental effect of focusing the user of this approach on a finite list of vulnerabilities, which would by no mean be exhaustive. Step 7: Evaluate the associated risk factors The objective of Step 7 is to compile a list of risk factors that have been qualified in terms of associated vulnerability (ies), probability and severity levels. As such, it is not possible to tailor this explicitly for the energy sector as it is a generic step. However, by virtue of the previous 6 steps being tailored for the Gas Transmission sector, the output of Step 7 is specific to the sector. Page 55 of 118
4.3.3 Oil Transmission Step1: Constitute the Holistic Risk Assessment Team The objective of Step 1 is to create and the holistic risk assessment team. The team should comprise personnel as described in the figure below, with the level of their contribution dependent on the risk assessment being undertaken. The Risk Manager is the owner/lead for this process and should be assisted in the process by 3-4 independent individuals. Expert and specialist input into the process will be provided by the following organisational functions as and when required to the Holistic Risk Assessment team: Holistic Risk Assessment Team Contributors Maintenance Team (Transmission Infrastructure) Engineering Team (Transmission Infrastructure) ICT & Physical Security Contingency Planning Manager Facilities Team HR Team Control/Dispatch Room Manager SCADA/Telemetry Manager(s) Logistics Team ICT (system & networks) Team Safety Team Figure 18: Holistic Risk Assessment Team Page 56 of 118
Step2: Define the scope of the risk assessment It is essential that the definition of the scope of the risk assessment is fully understood by the holistic risk assessment team. The elements that could be considered for inclusion within the scope of the risk assessment undertaken on an oil transmission operator s environment might be: Primary Oil Transmission Infrastructure: Pipelines (over ground, underground & underwater) Oil Transmission Network pump stations: These are used to maintain the flow of the oil within a section of the transmission network. Pump stations may also contain oil storage facilities. Supporting Oil Transmission Infrastructure: Intermediate storage facilities. Processing stations (split gas/oil, remove water, split fractions). SCADA/Telemetry: These contain the key elements for the management, monitoring and control of the oil transmission and storage infrastructure and include real time and historic status information. ICT Networks and Systems: The ICT systems and networks is the infrastructure that supports the operations of the corporate and the SCADA/Telemetry environments. Facilities: The facilities include the buildings and land where oil transmission and storage assets are located. Engineering function: The engineering function is responsible for the deployment and management of the assets used for the transmission of oil over the transmission infrastructure. Maintenance function: The maintenance function has the role of maintaining the infrastructure and work in conjunction with the engineering function. Contingency plans: The contingency plans provide the organisation with the tools required to react in an effective manner following an incident. Oil Transmission Infrastructure Dependencies: Tankers (Ship): These can contain in excess of 500,000,000 litres of oil and are used to transport oil to onshore oil terminals ready to be processed and injected into the oil transmission network. Page 57 of 118
Terminals/Depot/Farms: This is where oil from the oil fields (and Tankers) is stored and processed (not refined) and where oil is injected into the national transmission network (also cross border & international via interconnectors). Terminals/Depots/Farms can also be supplied with oil via road or rail (bridging). PIG Launchers: These are Y shaped points within an oil pipeline where a maintenance PIG (Pipeline Inspection Gauge) or Scraper is introduced into the pipeline. Interconnector: An Interconnector is the point where transmission networks connect either at a national or cross border/international level. Step 3: Define the scales for risk evaluation The objective is to provide defined scales for the evaluation or probability and severity. In the oil transmission sector, the scales for risk evaluation of incidents can be estimated in the following terms: Impact Scales The following table indicates the possible impact scales for an oil transmission operator: 1: Low impact 2: Medium impact 3: Significant impact 4: Critical impact 5: Most severe impact Extent of loss of supply by percentage of nominal output < 5% > 5 % > 25% > 50% > 75% < 25% < 50% < 75% Duration of loss of supply or sustained low flow rates. < 12 Hours > 12Hours > 24 Hours > 48 Hours > 96Hours < 24 Hours < 24 Hours < 96 Hours Financial loss Loss of revenue, Customer compensation, Cost of reactive remedial action & Regulator fines as a percentage of revenue during the period of an incident <.5% < 5% < 25% < 50% > 50% Page 58 of 118
Figure 19: Impact Scales for Oil Transmission Probability Scales The following table indicates the possible probability scales for an oil transmission operator: 1 Very low probability 2 Low probability 3 Medium probability 4 High Probability 5 Certainty Accidental or untargeted attacks It is extremely unlikely that the incident will occur as for example there is merely no experience of it in the oil transmission sector. The accident is not likely to occur as for example experience of it is very limited in the oil transmission sector. It is likely that the accident will occur as, for example, similar accidents have been reported in the oil transmission sector. It is very likely that the accident will occur in the organisation as, for example, most of the oil transmission sector has already suffered such incidents. The accident will happen in the organisation in the close future. Deliberate attacks Attack would require virtually unlimited resources (money, skills, etc.) Attack is very difficult to perform needing conjunction of expert skills and money. Attack is not easy but could be possible with single expert skills and a reasonable investment in time and effort, Attractiveness, lack of protection and, resources of the attacker making the attack perfectly feasible. Attractiveness, lack of protection and, resources of the attacker making the attack ordinary. Figure 20: Probability Scales for Oil Transmission Step 4: Understand the assets in the scope The objective of step 4 is to gain an understanding of the assets within the scope of the risk assessment for operations. Examples of the assets that could be within the scope of the risk assessment and that require their criticality and their priority levels to the oil transmission service to be understood are: Page 59 of 118
The transmission grid (Pipes & Interconnectors): The need is to fully understand and appreciate the extent of the network and the resilience levels provided in case of the loss of a section of pipeline or Interconnector. Dependencies to energy sources, such as oil terminal or other interconnections with other transmission networks need also to be understood. This analysis should be done in a different mode of operations such as seasonal usage and other usage patterns such as increases in peak demand and even price fluctuation. The Terminals: These assets are used to receive, process (not refine), store and pump oil to the oil transmission grid. Oil can be received directly from oil production fields using pipelines or by tanker (via a terminal). The ICT systems & networks: This is primarily the organisations ICT assets and the need to protect sensitive information that may facilitate a compromise of the electricity transmission infrastructure should its protection fail. It is also important to identify any possible route from the ICT corporate infrastructure into the Telemetry infrastructure. The SCADA/Telemetry infrastructure: The need is to understand the potential control of the energy transmission infrastructure that can be achieved using the SCADA/Telemetry components and the potential to cause widespread disruption to the oil transmission infrastructure should the systems be misused or compromised. The Engineering and Maintenance teams: The need at this stage is to understand what are the critical activities undertaken, who are the key actors within these teams and what other dependencies they rely upon in order to deliver the required levels of service. The Control/Dispatch Room (including Control Room personnel): The need is to understand the level of resources required to function for different scenarios (normal load, peak time, incidents, etc). Also how much of the control rooms management, monitoring and control of the systems are manual, automatic or a combination of both. Interconnection points: These assets are where the transmission network interconnects with other networks and what the level of criticality is given to each interconnection. The Contingency Plan: The need is to understand the contingency plan and all the resources that can be activated during an incident. Step 5: Understand the threats The objective is to understand the specific threats in the context of the target of the risk assessment being undertaken. The common threats against an oil transport organisation would include, but are not limited to, the following: Page 60 of 118
Intent Failure Nature Cascade Acts of Terrorism Acts of Vandalism Negligence Mistake Extreme conditions weather Loss of power supply/utilities/services Theft (copper/metals) Theft (equipment) Industrial action Impact (e.g. vehicle against pylon/pole) Ingress of Water Pandemic (Flu/etc) Geological Fire Loss of Telecommunications Loss of Oil Supply to the Transmission Network (Interconnector / Supply) Targeted Cyber Attack FIRE Flood Virus/Trojans Explosion EMP Disclosure of information (Theft/Leakage) Act of War Sabotage Equipment malfunction or failure Diplomatic Incident Oil (spillage) Loss, unavailability or turnover of personnel Outdated and unmaintainable technology Figure 21: Common Threats to Oil Transmission operators An Oil Transmission operator is required to review the high level threats listed above and if they are applicable to their context, they should proceed to evaluate the level of threat exposure, taking into consideration that the level of the threat of exposure may not be consistent across the scope of the risk assessment. Step 6: Review Security and identify vulnerabilities The Step 6 objective is documenting the detailed vulnerabilities within the scope of the holistic risk assessment. This step is very dependent on the particular situation at hand. It is therefore difficult to provide a list of typical vulnerabilities in the energy sector. This would have the detrimental effect of focusing the user of this approach on a finite list of vulnerabilities, which would by no mean be exhaustive. Page 61 of 118
Step 7: Evaluate the associated risk factors The objective of Step 7 is to compile a list of risk factors that have been qualified in terms of associated vulnerability, probability and severity levels. As such, it is not possible to tailor this explicitly for the energy sector as it is a generic step. However, by virtue of the previous 6 steps being tailored for the Oil Transmission sector, the output of Step 7 is specific to the sector. Page 62 of 118
5 Contingency Planning The following section will define the holistic Contingency Planning approach; identify the elements an organisation explicitly requires to create an appropriate and relevant Contingency Plan with an emphasis on the energy transmission sector. 5.1 Introduction As a critical element of an organisations Risk Management process, Contingency planning is an essential tool to assist with the prevention and the management of incidents. This includes the minimisation of the potential effects of an incident should one occur, the management of an active incident and the mitigation factors required to reduce the effects of an active incident. The formal implementation by an organisation of an effective Contingency Planning process will also ensure that should an incident occur it can be readily identified, that the organisation conducts regular Contingency Planning reviews & exercises and that the appropriate levels of training is provided to all the relevant personnel involved with the Contingency Planning processes. The failure to implement all of the required elements within a Contingency Plan will result in the failure to manage and react to an incident, should one occur, in an appropriate and effective manner. The need for a Holistic Contingency Planning process is to ensure that an organisation has considered and incorporated all of the relevant elements into the Contingency Plan, therefore providing an organisation with as comprehensive a plan as possible within the scope of the organisation Contingency Plan. A holistic approach is more a appropriate approach for the Energy sector organisations due to the fact that they, by default, have multiple interactions with other organisations and agencies, where the effect of an incident may have a significant negative effect and cause considerable disruption for their customers and the general public, across the Local, Regional and National areas, Cross Border within the EU and even internationally. However, this document will focus on a single organisational level (the operator). 5.2 EURACOM WP 2.2 Desktop Study The primary scope of work for EURACOM WP 2.2 was to undertake a comparative analysis and a desktop study and review of current Contingency Planning and Business Continuity Management Page 63 of 118
methodologies from various sources, encompassing international, national and domain-specific standards and guidelines. WP 2.2 objectively made a qualitative evaluation of existing standards and guidelines for contingency planning and business continuity management based on a defined set of analysis criteria, and assessed the suitability for application in the energy sector, having in mind the goal to develop under this project a common, holistic EURACOM methodology for risk assessment and contingency planning There was one major area of concern that the WP 2.2 desktop study highlighted and that was the general lack of Contingency Planning methodologies available (most of these were ICT centric), whilst there were a reasonable number of Business Continuity Management methodologies and standards available that provided a good representative sample, that could be analysed, to identify good practice. The study identified the common elements between Contingency Planning and Business Continuity Management methodologies and standards and this facilitated the highlighting of the common good practices. As a result of this, the study created a comparison matrix (criteria vs. standards & guidelines) in the Conclusions section of D.2.2 to help the reader readily identify these important and key common good practice elements from the methodologies and standards. The following is the approach undertaken by WP 2.2 1. Selection of resources for analysis 2. Definition of criteria for assessment and comparative analysis 3. Execution of the analysis 4. Conclusion and recommendations The conclusions and recommendations that came out of the WP 2.2 study (D 2.2) was that the analysis undertaken demonstrated the maturity of available holistic BCM standards and guidelines and underlying BCM frameworks and process models and from this perspective, a common approach for contingency planning under EURACOM (i.e. for application to energy infrastructures) should incorporate the critical, common elements of these existing models, i.e. integrated with a common approach for risk and vulnerability assessment as well. Also identified was that as the interconnectivity of networks and their dependencies are key issues for EURACOM, the question arises as to how joint exercising, maintenance and review is organised, allowing for a continuous improvement of the established practices within the Energy Operators. The common approach should propose tools and methods tailored to the purpose of the analysis, e.g. for risk assessment, impact analysis, definition of continuity requirements, incident prevention and response etc., providing guidance for the adoption and execution of the proposed approach. Page 64 of 118
Page 65 of 118
5.3 The Contingency Planning Approach at a glance The following is a recommendation for an approach to Contingency Planning for the Energy Sector that is based on a formal qualitative analysis of the current standards, methodologies, and good practice, including industry requirements. This approach is broken down in 3 primary phases: The Preparation Phase, The Test Exercise and Training Phase and, The Maintenance Phase. This is illustrated by the following diagram. Figure 22: The Contingency Planning 3 Phase Approach Page 66 of 118
5.4 Preparation Phase The preparation phase is where an organisation defines the scope, the objectives and the structure of Contingency Planning and then initiates the processes that are required for the proper mitigation of risk in terms of measures to prevent, protect, respond and recover from incidents which are relevant to the organisation s needs and in line with the organisation s defined objectives and priorities. The primary elements that are considered a fundamental within the Contingency Planning preparation process are illustrated in Figure 22: The Contingency Planning 3 Phase Approach are then described below. Figure 23: Contingency Planning Preparation Phase structure Page 67 of 118
5.4.1 The Objectives and scope Objective The objective of this initial step of the preparation phase is to set the objectives and scope for the contingency planning activities. Approach In particular, the Contingency Planning will have to define first the scope of the analysis; this scope has to be defined holistically in terms of the part of the organisation covered, the people, process, physical infrastructures, ICT infrastructures and more generally the resources that are involved. Where transport (water/rail/road) is a primary requirement and/or a contingency requirement, the details of transport requirements should be included within the scope and the objectives for the delivery of energy by transport, as this is required to be addressed within the contingency planning process. More generally, outputs from the dependency analysis would be used to address external dependencies of the organisation within the contingency planning process. For practical reasons the scope can be reduced for the first implementation of Contingency Planning process and then later on it can be further expanded in future iterations of the process (see Maintenance Phase: section 5.6). The scope can be set using the inputs of the Risk Assessment activities (please refer to section 4.2.3.2): either by reusing the scope of the risk assessment or by refining it to the areas that are identified as having predominant risk profiles (areas of high threat exposure or areas where occurrences of risk factors reach particularly high severity). Then following this scope, the objectives of Contingency Planning will have to be defined. This is for the business to set their expectation about the level of resilience or risk mitigation they want to, or are required to, achieve. This can be pragmatically performed by setting the risk appetite of the organisation by identifying the acceptable level of risk the organisation is ready to accept. Depending on the scales selected for the assessment of risk (please refer to Risk Assessment approach), this risk acceptance setting process will set a threshold for the risk level which is generally considered as acceptable. In addition to this, external influences need to be considered. These could be legislative, contractual, environmental, financial, etc and set some target risk level in designated areas which could be lower than the organisation own considerations. Page 68 of 118
-+ Stakeholders Figure 24: Risk Assessment Scale At this initial stage, it is important to note that senior management business representatives need to be involved in order to validate the objectives that will drive this activity for which the primary objective is to serve business and operations. Business Requirements Impact of Failure Restoration Requirements Page 69 of 118
5.4.2 Organisation for Contingency Planning The objective of this second step of the preparation phase is to ensure that all stakeholders required for the roll-out of contingency planning are identified along with their responsibilities. As this is an enterprise wide initiative, most of the functions of the organisation are represented. In addition, when dependencies are taken into account, relevant persons of contact have to be identified outside the organisational scope, e.g. within the supply chain or to downstream facilities. This would ensure rapid access to/passing of information in case of complex, large scale disturbances. The different level of stakeholder involvement is one or a combination of the following; Lead: The participant either leads the overall process or is responsible for a key area of the Contingency Planning process within their area of responsibility or expertise. Validation: The participant validates that the Contingency Planning processes are relevant, accurate and fit for purpose within their area of responsibility and expertise. Contribute: The participant contributes to the overall Contingency Planning process within their area of responsibility and expertise. Observe: The participant observes the processes from an observers position or from a compliance perspective. Receive: The participant receives relevant information explicit to their needs and requirements. This is demonstrated in the following 2 matrixes, which also identifies the different participants needed to make up the contingency teams: Page 70 of 118
Contingency Process Internal Stakeholders Risk Mitigation Strategy Prevention & Protection Measures Response & Recovery Measures Contingency Training Contingency Testing Contingency Exercises Contingency Maintenance Executive Management O/V C/R C C/O Contingency Planning Managers L L L/C/V L/C/V L/C/V L Business Management C C/V C C/R C C/V Contingency Plan Responders L/C C/R C C Risk Manager L/C O O C/V ICT Security Management C C/V C C/R C C C/V Physical Security Management C C/V C C/R C C C/V Disaster Recovery Manager L/C C Operations Representatives C C C C C Site(s) manager(s) C C C/R C C ICT & Network Managers C C C C Facilities Management C C C/R C C C Maintenance Manager C C C C C Maintenance Engineers C C Departmental Managers C C C C/R C/V C C Staff Representatives R Figure 25: Role vs. Contribution Internal Matrix Page 71 of 118
Contingency Process External Stakeholders Risk Mitigation Strategy Prevention & Protection Measures Response & Recovery Measures Contingency Training Contingency Testing Contingency Exercises Contingency Maintenance Contingency Planning Consultants C The Industry Regulator(s) C/O C C/O C/O R The local Emergency Services C/R L/C Industry Peers C C/O Suppliers C Partners C C C C Regional Government O/R C/R C/O R Customers R R National Government O/R C/R C/O R EU O/R C/R C/O R Figure 26: Role vs. Contribution External Matrix Page 72 of 118
5.4.3 Risk Mitigation Strategy Setting Objective The primary objective is to take the output from the Risk Assessment exercises and introduce appropriate Risk Treatment(s) to provide the most appropriate mitigation possible. Approach This process is initiated by a review of the risk factors identified in the scope of the analysis (please refer to section 4.2.3.7) as part of the Risk Assessment exercise. The first objective is to set the Risk Mitigation strategy, this is achieved by identifying the risk mitigation approach for each of the risk categories in the scope in order to reduce the risk level below the acceptable risk level defined in the objectives. It is important to mention at this stage that the risk mitigation solutions have to be envisaged as giving priority to the prevention and protection measures, the subsequent options being often less effective and more costly; this is a general statement which proves to be true in most cases. Prevention: putting in place prevention measures directly aimed at reducing the probability of the occurrence of a risk. Protection: putting in place protection measures which are tasked with reducing the severity of a risk should it occur. Response: for risk factors which can not be prevented or protected, an organisation sets a Contingency Plan aimed at reacting to the risk should it occur. Recovery: This regroups the set of resources and processes that need to be activated in order to resume to a normal state of operation after an incident has occurred and first response procedures have been activated. Risk Avoidance: For risk factors which are considered unacceptably high and for which no suitable or adequate prevention, protection, response or recovery measures are available, an option can be to avoid the risk. Depending on the circumstances, risk avoidance can be achieved through modifying the way the organisation operates to avoid the areas where the risk could occur (use of uncertain technologies, operations in risky regions, risky activities, etc.) Another option for risk avoidance is to transfer the risk through for example the out sourcing of the risky activities to an external partner, in a not too dissimilar context there are options for organisations to offset part of their risk factors to insurance companies by ensuring against it. Risk Acceptance is the final option of risk mitigation either because the risk level falls below the acceptability threshold or because there are no cost effective or even feasible measures to mitigate a risk that can not be avoided. Page 73 of 118
This process should be documented by a formal risk treatment plan identifying for each risk factor the mitigation options which are retained in all the above categories. This risk treatment plan should include an evaluation of the residual risk level of each risk factor after mitigation. It should also document the acceptance of the residual risk and in particular highlight all risk factors that are above the agreed risk acceptability threshold. It has to be re-emphasized that the whole approach undertaken and presented within this document needs to be holistic, which at this juncture means that security measures need to be identified in all areas (i.e. Organisational, Physical, IT and Human aspects security). On the other hand, risk mitigation strategies must appropriately take into consideration the existence of external dependencies, their nature (e.g. basic supplies for energy, water, communication etc., possibly organisational dependencies), their impact, relation and relevance with regard to specific risk (e.g. communication means may also be impacted by significant natural hazards). A good risk mitigation strategy may for example attempt to limit the level and reduce the impact of these dependencies (e.g. through the availability of alternative communication means, simultaneous suppliers, back-up systems etc.). This risk treatment plan constitutes the strategy of the organisation for risk mitigation, as such this plan, along with the necessary investments and impacts on operation implied by the mitigation measures need to be presented and validated by senior management to ensure senior executive backing and secure the appropriate resources required to implement the Contingency Planning project and processes. Stakeholders The lead stakeholder for the Risk Mitigation Strategy is the organisations Risk Manager who is acting on behalf of the Executive Management. It is the Executive Management who will validate the Risk Management Strategy. Other stakeholders include key areas of the business that will provide the required input to contribute with the formularisation of the strategy, potentially the industry regulator who may want some level of assurance that regulatory requirements are being addressed and met, as well as providing input to the process. Output Risk Treatment Plan (including mitigation measures, residual risk and required investment) validated by senior management. Page 74 of 118
5.4.4 Implementation of Prevention and Protection measures Objective The objective of is primarily for an organisation to adopt a proactive approach to Risk Mitigation activities in order to facilitate the prevention of a major incident. The proactive prevention of a major incident is preferable to the reactive management of an incident. This is why the prevention and protection measures have been grouped together. This is achieved by using appropriate countermeasures to effectively mitigate risk and therefore reduce the probability of occurrence of a risk (prevention) and/ or to limit the impact/ severity of a risk if it occurred. Approach The approach for the implementation of the prevention and protection measures is not dissimilar to the approach undertaken to roll out any other projects within the organisation. To this end, this activity should be driven following the practices and processes in place within the organisation for the governance, management and control of the project. This includes objectives setting and communication, definition of a project plan identifying the main phases, resources involved, timescales, activities, dependencies and the main milestones of the project, setting of a steering committee, procedures for validating the results, etc.. This is not the objective of this approach to go into detail for this part as it would be counterproductive to devise a project management framework which would not be totally adapted to the existing processes of an organisation which wishes to follow the EURACOM approach. Although this detailed approach is not described here, it is important to underline the key performance indicators, the drivers and other criteria that are crucial for the successful management and implementation of the project: Risk Mitigation: Risk mitigation is the principal driver of this activity and therefore the various activities and solutions need to be designed to satisfy this goal and to achieve the objectives set out in the Risk Treatment Plan. Residual Risk: The level of residual risk is the main result of this phase where the prevention and protection measures are implemented to achieve a target residual risk level. It is therefore important to follow this measure as a Key performance indicator (KPI) for the project both for the final result but also for the intermediate milestones to identify how the level of risk mitigation is built throughout the project execution. Holistic implementation: This aspect needs to be closely monitored to ensure that security is not implemented in a multitude of isolated silos and that on the contrary all measures form a complete security posture for prevention and protection. Page 75 of 118
Prioritising implementation of prevention and protection measures: Organisations operate with finite resources, so it is therefore impossible to implement all the prevention and protection measures at the same time. Some clear guidelines should be identified by management to prioritise the actions. These guidelines need to be established by the organisation but they should consider the following aspects: assessing the driving and the limiting factors for each action in order to identify priorities. The main driving factors to take into account are risk mitigation level and measures to comply with regulatory requirements. Concerning the limiting factors, time, cost, complexity of implementation can be considered. Acceptance of security: As expressed above, any acceptance process should follow the principles of any organisation project management practices (e.g. Factory Acceptance, User Acceptance, etc.). When it comes to security, specific additional steps should be added to the usual acceptance practices: o o Security penetration testing: In addition to the functional testing of security aimed at verifying that the security solutions deliver what they are expected to, penetration testing is aimed at verifying the robustness of these security functions or to check they can not be bypassed. These types of penetration tests can take various forms: ICT penetration tests (ethical hacking), social engineering, etc. Security audits or reviews: This is the use of a third party to verify and validate the actual implementation of the security measures. Stakeholders The key stakeholders are the Contingency Planning Management Managers whose role is to ensure that adequate Prevention and Protection measures are in place to assist with the prevention of an incident, or incidents, from occurring. In order to facilitate this process, a number of key management stakeholders from within the organisation are utilised and cover the different organisational aspects such as ICT, Physical and Logical Security, Facilities, etc. External stakeholders may include industry partners, the Regulator and Government who may observe and review the implemented controls. Output A comprehensive and formal Prevention and Protection measures implementation process and also the actual implementation of the validated Prevention and Protection measures. Page 76 of 118
The risk profile of the organisation has undergone a level of mitigation. This output should be fed back in the risk assessment process to show the evolution of the risk profile on the scope of the analysis. Page 77 of 118
5.4.5 Implementation of Response and Recovery measures This process will aim at developing the Response and Recovery capabilities of an organisation against incidents that have been identified through the risk assessment process. The objectives of this section will be to develop the necessary processes, solutions, resources to constitute this Response and Recovery capability. This section is presented after the Prevention and Protection measures implementation to reflect the general fact that responsive measures should always be considered to complement preventive measures. However, this does not mean that this process should happen once the previous one is over. On the contrary, this activity should happen in parallel to the previous one within the limits of the resources available in the organisation. 5.4.5.1 Approach - Scenarios selection The approach is initiated following the result of the Business Impact Analysis (evaluated during the Risk Assessment) and of the Risk Treatment phases. The risk treatment phase will provide the actual risk for which response and recovery measures have been selected for risk mitigation. The business impact analysis performed during the risk assessment will provide the details of the risk scenarios to be considered: The scenario of occurrence of the risk (e.g. an attack concept or a hazard scenario) with the threat agents, vulnerabilities exploited and assets that are directly affected; external dependencies should also be considered and analysed from the perspective of their impact on operational continuity. The evaluation of likelihood. The evaluation of the impact for the business derived on the various dimension of the impact (e.g. impact on supply, financial impact, regulatory obligations, legal pursuits, corporate image impact, impact on human beings, etc). It is important to mention that the objective of this report is to be understood in the context of the EURACOM objectives which are targeted at the resilience of supply chain of interconnected energy networks, therefore the rest of the approach will focus on impacts on supply only. As a result, response measures to mitigate financial impact (e.g. capital provisioning) or measures for managing brand image impacts (e.g. corporate communication plans) will not be covered by the approach. It is not always practical or even feasible for an organisation to develop its Contingency Plan to cover all possible scenarios; therefore it is often preferable to only start with a subset of them which are selected to dimension the Contingency Plan. It will then be possible for an organisation to add new scenarios when iterating and maintaining its contingency plan. Page 78 of 118
5.4.5.2 Continuity of Supply objectives For each one of these scenarios of risk, objectives need to be set in terms of Continuity of Supply. These Continuity of Supply objectives are derived from the objectives set in 5.4.1, by making sure that the operational impact remains in a limit that keeps the risk to an acceptable level. In this way, scenarios of risks which have a higher probability of occurrence will have more ambitious Continuity of Supply objectives. Supply continuity objectives need to be expressed in terms of: Time for Restoration of Supply: Return Time Objectives (RTO) expressed in time; Level of Supply: Return Point Objectives (RPO) expressed in terms of percentage of nominal supply and priorities for restoration of Energy. In more elaborated models, these supply continuity objectives, for any given scenario can be expressed in a more granular manner by plotting the profile of recovery of supply against time. Figure 27: Continuity Objective Profile 5.4.5.3 Derive Supply Continuity Objectives in the infrastructure Once the objectives have been set, the organisation has to perform an analysis of what are the required RTO and RPO for all the elements of the supporting infrastructure to allow for the supply to be recovered in time. This analysis will be supported by the dependency analysis performed as part of the risk assessment, to show how the various elements of the infrastructure and also external dependencies and supplies contribute to the supply of the end product. This will result in expressing continuity objectives for elements like assets of the energy network, systems, relocation of key personnel, etc. Page 79 of 118
Figure 28: Continuity Objective 5.4.5.4 Define possible strategies to meet the Supply Continuity Objectives For each scenario, several strategies can be identified in terms of the technical and organisational response and recovery measures that can be implemented. At this stage, only high level strategies need to be described, the level of detail being set to the right level for decisions to be made. These strategies should at least include: The actual continuity levels that can be achieved, The cost and effort required to develop the associated contingency plan, The cost and effort required to operate against the contingency plan, Qualitative information on the strategy describing how the plan would operate (plan reliant on technologies, people, external partners, backup infrastructure, emergency resources, etc.). 5.4.5.5 Selection of strategies A review of the various strategies needs to be undertaken in order to validate the actual solutions that will be implemented. It is difficult to stipulate a finite list of criteria to be reviewed for this exercise, but the following aspects should be given consideration: Page 80 of 118
The continuity levels expected with each strategy to be analysed against the Supply Continuity Objectives, The cost and effort in perspective to the actual risk which are mitigated, The use of common strategy elements to cover a maximum of risk scenarios by opposition to resources which can only cover a certain type of risk scenarios, Having flexible solutions which can be adapted to actual events, this recognises the fact that real life situations will always deviate from the scenarios used to dimension Contingency Plans and also to recognise that situations can evolve. To this respect, on the organisational side of the Contingency Plan, the organisational structure to manage incidents should allow for a core organisation and set of processes which are common whatever the scenario with possible variations around this core to cope with the specifics of each situation. To illustrate this, the procedure for alerting and mobilising a crisis management cell should be the same for all scenarios, however the composition of the crisis management cell would vary depending on the situation and therefore the disciplines and expertise that need to be represented. 5.4.5.6 Implementation of Response and Recovery Measures: the contingency plan Once the selection of the strategies is performed a detailed plan for their implementation has to be developed. From there, similarly to step 5.4.4 Implementation of Prevention and Protection Measures, this aspect is closely linked to the practice of the organisation in terms of project management. In addition to general project management principles, below are the specific aspects of a contingency plan development project. Focusing on Supply Continuity Objectives, the scenarios and the strategy. The contingency plan development is an organisation wide project ranging from tactical to strategic levels and covering all resources and operations of the business (IT, facilities, energy network, etc.). The organisational complexity of the project has therefore the potential to be extremely high. A good cohesion factor for all actions across the organisation is to use the scenario and the strategy as the main reference for all stakeholders to refer to as the mutual main objective. Animate regular cross organisation project reviews. To further address the complexity developed in the previous point, it is important to organise cross organisation project progress reviews in order to give representatives of each of the Contingency work stream an understanding of the big picture. Testing and maintenance. This is a key part of Contingency Planning, even more than in other projects testing, training and maintenance have to be faultless. This is why these aspects have been developed in a dedicated section. (ref 5.5.2). Page 81 of 118
Stakeholders The Contingency Planning Managers, Plan Responders and the Disaster Recovery Manager lead any Response and Recovery measures. To effectively respond during and incident, they will need to be supported by all relevant elements of the organisation along with suppliers, Industry Peers, Government and potentially the Emergency Services. Customers of the Energy Provider will need to be advised of, and updated with, restoration of service details. 5.4.5.7 Supporting data: the key elements of a Contingency Plan The main elements of a Contingency Plan can be organised around the following concepts: Incident Management, Crisis Management, Business Continuity Management Disaster Recovery Management. Other elements of a Contingency Plan which are not covered as part of the project because they are outside of EURACOM s remit are: Media Communication Plan, Legal dimensions of the Contingency Plan, Social and HR dimensions of the Contingency Plan, Finance and Markets dimensions of the Contingency Plan. 5.4.5.7.1 Incident Management Incident Management is the process through which organisations deal with incidents. It covers the reporting of incidents, the escalation/ resolution of incidents and the review of incidents. Organisation can have different processes for incidents management depending on their nature: A structure for reporting by customers of incidents linked to perturbation of supply A structure for reporting physical security incidents A structure for managing operational incidents through telemetry A structure for managing ICT incidents Page 82 of 118
All these processes are separate but have all in common the capability to escalate an incident which would go out of proportion to the crisis management process. 5.4.5.7.2 Crisis Management Crisis Management is the set of processes, organisational structure and resources for an organisation to actively manage a crisis (by opposition processes linked to tactical resolution actions) from detection to the exit of crisis mode. Crisis Management objectives are principally aimed at ensuring that there is a chain of command for the management of the crisis, enabling decision making in extraordinary circumstances and ensuring that decisions are relayed and that information is reported towards the decision makers. The basic processes cover alert, crisis cell mobilisation, crisis cell operation, decision making, decision communication, situation awareness building. These processes are very stable whatever the type of crisis or incident with allowance for variations to adapt to different types of situations. All these processes are documented as procedures of the crisis management plan. The response and recovery measures specific to scenarios are found in the supporting Business Continuity Plans and Disaster Recovery Plans (see below). Some of the elements constituting a crisis management plan are: 1. Organisation in terms of crisis (internal and external) covering roles and responsibilities 2. Main procedures from alert to end of crisis 3. Special conditions reflex actions for specific scenarios 4. Supporting tools: a. Directory of contacts internal b. Directory of contacts external c. Dashboards supporting crisis management Crisis management plans can be set at various levels of an organisation: Group Branch Site Activity Page 83 of 118
The plan is also supported via dedicated resources for crisis management, typical resources are: Alert system, Crisis Management Room(s) in several locations (the level of equipment need to be adapted to the needs), Video/ Phone conference facilities (available in time of crisis), Special Communication means in case of ICT or electricity blackout, Activation of an external information line, Crisis Management Case (containing all essentials for crisis management), Etc. 5.4.5.7.3 Business Continuity Management Business Continuity Management is the set of processes and resources an organisation has identified and provisioned to be activated following adverse events in order to ensure an acceptable level of continuity of operational activities. This plan contains dispositions about management but these aspects are mainly covered in the crisis management plan(s). The content of Business Continuity is more focused on more practical operational and tactical measures directly aimed at responding and recovering to specific incident scenarios. All these measures are documented through procedures in the Business Continuity Plan. A business continuity plan includes some of the following elements: 1. List of scenarios covered, 2. Overall strategy for each scenario along with organisation, roles and responsibilities. The strategy should remind the target continuity objectives, 3. Supporting procedures for the various actions and stakeholders (fall-back procedures, BCM activation, recovery actions, etc.) 4. Supporting tools. The plan is also supported via dedicated resources for business continuity, typical resources are: Deployable units for intervention on energy network, Page 84 of 118
Backup infrastructures and systems for energy supply, ICT and other fundamental, external resources and supplies, Alternate sites for employees in case of unavailability of primary site, Etc. 5.4.5.7.4 Disaster Recovery Management Disaster Recovery Management deals as well at operational as at tactical level and focuses on the measures and resources to be activated in order to recover the required level of ICT capabilities to support the business functions. Due to the complexity of ICT systems, these dispositions are usually dealt with separately from the other resources managed in the Business Continuity Plan. Concerning the energy sector, it is probable that Disaster Recovery Management will be separated in two main scopes: Corporate ICT, Process Control, Network management, ICT (SCADA, Telemetry, etc.). A disaster recovery plan includes some of the following elements: 1. List of scenarios covered, 2. Overall strategy for each scenario along with organisation, roles and responsibilities. The strategy should remind the target continuity objectives, 3. Supporting procedures for the various actions and stakeholders (swap production environment to DR site, restoration of data, system, applications, etc.) 4. Supporting tools. The plan is also supported via dedicated resources for business continuity, typical resources are: Disaster recovery site, Backup data (on tape, disk, etc.), Backup communication methods, Spare hardware, Alternate control room, Page 85 of 118
Etc. Page 86 of 118
5.5 The Test, Exercise & Training Phase These phases are critical to successful execution, management and maintenance of a Contingency Plan as they ensure that all the processes developed within the Contingency Plan can be successfully implemented and that the Contingency Plan implementers for Response & Recovery can effectively undertake their roles and fulfil their responsibilities. Any issues that arise from the Test, Exercise & Training phases are fed back into the Contingency Plan in the form of lessons learnt and be used to enhance and stabilise the processes surrounding Contingency Planning and the Contingency Plan. Figure 29: The test, exercise and training phase 5.5.1 Contingency Planning Training Objective The objective is that as a pre-requisite, all personnel involved with Contingency Planning need to be fully conversant as to what their roles and responsibilities are and what is required from them during an incident to facilitate a successful implementation of the Contingency Plan. Approach The use of subject matter experts without the appropriate Contingency Planning training will not provide an organisation with the level of skills required to manage incidents and to execute the Page 87 of 118
appropriate level of response to an incident. The use of subject matter experts without appropriate Contingency Planning training may in fact impede the successful execution of an organisation's Contingency Plan during an incident and would most certainly delay the recovery and restoration processes for the organisation. The only way to effectively manage this requirement is to provide the appropriate level of training to an individual or group of individuals that is relevant to their role when executing the contingency plan. The training needed has to be refreshed at regular intervals or when an individual s role changes, when new personnel have contingency responsibilities or if the organisation evolves. Failure to provide the necessary and appropriate training to personnel will potentially lead to an organisation s Contingency Plan failing and therefore increasing the potential for an incident to significantly increase in magnitude. The training of personnel with responsibilities under the contingency plan should be conducted along with testing. Personnel should reach a stage of competence where they are able to execute the roles without the need to refer to a guide and that the level of training must be sufficient enough so that contingency plan execution becomes second nature to personnel. There are three main training groups in Contingency Planning: 1. The training of strategic personnel for the implementation, the execution and the management of the organisations Contingency Plan. 2. The training for personnel who are responsible for the management and implementation of defined recovery processes. 3. The training for personnel who have a role in assisting with the implementation of the recovery processes Stakeholders Contingency Planning Managers are the key stakeholders (on behalf of the Executive Management) and will lead this element to ensure that all relevant parties within the organisation receive appropriate and relevant Contingency Training in order for them to fulfil their duties within the contingency Plan. Output A comprehensive training plan that accommodates all personnel involved in Contingency Planning and the provision for refresher training or updated training when key factors within the organisation change. Page 88 of 118
5.5.2 Test the Contingency Plan Objective The testing of the contingency plan is important to provide the appropriate level of assurance that the assumptions made about the quality and effectiveness of the Contingency Plan are tried and tested to confirm its validity and satisfy the organisation that the Test Plan, as well as the Contingency Plan itself, are fit for purpose and meet the organisations Continuity objectives. Approach The testing can be scenario based or it can be tailored to test explicit elements or functions of a Plan, including the ability to accommodate changes or deviations to the plan. The following types of testing can be undertaken: a. Call Tree: The function of a call tree is to provide a list of personnel, and their contact details, needed to execute and operate the Contingency Plan when an incident occurs. When contact is initiated to the first group of contacts, they then cascade the process onwards by contacting their contacts, who in turn contact their contacts, etc. This type of test will verify, or not, that the Contingency Plan s Call Tree works and it should introduce elements into the testing that address the loss of mobile and/or fixed telecoms or key/senior team members. If the call tree process is automated, this test is also relevant. b. Walkthrough: A walkthrough is primarily a formal peer review of an element (or elements) of a contingency plan where each stage is discussed and where its merits and deficiencies are identified and potential improvements are discussed. This is required in order to validate the plans ability to deliver and meet its objectives and where shortcomings can be addressed in a proactive manner. The output from this test/review is the most critical of the tests and the lessons learnt from it feed directly into the preparation phase. c. Table Top: A Table Top exercise would involve a contingency scenario being presented to the contingency team (or teams) where they would describe and discuss the actions they would undertake during and incident without actually executing their actions. Page 89 of 118
This test follows on from the Call Tree and the Walkthrough and leads towards the execution of a contingency exercise. The tests should include the following to identify the contingency and the actions that need to be taken to address the contingency: 1. Contingency: What is the contingency? 2. Activation: What are the activation criteria? 3. Severity: What is impact of the contingency? 4. Action: What action is undertaken to address the contingency? 5. Result: What is the expected result? The testing has to be undertaken in a formalised manner and personnel should be fully de-briefed so that any lessons learnt or short comings are captured and fed into the contingency plan maintenance phase. Stakeholders The Contingency Planning Managers are the key stakeholders for Contingency Testing as they have a responsibility to the Executive Management and potentially the Industry Regulator to validate that their Contingency Testing plans and activities are fit for purpose and as such will lead this element to ensure that all relevant parties within the organisation validate their Contingency Plans through appropriate testing. Output The output will be a formal Test Plan, including test scripts and scenarios, to validate the assumptions of the Contingency Plan s suitability and effectiveness and that it is fit for purpose and meets the organisations objectives. Page 90 of 118
5.5.3 Contingency Exercises Objective Contingency Exercises will act as a comprehensive method to complement the Testing and Training phases and to validate that they have achieved their goals and satisfy the organisations requirements to the appropriate level with respect to Contingency Plan. Approach Periodical contingency exercises are important not only as part of the testing and maintenance, but to ensure that personnel fully understand their roles and responsibilities in an actual incident and are conversant with the procedures and protocols should a Contingency Plan be activated. Contingency Exercises should be organised to simulate real world scenarios in a structured and defined format in order to validate the effectiveness of the contingency planning processes, including testing and training. The exercise should include as many elements of the organisation as possible, but must maintain a focus on the organisations most critical elements which have previously been identified in the Risk Assessment. Each exercise should have clearly defined objectives as to what the organisation wants to achieve from the exercise and the exercise should have a formal set of evaluation criteria to measure the level of success the organisation has achieved in it s response and recovery processes. Ideally the exercises should include a scenario based approach to a specific incident within the energy sector that will activate a broad range of response and recovery processes using multiple teams from different elements of an organisation's structure to facilitate effective co-operation, communication and decision making processes. These should include elements from the Test Plan including executing the Call Tree processes and utilising the scripts and scenarios developed for the Contingency Test Plan. Such a simulation, albeit a customised one, of a previous event (within the energy sector) would deliver a more realistic scenario simulation and would also be able to validate if the lessons learnt (by the energy sector) have been implemented and incorporated within their risk management processes following the post mortem of the original incident. Exercises could be either undertaken in Real or Accelerated time, with the former being more appropriate for a larger exercise involving multiple parties and agencies, whilst the latter would be more suitable for a smaller internal exercise e.g. a single site. Accelerated time exercises are also more suitable for simulating business responses during an incident. One primary and critical element for any form of contingency exercise is that there must be inbuilt and comprehensive safeguards against the simulated incident exercise being accidentally mistaken for a real Page 91 of 118
incident which could cause distress and panic for those not realising that an exercise was being executed. Panic driven actions could also be undertaken by poorly trained personnel that may implement real responses to simulated incidents, which could potentially lead to an actual incident occurring and placing the organisation in Contingency mode. This could also then create a scenario where a number of personnel may still think they are in the contingency Exercise mode and not actually execute the appropriate responses during an actual contingency. Organisations should also introduce additional elements within a contingency exercise to identify potential shortcomings following: The loss of large number key staff Severe weather conditions Premises being made unavailable Industrial action, strike Full or partial loss of communications Following any contingency exercise or simulation, a comprehensive de-briefing process should be undertaken in order to capture and provide a valuable insight into the positive and negative aspects of the exercise (what worked well and what didn t work very well) and this should then be recorded in a formal and structured manner. The information gained from the de-briefing(s) should then be analysed and then the results from the analysis will then feed into the lessons learnt process and therefore into the contingency plans maintenance phase to ensure that all captured issues that need attention are addressed appropriately. Extended Approach The introduction of major incident scenarios involving multiple organisations (Sector, Cross Sector, Emergency Services, Local, Regional, National, Cross Border and International) and complex exercises should be undertaken to validate the compatibility, interconnectivity and interdependency of the different contingency plans and to assist in reducing the probability of an incident from escalating. This type of exercise will ensure that major incidents involving multiple organisations can effectively implement and manage multiple contingency plans under a single overreaching contingency plan and that no isolated contingency plan can severely impede the response and recovery processes during a major incident. These aspects are further developed in section 7 Managing Dependencies of the energy sector in Risk Assessment and Contingency planning. Page 92 of 118
Stakeholders As with Contingency Testing, the Contingency Planning managers are the key stakeholders for Contingency Exercises as they have a responsibility to the Executive Management and also potentially to the Industry Regulator to validate that their Contingency Plans are fit for purpose. Contingency Exercises follow on from Contingency testing and not only involve an organisation s internal resources, but it has the potential to involve a number of external organisations such as the Emergency Services, Industry peers and Government. Customers will need to be advised that an exercise is being staged. Output A structured and formal scenario based program that will test the organisations Contingency Plan to ensure that the contingency testing and training is adequate and that the decision making processes are in place, embedded in the organisations business processes and implemented effectively within the Contingency Plan. Page 93 of 118
5.6 The Maintenance Phase The maintenance phase is where the Contingency Plan is modified and updated to meet an organisation s new objectives and change in their environment of operation and also where the lessons learnt are introduced into the maintenance phase, along with inputs from the Risk Assessment element. Additional results from ICT Security Testing (Risk Assessment), e.g. Penetration Tests, are input into the maintenance phase as the compromise and/or loss of ICT services can lead to the loss of service delivery for an energy provider. E.g. it can prevent the generation and distribution of energy. Figure 30: The maintenance phase 5.6.1 Contingency Planning Maintenance Objective The objective for Contingency Planning Maintenance is to ensure that the contingency plan, and its associated processes, is subjected to a constant formal evaluation and maintenance cycle to provide an organisation with a high level of assurance that the Contingency Plan is up to date and satisfies the organisations Contingency objectives and requirements. Approach Contingency Plans need to be maintained on a periodical basis to ensure that all dispositions are up to date and reflect the latest changes in the organisation and its environment. To this purpose a maintenance plan should be developed to ensure that the various components are reviewed and maintained on an adapted frequency. Page 94 of 118
Further to periodical maintenance tasks, a stronger approach to Contingency Plan maintenance is to identify all events that can trigger a change in the contingency plan. An organisation can therefore embed contingency planning into all of its business processes to ensure that when a decision to introduce, update or remove a process or service is taken, then the relevant changes are reflected within the Contingency Plan. This should include aspects like: 1. Personnel: Personnel will leave and join an organisation as part of normal business activity but when this happens and when the personnel have a role within the Contingency Plan; this should be controlled. This should be reflected in the appropriate business processes such as the company directory, telephone numbers, role. As this is critical for the Call Tree. This could be embedded within the organisation s Starters & Leavers processes, where personnel movements are managed. 2. ICT: Changes to key ICT equipment and services can have a significant impact on being able to support a business s Contingency Plan and as such, changes made to the structure of key ICT infrastructure components must be incorporated into a contingency plan. This process could be embedded within the internal ICT Change Management process where part of the process involves updating the contingency plan and the associated Disaster Recovery Processes. 3. Physical Infrastructure: The infrastructure is critical to the delivery of service for the majority of energy sector and changes made to this infrastructure, including comprehensive component descriptors, needs to be maintained and incorporated into the contingency plan. This process could be embedded into the infrastructure works and maintenance (new or replacement) project plans and changes to the infrastructure would be recorded as changes are made. In the energy sector, the asset management process is identified as one of the prevalent processes for infrastructure management and maintenance; it is therefore recommended that this process contains dispositions for Contingency Planning impact analysis when changes are applied to the infrastructure. 4. Transport: Where the energy sector is reliant on transport (Water, Rail or Road) or where there is an alternative contingency requirement to utilise transport when infrastructure (pipes & transmission lines) isn t available, these requirements need to be introduced into the contingency plan. Whether transport is a primary requirement and/or a contingency requirement, the details of transport requirements and the contingency expectations for the delivery of energy need to be addressed within the contingency plan. This is best achieved through the embedding of contingency plan update requirements into the processes of adding, removing and updating suppliers on the organisations financial systems. Page 95 of 118
5. Third parties involved in Contingency Planning (Suppliers, authorities, emergency services, etc): Where there is a change within third party organisations, this need to be reflected within the contingency plan and this can best be achieved at the contract agreement stage and the contingency plan changes undertaken as part of the process of adding/amending suppliers on the financial systems. Where the need for transport is solely to provide a contingency solution following the loss of distribution infrastructure (gas/oil), the contingency plan needs updating when requirement for the delivery of supplies is changed (who, amount, priority, etc). This process should be embedded so that changes are reflected in the contingency plan as and when the changes are agreed. 6. External Advice: Advice from external agencies could take the form of Regulatory requirements, industry recommendations, Standards, Government (Local, Regional, National, EU & International) and research. Most of this information would be manifested in the form of documents, advisories, bulletins, etc and should be reviewed internally by the organisations subject matter experts before submitting formal recommendations for changes to be made to the contingency planning process and the contingency plan. This requires that formal methods are required to embed the capture of this information into the business processes and this could be achieved by making key contingency planning personnel responsible for this process within their fields of expertise and responsibility. In addition to maintaining the strategy and processes of the contingency plan, it is crucial to ensure the maintenance of critical contingency resources which are not in use in day to day operations. For this purpose, the most efficient approach is not to create new maintenance regimes for these specific assets but rather to assign them to existing maintenance processes. For example, additional fleet resources would fall under the fleet maintenance regime of the organisation, Disaster Recovery systems in the ICT maintenance process, Office generators in facility management maintenance process, etc. Stakeholders Contingency Planning Managers are the key stakeholders of Contingency Planning Maintenance as they have overall responsibility for Contingency Planning Management within the organisation. They will include most elements of the organisation in the Contingency Planning Maintenance Life Cycle and will also take input from external elements such as Industry, the Regulator and Government in order to ensure satisfactory maintenance levels are achieved. Output The output is a series of formal maintenance update processes covering the different elements of the contingency plan with defined responsibilities. Page 96 of 118
5.6.2 Lessons Learnt Following the testing, an exercise, or even a contingency incident (within the organisation or external to the organisation), there will be lessons learnt that need to be studied, analysed and, where appropriate, introduced into the organisations Contingency Plan. Objective To ensure that identified and highlighted shortcomings in the contingency plan, or areas identified without appropriate and required contingency coverage, are addressed in an effective manner in order to mitigate the identified weaknesses within the contingency plan. Approach The approach should be a holistic and organic approach to understanding weaknesses within the contingency planning processes and taking input from a number of different sources including: 1. Contingency Tests 2. Contingency Exercises 3. Contingency Incidents a. Internal b. External 4. Good Practices 5. Industry Research 6. Security Assessments 7. Risk Assessments 8. Peer Reviews Stakeholders As with Contingency Planning Maintenance, Contingency Planning Managers are the key stakeholders of Contingency Lessons Learnt and as this is an important process they will receive feedback from most elements of the organisation in the Maintenance Life Cycle and will also take input from external elements such as Industry, the Regulator and Government. Page 97 of 118
Output The output should be a formal approach and processes to allow the organisation to incorporate the relevant Lessons Learnt within the contingency plan to ensure the effectiveness of the organisation implemented contingency plan and contingency planning process. Page 98 of 118
<THIS PAGE IS INTENTIONALLY BLANK> Page 99 of 118
6 The EURACOM Combined Risk Assessment and Contingency Planning Approach From the principles defined in section 3.2, the two sets of practices for Risk Assessment and Contingency Planning have been designed to be implemented in a combined approach building on the generic links identified in section 2. The general relationship between Risk Assessment and Contingency Planning can be illustrated by putting the two approaches on a single diagram and linked by the preparation loop on the one hand and the lessons learnt loop through the maintenance process on the other hand. Figure 31: The Combined Approach Taken by EURACOM to Risk Assessment and Contingency Planning. Page 100 of 118
6.1 The preparation loop On the direct linear sequence of steps, there is a link between the results of the Risk Assessment and the implementation of Contingency Planning. This first link is called the preparation loop as it is mainly occurring during the preparation activities when moving from the evaluation of the risk factors to the definition of a Risk Treatment Strategy. Figure 32: The preparation loop The relationship in detail is as follows: Definition of scope of Contingency Planning as described in 5.4.1: The scope can be set using the inputs of the Risk Assessment activities (please refer to section 4): either by reusing the scope of the risk assessment or by refining it to the areas that are identified as having predominant risk profiles (areas of high threat exposure or areas where occurrences of risk factors reach particularly high severity). Definition of the objectives of Contingency Planning as described in 5.4.1: This can be pragmatically performed by setting the risk appetite of the organisation by identifying the acceptable level of risk the organisation is ready to accept. Depending on the scales selected for the assessment of risk (please refer to Risk Assessment approach). Page 101 of 118
Risk Mitigation Strategy Setting as described in 5.4.3: This process is initiated by a review of the risk factors identified in the scope of the analysis (please refer to section 4.2.3.7) as part of the Risk Assessment exercise. 6.2 The lessons learnt loop The lessons learnt loop characterises the feedback from tests and exercises into the risk assessment process. The relationship in detail Figure 33: The lessons learnt loop starts from the results of Tests and Exercises of Contingency Planning as described in 5.5.3 : The information gained from the de-briefing(s) should then be analysed and then the results from the analysis will then feed into the lessons learnt process and therefore into the contingency plans maintenance phase to ensure that all captured issues that need attention are addressed appropriately., goes through the Maintenance process to update the contingency plan and where necessary Page 102 of 118
feeds into the Risk Assessment for the update of risk factors that are better understood (in terms of effect for example) or newly identified from tests and exercises activities. Page 103 of 118
7 Managing Dependencies of the energy sector in Risk Assessment and Contingency planning The objective of this section is to describe how risk assessment and contingency planning activities focusing on dependencies in the energy sector can be implemented within a wider multi-stakeholder framework (sector, cross sector, region, country, etc.). 7.1 Introduction In systems theory, an interdependency exists when a change in the state of one system element induces a change in the state of another system element (i.e. dependency), which in turn induces further changes in the state of the first system element via feedback mechanisms (i.e. interdependency). Dependency and interdependency relationships are particularly prevalent in the case of energy infrastructures, which display the characteristics of highly structured, complex and highly interconnected networks. These relationships can create subtle interactions and feedback mechanisms that have the capability to lead to unintended behaviours and consequences. Problems in one infrastructure can cascade to other infrastructures, potentially inducing large-scale effects. Interdependencies therefore introduce an additional level of vulnerabilities in the system, and they should be treated as such within a risk and contingency management framework. Additionally, interdependencies may vary considerably both in their scale and complexity, and they also typically involve numerous system components. As the levels of complexity increase and interdependency becomes more prevalent, increased levels of risk may occur with higher levels of uncertainty. The process of identifying and analysing dependency and interdependency relationships requires a detailed understanding of the overall system, and in particular how the components of each infrastructure and their associated functions or activities depend on, or are supported by, or interact with, each of the other infrastructures in each mode of operation [CROSS REFERENCE TNO PAPER/BOOK CHAPTER IFIP 2008]. This process provides valuable and essential information for risk assessment and contingency planning. Page 104 of 118
Figure 34: The High Level Analysis of Risk Assessment and Contingency Planning. 7.2 Managing dependencies in risk assessment (EURAM) The risk assessment process can be conducted at a multi-stakeholder level to address complex dependency paths and interdependencies within the sector, region, or at higher scales. The general Page 105 of 118
approach described in section 4 can be applied at higher levels of analysis, by reconsidering the assets, threats and vulnerabilities in order to fit with the complexity of larger scopes (sector, cross-sectors, region, country etc.). A number of specificities related to a multi-stakeholder risk assessment process are described below. 7.2.1 Defining the scope of the analysis and the risk assessment team The risk assessment scope has to take into account critical dependency paths at a wider level than the organisational one, spanning over an entire sector or following a cross-sectoral approach. The stakeholders involved in the risk assessment process have to be identified at this stage, along with an overall supervising authority to coordinate/lead the entire process. This could be an industry association within the sector if the scope is sector-specific, or a local or governmental authority if the scope is a region or a country. Once the scope and coordinating authority are defined, the risk assessment activities would require the support of a workgroup constituted of business and security experts from the various infrastructures included in the scope. 7.2.2 Identifying vulnerabilities stemming from interdependency situations within a wider scope The identification of vulnerabilities and evaluation of risks at this higher level of analysis is based on the results of previous risks and interdependency analyses carried out by each organisation/infrastructure. A harmonised approach to the identification of dependencies and risks at the organisational level would ensure a well-coordinated implementation of this very important stage in the multi-stakeholder risk assessment process. Each member of the workgroup will be in charge of contributing with information concerning their specific area of expertise. The objective of this approach is not to sum up point risk estimates assessed by each Infrastructure Operator (as this information is also confidential for each operator). The objective is to aggregate: high level information from the operators about their level of resilience over time to high level categories of risk factors without giving the detail of the associated vulnerabilities in the infrastructure, Information about the external dependencies of each infrastructure with information about their level of resilience over time. If the risk assessment aims at aggregating more precisely the results of individual risk assessments carried out by each organisation, it poses the question of comparability of results. It requires therefore that all organisations use The same approach (e.g. EURACOM), with the same guidelines or references (e.g. EURACOM s threats and asset lists + similar sources for vulnerability analysis) and, Page 106 of 118
The same scales for risk evaluation. Concerning the scales for risk assessment, it is necessary, if a single set of scale is to be applied across a wide scope involving many organisations that the severity scale (please refer to 4.2.3.3) is wide enough to be able to cater for the situation of the various organisations that are supposed to use it. To illustrate this using a simple example, let s consider two organisations A & B operating Critical Infrastructures in the energy sector, A being a rather small and focused organisation and B a large organisation concentrating many activities: For their own internal risk assessment, these companies could use a limited scale directly linked to the size of their operations with severity levels corresponding to similar ratios of their turnover for financial impact, customers in terms of energy supply disruption, etc. These scales are then appropriate to each organisation context but can not be compared as they are built on the specific situation of each organisation. In our case an Impact Level 5 for company A would be far lower than the same Impact Level 5 for company B. If their risk assessment results are to be aggregated, the scales for severity evaluation need to be based on absolute values that can be compared across the board. In this case it is unlikely that a 5 level scale as advised in 0 can be sufficient to cover with enough granularity the full spectrum of severity levels that could arise from the smaller organisations like company A or the largest like company B. It is therefore necessary to expand the scale for severity assessment. Figure 35: A unique severity scale for multi-stakeholders scopes Concerning the probability scale, the adoption of shared levels is less an issue as this dimension of the risk is less influenced by scale (size). Therefore the area of risk when going to a wider scale of analysis would only expand in one dimension as illustrated on the following figure. Page 107 of 118
Figure 36: A unique severity scale for multi-stakeholders scopes 7.2.3 Evaluating (inter)dependency risks Vulnerabilities identified according to the approach described above, along with inputs from an (inter)dependency analysis carried out on the same scope will support: Identification of the risks which have the more relevance in the wider scope of the study. Information sharing between the various parties which will allow operators to identify risks that they have not initially considered. It should be noted that, as a precondition to evaluate new risks, the scale for risk evaluation should also be reconsidered in the frame of the wider risk assessment scope. The overall output of these risk assessment activities would then feed into the contingency planning process. Page 108 of 118
7.3 Managing dependencies in contingency planning Contingency and response plans also need to be assessed from an infrastructure interdependencies perspective. In a situation of interdependence, concerted action comes namely through coordination, seen as the process of managing dependencies and interdependencies. Coordination mechanisms create linkages across system components and facilitate communication and linked action between various entities with preparedness and planning responsibilities. Several mechanisms can be implemented to address (inter)dependencies within a multi-stakeholder framework and the overall aim of these mechanisms is to facilitate and enhance coordination among stakeholders in complex emergency situations through harmonised information management (sharing) and contingency planning, as well as maximise the capacity and effectiveness to manage dependencies at the organisational level. The following sections describe how the EURACOM approach as described in section 5 should be applied in a multi-operator framework in order to incorporate the management of interdependencies in interconnected networks in the contingency planning process. 7.3.1 Preparation Phase 7.3.1.1 The Objectives and scope Coordinated alignment of contingency plans objectives among stakeholders helps to ensure that multiple plan components are focused on the same, or very similar, outcomes. Hence, for a joint (multioperator) approach to risk management and contingency planning, the development and alignment of strategic goals is an obvious, but important condition and key to deploying an inter-organisational planning effort taking into consideration cross-cutting dependencies. This implies for example that operators on each side (we assume for the sake of clarity a scenario of two interconnected networks, e.g. in a cross-border environment) share fundamental objectives and requirements with regard to risk acceptance (level of resilience or risk mitigation) and risk mitigation strategies, business continuity (maximum outage time etc.), restoration etc. On the other hand, differences in the respective national regulatory and legislative frameworks and possibly contractual and financial conditions have to be considered. The formulation of a joint strategy and corresponding objectives bases on a formal, mutual agreement and commitment and requires the participation of representatives from both sides (senior management and assigned lead participants). Page 109 of 118
7.3.1.2 Organisation for Contingency Planning Common approaches to risk assessment and contingency planning have the potential to increase the level of coordination where several organisations and their dependencies are involved. It is therefore necessary that the participating operators agree on a common organisational framework and management approach (i.e. EURACOM). This would cover the following arrangements: Organisational model (set up of a joint expert team) and key roles and responsibilities (see Erreur! Source du renvoi introuvable.), Documentation plan (essential documents that sustain a shared contingency management process, guidance documents etc.), Fundamental communication means and procedures. 7.3.1.3 Risk Mitigation Strategy Setting The identification of the risk mitigation approach for each considered risk category relies on the outputs of a joint risk assessment (see section 7.2). With regards to the management of interdependencies, this also requires previous agreement on the scope, i.e. the boundaries of the dependency analysis taking into account all modes of operation., types of scenarios (e.g. intra-sector, cross-sector, cross-border), types of dependencies (namely physical dependencies as in power infra-structures and, cyber dependencies, and resource dependencies (people, materials, transport,..)), level of detail of the underlying risk analysis etc. In particular, a holistic, all-hazards approach to risk assessment is expected to include a focus on: Upstream infrastructure assets (e.g. supply chain, operational partners) that, if lost or degraded, could adversely impact the performance of own infrastructures in the case of: o Normal and stressed operations o Disruptions (including coincident events, e.g. N-2) o Repair and restoration Identifying how interdependencies may change as a function of outage duration, frequency, and other factors all modes of operation; Cascading effects vs. risk of simultaneous failure through common vulnerabilities; Time characteristics of degradation and restoration; identifying how backup systems or other mitigation mechanisms can limit and/or reduce interdependence problems all modes of operation.; Identifying the linkages between own infrastructure and downstream (in particular community) assets (as potential grounds for major high-scale disturbances). Even though out of the scope of the contingency planning approach itself, it should be stressed that a detailed knowledge and understanding of the specific (inter)dependency risk scenarios is required in order to identify appropriate and effective risk mitigation solutions (see 5.4.3). Page 110 of 118
The conclusions and decisions should be documented in a Risk Treatment Plan or equivalent as part of the overall documentation plan shared and used by all involved operators / stakeholders. 7.3.1.4 Implementation of Prevention and Protection measures In this step, the participating stakeholders agree on distinct prevention and protection measures applying the formal process as described in 5.4.4. These will naturally focus on the prevention, mitigation, acceptance etc. of identified risks, namely with respect to operational continuity in all modes of operation., e.g.: (physical, cyber) protection of shared network infra-structure components, reducing the likelihood and/or periods of disruptions (outage times of critical components), reducing the likelihood of cascading effects, backup systems and mitigation mechanisms to limit the impact interdependence problems,... 7.3.1.5 Implementation of Response and Recovery measures The planning and implementation of response and recovery measures should follow the formal approach as described in section 5.4.5, with focus on interdependency in all modes of operation. This bases on the selection and analysis of relevant, realistic scenarios, with focus on network connectivity and supply continuity (as already identified and subject to the underlying risk analysis; see above in 7.3.1.3). Specific response and recovery measures should for example address: Formation of organisational structures for incident response and recovery (joint teams), technical means, coordination (operational procedures, communication) between participating stakeholders, Set-up and/or activation of back-up infrastructures, components (network supply, information technology and telecommunications) vs. use of alternative (external, third-party) systems and components (failing-over strategy), Coordination of communication and reporting to external stakeholders,... Page 111 of 118
In a multi-stakeholder scenario, the planning of response and recovery measures must pay particular attention to the aspects of communication as well as monitoring and information sharing (see section 7.3.3.3), ensuring that in the case of a risk occurrence, each operator becomes immediately aware of the incident in order to act according to the established plans (as part of the overall documentation plan) and as fast as possible. On the other hand, the flow of communication between stakeholders has to be mutually agreed and relevant contact persons within every organisation identified accordingly, along with established procedures for information exchange and predefined frequency of exchanges in different scenarios (e.g. incident, accident, disaster, etc.). It is important to consider that the management of incident response and recovery for network connectivity and related interdependency issues should not stay isolated from the general contingency plan (of each single operator) and its elements (incident management, crisis management, business continuity management). This becomes clear when considering that in a real-world scenario certain risks may relate to each other and therefore occur together or in some sequence. In other words, operators should implement integrated contingency planning processes that satisfy the needs for a joint management of risks related to network connectivity and interdependencies (multistakeholder framework) on one hand, and that also fit into and extend already established processes at the level of each participating organisation on the other. As a matter of fact, this integrated approach to risk management and contingency planning is a major challenge from the organisational perspective. 7.3.2 Test Exercise and Training Phase 7.3.2.1 Contingency Planning Training In a multi-stakeholder scenario, the need for joint training and exercising is evident and has to be supported by a documented contingency training plan. For each participating operational partner, such a joint plan should be elaborated as an extension to the general contingency training plan (in line with considerations for integrated contingency process management made in section 7.3.1.5). The planning and organisation of training follows the general approach (section 5.5.1). Inter-organisational resource alliances, e.g. through collective groups for incident response management and recovery, can be a powerful means to increase flexibility and resilience in crisis situations through joint synergy and planning. Each member of the alliance may take advantage of multiple strengths to address both shared and individual weaknesses, thereby increasing their level of resilience. These alliances and forms of cooperation are considered by the training approach, addressing key aspects such as: training of individuals according to their role and responsibility vs. scenario-based training (groups, up to all participants), Page 112 of 118
efficient exchange of information and better communications during incident management and recovery, efficient sharing of resources. The process of designing such common exercises also has some ancillary benefits, such as gradually building mutual trust and therefore facilitate and support future exchanges in real life situations. 7.3.2.2 Test the Contingency Plan Contingency plans should also be jointly tested and reviewed, thus ensuring consistency, completeness, effectiveness, adequacy and quality of the contingency plan and its main elements. As for testing of contingency plans in general, there are several types of exercises such as formal reviews and auditing, walk-through, simulation, table top exercise, etc. The roles and responsibilities for each partner should be clearly established and periodically reviewed. For instance, Table Top exercises are useful tools to test and brainstorm, against a series of pre-defined scenarios involving cascading contingencies, the operational and communication arrangements within a multi-stakeholder decision-making procedure. This will give partners an opportunity to think in advance of possible complex contingency occurrences and plan their anticipated responses in the context of an agreed chain of command, so as to allow them to be better prepared. 7.3.2.3 Contingency Exercises Scenarios for incident response and recovery management should be executed on the basis of simulated (invoked, but controlled) real-world scenarios. This is particularly important for (inter)dependency scenarios as these have by default a higher level of complexity. As a prerequisite for these exercises, the organisation s contingency plan should take full consideration of the interface with other similar plans involving external parties (commercial/operational partners, providers, local/regional authorities etc.). In addition, the commitment from senior and executive management must be continuous in order to support the implementation of such common programmes and also its continued operation. Exercises should always be followed by a joint review (including lessons learnt, see section 5.6.2) with participation of the lead and other assigned participants. Page 113 of 118
7.3.3 Maintenance Phase 7.3.3.1 Contingency Planning Maintenance In order to incorporate the conclusions of training, testing and exercising and to maintain the contingency planning process updated, the participating operators should implement a formal maintenance process. This encompasses all plans and other documents that compose the jointly managed documentation plan. Typical changes and aspects related to interdependencies in all modes of operation are: Personnel, assigned responsibilities etc., Resources (technical infrastructure, communication means,...), Regulation, legislation, other procedural and organisational aspects, Training... 7.3.3.2 Lessons Learnt Maintenance also considers the need for modifications to the planning process and specific arrangements. These would address any kind of shortcoming identified in the scope of testing and exercising (lessons learnt), with respect to responsible elements, notification procedures, information and communication flows, alternative techniques for response, available resources, and resource management during contingency situations, etc. 7.3.3.3 Monitoring and Information Sharing Information sharing concerns the willingness an organisation has to make strategic or tactical data available to others. To effectively plan for, and react to various contingencies, organisations should aim to build a common memory on methodologies, actual planning elements, best practices, lessons learnt on past disruptive events, etc. This would provide relevant information for the design and preparation of contingency plans, potentially enhancing the likelihood of a well-informed and successful solution. It is important to mention also here that a pre-requisite for information sharing is to provide trust, transparency and clear mechanisms for sensitive information handling. Without this, organisations will be reluctant to share information about their risk and response strategies. Collaboration, on the other hand, reflects the ability of an organisation to share and use information exchanged with others. It includes the ability to deploy joint information exchange systems, providing for a means to collect, disseminate and utilise information in a timely and efficient manner, in both normal and stressed conditions. Timeliness and efficiency refers to inputs being inserted in the optimal Page 114 of 118
sequence in the process. Collaboration therefore involves an interdependent relationship engaging all the parties to work closely together and create mutually beneficial outcomes, such as establishing and sharing joint knowledge and expertise, as well as reaching a common understanding and vision of the planning process. As an illustration of how the information exchange process can be implemented at the sector level, the ENTSO-E Transparency Platform publishes data on congestion management, system vertical load, planned schedule evolutions, day-ahead Net Transfer Capacity, balance management and interconnection outage information. More than thirty European Transmission System Operators (TSOs) participate actively in this exercise by publishing information on e.g. cross-border physical flows, crossborder commercial schedules and auction information. This is a good example of successful collaboration at the sector level, by engaging all the parties in a collaborative effort to establish and share joint knowledge and information. Building on existing efforts such as the one previously mentioned, sector-specific information exchange platforms can be extended to address dependency issues not only from a market perspective, but also at the operational level, including cross-sector insights on infrastructure vulnerabilities, threats, impacts and protective measures. Furthermore, it is important to mention the underlying effort in developing common standards for data supply and publication, along with data provision agreements and rigorous information exchange policies. The benefits of implementing a multi-stakeholder contingency planning process are numerous, among which, readiness and availability of resources (notably human and material), a tested mechanism for rapid decision making involving several partners, targeted to address complex contingencies, added value in terms of flexibility and focused effort, access to joint knowledge and expertise. It should be also mentioned that these processes are dynamic in nature, requiring effort on monitoring and maintenance, as well as on providing continuity to the process. There is however a danger, for instance in the case of multiple staff changes, that the process could cease and fail through neglect. Constant support from senior and executive management and robust procedures for feedback and maintenance are vital to the successful implementation of such schemes. 7.4 Current Framework for Operational Practices The electric power sector in Europe is undergoing a series of very important changes which have strong impact on power system security. Current regulatory developments in the electricity sector are mainly focused on market based mechanisms, such as congestion management and inter-tso compensation mechanisms. However, topics such as congestion management have a strong impact on both power system security and market liquidity. The requirement for market-based solutions to cross-border congestion management was introduced by Regulation n. 1228/2003, and restricts the choice of eligible methods to implicit auctions/market splitting and explicit auctions. Page 115 of 118
This means that TSOs are responsible for ensuring that capacity allocation complies with security requirements and for defining power transfer distributions which are consistent with the appropriate security standards, i.e. maximising available cross-border transmission capacity without compromising system security. Within this perspective, and with the aim to support collaboration and coordination between operators, the EtsoVista Transparency Platform has been launched in 2006 and further expanded in 2008 to include information on balance area profile and network capacity. This process can therefore serve as a means to support risk assessment at interconnectors. Although the scope of the information currently shared is limited with respect to security issues, this ongoing process can serve as a basis to implement wider collaborative processes, also including a focus on system security. Furthermore, security issues at the wider European level were addressed by ERGEG in the 2008 Guidelines of Good Practice for Operational Security, by focusing on how technical rules and operational procedures could work at a regional level to address interconnectivity issues, without specifying however implementation details. The document covers four areas, namely: Roles and responsibilities of different stakeholders and market players. Organisational framework for synchronous power system operation. Technical framework for operational security. Training and certification of TSO staff. The focus is mainly on information exchange among TSOs, regarding operational experiences, data relevant to the secure operation of the power system, commercial data, information on specific methods applied e.g. to calculate capacity, outcomes of contingency analyses, etc. This exchange of information would also include regular joint training between operators to improve the knowledge on the characteristics of neighbouring grids, along with overall communication and coordination. The guidelines also envisage a common monitoring system for increased efficiency in disturbance prevention and system defence in cases of disturbed conditions. These regulatory attempts targeted at the electricity sector illustrate a wider effort to guide current approaches addressing complex risks in the energy sector, namely through partnership agreements and collaborative efforts involving many stakeholders. This includes an overall effort engaging the many actors in the field to formalise good cooperation and targeted mechanisms and procedures aimed to address interdependencies from a system-wide perspective. Page 116 of 118
<THIS PAGE IS INTENTIONALLY BLANK> Page 117 of 118
8 Conclusion This document presents some candidate principles for the wide adoption of risk assessment and contingency planning approaches in the energy sector in order to contribute to enhance the resilience level of the interconnected energy networks. It is also foreseen that the wide adoption of similar risk management practices across the energy sectors would have some benefits in terms of interoperability of practices and overall efficiency of the security and resilience posture of the whole sector. This positive contribution should be achieved in two ways. First by enhancing risk management practices in the energy industry through the implementation of Risk Assessment and Contingency Planning at the level of each operator. This has the benefit of providing operators with a clear reference on which aspects they have to consider and through which process. In this way, it provides a good aid and benchmark to the risk managers and operations managers when asking themselves the question Have I covered everything? which is one of the most common worries for these often lonely positions. Then by providing some mechanisms to develop these risk management practices on larger scopes of applicability including multi-stakeholders and interconnected energy infrastructures. This first version of the EURACOM methodology for holistic, all hazard and combined approaches to Risk Assessment and Contingency Planning will then be put to discussion in the EURACOM community through 6 workshops gathering experts from the Industry sub sectors of oil, gas and electricity and also their regulators and associated national institutions. The result of these exchanges will allow improving the EURACOM methodology in a subsequent version which will be one of the main results of the EURACOM project. Page 118 of 118