Risk Mitigation Strategies for Critical Infrastructures Based on Graph Centrality Analysis

Transcription

1 Risk Mitigation Strategies for Critical Infrastructures Based on Graph Centrality Analysis George Stergiopoulos a, Panayiotis Kotzanikolaou b, Marianthi Theocharidou c, Dimitris Gritzalis a, a Information Security & Critical Infrastructure Protection Laboratory, Dept. of Informatics, Athens University of Economics & Business, 76 Patission Ave., GR-10434, Athens, Greece b Dept. of Informatics, University of Piraeus, 85 Karaoli & Dimitriou, GR-18534, Piraeus, Greece c European Commission, Joint Research Center (JRC), Institute for the Protection and the Security of the Citizen (IPSC), Security Technology Assessment Unit, via E. Fermi 2749, Ispra, I-21027, Italy Abstract Dependency Risk Graphs have been proposed as a tool to analyze the cascading failures of Critical Infrastructure (CI) dependency chains. However, dependency chain analysis is not by itself sufficient for developing an efficient risk mitigation strategy, i.e. a strategy that defines which CI nodes should have high priority for the application of mitigation controls, in order to achieve the optimal overall risk reduction. In this paper we extend our previous dependency risk analysis methodology towards efficient risk mitigation, by exploring the relation between dependency risk paths and graph centrality characteristics. We explore how graph centrality metrics can be applied to design and to evaluate the effectiveness of risk mitigation strategies. We examine alternative mitigation strategies and we empirically evaluate them through simulations. Our experiments are based on random graphs that simulate common CI dependency characteristics, as identified by recent empirical studies. Based on our experimental findings, we propose an algorithm that can be used to prioritize CI nodes for the application of mitigation controls, in order to achieve an efficient risk mitigation. Corresponding author addresses: geostergiop@aueb.gr (George Stergiopoulos), pkotzani@unipi.gr (Panayiotis Kotzanikolaou), marianthi.theocharidou@jrc.ec.europa.eu (Marianthi Theocharidou), dgrit@aueb.gr (Dimitris Gritzalis) Preprint submitted to International Journal of Critical Infrastructure Protection May 4, 2015

2 Keywords: Critical Infrastructure, Dependency Risk Graphs, Graph Centrality, Cascading failure, Information gain, feature selection, Mitigation. 1. Introduction (Inter)dependencies between Critical Infrastructures (CIs) is a key factor for Critical Infrastructure Protection, since CI dependencies may allow a seemingly isolated on a single CI, to cascade on multiple CIs. One way to analyze CI dependencies is by using Dependency Risk Graphs, i.e. graphs in which the nodes represent (modules of) CIs and their directed edges represent the potential risk that the destination node may suffer due to its dependency from the source node, in case of a failure being realized at the source CI. In our previous work [1, 2, 3, 4, 5] we have proposed a risk based methodology that can be used to assess the cumulative risk of dependency risk paths, i.e. of chains of CI nodes which are (inter)connected due to their dependencies. Our methodology uses as input the existing risk assessment results from each CI operator and, based on the 1storder dependencies between the CI nodes, it assesses the potential risk values of all the n-order dependency risk chains. By computing and sorting all the dependency risk paths, the assessor may identify the most critical dependency chains, by identifying all potential dependency risk paths with a cumulative risk above a predefined threshold. Although the identification of the most important dependency chains is an important step towards efficient risk mitigation, there are still open problems. A simple mitigation strategy would be to apply security controls at the root node of each critical dependency chain. However, this may not always be a cost effective strategy, since there are cases of nodes whose effect is not properly measured. For example, consider the case where a small subset of nodes (not necessarily being roots of the critical chains) exists that affect a large number of critical dependency paths. Decreasing the probability of failure in these nodes by selectively applying to them security controls, may have a greater overall benefit in risk reduction. Another example is when nodes of high importance 2

3 exist outside the most critical dependency paths; for example, nodes that may concurrently affect many other nodes which belong to the set of the most critical paths, or nodes that do affect the overall dependency risk of the entire structure/graph. Motivation. Existing risk mitigation methodologies are usually empirical, and they focus on CIs which are possible initiators of cascading failure and usually belong to specific sectors such as Energy and ICT. By studying actual large-scale failures, it is obvious that these two sectors often initiate serious cascading effects amongst interdependent CIs (e.g. the famous California blackout scenario or the recent New York blackout scenario of 2015). However, by only focusing on some sectors and on potential initiators may not be suitable in every case. The impact of each dependency should be taken into account together with the position of each CI within a network of interdependent CIs. The systematic identification of the most important nodes and the prioritization of nodes for the application of security controls can therefore be a complex task. The need for a high-level and efficient risk mitigation technique which takes into account multiple characteristics of the interdependent CIs is a well-known fact [6]. An optimal risk mitigation methodology would allow the detection of the smallest subset of CIs that have the highest effect on the overall risk reduction on a risk graph, even though the candidate nodes might not initiate a critical path or may not even belong to the most critical paths. Contribution. Towards this direction, in this paper we explore the use of graph centrality metrics in Dependency Risk Graphs, in order to effectively prioritize CIs when applying risk mitigation controls. Our goal is to design, implement and evaluate the effectiveness of alternative risk mitigation strategies. Each examined mitigation strategy is implemented algorithmically and it is empirically evaluated through simulations. The ultimate goal is to identify the minimum subset of CI nodes within a graph, whose risk treatment (application of security controls) would result in the maximum risk reduction in the entire graph of interdependent CIs. Our extensions are threefold: (a) We run various data mining experiments to identify correlations between central- 3

4 ity metrics and CI nodes that have high impact in a dependency risk graph; (b) We use the optimum centrality metrics, as identified in the previous step, in order to develop and test various risk mitigation strategies that maximize risk reduction. Results show that each mitigation strategy is most effective in achieving specific goals; (c) We take into consideration nodes with a high number of inbound (sinkholes) and outbound connections and compare risk reduction results. In order to validate the proposed mitigation strategies, we run experiments on hundreds of random graphs with randomly selected dependencies. These tests demonstrate the efficiency of the proposed risk mitigation strategies. All random graphs match criteria found in real-world CI graphs of interconnected infrastructures. The proposed mitigation strategies can be used as a proactive tool for the analysis of large-scale dependency infrastructure interconnections and pinpoint underestimated infrastructures that are critical to the overall risk of a dependency graph. 2. Building Blocks 75 Our methodology will be based on three main building blocks: (i) dependency risk graphs and multi-risk dependency analysis methodology, in order to model cascading failures; (ii) graph centrality analysis over dependency risk graphs and (iii) feature selection techniques in order to evaluate the effect of various centrality metrics in risk mitigation. We briefly describe these building blocks Multi-risk dependency analysis methodology 80 Dependency can be defined as the one-directional reliance of an asset, system, network, or collection thereof within or across sectors on an input, interaction, or other requirement from other sources in order to function properly [7]. Our work extends the dependency risk methodology of [1, 2] used for the analysis of multi-order cascading failures. 4

5 First-order dependency risk In [1, 2], initially the 1st-order dependencies between CIs, as identified and are then extended to model the n-order relations. Each dependency from a node CI i to a node CI j has been assigned 1 an impact value, denoted as I i,j and the likelihood L i,j of a disruption being realized. The product of the last two values 90 defines the dependency risk R i,j caused to the infrastructure CI j due to its dependency on CI i. Dependencies are visualized through graphs G = (N, E), where N is the set of nodes (or infrastructures or components) and E is the set of edges (or dependencies). The graph is directional and the destination CI is receiving a risk from the source CI due to its dependency Extending to n-order dependency risk Let CI = (CI 1,...CI m ) be the set of all the examined infrastructures. The algorithm examines each CI as a potential root of a cascading effect. Let CI Y0 denote a critical infrastructure examined as the root of a dependency chain and CI Y0 CI Y1... CI Yn denote a chain of length n. Then the algorithm 100 computes, for each examined chain, the cumulative Dependency Risk of the n-order dependency chain as: DR Y0,...,Y n = n R Y0,...,Y i i=1 i=1 j=1 n i ( L Yj 1,Y j ) I Yi 1,Y i (1) Informally, equation 1 computes the dependency risk contributed by each affected node in the chain, due to a failure realized in the source node. The computation of the risk is based on a risk matrix that combines the likelihood 105 and the incoming impact values of each vertex in the chain. Interested readers are referred to [2] for a detailed analysis. 1 The impact and likelihood values are assumed to be extracted by the organization level risk assessments performed by each CI operator. 5

6 2.2. Graph centrality metrics 110 Graph centrality analysis attempts to quantify the position of a node in relation to the other nodes in a graph. In other words, graph centrality metrics are used to estimate the relative importance of a node within a graph. Various centrality metrics are used, each of them measuring a different characteristic: Degree centrality measures the number of edges attached to each node. Given a node u, degree centrality is defined as: C d (u) = deg(u), where deg(u) is the total number of its outgoing and ingoing edges Closeness centrality quantifies the intuitive central or peripheral place- ment of a node in a two dimensional region, based on geodesic distances. It is defined as: C c (u) = δ(u, v), where δ(u, v) is the average shortest v V (G) path between the examined node u and any other node in the graph. Betweenness centrality measures the number of paths a node participates in, and is defined as: C b (u) = δ ij (u), where δ ij (u) = σij(u) σ ij. Here, u i j V σ ij (u) denotes the number of geodesic distances from i to j where node u is present and σ ij the number of geodesic distances from i to j. Bonacich (Eigenvector) centrality [8] attempts to measure the influence of a node in a network as: c i (α, β) = (α βc i )R i,j, where α is a scaling j factor, β reflects the extent to which centrality is weighted, R is the node adjacency matrix, I is the identity matrix and l is a matrix of ones. An adjacency matrix is a N N matrix with values of 1 if there is an edge between two nodes and 0 otherwise. Eccentricity centrality is similar to closeness centrality. Essentially, it is the greatest distance in all shortest-paths between u and any other vertex (geodesic distance) Feature selection Feature selection (also called variable selection or attribute selection) is used in data mining for the automatic selection of attributes in a data set that are 6

7 most relevant to a predictive modeling problem. Feature selection is critical for the effective analysis of characteristics in data-sets, since data-sets frequently contain redundant information that is not necessary to build a predictive model [9]. It increases both effectiveness and efficiency, since it removes noninformative terms according to corpus statistics [10]. In our case, we will explore centrality metrics as potential features in a dependency risk graph. Our predictive model is to detect which features (i.e. centrality metrics) are able to characterize nodes that mostly affect the cumulative risk of dependency paths. This model will identify which centrality metrics are the most useful (i.e. provide the most information) about whether a CI node greatly affects the overall risk on a graph or not. The various centrality types are ranked lower if they are common in both the so called positive set (nodes that have a high affect on the risk) and the negative set (nodes with low impact on risk paths). The features are ranked higher, if they are effective discriminators for a class. Our feature selection tests will use two well-known selection techniques, namely Information Gain and Gain Ratio. Information Gain[11] is a statistical measure that has been successfully applied for feature selection in information retrieval. Authors in [12, 13] used this algorithm for feature selection before training a binary classifier. This technique will be used to classify high centrality metrics in nodes, in an effort to distinguish which centrality metrics tend to appear often in nodes that greatly affect the cumulative risk of graphs. Gain Ratio is a variation of the Information Gain algorithm. The difference is that the Information Gain measure prefers to select attributes having a large number of values [14]. Gain Ratio overcomes this bias by taking into consideration the statistical appearance of a feature in both negatives and positive sets (i.e. if a specific Centrality metric is detected in dangerous CI nodes scarcely but is never found in safe nodes, it will have a rather low information gain rank but a very high gain ratio one). We use this technique to pinpoint (combinations of) Centrality metrics that might scarcely appear in graphs, yet significantly affect the cumulative risk in paths when present. 7

8 3. Exploring Centrality Metrics in Dependency Risk Graphs We will explore the effect of centrality metrics on the importance of nodes in dependency risk graphs. Recall that in such graphs the edges denote directed risk relations between nodes, and not topological connections. Intuitively, nodes with high centrality measures are expected to have a high effect in the overall dependency risk. The question that needs to be answered is which centrality metrics have the higher effect in risk propagation. Then, the nodes that are exhibiting the highest values of these centrality metrics will be good candidates for the implementation of risk mitigation controls. An algorithm capable of identifying these nodes will be able to design a cost-effective risk mitigation strategy Analysis of centrality metrics We will examine each centrality metric in dependency risk graphs, in order to analyze its effect to the overall risk Degree centrality A node with high degree centrality is a node with many dependencies. Since the edges in risk graphs are directional, the degree centrality should be examined for two different cases: 185 Inbound degree centrality, i.e. the number of edges ending to a node. A high inbound degree centrality would indicate nodes known as cascading resulting nodes [15]. Outbound degree centrality, i.e. the number of edges starting from a node. This is an indication of a cascading initiating node [15]. 190 Nodes with high inbound degree centrality in a risk graph are natural sinkhole points of incoming dependency risk. Nodes with high outbound degree centrality seem to be suitable nodes to examine, when prioritizing mitigation controls. Indeed, if proper mitigation controls are applied to such nodes, then 8

9 multiple cumulative dependency risk chains will simultaneously be reduced. Such a strategy could be a much more cost-effective alternative, from a mitigation strategy aiming at applying controls to particular high risk edges or high risk paths. Obviously, it is not certain that applying one or more security controls to a node with high outbound degree centrality, will positively affect many (or all) outgoing dependencies chains using this node. However, a mitigation strategy would benefit if it were to initially examine such possible security controls Closeness centrality Nodes with high closeness centrality are nodes that have short average distances from most nodes in a graph. In the case of a dependency risk graph, nodes with high closeness are nodes that tend to be part of many dependency chains; sometimes might even initiate them. Since cascading effects tend to affect relative short chains (empirical evidence indicates that cascades rarely propagate deeply [16]), intuitively, nodes with high closeness centrality will have a bigger effect in the overall risk of the dependency chains, than nodes with low closeness centrality. To formalize this, consider equation 1 used to compute the cumulative risk of a dependency chain: the closer a node is to the initiator of a cascading event, the more effect will have on the cumulative dependency risk since the likelihood of its outgoing dependency will affects all the partial risk values of the following dependencies (edges). A more effective way to exploit closeness centrality in mitigation decisions, is to compute the closeness of every node with respect to the subset of the most important initiator nodes. Regardless of the underlying methodology, the risk assessors will already have an a priori knowledge / intuition of the most important nodes for cascading failure scenarios. For example, empirical results show that energy and ICT nodes are the most common cascade initiators [16]. In addition, nodes with high outgoing degree centrality will probably be nodes that participate in multiple dependency risk chains. Thus it is possible to first identify the subset of the most important nodes for cascading failures and then 9

10 compute the closeness of all others in relation with this subset of nodes, as a secondary criterion for mitigation prioritization Eccentricity centrality Similar to closeness centrality, this is a measure of the centrality of a node in a graph which is based on having a small maximum distance from a node to every other reachable node. Here, small maximum distances are the greatest distances detected in all shortest-paths between v and any other vertex (geodesic distance). If the eccentricity of a CI node is high, then all other CI nodes are in proximity Betweenness centrality In a dependency risk graph, a node with high betweenness centrality will lie on a high proportion of dependency risk paths. This means that even though such nodes may not be initiating nodes of a cascading failure (high outbound centrality) or may not belong to a path with high cumulative dependency risk, they tend to contribute to multiple risk paths and thus play an important role to the overall risk. Applying mitigation measures to such nodes (in the form of security controls) will probably decrease the dependency risk of multiple chains simultaneously. Comparing closeness centrality with betweenness centrality, it seems that closeness should precede betweenness as a mitigation criterion. Although nodes that are between multiple paths will eventually affect multiple chains, it is possible that a node lies in multiple paths but it tends to be at the end of the chain and practically not affecting the cumulative dependency risk chain (recall that nodes with high-order dependency are rarely affected) Bonacich (Eigenvector) centrality 250 A node with high Bonacich [17] (eigenvector) centrality is a node that is adjacent to nodes with very high or very low influence, depending on whether variable beta is beta > 0 or beta < 0. In a risk dependency graph, we are interested in nodes with high eigenvector centrality where β > 0, since these 10

11 nodes are connected to other nodes which also have high connectivity. This is an interesting measure for CI risk graphs, as such nodes not only can cause cascading failures to more nodes, but they can cause multiple cascade chains of high risk. On the contrary, a less connected node means that it shares fewer dependencies with other nodes. It is affected only by specific nodes in the graph. This means that applying mitigation measures to such a node may not affect significantly the overall risk. On the contrary, if one applies mitigation controls to the node with high Eigenvector centrality (when β > 0), this means that the most powerful (or critical) is modified and this, in turn, affects several other important nodes Applying feature selection to classify centrality metrics In order to validate our intuitive analysis of centrality metrics in dependency risk graphs, described in section 3.1, we will use a data mining technique named feature selection to empirically study possible relations between the risk level of nodes in dependency paths (as assessed using the aforementioned methodology from [1]) and the various centrality metrics for nodes. This experiment will show how often nodes concurrently appear in; (a) the set of the most critical paths (i.e., the paths with the highest cumulative dependency risk value) and (b) to sets of nodes with the highest centrality values in a dependency risk graph. For our experiments, we simulated graphs that were randomized based on restrictions proposed in recent empirical studies [16, 15], in order to resemble CI dependencies based on real data. In particular: Occasional tight coupling. Some node relationships in the dependency risk graph are selected to have high impact values. To achieve this, in our experiments randomization applies random risk values with relatively high lower and upper likelihood and impact bounds. Interactive complexity. In reality, dependencies between CIs have high interactive complexity. Informally, this implies that we cannot foresee all the ways things can go wrong. In our experiments, we constructed 11

12 random graphs of 50 nodes with high complexity; the critical paths up to 4th-order dependencies have possible combinations ranging from 230,300 to 2,118,760 chains. 285 Number of dependencies. We randomly simulated CI nodes having from 1 to 7 connections (dependencies). Path length. Critical paths of length 3 to 4 hops are used. Number of initiators. More than 60% of the nodes in the graph act as initiators. 290 Connectivity of initiators. In our experiments, the initiators tend to have higher connectivity than the other nodes. In our experiments we used the Weka 2 tool (version 3.6). All experiments were conducted using the available implementation of the Information Gain and Gain Ratio Feature selection algorithms in Weka, using the program s Attribute Selection Ranker to rank output results Results of feature selection for centrality metrics We analyzed a data-set of 32,950 nodes extracted from about 700 graphs containing about 774,015,270 paths. Dependency graphs were created with the aforementioned restrictions. Feature selection algorithms were used to detect correlations between high centrality metrics and CI nodes that greatly affect the overall risk. Weka s output rankings provide insight on how high centrality features characterize dangerous nodes that affect the risk in top critical paths. Detailed output values for each test are presented in Table 1 and 2. Table 1 depicts the output from the Information Gain algorithm whereas Table 2 provides the Gain Ratio output:

13 INFORMATION GAIN Inbound Test Outbound Test Betweeness Eccentricity Closeness Eigenvector Intersection of all Centralities Inbound degree (sinkholes) Outbound degree Table 1: Weka s output ranking using the Information Gain algorithm GAIN RATIO Inbound Test Outbound Test Betweeness Eccentricity Closeness Eigenvector Intersection of all Centralities Inbound degree (sinkholes) Outbound degree Table 2: Weka s output ranking using the Gain Ratio algorithm Tests are divided into two sets: Inbound tests and Outbound tests. The difference between the two is that Inbound tests use Inbound Degree Centrality measures whereas Outbound tests compute everything using Outbound Degree Centrality measures. Results from Weka provided valuable information as to how nodes with high centrality values affect dependency risk graphs: 1. The amount of critical paths that had at least one node with high centrality measurements, is very high: An average of 74,4% of the most critical path in all graphs contained a node with at least one high centrality metric. This percentage seemed to increase dramatically to 99.9% for the top 10% of most critical paths, which leads to the conclusion that, the top 10% of paths frequently pass through the same nodes as the top 1% of paths. If we analyze even larger sample sets (more than 10% of critical paths) almost all nodes with high centrality are part of some critical path. 2. Gain ratio showed that CI nodes with high centrality measures in all types of centralities rarely appear in graphs but, when they do appear, 13

14 they definitely affect the cumulative risk of critical paths. CI nodes that have high measures in all centrality metrics are of high importance. 3. Nodes with high Closeness centrality seem to be the best candidates for detecting important CI nodes. Yet, results clearly show that all centrality types seem to be useful indicators of important CI nodes (information gain is above 20% for all features). Thus, all centrality metrics must be taken into account in each graph. 4. Nodes with high Closeness and/or Degree centrality, both outbound and inbound (sinkholes), tend to mitigate high amounts of risk. An average of 49,6% of all dangerous nodes tested had a high Closeness centrality and an average of 45% had a high Degree centrality value. 5. Risk mitigation is highly depended on the graph itself. For example, in graphs that contain nodes acting like single points of failure, high centrality values are excellent indicators of important CI nodes. If however we examine graphs in which, for example, all nodes have equal degree, the identification of nodes for risk mitigation is often less successful. Luckily, CI dependency nodes tend to have specific structural characteristics [16, 15]. 4. An Algorithm for Efficient Risk Mitigation Strategy We have concluded from the previous experiment that, even if nodes with high centrality are only a fraction of the nodes in the most critical risk paths, yet, they do affect all critical paths greatly. Also, feature analysis results suggest that all Centrality measures can play an important role, depending on the graph analyzed. Thus, it seems essential for a risk mitigation strategy not only to take into consideration which Centrality measures provide the best results (like high Closeness measures) but also to consider the diversities of each graph and adapt accordingly by choosing the correct combination of Centrality metrics to pinpoint nodes for applying risk mitigation controls. 14

15 Algorithm 1: Decision-tree traversal using Information Gain IG x (N) to select nodes for Risk Mitigation Data: High Centrality subsets: C = {C C, C B, C I, C Eg, C E } Head set of all nodes: N Subset of dangerous nodes: D N Information Gain of Centrality subset c over set D: IG c (D) Result: Set of selected nodes for risk mitigation S N 1. Start: Mark all sets in C as unused; Position at the root node; 2. Leaf: if C is empty then Output remaining nodes in D; END; else if IG c (D) = 0 c C then Output remaining nodes in D; END; else go to Step 5 ; end 3. Decide case: if no unused sets in C then go to Step 4 ; else go to Step 5 ; end 4. Backtrack: if at the root then STOP; else return to the vertex just above this one and go to Step 3 ; end 5. Decision: if IG c1 (D) = IG c2 (D) c 1, c 2 C then Select the left-most c from set C; else Select the c from set C with highest IG c (D) c C; end N N c; D D c; remove selected c from C; go to Step 2 ; 350 We incorporate these results from the previous feature selection analysis Section 3.2), in order to design a node selection algorithm for efficient risk mitigation. The proposed algorithm is heuristic and recursively chooses the 15

16 best set of nodes, by limiting its selection options, until it reaches a set of selected nodes of a predefined size Modeling feature selection results to algorithmic restrictions Recall from Section 3.2 that the two most important conclusions support that: (i) Nodes with high centrality measures in all types of centrality metrics always affect the total risk of a graph; (ii) Nodes with high Closeness and high Betweenness centrality seem to have a greater impact than the other metrics; (iii) If a node concurrently has high centrality values for more centrality metrics, it tends to have a higher affect on the risk graph. To model these observations into a dynamic node selection algorithm for risk mitigation, we implemented the following steps: 1. Model Definition: Define the subsets of nodes which exhibit the highest centrality values fir each centrality metric, as well as their Information Gain over the entire node set. 2. Setup: Pre-compute all centrality measures for each node in a graph. Using these centrality measures, generate all high centrality subsets. These subsets will act as features for feature selection. In addition, pre-compute all dependency risk paths of the graph and sort them according to their weight (total cumulative risk value of each path). 3. Use decision trees and Information Gain for node selection: Define the Information Gain method and its use in splitting node sets. 4. Traverse a decision tree using Information Gain splits: Traverse a decision tree to select the optimal subset of nodes for applying risk mitigation controls Model Definition 380 Let C A denote the subset of the top X% of nodes with the highest centrality measurements from all centrality sets; C E denote the subset of the top X% of nodes with the highest Eccentricity, C C denote the subset of the top X% of nodes with the highest Closeness centrality; C I denote the subset of the top 16

17 X% of nodes with the highest Inbound Degree centrality (sinkholes); C B denote the subset of the top X% of nodes with the highest Betweenness centrality and finally C Eg denote the subset of the top X% of nodes with the highest EigenVector centrality. Let N denote the set of all nodes in the risk graph. Let D N denote the subset of nodes that participate in the top 20 most critical risk paths of the graph. Finally, let IG x (N) denote the information gain when choosing subset C x as a feature of the node set N. For example, IG B (N) denotes the information gain when using high Betweenness centrality as a feature of set N. Information Gain values are calculated for each centrality feature subset separately. The high centrality subset C x with the highest Information Gain over N, is considered to affect at most the risk graph, in comparison with the other high centrality subsets. Example: In a Dependency Risk Graph, there exist nodes that act as initiators of cascading events in the three most critical paths(i.e., critical paths with the highest cumulative dependency risk value). Yet nodes exist that, even though they have smaller influence on the top three most critical paths, they affect the overall risk of the entire Dependency Risk Graph more than the aforementioned initiators. Our risk mitigation algorithm will be able to suggest these nodes over the three initiator nodes, using the Information Gain output from all high centrality node subsets Setup 405 The centrality measures of each node are pre-computed, along with all the dependency risk paths of the graph. For each Centrality measure, the algorithm uses the pre-computed centrality values and creates subsets C E, C C, C I, C B and C Eg. Then, the algorithm computes all risk paths in the graph and sorts them according their total accumulated risk. All output will be used as input to the risk mitigation algorithm to choose the most effective nodes for risk mitigation using countermeasures. 17

18 Using decision trees and Information Gain for node selection A decision can be made (and, practically, a decision tree can be traversed) by creating a split over the head set N of CI nodes and create smaller subsets (graph leaves). In a Dependency Risk Graph, the algorithm chooses the specific centrality node subset with the highest impact on the graph. Then, it outputs a new subset which is the intersection of the chosen subset with the original set. This process is repeated on each derived subset in a recursive manner called recursive partitioning. The recursion is completed when the Information Gain calculation at a specific point in the decision tree produces the same Information Gain values for the target subset of nodes, or when splitting no longer adds information to the predictions. The algorithm in 1 is used to define a sequence of high centrality subsets that act as features. The risk mitigation algorithm investigates this sequence to rapidly narrow down the group of CI nodes with the highest impact in the Dependency Risk Graph. This feature selection increases both effectiveness and efficiency, since it removes non-informative CI nodes according to corpus statistics [10] Example: Traversing a decision tree using Information Gain splits The above algorithm aims to choose nodes with the highest impact on a Dependency Risk Graph. The above risk mitigation strategy can be simplified into the following sentence: Always choose the subset of nodes with the highest Information Gain from each step in the decision-tree. Let us assume to have a Dependency Risk Graph with all centrality measures calculated and all high centrality subsets defined. Thus the algorithm 1 would run as follows: 1. Step 1: We need to find which centrality feature produces the most Information Gain against the root set D N of nodes. The information gain is calculated for all four high Centrality subsets (features): IG C (D) = // Gain of Closeness subset over D IG B (D) = // Gain of Betweeness subset over D 18

19 IG S (D) = // Gain of Sinkholes subset over D 440 IG Eg (D) = // Gain of Eigenvector subset over D IG E (D) = // Gain of Eccentricity subset over D The Closeness high centrality subset provides the highest Information Gain, therefore it is used as the decision attribute in the root set D. From here on, D is the subset D C C of nodes and N becomes the N C C subset. 2. Step 2: The next question is what feature should be tested at the second branch? Since we have used C C at the root, we only decide on the remaining four centrality subsets: Betweenness, Sinkholes, Eigenvector and Eccentricity. IG B (D) = IG Eg (D) = IG S (D) = IG E (D) = In this step, the Betweenness subset shows the highest Information Gain; therefore, it is used as the decision node. From here on D is the subset D C B of nodes and N becomes the N C B subset. 3. Step 3 to Step N:This process goes on until all nodes are classified and D remains with three nodes, or if the algorithm runs out of Centrality attributes Validation of the proposed algorithm We empirically study the efficiency of the risk mitigation algorithm presented in Section 4. We run 2000 random experiments; in each experiment the graph was produced based on the limitations described in Section 3.2 and each graph contained up to 50 nodes. 19

20 Efficiency analysis of the proposed mitigation strategy We know now from previous results that centrality measures can indeed characterize important CI nodes that greatly affect the overall risk in a dependency risk graph. To evaluate the effectiveness of the proposed risk mitigation strategy we need to emulate the implementation of mitigation controls in the nodes identified as important (dangerous) nodes by the algorithm presented in Section 4. In practice, examples of controls can be the repair prioritization of nodes (which node to send a repair crew first), the increase of asset redundancy on a node, or any other control that may reduce the likelihood or the consequence of a failure. In our experiments, we emulate the implementation of mitigation controls to a node CI i, by reducing the values L i,j that define the likelihood to cascade a failure experienced at node CI i to another node CI j, node CI j that depends on node CI i. All likelihood values L i,j of all the dependencies that originate from CI i were reduced by 20%. To measure the result of applying risk mitigation controls on each selected subset of nodes, we calculated the dependency risk values in the same graph, before and after the implementation of risk mitigation and we calculated the risk reduction achieved in each case. We measured the efficiency of the proposed risk mitigation algorithm by calculating the percentage of risk reduction in graphs using the aforementioned emulation of mitigation controls. The effect on three risk metrics was examined using the proposed algorithm 1 from Section 4 : (i) risk reduction achieved in the most critical path, (ii) risk reduction in the total sum of the risk of the top 20 paths of the highest cumulative dependency risk and (iii) risk reduction in the total sum of the entire risk from all graph paths. Only acyclic paths were taken into consideration to reduce overlapping: For example, if a high risk path was cyclic, its acyclic part would most likely appear as a high risk path too. Thus, we avoided calculating cyclic paths in our experiments. Mitigation controls were implemented on three (3) out of the fifty (50) CI nodes in each experiment (6% of the nodes in the entire graph). 20

21 Figure 1: Risk reduction in the most critical path Effect of the algorithm in the most critical path 495 Using the proposed algorithm from Section 4 to select CI nodes for implementing mitigation controls, we achieved a average risk reduction of 12.1%. The highest risk reduction achieved in all experiments was 43.7%; almost half the risk of the most critical path was mitigated. The risk mitigation achieved was the highest achieved in all different mitigation strategies tested. A more 500 thorough comparison of the results can be found in the following section 5.2. As it is shown in Figure 1 in about 25% of the times (498 out of the 2000 tests) there is no risk reduction to the most critical path. This however is not a primary concern, since our goal is to reduce the overall risk of the dependency risk graph and not necessarily on nodes of the most critical path that are easy to identify. 505 We note however that when mitigate controls were applied on nodes belonging to the the most critical path, the algorithm achieved better risk reduction results than alternative well-known mitigation strategies, such as the implementation of mitigation controls on the initiator of the most critical path. 21

22 Figure 2: Risk reduction in the top 20 most critical paths Effect of the algorithm in the top 20 risk paths 510 In Figure 2, we see that the highest risk reduction achieved in the sum of all the risk values derived from the top 20 critical paths, was 37.5%. The average risk reduction achieved in graphs using the proposed algorithm was 9.8%. In addition, we see that, in 2000 experiments, only 12 graphs had a structure that prohibited our algorithm from achieving any risk reduction in the top 20 paths; % of the algorithm s applications achieved some risk mitigation and the 47.3% of applications achieved high risk reduction (10% or more). These are interesting results: The top 20 most critical paths are comprised of 1000 nodes in each experiment. Even though paths share CI nodes, being able to choose only 3 out of 50 nodes for risk mitigation in 20 paths is far 520 more complex than choosing the correct nodes only in the most critical path comprised of 5 nodes. Again, the proposed algorithm had the best performance out of the three different mitigation strategies tested. In addition, by monitoring the algorithm s execution and CI node choices, we validated some preliminary results from Feature Analysis: The risk mitigation 525 algorithm was mostly choosing CI nodes with high centrality measures in all 22

23 centrality metrics (i.e. the intersection of all high centrality sets for all centrality metrics). Besides aggregating all centrality metrics, the second most frequent combination of centrality types was nodes with high Degree centrality (e.g. sinkholes) and Closeness centrality Effect of the algorithm in the entire dependency risk graph Figure 3: Risk reduction in the complete set of paths Last but not least, we analyzed the efficiency of the proposed algorithm when trying to reduce risk in the entire risk graph; not only on the top 20 paths. Here, we tested risk mitigation using the proposed node selection algorithm on the sum of all paths in dependency graphs. The results from 2000 experiments 535 are shown in Figure 3. We see that the highest risk reduction achieved was 12.2%, with an average risk reduction of 7.5% in all experiments. Interestingly, in 86.4% of the tests, the risk reduction achieved was between 5% and 10% with an average of around 8.7%. With such low dispersion of results, statistical probability suggests that the proposed algorithm s output shows strong signs 540 of uniformity and homogeneity. Again, the proposed algorithm had the best performance out of the three different mitigation strategies tested. 23

24 5.2. Comparing results with alternative mitigation strategies 545 After presenting results of the proposed algorithm, we will now compare our proposed risk mitigation algorithm with alternate mitigation mechanisms to evaluate its performance. Two alternate mitigation mechanisms for choosing CI nodes were used, both well-known and frequently proposed scenarios Using initiators from the most critical paths 550 The first strategy is based on the idea that initiators are critical points of failure in dependency risk chains. The first alternate mitigation strategy chooses to implement mitigation controls on CI nodes that are the initiators of critical dependency paths. Thus, in our experiments, three (3) out of the fifty (50) CI nodes are chosen (6% of the nodes in the entire graph) that act as initiators of the corresponding top 3 most critical paths Using nodes with high Degree Centrality (sinkholes and outbound) The second strategy focuses on the common idea amongst auditors that CI nodes with high inbound and outbound Degree Centrality (i.e. nodes with a high amount of incoming and/or outgoing connections) affect dependency risk paths a lot. For this reason, the second mitigation strategy chooses nodes the top 6% of nodes with the highest inbound and outbound Degree Centrality measures to apply mitigation controls Comparison results Strategies Risk Metrics Most critical path Top 20 critical paths Entire graph Information Gain Top Initiators Top Sinkholes 43.7% (max) 12.1% (avg) 37.5% (max) 9.8% (avg) 12.2% (max) 7.5% (avg) 38.4% (max) 11.8% (avg) 28.7% (max) 10.0% (avg) 10.1% (max) 5.3% (avg) 34.5% (max) 10.3% (avg) 29.8% (max) 7.3% (avg) 10.8% (max) 6.7% (avg) Table 3: Comparison of results from all mitigation strategies Table 3 shows the results from all mitigation strategies. Output marked in red exhibits the highest values achieved. Clearly, in our tests, the proposed de- 24

25 565 cision making algorithm using Information Gain had a clear advantage over the rest as far as pure numbers are concerned. Using the top initiators mechanism scored a higher average in reducing risk at the top 20 most critical paths, albeit only marginally Implementation details We briefly describe the technical aspects of the executed tests; platforms, technologies and algorithm implementations. All the tests were performed on an Intel Core i-7 with 4 cores and 16 GB of RAM. In order to test the proposed risk mitigation strategies, we developed an automatic Dependency Risk Graph generator in Java. The application is able to randomly create risk graphs and, then, run the mitigation algorithms on the developed models. Finally, it provides information on the percentage of risk reduction achieved using each of the mitigation mechanisms. We selected Neo4J [18] as a technology to implement our dependency analysis modeling tool, due to its adaptability, scalability and efficiency [19, 20]. Neo4J builds upon the property graph model; nodes may have various labels and each label can serve as an informational entity. The nodes are connected via directed, typed relationships. Both nodes and relationships hold arbitrary properties (key-value pairs). Using the Neo4J technology, the developed framework can represent complex graphs of thousands of dependent CIs through a weighted, directed graph. The developed application is stand-alone: It builds a random Dependency Risk Graph using the aforementioned restrictions to imitate real-world CI dependencies. It then pre-calculates all necessary graph data and executes the selected risk mitigation algorithm. When finished, it outputs a percentage of the risk reduction achieved in the current experiment/graph. 6. Conclusions 590 In this paper, we extended our previous method on dependency risk graph analysis, using graph centrality measures as additional criteria for the evaluation 25

26 of alternative risk mitigation strategies. We used feature selection algorithms and confirmed that the most critical paths in dependency risk graphs tend to involve nodes with high centrality measures. Since various centrality metrics exist, we performed multiple tests in order to define which measures and combinations are the most Our results showed that aggregating all centrality sets to identify nodes with high overall centrality provides an optimal mitigation strategy. However, by comparing the different centrality metrics, we showed that nodes with high Closeness and high Degree centrality measures seem to have the highest impact on the overall risk of a graph. Still, simply choosing nodes from these sets is not always a viable choice, as there exist dependency graphs where no nodes exist in specific high centrality sets or there may be contextual reasons that inhibit the application of controls in these nodes. For this reason, we developed a risk mitigation algorithm that uses a decision tree built from subsets of nodes with high centrality measures for each centrality metric. The Information Gain algorithm is used to traverse the tree and choose subsets of high centrality nodes that best characterize each graph under test. Thus, the algorithm is able to dynamically pinpoint the optimal subset of nodes with high centrality features, for overall risk mitigation Advantages and Limitations of the proposed mitigation method Although the proposed mitigation algorithm achieved the highest maximum risk mitigation when compared with well-known mitigation strategies, our experiments showed that there are cases where the algorithm will not select nodes belonging to the most critical path set. This can be considered as a potential limitation, and at the same time, a desired characteristic. On one hand, the proposed algorithm will not always reduce the risk of the most critical path. Note however that if this is required, it can be trivially achieved by selecting the initiator of the most critical path. On the other hand, the proposed algorithm outperforms all its competitors, since it searches for nodes that have a great effect at the overall risk and not for a subset of critical paths. This is the main advantage of the proposed algorithm; it can be used to identify CI nodes 26

27 which may not appear in the top critical path(s) but they greatly affect the overall risk of a dependency graph. While the exhaustive computation of the complete set of dependency risk paths may provide useful information and reveal underestimated CI nodes, assessors may also use the algorithm to examine particularly complex scenarios were manually selecting nodes for implementing mitigation controls might be very time consuming and/or impractical. A limitation of this approach is the need for input data from risk assessments performed at the each CIs. This is an inherent problem of all empirical risk approaches, since they analyze dependencies based on previous incidents. Sector coordinators or regulators may collect data concerning a specific sector (such as ICT or energy). National CIP authorities or CERTs may be able to collect such information. References [1] P. Kotzanikolaou, M. Theoharidou, D. Gritzalis, Cascading effects of common-cause failures in critical infrastructures, in: J. Butts, S. Shenoi (Eds.), Critical Infrastructure Protection, Vol. 417 of IFIP Advances in Information and Communication Technology, Springer, 2013, pp [2] P. Kotzanikolaou, M. Theoharidou, D. Gritzalis, Assessing n-order dependencies between critical infrastructures, IJCIS 9 (1/2) (2013) [3] P. Kotzanikolaou, M. Theoharidou, D. Gritzalis, Interdependencies between critical infrastructures: Analyzing the risk of cascading effects, in: Critical Information Infrastructure Security, Springer, 2013, pp [4] M. Theoharidou, P. Kotzanikolaou, D. Gritzalis, Risk assessment methodology for interdependent critical infrastructures, International Journal of Risk Assessment and Management 15 (2) (2011) [5] M. Theoharidou, P. Kotzanikolaou, D. Gritzalis, A multi-layer criticality assessment methodology based on interdependencies, Computers & Security 29 (6) (2010)