Repair strategies for a survivable Water-Treatment Facility B.R. Haverkort A. Remke S. Roolvink Oktober 4-6, 2010
Background Dependability of critical infrastructures is essential Stable society and healthy economy Recovery costly and takes time Increased attention by governments [Dutch Ministry of Internal Affairs, 2005] [U.S. Department of Energy, 2002] Legal demand: 24x7 availability Repair crew labour cost Using Arcade-PRISM [Haverkort et al., 2010]
Water-Treatment Facility
Water-Treatment Facility
Approach Survivability [Cloth and Haverkort, 2005] Survivability is the ability of a system to recover to a predefined service level in a timely manner after the occurrence of a disaster. Quantitative survivability [Haverkort et al., 2010] Smarter repair strategies Engineering solution for state-space reduction
Least Redundancy First Phase with fewer redundancies can form bottlenecks Internal non determinism: Phase with most failed components Using other strategies (FFF, FRF, FRL, FFL) (3) Softeners (2) (1) Sand filters Reservoir (2+1) Pumps MTTF: 2000 h MTTR: 5 h MTTF: 1000 h MTTR: 100 h MTTF: 6000 h MTTR: 12 h MTTF: 500 h MTTR: 1 h
Drawback Least Redundancy First (1) AA (2) C C (3) B B Service level computed logic gates No awareness of system architecture
Largest Service Increase First Repair the bottleneck in the system Quantitative survivability Haverkort et al. [2010] Use Quantitative survivability logic gates Internal non determinism: Using other strategies (FFF, FRF, FRL, FFL) (3) Softeners (2) (1) Sand filters Reservoir (2+1) Pumps MTTF: 2000 h MTTR: 5 h MTTF: 1000 h MTTR: 100 h MTTF: 6000 h MTTR: 12 h MTTF: 500 h MTTR: 1 h
State-space reduction (3) (3) Line 1 Softeners Sand filters (3) Line 2 Softeners MTTF: 2000 h MTTR: 5 h MTTF: 1000 h MTTR: 100 h (1) Reservoir (2) (1) Sand filters Reservoir MTTF: 6000 h MTTR: 12 h (3+1) Pumps (2+1) Pumps MTTF: 500 h MTTR: 1 h States Trans. Dedicated line one 2048 22528 line two 512 4606 FRF line one 111809 388478 line two 8129 25838 FCFS line one 108505112 217010222 line two 986410 1972818 Weak bi-simulation Symmetry reduction Property driven state-space reduction
Bare Disaster Recovery State-space reduction: Engineering solution Remove operational components Remove remaining failures Leads to computational error Reduces the CSL formulae State-space: Lowerbound: N + 1 Upperbound: 2 N up up up up up up up up down down down down down down down down
Computational error (1/2) n (A(i) A(i)) (1) i=1 A(a) = (Pr{rec; t i } + Pr{rec; t i 1}) i 1 2 (2) A(a) = (Pr{rec; t i } + Pr{rec; t i 1}) i 1 2 (3)
Computational error (2/2) Computed error for 4 and 8 component model Separate error removal components and failures Fixed MTTR = 500hours Using factor = MTTF MTTR Three scenarios MTTF variable: All, operational, failed
Computational error 4 component model Factor All - only failed All - no failures All variable 1000 0.501014 0.748142 100 5.100960 7.573423 10 60.863636 85.588937 1 2201.908829 2445.115942 0.1 360026.495549 362465.290727 0.01 471459.423933 496790.947154 Failed variable 1000 0.501014 0.748142 100 0.510008 2.999116 10 0.599306 25.523268 1 1.335231 250.849079 0.1 6.229353 2502.548662 0.01 53.307423 25021.242601 Operational variable 1000 0.501014 0.748142 100 5.009667 5.241404 10 49.733335 49.868457 1 418.190579 417.712125 0.1 2859.439232 2843.079725 0.01 26300.982631 26232.361480
Computational error 8 component model Factor All - only failed All - no failures All variable 1000 4.9999999987449e-05 0.00000000 100 0.000600000000019918 0.00000000 10 4.57135053087927 4.42465000 1 38303.2013005532 38296.57775020 0.1 49048.3449828958 49157.75733291 0.01 47568.1432514447 49909.71685010 Failed variable 1000 4.9999999987449e-05 0.00000000 100 0.00000000 0.00000000 10 0.000200000000006639 0.00000000 1 0.0465500000000247 0.25890000 0.1 1.88405000566246 117.07245000 0.01 22.0107500251144 2330.11605007 Operational variable 1000 4.9999999987449e-05 0.00000000 100 0.000400000000013279 0.00000000 10 1.12605000090451 1.05180000 1 8265.09155004313 8258.16155023 0.1 49156.67105003 49149.74105022 0.01 49916.60528867 49909.67528886
Water-Treatment Facility (2) Intake Intake pumps pumps (2) Coagulation Coagulation (6) Fast Fast Sand Sand filters filters (2) Ozonation Ozonation MTTF: 800 h MTTR: 2 h MTTF: 12000 h MTTR: 6 h MTTF: 7000 h MTTR: 24 h MTTF: 5000 h MTTR: 3 h (3) (3) Slow Softeners Slow Softeners Sand Sand filters filters (1) Reservoir Reservoir (3+1) Pumps Pumps MTTF: 2000 h MTTR: 5 h MTTF: 8000 h MTTR: 48 h MTTF: 6000 h MTTR: 12 h MTTF: 500 h MTTR: 1 h
Facility phases (1/2) Intake pumps Water intake from lakes/rivers Coagulation Chemicals bond contaminants Removal of phosphates, organic materials, viruses, bacteria and heavy metals Fast Sand filters Gravel and Sand filters Removal of organic materials, manganese and iron Ozonation Oxidation using Ozon Removal of organic materials, viruses and bacteria
Facility phases (2/2) Softeners Removal of calcium Slow Sand filters Slow moving water Remove last remaining bacteria Reservoir Temporary storage Pumps Distribute to customers
Disaster 1 (2) Intake Intake pumps pumps (2) Coagulation Coagulation (6) Fast Fast Sand Sand filters filters (2) Ozonation Ozonation MTTF: 800 h MTTR: 2 h MTTF: 12000 h MTTR: 6 h MTTF: 7000 h MTTR: 24 h MTTF: 5000 h MTTR: 3 h (3) (3) Slow Softeners Slow Softeners Sand Sand filters filters (1) Reservoir Reservoir (3+1) Pumps Pumps MTTF: 2000 h MTTR: 5 h MTTF: 8000 h MTTR: 48 h MTTF: 6000 h MTTR: 12 h MTTF: 500 h MTTR: 1 h
Results disaster 1 service level: Bronze Silver Gold
Results disaster 1 service level: Bronze Silver Gold
Results disaster 1 service level: Bronze Silver Gold
Disaster 2 (2) Intake Intake pumps pumps (2) Coagulation Coagulation (6) Fast Fast Sand Sand filters filters (2) Ozonation Ozonation MTTF: 800 h MTTR: 2 h MTTF: 12000 h MTTR: 6 h MTTF: 7000 h MTTR: 24 h MTTF: 5000 h MTTR: 3 h (3) (3) Slow Softeners Slow Softeners Sand Sand filters filters (1) Reservoir Reservoir (3+1) Pumps Pumps MTTF: 2000 h MTTR: 5 h MTTF: 8000 h MTTR: 48 h MTTF: 6000 h MTTR: 12 h MTTF: 500 h MTTR: 1 h
Results disaster 2 service level: Bronze Silver Gold
Results disaster 2 service level: Bronze Silver Gold
Results disaster 2 service level: Bronze Silver Gold
Conclusions and Future work Conclusions: LRF and LSIF perform equally well LSIF smart strategy it takes architecture into account BDR useful approximation technique Future work: Analyse influence of more repair crews Use/create CTMPD model checking Automate BDR and CSL reduction
The end... Questions?
L. Cloth and B.R. Haverkort. Model Checking for Survivability! In Proc. of QEST 2005, pages 145 154, 2005. Dutch Ministry of Internal Affairs. Raport bescherming vitale infrastructuur. Technical report, 2005. http://www.minbzk.nl/actueel/kamerstukken?actitmidt=54878. B.R. Haverkort, M. Kuntz, A. Remke, S. Roolvink, and M. I. A. Stoelinga. Evaluating repair strategies for a water-treatment facility using arcade. In The 40th Annual IEEE/IFIP International Conference on Dependable Systems & Networks (DSN 2010), Chicago, IL, USA, 2010. U.S. Department of Energy. 21 steps to improve cyber security of scada networks. Technical report, 2002. www.oe.netl.doe.gov/docs/prepare/21stepsbooklet.pdf.