Bilfinger HSG Facility Management GmbH Independent analysis methods for Data Centers Max Altmeyer / Marvin Köhler August 19 th, 2015
Agenda 1. Weak point analysis 2. Energy efficiency analysis 3. Combined FMECA / RAM / ENERGY Analysis 4. Spare parts management Independent analysis methods for Data Centers Page 2
Level of detail Identification of weak points using a modular concept SPOF Quick Check - Risk assessment using specific questions Identification and assessment of weak points using a tailored questionnaire with regard to all relevant systems Diversion of optimization measures FME(C)A Failure mode and effects analysis Detailed system analysis Identification of failure modes and investigation of their potential influences Qualitative assessment with assistance of a risk matrix Qualitative assessment of optimization measures RAM System modeling Simulation of system availability and reliability within a defined period of observation Optimization of maintenance strategy Optimization of spare parts inventory Quantitative assessment of optimization measures Independent analysis methods for Data Centers Page 3
Probability of occurence The SPOF Quick Check evaluates the technical infrastructure of Data Centers using a tailored questionnaire SPOF Quick check methodology Assessment of Fire protection system Air-conditioning/cooling system Electrical system as well as physical security external risks energy efficiency Risk matrix Severity using a tailored questionnaire and a coordinated risk matrix to identify Single Points Of Failure and deduce optimization measures. 1 Identification of relevant risks using a tailored questionnaire 2 Risk assessment using a coordinated risk matrix 3 Define need for action 4 Create results report Independent analysis methods for Data Centers Page 4
sum S FMECA is an qualitative risk assessment of all system components FMECA methodology Risk matrix FMECA = occurrence (O) mean time between failures 10a < MTBF 1a < MTBF < > 50a < 50a 10a < 1 a maintenance strategy proactive reactive age of the component severity (S) <life cycle >= life cycle remaining redundancy spare parts availability ranking 1 2 3 4 2N 0 sum O N+1 On Site / agreement with 1 S x O supplier 3 4 5 6 7 8 N set-up time > 12h 2 1 3 4 5 6 7 8 N-1 no availability 3 2 6 8 10 12 14 16 IT Outage 8 3 9 12 15 18 21 24 4 12 16 20 24 28 32 5 15 20 25 30 35 40 6 18 24 30 36 42 48 9 27 36 45 54 63 72 10 30 40 50 60 70 80 MTBF mean time between failures 11 33 44 55 66 77 88 1 Breaking down the system into its components 2 Identification and assessment of potential failures 3 Inclusion of the operational employees 4 Define need for action 5 Create results report Failure Mode, Effects and Criticality Analysis Assessment of all system components with regard to Repair times Spare parts availability Redundancy concept Maintenance strategy Component age Failure rate Failure detection and their corresponding criticality to answer the following questions: What can fail? What is the cause of the failure? What are the effects of the failure? What can be done in a preventive way? Subsequently a catalog of measures for the compensation of critical components will be prepared. Independent analysis methods for Data Centers Page 5
RAM is an quantitative methodology to calculate the reliability, availability and maintainability of a system RAM methodology RAM = Reliability, Availability and Maintainability The RAM analysis is using a realistic system image (model) to identify reliability parameters like System availability and Number of system failures via a Monte-Carlo-Simulation. Excerpt of a RBD 1 As is - System model based on the FMECA 2 Parameterization and simulation of the as is - model based on the FMECA 3 Definition, modeling and simulation of optimization measures Process of a RAM analysis: Modeling the DC Mapping of all components as blocks within a Reliability Block Diagram (RBD) Definition of failure models for each block including fault rate, repair times, maintenance activities etc. Model simulation via Isograph Availability Workbench 4 Comparison of measures in regard to availability and reliability 5 Creation of a detailed results report including the most promising measures Independent analysis methods for Data Centers Page 6
Agenda 1. Weak point analysis 2. Energy efficiency analysis 3. Combined FMECA / RAM / ENERGY analysis 4. Spare parts management Independent analysis methods for Data Centers Page 7
Our approach towards a better energy efficiency in a Data Center Analysis methodology Specific task in Data Centers: Efficiency increase and therewith an improvement of PUE and other performance figures An energy efficiency analysis tailored to data centers provides a structured identification of energy potentials based on the DC-specific Bilfinger Best Practice for energy efficiency The analysis is following the energy flow in the data center: Starting at the grid connection via transformers, emergency power systems, UPS, PDU to the server and from there via the CRAC unit, the piping system, the pumps, the heat exchangers, the coolers, the chillers or other heat sinks to the heat dissipation into the environment Independent analysis methods for Data Centers Page 8
Bilfinger Best Practice for energy efficiency in a Data Center UPS-System Highly efficient systems Graduation / Shutdown Use of modular systems Alternative energy storage In the server room Cover plates Raised floor sealing Rack orientation Hot and cold aisle containment Management of perforated plates Air cooling units Retrofit of FC-controlled / EC fans Increase of temperature difference (supply / return air) Shutdown of excessively redundant plant Air flow optimization Self actuating flaps General electrical supply On site power generation CHP unit with absorption chillers Lighting system Outside the server room Optimize the cooling medium temperature Extension of free cooling Efficiency increase at partial-load operation Frequency converter or EC technology for actuators Alternative heat sinks Thermal energy storage Subsequent use of waste heat Independent analysis methods for Data Centers Page 9
Agenda 1. Weak point analysis 2. Energy efficiency analysis 3. Combined FMECA / RAM / ENERGY analysis 4. Spare parts management Independent analysis methods for Data Centers Page 10
RAM RAM FMECA ENERGY The FMECA / RAM / ENERGY analysis considers availability and energy efficiency to find the optimized solutions for your data center Methodology Availability analysis Energy efficiency analysis Identification of existing risks Identification of critical components / SPOFs Employee training Identification of energy potentials ROI calculation for identified measures Detailed report illustrating the results 1 2 3 4 Measures 5 6 7 8 Actual energy consumption 1 4 2 3 5 6 7 Measures System modeling and calculation of current availability and reliability using the Monte-Carlo- Simulation. Modeling of measures which effect availability to quantify their impact on availability and reliability. Modeling of energy efficiency measures to quantify their impact on availability and reliability. Actual availability Availability with measures in place Availability with energy measures in place Energy consumption with energy measures in place Target: Availability increase 1 Modeling and simulation of the most promising measures to find an optimal combination 4 2 8 2 3 7 Target: Reduction of energy consumption Independent analysis methods for Data Centers Page 11
Added value for all involved parties Optimized operation of a Data Center Operator Quantification of the energy efficiency of existing technical plant optimized parameterization and operating mode of plant Knowledge transfer in terms of legal requirements Energy efficiency analysis Data Center FMECA / RAM analysis Quantification of reliability and availability Information about the criticality of all components Improvement of the maintenance management Consideration of technical plant with a view on reliability and energy efficiency Energy saving potentials additionally checked regarding the availability influences Independent analysis methods for Data Centers Page 12
Agenda 1. Weak point analysis 2. Energy efficiency analysis 3. Combined FMECA / RAM / ENERGY analysis 4. Spare parts management Independent analysis methods for Data Centers Page 13
TCO An efficient spare part management guarantees a minimum of total cost of ownership Weak point analysis Spare parts analysis Plant 1 Plant 3 1 1 System - FMECA to identify critical plant 2 Plant - FMECA to identify critical components of the previously identified plant Plant 2 Plant 1 Identification of critical plant 3 Modeling of the critical plant and parameterization using the FMECA 4 Comparison of different spare parts concepts to achieve a minimum TCO 5 Creation of a detailed results report with a recommendation for the most cost-effective spare parts concept Comp. 2 Comp. 3 2 3 Comp. 1 K opt Comp. 2 Setup times Repair times Storage costs Downtime costs Identification of critical components Maintenance costs Hazard rate Observation time Excerpt of a RBD n opt 4 Spare parts Independent analysis methods for Data Centers Page 14
Contact Bilfinger HSG Facility Management GmbH Max Altmeyer An der Gehespitz 50 63263 Neu-Isenburg Germany Phone +49 6102 45-3433 E-Mail max.altmeyer@bilfinger.com www.datacenters.bilfinger.com