Tier Classification of Data Centres Eric Maddison Senior Consultant, The Uptime Institute
Uptime Institute Unbiased and vendor neutral Global thought leadership for the data center industry Founded 1993 Standards and Training > Created the global data center Tier Classifications System > Certification of Designs, Facilities, Operations > Training of data center engineers and operators > Data Center Due Diligence Assessment > Management & Operations Stamp of Approval > Facilities Management Program Development Uptime Institute Network > North America, EMEA, APAC and LATAM Symposium Annual industry event 2
Uptime Institute EMEA Team Established 2012 Based in London and Dubai Commercial and service delivery capabilities Part of global Uptime Institute organisation > Global standards > Local execution 3
Uptime Institute Tier Standards 4
Why does Tier Certification matter? Quality and reliability of the data center > Independent third party review by Uptime Institute > Globally understood standard (Tier I, II, III and IV) > Protects against loss of weakness in infrastructure > Ensures consistent solution Enterprise Recognizes organizational accomplishment > Demonstrate to senior management that performance capability is there Positioning and differentiation > An Uptime Institute certified data center has an expected level of resilience corresponding to business requirements > Service Level Agreements can be quantified 5
Tier Certifications Worldwide 6
Uptime Institute Tier Standard Owners Advisory Committee Members Owner Advisory Committee (OAC) is a consortium of data center owners/operators having received Uptime Institute Tier Certifications > OAC is a formally organized group created to validate and endorse the contents and direction of the Tier Standards OAC represents global leaders in the Financial, Healthcare, Insurance, Manufacturing, Retail, and Government industries OAC members are worldwide: Australia, Brazil, Canada, Costa Rica, India, Kingdom of Saudi Arabia, Luxembourg, Russia, South Africa, Spain, Switzerland, Taiwan, UAE, UK, and the U.S. 7
Tier Classification Genesis An Owner s Request Data Center Performance and Investment Criteria An Industry Solution Tier Classifications Define Site Infrastructure Performance An International Standard Data Center Site Infrastructure Tier Standard Annually Adjudicated Standard Owners Advisory Committee (OAC) 8
Evolution of Tier Documentation 2008 and previous > White paper: Industry Standard Tier Classifications Define Site Infrastructure Performance Included attributes at request of operations teams Provided illustrations at request of engineering community 9
Illustrations Removed 10
Attributes Table Eliminated TIER I TIER II TIER III TIER IV Building Type Tenant Tenant Standalone Standalone Staffing None 1 Shift 1+ Shifts 24 by Forever Usable for critical load 100%N 100%N 90%N 90%N Initial build-out UPS output watts/ft² (typical) 20-30 40-50 40-60 50-80 Ultimate UPS output watts/ft² (typical) 20-30 40-50 100-150 150+ Class A uninterruptible cooling No No Maybe Yes Support space to raised floor ratio 20% 30% 80-90+% 100+% Raised floor height (typical) 12 18 30-36 30-36 Floor loading lbs/ft² (typical) 85 100 150 150+ Utility voltage (typical) 208,480 208,480 12-15 kv 12-15 kv Single points-of-failure Many + human error Many + human error Some + human error Fire+ EPO + human error Annual site caused IT downtime (actual field data) 28.8 hours 22.0 hours 1.6 hours 0.8 hours Representative site availability 99.67% 99.75% 99.98% 99.99% Months to implement 3 3 to 6 15 to 20 15 to 20 Year first deployed 1965 1970 1985 1995 11
Tier Standards Tier Standard: Topology > Defines Tier Classification System > Performance Requirements for each functionality objective Tier Standard: Operational Sustainability > Defines Expected Operational Behaviors > Aligned by Tier Both Are Owner Standards 12
Tier Standards The Current Versions http://www.uptimeinstitute.com/publications#tier-classification 13
Tier Standard Topology 14
Definitions N R = Required number of units (components) necessary to meet need = Capacity of system when discussing the design load or demand (Excludes IT-level architecture redundancy) = Number of redundant components Outage = A loss of IT equipment Loss of utility power, water, gas, or a hot day are expected events 15
Tier Classification Objectives Provide a common understanding and language of data center infrastructure concepts Identify expected data center performance by differences in topology > Recognize that all data centers are not alike nor need to be > Refers to a single operations site Tier concepts are simple; application requires extreme diligence 16
Tier Topology Concepts Tier Classifications represent broad topology concepts > Redundant capacity components > Redundant (diverse) distribution paths > Classification based on Maintenance opportunity and Failure response Fractional concepts are not recognized > No Standard for Tier III.6 > Tier III + is undefined > Tier rating tied to lowest system 17
Tiers Pertain to Design Topology Configuration of the site infrastructure equipment > Maintenance opportunities > Fault response Driven by owner s tolerance for an outage > Planned or unplanned downtime Do not address operation or location of the facility 18
Key Tier Topology Principles Begin and end at the IT Critical Environment > Owner decides what is critical or not Build upon the previous Tier Provide facility operation and maintenance opportunities Considers only the built environment > Temporary, roll-up, emergency, truck-mounted equipment not factored 19
Tier Classifications Tier I Basic Capacity Tier II Redundant Components Tier III Concurrently Maintainable > Applies to Each and Every component and path Tier IV Fault Tolerant > Considers a Single event, but Consequential impact 20
Tier I Basic Capacity Summary > Non-redundant capacity components ( N only) Critical Environment power and cooling systems > Single distribution path Operations and Maintenance Considerations > Site infrastructure and Critical Environments must be shut down for annual maintenance and repair work > Installation or construction of capacity may disrupt the Critical Environment 21
Tier I Operational Risks Any capacity component or distribution path element failure will disrupt the Critical Environment All or portions of the Critical Environment are susceptible to disruption due to planned and unplanned activities Operations (Human) errors have high likelihood of site disruption Deferred maintenance to avoid downtime increases the risk and severity of disruptions in the Critical Environment 22
Tier II Redundant Components Summary > Redundant capacity components (N+R) Engine generators, UPS modules, IT and UPS cooling > Single distribution path Operations and Maintenance Considerations > Some capacity components can be maintained or repaired with limited impact to the Critical Environment > Site infrastructure and Critical Environments must be shut down for annual maintenance and repair work > Installation or replacement of capacity components may disrupt the Critical Environment 23
Tier II Operational Risks A capacity component failure may disrupt the Critical Environment A distribution path element failure will disrupt the Critical Environment All or portions of the Critical Environment are susceptible to disruption due to planned and unplanned activities Operations (Human) errors have high likelihood of site disruption Deferred maintenance to avoid downtime increases the risk and severity of disruptions in the Critical Environment 24
Tier III Concurrently Maintainable Summary > Redundant capacity components and independent distribution paths (transformers and transfer switches are path elements) > Elements of a distribution path may be inactive > Predicated on dual-cord IT equipment > No runtime limits on engine-generator capacity at design load Operations and Maintenance Considerations > Each and Every capacity component and distribution path element can be taken out of service for maintenance, repair, or replacement without impacting the Critical Environment or IT processes 25
Tier III Practical Insight Each and Every extends to: > Valves and fittings > Switchgear and panels Maintenance focus requires: > Dead lugs for safety during electrical activities > Dry pipes to avoid liquid spills Single Points-of-Failure are not eliminated 26
Tier III Operational Risks All or portions of the Critical Environment are susceptible to disruption due to failures or unplanned activities Scheduled maintenance activities occur on redundant components, distribution paths, and systems which will reduce redundancy and may elevate risk of disruption Operations (Human) errors may lead to site disruption Single-cord IT equipment or incorrect installation may defeat Tier III infrastructure 27
Tier IV Fault Tolerant Summary > Redundant capacity components > Redundant active distribution paths > Compartmentalization of both capacity components and distribution paths > N after any failure > Continuous Cooling for critical IT and UPS systems > No runtime limits on engine-generator capacity at design load Operations and Maintenance Considerations > Each and Every capacity component and distribution path element can sustain a failure, error, planned, or unplanned event without impacting the Critical Environment or IT processes 28
Tier IV Practical Insight Single event with consequential impact > Loss of a switchboard impacts everything downstream powered by that switchboard > Replacing a valve requires a dry pipe on both sides Continuous Cooling must be consistent with UPS for IT equipment power Most human errors are considered failure events > Exceptions: Emergency Power Off (EPO) Activations Fire Suppression Activations Failure to properly connect IT loads 29
Tier IV Operational Risks The Critical Environment is not susceptible to disruption due to failure of any single capacity component, distribution element, site infrastructure system, or single human error Scheduled maintenance activities occur on redundant components, elements, and systems which may create a risk of disruption Operation of the EPO system, activation of the fire protection system, or malicious human interaction may lead to site disruption Single-cord IT equipment or incorrect installation may defeat Tier IV infrastructure 30
Tier IV Autonomous Response Operator intervention shall not be required to respond to single system failure Control system failure shall not disrupt Critical Environment > Critical Environment must remain stable with failed control system Tier IV data center facility infrastructure control systems > Detect system failure > Isolate and contain failure > Sustain N capacity after failure of any component or path 31
Tier Standard Topology Application to Mechanical and Electrical Systems 32
N Either kw or number of capacity components > Capacity of system to meet the load > Required number of units to meet the load 33
Nominal Capacity: 300 kw Example Components: N = 1 Capacity: N = 300 kw Components: N = 2 Capacity: N = 300 kw 34
N+1 Components N= 300 kw 600kW Installed N + 1= Component Count 450 kw Installed Stranded capacity is underutilized investment but running all pumps at reduced speed saves energy 35
2N Capacity N= 300 kw 600 kw Installed 36
N and Tiers No direct relationship between N and Tiers N+R or 2N does not guarantee functionality If N is one piece of equipment, N+1 = 2N = S+S > What Tier? 2N does not speak to Concurrent Maintainability or Fault Tolerant criteria N is often applied to capacity components, but not distributions paths > Common shortfall 37
Component Count Does Not Determine Tier Level The Uptime Institute: > N+1, N+2, N+N or 2(N+1) does not determine Tier level > It is possible to achieve Tier IV with just N+1 components for some systems 38
Tier I Chilled Water Distribution Components: N = 2 Basic Capacity Chilled Water Supply Chilled Water Return 39
Tier II Chilled Water Distribution Components: N = 2 Redundant Components Chilled Water Supply Chilled Water Return 40
Tier III Chilled Water Distribution Components: N = 2 Concurrently Maintainable Chilled Water Supply Chilled Water Return 41
Tier III Chilled Water Distribution Components: N = 2 Concurrently Maintainable Chilled Water Supply Chilled Water Return 42
Tier III Chilled Water Distribution Components: N = 2 Concurrently Maintainable Chilled Water Supply A and B Chilled Water Return A and B 43
Tier IV Chilled Water Distribution Components: N = 2 Chilled Water Supply A and B Concurrently Maintainable Fault Tolerant Chilled Water Return A and B 44
Tier I Chilled Water System Basic Capacity 45
Tier II Chilled Water System Redundant Components 46
Tier III Chilled Water System Concurrently Maintainable 47
Tier IV Chilled Water System Components N = 2 Fault Tolerant Concurrently Maintainable 48
2.4.1.d) Continuous Cooling Continuous Cooling is the capability to maintain steady state in the Critical Environments during a UPS discharge when neither utility nor enginegenerator power is available > Computer rooms, Network rooms, UPS rooms Corollary to uninterrupted power for IT devices Continuous Cooling is required to meet Tier IV criteria > Part of the Tier Standard: Topology 49
Tier I Power Backbone Basic Capacity 50
Tier II Power Backbone Redundant Components 51
Tier III Power Backbone Concurrently Maintainable 52
Tier IV Power Backbone Concurrently Maintainable Fault Tolerant 53
Tier III Engine-Generator Concept (2N) N = 2 Concurrently Maintainable 54
Tier III Engine-Generator Concept (N+1) N = 2 Concurrently Maintainable 55
Tier IV Engine-Generator Concept (N+1) N = 2 Concurrently Maintainable Fault Tolerant 56
Utility Power For Tier III and IV, engine-generator systems are considered the source of reliable power for the data center Utility power is an economic alternative The utility power system does not have to be Concurrently Maintainable Multiple utility feeds for redundancy are NOT required for any Tier > Multiple utility feeds may be required for capacity 57
Engine-Generator Ratings International Standards Organization (ISO) 8528-1 is the governing document Rating classifications > Emergency Standby > Prime > Continuous Major differences in operating hours and power output capacities 58
Standby Rating Definition > The maximum power available during a variable electrical power sequence, under the stated operating conditions, for which a generating set is capable of delivering in the event of a utility power outage or under test conditions for up to 200 hours of operation per year ISO 8528-1 But > The permissible average power output over 24 hours of operation cannot exceed 70% of the standby rating unless otherwise agreed by the manufacturer ISO 8528-1 59
Prime Rating Definition But > The maximum power which a generating set is capable of delivering continuously while supplying a variable electrical load when operated for an unlimited number of hours per year ISO 8528-1 > The permissible average power output over 24 hours of operation cannot exceed 70% of the Prime rating unless otherwise agreed by the manufacturer ISO 8528-1 60
Continuous Rating Definition > The maximum power which the generating set is capable of delivering continuously while supplying a constant electrical load when operated for an unlimited number of hours per year ISO 8528-1 61
Rating Examples (Same Frame) 62
Tier Standard Topology Additional Considerations 63
2.4.1.c) Compartmentalization Applies to complementary systems and distribution paths in Tier IV topology Tier IV requires physical isolation to prevent a single event from simultaneously impacting more than the number of redundant components or systems Each compartment shall contain no more than the number of redundant components > Where there are N+R components, no more than R components inside a single room 64
Equipment Compartmentalization Not Compliant with Tier IV Requirements 65
Effective Compartmentalization Secondary Chilled Water Pumps N=3 (R=1) Tier IV Compliant Chilled Water Machines and Primary Pumps N=4 (R=2) 66
Electrical Compartmentalization Not Compliant with Tier IV Requirements 67
Electrical Compartmentalization Tier IV Compliant 68
Electrical Compartmentalization Not Compliant with Tier IV Requirements 69
Electrical Compartmentalization Tier IV Compliant 70
Distribution Path Not Compliant with Tier IV Requirements 71
Distribution Path Tier IV Compliant 72
Site Communications Path POP A POP B Not Compliant with Tier IV requirements Concurrently Maintainable paths required for Tier III Common vault is not Fault Tolerant Compartmentalized path required for Tier IV 73
2.6 Ambient Temperature Design Points For all Tiers, equipment to be selected and sized per ASHRAE Handbook Fundamentals extreme maximums > Dry Bulb: N=20 value > Wet Bulb: Extreme Maximum Value Ambient temperatures impact the capacity of cooling equipment and engine-generator radiators Equipment must be sized to meet the extreme maximum temperatures Altitude over 3,000 feet may also impact capacity 74
ASHRAE Design Conditions Reference: 2009 ASHRAE Handbook Fundamentals (Updated and Published every 4 years) DALLAS, TEXAS, USA Monthly Design Dry Bulb Temperature Profiles (July) Required for all Tiers Extreme Annual Design Conditions 2% 1% 0.4% N=20 years 35.7 C 36.8 C 38.0 C 43.3 C 96.3 F 98.2 F 100.4 F 109.9 F (Probable) Hours Exceeded 15 hours 7.5 hours 3 hours Unlikely (20-year period) 75
Manufacturing Tolerances This means that a nominal 100kW unit could actually deliver only 92kW The performances are obtained through calculations and they are therefore subject to the consequent variations. Declared performances according to EN14511:2011 76
Fuel System Tier Progression Tier I Fuel storage to support engine generator Tier II Redundant tanks and pumps Tier III Redundant fuel supply paths to N engine generators Tier IV Autonomous control response to component or path failure 77
Fuel System Tier Criteria Tanks, piping, and pumps > Minimum of 12 hours of on-site fuel storage for all Tiers > Concurrently Maintainable for Tier III > Fault Tolerant for Tier IV while engines are in operation! Tier III fuel system must provide fuel from N tanks to N engines during scheduled maintenance on any fuel system component Tier IV fuel system controls must respond to system failures autonomously > Fuel system must provide fuel to N engines after any failure > Isolate and contain a leak or other failure 78
Fire Detection and Suppression Systems Application of Tiers focuses on the connection between the fire detection/suppression system and the HVAC and electronic systems supporting the critical environment Does not include the physical suppression system > e.g., sprinkler piping, sprinkler valves, etc. Tier III requires that the Critical Environment must not be impacted by any fire detection/suppression component taken out of service for calibration, repair, or replacement on a scheduled basis Principal Tier IV consideration > Tier IV requires autonomous response to failure 79
Building Automation Includes > Supervisory Control and Data Acquisition (SCADA) > Plant Controls (BAS or BMS) > Emergency Power Off (EPO) Tier III requires that the Critical Environment must not be impacted by any control element taken out of service for calibration, repair, or replacement on a scheduled basis Principal Tier IV consideration > Tier IV requires autonomous response to failure 80
Other Ancillary Systems No Tier-level specific or Certification criteria > Building Pressurization (Makeup Air Systems) > Battery Room Ventilation > Water Treatment Systems > Free-cooling or Economizer Systems > Lightning Protection > Grounding > Fuel Polishing Integrate carefully! 81
Component Labeling All Tier levels require that each and every critical component is uniquely labeled > e.g., CH-1 (chiller #1), UPS-1A, etc. > Includes breakers and valves Required to develop commissioning plans, preventive maintenance program, and operational procedures 82
Tier Standard Operational Sustainability 83
Operational Sustainability The behaviors and risks beyond Design Topology that impact the ability of a data center to meet its Business Objectives over the long term 84
Indicators of Operational Sustainability Shortfalls Computer room or storage space? Accident or poor planning? 85
Indicators of Operational Sustainability Shortfalls Office space in the computer room Mercedes in a data center support space 86
Genesis of Operational Sustainability Uptime Institute over the years has observed management issues posing the largest risk to uptime then physical infrastructure > Inadequate staffing > Ineffective or non-existing maintenance and training programs > Lacking processes and procedures > Resulting in the majority of outages being caused by human error No standard existed to help owners/operators determine > Common language/vocabulary of data center operations > Focus of data center management > Resource allocation > Justification of additional resources 87
Genesis of Operational Sustainability Failures Caused by human error 73% (and thus avoidable) Uptime Institute Abnormal Incident Reports through 1 January 2014 88
Purpose of the Standard Addresses behaviors and risks to: > Reduce failures due to human error (cause of 70% of failures) > Achieve maximum potential from the facility infrastructure Provides a tool to measure a data center s Operational Sustainability using these behaviors and risks Retains focus on those items that will most improve the performance of a data center Encourages doing it your way results oriented > Behaviors, not requirements 89
An Owner s Standard Developed by Uptime Institute team with hands-on site operations experience Tier Standard: Operational Sustainability (1 July 2010) > Measures effectiveness of data center management > Assists owners to maximize the investment in infrastructure > Gives owners an indication of where the data center stands operationally in relation to others > Supports efforts to maximize uptime and minimize risk Adjudicated by the Owner s Advisory Committee 90
Elements of Operational Sustainability 91
Operational Sustainability Rating System 92
Relationship Between Tiers and Operational Sustainability Based on Business Objectives Increased rigor with increased uptime requirement > Greater management rigor required to achieve the design potential of Tier III and IV infrastructure > Change opportunities become more complex and require more planning to achieve Both Tier Classification System and Operational Sustainability required to meet business/uptime objectives 93
Tiers Summary 94
Mapping Business Objective to Tiers Can your organization afford to take the computer room down to perform infrastructure maintenance (planned downtime)? > If yes, Tier I or II > If no, Tier III or IV Can your organization afford unplanned downtime taking your computer room down? > If yes, Tier III > If no, Tier IV 95
Tiers Certification Process 96
Tier Certification Process Tier Certification Of Design Documents Tier Certification of Constructed Facility Tier Certification of Operational Sustainability Design Documents Meet the Tier Objective Data Center Meets Functionality for the Tier Objective Data Center Is Being Managed/Operated to Meet the Tier Objective For more information: http://uptimeinstitute.com/contact-us 97
Tier Gap Analysis Before starting the formal Tier Certification process of existing data centers > Single in-office review of select design documents to identify common significant design shortfalls > Memo documents major gaps to the Tier objective > Conference call with owner team to discuss findings 98
Tier Certification of Design Documents Review of 100% design document package in Uptime Institute offices Deliverable of Tier deficiencies and Operational Sustainability enhancements Conference call with owner and design team to discuss report Compliance review of revised drawings Award letter and foil 99
Tier Certification of Constructed Facility On-site visit by team of consultants Identify discrepancies between design drawings and installed equipment Observe tests and demonstrations to prove Tier compliance Deliverable of Tier deficiencies and Operational Sustainability enhancements Conference call with owner team Award letter, foil, and plaque 100
Tier Certification of Operational Sustainability Site visit to review the facilities management Evaluate presence and effectiveness of staffing, training, maintenance program, processes, and procedures Scorecard and Gold, Silver, Bronze rating Certification becomes suffix to Tier > Tier III Gold 101
Global Tier Certifications Tier II 20 Tier III 224 Tier IV 50 Tier II 4 Tier III 76 Tier IV 14 Tier III Gold 4 Tier IV Gold 2 Certifications Underway Worldwide 185 102
Unbiased and vendor neutral thought leadership, research, and publications Thank you! Eric Maddison emaddison@uptimeinstitute.com http://uptimeinstitute.com 2013 Uptime Institute Professional Services, LLC 103