IT Equipment Design Evolution & Data Center Operation Optimization Don Beaty Roger Schmidt
Copyright Materials Copyright 2015 by ASHRAE. All rights reserved. No part of this presentation may be reproduced without written permission from ASHRAE, nor may any part of this presentation be reproduced, stored in a retrieval system or transmitted in any form or by any means (electronic, photocopying, recording or other) without written permission from ASHRAE. 2
AIA/CES Registered Provider ASHRAE is a Registered Provider with The American Institute of Architects Continuing Education Systems. Credit earned on completion of this program will be reported to CES Records for AIA members. Certificates of Completion for non AIA members are available on request. This program is registered with the AIA/CES for continuing professional education. As such, it does not include content that may be deemed or construed to be an approval or endorsement by the AIA of any material of construction or any method or manner of handling, using, distributing, or dealing in any material or product. Questions related to specific materials, methods, and services will be addressed at the conclusion of this presentation. 3
USGBC Education Partner IT Equipment Design Evolution & Data Center Operation Optimization Approved for: 3 General CE hours By ASHRAE GBCI cannot guarantee that course sessions will be delivered to you as submitted to GBCI. However, any course found to be in violation of the standards of the program, or otherwise contrary to the mission of GBCI, shall be removed. Your course evaluations will help us uphold these standards. 0 LEED-specific hours Approval date: July 2014 Course ID: 00920001026 4
Opening Comments Was the dominant computer cooling method liquid or air in 1980, in 2000? What will it be in 2020? IT hardware has been REALLY CHANGING and EVOLVING (for example, Extreme Low Energy Servers, CPUs, GPUs, Multi Core, Disaggregation, Variable Speed, Software Defined Networks) IT manufacturers respond to customer need and demands which vary greatly; any combination of the following: LOWER cost MORE storage MORE energy efficient MORE computing capabilities Significant changes in hardware including hardware operating conditions. Customer needs including operational continues the evolution and optimization of hardware. Operational data received in recent years combined with IT manufacturers analysis has created some surprising and important discoveries including unintended consequences. 5
Opening Comments Tower Server (tall & narrow) 1U Server (short & wide) Blade Server (square) Is Hardware Eating Itself? Is Software Eating Hardware? All indications are that Software is Eating Hardware; what does this mean? 6
Presentation Overview PRESENTATION TITLE: IT Equipment Design Evolution & Data Center Operation Optimization PART 1 Hardware Overview Including Trends Previous Trends PART 2 Hardware Overview Including Trends Current / Projected Trends PART 3 Hardware Basics PART 4 Hardware Requirements, Discoveries & Concerns PART 5 Facilities Air Cooling Architecture PART 6 Liquid Cooling Closing Comments 7
Part 1: Hardware Overview Including Trends Previous Trends History has consistently proven that many aspects of hardware continue to aggressively trend upward. Two exceptions to the upward trend are Electronic Packaging & Compute Energy 8
Incredible Performance Improvements 10 10 0 As the Number of Transistors goes up, Energy per Transistor goes down Number of Transistors 10 10 10 9 Energy per Transistor 10 1 10 2 10 3 10 4 10 8 10 7 10 6 Number of Transistors 10 5 Energy per Transistor 10 5 10 6 10 ~1 Million Factor Reduction in Energy / Transistor Over 30+ Years 4 1970 1980 1990 2000 9 2010 IBM Graphic Modified By DLB
Chip Cooling (Bipolar vs. CMOS) Historical Trend 15 End of Bipolar Water Cooling Bipolar CMOS Module Heat Flux (Watts / cm²) 10 5 0 1960 1970 1980 1990 2000 2010 IBM Graphic Modified By DLB 10
Moore s Law (Microprocessor Transistor Counts) Historical & Predictive Trend 2.6B 1B 100M Number of Transistors 10M 1M 100k 10k 2.3k 24 Month reduced to 18 Month Doubling 1970 1980 1990 2000 Date of Introduction 11 2010 Source: Wikipedia Modified By DLB
Kryder s Law (Moore s Law for Storage) Historical & Predictive Trend 10.000 1.00 0 100 Capacity (GB) 10 1 0,1 0,01 0,001 1980 1985 1990 1995 2000 2005 2010 2015 Source: Wikipedia / Scientific American 2005 Modified By DLB 12
Internet Traffic Trend Historical Trend 450.000 400.000 350.000 Amsterdam International Internet Exchange Monthly Input Traffic (TB) (July 2001 to January 2013) Amsterdam International Internet Exchange Traffic Intel (Doubling every 18 months) Moore s Law (Doubling every 24 months) TB Input / Month 300.000 250.000 200.000 150.000 100.000 50.000 0 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 Source: ams ix.com Modified by DLB 13
(Jakob) Nielsen's Law of Internet Bandwidth Historical & Predictive Trend 100.000.000 10.000.000 1.000.000 Capacity (GB) 100.000 10.000 1.00 0 100 10 A high end user's connection speed grows by 50% per year www.nngroup.com 1 1983 1988 1993 1998 2003 2008 2013 nngroup.com Modified By DLB 14
2000 & 2005 IT Equipment Power Predictive Trends Uptime Institute Graphic Modified By DLB ASHRAE Graphic Modified By DLB 0,65m 2 0,65m 2 Thermal Management Consortium in 2000: Published Through The Uptime Institute ASHRAE 2005 Publication: Datacom Equipment Power Trends & Cooling Applications History Validated These Trends 15
Data Center Generational Impact 2007 to 2011 Generation 1 Generation 2 Generation 3 Generation 4 Generations 1 to 4 have the SAME IT Performance IBM Metric Comparative Change (4U, 4 Processor) Generation 1 (Year 1 Baseline) Generation 4 (Year 4) Generation 1to 4 Comparison IT Performance 100% 100% SAME IT Power per Rack 100% 119% + 19% Rack Count 100% (60) 13% (8) 87% Total IT Power 100% 18% 82% Space Required 100% 15% 85% Cooling Required 100% 20% 80% 16
ITE Environment ASHRAE Psychrometric Chart 2004 / 2008 Recommended / Allowable Criteria 2004 2008 Low End Temp. 20 C (68 F) 18 C (64,4 F) High End Temp. 25 C (77 F) 27 C (80,6 F) Prior to 2004 Typically 20 C +/ 0,6 C Low End Moisture 40% RH 5,5 C DP (41,9 F) High End Moisture 55% RH 60% RH & DP 15 C (59 F) 17
Part 2: Hardware Overview Including Trends Current / Projected Trends Compute Performance Growth is faster than Compute Power Growth but power is still growing ASHRAE Trend Charts and TGG Maturity Model are valuable for Planning the Future 18
ITE Environment ASHRAE Psychrometric Chart 2011 Classes A1 and A2 are EXACTLY the SAME as previous Classes 1 & 2: Apply to new AND legacy equipment New Classes A3 and A4 do NOT include legacy equipment 19
Temperature Rate of Change Clarification NOTE: For tape storage equip., no more than: 5 C in an hour For all other IT equip., no more than: 20 C in an hour AND 5 C in any 15 minute period of time (The 5 C and 20 C temperature changes are NOT instantaneous temperature rates of change) 20
Temperature Rate of Change Example Graphs Inlet Air Temp ( C) 5 C 1 Hour 1 Hour 1 Hour Time (hours) Examples of Conforming Rate of Change for Tape Drives IBM Graphic Modified By DLB IBM Graphic Modified By DLB 15 min Inlet Air Temp ( C) 20 C 5 C 1 Hour 1 Hour 1 Hour Time (hours) Examples of Conforming Rate of Change for Other IT Equipment (e.g. Servers) 21
Volume Server Power Demand & Performance Growth Volume Server Power Demand & Performance Growth (excludes extreme low energy servers) Volume Server CAGR Growth in 3 Years Growth in 5 Years Power Demand Growth 2% 6% 5% 20% 10% 30% Performance Growth 25% 30% 90% 120% 200% 250% Compounded Annual Growth Rate (CAGR) Growth Projections are Rounded IBM 22
Volume Server Power Trends to 2020 (fully configured, fully utilized max load) Height Heat Load / Chassis (Watts) Heat Load / 42U Rack Increase No. of 2010 to Sockets 2010 2015 2020 2010 2015 2020 2020 1U 1s 255 290 330 10.710 12.180 13.860 29% 2s 600 735 870 25.200 30.870 36.540 45% 4s 1.000 1.100 1.200 42.000 46.200 50.400 20% 2U 2s 750 1.100 1.250 15.750 23.100 26.250 67% 4s 1.400 1.800 2.000 29.400 37.800 42.000 43% 4U 2s 2.300 3.100 3.300 23.000 31.000 33.000 43% 7U (Blade) 2s 5.500 6.500 7.500 33.000 39.000 45.000 36% 9U (Blade) 2s 6.500 8.000 9.500 26.000 32.000 38.000 46% ASHRAE Table Reformatted by DLB Associates Market Requirements force IT manufacturers to maximize performance/volume (creating high heat load/rack). These rack heat loads will result in increased focus on improving data center ventilation solutions and localized liquid cooling solutions. High Risk to Generalize; One Shoe Definitely Does NOT Fit All 23
Volume Server Power Trends Simple Adjustment Factor Example Adjustment Factor 1.00 (Original Value) 0.50 (Your Data Center) Volume Server Configuration Heat Load / Chassis (Watts) Heat Load / 42U Rack 2010 2015 2020 2010 2015 2020 1U, 2s 600 735 870 25.200 30.870 36.540 1U, 2s 300 368 435 12.600 15.435 18.270 How to adjust the published Trends for your environment: 1) Trend Chart Value for a 1U, 2s Volume Server in 2010: 600 Watts 2) ACTUAL MEASURED Value for YOUR 1U, 2s Server: 300 Watts 3) Calculated Adjustment Factor for YOUR 1U, 2s Server = 300 Watts / 600 Watts = 0.50 24
TGG Data Center Maturity Model Concept Color Coding Level 5: Visionary (5 Years Away) Clear: Not a DCMM level Level 4: Level 3: Reasonable Steps (between current best practices & the visionary 5 year projection) Black: Yellow: Target Theoretical Max. Level 2: Best Practice Yellow: Target Level 1: Part Best Practice Level 0: Minimal / No Progress Green: Green: Achieved Achieved DLB Associates 25
TGG Data Center Maturity Model Concept Color Coding Level 5 (Visionary: 5 Yrs. Away) Level 4 (Reasonable Step) Level 3 (Reasonable Step) Level 2 (Best Practice) Level 1 (Part Best Practice) Level 0 (Minimal / No Progress) Targeted, Theoretical Max. and Non Levels VARY with a Given Aspect of the Data Center DLB Associates 26
TGG Data Center Maturity Model Definitions FACILITY 1. Power (Critical Power Path Efficiency Building Entrance to IT load, Architecture, Operations, Generation) IT 5. Compute (Utilization, Workload Management, Operations, Power Management, Server population) 2. Cooling (PUE Cooling Contribution, RCI (high) & RCI (low) if applicable, Mechanical / Refrigerant Cooling reduction, Environmental set point range at inlet conditions to IT equipment, Environmental monitoring and control, Operations) 3. Other Facility (Operational Resilience, Resilience vs. Need, Lighting, Building/Shell, M&E Waste, Procurement) 4. Management (Monitoring, PUE, Waste heat reuse (as measured by ERF/ERE), CUE, WUE, xue/additional metrics) 6. Storage (Workload, Architecture, Operations, Technology, Provisioning) 7. Network (Utilization, Workload, Operations, Technology, Base performance, Provisioning) 8. Other IT (Overall, Utilization, IT sizing, Internal Power Supply Efficiency, Service Catalog / SLA's, Incentivizing change for efficient behavior, E Waste, Procurement) 27
TGG Data Center Maturity Model Definitions Matrix Topic Level 0 Level 1 Level 2 Level 3 Level 4 Level 5 Facility Cooling: 2.1 PUE Cooling Contribution Annual Average 1.0 Annual Average 0.5 Annual Average 0.35 Annual Average 0.2 Annual Average 0.1 Annual Average 0.05 Level 0 Level 1 Level 2 Level 3 Level 4 Level 5 Data Center Efficiency & Sustainability Investment (Financial, Time & Resource) 28 TGG Graphic Modified by DLB Associates
Data Center Maturity Model ASHRAE TC 9.9 Envelope Examples Data Center Maturity Model ASHRAE Envelope Examples TEMPERATURE ONLY Upper Limit Example 1 Example 2 Level Description Annual Excursion Annual Excursion Normal Normal Temperature Hours Temperature Hours 0 Minimal / No Progress 21 C ±1 C (R) None None 21 C ±1 C (R) None None 1 Part Best Practice 20 25 C (R) 25 27 F (R) < 10 hrs. 20 27 C (R) 27 29 C (A1) < 10 hrs. 2 Best Practice 20 27 C (R) 27 29 C (A1) < 10 hrs. 20 28 C (A1) 28 32 C (A1) < 10 hrs. 3 Reasonable Step 20 27 C (R) 27 29 F (A1) < 100 hrs. 20 28 C (A1) 28 32 C (A1) < 100 hrs. 4 Reasonable Step 20 27 C (R) 27 32 C (A1) < 10 hrs. 20 29 C(A1) 85 32 C (A1) < 10 hrs. 5 Visionary (5 Yrs. Away) 20 27 C (R) 27 33 C (A2) < 25 hrs. 20 29 C (A1) 29 35 C (A2) < 25 hrs. (R) = ASHRAE Recommended Envelope, (A1) = ASHRAE Allowable A1, (A2) = ASHRAE Allowable A2 DLB Associates 29
Part 3: Hardware Basics Some basics on server design, configuration and misconceptions 30
Server Power & IT Hardware Airflow Rate SERVER POWER 25 to 50% 100% IT HARDWARE AIRFLOW RATE 75 CFM/KW 150 CFM / KW Idle Power Production Power Normal Conditions Worst Case Conditions IBM Graphic Modified By DLB 31
Thermal Report Example: Generic Server Description Minimum Configuration Model 1 way 1.5 GHz Processor 16 GB memory Typical Heat Release Condition Nominal Airflow Max. @ 35 o C (95 F) Watts @ 120 V cfm (m 3 /h) cfm (m 3 /h) 420 26 44 40 68 Full 2 way 1.65 GHz Processor Max. memory 600 30 51 45 76 Typical 1 way 1.65 GHz Processor 16 GB memory 450 26 44 40 68 NOTE: Most new server fans are variable speed 32
Thermal Report Comparison to Nameplate Nameplate 920 W (1 kva w/ PF = 0.92) ASHRAE Thermal Report 420 to 600 W 33
Thermal Report Energy Star Example 34
2012 ASHRAE Whitepaper ASHRAE TC 9.9 Whitepaper on IT Equipment Thermal Management & Controls (free download at www.tc99.ashraetcs.org) The purpose of the whitepaper was to: 1) Describe mainstream ITE cooling systems 2) Describe ITE power and thermal management 3) Describe interactions between ITE equipment and the data center 35
Common Misconceptions with Respect to Servers MISCONCEPTION 1 ITE FANS consume 10% to 20% of Total ITE power: IT Fan Power 20 % 10 % 0% Idle Typical Perception IT Workload Idle Conditions (as low as 1% of total load) Typical Conditions (2% to 4% of total load) Extreme Conditions (8% to 15% of total load) (limited to no fan speed control) MISCONCEPTION 2 ITE is managed based upon Chassis Temperature Rise: Thermal management within servers is primarily driven to ensure compliance to component temperature specifications. Component temperatures are often very similar over a range of wide ambient temperatures. Temperature rise is not generally a consideration in the thermal management scheme. Exhaust temperature may be a consideration for safety reasons, in which case temperature rise of the air passing through the chassis must be determined in some manner. 36
Common Misconceptions with Respect to Servers (cont.) MISCONCEPTION 3 All mainstream IT equipment is Designed and Performs SIMILARLY: Poor Design IT equipment designed WITHOUT Precision Thermal Management Low End Volume Servers Typically do not monitor all sensors available and have simple fan speed control (FSC) algorithms. Sensors Today s well designed servers integrate a large numbers of thermal sensors along with activity and power sensors to drive fan speeds as low as possible to reduce wall power. Server Fans Power consumption of server fans has improved significantly over time. 37
Requirements for Server Cooling Thermal Design ASHRAE 3 Levels of Limiting Component Temperature 38
Platform Power Thermal Management Power Management Used for Thermal Compliance 39
Component Temperature Driven by Three Effects ASHRAE 3 Sources of Component Temperature: Self Heating Air Heating System Ambient 40
Boundaries for Thermal Management Design considerations include: Usage models Environmental conditions Rack & room level airflow protocols Components selection, location & configuration ASHRAE Design considerations must be evaluated against: Cost Performance Energy objectives 41
Component Packaging to Meet Thermal Requirements Processor Package TIM 2 TIM 1 Case/IHS Die Substrate T LA (Local Ambient) T s (sink) T c (case/ihs) T j (junction) Processor Package plus Cooling Components Abbreviations: T = Temperature TIM = Thermal Insulating Material IHS = Internal Heat Sink 42
System Considerations Good Thermal Design includes: Integrates Real time Optimization Delivers PRECISELY the Performance Needed Consumes LOWEST Power while meeting Component Specifications Incorporates best acoustic performance without compromising power / performance SHOW & TELL ASHRAE 43
Server Control Process & Sensing for Thermal Control Process SERVER CONTROL PROCESS Diagrammatic Sensor Location Example Temperature INPUTS Power Activity Fan Conditions Fan Speed Control ALGORITHMS Power Management State, Traffic Limitation Fan Speeds OUTPUTS Performance Settings Power Settings ASHRAE Graphic Modified By DLB ASHRAE Graphic Modified By DLB Common for 2 socket server to have more than 30 sensors 44
Testing and Validation Classical Testing and Industry Standards: 1) Acoustics 2) Electromagnetic Compatibility 3) Shock and Vibrations 4) Environmental and Thermal Stress 5) Volatile Organic Compounds 6) Product Safety 45
Testing and Validation Acoustics IBM IBM Emission Sound Pressure Level (Semi Anechoic Chamber) Sound Power Level (Reverberation Room) Measurements in accordance with ISO 7779: Measurement of airborne noise emitted by information technology and telecommunications equipment 46
Testing and Validation Electromagnetic Compatibility IBM IBM Radiated & Conducted Emissions (10 meter Semi Anechoic & OATS Chamber) Radiated Immunity (3 meter Semi Anechoic Chamber) Regulatory Compliance: Emissions Tests Immunity Tests 47
Testing and Validation Shock & Vibration Show & Tell 48
Testing and Validation Environmental / Thermal Stress Chamber IBM 49
Testing and Validation Environmental Chamber Verification of extremes of environmental envelope IBM IBM Worst case cabling configurations considered for testing 50
Testing and Validation Volatile Organic Compounds Nested Testing Strategy (VOCs, Ozone, Particulates) Test top model in each product line Perform thorough emissions analysis Pass remaining models on substantial equivalency Track Emissions Profiles Polymers IC Card substrate Supplies Dynamic Testing Targets Regulations/Standards Government Agency Guidelines Product Emissions Chamber (15 m x 8 m x 3 m) IBM 51
Testing and Validation Product Safety Product Safety Requirements driven by: 1) Legal/Regulatory Requirements Manufacturer based Marketing/Sales based Importer/Exporter based Customer based 2) Good Corporate Citizenship Process for Obtaining Certifications: 1) Early involvement in the design to assure the product will be compliant 2) Building block approach to certify safety critical components 3) Accreditation of local labs to carry out testing 52
ITE Environment ASHRAE Psychrometric Chart 2011 Classes A1 and A2 are EXACTLY the SAME as previous Classes 1 & 2: Apply to new AND legacy equipment New Classes A3 and A4 do NOT include legacy equipment 53
ITE Environment 2011 Environment Specifications Table (Partial) 54
Table of Contents EXECUTIVE SUMMARY 1) Introduction 2) Survey of Maximum Temperature Ratings 3) Cooling Design of Networking Equipment 4) Equipment Power and Exhaust Temperatures 5) Environmental Specifications 6) Reliability 7) Practical Installation Considerations 8) ASHRAE TC9.9 Recommendations 9) Summary 10) References APPENDIX A: DEFINITION OF ACRONYMS AND KEY TERMS APPENDIX B: ACOUSTICS APPENDIX C: TOUCH TEMPERATURE Data Center Networking Equipment Issues and Best Practices Whitepaper prepared by ASHRAE Technical Committee (TC) 9.9 Mission Critical Facilities, Data Centers, Technology Spaces, and Electronic Equipment 2013, American Society of Heating, Refrigerating and Air Conditioning Engineers, Inc. All rights reserved. This publication may not be reproduced in whole or in part; may not be distributed in paper or digital form; and may not be posted in any form on the Internet without ASHRAE s expressed written permission. Inquiries for use should be directed to publisher@ashrae.org ASHRAE TC 9.9 55
Networking Recommendations New networking equipment designs draw cooling air from the front face of the rack, with the air flow direction from the front of the rack to the rear, and the hot exhaust exiting the chassis at the rear face of the rack. This front to rear cooled equipment should be rated to a minimum of ASHRAE Class A3 [40 C (104 F)] and preferably ASHRAE Class A4 [45 C (113 F)]. The development of new products that do not adhere to a front to rear cooling design is not recommended. It is recommended that networking equipment, where the chassis does not span the full depth of the rack, have an air flow duct that extends all of the way to the front face of the rack. The equipment should be designed to withstand a higher inlet air temperature than the data center cooling supply air if: 1) the equipment is installed in an enclosed space that does not have direct access to the data center air cooling stream, or 2) the equipment has a side to side air flow configuration inside an enclosed cabinet. 56
Networking Recommendations (cont.) Networking equipment manufacturers should provide very specific information on what types of installations for which their equipment is designed. Users should follow the manufacturer installation recommendations carefully. Any accessories needed for installation, such as ducting, should either be provided with the equipment or should be readily available. By following these recommendations, the risk of equipment overheating can largely be avoided and the compatibility of networking equipment with other types of equipment in rack and data center level solutions will be significantly improved. 57
Common Air Flow & Mechanical Design Configurations Side View of Rack (side panel of rack and chassis removed) Front View of Rack Large Switch, Typically Full & Half Size Rack with Front to Rear Air Flow 58
Common Air Flow & Mechanical Design Configurations (cont.) Side View of Rack (side panel of rack and chassis removed) Front View of Rack Mid Size Switch with Side to Side Air Flow 59
Common Air Flow & Mechanical Design Configurations (cont.) pp Top View of Equipment (top panel of chassis removed) Front View of Equipment (ports can sometimes also be in rear) Small Networking Equipment with S Shaped Air Flow (other networking equipment diagrams are shown in the whitepaper) 60
Part 4: Hardware Requirements, Discoveries & Concerns 61
IT Equipment Environment Envelope Definitions RECOMMENDED The purpose of the recommended envelope is to give guidance to data center operators on maintaining high reliability and also operating their data centers in the most energy efficient manner. ALLOWABLE The allowable envelope is where the IT manufacturers test their equipment in order to verify that the equipment will function within those environmental boundaries. PROLONGED EXPOSURE Prolonged exposure of operating equipment to conditions outside its recommended range, especially approaching the extremes of the allowable operating environment, can result in decreased equipment reliability and longevity. Occasional short term excursions into the allowable envelope MAY be acceptable. OPERATING AT COLDER TEMPERATURES WASTES ENERGY NEEDLESSLY! 62
ITE Environment 2011 Environment Specifications Table (Partial) 63
2011 ALLOWABLE Environmental Envelopes High RH Effects: IT Reliability Enhanced Corrosion High Operating Temp. Effects: IT Reliability DC Airflow Transient Response Low RH Effects: IT Reliability ESD IBM 64
Ambient Inlet Temperature Impacts Power Increase Extrapolation From Graph Power Power Increase Due to Temperature 20 C (68 F) 25 C (77 F) 27 C (80,6 F) 32 C (89,6 F) 35 C (95 F) Lowest 1.00 1.00 1.02 1.04 1.06 Highest 1.02 1.03 1.04 1.11 1.20 Airflow and total power increase with temperature Fan power required increases to the cube of fan speed (rpm) Total power increase includes both fan and component power 65
IT Equipment Airflow Trends IT equipment AIRFLOW demands are increasing PROPORTIONAL to POWER. This SIGNIFICANT increase in airflow demand has not been obvious ( caught off guard). How fast will end users and facility operators discover this AIRFLOW INCREASE Significant airflow perturbation exists when HPC applications are initialized (can be on the order of >1,000 CFM / rack). 66
Data Center Air Flow Problem? Increase Aisle Pitch? Change Vent Type? Reduce Server Count? Front Rack Plenum? Liquid Cooling? Rear Door Heat Exchanger? WHAT DO YOU DO? Add Row Cooling to Supplement? Passive Chimney? Active Chimney? 67
How to Choose? Choice depends largely on facility design: 1) Existing or new facility, rack density, redundancy, flexibility, etc. 2) Lifecycle cost (TCO) is essential 3) Likely to use more than one approach Separate high and low density. Scale cooling resources as you scale demand. 68
Close Coupled Cooling HEAT EXCHANGER HEAT EXCHANGER RACK RACK IBM Rear Door Heat Exchanger (Side View) RACK RACK RACK HEX RACK RACK RACK Overhead Heat Exchanger (Side View) IBM In Row Heat Exchanger (Top View) IBM 69
Impact of Additional Hardware Connected to a Rack Rack Cover Server Rack Cover Additional Hardware dp 1 dp 2 dp 3 dp 4 dp 5 Side View of Rack Blowup of Rack & Server IBM 70
Bypass Air Internal to Rack Rack Cover Rack Cover Server 100% Server 100% Server OFF Server 100% Side View of Rack Server 100% Blowup of Servers 71
Multi Vendor Pressure Testing 100% 80% 60% 40% 20% 0% 20% 40% Inhibiting Flow CFM % 60% 0.3 0.2 0.1 0.0 0.1 0.2 0.3 Inches H2O Aiding Flow ASHRAE * Component temperatures essentially constant over the range of pressures 2011 2013 Products CCurrent Generation 2U CCurrent Generation 1U CCurrent Generation 1U Last Generation 1U CCurrent Generation CBlade CCurrent Generation 2U Current Generation 1U 72
Server Delta T Reduction at Increased Operating Temperature 58 C Effective Ceiling 58 C Effective Ceiling 60 C Effective Ceiling IBM 73
Higher Temperature Impacts on Data Center Operation 2.0 1.8 Hardware Failure Rate for Volume Servers X Factor 1.6 1.4 1.2 1.0 0.8 0.6 15 C (59 F) 17.5 C (63.5 F) 20 C (68 F) 22.5 C (72.5 F) 25 C (77 F) 27.5 C (81.5 F) 30 C (86 F) 35 C (95 F) 40 C (104 F) 45 C (113 F) Inlet Temperature IBM 74
Higher Temperature Impacts on Data Center Operation (cont.) CRAH RACK RACK CRAH RACK RACK CRAH RACK RACK IBM Hot Aisle Containment Cold Aisle Containment Direct Rack Exhaust IBM IBM Impacts to Air Flow & Temperature Inlet to Servers: Loss of utility power resulting in all CRAHs being off until generator power engages Loss of chilled water to some or all the CRAHs Airside economizer room where inlet air temperature goes above ASHRAE recommended by design 75
Raw Data Example Insulating floor/shoe Large voltages Accumulation (who sits up 10 times?) ESD mitigating floor/shoe Much lower voltages Much quicker discharge ASHRAE 76
Definition of Event Voltage (taking off and dropping a sweater) ASHRAE 77
Higher Temperature Impacts on Data Center Operation (cont.) Recording Device Stainless steel, brass or copper electrode (or wrist strap) can be used Charge Plate Monitor 91 cm Support Material 91 cm IBM ANSI / ESD STM 97.2 Test Procedure The induced voltage on the body of a person is measured and recorded while he is walking. 78
Main Experiment Main Experiment 1) Well defined walking experiment Awareness Experiments 2) Random walking and scraping 3) Sweater taking off and drop 4) Sitting up from chairs 5) Cart experiment The test setup for random walking and the scraping experiment is similar to the well defined walking experiment. The person walks randomly fast and sometimes scrapes his feet. The principle setup of the human walking test in accordance to ANSI/ESD S20.20. 79
Low RH Effects on Data Center Operation Low RH Effects on Data Center Operation: 15 C (59 F) and 15% RH Floor Shoes 3M 4530 Asia 3M China Slip On 3M 6432 Cond 3M Full Sole 3M 8413 Running Shoe 3M Green Diss DESCO 2 Meg Sole 3M Low Diss DESCO Heel 3M Thin VPI DESCO Full Sole 2 Meg Epoxy 1A Hush Puppy Flexco Rubber Red Wing HPL F Stat a Rest HPL N Sperry Korean Vinyl Heel Strap Standard Tile Wax 80 IBM
Probability of an ESD Related Risk In order to derive the relative rate of ESD related failures, it is necessary to obtain the probability of observing a voltage above a certain value. Three thresholds values have been selected: 500 V Service Test Limit 4,000 V 8,000 V Operational Test Limits 81
ESD Risks at the Three Threshold Values Cumulative Probability P (V > V 0 ) with ESD Floors and ESD Shoes Pattern Walking Environmental Condition V 0 = 500 V V 0 = 4,000 V V 0 = 8,000 V 45% RH at 80.6 F (27 C) 1.47e 9% 1.69e 17% 3.82e 20% 25% RH at 80.6 F (27 C) 9.74e 3% 3.05e 7% 9.61e 9% 8% RH at 80.6 F (27 C) 3.76e 4% 6.80e 10% 8.30e 12% Cumulative Probability P (V > V 0 ) with Non ESD Floors and Non ESD Shoes Pattern Walking Environmental Condition V 0 = 500 V V 0 = 4,000 V V 0 = 8,000 V 45% RH at 80.6 F (27 C) 4.7% 0.013% 0.0018% 25% RH at 80.6 F (27 C) 23% 1.13% 0.27% 8% RH at 80.6 F (27 C) 48.8% 2.28% 0.43% Cumulative Probability P (V > V 0 ) with ESD Floors and Non ESD Shoes Pattern Walking Environmental Condition V 0 = 500 V V 0 = 4,000 V V 0 = 8,000 V 45% RH at 80.6 F (27 C) 0.15% 7.44e 9% 1.17e 11% 25% RH at 80.6 F (27 C) 5.8% 7.14e 9% 2.12e 8% 8% RH at 80.6 F (27 C) 12.2% 2.38e 4% 3.01e 7% 82 BEST WORST ASHRAE
Dust Can Degrade Computer Reliability Dust is everywhere. Even with our best filtration efforts, fine dust will be present in a data center and will settle on electronic hardware. Dust settled on printed circuit boards can lead to electrical short circuiting of closely spaced features by absorbing water, getting wet and, thus, becoming electrically conductive. Dust can enter electrical connectors causing: Power connections to overheat, and Signal connections to become electrically noisy. IBM 83
Higher Temperature Impacts on Data Center Operation (cont.) Dust Contaminants & Relative Humidity Corrosion No corrosion IBM Each salt contaminant has a deliquescent relative humidity above which the salt will absorb moisture, becoming wet and electrically conductive. Wet salts can corrode metals. 84
Gaseous Pollutants/RH Effects http://www.atmos-chem-phys.net/14/1929/2014/acp-14-1929-2014.pdf IBM 85
Two Common Present Failure Modes Due to Gaseous Contamination 1)The copper creep corrosion on circuit boards 2)The corrosion of silver metallization in miniature surfacemounted components Sulfur bearing gases are mainly responsible for the corrosion related hardware failures. 86
ISA 71.04 1986 Classified Corrosion into 4 Severity Levels ISA S71.04 Corrosion Severity Levels Level Severity Corrosion Rate G1 Mild < 300 Å / month G2 Moderate < 1,000 Å / month G3 Harsh < 2,000 Å / month GX Severe > 2,000 Å / month ASHRAE white paper added Silver Coupons Now updated to include silver <200 Å / month and copper corrosion rate < 300 Å / month 1 angstrom (Å) is a unit of length equal to 0.1 nanometer or 1 10 10 meters ASHRAE Whitepaper, "Gaseous and Particulate Contamination Guidelines for Data Centers" http://tc99.ashraetcs.org 87
ASHRAE Recommendations for Controlling Particulate and Gaseous Contamination Data center managers must do their part in maintaining hardware reliability by monitoring, preventing and controlling the particulate and gaseous contamination in their data centers. Data centers must be kept clean to ISO 14644 1 Class 8. This level of cleanliness can generally be achieved by an appropriate filtration scheme as outlined here: 1) The room air may be continuously filtered with MERV 8 filters as recommended by ANSI / ASHRAE Standard 127 2007, Method of Testing for Rating Computer and Data Processing Room Unitary Air Conditioners. 2) Air entering a data center may be filtered with MERV 11 or MERV 13 filters as recommended by the 2008 ASHRAE book Particulate and Gaseous Contamination in Datacom Environments. Sources of dust inside data centers should be reduced. Every effort should be made to filter out dust that has deliquescent relative humidity greater than the maximum allowable relative humidity in the data center. The gaseous contamination should be within the ANSI / ISA 71.04 2013 severity level G1 that meets: 1) A copper reactivity rate of less than 300 angstroms / month, and 2) A silver reactivity rate of less than 300 angstroms / month. For data centers with higher gaseous contamination levels, gas phase filtration of the inlet air and the air in the data center is highly recommended. 88
Part 5: Facilities Air Cooling Architecture 89
Facilities Air Cooling Architecture Air Cooled System Definition: 1) Air is supplied to the inlets of the rack for convection cooling of the heat rejected by the components of the IT equipment within the rack. 2) Within the rack itself, the transport heat from the actual source component (for example, a processor) can be either liquid or air based. 3) The heat rejection media from the rack to the terminal cooling device outside of the rack is air. Liquid Cooled System Definition: 1) Liquid (for example, water usually above the dew point) is channeled to the actual heatproducing IT equipment components. 2) The liquid is used to transport heat from those components and rejected via a heat exchanger (air to liquid or liquid to liquid) or extended to the terminal cooling device outside of the rack. 90
Facilities Air Cooling Architecture 91
Facilities Air Cooling Architecture 92
Facilities Air Cooling Architecture 93
Facilities Air Cooling Architecture Raised Floor Implementation most commonly found in Data Centers using CRAC Units 94
Facilities Air Cooling Architecture Raised Floor Implementation using Building Air from a Central Plant 95
Facilities Air Cooling Architecture Raised Floor Implementation using 2 Story Configuration with CRAC Units on the Lower Floor 96
Facilities Air Cooling Architecture Overhead Cooling Distribution commonly found in Central Office Environments 97
Facilities Air Cooling Architecture Raised Floor Implementation using a Dropped Ceiling as a Hot Air Return Plenum 98
Facilities Air Cooling Architecture Raised Floor Implementation using Panels to Limit Air Mixing by Containing the Cold Aisle Supply 99
Facilities Air Cooling Architecture Raised Floor Implementation using Panels to Limit Air Mixing by Containing the Hot Aisle Exhaust 100
Facilities Air Cooling Architecture Raised Floor Implementation using Inlet & Outlet Plenums / Ducts Integral to the Rack 101
Facilities Air Cooling Architecture Raised Floor Implementation using Outlet Plenums / Ducts Integral to the Rack 102
Facilities Air Cooling Architecture Local Cooling Distribution using Overhead Cooling Units mounted to the Ceiling above the Cold Aisle 103
Facilities Air Cooling Architecture Local Cooling Distribution using Overhead Cooling Units mounted to the Ceiling above the Hot Aisle 104
Facilities Air Cooling Architecture Local Cooling Distribution using Overhead Cooling Units mounted to the Tops of the Racks 105
Facilities Air Cooling Architecture Local Cooling via Integral Rack Cooling Units on the Exhaust Side of the Rack 106
Facilities Air Cooling Architecture Local Cooling via Integral Rack Cooling Units on the Inlet Side of the Rack 107
Facilities Air Cooling Architecture Local Cooling Units Interspersed within a Row of Racks 108
Show & Tell 109
Facilities Air Cooling Architecture Air Cooled System Definition: 1) Air is supplied to the inlets of the rack for convection cooling of the heat rejected by the components of the IT equipment within the rack. 2) Within the rack itself, the transport heat from the actual source component (such as a processor) can be either liquid or air based. 3) The heat rejection media from the rack to the terminal cooling device outside of the rack is air. Liquid Cooled System Definition: 1) Liquid (such as water usually above the dew point) is channeled to the actual heat producing IT equipment components. 2) The liquid is used to transport heat from those components and rejected via a heat exchanger (air to liquid or liquid to liquid) or extended to the terminal cooling device outside of the rack. 110
Part 6: Liquid Cooling 111
Overview of ASHRAE s Liquid Cooling Guidelines Chapter 1 Introduction Chapter 2 Facility Cooling Systems Chapter 3 Facility Piping Design Chapter 4 Liquid Cooling Implementation for Datacom Equipment Chapter 5 Liquid Cooling Infrastructure Requirements for Chilled Water Systems Chapter 6 Liquid Cooling Infrastructure Requirements for Technology Cooling Systems Appendix 112
Liquid Cooling Book Contributors APC Aavid Cray Inc. Dell Computers Department of Defense DLB Associates Consulting Engineers EYPMCF Lytron Mallory & Evans Inc. NCR Panduit Rittal Sanmina SGI Hewlett Packard IBM Intel Corporation Lawrence Berkeley National Labs Liebert Corporation 113 Spraycool Syska & Hennessy Group Inc. Sun Microsystems Trane
Second Edition The second edition now includes the following changes: 1) Chilled Water System (CHWS) has been replaced with Facility Water System (FWS) In recognition that a chiller is NOT always a requirement for delivering cooling water to a datacom facility. 2) Discussions on approach temperatures have been added to Chapter 2 3) Discussion on Liquid Immersion Cooling (section 4.3) 4) In Chapter 5, the Facility Water System requirements now refer to classes W1 through W5 (section 5.1.1) 5) The guidance on water quality problems and wetted material requirements in Chapter 5 has been updated (sections 5.1.2.4 and 5.1.2.5) 6) A discussion on liquid cooling for NEBS compliant spaces has been added to Chapter 5 114
Immersion Cooling Implementation Open/semi open bath immersion cooling of an array of servers in a tank of dielectric fluid that cools the board components via vaporization. 1) Dielectric vapor is condensed using a bath level water cooled condenser. Open/semi open bath immersion cooling of an array of servers in a tank of mineral oil that cools the board components through natural and forced convection. 1) The pumped oil is cooled using an oil to water heat exchanger. Sealed immersion cooling of individual servers or components with refrigerants or other suitable dielectric fluids that cool components via vaporization. 1) Vapor generated is condensed using a server or rack level water cooled condenser. Sealed immersion cooling of individual servers or components with rack level dielectric fluids or mineral oil via sensible (that is, single phase) heat transfer. 1) The pumped dielectric fluid is cooled using a dielectric fluid to water heat exchanger. 115
Liquid Cooling Overview Heat Transfer Resultant Energy Requirements Rate ΔT Heat Transfer Medium Fluid Flow Rate Conduit Size Theoretical Power 35 kw (10 Tons) 6.7 C (12 F) Forced Air Water 15.700 m 3 /hr (9.200 cfm) 1.26 L/s (20 gpm) 86 cm ɸ (34 in.) 5 cm ɸ (2 in.) 2.71 kw (3.63 hp) 0.18 kw (0.25 hp) Water and other liquids (dielectrics, glycols and refrigerants) may be used for Datacom Equipment heat removal. 1) Heat rejection with liquids typically uses LESS transport energy (14.36 Air to Water power ratio for example below). 2) Liquid to liquid heat exchangers have closer approach temps than Liquid to air (coils), yielding increased economizer hours. 116
Liquid Cooling ASHRAE 2011 Thermal Guidelines Liquid Cooling Classes Main Cooling Equipment Typical Infrastructure Design Supplemental Cooling Equipment Facility Supply Water Temp. W1 W2 Chiller / Cooling Tower Water side Economizer (cooling tower or drycooler) 2 17 C (36 63 F) 2 27 C (36 81 F) W3 Cooling Tower Chiller 2 32 C (36 90 F) W4 Water side Economizer (cooling tower or drycooler) N/A 2 45 C (36 113 F) W5 Building Heating System Cooling Tower > 45 C (> 113 F) ASHRAE Table reformatted by DLB Associates 117
Liquid Cooling Typical Infrastructures for Data Centers Class W1, W2, W3 Class W4 Class W5 ASHRAE 118
Definitions: Liquid Cooling Systems / Loops Within a Data Center 119
Definitions: Direct Liquid Cooling Solutions Direct Water Cooling IBM 120
Closed Air Cooled Datacom Equipment in a Liquid Cooled Cabinet 121
Definitions: Rack Level Liquid Cooling Solutions HEAT EXCHANGER HEAT EXCHANGER RACK RACK IBM Rear Door Heat Exchanger (Side View) IBM Overhead Heat Exchanger (Side View) RACK RACK RACK HEX RACK RACK RACK IBM In Row Heat Exchanger (Top View) 122
Definitions: Coolant Distribution Unit (CDU) HEAT EXCHANGER COOLANT DISTRIBUTION UNIT (CDU) PUMPS EXP. TANK RACK MIXING VALVE CUSTOMER SIDE SUPPLY / RETURN Coolant Distribution Unit (CDU) Buffers Computer System Water from Customer s Facility Water 1) Facility s water does not meet system requirements Flow, pressure, temperature, cleanliness and quality 2) Facility s water volume comparatively large a leak could be a disaster 3) Clear demarcation between customer and computer Responsibility, control IBM 123
S/360 Model 91 Introduced 1966 IBM Installation at Columbia University Deployed intra board air to water heat exchangers to remove the heat Total heat load of System/360 Model 91 was 79 kw Coolant Distribution Unit supplied distilled water 50% of the heat load went to water IBM Intra board air to water heat exchanger 124
Why Water Cooling (vs. Air Cooling)? Water Advantages: 1) Order of magnitude lower unit thermal resistance 2) 3500X heat carrying capacity 3) Total control of the flow 4) Lower temperature Higher clock frequency Less power (leakage current) Better reliability 5) Less power consumption in the Data Center Greater Performance Greater Efficiency Better Quality 6) Less heat to transfer to outside ambient 7) Less or no computer room air handlers Water Disadvantages: 1) Added complexity 2) Added cost (but not necessarily cost/performance) 3) The perception of water cooling 125
Comparison : Water Cooling vs. Air Cooling High Heat Capacity (~ 4200 kj / m 3 K) Low Heat Capacity (~ 1,15 kj / m 3 K) Aluminum frame Copper cold plate IBM IBM Water Cooling Air Cooling 126
Distribution of Chilled Water Modular Liquid Cooling Unit Rack Rack Rack 127
Combination Air and Liquid Cooled Rack or Cabinet with CDU 128
Multiple Modular CDU Units in Data Center 129
Single Data Center Level CDU IBM IBM Redundant Plate & Frame Heat Exchanger Redundant Pumps 130
Data Center Energy Strategies LARGE Companies: Which power saving strategies have you deployed / plan to deploy in the next 12 months? Cold aisle / hot aisle containment 78% Raising inlet air temperatures 75% Detailed power monitoring, benchmarking improvement 55% Power management features on servers 44% VFDs on chillers, CRAH, or pumps 62% SMALL Companies: Which power saving strategies have you deployed / plan to deploy in the next 12 months? Cold aisle / hot aisle containment 62% Raising inlet air temperatures 42% Detailed power monitoring, benchmarking improvement 42% Power management features on servers 43% VFDs on chillers, CRAH, or pumps 21% Modular data center design (smaller floor plans, modular components, etc.) 33% Air side economization 36% Water side economization 31% Liquid cooling 19% Direct Current power 8% 131 Modular data center design (smaller floor plans, modular components, etc.) 24% Air side economization 19% Water side economization 10% Liquid cooling 11% Direct Current power 5%
Closing Comments IT hardware has been REALLY CHANGING and EVOLVING in response to customer need & demands. IT loads can be very difficult to predict. Some considerations include: 1) Maximum load is a hardware driven metric. 2) Average load or Maximum Operating load is a software driven metric. 3) Software workload is really a moving target (ranges from idle to maximum operating load). 4) Idle loads can be as low as 25% of the maximum operating load (4 to 1 turndown). This makes for an infinite number of combinations and increased risk of overgeneralizing. Overconfidence through generalization (avoiding the detail) creates high stakes, and risk is important for us to understand. EXPERTS WITH IN DEPTH ASHRAE EXPERIENCE HELP MITIGATE THE RISK. 132
Questions & Contact Information QUESTIONS? Former CHAIRS of ASHRAE Technical Committee TC 9.9 Don Beaty DLB Associates Email: dbeaty@dlbassociates.com Roger Schmidt IBM Email: c28rrs@us.ibm.com ASHRAE TC 9.9 Website www.tc99.ashraetcs.org 133
TC 9.9 Datacom Book Series 1) Thermal Guidelines for Data Processing Environments, 3rd Edition (2012) 2) Datacom Equipment Power Trends & Cooling Applications, 2nd Edition (2012) 3) Design Considerations for Datacom Equipment Centers (2006) 4) Liquid Cooling Guidelines for Datacom Equipment Centers, 2nd Edition (2014) 5) Structural & Vibration Guidelines for Datacom Equipment Centers (2008) 6) Best Practices for Datacom Facility Energy Efficiency (2008) 7) High Density Data Centers Case Studies & Best Practices (2008) 8) Particulate & Gaseous Contamination in Datacom Environments, 2nd Edition (2013) 9) Real Time Energy Consumption Measurements in Data Centers (2009) 10) Green Tips for Data Centers (2011) 11) PUE: A Comprehensive Examination of the Metric (2014) 134
Evaluation and Certificate Please fill out the course evaluation form and return it to the instructor. You will receive your Certificate of Attendance when you complete the evaluation form. NOTE: You must submit your license numbers to Kelly Arnold (karnold@ashrae.org) within 5 days after the course date to ensure you receive the proper continuing education credit. If you have any questions about ASHRAE courses, please contact Martin Kraft, Managing Editor, at mkraft@ashrae.org 135
ASHRAE Career Enhancement Curriculum Program Expand your knowledge of IAQ and Energy Savings Practices through a select series of ASHRAE Learning Institute courses Receive up to date instruction on new technology from industry experts Gain valuable HVAC knowledge Accelerate your career growth Receive a certificate for successful completion of the course series Visit www.ashrae.org/careerpath to learn more. 136
ASHRAE Professional Certification Do you want to stand out from the crowd? Become ASHRAE certified. ASHRAE certification serves as a springboard for your continued professional development. Assure employers and clients that you have mastered the body of knowledge that subject matter experts have identified as reflecting best practices. Please visit the following URL to learn more about our programs: www.ashrae.org/certification o o o o o o Building Energy Assessment Professional Building Energy Modeling Professional Commissioning Process Management Professional Healthcare Facility Design Professional High Performance Building Design Professional Operations & Performance Management Professional 137