Managing Cooling Capacity & Redundancy In Data Centers Today
About AdaptivCOOL 15+ Years Thermal & Airflow Expertise Global Presence U.S., India, Japan, China Standards & Compliances: ISO 9001:2008 RoHS Compliant JIG 101 Compliant REACH Compliant Environments: Enterprise Design / Build Colocation Managed Services Containerized Environments Industries: Financial /Insurance Healthcare Colocation / Hosting-Cloud Telecommunications / Media Manufacturing
About Electronic Environments Corp. Over 25 years of providing critical facility services to data center & telecom facilities of all sizes including: Engineering & Construction Services Data Center Power & Cooling Assessments Data Center Cooling & Energy Efficiency Solutions 24x7 Maintenance & Repair Services for Critical Power & Cooling Systems
Agenda The Data Center Landscape Why Capacity Visibility is so Important Knowing your IT Capacity and Gaining Visibility Increasing your IT Cooling Capacity The Truth About Redundancy Management Redundancy and IT Capacity Case Study
The Data Center Landscape High density IT equipment stresses the cooling capacity of many data centers. Increased proliferation of IT equipment can lead to unexpected problems with power and cooling infrastructure including overheating, overloads, and loss of redundancy. The ability to match heat load and cooling resource at the rack and room level is required to ensure efficient use of the cooling infrastructure resources.
Data Center Challenges Managing the data center for today and future can be formidable task. Knowing your infrastructure capacities and how to manage them is paramount to success. How much IT capacity is available while maintaining Redundancy? Space Power Cooling
Cooling Visibility Data center managers today need more granular data for thermal environmental management Many variables lead to reduction of cooling capacity Additional IT load stresses cooling resources Amplifies Uneven Power Distribution Thermal Hot Spots Loss of Cooling Redundancy & Availability
What Data Centers Struggle With Power and Space capacity resources are more easily defined and measured. Cooling capacity is affected by airflow which cannot be seen Sensitive to small scale changes Lost capacity accumulates without warning as the IT configuration evolves Dynamic server environment 30% improvement in infrastructure efficiency from improved airflow management
Why cooling capacity management is important IT Expansion Capabilities Staying within redundancies Knowing term of life of DC When to expand or new build Controlling Operating Costs Keeping your IT Thermally Safe Overall peace of mind
How do we gain visibility into our data center capabilities? PDU Monitoring Server Monitoring Thermal Monitoring Building Management System (BMS) Computational Fluid Dynamics (CFD) Thermal Imaging Cooling Resource Manager Environmental Management Services
PDU & Server Monitoring PDU Monitoring First step in DC performance monitoring IP Addressable PDU s allow for monitoring from one machine Automated alerting Parameters: Amps Temperature Humidity Server Monitoring Software based server management Hardware asset inventory and health reporting Controls 100 s of servers at once (powering on/off based on computational load) Predictive Failure Analysis and alerting View sensor and system event logs
Thermal & BMS Monitoring Thermal Monitoring Real-time Visibility Accuracy based on number of sensors Great for the proactive manager Parameters: Temperature Humidity Pressure BMS Monitoring Limited capacity Very sophisticated to implement Does not natively support SNMP Allows for historical trending
CFD Analysis & Thermal Imaging CFD Analysis Extremely Powerful Predictive Analysis Output is dependent of data gathering & CFD experience Thermal Imaging Point in Time Imaging Finds leaks and mixing Relative Low Cost Limited to temperature analysis
Polling Question: Have you had a CFD study done on your data center? A.) Yes B.) No, but we are considering C.) No, don t plan to D.) Not Sure
The Missing Link Real-Time Airflow & Cooling Resource Management
HotSpotr Air-movers HT-510T Under-floor Air-mover Thermostatically Controlled Supplied with Two Temperature Sensors Networked to AdaptivCOOL s Monitoring & Control System Fits Under Standard Size Perf. Tile HT-710 Overhead Air-mover Overhead Installation Return Hot Air Directly to the CRAC Intake Thermostatically Controlled Supplied with Two Temperature Sensors
Demand Based Cooling Components
Engineered Thermal Solution Recommended Solution Install networked under-floor fan tiles in key locations Install under-floor air velocity reducers Place CRAC unit in hot-standby Redistribute perforated tiles Option 1 Option 2 Option 3
Cooling Resource Management Real-time environmental interface Rack & Room level Automated Alerting Easy to implement, no down time needed Parameters: Temperature Humidity Pressure Unlike other monitoring solutions, cooling management reports in real-time and automatically adjusts the cooling systems to adapt
Cooling Resource Manager Easy to use Dashboard Real time environmentals Zone of Influence Cooling management CRAC shedding (energy savings) CRAC failure management Web enabled Safety and Redundancy- Alarms for specified out-of-normal conditions Automatic re-starts when needed for any CRAC s placed in reserve Interface to BACnet, LonWorks, SNMP and others
Cooling Resource Manager Interface Data Center Dashboard CRAC Status Fan Tile Status Zone Status Dashboard Overhead Air-mover Status
Environmental Management Services Remote monitoring of data center thermal conditions Monthly reports and analyses outlining conditions and trends Hardware & software configuration updating CFD updating based on pace of change Managed growth consulting to maximize IT load and cooling capacity
Increasing Your IT Cooling Capacity
Maximize Traditional Passive Cooling Proper Tile Placement Proper CRAC Placement Best Practices Blanking Panels Closing Floor Cutouts Hot Row/Cold Row CRACs perpendicular to rows
Best Practices Can t Solve Underfloor obstructions Poor return path Severe legacy placement issues Difficult site envelope issues U servers ~160 CFM / KW Blades ~90 CFM / KW It s a Fundamental Airflow DISTRIBUTION Problem! 25
Solving Airflow Distribution Design Requirements Data Center Applications Overcome fundamental DISTRIBUTION issues Robust, reliable, user friendly Non-intrusive, non-disruptive, no downtime to install Dynamically adjust to the changing Data Center Modular, scalable, reconfigurable 26
How Additional Capacity Can Be Achieved 166 Tons of Cooling 323 kw of IT Load 17 Cabinets Above 80 F Adding additional load will only compound issues Baseline:
Additional Capacity Achieved Room is Thermally Safe and Cooling Capacity is available for Additional IT Load 122 Tons of Cooling 323 kw of IT Load 0 Cabinets Above 80 F CRACS (OFF) DBC Solution Implemented with two CRACs off:
Increased Capacity Before: Accomplished by utilizing CFD Model and incorporating air moving floor tiles and maximizing CRAC return Air Temps Room is capable of increased IT Load (increased by 50 watts/sq. ft.) After:
Redundancy Management
What does Cooling Redundancy mean to you? Uptime Uptime Uptime
What is seen in DC s today Most data centers employ multiple CRAC units to keep the data center area cool. Generally, an extra unit is installed such that failure of any single unit will go unoticed to the end user. This is known as an N+1 setup, meaning N is the number of units needed to operate, and +1 denotes an extra unit is running as a backup While this design is sound, the CRAC units are not the only part of the entire cooling system that is critical.
Variables in Redundancy (Raised Floor) Cooling supply affected by: Plenum Depth & Pressure Underfloor Obstructions Perforated tile placement Room / Rack / CRAC Location Plenum Leakage
The Truth About Redundancy Having an additional CRAC does not mean that you are N+1 Redundant. Redundant CRACs can only provide cooling redundancy to the immediate area in which it s located This is further amplified if cold aisle containment is used
How Containment is Affected Contained zones are high pressure zones compared to the rest of the data center Requires the small server fans to pull air through the floor and into the racks Performance is dependent on closest CRAC If the contained area is far from the redundant CRAC, redundancy does not exist
Plenty of Cooling But Where is my Redundancy Baseline Model of a Room with very few thermal issues but does redundancy exist?
Don t Turn off that Unit! Baseline condition when 2 CRACs are put into Hot-Standby: Room is unable to maintain thermal safety
Redundancy Found and Utilized DBC Solution Implemented: All racks are adequately cooled and in compliance with ASHRAE Redundancy realized
Polling Question: Which of these are you experiencing in your data center? A.) Thermal Issues B.) Redundancy Issues C.) IT Growth D.) None
Disaster Waiting? Area is thermally unsafe CRAC #3 Fail
Disaster Averted! Area is thermally SAFE CRAC #3 Fail Additional Cold Air Due to Hotspotr
Demand Based Cooling Redundancy and IT Capacity Case Study
Background 5k sq ft DC running at Capacity Frequent local thermal issues Some areas not fully loaded Solution Goal Add 175kW IT load Safely with Current Cooling Infrastructure and Improved Reliability & Redundancy Marginal Thermal Redundancy Additional IT load coming
Normal State of Data Center Before DBC Temperatures at 6 Feet and Maximum Rack Intake CRACs are OFF (Supposed to be Redundant)
Temperature Profile During CRAC Failures Baseline CRAC #3 Failure CRAC Failure Baseline CRAC #4 Failure CRAC Manually turned on by user Remote CRAC cannot compensate for local CRAC failure CRAC Failure Powered Off by user
Temperature Profile With DBC and Additional 175kW IT Increase Normal or No Failure State Room in Normal state before DBC Baseline Temperatures Room with DBC Solution and ADDITIONAL 175kW DBC Solution Temperatures Better Cooling with Higher IT load Powered Off by User Primary Redundant Unit: On Hot Standby
True N+1 Cooling Redundancy with DBC Temperatures: DC without DBC CRAC # 3 Failure Temperatures: DC with DBC solution with add l 175kW Remote CRAC cannot CRAC Failure compensate for local CRAC Powered failure Off by user With DBC, Remote CRAC compensates for local CRAC failure
Questions? Comments? Mark Meyer AdaptivCOOL (510) 543-6909 mark.meyer@adaptivcool.com Brad Morgan Electronic Environments Inc. (508) 229-1446 bmorgan@eecnet.com
Polling Question: Would you like to learn more on airflow and cooling solutions from AdaptivCool and Electronic Environments? A.) Yes B.) No