Case study: End-to-end data centre infrastructure management Situation: A leading public sector organisation suspected that their air conditioning units were not cooling the data centre efficiently. Consequently, excessive power was being consumed and a number of servers were potentially at risk of overheating very quickly should more serious problems with the cooling system arise. Solution: Concurrent COMMAND provided detailed energy monitoring and a visual temperature profile of the room. This allowed for the impact of various configurations of the air conditioning units and air flow conditions to be assessed and an optimal profile to be determined. Once the cooling system was functioning more effectively, the ambient temperature of the data centre was increased to deliver further reductions in power consumption and CO2 emissions, while ensuring that mission-critical IT systems were maintained within thermal bounds. Situation: A leading public sector organisation identified the need to review their server estate with a view to optimising usage and potential system consolidation. Solution: Concurrent COMMAND was used to determine the extent to which the estate was being used efficiently and which servers could be retired or converted to virtual machines in order to reduce energy consumption. Moderate CPU usage Low-to-Moderate CPU usage Low CPU usage
Key facts 70m 2 floor space 38 standard racks Approximately 170 servers 3 air conditioning units A leading public sector organisation suspected the cooling in their data centre could work significantly more effectively. Initial investigations confirmed that the three air conditioning units in the data centre, while working hard and consuming significant amounts of power, were not working in an efficient manner. The client suspected that there were multiple hot and cold spots in the data centre due to poor air-flow management and were concerned that this may pose a risk to the health of their server estate. They realised they needed a Data Centre Infrastructure Management (DCIM) tool to help them mitigate the risk of fluctuating temperatures within the data centre, as well as to deliver meaningful energy and cost savings. Baseline power monitoring The first step was to provide accurate figures on the energy use across the data centre. This was achieved by using branch circuit monitoring hardware and Concurrent COMMAND s energy management capabilities. The result was detailed information on the energy use of individual air conditioning units. Combined with temperature information and air-flow measurements, this information allowed the university to understand their operating efficiency. The instantaneous power drawn by each rack of IT equipment was also monitored. A visual temperature profile The next task was to develop a temperature profile of the room to identify any hot or cold spots. A hundred of Concurrent Thinking s lowcost, one-wire temperature sensors were daisy-chained using cat 5 cable along the front and back of the racks, which were then monitored continuously. Concurrent COMMAND also collected data from those computer systems that reported their own inlet and CPU temperatures over the SNMP and IPMI protocols. The result was a detailed visualisation of the temperature conditions across the room. Various tests were undertaken and changes made in order to find the optimum configuration of the air conditioning units as well as to maximise air flow beneath the raised floor. With constant feedback from Concurrent COMMAND, the ambient air temperature was raised from 20 C to 24 C in steps of 1 C. As the temperature was adjusted, the impact on the servers and the room as a whole was monitored in real time until an optimal temperature profile was found. 2
20% power saving The changes that were made had the effect of removing the worst hot and cold spots, while stabilising the temperature profile of the room as a whole. Previously, the room temperature had been found to vary continuously as the air conditioning units competed against each other, so adding to the drain on power consumption. Concurrent COMMAND monitored the power consumption of the air conditioning units during this optimisation process. Once complete, the client was able to reduce the power consumption of the cooling system by 20%. 20% power reduction for air conditioning units shown in chart Average room temperature over time shown in graph Increased IT resilience Not only does detailed temperature monitoring result in a meaningful energy, cost and CO2 reduction, but the client s data centre and IT managers continue to benefit from timely warnings of any potentially critical events that relate to their cooling systems. With continuous monitoring and automated alarms, modest variations in both cooling efficiency and local temperature can now be identified and resolved well before mission-critical IT equipment is affected. 3
Temperature distribution in the data centre Conclusion Annual savings Energy: 20% reduction of cooling overhead / 55,000 kwh CO2: 30,000 Kg Costs: 5,500 The use of Concurrent COMMAND allowed the public sector organisation to confirm that their data centre cooling systems were not functioning optimally. Concurrent COMMAND provided them with the detailed, visual information that they needed to make significant improvements, with the on-going knowledge that their data centre continues to operate efficiently. The results were a safe increase in the overall temperature profile of the room, with corresponding energy, CO 2 and cost savings. 4
Key facts 70m 2 floor space 38 standard racks Over 170 servers A powerful DCIM toolkit Like most data centres that have grown naturally over time, the client suspected their IT systems were not operating in a very efficient manner. As a public sector organisation, they are under considerable pressures to reduce both OPEX costs and CO2 emissions, so optimising energy use within their data centres was a high priority. They decided to review server utilisation and consider consolidation strategies that could drive down energy costs and reduce the need for space and cooling infrastructure. To do this, they used Concurrent COMMAND, a robust Data Centre Infrastructure Management (DCIM) tool that is able to analyse the effectiveness of their servers in detail. Concurrent COMMAND uses protocols such as ModBus, SNMP and IPMI in order to monitor power usage at the distribution board, rack PDU and server level or indeed wherever hardware support for remote monitoring is available. It can also use SNMP and WMI protocols in order to obtain detailed information such as CPU, network and I/O usage by interrogating the operating system itself. Data can be manipulated and presented in multiple ways using dashboard widgets, data centre plan and rack views, and historical graphs; both for individual devices and groups of devices. This intuitive GUI allows the user to obtain high-level management information and then drill down to obtain the detailed and highly granular technical information that is often needed to make operational decisions. Monitoring system utilisation Concurrent COMMAND was used to monitor the CPU load of over 80 Windows and Linux systems systems, each a potential target for consolidation, over the course of a week. This was replicated three times, both during and outside term time, to ensure the findings were consistent. Underperforming systems were categorised by moderate CPU usage, low-to-moderate CPU usage and low CPU usage. CPU usage over 24 hours by category Moderate CPU usage Low-to-Moderate CPU usage Low CPU usage 5
The data provided by Concurrent COMMAND showed that servers in the low and low-to-moderate usage categories were still consuming large amounts of power, but performing very little useful work. The client was also able to review system activity data and identify trends on a weekly and daily basis. For example, in the low usage group, a peak period of daily activity was identified but beyond this many of the servers were virtually idle. Asset management and comprehensive system monitoring While the use of simple metrics such as CPU usage, CPU usage per watt, or CPU usage per of energy are useful indicators, they do not tell the whole story. In particular, CPUs vary enormously in terms of application performance: a three year old CPU is likely to be significantly less efficient than a state-of-the-art CPU and a modern server may have four or eight times as many CPU cores. For this reason, it is useful to combine information about particular servers from Concurrent COMMAND s built-in asset database in order to make more meaningful comparisons. In this study, publicly available benchmarks figures were used and assigned to groups of servers of a particular type and manufacturer within the asset database. Normalised CPU usage metrics were then compared, surprisingly demonstrating that the total combined load of the servers within the low and low-to-moderate usage categories equated to just 1.7 time the peak performance of a modern CPU core and yet they were consuming 3.9kW of energy. Weekly usage for a server in the low usage category Detailed, easy to access data Concurrent COMMAND also allowed the client to delve deeper into the results to further analyse the power consumption and utilisation of each individual server. With this information, they were able to identify peaks and trends in utilisation. 6
Daily usage chart of one server The above graph shows CPU load in blue and power used in red. This information allowed the client to identify major performance spikes, which could most likely be attributed to specific tasks such as a scheduled virus check or system backup. Informed decisions reduce costs and increase efficiency The client was able to use the information collated by Concurrent COMMAND to make decisions on how to best optimise their data centre assets. Additional investigations into the role and workload of each machine are required before any action is taken, including their potential roles in fail-over and disaster recovery. However, through the use of Concurrent COMMAND, it is now clear that many of the servers in the low and lowmoderate usage categories could be converted to virtual machines or retired. Furthermore, with the detailed historical information provided by Concurrent COMMAND, the requirements of individual virtual machines and the servers that will be needed can be accurately scoped. In a best case scenario, with a combined peak load of less than 10 modern CPU cores and an average of 6 modern CPU cores, it is possible that all the servers in the two low-usage categories could be replaced as virtual machines on a single modern server. Annual Savings Power: 30% reduction Costs: 14,000 Total annual savings Energy: 35% reduction Costs: 20,000 This would significantly reduce the overall power needed to run these services from circa 9.3kW to 0.3kW, saving circa 30% of the power used by all the IT equipment in the data centre. Such a reduction would also have a knock-on saving with respect to cooling requirements, resulting in a total annual saving of 14,000. When combined with savings made in part 1 to optimise stand-alone cooling costs, the total potential annual saving is nearly 20,000 or 35% of the initial total energy cost. 7
About Concurrent COMMAND How is our DCIM solution different? Concurrent Thinking s Data Centre Infrastructure Management (DCIM) product suite, Concurrent COMMAND, saves money by reducing risk, delivering significant operational efficiencies and cutting energy costs. It s a unique, easy-to-use and modular DCIM solution that allows you to manage all your data centre facility and IT assets within a single framework. Scalability in DCIM is a fundamental requirement; Concurrent COMMAND inherently manages hundreds of thousands of metrics every 15 seconds. This provides invaluable support to your business as you increase the number of sites, devices, racks, servers and virtual machines that you manage. It supports your entire existing and future infrastructure and utilises industry standard protocols such as Modbus, SNMP, WMI, IPMI and 1-wire technology, as well as key vendor-specific protocols, such as Intel Node Manager. It s vendor neutral and truly customisable, allowing you to customise Concurrent COMMAND to meet your specific needs and to monitor and control virtually all third party devices through an extensible scripting interface. Drive savings of over 20% Our customers measure their return on investment in months rather than years. Typically reported energy savings and improved operational efficiencies are over 20% and often significantly higher. A modular DCIM solution that grows with you Our modular licensing approach caters for the DCIM needs of SMEs, corporate data centres, colocation providers and cloud providers alike. It allows you to choose the modules that meet your current budget and requirements while being able to scale up your service as required. Contact us To find out more about Concurrent COMMAND or to request a demo, contact us on. 8