Technical white paper HP ProLiant BL660c Gen9 and Microsoft SQL Server 2014 technical brief Scale-up your Microsoft SQL Server environment to new heights Table of contents Executive summary... 2 Introduction... 2 HP ProLiant BL660c Gen9 overview... 2 HP ProLiant BL660c at a glance... 3 Solution components... 4 Server... 4 Storage... 4 Design principles... 5 Performance... 5 High Availability and disaster recovery (HADR)... 6 Capacity and sizing... 7 Workload description... 7 Analysis and recommendations... 8 Core baseline... 9 SQL Server 2014 deployment... 9 Summary... 10 Implementing a proof-of-concept... 10 For more information... 11 Click here to verify the latest version of this document
Executive summary The HP ProLiant BL660c Gen9 Server Blade is a new offering in the x86 blade family designed to provide superior performance and scalability. The BL660c is a versatile and scalable platform. Ideally suited as a building block for large scale virtualization farms or high performance standalone physical servers for applications that require scalability and performance such as Microsoft SQL Server 2014 databases. Increases in processor scaling and memory speed make this server a solid choice for traditional SQL Server deployments and the higher density memory design of the server provides a platform useful for both physical and virtualized environments. The BL660c Gen9 has an increased memory footprint of up to 2TB. This makes the server capable of supporting very large traditional OLTP instances along with the SQL Server 2014 In-memory DB engine as well. For customers that have standardized on a blade infrastructure form factor but require servers that can scale to 2TB of memory, 18 cores, or support three different network fabrics for greater throughput, the BL660c is an ideal server platform. In this paper, we characterize the BL660c server blade as part of a SQL Server 2014 solution with the HP 3PAR StoreServ 7440c, focusing on OLTP database deployments using Microsoft SQL Server 2014. This characterization will assist customers in better understanding their hardware choices when deciding between two and four socket blade platforms for their critical SQL Server 2014 environments. Target audience: Chief information officers (CIOs), chief technology officers (CTOs), data center managers, and others wishing to learn more about scaling up Microsoft SQL Server 2014 with this recommended configuration from HP. A working knowledge of server architecture, networking architecture, storage design, Microsoft Windows Server and Microsoft SQL Server is recommended. This white paper describes testing performed by HP in May 2015. Introduction The HP ProLiant BL660c Gen9 server blade has been designed to increase performance and provide a refresh or scale-up option for prior server blades such as the BL680c G7 and BL460c G7. In this evaluation we will characterize these performance improvements and overall platform architecture, and highlight options as they relate to Microsoft SQL Server 2014 database servers in the OLTP domain. This characterization focuses on: Scale-up OLTP performance across different processor options SAN and DAS Storage options SQL Server high availability and disaster recovery topologies when deployed in an HP BladeSystem environment with other server blades As part of this characterization we also map different SQL workload requirements to server hardware options. This mapping assists customers in sizing and configuring an HP ProLiant BL660c Gen9 server blade for a SQL Server host. HP ProLiant BL660c Gen9 overview The HP ProLiant BL660c Gen9 server blade is a full-height four-socket blade compatible with the HP BladeSystem c7000 enclosure. Each blade can host two or four Intel Xeon E5 4600 v3 family processors. The c7000 enclosure has the capacity to accommodate up to eight BL660c Gen9 server blades. Together, these components provide a highly scalable, high density platform than can host multiple large scale OLTP databases, or serve as a virtualized consolidation platform capable of hosting multiple databases across a large number of virtual machines. The server is architected to support three mezzanine card slots further increasing I/O scalability to meet varying workload demands. As a result the server can scale as needed in terms of processor, memory, and I/O to match Microsoft SQL Server 2014 OLTP database demands. 2
HP ProLiant BL660c at a glance An internal view of the HP BL660c Gen9 server blade is shown in figure 1. Figure 1. HP ProLiant BL660c Processors options The HP ProLiant BL660c Gen9 server blade uses Intel s new E5-4600 generation v3 family of processors. Intel classifies the processors according to the following market segments: basic, standard, advanced, frequency optimized, and segment optimized. In this characterization we will only test one processor in each of these two segments: Frequency optimized processors: Low (4-8) core count and high frequency designed for workloads licensed by the core such as SQL Server 2014. Segment optimized processors: High (14-18) core count and low frequency designed for workload consolidation. In addition to the core count, the size of the level 3 cache (L3) is also important for OLTP databases as this cache provides much faster access to data than local memory. The processors tested were: E5-4655 v3 6-core 2.9 Base / 3.2 Turbo GHz 30MB L3 Cache E5-4669 v3 18-core 2.1 Base / 2.9 Turbo GHz 45MB L3 Cache Memory The HP ProLiant BL660c Gen9 server blade memory benefits from E5-4600 v3 processor enhancements and DDR4 memory compatibility. The server provides 8 DIMM slots associated with each processor socket for a total of up to 32 DIMMs and 2TB of RAM. These improvements double the 1TB maximum RAM available in a BL660c Gen8 and the BL460c Gen9 servers while also providing between 33%-50% more bandwidth (depending on maximum frequency of processor chosen) than DDR3 in the BL660c Gen8. Maximum frequency is 2133 MHz, depending on memory DIMM and processor selection. I/O options The HP ProLiant BL660c Gen9 server blade provides two FlexibleLOM slots and three mezzanine slots for I/O. The tested configuration include a single FlexibleLOM adapter providing both network and FCoE connectivity to external storage. 3
Solution components Server HP ProLiant BL660c Gen9 server blades used for testing were configured as follows: 4x E5-4669 v3 (18 core) for high core processor test or 4x E5-4655 v3 (6 core) for low core processor test 2TB RAM HP 1.6TB I/O Accelerator HP FlexFabric 10Gb 2-port 536FLB FlexibleLOM Adapter Windows Server 2012 R2 Standard Edition SQL Server 2014 Enterprise Edition with Cumulative update 6 (CU6) HP driver update (March 2015 Edition) Storage The HP BladeSystem c7000 enclosure supports several storage protocols such as FC, FCoE, iscsi, etc. In particular, when using the HP 3PAR StoreServ 7440c four node storage array the c7000 blade enclosure can be configured in a flat-san (direct attached) topology eliminating the need for external SAN switches The BL660c Gen9 server blades can also store data internally via 4 local drives and optional HP flash I/O accelerators. For this testing a single HP 3PAR StoreServ 7440c configured with 32 480GB cmlc drives was used. The following diagram illustrates the component placement in a rack. Figure 2. BladeSystem c7000 enclosure and HP 3PAR StoreServ 7440c storage array 4
Design principles Designing a SQL database solution requires knowledge of the target workload(s) characteristics which can vary greatly depending on the business application. A workload can be CPU intensive, I/O intensive, memory intensive or some combination of all of the above. Covering all workload scenarios is beyond the scope of this paper. However, this paper will identify several of the more salient/typical OLTP database workload scenarios. Additionally this paper does not focus on other SQL Server workloads such as data warehousing, analytics services, or related BI workloads. Given the hardware specifications for the BL660c previously highlighted in this paper, we will first look at each of the critical components in the system to understand how each component relates and impacts the SQL database workload. Later in the analysis section below we will further illustrate this concept in terms of actual test results that validate our expectations. Performance CPU intensive SQL workloads The HP ProLiant BL660c Gen9 server blade provides a compact quad-socket architecture that individually provides high scaling per server while also providing higher core density than larger dedicated servers. These physical considerations prove valuable for both larger SQL Server physical instance deployments and multiple SQL Server virtualized instances. When using the E5-4669 v3 18 core CPU in a c7000 enclosure with 8 BL660c Gen9 server blades, the core density per U is 57.6 Low latency workloads For workloads requiring low latency and fewer threads, the server can be populated with Intel s frequency optimized processors. For example, the E5-4655 v3 has a base frequency of 2.9 GHz in a 6 core package with 30MB of L3 cache that execute threads faster. It also provides 5MB of L3 cache per core compared to standard processors that only provide 2.5MB per core. Note that E5 v3 processors have a shared L3 cache, so the amount of cache used per core varies but when divided evenly by the total number of cores gives a normalized figure that helps asses overall caching ability. This configuration offers the added benefit of lower SQL licensing costs but may not necessarily outperform a higher core count configuration under a large number of concurrent threads. Memory intensive workloads Mega instances that use large amounts of RAM actively benefit from a platform that offers high speed RAM and efficient interprocessor communication. When sizing a solution for a large instance, the BL660c offers up to 9.6 GT/s of QPI bandwidth and up to 2133 MHz DDR4 frequency depending on processor used. Performance can be further enhanced for OLTP databases by setting instance and or database NUMA alignment and by choosing a processor with the largest L3 cache for the core count target required for the workload. In addition, the large memory density of up to 2TB of RAM further enhances the server s ability to host mega instances or numerous virtualized instances with ample RAM in each virtual machine. I/O intensive workloads For I/O intensive workloads the choice of storage and mezzanine cards are influenced by the high availability and disaster recovery (HADR) topology needed for the solution. The topology will dictate if SAN or DAS will be used and in either case the BL660c Gen9 offers several options. In addition, the server has three mezzanine slots that can be used to scale additional external or internal I/O. For SAN storage, up to 20Gb FCoE capable network modules and 16Gb HBA mezzanine cards provide high bandwidth external I/O. For DAS storage, internal storage accelerator mezzanine cards add fast local storage compatible with Always On availability groups. In our lab server 1.6TB accelerator mezzanine cards were installed. The following HADR section outlines key topologies. The HP ProLiant BL660c Gen9 server blade also scales further than a BL460c Gen9 which only has two mezzanine cards. Read intensive workloads In cases of heavy read I/O workloads where physical reads might exceed a server s I/O capability Microsoft SQL Server 2014 Always On offers read-only replicas that can be used to offload heavy reads from a primary server scaling out to scale-up performance on the primary server. In this workload scenario business application modifications may be required to have reads use the proper connection string in order to be served by the read-only secondary. For example the BL660c Gen9 server blade can be used to implement a primary node in the availability group and a secondary BL460c Gen9 server blade used as a lighter server for read only portions of the workload(s). This scale-out model improves latency of key write transactions on the primary node while allowing reads and backups to occur from the secondary read node. 5
Figure 3. Scale-up read-only secondary High Availability and disaster recovery (HADR) There are several options for implementing HADR for Microsoft SQL Server 2014, depending on requirements and deployment topologies used (physical or virtual server, etc.). The HP ProLiant BL660c server blade supports Windows Server 2012 R2 and when used as a physical server can implement HADR using SQL Server Always On. The following figures illustrate the conceptual mapping for two of the key HADR topologies that can be implemented using SQL Server 2014 Always On. The key in choosing HADR topologies pivots on availability requirements. A traditional SQL Server failover cluster provides automatic instance protection, but not necessarily database protection. Availability groups don t provide instance protection, but guarantee database protection. Figure 4. SQL Server Failover cluster Instance and availability group topologies 6
Figure 5. SQL Server hybrid topology (Both FCI, and Always On availability groups) Note Microsoft SQL Server 2014 licensing has changed since SQL 2012. Passive availability group nodes must be licensed unless software assurance is purchased. Capacity and sizing In this tech brief we tested the HP ProLiant BL660c Gen9 using a CPU intensive OLTP workload to characterize the performance characteristics of the server blade itself and not necessarily other components such as storage performance. Workload description The workload is driven by a three tier application deployed using 16 HP ProLiant BL460c G7 blades in a separate c7000 chassis and rack. 8 individual blades interact with the database server by issuing SQL queries. The workload application drives OLTP transactions against four 500GB OLTP databases 7
Figure 6. Test environment Analysis and recommendations The evaluation tests were performed to validate and compare OLTP performance using different processors to show impact of reducing cores in CPU intensive workloads. The following initial tuning was performed to establish a baseline system configuration to be used for subsequent comparison tests. Initial hardware tuning We evaluated the server Hyper-Threading performance by testing SQL performance with Hyper-Threading ON versus OFF. Slightly better performance was achieved with it ON. Initial SQL Server 2014 tuning SQL Server 2014 Large Page enabled/disabled. Slightly better performance was achieved with large pages enabled. Max degree of parallelism was not varied, however we recommend testing this setting beyond the default of 0 (zero) which means use all cores for parallelism. For OLTP systems, 1 is a good starting point but each workload should be tested with higher values. We recommend testing up to the core count of an individual socket. 8
Core baseline A CPU intensive OLTP test was run on a single SQL Server 2014 instance containing four identical 128GB databases. The following results are not intended as a benchmark, but rather as a means to measure the differential performance between two processor models. The first test was performed using the E5-4669 v3 18-core CPU. The database instance averaged 26K batch requests per second. The second test was performed using the E5-4655 v3 6-core CPU. The database instance averaged 18K batch requests per second The tests where run by allowing full buffer pool warm-up, and workload ramp-up to 70% processor time. Measurements were averaged over a 10 minute run window after steady state. The chart in figure 7 illustrates the differential, in which the 6-core performance is 70% of the 18-core performance despite only having 33% of 18 cores. Figure 7. Transaction performance differential between 6 and 18 core processors under similar load 30000 25000 20000 15000 10000 5000 0 6 18 SQL Server 2014 deployment Four 500GB OLTP database files were deployed on the HP 3PAR StoreServ 7440c SAN storage array using a virtual volume provisioned from a RAID5 Common Provisioning Group (CPG). The database log files were deployed on a virtual volume provisioned from a RAID1 CPG. Default settings for Microsoft SQL Server were modified as follows: Max Degree of Parallelism Changed from 0 to 1 Trace Flag 836 Lock pages in memory Added to startup parameters Max Worker Threads Increased to 3000 Right granted to SQL Server service user Because the BL660c Gen9 server blade is NUMA capable each database was given NUMA affinity to a node and since there are four sockets in the server there is a one-to-one affinity mapping between databases and NUMA nodes. Each node maps to a physical socket. The affinity is configured via the SQL Configuration Manager tool by adding four distinct additional connection ports beyond port 1433 under the TCP IPALL section. Each port has affinity set by using a bitmask (converted to hexadecimal). 9
Use the SQL Configuration tool and select the client protocol TCP/IP Properties to set individual port affinity for each database. Figure 8. NUMA affinity setup using SQL Configuration tool The actual port definition string is: 1433,1501[0x1],1502[0x2],1503[0x4],1504[0x8] Summary This tech brief has outlined the key design factors that should be considered when choosing a server platform for Microsoft SQL Server 2014. In particular, the limited CPU performance comparison in this paper helps the reader understand the benchmarking process needed to successfully evaluate the correct processor and I/O options needed when sizing a configuration for SQL Server. In addition, the test results show the frequency optimized 6-core CPU performs up to 70% of an 18-core processor allowing for 66% fewer power and licenses. The CPU deep dive explored in this evaluation provides further insight on measurement metrics that can be used in platform evaluation. Finally, the evaluation shows the BL660c Gen9 a capable, scalable server blade suitable for a refresh from G7 or older server blades. Large performance improvements in the Intel E5 v3 Haswell processor family and the introduction of DDR4 RAM provide a substantial gain of up to 50% in memory bandwidth (frequency) greatly improving traditional and in-memory OLTP transaction latency. When this is combined with the BL660c Gen9 design and options, the solution results in a highly scalable and flexible building block for deploying mission critical SQL Server 2014 databases. Implementing a proof-of-concept As a matter of best practice for all deployments, HP recommends implementing a proof-of-concept using a test environment that matches as closely as possible the planned production environment. In this way, appropriate performance and scalability characterizations can be obtained. For help with a proof-of-concept, contact an HP Services representative (http://www8.hp.com/us/en/business-services/it-services/it-services.html) or your HP partner. 10
For more information HP ProLiant BL660c Gen9, hp.com/servers/bl660cgen9 HP BladeSystem, hp.com/go/bladesystem To help us improve our documents, please provide feedback at hp.com/solutions/feedback. Sign up for updates hp.com/go/getupdated Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. The only warranties for HP products and services are set forth in the express warranty statements accompanying such products and services. Nothing herein should be construed as constituting an additional warranty. HP shall not be liable for technical or editorial errors or omissions contained herein. Microsoft and Windows Server are trademarks of the Microsoft group of companies. Intel and Xeon are trademarks of Intel Corporation in the U.S. and other countries. 4AA5-9057ENW, June 2015