HP reference configuration for entry-level SAS Grid Manager solutions Up to 864 simultaneous SAS jobs and more than 3 GB/s I/O throughput Technical white paper Table of contents Executive summary... 2 Introduction... 2 Solution criteria... 2 Reference configuration... 4 System/Environment setup... 7 Bill of materials... 7 Implementing a proof-of-concept... 8 Appendix A Quantum StorNext data management solutions... 9 For more information... 10
Executive summary SAS Grid Computing customers require a solution that can deliver enterprise-class performance that is easy to scale and easy to manage. SAS Grid Manager manages the SAS workload dynamically across all nodes in the SAS Computing Grid. It provides dynamic load balancing, job prioritization and SAS job control. The reference configuration described herein is suitable for an entry-level SAS Grid Manager solution that requires 288 to 864 simultaneous SAS jobs and/or I/O throughput of 2.5 to 3.01GB/s. The configuration utilizes HP ProLiant server blades, HP 3PAR Storage Systems, Red Hat Enterprise Linux and a highly-available, scalable shared file system by Quantum. Target audience: The target audience for this reference configuration is the Information Technology (IT) community evaluating solutions for their environment. Business users and IT professionals who are interested in implementing SAS Grid Computing may find this configuration useful for an entry-level SAS Grid Computing configuration with Quantum StorNext File System, as well as an example of the scalability of HP servers and storage. This white paper describes a reference configuration used in testing performed in August 2011. Introduction This configuration was used in HP s performance benchmark testing, and will support a customer with entry-level SAS Grid Manager needs of up to 864 simultaneous SAS jobs and up to 3.01 GB/s sustained I/O throughput. SAS supplied the mixed analytics workload test suite, certified the correctness of the test suite employed, and validated that the benchmark used to develop this reference configuration met the needs of a typical SAS customer. Solution criteria The SAS Grid Manager mixed analytics benchmark scenario simulates the types of jobs received from various SAS clients such as display manager, batch, SAS Data Integration Studio, SAS Enterprise Miner, SAS Add-In for Microsoft Office, and SAS Enterprise Guide. Many customer environments have large numbers of ad-hoc SAS users or jobs that utilize analytics in support of their company s day to day business activities. The SAS Grid Manager analytic benchmark scenario consisted of typical analytic jobs designed to replicate a light to heavy workload. These jobs were launched via a script which included time delays to simulate scheduled jobs and interactive users launching at different times. The SAS Grid Manager workload used to develop this reference configuration had the following characteristics: 50% CPU intensive jobs and 50% I/O intensive jobs Utilized SAS procedures including: SAS DATA step, PROC RISK, PROC LOGISTIC, PROC GLM (general linear model), PROC REQ, PROC SQL, PROC MEANS, PROC SUMMARY, PROC FREQ and PROC SORT SAS program input sizes up to 50 GB per job* Input data types are text, SAS dataset, and SAS transport files Memory use per job is up to 1 GB Each SAS job uses approximately 25 MB/s of I/O throughput Job runtimes were varied (short and long-running tasks) * File sizes were chosen to eliminate the caching effect of SAN storage and system RAM. Larger file sizes would not have affected results. Testing of this configuration showed that an optimal level of I/O throughput (3.01 GB/s) was achieved during the 4- server tests, running 72 simultaneous SAS jobs on each server (288 simultaneous SAS jobs across all 4 servers). This throughput level was maintained for the entirety of the test duration, 52 minutes and 49 seconds. Note: Maximum throughput of the 3PAR T800 Storage System is 5.3 GB/s. This workload simulated a mixed workload of users accessing the grid servers. 2
Table 1. I/O Throughput for Number of Servers and Number of Simultaneous SAS Jobs Number of Servers Simultaneous Jobs per Server Total Simultaneous Grid Jobs Elapsed Time (hh:mm:ss) I/O Throughput (GB/s) 4 72 288 00:52:49 3.01 4 144 576 01:56:21 2.70 4 216 864 03:08:49 2.50 For a fixed number of resources (i.e. number of servers), this configuration scales linearly in conjunction with the number of simultaneous SAS jobs submitted. Figure 1. Test scalability of SAS Grid Manager running on HP server blades, 3PAR Storage System and Quantum StorNext File System I/O Throughput vs Simultaneous SAS Jobs 3.2 3 4 Servers 3.01 2.8 2.7 I/O Throughput GB/s 2.6 2.5 2.4 2.2 2 864 576 288 Number of Simultaneous SAS Jobs 3
Reference configuration Figure 2. Architectural diagram of entry-level reference configuration The reference configuration is comprised of SAS Platform Suite and SAS Grid Computing on 4 HP BL460c G7 server blades, each having two 6-core processors for a total of 48 cores in the HP BladeSystem grid, with HP s 3PAR T800 Storage System. Quantum s StorNext File System acts as the centralized clustered file share for SAS data across the 4 ProLiant server blades running Red Hat Enterprise Linux. The entire logical data storage environment for this configuration consists of a simple, single, file system running the high performance Quantum StorNext File System for SAS input, SAS output and even SAS work temporary storage. While the size of the StorNext File System should be dictated by solution requirements, file system sizes of up to 100TB will deliver the results documented herein. The fifth server blade in this reference configuration functions as the Quantum StorNext Metadata Controller. 4
Figure 3. Rear view, showing fibre channel cabling Two T-Class Controller nodes are included in this reference configuration, with each controller node containing 3 4-port Fibre Channel I/O cards. 5
A 3PAR T800 Storage System includes 3 internal PCI-X buses, illustrated in Figure 4 as PCI-X Buses 2, 1 and 0. Within each Multifunction Controller, there are 6 PCI-X slots. To obtain maximum data throughput for this reference configuration, it is necessary to fully utilize all 3 PCI-X buses. This may be done through the slot positioning of the 3 Fibre Channel I/O cards within each Multifunction Controller. Controller-to-Drive Chassis Fibre Channel cabling should utilize PCI-X Bus 2 (slot 0) and PCI-X Bus 1 (slot 2) within each Controller Node. Array-to-Server Fibre Channel cabling should utilize PCI-X Bus 0 (slot 5) within each Controller Node. NOTE: This is a non-typical configuration. If a customer wishes to implement such a configuration, the requirement must be specified at the time of ordering. Figure 4. Non-Standard Cabling Utilizing 3 PCI-X Buses 6
System/Environment setup The reference configuration includes several different software components: SAS Foundation 9.3 (installed on 4 server blades) SAS Grid Manager Red Hat Enterprise Linux 6.1 Quantum StorNext File System version 4.2 (shared across 4 server blades) Quantum StorNext Metadata Controller Server (1 server blade) Bill of materials Table 2. Bill of materials Qty Description Server Configuration 1 HP BladeSystem c7000 Enclosure 5 HP ProLiant BL460c G7 server blades, each with 2 x 6-core Intel Xeon 2800 MHz Processors 5 PC3-10600 DDR3 RDIMM (12 DIMMs x 4GB) 5 HP 300GB 6G SAS 10K rpm SFF (2.5-inch) Dual Port Hard Disk Storage Configuration 1 HP T800 Base Configuration (Includes (2) 2.33GHz T-Class Controller Nodes) 6 4-Port Fibre Channel Adapter (4 GBIT) 2 4GB Control Cache (2 x 2GB DIMMs) 4 6GB Data Cache (3 x 2GB DIMMs) 3 Drive Chassis (40-Disk, 4 GB/s) 9 4 x 300GB Drive Magazine (15K RPM, 4GB/s) 1 2M Cabinet Kit (With Redundant PDU Pair) 12 10M Fiber Cable 50/125 (LC-LC) 2 2M Fiber Cable 50/125 (LC-LC) 1 Service Processor 7
Implementing a proof-of-concept While this reference configuration produced the documented results, SAS solutions will vary greatly between implementations. As a matter of best practice for all deployments, HP recommends implementing a proof-of-concept or sizing using a test environment that matches as closely as possible the planned production environment. In this way, appropriate performance and scalability characterizations can be obtained. HP provides this as a complimentary service; you can reach us at sastech@hp.com. 8
Appendix A Quantum StorNext data management solutions Quantum s StorNext data management solutions let customers share data faster and store data at a lower cost. Proven in the world s most demanding environments, Quantum s StorNext sets the standard for high performance workflow operations and scalable data retention. It offers resilient, high-speed access to shared pools of digital files from both local area network (LAN) and storage area network (SAN)-attached servers, so projects like SAS analytics are completed faster. For SAS Grid deployments with StorNext on 3PAR T800, the latest StorNext File System (version 4.2 at the time of writing) is recommended to be installed on each of the servers in the HP BladeSystem environment. The reference configuration is based on Red Hat Enterprise Linux 6.1 in a SAN-connected environment. While StorNext also supports Microsoft Windows and various distributions of UNIX (including HP-UX) as SAN-attached clients, as well as a high-throughput proprietary LAN client (StorNext Distributed LAN Client), neither the StorNext for Windows client nor UNIX SAN clients nor the StorNext Distributed LAN client were targeted. It is recommended that 2 server blades function as the StorNext Metadata Controllers, with one being active and the second operating in stand-by mode. The other server blades can be configured with the StorNext File System, participating as nodes running the actual SAS analytics processing application. The StorNext Metadata Controller server acts a traffic cop for the rest of the servers accessing the single pool of storage on the 3PAR Storage System. For simplicity, a single StorNext File System on the array has been shown to be the simplest to manage, while still yielding the highest levels of performance. This file system can handle all the SAS input, output and even the local SAS work processing storage for all working nodes in the SAS Grid cluster. The StorNext shared file system should be configured with different settings for the transaction-heavy metadata and journal LUNs than the large, sequential file high-throughput workload of the data LUNs for the SAS analytic files. Stripe groups are recommended to be configured across multiple physical LUNs to increase performance. A stripe group is a RAID 0 (performance) stripe across multiple hardware LUNs allowing multiple simultaneous writes to be spread out across multiple spindles. The StorNext metadata and journal LUNs consist of 4 LUNs set up as RAID 1 mirrored pairs. Each LUN can be configured as 200 GB in size. The SAS analytic data LUNs consist of 16 LUNs configured, each 500 GB in size. These LUNs are set up as RAID 5. A total of 4 Stripe Groups are configured, each containing four 500 GB LUNs, allowing 4 simultaneous writes to occur. These 16 data LUNs make up the file system that all server blades access for read and write of the SAS analytic data. StorNext configuration details Metadata StripeGroup 4 x 200 GB LUNs (RAID 1 Mirrored Pair) StripeBreadth 128KB Data StripeGroups 4 StripeGroups 4 LUNs per StripeGroup StripeBreadth 128KB Modified StorNext configuration file settings The values mentioned below are modified from their StorNext default settings. Any values not mentioned are recommended to be left at the StorNext default configuration file setting. Global Section of the Configuration File FsBlockSize 64K JournalSize 64M BufferCacheSize 64M InodeCacheSize 128K MaxConnections 128 ThreadPoolSize 128 9
For more information HP and SAS: www.hp.com/go/sas HP Complimentary Customized Sizing: sastech@hp.com HP BladeSystem c-class Server Blades: http://www.hp.com/servers/cclass HP 3PAR Storage Systems: http://www.hp.com/go/3par Quantum StorNext File System: http://www.quantum.com/stornext SAS Partner Directory: http://www.sas.com/partners/directory/hp SAS Grid Computing: http://support.sas.com/rnd/scalability/grid/ To help us improve our documents, please provide feedback at http://h71019.www7.hp.com/activeanswers/us/en/solutions/technical_tools_feedback.html. Copyright 2007, 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. The only warranties for HP products and services are set forth in the express warranty statements accompanying such products and services. Nothing herein should be construed as constituting an additional warranty. HP shall not be liable for technical or editorial errors or omissions contained herein. Microsoft and Windows are U.S. registered trademarks of Microsoft Corporation. UNIX is a registered trademark of The Open Group. Intel and Xeon are trademarks of Intel Corporation in the U.S. and other countries. 4AA1-2765ENW, Created October 2007; Updated February 2012, Rev. 1