High Performance Oracle RAC Clusters A study of SSD SAN storage A Datapipe White Paper
Contents Introduction... 3 Disclaimer... 3 Problem Statement... 3 Storage Definitions... 3 Testing Method... 3 Test Cluster Implementation Diagram... 4 Hardware List... 5 HP DL380g5 (dbcluster)... 5 HP DL385g2 (test engine)... 5 QLogic Infiniband Switch... 5 HP 3PAR T-Class... 6 Pure Storage FlashArray FA-320... 7 HammerOra... 8 TPC-C... 8 Database Configuration... 8 Test 1. Light Workload... 9 Test 2. Medium Workload... 10 Test 3. Heavy Workload... 11 Test 4. Extreme Workload... 12 Test 5. RMAN backup and restore... 13 Workload Test Results Summary... 14 Summary... 15 Observations... 16 Conclusions... 16 Sources... 17 2
Introduction Performance tuning is a complex topic especially when it comes to Real Application Cluster (RAC) databases. A typical DBA is often confronted with a need to obtain more performance on an existing database workload. In many cases, this workload can not be recoded in any way. This paper will describe one approach to increasing performance without recoding an application. That approach is flash based SAN storage compared with spinning disk SAN storage. Disclaimer Datapipe has a large investment in 3par SAN storage. The purpose of this investigation is not to disparage or give a negative impression of the 3par storage system. 3par SAN storage is our primary solution. We are looking for solutions that are faster or less costly than traditional Fibre Channel or SSD based systems to use as a supplement to the existing SAN infrastructure. Problem Statement Determine if a flash memory based fiber attached SAN can provide a noticeable benefit to an OLTP database. Storage Definitions 3par FC Storage 3par FC (Tier 1) storage refers to a spinning disk based storage located on a 3par T-Series SAN. The disks are 15k fiber channel in RAID 1+0. The term 3parFC storage will be used from now on. 3par SSD Storage 3par SSD (Tier SSD) storage refers to a SSD disk based storage located on a 3par T-Series SAN. The disks are SSD in RAID 1+0. The term 3parSSD storage will be used from now on. Pure SSD Storage Pure SSD storage refers to the Pure FlashArray. All disks are SSD. The term PureSSD will be used from now on. Testing Method All test results used in the creation of this report were generated using HammerOra, a product that easily allows for the creating and running of TPC-C benchmark run. For all tests, the tablespaces SYSTEM and SYSAUX and the database control files resided on a 3par LUN. Only the schema tablespaces, the undo tablespaces, archivelog location and the redo logs were adjusted. Our goal is not to run the entire DB on a particular storage, but to observe performance gains by relocating parts of the DB to faster storage in a mixed use setting. This method of testing allows for rapid adjustments without the downtime required to relocate controlfiles and system tablespaces. 3
Test Cluster Implementation Diagram A two node cluster with multiple SAN storage vendors. Figure 1 : Oracle RAC cluster with with 3par and Pure storage. 4
Hardware List Quantity Device 2 HP DL380g5 Servers 2 Qlogic 12300 Infiniband Switches 2 Qlogic Embedded Subnet Management Suite 2 HCA InfiniBand: Mellanox Technologies MHQH29C-XTC 6 QFSP 3Meter infiniband cable (The Mate Company P/N C9797-3M-IB-28) 2 HBA - QLogic Corp. ISP2432-based 4Gb Fibre Channel 1 3par SAN 100gb Fiber Channel Raid 1, 10gb Fiber Channel Raid 1. 1 3par SAN 100gb SSD Raid 1, 10gb SSD Raid 1 1 Pure FlashArray 100gb Raid 1, 10gb Raid 1 1 HP DL385g2 Server HP DL380g5 (db cluster) Intel Xeon CPU E5345 @ 2.33GHz 8gb RAM The HP ProLiant DL380 delivers on its proven history of design excellence with enterprise-class uptime and manage ability, proven 2 socket Intel Xeon performance, and 2U density for a variety of rack deployments and applications. HP DL385g2 (test engine) AMD: Dual-Core AMD Opteron Processor 2214 HE 2.2ghz 4gb RAM Qlogic Infiniband Switch Model: Qlogic 12300 Edge The Qlogic 12000 InfiniBand Switch Series of products adapts to a wide range of customer High Performance Computing (HPC) environments. Based on the Qlogic TrueScale ASIC platform, these advanced 40Gbps QDR InfiniBand switches offer the highest port count, the highest port density, and the most flexibility for organizations with HPC needs that are continually evolving. 40Gb (QDR) InfiniBand Switch offerings from 18 to 864 ports Qlogic TrueScale ASIC architecture, with scalable, predictable performance Comprehensive set of advanced InfiniBand features 5
HP 3PAR T-Class Storage Systems are designed to deliver enterprise IT as a utility service in order to drive agility and efficiency in enterprise-class, virtual and cloud data centers. The new Tier 1 storage for cloud computing, HP 3PAR T-Class Storage Systems are designed to deliver enterprise IT as a utility service simply, efficiently, and flexibly. T-Class arrays feature a tightly coupled clustered architecture, secure multi-tenancy, and mixed workload support to fuel enterprise-class virtual and cloud data centers. Use of unique thin technologies reduces acquisition and operational costs by up to 50 percent while autonomic management features improve administrative efficiency by up to tenfold. The HP 3PAR Gen3 ASIC in each of the system s controller nodes provides a hyper-efficient, silicon-based engine that drives on-the-fly storage optimization to maximize capacity utilization while delivering high service levels. T-Class arrays are built from the ground up to enable agile and efficient response to the changing business needs present in today s most demanding data centers. Quoted from HP s web site 6
Pure Storage FlashArray FA-320 The Pure Storage FlashArray is an all-flash enterprise storage array designed from the ground-up for 100% MLC flash memory. The FlashArray was designed to balance performance and economics, delivering 100,000s of IOPS of performance with consistent <1ms latency, but at a price point at or below traditional Tier 1 disk storage. The FlashArray drives down the cost of flash memory using a combination of MLC-grade flash memory (less expensive than traditional SLC flash) and inline data reduction technologies (deduplication, compression, pattern removal, and thin provisioning). The data reduction technologies are inline, high-performance, and are a core part of the FlashArray s architecture (so much so they can t be turned off). The FlashArray was designed for enterprise reliability, employing active/active high availability via clustered controllers, non-volatile RAM for persistence of in-flight data through power loss, and dual-parity-or-better drive loss protection via it s unique RAID architecture, RAID-3D. 7
HammerOra. HammerOra is an open source load test tool for the Oracle, Microsoft SQL Server and MySQL Databases and Web Applications. More information about HammerOra can be found on the web site: http://hammerora.sourceforge.net/ For the tests conducted for this paper, the HammerOra tool was instructed to create a 40 warehouse database schema. Once completed the schema was exported using datapump and then imported into new schemas that were placed on each of the storage tiers to be tested. In this way, the test schemas are all identical. TPC-C The HammerOra tool utilizes a TPC-C benchmarking methodology. TPC-C implements a computer system to fulfill orders from customers to supply products from a company. The company sells 100,000 items and keeps its stock in warehouses. Each warehouse has 10 sales districts and each district serves 3000 customers. The customers call the company whose operators take the order, each order containing a number of items. Orders are usually satisfied from the local warehouse however a small number of items are not in stock at a particular point in time and are supplied by an alternative warehouse. Figure 1 shows this company structure. Database configuration Six ASM Disk groups were created. Two for each storage type 2 disk groups for Pure SSD. CREATE DISKGROUP PURE100 EXTERNAL REDUNDANCY DISK ORCL:PURE100; CREATE DISKGROUP PURE10 EXTERNAL REDUNDANCY DISK ORCL:PURE10; 2 disk groups for Tier 1 CREATE DISKGROUP IDATA6 EXTERNAL REDUNDANCY DISK ORCL:IDATA6 ; CREATE DISKGROUP DATA EXTERNAL REDUNDANCY DISK ORCL:DATA ; 2 disk groups for 3par SSD CREATE DISKGROUP SSD0 EXTERNAL REDUNDANCY DISK ORCL:SSD0 ; CREATE DISKGROUP SSD1 EXTERNAL REDUNDANCY DISK ORCL:SSD1 ; 8
Test 1. Light workload Figure 2: TPC-C benchmark; 3parFC Figure 3: TPC-C benchmark; 3parSSD Figure 4: TPC-C benchmark; PureSSD Test results 3ParFC 3ParSSD PureSSD Virtual Users 10 10 10 Iterations 5 5 5 Transactions 2,000 2,000 2,000 Total Transactions 100,000 100,000 100,000 Average TPM 30.303 59,405 76923 Test Duration 03 min 18 sec 01 min 41 sec 01 min 18 sec 9
Test 2. Medium workload Figure 5: TPC-C benchmark; 3parFC Figure 6: TPC-C benchmark; 3parSSD Figure 7: TPC-C benchmark; PureSSD Test 2 results 3ParFC 3ParSSD PureSSD Virtual Users 10 10 10 Iterations 50 50 50 Transactions 2,000 2,000 2,000 Total Transactions 1,000,000 1,000,000 1,000,000 Average TPM 45,454 49,958 82,644 Test Duration 22 min 00 sec 20 min 01 sec 12 min 06 sec 10
Test 3. Heavy Workload Figure 8: TPC-C benchmark; 3parFC Figure 9: TPC-C benchmark 3parSSD Figure 10: TPC-C benchmark; PureSSD Test 3 results 3ParFC 3ParSSD PureSSD Virtual Users 20 20 20 Iterations 50 50 50 Transactions 2,000 2,000 2,000 Total Transactions 2,000,000 2,000,000 2,000,000 Average TPM 74,349 89,485 117,073 Test Duration 26 min 54 sec 22 min 21 sec 17 min 05 sec 11
Test 4. Extreme Workload Figure 8: TPC-C benchmark; 3parFC Figure 9: TPC-C benchmark 3parSSD Figure 10: TPC-C benchmark; PureSSD Test 4 results 3ParFC 3ParSSD PureSSD Virtual Users 100 100 100 Iterations 10 10 10 Transactions 5,000 5,000 5,000 Total Transactions 5,000,000 5,000,000 5,000,000 Average TPM 99,667 88,261 105,894 Test Duration 50 min 10 sec 56 min 39 sec 47 min 13 sec 12
Test 5. RMAN backup and restore Any storage method we utilize has to function with Oracle s backup and recovery tool RMAN. An RMAN restore test was performed to validate the Pure FlashArray SAN. Delete HORAPURE datafile: ASMCMD> rm +PURE100/ibrac/horapure.dbf Restore and recovere tablespace: RMAN> connect target / connected to target database: IBRAC (DBID=1040728569) using target database control file instead of recovery catalog RMAN> restore tablespace HORAPURE; Starting restore at 13-JUL-12 using channel ORA_DISK_1 channel ORA_DISK_1: starting datafile backup set restore channel ORA_DISK_1: specifying datafile(s) to restore from backup set channel ORA_DISK_1: restoring datafile 00007 to +PURE100/ibrac/horapure.dbf channel ORA_DISK_1: reading from backup piece /u01/app/oracle/product/11.2.0/dbhome_1/dbs/6bng0tld_1_1 channel ORA_DISK_1: piece handle=/u01/app/oracle/product/11.2.0/ dbhome_1/dbs/6bng0tld_1_1 tag=tag20120713t201853 channel ORA_DISK_1: restored backup piece 1 channel ORA_DISK_1: restore complete, elapsed time: 00:19:05 Finished restore at 13-JUL-12 RMAN> recover tablespace HORAPURE; Starting recover at 13-JUL-12 using channel ORA_DISK_1 starting media recovery media recovery complete, elapsed time: 00:00:01 Finished recover at 13-JUL-12 RMAN> sql alter tablespace HORAPURE online ; sql statement: alter tablespace HORAPURE online Verify availability SQL> select count(*) from HORAPURE.customer; COUNT(*) ---------- 1200000 13
Workload Test Results Summary Figure 1: Execution Times in Seconds Elapsed Time Transactions TPS % improvement PureSSD Light Run 78 sec 100,000 1,282 60% 3ParSSD Light Run 101 sec 100,000 990 3ParFC Light Run 198 sec 100,000 505 PureSSD Medium Run 726 sec 1,000,000 1,377 45% 3ParSSD Medium Run 1201 sec 1,000,000 832 3ParFC Medium Run 1320 sec 1,000,000 757 PureSSD Heavy Run 1025 sec 2,000,000 1,951 36% 3ParSSD Heavy Run 1341 sec 2,000,000 1,491 3ParFC Heavy Run 1614 sec 2,000,000 1,239 PureSSD Extreme Run 2833 sec 5,000,000 1,764 5% 3ParSSD Extreme Run 3399 sec 5,000,000 1,471 3ParFC Extreme Run 3010 sec 5,000,000 1,661 Formula for % improvement is: ( Tier 1 Tier 0) / Tier 1* 100 14
Summary The Figures 2 through 10 are snapshots of the graphs generated by the testing engine. These graphs were captured at various points during the testing process. The graphs were included in this report to show trending information and also to show the maximum TPM values calculated during the test run by the testing engine (the numbers on the left side of the graph are the TMP values at one particular time reference. The numbers along the bottom of the graphs are time stamps. As a reminder, keep in mind that this is a mixed storage cluster. The goal is to demonstrate that adding faster storage to specific areas can improve overall performance. For these tests, the schema data, the redo logs, archivelogs and the undo tablespaces were located on each of the 3 different storage types so as to record the performance metrics from each adjustment. The rest of the database consisting of temporary tablespace, system tablespace, and control files were always placed on the same storage volume. The first three tests results show a pronounced improvement in test. Test 1 results indicate a 60% improvement for 10 virtual users to complete a total of 100,000 db transactions. Test 2 results indicate a 45% improvement for 10 virtual users to complete a total of 1,000,000 db transactions. Test 3 results indicate a 36% improvement for 20 virtual users to complete a total of 2,000,000 db transactions. Test 4 results indicate little improvement and even a slight degradation for 100 virtual users to complete 5,000,000 transactions each. However, it is my belief that these tests results should be ignored. A further inspection of the db cluster and test engine server indicate that 25% of the time used during Test 4 runs on all storage layers was consumed by CPU waits on both the DB cluster and test engine server. Another 30% of the time involved network waits between the test engine and db cluster. In short, all indications are that Test 4 workloads simply overwhelmed the infrastructure. Test 5 was conducted to ensure that Oracle s native backup and recovery tools are able to function properly. All RMAN testing was successful. 15
Observations The native asmca GUI tool that Oracle provides for ASM disk group management is easily able to detect and utilize disk LUNs provisioned from the Pure FlashArray. Oracle linux device-mapper-multipath is able to properly detect and managed the disk LUNs provide from the Pure FlashArray. Oracle ASMLIB is able to label and present to the Oracle Grid software, the devices created on top of the Pure FlashArray LUNs. Conclusions My overall opinion of the Pure FlashArray storage device is very favorable. Performance obtained from the device exceeded expectations. The ability to add this device directly into our existing fiber fabric and to use the device in a mixed storage environment along side our existing SAN infrastructure is a huge plus. Our original goal of demonstrating at least a 10% performance gain was easily achieved. 16
Sources Pure FlashArray http://www.purestorage.com/flash-array/ Infiniband http://en.wikipedia.org/wiki/infiniband Switched Topology http://en.wikipedia.org/wiki/switched_fabric Quad Data Rate http://en.wikipedia.org/wiki/quad_data_rate HP DL380g5 http://h10010.www1.hp.com/wwpc/ca/en/sm/wf0 5a/15351-15351-3328412-241475-241475-1121516.html QLogic 12300 Infiniband Switch http://www.qlogic.com/products/switches/pages/infinibandswitches.aspx Mellanox Infiniband HCA http://www.mellanox.com/content/pages.php?pg=products_dyn&product_family=4&menu_ section=41#tab-two Oracle Unbreakable Enterprise Kernel http://www.oracle.com/us/technologies/linux/ubreakable-enterprise-kernel-linux-173350.html Hammerora the open source oracle load test tool http://hammerora.sourceforge.net/ 3par T-Class Storage System http://h10010.www1.hp.com/wwpc/us/en/sm/wf0 5a/12169-304616-5044010-5044010-5044010-5044216.html?dnr=1 TPC-C on-line transaction processing benchmark. http://www.tpc.org/tpcc/ 17