DELL EMC ELASTIC CLOUD STORAGE Single Site CAS API Performance Analysis ABSTRACT This white paper analyzes read and write performance for a single ECS system using CAS API protocol to read and write different-sized objects, and includes performance levels and results for this ECS 2.2 release. December, 2016 WHITE PAPER
The information in this publication is provided as is. EMC Corporation makes no representations or warranties of any kind with respect to the information in this publication, and specifically disclaims implied warranties of merchantability or fitness for a particular purpose. Use, copying, and distribution of any EMC software described in this publication requires an applicable software license. EMC 2, EMC, the EMC logo, and ECS are registered trademarks or trademarks of EMC Corporation in the United States and other countries. All other trademarks used herein are the property of their respective owners. Copyright 2016 EMC Corporation. All rights reserved. Published in the USA. 12/16, white paper, H14927.1 EMC believes the information in this document is accurate as of its publication date. The information is subject to change without notice. EMC is now part of the Dell group of companies. 2
TABLE OF CONTENTS EXECUTIVE SUMMARY (BODY SUBHEAD LEVEL 1 STYLE)...4 Audience... 4 Scope... 4 TEST ENVIORNMENT...4 ECS Setup... 4 Centera Setup... 4 Client and Network Setup... 4 PERFORMANCE DATA AND ANALYSIS...5 Optimal Thread Counts for Performance... 5 Single Thread Performance... 6 Peak Peformance Results... 6 Performance Results on Ramping Threads... 7 Comparing ECS 2.2 with CentraStar... 9 Comparing ECS 2.2 to the 2.1 and 2.0 Releases... 11 CONCLUSION... 13 3
EXECUTIVE SUMMARY This document analyzes read and write performance for a single Elastic Cloud Storage (ECS) system using CAS API protocol to read and write different-sized objects. The performance levels for this release are included in the results below. AUDIENCE This white paper is intended for internal technical pre-sales personnel who help design ECS solutions for customers. SCOPE The scope of this paper is predominately the single-site performance analysis of CAS API. Test Enviornment This section describes the test environment to achieve these results. ECS SETUP The ECS tests were run on a single ECS U1500 Appliance running ECS software 2.2. The U1500 includes the following hardware: Four nodes with 60 drives per node and 240 drives total. Phoenix blades in a 2U blade-server. Each blade has dual, 4-core Intel 2.4GHz Ivy Bridge (E5-2609) CPUs, 64GB RAM and dual 10Gb Ethernet ports. Two 10GbE switches as the top-of-the-rack switches that connect ECS nodes to the load-generating clients. One 1GbE switch used for internal communication among the ECS nodes and for out-of-band management. 60 x 6TB SATA drives per ECS node. There are 60 drives in each Disk Array Enclosure (DAE). Each DAE has a single 6Gb/s SAS connection to an ECS node. CENTERA SETUP The CentraStar tests were run on a single Centera Cluster configured with 16 Gen4LP 12TB Nodes, running CentraStar Software version 4.3P3. Each node has a total of 4 drives and 64 drives in total. Number of Storage-On-Access: 4 Number of Storage nodes: 12 CLIENT AND NETWORK SETUP The test clients use Cubetools, an EMC-developed performance tool that was specifically developed to test ECS performance. Cubetools is similar to other performance tools such as CosBench and Grinder. For each of the four ECS nodes, a single client delivers the I/O load using Cubetools. Four clients write to four ECS nodes simultaneously, which simulates the behavior of a load balancer running in front of the ECS nodes. The clients randomly write to the nodes to simulate the way a load balancer would evenly distribute loads across ECS nodes. A single 10Gb Ethernet connection links each client to the 10Gb top-of-the-rack ECS switches, as shown in Figure 1: ECS Performance Lab Topology. Each client in the test is an Intel server with the following equipment: One Xeon E5-2609 2.5GHz processor 64GB of RAM 4
SUSE Linux 11 A single 10GbE connection to the ECS appliance We used the Cubetools automation capability to do the following: Vary the number of threads for some tests, and keep the number of threads constant for other tests Create objects of different sizes Execute different operations such as create, read, or delete objects Calculate various metrics such as TPS, bandwidth and latency Flush the cache Figure 1: ECS Performance Lab Topology Performance Data and Analysis This section describes operational aspects of the tests. OPTIMAL THREAD COUNTS FOR PERFORMANCE In many environments, customers can control the number of threads that applications use to access ECS by changing the configuration of a multi-threaded application or by changing the number of clients. ECS performance depends on the number of threads clients use to read from and write to ECS. Up to a certain point, performance is directly proportional to the number of threads; that is, the more threads, the better the performance. However, after a certain threshold, performance plateaus or decreases because processing the threads becomes resource-intensive from the ECS perspective. 5
The threshold of the performance peak is set as 150 threads for both smaller and larger objects. These tests used a total of 150 threads (from 4 different clients) for smaller and larger objects. With the U1500 hardware configuration, you can increase the thread count up to 100 threads per node, which provides for a total of 400 threads for CAS API CRUD operations. SINGLE THREAD PERFORMANCE Figure 2 shows the Single thread performance of ECS 2.2, with TPS on the left axis and bandwidth on the right axis. For read performance of small objects, the number of objects per second ranges from about 14 to18, and this number holds steady from 1KB objects to 500KB objects, after which it falls to about 7 TPS for 5MB objects. For read performance larger objects, it is more relevant to look at bandwidth because large objects result in fewer transactions per second. For objects 100MB and larger, performance is 59MB/s. Figure 2: ECS 2.2 Single Thread Performance Results PEAK PEFORMANCE RESULTS Figure 3 shows peak performance of ECS 2.2, with TPS on the left axis and bandwidth on the right axis. For read performance of small objects, the number of objects per second is about 2900, and this number holds almost steady from 1KB objects until 500KB objects, after which it falls to about 348TPS for 5MB objects. For read performance of larger objects, it is more relevant to look at bandwidth because large objects result in fewer transactions per second. For larger objects, the performance is 1100 MB/s and holds steady for larger objects. In general, as the object size increases, the transactions per second TPS decreases. But this does not indicate a decrease in performance since the bandwidth or the bytes transferred per second is higher for large object size. 6
Figure 3: ECS 2.2 Peak Performance PERFORMANCE RESULTS ON RAMPING THREADS The next set of figures general read/write throughput and bandwidth at different object size and thread counts. Figure 4 shows the performance results for ramping threads for Write throughput. Figure 5 shows the performance results for ramping threads for Write bandwidth. Figure 6 shows the performance results for ramping threads for Read throughput. Figure 7 shows the performance results for ramping threads for Read bandwidth. Figure 4: Ramping Threads for Write Throughput 7
Figure 5: Ramping Threads for Write Bandwidth Figure 5: Ramping Threads for Write Bandwidth 8
Figure 7: Ramping Threads for Read Bandwidth COMPARING ECS 2.2 WITH CENTRASTAR This following figures present comparisons between ECS 2.2 and CentraStar, providing graphs of both throughput and bandwidth for each system. Figure 8 illustrates the comparison of single-thread Write performance. Figure 9 illustrates the comparison of single-thread Read performance. Figure 10 illustrates the comparison of Peak thread Write performance. Figure 11 illustrates the comparison of Peak thread Read performance. Figure 8: Single Thread Write Performance Comparison 9
Figure 9: Single Thread Read Performance Comparison Figure 10: Peak Thread Write Performance Comparison 10
Figure 11: Peak Thread Read Performance Comparison COMPARING ECS 2.2 TO THE 2.1 AND 2.0 RELEASES This section presents a comparison of performance results between ECS versions 2.2, 2.1 and 2.0 for Single-thread and Peak Read/Write operations. ECS 2.2 achieves considerable performance improvements compared to 2.1 and 2.0, as shown in Figures 11, 12, 13 and 14. These improvements are entirely software-driven, as the hardware for all appliance versions is the same. For single thread write performance, Figure 12 shows about 80% improvement in TPS for smaller object (1KB to 500KB) and up to a 30% improvement for large object size. The bandwidth performance improved for ECS 2.2 about 15% for large object size ranging from 500KB up to 20MB. For a small object the ECS 2.2 bandwidth performance is same as ECS 2.1. Figure 12: Comparison of Single Thread Write Performance between ECS Releases 11
For single thread read performance, Figure 13 shows about a 100% improvement in TPS for smaller object (1KB to 1MB) and up to a 30% improvement for large object size. The bandwidth performance improved a great deal for large object sizes. For small object sizes (1KB to 50KB) the ECS 2.2 bandwidth performance is same as 2.1. Figure 13: Comparison of Single Thread Read Performance between ECS Releases Figure 14 shows a comparison of performance results between ECS versions 2.2, 2.1 and 2.0 for Peak-thread Write operations. Figure 14: Comparison of Peak Thread Write Performance between ECS Releases 12
Figure 15 shows a comparison of performance results between ECS versions 2.2, 2.1 and 2.0 for Peak-thread Read operations. Figure 15: Comparison of Peak Thread Read Performance between ECS Releases Conclusion ECS 2.2 provides significant software-based performance increases over previous releases. There is a significant improvement of performance for small object sizes for both read and write operations. At lower thread counts, small object read are about three times faster in comparison to the prior ECS release. 13