EMC Unified Storage for Microsoft SQL Server 2008 Enabled by EMC CLARiiON and EMC FAST Cache Reference
Copyright 2010 EMC Corporation. All rights reserved. Published October, 2010 EMC believes the information in this publication is accurate as of its publication date. The information is subject to change without notice. Benchmark results are highly dependent upon workload, specific application requirements, and system design and implementation. Relative system performance will vary as a result of these and other factors. Therefore, this workload should not be used as a substitute for a specific customer application benchmark when critical capacity planning and/or product evaluation decisions are contemplated. All performance data contained in this report was obtained in a rigorously controlled environment. Results obtained in other operating environments may vary significantly. EMC Corporation does not warrant or represent that a user can or will achieve similar performance expressed in transactions per minute. No warranty of system performance or price/performance is expressed or implied in this document. Use, copying, and distribution of any EMC software described in this publication requires an applicable software license. For the most up-to-date listing of EMC product names, see EMC Corporation Trademarks on EMC.com. All other trademarks used herein are the property of their respective owners. Part number: H8002.1 2
Table of Contents Reference architecture overview... 4 Solution architecture... 6 Key components... 8 Validated environment profile... 10 Hardware and software resources... 11 Conclusion... 12 3
Reference architecture overview Document purpose EMC's commitment to consistently maintain and improve quality is led by the Total Customer Experience (TCE) program, which is driven by Six Sigma methodologies. As a result, EMC has built Customer Integration Labs in its Global Solutions Centers to reflect real-world deployments in which TCE use cases are developed and executed. These use cases provide EMC with an insight into the challenges currently facing its customers. This document describes a proven reference architecture validated at EMC s Unified Storage Solutions Lab in Research Triangle Park, North Carolina. The architecture provides a known starting point for the development of a customer implementation as well as guidelines for how to apply best practices in the real world. Solution purpose The purpose of this reference architecture is to build a unified storage solution using EMC s unified storage platform and describe the benefits of the EMC Celerra Fully Automated Storage Tiering (FAST) Cache functionality in an online transaction processing (OLTP) database environment. This reference architecture validates the performance of all aspects of the solution and provides guidelines for building similar solutions. This reference architecture is not intended to be a comprehensive guide to every aspect of the EMC Unified Storage for Microsoft SQL Server 2008 Enabled by EMC CLARiiON and EMC FAST Cache solution. The business challenge Databases are an ever-present and ever-growing facet of any modern business system. These databases hold critical data and are the backbone of most, if not all, corporate information systems. These databases are updated and upgraded, on a regular basis, to meet constantly changing and ever-increasing user requirements. Surveys on customer environments indicate that storage is one attribute where constant resource allocation is required to enhance the functionality of the system. Most databases use spinning-disk media as the destination of all their data in production. Most people think of disks in terms of how much data they can store. We will call this storage capacity. We consider the growth of data in terms of gigabytes (GB) and terabytes (TB) to be a measurement of the required storage capacity of the underlying disk system. There is, however, another constraint in the design of storage systems how quickly data can be accessed. We will call this performance capacity. Because the performance capacity of the disk directly translates into user experience, this constraint often exhibits challenges in planning and maintaining any production system. Traditionally, storage administrators have added both performance and storage capacity to a system in the same way the addition of disk spindles and various volume management techniques to harness the newly expanded limits of the system. This method, while functional, is inefficient. It often leads to a method called short-stroking a drive whereby only a small portion of the drive s storage capacity is used, while drawing its entire performance capacity. This leaves the additional storage capacity on the drive essentially inaccessible without negatively impacting the performance of the database residing on it. 4
The technology solution The introduction of FAST Cache into a database storage system dramatically changes the basic performance calculation that has been familiar to storage administrators in the past. Storage administrators are encouraged to consider the storage array cache as a mechanism for ensuring consistent performance during increased load and spikes in activity. In contrast, most database administrators understand that even the most highly performing storage system in the world cannot match the memory of a local system in terms of the time taken to retrieve data. A core part of the job for many data architects and other database professionals is to optimize data structures within their system so that input/output (I/O) requests to the disk can be minimized and queries can be serviced by data that is already in memory. However, unless the data structures are small enough to be entirely contained in memory there will still be disk I/O that gets through. The obvious solution is to increase the size of the server memory. This becomes either expensive, or more likely not possible given the architecture of the server. In addition, the server memory can only help workload running on that particular server. If there are multiple database servers, this process repeats for each one of them. FAST Cache provides the complementary ability to add large amounts of read-write cache at the storage array level. This additional cache can be affined to a designated set of logical unit numbers (LUNs) so that in effect they become dedicated to the workloads that require them. From the database server perspective, the disks simply respond much faster to requests and this directly improves the query response time of the system. The solution benefits The solution provides the following benefits to an OLTP database environment: More efficient use of disk spindle resources Lower power consumption Faster response time 5
Solution architecture diagram The following diagram shows the overall physical architecture of the solution. Reference architecture overview This reference architecture contains a single Microsoft SQL server running against storage on an EMC storage array. The simplicity of the setup enables us to clearly observe the impact of the FAST Cache on the performance of the system. In practice, this feature may be used with many other databases concurrently as long as the planning process accounts for the fact that each database will contribute to the overall workload of the system. 6
Storage layout overview For testing purposes, a database was created using LUNs from a RAID 10 group for the database files and another RAID 10 group for the transaction log. The allocation of these LUNs followed the applicable best practices regarding the optimal number of data files with regard to the number of processor cores. In a real-world environment, this area will be sized based on the expected workload of the system. Network layout overview The Internet Protocol (IP) and Fibre Channel (FC) networks, as shown in the architecture diagram, are configured to be consistent with the relevant best practices and are architected for high availability. This requires redundant links between all the system components and no single point of failure in the design. The network was further sized so that it could easily handle the projected traffic of the test workload. 7
Key components Introduction This section briefly describes the key components of this solution. Microsoft SQL Server EMC unified storage FAST Cache Hardware and software resources on page 11 provides details on all the components that make up the reference architecture. Microsoft SQL Server Microsoft SQL Server is a relational database management system (RDBMS) that is commonly used in many business-critical applications to store, retrieve, and manage application data. It is sometimes considered an application, but is more properly considered as an application environment. Each individual business application has a dedicated set of database tables and associated query patterns that comprise the workload on the system as seen from a SQL server. While each application has a unique workload, there are common classifications of workload based on generalizing the type of data they are using. From these common classifications, one can arrive at benchmark standards, which can be used to compare the performance of systems. While these are not a substitute for detailed analysis of the actual workload, they can help start the planning process. The solution presented here is targeted for an OLTP workload consisting of many small transactions which are random in nature. The test workload is based on the TPC-C benchmark standard. EMC unified storage The EMC Celerra unified storage platform is a dedicated network server optimized for files and block access, delivering high-end features in a scalable, easy-to-use package. This makes it possible to dynamically grow, share, and cost-effectively manage multi-protocol file systems and also provide multi-protocol block access. Administrators can take advantage of simultaneous support for network file system (NFS) and Common Internet File System (CIFS) protocols by enabling Windows and Linux or UNIX clients to share files by using the sophisticated file-locking mechanisms and by leveraging iscsi or FC for high-bandwidth or latency-sensitive applications. This solution is proven using FC connections. However, there is nothing in the solution to prevent it from being successfully implemented using iscsi if a FC storage area network (SAN) is not available. 8
FAST Cache The EMC FAST Cache feature makes it possible to dramatically expand the read and write cache available in the storage array by using EMC Enterprise Flash Drive (EFD) devices as a cache location. Testing has shown that random access to data on EFDs is much faster than accessing data using a similar FC disk spindle. Prior to the introduction of FAST Cache, database administrators were forced to perform quite a bit of work in order to spread their data out and store only the most critical data in EFDs. This involved careful analysis of access patterns, and partitioning of tables to isolate data that could most benefit from faster access. This was necessary because storing the whole database in the EFD tier was not considered cost-effective. The FAST Cache feature removes the need for this rearchitecting of data. Instead of manual intervention and monitoring to continually move database partitions between storage tiers, the array simply uses the EFD devices like an extension of the array cache. This enables the currently active workload to obtain maximum benefit from the EFD devices with no interaction from the storage administrator. 9
Validated environment profile Profile characteristics The solution was validated with the following environment profile. Profile characteristic SQL database size Instances and databases Number of database files Workload Storage for SQL database Production SQL 2008 databases, RAID type, physical drive size, and speed FAST Cache configuration Value 400 GB Single instance and single database Four files each on a different LUN OLTP (TPC-C-like) FC storage RAID 10, 450 GB FC drives (15k rpm) 4x100 GB EFDs, RAID 1 configuration 10
Hardware and software resources Hardware The following table lists the hardware used to validate the solution. Equipment Quantity Configuration Notes Storage 1 CLARiiON CX4-480 450 GB FC drives (15k rpm) 100 GB EFDs Enterprise-class FC switch Enterprise network switch Primary database storage 1 4 GB FC switch Production systems may require additional hardware for high-availability purposes 1 Gigabit Ethernet (GbE) switch Production systems may require additional hardware for high-availability purposes Dell PowerEdge R710 1 Two quad-core Intel Xeon X5550 @ 2.67 GHz 64-bit processors 64 GB RAM Dell PowerEdge 2950 4 Two quad-core Intel Xeon E5440 @ 2.83 GHz Primary database server Load generation for test workload Software The following table lists the software used to validate the solution. Software Version Microsoft Windows Server Windows 2008 x64 Enterprise Edition SP2 Windows 2003 x32 Enterprise Edition R2 SP2 Microsoft SQL Server Enterprise Edition 2008 SP1 EMC CLARiiON FLARE 4.30.000.5.004 11
Conclusion Summary The addition of EMC FAST Cache to a well-designed database environment can dramatically improve the performance of the system without incurring proportional increases in cost. Most database environments are too large for a complete migration to EFD-based data LUNs. The EMC FAST Cache functionality enables a set of these devices to function as an additional caching mechanism on the storage array enabling the performance benefits of Flash-based storage without the expense of relocating the entire database, or the architectural redesign that is often required for performance partitioning. This solution describes how the introduction of FAST Cache in a simple SQL OLTP environment significantly improves the performance of the database environment with no additional DBA activity. In scenarios where multiple databases share the FAST Cache, a similar aggregate performance benefit is expected once the specific workloads are taken into account. Next steps EMC can help accelerate assessment, design, implementation, and management while lowering the implementation risks and costs of a unified storage solution for a Microsoft SQL Server environment. To learn more about this and other solutions, contact an EMC representative. 12