Data Center Performance Insurance How NFS Caching Guarantees Rapid Response Times During Peak Workloads November 2010
2 Saving Millions By Making It Easier And Faster Every year slow data centers and application failures cost companies hundreds of millions of dollars. Centralized storage caching applies the well-known concept of caching using high-speed DRAM & flash memory, but adds a new and innovative architecture which offers data center performance insurance. Data Center Challenge: Surviving Peak Workloads Typically, a data center s inability to process peak workloads stems from the I/O bottleneck inherent to traditional storage architectures. Facing pain from slow and sequential data access using mechanical hard disk drives, attempts to solve the problem range from over-provisioning parallel disks to placing cache memory directly in compute servers or storage devices. All of the proposed solutions have been expensive and unable to close the widening server-storage performance gap. 1 Shortfall Of Existing Solutions 1. Parallelizing disk I/O does not accelerate response time. It still takes milliseconds to access data on a mechanical disk drive no matter how many of them are available. 2. Traditional cache capacity is very limited in servers or storage systems. Storage experts recommend sizing a disk cache at ten percent of the disk s capacity. Following this rule-of-thumb a terabyte disk would need 100GB of cache which is unheard of. 3. Server and storage devices contain closed caches in that the cache resource is not usable by any other devices. Disk Drive Performance Shortfall Using multiple disk drives and striping data across them to increase I/O operations per seconds (IOPs) can improve throughput but not reduce I/O response times. 2 The root cause is the mechanical process of accessing disk data. 1 The Server Storage Performance Gap, Whitepaper, Violin Memory and The IO Performance Gap, StorageIO Group 2 The Disk Drive Shortfall, Technical Whitepaper, Violin Memory
3 Figure 1. Typical Disk Drive Based Storage Device Performance Profile Moving physical parts rotating the magnetic platter and the actuator implies a significant millisecond delay in responding. As additional activity or application workload increases, subsequent I/O requests stall, causing an I/O request queue. Although I/O queues can be reduced by parallel processing, individual I/O response times stay the same. For typical drive-based storage devices the I/O response time increases with a growing number IOPs. When I/O bottlenecks emerge, response time exceeds acceptable Service Level Agreements (SLAs) as shown in Figure 1. To date there has not been any way to insure a specific service level for IOPs. Caching Caching is a well known method that has proven extremely effective in many areas of computer design. Mitigating I/O bottlenecks with local caches duplicating original values has often been used to accelerate data access. Once the data is stored in a local cache, future operations access the cached copy rather than re-fetching data from the mechanical disk drive. Until now the caching concept has been used primarily in compute servers and storage devices both of which have strict limitations.
4 Server- based Caching Shortfall Server-based caching uses part of a compute server s main memory to cache data either within the application or in the storage device driver. The amount of usable memory within a computer server is typically capacity constrained as the application consumes most of the memory itself. Figure 2. Server- based Caching Sever-based caching does not scale passed the compute server, thus making it a nonsharable limited resource, as shown in Figure 2. Storage Device Caching Shortfall Storage device caching equips the storage subsystem with memory to cache frequently accessed data. Typically the cache is proprietary and small. As an example, for example 300GB disks contain 16MB of cache, but would and would need actually 30GB to be consistent with experts sizing recommendations discussed earlier. A 100TB storage system may need 10TB of cache, but the typical NFS storage system only supports 1TB of relatively slow MLC flash. Storage device caching does not scale passed the particular storage subsystem, making it a non-sharable resource, as shown in Figure 3.
5 Figure 3. Storage Device Caching And placing cache in a storage subsystem is a costly affair that customers are forced to accept because of the proprietary nature of most storage systems. Solution: Centralized Storage Caching Centralized storage caching applies the well-known concept of caching to create a central, sharable, and scalable caching resource that works with existing data center architectures. It keeps frequently accessed data in a very large central memory pool instead of relying solely on traditional hard disk drives. For example the vcache system leverages flash technology to enable 1-15 terabytes of NFS Cache. Centralized storage caching enables high-performance data access by avoiding time-consuming disk I/O and accelerates applications due to minimal I/O response times and increased data throughput. Figure 4. Centralized Storage Caching Centralized storage caching can be implemented with a sharable and scalable caching system that transparently integrates with existing data center architectures. This means no software to be installed or hardware to be added to existing compute servers or storage subsystems. It can keep frequently accessed data from hundreds of storage systems at hand and service I/O requests from thousands of concurrent clients in parallel with minimal response times.
6 Consolidating cache resources maximizes their use through sharing and simplifies management as a single scalable resource.
7 Violin Memory vcache Technologies The Violin Memory architecture for scalable NFS caching systems, called vcache, is based on a number of indispensable technologies that provide minimal response times and high-performance parallelism for a large cache. Figure 5. vcache Technology Architecture. Connected Memory Architecture The Connected Memory Architecture combines DRAM and flash memory into a unified large cache pool. This patent-pending technology is the foundation for scalable caching appliances and responsible for sharing data across all modules. Memory Coherence Protocol The Memory Coherence Protocol ensures constant response times across a large number of DRAM-based cache modules. Real Time Compressor The real time data compressor dramatically accelerates the internal network throughput and the cache memory. This patent-pending technology is responsible for a solution that goes beyond the traditional physically available limits. Cache Directory The Cache Directory is a shared resource, available across all caching modules, that contains the data and intelligence about current cache content. It is used by the policy engine, managed by the cache manager, and accessed by storage clients.
8 Cache Manager The Cache Manager provides a simple cache resource management framework for cache memory policies to be created or modified. System Manager The System Manager provides a simple cache module management framework to manage cache modules. Policy Engine The Policy Engine enforces heuristic or user defined policies for caching algorithms, or event driven caching. Application Caching Profiles The Application Caching Profiles are settings optimized for particular workloads and data center applications such as databases, small number of large files operations, or large number of small file operations. Data Center Performance Insurance Service level agreements for data center applications commit to an acceptable response time. Typically the acceptable threshold level is based on customer requirements specifying measurable objectives. When additional workload hits an I/O constrained disk based storage subsystem, response time increases and performances suffers. The more severe the bottleneck and the higher the peak workload, the faster response time will deteriorate (e.g. increase) from acceptable levels. Many times when data center applications drive more than 100k IOPs performance SLA will fall short as shown in the Disk Drive Profile of Figure 6.
9 Figure 6. Data Center with Performance Insurance Centralized storage caching provides protection by guaranteeing minimal response times (smaller than 1ms even above 100K IOPs). This keeps response times at constantly acceptable levels, even through peak workloads. Conclusion Centralized storage caching provides data center performance insurance by neutralizing and eliminating I/O bottlenecks. Its seamless integration with existing infrastructure makes it easy to deploy without disrupting IT or business operations. This central, sharable, and scalable approach efficiently ensures meeting robust service level agreements in the face of escalating data center demands.
10 Violin Memory accelerates storage and delivers real time application performance with vcache NFS caching. Deployed in the data center, Violin Memory scalable vcache systems provide scalable and transparent acceleration for existing storage infrastructures to speed up applications, eliminate peak load disruptions, and simplify enterprise configurations. 2010 Violin Memory. All rights reserved. All other trademarks and copyrights are property of their respective owners. Information provided in this paper may be subject to change. For more information, visit www.violin-memory.com Contact Violin Violin Memory, Inc. USA 2700 Garcia Ave, Suite 100, Mountain View, CA 94043 33 Wood Ave South, 3rd Floor, Iselin, NJ 08830 888) 9- VIOLIN Ext 10 or (888) 984-6546 Ext 10 Email: sales@violin- memory.com www.violin- memory.com