E-Guide DATA REDUCTION FOR ALL-FLASH STORAGE SearchSolidState Storage
W hen deploying flash technology, the most efficient system is one that performs data reduction techniques inline. George Crump offers criteria to help IT pros decide whether performance or function is most important when choosing all-flash storage arrays. PAGE 2 OF 13
DATA REDUCTION METHODS FOR THE ALL-FLASH DATA CENTER George Crump The concept of an all-flash is appealing because it would eliminate time-consuming tuning exercises. It would also allow s to achieve maximum virtual machine density while keeping application owners happy with storage response times. methods such as deduplication, compression and thin provisioning, along with the general decrease in flash-per-gigabyte (GB) prices, are moving the all-flash from concept to reality. Few vendors deliver all three pieces of the data reduction puzzle, so it is important to know which method, if any, is best for your organization. When considering data reduction to make flash more affordable, you need to figure out the possible performance impact. Adding any layer on top of a nearzero-latent storage medium will affect performance, but the critical question is Will the applications or users notice the impact of that layer? You can always lessen the performance impact with additional processing or memory. PAGE 3 OF 13
PICK YOUR DATA REDUCTION METHOD For the vast majority of s, the overhead associated with available data reduction techniques will be virtually unnoticeable. These systems have the performance to spare that most s can t take advantage of, so spending a few of those cycles to drive down the cost of a flash storage system is worth it. Thin provisioning is a sound investment for almost every environment. There is overhead in dynamically adding to a volume s capacity, but it is minimal. This technique is important because other forms of data reduction can t optimize it. That capacity is hard allocated to a given LUN and can t be shared. Deduplication eliminates redundant segments of data across files. The deduplication payoff can be significant, especially in virtual environments where there is so much commonality between guest operating systems. Deduplication can extract a significant performance toll, however. It creates a large amount of metadata to track unique data and pointers from what would be redundant data. Quickly traversing the metadata that deduplication requires is critical for overall system performance. While flash memory certainly helps, tracking redundancy as the system scales requires CPU power, which may raise the price of the storage system. PAGE 4 OF 13
Compression reduces storage capacity consumption by essentially eliminating redundancy within files instead of across files. While compression does not provide the impressive 9:1 type of reduction offered by deduplication, it provides a more consistent result because it operates on all the files and does not require redundancy across files. This in-file efficiency makes compression ideal for databases and other single file information. THE INLINE REQUIREMENT brings two distinct benefits to all-flash and hybrid flash storage systems: A reduction in the total capacity required. Many all-flash array vendors claim a price point of less than $3 per GB, and some even claim a price point below $1. The actual result will vary based on the level of efficiency realized, and each will be somewhat unique in how efficient these technologies will be., if done inline, should extend the life expectancy of the flash modules. The write limitations of flash modules have been well documented, and there are a predetermined number of writes they can receive. PAGE 5 OF 13
Performing all three data reduction methods prior to data being written to flash is called inline data efficiency. For example, if you used all three data reduction methods, you would achieve a 5:1 efficiency ratio -- a reasonable result. A 5:1 efficiency ratio translates into a 500% potential reduction in write traffic, extending the life of the flash module significantly. Which method is best depends on the use case -- most s are now looking to deploy flash across a wide variety of workloads. At one time or another, each data reduction method will be best for a given workload. For mixed workloads, the most efficient system is one that has all three capabilities and performs its data reductions inline. But few systems, at this point, provide all three capabilities. For specific use cases, the answer will vary. For example, in a database environment, a system that just performs compression is adequate. If that database is on the extreme edge of demanding performance, then a system with no data reduction or the ability to turn off data reduction may be necessary. Virtual environments may be able to leverage a system that can only provide deduplication. PAGE 6 OF 13
DATA REDUCTION ALTERNATIVE: NATIVE CAPACITY An alternative to data reduction is native capacity. In the past, not using data reduction for a general-purpose flash array made the system too expensive. But now, thanks to new, high-density flash technologies like triple-level cell (TLC) and 3D NAND, storage systems that use them can break the $1 per GB barrier. While the durability of these technologies is an even greater concern, they could be combined with a more reliable single-level cell tier of flash to act as a shock absorber to the more write-sensitive TLC tier. The advantages of this approach are that the knows exactly what the cost per GB is, there is no data reduction variable and there is no concern about performance overhead from its use. Without a doubt, data reduction has made the concept of an all-flash more realistic. Each pillar of data reduction -- deduplication, compression and thin provisioning -- has value. However, these methods are most effective when flash arrays can provide all three at the same time and do so inline before data is written to the flash modules. PAGE 7 OF 13
ALL-FLASH STORAGE ARRAYS: PERFORMANCE VS. FUNCTION George Crump Nearly every storage vendor now offers all-flash storage arrays, and IT professionals are beginning to recognize the need for these high-performance storage systems. But how does an IT pro decide which of the many all-flash arrays are best suited for their organization and performance demands? PERFORMANCE VS. FUNCTION As the all-flash storage array market begins to mature, there are two categories of arrays emerging. The first are all-flash arrays that were designed from the ground up to be all-flash arrays. They typically have optimized hardware designs that focus on extracting the maximum possible performance from the flash within the array. The vendors in this space are almost all emerging technology companies or startups. In most cases, their focus on hardware and performance is at the expense of storage software services. These are the features that many storage PAGE 8 OF 13
administrators now count on to do their jobs, providing capabilities like snapshots,replication and cloning. These arrays are known for generating millions of IOPS per system. However, there really is no established method for how those high IOPS numbers are obtained. They can be generated from a single workload or multiple workloads accessing the system at the same time. The other category is made up of all-flash arrays that are more featureoriented. These are typically systems from established vendors, as well as a few startups, that choose to focus on the software functionality (providing a feature-rich experience), often at the expense of maximum performance. Typically, these systems either use legacy hardware from the established vendor and retrofit their old arrays with solid-state drives (SSD) or, in the case of a startup, use off-the-shelf hardware to keep costs down. These systems can often generate 200-400k IOPS per system. Some scaleout, software-rich systems will claim an aggregate performance of millions of IOPS as well but, as mentioned above, the devil is in the details. They typically have a performance limit per volume or per node within the scale-out cluster. This means they can scale to millions of IOPS like the performance-focused systems described above, but it takes many nodes to get there and to see that PAGE 9 OF 13
extreme performance requires multiple workloads all running concurrently. A scale-out system cannot deliver millions of IOPS to a single workload or thread. WHICH IS BEST? We are often asked which method is best. The answer, as usual, depends on the needs of the and the specific applications that are running. Most s, while performance-constrained, are not constrained to the point that they will typically exceed the baseline performance of a feature-rich allflash array. Also, most organizations will take great comfort in the availability of the feature sets they have become accustomed to from legacy hard disk arrays. There are environments with a need for more than a half million IOPS, but it s how those IOPS are needed that will help determine the best system for a particular. If the need for performance is distributed across more than a few workloads, the all-flash systems that can provide scale-out linear performance growth are ideal. If the environment has a single workload that needs more than half a million IOPS, then the performance-focused systems are needed. As stated above, these systems can provide millions of IOPS to a single workload. PAGE 10 OF 13
MIDDLE GROUND? Is there room in the middle? Does a storage system exist that can meet the needs of a performance-demanding workload, yet still provide the featurerich environment that more traditional applications require? There are several vendors that provide this class of solution. This type of system must be designed first as a performance-focused system, then have software added to it. While the addition of that software will add some latency, it will not impact most applications. These systems typically have performance to spare. This software can be added in several ways. Some vendors provide an appliance that the performance-focused system can be connected into, allowing it to take advantage of all the features that the appliance can provide. This storage virtualization approach also allows the all-flash array to be somewhat integrated from a software services perspective. Other vendors have the ability to load storage software onto a co-processor within the flash array itself. This provides a tighter integration experience and saves the cost of an external appliance. Finally, all of these hardware-focused systems could work with any of the software-defined storage solutions that are on the market today, including those converged solutions that run within the hypervisor architecture. The PAGE 11 OF 13
key, though, is to make sure that that software-defined solution can support external, shared storage (not all do). While combining a hardware-focused solution with either an appliance or hypervisor that delivers the storage services, it s key to remember there remains one big challenge. That hardware-focused flash solution must be delivered at a price point (including software) that is in the same range as the feature-rich solutions described above. In most cases, the feature-rich solutions are still the most cost-effective, and again, 400k IOPS is more than enough for most s. All-flash arrays are becoming mainstream. Many vendors in the space claim price parity with performance-focused hard drive arrays. These would be arrays from name-brand vendors that are using 15K RPM drives. This claim is generally true, so any looking to buy a performance-focused disk array should be seriously considering an all-flash array. The choice within the all-flash segment is largely dependent on what the needs of the are. For most s, the feature-rich solutions will be all they need. But it may be worth the investigative step to confirm that and to then determine if they need a scale-up or scale-out system. PAGE 12 OF 13
FREE RESOURCES FOR TECHNOLOGY PROFESSIONALS TechTarget publishes targeted technology media that address your need for information and resources for researching products, developing strategy and making cost-effective purchase decisions. Our network of technology-specific Web sites gives you access to industry experts, independent content and analysis and the Web s largest library of vendor-provided white papers, webcasts, podcasts, videos, virtual trade shows, research reports and more drawing on the rich R&D resources of technology providers to address market trends, challenges and solutions. Our live events and virtual seminars give you access to vendor neutral, expert commentary and advice on the issues and challenges you face daily. Our social community IT Knowledge Exchange allows you to share real world information in real time with peers and experts. WHAT MAKES TECHTARGET UNIQUE? TechTarget is squarely focused on the enterprise IT space. Our team of editors and network of industry experts provide the richest, most relevant content to IT professionals and management. We leverage the immediacy of the Web, the networking and face-to-face opportunities of events and virtual events, and the ability to interact with peers all to create compelling and actionable information for enterprise IT professionals across all industries and markets. PAGE 13 OF 13