Deploy smarter, faster infrastructure with IBM storage solutions IBM FlashSystem and Storage deliver extreme performance for big-data applications Contents 2 Storage for the information age 4 The problem: Mechanical limitations 4 The solution: Synergy of FlashSystem and Storage 6 Proof points: Scientific and medical high-performance computing 6 Beyond disk 7 Empowering growth and innovation, today and tomorrow 8 For more information Today s organizations are awash in high-volume, high-velocity information and the organizations that can transform that data into actionable insights can gain a clear competitive advantage. By pairing IBM FlashSystem storage systems with code name Storage 1 software, organizations can achieve rapid value from all their data. Each solution has a long history of engineering innovation, product maturity and customer deployment success. Working together, they enable commercial, scientific and governmental groups to tackle the largest and most complex computing problems on a global scale. This technical white paper explains how Storage, a high-performance enterprise file management platform, and the IBM FlashSystem family of all-flash storage arrays can help organizations harness the power of big data. It looks at how Storage and FlashSystem can increase system-wide efficiency, lower power consumption and slash the computing costs of current tasks while also offering the scalability to support future growth and innovation. In addition, the paper examines how Storage and FlashSystem are delivering real-world value for the Research Computing and Cyberinfrastructure group at Penn State University and a large US research hospital. You ll learn how Storage data management capabilities enable FlashSystem to accelerate not only metadata stores,
but also entire data working sets. Together, Storage and FlashSystem can help researchers do more with big data and complete tasks that seemed nearly impossible before. Storage for the information age This section introduces the key features of Storage and FlashSystem and how they can help your critical applications operate with high availability and peak performance. Storage Storage Storage: Key features and benefits Storage is a proven, scalable, high-performance data and file management solution used extensively across multiple industries worldwide. It provides simplified data management and integrated information lifecycle tools capable of managing petabytes of data and billions of files. As a type of clustered file system, Storage removes datarelated bottlenecks by providing parallel access to data, eliminating single-filer choke points or hot spots. Storage is portable operating system interface (POSIX)-compliant, so organizations can implement enterprise-class, file-based storage solutions without modifying their applications. Plus, Storage supports a wide range of file system block sizes and types to match I/O requirements. Storage also performs important data storage and management functions, including information lifecycle management (ILM), disk caching, snapshots, replication, striping and clustering. This means that organizations can store the right data in the right place at the right time, with the right performance and at the right cost automatically. Storage Storage enables fast access to local copies of data, while synchronizing data stores located around the world. Storage enables management of the ever-escalating waves of big data by sharing virtualized storage pools with practically limitless file system scaling. Built-in intelligent resource utilization enables optimized placement of data on persistent storage options flash, spinning disk or tape based on desired attributes such as cost per gigabyte (GB), input/output operations per second (IOPS), latency and capacity. Organizations can add capacity with standard hardware while continuing to manage storage as a single enterprise-class storage system. 2
Storage also features Active File Management (AFM) distributed disk-caching technology to extend low-latency access to data from anywhere in the world. As data is written or modified at one location, all other locations can access that same data (for example, users in Tokyo can easily access the same data as users in New York). Each local cluster provides low-latency, local read-write performance. In addition, to help protect data accessed globally, native encryption is built into Storage. Plus, its Secure Erase feature enables the destruction of large subsets of a file system using a cryptographic operation that is, no digital shredding or time-consuming overwriting is required. FlashSystem: Key features and benefits FlashSystem storage arrays are designed to deliver extreme performance, microlatency, macro efficiency, and enterprisegrade reliability and serviceability. They are powerful, cost-effective tools for accelerating leading-edge scientific, academic, governmental and commercial computing environments where Storage is often deployed. IBM FlashSystem 840 IBM FlashSystem 840 offers industry-leading performance, reliability and low latency, while IBM FlashSystem V840 Enterprise Performance Solution adds the full spectrum of enterprise-grade management and feature-rich storage services. Together, these IBM systems provide multiple options for addressing high-velocity data requirements, removing performance bottlenecks and increasing productivity. Key features include: IBM MicroLatency : To speed response times, FlashSystem supports 135 µs reads and 90 µs writes with a purpose-built and highly parallel design. This provides fast access to insights and customers while reducing operational costs. Extreme performance: With optimized IOPS and bandwidth performance, FlashSystem works well in demanding environments. It can easily support a single application with thousands of concurrent users as well as multiple applications with diverse workloads. Macro efficiency: FlashSystem provides high storage density, low energy consumption, and greater utilization of existing resources, offering up to 40 TB of usable capacity in only two units of rack space. It also uses only 625 watts of power, making it one of the most power-efficient products on the market. Enterprise reliability: By employing enterprise-grade multi-level cell (emlc) NAND flash technology, plus two RAID dimensions IBM Variable Stripe RAID at the flash module level as well as system-wide RAID FlashSystem delivers enterprise-grade data protection. It includes hotswappable components for rapid servicing; plus, software and firmware updates can be completed with the system up and running. In addition, FlashSystem supports AES 256 hardware-based encryption, while FlashSystem V840 offers advanced storage services such as snapshots and replication. 3
IBM Systems and Technology The problem: Mechanical limitations constructing, managing, maintaining and cooling such massive HDD-based storage systems, plus the need for high-velocity metadata performance, are driving the search for solutions with much smaller physical footprints, lower costs and, of course, higher performance. Processors have increased in speed by orders of magnitude over the years, but spinning hard disk drives (HDD) have not. This difference has created a substantial performance gap between how fast processors demand data and how quickly an HDDbased system can respond. HDD speed lags behind processor speed because it is constrained by physical components. The solution: Synergy of FlashSystem and Storage Data and file management solutions such as Storage have gone to extraordinary lengths over the years to mitigate the mechanical limitations of spinning disks. Brute-force horizontal scaling has been the most common strategy. As a parallel file system, Storage helps resolve data throughput or bandwidth challenges by enabling massively parallel access to enormous numbers of HDDs. For example, Storage is the file system of the National Center for Atmospheric Research (NCAR) Yellowstone Supercomputer, which includes more than 72,000 processors and approximately 6,800 3 TB disks totaling 16 PB of disk storage. When it comes to performance, parallel file systems such as Storage are mainly limited by the metadata store, and this is exactly where FlashSystem offers the most value. In Storage environments, metadata usually comprises less than two percent of the capacity of the file system, but it is, of course, heavily involved in file system operations. Storage cluster Bandwidth is not the only storage challenge for HDD-based environments. In fact, there are two main components that create the structure of a parallel file system: the file system data blocks and the metadata that describes the file system structure and block detail. Massively parallel file systems can generate enormous and complex metadata stores. These databases become throttles on file system performance and thus can impact overall compute performance. Although Storage can help mitigate metadata performance challenges using distributed file system controllers, the fact remains files can move only as fast as their metadata lets them. If metadata stores are extremely large in parallel file systems, the data stores themselves are also likely to be proportionately enormous, as illustrated by the 6,800 disk storage system noted for the NCAR Yellowstone Supercomputer. The costs of Storage cluster Data and metadata Data Metadata Storage cluster: Primary storage Storage cluster: Primary storage IBM FlashSystem IBM FlashSystem accelerates metadata stores and thus Storage primary data workloads as well. 4
Using FlashSystem storage for metadata acceleration can dramatically reduce the time required for file system maintenance operations and processing jobs that create many small files. Separating the metadata onto FlashSystem storage can also accelerate the entire file system, because the metadata small-block I/O operations no longer interfere with the large streaming accesses that many parallel compute workloads generate. In fact, compute tasks such as batch processing and nightly backups have been significantly shortened by moving metadata stores to FlashSystem storage. In addition to metadata acceleration, Storage also offers the scalable, efficient performance of its ILM toolset. By using storage pools, filesets and user-defined policies, organizations can match the performance and cost of storage to the value of data. This way, they can create tiers of storage based on specific requirements. For example, one pool can be reserved for high-performance FlashSystem storage and another for HDD storage or even tape. When data is placed in or moved between storage pools, all of the data management is done by Storage. IBM FlashSystem HDD storage Hot files All other files High-capacity flash deployments are becoming more and more common, and they are expanding the field of problems that parallel file systems can solve. Many workloads have been designed assuming that the latency and IOPS of the disks are so poor that only large-block access should be used to grab enough data to fill the compute node memory, process it locally and stream out the results. This design limits the problem sets organizations can tackle to those that can easily be partitioned and processed independently on different compute nodes. FlashSystem can remove these limitations by providing lowlatency, high-iops performance and bandwidth, even with small-block random I/O. This offers a tremendous advantage to parallel processing architectures because each compute node can now access the entire dataset during processing, resulting in dramatically faster performance. Using FlashSystem to hold entire data working sets simplifies the architecture and allows Storage-powered clusters to efficiently tackle much broader ranges of problems. FlashSystem offers another important advantage in parallel compute environments the Quad Data Rate (QDR) Infiniband (IB) interface, which enables Storage to handle workloads with exceptionally high bandwidth requirements. These applications differ from more traditional workloads because there is no way to partition the data into chunks that can be sent to many compute nodes for processing in local memory before being recombined to synthesize results. Instead, each compute node needs access to the entire dataset at every processing step. The ability to place every CPU in a compute cluster close to all of the data is a unique advantage to architectures enabled by FlashSystem and QDR IB backbones because of the high bandwidth, low latency and ability to handle high-velocity random I/O. By using Storage and FlashSystem together, organizations can easily create storage pools that optimize performance and efficiency. 5
Finally, pooled or shared, extremely fast, low-latency, highbandwidth FlashSystem storage offers cost and management advantages over in-server flash solutions. For application datasets that can be federated where each compute node has an isolated working set FlashSystem shared storage provides the ability to time-slice access to a more efficient capacity rather than dedicating flash to every compute node at the maximum required capacity. FlashSystem provides the latency of internal flash, but with the flexibility and infrastructure of traditional shared storage solutions. Proof points: Scientific and medical high-performance computing This section looks at the organizations achieving real-world results by using Storage software and FlashSystem storage arrays. Metadata accelerator The Research Computing and Cyberinfrastructure (RCC) group at Penn State University provides systems and services that are used extensively in computational and data-driven research and teaching by more than 5,000 faculty members and students. By deploying FlashSystem to accelerate Storage metadata stores, RCC administrators were able to increase overall system performance by 600 percent. 2 According to Jason Holmes, lead systems administrator at RCC, Penn State absolutely made the right decision in choosing FlashSystem to accelerate its metadata. We ve had zero problems with FlashSystem. During the evaluation we brought them in, plugged them in, zoned them into our Fibre Channel SAN, created LUNs and they ve just worked ever since then. They re very well made and a very mature product. Plus, their support has been exceptional. Workload accelerator A large research hospital based in New York City is no stranger to the rigors, challenges and rewards of medical research backed by high-performance computational and data services. For example, the hospital s medical students are allowed to fully sequence their own genomes, which requires extreme computational horsepower. At the hospital, each genome processing project requires its own directory, often with more than a million files and ten thousand folders typically comprising 200 GB in total stored data. The sequencing application processing profile involves thousands of processors randomly accessing many small files. In this type of processing environment, HDD performance maxed out when handling the smaller data processing operations, leaving little room for larger operations. The hospital deployed Storage to create tiers of production data on FlashSystem storage, including the raw genome data and files, common related scientific data and other reference sources. All of this data was moved by Storage into storage pools powered by FlashSystem. System IOPS escalated dramatically. As the IT administrators tuned block sizes and processing threads, they saw a significant increase in IOPS performance. Beyond disk Storage and FlashSystem offer practical, cost-effective solutions to many of the information challenges currently faced by organizations of all sizes. So where does this leave HDD-based storage? HDD systems are priced lower per unit of storage when compared to flash. But when you consider other factors, such as mean time between failure (how often you must replace them), power consumption, value of performance, and even data-center real estate, flash storage is less costly than HDD. 6
Integrated deployments of Storage and FlashSystem are enabling renowned organizations to cost-effectively move away from HDD. The IBM solutions are particularly effective when disk storage is restricting performance (and can be replaced by flash) or results in excess capacity (driving efficiency down and costs up). In these cases, FlashSystem has become more and more attractive for storing all production data. And Storage, especially when integrated with products such as IBM Tivoli Storage Manager or IBM Linear Tape File System (LTFS), provides the tiering, pooling, management and data pre-fetch capabilities to make tape quite viable. Storage can uniquely manage the full data lifecycle, incorporating tape to deliver dramatically lower storage costs through policy-driven automation and tiered storage management. Disk essentially gets squeezed out between flash moving down from above and tape moving up from below. Plus, once storage is cost-, efficiency- and performanceoptimized, your organization can take full advantage of the cloud. Many IT environments are challenged by the network latency inherent in public and commercial cloud storage offerings. But a storage architecture employing Storage, FlashSystem and tape is already insulated from this issue. In fact, IBM technologies such as Storage and FlashSystem integrate well with solutions such as IBM Platform Computing cloud services and SoftLayer 3 infrastructures to enable a wide range of cloud-based solutions and architectures, including private, public and hybrid clouds. Moving Storage and FlashSystem-based compute environments from legacy tape into the cloud may be one of the most natural IT migrations yet. Empowering growth and innovation, today and tomorrow Not only do Storage software and FlashSystem storage systems have long histories of success, IBM is continually updating the solutions with industry-leading features and functionality. Organizations around the globe have already deployed them together, and this trend is accelerating. Why? Storage and FlashSystem bring the capabilities to address many of the most pressing data storage challenges. What s more, these leading-edge solutions deliver extreme performance for a smarter infrastructure, empowering growth and innovation for today and tomorrow. 7
For more information To learn more about IBM FlashSystem storage, please contact your IBM representative or IBM Business Partner, or visit: ibm.com/systems/storage/flash/ Additionally, IBM Global Financing can help you acquire the IT solutions that your business needs in the most cost-effective and strategic way possible. We ll partner with credit-qualified clients to customize an IT financing solution to suit your business goals, enable effective cash management, and improve your total cost of ownership. IBM Global Financing is your smartest choice to fund critical IT investments and propel your business forward. For more information, visit: ibm.com/financing Copyright IBM Corporation 2014 IBM Corporation Systems and Technology Group Route 100 Somers, NY 10589 Produced in the United States of America IBM, the IBM logo, ibm.com, FlashSystem, GPFS, Linear Tape File System, MicroLatency, Platform, Tivoli, and Variable Stripe RAID are trademarks of International Business Machines Corp., registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the web at Copyright and trademark information at ibm.com/legal/copytrade.shtml SoftLayer is a registered trademark of SoftLayer, Inc., an IBM Company. This document is current as of the initial date of publication and may be changed by IBM at any time. The performance data discussed herein is presented as derived under specific operating conditions. Actual results may vary. THE INFORMATION IN THIS DOCUMENT IS PROVIDED AS IS WITHOUT ANY WARRANTY, EXPRESS OR IMPLIED, INCLUDING WITHOUT ANY WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND ANY WARRANTY OR CONDITION OF NON-INFRINGEMENT. IBM products are warranted according to the terms and conditions of the agreements under which they are provided. Actual available storage capacity may be reported for both uncompressed and compressed data and will vary and may be less than stated. 1 Storage is based on IBM General Parallel File System (GPFS ) technology. 2 IBM, Penn State slashes backup time by 80 percent, December 2012. ibm.com/common/ssi/cgi-bin/ssialias?subtype=wh&infotype= SA&appname=STGE_TS_DS_USEN&htmlfid=TSC03192USEN&attachment= TSC03192USEN.PDF 3 SoftLayer Technologies was acquired by IBM in July of 2013. Please Recycle TSW03269-USEN-00