Big data management with IBM General Parallel File System



Similar documents
Automated file management with IBM Active Cloud Engine

IBM Global Technology Services September NAS systems scale out to meet growing storage demand.

IBM FlashSystem and Atlantis ILIO

Effective Storage Management for Cloud Computing

IBM SmartCloud Monitoring

Effective storage management and data protection for cloud computing

IBM DB2 Near-Line Storage Solution for SAP NetWeaver BW

Scala Storage Scale-Out Clustered Storage White Paper

Easily deploy and move enterprise applications in the cloud

IBM Tivoli Storage Manager Suite for Unified Recovery

IBM Storwize V5000. Designed to drive innovation and greater flexibility with a hybrid storage solution. Highlights. IBM Systems Data Sheet

IBM Storwize V7000: For your VMware virtual infrastructure

IBM System Storage SAN Volume Controller

IBM Tivoli Storage Manager for Virtual Environments

IBM System x reference architecture solutions for big data

Driving workload automation across the enterprise

IBM Endpoint Manager for Server Automation

Reduce your data storage footprint and tame the information explosion

IBM Scale Out Network Attached Storage

IBM SmartCloud Workload Automation

Driving IBM BigInsights Performance Over GPFS Using InfiniBand+RDMA

High Availability with Windows Server 2012 Release Candidate

BlueArc unified network storage systems 7th TF-Storage Meeting. Scale Bigger, Store Smarter, Accelerate Everything

IBM Storwize V7000 Unified and Storwize V7000 storage systems

IBM Storwize Rapid Application Storage solutions

For healthcare, change is in the air and in the cloud

Realizing the True Potential of Software-Defined Storage

EMC ISILON OneFS OPERATING SYSTEM Powering scale-out storage for the new world of Big Data in the enterprise

IBM System Storage DS5020 Express

Introduction to NetApp Infinite Volume

The business value of improved backup and recovery

Diagram 1: Islands of storage across a digital broadcast workflow

IBM PureFlex System. The infrastructure system with integrated expertise

Deploy smarter, faster infrastructure with IBM storage solutions

Simplify security management in the cloud

The IBM Cognos Platform

EMC VPLEX FAMILY. Continuous Availability and data Mobility Within and Across Data Centers

Ten ways to save money with IBM Tivoli Storage Manager

IBM PureFlex and Atlantis ILIO: Cost-effective, high-performance and scalable persistent VDI

IBM Tivoli Storage FlashCopy Manager

IBM PureApplication System for IBM WebSphere Application Server workloads

Taking control of the virtual image lifecycle process

HIGHLY AVAILABLE MULTI-DATA CENTER WINDOWS SERVER SOLUTIONS USING EMC VPLEX METRO AND SANBOLIC MELIO 2010

WOS Cloud. ddn.com. Personal Storage for the Enterprise. DDN Solution Brief

OPTIMIZING EXCHANGE SERVER IN A TIERED STORAGE ENVIRONMENT WHITE PAPER NOVEMBER 2006

Business-centric Storage FUJITSU Hyperscale Storage System ETERNUS CD10000

Move beyond monitoring to holistic management of application performance

Application Brief: Using Titan for MS SQL

EMC VPLEX FAMILY. Continuous Availability and Data Mobility Within and Across Data Centers

Build more and grow more with Cloudant DBaaS

IBM Storwize Rapid Application Storage

Optimize workloads to achieve success with cloud and big data

IBM System x and VMware solutions

IBM Endpoint Manager for Lifecycle Management

STORAGE CENTER. The Industry s Only SAN with Automated Tiered Storage STORAGE CENTER

Enterprise Storage Solution for Hyper-V Private Cloud and VDI Deployments using Sanbolic s Melio Cloud Software Suite April 2011

A Virtual Filer for VMware s Virtual SAN A Maginatics and VMware Joint Partner Brief

NetApp Big Content Solutions: Agile Infrastructure for Big Data

New Features in PSP2 for SANsymphony -V10 Software-defined Storage Platform and DataCore Virtual SAN

EMC DATA DOMAIN OPERATING SYSTEM

With DDN Big Data Storage

IBM System Storage DS3950 Express

COMPARING STORAGE AREA NETWORKS AND NETWORK ATTACHED STORAGE

POWER ALL GLOBAL FILE SYSTEM (PGFS)

Automated, centralized management for enterprise servers

INCREASING EFFICIENCY WITH EASY AND COMPREHENSIVE STORAGE MANAGEMENT

Parallels Cloud Storage

High Performance Server SAN using Micron M500DC SSDs and Sanbolic Software

IBM Virtualization Engine TS7700 GRID Solutions for Business Continuity

Archive Data Retention & Compliance. Solutions Integrated Storage Appliances. Management Optimized Storage & Migration

Moving Virtual Storage to the Cloud

IBM System x GPFS Storage Server

SAN Conceptual and Design Basics

Direct NFS - Design considerations for next-gen NAS appliances optimized for database workloads Akshay Shah Gurmeet Goindi Oracle

StarWind Virtual SAN for Microsoft SOFS

Smarter Energy: optimizing and integrating renewable energy resources

WHITE PAPER. Get Ready for Big Data:

UPSTREAM for Linux on System z

How To Protect Data On Network Attached Storage (Nas) From Disaster

Data virtualization: Delivering on-demand access to information throughout the enterprise

EMC DATA DOMAIN OPERATING SYSTEM

Virtualizing disaster recovery using cloud computing

EMC s Enterprise Hadoop Solution. By Julie Lockner, Senior Analyst, and Terri McClure, Senior Analyst

Optimized Storage Solution for Enterprise Scale Hyper-V Deployments

Software-Defined Networks Powered by VellOS

Transcription:

Big data management with IBM General Parallel File System Optimize storage management and boost your return on investment Highlights Handles the explosive growth of structured and unstructured data Offers proven commercial-grade reliability for disruption-free maintenance and capacity upgrades Automates information lifecycle management with policy-driven file placement for all tiers of storage from SSD to tape Enables world-wide global namespace with active file management for asynchronous access and control of local and remote files The explosive growth of big data, transactions and digitally aware devices is straining IT infrastructure and operations. At the same time, storage budgets are shrinking and user expectations continue to multiply. Managing file storage becomes more complicated when the user community grows, and are scattered across the globe. In the past, organizations addressed data management challenges by adding file servers or using network attached storage. Traditional network-attached storage solutions can be restricted in performance, security and scalability. A single file server doesn t scale, and even a roomful of file servers means complex storage management. Neither environment is well suited to continuous data access that a data-intensive computing environment requires. This is especially true when files are physically located in different buildings, towns or countries. To overcome these issues, you need to look at a new, more effective approach to managing data.

The IBM General Parallel File System (IBM GPFS ) can help you move beyond simply adding storage to optimizing data management. GPFS is a high-performance, shared-disk file management solution that can provide faster, more reliable access to a common set of file-based data. GPFS is also designed to provide: Online storage management and efficient storage utilization. Integrated information lifecycle tools capable of managing petabytes of data and billions of files. Centralized administration. Shared access to file systems from remote GPFS clusters. High-performance remote file caching that is integrated with the GPFS cluster file system. Handling the explosive growth of big data more efficiently GPFS provides the essential services that allow you to effectively manage the growing quantities of unstructured data. GPFS leverages its scalable architecture to provide fast access to your file data. File data is automatically spread across multiple storage devices, providing optimal use of your available storage for better performance. GPFS differs significantly from a Network File System (NFS). With GPFS, there is no single-server bottleneck or high-protocol overhead for data transfer. GPFS takes file management beyond a single system by providing scalable access from multiple systems to a global namespace. GPFS interacts with applications like a local file system, but is designed to deliver higher performance, scalability and fault tolerance by allowing access to the data from multiple systems directly and in parallel. GPFS allows multiple applications or users to share access to a single file simultaneously while maintaining file-data integrity. For example, multiple animators or editors can work on different parts of a single video file from multiple workstations at the same time. This capability helps you reduce the amount of storage and management overhead required to maintain several copies of the source file. GPFS lock management is highly optimized and fine-grained to help ensure data consistency and integrity during concurrent access. Expanding beyond a storage area network (SAN), a single GPFS file system can be accessed by nodes using a TCP/IP or InfiniBand connection. Using this block-based network data access, GPFS can outperform network-based sharing technologies like NFS and even local file systems such as EXT3 journaling file system for Linux or Journaled File System (JFS) on AIX. 2

Providing greater data access to optimize performance for demanding applications GPFS is designed for high-performance parallel workloads. Data and metadata flow from all the nodes to all the disks in parallel under control of a distributed lock manager. It has a very flexible cluster architecture which enables the design of a data storage solution that not only meets current needs but can also quickly be adapted to new requirements or technologies. GPFS configurations include direct-attached storage, network block input and output (I/O) or a combination of the two. The data can reside within a single data center or be spread across multisite operations with synchronous data replication or asynchronous global namespace integration. Applications requiring the highest per node throughput often deploy the direct-attached model, where disks are physically attached to all nodes in the cluster that accesses the file system. The direct-attached model requires an operating system device interface to the storage, for example, using a Fibre Channel SAN or InfiniBand connection. To optimize the cost of your storage infrastructure, deploy a large number of systems or seamlessly integrate multiple platforms and interconnect storage technologies, the network block I/O (also called network shared disk [NSD]) model can be used. Network block I/O is a software layer that transparently forwards block I/O requests from a GPFS client application node to an NSD server node to perform the disk I/O operation, and then passes the data back to the client. Using a network block I/O configuration can be more cost effective than a full-access SAN. You can use this type of configuration to tie together systems across a wide area network (WAN). You can also develop a data workflow within one location or across the WAN to help save time and the cost of managing file data. To exploit disk parallelism, GPFS intelligently prefetches data, issuing I/O requests in parallel to as many disks as necessary to achieve the peak bandwidth of the underlying storage-hardware infrastructure. GPFS recognizes multiple I/O patterns, including sequential, reverse sequential and various access patterns. In addition, for high-bandwidth environments like digital media, GPFS can read or write large blocks of data in a single operation. Configuring file systems for high availability and reliability For optimal reliability, GPFS can be configured to eliminate single points of failure. The file system can be configured to remain available automatically in the event of a disk, server, or even an entire site failure. 3

GPFS is designed to transparently fail over token (lock) operations and other GPFS cluster services, which can be distributed throughout the entire cluster to eliminate the need for dedicated metadata servers. GPFS can be configured to automatically recover from node, storage and other infrastructure failures. GPFS provides this functionality by supporting data replication to increase availability in the event of a storage media failure; multiple paths to the data in the event of a communications or server failure; and file system activity logging, enabling consistent fast recovery after system failures. In addition, GPFS supports snapshots to provide a space-efficient image of a file system at a specified time, which allows online backup and can help protect against user error. GPFS offers time-tested reliability and has been installed on thousands of nodes across industries, from weather research to broadcasting, retail, to financial services, data analytics and web service providers. GPFS is the basis of many cloud storage offerings. Simplifying data management with advanced tools GPFS provides an information lifecycle management toolset that helps simplify data management by giving you more control over data placement. The toolset includes storage pooling and a high-performance, scalable, rule-based policy engine. Storage pools enable you to transparently manage multiple tiers of storage based on performance or reliability. You can use storage pools to transparently provide the appropriate type of storage to multiple applications or different portions of a single application within the same directory. For example, GPFS can be configured to use low latency disks for index operations and high capacity disks for data operations of a relational database. In addition, storage pools can simplify storage management by allowing you to transparently re-task older storage systems, moving them from a Tier 1 role to Tier 2. The high-performance policy engine leverages the parallel metadata infrastructure of GPFS to provide highly scalable policy processing, which allows you to actively manage billions of files across multiple storage tiers. Data sharing anywhere, anytime GPFS includes active file management, which is a scalable, high-performance remote file data caching solution that is integrated within a GPFS file system. If you have a situation where massive amounts of data is gathered at separate locations and the results are analyzed by people at other locations, you need a solution that makes it possible to transparently move file data automatically to where it is needed. Using active file management you can implement a global namespace giving every site the same view to all of the data. This is especially useful for collaborative projects, applications and workflows that are managed globally, but need to have access to the same files. Now applications can see the same set of files and have local read/write performance to those files no matter the location. 4

Active file management leverages the inherent scalability of GPFS, which results in a high-performance locationindependent solution that masks failures and hides wide-area latencies and outages. Easing file system administration GPFS provides a common management interface that simplifies administration even in very large environments. The administration model is easy to use and consistent with standard file system administration. Cluster operations can be managed from any node in the GPFS cluster. These commands support cluster management and other standard file system administration functions including user quotas, snapshots and storage management. You can run multiple versions of GPFS in the same cluster, which means that you can do rolling upgrades of operating systems and GPFS versions. Rolling upgrades can provide a downtime method for upgrading cluster nodes. Heterogeneous servers and storage systems can be added to and removed from a GPFS cluster while the file system remains online, further simplifying management and helping enable 24X7 operations. When storage is added or removed, the data can be dynamically rebalanced to maintain optimal performance. Helping reduce the cost of managing your storage environment GPFS can reduce data duplication and make more efficient use of storage components by combining isolated islands of information into a centralized, high-performance storage infrastructure. It also can help improve server hardware utilization by allowing dynamic storage access to all data from any node. By improving storage use, optimizing process workflow and simplifying storage administration, GPFS lets you take a multipronged approach to reducing storage costs. Summary Information growth is exploding. Petabyte scale systems are becoming the norm. GPFS is already supporting petabytes of data today and can scale to meet the dynamic needs of your business, providing a solid application infrastructure with NFS file serving capabilities. GPFS provides world-class reliability, scalability and availability for your file data and is designed to optimize the use of storage, support scale-out applications and provide a highly available platform for data-intensive applications. A dynamic infrastructure that includes GPFS can help your company streamline data workflows, improve service, reduce costs and manage risk delivering real business results today while positioning you for future growth. 5

For more information To learn more about IBM GPFS, contact your IBM representative, IBM Business Partner or visit: ibm.com/systems/gpfs Additionally, IBM Global Financing can help you acquire the IT solutions that your business needs in the most cost-effective and strategic way possible. We ll partner with credit-qualified clients to customize an IT financing solution to suit your business goals, enable effective cash management, and improve your total cost of ownership. IBM Global Financing is your smartest choice to fund critical IT investments and propel your business forward. For more information, visit: ibm.com/financing Copyright IBM Corporation 2012 IBM Corporation IBM Systems and Technology Group Route 100 Somers, NY 10589 Produced in the United States of America April 2012 IBM, the IBM logo, ibm.com, and GPFS are trademarks of International Business Machines Corp., registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the web at Copyright and trademark information at ibm.com/legal/copytrade.shtml This document is current as of the initial date of publication and may be changed by IBM at any time. Not all offerings are available in every country in which IBM operates. THE INFORMATION IN THIS DOCUMENT IS PROVIDED AS IS WITHOUT ANY WARRANTY, EXPRESS OR IMPLIED, INCLUDING WITHOUT ANY WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND ANY WARRANTY OR CONDITION OF NON-INFRINGEMENT. IBM products are warranted according to the terms and conditions of the agreements under which they are provided. Actual available storage capacity may be reported for both uncompressed and compressed data and will vary and may be less than stated. Please Recycle POS03096-USEN-00