An Alternative Storage Solution for MapReduce Eric Lomascolo Director, Solutions Marketing
MapReduce Breaks the Problem Down Data Analysis Distributes processing work (Map) across compute nodes and accumulates results (Reduce) Hadoop is a popular open source MapReduce S/W Processes unstructured and semi-structured data HDFS uses location info to replicate information between nodes By Default 3 copies *Hadoop Demystified Rare Mile Technologies 8
About the Hadoop File System (HDFS) WORM access model Uses commodity hardware with the expectation that failures will occur Reads data in large, contiguous data blocks and process very large files Is Hardware agnostic Assumes that moving computation is cheaper than moving data 9
HDFS Performance is Limited HDFS Premise Moving Computation is Cheaper Than Moving Data The data ALWAYS has to be moved Either from local disk Or from the network Includes Replication operations for availability Results data movement And with a good network: the network wins Hadoop performance is gated by file system performance 10
Hadoop File System (HDFS) Challenges Performance a lack of caching in the case of random loads slow file modifications due to WORM and synchronous replication HTTP used for data transfer cannot use DMA Scalability Large block sizes limits the number of files Limits full use of resources in the case when data is not at the CPU HDFS RAID can eliminate need for replication but impacts CPU Storage Not POSIX compliant and non-general purpose access Data transfer into and out of Hadoop environment is required Data Replication storage costs 11
Lustre High Performance File System Alternative CIFS Client Object Storage Servers () 1-1,000s Object Storage Target (OST) NFS Client Gateway disk Client Router disk Client Support multiple network types Gemini, Myrinet, IB, GigE disk Client Metadata Servers (MDS) MDS MDS Lustre Client 1-100,000 Metadata Target (MDT) disk Disk arrays & SAN Fabric 12
Comparing HDFS to Lustre Cluster Setup Scenario 100 clients, 100 disks, Infiniband Disks: 1 TB High Capacity SAS drives (Seagate Barracuda) 80 MB/sec bandwidth with cache off Network: 4xSDR Infiniband 1GB/s HDFS: 1 drive per client Lustre: 10 s with 10 OSTs
HDFS Setup local local local Client Client Client IB Switch 80MB/s 1GB/s
Lustre Setup Client Client Client IB Switch OST OST OST OST OST OST 80MB/s 1GB/s
Comparing HDFS to Lustre Theoretical Part I 100 clients, 100 disks, SDR Infiniband HDFS: 1 drive per client Local client bandwidth is 80MB/s Lustre: Each has Lustre bandwidth is 800MB/s aggregate (80MB/s * 10) Assuming bus bandwidth to access all drives simultaneously Net bandwidth 1GB/s (IB is point to point) With 10 s, we have same capacity & bandwidth Network is not the limiting factor!
Comparing HDFS to Lustre Theoretical Part II - Striping In terms of raw bandwidth, network does not limit data access rate Striping the data for each Hadoop data block, we can focus our bandwidth on delivering a single block HDFS limit, for any 1 node: 80MB/s Lustre limit, for any 1 node: 800MB/s Assuming striping across 10 OSTs Can deliver that to 10 nodes simultaneously Typical MR workload is not simultaneous access (after initial job kickoff) 17
MapReduce I/O Benchmark 8 Nodes QDR IB 8 Drives (80MB/s) HDFS -8 Nodes -1 Disk each Lustre -2-4 OST Disks 18
MR Sort Benchmark Hadoop Data Movement Limited to: Local disk & HTTP Protocols 19
Lustre Advantages for Hadoop Performance Caching file system with complete cache coherence High performance file modifications replication not required Uses high speed DMA for data transfers Scalability Support for billions of files 2.5 Billion All compute clients have access to data Can leverage standard data and system availability techniques Storage POSIX compliant No data transfer for pre and post processing required Reduces need to manage multiple copies between analytic systems 20
ClusterStor 6000 A Big Data Scale-Out Solution Delivering the Ultimate in HPC Data Storage with: Optimized time to productivity Efficiency, application availability, results Unmatched file system performance Delivered! Industry s fastest just got two times faster Highest reliability, availability and serviceability Enterprise level resiliency 21
ClusterStor Solutions An integrated and scalable HPC data storage solution designed to be Easy to deploy, use, and manage Delivering efficiency, application availability, and massive results 22
Lustre Community and Xyratex Roles in the Lustre Community OpenSFS & EOFS Board Member - Direct funding of Lustre tree & roadmap development Active Contributor to Lustre Source & Roadmap -World class Lustre development team on staff Integration of Lustre into ClusterStor - Industry leading HPC storage solutions Lustre Support Services -ClusterStor, Lustre & 3 rd party hardware
ClusterStor 6000 Optimized time to productivity Uses Xyratex exclusive parallel scale-out file system processing and I/O architecture Leverages latest in Xyratex application platform technologies and Lustre integration Optimized HW/SW Fully Integrated Factory Tested Shipped Ready to Go Results in increased file system throughput and capacity efficiencies on a per rack unit volume basis 24
ClusterStor Delivers Scale-Out Lustre Scalable Storage Unit - SSU - Building Block CIFS Client NFS Client Gateway Object Storage Servers () 1-1,000s Object Storage Target (OST) disk ClusterStor SSU Client Router disk Client Support multiple network types Gemini, Myrinet, IB, GigE disk Client Metadata Servers (MDS) ClusterStor HA-MDS MDS MDS Lustre Client 1-100,000 Metadata Target (MDT) disk Disk arrays & SAN Fabric 25
ClusterStor 6000 Scale-Out Building Blocks Unmatched file system performance Delivered! Industry s fastest just got two times faster Each ClusterStor 6000 Scalable Storage Unit (SSU) Produces 6 GB/sec of File System Performance Linear processing scalability supports installations up to 1 TB/s file system throughput and tens of PBs of storage capacity 26
ClusterStor Scalable Storage Unit (SSU) 27 *Xyratex ClusterStor White Paper
ClusterStor 6000 ClusterStor 6000 SSU Produces 6.0 GB/sec IOR Doubles SSU Performance ClusterStor Embedded Server Module Two Modules per SSU for high availability Increased Performance 42GB/sec per rack Latest Processor Technology 2X Memory FDR InfiniBand 28
ClusterStor Family Performance and Capacity More Performance and Storage Capacity in Less Space GigaBytes Performance (User Level Sustained IOR Lustre File System Performance) 360 30 60 90 120 ClusterStor 6000 Doubles SSU Performance 150 Number of SSUs 270 180 ClusterStor 3000 90 5.76 11.52 17.28 23.04 28.80 PetaBytes (User Level Storage Capacity) 29
ClusterStor 6000 Highest reliability, availability and serviceability Fully resilient software-hardware integration with low level diagnostics, embedded monitoring, enterprise level data protection architecture, proactive alerts 30 Easy to Manage Real Time Monitoring
ClusterStor Powering The Fastest Storage System in The World (Q3 2012) >1TB/second Aggregate Bandwidth Xyratex CS-6000 System Number of Racks: 36 Square Footage: 644 ft 2 Hard Drives: 17,280 Power: ~0.443MW Heat Dissipation (BTUs): 1,165,600 Exponentially less cost, space, cooling and power than the competition! Xyratex Confidential
Links Xyratex http://www.xyratex.com/ NCSA http://www.ncsa.illinois.edu/ Hadoop Demystified http://blog.raremile.com/2012/06/hadoop-demystified/ Wikibon on Big Data http://wikibon.org/wiki/v/big_data http://wikibon.org/blog/taming-big-data/ 32
Thank You 33 Xyratex Confidential