HPC data becomes Big Data. Peter Braam peter.braam@braamresearch.com



Similar documents
Netapp HPC Solution for Lustre. Rich Fenton UK Solutions Architect

High Performance Oracle RAC Clusters A study of SSD SAN storage A Datapipe White Paper

Solid State Storage in the Evolution of the Data Center

News and trends in Data Warehouse Automation, Big Data and BI. Johan Hendrickx & Dirk Vermeiren

Essentials Guide CONSIDERATIONS FOR SELECTING ALL-FLASH STORAGE ARRAYS

Scaling Objectivity Database Performance with Panasas Scale-Out NAS Storage

Diablo and VMware TM powering SQL Server TM in Virtual SAN TM. A Diablo Technologies Whitepaper. May 2015

IBM General Parallel File System (GPFS ) 3.5 File Placement Optimizer (FPO)

Improving Lustre OST Performance with ClusterStor GridRAID. John Fragalla Principal Architect High Performance Computing

Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software

DIABLO TECHNOLOGIES MEMORY CHANNEL STORAGE AND VMWARE VIRTUAL SAN : VDI ACCELERATION

The Flash Transformed Data Center & the Unlimited Future of Flash John Scaramuzzo Sr. Vice President & General Manager, Enterprise Storage Solutions

High Performance Computing Specialists. ZFS Storage as a Solution for Big Data and Flexibility

POSIX and Object Distributed Storage Systems

IBM System x GPFS Storage Server

Hadoop: Embracing future hardware

Enabling High performance Big Data platform with RDMA

GPFS Storage Server. Concepts and Setup in Lemanicus BG/Q system" Christian Clémençon (EPFL-DIT)" " 4 April 2013"

IBM System x GPFS Storage Server

RAID. RAID 0 No redundancy ( AID?) Just stripe data over multiple disks But it does improve performance. Chapter 6 Storage and Other I/O Topics 29

BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB

Datacenter Operating Systems

Choosing Storage Systems

Flash Use Cases Traditional Infrastructure vs Hyperscale

ebay Storage, From Good to Great

High Performance Computing OpenStack Options. September 22, 2015

Driving IBM BigInsights Performance Over GPFS Using InfiniBand+RDMA

Intel Solid- State Drive Data Center P3700 Series NVMe Hybrid Storage Performance

HPC ABDS: The Case for an Integrating Apache Big Data Stack

NoSQL Performance Test In-Memory Performance Comparison of SequoiaDB, Cassandra, and MongoDB

Connecting Flash in Cloud Storage

IOmark- VDI. Nimbus Data Gemini Test Report: VDI a Test Report Date: 6, September

MaxDeploy Ready. Hyper- Converged Virtualization Solution. With SanDisk Fusion iomemory products

Storage Architectures for Big Data in the Cloud

Getting performance & scalability on standard platforms, the Object vs Block storage debate. Copyright 2013 MPSTOR LTD. All rights reserved.

HP ProLiant BL660c Gen9 and Microsoft SQL Server 2014 technical brief

Sonexion GridRAID Characteristics

Intel RAID SSD Cache Controller RCS25ZB040

IBM ELASTIC STORAGE SEAN LEE

HPC Advisory Council

Hybrid Software Architectures for Big

Benchmarking Cassandra on Violin

IS IN-MEMORY COMPUTING MAKING THE MOVE TO PRIME TIME?

Software-defined Storage Architecture for Analytics Computing

New Storage System Solutions

Direct NFS - Design considerations for next-gen NAS appliances optimized for database workloads Akshay Shah Gurmeet Goindi Oracle

Flash Memory Arrays Enabling the Virtualized Data Center. July 2010

Technology Insight Series

Big data management with IBM General Parallel File System

Big Data Trends and HDFS Evolution

WITH A FUSION POWERED SQL SERVER 2014 IN-MEMORY OLTP DATABASE

The Pitfalls of Deploying Solid-State Drive RAIDs

Why Computers Are Getting Slower (and what we can do about it) Rik van Riel Sr. Software Engineer, Red Hat

The Flash-Transformed Financial Data Center. Jean S. Bozman Enterprise Solutions Manager, Enterprise Storage Solutions Corporation August 6, 2014

DataStax Enterprise, powered by Apache Cassandra (TM)

All-Flash Arrays Weren t Built for Dynamic Environments. Here s Why... This whitepaper is based on content originally posted at

Deploying Affordable, High Performance Hybrid Flash Storage for Clustered SQL Server

Accelerating Hadoop MapReduce Using an In-Memory Data Grid

The Shortcut Guide to Balancing Storage Costs and Performance with Hybrid Storage

Big Data Technologies Compared June 2014

2009 Oracle Corporation 1

How To Test A Flash Storage Array For A Health Care Organization

Ceph. A file system a little bit different. Udo Seidel

Distributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms

Hadoop on the Gordon Data Intensive Cluster

Can High-Performance Interconnects Benefit Memcached and Hadoop?

Quantcast Petabyte Storage at Half Price with QFS!

File System & Device Drive. Overview of Mass Storage Structure. Moving head Disk Mechanism. HDD Pictures 11/13/2014. CS341: Operating System

Petabyte Scale Data at Facebook. Dhruba Borthakur, Engineer at Facebook, SIGMOD, New York, June 2013

IBM Netezza High Capacity Appliance

Preview of Oracle Database 12c In-Memory Option. Copyright 2013, Oracle and/or its affiliates. All rights reserved.

Building a Scalable Storage with InfiniBand

HP Z Turbo Drive PCIe SSD

Easier - Faster - Better

SSDs: Practical Ways to Accelerate Virtual Servers

How To Speed Up A Flash Flash Storage System With The Hyperq Memory Router

Improve Business Productivity and User Experience with a SanDisk Powered SQL Server 2014 In-Memory OLTP Database

DeIC Watson Agreement - hvad betyder den for DeIC medlemmerne

How SSDs Fit in Different Data Center Applications

SSDs: Practical Ways to Accelerate Virtual Servers

How To Scale Myroster With Flash Memory From Hgst On A Flash Flash Flash Memory On A Slave Server

Latency vs. Capacity Storage Projections

Virtualization of the MS Exchange Server Environment

MaxDeploy Hyper- Converged Reference Architecture Solution Brief

SAP HANA - Main Memory Technology: A Challenge for Development of Business Applications. Jürgen Primsch, SAP AG July 2011

SALSA Flash-Optimized Software-Defined Storage

Hypertable Architecture Overview

Cloud Storage. Parallels. Performance Benchmark Results. White Paper.

ioscale: The Holy Grail for Hyperscale

Running Highly Available, High Performance Databases in a SAN-Free Environment

An Alternative Storage Solution for MapReduce. Eric Lomascolo Director, Solutions Marketing

SGI Solutions for RDSI/CAUDIT 2013 SGI

How to Deploy OpenStack on TH-2 Supercomputer Yusong Tan, Bao Li National Supercomputing Center in Guangzhou April 10, 2014

Practical Applications of Lustre/ZFS Hybrid Systems LUG 2014 Miami FL

Intel HPC Distribution for Apache Hadoop* Software including Intel Enterprise Edition for Lustre* Software. SC13, November, 2013

Transcription:

HPC data becomes Big Data Peter Braam peter.braam@braamresearch.com

me 1983-2000 Academia Maths & Computer Science Entrepreneur with startups (5x) 4 startups sold Lustre emerged Held executive jobs with acquirers 2014 Independent, advise, research Advise SKA SDP @ Cambridge Research on automatic parallelization with Haskell community Help others Dec 2013 (C) 2013 Braam Research, All Rights Reserved 2

Contents Introduction market & key questions Some Big Data problems & Algorithms HPC storage Cloud storage Conclusions Dec 2013 (C) 2013 Braam Research, All Rights Reserved 3

Key questions & market trends Dec 2013 (C) 2013 Braam Research, All Rights Reserved 4

Two Questions Given an HPC storage system, how can it be used for Big Data Analysis? What storage platforms are candidates to meet HPC and Big Data requirements? Dec 2013 (C) 2013 Braam Research, All Rights Reserved 5

IDC market data Fact 2011 2013 % of sites using co-processors 28.2% 76.9% HPC sites performing big data analysis 67% % of compute cycles dedicated to big data 30% % of sites using cloud infrastructure for HPC 18.8% 23.5% Year over year growth in high density servers ($) 25.5% Year over year growth in servers ($) -6.2% Dec 2013 (C) 2013 Braam Research, All Rights Reserved 6

Other facts Flash and much faster persistent memory tiers are inevitably coming. Multiple software challenges arise from this Management of tiers Much faster storage software to keep up with devices Gap between disk and other system performance continues to increase There is embedded processing on servers with attached storage and client-server processing with clients networked to servers. Pros & cons somewhat unclear. Dec 2013 (C) 2013 Braam Research, All Rights Reserved 7

Big Data Problems & Algorithms Dec 2013 (C) 2013 Braam Research, All Rights Reserved 8

Big Data Problems samples Input generally from simulation or sensors Climate modeling simulate then Find the hottest day each year in Cape Town Find very low pressure spots (typhoons) on Earth Genomics, Astronomy Find patterns (e.g. strings, galaxies) in huge data sets Pre-process data at TB/sec rates Data management Move all files with data on a particular server Dec 2013 (C) 2013 Braam Research, All Rights Reserved 9

Big Data Problems samples 2 Social network, advertising & intelligence Most of these become graph problems, some very hard Non-compliance in stock market transaction logs Replace legacy consumer information data warehousing with modern analytics Replacements of Teradata / Netezza sometimes difficult Modern platforms lack easy to use analytics language Dec 2013 (C) 2013 Braam Research, All Rights Reserved 10

Wide variations Some problems (e.g. some graph problems) must be executed in RAM. Graph500 benchmark 2000x speedup in 2.5 years Other problems require many iterations through disk-resident data Netezza analytics systems use FPGA s for accelerated streaming (e.g. filtering, compressing) Dec 2013 (C) 2013 Braam Research, All Rights Reserved 11

Big Data Algorithms Considerable variation Machine learning Bayesian analysis Indexing, sorting DB like Graph algorithms Maximal Information Coefficients generalize regressions Compressed sensing (aka sparse recovery) Topological Data Analysis Dec 2013 (C) 2013 Braam Research, All Rights Reserved 12

Ogres Analogously to Berkeley Dwarfs big data problems have been classified: see Understanding Big Data Applications and Architectures 1st JTC 1 SGBD Meeting SDSC San Diego March 19 2014 Geoffrey Fox Judy Qiu Shantenu Jha (Rutgers) Dec 2013 (C) 2013 Braam Research, All Rights Reserved 13

So Given these variations a single architecture is not likely to address all big data problems well. Dec 2013 (C) 2013 Braam Research, All Rights Reserved 14

HPC Storage Dec 2013 (C) 2013 Braam Research, All Rights Reserved 15

HPC data Traditional model cluster file system and Single Shared File (with # cores readers / writers) File Per Process (and 1 process per core ) Tightly coupled problems allow little scheduling of tasklets or redistribution of I/O Problems Throughput == #server nodes x (speed of slowest node) Very sensitive to component variation Monitoring tools fail to root cause Dec 2013 (C) 2013 Braam Research, All Rights Reserved 16

Results quite reasonable Systems like Lustre, GPFS, Panasas Use carefully configured and tested hardware Fast networks Deliver 80% of slowest hardware component Pipelines from clients to disk are uniformly wide Servers can deliver ~3GB/sec / controller Achilles heels: Metadata Availability Data management Dec 2013 (C) 2013 Braam Research, All Rights Reserved 17

A sample of hard cases First write then read. Why the gap? Opening & creating files is too slow. Should run >2x faster! First seen at ORNL in 2006. Metadata performance on Sequoia and on Cove (50 & 5 SSD drives) Low 1000 s to ~15K ops / sec Maximum seen ever ~50K ops Dec 2013 (C) 2013 Braam Research, All Rights Reserved 18

HPC hard cases ctd Larger numbers of concurrent metadata clients are not easy. Conclusion: 1. Problems systems like Lustre remain 2. Sensitivity to uniformly good hardware 3. Honest data from the users & understanding exists 4. It has been used at very large scale Acknowledgement: graphs from a variety of presentations given at LADD 2013 Dec 2013 (C) 2013 Braam Research, All Rights Reserved 19

Cloud data into HPC file system Intel s FastForward project Ingest massive ACG graphs through Hadoop Represent ACG using an HDF5 adaptation layer (HAL) & in Lustre DAOS objects. Then compute. Acknowledgement: Figure from Intel s hpdd.intel.com wiki Dec 2013 (C) 2013 Braam Research, All Rights Reserved 20

Cloud Storage Dec 2013 (C) 2013 Braam Research, All Rights Reserved 21

Hybrid solutions may be best TACC Wrangler system Big Data companion to Stampede DSSD storage is PCI connected and has KV interface 120 node Dell cluster with DSSD storage 275M IOPS Undoubtedly This will solve many big data problems well There will be problems that don t fit or for which flash is too slow Dec 2013 (C) 2013 Braam Research, All Rights Reserved 22

Typical Cloud Storage Combines memcached key value stores or DB s Relational, Distributed Key Value, Embedded Key Value MySql, Cassandra / Hbase, Rocksdb / LevelDB object stores (swift, CEPH, ) Results Read heavy loads from one cluster 100 s of servers serving 10M s of requests/sec Only the embedded DBs keep up with flash and NVRam Flash means: ~10us / read or write, RAM means <1us Flexible schemas for metadata Dec 2013 (C) 2013 Braam Research, All Rights Reserved 23

Manageability AWS elastic cloud master piece Open source solutions do similar Cassandra, CEPH, OpenStack Dec 2013 (C) 2013 Braam Research, All Rights Reserved 24

Tiered storage When is tiered storage important? For HPC dumping RAM requires flash cache Likely of increased importance: L1,2,3 PCM Flash Disk Tape Tiered storage can use container concept Cache misses fetch a container to faster memory High bandwidth transfers container relatively quickly One time latency e.g. 1 sec Then speed of faster tiers Key Point: neither cloud nor HPC has this now Dec 2013 (C) 2013 Braam Research, All Rights Reserved 25

Cloud object stores - CEPH Object is file with an id not with a name CEPH manages Removal and addition of storage Failed nodes, racks Quite clever load balancing and data placement CRUSH data placement perfect for management Dec 2013 (C) 2013 Braam Research, All Rights Reserved 26

Cloud objects still to demonstrate HPC bandwidth == #nodes x BW/node only limited testing at scale, no models Not yet clear: how it integrates with tiered storage Dealing with mixed workloads Dec 2013 (C) 2013 Braam Research, All Rights Reserved 27

Data layout - placement How to place many stripes? Bottleneck in RAID arrays: Rebuild a drive goes at rate of BW of 1 drive takes days Parity de-clustering & distributed spare Rebuild at BW of N drives (N = 60 / 600 / 6000?) For e.g. 10+2 redundancy, speedup 60/10, 600/10, etc. Benefit is large 5x 100x+ Algorithms & math is hard: block mappings Somewhat unproven for HPC loads Cloud objects have a form of parity declustering Dec 2013 (C) 2013 Braam Research, All Rights Reserved 28

Data layout erasure codes How to rebuild a single stripe faster Generalizes RAID, Solomon-Reed codes etc. Benefits stripe reconstruction I/O 1-2x Tons of attention and publications If the network is the slowest component this is important parity de-clustering is hard on network Dec 2013 (C) 2013 Braam Research, All Rights Reserved 29

Conclusions Dec 2013 (C) 2013 Braam Research, All Rights Reserved 30

Conclusions There are many Big Data algorithms There are many cloud storage solutions Big data on HPC several vendors New specialized solutions (DSSD) More attention for modeling the problems & solutions Inevitably mileage will vary depending on the problem. Dec 2013 (C) 2013 Braam Research, All Rights Reserved 31

Thank you Dec 2013 (C) 2013 Braam Research, All Rights Reserved 32