Wrangler: A New Generation of Data-intensive Supercomputing. Christopher Jordan, Siva Kulasekaran, Niall Gaffney

Similar documents
New Storage System Solutions

Hadoop on the Gordon Data Intensive Cluster

Data management challenges in todays Healthcare and Life Sciences ecosystems

News and trends in Data Warehouse Automation, Big Data and BI. Johan Hendrickx & Dirk Vermeiren

Driving IBM BigInsights Performance Over GPFS Using InfiniBand+RDMA

HPC data becomes Big Data. Peter Braam

Scala Storage Scale-Out Clustered Storage White Paper

ebay Storage, From Good to Great

Oracle Big Data SQL Technical Update

SMB Direct for SQL Server and Private Cloud

BIG DATA-AS-A-SERVICE

Intro to Data Management. Chris Jordan Data Management and Collections Group Texas Advanced Computing Center

ALPS Supercomputing System A Scalable Supercomputer with Flexible Services

Big Fast Data Hadoop acceleration with Flash. June 2013

Can Flash help you ride the Big Data Wave? Steve Fingerhut Vice President, Marketing Enterprise Storage Solutions Corporation

With DDN Big Data Storage

Storage Architectures for Big Data in the Cloud

Performance Comparison of SQL based Big Data Analytics with Lustre and HDFS file systems

MAKING THE BUSINESS CASE

Well packaged sets of preinstalled, integrated, and optimized software on select hardware in the form of engineered systems and appliances

Flash Memory Arrays Enabling the Virtualized Data Center. July 2010

Can High-Performance Interconnects Benefit Memcached and Hadoop?

Integrated Grid Solutions. and Greenplum

Agenda. HPC Software Stack. HPC Post-Processing Visualization. Case Study National Scientific Center. European HPC Benchmark Center Montpellier PSSC

Converged storage architecture for Oracle RAC based on NVMe SSDs and standard x86 servers


ntier Verde Simply Affordable File Storage

Hadoop: Embracing future hardware

Comparing SMB Direct 3.0 performance over RoCE, InfiniBand and Ethernet. September 2014

SLIDE 1 Previous Next Exit

Hadoop & Spark Using Amazon EMR

ECDF Infrastructure Refresh - Requirements Consultation Document

NetApp High-Performance Computing Solution for Lustre: Solution Guide

GPFS Storage Server. Concepts and Setup in Lemanicus BG/Q system" Christian Clémençon (EPFL-DIT)" " 4 April 2013"

Utilizing the SDSC Cloud Storage Service

Commoditisation of the High-End Research Storage Market with the Dell MD3460 & Intel Enterprise Edition Lustre

Solid State Storage in the Evolution of the Data Center

Hadoop IST 734 SS CHUNG

THE EMC ISILON STORY. Big Data In The Enterprise. Copyright 2012 EMC Corporation. All rights reserved.

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat

Connecting Researchers, Data & HPC

DIABLO TECHNOLOGIES MEMORY CHANNEL STORAGE AND VMWARE VIRTUAL SAN : VDI ACCELERATION

Implement Hadoop jobs to extract business value from large and varied data sets

SALSA Flash-Optimized Software-Defined Storage

Benchmarking Cassandra on Violin

Elasticsearch on Cisco Unified Computing System: Optimizing your UCS infrastructure for Elasticsearch s analytics software stack

Architecting a High Performance Storage System

Data storage services at CC-IN2P3

WASHTENAW COUNTY FINANCE DEPARTMENT Purchasing Division

HYPER-CONVERGED INFRASTRUCTURE STRATEGIES

EMC XtremSF: Delivering Next Generation Storage Performance for SQL Server

Big + Fast + Safe + Simple = Lowest Technical Risk

Business-centric Storage FUJITSU Hyperscale Storage System ETERNUS CD10000

Enabling High performance Big Data platform with RDMA

Pilot-Streaming: Design Considerations for a Stream Processing Framework for High- Performance Computing

MapReduce and Hadoop. Aaron Birkland Cornell Center for Advanced Computing. January 2012

Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software

Overview: X5 Generation Database Machines

Outline. High Performance Computing (HPC) Big Data meets HPC. Case Studies: Some facts about Big Data Technologies HPC and Big Data converging

DDN updates object storage platform as it aims to break out of HPC niche

The Use of Flash in Large-Scale Storage Systems.

High-Performance SSD-Based RAID Storage. Madhukar Gunjan Chakhaiyar Product Test Architect

High Performance Computing (HPC)

Agenda. Enterprise Application Performance Factors. Current form of Enterprise Applications. Factors to Application Performance.

SSDs: Practical Ways to Accelerate Virtual Servers

Hadoop Architecture. Part 1

The Data Placement Challenge

Boas Betzler. Planet. Globally Distributed IaaS Platform Examples AWS and SoftLayer. November 9, IBM Corporation

IBM Spectrum Scale vs EMC Isilon for IBM Spectrum Protect Workloads

WOS OBJECT STORAGE PRODUCT BROCHURE DDN.COM Full Spectrum Object Storage

NextGen Infrastructure for Big DATA Analytics.

Improving Time to Results for Seismic Processing with Paradigm and DDN. ddn.com. DDN Whitepaper. James Coomer and Laurent Thiers

Performance, Reliability, and Operational Issues for High Performance NAS Storage on Cray Platforms. Cray User Group Meeting June 2007

Using Hadoop to Expand Data Warehousing

StorReduce Technical White Paper Cloud-based Data Deduplication

SSDs: Practical Ways to Accelerate Virtual Servers

Big data management with IBM General Parallel File System

Data-intensive HPC: opportunities and challenges. Patrick Valduriez

FLOW-3D Performance Benchmark and Profiling. September 2012

" " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " "

A Uniquely Flexible HPC Resource for New Communities and Data Analytics

Open source Google-style large scale data analysis with Hadoop

The Methodology Behind the Dell SQL Server Advisor Tool

Removing Performance Bottlenecks in Databases with Red Hat Enterprise Linux and Violin Memory Flash Storage Arrays. Red Hat Performance Engineering

Achieving Real-Time Business Solutions Using Graph Database Technology and High Performance Networks

Data Center Storage Solutions

Big Data, Big Red II, Data Capacitor II, Wrangler, Jetstream, and Globus Online

Cloud Storage. Parallels. Performance Benchmark Results. White Paper.

Building a Top500-class Supercomputing Cluster at LNS-BUAP

Welcome to the unit of Hadoop Fundamentals on Hadoop architecture. I will begin with a terminology review and then cover the major components

HPC Growing Pains. Lessons learned from building a Top500 supercomputer

The Evolution of Microsoft SQL Server: The right time for Violin flash Memory Arrays

XtremIO Flash Memory, Performance & endurance

EOFS Workshop Paris Sept, Lustre at exascale. Eric Barton. CTO Whamcloud, Inc Whamcloud, Inc.

White Paper. Recording Server Virtualization

Lustre* is designed to achieve the maximum performance and scalability for POSIX applications that need outstanding streamed I/O.

The Future of Data Management

HPC and Big Data. EPCC The University of Edinburgh. Adrian Jackson Technical Architect

SQL Server Virtualization

Storage Fusion Xcelerator

Transcription:

Wrangler: A New Generation of Data-intensive Supercomputing Christopher Jordan, Siva Kulasekaran, Niall Gaffney

Project Partners Academic partners: TACC Primary system design, deployment, and operations Indiana U. ; Hosting/Operating replicated system and end-to-end network tuning. U. of Chicago: Globus Online integration, high speed data transfer from user and XSEDE sites. Vendors: Dell, DSSD (subsidiary of EMC 2 )

SYSTEM DESIGN AND OVERVIEW

Some Observations There are fundamental differences in data access patterns between Data Analytics and HPC Small read access of many files vs. large sequential writes to several files Many Data Researchers want to work with Data not MPI, Lustre Striping, Vectorization, Code Optimization What s wrong with creating 4 Million 1K files and working with them at random?

Goals for Wrangler To address the data problem in multiple dimensions Data at large and small scales, reliable, secure Lots of data types: Structured and unstructured Fast, but not just for large files and sequential access. Need high transaction rates and random access too. To support a wide range of applications and interfaces Hadoop, but not *just* Hadoop. Traditional languages, but also R, GIS, DB, and other, less HPC style performing workflows. To support the full data lifecycle More than scratch Metadata and collection management support

TACC Ecosystem Stampede Top 10/Petaflop-class traditional cluster HPC system Stockyard and Corral 25 Petabytes of combined disk storage for all data needs Ranch 160 Petabytes of tape archive storage Maverick/Rustler/Rodeo Niche systems for visualization, Hadoop, VMs, etc

Wrangler Hardware IB Interconnect 120 Lanes (56 Gb/s) non-blocking Interconnect with 1 TB/s throughput Mass Storage Subsystem 10 PB (Replicated) Access & Analysis System 96 Nodes 128 GB+ Memory Haswell CPUs Three primary subsystems: A 10PB, replicated disk storage system. An embedded analytics capability of several thousand cores. A high speed global file store 1TB/s 250M+ IOPS High Speed Storage System 500+ TB 1 TB/s 250M+ IOPS

Wrangler At Large TACC Mass Storage Subsystem 10 PB (Replicated) Indiana Mass Storage Subsystem 10 PB (Replicated) IB Interconnect 120 Lanes (56 Gb/s) non-blocking Interconnect with 1 TB/s throughput Access & Analysis System 96 Nodes 128 GB+ Memory Haswell CPUs High Speed Storage System 500+ TB 1 TB/s 250M+ IOPS 40 Gb/s Ethernet 100 Gbps Public Network Globus Access & Analysis System 24 Nodes 128 GB+ Memory Haswell CPUs

Storage The disk storage system consists of more than 20 PB of raw disk for project-term storage. ~75 GB/s sequential write performance Lustre based file system with 34 OSS Nodes and 272 Storage Targets Exposed to users on the system as a traditional filesystem and irods based data management system

Analysis Hardware The high speed storage will be directly connected to 96 nodes for embedded processing. Each analytics node will have 24 Intel Haswell cores, and 128GB of RAM, 40 GB Ethernet and Mellenox FDR networking.

DSSD Storage The flash storage provides the truly innovative capability of Wrangler Not SSD; a direct attached PCI interface allows access to the NAND flash. Not limited by 40 Gb/s Ethernet or 56 GB/s IB networking Flash storage not tied to individual nodes Not PCI or SAS storage in a node More than half a petabyte of usable storage space once RAIDed Could handle continuous writes to storage for 5+ years without loss due to Memory Wear

DSSD Storage (2) While the aggregate speed is great, the per node speed, and the speed for small, non-sequential transactions make this special for big data applications We expect to get up to 12GB/s and 2 million+ IOPS to a single compute node More than 100x a decent local hard drive 5-6x a pretty good fileserver with a 48 drive RAID array. I/O performance that used to require scaling to thousands of nodes will now require just a handful Great for traditional databases, some Hadoop apps, other transaction-intensive workloads Per-node performance is key you can get benefits from Wrangler with e.g. Postgres on one node

External Connectivity Wrangler will connect externally through both the existing public networks connections and the 100Gbps connections at both TACC and Indiana Fast network paths will be available to Stampede and other TACC systems for migration of large datasets Globus Online will be configured on Wrangler on day 1.

Wrangler File Systems /flash GPFS-based parallel file system utilizing multiple DSSD units /data Lustre-based 10PB disk file system /data/published Long term storage/publication area /work TACC s 20PB global file system /corral-repl TACC s 5PB Data management file system

Why Use Wrangler Limited by storage system capabilities Lots of small files to process/analyze Lots of random I/O not sequential read/write Need to analyze large datasets produced on other systems quickly Need a more on-demand interactive analysis environment Need to work with databases at high transaction rates Have a Hadoop or Spark workflow with need for large highperformance backing HDFS datastore Have a dataset that many users will compute with or analyze in need of a system with data management capabilities Have a job that is currently IO bound

Why Not Wrangler Need a very specific compute environment Cloud systems such as the upcoming Jetstream Need lots of compute capacity (>2K cores) HPC systems like Stampede or Comet Need large shared memory environment Stampede 1 GB nodes or Bridges is coming

WRANGLER PORTAL AND SERVICES

Need to support: Motivations Long-term reservations (months) for traditional and Hadoop job types Persistent database provisioning irods provisioning and data management service controls (fixity/audit/etc) New workflows as they are developed

Portals for Everything TACC team has contributed significantly to our own portal and the XSEDE user portal Very successful with users Newly developed Wrangler portal provides interface and backend infrastructure to manage services, reservations and workflows

Wrangler Portal Architecture Django-based website with backend MySQL RabbitMQ/AMQP-based messaging system Scheduler reservation and job scripts System-level deployment scripts Information services

Wrangler Services AutoCuration - Data management services Data dock and other ingest services Globus Online Dataset Services Persistent and temporary service management Postgres/MariaDB/MongoDB Open web publishing/web-based Data sharing Other science gateways as appropriate

Wrangler and Databases Databases are a natural area of focus for Wrangler Persistent Databases Data Collections/Resources, Stream Processing Transient Databases Database used as temporary engine for processing SQLite, other node-local options

Wrangler Workflows Hadoop framework needs special deployment steps at the time of reservation/execution Temporary database deployments require special setup process Once you have the framework to do these things

Wrangler Planned Workflows Bioinformatics BLAST Ubiquitous in genetic preprocessing OrthoMCL Protein grouping iplant integration to support community workflows Media curation: Large-scale image analysis/conversion Video and audio processing

EARLY APPLICATION RESULTS

Some Application Modes Persistent Database data hosting, shared curation, web application backends Temporary Database SQL or NoSQL as application engine Traditional MPI applications with IO-heavy requirements Hadoop applications with IOPS-heavy workloads Bioinformatics Perl/Python/Java operating on large datasets in a mostly serial fashion

80000 GPFS Parallel Read Performance 70000 60000 50000 40000 30000 20000 10000 0 1 2 4 8 16 32 35 1PPN-64NSD 2PPN-64NSD 1PPN-4NSD 2PPN-4NSD 1PPN-8NSD 2PPN-8NSD

Persistent Database and Web Arctos collections website has >100GB database, ~10TB image collection Site performance directly attributable to database performance = storage performance Database to Flash, Images to Disk Also, intend to use multi-site nature of Wrangler to provide higher reliability

TPC-H & Star Schema BM Postgres Seconds/Query SSBM300-Wrangler-D5 TPCH300-Wrangler-HDD TPCH300-Wrangler-D5 TPCH100-Corral-HDD TPCH100-Wrangler-D5 0 1000 2000 3000 4000 5000 6000 7000 8000 9000

OrthoMCL Protein grouping application All significant computational effort expressed in SQL and performed in the database Not necessarily optimized SQL Execution time from >10 hours to ~1 hour moving from similar disk-based system

Image Collection Curation Multiple projects working with 10s or 100s of thousands of high-quality images (~1-5GB) These images may require conversion, resizing, analysis, en mass Initial development on Stampede Wrangler provides ~3X performance improvement

Social Science/Economics Analysis Databases are commonly used as really big spreadsheets for survey results etc Data subsets are retrieved using R/Python/SAS/etc for further analysis Example: A researcher who has daily stock market activity data for last 100+ years

Stock Market Analysis Workflow Create and load transient database on flash file system Create derived databases with data subsets Save resulting database, restart on future job executions Save database state like checkpoints

FUTURE DEVELOPMENT ON WRANGLER

Wrangler Status DSSD products not in GA quite yet Dual-controllers should double performance Currently in friendly user mode Interested users can request startup/early user access through the XSEDE user portal

Ongoing Wrangler Tasks Development efforts always expected to continue beyond deployment Data Management/Curation task automation Data Publication more flexible models Workflow capture and reproduction Gateway Hosting bring your code + data

Importance of Data Publication Wrangler has an inherent notion of the data life cycle This includes long-term storage, publication Will provide mechanisms for acquiring DOIs, publishing one or more files Will support varying levels of openness

Acknowledgments The Wrangler project is supported by the Division of Advanced Cyber Infrastructure at the National Science Foundation. Award #ACI-1447307 Wrangler: A Transformational Data Intensive Resource for the Open Science Community

Acknowledgments 2