Accelerating Lustre! with Cray DataWarp Steve Woods, Solutions Architect

Similar documents
Cray s Storage History and Outlook Lustre+ Jason Goodman, Cray LUG Denver

Diablo and VMware TM powering SQL Server TM in Virtual SAN TM. A Diablo Technologies Whitepaper. May 2015

PSAM, NEC PCIe SSD Appliance for Microsoft SQL Server (Reference Architecture) September 11 th, 2014 NEC Corporation

Flash 101. Violin Memory Switzerland. Violin Memory Inc. Proprietary 1

OLCF Best Practices. Bill Renaud OLCF User Assistance Group

HP ProLiant DL580 Gen8 and HP LE PCIe Workload WHITE PAPER Accelerator 90TB Microsoft SQL Server Data Warehouse Fast Track Reference Architecture

Application-Focused Flash Acceleration

High Performance Computing OpenStack Options. September 22, 2015

POSIX and Object Distributed Storage Systems

The Use of Flash in Large-Scale Storage Systems.

Current Status of FEFS for the K computer

Cloud Sure - Virtual Machines

Oracle Aware Flash: Maximizing Performance and Availability for your Database

DIABLO TECHNOLOGIES MEMORY CHANNEL STORAGE AND VMWARE VIRTUAL SAN : VDI ACCELERATION

IOmark- VDI. Nimbus Data Gemini Test Report: VDI a Test Report Date: 6, September

Evaluation Report: Accelerating SQL Server Database Performance with the Lenovo Storage S3200 SAN Array

Software-defined Storage Architecture for Analytics Computing

IOmark-VM. DotHill AssuredSAN Pro Test Report: VM a Test Report Date: 16, August

Intel Solid- State Drive Data Center P3700 Series NVMe Hybrid Storage Performance

Hadoop on the Gordon Data Intensive Cluster

VDI Without Compromise with SimpliVity OmniStack and VMware Horizon View

Capacity Planning for Microsoft SharePoint Technologies

Hadoop: Embracing future hardware

How To Build A Supermicro Computer With A 32 Core Power Core (Powerpc) And A 32-Core (Powerpc) (Powerpowerpter) (I386) (Amd) (Microcore) (Supermicro) (

VDI Without Compromise with SimpliVity OmniStack and Citrix XenDesktop

An Alternative Storage Solution for MapReduce. Eric Lomascolo Director, Solutions Marketing

2009 Oracle Corporation 1

Recommended hardware system configurations for ANSYS users

Maximizing Hadoop Performance and Storage Capacity with AltraHD TM

Accelerating I/O- Intensive Applications in IT Infrastructure with Innodisk FlexiArray Flash Appliance. Alex Ho, Product Manager Innodisk Corporation

Comparison of Hybrid Flash Storage System Performance


Windows 8 SMB 2.2 File Sharing Performance

How To Speed Up A Flash Flash Storage System With The Hyperq Memory Router

Flash Performance in Storage Systems. Bill Moore Chief Engineer, Storage Systems Sun Microsystems

Sun Constellation System: The Open Petascale Computing Architecture

Removing Performance Bottlenecks in Databases with Red Hat Enterprise Linux and Violin Memory Flash Storage Arrays. Red Hat Performance Engineering

HP ProLiant Gen8 vs Gen9 Server Blades on Data Warehouse Workloads

Maximum performance, minimal risk for data warehousing

Hyperscale Use Cases for Scaling Out with Flash. David Olszewski

Oracle Exadata Database Machine for SAP Systems - Innovation Provided by SAP and Oracle for Joint Customers

Commoditisation of the High-End Research Storage Market with the Dell MD3460 & Intel Enterprise Edition Lustre

Optimizing SQL Server Storage Performance with the PowerEdge R720

IOS110. Virtualization 5/27/2014 1

High Performance Computing Specialists. ZFS Storage as a Solution for Big Data and Flexibility

Shared File Performance Improvements

News and trends in Data Warehouse Automation, Big Data and BI. Johan Hendrickx & Dirk Vermeiren

HP reference configuration for entry-level SAS Grid Manager solutions

Using Synology SSD Technology to Enhance System Performance. Based on DSM 5.2

New Features in SANsymphony -V10 Storage Virtualization Software

Managing Storage Space in a Flash and Disk Hybrid Storage System

SAS Analytics on IBM FlashSystem storage: Deployment scenarios and best practices

Software-defined Storage at the Speed of Flash

PureSystems: Changing The Economics And Experience Of IT

The Data Placement Challenge

Getting the Most Out of Flash Storage

Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software

In search of the right way for extreme-scale HPC file system metadata

LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance

Flash Controller Architecture for All Flash Arrays

Maximizing Your Server Memory and Storage Investments with Windows Server 2012 R2

VMware Virtual SAN Design and Sizing Guide TECHNICAL MARKETING DOCUMENTATION V 1.0/MARCH 2014

Blueprints for Scalable IBM Spectrum Protect (TSM) Disk-based Backup Solutions

HyperQ Storage Tiering White Paper

MaxDeploy Ready. Hyper- Converged Virtualization Solution. With SanDisk Fusion iomemory products

MS EXCHANGE SERVER ACCELERATION IN VMWARE ENVIRONMENTS WITH SANRAD VXL

Best Practices for Optimizing Your Linux VPS and Cloud Server Infrastructure

89 Fifth Avenue, 7th Floor. New York, NY White Paper. HP 3PAR Adaptive Flash Cache: A Competitive Comparison

Cisco for SAP HANA Scale-Out Solution on Cisco UCS with NetApp Storage

Microsoft SQL Server 2012 on Cisco UCS with iscsi-based Storage Access in VMware ESX Virtualization Environment: Performance Study

MapR Enterprise Edition & Enterprise Database Edition

Everything you need to know about flash storage performance

Mit Soft- & Hardware zum Erfolg. Giuseppe Paletta

Avid ISIS

SAP HANA PLATFORM Top Ten Questions for Choosing In-Memory Databases. Start Here

Improving Grid Processing Efficiency through Compute-Data Confluence

Using Synology SSD Technology to Enhance System Performance Synology Inc.

WITH A FUSION POWERED SQL SERVER 2014 IN-MEMORY OLTP DATABASE

LSI MegaRAID CacheCade Performance Evaluation in a Web Server Environment

Minimize cost and risk for data warehousing

SAS Grid Manager Testing and Benchmarking Best Practices for SAS Intelligence Platform

CSCA0102 IT & Business Applications. Foundation in Business Information Technology School of Engineering & Computing Sciences FTMS College Global

ebay Storage, From Good to Great

WHITE PAPER 1

Part 1 - What s New in Hyper-V 2012 R2. Clive.Watson@Microsoft.com Datacenter Specialist

Virtuoso and Database Scalability

Cloud Storage. Parallels. Performance Benchmark Results. White Paper.

OLTP Meets Bigdata, Challenges, Options, and Future Saibabu Devabhaktuni

June Blade.org 2009 ALL RIGHTS RESERVED

A virtual SAN for distributed multi-site environments

HP ProLiant BL660c Gen9 and Microsoft SQL Server 2014 technical brief

IOmark- VDI. HP HP ConvergedSystem 242- HC StoreVirtual Test Report: VDI- HC b Test Report Date: 27, April

Analysis of VDI Storage Performance During Bootstorm

Lustre* is designed to achieve the maximum performance and scalability for POSIX applications that need outstanding streamed I/O.

Analyzing Big Data with Splunk A Cost Effective Storage Architecture and Solution

Advances in Virtualization In Support of In-Memory Big Data Applications

HADOOP ON ORACLE ZFS STORAGE A TECHNICAL OVERVIEW

SLIDE 1 Previous Next Exit

WINDOWS SERVER MONITORING

Transcription:

Accelerating Lustre! with ray DataWarp Steve Woods, Solutions Architect

Accelerate Your Storage! The Problem a new storage hierarchy DataWarp overview End user Perspectives Use cases Features Examples onfiguration onsiderations Summary

The Problem Buying Disk for Bandwidth is Expensive HP Wire, May 1, 2014 Attributed to Gary Grider, LANL

New Storage Hierarchy PU On Node Off Node PU Memory (DRAM) Storage (HDD) Traditional On Node Off Node Near Memory (HBM/HM) Far Memory (DRAM/NVDIMM) Near Storage (SSD) Far Storage (HDD) Highest effective cost Lowest latency Lowest effective cost Highest latency Today ray Storage and Data Management - 2015 4

New Storage Hierarchy DataWarp Software defined storage High performance storage pool Sonexion Scalable file system Resilient storage Problem solved! Scale bandwidth separately from capacity Reduce overall solution cost Improve application run time PU Near Memory (HBM/HM) Far Memory (DRAM/NVDIMM) Near Storage (SSD) Far Storage (HDD) ray Today Bandwidth needed apacity needed ray Storage and Data Management - 2015 5

Blending Flash with Disk For high Performance Lustre Blended Solution DataWarp to satisfy the bandwidth needs Sonexion to satisfy the capacity needs Drives down the cost of bandwidth ($/GB/s) Sonexion-only Solution Lots of SSU s for bandwidth Drives up the cost of bandwidth ($/GB/s)

DataWarp Overview Hardware Intel Server Block-based SSD Aries I/O blade = Raw performance Software Virtualizes the underlying HW Single solution of flash & HDD Automation via policy Intuitive interface = Harnesses the performance

Software Phases of DataWarp Phase 0 (available 2014) Statically configured compute node swap Single server file systems, /flash/ Phase 1 (fall 2015) [LE 5.2UP04 + patches] Dynamic allocation and configuration of DataWarp storage to jobs (WLM support) Application controlled explicit movement of data between DataWarp and parallel file system (stage_in and stage_out) DVS striping across DataWarp nodes Phase 2 (late 2016) [LE 6.0UP02] DVS client caching Implicit movement of data between DataWarp and PFS storage (cache) No application changes required 9/12/2016 opyright 2015 ray Inc 8

DataWarp Hardware Package Standard X I/O blade SSDs instead of PIe cables = Plugs right into the Aries network apacity 2 nodes per blade 2 SSD s per node = 12.6 TB s per blade (shown) Performance = Node processors are already optimized for I/O and the ray Aries network A A A A A A LN LN DW DW HA HA HA HA SSD SSD SSD SSD 3.2TB 3.2TB 3.2TB 3.2TB Lustre storage =12.6TB

DataWarp Software Service layer (DWS) Defines the user experience Service layer (DVS) Virtualizes I/O Distributed File system layer Virtualizes the pool of Flash File presentation File presentation File presentation WLM User DataWarp Service Open Source FS Application Data Virtualization Service Logical Volume Manager PFS DWFS Devices

DataWarp User Perspectives Transparent New user No change to their experience e.g. PFS ache Active Experienced user WLM script cmds ommon for most use cases Optimized Power user ontrol Via Lib/LI e.g. async workflow

DataWarp User Perspectives Workload Manager Integration (WLM) Researcher/engineer inserts DataWarp commands into the job script I need this much space in the DataWarp pool I need the space in DataWarp to be shared I need the results saved out to the Parallel File System Job Script requests resources via WLM DataWarp capacity ompute nodes, files, file locations WLM automates clean up after the application completes WLM integration is the key Ease of use Dynamic provisioning

DataWarp User Perspectives Supported Workload Managers SLURM WLM User Application PFS Moab/Torque DataWarp Service XFS Data Virtualization Service DWFS PBS-Pro Logical Volume Manager Devices

Use ases for DataWarp Reference files File interchange High performance scratch Private scratch space Swap space We ll focus here Shared Storage PFS ache Local Storage Burst Buffer Local ache for the PFS Transparent user model heckpoint Restart

Use ases for DataWarp Shared Storage ray HP ompute Nodes DataWarp Nodes Reference files Read intensive commonly used by multi-compute nodes DataWarp Used directed behavior Automated provisioning of resources IS 2016 opyright 2016 ray Inc.

Use ases for DataWarp Shared Storage ray HP ompute Nodes File interchange Sharing intermediate work DataWarp Used directed behavior Automated provisioning of resources DataWarp Nodes IS 2016 opyright 2016 ray Inc.

Use ases for DataWarp Shared Storage ray HP ompute Nodes DataWarp Nodes High performance scratch Files are striped across the pool DataWarp User directed behavior Automated provisioning of resources IS 2016 opyright 2016 ray Inc.

Use ases for DataWarp Reference files File interchange High performance scratch Private scratch space Swap space Shared Storage Local Storage PFS ache Burst Buffer Local ache for the PFS Transparent user model heckpoint Restart

DataWarp Application Flexibility Burst Buffer ray HP ompute Nodes Shared Storage ray HP ompute Nodes Local Storage ray HP ompute Nodes PFS ache ray HP ompute Nodes Burst DataWarp Nodes DataWarp Nodes DataWarp Nodes DataWarp Nodes Trickle Sonexion Lustre Sonexion Lustre IS 2016 opyright 2016 ray Inc.

#DW jobdw... Requests a job DataWarp instance Lifetime the same as batch job Only usable by that batch job capacity=<size> Indirect control over server count based on granularity. Might help to request more space than you need. type=scratch Selects use of DWFS file system type=cache Selects use of DWFS file system 20

#DW jobdw... (continued) access_mode=striped All compute nodes see the same filesystem Files are striped across all allocated DW server nodes Files are visible to all compute nodes using the instance Aggregates both capacity and bandwidth per file access_mode=private All compute nodes see a different filesystem Files only go to a single DW server node A compute node uses the same DW node and files only seen by that compute node access_mode=striped,private Two mount points created on each compute node Share the same space 21

Simple DataWarp job with Moab #!/bin/bash #PBS -l walltime=2:00 -joe -l nodes=8 #DW jobdw type=scratch access_mode=striped capacity=790gib. /opt/modules/default/init/bash module load dws dwstat most # show DW space available and allocated cd $PBS_O_WORKDIR aprun -n 1 df -h $DW_JOB_STRIPED # only visible on compute nodes IOR=/home/users/dpetesch/bin/IOR.X aprun -n 32 -N 4 $IOR -F -t 1m -b 2g -o $DW_JOB_STRIPED/IOR_file 9/12/2016 opyright 2015 ray Inc 22

DataWarp scratch vs. cache Scratch (phase 1) #!/bin/bash #PBS -l walltime=4:00:00 -joe -l nodes=1 #DW jobdw type=scratch access_mode=striped capacity=200gib cd $PBS_O_WORKDIR export TMPDIR=$DW_JOB_PRIVATE NAST="/msc/nast20131/bin/nast20131 scr=yes bat=no sdir=$tmpdir" ccmrun ${NAST} input.dat mem=16gb mode=i8 out=dw_out ache (phase 2) #!/bin/bash #PBS -l walltime=4:00:00 -joe -l nodes=1 #DW jobdw type=cache access_mode=striped pfs=/lus/scratch/dw_cache capacity=200gib cd $PBS_O_WORKDIR export TMPDIR=$DW_JOB_STRIPED_AHE NAST="/msc/nast20131/bin/nast20131 scr=yes bat=no sdir=$tmpdir" ccmrun ${NAST} input.dat mem=16gb mode=i8 out=dw_cache_out 9/12/2016 opyright 2015 ray Inc 23

DataWarp Bandwidth The DataWarp bandwidth seen by an application depends on multiple factors: Transfer size of the I/O requests Number of Active Streams (files) per DataWarp server (for File-per-Process I/O, equals number of processes) Number of DataWarp server nodes (which is related to capacity requested) Other activity on the DW server nodes Administrative and other user jobs. It is a shared resource. 24

Minimize ompute Residence Time with Data Warp Timestep Writes Lustre Node ount Initial Data Load ompute Final Data Writes Wall Time Timestep Writes (DW) Key ompute Nodes DataWarp Node ount DW Preload DW Post Dump ompute Nodes - Idle I/O Time Lustre I/O Time DW DW Nodes Wall Time IS 2016 opyright 2016 ray Inc.

DataWarp with MS NASTRAN ray blog reference: http://www.cray.com/blog/io-accelerator-boosts-msc-nastran-simulations/ DataWarp Job wall clock reduced by 2x with DataWarp Lustre Only IS 2016 opyright 2016 ray Inc.

Elapsed seconds for Standard 3500 3000 2500 2000 1500 1000 500 Abaqus 2016 s4e, 24M elements, 2 ranks per node 16-core 2.3 GHz Haswell, 128 GB nodes X40 ABI lustre S400 lustre X40 ABI DW S400 /tmp 0 cpus=128 cpus=256 cpus=384 cpus=512 cpus=640 cpus=768 cpus=1024 cpus=1536 4 nodes 8 nodes 12 nodes 16 nodes 20 nodes 24 nodes 32 nodes 48nodes 9/12/2016 opyright 2015 ray Inc 27

DataWarp onsiderations Know your workload apacity requirement Bandwidth requirement Iteration interval alculate ratio of DataWarp to Spinning disk % of calculated bandwidth needed by DW vs HDD Is excess bandwidth needed to sync to HDD % of storage capacity needed by DW to maintain performance capacity for multiple iterations Budget

DataWarp Bottom Line It is about reducing Time to Solution Returning control back to compute Reducing the cost of Time to Solution

DataWarp Summary 1 3 2 Faster time to insight Easy 2to use Accelerates performance Dynamic Flexible

Questions?