Use Cases for Large Memory Appliance/Burst Buffer



Similar documents
Big Data With Hadoop

Multi-GPU Load Balancing for Simulation and Rendering

So#ware Tools and Techniques for HPC, Clouds, and Server- Class SoCs Ron Brightwell

TCP Servers: Offloading TCP Processing in Internet Servers. Design, Implementation, and Performance

File System & Device Drive. Overview of Mass Storage Structure. Moving head Disk Mechanism. HDD Pictures 11/13/2014. CS341: Operating System

Final Project Report. Trading Platform Server

Distributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms

EOFS Workshop Paris Sept, Lustre at exascale. Eric Barton. CTO Whamcloud, Inc Whamcloud, Inc.

SAP HANA - Main Memory Technology: A Challenge for Development of Business Applications. Jürgen Primsch, SAP AG July 2011

NV-DIMM: Fastest Tier in Your Storage Strategy

Deploying Flash- Accelerated Hadoop with InfiniFlash from SanDisk

Take An Internal Look at Hadoop. Hairong Kuang Grid Team, Yahoo! Inc

Petascale Software Challenges. Piyush Chaudhary High Performance Computing

Luxembourg June

WHITEPAPER ODX OFFLOADED DATA TRANSFER EXPLAINED

All-Flash Arrays Weren t Built for Dynamic Environments. Here s Why... This whitepaper is based on content originally posted at

IS IN-MEMORY COMPUTING MAKING THE MOVE TO PRIME TIME?

Introducing Storm 1 Core Storm concepts Topology design

Windows Server Performance Monitoring

Intel DPDK Boosts Server Appliance Performance White Paper

Databases Going Virtual? Identifying the Best Database Servers for Virtualization

Realizing the next step in storage/converged architectures

Using Synology SSD Technology to Enhance System Performance Synology Inc.

Solution Brief Availability and Recovery Options: Microsoft Exchange Solutions on VMware

Direct NFS - Design considerations for next-gen NAS appliances optimized for database workloads Akshay Shah Gurmeet Goindi Oracle

WHITE PAPER 1

SOFTWARE-DEFINED STORAGE IN ACTION. What s new in SANsymphony-V v10

Hank Childs, University of Oregon

Symmetric Multiprocessing

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM

Architecting For Failure Why Cloud Architecture is Different! Michael Stiefel

- An Essential Building Block for Stable and Reliable Compute Clusters

Enabling High performance Big Data platform with RDMA

Virtualization of Oracle Evolves to Best Practice for Production Systems

America s Most Wanted a metric to detect persistently faulty machines in Hadoop

1. Comments on reviews a. Need to avoid just summarizing web page asks you for:

DISTRIBUTED SYSTEMS [COMP9243] Lecture 9a: Cloud Computing WHAT IS CLOUD COMPUTING? 2

<Insert Picture Here> Getting Coherence: Introduction to Data Grids South Florida User Group

emontage: An Architecture for Rapid Integration of Situational Awareness Data at the Edge

University of Dublin Trinity College. Storage Hardware.

Big Data Storage Options for Hadoop Sam Fineberg, HP Storage

HPC data becomes Big Data. Peter Braam

Overlapping Data Transfer With Application Execution on Clusters

Journal of science STUDY ON REPLICA MANAGEMENT AND HIGH AVAILABILITY IN HADOOP DISTRIBUTED FILE SYSTEM (HDFS)

White Paper. Real-time Capabilities for Linux SGI REACT Real-Time for Linux

NVRAM-aware Logging in Transaction Systems. Jian Huang

Big Data Analytics - Accelerated. stream-horizon.com

EMC: The Virtual Data Center

Leveraging EMC Fully Automated Storage Tiering (FAST) and FAST Cache for SQL Server Enterprise Deployments

IT of SPIM Data Storage and Compression. EMBO Course - August 27th! Jeff Oegema, Peter Steinbach, Oscar Gonzalez

Axceleon s CloudFuzion Turbocharges 3D Rendering On Amazon s EC2

Data Centric Systems (DCS)

CS 153 Design of Operating Systems Spring 2015

Intel Data Direct I/O Technology (Intel DDIO): A Primer >

Parallel Processing of cluster by Map Reduce

Everything you need to know about flash storage performance

Managing storage in the virtual data center. A white paper on HP Storage Essentials support for VMware host virtualization

TRACE PERFORMANCE TESTING APPROACH. Overview. Approach. Flow. Attributes

PERFORMANCE MODELS FOR APACHE ACCUMULO:

Big Fast Data Hadoop acceleration with Flash. June 2013

The Uintah Framework: A Unified Heterogeneous Task Scheduling and Runtime System

GraySort on Apache Spark by Databricks

High Frequency Trading and NoSQL. Peter Lawrey CEO, Principal Consultant Higher Frequency Trading

The Classical Architecture. Storage 1 / 36

Storage Architectures for Big Data in the Cloud

Chapter 6, The Operating System Machine Level

Making Multicore Work and Measuring its Benefits. Markus Levy, president EEMBC and Multicore Association

Solving I/O Bottlenecks to Enable Superior Cloud Efficiency

MAGENTO HOSTING Progressive Server Performance Improvements

ZooKeeper. Table of contents

Demand Attach / Fast-Restart Fileserver

The Use of Flash in Large-Scale Storage Systems.

Running a Workflow on a PowerCenter Grid

Memory-Centric Database Acceleration

Assignment # 1 (Cloud Computing Security)

ParFUM: A Parallel Framework for Unstructured Meshes. Aaron Becker, Isaac Dooley, Terry Wilmarth, Sayantan Chakravorty Charm++ Workshop 2008

Amadeus SAS Specialists Prove Fusion iomemory a Superior Analysis Accelerator

Using Synology SSD Technology to Enhance System Performance Synology Inc.

Distributed File Systems

Applications to Computational Financial and GPU Computing. May 16th. Dr. Daniel Egloff

Solvers, Algorithms and Libraries (SAL)

Storage I/O Control: Proportional Allocation of Shared Storage Resources

Running SAP Solutions in the Cloud How to Handle Sizing and Performance Challenges. William Adams SAP AG

Benchmarking Cassandra on Violin

Large Scale Text Analysis Using the Map/Reduce

Hadoop & Spark Using Amazon EMR

The Hadoop Distributed File System

Chapter 2: Computer-System Structures. Computer System Operation Storage Structure Storage Hierarchy Hardware Protection General System Architecture

How To Connect Virtual Fibre Channel To A Virtual Box On A Hyperv Virtual Machine

Server Virtualization: Avoiding the I/O Trap

CS2510 Computer Operating Systems

CS2510 Computer Operating Systems

Transcription:

Use Cases for Large Memory Appliance/Burst Buffer Rob Neely Bert Still Ian Karlin Adam Bertsch LLNL-PRES- 648613 This work was performed under the auspices of the U.S. Department of Energy by under contract DE-AC52-07NA27344. Lawrence Livermore National Security, LLC

Assumptions in these use cases Large/huge amounts of memory are directly available to applications. Access (latency and bandwidth) to this memory is slower than to regular main memory. Memory may or may not be persistent (few of these usecases rely on it) There may or may not be dedicated compute resources near/ in the memory Memory is possibly shared by multiple compute nodes Access is either via block device interface or byte addressed interface (use-case dependent) Each use case that follows lists a priority. This is our best attempt at guessing how likely we would be use pursue this case, and is the opinion of LLNL only. 1

Use case #1: Defensive I/O (checkpointing) Priority: 10/10 Use for fast checkpointing using standard techniques Use LLNL SCR (Scalable Checkpoint-Restart) package or System API System API must maintain low barrier to entry for apps SCR uses Memory-mapped files Drain every Nth checkpoint to filesystem asynchronously Use for regular defensive I/O If compute were attached, could have a user-defined management thread running asynchronously Allows for rapid rollback/recovery to be implemented in code Concept and software (SCR) exist today Low barrier to entry for apps Non- volatile memory allows for potential fast restart if job is re- launched Taking advantage of non- volatile properties would require re- launch on same set of nodes (or clever shuffling/ unified view of data) POSIX filesystem security required to protect NV data 2

Use case #2: Visualization I/O Priority: 9/10 Use for fast output using standard techniques Drain plot files to filesystem asynchronously If compute were attached, could have a user-defined management thread (e.g. generate movie frame) running asynchronously Allows for asynchronous use of permanent storage Should co- exist with defensive I/O case Asynchronous use of permanent storage improves ratio of time spent on compute Asynchronous computation (if available) could save a lot of time and space in generating plots Managing asynchronous computation (if available) could add complexity to workflow 3

Use case #3: Accelerated reads Priority: 8/10 Use for fast application startup using standard read techniques Stage data in advance of application requirements/launch Use for large or commonly read data, such as Shared libraries Input decks Material databases Probably a low barrier to entry for apps Some effort in place (SPINDLE) at LLNL to automate some of this for dynamic libraries Not clear how storage management would be handled (staging, persistence, ) 4

Use case #4: In- situ visualization Priority: 6/10 Do on the fly staged visualization and/or analysis Keep copy of application state in memory for in-situ processing Feature extraction / topological analysis Preprocessing or data reduction before writing to long term storage Create movie frames asynchronously while calculation continues Pause simulation and examine If compute is available on storage, process asynchronously Existing research (e.g. VisIt) could take advantage Greatly reduces amount of state written to disk Asynchrony provides performance gains, but adds complexity Research required before production use 5

Use case #5: Data lookup server Priority: 4/10 Storage of large shared databases/tables Easily allow sharing of large data tables (e.g. EOS, cross-sections, opacities) across multiple nodes With attached compute, can do calculations (e.g. interpolations) locally and minimize data motion Large storage allows for additional pre-calculation of lookup coefficients, derivatives, etc Faster lookup times Clear breakdown of client/ server roles well- understood programming model Tables could be preinitialized and kept in memory across runs avoid reinitialization costs. Bursty access will align across nodes in an SPMD model. Potential for hot spots and/or not enough compute May not be enough room to overlap request/use to make async access pay off 6

Use case #6: Large out- of- core storage Priority: 3/10 Use as extension of main memory Allow for greater physics fidelity For example, greatly increase the number of particles in a Monte Carlo simulation, staging them into near memory in a pipelined fashion Near memory becomes another level of cache could be managed automatically through the OS via small pages E.g DI-MMAP project at LLNL Reduce dynamic memory management Just keep temporary arrays around Malloc/free are slow and usually done in serial (non-threaded) sections of code Memory- capacity intensive applications not required to strong- scale as much Cache- approach would be largely invisible to the user Programmer- managed placement and movement between memory is as- yet undefined Many problems will want additional compute resources to go with all that additional memory 7

Use case #7: Speculative Execution Priority: 2/10 Keep micro-checkpoints in memory to support transaction-like rollback Most multi-physics codes have physics-based triggers that will cause the code to self-abort. E.g. Over-advection of materials Failure to converge in a solver Allowing a code to rollback to an earlier part of the timestep would allow for a re-try E.g. Loosen constraints on the problem, reduce timestep, etc More robust code behavior Much smaller and faster than a full checkpoint/restart Memory capacity may not be what s preventing us from doing this today. It s just simply hard to implement 8

Use case #8: Master task / worker queue model Priority: 1/10 Keep state in NVRAM, fire off worker-tasks on nodes Task queue managed on NVRAM node assumes some level of compute Potentially good model for older/legacy codes that did lots of work on task 0 Good for priority task queue designed software Tasks must be long- running enough to absorb latency cost of spawning Still must manage distributed state of task queue at scale 9

Open Questions How does one access data in NVRAM? Block or byte addressable? Memory map? Other? What is the security model, to ensure persistent data in NVRAM can be accessed by the user in subsequent runs, but not by other users? Perhaps solvable with block-storage via file system permissions. Much harder problem for byte-addressable storage models This is a big-deal issue for machines running in a classified environment What is the potential impact of accessing data in a memory space largely dedicated to another user? What s a notional ratio of memory between near and far? If we use NVRAM simply as additional capacity without commensurate boost in compute, aren t we greatly exacerbating the data motion / memory bandwidth bottleneck? For the UQ example, what s the advantage over having a global view of the data set on disk? 10