Modeling Big Data/HPC Storage Using Massively Parallel Simula:on

Similar documents
Appro Supercomputer Solutions Best Practices Appro 2012 Deployment Successes. Anthony Kenisky, VP of North America Sales

Portable, Scalable, and High-Performance I/O Forwarding on Massively Parallel Systems. Jason Cope

Overview on Modern Accelerators and Programming Paradigms Ivan Giro7o


A GPU COMPUTING PLATFORM (SAGA) AND A CFD CODE ON GPU FOR AEROSPACE APPLICATIONS

An Alternative Storage Solution for MapReduce. Eric Lomascolo Director, Solutions Marketing

GPFS Storage Server. Concepts and Setup in Lemanicus BG/Q system" Christian Clémençon (EPFL-DIT)" " 4 April 2013"

Pedraforca: ARM + GPU prototype

Comparing Dynamic Disk Pools (DDP) with RAID-6 using IOR

RAID. RAID 0 No redundancy ( AID?) Just stripe data over multiple disks But it does improve performance. Chapter 6 Storage and Other I/O Topics 29

System Software for High Performance Computing. Joe Izraelevitz

Quantcast Petabyte Storage at Half Price with QFS!

Building Clusters for Gromacs and other HPC applications

Agenda. HPC Software Stack. HPC Post-Processing Visualization. Case Study National Scientific Center. European HPC Benchmark Center Montpellier PSSC

MedInformatix System Requirements

Sun Constellation System: The Open Petascale Computing Architecture

How To Build A Supermicro Computer With A 32 Core Power Core (Powerpc) And A 32-Core (Powerpc) (Powerpowerpter) (I386) (Amd) (Microcore) (Supermicro) (

Maximizing Hadoop Performance and Storage Capacity with AltraHD TM

Evaluation Report: Supporting Microsoft Exchange on the Lenovo S3200 Hybrid Array

HP recommended configuration for Microsoft Exchange Server 2010: HP LeftHand P4000 SAN

DSS. Diskpool and cloud storage benchmarks used in IT-DSS. Data & Storage Services. Geoffray ADDE

An Optimistic Parallel Simulation Protocol for Cloud Computing Environments

THE EXPAND PARALLEL FILE SYSTEM A FILE SYSTEM FOR CLUSTER AND GRID COMPUTING. José Daniel García Sánchez ARCOS Group University Carlos III of Madrid

SGI High Performance Computing

OpenMP Programming on ScaleMP

Direct NFS - Design considerations for next-gen NAS appliances optimized for database workloads Akshay Shah Gurmeet Goindi Oracle

Network Contention and Congestion Control: Lustre FineGrained Routing

Tableau Server Scalability Explained

SUN ORACLE EXADATA STORAGE SERVER

ALPS Supercomputing System A Scalable Supercomputer with Flexible Services

Mississippi State University High Performance Computing Collaboratory Brief Overview. Trey Breckenridge Director, HPC

The PHI solution. Fujitsu Industry Ready Intel XEON-PHI based solution. SC Denver

Overlapping Data Transfer With Application Execution on Clusters

Clusters: Mainstream Technology for CAE

IOmark- VDI. Nimbus Data Gemini Test Report: VDI a Test Report Date: 6, September

Evaluation Report: Accelerating SQL Server Database Performance with the Lenovo Storage S3200 SAN Array

Data Centers and Cloud Computing. Data Centers

Data Centers and Cloud Computing

2009 Oracle Corporation 1

CORRIGENDUM TO TENDER FOR HIGH PERFORMANCE SERVER

HITACHI VIRTUAL STORAGE PLATFORM FAMILY MATRIX

Avid ISIS

Scaling Objectivity Database Performance with Panasas Scale-Out NAS Storage

Virtualised MikroTik

DeIC Watson Agreement - hvad betyder den for DeIC medlemmerne

PSAM, NEC PCIe SSD Appliance for Microsoft SQL Server (Reference Architecture) September 11 th, 2014 NEC Corporation

Cloud Storage. Parallels. Performance Benchmark Results. White Paper.

Main Memory Data Warehouses

Packet Tracer 3 Lab VLSM 2 Solution

HP high availability solutions for Microsoft SQL Server Fast Track Data Warehouse using SQL Server 2012 failover clustering

Configuration Maximums VMware Infrastructure 3

PCIe Over Cable Provides Greater Performance for Less Cost for High Performance Computing (HPC) Clusters. from One Stop Systems (OSS)

IT Survey Frank Dwyer Senior Director, Information Technology The Salk Institute, La Jolla, CA

HP ProLiant BL660c Gen9 and Microsoft SQL Server 2014 technical brief

Cluster Grid Interconects. Tony Kay Chief Architect Enterprise Grid and Networking

Redundancy in enterprise storage networks using dual-domain SAS configurations

Big Graph Processing: Some Background

Technology Update White Paper. High Speed RAID 6. Powered by Custom ASIC Parity Chips

Analysis on Virtualization Technologies in Cloud

SQream Technologies Ltd - Confiden7al

SMB Direct for SQL Server and Private Cloud

HC900 Hybrid Controller When you need more than just discrete control

DIABLO TECHNOLOGIES MEMORY CHANNEL STORAGE AND VMWARE VIRTUAL SAN : VDI ACCELERATION

Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging

HITACHI VIRTUAL STORAGE PLATFORM FAMILY MATRIX

RAID Performance Analysis

PARALLELS CLOUD STORAGE

Comparing SMB Direct 3.0 performance over RoCE, InfiniBand and Ethernet. September 2014

SeaMicro SM Server

Violin: A Framework for Extensible Block-level Storage

Agenda. Enterprise Application Performance Factors. Current form of Enterprise Applications. Factors to Application Performance.

Configuration Maximums VMware vsphere 4.0

Design and Implementation of a Storage Repository Using Commonality Factoring. IEEE/NASA MSST2003 April 7-10, 2003 Eric W. Olsen

High Performance Computing OpenStack Options. September 22, 2015

Rackspace Cloud Databases and Container-based Virtualization

Latency in High Performance Trading Systems Feb 2010

NetApp High-Performance Computing Solution for Lustre: Solution Guide

Cluster Scalability of ANSYS FLUENT 12 for a Large Aerodynamics Case on the Darwin Supercomputer

BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB

Accelerating From Cluster to Cloud: Overview of RDMA on Windows HPC. Wenhao Wu Program Manager Windows HPC team

Globus Striped GridFTP Framework and Server. Raj Kettimuthu, ANL and U. Chicago

Cloud Computing through Virtualization and HPC technologies

MapR Enterprise Edition & Enterprise Database Edition

Netapp HPC Solution for Lustre. Rich Fenton UK Solutions Architect

The Hartree Centre helps businesses unlock the potential of HPC

An Introduction to Dispersive Virtualized Networks

IT of SPIM Data Storage and Compression. EMBO Course - August 27th! Jeff Oegema, Peter Steinbach, Oscar Gonzalez

IOmark- VDI. HP HP ConvergedSystem 242- HC StoreVirtual Test Report: VDI- HC b Test Report Date: 27, April

Virtualization Performance on SGI UV 2000 using Red Hat Enterprise Linux 6.3 KVM

Enabling Technologies for Distributed and Cloud Computing

Evoluzione dell Infrastruttura di Calcolo e Data Analytics per la ricerca

Transcription:

Modeling Big Data/HPC Storage Using Massively Parallel Simula:on Chris Carothers (CCNI) Misbah Mubarak (CS) Rensselaer Polytechnic Ins:tute chrisc@cs.rpi.edu Rob Ross Phil Carns MCS/ANL rross@mcs.anl.gov Connec&ons of all the sub- networks in the world by Bill Cheswick, Lumeta Corp, 1998

Brief History of PDES and NM&S 1994: wireless PCS Network model. 32x32 square grid è 1024 LPs Mobile subscribers as events sent to grid LPs. No physical layer being modeled Performance: ~25K ev/sec on 8 DECsta:on worksta:ons 2003: Packet- level TCP over AT&T Backbone ~1 million packet- level TCP flows over AT&T AS Leveraged Intel P4 HT, mul:- core processors Performance: 200K to 400K ev/sec on 4- way systems 2007: Slice- level BitTorrent Model over NBC Network Upto swarms of 256K Bi`orent clients with 2 to 16 seeders Consumed upwards of 48 GB Serial performance only: ~50K ev/sec 2007-Present saw a significant bump in PDES performance!

Blue Gene /P Layout In ~2009 ALCF/ANL Intrepid 163K cores/ 40 racks @ ~500 TFLOPS ~80TB RAM ~8 PB of disk over GPFS Custom OS kernel

NSF MRI Balanced Cyberinstrument @ CCNI Blue Gene/Q Phase 1: 400+ teraflops @ 2+ GF/wa` #1 on Green 500 list (architecture) 10PF and 20PF DOE systems Exec Model: 64K threads/16k cores 32 TB RAM 32 I/O nodes (4x over typical BG/Qs) RAM Storage Accelerator 8 TB @ 60+ GB/sec 32 servers @ 128 GB each Disk storage 32 servers @ 24 TB disk Bandwidth: 5 to 24 GB/sec Viz systems CCNI: 16 servers w/ dual GPUs EMACS: display wall + servers WHAT CAN WE DO WITH THIS COMPUTE POWER FOR STORAGE M&S?

12.27 billion ev/sec for 10% remote on 65,536 cores!! ROSS is an op&mis&c Time Warp discrete- event simula&on engine designed for massively parallel systems 4 billion ev/sec for 100% remote on 65,536 cores!! Observed similar scaling on Blue Gene/ Q using 256K MPI ranks PHOLD benchmark on ROSS w/ 1 M LPs @ 10 ev each!

CODES Project: CO- Design of Exascale Storage Top I/O job: Plasma Physics 67 TBs per job 10+ hrs to execute Over 2 hr idle period Overall, Bursty I/O! How to design an Exascale storage system? (e.g., Today that s ~1,000,000 3 TB hard disk system)

Modeling Complexity @ Every Level: e.g., File Open applica:on level request storage level Each box represents an event in the model! CIOD level file system level PVFS clients talk to PVFS servers itera:vely to find entry Randomly select a fileserver and create the object If we just model the file system complexity, what happens? 7

Model vs Data: Shared Unaligned Read Test Underscores the need for co- design with real experimental performance data!! Used POSIX interface Used 4MB (4 * 10^6) accesses for a total of 64MB per process Requests span mul:ple file stripe units, requiring that requests serviced by 2 storage node rather than one Simulated read performance curve is similar to simulated write performance curve because the striping algorithm is the same Max error is 30-40% Possible reasons: Lack of queuing at fileserver and Myricom network layer 8

The Dragonfly Network Topology A two level directly connected topology Uses high- radix routers Large number of ports per router Each port has moderate bandwidth p : Number of compute nodes connected to a router a : Number of routers in a group h : Number of global channels per router k=a + p + h 1 a=2p=2h (Recommended configura:on)

ROSS Dragonfly Performance Results on BG/P vs. BG/Q (for a 50 million node model) The event efficiency stays high on both BG/P and BG/Q as each MPI task has substan:al work load The computa:on performed at each MPI task dominates the number of rolled back events

Billion- Node Torus Network Model Using ROSS ROSS is a massively parallel discrete- event simulator Has scaled to 131,072 cores Yields very good strong scaling/extreme execu:on :me compression For accurate storage simula:ons, network is clearly important! So, can we model an exascale like network at the packet- level 32^6 (~1 billion) node Torus Topology consumes > 2 TB Small torus validated model against Blue Gene/L Torus network a b c d number of processors 4,096 8,192 16,384 32,768 65,536 131,072 efficiency 99.83% 99.90% 99.83% 99.55% 98.89% 97.51% event- rate (M/sec) 639 1,192 2,260 4,002 7,307 12,359 remote event percentage secondary rollback rate ( 10-5 ) 11.71% 12.39% 13.77% 16.53% 16.88% 17.22% 1.06 0.254 0.0429 0.51 3.87 21.7

Summary & Forward Challenges 1. TAKE AWAY: Big Data/HPC storage systems can be effec:vely modeled using massively parallel simula:on tools and techniques ROSS is open source at available at our wiki site: odin.cs.rpi.edu 2. Need models and co- design around hybrid HPC/cloud storage systems. 3. Need power, failure and recovery models. 4. Need to exploit simula:on s out of band capabili:es. 5. Extend parallel simula:on engines for parallel I/O, data collec:on. 6. Massively parallel dynamic load balancing of models, especially under irregular big data/hpc workloads. 7. Make simula:on engines and models easier to use and configure. 8. Valida:on/Verifica:on techniques at scale when you don t have real experimental data.