EXPLOITING SHARED MEMORY TO IMPROVE PARALLEL I/O PERFORMANCE

Size: px
Start display at page:

Download "EXPLOITING SHARED MEMORY TO IMPROVE PARALLEL I/O PERFORMANCE"

Transcription

1 EXPLOITING SHARED MEMORY TO IMPROVE PARALLEL I/O PERFORMANCE Andrew B. Hastings Sun Microsystems, Inc. Alok Choudhary Northwestern University September 19, 2006 This material is based on work supported by DARPA under Contract No. NBCH

2 Outline Motivation Previous work New shared memory solutions Performance evaluation Conclusion and future work Page 2

3 Why Shared Memory? Because it's there! > For Phase II of DARPA's High Productivity Computer Systems program, Sun proposed a petascale shared memory system Opportunity to improve performance without altering applications > Shared memory typically has lower latency and lower overhead (especially for small payloads) than messages > Change just the library to use shared memory Interesting research area > Most previous work on parallel I/O focusses on clusters Page 3

4 A Common Parallel I/O Problem Application accesses may be noncontiguous in memory and in the file > If not optimized, can result in tens of thousands of small Posix I/O operations Process 1 (P1) memory Process 2 (P2) memory For MPI-IO, two MPI derived datatypes specify the file and memory access patterns file 8 I/O requests (each arrow represents a request) Page 4

5 Previous Solutions 1 Data sieving I/O Each process locks and reads a contiguous block, fills in the altered data, then writes back and unlocks P1 memory data 2???? buffer 5 P I/O requests file Page 5

6 Previous Solutions 2 List I/O Each process creates a list of memory regions and a list of file regions; calls a new filesystem interface P1 (memory list, file list) memory data (memory list, file list) P2 2 I/O requests file Datatype I/O Each process creates a small data structure describing repeating regions in memory and in file; calls a new filesystem interface Page 6

7 Previous Solutions 3 Two-phase collective Each process sends round of data to each aggregator. Aggregator(s) receive and merge into buffer, make large write call(s) to filesystem; repeat until done. P1 send memory data P2 (aggregator) send receive receive buffer file merge write Page 7

8 Using Shared Memory: mmap Each process maps file into its address space, copies data to appropriate location in mapped file > Similar to List I/O but mostly implemented in library P1 loads/stores memory data P2 mapped file Page 8

9 Using Shared Memory: Collectives Collective Shared Data: Each aggregator copies data between its working buffer and shared application memory Collective Shared Buffer: Each process copies data between its application memory and aggregator(s)'s shared working buffer(s) P1 loads/stores memory data buffer file P2 (aggregator) write Page 9

10 Datatype Iterators Problem: copy driven by (offset, length) list: > Huge list thrashes processor cache > List generation expensive; delays I/O Solution: datatype iterator tracks position in MPI datatype, returns next (offset, length) on demand > State fits in handful of cache lines > Tiny startup cost; higher traversal cost can overlap I/O datatype iterator datatype stack 983,039 (offset, length) 0... MPI datatype Page 10

11 Overlapping I/O Strategy: Split working buffer into sub-buffers > After sub-buffer is filled, initiate asynchronous I/O > Before filling next sub-buffer, wait for previous asynchronous I/O on it to complete > Overlaps I/O and data rearrangement! Performance gain for collective shared buffer on FLASH I/O benchmark: > 60% with lists > 90% with datatype iterators Page 11

12 Performance Evaluation Hardware: Sun Fire 6800, MHz processors,150 MHz system bus, 96 GB memory, 4 1Gb FC channels, 4 Sun StorEdge T3 disk arrays (T3 cache disabled) Software: LAM 7.1.1, ROMIO 1.2.4, Solaris 9, Sun StorageTek QFS 4.5 (3 data+1 metadata), 64-bit execution model Bandwidth to data arrays: < 300 MB/s Caveat: Buffered reads benefit from warm buffer cache! Sun, StorageTek, Sun Fire, Sun StorEdge, and Solaris are trademarks or registered trademarks of Sun Miicrosystems, Inc., in the United States and other countries. Page 12

13 Tile Reader Benchmark Tiled display simulation > File size 7 37 MB > From Parallel I/O Benchmarking Consortium, Argonne Data distribution for 2 2 tile array: Data read by one process Aggregate Read Bandwidth (MB/s) Tile array dimensions (number of processes = product) CSB-dt (dir) CSB-list (dir) CSD (dir) mmap (buf) *List I/O (buf) 2PC (dir) 2PC (buf) *DS (dir) *DS (buf) *omitted due to poor performance Page 13

14 ROMIO 3D Block Test array of ints block-distributed to processes > Uneven data distribution for some process counts > Fixed file size: 824 MB Data distribution for 8 processes: Data accessed by one process Page 14

15 ROMIO 3D Block Test Results Aggregate Write Bandwidth (MB/s) Number of processes Aggregate Read Bandwidth (MB/s) CSB-dt (dir) 700 CSB-list (dir) 600 CSD (dir) 500 mmap (buf) *List I/O (buf) 400 2PC (dir) 300 2PC (buf) *DS (dir) 200 *DS (buf) Number of processes *omitted due to poor performance Page 15

16 FLASH I/O Benchmark Z-Axis Memory Organization From Argonne/Northwestern Checkpoint reorganizes to group values by variable 80 blocks per process File Organization X Var 0 Var 1 Var 2 Var 23 Block 0 Block 1 Block 2 Block 79 Proc 0 Proc 1 Proc Proc N Blocks to access in Y-axis Cut a slice of the block Y-Axis FLASH block structure X-Axis Each element has 24 variables Variable 0 Variable 1 Variable 2 Variable 23 Y Z Blocks to access in X-axis Guard Cells Page 16

17 FLASH I/O Benchmark Results Aggregate Write Bandwidth (MB/s) Number of processes: Number of cells along block edge File size: 165MB 15GB Aggregate Write Bandwidth (MB/s) Block size: cells Number of processes File size: 469MB 2.8GB CSB-dt (dir) CSB-list (dir) CSD (dir) mmap (buf) List I/O (buf) 2PC (dir) 2PC (buf) *DS (dir) *DS (buf) *omitted due to poor performance Page 17

18 Conclusion Combination of collective shared buffer, datatype iterators, and sub-buffering offered best aggregate performance for several application I/O patterns > Achieved 90% of available disk bandwidth > 5 improvement over two-phase collective Rediscovered streaming I/O principles: 1. Reduce startup overhead (datatype iterators) 2. Overlap I/O and computation when possible (sub-buffering) Page 18

19 Future Work Apply datatype iterators to MPI messages > Direct sender-to-receiver copy if shared memory Apply datatype iterators to data sieving and twophase collective in ROMIO (currently list-based) > Could benefit traditional clusters Possible standardization of datatype iterators > Required for use of datatype iterators in ROMIO if ROMIO is to remain portable across MPI implementations Page 19

20 Acknowledgements Harriet Coverston and Anton Rang of Sun Microsystems also contributed to this work. This material is based on work supported by the US Defense Advanced Research Projects Agency under Contract No. NBCH Page 20

21 EXPLOITING SHARED MEMORY TO IMPROVE PARALLEL I/O PERFORMANCE Andrew B. Hastings Alok Choudhary This material is based on work supported by DARPA under Contract No. NBCH

22 Datatype Iterators Interface Interfaces: > dtc_next: advance cursor to next contiguous block, return (offset, length) > dtc_size_seek/dtc_extent_seek: position cursor to size or extent within datatype > dtc_size_tell/dtc_extent_tell: return size or extent within datatype corresponding to cursor position Simplifies implementation: > Collective shared buffer 62% fewer code lines with datatype iterators compared to lists Page 22

23 Datatype Iterators Example Copy (non-)contiguous application data directly to (non-)contiguous shared working buffer: while (file_off + file_len <= end_off) { // Entire file block still // fits in current chunk while (file_len >= mem_len) { // Mem block fits in file block src = app_buf + mem_off; memcpy(dest, src, mem_len); // Copy remaining mem block file_off += mem_len; file_len -= mem_len; dest += mem_len; (mem_off, mem_len) = dtc_next(mem_dtc); // Get next mem block } while (mem_len >= file_len) { // File block fits in mem block dest = temp_buf + file_off - start_off; memcpy(dest, src, file_len); // Copy remaining file block mem_off += file_len; mem_len -= file_len; src += file_len; (file_off, file_len) = dtc_next(file_dtc); // Get next file block if (file_off + file_len > end_off) break; } } // Elided: post-loop handling of tail end of file block Page 23

24 Legend CSB-dt: collective shared buffer with datatype iterators > 1 aggregator, 32MB buffer, 4 sub-buffers CSB-list: collective shared buffer with lists > 1 aggregator, 32MB buffer, 4 sub-buffers CSD: collective shared data (lists) > All processes aggregators, 32MB buffer, no sub-buffers 2PC: two-phase collective (lists) > All processes aggregators, 16MB buffer DS: data sieving > 8 MB buffer Page 24

THE EXPAND PARALLEL FILE SYSTEM A FILE SYSTEM FOR CLUSTER AND GRID COMPUTING. José Daniel García Sánchez ARCOS Group University Carlos III of Madrid

THE EXPAND PARALLEL FILE SYSTEM A FILE SYSTEM FOR CLUSTER AND GRID COMPUTING. José Daniel García Sánchez ARCOS Group University Carlos III of Madrid THE EXPAND PARALLEL FILE SYSTEM A FILE SYSTEM FOR CLUSTER AND GRID COMPUTING José Daniel García Sánchez ARCOS Group University Carlos III of Madrid Contents 2 The ARCOS Group. Expand motivation. Expand

More information

InterferenceRemoval: Removing Interference of Disk Access for MPI Programs through Data Replication

InterferenceRemoval: Removing Interference of Disk Access for MPI Programs through Data Replication InterferenceRemoval: Removing Interference of Disk Access for MPI Programs through Data Replication Xuechen Zhang and Song Jiang The ECE Department Wayne State University Detroit, MI, 4822, USA {xczhang,

More information

Performance and scalability of a large OLTP workload

Performance and scalability of a large OLTP workload Performance and scalability of a large OLTP workload ii Performance and scalability of a large OLTP workload Contents Performance and scalability of a large OLTP workload with DB2 9 for System z on Linux..............

More information

Cray DVS: Data Virtualization Service

Cray DVS: Data Virtualization Service Cray : Data Virtualization Service Stephen Sugiyama and David Wallace, Cray Inc. ABSTRACT: Cray, the Cray Data Virtualization Service, is a new capability being added to the XT software environment with

More information

Lecture 5: GFS & HDFS! Claudia Hauff (Web Information Systems)! ti2736b-ewi@tudelft.nl

Lecture 5: GFS & HDFS! Claudia Hauff (Web Information Systems)! ti2736b-ewi@tudelft.nl Big Data Processing, 2014/15 Lecture 5: GFS & HDFS!! Claudia Hauff (Web Information Systems)! ti2736b-ewi@tudelft.nl 1 Course content Introduction Data streams 1 & 2 The MapReduce paradigm Looking behind

More information

Cloud Computing at Google. Architecture

Cloud Computing at Google. Architecture Cloud Computing at Google Google File System Web Systems and Algorithms Google Chris Brooks Department of Computer Science University of San Francisco Google has developed a layered system to handle webscale

More information

SWARM: A Parallel Programming Framework for Multicore Processors. David A. Bader, Varun N. Kanade and Kamesh Madduri

SWARM: A Parallel Programming Framework for Multicore Processors. David A. Bader, Varun N. Kanade and Kamesh Madduri SWARM: A Parallel Programming Framework for Multicore Processors David A. Bader, Varun N. Kanade and Kamesh Madduri Our Contributions SWARM: SoftWare and Algorithms for Running on Multicore, a portable

More information

COSC 6374 Parallel Computation. Parallel I/O (I) I/O basics. Concept of a clusters

COSC 6374 Parallel Computation. Parallel I/O (I) I/O basics. Concept of a clusters COSC 6374 Parallel I/O (I) I/O basics Fall 2012 Concept of a clusters Processor 1 local disks Compute node message passing network administrative network Memory Processor 2 Network card 1 Network card

More information

SCORPIO: A Scalable Two-Phase Parallel I/O Library With Application To A Large Scale Subsurface Simulator

SCORPIO: A Scalable Two-Phase Parallel I/O Library With Application To A Large Scale Subsurface Simulator SCORPIO: A Scalable Two-Phase Parallel I/O Library With Application To A Large Scale Subsurface Simulator Sarat Sreepathi, Vamsi Sripathi, Richard Mills, Glenn Hammond, G. Kumar Mahinthakumar Oak Ridge

More information

Q & A From Hitachi Data Systems WebTech Presentation:

Q & A From Hitachi Data Systems WebTech Presentation: Q & A From Hitachi Data Systems WebTech Presentation: RAID Concepts 1. Is the chunk size the same for all Hitachi Data Systems storage systems, i.e., Adaptable Modular Systems, Network Storage Controller,

More information

Removing Performance Bottlenecks in Databases with Red Hat Enterprise Linux and Violin Memory Flash Storage Arrays. Red Hat Performance Engineering

Removing Performance Bottlenecks in Databases with Red Hat Enterprise Linux and Violin Memory Flash Storage Arrays. Red Hat Performance Engineering Removing Performance Bottlenecks in Databases with Red Hat Enterprise Linux and Violin Memory Flash Storage Arrays Red Hat Performance Engineering Version 1.0 August 2013 1801 Varsity Drive Raleigh NC

More information

CSAR: Cluster Storage with Adaptive Redundancy

CSAR: Cluster Storage with Adaptive Redundancy CSAR: Cluster Storage with Adaptive Redundancy Manoj Pillai, Mario Lauria Department of Computer and Information Science The Ohio State University Columbus, OH, 4321 Email: pillai,lauria@cis.ohio-state.edu

More information

Agenda. HPC Software Stack. HPC Post-Processing Visualization. Case Study National Scientific Center. European HPC Benchmark Center Montpellier PSSC

Agenda. HPC Software Stack. HPC Post-Processing Visualization. Case Study National Scientific Center. European HPC Benchmark Center Montpellier PSSC HPC Architecture End to End Alexandre Chauvin Agenda HPC Software Stack Visualization National Scientific Center 2 Agenda HPC Software Stack Alexandre Chauvin Typical HPC Software Stack Externes LAN Typical

More information

COSC 6374 Parallel Computation. Parallel I/O (I) I/O basics. Concept of a clusters

COSC 6374 Parallel Computation. Parallel I/O (I) I/O basics. Concept of a clusters COSC 6374 Parallel Computation Parallel I/O (I) I/O basics Spring 2008 Concept of a clusters Processor 1 local disks Compute node message passing network administrative network Memory Processor 2 Network

More information

BMC Recovery Manager for Databases: Benchmark Study Performed at Sun Laboratories

BMC Recovery Manager for Databases: Benchmark Study Performed at Sun Laboratories WHITE PAPER BMC Recovery Manager for Databases: Benchmark Study Performed at Sun Laboratories BMC delivers extraordinarily fast Oracle backup and recovery performance close to two terabytes per hour Table

More information

CS 6290 I/O and Storage. Milos Prvulovic

CS 6290 I/O and Storage. Milos Prvulovic CS 6290 I/O and Storage Milos Prvulovic Storage Systems I/O performance (bandwidth, latency) Bandwidth improving, but not as fast as CPU Latency improving very slowly Consequently, by Amdahl s Law: fraction

More information

Chapter 6. 6.1 Introduction. Storage and Other I/O Topics. p. 570( 頁 585) Fig. 6.1. I/O devices can be characterized by. I/O bus connections

Chapter 6. 6.1 Introduction. Storage and Other I/O Topics. p. 570( 頁 585) Fig. 6.1. I/O devices can be characterized by. I/O bus connections Chapter 6 Storage and Other I/O Topics 6.1 Introduction I/O devices can be characterized by Behavior: input, output, storage Partner: human or machine Data rate: bytes/sec, transfers/sec I/O bus connections

More information

Binary search tree with SIMD bandwidth optimization using SSE

Binary search tree with SIMD bandwidth optimization using SSE Binary search tree with SIMD bandwidth optimization using SSE Bowen Zhang, Xinwei Li 1.ABSTRACT In-memory tree structured index search is a fundamental database operation. Modern processors provide tremendous

More information

The IntelliMagic White Paper: Storage Performance Analysis for an IBM Storwize V7000

The IntelliMagic White Paper: Storage Performance Analysis for an IBM Storwize V7000 The IntelliMagic White Paper: Storage Performance Analysis for an IBM Storwize V7000 Summary: This document describes how to analyze performance on an IBM Storwize V7000. IntelliMagic 2012 Page 1 This

More information

Overlapping Data Transfer With Application Execution on Clusters

Overlapping Data Transfer With Application Execution on Clusters Overlapping Data Transfer With Application Execution on Clusters Karen L. Reid and Michael Stumm reid@cs.toronto.edu stumm@eecg.toronto.edu Department of Computer Science Department of Electrical and Computer

More information

- An Essential Building Block for Stable and Reliable Compute Clusters

- An Essential Building Block for Stable and Reliable Compute Clusters Ferdinand Geier ParTec Cluster Competence Center GmbH, V. 1.4, March 2005 Cluster Middleware - An Essential Building Block for Stable and Reliable Compute Clusters Contents: Compute Clusters a Real Alternative

More information

Distributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms

Distributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms Distributed File System 1 How do we get data to the workers? NAS Compute Nodes SAN 2 Distributed File System Don t move data to workers move workers to the data! Store data on the local disks of nodes

More information

Exploiting Transparent Remote Memory Access for Non-Contiguous- and One-Sided-Communication

Exploiting Transparent Remote Memory Access for Non-Contiguous- and One-Sided-Communication Workshop for Communication Architecture in Clusters, IPDPS 2002: Exploiting Transparent Remote Memory Access for Non-Contiguous- and One-Sided-Communication Joachim Worringen, Andreas Gäer, Frank Reker

More information

Large File System Backup NERSC Global File System Experience

Large File System Backup NERSC Global File System Experience Large File System Backup NERSC Global File System Experience M. Andrews, J. Hick, W. Kramer, A. Mokhtarani National Energy Research Scientific Computing Center at Lawrence Berkeley National Laboratory

More information

September 25, 2007. Maya Gokhale Georgia Institute of Technology

September 25, 2007. Maya Gokhale Georgia Institute of Technology NAND Flash Storage for High Performance Computing Craig Ulmer cdulmer@sandia.gov September 25, 2007 Craig Ulmer Maya Gokhale Greg Diamos Michael Rewak SNL/CA, LLNL Georgia Institute of Technology University

More information

LLNL s Parallel I/O Testing Tools and Techniques for ASC Parallel File Systems

LLNL s Parallel I/O Testing Tools and Techniques for ASC Parallel File Systems UCRL-CONF-203489 LAWRENCE LIVERMORE NATIONAL LABORATORY LLNL s Parallel I/O Testing Tools and Techniques for ASC Parallel File Systems W. E. Loewe, R. M. Hedges, T. T. McLarty, and C. J. Morrone April

More information

Distributed File Systems

Distributed File Systems Distributed File Systems Paul Krzyzanowski Rutgers University October 28, 2012 1 Introduction The classic network file systems we examined, NFS, CIFS, AFS, Coda, were designed as client-server applications.

More information

Indexing on Solid State Drives based on Flash Memory

Indexing on Solid State Drives based on Flash Memory Indexing on Solid State Drives based on Flash Memory Florian Keusch MASTER S THESIS Systems Group Department of Computer Science ETH Zurich http://www.systems.ethz.ch/ September 2008 - March 2009 Supervised

More information

Remote Copy Technology of ETERNUS6000 and ETERNUS3000 Disk Arrays

Remote Copy Technology of ETERNUS6000 and ETERNUS3000 Disk Arrays Remote Copy Technology of ETERNUS6000 and ETERNUS3000 Disk Arrays V Tsutomu Akasaka (Manuscript received July 5, 2005) This paper gives an overview of a storage-system remote copy function and the implementation

More information

SAP HANA - Main Memory Technology: A Challenge for Development of Business Applications. Jürgen Primsch, SAP AG July 2011

SAP HANA - Main Memory Technology: A Challenge for Development of Business Applications. Jürgen Primsch, SAP AG July 2011 SAP HANA - Main Memory Technology: A Challenge for Development of Business Applications Jürgen Primsch, SAP AG July 2011 Why In-Memory? Information at the Speed of Thought Imagine access to business data,

More information

Best Practices for Data Sharing in a Grid Distributed SAS Environment. Updated July 2010

Best Practices for Data Sharing in a Grid Distributed SAS Environment. Updated July 2010 Best Practices for Data Sharing in a Grid Distributed SAS Environment Updated July 2010 B E S T P R A C T I C E D O C U M E N T Table of Contents 1 Abstract... 2 1.1 Storage performance is critical...

More information

Arrow ECS sp. z o.o. Oracle Partner Academy training environment with Oracle Virtualization. Oracle Partner HUB

Arrow ECS sp. z o.o. Oracle Partner Academy training environment with Oracle Virtualization. Oracle Partner HUB Oracle Partner Academy training environment with Oracle Virtualization Technology Oracle Partner HUB Overview Description of technology The idea of creating new training centre was to attain light and

More information

Chapter 6, The Operating System Machine Level

Chapter 6, The Operating System Machine Level Chapter 6, The Operating System Machine Level 6.1 Virtual Memory 6.2 Virtual I/O Instructions 6.3 Virtual Instructions For Parallel Processing 6.4 Example Operating Systems 6.5 Summary Virtual Memory General

More information

The search engine you can see. Connects people to information and services

The search engine you can see. Connects people to information and services The search engine you can see Connects people to information and services The search engine you cannot see Total data: ~1EB Processing data : ~100PB/day Total web pages: ~1000 Billion Web pages updated:

More information

Introduction History Design Blue Gene/Q Job Scheduler Filesystem Power usage Performance Summary Sequoia is a petascale Blue Gene/Q supercomputer Being constructed by IBM for the National Nuclear Security

More information

Virtuoso and Database Scalability

Virtuoso and Database Scalability Virtuoso and Database Scalability By Orri Erling Table of Contents Abstract Metrics Results Transaction Throughput Initializing 40 warehouses Serial Read Test Conditions Analysis Working Set Effect of

More information

Opportunistic Data-driven Execution of Parallel Programs for Efficient I/O Services

Opportunistic Data-driven Execution of Parallel Programs for Efficient I/O Services Opportunistic Data-driven Execution of Parallel Programs for Efficient I/O Services Xuechen Zhang ECE Department Wayne State University Detroit, MI, 4822, USA xczhang@wayne.edu Kei Davis CCS Division Los

More information

COLO: COarse-grain LOck-stepping Virtual Machine for Non-stop Service

COLO: COarse-grain LOck-stepping Virtual Machine for Non-stop Service COLO: COarse-grain LOck-stepping Virtual Machine for Non-stop Service Eddie Dong, Yunhong Jiang 1 Legal Disclaimer INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE,

More information

Implementing Network Attached Storage. Ken Fallon Bill Bullers Impactdata

Implementing Network Attached Storage. Ken Fallon Bill Bullers Impactdata Implementing Network Attached Storage Ken Fallon Bill Bullers Impactdata Abstract The Network Peripheral Adapter (NPA) is an intelligent controller and optimized file server that enables network-attached

More information

What is RAID--BASICS? Mylex RAID Primer. A simple guide to understanding RAID

What is RAID--BASICS? Mylex RAID Primer. A simple guide to understanding RAID What is RAID--BASICS? Mylex RAID Primer A simple guide to understanding RAID Let's look at a hard disk... Several platters stacked on top of each other with a little space in between. One to n platters

More information

Computer Systems Structure Input/Output

Computer Systems Structure Input/Output Computer Systems Structure Input/Output Peripherals Computer Central Processing Unit Main Memory Computer Systems Interconnection Communication lines Input Output Ward 1 Ward 2 Examples of I/O Devices

More information

Interconnection Networks. Interconnection Networks. Interconnection networks are used everywhere!

Interconnection Networks. Interconnection Networks. Interconnection networks are used everywhere! Interconnection Networks Interconnection Networks Interconnection networks are used everywhere! Supercomputers connecting the processors Routers connecting the ports can consider a router as a parallel

More information

Dependable Systems. 9. Redundant arrays of. Prof. Dr. Miroslaw Malek. Wintersemester 2004/05 www.informatik.hu-berlin.de/rok/zs

Dependable Systems. 9. Redundant arrays of. Prof. Dr. Miroslaw Malek. Wintersemester 2004/05 www.informatik.hu-berlin.de/rok/zs Dependable Systems 9. Redundant arrays of inexpensive disks (RAID) Prof. Dr. Miroslaw Malek Wintersemester 2004/05 www.informatik.hu-berlin.de/rok/zs Redundant Arrays of Inexpensive Disks (RAID) RAID is

More information

NAND Flash Architecture and Specification Trends

NAND Flash Architecture and Specification Trends NAND Flash Architecture and Specification Trends Michael Abraham (mabraham@micron.com) NAND Solutions Group Architect Micron Technology, Inc. August 2012 1 Topics NAND Flash Architecture Trends The Cloud

More information

The IntelliMagic White Paper on: Storage Performance Analysis for an IBM San Volume Controller (SVC) (IBM V7000)

The IntelliMagic White Paper on: Storage Performance Analysis for an IBM San Volume Controller (SVC) (IBM V7000) The IntelliMagic White Paper on: Storage Performance Analysis for an IBM San Volume Controller (SVC) (IBM V7000) IntelliMagic, Inc. 558 Silicon Drive Ste 101 Southlake, Texas 76092 USA Tel: 214-432-7920

More information

CS2510 Computer Operating Systems

CS2510 Computer Operating Systems CS2510 Computer Operating Systems HADOOP Distributed File System Dr. Taieb Znati Computer Science Department University of Pittsburgh Outline HDF Design Issues HDFS Application Profile Block Abstraction

More information

CS2510 Computer Operating Systems

CS2510 Computer Operating Systems CS2510 Computer Operating Systems HADOOP Distributed File System Dr. Taieb Znati Computer Science Department University of Pittsburgh Outline HDF Design Issues HDFS Application Profile Block Abstraction

More information

Computer Engineering and Systems Group Electrical and Computer Engineering SCMFS: A File System for Storage Class Memory

Computer Engineering and Systems Group Electrical and Computer Engineering SCMFS: A File System for Storage Class Memory SCMFS: A File System for Storage Class Memory Xiaojian Wu, Narasimha Reddy Texas A&M University What is SCM? Storage Class Memory Byte-addressable, like DRAM Non-volatile, persistent storage Example: Phase

More information

Sun Constellation System: The Open Petascale Computing Architecture

Sun Constellation System: The Open Petascale Computing Architecture CAS2K7 13 September, 2007 Sun Constellation System: The Open Petascale Computing Architecture John Fragalla Senior HPC Technical Specialist Global Systems Practice Sun Microsystems, Inc. 25 Years of Technical

More information

An Architectural study of Cluster-Based Multi-Tier Data-Centers

An Architectural study of Cluster-Based Multi-Tier Data-Centers An Architectural study of Cluster-Based Multi-Tier Data-Centers K. VAIDYANATHAN, P. BALAJI, J. WU, H. -W. JIN, D. K. PANDA Technical Report OSU-CISRC-5/4-TR25 An Architectural study of Cluster-Based Multi-Tier

More information

Price/performance Modern Memory Hierarchy

Price/performance Modern Memory Hierarchy Lecture 21: Storage Administration Take QUIZ 15 over P&H 6.1-4, 6.8-9 before 11:59pm today Project: Cache Simulator, Due April 29, 2010 NEW OFFICE HOUR TIME: Tuesday 1-2, McKinley Last Time Exam discussion

More information

EFFICIENT EXTERNAL SORTING ON FLASH MEMORY EMBEDDED DEVICES

EFFICIENT EXTERNAL SORTING ON FLASH MEMORY EMBEDDED DEVICES ABSTRACT EFFICIENT EXTERNAL SORTING ON FLASH MEMORY EMBEDDED DEVICES Tyler Cossentine and Ramon Lawrence Department of Computer Science, University of British Columbia Okanagan Kelowna, BC, Canada tcossentine@gmail.com

More information

Operating Systems, 6 th ed. Test Bank Chapter 7

Operating Systems, 6 th ed. Test Bank Chapter 7 True / False Questions: Chapter 7 Memory Management 1. T / F In a multiprogramming system, main memory is divided into multiple sections: one for the operating system (resident monitor, kernel) and one

More information

FAWN - a Fast Array of Wimpy Nodes

FAWN - a Fast Array of Wimpy Nodes University of Warsaw January 12, 2011 Outline Introduction 1 Introduction 2 3 4 5 Key issues Introduction Growing CPU vs. I/O gap Contemporary systems must serve millions of users Electricity consumed

More information

Demand Attach / Fast-Restart Fileserver

Demand Attach / Fast-Restart Fileserver . p.1/28 Demand Attach / Fast-Restart Fileserver Tom Keiser Sine Nomine Associates . p.2/28 Introduction Project was commissioned by an SNA client Main requirement was to reduce fileserver restart time

More information

Comparing Dynamic Disk Pools (DDP) with RAID-6 using IOR

Comparing Dynamic Disk Pools (DDP) with RAID-6 using IOR Comparing Dynamic Disk Pools (DDP) with RAID-6 using IOR December, 2012 Peter McGonigal petermc@sgi.com Abstract Dynamic Disk Pools (DDP) offer an exciting new approach to traditional RAID sets by substantially

More information

Scalable Data Analysis in R. Lee E. Edlefsen Chief Scientist UserR! 2011

Scalable Data Analysis in R. Lee E. Edlefsen Chief Scientist UserR! 2011 Scalable Data Analysis in R Lee E. Edlefsen Chief Scientist UserR! 2011 1 Introduction Our ability to collect and store data has rapidly been outpacing our ability to analyze it We need scalable data analysis

More information

Benchmarking Hadoop & HBase on Violin

Benchmarking Hadoop & HBase on Violin Technical White Paper Report Technical Report Benchmarking Hadoop & HBase on Violin Harnessing Big Data Analytics at the Speed of Memory Version 1.0 Abstract The purpose of benchmarking is to show advantages

More information

Quiz for Chapter 6 Storage and Other I/O Topics 3.10

Quiz for Chapter 6 Storage and Other I/O Topics 3.10 Date: 3.10 Not all questions are of equal difficulty. Please review the entire quiz first and then budget your time carefully. Name: Course: Solutions in Red 1. [6 points] Give a concise answer to each

More information

Cloud Storage. Parallels. Performance Benchmark Results. White Paper. www.parallels.com

Cloud Storage. Parallels. Performance Benchmark Results. White Paper. www.parallels.com Parallels Cloud Storage White Paper Performance Benchmark Results www.parallels.com Table of Contents Executive Summary... 3 Architecture Overview... 3 Key Features... 4 No Special Hardware Requirements...

More information

Symmetric Multiprocessing

Symmetric Multiprocessing Multicore Computing A multi-core processor is a processing system composed of two or more independent cores. One can describe it as an integrated circuit to which two or more individual processors (called

More information

Big Data in HPC Applications and Programming Abstractions. Saba Sehrish Oct 3, 2012

Big Data in HPC Applications and Programming Abstractions. Saba Sehrish Oct 3, 2012 Big Data in HPC Applications and Programming Abstractions Saba Sehrish Oct 3, 2012 Big Data in Computational Science - Size Data requirements for select 2012 INCITE applications at ALCF (BG/P) On-line

More information

Awareness of MPI Virtual Process Topologies on the Single-Chip Cloud Computer

Awareness of MPI Virtual Process Topologies on the Single-Chip Cloud Computer Awareness of MPI Virtual Process Topologies on the Single-Chip Cloud Computer Steffen Christgau, Bettina Schnor Potsdam University Institute of Computer Science Operating Systems and Distributed Systems

More information

Cooperative Client-side File Caching for MPI Applications

Cooperative Client-side File Caching for MPI Applications Cooperative Client-side File Caching for MPI Applications Wei-keng Liao, Kenin Coloma, Alok Choudhary,andLeeWard Electrical Engineering and Computer Science Department Northwestern University Evanston,

More information

Initial Performance Evaluation of the Cray SeaStar Interconnect

Initial Performance Evaluation of the Cray SeaStar Interconnect Initial Performance Evaluation of the Cray SeaStar Interconnect Ron Brightwell Kevin Pedretti Keith D. Underwood Sandia National Laboratories PO Box 58 Albuquerque, NM 87185-111 E-mail: {rbbrigh,ktpedre,kdunder}@sandia.gov

More information

EMC XtremSF: Delivering Next Generation Performance for Oracle Database

EMC XtremSF: Delivering Next Generation Performance for Oracle Database White Paper EMC XtremSF: Delivering Next Generation Performance for Oracle Database Abstract This white paper addresses the challenges currently facing business executives to store and process the growing

More information

The Case for Massive Arrays of Idle Disks (MAID)

The Case for Massive Arrays of Idle Disks (MAID) The Case for Massive Arrays of Idle Disks (MAID) Dennis Colarelli, Dirk Grunwald and Michael Neufeld Dept. of Computer Science Univ. of Colorado, Boulder January 7, 2002 Abstract The declining costs of

More information

HP Z Turbo Drive PCIe SSD

HP Z Turbo Drive PCIe SSD Performance Evaluation of HP Z Turbo Drive PCIe SSD Powered by Samsung XP941 technology Evaluation Conducted Independently by: Hamid Taghavi Senior Technical Consultant June 2014 Sponsored by: P a g e

More information

Direct NFS - Design considerations for next-gen NAS appliances optimized for database workloads Akshay Shah Gurmeet Goindi Oracle

Direct NFS - Design considerations for next-gen NAS appliances optimized for database workloads Akshay Shah Gurmeet Goindi Oracle Direct NFS - Design considerations for next-gen NAS appliances optimized for database workloads Akshay Shah Gurmeet Goindi Oracle Agenda Introduction Database Architecture Direct NFS Client NFS Server

More information

CSCA0102 IT & Business Applications. Foundation in Business Information Technology School of Engineering & Computing Sciences FTMS College Global

CSCA0102 IT & Business Applications. Foundation in Business Information Technology School of Engineering & Computing Sciences FTMS College Global CSCA0102 IT & Business Applications Foundation in Business Information Technology School of Engineering & Computing Sciences FTMS College Global Chapter 2 Data Storage Concepts System Unit The system unit

More information

Delegation-based I/O Mechanism for High Performance Computing Systems

Delegation-based I/O Mechanism for High Performance Computing Systems 1 Delegation-based I/O Mechanism for High Performance Computing Systems Arifa Nisar, Wei-keng Liao and Alok Choudhary Electrical Engineering and Computer Science Department Northwestern University Evanston,

More information

RAID. RAID 0 No redundancy ( AID?) Just stripe data over multiple disks But it does improve performance. Chapter 6 Storage and Other I/O Topics 29

RAID. RAID 0 No redundancy ( AID?) Just stripe data over multiple disks But it does improve performance. Chapter 6 Storage and Other I/O Topics 29 RAID Redundant Array of Inexpensive (Independent) Disks Use multiple smaller disks (c.f. one large disk) Parallelism improves performance Plus extra disk(s) for redundant data storage Provides fault tolerant

More information

Last Class: OS and Computer Architecture. Last Class: OS and Computer Architecture

Last Class: OS and Computer Architecture. Last Class: OS and Computer Architecture Last Class: OS and Computer Architecture System bus Network card CPU, memory, I/O devices, network card, system bus Lecture 3, page 1 Last Class: OS and Computer Architecture OS Service Protection Interrupts

More information

Configuring CoreNet Platform Cache (CPC) as SRAM For Use by Linux Applications

Configuring CoreNet Platform Cache (CPC) as SRAM For Use by Linux Applications Freescale Semiconductor Document Number:AN4749 Application Note Rev 0, 07/2013 Configuring CoreNet Platform Cache (CPC) as SRAM For Use by Linux Applications 1 Introduction This document provides, with

More information

Performance Characteristics of VMFS and RDM VMware ESX Server 3.0.1

Performance Characteristics of VMFS and RDM VMware ESX Server 3.0.1 Performance Study Performance Characteristics of and RDM VMware ESX Server 3.0.1 VMware ESX Server offers three choices for managing disk access in a virtual machine VMware Virtual Machine File System

More information

Accelerating Server Storage Performance on Lenovo ThinkServer

Accelerating Server Storage Performance on Lenovo ThinkServer Accelerating Server Storage Performance on Lenovo ThinkServer Lenovo Enterprise Product Group April 214 Copyright Lenovo 214 LENOVO PROVIDES THIS PUBLICATION AS IS WITHOUT WARRANTY OF ANY KIND, EITHER

More information

CS 377: Operating Systems. Outline. A review of what you ve learned, and how it applies to a real operating system. Lecture 25 - Linux Case Study

CS 377: Operating Systems. Outline. A review of what you ve learned, and how it applies to a real operating system. Lecture 25 - Linux Case Study CS 377: Operating Systems Lecture 25 - Linux Case Study Guest Lecturer: Tim Wood Outline Linux History Design Principles System Overview Process Scheduling Memory Management File Systems A review of what

More information

ECLIPSE Performance Benchmarks and Profiling. January 2009

ECLIPSE Performance Benchmarks and Profiling. January 2009 ECLIPSE Performance Benchmarks and Profiling January 2009 Note The following research was performed under the HPC Advisory Council activities AMD, Dell, Mellanox, Schlumberger HPC Advisory Council Cluster

More information

D1.2 Network Load Balancing

D1.2 Network Load Balancing D1. Network Load Balancing Ronald van der Pol, Freek Dijkstra, Igor Idziejczak, and Mark Meijerink SARA Computing and Networking Services, Science Park 11, 9 XG Amsterdam, The Netherlands June ronald.vanderpol@sara.nl,freek.dijkstra@sara.nl,

More information

enabling Ultra-High Bandwidth Scalable SSDs with HLnand

enabling Ultra-High Bandwidth Scalable SSDs with HLnand www.hlnand.com enabling Ultra-High Bandwidth Scalable SSDs with HLnand May 2013 2 Enabling Ultra-High Bandwidth Scalable SSDs with HLNAND INTRODUCTION Solid State Drives (SSDs) are available in a wide

More information

ProTrack: A Simple Provenance-tracking Filesystem

ProTrack: A Simple Provenance-tracking Filesystem ProTrack: A Simple Provenance-tracking Filesystem Somak Das Department of Electrical Engineering and Computer Science Massachusetts Institute of Technology das@mit.edu Abstract Provenance describes a file

More information

- Behind The Cloud -

- Behind The Cloud - - Behind The Cloud - Infrastructure and Technologies used for Cloud Computing Alexander Huemer, 0025380 Johann Taferl, 0320039 Florian Landolt, 0420673 Seminar aus Informatik, University of Salzburg Overview

More information

1 Storage Devices Summary

1 Storage Devices Summary Chapter 1 Storage Devices Summary Dependability is vital Suitable measures Latency how long to the first bit arrives Bandwidth/throughput how fast does stuff come through after the latency period Obvious

More information

Parallel Processing of cluster by Map Reduce

Parallel Processing of cluster by Map Reduce Parallel Processing of cluster by Map Reduce Abstract Madhavi Vaidya, Department of Computer Science Vivekanand College, Chembur, Mumbai vamadhavi04@yahoo.co.in MapReduce is a parallel programming model

More information

VirtualCenter Database Performance for Microsoft SQL Server 2005 VirtualCenter 2.5

VirtualCenter Database Performance for Microsoft SQL Server 2005 VirtualCenter 2.5 Performance Study VirtualCenter Database Performance for Microsoft SQL Server 2005 VirtualCenter 2.5 VMware VirtualCenter uses a database to store metadata on the state of a VMware Infrastructure environment.

More information

Fault Tolerance & Reliability CDA 5140. Chapter 3 RAID & Sample Commercial FT Systems

Fault Tolerance & Reliability CDA 5140. Chapter 3 RAID & Sample Commercial FT Systems Fault Tolerance & Reliability CDA 5140 Chapter 3 RAID & Sample Commercial FT Systems - basic concept in these, as with codes, is redundancy to allow system to continue operation even if some components

More information

Using Linux Clusters as VoD Servers

Using Linux Clusters as VoD Servers HAC LUCE Using Linux Clusters as VoD Servers Víctor M. Guĺıas Fernández gulias@lfcia.org Computer Science Department University of A Corunha funded by: Outline Background: The Borg Cluster Video on Demand.

More information

Flash Performance in Storage Systems. Bill Moore Chief Engineer, Storage Systems Sun Microsystems

Flash Performance in Storage Systems. Bill Moore Chief Engineer, Storage Systems Sun Microsystems Flash Performance in Storage Systems Bill Moore Chief Engineer, Storage Systems Sun Microsystems 1 Disk to CPU Discontinuity Moore s Law is out-stripping disk drive performance (rotational speed) As a

More information

Analysis of VDI Storage Performance During Bootstorm

Analysis of VDI Storage Performance During Bootstorm Analysis of VDI Storage Performance During Bootstorm Introduction Virtual desktops are gaining popularity as a more cost effective and more easily serviceable solution. The most resource-dependent process

More information

Big Table A Distributed Storage System For Data

Big Table A Distributed Storage System For Data Big Table A Distributed Storage System For Data OSDI 2006 Fay Chang, Jeffrey Dean, Sanjay Ghemawat et.al. Presented by Rahul Malviya Why BigTable? Lots of (semi-)structured data at Google - - URLs: Contents,

More information

An Implementation and Evaluation of Client-Side File Caching for MPI-IO

An Implementation and Evaluation of Client-Side File Caching for MPI-IO An Implementation and Evaluation of Client-Side File Caching for MPI-IO Wei-keng Liao 1, Avery Ching 1, Kenin Coloma 1, Alok Choudhary 1,andLeeWard 2 1 Northwestern University 2 Sandia National Laboratories

More information

Quantum StorNext. Product Brief: Distributed LAN Client

Quantum StorNext. Product Brief: Distributed LAN Client Quantum StorNext Product Brief: Distributed LAN Client NOTICE This product brief may contain proprietary information protected by copyright. Information in this product brief is subject to change without

More information

MPICH FOR SCI-CONNECTED CLUSTERS

MPICH FOR SCI-CONNECTED CLUSTERS Autumn Meeting 99 of AK Scientific Computing MPICH FOR SCI-CONNECTED CLUSTERS Joachim Worringen AGENDA Introduction, Related Work & Motivation Implementation Performance Work in Progress Summary MESSAGE-PASSING

More information

Scaling Study of LS-DYNA MPP on High Performance Servers

Scaling Study of LS-DYNA MPP on High Performance Servers Scaling Study of LS-DYNA MPP on High Performance Servers Youn-Seo Roh Sun Microsystems, Inc. 901 San Antonio Rd, MS MPK24-201 Palo Alto, CA 94303 USA youn-seo.roh@sun.com 17-25 ABSTRACT With LS-DYNA MPP,

More information

Solution Direction for Long-Term Archive

<Insert Picture Here> Solution Direction for Long-Term Archive 1 Solution Direction for Long-Term Archive Donna Harland Oracle Optimized Solutions: Solutions Architect Program Agenda Archive Layers SAM QFS connectivity for

More information

GeoGrid Project and Experiences with Hadoop

GeoGrid Project and Experiences with Hadoop GeoGrid Project and Experiences with Hadoop Gong Zhang and Ling Liu Distributed Data Intensive Systems Lab (DiSL) Center for Experimental Computer Systems Research (CERCS) Georgia Institute of Technology

More information

MPLAB TM C30 Managed PSV Pointers. Beta support included with MPLAB C30 V3.00

MPLAB TM C30 Managed PSV Pointers. Beta support included with MPLAB C30 V3.00 MPLAB TM C30 Managed PSV Pointers Beta support included with MPLAB C30 V3.00 Contents 1 Overview 2 1.1 Why Beta?.............................. 2 1.2 Other Sources of Reference..................... 2 2

More information

An Implementation Of Multiprocessor Linux

An Implementation Of Multiprocessor Linux An Implementation Of Multiprocessor Linux This document describes the implementation of a simple SMP Linux kernel extension and how to use this to develop SMP Linux kernels for architectures other than

More information

GPFS Storage Server. Concepts and Setup in Lemanicus BG/Q system" Christian Clémençon (EPFL-DIT)" " 4 April 2013"

GPFS Storage Server. Concepts and Setup in Lemanicus BG/Q system Christian Clémençon (EPFL-DIT)  4 April 2013 GPFS Storage Server Concepts and Setup in Lemanicus BG/Q system" Christian Clémençon (EPFL-DIT)" " Agenda" GPFS Overview" Classical versus GSS I/O Solution" GPFS Storage Server (GSS)" GPFS Native RAID

More information

Legal Notices... 2. Introduction... 3

Legal Notices... 2. Introduction... 3 HP Asset Manager Asset Manager 5.10 Sizing Guide Using the Oracle Database Server, or IBM DB2 Database Server, or Microsoft SQL Server Legal Notices... 2 Introduction... 3 Asset Manager Architecture...

More information