EXPLOITING SHARED MEMORY TO IMPROVE PARALLEL I/O PERFORMANCE
|
|
|
- Ellen Sanders
- 10 years ago
- Views:
Transcription
1 EXPLOITING SHARED MEMORY TO IMPROVE PARALLEL I/O PERFORMANCE Andrew B. Hastings Sun Microsystems, Inc. Alok Choudhary Northwestern University September 19, 2006 This material is based on work supported by DARPA under Contract No. NBCH
2 Outline Motivation Previous work New shared memory solutions Performance evaluation Conclusion and future work Page 2
3 Why Shared Memory? Because it's there! > For Phase II of DARPA's High Productivity Computer Systems program, Sun proposed a petascale shared memory system Opportunity to improve performance without altering applications > Shared memory typically has lower latency and lower overhead (especially for small payloads) than messages > Change just the library to use shared memory Interesting research area > Most previous work on parallel I/O focusses on clusters Page 3
4 A Common Parallel I/O Problem Application accesses may be noncontiguous in memory and in the file > If not optimized, can result in tens of thousands of small Posix I/O operations Process 1 (P1) memory Process 2 (P2) memory For MPI-IO, two MPI derived datatypes specify the file and memory access patterns file 8 I/O requests (each arrow represents a request) Page 4
5 Previous Solutions 1 Data sieving I/O Each process locks and reads a contiguous block, fills in the altered data, then writes back and unlocks P1 memory data 2???? buffer 5 P I/O requests file Page 5
6 Previous Solutions 2 List I/O Each process creates a list of memory regions and a list of file regions; calls a new filesystem interface P1 (memory list, file list) memory data (memory list, file list) P2 2 I/O requests file Datatype I/O Each process creates a small data structure describing repeating regions in memory and in file; calls a new filesystem interface Page 6
7 Previous Solutions 3 Two-phase collective Each process sends round of data to each aggregator. Aggregator(s) receive and merge into buffer, make large write call(s) to filesystem; repeat until done. P1 send memory data P2 (aggregator) send receive receive buffer file merge write Page 7
8 Using Shared Memory: mmap Each process maps file into its address space, copies data to appropriate location in mapped file > Similar to List I/O but mostly implemented in library P1 loads/stores memory data P2 mapped file Page 8
9 Using Shared Memory: Collectives Collective Shared Data: Each aggregator copies data between its working buffer and shared application memory Collective Shared Buffer: Each process copies data between its application memory and aggregator(s)'s shared working buffer(s) P1 loads/stores memory data buffer file P2 (aggregator) write Page 9
10 Datatype Iterators Problem: copy driven by (offset, length) list: > Huge list thrashes processor cache > List generation expensive; delays I/O Solution: datatype iterator tracks position in MPI datatype, returns next (offset, length) on demand > State fits in handful of cache lines > Tiny startup cost; higher traversal cost can overlap I/O datatype iterator datatype stack 983,039 (offset, length) 0... MPI datatype Page 10
11 Overlapping I/O Strategy: Split working buffer into sub-buffers > After sub-buffer is filled, initiate asynchronous I/O > Before filling next sub-buffer, wait for previous asynchronous I/O on it to complete > Overlaps I/O and data rearrangement! Performance gain for collective shared buffer on FLASH I/O benchmark: > 60% with lists > 90% with datatype iterators Page 11
12 Performance Evaluation Hardware: Sun Fire 6800, MHz processors,150 MHz system bus, 96 GB memory, 4 1Gb FC channels, 4 Sun StorEdge T3 disk arrays (T3 cache disabled) Software: LAM 7.1.1, ROMIO 1.2.4, Solaris 9, Sun StorageTek QFS 4.5 (3 data+1 metadata), 64-bit execution model Bandwidth to data arrays: < 300 MB/s Caveat: Buffered reads benefit from warm buffer cache! Sun, StorageTek, Sun Fire, Sun StorEdge, and Solaris are trademarks or registered trademarks of Sun Miicrosystems, Inc., in the United States and other countries. Page 12
13 Tile Reader Benchmark Tiled display simulation > File size 7 37 MB > From Parallel I/O Benchmarking Consortium, Argonne Data distribution for 2 2 tile array: Data read by one process Aggregate Read Bandwidth (MB/s) Tile array dimensions (number of processes = product) CSB-dt (dir) CSB-list (dir) CSD (dir) mmap (buf) *List I/O (buf) 2PC (dir) 2PC (buf) *DS (dir) *DS (buf) *omitted due to poor performance Page 13
14 ROMIO 3D Block Test array of ints block-distributed to processes > Uneven data distribution for some process counts > Fixed file size: 824 MB Data distribution for 8 processes: Data accessed by one process Page 14
15 ROMIO 3D Block Test Results Aggregate Write Bandwidth (MB/s) Number of processes Aggregate Read Bandwidth (MB/s) CSB-dt (dir) 700 CSB-list (dir) 600 CSD (dir) 500 mmap (buf) *List I/O (buf) 400 2PC (dir) 300 2PC (buf) *DS (dir) 200 *DS (buf) Number of processes *omitted due to poor performance Page 15
16 FLASH I/O Benchmark Z-Axis Memory Organization From Argonne/Northwestern Checkpoint reorganizes to group values by variable 80 blocks per process File Organization X Var 0 Var 1 Var 2 Var 23 Block 0 Block 1 Block 2 Block 79 Proc 0 Proc 1 Proc Proc N Blocks to access in Y-axis Cut a slice of the block Y-Axis FLASH block structure X-Axis Each element has 24 variables Variable 0 Variable 1 Variable 2 Variable 23 Y Z Blocks to access in X-axis Guard Cells Page 16
17 FLASH I/O Benchmark Results Aggregate Write Bandwidth (MB/s) Number of processes: Number of cells along block edge File size: 165MB 15GB Aggregate Write Bandwidth (MB/s) Block size: cells Number of processes File size: 469MB 2.8GB CSB-dt (dir) CSB-list (dir) CSD (dir) mmap (buf) List I/O (buf) 2PC (dir) 2PC (buf) *DS (dir) *DS (buf) *omitted due to poor performance Page 17
18 Conclusion Combination of collective shared buffer, datatype iterators, and sub-buffering offered best aggregate performance for several application I/O patterns > Achieved 90% of available disk bandwidth > 5 improvement over two-phase collective Rediscovered streaming I/O principles: 1. Reduce startup overhead (datatype iterators) 2. Overlap I/O and computation when possible (sub-buffering) Page 18
19 Future Work Apply datatype iterators to MPI messages > Direct sender-to-receiver copy if shared memory Apply datatype iterators to data sieving and twophase collective in ROMIO (currently list-based) > Could benefit traditional clusters Possible standardization of datatype iterators > Required for use of datatype iterators in ROMIO if ROMIO is to remain portable across MPI implementations Page 19
20 Acknowledgements Harriet Coverston and Anton Rang of Sun Microsystems also contributed to this work. This material is based on work supported by the US Defense Advanced Research Projects Agency under Contract No. NBCH Page 20
21 EXPLOITING SHARED MEMORY TO IMPROVE PARALLEL I/O PERFORMANCE Andrew B. Hastings Alok Choudhary This material is based on work supported by DARPA under Contract No. NBCH
22 Datatype Iterators Interface Interfaces: > dtc_next: advance cursor to next contiguous block, return (offset, length) > dtc_size_seek/dtc_extent_seek: position cursor to size or extent within datatype > dtc_size_tell/dtc_extent_tell: return size or extent within datatype corresponding to cursor position Simplifies implementation: > Collective shared buffer 62% fewer code lines with datatype iterators compared to lists Page 22
23 Datatype Iterators Example Copy (non-)contiguous application data directly to (non-)contiguous shared working buffer: while (file_off + file_len <= end_off) { // Entire file block still // fits in current chunk while (file_len >= mem_len) { // Mem block fits in file block src = app_buf + mem_off; memcpy(dest, src, mem_len); // Copy remaining mem block file_off += mem_len; file_len -= mem_len; dest += mem_len; (mem_off, mem_len) = dtc_next(mem_dtc); // Get next mem block } while (mem_len >= file_len) { // File block fits in mem block dest = temp_buf + file_off - start_off; memcpy(dest, src, file_len); // Copy remaining file block mem_off += file_len; mem_len -= file_len; src += file_len; (file_off, file_len) = dtc_next(file_dtc); // Get next file block if (file_off + file_len > end_off) break; } } // Elided: post-loop handling of tail end of file block Page 23
24 Legend CSB-dt: collective shared buffer with datatype iterators > 1 aggregator, 32MB buffer, 4 sub-buffers CSB-list: collective shared buffer with lists > 1 aggregator, 32MB buffer, 4 sub-buffers CSD: collective shared data (lists) > All processes aggregators, 32MB buffer, no sub-buffers 2PC: two-phase collective (lists) > All processes aggregators, 16MB buffer DS: data sieving > 8 MB buffer Page 24
THE EXPAND PARALLEL FILE SYSTEM A FILE SYSTEM FOR CLUSTER AND GRID COMPUTING. José Daniel García Sánchez ARCOS Group University Carlos III of Madrid
THE EXPAND PARALLEL FILE SYSTEM A FILE SYSTEM FOR CLUSTER AND GRID COMPUTING José Daniel García Sánchez ARCOS Group University Carlos III of Madrid Contents 2 The ARCOS Group. Expand motivation. Expand
InterferenceRemoval: Removing Interference of Disk Access for MPI Programs through Data Replication
InterferenceRemoval: Removing Interference of Disk Access for MPI Programs through Data Replication Xuechen Zhang and Song Jiang The ECE Department Wayne State University Detroit, MI, 4822, USA {xczhang,
Performance and scalability of a large OLTP workload
Performance and scalability of a large OLTP workload ii Performance and scalability of a large OLTP workload Contents Performance and scalability of a large OLTP workload with DB2 9 for System z on Linux..............
Lecture 5: GFS & HDFS! Claudia Hauff (Web Information Systems)! [email protected]
Big Data Processing, 2014/15 Lecture 5: GFS & HDFS!! Claudia Hauff (Web Information Systems)! [email protected] 1 Course content Introduction Data streams 1 & 2 The MapReduce paradigm Looking behind
SWARM: A Parallel Programming Framework for Multicore Processors. David A. Bader, Varun N. Kanade and Kamesh Madduri
SWARM: A Parallel Programming Framework for Multicore Processors David A. Bader, Varun N. Kanade and Kamesh Madduri Our Contributions SWARM: SoftWare and Algorithms for Running on Multicore, a portable
Cloud Computing at Google. Architecture
Cloud Computing at Google Google File System Web Systems and Algorithms Google Chris Brooks Department of Computer Science University of San Francisco Google has developed a layered system to handle webscale
COSC 6374 Parallel Computation. Parallel I/O (I) I/O basics. Concept of a clusters
COSC 6374 Parallel I/O (I) I/O basics Fall 2012 Concept of a clusters Processor 1 local disks Compute node message passing network administrative network Memory Processor 2 Network card 1 Network card
CSAR: Cluster Storage with Adaptive Redundancy
CSAR: Cluster Storage with Adaptive Redundancy Manoj Pillai, Mario Lauria Department of Computer and Information Science The Ohio State University Columbus, OH, 4321 Email: pillai,[email protected]
BMC Recovery Manager for Databases: Benchmark Study Performed at Sun Laboratories
WHITE PAPER BMC Recovery Manager for Databases: Benchmark Study Performed at Sun Laboratories BMC delivers extraordinarily fast Oracle backup and recovery performance close to two terabytes per hour Table
COSC 6374 Parallel Computation. Parallel I/O (I) I/O basics. Concept of a clusters
COSC 6374 Parallel Computation Parallel I/O (I) I/O basics Spring 2008 Concept of a clusters Processor 1 local disks Compute node message passing network administrative network Memory Processor 2 Network
Cray DVS: Data Virtualization Service
Cray : Data Virtualization Service Stephen Sugiyama and David Wallace, Cray Inc. ABSTRACT: Cray, the Cray Data Virtualization Service, is a new capability being added to the XT software environment with
Binary search tree with SIMD bandwidth optimization using SSE
Binary search tree with SIMD bandwidth optimization using SSE Bowen Zhang, Xinwei Li 1.ABSTRACT In-memory tree structured index search is a fundamental database operation. Modern processors provide tremendous
The IntelliMagic White Paper: Storage Performance Analysis for an IBM Storwize V7000
The IntelliMagic White Paper: Storage Performance Analysis for an IBM Storwize V7000 Summary: This document describes how to analyze performance on an IBM Storwize V7000. IntelliMagic 2012 Page 1 This
Removing Performance Bottlenecks in Databases with Red Hat Enterprise Linux and Violin Memory Flash Storage Arrays. Red Hat Performance Engineering
Removing Performance Bottlenecks in Databases with Red Hat Enterprise Linux and Violin Memory Flash Storage Arrays Red Hat Performance Engineering Version 1.0 August 2013 1801 Varsity Drive Raleigh NC
Agenda. HPC Software Stack. HPC Post-Processing Visualization. Case Study National Scientific Center. European HPC Benchmark Center Montpellier PSSC
HPC Architecture End to End Alexandre Chauvin Agenda HPC Software Stack Visualization National Scientific Center 2 Agenda HPC Software Stack Alexandre Chauvin Typical HPC Software Stack Externes LAN Typical
Distributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms
Distributed File System 1 How do we get data to the workers? NAS Compute Nodes SAN 2 Distributed File System Don t move data to workers move workers to the data! Store data on the local disks of nodes
CS 6290 I/O and Storage. Milos Prvulovic
CS 6290 I/O and Storage Milos Prvulovic Storage Systems I/O performance (bandwidth, latency) Bandwidth improving, but not as fast as CPU Latency improving very slowly Consequently, by Amdahl s Law: fraction
Remote Copy Technology of ETERNUS6000 and ETERNUS3000 Disk Arrays
Remote Copy Technology of ETERNUS6000 and ETERNUS3000 Disk Arrays V Tsutomu Akasaka (Manuscript received July 5, 2005) This paper gives an overview of a storage-system remote copy function and the implementation
Chapter 6. 6.1 Introduction. Storage and Other I/O Topics. p. 570( 頁 585) Fig. 6.1. I/O devices can be characterized by. I/O bus connections
Chapter 6 Storage and Other I/O Topics 6.1 Introduction I/O devices can be characterized by Behavior: input, output, storage Partner: human or machine Data rate: bytes/sec, transfers/sec I/O bus connections
Overlapping Data Transfer With Application Execution on Clusters
Overlapping Data Transfer With Application Execution on Clusters Karen L. Reid and Michael Stumm [email protected] [email protected] Department of Computer Science Department of Electrical and Computer
Q & A From Hitachi Data Systems WebTech Presentation:
Q & A From Hitachi Data Systems WebTech Presentation: RAID Concepts 1. Is the chunk size the same for all Hitachi Data Systems storage systems, i.e., Adaptable Modular Systems, Network Storage Controller,
- An Essential Building Block for Stable and Reliable Compute Clusters
Ferdinand Geier ParTec Cluster Competence Center GmbH, V. 1.4, March 2005 Cluster Middleware - An Essential Building Block for Stable and Reliable Compute Clusters Contents: Compute Clusters a Real Alternative
Exploiting Transparent Remote Memory Access for Non-Contiguous- and One-Sided-Communication
Workshop for Communication Architecture in Clusters, IPDPS 2002: Exploiting Transparent Remote Memory Access for Non-Contiguous- and One-Sided-Communication Joachim Worringen, Andreas Gäer, Frank Reker
Virtuoso and Database Scalability
Virtuoso and Database Scalability By Orri Erling Table of Contents Abstract Metrics Results Transaction Throughput Initializing 40 warehouses Serial Read Test Conditions Analysis Working Set Effect of
LLNL s Parallel I/O Testing Tools and Techniques for ASC Parallel File Systems
UCRL-CONF-203489 LAWRENCE LIVERMORE NATIONAL LABORATORY LLNL s Parallel I/O Testing Tools and Techniques for ASC Parallel File Systems W. E. Loewe, R. M. Hedges, T. T. McLarty, and C. J. Morrone April
Large File System Backup NERSC Global File System Experience
Large File System Backup NERSC Global File System Experience M. Andrews, J. Hick, W. Kramer, A. Mokhtarani National Energy Research Scientific Computing Center at Lawrence Berkeley National Laboratory
Indexing on Solid State Drives based on Flash Memory
Indexing on Solid State Drives based on Flash Memory Florian Keusch MASTER S THESIS Systems Group Department of Computer Science ETH Zurich http://www.systems.ethz.ch/ September 2008 - March 2009 Supervised
What is RAID--BASICS? Mylex RAID Primer. A simple guide to understanding RAID
What is RAID--BASICS? Mylex RAID Primer A simple guide to understanding RAID Let's look at a hard disk... Several platters stacked on top of each other with a little space in between. One to n platters
Distributed File Systems
Distributed File Systems Paul Krzyzanowski Rutgers University October 28, 2012 1 Introduction The classic network file systems we examined, NFS, CIFS, AFS, Coda, were designed as client-server applications.
SAP HANA - Main Memory Technology: A Challenge for Development of Business Applications. Jürgen Primsch, SAP AG July 2011
SAP HANA - Main Memory Technology: A Challenge for Development of Business Applications Jürgen Primsch, SAP AG July 2011 Why In-Memory? Information at the Speed of Thought Imagine access to business data,
Best Practices for Data Sharing in a Grid Distributed SAS Environment. Updated July 2010
Best Practices for Data Sharing in a Grid Distributed SAS Environment Updated July 2010 B E S T P R A C T I C E D O C U M E N T Table of Contents 1 Abstract... 2 1.1 Storage performance is critical...
CS2510 Computer Operating Systems
CS2510 Computer Operating Systems HADOOP Distributed File System Dr. Taieb Znati Computer Science Department University of Pittsburgh Outline HDF Design Issues HDFS Application Profile Block Abstraction
CS2510 Computer Operating Systems
CS2510 Computer Operating Systems HADOOP Distributed File System Dr. Taieb Znati Computer Science Department University of Pittsburgh Outline HDF Design Issues HDFS Application Profile Block Abstraction
FAWN - a Fast Array of Wimpy Nodes
University of Warsaw January 12, 2011 Outline Introduction 1 Introduction 2 3 4 5 Key issues Introduction Growing CPU vs. I/O gap Contemporary systems must serve millions of users Electricity consumed
Operating Systems, 6 th ed. Test Bank Chapter 7
True / False Questions: Chapter 7 Memory Management 1. T / F In a multiprogramming system, main memory is divided into multiple sections: one for the operating system (resident monitor, kernel) and one
Demand Attach / Fast-Restart Fileserver
. p.1/28 Demand Attach / Fast-Restart Fileserver Tom Keiser Sine Nomine Associates . p.2/28 Introduction Project was commissioned by an SNA client Main requirement was to reduce fileserver restart time
Chapter 6, The Operating System Machine Level
Chapter 6, The Operating System Machine Level 6.1 Virtual Memory 6.2 Virtual I/O Instructions 6.3 Virtual Instructions For Parallel Processing 6.4 Example Operating Systems 6.5 Summary Virtual Memory General
Comparing Dynamic Disk Pools (DDP) with RAID-6 using IOR
Comparing Dynamic Disk Pools (DDP) with RAID-6 using IOR December, 2012 Peter McGonigal [email protected] Abstract Dynamic Disk Pools (DDP) offer an exciting new approach to traditional RAID sets by substantially
Arrow ECS sp. z o.o. Oracle Partner Academy training environment with Oracle Virtualization. Oracle Partner HUB
Oracle Partner Academy training environment with Oracle Virtualization Technology Oracle Partner HUB Overview Description of technology The idea of creating new training centre was to attain light and
Direct NFS - Design considerations for next-gen NAS appliances optimized for database workloads Akshay Shah Gurmeet Goindi Oracle
Direct NFS - Design considerations for next-gen NAS appliances optimized for database workloads Akshay Shah Gurmeet Goindi Oracle Agenda Introduction Database Architecture Direct NFS Client NFS Server
COLO: COarse-grain LOck-stepping Virtual Machine for Non-stop Service
COLO: COarse-grain LOck-stepping Virtual Machine for Non-stop Service Eddie Dong, Yunhong Jiang 1 Legal Disclaimer INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE,
RAID. RAID 0 No redundancy ( AID?) Just stripe data over multiple disks But it does improve performance. Chapter 6 Storage and Other I/O Topics 29
RAID Redundant Array of Inexpensive (Independent) Disks Use multiple smaller disks (c.f. one large disk) Parallelism improves performance Plus extra disk(s) for redundant data storage Provides fault tolerant
Introduction History Design Blue Gene/Q Job Scheduler Filesystem Power usage Performance Summary Sequoia is a petascale Blue Gene/Q supercomputer Being constructed by IBM for the National Nuclear Security
The IntelliMagic White Paper on: Storage Performance Analysis for an IBM San Volume Controller (SVC) (IBM V7000)
The IntelliMagic White Paper on: Storage Performance Analysis for an IBM San Volume Controller (SVC) (IBM V7000) IntelliMagic, Inc. 558 Silicon Drive Ste 101 Southlake, Texas 76092 USA Tel: 214-432-7920
Accelerating Server Storage Performance on Lenovo ThinkServer
Accelerating Server Storage Performance on Lenovo ThinkServer Lenovo Enterprise Product Group April 214 Copyright Lenovo 214 LENOVO PROVIDES THIS PUBLICATION AS IS WITHOUT WARRANTY OF ANY KIND, EITHER
ECLIPSE Performance Benchmarks and Profiling. January 2009
ECLIPSE Performance Benchmarks and Profiling January 2009 Note The following research was performed under the HPC Advisory Council activities AMD, Dell, Mellanox, Schlumberger HPC Advisory Council Cluster
An Architectural study of Cluster-Based Multi-Tier Data-Centers
An Architectural study of Cluster-Based Multi-Tier Data-Centers K. VAIDYANATHAN, P. BALAJI, J. WU, H. -W. JIN, D. K. PANDA Technical Report OSU-CISRC-5/4-TR25 An Architectural study of Cluster-Based Multi-Tier
Interconnection Networks. Interconnection Networks. Interconnection networks are used everywhere!
Interconnection Networks Interconnection Networks Interconnection networks are used everywhere! Supercomputers connecting the processors Routers connecting the ports can consider a router as a parallel
Sun Constellation System: The Open Petascale Computing Architecture
CAS2K7 13 September, 2007 Sun Constellation System: The Open Petascale Computing Architecture John Fragalla Senior HPC Technical Specialist Global Systems Practice Sun Microsystems, Inc. 25 Years of Technical
EFFICIENT EXTERNAL SORTING ON FLASH MEMORY EMBEDDED DEVICES
ABSTRACT EFFICIENT EXTERNAL SORTING ON FLASH MEMORY EMBEDDED DEVICES Tyler Cossentine and Ramon Lawrence Department of Computer Science, University of British Columbia Okanagan Kelowna, BC, Canada [email protected]
1 Storage Devices Summary
Chapter 1 Storage Devices Summary Dependability is vital Suitable measures Latency how long to the first bit arrives Bandwidth/throughput how fast does stuff come through after the latency period Obvious
Using Linux Clusters as VoD Servers
HAC LUCE Using Linux Clusters as VoD Servers Víctor M. Guĺıas Fernández [email protected] Computer Science Department University of A Corunha funded by: Outline Background: The Borg Cluster Video on Demand.
Fault Tolerance & Reliability CDA 5140. Chapter 3 RAID & Sample Commercial FT Systems
Fault Tolerance & Reliability CDA 5140 Chapter 3 RAID & Sample Commercial FT Systems - basic concept in these, as with codes, is redundancy to allow system to continue operation even if some components
Parallel Processing of cluster by Map Reduce
Parallel Processing of cluster by Map Reduce Abstract Madhavi Vaidya, Department of Computer Science Vivekanand College, Chembur, Mumbai [email protected] MapReduce is a parallel programming model
Dependable Systems. 9. Redundant arrays of. Prof. Dr. Miroslaw Malek. Wintersemester 2004/05 www.informatik.hu-berlin.de/rok/zs
Dependable Systems 9. Redundant arrays of inexpensive disks (RAID) Prof. Dr. Miroslaw Malek Wintersemester 2004/05 www.informatik.hu-berlin.de/rok/zs Redundant Arrays of Inexpensive Disks (RAID) RAID is
NAND Flash Architecture and Specification Trends
NAND Flash Architecture and Specification Trends Michael Abraham ([email protected]) NAND Solutions Group Architect Micron Technology, Inc. August 2012 1 Topics NAND Flash Architecture Trends The Cloud
Computer Engineering and Systems Group Electrical and Computer Engineering SCMFS: A File System for Storage Class Memory
SCMFS: A File System for Storage Class Memory Xiaojian Wu, Narasimha Reddy Texas A&M University What is SCM? Storage Class Memory Byte-addressable, like DRAM Non-volatile, persistent storage Example: Phase
Big Table A Distributed Storage System For Data
Big Table A Distributed Storage System For Data OSDI 2006 Fay Chang, Jeffrey Dean, Sanjay Ghemawat et.al. Presented by Rahul Malviya Why BigTable? Lots of (semi-)structured data at Google - - URLs: Contents,
An Implementation and Evaluation of Client-Side File Caching for MPI-IO
An Implementation and Evaluation of Client-Side File Caching for MPI-IO Wei-keng Liao 1, Avery Ching 1, Kenin Coloma 1, Alok Choudhary 1,andLeeWard 2 1 Northwestern University 2 Sandia National Laboratories
Price/performance Modern Memory Hierarchy
Lecture 21: Storage Administration Take QUIZ 15 over P&H 6.1-4, 6.8-9 before 11:59pm today Project: Cache Simulator, Due April 29, 2010 NEW OFFICE HOUR TIME: Tuesday 1-2, McKinley Last Time Exam discussion
GeoGrid Project and Experiences with Hadoop
GeoGrid Project and Experiences with Hadoop Gong Zhang and Ling Liu Distributed Data Intensive Systems Lab (DiSL) Center for Experimental Computer Systems Research (CERCS) Georgia Institute of Technology
Benchmarking Hadoop & HBase on Violin
Technical White Paper Report Technical Report Benchmarking Hadoop & HBase on Violin Harnessing Big Data Analytics at the Speed of Memory Version 1.0 Abstract The purpose of benchmarking is to show advantages
Cooperative Client-side File Caching for MPI Applications
Cooperative Client-side File Caching for MPI Applications Wei-keng Liao, Kenin Coloma, Alok Choudhary,andLeeWard Electrical Engineering and Computer Science Department Northwestern University Evanston,
MPLAB TM C30 Managed PSV Pointers. Beta support included with MPLAB C30 V3.00
MPLAB TM C30 Managed PSV Pointers Beta support included with MPLAB C30 V3.00 Contents 1 Overview 2 1.1 Why Beta?.............................. 2 1.2 Other Sources of Reference..................... 2 2
An Implementation Of Multiprocessor Linux
An Implementation Of Multiprocessor Linux This document describes the implementation of a simple SMP Linux kernel extension and how to use this to develop SMP Linux kernels for architectures other than
Scalable Data Analysis in R. Lee E. Edlefsen Chief Scientist UserR! 2011
Scalable Data Analysis in R Lee E. Edlefsen Chief Scientist UserR! 2011 1 Introduction Our ability to collect and store data has rapidly been outpacing our ability to analyze it We need scalable data analysis
Hadoop Architecture. Part 1
Hadoop Architecture Part 1 Node, Rack and Cluster: A node is simply a computer, typically non-enterprise, commodity hardware for nodes that contain data. Consider we have Node 1.Then we can add more nodes,
Accelerating Hadoop MapReduce Using an In-Memory Data Grid
Accelerating Hadoop MapReduce Using an In-Memory Data Grid By David L. Brinker and William L. Bain, ScaleOut Software, Inc. 2013 ScaleOut Software, Inc. 12/27/2012 H adoop has been widely embraced for
Cloud Storage. Parallels. Performance Benchmark Results. White Paper. www.parallels.com
Parallels Cloud Storage White Paper Performance Benchmark Results www.parallels.com Table of Contents Executive Summary... 3 Architecture Overview... 3 Key Features... 4 No Special Hardware Requirements...
Parallels Cloud Server 6.0
Parallels Cloud Server 6.0 Parallels Cloud Storage I/O Benchmarking Guide September 05, 2014 Copyright 1999-2014 Parallels IP Holdings GmbH and its affiliates. All rights reserved. Parallels IP Holdings
Quiz for Chapter 6 Storage and Other I/O Topics 3.10
Date: 3.10 Not all questions are of equal difficulty. Please review the entire quiz first and then budget your time carefully. Name: Course: Solutions in Red 1. [6 points] Give a concise answer to each
Implementing Network Attached Storage. Ken Fallon Bill Bullers Impactdata
Implementing Network Attached Storage Ken Fallon Bill Bullers Impactdata Abstract The Network Peripheral Adapter (NPA) is an intelligent controller and optimized file server that enables network-attached
Lecture 36: Chapter 6
Lecture 36: Chapter 6 Today s topic RAID 1 RAID Redundant Array of Inexpensive (Independent) Disks Use multiple smaller disks (c.f. one large disk) Parallelism improves performance Plus extra disk(s) for
Symmetric Multiprocessing
Multicore Computing A multi-core processor is a processing system composed of two or more independent cores. One can describe it as an integrated circuit to which two or more individual processors (called
EMC XtremSF: Delivering Next Generation Performance for Oracle Database
White Paper EMC XtremSF: Delivering Next Generation Performance for Oracle Database Abstract This white paper addresses the challenges currently facing business executives to store and process the growing
HP Z Turbo Drive PCIe SSD
Performance Evaluation of HP Z Turbo Drive PCIe SSD Powered by Samsung XP941 technology Evaluation Conducted Independently by: Hamid Taghavi Senior Technical Consultant June 2014 Sponsored by: P a g e
Configuring CoreNet Platform Cache (CPC) as SRAM For Use by Linux Applications
Freescale Semiconductor Document Number:AN4749 Application Note Rev 0, 07/2013 Configuring CoreNet Platform Cache (CPC) as SRAM For Use by Linux Applications 1 Introduction This document provides, with
The Case for Massive Arrays of Idle Disks (MAID)
The Case for Massive Arrays of Idle Disks (MAID) Dennis Colarelli, Dirk Grunwald and Michael Neufeld Dept. of Computer Science Univ. of Colorado, Boulder January 7, 2002 Abstract The declining costs of
Computer Systems Structure Input/Output
Computer Systems Structure Input/Output Peripherals Computer Central Processing Unit Main Memory Computer Systems Interconnection Communication lines Input Output Ward 1 Ward 2 Examples of I/O Devices
Chapter 11: Input/Output Organisation. Lesson 06: Programmed IO
Chapter 11: Input/Output Organisation Lesson 06: Programmed IO Objective Understand the programmed IO mode of data transfer Learn that the program waits for the ready status by repeatedly testing the status
Cluster Scalability of ANSYS FLUENT 12 for a Large Aerodynamics Case on the Darwin Supercomputer
Cluster Scalability of ANSYS FLUENT 12 for a Large Aerodynamics Case on the Darwin Supercomputer Stan Posey, MSc and Bill Loewe, PhD Panasas Inc., Fremont, CA, USA Paul Calleja, PhD University of Cambridge,
Performance Characteristics of VMFS and RDM VMware ESX Server 3.0.1
Performance Study Performance Characteristics of and RDM VMware ESX Server 3.0.1 VMware ESX Server offers three choices for managing disk access in a virtual machine VMware Virtual Machine File System
GPU File System Encryption Kartik Kulkarni and Eugene Linkov
GPU File System Encryption Kartik Kulkarni and Eugene Linkov 5/10/2012 SUMMARY. We implemented a file system that encrypts and decrypts files. The implementation uses the AES algorithm computed through
enabling Ultra-High Bandwidth Scalable SSDs with HLnand
www.hlnand.com enabling Ultra-High Bandwidth Scalable SSDs with HLnand May 2013 2 Enabling Ultra-High Bandwidth Scalable SSDs with HLNAND INTRODUCTION Solid State Drives (SSDs) are available in a wide
Last Class: OS and Computer Architecture. Last Class: OS and Computer Architecture
Last Class: OS and Computer Architecture System bus Network card CPU, memory, I/O devices, network card, system bus Lecture 3, page 1 Last Class: OS and Computer Architecture OS Service Protection Interrupts
VirtualCenter Database Performance for Microsoft SQL Server 2005 VirtualCenter 2.5
Performance Study VirtualCenter Database Performance for Microsoft SQL Server 2005 VirtualCenter 2.5 VMware VirtualCenter uses a database to store metadata on the state of a VMware Infrastructure environment.
Improved LS-DYNA Performance on Sun Servers
8 th International LS-DYNA Users Conference Computing / Code Tech (2) Improved LS-DYNA Performance on Sun Servers Youn-Seo Roh, Ph.D. And Henry H. Fong Sun Microsystems, Inc. Abstract Current Sun platforms
D1.2 Network Load Balancing
D1. Network Load Balancing Ronald van der Pol, Freek Dijkstra, Igor Idziejczak, and Mark Meijerink SARA Computing and Networking Services, Science Park 11, 9 XG Amsterdam, The Netherlands June [email protected],[email protected],
A Comparison Of Shared Memory Parallel Programming Models. Jace A Mogill David Haglin
A Comparison Of Shared Memory Parallel Programming Models Jace A Mogill David Haglin 1 Parallel Programming Gap Not many innovations... Memory semantics unchanged for over 50 years 2010 Multi-Core x86
Configuring Apache Derby for Performance and Durability Olav Sandstå
Configuring Apache Derby for Performance and Durability Olav Sandstå Database Technology Group Sun Microsystems Trondheim, Norway Overview Background > Transactions, Failure Classes, Derby Architecture
InfoScale Storage & Media Server Workloads
InfoScale Storage & Media Server Workloads Maximise Performance when Storing and Retrieving Large Amounts of Unstructured Data Carlos Carrero Colin Eldridge Shrinivas Chandukar 1 Table of Contents 01 Introduction
HP Smart Array Controllers and basic RAID performance factors
Technical white paper HP Smart Array Controllers and basic RAID performance factors Technology brief Table of contents Abstract 2 Benefits of drive arrays 2 Factors that affect performance 2 HP Smart Array
