COSC 6374 Parallel Computation. Parallel I/O (I) I/O basics. Concept of a clusters



Similar documents
COSC 6374 Parallel Computation. Parallel I/O (I) I/O basics. Concept of a clusters

Storing Data: Disks and Files

Chapter 11 Distributed File Systems. Distributed File Systems

Network Attached Storage. Jinfeng Yang Oct/19/2015

Network File System (NFS) Pradipta De

Lecture 36: Chapter 6

Outline. Database Management and Tuning. Overview. Hardware Tuning. Johann Gamper. Unit 12

Chapter Introduction. Storage and Other I/O Topics. p. 570( 頁 585) Fig I/O devices can be characterized by. I/O bus connections

Review. Lecture 21: Reliable, High Performance Storage. Overview. Basic Disk & File System properties CSC 468 / CSC /23/2006

RAID. Tiffany Yu-Han Chen. # The performance of different RAID levels # read/write/reliability (fault-tolerant)/overhead

Distributed File Systems

Chapter 11 I/O Management and Disk Scheduling

Filing Systems. Filing Systems

CS 153 Design of Operating Systems Spring 2015

Database Management Systems

How To Write A Disk Array

An Introduction to RAID. Giovanni Stracquadanio

Sistemas Operativos: Input/Output Disks

CSE 120 Principles of Operating Systems

CS 464/564 Introduction to Database Management System Instructor: Abdullah Mueen

Price/performance Modern Memory Hierarchy

Data Storage - II: Efficient Usage & Errors

HP Smart Array Controllers and basic RAID performance factors

Storing Data: Disks and Files. Disks and Files. Why Not Store Everything in Main Memory? Chapter 7

Input / Ouput devices. I/O Chapter 8. Goals & Constraints. Measures of Performance. Anatomy of a Disk Drive. Introduction - 8.1

Fault Tolerance & Reliability CDA Chapter 3 RAID & Sample Commercial FT Systems

Introduction. What is RAID? The Array and RAID Controller Concept. Click here to print this article. Re-Printed From SLCentral

Chapter 13. Disk Storage, Basic File Structures, and Hashing

HARD DRIVE CHARACTERISTICS REFRESHER

PIONEER RESEARCH & DEVELOPMENT GROUP

1 Storage Devices Summary

File System Design and Implementation

Operating Systems CSE 410, Spring File Management. Stephen Wagner Michigan State University

technology brief RAID Levels March 1997 Introduction Characteristics of RAID Levels

RAID HARDWARE. On board SATA RAID controller. RAID drive caddy (hot swappable) SATA RAID controller card. Anne Watson 1

RAID Storage, Network File Systems, and DropBox

Lecture 5: GFS & HDFS! Claudia Hauff (Web Information Systems)! ti2736b-ewi@tudelft.nl

Chapter 13 File and Database Systems

Chapter 13 File and Database Systems

Chapter 13 Disk Storage, Basic File Structures, and Hashing.

Storage. The text highlighted in green in these slides contain external hyperlinks. 1 / 14

RAID. RAID 0 No redundancy ( AID?) Just stripe data over multiple disks But it does improve performance. Chapter 6 Storage and Other I/O Topics 29

Storage in Database Systems. CMPSCI 445 Fall 2010

File System & Device Drive. Overview of Mass Storage Structure. Moving head Disk Mechanism. HDD Pictures 11/13/2014. CS341: Operating System

File-System Implementation

Distributed File Systems. NFS Architecture (1)

Why disk arrays? CPUs improving faster than disks

Lecture 18: Reliable Storage

Overview of I/O Performance and RAID in an RDBMS Environment. By: Edward Whalen Performance Tuning Corporation

Google File System. Web and scalability

Lecture 23: Multiprocessors

RAID Overview

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 13-1

Web DNS Peer-to-peer systems (file sharing, CDNs, cycle sharing)

CS161: Operating Systems

Summer Student Project Report

Physical Data Organization

How To Improve Performance On A Single Chip Computer

Hard Disk Drives and RAID

Distributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms

Chapter 10: Mass-Storage Systems

High-Performance SSD-Based RAID Storage. Madhukar Gunjan Chakhaiyar Product Test Architect

RAID. Storage-centric computing, cloud computing. Benefits:

Chapter 11 I/O Management and Disk Scheduling

Chapter 11: File System Implementation. Operating System Concepts with Java 8 th Edition

COS 318: Operating Systems. File Layout and Directories. Topics. File System Components. Steps to Open A File

Why disk arrays? CPUs speeds increase faster than disks. - Time won t really help workloads where disk in bottleneck

Big data management with IBM General Parallel File System

RAID Basics Training Guide

CS2510 Computer Operating Systems

CS2510 Computer Operating Systems

Chapter 6 External Memory. Dr. Mohamed H. Al-Meer

Distributed File Systems

Disk Array Data Organizations and RAID

Cloud Storage. Parallels. Performance Benchmark Results. White Paper.

CS 6290 I/O and Storage. Milos Prvulovic

CSAR: Cluster Storage with Adaptive Redundancy

Striped Set, Advantages and Disadvantages of Using RAID

Design and Evolution of the Apache Hadoop File System(HDFS)

How To Create A Multi Disk Raid

Dr Markus Hagenbuchner CSCI319. Distributed Systems

PARALLELS CLOUD STORAGE

CHAPTER 17: File Management

Direct NFS - Design considerations for next-gen NAS appliances optimized for database workloads Akshay Shah Gurmeet Goindi Oracle

We mean.network File System

GPFS Storage Server. Concepts and Setup in Lemanicus BG/Q system" Christian Clémençon (EPFL-DIT)" " 4 April 2013"

Design and Implementation of a Storage Repository Using Commonality Factoring. IEEE/NASA MSST2003 April 7-10, 2003 Eric W. Olsen

IBM System x GPFS Storage Server

Timing of a Disk I/O Transfer

Client/Server and Distributed Computing

Operating Systems. RAID Redundant Array of Independent Disks. Submitted by Ankur Niyogi 2003EE20367

Transcription:

COSC 6374 Parallel I/O (I) I/O basics Fall 2012 Concept of a clusters Processor 1 local disks Compute node message passing network administrative network Memory Processor 2 Network card 1 Network card 2 1

I/O Problem (I) Every node has its own local disk Most applications require data and executable to be locally available e.g. an MPI application using multiple nodes requires executable to be available on all nodes in the same directory using the same name Multiple processes need to access the same file potentially different portions efficiency Basic characteristics of storage devices Capacity: amount of data a device can store Transfer rate or bandwidth: amount of data at which a device can read/write in a certain amount of time Access time or latency: delay before the first byte is moved Prefix Abbreviation Base ten Base two kilo, kibi K, Ki 10^3 2^10=1024 Mega, mebi M, Mi 10^6 2^20 Giga, gibi G, Gi 10^9 2^30 Tera, tebi T, Ti 10^12 2^40 Peta, pebi P, Pi 10^15 2^50 2

UNIX File Access Model A File is a sequence of bytes When a program opens a file, the file system establishes a file pointer. The file pointer is an integer indicating the position in the file, where the next byte will be written/read. Disk drives read and write data in fixed-sized units (disk sectors) File systems allocate space in blocks, which is a fixed number of contiguous disk sectors. In UNIX based file systems, the blocks that hold data are listed in an inode. An inode contains the information needed to find all the blocks that belong to a file. If a file is too large and an inode can not hold the whole list of blocks, intermediate nodes (indirect blocks) are introduced. Write operations Write: the file systems copies bytes from the user buffer into system buffer. If buffer filled up, system sends data to disk System buffering + allows file systems to collect full blocks of data before sending to disk + File system can send several blocks at once to the disk (delayed write or write behind) - Data not really saved in the case of a system crash - For very large write operations, the additional copy from user to system buffer could/should be avoided 3

Read operations Read: File system determines, which blocks contain requested data Read blocks from disk into system buffer Copy data from system buffer into user memory System buffering: + file system always reads a full block (file caching) + If application reads data sequentially, prefetching (read ahead) can improve performance - Prefetching harmful to the performance, if application has a random access pattern. Dealing with disk latency: Caching and buffering Avoids repeated access to the same block Allows a file system to smooth out I/O behavior Helps to hide the latency of the hard drives Lowers the performance of I/O operations for irregular access Non-blocking I/O gives users control over prefetching and delayed writing Initiate read/write operations as soon as possible Wait for the finishing of the read/write operations just when absolutely necessary. 4

Improving Disk Bandwidth: disk striping Utilize multiple hard drives Split a file into constant chunks and distribute them across all disks Three relevant parameters: Stripe factor: number of disks Stripe depth: size of each block Which disk contains the first block of the file Block 1 Block 2 Block 3 Block n Disk 1 Disk 2 Disk 3 Disk 4 Disk striping Ideal assumption b(n, p) = p * b(n/p, 1) with N: number of bytes to be written b: bandwidth p: number of disks Realistically: b(n,p) < p * b(n/p,1) since N is often not large enough to fully utilize p hard drives networking overhead 5

Two levels of disk striping (I) Using a RAID controller Hardware typically a single box number of disks: 3 n Redundant arrays of independent disks (RAID) Goals: improve reliability and performance of an I/O system improve performance of an I/O system Several RAID levels defined RAID 0: disk striping without redundant storage ( JBOD = just a bunch of disks) No fault tolerance Good for high transfer rates i.e. read/write bandwidth of a single large file Good for high request rates i.e. access time to many (small) files RAID 1: mirroring All data is replicated on two or more disks Does not improve write performance and just moderately the read performance 6

RAID level 2 RAID 2: Hamming codes Each group of data bits has several check bits appended to it forming Hamming code words Each bit of a Hamming code word is stored on a separate disk Very high additional costs: e.g. up to 50% additional capacity required Hardly used today since parity based codes faster and easier RAID level 3 Parity based protection: Based on exclusive OR (XOR) Reversible Example 01101010 (data byte 1) XOR 11001001 (data byte 2) -------------------------------------- 10100011 (parity byte) Recovery 11001001 (data byte 2) XOR 10100011 (parity byte) --------------------------------------- 01101010 (recovered data byte 1) 7

RAID level 3 (cont.) Data divided evenly into N subblocks (N = number of disks, typically 4 or 5) Computing parity bytes generates an additional subblock Subblocks written in parallel on N+1 disks For best performance data should be of size (N * sector size) Problems with RAID level 3: All disks are always participating in every operation => contention for applications with high access rates If data size is less than N*sector size, system has to read old subblocks to calculate the parity bytes RAID level 3 good for high transfer rates RAID level 4 Parity bytes for N disks calculated and stored Parity bytes are stored on a separate disk Files are not necessarily distributed over N disks For read operations: Determine disks for the requested blocks Read data from these disks For write operations Retrieve the old data from the sector being overwritten Retrieve parity block from the parity disk Extract old data from the parity block using XOR operations Add the new data to the parity block using XOR Store new data Store new parity block Bottleneck: parity disk is involved in every operation 8

RAID level 5 Same as RAID 4, but parity blocks are distributed on different disks Block 1 Block 2 Block 3 Block 4 P(1,2,3,4) Block 5 Block 6 Block 7 P(5,6,7,8) Block 8 RAID level 6 Tolerates the loss of more than one disk Collection of several techniques E.g. P+Q parity: store parity bytes using two different algorithms and store the two parity blocks on different disks E.g. Two dimensional parity Parity disks 9

Is RAID level 1 + RAID level 0 RAID 1 mirroring RAID level 10 RAID 0 striping Also available: RAID 53 (RAID 0 + RAID 3) Comparing RAID levels RAID level Protection Space usage Good at.. Poor at.. 0 None N Performance Data protect. 1 Mirroring 2N Data protect. Space effic. 2 Hamming codes ~1.5N Transfer rate Request rate 3 Parity N+1 Transfer rate Request rate 4 Parity N+1 Read req. rate Write perf. 5 Parity N+1 Request rate Transfer rate 6 P+Q or 2-D (N+2) or (MN+M+N) Data protect. Write perf. 10 Mirroring 2N Performance Space effic. 53 parity N+striping factor Performance Space effic. 10

Two levels of disk striping (II) Using a parallel file system exposes the individual units capable of handling data often called storage servers, I/O nodes, etc. each storage server might use multiple hard drives underneath the hood to increase its read/write bandwidth Metadata server which keeps track of which parts of a file are on which storage server Single disk failure less of a problem, if each server uses underneath the hood a RAID 5 storage system Parallel File Systems: Conceptual overview Compute nodes Meta-data server storage server 0 storage server 1 storage server 2 storage server 3 11

File access on a parallel file system Compute node Metadata server Application calls write() OS requests list of relevant I/O nodes for this write operation MD server sends storage IDs, offsets etc. OS sends data to storage servers Disk striping Requirements to improve performance of I/O operations using disk striping: Multiple physical disks Have to balance network bandwidth and I/O bandwidth Problem of simple disk striping: for a fixed file size, the number of disks which can be used in parallel is limited Prominent parallel file systems PVFS2 Lustre GPFS NFS v4.2 (new standard currently being ratified) 12

Distributed vs. Parallel File Systems Distributed File Systems Offer access to a collection of files on remote machines Typically client-server based approach Transparent for the user NFS The Network File System Protocol for a remote file service Stateless server (v3) Communication based on RPC (Remote Procedure Call) NFS provides session semantics changes to an open file are initially only visible to the process that modified the file File locking not part of NFS protocol (v3) but often available through a separate protocol/daemon Client caching not part of the NFS protocol (v3) implementation dependent behavior Network File System (NFS) Compute node = NFS client NFS server Application calls write() OS forwards data to NSF server NFS daemon receives data NFS daemon calls write() 13

Parallel vs. Distributed File Systems In distributed file systems: Concurrent access to the same file from several processes is considered to be an unlikely event Assume different (i.e. lower) numbers of processes accessing a file Different security requirements 14