COSC 6374 Parallel Computation. Parallel I/O (I) I/O basics. Concept of a clusters



Similar documents
COSC 6374 Parallel Computation. Parallel I/O (I) I/O basics. Concept of a clusters

Chapter 11 Distributed File Systems. Distributed File Systems

Network File System (NFS) Pradipta De

Distributed File Systems. NFS Architecture (1)

Network Attached Storage. Jinfeng Yang Oct/19/2015

We mean.network File System

COS 318: Operating Systems. File Layout and Directories. Topics. File System Components. Steps to Open A File

Distributed File Systems. Chapter 10

Distributed File Systems

File-System Implementation

Review. Lecture 21: Reliable, High Performance Storage. Overview. Basic Disk & File System properties CSC 468 / CSC /23/2006

Last class: Distributed File Systems. Today: NFS, Coda

How to Choose your Red Hat Enterprise Linux Filesystem

CS 153 Design of Operating Systems Spring 2015

RAID Storage, Network File Systems, and DropBox

Storage Architectures for Big Data in the Cloud

Lecture 36: Chapter 6

CSE 120 Principles of Operating Systems

Storing Data: Disks and Files

Lecture 5: GFS & HDFS! Claudia Hauff (Web Information Systems)! ti2736b-ewi@tudelft.nl

Input / Ouput devices. I/O Chapter 8. Goals & Constraints. Measures of Performance. Anatomy of a Disk Drive. Introduction - 8.1

Outline. Database Management and Tuning. Overview. Hardware Tuning. Johann Gamper. Unit 12

A Deduplication File System & Course Review

Chapter 13. Disk Storage, Basic File Structures, and Hashing

Chapter 13 File and Database Systems

Chapter 13 File and Database Systems

Chapter 11: File System Implementation. Operating System Concepts with Java 8 th Edition

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 13-1

Cloud Storage. Parallels. Performance Benchmark Results. White Paper.

Chapter 13 Disk Storage, Basic File Structures, and Hashing.

Distributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms

Chapter Introduction. Storage and Other I/O Topics. p. 570( 頁 585) Fig I/O devices can be characterized by. I/O bus connections

Operating Systems CSE 410, Spring File Management. Stephen Wagner Michigan State University

Web DNS Peer-to-peer systems (file sharing, CDNs, cycle sharing)

Google File System. Web and scalability

Here is a diagram of a simple computer system: (this diagram will be the one needed for exams) CPU. cache

THE EXPAND PARALLEL FILE SYSTEM A FILE SYSTEM FOR CLUSTER AND GRID COMPUTING. José Daniel García Sánchez ARCOS Group University Carlos III of Madrid

CS 464/564 Introduction to Database Management System Instructor: Abdullah Mueen

Price/performance Modern Memory Hierarchy

Network File System (NFS)

Client/Server and Distributed Computing

Systems Infrastructure for Data Science. Web Science Group Uni Freiburg WS 2014/15

Dr Markus Hagenbuchner CSCI319. Distributed Systems

CS2510 Computer Operating Systems

CS2510 Computer Operating Systems

HP Smart Array Controllers and basic RAID performance factors

Facebook: Cassandra. Smruti R. Sarangi. Department of Computer Science Indian Institute of Technology New Delhi, India. Overview Design Evaluation

CHAPTER 6: DISTRIBUTED FILE SYSTEMS

Finding a needle in Haystack: Facebook s photo storage IBM Haifa Research Storage Systems

Amoeba Distributed Operating System

Lecture 18: Reliable Storage

RAID. RAID 0 No redundancy ( AID?) Just stripe data over multiple disks But it does improve performance. Chapter 6 Storage and Other I/O Topics 29

technology brief RAID Levels March 1997 Introduction Characteristics of RAID Levels

CSAR: Cluster Storage with Adaptive Redundancy

Panasas at the RCF. Fall 2005 Robert Petkus RHIC/USATLAS Computing Facility Brookhaven National Laboratory. Robert Petkus Panasas at the RCF

Distributed File Systems

RAMCloud and the Low- Latency Datacenter. John Ousterhout Stanford University

File System Management

SAM-FS - Advanced Storage Management Solutions for High Performance Computing Environments

File Systems Management and Examples

Storing Data: Disks and Files. Disks and Files. Why Not Store Everything in Main Memory? Chapter 7

PARALLELS CLOUD STORAGE

SAN Conceptual and Design Basics

FAWN - a Fast Array of Wimpy Nodes

Lab 2 : Basic File Server. Introduction

File System & Device Drive. Overview of Mass Storage Structure. Moving head Disk Mechanism. HDD Pictures 11/13/2014. CS341: Operating System

Big Data With Hadoop

Optimizing Performance. Training Division New Delhi

Linux Driver Devices. Why, When, Which, How?

HP Proliant BL460c G7

Distributed RAID Architectures for Cluster I/O Computing. Kai Hwang

Design and Implementation of a Storage Repository Using Commonality Factoring. IEEE/NASA MSST2003 April 7-10, 2003 Eric W. Olsen

Windows Server Performance Monitoring

Big Data: A Storage Systems Perspective Muthukumar Murugan Ph.D. HP Storage Division

Original-page small file oriented EXT3 file storage system

Overview of I/O Performance and RAID in an RDBMS Environment. By: Edward Whalen Performance Tuning Corporation

Lecture 23: Multiprocessors

Windows 8 SMB 2.2 File Sharing Performance

OPERATING SYSTEMS FILE SYSTEMS

Four Reasons To Start Working With NFSv4.1 Now

Communicating with devices

Scala Storage Scale-Out Clustered Storage White Paper

ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective

Computational infrastructure for NGS data analysis. José Carbonell Caballero Pablo Escobar

RAID HARDWARE. On board SATA RAID controller. RAID drive caddy (hot swappable) SATA RAID controller card. Anne Watson 1

Chapter 10: Mass-Storage Systems

Secondary Storage. Any modern computer system will incorporate (at least) two levels of storage: magnetic disk/optical devices/tape systems

Design and Evolution of the Apache Hadoop File System(HDFS)

POWER ALL GLOBAL FILE SYSTEM (PGFS)

Microsoft Exchange Server 2003 Deployment Considerations

Filing Systems. Filing Systems

Distributed File Systems

Chapter 18: Database System Architectures. Centralized Systems

Configuring Apache Derby for Performance and Durability Olav Sandstå

File Management Chapters 10, 11, 12

Transcription:

COSC 6374 Parallel Computation Parallel I/O (I) I/O basics Spring 2008 Concept of a clusters Processor 1 local disks Compute node message passing network administrative network Memory Processor 2 Network card 1 Network card 2 1

I/O Problem (I) Every node has its own local disk no globally visible file-system Most applications require data and executable to be locally available e.g. an MPI application using 100 nodes requires the executable to be available on all nodes in the same directory using the same name I/O problem (II) Current processor performance: e.g. Pentium 4 3 GHz ~ 6GFLOPS Memory Bandwidth: 133 MHz * 4 * 64Bit ~ 4.26 GB/s Current network performance: Gigabit Ethernet: latency ~ 40 µs, bandwidth=125mb/s InfiniBand 4x: latency ~ 5 µs, bandwidth =1GB/s Disc performance: Latency: 7-12 ms Bandwidth: ~20MB/sec 60 MB/sec 2

Basic characteristics of storage devices Capacity: amount of data a device can store Transfer rate or bandwidth: amount of data at which a device can read/write in a certain amount of time Access time or latency: delay before the first byte is moved Prefix Abbreviation Base ten Base two kilo, kibi K, Ki 10^3 2^10=1024 Mega, mebi M, Mi 10^6 2^20 Giga, gibi G, Gi 10^9 2^30 Tera, tebi T, Ti 10^12 2^40 Peta, pebi P, Pi 10^15 2^50 UNIX File Access Model (I) A File is a sequence of bytes When a program opens a file, the file system establishes a file pointer. The file pointer is an integer indicating the position in the file, where the next byte will be written/read. Multiple processes can open a file concurrently. Each process will have its own file pointer. No conflicts occur, when multiple processes read the same file. If several processes write at the same location, most UNIX file systems guarantee sequential consistency. (The data from one of the processes will be available in the file, but not a mixture of several processes). 3

UNIX File Access Model (II) Disk drives read and write data in fixed-sized units (disk sectors) File systems allocate space in blocks, which is a fixed number of contiguous disk sectors. In UNIX based file systems, the blocks that hold data are listed in an inode. An inode contains the information needed to find all the blocks that belong to a file. If a file is too large and an inode can not hold the whole list of blocks, intermediate nodes (indirect blocks) are introduced. Write operations Write: the file systems copies bytes from the user buffer into system buffer. If buffer filled up, system sends data to disk System buffering + allows file systems to collect full blocks of data before sending to disk + File system can send several blocks at once to the disk (delayed write or write behind) - Data not really saved in the case of a system crash - For very large write operations, the additional copy from user to system buffer could/should be avoided 4

Read operations Read: File system determines, which blocks contain requested data Read blocks from disk into system buffer Copy data from system buffer into user memory System buffering: + file system always reads a full block (file caching) + If application reads data sequentially, prefetching (read ahead) can improve performance - Prefetching harmful to the performance, if application has a random access pattern. File system operations Caching and buffering improve performance Avoiding repeated access to the same block Allowing a file system to smooth out I/O behavior Non-blocking I/O gives users control over prefetching and delayed writing Initiate read/write operations as soon as possible Wait for the finishing of the read/write operations just when absolutely necessary. 5

Distributed File Systems vs. Parallel File Systems Offer access to a collection of files on remote machines Typically client-server based approach Transparent for the user Concurrent access to the same file from several processes is considered to be an unlikely event in contrary to parallel file systems, where it is considered to be a standard operation Distributed file systems assume different numbers of processors than parallel file systems Distributed file systems have different security requirements than parallel file systems NFS Network File System Protocol for a remote file service Client server based approach Stateless server (v3) Communication based on RPC (Remote Procedure Call) NFS provides session semantics changes to an open file are initially only visible to the process that modified the file File locking not part of NFS protocol (v3) File locking handled by a separate protocol/daemon Locking of blocks often supported Client caching not part of the NFS protocol (v3) depending on implementation E.g. allowing cached data to be stale for 30 seconds 6

NFS in a cluster Front-end node hosts the file server NFS in a cluster (II) All file operations are remote operations file server (= NFS server) = bottleneck Extensive usage of file locking required to implement sequential consistency of UNIX I/O Communication between client and server typically uses the slow communication channel on a cluster Do we use several disks at all? Some inefficiencies in the specification, e.g. a read operation involves two RPC operations Lookup file-handle Read request 7

Parallel I/O Basic idea: disk striping Stripe factor: number of disks Stripe depth: size of each block Disk striping Requirements for improving disk performance: Multiple physical disks Separate I/O channels to each disk Data transfer to all disks simultaneously Problem of simple disk striping: Minimum stripe depth (sector size) required for optimal disk performance since file size is limited, the number of disks which can be used in parallel is limited as well Loss of a single disk makes entire file useless Risk to loose a disk is proportional to the number of disks used RAID (Redundant Arrays of Independent Disks see lecture 2) 8

Parallel File Systems Goals Several process should be able to access the same file concurrently Several process should be able to access the same file efficiently Problems Unix sequential consistency semantics Handling of file-pointers Caching and buffering Concurrent file access logical view Number of compute and I/O nodes need not match 1 1 2 2 3 3 4 4 Blocks from compute nodes 1 2 3 4 1 2 3 4 Logical view ( shared file ) 1 3 1 3 2 4 2 4 I/O nodes Disks 9

Concurrent file access opening a file Each I/O node has a subset of the blocks File system needs to look up where the file resides Each I/O node maintains its own directory information or Centralized name service File system needs to look up striping factor (often fixed) Creating a new file file systems has to choose different I/O nodes for holding the first block to avoid contention Concurrent write operations How to ensure sequential consistency? File locking Prevents parallelism even if processes write to different locations in the same file (false sharing) Better: locking of individual blocks Parallel file systems often offer two consistency models Sequential consistency A relaxed consistency model application is responsible for preventing overlapping write-operations 10

File pointers In UNIX: every process has a separate file pointer (individual file pointers) Shared file pointers often useful (e.g. reading the next piece of work, writing a parallel log-file) On distributed memory machines: slow, since somebody has to coordinate the file pointer Can be fast on shared memory machines General problems: file pointer atomicity Non blocking I/O Explicit file offset operations: each process tells the file system where to read/write in the file no update to file pointers! Buffering and caching Client buffering: buffering at compute nodes Consistency problems (e.g. one node writes, another tries to read the same data) Server buffering: buffering at I/O nodes Prevents concatenating several small requests to a single large one => produces lots of traffic 11

Example for a parallel file system: xfs Anderson et all., 1995 Storage server: storing parts of a file Metadata manager: keeps track of data blocks Client: processes user requests Client Manager Manager Storage server Storage server Storage server xfs continued Communication based on active messages Uses fast networking infrastructure Log-based file system Modifications to a file are written to a log-file and collectively written to disk To find a data block, a separate table (imap) holds inode references to the position in the log-file Log-file is distributed among several processes using RAID techniques Storage servers are organized in stripe groups I.e. not all storage servers are participating in all operations A globally replicated table stores which server is belonging to which stripe group Each file has a manager associated to it Manager map: identifies manager for a specific file 12

Starting point: file, data Directory returns file id xfs continued again Manager map returns metadata manager Metadata manager returns exact location of inode in the log: stripe group id, segment id and offset in segment Client computes on which server the block really is File, data Directory fid Manager map imap Stripe group map Client caching in xfs xfs maintains a local block cache Based on block caching Request of write permission transfers the ownership Manager keeps track where a file block is cached Collaborative caching Manager transfer most recent version of a data block directly from one cache into another cache 13

xfs versus NFS Issue NFS v3 xfs Design goals Access transparency Server-less system Access model Remote Log-based Communication RPC Active msgs. Client process Thin Fat Server groups No Yes Name space Per client Global Sharing semantics Session UNIX Caching unit Implementation dep. Block Fault tolerance Reliable communication Striping Summary Parallel I/O is a means to decrease the file I/O access times on parallel applications Performance relevant factors Stripe factor Stripe depth Buffering and caching Non-blocking I/O Parallel file systems offer support for concurrent access of processes to the same file Individual file pointers Shared file pointers Explicit offset Distributed file systems are a poor replacement for parallel file systems 14