Review. Lecture 21: Reliable, High Performance Storage. Overview. Basic Disk & File System properties CSC 468 / CSC 2204 11/23/2006



Similar documents
CS 153 Design of Operating Systems Spring 2015

CSE 120 Principles of Operating Systems

Lecture 18: Reliable Storage

Lecture 36: Chapter 6

File System Design and Implementation

COSC 6374 Parallel Computation. Parallel I/O (I) I/O basics. Concept of a clusters

Outline. Database Management and Tuning. Overview. Hardware Tuning. Johann Gamper. Unit 12

File Systems Management and Examples

PIONEER RESEARCH & DEVELOPMENT GROUP

Chapter 13 File and Database Systems

Chapter 13 File and Database Systems

Filing Systems. Filing Systems

COSC 6374 Parallel Computation. Parallel I/O (I) I/O basics. Concept of a clusters

Block1. Block2. Block3. Block3 Striping

File System Reliability (part 2)

Storing Data: Disks and Files

Data Storage - II: Efficient Usage & Errors

RAID HARDWARE. On board SATA RAID controller. RAID drive caddy (hot swappable) SATA RAID controller card. Anne Watson 1

COS 318: Operating Systems

RAID Storage, Network File Systems, and DropBox

Disk Array Data Organizations and RAID

technology brief RAID Levels March 1997 Introduction Characteristics of RAID Levels

Topics in Computer System Performance and Reliability: Storage Systems!

The Design and Implementation of a Log-Structured File System

6. Storage and File Structures

RAID. Contents. Definition and Use of the Different RAID Levels. The different RAID levels: Definition Cost / Efficiency Reliability Performance

Price/performance Modern Memory Hierarchy

COS 318: Operating Systems. File Layout and Directories. Topics. File System Components. Steps to Open A File

Database Management Systems

File System & Device Drive. Overview of Mass Storage Structure. Moving head Disk Mechanism. HDD Pictures 11/13/2014. CS341: Operating System

TELE 301 Lecture 7: Linux/Unix file

Today s Papers. RAID Basics (Two optional papers) Array Reliability. EECS 262a Advanced Topics in Computer Systems Lecture 4

Fault Tolerance & Reliability CDA Chapter 3 RAID & Sample Commercial FT Systems

Chapter 9: Peripheral Devices: Magnetic Disks

CS161: Operating Systems

Striped Set, Advantages and Disadvantages of Using RAID

Storage and File Systems. Chester Rebeiro IIT Madras

Input / Ouput devices. I/O Chapter 8. Goals & Constraints. Measures of Performance. Anatomy of a Disk Drive. Introduction - 8.1

How To Improve Performance On A Single Chip Computer

Hard Disk Drives and RAID

An Introduction to RAID. Giovanni Stracquadanio

Sistemas Operativos: Input/Output Disks

CS420: Operating Systems

How To Write A Disk Array

Chapter Introduction. Storage and Other I/O Topics. p. 570( 頁 585) Fig I/O devices can be characterized by. I/O bus connections

RAID Technology Overview

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 13-1

Operating Systems. RAID Redundant Array of Independent Disks. Submitted by Ankur Niyogi 2003EE20367

Chapter 13 Disk Storage, Basic File Structures, and Hashing.

RAID. Tiffany Yu-Han Chen. # The performance of different RAID levels # read/write/reliability (fault-tolerant)/overhead

DELL RAID PRIMER DELL PERC RAID CONTROLLERS. Joe H. Trickey III. Dell Storage RAID Product Marketing. John Seward. Dell Storage RAID Engineering

Summer Student Project Report

Disk Storage & Dependability

Configuring Apache Derby for Performance and Durability Olav Sandstå

Operating Systems CSE 410, Spring File Management. Stephen Wagner Michigan State University

Physical Data Organization

Chapter 13. Disk Storage, Basic File Structures, and Hashing

Disks and RAID. Profs. Bracy and Van Renesse. based on slides by Prof. Sirer

1 Storage Devices Summary

Storing Data: Disks and Files. Disks and Files. Why Not Store Everything in Main Memory? Chapter 7

CS 464/564 Introduction to Database Management System Instructor: Abdullah Mueen

The Google File System

Why disk arrays? CPUs improving faster than disks

RAID Overview

Distributed File Systems

Distribution One Server Requirements

Cloud Storage. Parallels. Performance Benchmark Results. White Paper.

ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective

Chapter 12: Mass-Storage Systems

HP Smart Array Controllers and basic RAID performance factors

PARALLELS CLOUD STORAGE

Journal-guided Resynchronization for Software RAID

Information Systems. Computer Science Department ETH Zurich Spring 2012

Guide to SATA Hard Disks Installation and RAID Configuration

Review: The ACID properties

Why disk arrays? CPUs speeds increase faster than disks. - Time won t really help workloads where disk in bottleneck

What is RAID? data reliability with performance

Chapter 6 External Memory. Dr. Mohamed H. Al-Meer

Google File System. Web and scalability

Raid storage. Raid 0: Striping. Raid 1: Mirrored

Berkeley Ninja Architecture

A Deduplication File System & Course Review

Guide to SATA Hard Disks Installation and RAID Configuration

RAID Level Descriptions. RAID 0 (Striping)

Lecture 23: Multiprocessors

Chapter 13. Chapter Outline. Disk Storage, Basic File Structures, and Hashing

RAID. RAID 0 No redundancy ( AID?) Just stripe data over multiple disks But it does improve performance. Chapter 6 Storage and Other I/O Topics 29

Reliability and Fault Tolerance in Storage

Two Parts. Filesystem Interface. Filesystem design. Interface the user sees. Implementing the interface

Module 6. RAID and Expansion Devices

SAM-FS - Advanced Storage Management Solutions for High Performance Computing Environments

Storage and File Structure

RAID Made Easy By Jon L. Jacobi, PCWorld

The Microsoft Large Mailbox Vision

Transcription:

S 468 / S 2204 Review Lecture 2: Reliable, High Performance Storage S 469HF Fall 2006 ngela emke rown We ve looked at fault tolerance via server replication ontinue operating with up to f failures Recovery requires restoring state on failed server With replicated state machines, can rebuild state from non-faulty replicas unless another fault occurs May require each server to maintain local state on stable storage to allow recovery to consistent global state following catastrophic failure Overview Now we re going to look at some example file and storage systems designed for high performance and reliability S Unix Fast File System (FFS) (review) Log-structured File System (LFS) Soft Updates Redundant rray of Inexpensive isks (RI) asic isk & File System properties isks provide large-scale non-volatile storage ccess time determined by: Seek time Rotational latency Transfer time Good performance depends on Reducing seeks reating large transfers File systems provide logical organization and management of data on disk

S 468 / S 2204 Unix Inodes: Indirection & Independence Original Unix File System irectory Inode block lists ata blocks File size grows dynamically, allocations are independent Problem: hard to achieve closeness and amortization Superblock Recall FS sees storage as linear array of blocks Each block has a logical block number (LN) efault usage of LN space itmap Inodes ata locks Simple, straightforward implementation Easy to implement and understand Poor bandwidth utilization (lots of seeking) Poor reliability (lose superblock, lose everything) ylinder Groups S FFS addressed placement problems using the notion of a cylinder group (aka allocation groups in lots of modern FS s) ata blocks in same file allocated in same cylinder group Files in same directory allocated in same cylinder group Inodes for files allocated in same cylinder group as file data blocks Superblock replicated to improve reliability Superblock ylinder group organization ylinder Group Log-structured File System The Log-structured File System (LFS) was designed in response to two trends in workload and technology:. isk bandwidth is scaling significantly (40% a year) Latency is not 2. Large main memories in machines Large buffer caches bsorb large fraction of read requests an use for writes as well oalesce small writes into large writes LFS takes advantage of both of these to increase FS performance Rosenblum and Ousterhout(erkeley, 9) 2

S 468 / S 2204 LFS vs. FFS LFS addresses some problems with FFS Placement is improved, but still have many small seeks Possibly related files are physically separated Inodes separated from files (small seeks) irectory entries separate from inodes Metadata requires synchronous writes With small files, most writes are to metadata (synchronous) Synchronous writes very slow LFS pproach Treat the disk as a single log for appending ollect writes in disk cache, write out entire collection in one large disk request Leverages disk bandwidth No seeks (assuming read/write head is at end of log) ll info written to disk is appended to log ata blocks, attributes, inodes, directories, etc. Simple, eh? las, only in abstract Example: Update file in FFS Example: Update file in LFS locks (data & metadata are updated in place Modified blocks (data & metadata) are written to new location at end of log 3

S 468 / S 2204 LFS hallenges LFS: Locating ata LFS has two challenges it must address for it to be practical. Locating data written to the log FFS places files in a location, LFS writes data at the end 2. Managing free space on the disk isk is finite, so log is finite, cannot always append Need to recover deleted blocks in old parts of log FFS uses inodes to locate data blocks Inodes pre-allocated in each cylinder group irectories contain locations of inodes LFS appends inodes to end of the log just like data Makes them hard to find pproach Use another level of indirection: Inode maps Inode maps map file #s to inode location Inode map blocks are written to log like any other block Location of inode map blocks kept in checkpoint region heckpoint region has a fixed location ache inode maps in memory for performance LFS: Free Space Management LFS append-only quickly runs out of disk space Need to recover deleted blocks pproach: Fragment log into segments Thread active segments on disk Segments can be anywhere Reclaim space by cleaning segments Read segment opy live data to end of log Now have free segment you can reuse leaning is a big problem ostly overhead LFS: rash Recovery Traditional Unix FS: changes can be made anywhere Must scan entire disk to restore metadata consistency after crash LFS: most recent changes are at end of log Much faster/easier to achieve a consistent state LFS borrows 2 database techniques for recovery after a crash: heckpoints define a consistent file system state Roll-forward recovers information written since the last checkpoint 4

S 468 / S 2204 LFS heckpoints LFS Roll-Forward Special, fixed location on disk containing: ddresses of all inode map blocks Segment usage table (for cleaning) ddress of last segment written Timestamp rash can occur while writing the checkpoint Keep two checkpoint regions Write timestamp out last On recovery, use region with most recent timestamp an recover data written since last checkpoint Examine segments written since segment identified by checkpoint region Segments contain special summary blocks and directory operation logs that record changes to file system metadata Operation log entry is guaranteed to appear before modified directory or inode blocks If directory or inode is inconsistent, operation log entry can be used to correct it Write-ahead logging Soft Updates enefit of logging is strictly for reliability/recovery. Used by JFS, ReiserFS INTENT LOG ISK UPTES 2 This approach journals file system updates to an intent log () before modifying the file itself (2). FS LOKS ON ISK Protect consistency of disks by controlling order of writes use write-back caching for all (non-fsync) updates make sure updates propagate to disk in the correct order works great, but only goes as far as update ordering can go 5

S 468 / S 2204 asic Update Ordering Rules RI Purpose: integrity of metadata pointers in face of unpredictable system failures similar to rules of programming with pointers Resource llocation initialize resource before setting pointer Resource e-allocation nullify previous pointer before reuse Resource Movement set new pointer before nullifying old one Notice that something always left dangling assuming a badly-timed crash? Redundant rray of Inexpensive isks (RI) storage system, not a file system Patterson, Katz, and Gibson (erkeley, 88) Idea: Use many disks in parallel to increase storage bandwidth, improve reliability Files are striped across disks Each stripe portion is read/written in parallel andwidth increases with more disks RI Level 0: isk Striping isk striping details One Stripe 0 4 0 5 2 6 2 3 7 3 One isk How disk striping works reak up total space into fixed-size stripe units istribute the stripe units among disks in round-robin fashion ompute location of block # as follows disk# = % N (%=modulo, N = # of disks) LN# = / N (computes the LN on given disk) Key design decision: picking the stripe unit size too big: no parallel transfers and imperfect load balancing too small: small transfers span stripe unit boundaries also, should be a multiple of block size to assist alignment No redundancy 6

S 468 / S 2204 RI 0 hallenges RI Level : Mirroring Small files (small writes less than a full stripe) Need to read entire stripe, update with small write, then write entire stripe out to disks Reliability More disks increases the chance of media failure (MTF) Turn reliability problem into a feature Use one disk to store parity data XOR of all data blocks in stripe an recover any data block from all others + parity block Hence redundant in name Introduces overhead, but, hey, disks are inexpensive Redundancy via replication, two (or more) copies mirroring, shadowing, duplexing, etc. Write both, read either 0 0 2 3 2 3 Raid Level 2&3: Parity disks RI Levels 4-6 oth: very small stripe unit (single byte or word) ll writes update parity disk(s) an correct single-bit errors Level 2: #parity disks = Log 2 (#data disks) overkill Level 3: One extra disk p p p p Levels 4-6 introduce independence: Each disk may be writing different data Level 4 is similar to Level 3 Level 5 removes parity disk bottleneck istributes parity bits across all data disks Level 6 tolerates 2 errors p p p p 7