RAID. Tiffany Yu-Han Chen. # The performance of different RAID levels # read/write/reliability (fault-tolerant)/overhead

Similar documents

The Google File System

Distributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms

Cloud Computing at Google. Architecture

Distributed File Systems

The Google File System

Storing Data: Disks and Files

Lecture 36: Chapter 6

technology brief RAID Levels March 1997 Introduction Characteristics of RAID Levels

Distributed File Systems

Why disk arrays? CPUs speeds increase faster than disks. - Time won t really help workloads where disk in bottleneck

Massive Data Storage

Google File System. Web and scalability

Lecture 5: GFS & HDFS! Claudia Hauff (Web Information Systems)! ti2736b-ewi@tudelft.nl

COSC 6374 Parallel Computation. Parallel I/O (I) I/O basics. Concept of a clusters

Why disk arrays? CPUs improving faster than disks

Operating Systems. RAID Redundant Array of Independent Disks. Submitted by Ankur Niyogi 2003EE20367

Database Management Systems

RAID. Storage-centric computing, cloud computing. Benefits:

Data Storage - II: Efficient Usage & Errors

Filing Systems. Filing Systems

Outline. Database Management and Tuning. Overview. Hardware Tuning. Johann Gamper. Unit 12

How To Write A Disk Array

Hadoop IST 734 SS CHUNG

File System Design and Implementation

Chapter 6 External Memory. Dr. Mohamed H. Al-Meer

RAID. Contents. Definition and Use of the Different RAID Levels. The different RAID levels: Definition Cost / Efficiency Reliability Performance

File System & Device Drive. Overview of Mass Storage Structure. Moving head Disk Mechanism. HDD Pictures 11/13/2014. CS341: Operating System

RAID HARDWARE. On board SATA RAID controller. RAID drive caddy (hot swappable) SATA RAID controller card. Anne Watson 1

THE HADOOP DISTRIBUTED FILE SYSTEM

Definition of RAID Levels

RAID Performance Analysis

CS 153 Design of Operating Systems Spring 2015

Hadoop Distributed File System (HDFS) Overview

HP Smart Array Controllers and basic RAID performance factors

Exercise 2 : checksums, RAID and erasure coding

Chapter 9: Peripheral Devices: Magnetic Disks

W4118: RAID. Instructor: Junfeng Yang

How To Create A Multi Disk Raid

Design and Evolution of the Apache Hadoop File System(HDFS)

MASSIVE DATA PROCESSING (THE GOOGLE WAY ) 27/04/2015. Fundamentals of Distributed Systems. Inside Google circa 2015

Fault Tolerance & Reliability CDA Chapter 3 RAID & Sample Commercial FT Systems

Sistemas Operativos: Input/Output Disks

How To Understand And Understand The Power Of Aird 6 On Clariion

Hadoop Architecture. Part 1

Parallel Processing of cluster by Map Reduce

Lecture 23: Multiprocessors

CSE 120 Principles of Operating Systems

DELL RAID PRIMER DELL PERC RAID CONTROLLERS. Joe H. Trickey III. Dell Storage RAID Product Marketing. John Seward. Dell Storage RAID Engineering

Alternatives to Big Backup

COS 318: Operating Systems. Storage Devices. Kai Li Computer Science Department Princeton University. (

Seminar Presentation for ECE 658 Instructed by: Prof.Anura Jayasumana Distributed File Systems

Load Balancing in Fault Tolerant Video Server

Storage Architectures for Big Data in the Cloud

CS2510 Computer Operating Systems

CS2510 Computer Operating Systems

PARALLELS CLOUD STORAGE

Data-Intensive Computing with Map-Reduce and Hadoop

Distributed File Systems

CS420: Operating Systems

OceanStor 9000 InfoProtector Technical White Paper. Issue 01. Date HUAWEI TECHNOLOGIES CO., LTD.

Hadoop & its Usage at Facebook

Data Corruption In Storage Stack - Review

Price/performance Modern Memory Hierarchy

SSDs and RAID: What s the right strategy. Paul Goodwin VP Product Development Avant Technology

How to choose the right RAID for your Dedicated Server

Big Data Processing in the Cloud. Shadi Ibrahim Inria, Rennes - Bretagne Atlantique Research Center

Review. Lecture 21: Reliable, High Performance Storage. Overview. Basic Disk & File System properties CSC 468 / CSC /23/2006

Disks and RAID. Profs. Bracy and Van Renesse. based on slides by Prof. Sirer

Summer Student Project Report

Introduction Disks RAID Tertiary storage. Mass Storage. CMSC 412, University of Maryland. Guest lecturer: David Hovemeyer.

Quantcast Petabyte Storage at Half Price with QFS!

CSE-E5430 Scalable Cloud Computing P Lecture 5

Reliability and Fault Tolerance in Storage

RAID 5 rebuild performance in ProLiant

Distributed File Systems

Cloud Storage. Parallels. Performance Benchmark Results. White Paper.

Comprehending the Tradeoffs between Deploying Oracle Database on RAID 5 and RAID 10 Storage Configurations. Database Solutions Engineering

CS161: Operating Systems

PIONEER RESEARCH & DEVELOPMENT GROUP

How To Improve Performance On A Single Chip Computer

Chapter 11 I/O Management and Disk Scheduling

MinCopysets: Derandomizing Replication In Cloud Storage

Designing a Cloud Storage System

Dependable Systems. 9. Redundant arrays of. Prof. Dr. Miroslaw Malek. Wintersemester 2004/05

Chapter Introduction. Storage and Other I/O Topics. p. 570( 頁 585) Fig I/O devices can be characterized by. I/O bus connections

Apache Hadoop FileSystem and its Usage in Facebook

Storage and File Structure

Sunita Suralkar, Ashwini Mujumdar, Gayatri Masiwal, Manasi Kulkarni Department of Computer Technology, Veermata Jijabai Technological Institute

Accelerating and Simplifying Apache

Transcription:

RAID # The performance of different RAID levels # read/write/reliability (fault-tolerant)/overhead Tiffany Yu-Han Chen (These slides modified from Hao-Hua Chu National Taiwan University)

RAID 0 - Striping Strip data across all drives (minimum 2 drives) Sequential blocks of data are written across multiple disks in stripes.

RAID 0 - Striping Strip data across all drives (minimum 2 drives) Sequential blocks of data are written across multiple disks in stripes. Read/write? Reliability? Failure recovery? Space redundancy overhead?

RAID 0 - Striping Strip data across all drives (minimum 2 drives) Sequential blocks of data are written across multiple disks in stripes. Read/write? Concurrent read/write Reliability? Failure recovery? No/No Space redundancy overhead? 0%

RAID 1 - Mirrored Redundancy by duplicating data on multiple disks Copy each block to both disks

RAID 1 - Mirrored Redundancy by duplicating data on different disks Copy each block to both disks Read? Write? Reliability? Failure recovery? Space redundancy overhead?

RAID 1 - Mirrored Redundancy by duplicating data on different disks Copy each block to both disks Read? Concurrent read Write? Need to write to both disks Reliability? Failure recovery? Yes Space redundancy overhead? 50%

RAID 3 RAID 5 Parity Disk for Error Correction Disk controllers can detect incorrect disk blocks while reading it Check disk does not need to do error detection, error correction would be sufficient

RAID 3 RAID 5 Parity Disk for Error Correction One parity disk for error correction Parity bit value = XOR across all data bit values 0 1 0?

RAID 3 RAID 5 Parity Disk for Error Correction One parity disk for error correction Parity bit value = XOR across all data bit values If one disk fails, recover the lost data XOR across all good data bit values and parity bit value? 0 0 1

RAID 4 - Block-Interleaved Parity Coarse-grained striping at the block level One parity disk

RAID 4 - Block-Interleaved Parity Coarse-grained striping at the block level One parity disk If one disk fails, # of disk access?

RAID 4 - Block-Interleaved Parity Coarse-grained striping at the block level One parity disk If one disk fails, # of disk access: Read from all good disks (including parity disk) to reconstruct the lost block.

RAID 4 - Block-Interleaved Parity Read? 1-block read, disk access? Reliability? Failure recovery? Space redundancy overhead?

RAID 4 - Block-Interleaved Parity Read? Concurrent read Reliability? Failure recovery? Can tolerate 1 disk failure Space redundancy overhead? 1 parity disk

RAID 4 - Block-Interleaved Parity Write Write also needs to update the parity disk. no need to read other disks!

RAID 4 - Block-Interleaved Parity Write Write also needs to update the parity block. no need to read other disks! Can compute new parity based on old data, new data, and old parity New parity = (old data XOR new data) XOR old parity 1-block write, # of disk access?

RAID 4 - Block-Interleaved Parity Write Write also needs to update the parity block. no need to read other disks! Can compute new parity based on old data, new data, and old parity New parity = (old data XOR new data) XOR old parity 1-block write, # of disk access? Result in bottleneck on the parity disk! (can do only one write at a time)

RAID 5- Block-Interleaved Distributed-Parity Remove the parity disk bottleneck in RAID 4 by distributing the parity uniformly over all of the disks. Comparing to RAID 4 Read? Write? Reliability, space redundancy?

RAID 5- Block-Interleaved Distributed-Parity Remove the parity disk bottleneck in RAID 4 by distributing the parity uniformly over all of the disks. Comparing to RAID 4 Read? Similar Write? Better Reliability, space redundancy? Similar

Questions?

Google File System (GFS) # Observations -> design decisions (These slides modified from Alex Moshchuk, University of Washington)

Assumptions GFS built with commodity hardware High component failure rates GFS stores a modest number of huge files A few million files Each is 100MB or larger

Workloads Read: Large streaming reads (> 1MB); small random reads (a few KBs)

Workloads Read: Large streaming reads (> 1MB); small random reads (a few KBs) Write: Files are write-once, mostly append to File is concurrently written by multiple clients (map-reduce)

GFS Design Decisions Files stored as/divided into chunks Chunk size: 64MB Reliability through replication Each chunk replicated across 3+ chunkservers Single master to coordinate access, keep metadata Simple Add record append operations Support concurrent appends

Architecture Single master Multiple chunkservers

Architecture Single master Multiple chunkservers 1.

Architecture Single master Multiple chunkservers 2.

Architecture Single master Multiple chunkservers 3.

Architecture Single master Multiple chunkservers 4.

Single Master - Simple Problem: Single point of failure Scalability bottleneck

Single Master - Simple Problem: Single point of failure Scalability bottleneck Solutions: Shadow masters

Single Master - Simple Problem: Single point of failure Scalability bottleneck Solutions: Shadow masters Minimize master involvement Chunk leases: master delegates authority to primary replicas in data mutations never move data through it, use only for metadata

Write Algorithm 1. Application originates the request 2. GFS client translates request and sends it to master 3. Master responds with chunk handle and replica locations

Write Algorithm 4. Client pushes write data to all locations. Data is stored in chunkserver s internal buffers

Write Algorithm 5. Client sends write command to primary 6. Primary determines serial order for data mutations in its buffer and writes the mutations in that order to the chunk 7. Primary sends the serial order to the secondaries and tells them to perform the write

Write Algorithm 8. Secondaries respond back to primary 9. Primary responds back to the client

Write Algorithm 4. Client pushes write data to all locations. Data is stored in chunkserver s internal buffers

Write Algorithm 4. Client pushes write data to all locations. Data is stored in chunkserver s internal buffers

Write Failure Applications need to deal with failures Rewrite Chunks are not bitwise identical Use checksum to skip inconsistent file regions xxxxx

Write Failure Applications need to deal with failures Rewrite Chunks are not bitwise identical Use checksum to skip inconsistent file regions D1 D2 D3 D4 D1 D2 D3 D4 D1 D2 D3 D4 D1 D2 D3 D4 D1 D2 D3 D4 xxxxx D1 D2 D3 D4

Questions?