Data Corruption In Storage Stack - Review

Similar documents

Distributed Storage Networks and Computer Forensics

Algorithms and Methods for Distributed Storage Networks 5 Raid-6 Encoding Christian Schindelhauer

STAR : An Efficient Coding Scheme for Correcting Triple Storage Node Failures

Theoretical Aspects of Storage Systems Autumn 2009

How To Encrypt Data With A Power Of N On A K Disk

Reliability and Fault Tolerance in Storage

A Tutorial on Reed-Solomon Coding for Fault-Tolerance in RAID-like Systems

RAID Triple Parity. Peter Corbett. Atul Goel ABSTRACT. Categories and Subject Descriptors. General Terms. Keywords

A Reed-Solomon Code for Disk Storage, and Efficient Recovery Computations for Erasure-Coded Disk Storage

Data Storage - II: Efficient Usage & Errors

How To Understand And Understand The Power Of Aird 6 On Clariion

A Piggybacking Design Framework for Read-and Download-efficient Distributed Storage Codes

Linear Codes. Chapter Basics

Christian Schindelhauer Technical Faculty Computer-Networks and Telematics University of Freiburg

An Analysis of Data Corruption in the Storage Stack

How To Write A Hexadecimal Program

The mathematics of RAID-6

Why disk arrays? CPUs speeds increase faster than disks. - Time won t really help workloads where disk in bottleneck

A Cost-based Heterogeneous Recovery Scheme for Distributed Storage Systems with RAID-6 Codes

RAID. Tiffany Yu-Han Chen. # The performance of different RAID levels # read/write/reliability (fault-tolerant)/overhead

Definition of RAID Levels

RAID. Storage-centric computing, cloud computing. Benefits:

Practical Data Integrity Protection in Network-Coded Cloud Storage

Storing Data: Disks and Files

Using RAID6 for Advanced Data Protection

RAID-DP: NetApp Implementation of Double- Parity RAID for Data Protection

Surviving Two Disk Failures. Introducing Various RAID 6 Implementations

Data Protection Technologies: What comes after RAID? Vladimir Sapunenko, INFN-CNAF HEPiX Spring 2012 Workshop

RAID Level Descriptions. RAID 0 (Striping)

COMP 7970 Storage Systems

Rethinking Erasure Codes for Cloud File Systems: Minimizing I/O for Recovery and Degraded Reads

Today s Papers. RAID Basics (Two optional papers) Array Reliability. EECS 262a Advanced Topics in Computer Systems Lecture 4

A Tutorial on Reed-Solomon Coding for Fault-Tolerance in RAID-like Systems

Why disk arrays? CPUs improving faster than disks

How To Write A Disk Array

Magnus: Peer to Peer Backup System

HDP Code: A Horizontal-Diagonal Parity Code to Optimize I/O Load Balancing in RAID-6

A Performance Comparison of Open-Source Erasure Coding Libraries for Storage Applications

RAID-DP : NETWORK APPLIANCE IMPLEMENTATION OF RAID DOUBLE PARITY FOR DATA PROTECTION A HIGH-SPEED IMPLEMENTATION OF RAID 6

Chapter 13 File and Database Systems

Chapter 13 File and Database Systems

COSC 6374 Parallel Computation. Parallel I/O (I) I/O basics. Concept of a clusters

Disk Array Data Organizations and RAID

A Hitchhiker s Guide to Fast and Efficient Data Reconstruction in Erasure-coded Data Centers

Reliability of Data Storage Systems

Efficient LDPC Code Based Secret Sharing Schemes and Private Data Storage in Cloud without Encryption

Exercise 2 : checksums, RAID and erasure coding

ES-1 Elettronica dei Sistemi 1 Computer Architecture

Non-Redundant (RAID Level 0)

The Advantages and Disadvantages of Network Computing Nodes

Fault Tolerance & Reliability CDA Chapter 3 RAID & Sample Commercial FT Systems

High Performance Computing. Course Notes High Performance Storage

CS 61C: Great Ideas in Computer Architecture. Dependability: Parity, RAID, ECC

Cryptography and Network Security. Prof. D. Mukhopadhyay. Department of Computer Science and Engineering. Indian Institute of Technology, Kharagpur

IBM System x GPFS Storage Server

Lecture 36: Chapter 6

PARALLEL I/O FOR HIGH PERFORMANCE COMPUTING

Database Management Systems

Case for storage. Outline. Magnetic disks. CS2410: Computer Architecture. Storage systems. Sangyeun Cho

= = 3 4, Now assume that P (k) is true for some fixed k 2. This means that

Research on massive data storage in virtual roaming system based on RAID5 RAN Feipeng, DAI Huayang, XING Wujie, WANG Xiang, Li Xuesong

How To Improve Performance On A Single Chip Computer

Disks and RAID. Profs. Bracy and Van Renesse. based on slides by Prof. Sirer

How To Recover A Single Failed Disk

NetApp RAID-DP : Dual-Parity RAID 6 Protection Without Compromise

Solving Data Loss in Massive Storage Systems Jason Resch Cleversafe

OceanStor 9000 InfoProtector Technical White Paper. Issue 01. Date HUAWEI TECHNOLOGIES CO., LTD.

A Performance Evaluation and Examination of Open-Source Erasure Coding Libraries For Storage

Improving Lustre OST Performance with ClusterStor GridRAID. John Fragalla Principal Architect High Performance Computing

Filing Systems. Filing Systems

technology brief RAID Levels March 1997 Introduction Characteristics of RAID Levels

Design and Implementation of a Storage Repository Using Commonality Factoring. IEEE/NASA MSST2003 April 7-10, 2003 Eric W. Olsen

Standard RAID levels - Wikipedia, the free encycl...

A HIGHLY RELIABLE GPU-BASED RAID SYSTEM MATTHEW L. CURRY

Note: Correction to the 1997 Tutorial on Reed-Solomon Coding

CSE-E5430 Scalable Cloud Computing P Lecture 5

Securing Cloud Data Storage

ECE 842 Report Implementation of Elliptic Curve Cryptography

NCCloud: A Network-Coding-Based Storage System in a Cloud-of-Clouds

Sonexion GridRAID Characteristics

Erasure Codes Made So Simple, You ll Really Like Them

Outline. Database Management and Tuning. Overview. Hardware Tuning. Johann Gamper. Unit 12

Cloud Storage over Multiple Data Centers

XtremIO DATA PROTECTION (XDP)

The Pros and Cons of Erasure Coding & Replication vs. RAID in Next-Gen Storage Platforms. Abhijith Shenoy Engineer, Hedvig Inc.

Three-Dimensional Redundancy Codes for Archival Storage

File System & Device Drive. Overview of Mass Storage Structure. Moving head Disk Mechanism. HDD Pictures 11/13/2014. CS341: Operating System

CS161: Operating Systems

StorTrends RAID Considerations

VERY IMPORTANT NOTE! - RAID

Embedding more security in digital signature system by using combination of public key cryptography and secret sharing scheme

HDD Ghosts, Goblins and Failures in RAID Storage Systems

A Solution to the Network Challenges of Data Recovery in Erasure-coded Distributed Storage Systems: A Study on the Facebook Warehouse Cluster

High-Availability and Scalable Cluster-in-a-Box HPC Storage Solution

Operating Systems. RAID Redundant Array of Independent Disks. Submitted by Ankur Niyogi 2003EE20367

RAID. Contents. Definition and Use of the Different RAID Levels. The different RAID levels: Definition Cost / Efficiency Reliability Performance

RAID Technology Overview

CS420: Operating Systems

arxiv: v2 [cs.dc] 6 Jun 2013

Transcription:

Theoretical Aspects of Storage Systems Autumn 2009 Chapter 2: Double Disk Failures André Brinkmann

Data Corruption in the Storage Stack What are Latent Sector Errors What is Silent Data Corruption Checksum Mismatches Identity Discrepancies and Parity Inconsistencies Experiences concerning Silent Data Corruption How to detect these errors and how do they occur Are there any differences between nearline disks and enterprise class disks Is the age of a disk significant Is there any temporal / spatial locality Is the block number important What are the influences of silent data corruption and latent sector errors on RAID recovery?

Outline Importance of multi-error correcting codes RAID 6 strategies Reed-Solomon Codes and Galois fields RAID with Double Parity-Encodings Disk Arrays Row Diagonal Parity EVENODD

Double Disk Failures Assumptions for storage cluster environment: 1 PByte of data stored on 2000 computers Environment is grouped into 200 RAID 5 sets with 10 disks each MTBF of each computer (including disks) is 1000 days Recovery of a computer is 1 day MTTDL= 200 1000d ( ) 2 10 9 1d 55d Protection against single disk failures not enough in large scale environments RAID 6 is mandatory in large environments Recovery time has to be minimized Example taken from Lustre Manual v1.6, August 2007

RAID 6 Any form of RAID that can continue to execute read and write requests to all of an array s virtual disks in the presence of two concurrent disk failures. Both dual check data computations (parity and Reed-Solomon) and orthogonal dual parity have been proposed for RAID Level 6. SNIA: The Dictionary if Storage Networking Terminology Coding / Implementation is not standardized

Terms and Definitions Number of data disks: n Number of coding disks: m Rate of a code: R = n/(n+m) Identifiable Failure: Erasure

Issues with Erasure Coding Failure Coverage - Four ways to specify Specified by a threshold: (e.g. 3 erasures always tolerated) Specified by an average: (e.g. can recover from an average of 11.84 erasures). Specified as MDS (Maximum Distance Separable): MDS: Threshold = average = m. Space optimal. Specified by Overhead Factor f: f = factor from MDS = m/average. f is always >= 1 f = 1 is MDS. J. Plank: Erasure Codes for Storage Applications.Tutorial given at FAST-2005

Problem Definition Partitioning of (n+m) disks into n data disks d i,,d n and m checksum devices c 1,, c m Every disk can store k Bytes Aim: Up to m disks can fail without data loss Capacity of each disk is given in chunks of the size of a words l = ( k Bytes) 8 bits 1 word = 8k byte w bits w Encoding function F i for chunk j on checksum disk i is only based on corresponding words on data disks c i, j = F i (d 1, j,d 2, j,,d n, j ) Update from data word d j to d j on disk x only requires an update for the checksums of the same row ' c i, j ' = G i (d x, j,d x, j,c i, j ) J. Plank: A Tutorial on Reed-Solomon Coding for Fault-Tolerant RAID-like Systems

RAID 5 properties For RAID 5, the number of checksum devices is m=1 and word length is w=1 The checksum is computed as follows: c 1 = F 1 (d 1,,d n ) = d 1 d 2 d n c can be recalculated from the parity of its old value and the old and new data word c 1 ' ' ' = G 1, j (d j,d j,c 1 ) = c 1 d j d j Each word of a failed device can be restored as the parity of the corresponding words on the remaining devices: d j = d 1 d j 1 d j +1 d n c 1 J. Plank: A Tutorial on Reed-Solomon Coding for Fault-Tolerant RAID-like Systems

Reed-Solomon Codes The only MDS coding technique for arbitrary n and m This means that m erasures are always tolerated Have been around for decades Operate on binary words of data, composed of w bits, where 2 w n+m Expensive J. Plank: Erasure Codes for Storage Applications.Tutorial given at FAST-2005

Multi-error correcting Reed-Solomon codes Standard approach for disk protection Use matrices A,E with and the following properties: The system stays linearly independent after the elimination of m rows in A or E Data can be reconstructed using the Gaussian elimination method Derive from a Vandermonde Matrix J. Plank: A Tutorial on Reed-Solomon Coding for Fault-Tolerant RAID-like Systems

Challenges of Reed-Solomon Codes It has to hold that 2 w > n + m Word size is an issue: If n+m 256, we can use bytes as words. If n+m 65,536, we can use shorts as words Arithmetic has to be closed under addition and multiplication (and has to contain the corresponding inverse) Holds without problems for infinite precision real numbers, but not for fixed sized words or even integers Frequent Error: calculate using integer modulo 2 w Division not defined for all elements elimination method is not possible Use Galois Fields with 2 w elements GF(2 w ) J. Plank: A Tutorial on Reed-Solomon Coding for Fault-Tolerant RAID-like Systems

Galois Field Arithmetic GF(2 w ) has elements 0, 1, 2,, 2 w -1 Addition = XOR Easy to implement Nice and Fast Multiplication hard to explain If w small ( 8), use multiplication table If w bigger ( 16), use log/anti-log tables Otherwise, use an iterative process J. Plank: Erasure Codes for Storage Applications.Tutorial given at FAST-2005

Orthogonal Parity RAID Disks are organized as a 2-dimensional matrix Parity is computed for Each row Each column Advantages Allows failure of many (at least 2) disks Disadvantages More parity blocks needed Slow writes: Each write requires 3 read and 4 write operations Each block represents a dedicated disk 0 1 2 3 4 5 6 7 8 9 10 11 Single disk failure handled similar to RAID 5 12 13 14 15 All double disk failures can be handled by row and/or column Parity

Row-Diagonal Parity (RDP) RAID Dedicated RAID Double Parity implementation from NetApp Reduces number of necessary parity disks at the cost that not every failure can be directly resolved First parity dimension is performed as RAID 5 over each row and -parities are calculated for each stripe Stripe consits of p disk with p is prime number and disks 0 p-1 include parity disks Data blocks are stored in disks 0 p 2 Block (i,k) belongs to parity set (i+k) mod p First or last diagonal cannot be used to build own parity DP DP DP DP Corbet et. al: Row-Diagonal Parity for Double Disk Failure Correction, FAST 2004

Erasures and RDP Possible cases: Only a single disk fails: No difference compared to RAID 4 Two disks fail: Block (0,1) can be restored based on a diagonal Block (0,0) can be reconstructed AFTERWARDS based on a stripe Block (3,0) will be restored based on a diagonal... DP DP DP DP

RDP and Parity distribution Original layout of RDP has same bottleneck as RAID 4 A rotation of the parity disk after each stripe seems at least difficult If RDP pattern consists of m rows then it is possible to rotate the meaning of the disks every m rows m m

EVENODD EVENODD has been proposed in 1994 First MDS code, which has been soley based on parities and which corrects two erasures Can be seen as foundation of RDP Higher number of operations for update and reconstruction as RDP Every storage node forms one column Number of data nodes: m with m is prime Number of nodes: m + 2 Blaum, Bradey et. al: EvenOdd An optimal scheme for tolerating double disk Failures in RAID architectures

EVENODD Codes Definition of the blocks as matrix of dimension (m-1) x (m+1) with elements a i,j Element a i,j with 0 < i < m-2 and 0 < j < m is symbol j on disk I Disks m and m+1 store redundant information Imaginary Zero-row a m-1,j as last row Calculation of stripe parities: Calculation of diagonal parities:

EVENODD Codes parity I parity II adjuster Parity node1: Simple horizontal parity Parity node 2: Diagonal parity including additional adjuster Slides is based on: Huang, Xu:Star an Efficient Encoding Scheme for Correcting Triple Storage Node Failures, Fast 2005

Evenodd Decoding Note that S is parity of second parity column There will be at least one diagonal that is missing just one data word. Decode it / them Then there will be at least one row missing just one data word: Decode it / them Continue this process until all the data words are decoded J. Plank: Erasure Codes for Storage Applications.Tutorial given at FAST-2005

STAR: 3-error correcting code Extention of EVENODD Additional parity row parity III Folie von: Huang, Xu:Star an Efficient Encoding Scheme for Correcting Triple Storage Node Failures, Fast 2005

Decoding Complexity Comparison between Star, Evenodd, an additional codes from Blaum, and a purely XOR-basierten RS-Code from Blömer et al. Slide based on Huang, Xu:Star an Efficient Encoding Scheme for Correcting Triple Storage Node Failures, Fast 2005

Slide based on Huang, Xu:Star an Efficient Encoding Scheme for Correcting Triple Storage Node Failures, Fast 2005 Decoding Performance