An In-Depth Look at Variable Stripe RAID (VSR)



Similar documents
An In-Depth Look at the RamSan-810 Flash Solid State Disk

High-Performance SSD-Based RAID Storage. Madhukar Gunjan Chakhaiyar Product Test Architect

PIONEER RESEARCH & DEVELOPMENT GROUP

Advantages of e-mmc 4.4 based Embedded Memory Architectures

The Technologies & Architectures. President, Demartek

Ensuring Robust Data Integrity

Important Differences Between Consumer and Enterprise Flash Architectures

Understanding endurance and performance characteristics of HP solid state drives

Endurance Models for Cactus Technologies Industrial-Grade Flash Storage Products. White Paper CTWP006

SSD Performance Tips: Avoid The Write Cliff

Trends in NAND Flash Memory Error Correction

Don t Let RAID Raid the Lifetime of Your SSD Array

Database!Fatal!Flash!Flaws!No!One! Talks!About!!!

Solid State Storage in Massive Data Environments Erik Eyberg

NAND Flash FAQ. Eureka Technology. apn5_87. NAND Flash FAQ

Flash 101. Violin Memory Switzerland. Violin Memory Inc. Proprietary 1

Flash s Role in Big Data, Past Present, and Future OBJECTIVE ANALYSIS. Jim Handy

Definition of RAID Levels

NAND Basics Understanding the Technology Behind Your SSD

Yaffs NAND Flash Failure Mitigation

All-Flash Storage Solution for SAP HANA:

Solid State Drive Technology

SLC vs MLC: Proper Flash Selection for SSDs in Industrial, Military and Avionic Applications. A TCS Space & Component Technology White Paper

Destroying Flash Memory-Based Storage Devices (draft v0.9)

NAND Flash Media Management Through RAIN

Data Retention in MLC NAND Flash Memory: Characterization, Optimization, and Recovery

SLC vs MLC: Which is best for high-reliability apps?

O&O Defrag and Solid State Drives

In-Block Level Redundancy Management for Flash Storage System

Disks and RAID. Profs. Bracy and Van Renesse. based on slides by Prof. Sirer

Solid State Technology What s New?

MCF54418 NAND Flash Controller

Choosing the Right NAND Flash Memory Technology

Building Flexible, Fault-Tolerant Flash-based Storage Systems

Impact of Stripe Unit Size on Performance and Endurance of SSD-Based RAID Arrays

21 st Century Storage What s New and What s Changing

COS 318: Operating Systems. Storage Devices. Kai Li and Andy Bavier Computer Science Department Princeton University

A Flash Storage Technical. and. Economic Primer. and. By Mark May Storage Consultant, By James Green, vexpert Virtualization Consultant

End to End Data Path Protection

RAID Technology Overview

VERY IMPORTANT NOTE! - RAID

Addressing Fatal Flash Flaws That Plague All Flash Storage Arrays

Database Hardware Selection Guidelines

Booting from NAND Flash Memory

OCZ s NVMe SSDs provide Lower Latency and Faster, more Consistent Performance

RAID Implementation for StorSimple Storage Management Appliance

Storage Technologies for Video Surveillance

PowerProtector: Superior Data Protection for NAND Flash

SSDs and RAID: What s the right strategy. Paul Goodwin VP Product Development Avant Technology

Samsung 3bit 3D V-NAND technology

RAID Overview

DELL SOLID STATE DISK (SSD) DRIVES

XtremIO DATA PROTECTION (XDP)

Technologies Supporting Evolution of SSDs

The What, Why and How of the Pure Storage Enterprise Flash Array

Introduction. What is RAID? The Array and RAID Controller Concept. Click here to print this article. Re-Printed From SLCentral

Physical Data Organization

DELL RAID PRIMER DELL PERC RAID CONTROLLERS. Joe H. Trickey III. Dell Storage RAID Product Marketing. John Seward. Dell Storage RAID Engineering

Technology Update White Paper. High Speed RAID 6. Powered by Custom ASIC Parity Chips

enabling Ultra-High Bandwidth Scalable SSDs with HLnand

Fault Tolerance & Reliability CDA Chapter 3 RAID & Sample Commercial FT Systems

Advanced Knowledge and Understanding of Industrial Data Storage

Solid State Drives Data Reliability and Lifetime. Abstract

SLC vs. MLC: An Analysis of Flash Memory

an analysis of RAID 5DP

Comparison of NAND Flash Technologies Used in Solid- State Storage

Industrial Flash Storage Module

HP Smart Array Controllers and basic RAID performance factors

How To Create A Flash-Enabled Storage For Virtual Desktop 2.5 (Vdi) And 3.5D (Vdi) With Nimble Storage

Technical Note Memory Management in NAND Flash Arrays

Flash Fabric Architecture

Operating Systems. RAID Redundant Array of Independent Disks. Submitted by Ankur Niyogi 2003EE20367

Price/performance Modern Memory Hierarchy

NAND Flash Memory Reliability in Embedded Computer Systems

Enterprise Flash Drive

RAID Made Easy By Jon L. Jacobi, PCWorld

The Evolving NAND Flash Business Model for SSD. Steffen Hellmold VP BD, SandForce

COS 318: Operating Systems. Storage Devices. Kai Li Computer Science Department Princeton University. (

Flash for Databases. September 22, 2015 Peter Zaitsev Percona

Disk Storage & Dependability

1 / 25. CS 137: File Systems. Persistent Solid-State Storage

RAID Overview: Identifying What RAID Levels Best Meet Customer Needs. Diamond Series RAID Storage Array

Factors rebuilding a degraded RAID

Solid State Drives. Separating myths from facts

Storage and File Structure

Solid State Drive Architecture

Understanding Flash SSD Performance

NAND Flash Architecture and Specification Trends

STORAGE SOLID-STATE STORAGE ARRAYS BUYER S CHECKLIST

File System Design and Implementation

Flash-optimized Data Progression

Transcription:

An In-Depth Look at Variable Stripe RAID (VSR) A white paper by Robbie Stevens, Senior Marketing/Technical Writer 10777 Westheimer Rd. Houston, TX 77042 www.ramsan.com Scan this QR code with your smartphone for more information on Variable Stripe RAID. Download a QR scanner here: http://2dscan.com

Table of Contents Introduction... 3 Background Information... 4 Flash Architecture... 4 Bad Blocks Lead to Plane Failure... 5 RAID Arrays... 5 Prior RamSan RAID Implementation... 5 Variable Stripe RAID... 7 Advantages vs. Traditional RAID... 7 Advantages vs. Competition... 8 Conclusion... 9 Variable Stripe RAID (VSR) 2 June 2011

Introduction Variable Stripe RAID (VSR) is a new feature from Texas Memory Systems (TMS) that reduces the risk of uncorrectable errors occurring in user data on a NAND Flash chip while preserving as much storage space as possible. VSR maximizes chip storage space since it is not necessary to relocate an entire stripe following a failure. TMS now offers products with VSR incorporated into their Flash controllers, allowing stripes containing bad planes to be seamlessly rebuilt and efficiently relocated. By dynamically altering the stripe size, VSR can drastically reduce wasted planes. Variable Stripe RAID (VSR) 3 June 2011

Background Information Flash Architecture Flash is a type of non-volatile, solid state storage technology. In enterprise applications, multiple Flash chips are used together to produce solid state disks (SSDs) in the form of external rackmount systems or internal cards or drives. A Flash memory chip is divided into multiple nested entities. The 32nm Toshiba Flash chips used by the TMS Series-7 RamSan systems are divided as follows: One chip: is divided into 8 dies: One die is divided into 2 planes: One plane is divided into 2,048 blocks: One block is divided into 64 pages: Variable Stripe RAID (VSR) 4 June 2011

One page is divided into 8,192 bytes: In total, each chip contains 8 dies, 16 planes, 32,768 blocks, 2,097,152 pages, and 17,179,869,184 bytes! Bad Blocks Lead to Plane Failure A block may fail for a number of reasons. While many of these errors are automatically corrected using an error-correcting code (ECC), sometimes errors occur that ECC cannot handle. For example, a block may be worn past its usable life or data may be unintentionally altered through instances called read-, write-, and erase-disturb errors. If a block cannot be erased or written to, or if a read fails with uncorrectable ECC errors (that is, ECC cannot correct errors because too many bytes have failed), the PowerPC CPU in the Series-7 Flash Controller marks the block as bad and attempts to recover its data, if possible. Plane failure occurs if enough blocks within a plane go bad (256 for TMS systems). Failed planes can no longer be used to store data. RAID Arrays RAID (Redundant Array of Independent Disks) arrays are a popular way to mitigate the risks of losing data on an SSD in the event of a failed chip. In many RAID implementations, data is striped across all chips in an SSD, and if one chip fails the information on the remaining chips can be used to rebuild the missing data. The types of RAID arrays used in RamSan systems stripe data across all chips. For example, a RamSan-70 uses 40 chips and four Flash controllers in its base configuration. Each RamSan-70 Flash controller connects to ten Flash chips and is initially set up to use a 9+1 rolling XOR RAID across all ten of its Flash chips. Nine data pages within each RAID stripe are protected by a tenth page of additional information that can be used to rebuild lost data in the event of uncorrectable data errors. Prior RamSan RAID Implementation All RamSan systems operate as RAID arrays (in most cases, they have used RAID-5). If a bad plane was detected, the controller automatically rebuilt the missing block and moved all data from that stripe to a new location. All blocks used by that stripe were then deactivated. Variable Stripe RAID (VSR) 5 June 2011

Many RAID-5 controllers automatically detect plane failure and rebuild the data elsewhere much like VSR does, increasing data stability. However, the reduction in usable capacity within one device limits the overall number of stripes that can be created within the entire group. If a single device loses one eighth of its available blocks due to a failure, then all other devices within the RAID group also see a capacity reduction of one eighth. Once all available capacity in the failing device has been utilized, then no more stripes may be created, even if unused capacity still remains within the functioning devices. In the case of a 9+1 stripe, one bad plane results in nine good planes going to waste. In an extreme case, an entire chip may fail. Such a condition prevents the creation of any new stripes, and renders the entire RAID group unusable. Over time, the number of good planes trapped in bad stripes can reduce storage capacity by a significant amount. Variable Stripe RAID (VSR) 6 June 2011

Variable Stripe RAID When a bad plane occurs, VSR technology allows a plane to be removed from use without impacting the available capacity of other devices within the RAID stripe. Upon detection of a failure, the failing plane is removed from use (no further writes are allowed) and all used pages within the affected stripe are marked as critical to move. Information from the affected stripe is then gradually relocated to knowngood stripes, a process that is performed as a background task to minimize the required processing power. Unlike traditional RAID, the loss of capacity in one device does not affect other devices. The functioning devices those having more available capacity than that of the failing device simply create stripes with fewer pages (such as 8 +1 instead of 9+1) to work around the failing plane. Even in the event of complete chip loss, new stripes may be created across the remaining good chips and the failing chip is simply taken out of circulation. With VSR technology, device failures result in far less capacity loss than traditional RAID. Advantages vs. Traditional RAID In a traditional RAID-5 array, a stripe can only survive the failure of one plane used by the stripe. If a second plane in a stripe fails and the data has not been relocated by the controller, the entire stripe is corrupted, resulting in data loss. In addition, if the stripe is relocated by the controller, the planes used by the original stripe are deactivated. For example, suppose one plane in a 9+1 rolling XOR RAID-5 array fails (in this case, plane 6 1 ): When the failed plane is detected, the data is rebuilt and moved to a new stripe, along with the other data in the stripe: 1 2 2 2 3 2 4 2 5 2 6 2 7 2 8 2 9 2 P 2 The original ten planes can then no longer be used: In a VSR array, when a plane fails, the data from that stripe is rebuilt and moved to a new stripe in a similar fashion. However, the failed plane is removed from the original stripe, allowing the stripe to be used for new data. While the failed plane cannot be reused, the other nine planes are retained and additional data can be striped across them. For example, if the stripe was originally set up using a 9+1 Variable Stripe RAID (VSR) 7 June 2011

rolling XOR RAID, once the failed plane is removed the stripe functions as an 8+1 rolling XOR stripe. Suppose one plane in a 9+1 rolling XOR VSR RAID array fails (again, plane 6 1 ): When the failed plane is detected, the data is rebuilt and moved (along with the other data in the stripe) to a new stripe, similar to RAID-5: 1 2 2 2 3 2 4 2 5 2 6 2 7 2 8 2 9 2 P 2 However, VSR striping allows the original stripe to be reused. Only the failed plane is deactivated, resulting in an 8+1 rolling XOR VSR RAID stripe for new data: Furthermore, the new 8+1 RAID stripe can survive another bad plane. Suppose a plane fails in the new stripe: The stripe is rebuilt and moved to a new location, and the remaining 7+1 stripe is made available for new data. 1 3 2 3 3 3 4 3 5 3 6 3 7 3 8 3 9 3 P 3 Since VSR salvages the good planes in a stripe containing a failed plane, much more storage space is retained. Advantages vs. Competition This technology is only available from Texas Memory Systems, a company committed to producing the fastest, most reliable enterprise storage available. TMS uses only single-level cell (SLC) Flash chips, which have a much longer (10x) expected life than the multi-level cell (MLC) chips used in competing products. By reducing space lost to failed planes, VSR technology is another way TMS extends the usable life and value of its products. Variable Stripe RAID (VSR) 8 June 2011

Conclusion Variable Stripe RAID technology from Texas Memory Systems extends the usable life of its storage solutions, helping enterprise SSD users retain as much storage space as possible while protecting the integrity of their data. When VSR striping is employed, data from a failed plane is rebuilt and the plane is relocated. More importantly, the stripe containing the failed plane is not discarded; instead the failed plane is simply removed from the stripe, allowing new data to be written to the stripe. Since VSR technology automatically and dynamically adjusts stripe width following plane failure, this new RAID strategy can reduce wasted planes and efficiently protect valuable data. Variable Stripe RAID (VSR) 9 June 2011