FREE-p: Protecting Non-Volatile Memory against both Hard and Soft Errors



Similar documents
The Evolving NAND Flash Business Model for SSD. Steffen Hellmold VP BD, SandForce

Chapter Introduction. Storage and Other I/O Topics. p. 570( 頁 585) Fig I/O devices can be characterized by. I/O bus connections

Disk Storage & Dependability

Endurance Models for Cactus Technologies Industrial-Grade Flash Storage Products. White Paper CTWP006

Naveen Muralimanohar Rajeev Balasubramonian Norman P Jouppi

NAND Flash FAQ. Eureka Technology. apn5_87. NAND Flash FAQ

NAND Flash Architecture and Specification Trends

COS 318: Operating Systems. Storage Devices. Kai Li Computer Science Department Princeton University. (

A Comparison of Client and Enterprise SSD Data Path Protection

Price/performance Modern Memory Hierarchy

Storage Class Memory and the data center of the future

Flash Memory. Jian-Jia Chen (Slides are based on Yuan-Hao Chang) TU Dortmund Informatik 12 Germany 2015 年 01 月 27 日. technische universität dortmund

Solid State Drive Technology

Impact of Stripe Unit Size on Performance and Endurance of SSD-Based RAID Arrays

Physical Data Organization

File Systems for Flash Memories. Marcela Zuluaga Sebastian Isaza Dante Rodriguez

Yaffs NAND Flash Failure Mitigation

File System & Device Drive. Overview of Mass Storage Structure. Moving head Disk Mechanism. HDD Pictures 11/13/2014. CS341: Operating System

Understanding endurance and performance characteristics of HP solid state drives

RAID. RAID 0 No redundancy ( AID?) Just stripe data over multiple disks But it does improve performance. Chapter 6 Storage and Other I/O Topics 29

18-548/ Associativity 9/16/98. 7 Associativity / Memory System Architecture Philip Koopman September 16, 1998

Flash for Databases. September 22, 2015 Peter Zaitsev Percona

Disks and RAID. Profs. Bracy and Van Renesse. based on slides by Prof. Sirer

Reduced Precision Hardware for Ray Tracing. Sean Keely University of Texas, Austin

COS 318: Operating Systems. Storage Devices. Kai Li and Andy Bavier Computer Science Department Princeton University

Accelerating Server Storage Performance on Lenovo ThinkServer

Performance Analysis of Flash Storage Devices and their Application in High Performance Computing

COS 318: Operating Systems. Virtual Memory and Address Translation

SSDs and RAID: What s the right strategy. Paul Goodwin VP Product Development Avant Technology

TECHNOLOGY BRIEF. Compaq RAID on a Chip Technology EXECUTIVE SUMMARY CONTENTS

Motivation: Smartphone Market

In-Block Level Redundancy Management for Flash Storage System

NAND Flash Architecture and Specification Trends

Flash-optimized Data Progression

1 / 25. CS 137: File Systems. Persistent Solid-State Storage

SOLID STATE DRIVES AND PARALLEL STORAGE

What Types of ECC Should Be Used on Flash Memory?

NAND Basics Understanding the Technology Behind Your SSD

Outline. Database Management and Tuning. Overview. Hardware Tuning. Johann Gamper. Unit 12

The Quest for Speed - Memory. Cache Memory. A Solution: Memory Hierarchy. Memory Hierarchy

Technologies Supporting Evolution of SSDs

Enhancing IBM Netfinity Server Reliability

Important Differences Between Consumer and Enterprise Flash Architectures

WEBTECH EDUCATIONAL SERIES

Low-Power Error Correction for Mobile Storage

Promise of Low-Latency Stable Storage for Enterprise Solutions

Linux flash file systems JFFS2 vs UBIFS

Computer Systems Structure Main Memory Organization

Managing the evolution of Flash : beyond memory to storage

With respect to the way of data access we can classify memories as:

RAID 5 rebuild performance in ProLiant

Error Control Coding and Ethernet

Hardware Based Flash Memory Failure Characterization Platform

Don t Let RAID Raid the Lifetime of Your SSD Array

FAST 11. Yongseok Oh University of Seoul. Mobile Embedded System Laboratory

Energy-Efficient, High-Performance Heterogeneous Core Design

Data Distribution Algorithms for Reliable. Reliable Parallel Storage on Flash Memories

Enterprise Flash Drive

Accelerating I/O- Intensive Applications in IT Infrastructure with Innodisk FlexiArray Flash Appliance. Alex Ho, Product Manager Innodisk Corporation

Solid State Drives Data Reliability and Lifetime. Abstract

ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective

HETEROGENEOUS SYSTEM COHERENCE FOR INTEGRATED CPU-GPU SYSTEMS

Comparison of NAND Flash Technologies Used in Solid- State Storage

Memory Basics. SRAM/DRAM Basics

Flash In The Enterprise

SSD Performance Tips: Avoid The Write Cliff

ioscale: The Holy Grail for Hyperscale

EMC XTREMIO EXECUTIVE OVERVIEW

Board Notes on Virtual Memory

Virtual Memory. How is it possible for each process to have contiguous addresses and so many of them? A System Using Virtual Addressing

Data Retention in MLC NAND Flash Memory: Characterization, Optimization, and Recovery

1 Storage Devices Summary

Storing Data: Disks and Files

Implications of Storage Class Memories (SCM) on Software Architectures

Designing a Cloud Storage System

Lecture 17: Virtual Memory II. Goals of virtual memory

OC By Arsene Fansi T. POLIMI

1. Memory technology & Hierarchy

Solid State Drive (SSD) FAQ

Understanding Flash SSD Performance

Flash & DRAM Si Scaling Challenges, Emerging Non-Volatile Memory Technology Enablement - Implications to Enterprise Storage and Server Compute systems

361 Computer Architecture Lecture 14: Cache Memory

Thread level parallelism

In-memory database systems, NVDIMMs and data durability

CS 61C: Great Ideas in Computer Architecture. Dependability: Parity, RAID, ECC

Rethinking SIMD Vectorization for In-Memory Databases

WHITE PAPER Guide to 50% Faster VMs No Hardware Required

Slide Set 8. for ENCM 369 Winter 2015 Lecture Section 01. Steve Norman, PhD, PEng

Maximizing Your Server Memory and Storage Investments with Windows Server 2012 R2

HP Smart Array Controllers and basic RAID performance factors

Transcription:

FREE-p: Protecting Non-Volatile Memory against both Hard and Soft Errors Doe Hyun Yoon Naveen Muralimanohar Jichuan Chang Parthasarathy Ranganathan Norman P. Jouppi Mattan Erez Electrical and Computer Engineering The University of Texas at Austin Intelligent Infrastructure Lab. Hewlett-Packard Labs.

N Challenge in Emerging NVRAM Finite write endurance PCRAM cells wear out after 10 8 writes on average Process variation exacerbates the problem Prior solutions Custom wear-out tolerance mechanisms Coarse-grained remapping Dynamically Replicated Memory (DRM) Error Correcting Pointer (ECP) Stuck-At-Failure Error Recovery (SAFER) 2

N Limitations of Prior Solutions Only HARD cell errors Can t detect/correct SOFT errors Resistance drift in PCRAM Can t detect/correct errors on periphery, wires, and packaging Error detection/correction logic within NVRAM devices Memory industry favors SIMPLE and CHEAP devices Error detection/correction in DRAM DRAM chips only store data Additional DRAM chips for storing redundant information Error detection/correction is done at the memory controller Better follow the same design strategy in NVRAM systems 3

N FREE-p Fine-grained Remapping with ECC and Embedded-pointer Multi-tiered ECC Fine-grained remapping Detection/correction at the memory controller 4

N Multi-tiered ECC 6EC-7ED BCH code 6-bit error correcting 7-bit error detecting 61 bits for 64B data (less than 12.5% overhead) Tolerate up to 4 bit wear-out failures and 2 bit soft errors Multi-tiered decoding Quick-, Slow-, and Mem-ECC Extend Hi-ECC [Wilkerson+ ISCA 10] Fast-path decoding for most initial periods 5

N Dealing with Intolerable Failures Eventually, some blocks become faulty More than 4 wear-out failures per block Coarse-grained remapping (prior solutions) Leverage virtual-to-physical mapping Mapping unit: 4kB or larger A block with intolerable failure maps out the whole page Fine-grained remapping Disable only a faulty block Mapping unit: 64B Effectively handle both random and concentrated errors 6

N Fine-grained Remapping (FR) with Embedded-pointer Embed a 64-bit pointer within a faulty block There are still-functional bits in a faulty block Use 7-Modular Redundancy to tolerate the failures 1-bit D/P flag per 64B block Identify a block is remapped or not D/P flag Data page ptr data Non-Volatile Memories Workshop Page 2011for remapping 7

N How to Mitigate This Penalty? Remap pointer cache Cache remap pointers Avoid reading remap pointer from NVRAM when cache hit Hash based index cache Pre-defined hash functions for remapping Compute, not cache, remap pointer H-idx: which hash function is used for remapping? 0: not remapped 1 or 2: remapped using one of the hash functions 3: hash collision All candidate locations are already used for other blocks Need to read the remap pointer 2 bits per 64B block 8

PA N Hash-Based Index Cache PFN CI CO 36 6 6 Hash2 Hash1 Offset Tag Base 64 H-idx s H-idx cache =? hit/miss H-idx remap_base + Remapped_address 9

N Memory System Organization with FREE-p Memory controller Write Req Queue ABUS 8 devices for data 1 device for ECC and D/P flag Read Req Queue Read Data Queue decode x8 x8 x8 x8 x8 x8 x8 x8 x8 ECC decoding ECC encoding Write Data Queue 72-bit wide DBUS NVRAM 10

Evaluation

Normalized Capacity Normalized Capacity N Capacity vs. Lifetime 1.2 1.0 0.8 0.6 0.4 0.2 0.0 1.2 1.0 0.8 0.6 0.4 0.2 0.0 CoV = 0.25 (moderate process variation) ECP6 FREE-p 0 1 2 3 4 Year 5 6 7 8 9 10 CoV = 0.35 (severe process variation) 26% 0 1 2 3 4 5 6 7 8 9 10 Year 11.5% Assume 16GB capacity, perfect wear-leveling, 12.8GB/s channel, constant write traffic (1/3 traffic of peak rate) 12

Faulty blocks / page N Performance Evaluation In-order core with PCRAM Performance depends on NVRAM wear-out status 64 56 48 40 32 24 16 8 0 1 remapped block per page after 7.3 years Almost no performance penalty for initial 7 years 0 2 4 Year 6 8 10 13

FFT RADIX OCEAN RAYTRACE facesim canneal streamcluster Average Performance overhead N After 7.3 Years 6.0% 5.0% 4.0% FREE-p:Baseline FREE-p:Cache FREE-p:IndexCache FREE-p:PerfectCache+Locality 3.0% 2.0% 1.0% 0.0% SPLASH2 PARSEC 14

FFT RADIX OCEAN RAYTRACE facesim canneal streamcluster Average Performance overhead N After 7.3 Years 6.0% 5.0% 4.0% FREE-p:Baseline FREE-p:Cache FREE-p:IndexCache FREE-p:PerfectCache+Locality 3.0% 2.0% 1.0% 0.0% SPLASH2 PARSEC 15

FFT RADIX OCEAN RAYTRACE facesim canneal streamcluster Average Performance overhead N After 7.3 Years 6.0% 5.0% 4.0% FREE-p:Baseline FREE-p:Cache FREE-p:IndexCache FREE-p:PerfectCache+Locality 1.8% 3.0% 2.0% 1.0% 0.0% SPLASH2 PARSEC 16

N Conclusions FREE-p combines multi-tiered ECC and FR Protect against both Hard AND Soft errors 11.5% longer lifetime over ECP6 Less than 2% performance degradation, even at 7 years 12.5% storage overhead (same as current DRAM protection) Everything is implemented at the memory controller End-to-end protection Hard and soft errors in the cell array Errors on the periphery, wires, packaging, System designers determine protection level Can be extended to chipkill-correct Simple and cheap (commodity) NVRAM devices 17

FREE-p: Protecting Non-Volatile Memory against both Hard and Soft Errors Doe Hyun Yoon Naveen Muralimanohar Jichuan Chang Parthasarathy Ranganathan Norman P. Jouppi Mattan Erez Electrical and Computer Engineering The University of Texas at Austin Intelligent Infrastructure Lab. Hewlett-Packard Labs.