SRAM Scaling Limit: Its Circuit & Architecture Solutions

Similar documents
Static-Noise-Margin Analysis of Conventional 6T SRAM Cell at 45nm Technology

Intel Labs at ISSCC Copyright Intel Corporation 2012

Efficient Interconnect Design with Novel Repeater Insertion for Low Power Applications

數 位 積 體 電 路 Digital Integrated Circuits

Leakage Power Reduction Using Sleepy Stack Power Gating Technique

DRG-Cache: A Data Retention Gated-Ground Cache for Low Power 1

Design and analysis of flip flops for low power clocking system

Chapter 2 Heterogeneous Multicore Architecture

Power consumption is now the major technical

On the Use of Strong BCH Codes for Improving Multilevel NAND Flash Memory Storage Capacity

DESIGN CHALLENGES OF TECHNOLOGY SCALING

Test Solution for Data Retention Faults in Low-Power SRAMs

Power Reduction Techniques in the SoC Clock Network. Clock Power

A New Programmable RF System for System-on-Chip Applications

«A 32-bit DSP Ultra Low Power accelerator»

ISSCC 2003 / SESSION 4 / CLOCK RECOVERY AND BACKPLANE TRANSCEIVERS / PAPER 4.7

International Journal of Electronics and Computer Science Engineering 1482

Price/performance Modern Memory Hierarchy

Low-Power Error Correction for Mobile Storage

On-chip clock error characterization for clock distribution system

All Programmable Logic. Hans-Joachim Gelke Institute of Embedded Systems. Zürcher Fachhochschule

Efficient Motion Estimation by Fast Three Step Search Algorithms

Chapter Introduction. Storage and Other I/O Topics. p. 570( 頁 585) Fig I/O devices can be characterized by. I/O bus connections

Digital Design for Low Power Systems

路 論 Chapter 15 System-Level Physical Design

Naveen Muralimanohar Rajeev Balasubramonian Norman P Jouppi

Low Power and Reliable SRAM Memory Cell and Array Design

CHARGE pumps are the circuits that used to generate dc

ISSCC 2003 / SESSION 13 / 40Gb/s COMMUNICATION ICS / PAPER 13.7

S. Venkatesh, Mrs. T. Gowri, Department of ECE, GIT, GITAM University, Vishakhapatnam, India

A 1-GSPS CMOS Flash A/D Converter for System-on-Chip Applications

2B1459 System-on-Chip Applications (5CU/7.5ECTS) Course Number for Graduate Students (2B5476, 4CU)

Reconfigurable Architecture Requirements for Co-Designed Virtual Machines

{The Non-Volatile Memory Technology Database (NVMDB)}, UCSD-CSE Techreport CS

NAND Flash Architecture and Specification Trends

Evaluating Embedded Non-Volatile Memory for 65nm and Beyond

1. Memory technology & Hierarchy

3D NAND Technology Implications to Enterprise Storage Applications

A Dual-Mode NAND Flash Memory: 1-Gb Multilevel and High-Performance 512-Mb Single-Level Modes

Energy aware RAID Configuration for Large Storage Systems

McPAT: An Integrated Power, Area, and Timing Modeling Framework for Multicore and Manycore Architectures

Data Storage Time Sensitive ECC Schemes for MLC NAND Flash Memories

Computer Architecture

Low leakage and high speed BCD adder using clock gating technique

Data Retention in MLC NAND Flash Memory: Characterization, Optimization, and Recovery

An Analytical Model for the Calculation of the Expected Miss Ratio in Faulty Caches

Implementation Details

Linux flash file systems JFFS2 vs UBIFS

FAULT TOLERANCE FOR MULTIPROCESSOR SYSTEMS VIA TIME REDUNDANT TASK SCHEDULING

TECHNOLOGY BRIEF. Compaq RAID on a Chip Technology EXECUTIVE SUMMARY CONTENTS

Nanotechnologies for the Integrated Circuits

Model-Based Synthesis of High- Speed Serial-Link Transmitter Designs

A Survey on ARM Cortex A Processors. Wei Wang Tanima Dey

A CDMA Based Scalable Hierarchical Architecture for Network- On-Chip

SLC vs MLC NAND and The Impact of Technology Scaling. White paper CTWP010

Architectures and Design Methodologies for Micro and Nanocomputing

Enabling Technologies for Distributed Computing

Module 2. Embedded Processors and Memory. Version 2 EE IIT, Kharagpur 1

This Unit: Putting It All Together. CIS 501 Computer Architecture. Sources. What is Computer Architecture?

Energy Constrained Resource Scheduling for Cloud Environment

760 Veterans Circle, Warminster, PA Technical Proposal. Submitted by: ACT/Technico 760 Veterans Circle Warminster, PA

Slide Set 8. for ENCM 369 Winter 2015 Lecture Section 01. Steve Norman, PhD, PEng

How To Improve Performance On A Single Chip Computer

The Bleak Future of NAND Flash Memory

Offloading file search operation for performance improvement of smart phones

Semiconductor Memories

A PC-BASED TIME INTERVAL COUNTER WITH 200 PS RESOLUTION

OpenPOWER Outlook AXEL KOEHLER SR. SOLUTION ARCHITECT HPC

STMicroelectronics. Deep Sub-Micron Processes 130nm, 65 nm, 40nm, 28nm CMOS, 28nm FDSOI. SOI Processes 130nm, 65nm. SiGe 130nm

IC-EMC Simulation of Electromagnetic Compatibility of Integrated Circuits

Operating Systems. RAID Redundant Array of Independent Disks. Submitted by Ankur Niyogi 2003EE20367

NAND Flash FAQ. Eureka Technology. apn5_87. NAND Flash FAQ

Low Power AMD Athlon 64 and AMD Opteron Processors

Highly Scalable NAND Flash Memory Cell Design Embracing Backside Charge Storage

DDR subsystem: Enhancing System Reliability and Yield

Chapter 10: Virtual Memory. Lesson 03: Page tables and address translation process using page tables

TRUE SINGLE PHASE CLOCKING BASED FLIP-FLOP DESIGN

ISSCC 2004 / SESSION 17 / MEMS AND SENSORS / 17.4

Secured Embedded Many-Core Accelerator for Big Data Processing

A Novel Low Power, High Speed 14 Transistor CMOS Full Adder Cell with 50% Improvement in Threshold Loss Problem

Certification Document macle GmbH Grafenthal-S1212M 24/02/2015. macle GmbH Grafenthal-S1212M Storage system

Design of a Reliable Broadband I/O Employing T-coil

FLASH TECHNOLOGY DRAM/EPROM. Flash Year Source: Intel/ICE, "Memory 1996"

On-Chip Interconnection Networks Low-Power Interconnect

Enabling Technologies for Distributed and Cloud Computing

Memory Architecture and Management in a NoC Platform

8 Gbps CMOS interface for parallel fiber-optic interconnects

Architekturen und Einsatz von FPGAs mit integrierten Prozessor Kernen. Hans-Joachim Gelke Institute of Embedded Systems Professur für Mikroelektronik

A Storage Architecture for High Speed Signal Processing: Embedding RAID 0 on FPGA

DELL RAID PRIMER DELL PERC RAID CONTROLLERS. Joe H. Trickey III. Dell Storage RAID Product Marketing. John Seward. Dell Storage RAID Engineering

A Survey on Sequential Elements for Low Power Clocking System

Switch Fabric Implementation Using Shared Memory

Transcription:

SRAM Scaling Limit: Its Circuit & Architecture Solutions Nam Sung Kim, Ph.D. Assistant Professor Department of Electrical and Computer Engineering University of Wisconsin - Madison

SRAM VCC min Challenges 1.2V V CC 0.7V 0.6V (stand-by) VCC/Freq scaling for power efficient computing requires SRAMs to operate at low VCC I i i ti b t SRAM f il li iti th Increasing process variations exacerbate SRAM failures limiting the lowest core operating VCC VCC min

SRAM VCC min Challenges 1.2V V CC 0.7V 0.6V (stand-by) VCC/Freq scaling for power efficient computing requires SRAMs to operate at low VCC SRAM area scaling is getting harder because of process variations and voltage scaling! Increasing process variations exacerbate SRAM failures limiting iti the lowest core operating VCC VCC min

SRAM Failure Mechanisms Read Failure Write Failure Voltage( V) 0.8 0.6 0.4 02 0.2 WL Voltage(V V) 0.8 0.6 0.4 02 0.2 0 0 250 500 750 Time (ps) 0 0 250 500 750 Time (ps) WL WL - V T + V T + V T - V T 0 1 0 1 - V T + V T + V T + V T + V T - V T - V T + V T BL BL# BL BL#

Process Variation Trend Threshold variation: Gate length variation: ITRS Projection for Vth and Leff Vairations Corresponding SRAM Failure Probability vs VCC Courtesy: K. Cao

Process Variation Trend Threshold variation: Increasing random variations & decreasing VCC w/ technology scaling begin to limit SRAM size & VCC scaling! Gate length variation: ITRS Projection for Vth and Leff Vairations Corresponding SRAM Failure Probability vs VCC Courtesy: K. Cao

Circuit Solution Dynamic/adaptive techniques 6T SRAM Dual supply column-based technique 1 Assisted read/write techniques 2,3,4 1. K. Zhang et al. A 3-GHz 70-Mb SRAM in 65-nm CMOS Technology With Integrated Column-Based Dynamic Power Supply. IEEE J. Solid-State Circuits vol 41 no 1, pp 146 151, 2006. 2. M. Khellah, N. Kim, et al. PVT-Variations and Supply-Noise Tolerant 45nm Dense Cache Arrays with Diffusion-Notch-Free (DNF) 6T SRAM Cells and Dynamic Multi-Vcc Circuits. it In Proc. IEEE VLSI Circuit it Symposium, Jun 2008. 3. F. Hamzaoglu, K. Zhang, et al. A 153Mb-SRAM Design with Dynamic Stability Enhancement and Leakage Reduction in 45nm High-κ Metal-Gate CMOS Technology. ISSCC 2008. 4. S. Ohbayashi. A 65-nm SoC Embedded 6T-SRAM Designed for manufacturability With Read and Write Operation Stabilizing Circuits. IEEE J. Solid-State Circuits vol 42 no 4, pp 820 829, 2007. SRAM cell sizing +ECCs 6T SRAM cell area vs. failure rate trade-off Carefully sized 6T SRAM cells for large caches have been more area efficient than 8T 1 and 10T 2,3 SRAMs at the same VCC min Stronger ECCs allow us to continue VCC min scaling (for now) 1. N. Verma, A. Chandrakasan. A 65nm 8T Sub-Vt SRAM Employing Sense-Amplifier Redundancy. ISSCC 2007. 2. B. Calhoun, A. Chandrakasan. A 256kb Sub-threshold SRAM in 65nm CMOS. ISSCC 2006. 3. I. Chang, J. Kim, K. Roy. A 32kb 10T Subthreshold SRAM Array with Bit-Interleaving and Differential Read Scheme in 90nm CMOS. ISSCC 2008. 4. Z. Chishti, et al. Improving Cache Lifetime Reliability at Ultra-low Voltages. MICRO 2009.

Circuit Solution Dynamic/adaptive techniques 6T SRAM Dual supply column-based technique 1 Assisted read/write techniques 2,3,4 1. K. Zhang et al. A 3-GHz 70-Mb SRAM in 65-nm CMOS Technology With Integrated Column-Based Dynamic Power Supply. IEEE J. Solid-State Circuits vol 41 no 1, pp 146 151, 2006. 2. M. Khellah, N. Kim, et al. PVT-Variations and Supply-Noise Tolerant 45nm Dense Cache Arrays with Diffusion-Notch-Free (DNF) 6T Order-of-magnitude SRAM Cells and Dynamic Multi-Vcc Circuits. In Proc. IEEE failure VLSI Circuit it Symposium, rate Jun reduction 2008. w/ 3. F. Hamzaoglu, K. Zhang, et al. A 153Mb-SRAM Design with Dynamic Stability Enhancement and Leakage Reduction in 45nm High-κ Metal-Gate conventional CMOS Technology. ISSCC 6T 2008. SRAM + small overhead! 4. S. Ohbayashi. A 65-nm SoC Embedded 6T-SRAM Designed for manufacturability With Read and Write Operation Stabilizing Circuits. IEEE J. Solid-State Circuits vol 42 no 4, pp 820 829, 2007. SRAM cell sizing +ECCs 6T SRAM cell area vs. failure rate trade-off Carefully sized 6T SRAM cells for large caches have been more area efficient than 8T 1 and 10T 2,3 SRAMs at the same VCC min Stronger ECCs allow us to continue VCC min scaling (for now) 1. N. Verma, A. Chandrakasan. A 65nm 8T Sub-Vt SRAM Employing Sense-Amplifier Redundancy. ISSCC 2007. 2. B. Calhoun, A. Chandrakasan. A 256kb Sub-threshold SRAM in 65nm CMOS. ISSCC 2006. 3. I. Chang, J. Kim, K. Roy. A 32kb 10T Subthreshold SRAM Array with Bit-Interleaving and Differential Read Scheme in 90nm CMOS. ISSCC 2008. 4. Z. Chishti, et al. Improving Cache Lifetime Reliability at Ultra-low Voltages. MICRO 2009. Can we continue the current trend w/ 6T SRAM? Probably not.

Architecture Solution Norm Failure Rate 1.E+21 1E 1.E+1818 1.E+15 1.E+12 1.E+09 1.E+06 1E+03 1.E+03 100mV difference @ iso-failure point w/ 15% LLC size difference 4MB-Small Cell 4MB-Medium Cell 4MB- ULV 1.E+00 0.5 0.7 0.9 1.1 Voltage 4MB-Small Cell w/ 1-way disabled Small cell is 15% smaller, but 100mV higher VCC min than medium one Allowing failure in any one LLC way in each set w/ small cell give 100mV lower VCC min while 15% smaller overall cache area.

Dynamic Cache Resizing Designing a large cache operating at both high and low voltages is very challenging Lower operating voltage requires a larger area per bit Can we design a configurable cache? Allow as big cache size as possible when performance is important Allow as low voltage as possible at the expense of cache capacity when power is important t At lower voltages and frequencies Processor performance is less sensitive to on-chip cache size due to reduced frequency gap b/w main memory and on-chip cache Reduce cache size to lower VCC min at lower freq since performance impact is very small!

Conclusion VCC/Freq scaling for power efficient computing Require SRAMs to operate at low VCC Increasing random variations & decreasing VCC w/ technology scaling Begin to limit it SRAM size scaling! Various adaptive/dynamic + sizing + ECC techniques Have reduced the SRAM failure rate by order-of-magnitude of failure rate w/ conventional 6T SRAM + small overhead. So far, 6T SRAM has been more area efficient than 8T and 10T SRAM for large cache structures Incorporating architecture techniques Lower VCC min by trading cache capacity w/ lower VCC min The performance impact is very small due to reduced frequency gap b/w main memory and on-chip caches