Highlights of the High- Bandwidth Memory (HBM) Standard

Similar documents

Tuning DDR4 for Power and Performance. Mike Micheletti Product Manager Teledyne LeCroy

Tuning DDR4 for Power and Performance. Mike Micheletti Product Manager Teledyne LeCroy

Computer Architecture

Table 1: Address Table

Features. DDR SODIMM Product Datasheet. Rev. 1.0 Oct. 2011

User s Manual HOW TO USE DDR SDRAM

Table 1 SDR to DDR Quick Reference

V58C2512(804/404/164)SB HIGH PERFORMANCE 512 Mbit DDR SDRAM 4 BANKS X 16Mbit X 8 (804) 4 BANKS X 32Mbit X 4 (404) 4 BANKS X 8Mbit X 16 (164)

Features. DDR3 Unbuffered DIMM Spec Sheet

Technical Note DDR2 Offers New Features and Functionality

Memory unit. 2 k words. n bits per word

DDR subsystem: Enhancing System Reliability and Yield

GR2DR4B-EXXX/YYY/LP 1GB & 2GB DDR2 REGISTERED DIMMs (LOW PROFILE)

A Mixed Time-Criticality SDRAM Controller

are un-buffered 200-Pin Double Data Rate (DDR) Synchronous DRAM Small Outline Dual In-Line Memory Module (SO-DIMM). All devices

Technical Note FBDIMM Channel Utilization (Bandwidth and Power)

JEDEC STANDARD. Double Data Rate (DDR) SDRAM Specification JESD79C. (Revision of JESD79B) JEDEC SOLID STATE TECHNOLOGY ASSOCIATION MARCH 2003

Technical Note. Initialization Sequence for DDR SDRAM. Introduction. Initializing DDR SDRAM

DDR3 SDRAM UDIMM MT8JTF12864A 1GB MT8JTF25664A 2GB

DDR2 Specific SDRAM Functions

Memory Module Specifications KVR667D2D4F5/4G. 4GB 512M x 72-Bit PC CL5 ECC 240-Pin FBDIMM DESCRIPTION SPECIFICATIONS

JEDEC STANDARD DDR2 SDRAM SPECIFICATION JESD79-2B. (Revision of JESD79-2A) JEDEC SOLID STATE TECHNOLOGY ASSOCIATION. January 2005

Performance Measurements MEMCON 2014

Intel X38 Express Chipset Memory Technology and Configuration Guide

Features. DDR3 SODIMM Product Specification. Rev. 1.7 Feb. 2016

ADQYF1A08. DDR2-1066G(CL6) 240-Pin O.C. U-DIMM 1GB (128M x 64-bits)

Intel Q35/Q33, G35/G33/G31, P35/P31 Express Chipset Memory Technology and Configuration Guide

Mobile SDRAM. MT48H16M16LF 4 Meg x 16 x 4 banks MT48H8M32LF 2 Meg x 32 x 4 banks

D1 D2 D3 D4 D5 D6 D7. Benefits: Stable system operation No trace length matching Low PCB cost

DDR SDRAM SODIMM. MT9VDDT1672H 128MB 1 MT9VDDT3272H 256MB MT9VDDT6472H 512MB For component data sheets, refer to Micron s Web site:

SOLVING HIGH-SPEED MEMORY INTERFACE CHALLENGES WITH LOW-COST FPGAS

DDR3 DIMM Slot Interposer

GPU System Architecture. Alan Gray EPCC The University of Edinburgh

Intel 965 Express Chipset Family Memory Technology and Configuration Guide

DDR SDRAM SODIMM. MT8VDDT3264H 256MB 1 MT8VDDT6464H 512MB For component data sheets, refer to Micron s Web site:

Contents. 1. Trends 2. Markets 3. Requirements 4. Solutions

FS1140 & FS1141 DDR Protocol Checking & Performance Tool. FuturePlus Systems. Power Tools For Bus Analysis

White Paper Utilizing Leveling Techniques in DDR3 SDRAM Memory Interfaces

Graphics Cards and Graphics Processing Units. Ben Johnstone Russ Martin November 15, 2011

Emerging storage and HPC technologies to accelerate big data analytics Jerome Gaysse JG Consulting

DDR SDRAM UDIMM MT16VDDT6464A 512MB MT16VDDT12864A 1GB MT16VDDT25664A 2GB

Eureka Technology. Understanding SD, SDIO and MMC Interface. by Eureka Technology Inc. May 26th, Copyright (C) All Rights Reserved

DDR SDRAM SODIMM MT16VDDF6464H 512MB MT16VDDF12864H 1GB

RAID. RAID 0 No redundancy ( AID?) Just stripe data over multiple disks But it does improve performance. Chapter 6 Storage and Other I/O Topics 29

1 Storage Devices Summary

IS42/45R86400D/16320D/32160D IS42/45S86400D/16320D/32160D

Technical Note. Micron NAND Flash Controller via Xilinx Spartan -3 FPGA. Overview. TN-29-06: NAND Flash Controller on Spartan-3 Overview

DDR4 Memory Technology on HP Z Workstations

Table Of Contents. Page 2 of 26. *Other brands and names may be claimed as property of others.

ThinkServer PC DDR2 FBDIMM and PC DDR2 SDRAM Memory options boost overall performance of ThinkServer solutions

DDR3(L) 4GB / 8GB UDIMM

Input / Ouput devices. I/O Chapter 8. Goals & Constraints. Measures of Performance. Anatomy of a Disk Drive. Introduction - 8.1

DDR2 SDRAM SODIMM MT8HTF12864HZ 1GB MT8HTF25664HZ 2GB. Features. 1GB, 2GB (x64, SR) 200-Pin DDR2 SDRAM SODIMM. Features

DDR2 SDRAM SODIMM MT8HTF6464HDZ 512MB MT8HTF12864HDZ 1GB. Features. 512MB, 1GB (x64, DR) 200-Pin DDR2 SODIMM. Features

Configuring Memory on the HP Business Desktop dx5150

DDR2 SDRAM SODIMM MT4HTF6464HZ 512MB. Features. 512MB (x64, SR) 200-Pin DDR2 SODIMM. Features. Figure 1: 200-Pin SODIMM (MO-224 R/C C)

enabling Ultra-High Bandwidth Scalable SSDs with HLnand

Computer Architecture

Microprocessor & Assembly Language

DDR2 SDRAM SODIMM MT16HTF12864H 1GB MT16HTF25664H 2GB

DigiPoints Volume 1. Student Workbook. Module 4 Bandwidth Management

OpenPOWER Outlook AXEL KOEHLER SR. SOLUTION ARCHITECT HPC

OpenSPARC T1 Processor

Memory Testing. Memory testing.1

A New Chapter for System Designs Using NAND Flash Memory

DDR2 Device Operations & Timing Diagram DDR2 SDRAM. Device Operations & Timing Diagram

Models Smart Array 6402A/128 Controller 3X-KZPEC-BF Smart Array 6404A/256 two 2 channel Controllers

DDR2 SDRAM SODIMM MT8HTF3264HD 256MB MT8HTF6464HD 512MB MT8HTF12864HD 1GB For component data sheets, refer to Micron s Web site:

Memory technology evolution: an overview of system memory technologies

Remote Copy Technology of ETERNUS6000 and ETERNUS3000 Disk Arrays

Samsung DDR4 SDRAM DDR4 SDRAM

DQ0 DQ1 DQ2 DQ3 DQ4 DQ5 DQ6 DQ7 WE# RAS# A0 A1 A2 A3

HETEROGENEOUS HPC, ARCHITECTURE OPTIMIZATION, AND NVLINK

The Leader in Memory Technology

DDR3 memory technology

DDR2 SDRAM UDIMM MT18HTF6472AY 512MB MT18HTF12872AY 1GB MT18HTF25672AY 2GB MT18HTF51272AY 4GB. Features

DDR2 SDRAM SODIMM MT16HTF12864HZ 1GB MT16HTF25664HZ 2GB MT16HTF51264HZ 4GB. Features. 1GB, 2GB, 4GB (x64, DR) 200-Pin DDR2 SDRAM SODIMM.

1 Gbit, 2 Gbit, 4 Gbit, 3 V SLC NAND Flash For Embedded

MICROPROCESSOR AND MICROCOMPUTER BASICS

Memory Hierarchy. Arquitectura de Computadoras. Centro de Investigación n y de Estudios Avanzados del IPN. adiaz@cinvestav.mx. MemoryHierarchy- 1

Computer Organization & Architecture Lecture #19

ThinkServer PC DDR3 1333MHz UDIMM and RDIMM PC DDR3 1066MHz RDIMM options for the next generation of ThinkServer systems TS200 and RS210

Dual DIMM DDR2 and DDR3 SDRAM Interface Design Guidelines

PCI Express* Ethernet Networking

Dell PowerEdge Servers Memory

MICROPROCESSOR. Exclusive for IACE Students iacehyd.blogspot.in Ph: /422 Page 1

DDR Memory Overview, Development Cycle, and Challenges

PT973216BG. 32M x 4BANKS x 16BITS DDRII. Table of Content-

Memory Module Specifications KVR667D2D8F5/2GI. 2GB 256M x 72-Bit PC CL5 ECC 240-Pin FBDIMM DESCRIPTION SPECIFICATIONS

COMPUTER HARDWARE. Input- Output and Communication Memory Systems

Motivation: Smartphone Market

ADATA Technology Corp. DDR3-1600(CL11) 240-Pin VLP ECC U-DIMM 4GB (512M x 72-bit)

NAND Flash & Storage Media

Note: Data Rate (MT/s) CL = 3 CL = 4 CL = 5 CL = 6. t RCD (ns) t RP (ns) t RC (ns) t RFC (ns)

Chapter Introduction. Storage and Other I/O Topics. p. 570( 頁 585) Fig I/O devices can be characterized by. I/O bus connections

QuickSpecs. HP Smart Array 5312 Controller. Overview

Memory ICS 233. Computer Architecture and Assembly Language Prof. Muhamed Mudawar

1.55V DDR2 SDRAM FBDIMM

What Every Programmer Should Know About Memory

DDR3 SDRAM SODIMM MT16JSF25664HZ 2GB MT16JSF51264HZ 4GB. Features. 2GB, 4GB (x64, DR) 204-Pin Halogen-Free DDR3 SODIMM. Features

Transcription:

Highlights of the High- Bandwidth Memory (HBM) Standard Mike O Connor Sr. Research Scientist

What is High-Bandwidth Memory (HBM)? Memory standard designed for needs of future GPU and HPC systems: Exploit very large number of signals available with diestacking technologies for very high memory bandwidth Reduce I/O energy costs Enable higher fraction of peak bandwidth to be exploited by sophisticated memory controllers Enable ECC/Resilience Features JEDEC standard JESD235, adopted Oct 2013. Initial work on standard started in 2010

What is High-Bandwidth Memory (HBM)? Enables systems with extremely high bandwidth requirements like future high-performance GPUs

HBM Overview Standard defines an HBM stack Bonding footprint Interface Signaling Commands & Protocol Some optional features: ECC support Base-layer logic/redistribution/io die Standard does not define Internal architecture of the stack Precise DRAM timing parameters

HBM Overview Channel 0 Channel 1 4 DRAM dies with 2 channels per die Optional Base Logic Die *Figure from JEDEC Standard High Bandwidth Memory (HBM) DRAM, JESD 235, Oct. 2013 Each HBM stack provides 8 independent memory channels These are completely independent memory interfaces Independent clocks & timing Independent commands Independent memory arrays In short, nothing one channel does affects another channel

HBM Overview - Bandwidth Each channel provides a 128-bit data interface Data rate of 1 to 2 Gbps per signal (500-1000 MHz DDR) 16-32 GB/sec of bandwidth per channel 8 Channels per stack 128-256 GB/sec of bandwidth per stack For comparison: Highest-end GPU today (NVIDIA GeForce GTX TITAN Black) 384b wide GDDR5 (12 x32 devices) @ 7 Gbps = 336 GB/s Future possible GPU with 4 stacks of HBM Four stacks of HBM @ 1-2 Gbps = 512 GB/s - 1 TB/s cost

HBM Overview - Bandwidth Each channel provides a 128-bit data interface Data rate of 1 to 2 Gbps per signal (500-1000 MHz DDR) 16-32 GB/sec of bandwidth per channel 8 Channels per stack At lower overall DRAM system power. 128-256 GB/sec of bandwidth per stack For comparison: ~6-7 pj/bit vs. ~18-22 pj/bit for GDDR5 Highest-end GPU today (NVIDIA GeForce GTX TITAN Black) 384b wide GDDR5 (12 x32 devices) @ 7 Gbps = 336 GB/s cost Future possible GPU with 4 stacks of HBM Four stacks of HBM @ 1-2 Gbps = 512 GB/s - 1 TB/s power cost

HBM Overview - Capacity Per-channel capacities supported from 1-32 Gbit Stack capacity of 1 to 32GBytes Near-term, at lower-end of range e.g. 4 high stack of 4Gb dies = 2GBytes/stack 8 or 16 banks per channel 16 banks when > 4Gbit per channel (> 4GBytes/stack) Not including optional additional ECC bits A stack providing ECC storage may have 12.5% more bits

HBM Channel Overview Each channel is similar to a standard DDR interface Data interface is bi-directional Still requires delay to turn the bus around between RD and WR Burst-length of 2 (32B per access) Requires traditional command sequences Activates required to open rows before read/write Precharges required before another activate Traditional dram timings still exist (trc, trrd, trp, tfaw, etc.) but are entirely per-channel

HBM Channel Summary Function # of µbumps Notes Data 128 DDR, bi-directional Column Command/Addr. 8 DDR Row Command/Addr. 6 DDR Data Bus Inversion 16 1 for every 8 Data bits, bi-directional Data Mask/Check Bits 16 1 for every 8 Data bits, bi-directional Strobes 16 Differential RD & WR strobes for every 32 Data bits Clock 2 Differential Clock Clock Enable 1 Enable low-power mode Total 193

New: Split Command Interfaces 2 semi-independent command interfaces per channel Column Commands Read / Write Row Commands ACT / PRE / etc. Key reasons to provide separate row command i/f: 100% col. cmd bandwidth to saturate the data bus w/ BL=2 Simplifies memory controller Better performance (issue ACT earlier or not delay RD/WR) Still need to enforce usual ACT RD/WR PRE timings

New: Single-Bank Refresh Current DRAMs require refresh operations Refresh commands require all banks to be closed ~ 1 refresh command every few µsec Can consume 5-10% of potential bandwidth Increasing overheads with larger devices Sophisticated DRAM controllers work hard to overlap ACT/PRE in one bank with traffic to other banks Can manage the refresh similarly Added Refresh Single Bank command Like an ACT, but w/ internal per-bank row counter Can be issued to any banks in any order Memory controller responsible for ensuring all banks get enough refreshes each refresh period

New: Single-Bank Refresh Bank 0 Bank 1 PRE PRE Refresh (All Banks) ACT ACT RD RD Bank n PRE ACT Traditional Precharge-All and Refresh-All Bank 0 PRE ACT RD PRE REFSB ACT Bank 1 RD PRE ACT RD PRE ACT Bank n PRE REFSB ACT RD PRE Arbitrary Single-Bank Refresh

New: RAS Support HBM standard supports ECC Optional: Not all stacks required to support it ECC and non-ecc stacks use same interface Key insight: Per-byte data mask signals and ECC not simultaneously useful Data Mask Signals can carry ECC data - makes them bi-directional on HBM stacks that support ECC Parity check of all cmd/addr busses also supported

Other HBM Features HBM supports Temperature Compensated Self Refresh Temperature dependent refresh rates with several temperature ranges (e.g. cool/standby, normal, extended, emergency) Temperature sensor can be read by memory controller to adjust its refresh rates as well Data Bus Inversion coding to reduce number of simultaneously switching signals No more than 4 of 9 (DQ[0..7], DBI) signals switch DBI computation maintained across consecutive commands

Thank You QUESTIONS? moconnor@nvidia.com

BACKUP

Footprint BACKUP

HBM Footprint *Figure from JEDEC Standard High Bandwidth Memory (HBM) DRAM, JESD 235, Oct. 2013

HBM Footprint Half of One channel Data i/f Four channels Command i/f *Figure from JEDEC Standard High Bandwidth Memory (HBM) DRAM, JESD 235, Oct. 2013

Commands BACKUP

Column Commands Command Clock C[0:7] Column NOP Rising CNOP / XXXXX Falling XXXXXXX / Parity Read Rising RD / Autoprecharge / Bank Falling Column Address / Parity Write Rising RD / Autoprecharge / Bank Falling Column Address / Parity Mode Register Set Rising MRS / Mode Reg Falling Opcode

Row Commands Command Clock R[0:5] Row NOP Rising RNOP / XXX Falling XXXXX / Parity Activate Rising ACT / Bank Falling Row Address[15:11] / Parity Rising Row Address[10:5] Falling Row Address[4:0] / Parity Precharge Rising PRE / Bank Falling XXXXX / Parity Precharge All Banks Rising PREA / XXX Falling XXXXX / Parity Refresh (single bank) Rising REFSB / Bank Falling XXXXX / Parity Refresh (all banks) Rising REF / XXX Falling XXXXX / Parity

RAS BACKUP

HBM RAS Challenges Stacked Memory has some challenges with respect to RAS requirements Traditional DRAM DIMMs get only a subset of bits (e.g. 4) from each burst from a single DRAM device HBM gives you all the bits of a burst from a single row of a single bank of a single DRAM device Good for power, but RAS-wise all our eggs are in one basket Including the ECC bits Need techniques to detect failures (e.g. row decode fault) Need techniques to recover from failures (e.g. RAID-like schemes)