3D GPU ARCHITECTURE USING CACHE STACKING: PERFORMANCE, COST, POWER AND THERMAL ANALYSIS

Save this PDF as:
 WORD  PNG  TXT  JPG

Size: px
Start display at page:

Download "3D GPU ARCHITECTURE USING CACHE STACKING: PERFORMANCE, COST, POWER AND THERMAL ANALYSIS"

Transcription

1 3D GPU ARCHITECTURE USING CACHE STACKING: PERFORMANCE, COST, POWER AND THERMAL ANALYSIS Ahmed Al Maashri, Guangyu Sun, Xiangyu Dong, Vijay Narayanan and Yuan Xie Department of Computer Science and Engineering, Penn State University

2 MOTIVATION Studies have shown that small cache size and low cache bandwidth will limit the performance of GPU Problems: We need to mitigate the high latency that is associated with increasing GPU cache sizes As we increase the computational capabilities of GPUs, there is an increase in power consumption

3 SOLUTION 3D ARCHITECTURE Benefits: reduced latency in circuits, reduced wires length that results in a reduction in power consumption and a reduction in footprint enables heterogeneous integration

4 BACKGROUND 3D INTEGRATION In a 3D IC, multiple device layers are stacked together with direct vertical interconnects Through-Silicon Vias (TSVs) through them. Conceptual 3D IC

5 BACKGROUND CONT D 3D architecture has already been used in processor-cache-memory system Schematic view

6 BACKGROUND CONT D Using a 3-D architecture allows us to keep the main memory on-chip and effectively reduce the latency for accessing it. This is because the onchip interconnections that replace the off-chip buses have much smaller delay and hence increase the memory bus frequency. Problem: One of the issues related to die stacking is the increase in power density which leads to an increase in chip temperature

7 DESIGN SPACE EXPLORATION investigate the effects of changing the organization of the GPU caches on the hit rate (Streamer caches, Texture Unit caches, ZST caches and Color Write caches) The simulation results show negligible impact on the hit rate for all the caches, except for the TU and the ZST caches

8 TU CACHE The texture cache is a read-only cache that stores image data that is used for putting images onto triangles, a process called texture mapping. The texture cache has a high hit rate since there is heavy reuse between neighboring pixels temporal locality

9 ZST CACHE Z and Stencil Test caches take advantage of the spatial locality because of the very nature of the depth buffer where neighboring fragments are more likely to be fetched in an X-Y frame grid Depth buffer: When an object is rendered, the depth of a generated pixel (z coordinate) is stored in a buffer (the z-buffer or depth buffer). This buffer is usually arranged as a two-dimensional array (x-y) with one element for each screen pixel.

10

11 DESIGN SPACE EXPLORATION CONT D use the 3DCacti simulator in order to determine the extra cycles incurred due to size increase These 2-layer and 4-layer caches were die-stacked by dividing the word lines. increasing the cache size increases the latency; however, dividing the caches into a number of layers has reduced latency.

12 3D COST MODEL There are a number of techniques for stacking dies of which Wafer-to-Wafer (W2W) and Die-to-Wafer (D2W) techniques are the most common. Unlike W2W, D2W allows for stacking individual dies to another wafer resulting in higher flexibility and higher yield. Die Cost: Cost of fabricating a single die before 3D bonding Bonding Cost: Cost incurred due to bonding (We assume a bonding cost of $150 per wafer) Die Yield: The die area is inversely proportional to the die yield. Bonding Yield(Our 3D bonding cost model is based on the 3D process from our industry partners, with the assumption that the yield of each 3D process step is 99%.) Known-Good-Die testing cost

13 ISO-CYCLE TIME RESULTS Assume iso-cycle time is 0.75 ns. This cycle time captures typical frequency ranges used in current GPUs

14 SCENARIO I a 2D GPU vs a 3Dstacked cache GPU. Both GPUs contain 128 shaders, and both utilize 65nm technology. The first layer in the 3D GPU contains the GPU processing units, while the other two layers contain the partitioned ZST and TU caches 3D architecture achieves up to 45% speed up over the 2D planar architecture Total power: 106.4W Maximum temperature: ºC(hotspot simulation tool)

15 SCENARIO II: HETEROGENEOUS INTEGRATION In the first layer of the 3D design, we implement the GPU units in 65nm technology. However, the second layer uses 45nm technology Working with smaller feature sizes allows us to cram all the caches into one layer saving cost incurred due to bonding. 3D design outperformed 2D by a 19% geometric mean speedup. Total power: 82.1W Maximum temperature: 82.24ºC(hotspot simulation tool)

16 MRAM VS. SRAM Since leakage power(a gradual loss of energy from a charged capacitor) is an important component of power consumption, we consider the impact of utilizing non-volatile Magnetic Random Access Memory (MRAM) that has zero standby power as a candidate for implementing caches. leakage power: a gradual loss of energy from a charged capacitor Standby power: the electric power consumed by electronic and electrical appliances while they are switched off or in a standby mode.

17 MAGNETORESISTIVE RANDOM-ACCESS MEMORY (MRAM) Unlike conventional RAM chip technologies, data in MRAM is not stored as electric charge or current flows, but by magnetic storage elements. The heart of an MRAM memory cell is the magnetic tunnel junction (MTJ), a small device having two ferromagnetic layers separated by a thin dielectric layer as shown below: The resistance of the MTJ is low if they are parallel( 1 ) and high if they are antiparallel( 0 ). Not only does it retain its memory with the power turned off but also there is no constant power-draw. the write process is slower and requires more power to overcome the existing field stored in the junction.

18 MRAM VS. SRAM CONT D For caches with a less number of writes compared to reads, we observed a performance gain. However, due to the slow write times of the MRAM, compared to SRAM, when the number of writes is large, there is performance degradation.

19 MRAM VS. SRAM CONT D The power benefits of MRAM over SRAM makes the former more appealing for power-conserving applications.

20 CONTRIBUTIONS Performance evaluation of 3Dstacked caches on GPUs Comparison between 3D stacked SRAMs and MRAMs in GPUs in terms of power consumptions Power and thermal analysis of proposed architectural designs.

21 Questions?

Memory Technology: Putting the nano in your ipod

Memory Technology: Putting the nano in your ipod Memory Technology: Putting the nano in your ipod Eric Pop Dept. of Electrical & Computer Engineering http://poplab.ece.uiuc.edu 1 Applications of Memory 1 Gigabyte = 1 GB = 1 billion bytes 1024 3 bytes

More information

Graphics Processing Unit (GPU) Memory Hierarchy. Presented by Vu Dinh and Donald MacIntyre

Graphics Processing Unit (GPU) Memory Hierarchy. Presented by Vu Dinh and Donald MacIntyre Graphics Processing Unit (GPU) Memory Hierarchy Presented by Vu Dinh and Donald MacIntyre 1 Agenda Introduction to Graphics Processing CPU Memory Hierarchy GPU Memory Hierarchy GPU Architecture Comparison

More information

Reducing memory latency using MRAM. Suzanne Reed Lisa Trova

Reducing memory latency using MRAM. Suzanne Reed Lisa Trova Reducing memory latency using MRAM Suzanne Reed Lisa Trova Outline Problems with existing memory DRAM SRAM MRAM - Mechanics Read/Write Strategies Benefits/Cons of MRAM Alternatives to MRAM DRAM Created

More information

Lecture Notes: Memory Systems

Lecture Notes: Memory Systems Lecture Notes: Memory Systems Rajeev Balasubramonian March 29, 2012 1 DRAM vs. SRAM On a processor chip, data is typically stored in SRAM caches. However, an SRAM cell is large enough that a single processor

More information

Technology Roadmap of DRAM for Three Major manufacturers: Samsung, SK-Hynix and Micron May 2013

Technology Roadmap of DRAM for Three Major manufacturers: Samsung, SK-Hynix and Micron May 2013 Technology Roadmap of DRAM for Three Major manufacturers: Samsung, SK-Hynix and Micron May 2013 Ch#47043 Table of Contents 1. Challenges of DRAM and Future of DRAM (ITRS) 2. Different types of scaling

More information

Page 1 of 5. IS 335: Information Technology in Business Lecture Outline Computer/Data Storage Technology

Page 1 of 5. IS 335: Information Technology in Business Lecture Outline Computer/Data Storage Technology Lecture Outline Computer/Data Storage Technology Objectives Describe the distinguishing characteristics of primary and secondary storage Describe the devices used to implement primary storage Compare secondary

More information

Photonic Networks for Data Centres and High Performance Computing

Photonic Networks for Data Centres and High Performance Computing Photonic Networks for Data Centres and High Performance Computing Philip Watts Department of Electronic Engineering, UCL Yury Audzevich, Nick Barrow-Williams, Robert Mullins, Simon Moore, Andrew Moore

More information

1. Memory technology & Hierarchy

1. Memory technology & Hierarchy 1. Memory technology & Hierarchy RAM types Advances in Computer Architecture Andy D. Pimentel Memory wall Memory wall = divergence between CPU and RAM speed We can increase bandwidth by introducing concurrency

More information

The Memory Hierarchy & Cache Review of Memory Hierarchy & Cache Basics (from 550):

The Memory Hierarchy & Cache Review of Memory Hierarchy & Cache Basics (from 550): Review From 550 The Memory Hierarchy & Cache Review of Memory Hierarchy & Cache Basics (from 550): Motivation for The Memory Hierarchy: CPU/Memory Performance Gap The Principle Of Locality Cache Basics:

More information

Random-Access Memory (RAM)

Random-Access Memory (RAM) class12.ppt Memory Random-Access Memory (RAM) Key features 2 RAM is packaged as a chip. Basic storage unit is a cell (one bit per cell). Multiple RAM chips form a memory. Static RAM (SRAM( SRAM) Each cell

More information

MRAM Improvements to Automotive Non- Volatile Memory Storage

MRAM Improvements to Automotive Non- Volatile Memory Storage MRAM Improvements to Automotive Non- Volatile Memory Storage August 5, 2016 Author(s): Chinh Nguyen (Ford) Dona Burkard (Ford) Kelvin Dobbins (Ford) Chuck Bohac (Everspin) 1 Table of Contents Abstract...

More information

Computer Architecture. Computer Architecture. Topics Discussed

Computer Architecture. Computer Architecture. Topics Discussed Computer Architecture Babak Kia Adjunct Professor Boston University College of Engineering Email: bkia -at- bu.edu ENG SC757 - Advanced Microprocessor Design Computer Architecture Computer Architecture

More information

A Brief Review of Processor Architecture. Why are Modern Processors so Complicated? Basic Structure

A Brief Review of Processor Architecture. Why are Modern Processors so Complicated? Basic Structure A Brief Review of Processor Architecture Why are Modern Processors so Complicated? Basic Structure CPU PC IR Regs ALU Memory Fetch PC -> Mem addr [addr] > IR PC ++ Decode Select regs Execute Perform op

More information

State-of-the-Art Flash Memory Technology, Looking into the Future

State-of-the-Art Flash Memory Technology, Looking into the Future State-of-the-Art Flash Memory Technology, Looking into the Future April 16 th, 2012 大 島 成 夫 (Jeff Ohshima) Technology Executive Memory Design and Application Engineering Semiconductor and Storage Products

More information

08 - Address Generator Unit (AGU)

08 - Address Generator Unit (AGU) September 30, 2013 Todays lecture Memory subsystem Address Generator Unit (AGU) Memory subsystem Applications may need from kilobytes to gigabytes of memory Having large amounts of memory on-chip is expensive

More information

Lecture 12: DRAM Basics. Today: DRAM terminology and basics, energy innovations

Lecture 12: DRAM Basics. Today: DRAM terminology and basics, energy innovations Lecture 12: DRAM Basics Today: DRAM terminology and basics, energy innovations 1 DRAM Main Memory Main memory is stored in DRAM cells that have much higher storage density DRAM cells lose their state over

More information

Orthogonal Spin Transfer (OST) A Better Approach

Orthogonal Spin Transfer (OST) A Better Approach Orthogonal Spin Transfer (OST) A Better Approach Flash Memory Summit August 2014 Company Background History Technology Opportunity Formed in 2007 by Allied Minds and NYU to commercialize Orthogonal Spin

More information

Click to edit Master title style Thinking outside of the chip Using co-design to optimize interconnect between IC, Package and PCB.

Click to edit Master title style Thinking outside of the chip Using co-design to optimize interconnect between IC, Package and PCB. Thinking outside of the chip Using co-design to optimize interconnect between IC, Package and PCB John Park Click Current to Over-the-wall edit Master design title process style IC Layout Package design

More information

Heterogeneous 3-D stacking, can we have the best of both (technology) worlds

Heterogeneous 3-D stacking, can we have the best of both (technology) worlds Heterogeneous 3-D stacking, can we have the best of both (technology) worlds Liam Madden Corporate Vice President March 25 th, 2013 The Chameleon Chip Field Programmable Gate Array (FPGA) Page 2 Moore

More information

Computer Graphics Hardware An Overview

Computer Graphics Hardware An Overview Computer Graphics Hardware An Overview Graphics System Monitor Input devices CPU/Memory GPU Raster Graphics System Raster: An array of picture elements Based on raster-scan TV technology The screen (and

More information

Silicon Memories. Why store things in silicon? It s fast!!! Compatible with logic devices (mostly) The main goal is to be cheap

Silicon Memories. Why store things in silicon? It s fast!!! Compatible with logic devices (mostly) The main goal is to be cheap Silicon Memories Why store things in silicon? It s fast!!! Compatible with logic devices (mostly) The main goal is to be cheap Dense -- The smaller the bits, the less area you need, and the more bits you

More information

Computer Fundamentals Lecture 3. Dr Robert Harle. Michaelmas 2013

Computer Fundamentals Lecture 3. Dr Robert Harle. Michaelmas 2013 Computer Fundamentals Lecture 3 Dr Robert Harle Michaelmas 2013 Today's Topics Motherboards, buses, peripherals Memory hierarchy (S)RAM cells Spinning HDDs Flash and SSDs Graphics Cards and GPUs RISC and

More information

Three-Dimensional Integration Technology and Integrated Systems

Three-Dimensional Integration Technology and Integrated Systems Three-Dimensional Integration Technology and Integrated Systems M. Koyanagi, T. Fukishima and T. Tanaka Tohoku University, Japan Department of Bioengineering and Robotics Outline 1. Background 2. Wafer-to-Wafer

More information

Modeling, Architecture, and Applications for Emerging Memory Technologies

Modeling, Architecture, and Applications for Emerging Memory Technologies Future Landscape of Embedded Memories Modeling, Architecture, and Applications for Emerging Memory Technologies Yuan Xie Pennsylvania State University Editor s note: Spin-transfer torque RAM and phase-change

More information

Topics. Caches and Virtual Memory. Cache Operations. Cache Operations. Write Policies on Cache Hit. Read and Write Policies.

Topics. Caches and Virtual Memory. Cache Operations. Cache Operations. Write Policies on Cache Hit. Read and Write Policies. Topics Caches and Virtual Memory CS 333 Fall 2006 Cache Operations Placement strategy Replacement strategy Read and write policy Virtual Memory Why? General overview Lots of terminology Cache Operations

More information

Storage Class Memory: Technology Overview & System Impacts

Storage Class Memory: Technology Overview & System Impacts : Technology Overview & System Impacts Zhichao Liang frankey0207@gmail.com Outline Why & what is storage class memory? A typical storage class memory device: PCM The impacts of SCM on database system Conclusion

More information

3D IC Design and CAD Challenges

3D IC Design and CAD Challenges 3D IC Design and CAD Challenges Ruchir Puri IBM T J Watson Research Center Yorktown Heights, NY 10598 Precedent for 3D Integration: When Real Estate Becomes Pricey 1900 Vertical Integration isn t new!

More information

GPU Architecture. Michael Doggett ATI

GPU Architecture. Michael Doggett ATI GPU Architecture Michael Doggett ATI GPU Architecture RADEON X1800/X1900 Microsoft s XBOX360 Xenos GPU GPU research areas ATI - Driving the Visual Experience Everywhere Products from cell phones to super

More information

CHAPTER 6 Memory. CMPS375 Class Notes (Chap06) Page 1 / 17 by Kuo-pao Yang

CHAPTER 6 Memory. CMPS375 Class Notes (Chap06) Page 1 / 17 by Kuo-pao Yang CHAPTER 6 Memory 6.1 Memory 313 6.2 Types of Memory 313 6.3 The Memory Hierarchy 315 6.3.1 Locality of Reference 318 6.4 Cache Memory 319 6.4.1 Cache Mapping Schemes 321 6.4.2 Replacement Policies 333

More information

Low power GPUs a view from the industry. Edvard Sørgård

Low power GPUs a view from the industry. Edvard Sørgård Low power GPUs a view from the industry Edvard Sørgård 1 ARM in Trondheim Graphics technology design centre From 2006 acquisition of Falanx Microsystems AS Origin of the ARM Mali GPUs Main activities today

More information

Outline - Microprocessors

Outline - Microprocessors Outline - Microprocessors General Concepts Memory Bus Structure Central Processing Unit Registers Instruction Set Clock Architecture Von Neuman vs. Harvard CISC vs. RISC General e Concepts - Computer Hardware

More information

Slot Machine Memory Devices. Week # 5

Slot Machine Memory Devices. Week # 5 Slot Machine Memory Devices Week # 5 Overview Items to be covered: Memory Devices Terminology General Operation CPU Memory Connection Read Only Memory (ROM) Overview ROM Architecture Types of ROMs Random

More information

State-of-Art (SoA) System-on-Chip (SoC) Design HPC SoC Workshop

State-of-Art (SoA) System-on-Chip (SoC) Design HPC SoC Workshop Photos placed in horizontal position with even amount of white space between photos and header State-of-Art (SoA) System-on-Chip (SoC) Design HPC SoC Workshop Michael Holmes Manager, Mixed Signal ASIC/SoC

More information

NAND Flash FAQ. Eureka Technology. apn5_87. NAND Flash FAQ

NAND Flash FAQ. Eureka Technology. apn5_87. NAND Flash FAQ What is NAND Flash? What is the major difference between NAND Flash and other Memory? Structural differences between NAND Flash and NOR Flash What does NAND Flash controller do? How to send command to

More information

LPDDR3 and Wide I/O DRAM:

LPDDR3 and Wide I/O DRAM: LPDDR3 and Wide I/O DRAM: Interface Changes that give PC-Like Memory Performance to Mobile Devices Marc Greenberg, Director, Product Marketing - Memcon 2012 San Jose Sept 18, 2012 Agenda What is PC-like

More information

COMPUTER HARDWARE. Input- Output and Communication Memory Systems

COMPUTER HARDWARE. Input- Output and Communication Memory Systems COMPUTER HARDWARE Input- Output and Communication Memory Systems Computer I/O I/O devices commonly found in Computer systems Keyboards Displays Printers Magnetic Drives Compact disk read only memory (CD-ROM)

More information

From physics to products

From physics to products From physics to products From MRAM to MLU and beyond memory Magnetic Random Access Memory Magnetic Logic Unit Lucien Lombard Crocus-Technology Overview 1 - The semiconductor industry 2 - Crocus-Technology

More information

Module 2. Embedded Processors and Memory. Version 2 EE IIT, Kharagpur 1

Module 2. Embedded Processors and Memory. Version 2 EE IIT, Kharagpur 1 Module 2 Embedded Processors and Memory Version 2 EE IIT, Kharagpur 1 Lesson 5 Memory-I Version 2 EE IIT, Kharagpur 2 Instructional Objectives After going through this lesson the student would Pre-Requisite

More information

361 Computer Architecture Lecture 14: Cache Memory

361 Computer Architecture Lecture 14: Cache Memory 1 361 Computer Architecture Lecture 14 Memory cache.1 The Motivation for s Memory System Processor DRAM Motivation Large memories (DRAM) are slow Small memories (SRAM) are fast Make the average access

More information

Stratix II Device System Power Considerations

Stratix II Device System Power Considerations Stratix II Device System Power Considerations June 2004, ver. 1.0 Application Note 355 Introduction Power Components Altera developed Stratix II devices using a 90-nm process technology optimized for performance

More information

Embedded STT-MRAM for Mobile Applications:

Embedded STT-MRAM for Mobile Applications: Embedded STT-MRAM for Mobile Applications: Enabling Advanced Chip Architectures Seung H. Kang Qualcomm Inc. Acknowledgments I appreciate valuable contributions and supports from Kangho Lee, Xiaochun Zhu,

More information

3D Graphics Hardware Graphics II Spring 1999

3D Graphics Hardware Graphics II Spring 1999 3D Graphics Hardware 15-463 Graphics II Spring 1999 Topics Graphics Architecture Uniprocessor Acceleration Front-End Multiprocessing Pipelined Parallel Back-End Multiprocessing Pipelined Parallel Graphics

More information

With respect to the way of data access we can classify memories as:

With respect to the way of data access we can classify memories as: Memory Classification With respect to the way of data access we can classify memories as: - random access memories (RAM), - sequentially accessible memory (SAM), - direct access memory (DAM), - contents

More information

Introduction to Computer Graphics 8. Buffers and Mapping techniques (A)

Introduction to Computer Graphics 8. Buffers and Mapping techniques (A) Introduction to Computer Graphics 8. Buffers and Mapping techniques (A) National Chiao Tung Univ, Taiwan By: I-Chen Lin, Assistant Professor Textbook: Hearn and Baker, Computer Graphics, 3rd Ed., Prentice

More information

Memory and Programmable Logic

Memory and Programmable Logic Chapter 7 Memory and Programmable Logic 7 Outline! Introduction! RandomAccess Memory! Memory Decoding! Error Detection and Correction! ReadOnly Memory! Programmable Devices! Sequential Programmable Devices

More information

Shader Model 3.0, Best Practices. Phil Scott Technical Developer Relations, EMEA

Shader Model 3.0, Best Practices. Phil Scott Technical Developer Relations, EMEA Shader Model 3.0, Best Practices Phil Scott Technical Developer Relations, EMEA Overview Short Pipeline Overview CPU Bound new optimization opportunities Obscure bits of the pipeline that can trip you

More information

The Memory Gap and the Future of High Performance Memories

The Memory Gap and the Future of High Performance Memories Revised June 2001 The Memory Gap and the Future of High Performance Memories by Maurice V.Wilkes AT&T Research Laboratories - Cambridge, UK The first main memories to be used on digital computers were

More information

Bi-directional FlipFET TM MOSFETs for Cell Phone Battery Protection Circuits

Bi-directional FlipFET TM MOSFETs for Cell Phone Battery Protection Circuits Bi-directional FlipFET TM MOSFETs for Cell Phone Battery Protection Circuits As presented at PCIM 2001 Authors: *Mark Pavier, *Hazel Schofield, *Tim Sammon, **Aram Arzumanyan, **Ritu Sodhi, **Dan Kinzer

More information

Outline. Lecture 6: EITF20 Computer Architecture. CPU Performance Equation - Pipelining. Dynamic scheduling, speculation - summary.

Outline. Lecture 6: EITF20 Computer Architecture. CPU Performance Equation - Pipelining. Dynamic scheduling, speculation - summary. Outline 1 Reiteration Lecture 6: EITF20 Computer Architecture 2 Memory hierarchy Anders Ardö EIT Electrical and Information Technology, Lund University November 13, 2013 3 Cache memory 4 Cache performance

More information

Hugh de Lacy - Technical Manager Alun Jones Technical Director Micross Components Ltd., Given at CMSE, Portsmouth 2008

Hugh de Lacy - Technical Manager Alun Jones Technical Director Micross Components Ltd.,  Given at CMSE, Portsmouth 2008 Shrinking Silicon Feature Sizes Consequences for Reliability Hugh de Lacy - Technical Manager Alun Jones Technical Director Micross Components Ltd., www.micross.com Given at CMSE, Portsmouth 2008 1 The

More information

Radeon HD 2900 and Geometry Generation. Michael Doggett

Radeon HD 2900 and Geometry Generation. Michael Doggett Radeon HD 2900 and Geometry Generation Michael Doggett September 11, 2007 Overview Introduction to 3D Graphics Radeon 2900 Starting Point Requirements Top level Pipeline Blocks from top to bottom Command

More information

Chapter 6 The Memory System. Jin-Fu Li Department of Electrical Engineering National Central University Jungli, Taiwan

Chapter 6 The Memory System. Jin-Fu Li Department of Electrical Engineering National Central University Jungli, Taiwan Chapter 6 The Memory System Jin-Fu Li Department of Electrical Engineering National Central University Jungli, Taiwan Outline Basic Concepts Semiconductor Random Access Memories Read Only Memories Speed,

More information

Avoiding Unnecessary Write Operations in STT-MRAM for Low Power Implementation

Avoiding Unnecessary Write Operations in STT-MRAM for Low Power Implementation Avoiding Unnecessary Write Operations in STT-MRAM for Low Implementation Rajendra Bishnoi, Fabian Oboril, Mojtaba Ebrahimi and Mehdi B. Tahoori Chair of Dependable Nano Computing (CDNC), Karlsruhe Institute

More information

Samsung emcp. WLI DDP Package. Samsung Multi-Chip Packages can help reduce the time to market for handheld devices BROCHURE

Samsung emcp. WLI DDP Package. Samsung Multi-Chip Packages can help reduce the time to market for handheld devices BROCHURE Samsung emcp Samsung Multi-Chip Packages can help reduce the time to market for handheld devices WLI DDP Package Deliver innovative portable devices more quickly. Offer higher performance for a rapidly

More information

GPU(Graphics Processing Unit) with a Focus on Nvidia GeForce 6 Series. By: Binesh Tuladhar Clay Smith

GPU(Graphics Processing Unit) with a Focus on Nvidia GeForce 6 Series. By: Binesh Tuladhar Clay Smith GPU(Graphics Processing Unit) with a Focus on Nvidia GeForce 6 Series By: Binesh Tuladhar Clay Smith Overview History of GPU s GPU Definition Classical Graphics Pipeline Geforce 6 Series Architecture Vertex

More information

DRAM and Memory System Trends

DRAM and Memory System Trends DRAM and System Trends Steven Woo Rambus Inc. October 24, 2004 Moore s Law Driving Performance Clock Speed Performance driven by increasing clock speeds and functionality Increasing transistor counts need

More information

Comparing Technologies: MRAM vs. FRAM

Comparing Technologies: MRAM vs. FRAM MRAM TECHNOLOGY MRAM or Magnetic Random Access Memory uses a 1 transistor 1 magnetic tunnel junction (1T-1MTJ) architecture with the magnetic state of a ferromagnetic material as the data storage element.

More information

DIMM Technologies DIMM (dual inline memory module) Has independent pins on opposite sides of module

DIMM Technologies DIMM (dual inline memory module) Has independent pins on opposite sides of module 1 2 3 4 5 6 7 8 9 A+ Guide to Hardware, 4e Chapter 6 Upgrading Memory Objectives Learn about the different kinds of physical memory and how they work Learn how to upgrade memory Learn how to troubleshoot

More information

Why Hybrid Storage Strategies Give the Best Bang for the Buck

Why Hybrid Storage Strategies Give the Best Bang for the Buck JANUARY 28, 2014, SAN JOSE, CA Tom Coughlin, Coughlin Associates & Jim Handy, Objective Analysis PRESENTATION TITLE GOES HERE Why Hybrid Storage Strategies Give the Best Bang for the Buck 1 Outline Different

More information

CSEE 3827: Fundamentals of Computer Systems, Spring Memory Arrays

CSEE 3827: Fundamentals of Computer Systems, Spring Memory Arrays CSEE 3827: Fundamentals of Computer Systems, Spring 2011 6. Memory Arrays Prof. Martha Kim (martha@cs.columbia.edu) Web: http://www.cs.columbia.edu/~martha/courses/3827/sp11/ Outline (H&H 5.5-5.6) Memory

More information

GPGPU Computing. Yong Cao

GPGPU Computing. Yong Cao GPGPU Computing Yong Cao Why Graphics Card? It s powerful! A quiet trend Copyright 2009 by Yong Cao Why Graphics Card? It s powerful! Processor Processing Units FLOPs per Unit Clock Speed Processing Power

More information

3D Cache Hierarchy Optimization

3D Cache Hierarchy Optimization 3D Cache Hierarchy Optimization Leonid Yavits, Amir Morad, Ran Ginosar Department of Electrical Engineering Technion Israel Institute of Technology Haifa, Israel yavits@tx.technion.ac.il, amirm@tx.technion.ac.il,

More information

Intel s Revolutionary 22 nm Transistor Technology

Intel s Revolutionary 22 nm Transistor Technology Intel s Revolutionary 22 nm Transistor Technology Mark Bohr Intel Senior Fellow Kaizad Mistry 22 nm Program Manager May, 2011 1 Key Messages Intel is introducing revolutionary Tri-Gate transistors on its

More information

Main Memory Background

Main Memory Background ECE 554 Computer Architecture Lecture 5 Main Memory Spring 2013 Sudeep Pasricha Department of Electrical and Computer Engineering Colorado State University Pasricha; portions: Kubiatowicz, Patterson, Mutlu,

More information

Non-Volatile Memory. Non-Volatile Memory & its use in Enterprise Applications. Contents

Non-Volatile Memory. Non-Volatile Memory & its use in Enterprise Applications. Contents Non-Volatile Memory Non-Volatile Memory & its use in Enterprise Applications Author: Adrian Proctor, Viking Technology [email: adrian.proctor@vikingtechnology.com] This paper reviews different memory technologies,

More information

1 / 25. CS 137: File Systems. Persistent Solid-State Storage

1 / 25. CS 137: File Systems. Persistent Solid-State Storage 1 / 25 CS 137: File Systems Persistent Solid-State Storage Technology Change is Coming Introduction Disks are cheaper than any solid-state memory Likely to be true for many years But SSDs are now cheap

More information

Alpha CPU and Clock Design Evolution

Alpha CPU and Clock Design Evolution Alpha CPU and Clock Design Evolution This lecture uses two papers that discuss the evolution of the Alpha CPU and clocking strategy over three CPU generations Gronowski, Paul E., et.al., High Performance

More information

AN OFFSET-CHARGE INDEPENDENT SINGLE- ELECTRONICS RS FLIP-FLOP

AN OFFSET-CHARGE INDEPENDENT SINGLE- ELECTRONICS RS FLIP-FLOP AN OFFSET-CHARGE INDEPENDENT SINGLE- ELECTRONICS RS FLIP-FLOP P. HADLEY, E. H. VISSCHER, Y. CHEN, and J. E. MOOIJ Applied Physics, Delft University of Technology Lorentzweg 1, 2628 CJ Delft, The Netherlands

More information

1.Introduction. Introduction. Most of slides come from Semiconductor Manufacturing Technology by Michael Quirk and Julian Serda.

1.Introduction. Introduction. Most of slides come from Semiconductor Manufacturing Technology by Michael Quirk and Julian Serda. .Introduction If the automobile had followed the same development cycle as the computer, a Rolls- Royce would today cost $00, get one million miles to the gallon and explode once a year Most of slides

More information

Lecture 12: MOS Decoders, Gate Sizing

Lecture 12: MOS Decoders, Gate Sizing Lecture 12: MOS Decoders, Gate Sizing MAH, AEN EE271 Lecture 12 1 Memory Reading W&E 8.3.1-8.3.2 - Memory Design Introduction Memories are one of the most useful VLSI building blocks. One reason for their

More information

Qualcomm Technologies, Inc. Designing Mobile Devices for Low Power and Thermal Efficiency

Qualcomm Technologies, Inc. Designing Mobile Devices for Low Power and Thermal Efficiency Qualcomm Technologies, Inc. Designing Mobile Devices for Low Power and Thermal Efficiency October 2013 1 Qualcomm Technologies Inc. Qualcomm, Krait, and Hexagon are trademarks of Qualcomm Incorporated,

More information

Interconnection technologies

Interconnection technologies Interconnection technologies Ron Ho VLSI Research Group Sun Microsystems Laboratories 1 Acknowledgements Many contributors to the work described here > Robert Drost, David Hopkins, Alex Chow, Tarik Ono,

More information

Samsung 2bit 3D V-NAND technology

Samsung 2bit 3D V-NAND technology Samsung 2bit 3D V-NAND technology Gain more capacity, speed, endurance and power efficiency Traditional NAND technology cannot keep pace with growing data demands Introduction Data traffic continues to

More information

GPUs: Doing More Than Just Games. Mark Gahagan CSE 141 November 29, 2012

GPUs: Doing More Than Just Games. Mark Gahagan CSE 141 November 29, 2012 GPUs: Doing More Than Just Games Mark Gahagan CSE 141 November 29, 2012 Outline Introduction: Why multicore at all? Background: What is a GPU? Quick Look: Warps and Threads (SIMD) NVIDIA Tesla: The First

More information

9 Memory Devices & Chip Area

9 Memory Devices & Chip Area 9 Memory Devices & Chip Area 18-548/15-548 Memory System Architecture Philip Koopman September 30, 1998 Required Reading: Understanding SRAM (App. Note) What s All This Flash Stuff? (App. Note) Assignments

More information

The Quest for Speed - Memory. Cache Memory. A Solution: Memory Hierarchy. Memory Hierarchy

The Quest for Speed - Memory. Cache Memory. A Solution: Memory Hierarchy. Memory Hierarchy The Quest for Speed - Memory Cache Memory CSE 4, Spring 25 Computer Systems http://www.cs.washington.edu/4 If all memory accesses (IF/lw/sw) accessed main memory, programs would run 20 times slower And

More information

Layout, Fabrication, and Elementary Logic Design

Layout, Fabrication, and Elementary Logic Design Introduction to CMOS VLSI Design Layout, Fabrication, and Elementary Logic Design Adapted from Weste & Harris CMOS VLSI Design Overview Implementing switches with CMOS transistors How to compute logic

More information

Aeroflex Solutions for Stacked Memory Packaging Increasing Density while Decreasing Area

Aeroflex Solutions for Stacked Memory Packaging Increasing Density while Decreasing Area Aeroflex Solutions for Stacked Memory Packaging Increasing Density while Decreasing Area Authors: Ronald Lake Tim Meade, Sean Thorne, Clark Kenyon, Richard Jadomski www.aeroflex.com/memories Military and

More information

Hardware. What s inside the. box?

Hardware. What s inside the. box? Hardware What s inside the box? Inside the case Motherboard CPU Hard Disk Memory Ethernet Card Optical Drive Power Supply Fan Video Card Sound Card http://www.youtube.com/watch?v=-gqmtitmdas Motherboard

More information

Slide Set 8. for ENCM 369 Winter 2015 Lecture Section 01. Steve Norman, PhD, PEng

Slide Set 8. for ENCM 369 Winter 2015 Lecture Section 01. Steve Norman, PhD, PEng Slide Set 8 for ENCM 369 Winter 2015 Lecture Section 01 Steve Norman, PhD, PEng Electrical & Computer Engineering Schulich School of Engineering University of Calgary Winter Term, 2015 ENCM 369 W15 Section

More information

Memory Design. Random Access Memory. Row decoder. n bit address. 2 m+k memory cells wide. n-1:k. Column Decoder. k-1:0.

Memory Design. Random Access Memory. Row decoder. n bit address. 2 m+k memory cells wide. n-1:k. Column Decoder. k-1:0. Memory Design Random Access Memory Row decoder 2 m+k memory cells wide n-1:k k-1:0 Column Decoder n bit address Sense Amplifier m bit data word Memory Timing: Approaches Address bus Row Address Column

More information

Agenda. Michele Taliercio, Il circuito Integrato, Novembre 2001

Agenda. Michele Taliercio, Il circuito Integrato, Novembre 2001 Agenda Introduzione Il mercato Dal circuito integrato al System on a Chip (SoC) La progettazione di un SoC La tecnologia Una fabbrica di circuiti integrati 28 How to handle complexity G The engineering

More information

Homework # 2. Solutions. 4.1 What are the differences among sequential access, direct access, and random access?

Homework # 2. Solutions. 4.1 What are the differences among sequential access, direct access, and random access? ECE337 / CS341, Fall 2005 Introduction to Computer Architecture and Organization Instructor: Victor Manuel Murray Herrera Date assigned: 09/19/05, 05:00 PM Due back: 09/30/05, 8:00 AM Homework # 2 Solutions

More information

Architectures and Platforms

Architectures and Platforms Hardware/Software Codesign Arch&Platf. - 1 Architectures and Platforms 1. Architecture Selection: The Basic Trade-Offs 2. General Purpose vs. Application-Specific Processors 3. Processor Specialisation

More information

Digital Circuits. Frequently Asked Questions

Digital Circuits. Frequently Asked Questions Digital Circuits Frequently Asked Questions Module 1: Digital & Analog Signals 1. What is a signal? Signals carry information and are defined as any physical quantity that varies with time, space, or any

More information

In-Memory Databases Algorithms and Data Structures on Modern Hardware. Martin Faust David Schwalb Jens Krüger Jürgen Müller

In-Memory Databases Algorithms and Data Structures on Modern Hardware. Martin Faust David Schwalb Jens Krüger Jürgen Müller In-Memory Databases Algorithms and Data Structures on Modern Hardware Martin Faust David Schwalb Jens Krüger Jürgen Müller The Free Lunch Is Over 2 Number of transistors per CPU increases Clock frequency

More information

Low-Cost Epoch-Based Correlation Prefetching for Commercial Applications. Yuan Chou Architecture Technology Group Microelectronics Division

Low-Cost Epoch-Based Correlation Prefetching for Commercial Applications. Yuan Chou Architecture Technology Group Microelectronics Division Low-Cost Epoch-Based Correlation Prefetching for Commercial Applications Yuan Chou Architecture Technology Group Microelectronics Division 1 2 Motivation Performance of many commercial applications limited

More information

Semiconductor Memories

Semiconductor Memories Chapter 8 Semiconductor Memories (based on Kang, Leblebici. CMOS Digital Integrated Circuits 8.1 General concepts Data storage capacity available on a single integrated circuit grows exponentially being

More information

Chapter 1 Computer System Overview

Chapter 1 Computer System Overview Operating Systems: Internals and Design Principles Chapter 1 Computer System Overview Eighth Edition By William Stallings Operating System Exploits the hardware resources of one or more processors Provides

More information

The Evolving NAND Flash Business Model for SSD. Steffen Hellmold VP BD, SandForce

The Evolving NAND Flash Business Model for SSD. Steffen Hellmold VP BD, SandForce The Evolving NAND Flash Business Model for SSD Steffen Hellmold VP BD, SandForce Flash Forward: Flash Flash Memory Memory Storage Storage Solutions Solutions Solid State Storage - Vision Solid State Storage

More information

Multicore Architectures

Multicore Architectures Multicore Architectures Week 1, Lecture 2 Multicore Landscape Intel Dual and quad-core Pentium family. 80-core demonstration last year. AMD Dual, triple (?!), and quad-core Opteron family. IBM Dual and

More information

Phase-state Low Electron-number Drive Random Access Memory (PLEDM)

Phase-state Low Electron-number Drive Random Access Memory (PLEDM) TA 7.4 Phase-state Low Electron-number Drive Random Access Memory (PLEDM) Kazuo Nakazato, Kiyoo Itoh 1, Haroon Ahmed 2, Hiroshi Mizuta, Teruaki Kisu 3, Masataka Kato 4, Takeshi Sakata 1 Hitachi Cambridge

More information

Energy-Efficient Manycore Architectures for Big Data

Energy-Efficient Manycore Architectures for Big Data Energy-Efficient Manycore Architectures for Big Data Department of Computer Science University of Illinois at Urbana-Champaign http://iacoma.cs.uiuc.edu BPOE April 2015 Wanted: Energy-Efficient Computing

More information

Parallel Simplification of Large Meshes on PC Clusters

Parallel Simplification of Large Meshes on PC Clusters Parallel Simplification of Large Meshes on PC Clusters Hua Xiong, Xiaohong Jiang, Yaping Zhang, Jiaoying Shi State Key Lab of CAD&CG, College of Computer Science Zhejiang University Hangzhou, China April

More information

Memory Basics. SRAM/DRAM Basics

Memory Basics. SRAM/DRAM Basics Memory Basics RAM: Random Access Memory historically defined as memory array with individual bit access refers to memory with both Read and Write capabilities ROM: Read Only Memory no capabilities for

More information

Writing Applications for the GPU Using the RapidMind Development Platform

Writing Applications for the GPU Using the RapidMind Development Platform Writing Applications for the GPU Using the RapidMind Development Platform Contents Introduction... 1 Graphics Processing Units... 1 RapidMind Development Platform... 2 Writing RapidMind Enabled Applications...

More information

Processing Unit. Backing Store

Processing Unit. Backing Store SYSTEM UNIT Basic Computer Structure Input Unit Central Processing Unit Main Memory Output Unit Backing Store The Central Processing Unit (CPU) is the unit in the computer which operates the whole computer

More information

Flash & DRAM Si Scaling Challenges, Emerging Non-Volatile Memory Technology Enablement - Implications to Enterprise Storage and Server Compute systems

Flash & DRAM Si Scaling Challenges, Emerging Non-Volatile Memory Technology Enablement - Implications to Enterprise Storage and Server Compute systems Flash & DRAM Si Scaling Challenges, Emerging Non-Volatile Memory Technology Enablement - Implications to Enterprise Storage and Server Compute systems Jung H. Yoon, Hillery C. Hunter, Gary A. Tressler

More information

Computer Systems Structure Main Memory Organization

Computer Systems Structure Main Memory Organization Computer Systems Structure Main Memory Organization Peripherals Computer Central Processing Unit Main Memory Computer Systems Interconnection Communication lines Input Output Ward 1 Ward 2 Storage/Memory

More information

Outline. Introduction Interconnect Modeling Wire Resistance Wire Capacitance Wire RC Delay Crosstalk Wire Engineering Repeaters. 4th Ed.

Outline. Introduction Interconnect Modeling Wire Resistance Wire Capacitance Wire RC Delay Crosstalk Wire Engineering Repeaters. 4th Ed. Lecture 14: Wires Outline Introduction Interconnect Modeling Wire Resistance Wire Capacitance Wire RC Delay Crosstalk Wire Engineering Repeaters 2 Introduction Chips are mostly made of wires called interconnect

More information