Memory Hierarchy. Arquitectura de Computadoras. Centro de Investigación n y de Estudios Avanzados del IPN. adiaz@cinvestav.mx. MemoryHierarchy- 1



Similar documents
361 Computer Architecture Lecture 14: Cache Memory

Memory ICS 233. Computer Architecture and Assembly Language Prof. Muhamed Mudawar

Computer Systems Structure Main Memory Organization

COMPUTER HARDWARE. Input- Output and Communication Memory Systems

The Big Picture. Cache Memory CSE Memory Hierarchy (1/3) Disk

In-Memory Databases Algorithms and Data Structures on Modern Hardware. Martin Faust David Schwalb Jens Krüger Jürgen Müller

Computer Architecture

CSCA0102 IT & Business Applications. Foundation in Business Information Technology School of Engineering & Computing Sciences FTMS College Global

The Quest for Speed - Memory. Cache Memory. A Solution: Memory Hierarchy. Memory Hierarchy

Computer Architecture

1. Memory technology & Hierarchy

Slide Set 8. for ENCM 369 Winter 2015 Lecture Section 01. Steve Norman, PhD, PEng

Memory Basics. SRAM/DRAM Basics

Main Memory & Backing Store. Main memory backing storage devices

Module 2. Embedded Processors and Memory. Version 2 EE IIT, Kharagpur 1

CPS104 Computer Organization and Programming Lecture 18: Input-Output. Robert Wagner

Chapter 5 :: Memory and Logic Arrays

Hardware: Input, Processing, and Output Devices. A PC in Every Home. Assembling a Computer System

Input / Ouput devices. I/O Chapter 8. Goals & Constraints. Measures of Performance. Anatomy of a Disk Drive. Introduction - 8.1

Secondary Storage. Any modern computer system will incorporate (at least) two levels of storage: magnetic disk/optical devices/tape systems

Unit Storage Structures 1. Storage Structures. Unit 4.3

Computer Organization and Architecture. Characteristics of Memory Systems. Chapter 4 Cache Memory. Location CPU Registers and control unit memory

CS 6290 I/O and Storage. Milos Prvulovic

Memory unit. 2 k words. n bits per word

Physical Data Organization

Logical Operations. Control Unit. Contents. Arithmetic Operations. Objectives. The Central Processing Unit: Arithmetic / Logic Unit.

Computer Architecture

RAID. RAID 0 No redundancy ( AID?) Just stripe data over multiple disks But it does improve performance. Chapter 6 Storage and Other I/O Topics 29

Chapter 6. Inside the System Unit. What You Will Learn... Computers Are Your Future. What You Will Learn... Describing Hardware Performance

Data Storage - I: Memory Hierarchies & Disks

With respect to the way of data access we can classify memories as:

A Close Look at PCI Express SSDs. Shirish Jamthe Director of System Engineering Virident Systems, Inc. August 2011

Measuring Cache and Memory Latency and CPU to Memory Bandwidth

Bandwidth Calculations for SA-1100 Processor LCD Displays

SAP HANA - Main Memory Technology: A Challenge for Development of Business Applications. Jürgen Primsch, SAP AG July 2011

RAM & ROM Based Digital Design. ECE 152A Winter 2012

Storing Data: Disks and Files

Introduction To Computers: Hardware and Software

File System & Device Drive. Overview of Mass Storage Structure. Moving head Disk Mechanism. HDD Pictures 11/13/2014. CS341: Operating System

We r e going to play Final (exam) Jeopardy! "Answers:" "Questions:" - 1 -

Price/performance Modern Memory Hierarchy

Technical Note DDR2 Offers New Features and Functionality

Primary Memory. Input Units CPU (Central Processing Unit)

Solid State Drive Architecture

Computer Performance. Topic 3. Contents. Prerequisite knowledge Before studying this topic you should be able to:

Topics in Computer System Performance and Reliability: Storage Systems!

Chapter 2: Computer-System Structures. Computer System Operation Storage Structure Storage Hierarchy Hardware Protection General System Architecture

COMPUTER ORGANIZATION ARCHITECTURES FOR EMBEDDED COMPUTING

Chapter 1 Computer System Overview

EFFICIENT EXTERNAL SORTING ON FLASH MEMORY EMBEDDED DEVICES

Introduction to Information System Layers and Hardware. Introduction to Information System Components Chapter 1 Part 1 of 4 CA M S Mehta, FCA

Quiz for Chapter 6 Storage and Other I/O Topics 3.10

Introduction to Microprocessors

Understanding Memory TYPES OF MEMORY

Chapter 3: Computer Hardware Components: CPU, Memory, and I/O

Candidates should be able to: (i) describe the purpose of RAM in a computer system

Getting the Performance Out Of High Performance Computing. Getting the Performance Out Of High Performance Computing

Mass Storage Structure

Communicating with devices

University of Dublin Trinity College. Storage Hardware.

Disk Storage & Dependability

Table 1: Address Table

Why Latency Lags Bandwidth, and What it Means to Computing

Read this before starting!

Computers. Hardware. The Central Processing Unit (CPU) CMPT 125: Lecture 1: Understanding the Computer

Chapter 2 Data Storage

The Central Processing Unit:

CS (CCN 27156) CS (CCN 26880) Software Engineering for Scientific Computing. Lecture 1: Introduction

Machine Architecture and Number Systems. Major Computer Components. Schematic Diagram of a Computer. The CPU. The Bus. Main Memory.

Writing Assignment #2 due Today (5:00pm) - Post on your CSC101 webpage - Ask if you have questions! Lab #2 Today. Quiz #1 Tomorrow (Lectures 1-7)

Memory Systems. Static Random Access Memory (SRAM) Cell

Random-Access Memory (RAM) The Memory Hierarchy. SRAM vs DRAM Summary. Conventional DRAM Organization. Page 1

Memory Basics ~ ROM, DRAM, SRAM, Cache Memory The article was added by Kyle Duke

Chapter 2 Basic Structure of Computers. Jin-Fu Li Department of Electrical Engineering National Central University Jungli, Taiwan

NAND Flash FAQ. Eureka Technology. apn5_87. NAND Flash FAQ

A3 Computer Architecture

Chapter Introduction. Storage and Other I/O Topics. p. 570( 頁 585) Fig I/O devices can be characterized by. I/O bus connections

Next Generation GPU Architecture Code-named Fermi

DDR2 Specific SDRAM Functions

Outline. Principles of Database Management Systems. Memory Hierarchy: Capacities and access times. CPU vs. Disk Speed

External Sorting. Why Sort? 2-Way Sort: Requires 3 Buffers. Chapter 13

Naveen Muralimanohar Rajeev Balasubramonian Norman P Jouppi

How To Understand The Design Of A Microprocessor

1 / 25. CS 137: File Systems. Persistent Solid-State Storage

Reconfigurable Architecture Requirements for Co-Designed Virtual Machines

CSE 160 Lecture 5. The Memory Hierarchy False Sharing Cache Coherence and Consistency. Scott B. Baden

1 Storage Devices Summary

Memory. The memory types currently in common usage are:

SDRAM and DRAM Memory Systems Overview

The Classical Architecture. Storage 1 / 36

Table Of Contents. Page 2 of 26. *Other brands and names may be claimed as property of others.

Configuring and using DDR3 memory with HP ProLiant Gen8 Servers

Introduction to I/O and Disk Management

Operating Systems. Virtual Memory

Devices and Device Controllers

Improve Business Productivity and User Experience with a SanDisk Powered SQL Server 2014 In-Memory OLTP Database

Homework # 2. Solutions. 4.1 What are the differences among sequential access, direct access, and random access?

COMPUTER SCIENCE AND ENGINEERING - Microprocessor Systems - Mitchell Aaron Thornton

FAWN - a Fast Array of Wimpy Nodes

Enabling Technologies for Distributed Computing

Transcription:

Hierarchy Arturo Díaz D PérezP Centro de Investigación n y de Estudios Avanzados del IPN adiaz@cinvestav.mx Hierarchy- 1

The Big Picture: Where are We Now? The Five Classic Components of a Computer Processor Input Control Datapath Output Today s Topics: Locality and Hierarchy Organization Hierarchy- 2

Trends Technology Typical Access Time $ per GB in 2004 0.5-5 ns $4000-$10,000 DRAM 50-70 ns $100-$200 Magnetic Disck 5,000,000-20,000,000 ns $0.50-$2 Hierarchy- 3

Technology Trends Capacity Speed (latency) Logic:2x in 3 years 2x in 3 years DRAM: 4x in 3 years 2x in 10 years Disk: 4x in 3 years 2x in 10 years DRAM Year Size Cycle Time 1000:1! 2:1! 1980 64 Kb 250 ns 1983 256 Kb 220 ns 1986 1 Mb 190 ns 1989 4 Mb 165 ns 1992 16 Mb 145 ns 1995 64 Mb 120 ns 2004 2 GB 50-70ns Hierarchy- 4

Who Cares About the Hierarchy? Processor-DRAM Gap (latency) Performance 1000 100 10 1 1980 1981 1982 1983 1984 1985 Moore s Law Less Law? DRAM 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 CPU µproc 60%/yr. (2X/1.5yr) Processor- Performance Gap: (grows 50% / year) DRAM 9%/yr. (2X/10 yrs) Time Hierarchy- 5

Current Microprocessor Rely on caches to bridge gap Microprocessor-DRAM performance gap time of a full cache miss in instructions executed 1st Alpha (7000): 340 ns/5.0 ns = 68 clks x 2 or 136 instructions 2nd Alpha (8400): 266 ns/3.3 ns = 80 clks x 4 or 320 instructions 3rd Alpha: 180 ns/1.7 ns =108 clks x 6 or 648 instructions 1/2X latency x 3X clock rate x 3X Instr/clock -5X Hierarchy- 6

Impact on Performance Suppose a processor executes at Clock Rate = 200 MHz (5 ns per cycle) CPI = 1.1 50% arith/logic, 30% ld/st, 20% control Inst Miss (0.5) 16% Ideal CPI (1.1) 35% Suppose that 10% of memory operations get 50 cycle miss penalty DataMiss (1.6) 49% CPI = ideal CPI + average stalls per instruction = 1.1(cyc) +( 0.30 (datamops/ins) x 0.10 (miss/datamop) x 50 (cycle/miss) ) = 1.1 cycle + 1.5 cycle = 2. 6 58 % of the time the processor is stalled waiting for memory! a 1% instruction miss rate would add an additional 0.5 cycles to the CPI! Hierarchy- 7

The Goal: illusion of large, fast, cheap memory Fact: Large memories are slow, fast memories are small How do we create a memory that is large, cheap and fast (most of the time)? Hierarchy Parallelism Hierarchy- 8

An Expanded View of the System Processor Control Datapath Speed: Size: Cost: Fastest Smallest Highest Slowest Biggest Lowest Hierarchy- 9

Why hierarchy works The Principle of Locality: Program access a relatively small portion of the address space at any instant of time. Probability of reference 0 Address Space 2 n - 1 Hierarchy- 10

Hierarchy: How Does it Work? Temporal Locality (Locality in Time): Clustering in time: items referenced in the immediate past have a high probability of being re-referenced in the immediate future => Keep most recently accessed data items closer to the processor Spatial Locality (Locality in Space): Clustering in space: items located physically near an item referenced in the immediate past have a high probability of being re-referenced in the immediate future => Move blocks consists of contiguous words to the upper levels To Processor From Processor Upper Level Blk X Lower Level Blk Y Hierarchy- 11

Visualizing Locality The memory map [Hatfield and Gerald 1971] Hierarchy- 12

Hierarchy of a Modern Computer System By taking advantage of the principle of locality: Present the user with as much memory as is available in the cheapest technology. Provide access at the speed offered by the fastest technology. Processor Datapath Control Registers On-Chip Cache Second Level Cache () Main (DRAM) Secondary Storage (Disk) Tertiary Storage (Tape) Speed (ns): 1s 10s 100s Size (bytes): 100s Ks Ms 10,000,000s (10s ms) Gs 10,000,000,000s (10s sec) Ts Hierarchy- 13

How is the hierarchy managed? Registers <-> by compiler (programmer?) cache <-> memory by the hardware memory <-> disks by the hardware and operating system (virtual memory) by the programmer (files) Hierarchy- 14

Hierarchy: Terminology Hit: data appears in some block in the upper level (example: Block X) Hit Rate: the fraction of memory access found in the upper level Hit Time: Time to access the upper level which consists of RAM access time + Time to determine hit/miss Miss: data needs to be retrieve from a block in the lower level (Block Y) Miss Rate = 1 - (Hit Rate) Miss Penalty: Time to replace a block in the upper level + Time to deliver the block the processor Hit Time << Miss Penalty To Processor From Processor Upper Level Blk X Lower Level Blk Y Hierarchy- 15

Hierarchy Technology Tecnologías de Información Random Access: Random is good: access time is the same for all locations DRAM: Dynamic Random Access High density, low power, cheap, slow Dynamic: need to be refreshed regularly : Static Random Access Low density, high power, expensive, fast Static: content will last forever (until lose power) Non-so-random Access Technology: Access time varies from location to location and from time to time Examples: Disk, CDROM Sequential Access Technology: access time linear in location (e.g.,tape) We will concentrate on random access technology The Main : DRAMs + Caches: s Hierarchy- 16

Main Background Performance of Main : Latency: Cache Miss Penalty Access Time: time between request and word arrives Cycle Time: time between requests Bandwidth: I/O & Large Block Miss Penalty (L2) Main is DRAM: Dynamic Random Access Dynamic since needs to be refreshed periodically (8 ms) Addresses divided into 2 halves ( as a 2D matrix): RAS or Row Access Strobe CAS or Column Access Strobe Cache uses : Static Random Access No refresh (6 transistors/bit vs. 1 transistor) Size: DRAM/ - 4-8 Cost/Cycle time: /DRAM - 8-16 Hierarchy- 17

Typical Organization Din 3 Din 2 Din 1 Din 0 WrEn A0 A1 A2 16x4 A3 Dout 3 Dout 2 Dout 1 Dout 0 Hierarchy- 18

Typical Organization: 16-word x 4-bit4 Din 3 Din 2 Din 1 Din 0 Precharge WrEn Wr Driver & Wr Driver & Wr Driver & Wr Driver & - Precharger+ - Precharger+ - Precharger+ - Precharger+ : : : : - Sense Amp+ - Sense Amp+ - Sense Amp+ - Sense Amp+ Word 0 Word 1 Word 15 Address Decoder A0 A1 A2 A3 Dout 3 Dout 2 Dout 1 Dout 0 Hierarchy- 19

Classical DRAM Organization bit (data) lines r o w d e c o d e r RAM Array Each intersection represents a 1-T DRAM word (row) select row address Column Selector & I/O Circuits Column Address data Row and Column Address together: Select 1 bit a time Hierarchy- 20

Main Performance Simple: Wide: CPU/Mux 1 word; Mux/Cache, Bus, N words (Alpha: 64 bits & 256 bits) CPU, Cache, Bus, same width (32 bits) Interleaved: CPU, Cache, Bus 1 word: N Modules (4 Modules); example is word interleaved Hierarchy- 21

Summary First Part Two Different Types of Locality: Temporal Locality (Locality in Time): If an item is referenced, it will tend to be referenced again soon. Spatial Locality (Locality in Space): If an item is referenced, items whose addresses are close by tend to be referenced soon. By taking advantage of the principle of locality: Present the user with as much memory as is available in the cheapest technology. Provide access at the speed offered by the fastest technology. DRAM is slow but cheap and dense: Good choice for presenting the user with a BIG memory system is fast but expensive and not very dense: Good choice for providing the user FAST access time. Hierarchy- 22