Direct mapped cache. Adapted from Computer Organization and Design, 4 th edition, Patterson and Hennessy

Similar documents
The Quest for Speed - Memory. Cache Memory. A Solution: Memory Hierarchy. Memory Hierarchy

361 Computer Architecture Lecture 14: Cache Memory

The Big Picture. Cache Memory CSE Memory Hierarchy (1/3) Disk

Virtual Memory. How is it possible for each process to have contiguous addresses and so many of them? A System Using Virtual Addressing

18-548/ Associativity 9/16/98. 7 Associativity / Memory System Architecture Philip Koopman September 16, 1998

Slide Set 8. for ENCM 369 Winter 2015 Lecture Section 01. Steve Norman, PhD, PEng

Board Notes on Virtual Memory

Memory Hierarchy. Arquitectura de Computadoras. Centro de Investigación n y de Estudios Avanzados del IPN. adiaz@cinvestav.mx. MemoryHierarchy- 1

Lecture 17: Virtual Memory II. Goals of virtual memory

Chapter 10: Virtual Memory. Lesson 03: Page tables and address translation process using page tables

Memory ICS 233. Computer Architecture and Assembly Language Prof. Muhamed Mudawar

Computer Organization and Architecture. Characteristics of Memory Systems. Chapter 4 Cache Memory. Location CPU Registers and control unit memory

EE361: Digital Computer Organization Course Syllabus

CS 61C: Great Ideas in Computer Architecture Virtual Memory Cont.

COS 318: Operating Systems. Virtual Memory and Address Translation

Operating Systems. Virtual Memory

COMPUTER HARDWARE. Input- Output and Communication Memory Systems

Lecture 3: Modern GPUs A Hardware Perspective Mohamed Zahran (aka Z) mzahran@cs.nyu.edu

Chapter 1 Computer System Overview

CS/COE1541: Introduction to Computer Architecture. Memory hierarchy. Sangyeun Cho. Computer Science Department University of Pittsburgh

In-Memory Databases Algorithms and Data Structures on Modern Hardware. Martin Faust David Schwalb Jens Krüger Jürgen Müller

We r e going to play Final (exam) Jeopardy! "Answers:" "Questions:" - 1 -

Virtual Memory Paging

Virtual vs Physical Addresses

Introducción. Diseño de sistemas digitales.1

Computer Architecture

Putting it all together: Intel Nehalem.

CSCA0102 IT & Business Applications. Foundation in Business Information Technology School of Engineering & Computing Sciences FTMS College Global

COMPUTER ORGANIZATION ARCHITECTURES FOR EMBEDDED COMPUTING

Bindel, Spring 2010 Applications of Parallel Computers (CS 5220) Week 1: Wednesday, Jan 27

What is a bus? A Bus is: Advantages of Buses. Disadvantage of Buses. Master versus Slave. The General Organization of a Bus

Outline. Cache Parameters. Lecture 5 Cache Operation

Vorlesung Rechnerarchitektur 2 Seite 178 DASH

CPU Organization and Assembly Language

Exploiting Address Space Contiguity to Accelerate TLB Miss Handling

Logical Operations. Control Unit. Contents. Arithmetic Operations. Objectives. The Central Processing Unit: Arithmetic / Logic Unit.

OPERATING SYSTEM - VIRTUAL MEMORY

Thread level parallelism

Chapter 2 Data Storage

This Unit: Virtual Memory. CIS 501 Computer Architecture. Readings. A Computer System: Hardware

The Central Processing Unit:

Homework # 2. Solutions. 4.1 What are the differences among sequential access, direct access, and random access?

CSE 160 Lecture 5. The Memory Hierarchy False Sharing Cache Coherence and Consistency. Scott B. Baden

This Unit: Caches. CIS 501 Introduction to Computer Architecture. Motivation. Types of Memory

Pentium vs. Power PC Computer Architecture and PCI Bus Interface

Computers. Hardware. The Central Processing Unit (CPU) CMPT 125: Lecture 1: Understanding the Computer

OPERATING SYSTEMS MEMORY MANAGEMENT

Main Memory. Memories. Hauptspeicher. SIMM: single inline memory module 72 Pins. DIMM: dual inline memory module 168 Pins

A3 Computer Architecture

Memory Management CS 217. Two programs can t control all of memory simultaneously

W4118: segmentation and paging. Instructor: Junfeng Yang

I/O. Input/Output. Types of devices. Interface. Computer hardware

StrongARM** SA-110 Microprocessor Instruction Timing

Intel P6 Systemprogrammering 2007 Föreläsning 5 P6/Linux Memory System

Memory management basics (1) Requirements (1) Objectives. Operating Systems Part of E1.9 - Principles of Computers and Software Engineering

CS352H: Computer Systems Architecture

OC By Arsene Fansi T. POLIMI

Chapter 11 I/O Management and Disk Scheduling

OPENSPARC T1 OVERVIEW

Disk Storage & Dependability

Introduction to RISC Processor. ni logic Pvt. Ltd., Pune

Peter J. Denning, Naval Postgraduate School, Monterey, California

6.033 Computer System Engineering

Computer Performance. Topic 3. Contents. Prerequisite knowledge Before studying this topic you should be able to:

Intel Itanium Quad-Core Architecture for the Enterprise. Lambert Schaelicke Eric DeLano

COSC 6374 Parallel Computation. Parallel I/O (I) I/O basics. Concept of a clusters

Why Latency Lags Bandwidth, and What it Means to Computing

Computer Architecture

CS:APP Chapter 4 Computer Architecture. Wrap-Up. William J. Taffe Plymouth State University. using the slides of

How To Write A Page Table

2

Computer Organization and Components

Chapter 10: Virtual Memory. Lesson 08: Demand Paging and Page Swapping

ADVANCED PROCESSOR ARCHITECTURES AND MEMORY ORGANISATION Lesson-17: Memory organisation, and types of memory

CSC 2405: Computer Systems II

WAR: Write After Read

Hypervisor: Requirement Document (Version 3)

Lecture 4. Parallel Programming II. Homework & Reading. Page 1. Projects handout On Friday Form teams, groups of two

Parallel Computing 37 (2011) Contents lists available at ScienceDirect. Parallel Computing. journal homepage:

CHAPTER 4 MARIE: An Introduction to a Simple Computer

Creative Commons Attribution-Share 3.0 United States License

1. Memory technology & Hierarchy

Memory Management Outline. Background Swapping Contiguous Memory Allocation Paging Segmentation Segmented Paging

Data Storage - I: Memory Hierarchies & Disks

Solution: start more than one instruction in the same clock cycle CPI < 1 (or IPC > 1, Instructions per Cycle) Two approaches:

Giving credit where credit is due

CISC, RISC, and DSP Microprocessors

CS 464/564 Introduction to Database Management System Instructor: Abdullah Mueen


Physical Data Organization

The 5 Minute Rule for Trading Memory for Disc Accesses and the 5 Byte Rule for Trading Memory for CPU Time

Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software

Choosing a Computer for Running SLX, P3D, and P5

Chapter 2 Basic Structure of Computers. Jin-Fu Li Department of Electrical Engineering National Central University Jungli, Taiwan

Data Link Layer(1) Principal service: Transferring data from the network layer of the source machine to the one of the destination machine

Execution Cycle. Pipelining. IF and ID Stages. Simple MIPS Instruction Formats

Technical Note. Micron NAND Flash Controller via Xilinx Spartan -3 FPGA. Overview. TN-29-06: NAND Flash Controller on Spartan-3 Overview

Architecture of Hitachi SR-8000

Storing Data: Disks and Files

EFFICIENT EXTERNAL SORTING ON FLASH MEMORY EMBEDDED DEVICES

Transcription:

Lecture 14 Direct mapped cache Adapted from Computer Organization and Design, 4 th edition, Patterson and Hennessy

A Typical Memory Hierarchy Take advantage of the principle of locality to present the user with as much memory as is available in the cheapest technology at the speed offered by the fastest technology On Chip Components Control Datapath RegFile IT TLB DTLB In nstr Ca che Data Cache Second Level Cache (SRAM) Main Memory (DRAM) Secondary Memory (Disk) Speed (%cycles): ½ s 1 s 10 s 100 s 10,000 s Size (bytes): 100 s 10K s M s G s T s Cost: highest lowest

Characteristics of the Memory Hierarchy Increasing distance from the processor in access time Processor 4 8 bytes (word) L1$ 8 32 bytes (block) L2$ 1 to 4 blocks Main Memory Inclusive what is in L1$ is a subset of what is in L2$ is a subset of what is in MM that is a subset of is in SM Secondary Memory 1,024+ bytes (disk sector = page) (Relative) size of the memory at each level

Cache Basics Two questions to answer (in hardware): Q1: How do we know if a data item is in the cache? Q2: If it is, how do we find it? Direct mapped Each memory block is mapped to exactly one block in the cache lots of lower level blocks must share blocks in the cache Address mapping (to answer Q2): (block address) modulo (# of blocks in the cache) Have a tag associated with each cache block that contains the address information (the upper portion of the address) required to identify the block (to answer Q1)

Caching: A Simple First Example Main Memory 0000xx Cache 0001xx One word blocks 0010xx Two low order bits Index Valid Tag Data define the byte in the 0011xx word (32b words) 00 0100xx 01 0101xx 10 0110xx 11 0111xx Q2: How do we find it? 1000xx 1001xx 1010xx Use next 2 low order Q1: Is it there? 1011xx memory address bits 1100xx the index to determine Comparethecache tag 1101xx which cache block (i.e., to the high order 2 1110xx modulo the number of memory address bits to 1111xx blocks in the cache) tell if the memory block is in the cache (block address) modulo (# of blocks in the cache)

Cache Index 00 01 10 11 Caching: A Simple First Example Valid Tag Q1: Is it there? Data Comparethecache tag to the high order 2 memory address bits to tell if the memory block is in the cache Main Memory 0000xx 0001xx One word blocks 0010xx Two low order bits define the byte in the 0011xx word (32b words) 0100xx 0101xx 0110xx 0111xx Q2: How do we find it? 1000xx 1001xx 1010xx Use next 2 low order 1011xx memory address bits 1100xx the index to determine 1101xx which cache block (i.e., 1110xx 1111xx modulo the number of blocks in the cache) (block address) modulo (# of blocks in the cache)

MIPS Direct Mapped Cache Example One word blocks, cache size = 1K words (or 4KB) 31 30... 13 12 11... 2 1 0 Byte offset Hit Tag 20 Index 10 Data Index 0 1 2... 1021 1022 1023 Valid Tag Data 20 32 What kind of locality are we taking advantage of?

Direct Mapped Cache Consider the main memory word reference string Start with an empty cache all blocks initially marked as not valid 0 1 2 3 4 3 4 15 0 1 2 3 4 3 4 15

Direct Mapped Cache Consider the main memory word reference string Start with an empty cache all blocks initially marked as not valid 0 1 2 3 4 3 4 15 0 miss 1 miss 2 miss 3 miss 00 Mem(0) 00 Mem(0) 00 Mem(1) 00 Mem(0) 00 Mem(1) 00 Mem(0) 00 Mem(1) 00 Mem(2) 00 Mem(2) 00 Mem(3) 01 4 miss 3 hit 4 hit 15 miss 4 00 Mem(0) 01 Mem(4) 01 Mem(4) 01 Mem(4) 00 Mem(1) 00 Mem(1) 00 Mem(1) 00 Mem(1) 00 Mem(2) 00 Mem(2) 00 Mem(2) 00 Mem(2) 00 Mem(3) 00 Mem(3) 00 Mem(3) 11 00 Mem(3) 15 8 requests, 6 misses

Multiword Block Direct Mapped Cache Four words/block, cache size = 1K words Hit 31 30... 13 12 11... 4 3 2 1 0 Byte offset Data Tag 20 Index 8 Block offset Index Valid 0 1 2... 253 254 255 Tag 20 Data What kind of locality are we taking advantage of? 32

Taking Advantage of Spatial Locality Let cache block hold more than one word Start with an empty cache all blocks initially marked as not valid 0 1 2 3 4 3 4 15 0 1 2 3 4 3 4 15

Taking Advantage of Spatial Locality Let cache block hold more than one word Start with an empty cache all blocks initially marked as not valid 0 1 2 3 4 3 4 15 0 miss 1 hit 2 miss 00 Mem(1) Mem(0) 00 Mem(1) Mem(0) 00 Mem(1) Mem(0) 00 Mem(3) Mem(2) 3 hit 4 miss 3 hit 01 5 4 00 Mem(1) Mem(0) 00 Mem(1) Mem(0) 01 Mem(5) Mem(4) 00 Mem(3) Mem(2) 00 Mem(3) Mem(2) 00 Mem(3) Mem(2) 4 hit 15 miss 01 Mem(5) Mem(4) 00 Mem(3) Mem(2) 8 requests, 4 misses 1101 Mem(5) Mem(4) 15 14 00 Mem(3) Mem(2)

Miss Rate vs Block Size vs Cache Size 10 Mis ss rate (% %) 5 8 KB 16 KB 64 KB 256 KB 0 16 32 64 128 256 Block size (bytes) Miss rate goes up if the block size becomes a significant fraction of the cache size because the number of blocks that can be held inthe samesizesize cache issmaller smaller (increasing capacitymisses)

Handling Cache Misses (Single Word Blocks) Read misses (I$ and D$) stall the pipeline, fetch the block from the next level in the memory hierarchy, install it in the cache and send the requested word to the processor, then let the pp pipeline resume Write misses (D$ only) 1. stall the pipeline, fetch the block from next level in the memory hierarchy, install it in the cache (which may involve having to evict a dirty block if using a write back cache), write the word from the processor to the cache, then let the pipeline resume or 2. Write allocate just write the word into the cache updating both the tag and data, no need to check for cache hit, no need to stall or 3. No write allocate skip the cache write (but must invalidate that cache block since it will now hold stale data) and just write the word to the write buffer (and eventually to the next memory level), no need to stall if the write buffer isn t full

Multiword Block Considerations Read misses ($ (I$ and D$) Processed the same as for single word blocks a miss returns the entire block from memory Miss penalty grows as block size grows Early restart processor resumes execution as soon as the requested word of the block is returned Requested word first requested word is transferred from the memory to the cache (and processor) first Nonblocking cache allows the processor to continue to access the cache while the cache is handling an earlier miss Write misses (D$) If using write allocate must first fetch the block from memory and then write the word to the block (or could end up with a garbled block in the cache (e.g., for 4 word blocks, a new tag, one word of data from the new block, and three words of data from the old block)