Outline. Cache Parameters. Lecture 5 Cache Operation



Similar documents
Discussion Session 9. CS/ECE 552 Ramkumar Ravi 26 Mar 2012

The Quest for Speed - Memory. Cache Memory. A Solution: Memory Hierarchy. Memory Hierarchy

361 Computer Architecture Lecture 14: Cache Memory

Memory ICS 233. Computer Architecture and Assembly Language Prof. Muhamed Mudawar

CS/COE1541: Introduction to Computer Architecture. Memory hierarchy. Sangyeun Cho. Computer Science Department University of Pittsburgh

The Big Picture. Cache Memory CSE Memory Hierarchy (1/3) Disk

Lecture 17: Virtual Memory II. Goals of virtual memory

We r e going to play Final (exam) Jeopardy! "Answers:" "Questions:" - 1 -

This Unit: Caches. CIS 501 Introduction to Computer Architecture. Motivation. Types of Memory

Base Conversion written by Cathy Saxton

Homework # 2. Solutions. 4.1 What are the differences among sequential access, direct access, and random access?

CS 61C: Great Ideas in Computer Architecture Virtual Memory Cont.

Computer Architecture

Counting in base 10, 2 and 16

Computer Organization and Architecture. Characteristics of Memory Systems. Chapter 4 Cache Memory. Location CPU Registers and control unit memory

Application Note 195. ARM11 performance monitor unit. Document number: ARM DAI 195B Issued: 15th February, 2008 Copyright ARM Limited 2007

This Unit: Virtual Memory. CIS 501 Computer Architecture. Readings. A Computer System: Hardware

CS:APP Chapter 4 Computer Architecture. Wrap-Up. William J. Taffe Plymouth State University. using the slides of

1. Give the 16 bit signed (twos complement) representation of the following decimal numbers, and convert to hexadecimal:

Board Notes on Virtual Memory

18-548/ Associativity 9/16/98. 7 Associativity / Memory System Architecture Philip Koopman September 16, 1998

APP INVENTOR. Test Review

Oct: 50 8 = 6 (r = 2) 6 8 = 0 (r = 6) Writing the remainders in reverse order we get: (50) 10 = (62) 8

Table 1 below is a complete list of MPTH commands with descriptions. Table 1 : MPTH Commands. Command Name Code Setting Value Description

CS201: Architecture and Assembly Language

14.1 Rent-or-buy problem

Lecture 2. Binary and Hexadecimal Numbers

Virtual Memory. How is it possible for each process to have contiguous addresses and so many of them? A System Using Virtual Addressing

A Predictive Model for Cache-Based Side Channels in Multicore and Multithreaded Microprocessors

Let s put together a Manual Processor

Useful Number Systems

Traditional IBM Mainframe Operating Principles

Virtual vs Physical Addresses

plc numbers Encoded values; BCD and ASCII Error detection; parity, gray code and checksums

Technical Support Bulletin Nr.18 Modbus Tips

5.4 Solving Percent Problems Using the Percent Equation

This 3-digit ASCII string could also be calculated as n = (Data[2]-0x30) +10*((Data[1]-0x30)+10*(Data[0]-0x30));

Lecture 11: Number Systems

(Refer Slide Time: 00:01:16 min)

=

Chapter 5 Instructor's Manual

CSI 333 Lecture 1 Number Systems

Decimal to Binary Conversion

Chapter 4: Computer Codes

Multiprocessor Cache Coherence

isppac-powr1220at8 I 2 C Hardware Verification Utility User s Guide

Bindel, Spring 2010 Applications of Parallel Computers (CS 5220) Week 1: Wednesday, Jan 27

VHDL DESIGN OF EDUCATIONAL, MODERN AND OPEN- ARCHITECTURE CPU

Operating Systems. Virtual Memory

Chapter 1 Computer System Overview

Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay

CS101 Lecture 11: Number Systems and Binary Numbers. Aaron Stevens 14 February 2011

Internet Protocol Address

Binary Adders: Half Adders and Full Adders

The Hexadecimal Number System and Memory Addressing

Binary Number System. 16. Binary Numbers. Base 10 digits: Base 2 digits: 0 1

Data Memory Alternatives for Multiscalar Processors

Experiment # 9. Clock generator circuits & Counters. Eng. Waleed Y. Mousa

BCD (ASCII) Arithmetic. Where and Why is BCD used? Packed BCD, ASCII, Unpacked BCD. BCD Adjustment Instructions AAA. Example

Outline: Operating Systems

CSE373: Data Structures and Algorithms Lecture 3: Math Review; Algorithm Analysis. Linda Shapiro Winter 2015

CPEN Digital Logic Design Binary Systems

Intel P6 Systemprogrammering 2007 Föreläsning 5 P6/Linux Memory System

Section 1.4 Place Value Systems of Numeration in Other Bases

Here is a diagram of a simple computer system: (this diagram will be the one needed for exams) CPU. cache

Lecture 15. IP address space managed by Internet Assigned Numbers Authority (IANA)

ELEG3924 Microprocessor Ch.7 Programming In C

COS 318: Operating Systems. Virtual Memory and Address Translation

13-1. This chapter explains how to use different objects.

Counters. Present State Next State A B A B

Binary Representation

Performance Impacts of Non-blocking Caches in Out-of-order Processors

Digital System Design Prof. D Roychoudhry Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

AES Power Attack Based on Induced Cache Miss and Countermeasure

How To Write A Page Table

MONITORING PERFORMANCE IN WINDOWS 7

The Classical Architecture. Storage 1 / 36

Memory Hierarchy. Arquitectura de Computadoras. Centro de Investigación n y de Estudios Avanzados del IPN. adiaz@cinvestav.mx. MemoryHierarchy- 1

Memory management basics (1) Requirements (1) Objectives. Operating Systems Part of E1.9 - Principles of Computers and Software Engineering

Chapter 7 Lab - Decimal, Binary, Octal, Hexadecimal Numbering Systems

Outlook 2003/2007 NK2 File Format and Developer Guidelines

EE361: Digital Computer Organization Course Syllabus

Lecture 1: Data Storage & Index

ISOBUS Task Controller Workshop. Presented by Andy Beck / Hans Nissen John Deere

TCP/IP Cheat Sheet. A Free Study Guide by Boson Software, LLC

Market Supply in the Short Run

Annotation to the assignments and the solution sheet. Note the following points

Memory is implemented as an array of electronic switches

This Unit: Main Memory. CIS 501 Introduction to Computer Architecture. Memory Hierarchy Review. Readings

USB Simply Buffered (USB) Mass Storage Class Bulk-Only Transport

Lecture 8: Synchronous Digital Systems

Binary Representation. Number Systems. Base 10, Base 2, Base 16. Positional Notation. Conversion of Any Base to Decimal.

NUMBER SYSTEMS. William Stallings

SRF08 Ultra sonic range finder Technical Specification

Transcription:

Lecture Cache Operation ECE / Fall Edward F. Gehringer Based on notes by Drs. Eric Rotenberg & Tom Conte of NCSU Outline Review of cache parameters Example of operation of a direct-mapped cache. Example of operation of a set-associative cache. Simulating a generic cache. Write-through vs. write-back caches. Write-allocate vs. no write-allocate. Victim caches. Cache Parameters SIZE = total amount of cache data storage, in bytes BLOCKSIZE = total number of bytes in a single block ASSOC = associativity, i.e., # of blocks in a set

Cache Parameters (cont.) Equation for # of cache blocks in cache: # cacheblocks? SIZE BLOCKSIZE Equation for # of sets in cache: #sets? #cache blocks SIZE? ASSOC BLOCKSIZE? ASSOC Address Fields index block offset Tag field is compared to the (s) of the indexed cache line(s). If it matches, block is there (hit). If it doesn t match, block is not there (miss). Used to look up a set, whose lines contain one or more memory blocks. (The # of blocks per set is the associativity.) Once a block is found, the offset selects a particular byte or word of data in the block. Address Fields (cont.) Widths of address fields (# bits) # index bits = log (# sets) # block offset bits = log (block size) # bits = # index bits # block offset bits Assuming -bit addresses index block offset

Address (from processor) block index offset hit/miss Byte or word select requested data (byte or word) (to processor) Example of cache operation Example: Processor accesses a - byte direct-mapped cache, which has -byte lines, with the following sequence of addresses. Show contents of the cache after each access. Count # of hits and # of replacements. Example address sequence Address (hex) xffe xbeefc xffe xffe8 x8 x8e x xc x Tag (hex) Index & offset bits (binary) Index (decimal) Comment

Example (cont.) SIZE # sets??? 8 BLOCKSIZE? ASSOC? # index bits = log (# sets) = log (8) = # block offset bits = log (block size) = log ( bytes) = # bits = total # address bits # index bits # block offset bits = bits bits bits = Thus, the top nibbles ( bits) of address form the and lower nibbles (8 bits) of address form the index and block offset fields. Address (hex) xffe xbeefc xffe xffe8 x8 x8e x xc x Tag (hex) xff xbeef xff xff x x8 x x x Index & offset Index Comment bits (binary) (decimal) Miss Miss Hit Hit Miss Miss/Replace Hit Miss/Replace Hit 8 FF offset FF Get block from memory (slow)

8 BEEF offset BEEF FF Get block from memory (slow) 8 FF offset BEEF FF 8 FF offset BEEF FF

8 offset BEEF FF Get block from memory (slow) 8 8 offset BEEF 8 FF Get block from Memory (slow) 8 offset BEEF 8

8 offset BEEF 8 Get block from memory (slow) 8 offset 8 Set-Associative Example Example: Processor accesses a -byte -way set -associative cache, which has block size of bytes, with the following sequence of addresses. Show contents of cache after each access. Count # of hits and # of replacements.

Address (from processor) block index offset? bytes?? bytes? select a block hit or select certain bytes Example (cont.) SIZE # sets??? BLOCKSIZE? ASSOC? # index bits = log (# sets) = log () = # block offset bits = log (block size) = log ( bytes) = # bits = total # address bits # index bits # block offset bits = bits bits bits = Address (hex) xffe xbeefc x8 xffe x8 x8e x Tag (hex) xfe8 xdde x xfe8 x x x Index & offset bits (binary) Index (decimal) Miss Miss Miss Hit Hit Miss/Replace Hit Comment 8

FE8 offset FE8 not shown for convenience DDE offset DDE FE8 not shown for convenience offset DDE FE8 not shown for convenience 9

FE8 offset DDE FE8 not shown for convenience offset DDE FE8 not shown for convenience offset DDE FE8 not shown for convenience

offset DDE not shown for convenience Generic Cache Every cache is an n-way set-associative cache.. Direct-mapped: -way set-associative. n-way set-associative: n-way set-associative. Fully-associative: n-way set-associative, where there is only set containing n blocks # index bits = log () = (equation still works!) Generic Cache (cont.) The same equations hold for any cache type Equation for # of cache blocks in cache: SIZE # cacheblocks? BLOCKSIZE Equation for # of sets in cache: #cache blocks SIZE #sets?? ASSOC BLOCKSIZE? ASSOC Fully-associative: ASSOC = # cache blocks

Generic Cache (cont.) What this means You don t have to treat the three cache types differently in your simulator! Support a generic n-way set -associative cache You don t have to specifically worry about the two extremes (direct-mapped & fully -associative). E.g., even DM cache has LRU information (in simulator), even though it is not used. Replacement Policy Which block in a set should be replaced when a new block has to be allocated? LRU (least recently used) is the most common policy. Implementation: Real hardware maintains LRU bits with each block in set. Simulator: Cheat using sequence numbers as timestamps. Makes simulation easier gives same effect as hardware. A() LRU Example Assume that blocks A, B, C, D, and E all map to the same set Trace: A B C D D D B D E Assign sequence numbers (increment by for each new access). This yields A() B() C() D() D() D() B() D() E(8) A() B() C() D() A() B() A() B() C() D() A() B() C() A() B() C() D() A() B() C() D() A() B() C() D() E(8) B() C() D()

Handling Writes What happens when a write occurs? Two issues. Is just the cache updated with new data, or are lower levels of memory hierarchy updated at same time?. Do we allocate a new block in the cache if the write misses? The Write-Update Question Write-through (WT) policy: Writes that go to the cache are also written through to the next level in the memory hierarchy. cache next level in memory hierarchy The Write-Update Question, cont. Write-back (WB) policy: Writes go only to the cache, and are not (immediately) written through to the next level of the hierarchy. cache next level in memory hierarchy

The Write-Update Question, cont. Write-back, cont. What happens when a line previously written to needs to be replaced?. We need to have a dirty bit (D) with each line in the cache, and set it when line is written to.. When a dirty block is replaced, we need to write the entire block back to next level of memory ( write-back ). The Write-Update Question, cont. With the write-back policy, replacement of a dirty block triggers a writeback of the entire line. next level in memory hierarchy cache D Replacement causes writeback The Write-Allocation Question Write-Allocate (WA) Bring the block into the cache if the write misses (handled just like a read miss). Typically used with write-back policy: WBWA Write-No-Allocate (NA) Do not bring the block into the cache if the write misses. Must be used in conjunction with write-through: WTNA.

The Write-Allocation Question, cont. WTNA (scenario: the write misses) cache write miss next level in memory hierarchy The Write Allocation Question, cont. WBWA (scenario: the write misses) cache D next level in memory hierarchy Victim Caches A victim cache is a small fully-associative cache that sits alongside primary cache E.g., holds cache blocks Management is slightly weird compared to conventional caches When main cache evicts (replaces) a block, the victim cache will take the evicted (replaced) block. The evicted block is called the victim. When the main cache misses, the victim cache is searched for recently discarded blocks. A victim-cache hit means the main cache doesn t have to go to memory for the block.

Victim-Cache Example Suppose we have a -entry victim cache It initially holds blocks X and Y. Y is the LRU block in the victim cache. The main cache is direct-mapped. Blocks A and B map to the same set in main cache Trace: A B A B A B Victim-Cache Example, cont. A X Y (LRU) Main cache Victim cache Victim-Cache Example, cont. B. B misses in main cache and evicts A.. A goes to victim cache & replaces Y. (the previous LRU). X becomes LRU. X (LRU) A Main cache Victim cache

Victim-Cache Example, cont. A. A misses in L but hits in victim cache, so A and B swap positions:. A is moved from victim cache to L, and B (the victim) goes to victim cache where A was located. (Note: We don t replace the LRU block, X, in case of victim-cache hit) X (LRU) B L cache Victim cache Victim Cache Why? Direct-mapped caches suffer badly from repeated conflicts Victim cache provides illusion of set associativity. A poor-man s version of set associativity. A victim cache does not have to be large to be effective; even a 8 entry victim cache will frequently remove > % of conflict misses.