This Unit: Main Memory. CIS 501 Introduction to Computer Architecture. Memory Hierarchy Review. Readings



Similar documents
This Unit: Virtual Memory. CIS 501 Computer Architecture. Readings. A Computer System: Hardware

Memory Hierarchy II: Main Memory

Lecture 17: Virtual Memory II. Goals of virtual memory

CS 61C: Great Ideas in Computer Architecture Virtual Memory Cont.

361 Computer Architecture Lecture 14: Cache Memory

Virtual Memory. How is it possible for each process to have contiguous addresses and so many of them? A System Using Virtual Addressing

The Quest for Speed - Memory. Cache Memory. A Solution: Memory Hierarchy. Memory Hierarchy

COS 318: Operating Systems. Virtual Memory and Address Translation

W4118: segmentation and paging. Instructor: Junfeng Yang

CS161: Operating Systems

Chapter 1 Computer System Overview

Intel P6 Systemprogrammering 2007 Föreläsning 5 P6/Linux Memory System

CS/COE1541: Introduction to Computer Architecture. Memory hierarchy. Sangyeun Cho. Computer Science Department University of Pittsburgh

CS5460: Operating Systems

Computer Systems Structure Main Memory Organization

Chapter 10: Virtual Memory. Lesson 03: Page tables and address translation process using page tables

Price/performance Modern Memory Hierarchy

Memory ICS 233. Computer Architecture and Assembly Language Prof. Muhamed Mudawar

Virtual vs Physical Addresses

Disk Storage & Dependability

Board Notes on Virtual Memory

Slide Set 8. for ENCM 369 Winter 2015 Lecture Section 01. Steve Norman, PhD, PEng

Memory Management Outline. Background Swapping Contiguous Memory Allocation Paging Segmentation Segmented Paging

Computer Organization and Architecture. Characteristics of Memory Systems. Chapter 4 Cache Memory. Location CPU Registers and control unit memory

1. Memory technology & Hierarchy

Operating Systems CSE 410, Spring File Management. Stephen Wagner Michigan State University

This Unit: Putting It All Together. CIS 501 Computer Architecture. Sources. What is Computer Architecture?

1 File Management. 1.1 Naming. COMP 242 Class Notes Section 6: File Management

Chapter 6, The Operating System Machine Level

The Classical Architecture. Storage 1 / 36

Computer Architecture

Memory Hierarchy. Arquitectura de Computadoras. Centro de Investigación n y de Estudios Avanzados del IPN. adiaz@cinvestav.mx. MemoryHierarchy- 1

Physical Data Organization

Computer Architecture

18-548/ Associativity 9/16/98. 7 Associativity / Memory System Architecture Philip Koopman September 16, 1998

Random Access Memory (RAM) Types of RAM. RAM Random Access Memory Jamie Tees SDRAM. Micro-DIMM SO-DIMM

This Unit: Caches. CIS 501 Introduction to Computer Architecture. Motivation. Types of Memory

Chapter 11 I/O Management and Disk Scheduling

COMPUTER HARDWARE. Input- Output and Communication Memory Systems

The basic CAP CPU consists of a microprogramming control unit, 4K 16-bit words of micro-control storage, and an 79

Bindel, Spring 2010 Applications of Parallel Computers (CS 5220) Week 1: Wednesday, Jan 27

1 Storage Devices Summary

Operating Systems. Virtual Memory

Outline: Operating Systems

The Big Picture. Cache Memory CSE Memory Hierarchy (1/3) Disk

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 13-1

Lesson Objectives. To provide a grand tour of the major operating systems components To provide coverage of basic computer system organization

We r e going to play Final (exam) Jeopardy! "Answers:" "Questions:" - 1 -

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Fall 2007 Lecture 5 - DBMS Architecture

Introduction Disks RAID Tertiary storage. Mass Storage. CMSC 412, University of Maryland. Guest lecturer: David Hovemeyer.

Virtual Memory. Virtual Memory. Paging. CSE 380 Computer Operating Systems. Paging (1)

Exception and Interrupt Handling in ARM

Chapter 13. Disk Storage, Basic File Structures, and Hashing

The World According to the OS. Operating System Support for Database Management. Today s talk. What we see. Banking DB Application

6.033 Computer System Engineering

COSC 6374 Parallel Computation. Parallel I/O (I) I/O basics. Concept of a clusters

Chapter 11 I/O Management and Disk Scheduling

Operating System Overview. Otto J. Anshus

How To Write A Page Table

Introducción. Diseño de sistemas digitales.1

This Unit: Floating Point Arithmetic. CIS 371 Computer Organization and Design. Readings. Floating Point (FP) Numbers

Virtualization. Explain how today s virtualization movement is actually a reinvention

CS 464/564 Introduction to Database Management System Instructor: Abdullah Mueen

Memory management basics (1) Requirements (1) Objectives. Operating Systems Part of E1.9 - Principles of Computers and Software Engineering

Chapter 13 Disk Storage, Basic File Structures, and Hashing.

Computer Architecture

COS 318: Operating Systems

CS 6290 I/O and Storage. Milos Prvulovic

Homework # 2. Solutions. 4.1 What are the differences among sequential access, direct access, and random access?

RAID. RAID 0 No redundancy ( AID?) Just stripe data over multiple disks But it does improve performance. Chapter 6 Storage and Other I/O Topics 29

Data Storage - I: Memory Hierarchies & Disks

Configuring CoreNet Platform Cache (CPC) as SRAM For Use by Linux Applications

In-Memory Databases Algorithms and Data Structures on Modern Hardware. Martin Faust David Schwalb Jens Krüger Jürgen Müller

SAP HANA - Main Memory Technology: A Challenge for Development of Business Applications. Jürgen Primsch, SAP AG July 2011

File Management. Chapter 12

COS 318: Operating Systems

The Central Processing Unit:

Chapter Introduction. Storage and Other I/O Topics. p. 570( 頁 585) Fig I/O devices can be characterized by. I/O bus connections

Microcontrollers Deserve Protection Too

Binary search tree with SIMD bandwidth optimization using SSE

Uses for Virtual Machines. Virtual Machines. There are several uses for virtual machines:

VMware and CPU Virtualization Technology. Jack Lo Sr. Director, R&D

POSIX. RTOSes Part I. POSIX Versions. POSIX Versions (2)

evm Virtualization Platform for Windows

Chapter 13: Query Processing. Basic Steps in Query Processing

An Implementation Of Multiprocessor Linux

DELL RAID PRIMER DELL PERC RAID CONTROLLERS. Joe H. Trickey III. Dell Storage RAID Product Marketing. John Seward. Dell Storage RAID Engineering

ProTrack: A Simple Provenance-tracking Filesystem

RAM. Overview DRAM. What RAM means? DRAM

Lecture 16: Storage Devices

Intel s Virtualization Extensions (VT-x) So you want to build a hypervisor?

Memory Systems. Static Random Access Memory (SRAM) Cell

Operating Systems, 6 th ed. Test Bank Chapter 7

CS 377: Operating Systems. Outline. A review of what you ve learned, and how it applies to a real operating system. Lecture 25 - Linux Case Study

Chapter 2 Basic Structure of Computers. Jin-Fu Li Department of Electrical Engineering National Central University Jungli, Taiwan

Review from last time. CS 537 Lecture 3 OS Structure. OS structure. What you should learn from this lecture

Computer Systems Structure Input/Output

Transcription:

This Unit: Main Memory CIS 501 Introduction to Computer Architecture Unit 4: Memory Hierarchy II: Main Memory Application OS Compiler Firmware CPU I/O Memory Digital Circuits Gates & Transistors Memory hierarchy review Virtual memory Address translation and page tables Virtual memory s impact on caches Page-based protection Organizing a memory system Bandwidth matching Error correction CIS 501 (Martin/Roth): Main Memory 1 CIS 501 (Martin/Roth): Main Memory 2 Readings P+H Chapter 5.8-5.18 Skip Intel Pentium example in 5.11 Skim 5.14, 5.15, 5.18 Memory Hierarchy Review Storage: registers, memory, disk Memory is the fundamental element Memory component performance t avg = t hit + % miss * t miss Can t get both low t hit and % miss in a single structure Memory hierarchy Upper components: small, fast, expensive Lower components: big, slow, cheap t avg of hierarchy is close to t hit of upper (fastest) component 10/90 rule: 90% of stuff found in fastest component Temporal/spatial locality: automatic up-down data movement CIS 501 (Martin/Roth): Main Memory 3 CIS 501 (Martin/Roth): Main Memory 4

Concrete Memory Hierarchy Memory Organization I$ CPU L2 D$ Main Memory Disk 1st/2nd levels: caches (I$, D$, L2) Made of SRAM Last unit 3rd level: main memory Made of DRAM Managed in software This unit 4th level: disk (swap space) Made of magnetic iron oxide discs Manage in software Next unit Paged virtual memory Programs what a conceptually view of a memory of unlimited size Use disk as a backing store when physical memory is exhausted Memory acts like a cache, managed (mostly) by software How is the memory as a cache organized? Block size? Pages that are typically 4KB or larger Associativity? Fully associative Replacement policy? In software Write-back vs. write-through? Write-back Write-allocate vs. write-non-allocate? Write allocate CIS 501 (Martin/Roth): Main Memory 5 CIS 501 (Martin/Roth): Main Memory 6 Low % miss At All Costs Memory Organization Parameters For a memory component: t hit vs. % miss tradeoff Parameter I$/D$ L2 Main Memory t hit 1-2ns 5-15ns 100ns Upper components (I$, D$) emphasize low t hit Frequent access! minimal t hit important t miss is not bad! minimal % miss less important Low capacity/associativity/block-size, write-back or write-thru t miss Capacity Block size Associativity 5-15ns 8 64KB 16 32B 1 4 100ns 256KB 8MB 32 256B 4 16 10ms (10M ns) 256MB 4GB 8 64KB pages Full Replacement Policy LRU/NMRU LRU/NMRU working set Moving down (L2) emphasis turns to % miss Infrequent access! minimal t hit less important t miss is bad! minimal % miss important High capacity/associativity/block size, write-back Write-through? Write-allocate? Write buffer? Victim buffer? Either Either Yes Yes No Yes Yes No No Yes No No For memory, emphasis entirely on % miss t miss is disk access time (measured in ms, not ns) Prefetching? Either Yes Either CIS 501 (Martin/Roth): Main Memory 7 CIS 501 (Martin/Roth): Main Memory 8

Software Managed Memory Isn t full associativity difficult to implement? Yes in hardware Implement fully associative memory in software Let s take a step back Virtual Memory Idea of treating memory like a cache Contents are a dynamic subset of program s address space Dynamic content management transparent to program Actually predates caches (by a little) Original motivation: compatibility IBM System 370: a family of computers with one software suite + Same program could run on machines with different memory sizes Caching mechanism made it appear as if memory was 2 N bytes Regardless of how much there actually was Prior, programmers explicitly accounted for memory size Virtual memory Virtual: in effect, but not in actuality (i.e., appears to be, but isn t) CIS 501 (Martin/Roth): Main Memory 9 CIS 501 (Martin/Roth): Main Memory 10 Virtual Memory Program code heap stack Main Memory Disk Programs use virtual addresses () 0 2 N 1 size also referred to as machine size E.g., Pentium4 is 32-bit, Alpha is 64-bit Memory uses physical addresses () 0 2 M 1 (typically M<N, especially if N=64) 2 M is most physical memory machine supports! at page granularity (VP!PP) By system Mapping need not preserve contiguity VP need not be mapped to any PP Unmapped VPs live on disk (swap) CIS 501 (Martin/Roth): Main Memory 11 Uses of Virtual Memory Virtual memory is quite a useful feature Automatic, transparent memory management just one use Functionality problems are solved by adding levels of indirection Example: program isolation and multiprogramming Each process thinks it has 2 N bytes of address space Each thinks its stack starts at address 0xFFFFFFFF System maps VPs from different processes to different PPs + Prevents processes from reading/writing each other s memory Program1 Program2 CIS 501 (Martin/Roth): Main Memory 12

More Uses of Virtual Memory Isolation and Protection Piggy-back mechanism to implement page-level protection Map virtual page to physical page and to Read/Write/Execute protection bits in page table In multi-user systems Prevent user from accessing another s memory Only the operating system can see all system memory Attempt to illegal access, to execute data, to write read-only data? Exception! OS terminates program More later Inter-process communication Map virtual pages in different processes to same physical page Share files via the UNIX mmap() call Address Translation virtual address[31:0] physical address[25:0] VPN[31:16] translate PPN[27:16]! mapping called address translation Split into virtual page number (VPN) and page offset (POFS) Translate VPN into physical page number (PPN) POFS is not translated! = [VPN, POFS]! [PPN, POFS] Example above 64KB pages! 16-bit POFS 32-bit machine! 32-bit! 16-bit VPN Maximum 256MB memory! 28-bit! 12-bit PPN POFS[15:0] don t touch POFS[15:0] CIS 501 (Martin/Roth): Main Memory 13 CIS 501 (Martin/Roth): Main Memory 14 Mechanics of Address Translation How are addresses translated? In software (now) but with hardware acceleration (a little later) Each process allocated a page table (PT) Managed by the operating system PT Maps VPs to PPs or to disk (swap) addresses VP entries empty if page never referenced Translation is table lookup struct { union { int ppn, disk_block; } int is_valid, is_dirty; } PTE; struct PTE pt[num_virtual_ges]; vpn Page Table Size How big is a page table on the following machine? 4B page table entries (PTEs) 32-bit machine 4KB pages 32-bit machine! 32-bit! 4GB virtual memory 4GB virtual memory / 4KB page size! 1M VPs 1M VPs * 4B PTE! 4MB How big would the page table be with 64KB pages? How big would it be for a 64-bit machine? int translate(int vpn) { if (pt[vpn].is_valid) return pt[vpn].ppn; } Disk(swap) Page tables can get big There are ways of making them smaller CIS 501 (Martin/Roth): Main Memory 15 CIS 501 (Martin/Roth): Main Memory 16

Multi-Level Page Table One way: multi-level page tables Tree of page tables Lowest-level tables hold PTEs Upper-level tables hold pointers to lower-level tables Different parts of VPN used to index different levels Example: two-level page table for machine on last slide Compute number of pages needed for lowest-level (PTEs) 4KB pages / 4B PTEs! 1K PTEs/page 1M PTEs / (1K PTEs/page)! 1K pages Compute number of pages needed for upper-level (pointers) 1K lowest-level pages! 1K pointers 1K pointers * 32-bit! 4KB! 1 upper level page Multi-Level Page Table 20-bit VPN Upper 10 bits index 1st-level table Lower 10 bits index 2nd-level table struct { union { int ppn, disk_block; } int is_valid, is_dirty; } PTE; struct { struct PTE ptes[1024]; } L2PT; struct L2PT *pt[1024]; VPN[19:10] pt root int translate(int vpn) { struct L2PT *l2pt = pt[vpn>>10]; if (l2pt && l2pt->ptes[vpn&1023].is_valid) return l2pt->ptes[vpn&1023].ppn; } VPN[9:0] 1st-level pointers 2nd-level PTEs CIS 501 (Martin/Roth): Main Memory 17 CIS 501 (Martin/Roth): Main Memory 18 Multi-Level Page Table (PT) Have we saved any space? Isn t total size of 2nd level tables same as single-level table (i.e., 4MB)? Yes, but Large virtual address regions unused Corresponding 2nd-level tables need not exist Corresponding 1st-level pointers are null Example: 2MB code, 64KB stack, 16MB heap Each 2nd-level table maps 4MB of virtual addresses 1 for code, 1 for stack, 4 for heap, (+1 1st-level) 7 total pages = 28KB (much less than 4MB) CIS 501 (Martin/Roth): Main Memory 19 Address Translation Mechanics The six questions What? address translation Why? compatibility, multi-programming, protection How? page table Who performs it? When do you translate? Where does page table reside? Conceptual view: Translate virtual address before every cache access Walk the page table for every load/store/instruction-fetch Disallow program from modifying its own page table entries Actual approach: Cache translations in a translation cache to avoid repeated lookup CIS 501 (Martin/Roth): Main Memory 20

Translation Lookaside Buffer I$ CPU L2 D$ Main Memory Functionality problem? add indirection Performance problem? add cache Address translation too slow? Cache translations in translation lookaside buffer () Small cache: 16 64 entries Often fully associative + Exploits temporal locality in page table (PT) What if an entry isn t found in the? Invoke miss handler tag VPN VPN VPN data PPN PPN PPN CIS 501 (Martin/Roth): Main Memory 21 Misses and Miss Handling miss: requested PTE not in, search page table Software routine, e.g., Alpha Special instructions for accessing directly Latency: one or two memory accesses + OS call Hardware finite state machine (FSM), e.g., x86 Store page table root in hardware register Page table root and table pointers are physical addresses + Latency: saves cost of OS call In both cases, reads use the the standard cache hierarchy + Allows caches to help speed up search of the page table Nested miss: miss handler itself misses in the Solution #1: Allow recursive misses (very tricky) Solution #2: Lock entries for page table into Solution #3: Avoid problem using physical address in page table CIS 501 (Martin/Roth): Main Memory 22 Page Faults Physical Caches Page fault: PTE not in or page table Page is simply not in memory Starts out as a miss, detected by OS handler/hardware FSM OS routine Choose a physical page to replace Working set : more refined software version of LRU Tries to see which pages are actively being used Balances needs of all current running applications If dirty, write to disk Read missing page from disk Takes so long (~10ms), OS schedules another task Treat like a normal miss from here I$ CPU L2 D$ Main Memory Memory hierarchy so far: physical caches Indexed and tagged by s Translate to to at the outset + Cached inter-process communication works Single copy indexed by Slow: adds at least one cycle to t hit CIS 501 (Martin/Roth): Main Memory 23 CIS 501 (Martin/Roth): Main Memory 24

Virtual Caches Hybrid: Parallel /Cache Access I$ CPU L2 D$ Main Memory Alternative: virtual caches Indexed and tagged by s Translate to s only to access L2 + Fast: avoids translation latency in common case Problem: s from different processes are distinct physical locations (with different values) What to do on process switches? Flush caches? Slow Add process IDs to cache tags Does inter-process communication work? Aliasing: multiple s map to same Can t allow same in the cache twice Also a problem for DMA I/O Can be handled, but very complicated I$ CPU L2 D$ Main Memory Compromise: access in parallel In small caches, index of and the same Use the to generate index Tagged by s Cache access and address translation in parallel + No context-switching/aliasing problems + Fast: no additional t hit cycles Common organization in processors today Note: itself can t use this trick Must have virtual tags Adds process IDs to tags CIS 501 (Martin/Roth): Main Memory 25 CIS 501 (Martin/Roth): Main Memory 26 Parallel Cache/ Access Two ways to look at Cache: tag+index+offset Fully associative : VPN+page offset == == Parallel cache/ If address translation == doesn t change index VPN/index don t overlap hit/miss cache hit/miss virtual tag [31:12] index [11:5] address CIS 501 (Martin/Roth): Main Memory 27 == VPN [31:16] page offset [15:0] cache [4:0] data Cache Size And Page Size [31:12]? index [11:5] [4:0] VPN [31:16] page offset [15:0] Relationship between page size and L1 cache size Forced by non-overlap between VPN and IDX portions of Which is required for access Rule: (cache size) / (associativity)! page size Result: associativity increases allowable cache sizes Systems are moving towards bigger (64KB) pages To use parallel translation with bigger caches To amortize disk latency Example: Pentium 4, 4KB pages, 8KB, 2-way SA L1 data cache If cache is too big, same issues as virtually-indexed caches No relationship between page size and L2 cache size CIS 501 (Martin/Roth): Main Memory 28

Organization Like caches: s also have ABCs Capacity Associativity (At least 4-way associative, fully-associative common) What does it mean for a to have a block size of two? Two consecutive VPs share a single tag Like caches: there can be L2 s Why? Think about this Rule of thumb: should cover L2 contents In other words: (#PTEs in ) * page size! L2 size Why? Think about relative miss latency in each Virtual Memory Virtual memory ubiquitous today Certainly in general-purpose (in a computer) processors But even many embedded (in non-computer) processors support it Several forms of virtual memory Paging (aka flat memory): equal sized translation blocks Most systems do this Segmentation: variable sized (overlapping?) translation blocks x86 used this rather than 32-bits to break 16-bit (64KB) limit Makes life hell Paged segments: don t ask How does virtual memory work when system starts up? CIS 501 (Martin/Roth): Main Memory 29 CIS 501 (Martin/Roth): Main Memory 30 Memory Protection and Isolation Stack Smashing via Buffer Overflow Most important role of virtual memory today Virtual memory protects applications from one another OS uses indirection to isolate applications One buggy program should not corrupt the OS or other programs + Comes for free with translation However, the protection is limited What about protection from Viruses and worms? Stack smashing Malicious/buggy services? Other applications with which you want to communicate int i = 0; char buf[128]; while ((buf[i++] = getc())!= \n ) ; return; Stack smashing via buffer overflow Oldest trick in the virus book Exploits stack frame layout and Sloppy code: length-unchecked copy to stack buffer Attack string : code (128B) + &buf[0] (4B) Caller return address replaced with pointer to attack code Caller return executes attack code at caller s privilege level Vulnerable programs: gzip-1.2.4, sendmail-8.7.5 ra buf[128] ra &buf[0] attack code ra CIS 501 (Martin/Roth): Main Memory 31 CIS 501 (Martin/Roth): Main Memory 32

Page-Level Protection struct { union { int ppn, disk_block; } int is_valid, is_dirty, permissions; } PTE; Page-level protection Piggy-backs on translation infrastructure Each PTE associated with permission bits: Read, Write, execute Read/execute (RX): for code Read (R): read-only data Read/write (RW): read-write data access traps on illegal operations (e.g., write to RX page) To defeat stack-smashing? Set stack permissions to RW Will trap if you try to execute &buf[0] + X bits recently added to x86 for this specific purpose Unfortunately, hackers have many other tricks CIS 501 (Martin/Roth): Main Memory 33 Safe and Efficient Services Scenario: module (application) A wants service B provides A doesn t trust B and vice versa (e.g., B is kernel) How is service provided? Option I: conventional call in same address space + Can easily pass data back and forth (pass pointers) Untrusted module can corrupt your data Option II: trap or cross address space call Copy data across address spaces: slow, hard if data uses pointers + Data is not vulnerable Page-level protection helps somewhat, but Page-level protection can be too coarse grained If modules share address space, both can change protections CIS 501 (Martin/Roth): Main Memory 34 Research: Mondriaan Memory Protection Mondriaan Memory Protection (MMP) Research project from MIT [Witchel+, ASPLOS 00] Separates translation from protection MMP translation: as usual One translation structure (page table) per address space Hardware acceleration: caches translations MMP protection Protection domains: orthogonal to address spaces Services run in same address space but different protection domains One protection structure (protection table) per protection domain Protection bits represented at word-level Two bits per 32-bit word, only 6.25% overhead Hardware acceleration: PLB caches protection bits Brief History of DRAM DRAM (memory): a major force behind computer industry Modern DRAM came with introduction of IC (1970) Preceded by magnetic core memory (1950s) More closely resembles today s disks than memory And by mercury delay lines before that (ENIAC) Re-circulating vibrations in mercury tubes the one single development that put computers on their feet was the invention of a reliable form of memory, namely the core memory It s cost was reasonable, it was reliable, and because it was reliable it could in due course be made large Maurice Wilkes Memoirs of a Computer Programmer, 1985 CIS 501 (Martin/Roth): Main Memory 35 CIS 501 (Martin/Roth): Main Memory 36

DRAM Technology DRAM optimized for density (bits per chip) Capacitor & one transistor In contrast, SRAM has six transistors Capacitor stores charge (for 1) or no charge (for 0) Transistor controls access to the charge Analogy of balloon + valve Destructive read Sensing if the capacitor is charged or not destroy value Solution: every read is immediately followed by a write Refresh Charge leaks away Occasionally read then write back the values Not needed for SRAM High latency (50ns to 150ns) DRAM Bandwidth Use multiple DRAM chips to increase bandwidth Recall, access are the same size as second-level cache Example, 16 2-byte wide chips for 32B access DRAM density increasing faster than demand Result: number of memory chips per system decreasing Need to increase the bandwidth per chip Especially important in game consoles SDRAM! DDR! DDR2 Rambus - high-bandwidth memory Used by several game consoles CIS 501 (Martin/Roth): Main Memory 37 CIS 501 (Martin/Roth): Main Memory 38 DRAM Reliability One last thing about DRAM technology errors DRAM fails at a higher rate than SRAM (CPU logic) Very few electrons stored per bit Bit flips from energetic "-particle strikes Many more bits Modern DRAM systems: built-in error detection/correction Today all servers; desktop and laptops soon Key idea: checksum-style redundancy Main DRAM chips store data, additional chips store f(data) f(data) < data On read: re-compute f(data), compare with stored f(data) Different? Error Option I (detect): kill program Option II (correct): enough information to fix error? fix and go on CIS 501 (Martin/Roth): Main Memory 39 DRAM Error Detection and Correction address 4M x 2B data 4M x 2B 4M x 2B Performed by memory controller (not the DRAM chip) Error detection/correction schemes distinguished by How many (simultaneous) errors they can detect How many (simultaneous) errors they can correct CIS 501 (Martin/Roth): Main Memory 40 4M x 2B 0 1 2 3 f error 4M x 2B f

Error Detection: Parity Parity: simplest scheme f(data N 1 0 ) = XOR(data N 1,, data 1, data 0 ) + Single-error detect: detects a single bit flip (common case) Will miss two simultaneous bit flips But what are the odds of that happening? Zero-error correct: no way to tell which bit flipped Error Correction: Hamming Codes Hamming Code H(A,B) = number of 1 s in A^B (number of bits that differ) Called Hamming distance Use D data bits + C check bits construct a set of codewords Check bits are parities on different subsets of data bits #codewords A,B H(A,B)! " No combination of " 1 transforms one codeword into another For simple parity: " = 2 Errors of $ bits (or fewer) can be detected if " = $ + 1 Errors of % bits or fewer can be corrected if " = 2% + 1 Errors of $ bits can be detected and errors of % bits can be corrected if " = % + $ + 1 CIS 501 (Martin/Roth): Main Memory 41 CIS 501 (Martin/Roth): Main Memory 42 SEC Hamming Code SEC: single-error correct C = log 2 D + 1 + Relative overhead decreases as D grows Example: D = 4! C = 3 d 1 d 2 d 3 d 4 c 1 c 2 c 3! c 1 c 2 d 1 c 3 d 2 d 3 d 4 c 1 = d 1 ^ d 2 ^ d 4, c 2 = d 1 ^ d 3 ^ d 4, c 3 = d 2 ^ d 3 ^ d 4 Syndrome: c i ^ c i = 0? no error : points to flipped-bit Working example Original data = 0110! c 1 = 1, c 2 = 1, c 3 = 0 Flip d 2 = 0010! c 1 = 0, c 2 = 1, c 3 = 1 Syndrome = 101 (binary 5)! 5th bit? D 2 Flip c 2! c 1 = 1, c 2 = 0, c 3 = 0 Syndrome = 010 (binary 2)! 2nd bit? c 2 SECDED Hamming Code SECDED: single error correct, double error detect C = log 2 D + 2 Additional parity bit to detect additional error Example: D = 4! C = 4 d 1 d 2 d 3 d 4 c 1 c 2 c 3! c 1 c 2 d 1 c 3 d 2 d 3 d 4 c 4 c 4 = c 1 ^ c 2 ^ d 1 ^ c 3 ^ d 2 ^ d 3 ^ d 4 Syndrome == 0 and c 4 == c 4! no error Syndrome!= 0 and c 4!= c 4! 1-bit error Syndrome!= 0 and c 4 == c 4! 2-bit error Syndrome == 0 and c 4!= c 4! c 4 error Many machines today use 64-bit SECDED code C = 8 (one additional byte, 12% overhead) ChipKill - correct any aligned 4-bit error If an entire DRAM chips dies, the system still works! CIS 501 (Martin/Roth): Main Memory 43 CIS 501 (Martin/Roth): Main Memory 44

Summary Virtual memory Page tables and address translation Virtual, physical, and hybrid caches s DRAM technology Error detection and correction Parity and Hamming codes CIS 501 (Martin/Roth): Main Memory 45