Time ECE7660 Memory.2 Fall 2005
|
|
- Lorraine Lamb
- 7 years ago
- Views:
Transcription
1 ECE7660 Advanced Computer Architecture Basics of Memory Hierarchy ECE7660 Memory.1 Fall 2005 Who Cares About the Memory Hierarchy? Processor-DRAM Memory Gap (latency) Performance Moore s Law CPU µproc 60%/yr. (2X/1.5yr) Processor-Memory Performance Gap (grows 50% / year) DRAM DRAM 9%/yr. (2X/10 yrs) Time ECE7660 Memory.2 Fall 2005
2 Memory Hierarchy of a Modern Computer System Processor Datapath Control Registers On-Chip Second Level (SRAM) Main Memory (DRAM) Secondary Storage (Disk) Level/Name Typical size Implementation technology Access time (ns) 1/RF < 1 KB Custom memory w. multiple ports, CMOS /C < 16 MB On-chip or off-chip CMOS SRAM /MM < 16 GB CMOS DRAM /DM > 100GB Magnetic disk 5,000,000 Bandwidth 20, ,000 (MB/s) ,000 (MB/s) (MB/s) (MB/s) Managed by Compiler Hardware Operating system OS/operator Backed by Main memory Disk CD or tape ECE7660 Memory.3 Fall 2005 Memory Hierarchy Principles of Operation At any given time, data is copied between only 2 adjacent levels Upper Level the one closer to the processor - Smaller, faster, and uses more expensive technology Lower Level the one further away from the processor - Bigger, slower, and uses less expensive technology Block The minimum unit of information that can either be present or not present in the two level hierarchy To Processor From Processor Upper Level Memory Blk X Lower Level Memory Blk Y ECE7660 Memory.4 Fall 2005
3 Memory Hierarchy Terminology Hit data appears in some block in the upper level (example Block X) Hit Rate the fraction of memory access found in the upper level Hit Time Time to access the upper level which consists of RAM access time + Time to determine hit/miss Miss data needs to be retrieve from a block in the lower level (Block Y) Miss Rate = 1 - (Hit Rate) Miss Penalty Time to replace a block in the upper level + Time to deliver the block to the processor Hit Time << Miss Penalty To Processor From Processor Upper Level Memory Block X Lower Level Memory Block Y ECE7660 Memory.5 Fall 2005 Memory Hierarchy How Does it Work? Principle of Locality Program access a relatively small portion of the address space at any instant of time. Example 90% of time in 10% of the code Temporal Locality (Locality in Time) If an item is referenced, it will tend to be referenced again soon. Keep more recently accessed data items closer to the processor Spatial Locality (Locality in Space) If an item is referenced, items whose addresses are close by tend to be referenced soon. Move blocks consists of contiguous words to the upper levels To Processor From Processor Upper Level Memory Blk X Lower Level Memory Blk Y ECE7660 Memory.6 Fall 2005
4 Memory Hierarchy Technology Random Access Random is good access time is the same for all locations DRAM Dynamic Random Access Memory - High density, low power, cheap, slow - Dynamic need to be refreshed regularly SRAM Static Random Access Memory - Low density, high power, expensive, fast - Static content will last forever Non-so-random Access Technology Access time varies from location to location and from time to time Examples Disk, tape drive, CDROM The lectures will concentrate on random access technology The Main Memory DRAMs s SRAMs ECE7660 Memory.7 Fall 2005 Memory Hierarchy System Virtual Memory System ECE7660 Memory.8 Fall 2005
5 What is a cache? Small, fast storage used to improve average access time to slow memory. Exploits spacial and temporal locality In computer architecture, almost everything is a cache! Registers a cache on variables First-level cache a cache on second-level cache Second-level cache a cache on memory Memory a cache on disk (virtual memory) TLB a cache on page table Branch-prediction a cache on prediction information? Proc/Regs Bigger L1- L2- Memory Faster Disk, Tape, etc. ECE7660 Memory.9 Fall 2005 The Simplest Direct Mapped Memory Address A B C D E F Memory 4 Byte Direct Mapped Index Location 0 can be occupied by data from Memory location 0, 4, 8,... etc. In general any memory location whose 2 LSBs of the address are 0s Address<10> => cache index (Block Addr) module (#cache blocks) How can we tell which one is in the cache? Which one should we place in the cache? ECE7660 Memory.10 Fall 2005
6 Example Consider an eight-word direct mapped cache. Show the contents of the cache as it responds to a series of requests (decimal addresses) 22, 26, 22, 26, 16, 3, 16, miss miss hit hit miss miss hit miss ECE7660 Memory.11 Fall 2005 Organization Tag and Index Assume a 32-bit memory (byte ) address N A 2 bytes direct mapped cache - Index The lower N bits of the memory address - Tag The upper (32 - N) bits of the memory address 31 N 0 Tag Example 0x50 Index Ex 0x03 Valid Bit Stored as part of the cache state 0x50 2 N Bytes Direct Mapped Byte 0 Byte 1 Byte 2 Byte 3 N Byte N 2-1 ECE7660 Memory.12 Fall 2005
7 Access Example Start Up V Tag Data Access (miss) Access (HIT) 000 M [00001] 010 M [01010] Miss Handling Load Data Write Tag & Set V Access (miss) 000 M [00001] Access (HIT) 000 M [00001] 010 M [01010] Load Data Write Tag & Set V 000 M [00001] 010 M [01010] Sad Fact of Life A lot of misses at start up Compulsory Misses - (Cold start misses) ECE7660 Memory.13 Fall 2005 Definition of a Block Block the cache data that has in its own cache tag Our previous extreme example (see slide 10) 4-byte Direct Mapped cache Block Size = 1 Byte Take advantage of Temporal Locality If a byte is referenced, it will tend to be referenced soon. Did not take advantage of Spatial Locality If a byte is referenced, its adjacent bytes will be referenced soon. In order to take advantage of Spatial Locality increase the block size Valid Tag Direct Mapped Data Byte 0 Byte 1 Byte 2 Byte 3 Question Can we say the larger, the better for the block size? ECE7660 Memory.14 Fall 2005
8 Example 1 KB Direct Mapped with 32 B Blocks For a 2 ** N byte cache The uppermost (32 - N) bits are always the Tag The lowest M bits are the Byte Select (Block Size = 2 ** M) 31 Tag Example 0x50 Stored as part of the cache state 9 Index Ex 0x01 4 Byte Select Ex 0x00 0 Valid Bit Tag 0x50 Data Byte 31 Byte 63 Byte 1 Byte 33 Byte 0 Byte Byte 1023 Byte ECE7660 Memory.15 Fall 2005 Mapping an Address to a Multiword Block Address Tag is divided into three fields tag, index, offset ttttttttttttttttt iiiiiiiiii oooo tag index byte to check to offset if have select within correct block block block Example Consider a cache with 64 Blocks and a block size of 16 bytes. What block number does byte address 1200 map to? [1200/16] module 64 = 11 ECE7660 Memory.16 Fall 2005
9 DM Example One Suppose we have a 16KB of data in a direct-mapped cache with 4 word blocks Determine the size of the tag, index and offset fields if we re using a 32-bit architecture # rows/cache =# blocks/cache (since there s one block/row) = 2 10 rows/cache need 10 bits to specify this many rows Tag use remaining bits as tag tag length = mem addr length - offset - index = bits = 18 bits so tag is leftmost 18 bits of memory address Why not full 32 bit address as tag? All bytes within block need same address (-4b) Index must be same for every address within a block, so its redundant in tag check, thus can leave off to save memory (- 10 bits in this example) ECE7660 Memory.17 Fall 2005 DM Example Two (Bits in a cache) How many total bits are required for a directed mapped cache with 64 KB of data and one-word blocks, assuming a 32-bit address? 2^14 (32 + ( )+1) = 2^14*49 = 784 Kbits ECE7660 Memory.18 Fall 2005
10 Block Size Tradeoff In general, larger block size take advantage of spatial locality BUT Larger block size means larger miss penalty - Takes longer time to fill up the block If block size is too big relative to cache size, miss rate will go up Average Access Time = Hit Time x (1 - Miss Rate) + Miss Penalty x Miss Rate Miss Penalty Miss Rate Exploits Spatial Locality Fewer blocks compromises temporal locality Average Access Time Increased Miss Penalty & Miss Rate Block Size Block Size Block Size ECE7660 Memory.19 Fall 2005 Another Extreme Example Valid Bit Tag Data Byte 3 Byte 2 Byte 1 Byte 0 0 Size = 4 bytes Only ONE entry in the cache Block Size = 4 bytes True If an item is accessed, likely that it will be accessed again soon But it is unlikely that it will be accessed again immediately!!! The next access will likely to be a miss again - Continually loading data into the cache but discard (force out) them before they are used again - Worst nightmare of a cache designer Ping Pong Effect Conflict Misses are misses caused by Different memory locations mapped to the same cache index - Solution 1 make the cache size bigger - Solution 2 Multiple entries for the same Index ECE7660 Memory.20 Fall 2005
11 How Do you Design a? Set of Operations that must be supported read data <= Mem[Physical Address] write Mem[Physical Address] <= Data Physical Address Read/Write Data Memory Black Box Inside it has Tag-Data Storage, Muxes, Comparators,... Determine the internal register transfers Design the Datapath Design the Controller Address Data In Data Out DataPath Control Points Signals Controller ECE7660 Memory.21 Fall 2005 R/W Active wait Impact on time I - PC CPI = Ideal CPI + memory stall cycles CPU time = (CPU execution clock cycles + Memory stall cycles) x clock cycle time miss invalid IR IRex IRm A R B Memory stall clock cycles = (Reads x Read miss rate x Read miss penalty + Writes x Write miss rate x Write miss penalty) IRwb D T Miss Memory stall clock cycles = Memory accesses x Miss rate x Miss penalty Average Memory Access time (AMAT) = Hit Time + (Miss Rate x Miss Penalty) Memory hit time is included in execution cycles. ECE7660 Memory.22 Fall 2005
12 Example 1 Assume an instruction cache miss rate for gcc of 2% and a data cache miss rate of 4%. If a machine has a CPI of 2 without any memory stalls and the miss penalty is 40 cycles for all misses, determine how much faster a machine would run with a perfect cache that never missed. (It is known that the frequency of all loads and stores in gcc is 36%, unused) Answer Instruction miss cycles = I * 2% * 40 = 0.80I Data miss cycles = I * 36% * 4% * 40 = 0.56I The CPI with memory stalls is = 3.36 The performance with the perfect cache is better by 1.68 ECE7660 Memory.23 Fall 2005 Example 2 Suppose a processor executes at Clock Rate = 200 MHz (5 ns per cycle), Ideal (no misses) CPI = % arith/logic, 30% ld/st, 20% control Suppose that 10% of memory operations get 50 cycle miss penalty Suppose that 1% of instructions get same miss penalty Questions what is CPI? What s the percentage of proc time stalled waiting for memory? What is average memory access time? CPI = ideal CPI + average stalls per instruction 1.1(cycles/ins) + [ 0.30 (DataMops/ins) x 0.10 (miss/datamop) x 50 (cycle/miss)] + [ 1 (InstMop/ins) x 0.01 (miss/instmop) x 50 (cycle/miss)] = ( ) cycle/ins = 3.1 Percentage of the memory stall time = ( )/3.1 = 64.5% AMAT=(1/1.3)x[1+0.01x50]+(0.3/1.3)x[1+0.1x50]=2.54 cycles/access ECE7660 Memory.24 Fall 2005
13 Impact on Time Vector Product Example float dot_prod(float x[1024], y[1024]) { float sum = 0.0; int i; for (i = 0; i < 1024; i++) sum += x[i]*y[i]; return sum; } Machine Processor with 64KB direct-mapped cache, 16 B line size Performance Good case 24 cycles / element Bad case 66 cycles / element ECE7660 Memory.25 Fall 2005 Thrashing Example x[0] x[1] x[2] x[3] Line y[0] y[1] y[2] y[3] Line Line Line x[1020] x[1021] x[1022] x[1023] Line y[1020] y[1021] y[1022] y[1023] Line Access one element from each array per iteration ECE7660 Memory.26 Fall 2005
14 Thrashing Example Good Case x[0] x[1] x[2] x[3] y[0] y[1] y[2] y[3] Line Access Sequence Read x[0] - x[0], x[1], x[2], x[3] loaded Read y[0] - y[0], y[1], y[2], y[3] loaded Read x[1] - Hit Read y[1] - Hit 2 misses / 8 reads Analysis x[i] and y[i] map to different cache lines Miss rate = 25% - Two memory accesses / iteration - On every 4th iteration have two misses Timing 10 cycle loop time 28 cycles / cache miss Average time / iteration = * 2 * 28 ECE7660 Memory.27 Fall 2005 Thrashing Example Bad Case x[0] x[1] x[2] x[3] y[0] y[1] y[2] y[3] Line Access Pattern Read x[0] - x[0], x[1], x[2], x[3] loaded Read y[0] - y[0], y[1], y[2], y[3] loaded Read x[1] - x[0], x[1], x[2], x[3] loaded Read y[1] - y[0], y[1], y[2], y[3] loaded 8 misses / 8 reads Analysis x[i] and y[i] map to same cache lines Miss rate = 100% - Two memory accesses / iteration - On every iteration have two misses Timing 10 cycle loop time 28 cycles / cache miss Average time / iteration = * 2 * 28 ECE7660 Memory.28 Fall 2005
15 General Approaches to Improve Perf. Average Memory Access time (AMAT) = Hit Time + (Miss Rate x Miss Penalty) 1. Reduce the miss rate, I - PC 2. Reduce the miss penalty, or miss IR 3. Reduce the time to hit in the cache. invalid IRex A B IRm IRwb D Miss R T ECE7660 Memory.29 Fall 2005 Design to Reduce Miss Rate Four Key Questions Q1 Where can a block be placed in the upper level? (Block placement) Fully Associative, Set Associative, Direct Mapped Q2 How is a block found if it is in the upper level? (Block identification) Tag/Block Q3 Which block should be replaced on a miss? (Block replacement) Random, LRU Q4 What happens on a write? (Write strategy) Write Back or Write Through (with Write Buffer) ECE7660 Memory.30 Fall 2005
16 Where can a block be place? Block 12 placed in 8 block cache Fully associative, direct mapped, 2-way set associative S.A. Mapping = Block Number Modulo Number Sets Block no. Fully associative block 12 can go anywhere Block no. Direct mapped block 12 can go only into block 4 (12 mod 8) Block no. Set associative block 12 can go anywhere in set 0 (12 mod 4) Block-frame address Set 0 Set 1 Set Set 2 3 Block no ECE7660 Memory.31 Fall 2005 And yet Another Extreme Example Fully Associative Fully Associative Forget about the Index Compare the Tags of all cache entries in parallel Example Block Size = 32 B blocks, we need N 27-bit comparators By definition Conflict Miss = 0 for a fully associative cache 31 Tag (27 bits long) 4 Byte Select Ex 0x01 0 X X X X Tag Valid Bit Data Byte 31 Byte 63 Byte 1 Byte 33 Byte 0 Byte 32 X ECE7660 Memory.32 Fall 2005
17 A Two-way Set Associative N-way set associative N entries for each Index N direct mapped caches operates in parallel Valid Example Two-way set associative cache Index selects a set from the cache The two tags in the set are compared in parallel Data is selected based on the tag result Tag Data Block 0 Index Data Block 0 Tag Valid Adr Tag Compare Sel1 1 Mux 0 Sel0 Compare OR Hit Block ECE7660 Memory.33 Fall 2005 Disadvantage of Set Associative N-way Set Associative versus Direct Mapped N comparators vs. 1 Extra MUX delay for the data Data comes AFTER Hit/Miss In a direct mapped cache, Block is available BEFORE Hit/Miss Possible to assume a hit and continue. Recover later if miss. Valid Tag Data Block 0 Index Data Block 0 Tag Valid Adr Tag Compare Sel1 1 Mux 0 Sel0 Compare OR Block Hit ECE7660 Memory.34 Fall 2005
18 Sources of Misses Compulsory (cold start, first reference) first access to a block Cold fact of life not a whole lot you can do about it Conflict (collision) Multiple memory locations mapped to the same cache location Solution 1 increase cache size Solution 2 increase associativity Capacity cannot contain all blocks access by the program Solution increase cache size Invalidation other process (e.g., I/O, processors) updates memory ECE7660 Memory.35 Fall 2005 The Need to Make a Decision! Direct Mapped Each memory location can only mapped to 1 cache location No need to make any decision -) - Current item replaced the previous item in that cache location N-way Set Associative Each memory location have a choice of N cache locations Fully Associative Each memory location can be placed in ANY cache location miss in a N-way Set Associative or Fully Associative Bring in new block from memory Throw out a cache block to make room for the new block ====> We need to make a decision on which block to throw out! ECE7660 Memory.36 Fall 2005
19 Block Replacement Policy Random Replacement Hardware randomly selects a cache item and throw it out Least Recently Used Hardware keeps track of the access history Replace the entry that has not been used for the longest time First-in, first out (FIFO) replace the oldest block Example of a Simple Pseudo Least Recently Used Implementation Assume 64 Fully Associative Entries Hardware replacement pointer points to one cache entry Whenever an access is made to the entry the pointer points to - Move the pointer to the next entry Otherwise do not move the pointer Replacement Pointer Entry 0 Entry 1 ECE7660 Memory.37 Fall 2005 Entry 63 Write Policy Write Through versus Write Back read is much easier to handle than cache write Instruction cache is much easier to design than data cache write How do we keep data in the cache and memory consistent? Two options (decision time again -) Write Back write to cache only. Write the cache block to memory when that cache block is being replaced on a cache miss. - Need a dirty bit for each cache block, so as to the frequency of writing back blocks - Greatly reduce the memory bandwidth requirement - Control can be complex Write Through write to cache and memory at the same time. - The cache is always clean - What!!! How can this be? Isn t memory too slow for this? ECE7660 Memory.38 Fall 2005
20 Write Buffer for Write Through Processor DRAM Write Buffer Write stall the time when cpu s waiting for writes to complete A Write Buffer is needed between the and Memory Processor writes data into the cache and the write buffer Memory controller write contents of the buffer to memory Write buffer is just a FIFO Typical number of entries 4 Works fine if Store frequency (w.r.t. time) << 1 / DRAM write cycle Memory system designer s nightmare Store frequency (w.r.t. time) > 1 / DRAM write cycle Write buffer saturation ECE7660 Memory.39 Fall 2005 Write Buffer Saturation Processor DRAM Write Buffer Store frequency (w.r.t. time) > 1 / DRAM write cycle If this condition exist for a long period of time (CPU cycle time too quick and/or too many store instructions in a row) - Store buffer will overflow no matter how big you make it - The CPU Cycle Time <= DRAM Write Cycle Time Solution for write buffer saturation Use a write back cache Install a second level (L2) cache Processor L2 DRAM Write Buffer ECE7660 Memory.40 Fall 2005
21 Write Allocate versus Not Allocate Assume a 16-bit write to memory location 0x0 and causes a Write Miss Do we read in the rest of the block (Byte 2, 3,... 31)? Yes Write Allocate No Write Not Allocate 31 Tag Example 0x00 9 Index Ex 0x00 4 Byte Select Ex 0x00 0 Valid Bit Tag 0x00 Data Byte 31 Byte 63 Byte 1 Byte 33 Byte 0 Byte Byte 1023 Byte ECE7660 Memory.41 Fall 2005 How important are caches? Floorplan Register files in middle of execution units 64k instr cache 64k data cache s take up a large fraction of the die (Figure from Jim Keller, Compaq Corp.) ECE7660 Memory.42 Fall 2005
22 The Processor Itself ECE7660 Memory.43 Fall 2005 Summary of Basics The Principle of Locality Temporal Locality vs Spatial Locality Four Questions For Any Where to place in the cache How to locate a block in the cache Replacement Write policy Write through vs Write back - Write miss Three Major Categories of Misses Compulsory Misses sad facts of life. Example cold start misses. Conflict Misses increase cache size and/or associativity. Nightmare Scenario ping pong effect! Capacity Misses increase cache size ECE7660 Memory.44 Fall 2005
361 Computer Architecture Lecture 14: Cache Memory
1 361 Computer Architecture Lecture 14 Memory cache.1 The Motivation for s Memory System Processor DRAM Motivation Large memories (DRAM) are slow Small memories (SRAM) are fast Make the average access
More informationMemory Hierarchy. Arquitectura de Computadoras. Centro de Investigación n y de Estudios Avanzados del IPN. adiaz@cinvestav.mx. MemoryHierarchy- 1
Hierarchy Arturo Díaz D PérezP Centro de Investigación n y de Estudios Avanzados del IPN adiaz@cinvestav.mx Hierarchy- 1 The Big Picture: Where are We Now? The Five Classic Components of a Computer Processor
More informationThe Big Picture. Cache Memory CSE 675.02. Memory Hierarchy (1/3) Disk
The Big Picture Cache Memory CSE 5.2 Computer Processor Memory (active) (passive) Control (where ( brain ) programs, Datapath data live ( brawn ) when running) Devices Input Output Keyboard, Mouse Disk,
More informationMemory ICS 233. Computer Architecture and Assembly Language Prof. Muhamed Mudawar
Memory ICS 233 Computer Architecture and Assembly Language Prof. Muhamed Mudawar College of Computer Sciences and Engineering King Fahd University of Petroleum and Minerals Presentation Outline Random
More informationThe Quest for Speed - Memory. Cache Memory. A Solution: Memory Hierarchy. Memory Hierarchy
The Quest for Speed - Memory Cache Memory CSE 4, Spring 25 Computer Systems http://www.cs.washington.edu/4 If all memory accesses (IF/lw/sw) accessed main memory, programs would run 20 times slower And
More informationOperating Systems. Virtual Memory
Operating Systems Virtual Memory Virtual Memory Topics. Memory Hierarchy. Why Virtual Memory. Virtual Memory Issues. Virtual Memory Solutions. Locality of Reference. Virtual Memory with Segmentation. Page
More informationCOMPUTER HARDWARE. Input- Output and Communication Memory Systems
COMPUTER HARDWARE Input- Output and Communication Memory Systems Computer I/O I/O devices commonly found in Computer systems Keyboards Displays Printers Magnetic Drives Compact disk read only memory (CD-ROM)
More informationChapter 1 Computer System Overview
Operating Systems: Internals and Design Principles Chapter 1 Computer System Overview Eighth Edition By William Stallings Operating System Exploits the hardware resources of one or more processors Provides
More informationHY345 Operating Systems
HY345 Operating Systems Recitation 2 - Memory Management Solutions Panagiotis Papadopoulos panpap@csd.uoc.gr Problem 7 Consider the following C program: int X[N]; int step = M; //M is some predefined constant
More informationComputer Organization and Architecture. Characteristics of Memory Systems. Chapter 4 Cache Memory. Location CPU Registers and control unit memory
Computer Organization and Architecture Chapter 4 Cache Memory Characteristics of Memory Systems Note: Appendix 4A will not be covered in class, but the material is interesting reading and may be used in
More informationSlide Set 8. for ENCM 369 Winter 2015 Lecture Section 01. Steve Norman, PhD, PEng
Slide Set 8 for ENCM 369 Winter 2015 Lecture Section 01 Steve Norman, PhD, PEng Electrical & Computer Engineering Schulich School of Engineering University of Calgary Winter Term, 2015 ENCM 369 W15 Section
More informationSecondary Storage. Any modern computer system will incorporate (at least) two levels of storage: magnetic disk/optical devices/tape systems
1 Any modern computer system will incorporate (at least) two levels of storage: primary storage: typical capacity cost per MB $3. typical access time burst transfer rate?? secondary storage: typical capacity
More informationHomework # 2. Solutions. 4.1 What are the differences among sequential access, direct access, and random access?
ECE337 / CS341, Fall 2005 Introduction to Computer Architecture and Organization Instructor: Victor Manuel Murray Herrera Date assigned: 09/19/05, 05:00 PM Due back: 09/30/05, 8:00 AM Homework # 2 Solutions
More informationLecture 17: Virtual Memory II. Goals of virtual memory
Lecture 17: Virtual Memory II Last Lecture: Introduction to virtual memory Today Review and continue virtual memory discussion Lecture 17 1 Goals of virtual memory Make it appear as if each process has:
More informationMemory Basics. SRAM/DRAM Basics
Memory Basics RAM: Random Access Memory historically defined as memory array with individual bit access refers to memory with both Read and Write capabilities ROM: Read Only Memory no capabilities for
More informationBindel, Spring 2010 Applications of Parallel Computers (CS 5220) Week 1: Wednesday, Jan 27
Logistics Week 1: Wednesday, Jan 27 Because of overcrowding, we will be changing to a new room on Monday (Snee 1120). Accounts on the class cluster (crocus.csuglab.cornell.edu) will be available next week.
More informationComputers. Hardware. The Central Processing Unit (CPU) CMPT 125: Lecture 1: Understanding the Computer
Computers CMPT 125: Lecture 1: Understanding the Computer Tamara Smyth, tamaras@cs.sfu.ca School of Computing Science, Simon Fraser University January 3, 2009 A computer performs 2 basic functions: 1.
More informationBoard Notes on Virtual Memory
Board Notes on Virtual Memory Part A: Why Virtual Memory? - Letʼs user program size exceed the size of the physical address space - Supports protection o Donʼt know which program might share memory at
More information18-548/15-548 Associativity 9/16/98. 7 Associativity. 18-548/15-548 Memory System Architecture Philip Koopman September 16, 1998
7 Associativity 18-548/15-548 Memory System Architecture Philip Koopman September 16, 1998 Required Reading: Cragon pg. 166-174 Assignments By next class read about data management policies: Cragon 2.2.4-2.2.6,
More informationComputer Systems Structure Main Memory Organization
Computer Systems Structure Main Memory Organization Peripherals Computer Central Processing Unit Main Memory Computer Systems Interconnection Communication lines Input Output Ward 1 Ward 2 Storage/Memory
More informationComputer Architecture
Cache Memory Gábor Horváth 2016. április 27. Budapest associate professor BUTE Dept. Of Networked Systems and Services ghorvath@hit.bme.hu It is the memory,... The memory is a serious bottleneck of Neumann
More informationWhat is a bus? A Bus is: Advantages of Buses. Disadvantage of Buses. Master versus Slave. The General Organization of a Bus
Datorteknik F1 bild 1 What is a bus? Slow vehicle that many people ride together well, true... A bunch of wires... A is: a shared communication link a single set of wires used to connect multiple subsystems
More informationQuiz for Chapter 6 Storage and Other I/O Topics 3.10
Date: 3.10 Not all questions are of equal difficulty. Please review the entire quiz first and then budget your time carefully. Name: Course: Solutions in Red 1. [6 points] Give a concise answer to each
More informationCS 6290 I/O and Storage. Milos Prvulovic
CS 6290 I/O and Storage Milos Prvulovic Storage Systems I/O performance (bandwidth, latency) Bandwidth improving, but not as fast as CPU Latency improving very slowly Consequently, by Amdahl s Law: fraction
More informationOutline. Cache Parameters. Lecture 5 Cache Operation
Lecture Cache Operation ECE / Fall Edward F. Gehringer Based on notes by Drs. Eric Rotenberg & Tom Conte of NCSU Outline Review of cache parameters Example of operation of a direct-mapped cache. Example
More informationIn-Memory Databases Algorithms and Data Structures on Modern Hardware. Martin Faust David Schwalb Jens Krüger Jürgen Müller
In-Memory Databases Algorithms and Data Structures on Modern Hardware Martin Faust David Schwalb Jens Krüger Jürgen Müller The Free Lunch Is Over 2 Number of transistors per CPU increases Clock frequency
More informationRecord Storage and Primary File Organization
Record Storage and Primary File Organization 1 C H A P T E R 4 Contents Introduction Secondary Storage Devices Buffering of Blocks Placing File Records on Disk Operations on Files Files of Unordered Records
More informationChapter 12. Paging an Virtual Memory Systems
Chapter 12 Paging an Virtual Memory Systems Paging & Virtual Memory Virtual Memory - giving the illusion of more physical memory than there really is (via demand paging) Pure Paging - The total program
More informationCOS 318: Operating Systems. Virtual Memory and Address Translation
COS 318: Operating Systems Virtual Memory and Address Translation Today s Topics Midterm Results Virtual Memory Virtualization Protection Address Translation Base and bound Segmentation Paging Translation
More informationCSCA0102 IT & Business Applications. Foundation in Business Information Technology School of Engineering & Computing Sciences FTMS College Global
CSCA0102 IT & Business Applications Foundation in Business Information Technology School of Engineering & Computing Sciences FTMS College Global Chapter 2 Data Storage Concepts System Unit The system unit
More information1. Memory technology & Hierarchy
1. Memory technology & Hierarchy RAM types Advances in Computer Architecture Andy D. Pimentel Memory wall Memory wall = divergence between CPU and RAM speed We can increase bandwidth by introducing concurrency
More informationComputer Architecture
Computer Architecture Slide Sets WS 2013/2014 Prof. Dr. Uwe Brinkschulte M.Sc. Benjamin Betting Part 11 Memory Management Computer Architecture Part 11 page 1 of 44 Prof. Dr. Uwe Brinkschulte, M.Sc. Benjamin
More informationStoring Data: Disks and Files
Storing Data: Disks and Files (From Chapter 9 of textbook) Storing and Retrieving Data Database Management Systems need to: Store large volumes of data Store data reliably (so that data is not lost!) Retrieve
More informationA3 Computer Architecture
A3 Computer Architecture Engineering Science 3rd year A3 Lectures Prof David Murray david.murray@eng.ox.ac.uk www.robots.ox.ac.uk/ dwm/courses/3co Michaelmas 2000 1 / 1 6. Stacks, Subroutines, and Memory
More informationIntroduction Disks RAID Tertiary storage. Mass Storage. CMSC 412, University of Maryland. Guest lecturer: David Hovemeyer.
Guest lecturer: David Hovemeyer November 15, 2004 The memory hierarchy Red = Level Access time Capacity Features Registers nanoseconds 100s of bytes fixed Cache nanoseconds 1-2 MB fixed RAM nanoseconds
More informationCS/COE1541: Introduction to Computer Architecture. Memory hierarchy. Sangyeun Cho. Computer Science Department University of Pittsburgh
CS/COE1541: Introduction to Computer Architecture Memory hierarchy Sangyeun Cho Computer Science Department CPU clock rate Apple II MOS 6502 (1975) 1~2MHz Original IBM PC (1981) Intel 8080 4.77MHz Intel
More informationPrice/performance Modern Memory Hierarchy
Lecture 21: Storage Administration Take QUIZ 15 over P&H 6.1-4, 6.8-9 before 11:59pm today Project: Cache Simulator, Due April 29, 2010 NEW OFFICE HOUR TIME: Tuesday 1-2, McKinley Last Time Exam discussion
More informationWe r e going to play Final (exam) Jeopardy! "Answers:" "Questions:" - 1 -
. (0 pts) We re going to play Final (exam) Jeopardy! Associate the following answers with the appropriate question. (You are given the "answers": Pick the "question" that goes best with each "answer".)
More informationCOMPUTER ORGANIZATION ARCHITECTURES FOR EMBEDDED COMPUTING
COMPUTER ORGANIZATION ARCHITECTURES FOR EMBEDDED COMPUTING 2013/2014 1 st Semester Sample Exam January 2014 Duration: 2h00 - No extra material allowed. This includes notes, scratch paper, calculator, etc.
More informationCPS104 Computer Organization and Programming Lecture 18: Input-Output. Robert Wagner
CPS104 Computer Organization and Programming Lecture 18: Input-Output Robert Wagner cps 104 I/O.1 RW Fall 2000 Outline of Today s Lecture The I/O system Magnetic Disk Tape Buses DMA cps 104 I/O.2 RW Fall
More informationPhysical Data Organization
Physical Data Organization Database design using logical model of the database - appropriate level for users to focus on - user independence from implementation details Performance - other major factor
More informationData Storage - I: Memory Hierarchies & Disks
Data Storage - I: Memory Hierarchies & Disks W7-C, Spring 2005 Updated by M. Naci Akkøk, 27.02.2004 and 23.02.2005, based upon slides by Pål Halvorsen, 11.3.2002. Contains slides from: Hector Garcia-Molina,
More informationINTRODUCTION The collection of data that makes up a computerized database must be stored physically on some computer storage medium.
Chapter 4: Record Storage and Primary File Organization 1 Record Storage and Primary File Organization INTRODUCTION The collection of data that makes up a computerized database must be stored physically
More informationCS 61C: Great Ideas in Computer Architecture Virtual Memory Cont.
CS 61C: Great Ideas in Computer Architecture Virtual Memory Cont. Instructors: Vladimir Stojanovic & Nicholas Weaver http://inst.eecs.berkeley.edu/~cs61c/ 1 Bare 5-Stage Pipeline Physical Address PC Inst.
More informationVirtual Memory. How is it possible for each process to have contiguous addresses and so many of them? A System Using Virtual Addressing
How is it possible for each process to have contiguous addresses and so many of them? Computer Systems Organization (Spring ) CSCI-UA, Section Instructor: Joanna Klukowska Teaching Assistants: Paige Connelly
More informationTechnology Update White Paper. High Speed RAID 6. Powered by Custom ASIC Parity Chips
Technology Update White Paper High Speed RAID 6 Powered by Custom ASIC Parity Chips High Speed RAID 6 Powered by Custom ASIC Parity Chips Why High Speed RAID 6? Winchester Systems has developed High Speed
More informationThis Unit: Caches. CIS 501 Introduction to Computer Architecture. Motivation. Types of Memory
This Unit: Caches CIS 5 Introduction to Computer Architecture Unit 3: Storage Hierarchy I: Caches Application OS Compiler Firmware CPU I/O Memory Digital Circuits Gates & Transistors Memory hierarchy concepts
More informationPipeline Hazards. Structure hazard Data hazard. ComputerArchitecture_PipelineHazard1
Pipeline Hazards Structure hazard Data hazard Pipeline hazard: the major hurdle A hazard is a condition that prevents an instruction in the pipe from executing its next scheduled pipe stage Taxonomy of
More informationWith respect to the way of data access we can classify memories as:
Memory Classification With respect to the way of data access we can classify memories as: - random access memories (RAM), - sequentially accessible memory (SAM), - direct access memory (DAM), - contents
More informationOutline: Operating Systems
Outline: Operating Systems What is an OS OS Functions Multitasking Virtual Memory File Systems Window systems PC Operating System Wars: Windows vs. Linux 1 Operating System provides a way to boot (start)
More informationUniversity of Dublin Trinity College. Storage Hardware. Owen.Conlan@cs.tcd.ie
University of Dublin Trinity College Storage Hardware Owen.Conlan@cs.tcd.ie Hardware Issues Hard Disk/SSD CPU Cache Main Memory CD ROM/RW DVD ROM/RW Tapes Primary Storage Floppy Disk/ Memory Stick Secondary
More informationChapter 11 I/O Management and Disk Scheduling
Operating Systems: Internals and Design Principles, 6/E William Stallings Chapter 11 I/O Management and Disk Scheduling Dave Bremer Otago Polytechnic, NZ 2008, Prentice Hall I/O Devices Roadmap Organization
More informationCSE 160 Lecture 5. The Memory Hierarchy False Sharing Cache Coherence and Consistency. Scott B. Baden
CSE 160 Lecture 5 The Memory Hierarchy False Sharing Cache Coherence and Consistency Scott B. Baden Using Bang coming down the home stretch Do not use Bang s front end for running mergesort Use batch,
More information6. Storage and File Structures
ECS-165A WQ 11 110 6. Storage and File Structures Goals Understand the basic concepts underlying different storage media, buffer management, files structures, and organization of records in files. Contents
More informationChapter 2 Basic Structure of Computers. Jin-Fu Li Department of Electrical Engineering National Central University Jungli, Taiwan
Chapter 2 Basic Structure of Computers Jin-Fu Li Department of Electrical Engineering National Central University Jungli, Taiwan Outline Functional Units Basic Operational Concepts Bus Structures Software
More informationQuiz for Chapter 1 Computer Abstractions and Technology 3.10
Date: 3.10 Not all questions are of equal difficulty. Please review the entire quiz first and then budget your time carefully. Name: Course: Solutions in Red 1. [15 points] Consider two different implementations,
More informationBig Picture. IC220 Set #11: Storage and I/O I/O. Outline. Important but neglected
Big Picture Processor Interrupts IC220 Set #11: Storage and Cache Memory- bus Main memory 1 Graphics output Network 2 Outline Important but neglected The difficulties in assessing and designing systems
More informationStoring Data: Disks and Files. Disks and Files. Why Not Store Everything in Main Memory? Chapter 7
Storing : Disks and Files Chapter 7 Yea, from the table of my memory I ll wipe away all trivial fond records. -- Shakespeare, Hamlet base Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Disks and
More informationTechnical Note. Micron NAND Flash Controller via Xilinx Spartan -3 FPGA. Overview. TN-29-06: NAND Flash Controller on Spartan-3 Overview
Technical Note TN-29-06: NAND Flash Controller on Spartan-3 Overview Micron NAND Flash Controller via Xilinx Spartan -3 FPGA Overview As mobile product capabilities continue to expand, so does the demand
More informationMemory Management Outline. Background Swapping Contiguous Memory Allocation Paging Segmentation Segmented Paging
Memory Management Outline Background Swapping Contiguous Memory Allocation Paging Segmentation Segmented Paging 1 Background Memory is a large array of bytes memory and registers are only storage CPU can
More informationMemory management basics (1) Requirements (1) Objectives. Operating Systems Part of E1.9 - Principles of Computers and Software Engineering
Memory management basics (1) Requirements (1) Operating Systems Part of E1.9 - Principles of Computers and Software Engineering Lecture 7: Memory Management I Memory management intends to satisfy the following
More informationCS 464/564 Introduction to Database Management System Instructor: Abdullah Mueen
CS 464/564 Introduction to Database Management System Instructor: Abdullah Mueen LECTURE 14: DATA STORAGE AND REPRESENTATION Data Storage Memory Hierarchy Disks Fields, Records, Blocks Variable-length
More informationByte Ordering of Multibyte Data Items
Byte Ordering of Multibyte Data Items Most Significant Byte (MSB) Least Significant Byte (LSB) Big Endian Byte Addresses +0 +1 +2 +3 +4 +5 +6 +7 VALUE (8-byte) Least Significant Byte (LSB) Most Significant
More informationComputer Performance. Topic 3. Contents. Prerequisite knowledge Before studying this topic you should be able to:
55 Topic 3 Computer Performance Contents 3.1 Introduction...................................... 56 3.2 Measuring performance............................... 56 3.2.1 Clock Speed.................................
More informationCS:APP Chapter 4 Computer Architecture. Wrap-Up. William J. Taffe Plymouth State University. using the slides of
CS:APP Chapter 4 Computer Architecture Wrap-Up William J. Taffe Plymouth State University using the slides of Randal E. Bryant Carnegie Mellon University Overview Wrap-Up of PIPE Design Performance analysis
More informationThe Classical Architecture. Storage 1 / 36
1 / 36 The Problem Application Data? Filesystem Logical Drive Physical Drive 2 / 36 Requirements There are different classes of requirements: Data Independence application is shielded from physical storage
More informationRAM & ROM Based Digital Design. ECE 152A Winter 2012
RAM & ROM Based Digital Design ECE 152A Winter 212 Reading Assignment Brown and Vranesic 1 Digital System Design 1.1 Building Block Circuits 1.1.3 Static Random Access Memory (SRAM) 1.1.4 SRAM Blocks in
More informationOutline. CS 245: Database System Principles. Notes 02: Hardware. Hardware DBMS ... ... Data Storage
CS 245: Database System Principles Notes 02: Hardware Hector Garcia-Molina Outline Hardware: Disks Access Times Solid State Drives Optimizations Other Topics: Storage costs Using secondary storage Disk
More informationThis Unit: Virtual Memory. CIS 501 Computer Architecture. Readings. A Computer System: Hardware
This Unit: Virtual CIS 501 Computer Architecture Unit 5: Virtual App App App System software Mem CPU I/O The operating system (OS) A super-application Hardware support for an OS Virtual memory Page tables
More informationCHAPTER 7: The CPU and Memory
CHAPTER 7: The CPU and Memory The Architecture of Computer Hardware, Systems Software & Networking: An Information Technology Approach 4th Edition, Irv Englander John Wiley and Sons 2010 PowerPoint slides
More informationEFFICIENT EXTERNAL SORTING ON FLASH MEMORY EMBEDDED DEVICES
ABSTRACT EFFICIENT EXTERNAL SORTING ON FLASH MEMORY EMBEDDED DEVICES Tyler Cossentine and Ramon Lawrence Department of Computer Science, University of British Columbia Okanagan Kelowna, BC, Canada tcossentine@gmail.com
More informationSolid State Drive Architecture
Solid State Drive Architecture A comparison and evaluation of data storage mediums Tyler Thierolf Justin Uriarte Outline Introduction Storage Device as Limiting Factor Terminology Internals Interface Architecture
More informationModule 2. Embedded Processors and Memory. Version 2 EE IIT, Kharagpur 1
Module 2 Embedded Processors and Memory Version 2 EE IIT, Kharagpur 1 Lesson 5 Memory-I Version 2 EE IIT, Kharagpur 2 Instructional Objectives After going through this lesson the student would Pre-Requisite
More informationStorage in Database Systems. CMPSCI 445 Fall 2010
Storage in Database Systems CMPSCI 445 Fall 2010 1 Storage Topics Architecture and Overview Disks Buffer management Files of records 2 DBMS Architecture Query Parser Query Rewriter Query Optimizer Query
More informationOperating Systems, 6 th ed. Test Bank Chapter 7
True / False Questions: Chapter 7 Memory Management 1. T / F In a multiprogramming system, main memory is divided into multiple sections: one for the operating system (resident monitor, kernel) and one
More informationCentral Processing Unit (CPU)
Central Processing Unit (CPU) CPU is the heart and brain It interprets and executes machine level instructions Controls data transfer from/to Main Memory (MM) and CPU Detects any errors In the following
More informationVirtual Memory Paging
COS 318: Operating Systems Virtual Memory Paging Kai Li Computer Science Department Princeton University (http://www.cs.princeton.edu/courses/cos318/) Today s Topics Paging mechanism Page replacement algorithms
More informationCommunication Protocol
Analysis of the NXT Bluetooth Communication Protocol By Sivan Toledo September 2006 The NXT supports Bluetooth communication between a program running on the NXT and a program running on some other Bluetooth
More informationMemory Management CS 217. Two programs can t control all of memory simultaneously
Memory Management CS 217 Memory Management Problem 1: Two programs can t control all of memory simultaneously Problem 2: One program shouldn t be allowed to access/change the memory of another program
More informationArchitecture of Hitachi SR-8000
Architecture of Hitachi SR-8000 University of Stuttgart High-Performance Computing-Center Stuttgart (HLRS) www.hlrs.de Slide 1 Most of the slides from Hitachi Slide 2 the problem modern computer are data
More informationRAID. RAID 0 No redundancy ( AID?) Just stripe data over multiple disks But it does improve performance. Chapter 6 Storage and Other I/O Topics 29
RAID Redundant Array of Inexpensive (Independent) Disks Use multiple smaller disks (c.f. one large disk) Parallelism improves performance Plus extra disk(s) for redundant data storage Provides fault tolerant
More informationChapter 13: Query Processing. Basic Steps in Query Processing
Chapter 13: Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 13.1 Basic Steps in Query Processing 1. Parsing
More informationCAs and Turing Machines. The Basis for Universal Computation
CAs and Turing Machines The Basis for Universal Computation What We Mean By Universal When we claim universal computation we mean that the CA is capable of calculating anything that could possibly be calculated*.
More informationMemory. The memory types currently in common usage are:
ory ory is the third key component of a microprocessor-based system (besides the CPU and I/O devices). More specifically, the primary storage directly addressed by the CPU is referred to as main memory
More informationChoosing a Computer for Running SLX, P3D, and P5
Choosing a Computer for Running SLX, P3D, and P5 This paper is based on my experience purchasing a new laptop in January, 2010. I ll lead you through my selection criteria and point you to some on-line
More information150127-Microprocessor & Assembly Language
Chapter 3 Z80 Microprocessor Architecture The Z 80 is one of the most talented 8 bit microprocessors, and many microprocessor-based systems are designed around the Z80. The Z80 microprocessor needs an
More informationOPERATING SYSTEM - VIRTUAL MEMORY
OPERATING SYSTEM - VIRTUAL MEMORY http://www.tutorialspoint.com/operating_system/os_virtual_memory.htm Copyright tutorialspoint.com A computer can address more memory than the amount physically installed
More informationADVANCED PROCESSOR ARCHITECTURES AND MEMORY ORGANISATION Lesson-17: Memory organisation, and types of memory
ADVANCED PROCESSOR ARCHITECTURES AND MEMORY ORGANISATION Lesson-17: Memory organisation, and types of memory 1 1. Memory Organisation 2 Random access model A memory-, a data byte, or a word, or a double
More informationx64 Servers: Do you want 64 or 32 bit apps with that server?
TMurgent Technologies x64 Servers: Do you want 64 or 32 bit apps with that server? White Paper by Tim Mangan TMurgent Technologies February, 2006 Introduction New servers based on what is generally called
More information2
1 2 3 4 5 For Description of these Features see http://download.intel.com/products/processor/corei7/prod_brief.pdf The following Features Greatly affect Performance Monitoring The New Performance Monitoring
More informationwhat operations can it perform? how does it perform them? on what kind of data? where are instructions and data stored?
Inside the CPU how does the CPU work? what operations can it perform? how does it perform them? on what kind of data? where are instructions and data stored? some short, boring programs to illustrate the
More informationCase for storage. Outline. Magnetic disks. CS2410: Computer Architecture. Storage systems. Sangyeun Cho
Case for storage CS24: Computer Architecture Storage systems Sangyeun Cho Computer Science Department Shift in focus from computation to communication & storage of information Eg, Cray Research/Thinking
More informationSwitch Fabric Implementation Using Shared Memory
Order this document by /D Switch Fabric Implementation Using Shared Memory Prepared by: Lakshmi Mandyam and B. Kinney INTRODUCTION Whether it be for the World Wide Web or for an intra office network, today
More informationCISC, RISC, and DSP Microprocessors
CISC, RISC, and DSP Microprocessors Douglas L. Jones ECE 497 Spring 2000 4/6/00 CISC, RISC, and DSP D.L. Jones 1 Outline Microprocessors circa 1984 RISC vs. CISC Microprocessors circa 1999 Perspective:
More informationMulti-Threading Performance on Commodity Multi-Core Processors
Multi-Threading Performance on Commodity Multi-Core Processors Jie Chen and William Watson III Scientific Computing Group Jefferson Lab 12000 Jefferson Ave. Newport News, VA 23606 Organization Introduction
More information14.1 Rent-or-buy problem
CS787: Advanced Algorithms Lecture 14: Online algorithms We now shift focus to a different kind of algorithmic problem where we need to perform some optimization without knowing the input in advance. Algorithms
More informationHow NAND Flash Threatens DRAM
How NAND Flash Threatens DRAM Jim Handy OBJECTIVE ANALYSIS Outline Why even think about DRAM vs. NAND? The memory/storage hierarchy What benchmarks tell us What about 3D XPoint memory? The system of the
More informationChapter 2: Computer-System Structures. Computer System Operation Storage Structure Storage Hierarchy Hardware Protection General System Architecture
Chapter 2: Computer-System Structures Computer System Operation Storage Structure Storage Hierarchy Hardware Protection General System Architecture Operating System Concepts 2.1 Computer-System Architecture
More information(Refer Slide Time: 00:01:16 min)
Digital Computer Organization Prof. P. K. Biswas Department of Electronic & Electrical Communication Engineering Indian Institute of Technology, Kharagpur Lecture No. # 04 CPU Design: Tirning & Control
More informationChapter 9: Peripheral Devices: Magnetic Disks
Chapter 9: Peripheral Devices: Magnetic Disks Basic Disk Operation Performance Parameters and History of Improvement Example disks RAID (Redundant Arrays of Inexpensive Disks) Improving Reliability Improving
More information