Lecture 14: Memory Hierarchy. Housekeeping

Size: px
Start display at page:

Download "Lecture 14: Memory Hierarchy. Housekeeping"

Transcription

1 S 16 L Lecture 14: Memory Hierarchy James C. Hoe Department of ECE Carnegie Mellon University Housekeeping S 16 L14 2 Your goal today big picture intro to memory subsystem understand the general concept of memory hierarchy Notices I don t teach SRAM/DRAM/Flash except that they exist one faster than another one more capacity than another one cheaper (per bit) than another I also don t teach delay line, Selectron, drum, or core Readings P&H Ch6 for the next many lectures

2 Wishful Memory S 16 L14 3 So far we imagined a program sees a contiguous 4GB memory access anywhere in memory in 1 proc. cycle We are in good company Burks, Goldstein, von Neumann, 1946 The Reality S 16 L14 4 Can t afford and don t need as much memory as the size of the user address space (think about 64 bit ISAs) Most machines are multi tasked between several programs You can t find memory technology that is affordable in GByte and also cycle in GHz The magic memory abstraction are nevertheless very useful approximation of reality due to memory hierarchy: large and fast virtual memory: contiguous and private

3 The Law of Storage S 16 L14 5 Bigger is slower SRAM 512 sub nanosec SRAM nanosec DRAM ~50 nanosec Hard Disk ~10 millisec Faster is more expensive (dollars and chip area) SRAM ~$10K per GByte DRAM ~$10 per GByte Hard Disk ~$0.1 per GByte ***Note*** these sample values scale with time How to make memory Bigger, Faster, and Cheaper? S 16 L14 6 Principles behind the solution

4 #1: The Locality Principle One s recent past is a very good predictor of one s near future. Temporal Locality: If you just did something, it is very likely that you will do the same thing again soon since you are here today, there is a good chance you will be here again and again regularly inverse is also true Spatial Locality: If you just did something, it is very likely you will do something similar/related every time I find you in this room, you are probably sitting in the same seat (or nearby) you are probably sitting near the same people S 16 L14 7 Programs are even more predictable than people #1: Memory Locality Typical programs have strong locality in memory references *** typical programs are composed of loops Temporal: programs tend to reference (read and write) the same memory location many times and within a small window of time Spatial: programs tend to reference a cluster of near by memory locations (most notable examples 1. instruction memory references and 2. array/data structure references) Corollary: a program may reference a large number of different memory locations over its lifetime but not all around the same time S 16 L14 8

5 #2: Memoization If something is expensive to compute, remember the answer for a while, just in case it is needed again, soon Memoization needs locality to be effective Without locality storing a large number of different answers (many of which never reused) locating an answer from a large number of stored answers can be more expensive than recomputing it With locality small number of answers gets reused all the time! store a small number of frequent answers can avoid most recomputations S 16 L14 9 #3: Cost Amortization S 16 L14 10 overhead cost : one time cost to set something up per unit cost : cost for per unit of operation total cost = overhead + per unit cost x N average cost = total cost / N = ( overhead / N ) + per unit cost the essence of amortization It is often okay to have a high overhead cost if the cost can be distributed over a large number of units lower the average cost

6 S 16 L14 11 Putting the principles to work Memory Hierarchy S 16 L14 12 move what you use here fast small with strong locality appears as fast as and as large as faster per byte cheaper per byte backup everything here big but slow

7 Managing Memory Hierarchy S 16 L14 13 Manage data movement across hierarchies manually vacuum tubes vs Selectron discussed in von Neumann paper core vs drum memory in the 50 s too painful for programmers on substantial programs still done in some embedded processors on chip scratchpad SRAM in lieu of a cache Automatic management simple heuristic: keep most recently used items in fast mem dates back to ATLAS, 1962 in every modern computer (fast processor, slow DRAM) average programmer doesn t need to know about it You don t need to know how big the cache is to write a correct program! You may if you want a fast program. Memory Abstraction Modern Memory Hierarchy Register File 32 words, sub nsec L1 cache ~32 KB, ~nsec S 16 L14 14 manual register spilling L2 cache 512 KB ~ 1MB, many nsec L3 cache,... automatic cache management Main memory (DRAM), GB, ~100 nsec Swap Disk 100 GB~TB, ~10 msec automatic demand paging

8 S 16 L14 15 Hierarchical Performance Analysis For a given memory hierarchy level i it has a technologyintrinsic access time of t i Perceived access time T i is longer than t i Except for the outer most hierarchy, when looking up a given address a chance (hit rate h i ) you hit and access time is t i a chance (miss rate m i ) you miss and access time t i +T i+1 h i + m i = 1 Thus T i = h i t i + m i (t i + T i+1 ) T i = t i + m i T i+1 think this of as miss penalty **Note**, h i and m i are defined to be the hit rate and miss rate of just the references that missed at L i 1 Hierarchy Design Compromises S 16 L14 16 Recursive latency equation T i = t i + m i T i+1 The goal: achieve desired T 1 within allowed cost T i t i is desirable but not necessary Keep m i low increase capacity C i lowers m i, but increases t i lower m i by smarter management, e.g., replacement: anticipate what you don t need prefetching: anticipate what you will need Keep T i+1 low faster lower hierarchies help, but at increased cost and/or reduced capacity often better to introduce intermediate hierarchies

9 Hierarchy Design Considerations S 16 L14 17 DRAM optimized for capacity/dollar T DRAM is essentially same regardless of capacity SRAM optimized first for capacity/latency; second for capacity/dollar different compromise between capacity and latency possible t i = O( C i ) Hierarchies bridge the difference between CPU speed and DRAM speed T pclk T DRAM no hierarchy needed T pclk << T DRAM one or more levels of SRAM hierarchies to minimize T 1 while staying within cost Intel P4 Example (very fast, very deep pipeline) 90nm P4, 3.6 GHz if m 1 =0.1, m 2 =0.1 L1 D cache T 1 =7.6, T 2 =36 C 1 = 16K if m 1 =0.01, m 2 =0.01 t 1 = 4 cyc int / 9 cycle fp T 1 =4.2, T 2 =19.8 (note: pipelined) L2 D cache if m 1 =0.05, m 2 =0.01 C 2 =1024 KB T 1 =5.00, T 2 =19.8 t 2 = 18 cyc int / 18 cyc fp if m 1 =0.01, m 2 =0.50 Main memory T 1 =5.08, T 2 =108 t 3 = ~ 50ns or 180 cyc Notice best case latency is not 1 anymore Why not? worst case access latency are into 300+ cyc, depending exactly what happens S 16 L14 18

10 Aside: Why is DRAM slow? S 16 L14 19 DRAM fabrication at the forefront of VLSI technology, but scaled with Moore s law in capacity and cost, not speed Between 1980 ~ 2004 DRAM 64K bit 1024M bit (exponential ~55% annual) 250ns 50ns (linear) But, remember, this is a very deliberate choice. We can engineer faster DRAM if we needed to Memory capacity needs to grow linearly with CPU speed to keep a balanced system Amdahl s Other Law DRAM/processor speed difference reconciled through memory hierarchies (L1, L2, L3,...) L2 became common place in the 1990s L3 became common place in the 2000s Pop Quiz S 16 L14 20 What does the principle of say about hierarchical memory? Memory Locality Memoization Amortization

11 S 16 L14 21 Cache Design Basics Cache S 16 L14 22 Generically in computing, any structure that memoizes frequently used results to avoid repeating the long latency operations required to reproduce the results from scratch, e.g. a web cache In computer architecture, an automatically managed memory hierarchy (usually) based on SRAM memoize in a small SRAM the most frequently accessed DRAM memory locations to avoid repeatedly paying for the DRAM access latency

12 Cache Interface for Dummies ready MemWrite S 16 L14 23 Instruction address Instruction memory Instruction valid Address Write data Data memory Read data valid MemRead Like the magic memory we assumed earlier present address, R/W command, etc most of the time result or update valid after a short/fixed latency (1 cyc?) Except, cache may not be valid/ready on every cycle eventually will become valid/ready but what happens to the pipeline until then? [Based on figures from P&H CO&D, COPYRIGHT 2004 Elsevier. ALL RIGHTS RESERVED.] The Basic Problem S 16 L14 24 Potentially M=2 m bytes of memory, how to keep the most frequently used ones in C bytes of fast storage where C << M Basic issues (intertwined) (1) where to cache a memory location? (2) how to find a cached memory location? (3) granularity of management: large, small, uniform? (4) when to bring a memory location into cache? (5) which cached memory location to evict to free up space? Optimizations

13 address Basic Operation S 16 L14 25 (2) cache lookup (1, 3, 5) yes hit? no choose location occupied? no yes return data update cache fetch new from L i+1 evict old to L i+1 data Ans to (4): memory location brought into cache on demand. What about prefetch? Basic Cache Parameters S 16 L14 26 Let M = 2 m be the size of the address space in bytes sample values: 2 32, 2 64 Let G=2 g be the cache access granularity in bytes sample values: 4, 8 Let C be the capacity of the cache in bytes sample values: 16 KByte (L1), 1 MByte (L2) 100% hit rate working set size (W) C

14 Direct Mapped Cache (v1) lg 2 M bit address tag idx g Tag Bank Data Bank S 16 L14 27 t bits lg 2 (C/G) bits C/G lines by t bits valid C/G lines by G bytes t bits = What about writes? G bytes let t= lg 2 M lg 2 (C) hit? data Storage Overhead S 16 L14 28 For each cache block of G bytes, must also store additional t+1 bits where t=lg 2 M lg 2 (C) if M=2 32, G=4, C=16K=2 14 t=18 bits for each 4 byte block 60% storage overhead 16KB cache really needs 25.5KB of SRAM Solution: let multiple G byte words share a common tag each B byte block holds B/G words if M=2 32, B=16, G=4, C=16K t=18 bits for each 16 byte block 15% storage overhead 16KB cache needs 18.4KB of SRAM 15% of 16KB is small, 15% of 1MB is 152KB larger block size for lower/larger hierarchies

15 Direct Mapped Cache (final) lg 2 M bit address tag idx bo g S 16 L14 29 lg 2 (C/B) bits Tag Bank C/B by t bits valid Data Bank C/B by B bytes t bits lg 2 (B/G) bits = t bits B bytes let t= lg 2 M lg 2 (C) hit? G bytes data Direct Mapped Cache C bytes of storage divided into C/B blocks A block of memory is mapped to one particular cache block according the address block index field All addresses with the same block index field map to the same cache block 2 t such addresses; can cache only one such block at a time even if C > working set size, collision is possible given 2 random addresses, chance for collision is 1/(C/B) Notice likelihood for collision decreases with increasing number of cache blocks (C/B) 100% S 16 L14 30 hit rate working set size (W) C

16 Block Size and m i S 16 L14 31 Bytes that share a common tag are all in or all out Loading a multi word block at a time has the effect of prefetching for spatial locality pay miss penalty only once per block works especially well in instruction caches effective up to the limit of spatial locality But, increasing block size (while holding C constant) reduces the number of blocks increases possibility for collision hit rate B Block Size and T i+1 S 16 L14 32 Loading a large block can increase T i+1 if I want the last word on a block, I have to wait for the entire block to be loaded solution 1 critical word first reload L i+1 returns the requested word first then rotate around the complete block supply requested word to pipeline as soon as available solution 2: sub blocking individual valid bits for different sub blocks reload only requested sub block on demand note: all sub blocks stall share common tag tag v s block 0 v s block 1 v s block 2

361 Computer Architecture Lecture 14: Cache Memory

361 Computer Architecture Lecture 14: Cache Memory 1 361 Computer Architecture Lecture 14 Memory cache.1 The Motivation for s Memory System Processor DRAM Motivation Large memories (DRAM) are slow Small memories (SRAM) are fast Make the average access

More information

The Quest for Speed - Memory. Cache Memory. A Solution: Memory Hierarchy. Memory Hierarchy

The Quest for Speed - Memory. Cache Memory. A Solution: Memory Hierarchy. Memory Hierarchy The Quest for Speed - Memory Cache Memory CSE 4, Spring 25 Computer Systems http://www.cs.washington.edu/4 If all memory accesses (IF/lw/sw) accessed main memory, programs would run 20 times slower And

More information

The Big Picture. Cache Memory CSE 675.02. Memory Hierarchy (1/3) Disk

The Big Picture. Cache Memory CSE 675.02. Memory Hierarchy (1/3) Disk The Big Picture Cache Memory CSE 5.2 Computer Processor Memory (active) (passive) Control (where ( brain ) programs, Datapath data live ( brawn ) when running) Devices Input Output Keyboard, Mouse Disk,

More information

Price/performance Modern Memory Hierarchy

Price/performance Modern Memory Hierarchy Lecture 21: Storage Administration Take QUIZ 15 over P&H 6.1-4, 6.8-9 before 11:59pm today Project: Cache Simulator, Due April 29, 2010 NEW OFFICE HOUR TIME: Tuesday 1-2, McKinley Last Time Exam discussion

More information

Slide Set 8. for ENCM 369 Winter 2015 Lecture Section 01. Steve Norman, PhD, PEng

Slide Set 8. for ENCM 369 Winter 2015 Lecture Section 01. Steve Norman, PhD, PEng Slide Set 8 for ENCM 369 Winter 2015 Lecture Section 01 Steve Norman, PhD, PEng Electrical & Computer Engineering Schulich School of Engineering University of Calgary Winter Term, 2015 ENCM 369 W15 Section

More information

what operations can it perform? how does it perform them? on what kind of data? where are instructions and data stored?

what operations can it perform? how does it perform them? on what kind of data? where are instructions and data stored? Inside the CPU how does the CPU work? what operations can it perform? how does it perform them? on what kind of data? where are instructions and data stored? some short, boring programs to illustrate the

More information

COMPUTER HARDWARE. Input- Output and Communication Memory Systems

COMPUTER HARDWARE. Input- Output and Communication Memory Systems COMPUTER HARDWARE Input- Output and Communication Memory Systems Computer I/O I/O devices commonly found in Computer systems Keyboards Displays Printers Magnetic Drives Compact disk read only memory (CD-ROM)

More information

Bindel, Spring 2010 Applications of Parallel Computers (CS 5220) Week 1: Wednesday, Jan 27

Bindel, Spring 2010 Applications of Parallel Computers (CS 5220) Week 1: Wednesday, Jan 27 Logistics Week 1: Wednesday, Jan 27 Because of overcrowding, we will be changing to a new room on Monday (Snee 1120). Accounts on the class cluster (crocus.csuglab.cornell.edu) will be available next week.

More information

Computer Architecture

Computer Architecture Cache Memory Gábor Horváth 2016. április 27. Budapest associate professor BUTE Dept. Of Networked Systems and Services ghorvath@hit.bme.hu It is the memory,... The memory is a serious bottleneck of Neumann

More information

Introduction Disks RAID Tertiary storage. Mass Storage. CMSC 412, University of Maryland. Guest lecturer: David Hovemeyer.

Introduction Disks RAID Tertiary storage. Mass Storage. CMSC 412, University of Maryland. Guest lecturer: David Hovemeyer. Guest lecturer: David Hovemeyer November 15, 2004 The memory hierarchy Red = Level Access time Capacity Features Registers nanoseconds 100s of bytes fixed Cache nanoseconds 1-2 MB fixed RAM nanoseconds

More information

Memory Hierarchy. Arquitectura de Computadoras. Centro de Investigación n y de Estudios Avanzados del IPN. adiaz@cinvestav.mx. MemoryHierarchy- 1

Memory Hierarchy. Arquitectura de Computadoras. Centro de Investigación n y de Estudios Avanzados del IPN. adiaz@cinvestav.mx. MemoryHierarchy- 1 Hierarchy Arturo Díaz D PérezP Centro de Investigación n y de Estudios Avanzados del IPN adiaz@cinvestav.mx Hierarchy- 1 The Big Picture: Where are We Now? The Five Classic Components of a Computer Processor

More information

Memory Management CS 217. Two programs can t control all of memory simultaneously

Memory Management CS 217. Two programs can t control all of memory simultaneously Memory Management CS 217 Memory Management Problem 1: Two programs can t control all of memory simultaneously Problem 2: One program shouldn t be allowed to access/change the memory of another program

More information

Choosing a Computer for Running SLX, P3D, and P5

Choosing a Computer for Running SLX, P3D, and P5 Choosing a Computer for Running SLX, P3D, and P5 This paper is based on my experience purchasing a new laptop in January, 2010. I ll lead you through my selection criteria and point you to some on-line

More information

In-Memory Databases Algorithms and Data Structures on Modern Hardware. Martin Faust David Schwalb Jens Krüger Jürgen Müller

In-Memory Databases Algorithms and Data Structures on Modern Hardware. Martin Faust David Schwalb Jens Krüger Jürgen Müller In-Memory Databases Algorithms and Data Structures on Modern Hardware Martin Faust David Schwalb Jens Krüger Jürgen Müller The Free Lunch Is Over 2 Number of transistors per CPU increases Clock frequency

More information

COMPUTER ORGANIZATION ARCHITECTURES FOR EMBEDDED COMPUTING

COMPUTER ORGANIZATION ARCHITECTURES FOR EMBEDDED COMPUTING COMPUTER ORGANIZATION ARCHITECTURES FOR EMBEDDED COMPUTING 2013/2014 1 st Semester Sample Exam January 2014 Duration: 2h00 - No extra material allowed. This includes notes, scratch paper, calculator, etc.

More information

COS 318: Operating Systems. Virtual Memory and Address Translation

COS 318: Operating Systems. Virtual Memory and Address Translation COS 318: Operating Systems Virtual Memory and Address Translation Today s Topics Midterm Results Virtual Memory Virtualization Protection Address Translation Base and bound Segmentation Paging Translation

More information

Writing Assignment #2 due Today (5:00pm) - Post on your CSC101 webpage - Ask if you have questions! Lab #2 Today. Quiz #1 Tomorrow (Lectures 1-7)

Writing Assignment #2 due Today (5:00pm) - Post on your CSC101 webpage - Ask if you have questions! Lab #2 Today. Quiz #1 Tomorrow (Lectures 1-7) Overview of Computer Science CSC 101 Summer 2011 Main Memory vs. Auxiliary Storage Lecture 7 July 14, 2011 Announcements Writing Assignment #2 due Today (5:00pm) - Post on your CSC101 webpage - Ask if

More information

Computer Organization and Architecture. Characteristics of Memory Systems. Chapter 4 Cache Memory. Location CPU Registers and control unit memory

Computer Organization and Architecture. Characteristics of Memory Systems. Chapter 4 Cache Memory. Location CPU Registers and control unit memory Computer Organization and Architecture Chapter 4 Cache Memory Characteristics of Memory Systems Note: Appendix 4A will not be covered in class, but the material is interesting reading and may be used in

More information

Lecture 17: Virtual Memory II. Goals of virtual memory

Lecture 17: Virtual Memory II. Goals of virtual memory Lecture 17: Virtual Memory II Last Lecture: Introduction to virtual memory Today Review and continue virtual memory discussion Lecture 17 1 Goals of virtual memory Make it appear as if each process has:

More information

Board Notes on Virtual Memory

Board Notes on Virtual Memory Board Notes on Virtual Memory Part A: Why Virtual Memory? - Letʼs user program size exceed the size of the physical address space - Supports protection o Donʼt know which program might share memory at

More information

Secondary Storage. Any modern computer system will incorporate (at least) two levels of storage: magnetic disk/optical devices/tape systems

Secondary Storage. Any modern computer system will incorporate (at least) two levels of storage: magnetic disk/optical devices/tape systems 1 Any modern computer system will incorporate (at least) two levels of storage: primary storage: typical capacity cost per MB $3. typical access time burst transfer rate?? secondary storage: typical capacity

More information

Quiz for Chapter 6 Storage and Other I/O Topics 3.10

Quiz for Chapter 6 Storage and Other I/O Topics 3.10 Date: 3.10 Not all questions are of equal difficulty. Please review the entire quiz first and then budget your time carefully. Name: Course: Solutions in Red 1. [6 points] Give a concise answer to each

More information

Measuring Cache and Memory Latency and CPU to Memory Bandwidth

Measuring Cache and Memory Latency and CPU to Memory Bandwidth White Paper Joshua Ruggiero Computer Systems Engineer Intel Corporation Measuring Cache and Memory Latency and CPU to Memory Bandwidth For use with Intel Architecture December 2008 1 321074 Executive Summary

More information

Homework # 2. Solutions. 4.1 What are the differences among sequential access, direct access, and random access?

Homework # 2. Solutions. 4.1 What are the differences among sequential access, direct access, and random access? ECE337 / CS341, Fall 2005 Introduction to Computer Architecture and Organization Instructor: Victor Manuel Murray Herrera Date assigned: 09/19/05, 05:00 PM Due back: 09/30/05, 8:00 AM Homework # 2 Solutions

More information

Logical Operations. Control Unit. Contents. Arithmetic Operations. Objectives. The Central Processing Unit: Arithmetic / Logic Unit.

Logical Operations. Control Unit. Contents. Arithmetic Operations. Objectives. The Central Processing Unit: Arithmetic / Logic Unit. Objectives The Central Processing Unit: What Goes on Inside the Computer Chapter 4 Identify the components of the central processing unit and how they work together and interact with memory Describe how

More information

More on Pipelining and Pipelines in Real Machines CS 333 Fall 2006 Main Ideas Data Hazards RAW WAR WAW More pipeline stall reduction techniques Branch prediction» static» dynamic bimodal branch prediction

More information

18-548/15-548 Associativity 9/16/98. 7 Associativity. 18-548/15-548 Memory System Architecture Philip Koopman September 16, 1998

18-548/15-548 Associativity 9/16/98. 7 Associativity. 18-548/15-548 Memory System Architecture Philip Koopman September 16, 1998 7 Associativity 18-548/15-548 Memory System Architecture Philip Koopman September 16, 1998 Required Reading: Cragon pg. 166-174 Assignments By next class read about data management policies: Cragon 2.2.4-2.2.6,

More information

Solid State Drive Architecture

Solid State Drive Architecture Solid State Drive Architecture A comparison and evaluation of data storage mediums Tyler Thierolf Justin Uriarte Outline Introduction Storage Device as Limiting Factor Terminology Internals Interface Architecture

More information

ELE 356 Computer Engineering II. Section 1 Foundations Class 6 Architecture

ELE 356 Computer Engineering II. Section 1 Foundations Class 6 Architecture ELE 356 Computer Engineering II Section 1 Foundations Class 6 Architecture History ENIAC Video 2 tj History Mechanical Devices Abacus 3 tj History Mechanical Devices The Antikythera Mechanism Oldest known

More information

A3 Computer Architecture

A3 Computer Architecture A3 Computer Architecture Engineering Science 3rd year A3 Lectures Prof David Murray david.murray@eng.ox.ac.uk www.robots.ox.ac.uk/ dwm/courses/3co Michaelmas 2000 1 / 1 6. Stacks, Subroutines, and Memory

More information

Chapter 1 Computer System Overview

Chapter 1 Computer System Overview Operating Systems: Internals and Design Principles Chapter 1 Computer System Overview Eighth Edition By William Stallings Operating System Exploits the hardware resources of one or more processors Provides

More information

Why Computers Are Getting Slower (and what we can do about it) Rik van Riel Sr. Software Engineer, Red Hat

Why Computers Are Getting Slower (and what we can do about it) Rik van Riel Sr. Software Engineer, Red Hat Why Computers Are Getting Slower (and what we can do about it) Rik van Riel Sr. Software Engineer, Red Hat Why Computers Are Getting Slower The traditional approach better performance Why computers are

More information

Introduction to Cloud Computing

Introduction to Cloud Computing Introduction to Cloud Computing Parallel Processing I 15 319, spring 2010 7 th Lecture, Feb 2 nd Majd F. Sakr Lecture Motivation Concurrency and why? Different flavors of parallel computing Get the basic

More information

Virtual Memory. How is it possible for each process to have contiguous addresses and so many of them? A System Using Virtual Addressing

Virtual Memory. How is it possible for each process to have contiguous addresses and so many of them? A System Using Virtual Addressing How is it possible for each process to have contiguous addresses and so many of them? Computer Systems Organization (Spring ) CSCI-UA, Section Instructor: Joanna Klukowska Teaching Assistants: Paige Connelly

More information

CSE 160 Lecture 5. The Memory Hierarchy False Sharing Cache Coherence and Consistency. Scott B. Baden

CSE 160 Lecture 5. The Memory Hierarchy False Sharing Cache Coherence and Consistency. Scott B. Baden CSE 160 Lecture 5 The Memory Hierarchy False Sharing Cache Coherence and Consistency Scott B. Baden Using Bang coming down the home stretch Do not use Bang s front end for running mergesort Use batch,

More information

CHAPTER 7: The CPU and Memory

CHAPTER 7: The CPU and Memory CHAPTER 7: The CPU and Memory The Architecture of Computer Hardware, Systems Software & Networking: An Information Technology Approach 4th Edition, Irv Englander John Wiley and Sons 2010 PowerPoint slides

More information

Why Latency Lags Bandwidth, and What it Means to Computing

Why Latency Lags Bandwidth, and What it Means to Computing Why Latency Lags Bandwidth, and What it Means to Computing David Patterson U.C. Berkeley patterson@cs.berkeley.edu October 2004 Bandwidth Rocks (1) Preview: Latency Lags Bandwidth Over last 20 to 25 years,

More information

Physical Data Organization

Physical Data Organization Physical Data Organization Database design using logical model of the database - appropriate level for users to focus on - user independence from implementation details Performance - other major factor

More information

Lecture 1: the anatomy of a supercomputer

Lecture 1: the anatomy of a supercomputer Where a calculator on the ENIAC is equipped with 18,000 vacuum tubes and weighs 30 tons, computers of the future may have only 1,000 vacuum tubes and perhaps weigh 1½ tons. Popular Mechanics, March 1949

More information

Module 2. Embedded Processors and Memory. Version 2 EE IIT, Kharagpur 1

Module 2. Embedded Processors and Memory. Version 2 EE IIT, Kharagpur 1 Module 2 Embedded Processors and Memory Version 2 EE IIT, Kharagpur 1 Lesson 5 Memory-I Version 2 EE IIT, Kharagpur 2 Instructional Objectives After going through this lesson the student would Pre-Requisite

More information

CS 464/564 Introduction to Database Management System Instructor: Abdullah Mueen

CS 464/564 Introduction to Database Management System Instructor: Abdullah Mueen CS 464/564 Introduction to Database Management System Instructor: Abdullah Mueen LECTURE 14: DATA STORAGE AND REPRESENTATION Data Storage Memory Hierarchy Disks Fields, Records, Blocks Variable-length

More information

Chapter 2 - Computer Organization

Chapter 2 - Computer Organization Chapter 2 - Computer Organization CPU organization Basic Elements and Principles Parallelism Memory Storage Hierarchy I/O Fast survey of devices Character Codes Ascii, Unicode Homework: Chapter 1 # 2,

More information

Memory Management Outline. Background Swapping Contiguous Memory Allocation Paging Segmentation Segmented Paging

Memory Management Outline. Background Swapping Contiguous Memory Allocation Paging Segmentation Segmented Paging Memory Management Outline Background Swapping Contiguous Memory Allocation Paging Segmentation Segmented Paging 1 Background Memory is a large array of bytes memory and registers are only storage CPU can

More information

Computers. Hardware. The Central Processing Unit (CPU) CMPT 125: Lecture 1: Understanding the Computer

Computers. Hardware. The Central Processing Unit (CPU) CMPT 125: Lecture 1: Understanding the Computer Computers CMPT 125: Lecture 1: Understanding the Computer Tamara Smyth, tamaras@cs.sfu.ca School of Computing Science, Simon Fraser University January 3, 2009 A computer performs 2 basic functions: 1.

More information

1. Memory technology & Hierarchy

1. Memory technology & Hierarchy 1. Memory technology & Hierarchy RAM types Advances in Computer Architecture Andy D. Pimentel Memory wall Memory wall = divergence between CPU and RAM speed We can increase bandwidth by introducing concurrency

More information

Outline: Operating Systems

Outline: Operating Systems Outline: Operating Systems What is an OS OS Functions Multitasking Virtual Memory File Systems Window systems PC Operating System Wars: Windows vs. Linux 1 Operating System provides a way to boot (start)

More information

CS:APP Chapter 4 Computer Architecture. Wrap-Up. William J. Taffe Plymouth State University. using the slides of

CS:APP Chapter 4 Computer Architecture. Wrap-Up. William J. Taffe Plymouth State University. using the slides of CS:APP Chapter 4 Computer Architecture Wrap-Up William J. Taffe Plymouth State University using the slides of Randal E. Bryant Carnegie Mellon University Overview Wrap-Up of PIPE Design Performance analysis

More information

Central Processing Unit (CPU)

Central Processing Unit (CPU) Central Processing Unit (CPU) CPU is the heart and brain It interprets and executes machine level instructions Controls data transfer from/to Main Memory (MM) and CPU Detects any errors In the following

More information

Input / Ouput devices. I/O Chapter 8. Goals & Constraints. Measures of Performance. Anatomy of a Disk Drive. Introduction - 8.1

Input / Ouput devices. I/O Chapter 8. Goals & Constraints. Measures of Performance. Anatomy of a Disk Drive. Introduction - 8.1 Introduction - 8.1 I/O Chapter 8 Disk Storage and Dependability 8.2 Buses and other connectors 8.4 I/O performance measures 8.6 Input / Ouput devices keyboard, mouse, printer, game controllers, hard drive,

More information

Quiz for Chapter 1 Computer Abstractions and Technology 3.10

Quiz for Chapter 1 Computer Abstractions and Technology 3.10 Date: 3.10 Not all questions are of equal difficulty. Please review the entire quiz first and then budget your time carefully. Name: Course: Solutions in Red 1. [15 points] Consider two different implementations,

More information

Chapter 7: Distributed Systems: Warehouse-Scale Computing. Fall 2011 Jussi Kangasharju

Chapter 7: Distributed Systems: Warehouse-Scale Computing. Fall 2011 Jussi Kangasharju Chapter 7: Distributed Systems: Warehouse-Scale Computing Fall 2011 Jussi Kangasharju Chapter Outline Warehouse-scale computing overview Workloads and software infrastructure Failures and repairs Note:

More information

18-742 Lecture 4. Parallel Programming II. Homework & Reading. Page 1. Projects handout On Friday Form teams, groups of two

18-742 Lecture 4. Parallel Programming II. Homework & Reading. Page 1. Projects handout On Friday Form teams, groups of two age 1 18-742 Lecture 4 arallel rogramming II Spring 2005 rof. Babak Falsafi http://www.ece.cmu.edu/~ece742 write X Memory send X Memory read X Memory Slides developed in part by rofs. Adve, Falsafi, Hill,

More information

An Introduction to Computer Science and Computer Organization Comp 150 Fall 2008

An Introduction to Computer Science and Computer Organization Comp 150 Fall 2008 An Introduction to Computer Science and Computer Organization Comp 150 Fall 2008 Computer Science the study of algorithms, including Their formal and mathematical properties Their hardware realizations

More information

Introduction to Microprocessors

Introduction to Microprocessors Introduction to Microprocessors Yuri Baida yuri.baida@gmail.com yuriy.v.baida@intel.com October 2, 2010 Moscow Institute of Physics and Technology Agenda Background and History What is a microprocessor?

More information

This Unit: Caches. CIS 501 Introduction to Computer Architecture. Motivation. Types of Memory

This Unit: Caches. CIS 501 Introduction to Computer Architecture. Motivation. Types of Memory This Unit: Caches CIS 5 Introduction to Computer Architecture Unit 3: Storage Hierarchy I: Caches Application OS Compiler Firmware CPU I/O Memory Digital Circuits Gates & Transistors Memory hierarchy concepts

More information

Memory ICS 233. Computer Architecture and Assembly Language Prof. Muhamed Mudawar

Memory ICS 233. Computer Architecture and Assembly Language Prof. Muhamed Mudawar Memory ICS 233 Computer Architecture and Assembly Language Prof. Muhamed Mudawar College of Computer Sciences and Engineering King Fahd University of Petroleum and Minerals Presentation Outline Random

More information

Design-space exploration of flash augmented architectures

Design-space exploration of flash augmented architectures Design-space exploration of flash augmented architectures Thanumalayan S 1, Vijay Chidambaram V 1, Ranjani Parthasarathi 2 College of Engineering, Guindy, Anna University Abstract Flash technologies are

More information

Lecture 3: Modern GPUs A Hardware Perspective Mohamed Zahran (aka Z) mzahran@cs.nyu.edu http://www.mzahran.com

Lecture 3: Modern GPUs A Hardware Perspective Mohamed Zahran (aka Z) mzahran@cs.nyu.edu http://www.mzahran.com CSCI-GA.3033-012 Graphics Processing Units (GPUs): Architecture and Programming Lecture 3: Modern GPUs A Hardware Perspective Mohamed Zahran (aka Z) mzahran@cs.nyu.edu http://www.mzahran.com Modern GPU

More information

File System Management

File System Management Lecture 7: Storage Management File System Management Contents Non volatile memory Tape, HDD, SSD Files & File System Interface Directories & their Organization File System Implementation Disk Space Allocation

More information

Memory Basics. SRAM/DRAM Basics

Memory Basics. SRAM/DRAM Basics Memory Basics RAM: Random Access Memory historically defined as memory array with individual bit access refers to memory with both Read and Write capabilities ROM: Read Only Memory no capabilities for

More information

CS 61C: Great Ideas in Computer Architecture Virtual Memory Cont.

CS 61C: Great Ideas in Computer Architecture Virtual Memory Cont. CS 61C: Great Ideas in Computer Architecture Virtual Memory Cont. Instructors: Vladimir Stojanovic & Nicholas Weaver http://inst.eecs.berkeley.edu/~cs61c/ 1 Bare 5-Stage Pipeline Physical Address PC Inst.

More information

Computer Performance. Topic 3. Contents. Prerequisite knowledge Before studying this topic you should be able to:

Computer Performance. Topic 3. Contents. Prerequisite knowledge Before studying this topic you should be able to: 55 Topic 3 Computer Performance Contents 3.1 Introduction...................................... 56 3.2 Measuring performance............................... 56 3.2.1 Clock Speed.................................

More information

Introducción. Diseño de sistemas digitales.1

Introducción. Diseño de sistemas digitales.1 Introducción Adapted from: Mary Jane Irwin ( www.cse.psu.edu/~mji ) www.cse.psu.edu/~cg431 [Original from Computer Organization and Design, Patterson & Hennessy, 2005, UCB] Diseño de sistemas digitales.1

More information

Understanding Data Locality in VMware Virtual SAN

Understanding Data Locality in VMware Virtual SAN Understanding Data Locality in VMware Virtual SAN July 2014 Edition T E C H N I C A L M A R K E T I N G D O C U M E N T A T I O N Table of Contents Introduction... 2 Virtual SAN Design Goals... 3 Data

More information

Processor Architectures

Processor Architectures ECPE 170 Jeff Shafer University of the Pacific Processor Architectures 2 Schedule Exam 3 Tuesday, December 6 th Caches Virtual Memory Input / Output OperaKng Systems Compilers & Assemblers Processor Architecture

More information

Performance with the Oracle Database Cloud

Performance with the Oracle Database Cloud An Oracle White Paper September 2012 Performance with the Oracle Database Cloud Multi-tenant architectures and resource sharing 1 Table of Contents Overview... 3 Performance and the Cloud... 4 Performance

More information

Lecture 3: Evaluating Computer Architectures. Software & Hardware: The Virtuous Cycle?

Lecture 3: Evaluating Computer Architectures. Software & Hardware: The Virtuous Cycle? Lecture 3: Evaluating Computer Architectures Announcements - Reminder: Homework 1 due Thursday 2/2 Last Time technology back ground Computer elements Circuits and timing Virtuous cycle of the past and

More information

Communicating with devices

Communicating with devices Introduction to I/O Where does the data for our CPU and memory come from or go to? Computers communicate with the outside world via I/O devices. Input devices supply computers with data to operate on.

More information

COS 318: Operating Systems. Storage Devices. Kai Li Computer Science Department Princeton University. (http://www.cs.princeton.edu/courses/cos318/)

COS 318: Operating Systems. Storage Devices. Kai Li Computer Science Department Princeton University. (http://www.cs.princeton.edu/courses/cos318/) COS 318: Operating Systems Storage Devices Kai Li Computer Science Department Princeton University (http://www.cs.princeton.edu/courses/cos318/) Today s Topics Magnetic disks Magnetic disk performance

More information

Computer Architecture

Computer Architecture Computer Architecture Slide Sets WS 2013/2014 Prof. Dr. Uwe Brinkschulte M.Sc. Benjamin Betting Part 11 Memory Management Computer Architecture Part 11 page 1 of 44 Prof. Dr. Uwe Brinkschulte, M.Sc. Benjamin

More information

Operating Systems. Virtual Memory

Operating Systems. Virtual Memory Operating Systems Virtual Memory Virtual Memory Topics. Memory Hierarchy. Why Virtual Memory. Virtual Memory Issues. Virtual Memory Solutions. Locality of Reference. Virtual Memory with Segmentation. Page

More information

Optimizing matrix multiplication Amitabha Banerjee abanerjee@ucdavis.edu

Optimizing matrix multiplication Amitabha Banerjee abanerjee@ucdavis.edu Optimizing matrix multiplication Amitabha Banerjee abanerjee@ucdavis.edu Present compilers are incapable of fully harnessing the processor architecture complexity. There is a wide gap between the available

More information

Random Access Memory (RAM) Types of RAM. RAM Random Access Memory Jamie Tees SDRAM. Micro-DIMM SO-DIMM

Random Access Memory (RAM) Types of RAM. RAM Random Access Memory Jamie Tees SDRAM. Micro-DIMM SO-DIMM Random Access Memory (RAM) Sends/Receives data quickly between CPU This is way quicker than using just the HDD RAM holds temporary data used by any open application or active / running process Multiple

More information

CS161: Operating Systems

CS161: Operating Systems CS161: Operating Systems Matt Welsh mdw@eecs.harvard.edu Lecture 18: RAID April 19, 2007 2007 Matt Welsh Harvard University 1 RAID Redundant Arrays of Inexpensive Disks Invented in 1986-1987 by David Patterson

More information

a storage location directly on the CPU, used for temporary storage of small amounts of data during processing.

a storage location directly on the CPU, used for temporary storage of small amounts of data during processing. CS143 Handout 18 Summer 2008 30 July, 2008 Processor Architectures Handout written by Maggie Johnson and revised by Julie Zelenski. Architecture Vocabulary Let s review a few relevant hardware definitions:

More information

SAP HANA - Main Memory Technology: A Challenge for Development of Business Applications. Jürgen Primsch, SAP AG July 2011

SAP HANA - Main Memory Technology: A Challenge for Development of Business Applications. Jürgen Primsch, SAP AG July 2011 SAP HANA - Main Memory Technology: A Challenge for Development of Business Applications Jürgen Primsch, SAP AG July 2011 Why In-Memory? Information at the Speed of Thought Imagine access to business data,

More information

Architecture of Hitachi SR-8000

Architecture of Hitachi SR-8000 Architecture of Hitachi SR-8000 University of Stuttgart High-Performance Computing-Center Stuttgart (HLRS) www.hlrs.de Slide 1 Most of the slides from Hitachi Slide 2 the problem modern computer are data

More information

CS 6290 I/O and Storage. Milos Prvulovic

CS 6290 I/O and Storage. Milos Prvulovic CS 6290 I/O and Storage Milos Prvulovic Storage Systems I/O performance (bandwidth, latency) Bandwidth improving, but not as fast as CPU Latency improving very slowly Consequently, by Amdahl s Law: fraction

More information

Indexing on Solid State Drives based on Flash Memory

Indexing on Solid State Drives based on Flash Memory Indexing on Solid State Drives based on Flash Memory Florian Keusch MASTER S THESIS Systems Group Department of Computer Science ETH Zurich http://www.systems.ethz.ch/ September 2008 - March 2009 Supervised

More information

This Unit: Virtual Memory. CIS 501 Computer Architecture. Readings. A Computer System: Hardware

This Unit: Virtual Memory. CIS 501 Computer Architecture. Readings. A Computer System: Hardware This Unit: Virtual CIS 501 Computer Architecture Unit 5: Virtual App App App System software Mem CPU I/O The operating system (OS) A super-application Hardware support for an OS Virtual memory Page tables

More information

Thread level parallelism

Thread level parallelism Thread level parallelism ILP is used in straight line code or loops Cache miss (off-chip cache and main memory) is unlikely to be hidden using ILP. Thread level parallelism is used instead. Thread: process

More information

EFFICIENT EXTERNAL SORTING ON FLASH MEMORY EMBEDDED DEVICES

EFFICIENT EXTERNAL SORTING ON FLASH MEMORY EMBEDDED DEVICES ABSTRACT EFFICIENT EXTERNAL SORTING ON FLASH MEMORY EMBEDDED DEVICES Tyler Cossentine and Ramon Lawrence Department of Computer Science, University of British Columbia Okanagan Kelowna, BC, Canada tcossentine@gmail.com

More information

This Unit: Putting It All Together. CIS 501 Computer Architecture. Sources. What is Computer Architecture?

This Unit: Putting It All Together. CIS 501 Computer Architecture. Sources. What is Computer Architecture? This Unit: Putting It All Together CIS 501 Computer Architecture Unit 11: Putting It All Together: Anatomy of the XBox 360 Game Console Slides originally developed by Amir Roth with contributions by Milo

More information

INTRODUCTION The collection of data that makes up a computerized database must be stored physically on some computer storage medium.

INTRODUCTION The collection of data that makes up a computerized database must be stored physically on some computer storage medium. Chapter 4: Record Storage and Primary File Organization 1 Record Storage and Primary File Organization INTRODUCTION The collection of data that makes up a computerized database must be stored physically

More information

Concept of Cache in web proxies

Concept of Cache in web proxies Concept of Cache in web proxies Chan Kit Wai and Somasundaram Meiyappan 1. Introduction Caching is an effective performance enhancing technique that has been used in computer systems for decades. However,

More information

Advanced Computer Architecture-CS501. Computer Systems Design and Architecture 2.1, 2.2, 3.2

Advanced Computer Architecture-CS501. Computer Systems Design and Architecture 2.1, 2.2, 3.2 Lecture Handout Computer Architecture Lecture No. 2 Reading Material Vincent P. Heuring&Harry F. Jordan Chapter 2,Chapter3 Computer Systems Design and Architecture 2.1, 2.2, 3.2 Summary 1) A taxonomy of

More information

Parallel Programming Survey

Parallel Programming Survey Christian Terboven 02.09.2014 / Aachen, Germany Stand: 26.08.2014 Version 2.3 IT Center der RWTH Aachen University Agenda Overview: Processor Microarchitecture Shared-Memory

More information

Understanding the Benefits of IBM SPSS Statistics Server

Understanding the Benefits of IBM SPSS Statistics Server IBM SPSS Statistics Server Understanding the Benefits of IBM SPSS Statistics Server Contents: 1 Introduction 2 Performance 101: Understanding the drivers of better performance 3 Why performance is faster

More information

1 Storage Devices Summary

1 Storage Devices Summary Chapter 1 Storage Devices Summary Dependability is vital Suitable measures Latency how long to the first bit arrives Bandwidth/throughput how fast does stuff come through after the latency period Obvious

More information

CPU Organization and Assembly Language

CPU Organization and Assembly Language COS 140 Foundations of Computer Science School of Computing and Information Science University of Maine October 2, 2015 Outline 1 2 3 4 5 6 7 8 Homework and announcements Reading: Chapter 12 Homework:

More information

HY345 Operating Systems

HY345 Operating Systems HY345 Operating Systems Recitation 2 - Memory Management Solutions Panagiotis Papadopoulos panpap@csd.uoc.gr Problem 7 Consider the following C program: int X[N]; int step = M; //M is some predefined constant

More information

The continuum of data management techniques for explicitly managed systems

The continuum of data management techniques for explicitly managed systems The continuum of data management techniques for explicitly managed systems Svetozar Miucin, Craig Mustard Simon Fraser University MCES 2013. Montreal Introduction Explicitly Managed Memory systems lack

More information

Rambus Smart Data Acceleration

Rambus Smart Data Acceleration Rambus Smart Data Acceleration Back to the Future Memory and Data Access: The Final Frontier As an industry, if real progress is to be made towards the level of computing that the future mandates, then

More information

Chapter 2 Basic Structure of Computers. Jin-Fu Li Department of Electrical Engineering National Central University Jungli, Taiwan

Chapter 2 Basic Structure of Computers. Jin-Fu Li Department of Electrical Engineering National Central University Jungli, Taiwan Chapter 2 Basic Structure of Computers Jin-Fu Li Department of Electrical Engineering National Central University Jungli, Taiwan Outline Functional Units Basic Operational Concepts Bus Structures Software

More information

The Reduced Address Space (RAS) for Application Memory Authentication

The Reduced Address Space (RAS) for Application Memory Authentication The Reduced Address Space (RAS) for Application Memory Authentication David Champagne, Reouven Elbaz and Ruby B. Lee Princeton University, USA Introduction Background: TPM, XOM, AEGIS, SP, SecureBlue want

More information

Parallel Algorithm Engineering

Parallel Algorithm Engineering Parallel Algorithm Engineering Kenneth S. Bøgh PhD Fellow Based on slides by Darius Sidlauskas Outline Background Current multicore architectures UMA vs NUMA The openmp framework Examples Software crisis

More information

TPC-W * : Benchmarking An Ecommerce Solution By Wayne D. Smith, Intel Corporation Revision 1.2

TPC-W * : Benchmarking An Ecommerce Solution By Wayne D. Smith, Intel Corporation Revision 1.2 TPC-W * : Benchmarking An Ecommerce Solution By Wayne D. Smith, Intel Corporation Revision 1.2 1 INTRODUCTION How does one determine server performance and price/performance for an Internet commerce, Ecommerce,

More information

Putting it all together: Intel Nehalem. http://www.realworldtech.com/page.cfm?articleid=rwt040208182719

Putting it all together: Intel Nehalem. http://www.realworldtech.com/page.cfm?articleid=rwt040208182719 Putting it all together: Intel Nehalem http://www.realworldtech.com/page.cfm?articleid=rwt040208182719 Intel Nehalem Review entire term by looking at most recent microprocessor from Intel Nehalem is code

More information

Multi-Threading Performance on Commodity Multi-Core Processors

Multi-Threading Performance on Commodity Multi-Core Processors Multi-Threading Performance on Commodity Multi-Core Processors Jie Chen and William Watson III Scientific Computing Group Jefferson Lab 12000 Jefferson Ave. Newport News, VA 23606 Organization Introduction

More information

Lecture 16: Storage Devices

Lecture 16: Storage Devices CS 422/522 Design & Implementation of Operating Systems Lecture 16: Storage Devices Zhong Shao Dept. of Computer Science Yale University Acknowledgement: some slides are taken from previous versions of

More information