CS61: Systems Programming and Machine Organization Harvard University, Fall 2009 Lecture 9: Memory and Storage Technologies October 1, 2009
Announcements Lab 3 has been released! You are welcome to switch partners In fact, we often suggest working alone Start early on Lab 3!!! 2
Topics for today Memory: SRAM, DRAM, different memory technologies Disks: How they work, why they're so slow I/O and memory buses: Connecting things together 3
What's inside a computer anyway? 4
Audio Ethernet USB PCIe (graphics) PCI slots Northbridge CPU Southbridge Memory slots IDE SATA Floppy Power 5
What's inside a computer anyway? http://en.wikipedia.org/wiki/file:motherboard_diagram.svg 6
Random-Access Memory (RAM) Key features RAM is traditionally packaged as a chip. Basic storage unit is normally a cell (one bit per cell). Multiple RAM chips form a memory. Static RAM (SRAM) Each cell stores a bit with a four or six-transistor circuit. Retains value indefinitely, as long as it is kept powered. Relatively insensitive to electrical noise (EMI), radiation, etc. Faster and more expensive than DRAM can be 100x more expensive! Dynamic RAM (DRAM) Each cell stores bit with a capacitor. One transistor is used for access Stored electric charge decays over time -- must be refreshed every 10-100 ms. More sensitive to disturbances (EMI, radiation, ) than SRAM. Slower, but cheaper than SRAM. 7
Conventional DRAM Organization d x w DRAM: d*w total bits organized as d supercells of size w bits 16 x 8 DRAM chip cols 0 1 2 3 0 2 bits / addr 1 rows memory controller 2 supercell (2,1) (to CPU) 3 8 bits / data internal row buffer 8
Reading DRAM Supercell (2,1) Step 1(a): Row access strobe (RAS) selects row 2. Step 1(b): Row 2 copied from DRAM array to row buffer. 16 x 8 DRAM chip cols 0 RAS = 2 2 / 1 2 3 0 addr 1 rows memory controller 2 8 / 3 data internal row buffer 9
Reading DRAM Supercell (2,1) Step 2(a): Column access strobe (CAS) selects column 1. Step 2(b): Supercell (2,1) copied from buffer to data lines, and eventually back to the CPU. 16 x 8 DRAM chip cols 0 CAS = 1 2 / 1 2 3 0 addr To CPU 1 rows memory controller supercell (2,1) 2 8 / 3 data supercell (2,1) internal row buffer 10
Memory Modules addr (row = i, col = j) : supercell (i,j) DRAM 0 64 MB memory module consisting of eight 8Mx8 DRAMs DRAM 7 bits bits bits 56-63 48-55 40-47 63 56 55 48 47 40 39 bits bits bits bits 32-39 24-31 16-23 8-15 32 31 24 23 16 15 8 7 bits 0-7 0 Memory controller 64-bit doubleword at main memory address A 64-bit doubleword 11
DRAM packaging Dual In-Line Package (DIP) Single In-Line Pin Package (SIPP) Single In-Line Memory Module (SIMM) 30-pin 72-pin Double In-Line Memory Module (DIMM) 168-pin Double Data Rate (DDR) DIMM (184-pin) http://en.wikipedia.org/wiki/file:ram_n.jpg 12
Enhanced DRAMs DRAM Cores with better interface logic and faster I/O : Synchronous DRAM (SDRAM) Uses a conventional clock signal instead of asynchronous control Double data-rate synchronous DRAM (DDR SDRAM) Double edge clocking sends two bits per cycle per pin RamBus DRAM (RDRAM) Uses faster signaling over fewer wires with a transaction oriented interface protocol Obsolete Technologies : Fast page mode DRAM (FPM DRAM) Allowed re-use of row-addresses Extended data out DRAM (EDO DRAM) Enhanced FPM DRAM with more closely spaced CAS signals. Video RAM (VRAM) Dual ported FPM DRAM with a second, concurrent, serial interface Extra functionality DRAMS (CDRAM, GDRAM) Added SRAM (CDRAM) and support for graphics operations (GDRAM) 13
Nonvolatile Memories DRAM and SRAM are volatile memories Lose information if powered off. Nonvolatile memories retain value even if powered off Read-only memory (ROM): programmed during production Magnetic RAM (MRAM): stores bit magnetically (in development) Ferro-electric RAM (FERAM): uses a ferro-electric dielectric Programmable ROM (PROM): can be programmed once Eraseable PROM (EPROM): can be bulk erased (UV, X-Ray) Electrically eraseable PROM (EEPROM): electronic erase capability Flash memory: EEPROMs with partial (sector) erase capability Uses for Nonvolatile Memories Firmware programs stored in a ROM (BIOS, controllers for disks, network cards, graphics accelerators, security subsystems, ) Solid state disks (flash cards, memory sticks, etc.) Smart cards, embedded systems, appliances Disk caches 14
Traditional Bus Structure Connecting CPU and Memory A bus is a collection of parallel wires that carry address, data, and control signals. Buses are typically shared by multiple devices. CPU chip register file ALU system bus bus interface North bridge memory bus main memory 15
Memory Read Transaction (1) CPU places address A on the memory bus. register file %eax Load operation: movl A, %eax ALU North bridge bus interface A main memory 0 x A 16
Memory Read Transaction (2) Main memory reads A from the memory bus, retrieves word x, and places it on the bus. register file %eax Load operation: movl A, %eax ALU North bridge bus interface x main memory 0 x A 17
Memory Read Transaction (3) CPU read word x from the bus and copies it into register %eax. register file %eax x Load operation: movl A, %eax ALU North bridge bus interface main memory 0 x A 18
Memory Write Transaction (1) CPU places address A on bus. Main memory reads it and waits for the corresponding data word to arrive. register file %eax y Store operation: movl %eax, A ALU North bridge bus interface A main memory 0 A 19
Memory Write Transaction (2) CPU places data word y on the bus. register file %eax y Store operation: movl %eax, A ALU main memory North bridge bus interface y 0 A 20
Memory Write Transaction (3) Main memory reads data word y from the bus and stores it at address A. register file %eax y Store operation: movl %eax, A ALU North bridge bus interface main memory 0 y A 21
Errors happen! Electrical or magnetic interference can cause bits to flip from 1 to 0 (or vice versa) Can be caused by cosmic rays passing through the memory chips. Error rates go up as densities increase: Up to one bit error per GB per HOUR One solution: Error-Correcting Code (ECC) RAM Contains redundant information used to detect and correct bit errors: Hamming code. Can detect and correct single bit errors; can detect (but not correct) 2-bit errors Costs more: need more memory chips to store ECC information Slower: Requires time to check and correct bit errors 22
What happens to DRAM when you turn off the power? The contents are erased... right? Not so fast. 23
What happens to DRAM when you turn off the power? The contents are erased... right? Not so fast. 30 sec 60 sec 5 min http://citp.princeton.edu/memory/ 24
A Disk Primer Disks consist of one or more platters divided into tracks Each platter may have one or two heads that perform read/write operations Each track consists of multiple sectors The set of sectors across all platters is a cylinder Platter Aperture Sector Heads Track 25
Disk Operation (Single-Platter View) The disk surface spins at a fixed rotational rate The read/write head is attached to the end of the arm and flies over the disk surface on a thin cushion of air. spindle spindle spindle spindle By moving radially, the arm can position the read/write head over any track. 26
Disk Operation (Multi-Platter View) read/write heads move in unison from cylinder to cylinder arm spindle 27
Hard Disk Evolution IBM 305 RAMAC (1956) First commercially produced hard drive 5 Mbyte capacity, 50 platters each 24 in diameter! 28
Hard Disk Evolution 29
Disk access time Command overhead: Seek time: Time for head position to stabilize on the selected track Rotational latency: Time to move disk arm to the appropriate track Depends on how fast you can physically move the disk arm These times are not improving rapidly! Settle time: Time to issue I/O, get the HDD to start responding, select appropriate head Time for the appropriate sector to move under the disk arm Depends on the rotation speed of the disk (e.g., 7200 RPM) Transfer time Time to transfer a sector to/from the disk controller Depends on density of bits on disk and RPM of disk rotation Faster for tracks near the outer edge of the disk why? Modern drives have more sectors on the outer tracks! 30
Example disk characteristics IBM Ultrastar 36XP drive form factor: 3.5 capacity: 36.4 GB rotation rate: 7,200 RPM (120 RPS, musical note C3) platters: 10 surfaces: 20 sector size: 512-732 bytes cylinders: 11,494 cache: 4MB transfer rate: 17.9 MB/s (inner) 28.9 MB/s (outer) full seek: 14.5 ms head switch: 0.3 ms Disk interface speeds SCSI: From 5 MB/sec to 320 MB/sec ATA: from 33 MB/sec to 100 MB/sec Serial ATA (single wire): Starting at 150 MB/sec Firewire: 50 MB/sec 31
Disk Access Time Example Given: Rotational rate = 7,200 RPM Average seek time = 9 ms. Avg # sectors/track = 400. Derived: T(avg rotation) = 1/2 x (60 secs/7200 RPM) x 1000 ms/sec = 4 ms. T(avg transfer) = 60/7200 RPM x 1/400 sectors/track x 1000 ms/sec = 0.02 ms T(access) = 9 ms + 4 ms + 0.02 ms = 13.02 ms total Important points: Access time dominated by seek time and rotational latency. First bit in a sector is the most expensive, the rest are free. SRAM access time is about 4 ns/doubleword, DRAM about 60 ns Disk is about 40,000 times slower than SRAM, 2,500 times slower then DRAM. 32
Disks are messy and slow Low-level interface for reading and writing sectors Access times are still very slow Generally allow OS to read/write an entire sector at a time No notion of files or directories -- just raw sectors So, what do you do if you need to write a single byte to a file? Disk may have numerous bad blocks OS may need to mask this from filesystem Disk seek times are around 10 ms Although raw throughput has increased dramatically Compare to several nanosec to access main memory Requires careful scheduling of I/O requests Tom's Hardware Disk Comparisons 33
What about flash disks? (SSDs) Non-volatile, solid state storage Wicked expensive: about $2.50/GB versus ~$0.25/GB for HDD. Can read and write individual bytes at a time However, must erase a whole block before writing to it Limited number of erase/write cycles No moving parts! Fast access times (about 0.1 msec!!) Lower power (no moving parts) Most flash on the market today can withstand up to 1 million erase/write cycles This has an impact on how the OS manages storage. 34
Logical Disk Blocks Modern disks present a simpler abstract view of the complex sector geometry: The set of available sectors is modeled as a sequence of b-sized logical blocks (0, 1, 2,...) Mapping between logical blocks and actual (physical) sectors Maintained by hardware/firmware device called disk controller. Converts requests for logical blocks into (surface,track,sector) triples. Allows controller to set aside spare cylinders for each zone. Accounts for the difference in formatted capacity and maximum capacity. 35
ATA and IDE Interfaces IDE stands for Integrated (or Intelligent ) Drive Electronics Enhanced IDE (EIDE) and ATA-2 Faster version of ATA/IDE that supports Direct Memory Access (DMA) transfers Ultra ATA: Speed enhancements to ATA standard Same as ATA (Advanced Technology Attachment) Standard interface to hard drives that integrate a drive controller in the drive itself 1 or 2 drives on a chain Versions running at 33, 66, and 100 Mbytes/sec Serial ATA: Emerging standard using a serial (not parallel) interface Speeds starting at 150 Mbyte/sec Can drive longer cables at much higher clock speeds than parallel cable Rounded parallel ATA Serial ATA Parallel ATA 36
SCSI Interface Standard hardware interface to wide range of I/O devices Access model using logical blocks on disk Supported up to 8 devices on a single bus Lots of problems with termination: required physical connector on end of cable to avoid signal refraction! SCSI-2: The next generation On-disk controller maps logical block # to sector/track/head combination SCSI-1: 8-bit bus, 5 Mhz = 5 Mbytes/sec max speed Disks, CDs, DVDs, tapes, etc. Bus-based design: single shared set of I/O lines that all devices connect to Fast SCSI: 10 Mhz clock speed Wide SCSI: 16 bit bus width Fast wide SCSI: 10 Mhz + 16 bit bus = 20 MB/sec throughput SCSI-3: Ramping up on speed and bus width Highest speed now is Ultra320 SCSI : 160 Mhz x 16 bits = 320 MB/sec max speed 37
Relative Interconnect Speeds (from macspeedzone.com) 38
I/O Bus CPU chip register file ALU system bus memory bus main memory North bridge bus interface South bridge I/O bus USB controller mouse keyboard graphics adapter disk controller Expansion slots for other devices such as network adapters. monitor disk 39
Reading a Disk Sector (1) CPU chip register file ALU CPU initiates a disk read by writing a command, logical block number, and destination memory address to a port (address) associated with disk controller. main memory bus interface I/O bus USB controller mouse keyboard graphics adapter disk controller monitor disk 40
Reading a Disk Sector (2) CPU chip Disk controller reads the sector and performs a direct memory access (DMA) transfer into main memory. register file ALU main memory bus interface I/O bus USB controller mouse keyboard graphics adapter disk controller monitor disk 41
Reading a Disk Sector (3) CPU chip register file ALU When the DMA transfer completes, the disk controller notifies the CPU with an interrupt (i.e., asserts a special interrupt pin on the CPU) main memory bus interface I/O bus USB controller mouse keyboard graphics adapter disk controller monitor disk 42
Storage Trends SRAM metric 1980 1985 1990 1995 2000 2005 2005:1980 $/MB access (ns) 19,200 300 2,900 150 320 35 256 15 100 12 75 10 256 x 30 x 1980 1985 1990 1995 2000 2005 2005:1980 $/MB 8,000 access (ns) 375 typical size(mb) 0.064 880 200 0.256 100 100 4 30 70 16 1 60 64 0.20 50 1,000 40,000 x 8x 15,000 x 1985 1990 1995 2000 2005 2005:1980 100 75 10 8 28 160 0.30 10 1,000 0.05 8 9,000 0.001 10,000 x 4 22 x 400,000 400,000 x DRAM metric Disk metric 1980 $/MB 500 access (ms) 87 typical size(mb) 1 43
CPU Clock Rates 1980 1985 1990 1995 processor 8080 286 386 Pentium P-III P-4 clock rate(mhz) 1 6 20 150 3,000 2000 750 2005 2005:1980 3,000 x 44
The CPU-Memory Gap The gap widens between DRAM, disk, and CPU speeds. 100,000,000 10,000,000 1,000,000 100,000 10,000 ns Disk seek time DRAM access time SRAM access time CPU cycle time 1,000 100 10 1 0 1980 1985 1990 1995 2000 2005 Year 45
Next lecture Dynamic Memory Allocation: Getting Started on Lab 3! 46