Problem 1 (3 parts, 20points) Pipelining Speedup Suppose a program running on a RISC machine performs 16,000,000 instructions during its execution. The total time it takes to execute an instruction is 200 ns, independent of the clock cycle time. The total amount of work that needs to be performed on each instruction is infinitely divisible, so there may be any number of pipeline stages. 1a) [6 points] Complete the table below by computing the stage time, total execution time, and speedup (relative to the non-pipelined case) for the different pipelining depths. Ignore all hazards (i.e., assume ideal pipelining for this part). Neglect stage time increases caused by pipeline register delays, etc., for this part. Pipeline Depth Stage time Total Execution Time Pipeline Speedup 1 200 ns 3.2 sec 1 2 100 ns 1.6 sec 2 4 8 50 ns 0.8 sec 4 25 ns 0.4 sec 8 Execution Time = 16 M instructions * 1 cycle/instruction * stage_time 1b) [6 points] Now suppose pipelining register delays and processor control overhead adds 10 ns to the latency of each pipeline stage. (So, for example, if there are four pipeline stages, each instruction will have an execution latency of 240 ns and the pipelined machine produces 1 instruction every 60 ns.) What is the maximum speedup that can be obtained through pipelining? Assume there are no hazards (ideal pipelining). Shortest possible stage time = 10ns Total execution time = 16M instructions * 1 cycle/inst * 10ns = 0.16 seconds Original execution time = 3.2 seconds Speedup = 3.2/0.16 = 20 OR Shortest possible stage time = 10ns Maximum throughput is 1/10ns. Compare this to throughput of nonpipelined case 1/200ns: 1/10 / 1/200 = 20 Maximum Speedup: 20 1
1c) [8 points] Now take into account stalls caused by hazards in the pipeline. Complete the table below using the average stall cycles per instruction listed for each pipeline depth. Ignore stage time increases caused by pipeline register delays, control overhead, etc., for this part. Pipeline Depth Average # Stall Cycles/Instruction Stage Time Total Execution Time Pipeline Speedup 1 0.0 200 ns 3.2 sec 1 2 0.6 100 ns 2.56 sec 1.25 4 1.4 50 ns 1.92 sec 1.67 8 4.1 25 ns 2.04 sec 1.57 Execution Time = Instruction_Count * CPI * stage_time 16M insts * 1.6 cpi * 100ns = 2.56 sec 16M insts * 2.4 cpi * 50ns = 1.92 sec 16M insts * 5.1 cpi * 25ns = 2.04 sec 2
Problem 2 (5 parts, 20 points) Video CD-ROMs Consider a video CD-ROM system. Suppose it takes 1 byte per pixel to represent the pixel s color and a single image frame in a movie contains 16K pixels (the size of the frame is approximately 128 pixels by 128 pixels). A CD-ROM drive (1X) has a 150 KB/second transfer rate and a total storage capacity of 600 MB per disk. A typical flicker-free movie must run at 30 frames/second. 2a) [4 points] How many frames per second can be provided with a 1X CD-ROM? (show work) 150 KB/sec / 16KB/frame = 9.375 9.375 frames/second. 2b) [4 points] In general, an nx CD-ROM drive spins n times as fast and provide n times the transfer rate as a 1X CD-ROM. (For example, a 2X CD-ROM drive spins the CD-ROM twice as fast and has twice the transfer rate.) How many times faster than the original (1X) CD- ROM do we need to spin our CD-ROM to get the transfer rate necessary for a flicker-free movie? (show work) 9.375 frames/sec * N = 30 frames/sec; N = 3.2, round up to 4 X OR 30 frames/sec * 16 KB/frame = 480 KB/sec; 480/150 = 3.2; round up to 4 X 4 X CD-ROM. 2c) [4 points] How many frames of a movie can be stored on a CD-ROM (1X)? (show work) 600 MB / 16KB/frame = 37.5 K 37500 frames. 2d) [4 points] If we are able to run our movie at 30 frames per second, how many minutes of a movie can be stored on the CD-ROM? (show work) 37.5K frames / 30 frames/sec = 1250 sec = 20.83 min 20.83 minutes. 2e) [4 points] How much data compression do we need to do to fit a 120 minute movie on the CD-ROM? (show work) 120 / 20.83 = 5.76 X; round up to 6 X 6 times reduction in the data. 3
Problem 3 (2 parts, 15 points) The Hazards of Multi-cycle Functional Units Consider the following program fragment executing on a basic 5-stage DLX pipeline with all stages taking 1 cycle, except the Execute stage, which takes a variable number of cycles, depending on the functional unit used: Functional unit Number of EX cycles Integer ALU 1 Floating Point Add 5 Floating Point Load/Store 2 Floating Point Multiply 3 Assume registers are written in the first half of the clock cycle and read in the second half. 3a) [9 points] Suppose the instructions enter the pipeline in order, with a new instruction starting on each cycle. (That is, assume there is no hazard detection mechanism being used and no stalls are introduced to avoid hazards.) Determine which data hazards occur in executing this program fragment. Indicate the hazards as in the following example: if there is a WAR (antidependence) hazard between instructions 3 and 4, involving register F8, and 3 precedes 4, put 3(F8) in the WAR column to the right of instruction 4. An instruction may cause more than one hazard. Assume there are no instructions previous to instruction 1. Instruction RAW (true) WAR (anti) WAW (output) 1: SUBF F1, F2, F3 2: ADDF F1, F4, F5 3: MULTF F6, F3, F1 2(F1) 4: SF 100(R1), F6 3(F6) 5: LF F1, 0(R1) 2(F1) 6: ADDF F2, F1, F6 3(F6), 5(F1) 3b) [6 points] In this part, assume there is no forwarding hardware. If the instructions are executed in order, determine how many stall cycles are required for each instruction (i.e., how many bubbles must be inserted BEFORE each instruction to avoid all data hazards). Instruction Number of Stalls 1: SUBF F1, F2, F3 0 2: ADDF F1, F4, F5 0 3: MULTF F6, F3, F1 6 4: SF 100(R1), F6 4 5: LF F1, 0(R1) 0 6: ADDF F2, F1, F6 3 4
Problem 4 (4 parts, 20 points) Disk Technology Suppose we have a magnetic disk with the following parameters. Controller overhead 3 ms Average seek time 10 ms Rotation rate 5400 revolutions/minute Transfer rate 2.88 MB/second # sectors per track 32 sectors/track Sector size 1 KByte 4a) [5 points] What is the average time to read or write a single sector? (show work) 3 ms + 10ms + ½(60/5400) +1KB/2.88MB/sec = 3 + 10 + 5.56 + 0.35 ms = 18.9 ms 18.9 ms 4b) [5 points] What is the average time to read or write 16 KB in 16 consecutive sectors in the same cylinder? (show work) 3 ms + 10ms + ½(60/5400) +16KB/2.88MB/sec = 3 + 10 + 5.56 + 16(0.35) ms 24.12 ms 4c) [5 points] What is the average time to read or write an entire track (32 consecutive Kbytes)? Assume sectors can be read or written in any order. (show work) 3 ms + 10ms + 0 ms + 32KB/2.88MB/sec = 3 + 10 + 11.11 = 24.11 ms 24.11 ms 4d) [5 points] Now suppose we have an array of 8 of these magnetic disks. The disks are synchronized so that the arms on all the disks are always over the same track and the same sector within the track. The data is striped across the disks in the array so that 8 consecutive sectors can be read in parallel. What is the average time to read or write 16 consecutive KB in the disk array system? (show work) 3 ms + 10ms + ½(60/5400) +2KB/2.88MB/sec = 3 + 10 + 5.56 + 2(0.35) ms = 19.25 ms 5