nwpc PC Nw PC valm Mmory Mm. control rad writ Data mmory data out rmmovl ra, D(rB) Excut Bch CC ALU A vale ALU Addr ALU B Data vala ALU fun. valb dste dstm srca srcb dste dstm srca srcb Ftch Dcod Excut icod:ifun M 1 [PC] ra:rb M 1 [PC+1] valc M 4 [PC+2] valp PC+6 vala [ra] valb [rb] vale valb + valc Dcod icod ifun ra rb valc valp A B M istr fil E Writ back Mmory Writ back PC updat M 4 [vale] vala PC valp Ftch Instruction mmory PC incrmnt PC
SEQ Opration #1 Combinational Loic CC ad ad Ports Data mmory Writ Writ Ports Stat PC ristr Cond. Cod ristr Data mmory istr fil All updatd as clock riss PC 0x00c istr fil Combinational Loic ALU Control Mmory rads Instruction mmory istr fil Data mmory
SEQ Opration #2 Cycl 1: Cycl 2: Cycl 3: Cycl 4: Cycl 1 Cycl 2 Cycl 3 Cycl 4 0x000: irmovl $0x100,%bx # %bx <-- 0x100 0x006: irmovl $0x200,%dx # %dx <-- 0x200 0x00c: addl %dx,%bx # %bx <-- 0x300 CC <-- 000 0x00: j dst # Not takn Combinational Loic CC 100 PC 0x00c ad ad Ports Data mmory istr fil %bx = 0x100 Writ Writ Ports stat st accordin to scond irmovl instruction combinational startin to ract to stat chans
SEQ Opration #3 Cycl 1: Cycl 2: Cycl 3: Cycl 4: Cycl 1 Cycl 2 Cycl 3 Cycl 4 0x000: irmovl $0x100,%bx # %bx <-- 0x100 0x006: irmovl $0x200,%dx # %dx <-- 0x200 0x00c: addl %dx,%bx # %bx <-- 0x300 CC <-- 000 0x00: j dst # Not takn Combinational Loic CC 100 000 PC 0x00c 0x00 ad ad Ports Data mmory istr fil %bx = 0x100 Writ Writ Ports 0x300 stat st accordin to scond irmovl instruction combinational nrats rsults for addl instruction
SEQ Opration #4 Cycl 1: Cycl 2: Cycl 3: Cycl 4: Cycl 1 Cycl 2 Cycl 3 Cycl 4 0x000: irmovl $0x100,%bx # %bx <-- 0x100 0x006: irmovl $0x200,%dx # %dx <-- 0x200 0x00c: addl %dx,%bx # %bx <-- 0x300 CC <-- 000 0x00: j dst # Not takn Combinational Loic CC 000 PC 0x00 ad ad Ports Data mmory istr fil %bx = 0x300 Writ Writ Ports stat st accordin to addl instruction combinational startin to ract to stat chans
SEQ Opration #5 Cycl 1: Cycl 2: Cycl 3: Cycl 4: Cycl 1 Cycl 2 Cycl 3 Cycl 4 0x000: irmovl $0x100,%bx # %bx <-- 0x100 0x006: irmovl $0x200,%dx # %dx <-- 0x200 0x00c: addl %dx,%bx # %bx <-- 0x300 CC <-- 000 0x00: j dst # Not takn Combinational Loic CC 000 PC 0x00 0x013 ad Writ Data mmory ad Writ Ports Ports istr fil %bx = 0x300 stat st accordin to addl instruction combinational nrats rsults for j instruction
SEQ Summary Implmntation Exprss vry instruction as sris of simpl stps Follow sam nral flow for ach instruction typ Assmbl ristrs, mmoris, prdsind combinational blocks Connct with control Limitations
SEQ Summary Implmntation Exprss vry instruction as sris of simpl stps Follow sam nral flow for ach instruction typ Assmbl ristrs, mmoris, prdsind combinational blocks Connct with control Limitations Too slow to b practical In on cycl, must propaat throuh instruction mmory, ristr fil, ALU, and data mmory Would nd to run clock vry slowly Hardwar units only activ for fraction of clock cycl
Piplin Ovrviw Gnral Principls of Piplinin Goal Difficultis Cratin a Piplind Y86 Procssor arranin SEQ Insrtin piplin ristrs Problms with data and control hazards
al-world Piplins: Car Washs Squntial Paralll Piplind Ida Divid procss into indpndnt stas Mov objcts throuh stas in squnc At any ivn tims, multipl objcts bin procssd
Computational Exampl 300 ps 20 ps Combinational Dlay = 320 ps Throuhput = 3.12 GOPS Systm Gia - Oprations pr Scond Computation rquirs total of 300 picosconds Additional 20 picosconds to sav rsult in ristr Can must hav clock cycl of at last 320 ps Dlay = Latncy = 320ps = 1/Throuhput
3-Way Piplind Vrsion 100 ps 20 ps 100 ps 20 ps 100 ps 20 ps A B C Systm Divid combinational into 3 blocks of 100 ps ach
3-Way Piplind Vrsion 100 ps 20 ps 100 ps 20 ps 100 ps 20 ps A B C Dlay = 360 ps Throuhput = 8.33 GOPS Systm Divid combinational into 3 blocks of 100 ps ach Can bin nw opration as soon as prvious on passs throuh sta A. Bin nw opration vry 120 ps Ovrall latncy incrass 360 ps from start to finish
Piplin Diarams Unpiplind OP1 OP2 OP3 Tim Cannot start nw opration until prvious on complts 3-Way Piplind OP1 OP2 OP3 Tim Up to 3 oprations in procss simultanously
Opratin a Piplin 239 241 300 359 OP1 OP2 OP3 0 120 240 360 480 640 Tim 100 ps 20 ps 100 ps 20 ps 100 ps 20 ps A B C
Limitations: Nonuniform Dlays 1 100 ps 20 ps A B C OP1 OP2 OP3 Tim
Limitations: Nonuniform Dlays 1 100 ps 20 ps A B C Dlay = 510 ps Throuhput = 5.88 GOPS OP1 OP2 OP3 Tim Throuhput limitd by slowst sta Othr stas sit idl for much of th tim Challnin to partition systm into balancd stas
Limitations: istr Ovrhad
Limitations: istr Ovrhad Dlay = 420 ps, Throuhput = 14.29 GOPS As try to dpn piplin, ovrhad of loadin ristrs bcoms mor sinificant Prcnta of clock cycl spnt loadin ristr: 1-sta piplin: 6.25% 3-sta piplin: 16.67% 6-sta piplin: 28.57% Hih spds of modrn procssor dsins obtaind throuh vry dp piplinin
What could possibly o wron? 1 irmovl $50, %ax 2 addl %ax, %bx 3 mrmovl 100( %bx ), %dx
Data Dpndncis Combinational OP1 OP2 OP3 Tim Systm Each opration dpnds on rsult from prcdin on
Data Hazards A B C OP1 OP2 OP3 OP4 Tim sult dos not fd back around in tim for nxt opration Piplinin has chand bhavior of systm
Data Dpndncis in Procssors 1 irmovl $50, %ax 2 addl %ax, %bx 3 mrmovl 100( %bx ), %dx sult from on instruction usd as oprand for anothr ad-aftr-writ (AW) dpndncy Vry common in actual prorams Must mak sur our piplin handls ths proprly Gt corrct rsults Minimiz prformanc impact