Paralll and Distributd Programming Prformanc! wo main goals to b achivd with th dsign of aralll alications ar:! Prformanc: th caacity to rduc th tim to solv th roblm whn th comuting rsourcs incras;! Scalability: th caacity to incras rformanc whn th comlxity, or siz of th roblm, incrass.! h main factors limiting th rformanc and th scalability of an alication ar:! Architctural Limitations! Algorithmic Limitations
! Architctural Limitations! Latncy and Bandwidth! Data ohrncy! Mmory aacity Factors Limiting Prformanc! Algorithmic Limitations! Missing Paralllism squntial cod! ommunication Frquncy! Synchronization Frquncy! Poor Schduling task granularity/load balancing 3! hr ar distinct classs of rformanc mtrics:! for Procssors: assss th rformanc of a rocssor using normally by masuring th sd or th numbr of orations that it dos in a crtain riod of tim.! of Paralll Alications: assss th rformanc of a aralll alication normally by comaring th xcution tim with multil rocssors and th xcution tim with just on rocssor.! W ar mostly intrstd in mtrics that allow th rformanc valuation of aralll alications. 4
for Procssors! Som of th bst known mtrics to masur rformanc of a rocssor architctur:! MIPS: Millions of Instructions Pr Scond.! FLOPS: FLoating oint Orations Pr Scond.! SPEint: SPE Standard Prformanc Evaluation ororation bnchmarks that valuat rocssor rformanc on intgr arithmtic 99.! SPEf: SPE bnchmarks that valuat rocssor rformanc on floating oint orations 000.! Whtston: synthtic bnchmarks to assss rocssor rformanc on floating oint orations 97.! Dhryston: synthtic bnchmarks to asss rocssor rformanc on intgr arithmtic 984. 5 for Paralll Alications! hr ar a numbr of mtrics, th bst known ar:! Sdu! Efficincy! Rdundancy! Utilization! Quality! hr also som laws/mtrics that try to xlain and assrt th otntial rformanc of a aralll alication. h bst known ar:! Amdahl Law! Gustafson-Barsis Law! Kar-Flatt Law! Isoficincy Law 6 3
Sdu! Sdu is a masur of rformanc. It masurs th ration btwn th squntial xcution tim and th aralll xcution tim. S is th xcution tim with on rocssor is th xcution tim with rocssors PU PUs 4 PUs 8 PUs 6 PUs 000 50 80 60 00 S,9 3,57 6,5 0,00 7 Efficincy! Efficincy is a masur of th usag of th comutational rsourcs. It masurs th ration btwn rformanc and th rsourcs usd to achiv that rformanc. S E S is th sdu for rocssors PU PUs 4 PUs 8 PUs 6 PUs S,9 3,57 6,5 0,00 E 0,96 0,89 0,78 0,63 8 4
Rdundancy! Rdundancy masurs th incras in th rquird comutation whn using mor rocssors. It masurs th ration btwn th numbr of orations rformd by th aralll xcution and by th squntial xcution. O R O O is th total numbr of orations rformd with rocssor O is th total numbr of orations rformd with rocssors PU PUs 4 PUs 8 PUs 6 PUs O 0000 050 000 50 5000 R,03,0,3,50 9 Utilization! Utilization is a masur of th good us of th comutational caacity. It masurs th ratio btwn th comutational caacity utilizd during xcution and th caacity that was availabl. U R E PU PUs 4 PUs 8 PUs 6 PUs R,03,0,3,50 E 0,96 0,89 0,78 0,63 U 0,99 0,98 0,96 0,95 0 5
Quality! Quality is a masur of th rlvancy of using aralll comuting. S E Q R PU PUs 4 PUs 8 PUs 6 PUs S,9 3,57 6,5 0,00 E 0,96 0,89 0,78 0,63 R,03,0,3,50 Q,79,89 3,96 4,0 Amdahl Law! h comutations rformd by a aralll alication ar of 3 tys:! sq: comutations that can only b ralizd squncially.! ar: comutations that can b ralizd in aralll.! com: comutations rlatd to communication/synchronization/initialization.! Using ths 3 classs, th sdu of an alication can b dfind as: sq ar S ar sq com 6
Amdahl Law! Sinc com 0 thn: sq ar S ar sq! If f is th fraction of th comutation that can only b ralizd squntially, thn: f sq sq ar sq f and S! $ sq" f # ' & % sq 3 Amdahl Law! Simlifying: sq f S sq f sq S f f S f f 4 7
Amdahl Law! Lt 0 f b th comutation fraction that can only b ralizd squntially. h Amdahl law tlls us that th maximum sdu that a aralll alication can attain with rocssors is: S f f! h Amdahl law can also b usd to dtrmin th limit of maximum sdu that a dtrmind alication can achiv rgardlss of th numbr of rocssors uusd. 5 Amdahl Law! Suos on wants to dtrmin if it is advantagos to dvlo a aralll vrsion of a crtain squntial alication. hrough xrimntation, it was vrifid that 90% of th xcution tim is snt in rocdurs that may b aralllizabl. What is th maximum sdu that can b achivd with a aralll vrsion of th roblm xcuting on 8 rocssors? S 4,7 0, 0, 8! And th limit of th maximum sdu that can b attaind? lim 0 0, 0, 6 8
Limitations of th Amdahl Law! h Amdahl law ignors th cost with communication/synchronization orations associatd to th introduction of aralllism in an alication. For this rason, th Amdahl law can rsult in rdictions not vry ralistic for crtain roblms.! onsidr a aralll alication, with comlxity On, whos xcution attrn is th following, whr n is th siz of th roblm:! Excution tim of th squntial art inut and outut of data:! Excution tim of th aralll art: n 00! otal communication/synchronization oints r rocssor:! Excution tim du to communication/synchronization n0.000: 0.000 n log 0 8.000 n log n 7 Limitations of th Amdahl Law! What is th maximum sdu attainabl?! Uzing Amdahl law: f 8.000 n 8.000 n n 00! Uzinf th sdu masur: 8.000 n n and S! 00 n 8.000 n "00 n 8.000 n S 00 n 8.000 n 00 n 0 log n 0.000 log 8 9
Limitations of th Amdahl Law PU PUs 4 PUs 8 PUs 6 PUs n 0.000,95 3,70 6,7,36 Amdahl law n 0.000,98 3,89 7,5 4,0 n 30.000,99 3,94 7,7 4,8 n 0.000,6,,,57 Sdu n 0.000,87 3, 4,7 6,64 n 30.000,93 3,55 5,89 9,9 9 Gustafson-Barsis Law! onsidr again th sdu masur dfind rviously: sq ar S ar sq! If f is th fraction of th aralll comutation snt xcuting squntial comutations, thn -f is th fraction of th tim snt in th aralll art: f sq sq ar and! f ar sq ar 0 0
! hn:! Simlifying: S S Gustafson-Barsis Law ar sq f sq ar ar f sq f f f ar sq ar sq f S f Gustafson-Barsis Law! Lt 0 f b th fraction of aralll comutation snt xcuting squntial comutations. h Gustafson-Barsis law tlls us that th maximum sdu that a aralll alication with rocssors can attain is: S f! Whil th Amdahl law starts from th tim of th squntial xcution to stimat th maximum sdu that can b attaind with multil rocssors, th Gustafson- Barsis law dos th oosit, that is, it starts from th aralll xcution tim to stimat th maximum sdu in comarison with th squntial xcution.
Gustafson-Barsis Law! onsidr that a crtain alication xcuts in 0 sconds in 64 rocssors. What is th maximum sdu of an alication knowing, by xrimntation, that 5% of th xcution tim is snt on squntial comutations. S 64 0,05 64 64 3,5 60,85! Suos that a crtain comany wants to buy a surcomutr with 6.384 rocssors to achiv a sdu of 5.000 in an imortant fundamntal roblm. What is th maximum fraction of th aralll xcution that can b snt in squntial comutations to attain th xctd sdu? 5.000 6.384 f 6.384 f 6.383.384 f 0,084 3 Gustafson-Barsis Law Limitations! Whn using th xcution tim of th aralll xcution as a starting oint, instad of th squntial xcution, th Gustafson-Barsis law assums that th xcution with on rocssor is, in th worst cass, tims slowr than th xcution with rocssors.! his may not b tru if th availabl mmory for th xcution with on rocssor is insufficint whn comard to th th comutation with rocssors. For this rason, th stimatd sdu by th Gustafson-Barsis law is normally dsignatd as scald sdu. 4
Kar-Flatt Mtric! Lt us considr again th dfinition of squntial xcution tim and aralll xcution tim: sq ar ar sq com! Lt b th xrimntally dtrmind squntial fraction of a aralll comutation: sq 5 Kar-Flatt Mtric! hn: sq ar! If on considrs that com is ngligibl thn:! On th othr hand: S S 6 3
4 7 Kar-Flatt Mtric! Simlifying: S S S S S S S S 8 Kar-Flatt Mtric! Lt S b th sdu of a aralll alication with > rocssors. h Kar- Flatt mtric tlls us that th xrimntally dtrmind squntial fraction is:! h lss th valu th bttr th aralllization! h Kar-Flatt mtric is intrsting bcaus by nglting th costs with communication/synchronization/initialization orations associatd with aralllism, allows us, a ostriori, to dtrmin th rlvanc of th com comonnt in th vntual dcras of th alication s fficincy. S
Kar-Flatt Mtric! By dfinition, th xrimntally dtrmind squntial fraction is a constant valu that dos not dnd on th numbr of rocssors. sq! On th othr hand, th Kar-Flatt mtric is a function of th numbr of rocssors. S 9 Kar-Flatt Mtric! onsidring that th fficincy of an alication is a dcrasing function on th numbr of rocssors, Kar-Flatt mtric allows us to dtrmin th imortanc of com in that dcras.! If th valus of ar constant whn th numbr of rocssors incrass, that mans that th com comonnt is constant. hrfor, th fficincy dcras is du to th scars aralllism availabl in th alication.! If th valus of incras with th incras in th numbr of rocssors, it mans that th dcras is du to th com comonnt, that is, du to th xcssiv costs associatd with th aralll comutation communication costs, synchronization and/or comutation initialization. 30 5
Kar-Flatt Mtric! For xaml, th Kar-Flatt mtric allows us to dtct sourcs of infficincy not considrd by th modl, which assums that rocssors xcut th aralll art tims fastr thn whn xcuting with just on rocssor.! If w hav 5 rocssors to solv a roblm dcomosd in 0 atomic tasks, thn all rocssors can xcut 4 tasks. If all tasks tak th sam tim to xcut, thn th aralll xcution tim should b a fraction of 5.! On th othr hand, if w hav 6 rocssors to solv th sam roblm, 4 rocssors can xcut 3 tasks but th othr must ncssarily xcut 4. his maks th xcution tim again a fraction of 5 and not of 6. 3 Kar-Flatt Mtric! onsidr th following sdus obtaind by a crtain aralll alication: PUs 3 PUs 4 PUs 5 PUs 6 PUs 7 PUs 8 PUs S,8,50 3,08 3,57 4,00 4,38 4,7 0,099 0,00 0,00 0,00 0,00 0,00 0,00! What is th main rason for th alication to just achiv a sdu of 4,7 with 8 rocssors?! Givn that dosn t incras with th numbr of rocssors, it mans that th main rason for th small sdu is th littl aralllism avaiabl in th roblm. 3 6
Kar-Flatt Mtric! onsidr th following sdus obtaind by a crtain aralll alication: PUs 3 PUs 4 PUs 5 PUs 6 PUs 7 PUs 8 PUs S,87,6 3,3 3,73 4,4 4,46 4,7 0,070 0,075 0,079 0,085 0,090 0,095 0,00! What is th main rason for th alication to just achiv a sdu of 4,7 with 8 rocssors?! Givn that incrass slightly with th numbr of rocssors, it mans that th main rason for th small sdu ar th costs associatd to th aralll comutation. 33 Efficincy and Scalability! From rvious rsults, w can conclud that th fficincy of an alication is:! A dcrasing function of th numbr of rocssors.! yically, an incrasing function on th siz of th robm. 34 7
Efficincy and Scalability! An alication is said scalabl whn its fficincy is maintaind whn w incras roortionally th numbr of rocssors and th siz of th roblm.! h scalability of an alication rflcts its caacity in making us of availabl rsourcs ffctivly. PU PUs 4 PUs 8 PUs 6 PUs n 0.000 0,8 0,53 0,8 0,6 Efficincy n 0.000 0,94 0,80 0,59 0,4 n 30.000 0,96 0,89 0,74 0,58 35 Isofficincy Mtric! h fficincy of an alication is tiically an incrasing function of th siz of th roblm sinc th comlxity of communication is, normally, smallr thn th comutation comlxity, that is, to maintain th sam lvl of fficincy whn w incras th numbr of rocssors on nds to incras th siz of th roblm. h isofficincy mtric formalizs this ida.! Lts considr again th dfinition of sdu: sq ar S ar sq com sq ar sq ar sq ar com sq ar sq com 36 8
9 37 Isofficincy Mtric! Lt 0 b th xcution tim snt by rocssors on th aralll algorithm rforming comutations not don in squntial algorithm:! Simlifying: 0 com sq 0 0 0 0 ar sq ar sq ar sq E ar sq ar sq S 38 Isofficincy Mtric! hn:! If on wants to maintain th sam lvl of fficincy whn w incras th numbr of rocssors, thn: 0 0 0 E E E E E 0 c c E E
Isofficincy Mtric! Lt E b th fficincy of a aralll alication with rocssors. h isofficincy mtric tlls us that to maintain th sam lvl of fficincy whn w incras th numbr of rocssors, thn th siz of th roblm must b incrasd so that th following inquality is satisfid: with c E # E! c " 0 and 0 #"sq "com! h alicability of th isofficincy mtric may dnd on th availabl mmory, considring th maximum siz of th roblm that can b solvd is limitd by that quantity. 39 Isofficincy Mtric! Suos that th isofficincy mtric for a roblm siz n is givn as a function on th numbr of rocssors : n f! If Mn dsignats th quantity of rquird mmory to solv a roblm of siz n thn: f M n M! hat is, to maintain th sam lvl of fficincy, th quantity of rquird mmory r rocssor is: f M n M 40 0
Isofficincy Mtric c log Mmory r rocssor Efficincy can not b Maintaind and should dcras Effcincy can c c log Mmory limit B maintaind c Numbr of rocssors 4 Isofficincy Mtric! onsidr that th squntial vrsion of a crtain alication has comlxity On 3, and that th xcution tim snt by ach of th rocssors of th aralll vrsion in communication/synchronization orations is On log. If th amount of mmory ncssary to rrsnt a roblm of siz n is n, what is th scalability of th alication in trms of mmory? 3 n c n log n c log M n n M c log c log c log! hn, th scalability of th alication is low. 4
Surlinar Sdu! h sdu is said to b surlinar whn th ratio btwn th squntial xcution tim and th aralll xcution tim with rocssors is gratr than.! Som factors that may mak th sdu surlinar ar:! omunication/synchronization/initialization costs ar almost inxistnt.! olrancy to communication latncy.! Incras th mmory caacity th roblm may hav to fit all in mmory.! Subdivisions of th roblma smallr tasks may gnrat lss cach misss.! omutation randomnss in otimization roblms or with multil solutions. 43 Surlinar Sdu If just on comutr rocssador can solv a roblm in N sconds, could N comutrs rocssors Solv th sam roblm in scond? 44