Parallel and Distributed Programming. Performance Metrics

Similar documents
QUANTITATIVE METHODS CLASSES WEEK SEVEN

Question 3: How do you find the relative extrema of a function?

EFFECT OF GEOMETRICAL PARAMETERS ON HEAT TRANSFER PERFORMACE OF RECTANGULAR CIRCUMFERENTIAL FINS

by John Donald, Lecturer, School of Accounting, Economics and Finance, Deakin University, Australia

Foreign Exchange Markets and Exchange Rates

The example is taken from Sect. 1.2 of Vol. 1 of the CPN book.

Econ 371: Answer Key for Problem Set 1 (Chapter 12-13)

Adverse Selection and Moral Hazard in a Model With 2 States of the World

Keywords Cloud Computing, Service level agreement, cloud provider, business level policies, performance objectives.

AP Calculus AB 2008 Scoring Guidelines

ITIL & Service Predictability/Modeling Plexent

Review on KVM Hypervisor

THE FUNDAMENTALS OF CURRENT SENSE TRANSFORMER DESIGN. Patrick A. Cattermole, Senior Applications Engineer MMG 10 Vansco Road, Toronto Ontario Canada

FACULTY SALARIES FALL NKU CUPA Data Compared To Published National Data

An Adaptive Clustering MAP Algorithm to Filter Speckle in Multilook SAR Images

Mathematics. Mathematics 3. hsn.uk.net. Higher HSN23000

Journal of Engineering and Natural Sciences Mühendislik ve Fen Bilimleri Dergisi

Long run: Law of one price Purchasing Power Parity. Short run: Market for foreign exchange Factors affecting the market for foreign exchange

Traffic Flow Analysis (2)

Repulsive Force

Performance Evaluation

Upper Bounding the Price of Anarchy in Atomic Splittable Selfish Routing

Media Considerations Related to Puerto Rico s Fiscal Situation

Combinatorial Analysis of Network Security

STATEMENT OF INSOLVENCY PRACTICE 3.2

Hardware Modules of the RSA Algorithm

Lecture 3: Diffusion: Fick s first law

Rural and Remote Broadband Access: Issues and Solutions in Australia

EROS SYSTEM SATELLITE ORBIT AND CONSTELLATION DESIGN

(Analytic Formula for the European Normal Black Scholes Formula)

Cloud and Big Data Summer School, Stockholm, Aug., 2015 Jeffrey D. Ullman

81-1-ISD Economic Considerations of Heat Transfer on Sheet Metal Duct

ME 612 Metal Forming and Theory of Plasticity. 6. Strain

Chapter 10 Function of a Matrix

The Constrained Ski-Rental Problem and its Application to Online Cloud Cost Optimization

Projections - 3D Viewing. Overview Lecture 4. Projection - 3D viewing. Projections. Projections Parallel Perspective

Basis risk. When speaking about forward or futures contracts, basis risk is the market

Expert-Mediated Search

A tutorial for laboratory determination of Planck s constant from the Planck radiation law

Fetch. Decode. Execute. Memory. PC update

C H A P T E R 1 Writing Reports with SAS

Entity-Relationship Model

A Multi-Heuristic GA for Schedule Repair in Precast Plant Production

A Loadable Task Execution Recorder for Hierarchical Scheduling in Linux

Category 11: Use of Sold Products

Business rules FATCA V. 02/11/2015

CPU. Rasterization. Per Vertex Operations & Primitive Assembly. Polynomial Evaluator. Frame Buffer. Per Fragment. Display List.

TIME MANAGEMENT. 1 The Process for Effective Time Management 2 Barriers to Time Management 3 SMART Goals 4 The POWER Model e. Section 1.

ESCI 241 Meteorology Lesson 6 Humidity

CARE QUALITY COMMISSION ESSENTIAL STANDARDS OF QUALITY AND SAFETY. Outcome 10 Regulation 11 Safety and Suitability of Premises

5 2 index. e e. Prime numbers. Prime factors and factor trees. Powers. worked example 10. base. power

Factorials! Stirling s formula

Lecture notes: 160B revised 9/28/06 Lecture 1: Exchange Rates and the Foreign Exchange Market FT chapter 13

Financial Mathematics

New Basis Functions. Section 8. Complex Fourier Series

Dehumidifiers: A Major Consumer of Residential Electricity

Use a high-level conceptual data model (ER Model). Identify objects of interest (entities) and relationships between these objects

Global Sourcing: lessons from lean companies to improve supply chain performances

FACILITY MANAGEMENT SCHEMES FOR SCHOOLS IN THE UK:A STUDY OF VARIATIONS IN SUPPORT SERVICES COSTS AND CAPITAL EFFICIENCY RATIOS

LG has introduced the NeON 2, with newly developed Cello Technology which improves performance and reliability. Up to 320W 300W

1754 IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. 6, NO. 5, MAY 2007

Category 7: Employee Commuting

Intermediate Macroeconomic Theory / Macroeconomic Analysis (ECON 3560/5040) Final Exam (Answers)

CPS 220 Theory of Computation REGULAR LANGUAGES. Regular expressions

[ ] These are the motor parameters that are needed: Motor voltage constant. J total (lb-in-sec^2)

A Theoretical Model of Public Response to the Homeland Security Advisory System

Why Market-Valuation-Indifferent Indexing Works

AP Calculus Multiple-Choice Question Collection connect to college success

Abstract. Introduction. Statistical Approach for Analyzing Cell Phone Handoff Behavior. Volume 3, Issue 1, 2009

In the previous two chapters, we clarified what it means for a problem to be decidable or undecidable.

Theoretical aspects of investment demand for gold

Logo Design/Development 1-on-1

Category 1: Purchased Goods and Services

METHODS FOR HANDLING TIED EVENTS IN THE COX PROPORTIONAL HAZARD MODEL

Free ACA SOLUTION (IRS 1094&1095 Reporting)

Theoretical approach to algorithm for metrological comparison of two photothermal methods for measuring of the properties of materials

The price of liquidity in constant leverage strategies. Marcos Escobar, Andreas Kiechle, Luis Seco and Rudi Zagst

On the moments of the aggregate discounted claims with dependence introduced by a FGM copula

An Broad outline of Redundant Array of Inexpensive Disks Shaifali Shrivastava 1 Department of Computer Science and Engineering AITR, Indore

Incomplete 2-Port Vector Network Analyzer Calibration Methods

GOAL SETTING AND PERSONAL MISSION STATEMENT

Sharp bounds for Sándor mean in terms of arithmetic, geometric and harmonic means

Incorporating Statistical Process Control and Statistical Quality Control Techniques into a Quality Assurance Program

Fredy Vélez. Centro Tecnológico CARTIF, Valladolid, España.

Planning and Managing Copper Cable Maintenance through Cost- Benefit Modeling

Policies for Simultaneous Estimation and Optimization

Economic Insecurity, Individual Behavior and Social Policy

Sci.Int.(Lahore),26(1), ,2014 ISSN ; CODEN: SINTE 8 131

Nimble Storage Exchange ,000-Mailbox Resiliency Storage Solution

Lecture 20: Emitter Follower and Differential Amplifiers

Dual Fuel Competition in the British Energy Retail Markets

Real-Time Evaluation of Campaign Performance

REPORT' Meeting Date: April 19,201 2 Audit Committee

Key Management System Framework for Cloud Storage Singa Suparman, Eng Pin Kwang Temasek Polytechnic

Chapter 7. Fastenings. Contents. Lawrence A. Soltis

Budget Optimization in Search-Based Advertising Auctions

Developing Software Bug Prediction Models Using Various Software Metrics as the Bug Indicators

Transcription:

Paralll and Distributd Programming Prformanc! wo main goals to b achivd with th dsign of aralll alications ar:! Prformanc: th caacity to rduc th tim to solv th roblm whn th comuting rsourcs incras;! Scalability: th caacity to incras rformanc whn th comlxity, or siz of th roblm, incrass.! h main factors limiting th rformanc and th scalability of an alication ar:! Architctural Limitations! Algorithmic Limitations

! Architctural Limitations! Latncy and Bandwidth! Data ohrncy! Mmory aacity Factors Limiting Prformanc! Algorithmic Limitations! Missing Paralllism squntial cod! ommunication Frquncy! Synchronization Frquncy! Poor Schduling task granularity/load balancing 3! hr ar distinct classs of rformanc mtrics:! for Procssors: assss th rformanc of a rocssor using normally by masuring th sd or th numbr of orations that it dos in a crtain riod of tim.! of Paralll Alications: assss th rformanc of a aralll alication normally by comaring th xcution tim with multil rocssors and th xcution tim with just on rocssor.! W ar mostly intrstd in mtrics that allow th rformanc valuation of aralll alications. 4

for Procssors! Som of th bst known mtrics to masur rformanc of a rocssor architctur:! MIPS: Millions of Instructions Pr Scond.! FLOPS: FLoating oint Orations Pr Scond.! SPEint: SPE Standard Prformanc Evaluation ororation bnchmarks that valuat rocssor rformanc on intgr arithmtic 99.! SPEf: SPE bnchmarks that valuat rocssor rformanc on floating oint orations 000.! Whtston: synthtic bnchmarks to assss rocssor rformanc on floating oint orations 97.! Dhryston: synthtic bnchmarks to asss rocssor rformanc on intgr arithmtic 984. 5 for Paralll Alications! hr ar a numbr of mtrics, th bst known ar:! Sdu! Efficincy! Rdundancy! Utilization! Quality! hr also som laws/mtrics that try to xlain and assrt th otntial rformanc of a aralll alication. h bst known ar:! Amdahl Law! Gustafson-Barsis Law! Kar-Flatt Law! Isoficincy Law 6 3

Sdu! Sdu is a masur of rformanc. It masurs th ration btwn th squntial xcution tim and th aralll xcution tim. S is th xcution tim with on rocssor is th xcution tim with rocssors PU PUs 4 PUs 8 PUs 6 PUs 000 50 80 60 00 S,9 3,57 6,5 0,00 7 Efficincy! Efficincy is a masur of th usag of th comutational rsourcs. It masurs th ration btwn rformanc and th rsourcs usd to achiv that rformanc. S E S is th sdu for rocssors PU PUs 4 PUs 8 PUs 6 PUs S,9 3,57 6,5 0,00 E 0,96 0,89 0,78 0,63 8 4

Rdundancy! Rdundancy masurs th incras in th rquird comutation whn using mor rocssors. It masurs th ration btwn th numbr of orations rformd by th aralll xcution and by th squntial xcution. O R O O is th total numbr of orations rformd with rocssor O is th total numbr of orations rformd with rocssors PU PUs 4 PUs 8 PUs 6 PUs O 0000 050 000 50 5000 R,03,0,3,50 9 Utilization! Utilization is a masur of th good us of th comutational caacity. It masurs th ratio btwn th comutational caacity utilizd during xcution and th caacity that was availabl. U R E PU PUs 4 PUs 8 PUs 6 PUs R,03,0,3,50 E 0,96 0,89 0,78 0,63 U 0,99 0,98 0,96 0,95 0 5

Quality! Quality is a masur of th rlvancy of using aralll comuting. S E Q R PU PUs 4 PUs 8 PUs 6 PUs S,9 3,57 6,5 0,00 E 0,96 0,89 0,78 0,63 R,03,0,3,50 Q,79,89 3,96 4,0 Amdahl Law! h comutations rformd by a aralll alication ar of 3 tys:! sq: comutations that can only b ralizd squncially.! ar: comutations that can b ralizd in aralll.! com: comutations rlatd to communication/synchronization/initialization.! Using ths 3 classs, th sdu of an alication can b dfind as: sq ar S ar sq com 6

Amdahl Law! Sinc com 0 thn: sq ar S ar sq! If f is th fraction of th comutation that can only b ralizd squntially, thn: f sq sq ar sq f and S! $ sq" f # ' & % sq 3 Amdahl Law! Simlifying: sq f S sq f sq S f f S f f 4 7

Amdahl Law! Lt 0 f b th comutation fraction that can only b ralizd squntially. h Amdahl law tlls us that th maximum sdu that a aralll alication can attain with rocssors is: S f f! h Amdahl law can also b usd to dtrmin th limit of maximum sdu that a dtrmind alication can achiv rgardlss of th numbr of rocssors uusd. 5 Amdahl Law! Suos on wants to dtrmin if it is advantagos to dvlo a aralll vrsion of a crtain squntial alication. hrough xrimntation, it was vrifid that 90% of th xcution tim is snt in rocdurs that may b aralllizabl. What is th maximum sdu that can b achivd with a aralll vrsion of th roblm xcuting on 8 rocssors? S 4,7 0, 0, 8! And th limit of th maximum sdu that can b attaind? lim 0 0, 0, 6 8

Limitations of th Amdahl Law! h Amdahl law ignors th cost with communication/synchronization orations associatd to th introduction of aralllism in an alication. For this rason, th Amdahl law can rsult in rdictions not vry ralistic for crtain roblms.! onsidr a aralll alication, with comlxity On, whos xcution attrn is th following, whr n is th siz of th roblm:! Excution tim of th squntial art inut and outut of data:! Excution tim of th aralll art: n 00! otal communication/synchronization oints r rocssor:! Excution tim du to communication/synchronization n0.000: 0.000 n log 0 8.000 n log n 7 Limitations of th Amdahl Law! What is th maximum sdu attainabl?! Uzing Amdahl law: f 8.000 n 8.000 n n 00! Uzinf th sdu masur: 8.000 n n and S! 00 n 8.000 n "00 n 8.000 n S 00 n 8.000 n 00 n 0 log n 0.000 log 8 9

Limitations of th Amdahl Law PU PUs 4 PUs 8 PUs 6 PUs n 0.000,95 3,70 6,7,36 Amdahl law n 0.000,98 3,89 7,5 4,0 n 30.000,99 3,94 7,7 4,8 n 0.000,6,,,57 Sdu n 0.000,87 3, 4,7 6,64 n 30.000,93 3,55 5,89 9,9 9 Gustafson-Barsis Law! onsidr again th sdu masur dfind rviously: sq ar S ar sq! If f is th fraction of th aralll comutation snt xcuting squntial comutations, thn -f is th fraction of th tim snt in th aralll art: f sq sq ar and! f ar sq ar 0 0

! hn:! Simlifying: S S Gustafson-Barsis Law ar sq f sq ar ar f sq f f f ar sq ar sq f S f Gustafson-Barsis Law! Lt 0 f b th fraction of aralll comutation snt xcuting squntial comutations. h Gustafson-Barsis law tlls us that th maximum sdu that a aralll alication with rocssors can attain is: S f! Whil th Amdahl law starts from th tim of th squntial xcution to stimat th maximum sdu that can b attaind with multil rocssors, th Gustafson- Barsis law dos th oosit, that is, it starts from th aralll xcution tim to stimat th maximum sdu in comarison with th squntial xcution.

Gustafson-Barsis Law! onsidr that a crtain alication xcuts in 0 sconds in 64 rocssors. What is th maximum sdu of an alication knowing, by xrimntation, that 5% of th xcution tim is snt on squntial comutations. S 64 0,05 64 64 3,5 60,85! Suos that a crtain comany wants to buy a surcomutr with 6.384 rocssors to achiv a sdu of 5.000 in an imortant fundamntal roblm. What is th maximum fraction of th aralll xcution that can b snt in squntial comutations to attain th xctd sdu? 5.000 6.384 f 6.384 f 6.383.384 f 0,084 3 Gustafson-Barsis Law Limitations! Whn using th xcution tim of th aralll xcution as a starting oint, instad of th squntial xcution, th Gustafson-Barsis law assums that th xcution with on rocssor is, in th worst cass, tims slowr than th xcution with rocssors.! his may not b tru if th availabl mmory for th xcution with on rocssor is insufficint whn comard to th th comutation with rocssors. For this rason, th stimatd sdu by th Gustafson-Barsis law is normally dsignatd as scald sdu. 4

Kar-Flatt Mtric! Lt us considr again th dfinition of squntial xcution tim and aralll xcution tim: sq ar ar sq com! Lt b th xrimntally dtrmind squntial fraction of a aralll comutation: sq 5 Kar-Flatt Mtric! hn: sq ar! If on considrs that com is ngligibl thn:! On th othr hand: S S 6 3

4 7 Kar-Flatt Mtric! Simlifying: S S S S S S S S 8 Kar-Flatt Mtric! Lt S b th sdu of a aralll alication with > rocssors. h Kar- Flatt mtric tlls us that th xrimntally dtrmind squntial fraction is:! h lss th valu th bttr th aralllization! h Kar-Flatt mtric is intrsting bcaus by nglting th costs with communication/synchronization/initialization orations associatd with aralllism, allows us, a ostriori, to dtrmin th rlvanc of th com comonnt in th vntual dcras of th alication s fficincy. S

Kar-Flatt Mtric! By dfinition, th xrimntally dtrmind squntial fraction is a constant valu that dos not dnd on th numbr of rocssors. sq! On th othr hand, th Kar-Flatt mtric is a function of th numbr of rocssors. S 9 Kar-Flatt Mtric! onsidring that th fficincy of an alication is a dcrasing function on th numbr of rocssors, Kar-Flatt mtric allows us to dtrmin th imortanc of com in that dcras.! If th valus of ar constant whn th numbr of rocssors incrass, that mans that th com comonnt is constant. hrfor, th fficincy dcras is du to th scars aralllism availabl in th alication.! If th valus of incras with th incras in th numbr of rocssors, it mans that th dcras is du to th com comonnt, that is, du to th xcssiv costs associatd with th aralll comutation communication costs, synchronization and/or comutation initialization. 30 5

Kar-Flatt Mtric! For xaml, th Kar-Flatt mtric allows us to dtct sourcs of infficincy not considrd by th modl, which assums that rocssors xcut th aralll art tims fastr thn whn xcuting with just on rocssor.! If w hav 5 rocssors to solv a roblm dcomosd in 0 atomic tasks, thn all rocssors can xcut 4 tasks. If all tasks tak th sam tim to xcut, thn th aralll xcution tim should b a fraction of 5.! On th othr hand, if w hav 6 rocssors to solv th sam roblm, 4 rocssors can xcut 3 tasks but th othr must ncssarily xcut 4. his maks th xcution tim again a fraction of 5 and not of 6. 3 Kar-Flatt Mtric! onsidr th following sdus obtaind by a crtain aralll alication: PUs 3 PUs 4 PUs 5 PUs 6 PUs 7 PUs 8 PUs S,8,50 3,08 3,57 4,00 4,38 4,7 0,099 0,00 0,00 0,00 0,00 0,00 0,00! What is th main rason for th alication to just achiv a sdu of 4,7 with 8 rocssors?! Givn that dosn t incras with th numbr of rocssors, it mans that th main rason for th small sdu is th littl aralllism avaiabl in th roblm. 3 6

Kar-Flatt Mtric! onsidr th following sdus obtaind by a crtain aralll alication: PUs 3 PUs 4 PUs 5 PUs 6 PUs 7 PUs 8 PUs S,87,6 3,3 3,73 4,4 4,46 4,7 0,070 0,075 0,079 0,085 0,090 0,095 0,00! What is th main rason for th alication to just achiv a sdu of 4,7 with 8 rocssors?! Givn that incrass slightly with th numbr of rocssors, it mans that th main rason for th small sdu ar th costs associatd to th aralll comutation. 33 Efficincy and Scalability! From rvious rsults, w can conclud that th fficincy of an alication is:! A dcrasing function of th numbr of rocssors.! yically, an incrasing function on th siz of th robm. 34 7

Efficincy and Scalability! An alication is said scalabl whn its fficincy is maintaind whn w incras roortionally th numbr of rocssors and th siz of th roblm.! h scalability of an alication rflcts its caacity in making us of availabl rsourcs ffctivly. PU PUs 4 PUs 8 PUs 6 PUs n 0.000 0,8 0,53 0,8 0,6 Efficincy n 0.000 0,94 0,80 0,59 0,4 n 30.000 0,96 0,89 0,74 0,58 35 Isofficincy Mtric! h fficincy of an alication is tiically an incrasing function of th siz of th roblm sinc th comlxity of communication is, normally, smallr thn th comutation comlxity, that is, to maintain th sam lvl of fficincy whn w incras th numbr of rocssors on nds to incras th siz of th roblm. h isofficincy mtric formalizs this ida.! Lts considr again th dfinition of sdu: sq ar S ar sq com sq ar sq ar sq ar com sq ar sq com 36 8

9 37 Isofficincy Mtric! Lt 0 b th xcution tim snt by rocssors on th aralll algorithm rforming comutations not don in squntial algorithm:! Simlifying: 0 com sq 0 0 0 0 ar sq ar sq ar sq E ar sq ar sq S 38 Isofficincy Mtric! hn:! If on wants to maintain th sam lvl of fficincy whn w incras th numbr of rocssors, thn: 0 0 0 E E E E E 0 c c E E

Isofficincy Mtric! Lt E b th fficincy of a aralll alication with rocssors. h isofficincy mtric tlls us that to maintain th sam lvl of fficincy whn w incras th numbr of rocssors, thn th siz of th roblm must b incrasd so that th following inquality is satisfid: with c E # E! c " 0 and 0 #"sq "com! h alicability of th isofficincy mtric may dnd on th availabl mmory, considring th maximum siz of th roblm that can b solvd is limitd by that quantity. 39 Isofficincy Mtric! Suos that th isofficincy mtric for a roblm siz n is givn as a function on th numbr of rocssors : n f! If Mn dsignats th quantity of rquird mmory to solv a roblm of siz n thn: f M n M! hat is, to maintain th sam lvl of fficincy, th quantity of rquird mmory r rocssor is: f M n M 40 0

Isofficincy Mtric c log Mmory r rocssor Efficincy can not b Maintaind and should dcras Effcincy can c c log Mmory limit B maintaind c Numbr of rocssors 4 Isofficincy Mtric! onsidr that th squntial vrsion of a crtain alication has comlxity On 3, and that th xcution tim snt by ach of th rocssors of th aralll vrsion in communication/synchronization orations is On log. If th amount of mmory ncssary to rrsnt a roblm of siz n is n, what is th scalability of th alication in trms of mmory? 3 n c n log n c log M n n M c log c log c log! hn, th scalability of th alication is low. 4

Surlinar Sdu! h sdu is said to b surlinar whn th ratio btwn th squntial xcution tim and th aralll xcution tim with rocssors is gratr than.! Som factors that may mak th sdu surlinar ar:! omunication/synchronization/initialization costs ar almost inxistnt.! olrancy to communication latncy.! Incras th mmory caacity th roblm may hav to fit all in mmory.! Subdivisions of th roblma smallr tasks may gnrat lss cach misss.! omutation randomnss in otimization roblms or with multil solutions. 43 Surlinar Sdu If just on comutr rocssador can solv a roblm in N sconds, could N comutrs rocssors Solv th sam roblm in scond? 44