Efficient Implementation of Concurrent Programming Languages
|
|
|
- April Thompson
- 9 years ago
- Views:
Transcription
1 Uppsl Dissettions fom the Fculty of Science nd Technology 43 ERIK STENMAN Efficient Implementtion of Concuent Pogmmg Lnguges ACTA UNIVERSITATIS UPSALIENSIS UPPSALA 2002
2 ACTA UNIVERSITATIS UPSALIENSIS Uppsl Dissettions fom the Fculty of Science nd Technology 43
3
4 Eik Stenmn Efficient Implementtion of Concuent Pogmmg Lnguges
5 Dissettion fo the Degee of Docto of Philosophy Compute Science pesented t Uppsl Univesity Abstct Stenmn, E. 2002: Efficient Implementtion of Concuent Pogmmg Lnguges, Act Univesittis Upsliensis. Uppsl dissettions fom the Fculty of Science nd Technology pp. Uppsl. ISBN This thesis poposes nd expeimentlly evlutes techniques fo efficient implementtion of lnguges designed fo high vilbility concuent systems. This expeimentl evlution hs been done while developg the High Pefomnce Elng (HiPE) system, ntive code compile fo SPARC nd x86. The two m gols of the HiPE system e to povide efficient execution of Elng pogms, nd to povide esech vehicle fo evlutg implementtion techniques fo concuent functionl pogmmg lnguges. The focus of the thesis is the evlution of two techniques tht enble tepocess optimiztion though dynmic compiltion. The fist technique is fst egiste llocto clled le scn, nd the second is memoy chitectue whee pocesses she memoy. The m contibutions of the thesis e: An evlution of le scn egiste lloction diffeent lnguge settg. In ddition the pefomnce of le scn on the egiste poo x86 chitectue is evluted fo the fist time. A desciption of thee diffeent hep chitectues (pivte heps, shed hep, nd hybid of the two), with systemtic vestigtion of implementtion spects nd n extensive discussion on the ssocited pefomnce tde-offs of the hep chitectues. The desciption is ccompnied by n expeimentl evlution of the pivte vs. the shed hep settg. A novel ppoch to optimizg concuent pogm, by megg code fom sende with code fom eceive, is pesented togethe with othe methods fo educg the ovehed of context switchg. A desciption of the implementtion spects of complete nd obust ntive code Elng system, which mkes it possible to test compile optimiztions on el wold pogms. Eik Stenmn, Deptment of Infomtion Technology, Uppsl Univesity, Box 337, SE Uppsl, Sweden. c Eik Stenmn 2002 ISSN ISBN Pted Sweden by Elndes Gotb, Stockholm 2002.
6 To my pents nd my wife.
7
8 Acknowledgments The dys e ll empty nd the nights e Unel. cookie Fist nd foemost I would like to thnk my supeviso Konstntos (Kostis) Sgons. Without his guidnce, dvice, nd will to help this thesis would neve hve been completed, o even stted. I should lso note tht this esech hs been suppoted pt by the ASTEC (Advnced Softwe Technology) competence cente with mtchg funds by Eicsson Development. A big compile poject like the HiPE compile is tem effot, nd I m gteful to sevel people fo the completion of the HiPE compile. Fist of ll I m gteful to my oigl supeviso Håkn Milloth fo lettg me do Mste s thesis on compiltion of Elng, nd to Thoms Ldgen fo tkg ove the poject when Håkn left. Much of the itil vesion of the HiPE system is due to my Mste s thesis ptne Chiste Jonsson. In the JERICO compile he wote the bck-end nd untime system suppot. As we stted on the HiPE compile we switched oles nd he wote the font-end, the gph colog egiste llocto, nd othe optimiztions. Chiste ws get souce of spition, nd it felt secue to hve such knowledgeble ptne. Without him by my side I would pobbly neve hve undetken such n mbitious poject s the HiPE system. Mikel Pettesson, who took ove the poject when Thoms Ldgen left, is lso vey knowledgeble compile hcke, nd he lwys hd n nswe nd could suggest solution to ny implementtion poblem I encounteed. He is esponsible fo most of the untime system suppot nd he is the m designe of the x86 bck-end. Sevel othes hve lso hcked on nd ound the HiPE compile: Kostis Sgons wote the BEAM disssemble, nd the BEAM to Icode tnsltion. Richd Clsson wote help libies such s pop-lists nd i
9 ii woked on the tefce to the ody Elng compile. Sven-Olof Nystöm wote the genelized blnced tee implementtion which is the coe of mny dt stuctues the HiPE compile. Thee hve lso been mny Mste s students nd students the Compiles-2 couse hckg on the system, fo exmple: Andes Wll, Thoild Selén, Ingem Åbeg who wote the fist vesion of the colescg egiste llocto. Ulf Mgnusson who woked on the x86 bck-end. Chistoffe Vikstöm, Dniel Deogun, nd Jespe Bengtsson who implemented the SSA convesion Icode. Pe Gustfsson who implemented fst suppot fo bies. Tobis Ldhl who wote the ntive suppot fo flotg pot opetions. Lst, but not lest, I would like to thnk Jespe Wilhelmsson who implemented most of the untime suppot fo the shed hep system. We hve lso hd much vluble help fom the Elng/OTP tem t Eicsson, nd I would especilly like to thnk Bjön Gustfsson fo his ptience with ll my questions bout the system, nd fo ll the effot he put to mkg the Elng/OTP system suppot HiPE. I hve lso hd lot of help witg this thesis, nd the ppes it is bsed on, nd I would g like to thnk Kostis Sgons fo ll the help with the witg nd fo helpg me see wht is impotnt. I would lso like to thnk Sven-Olof Nystöm fo mny vluble comments on the thesis. Sce much of this thesis is tken moe o less stight fom peviously published ppes some of the wodgs might be those of my couthos: Kostis Sgons, Mikel Pettesson, Sven-Olof Nystöm, Jespe Wilhelmsson, nd Thoms Ldgen. I lso got vluble comments on my thesis fom my opponent Simon Peyton-Jones, nd lguistic help fom my wife Cecili Stenmn. My fiends t the deptment hs mde my time s Ph.D student enjoyble. In the begng Pe Mildne, Chiste Jonsson, Gustf Gffe Nese, Gege Ottosson, nd dug the yes Jkob Engblom, nd Jespe Wilhelmsson. Most vluble thoughout this whole time hs been my gowg fiendship with Richd Clsson. He hs been get help, both s esech collegue to discuss new (nd old) ides with, nd s fun fiend to emisce bout vtge home computes 1 ove bee. My time Uppsl hs been mde fun by Bikln, the Gmes- Dom gng who iegully joed me ll night LAN-pties, nd ll my othe fiends. I m lso gteful to hve found my closest fiend, 1 Fo us vtge is the begng of the eighties.
10 iii Fedik Stöm hee Uppsl. He hs spied nd encouged me sce the fist dy of ou compute science studies. My fmily hs lwys been vey impotnt to me. Fo lwys believg me, nd fo buyg me my fist thee computes, enblg me to tke this pth life, I m etenlly gteful to my pents. Wods e not enough to expess my gtitude towd my eve lovg nd suppotg wife who helped me though ll the though times of my Ph.D. studies. Hvg to give up my lst nme ws smll pice to py to get he s my wife. Thnk you, Cill!
11
12 Foewod The tools we use hve pofound (nd devious!) fluence on ou thkg hbits, nd, theefoe, on ou thkg bilities. Edsge Dijkst At the ge of 17 I ws fotunte enough to spend one ye s n exchnge student the USA. This ws deed n eductionl ye, nd believe it o not, one of the most memoble expeiences ws elted to the ctul pupose of the tip: the leng of lnguge. When I ived the Sttes my gsp of the English lnguge left much to desie, nd my vocbuly ws quite limited, but I got by. As the dys went by the constnt witg, edg, spekg, nd listeng to English flly foced me to stt thkg English. At fist I ws delighted, this ws get, I could tect with my envionment much fste when I no longe hd to constntly tnslte evey sentence to Swedish, thk up esponse nd then tnslte tht bck to English. The joy ws not long lstg though; I soon found tht I hd become less witted. My limited English vocbuly mde it impossible to thk cet thoughts; I simply lcked the wods. This tught me n impotnt lesson bout the powe of lnguge: If you lnguge (o you gsp of it) is not up to the tsk, you put you telligence nd cetivity t isk. I hve cied this sight with me eve sce, even to the wold of pogmmg lnguges. Hence, I hve lwys stived to fd lnguges tht e poweful enough to esily expess the concepts t hnd. To me, Elng with its built suppot fo concuency is such lnguge. In the bsence of scientific study of the poductivity diffeent lnguges I hope tht my pesonl motivtion to the impotnce of Elng is enough to ouse you teest the efficient implementtion of Elng nd othe simil lnguges. v
13
14 Pio Publictions This thesis is to lge extent bsed on the followg ppes. 2 I. E. Johnsson, S.-O. Nystöm. Pofile-guided optimiztion coss pocess boundies. Poceedgs of ACM SIGPLAN Wokshop on Dynmic nd Adptive Compiltion, II. E. Johnsson, M. Pettesson nd K. Sgons. A High Pefomnce Elng System. Poceedgs of the 2nd ACM SIGPLAN Intentionl Confeence on Pciples nd Pctice of Decltive Pogmmg, III. E. Johnsson nd K. Sgons. Le Scn Registe Alloction High-Pefomnce Elng Compile. Poceedgs of the 4th Intentionl Symposium, Pcticl Aspects of Decltive Lnguges, IV. E. Johnsson, K. Sgons, nd J. Wilhelmsson. Hep Achitectues fo Concuent Lnguges usg Messge Pssg. Poceedgs of the ACM SIGPLAN Intentionl Symposium on Memoy Mngement, V. M. Pettesson, K. Sgons, nd E. Johnsson. The HiPE/x86 Elng Compile: System Desciption nd Pefomnce Evlution. Sixth Intentionl Symposium on Functionl nd Logic Pogmmg, VI. E. Stenmn nd K. Sgons. On Reducg Intepocess Communiction Ovehed Concuent Pogms. Poceedgs of ACM SIGPLAN Elng Wokshop, VII. E. Johnsson, M. Pettesson, K. Sgons, nd T. Ldgen. The Development of the HiPE System: Design nd Expeience Repot. Accepted, will ppe the Spge Intentionl Jounl on Softwe Tools fo Technology Tnsfe. VIII. E. Stenmn nd K. Sgons. Expeimentl evlution nd impovements to le scn egiste lloction. Submitted fo publiction. 2 Note tht the utho hs chnged nme fom Eik Johnsson to Eik Stenmn. vii
15
16 Contents I Pefce 1 1 Intoduction Poblem sttement Contibutions of this thesis Thesis oveview Bckgound Elng Concuency Elng Memoy mngement Elng nd othe concuent lnguges Uses of Elng Gols of HiPE A bief histoy of HiPE JERICO: The fist pototype Compile Cllg conventions nd stck fmes Bckptchg Pefomnce of the JERICO compile The HiPE system befoe Open Souce Elng Open souce HiPE HiPE 1.0/OTP-R HiPE 2.0/OTP-R Some specil HiPE fetues II Implementtion 29 3 The compile fstuctue Phses the compile ix
17 x CONTENTS To BEAM code BEAM to Icode Icode to RTL Symbolic SPARC Symbolic IA Registe lloction Fme mngement Leizg the code Assemblg the code Intefce issues Tilclls Exception hndlg Stck desciptos Gbge collection nd genetionl stck scnng Mode switchg Built- functions Pocess switchg Code lodg Ptten mtchg implementtion Registe lloction Globl egiste lloction Gph colog egiste lloction Iteted egiste colescg Le scn egiste lloction Implemented egiste lloctos HiPE Gph colog egiste llocto Iteted egiste colescg llocto Le scn egiste llocto A nïve egiste llocto The SPARC bck-end The x86 bck-end Tweks fo le scn on the x Relted wok Discussion Hep chitectues An Achitectue with Pivte Heps Pocess communiction Gbge collection Pos nd cons An Achitectue with Shed Hep
18 CONTENTS xi Pocess communiction Gbge collection Pos nd cons Optimiztions Poposg A Hybid Achitectue Alloction sttegy Pocess communiction Gbge collection Pos nd cons Pefomnce of pototype Relted Wok Discussion Pocess optimiztion Reschedulg Send Diect Disptch Intepocess Inlg The tnsfomtion Futhe considetions Retun messges Expeiencesfompototype Potentil gs Relted Wok III Evlution 91 7 Pefomnce of HiPE Elng vs. othe functionl lnguges Compison of Elng implementtions Compison of ntive vs. emulted code Discussion Pefomnce of egiste lloctos Benchmks Compiltion times Speed of Execution Spills on SPARC Spills on x A deepe look on le scn Impctofstuctionodeg Impct of pefomg liveness nlysis
19 xii CONTENTS 9.3 Impct of spillg heuistics Lifetime holes nd live nge splittg A compison of hep chitectues The benchmks nd the settg A compison of pivte hep vs. shed hep chitectue Time pefomnce Stop times Spce pefomnce Summy IV Conclusion Conclusion Summy of contibutions Discussion Futue Resech Refeences 141
20 List of Figues 2.1 A ntive stck fme JERICO Bckptchg when code fo the functions g the cllch f g h is eloded. The new code clls q (not shown) sted of h Stuctue of HiPE-enbled Elng/OTP system Recusive clls (f g) nd tilclls (g til h) The Icode CFG s fo the functions g nd f fom Pogm 3.1. Note tht the cll to g fom f is potected by nexceptionhndle(bsicblock3) The cll stck fo fo the functions g/m the cll ch f/l g/m h/n A stck descipto fo cll (on SPARC) to h/0 fom g/17. Note tht 16 of the guments to g/17 e pssed egistes nd tht 1 locl vible is live dug the cll to h/ Mode-switch fmes ceted cll f g h Code bckptchg done by HiPE lke Contol-flow gph nd two of its possible leiztion Memoy chitectue with pivte heps Messge pssg pivte hep system Memoy chitectue with shed hep Messge pssg shed hep system A hybid memoy chitectue Messgepssghybidchitectue Pocess megg Compiltion times on SPARC Compiltion times, with SSA convesion, on SPARC xiii
21 xiv LIST OF FIGURES 8.3 Compiltion times on x Compiltion times, with SSA convesion, on x Execution times on SPARC Execution times, with SSA convesion, on SPARC Execution times on x Execution times, with SSA convesion, on x Estone nkg on SPARC nd x Estone nkg on SPARC nd x86 with SSA convesion A simple contol flow gph A contol-flow gph nd its odegs Nomlized times fo the pocs benchmk Nomlized execution times Mx gbge collection stop times (ms)
22 List of Tbles 4.1 Use of SPARC egistes HiPE Use of x86 egistes HiPE Pefomnce of functionl lnguges on thee ecusive pogms nd one concuent. Execution times seconds Times ( seconds) fo sequentil benchmks diffeent Elng implementtions Speedup of diffeent Elng implementtions comped to JAM Times ( seconds) nd speedup ove JAM fo concuent benchmks diffeent Elng implementtions Times ( seconds) nd speedup ove JAM fo lge benchmks diffeent Elng implementtions Speedup of HiPE-1.0 ove BEAM R Desciptionofbenchmkpogms Sizes of benchmk pogms Numbe of spilled tempoies nd SPARC stuctions fte lloction Numbe of spilled tempoies nd SPARC stuctions fte lloction (with SSA) Numbe of spilled tempoies nd x86 stuctions fte lloction Numbe of spilled tempoies nd x86 stuctions fte lloction (with SSA) Numbe of spilled tempoies usg diffeent bsic block odegs Numbe of spilled tempoies usg diffeent bsic block odegs(withssa) Impct of spillg heuistics xv
23 xvi LIST OF TABLES 10.1 Numbe of pocesses nd messges Hepsizeslloctedndused(1,000wods)...131
24 Pt I Pefce 1
25
26 Chpte 1 Intoduction Obviously, the med lnguge hs enomous momentum. I m not poposg tht you cn fight this poweful foce. Wht I m poposg is exctly the opposite: tht, like pctitione of Aikido, you cn use it gst you opponents. Pul Ghm This thesis poposes nd expeimentlly evlutes techniques fo efficient implementtion of lnguges designed fo high vilbility concuent systems. A concuent system is system tht is designed s collection of dependent pocesses. Fom the view of the designe these pocesses e pefomg thei tsks simultneously, but elity thei execution might be televed on sgle pocesso. We distguish between concuent pocesses, i.e., pocesses tht conceptully e executg simultneously, nd pllel pocesses, i.e., pocesses tht elity e executg simultneously on, e.g., multipocesso mche. Mny systems lend themselves ntully to concuent implementtion, notbly tective systems which extenl events dictte the execution ode, nd distibuted systems whee tsks e executed pllel on diffeent nodes. The concept of pocesses is lso impotnt s n bstction. A pocess encpsultes stte ntul wy; this espect pocesses esemble objects n object-oiented lnguge. It is hence not supisg tht ecent yes, concuency s fom of bstction hs become cesgly popul, nd mny moden pogmmg lnguges (such s Occm [62], CML [78], Oz [90], Elng [10], Jv [39], nd C#) come with some fom of built- suppot fo concuent pocesses (o theds). Mny of these lnguges belong to the ctegoy tht we will cll concuent functionl pogmmg lnguges (CFPL). A CFPL is functionl pogmmg lnguge with built- suppot fo concuency. One ppliction e, with equiements tht lend themselves ntully to the use of concuent functionl pogmmg lnguge, is
27 4 CHAPTER 1. INTRODUCTION tht of the tele-communictions (telecom) dusty. Most moden systems povided by the telecom dusty, such s telephone exchnges, Intenet seves, nd outes, hve vey high vilbility equiements. Usully, these systems equie five nes vilbility, tht is % uptime. O put nothe wy: less thn five mutes downtime pe ye, cludg ll plnned stops fo mtennce nd updtes. Anothe chcteistic of these systems is tht it is often ntul to model them s lge set of concuent tsks o subsystems. The competitiveness of the telecom dusty lso demnds shot development times. Sce lge pts of these systems e implemented softwe, the telecom dusty hs need fo softwe development envionments tht cn suppot the fst development of highly concuent, fult-tolent systems. Anothe equiement of telecom systems is tht they e supposed to un vitully foeve. This is f fom the kd of web-pplictions tht one wites e.g., Jv, which executes fo few mutes o hous t most. It is lso f fom usul use level pogms such s one would build with, e.g., Micosoft Visul studio nd un unde Wdows. Nobody expects these systems to execute contuously fo yes. The lifetime of bckbone telephone exchnge, on the othe hnd, hs these kds of equiements. Few softwe development tools clim to cte fo such exteme equiements. But the Open Telecom Pltfom (OTP) povided by Eicsson is such tool nd it hs poved itself sevel telecom pojects dug the lst ten yes. The coe technology of OTP is the concuent functionl pogmmg lnguge Elng. It is impotnt tht the untime pefomnce of pplictions developed Elng is s good s possible. Fste execution cn diectly be tuned to the bility to hndle moe uses, clls, connections, equests, etc, o the bility to hndle the sme numbe of uses with chepe hdwe. Also, if the pefomnce of the Elng system is too low o cet lnguge fetues give suboptiml pefomnce, the develope will be tempted to use unntul bstctions ode to chieve cceptble pefomnce. Theefoe, we believe tht it is impotnt to fd genelly pplicble implementtion techniques fo concuent functionl pogmmg lnguges tht ensue high untime pefomnce. To tckle this poblem we hve stted the HiPE (High Pefomnce Elng) poject. The m contibution of the HiPE poject is the HiPE system with ntive code Elng compile fo SPARC nd x86. Ou m gol with the HiPE system is to povide the most efficient execution of Elng pogms. Anothe im of the HiPE system is to povide esech
28 1.1. PROBLEM STATEMENT 5 vehicle tht cn be used to evlute implementtion techniques fo concuent functionl pogmmg lnguges. We hve tied to mke well stuctued, open, nd modul system which llows pogmmg lnguge implemento to plug nd ply pts of the system, ode to evlute diffeent implementtions of one component while keepg the est of the system unchnged. We believe tht the ntue of distibuted tective systems mkes the use of sttic nlysis sub-optiml. These systems hve huge code bse nd e developed modul units mkg whole pogm sttic nlysis poblemtic. As mentioned, these systems hve to cte fo code updtes unng system, which complictes the implementtion of sttic nlysis nd optimiztion scheme. And flly, these systems e vey dynmic ntue, mkg one-time sttic nlysis impecise. Insted of usg sttic nlyses we suggest tht some optimiztion of such systems should be pofile-guided nd pefomed dynmiclly (done t untime). With the HiPE system we hope to cete untime system fo Elng tht hs the bility to econfigue nd e-optimize itself unng system without focg the system to go offle. Fo exmple, fst compile, which spends little time on egiste lloction, cn be used just--time compiles nd fo systems with dynmic ecompiltion. This opens up new oppotunities fo optimiztion nd cn ultimtely led to fste execution comped to system tht compiles the pogm only once, even if this compile poduces n optiml egiste lloction. To chieve the most efficient execution of Elng pogms we need to fd out which pts of the Elng system need to be impoved nd how the execution of Elng pogms cn be optimized. Then we need to fd techniques to implement these optimiztions, nd flly we need to tune the implementtion of these techniques so tht they cn be pplied t untime without distubg the execution of the ppliction. In this thesis sevel techniques to chieve efficient execution of concuent functionl pogmmg lnguges e studied. Even though the focus of the thesis is on the pplicbility of these techniques to Elng, we stess tht they e pplicble to othe pogmmg lnguges nd systems s well. 1.1 PROBLEM STATEMENT As stted peviously, the gol of this thesis is to fd nd evlute efficient implementtion techniques fo CFPLs though pofile-guided dynmic ecompiltion. Dynmic ecompiltion equies fst compile, so we hve studied egiste lloction sce this pt of the compile often is compiltion time bottleneck. In ode to mke te-pocess
29 6 CHAPTER 1. INTRODUCTION communiction moe efficient we would like to hve ccess to shed memoy, nd theefoe we study the impct of shed hep chitectues. With fst compile nd shed memoy new pocess optimiztion techniques cn be developed. Hence, this thesis mly focuses on thee sub-poblems, 1) egiste lloction, 2) hep chitectues suppotg communiction though messge pssg, nd 3) pocess optimiztion, but it lso pesents some genel sights on the development of pogmmg lnguge systems. Let us look t these thee poblems little moe detil. Registe lloction The fstest memoy loctions moden computes e the egistes. Unfotuntely the numbe of egistes is limited, hence it is impotnt to use them s efficiently s possible. The egiste llocto is the pt of compile esponsible fo fdg n effective use of egistes. This is one of the hdest nd most centl poblems to compile nd extensive esech hs been conducted this e. We study thee diffeent egiste lloctos nd compe thei pefomnce to ech othe nd gst nïve llocto, which keeps ll tempoies on the stck. Two of the lloctos e vts of the lloction technique we suspect is the most commonly used technique moden compiles, nmely gph colog. The thid, eltively new technique, clled le scn, is designed with fst compiltion times md, nd hence is well suited fo just--time compiltion. A somewht supisg esult of this compison is tht even though the impct on pefomnce of egiste lloction comped to the nïve ppoch is significnt, when compg the thee lloctos with ech othe thei pefomnces e simil. We extend pevious esech done on le scn egiste lloction by pplyg it to new poblem e, nmely just--time compiltion of functionl pogmmg lnguges. We show tht le scn pefoms well even though the undelyg implementtion equies eltively lge numbe of pecoloed egistes. We lso povide the fist ppliction of le scn to egiste poo chitectue such s x86, nd show tht it pefoms esonbly well tht context too. Flly, we lso evlute sevel vts to the lgoithm to fd the most efficient implementtion. Hep chitectues A key issue the design of concuent lnguge implementtion is the memoy chitectue of the untime system. Thee exist mny diffeent wys of stuctug the chitectue of the untime system, ech hvg its pos nd cons.
30 1.2. CONTRIBUTIONS OF THIS THESIS 7 We pesent thee memoy chitectues fo high-level pogmmg lnguges tht implement concuency though messge pssg. 1 The thee chitectues e 1) pivte hep system, which ech pocess hs its own pivte memoy, 2) shed hep system which ll pocesses she the memoy one common hep, nd 3) hybid system with pivte heps fo pivte dt nd shed hep fo messges. We systemticlly vestigte spects tht fluence the choice between them, nd extensively discuss the ssocited pefomnce tdeoffs. Moeove, n implementtion settg whee the est of the untime system is unchnged, we pesent detiled expeimentl compison between two of these chitectues both on lge highly concuent pogms nd on synthetic benchmks. Pocess optimiztion The use of concuency often hides pts of the dt flow fom the compile. This mkes mny of the common compile optimiztions of tody hd o even impossible the cse of tepocess communiction, the sme wy s the use of pocedues limits the optimiztions compiles tht does not employ te-pocedul optimiztions. As stted, the dynmic ntue of the pplictions we e vestigtg mkes sttic nlysis hd nd impecise. Hence, we popose the use of pofilg to deteme the te-pocess dt flow n ppliction. The collected fomtion cn then be used to optimize the code by fo exmple complete o ptil pocess megg. We pesent method fo ptil pocess megg whee the code sendg messge is meged with the code tht will eceive the messge. 1.2 CONTRIBUTIONS OF THIS THESIS To summize the contibutions of this thesis by e, they e: Registe lloction A thoough evlution of le scn egiste lloction settg diffeent fom the impetive one tht it hs been pplied to peviously. A compison of le scn with thee othe egiste lloctos. The fist evlution of the pefomnce of le scn on the egiste poo x86 chitectue. 1 Even lnguges, such s Jv, tht implement pocess communictions though shed stuctues cn use these thee chitectues, but the tde-offs fo such lnguge e diffeent thn fo lnguge tht uses messge pssg.
31 8 CHAPTER 1. INTRODUCTION An evlution of the effect of options to the bsic lgoithm. Hep chitectues A desciption of thee diffeent hep chitectues (pivte heps, shed heps, nd hybid of the two) with systemtic vestigtion of implementtion spects. An extensive discussion on the ssocited pefomnce tdeoffs of ech of the thee hep chitectues. An expeimentl evlution of the two exteme chitectues on both el wold pogms nd tificil benchmks, pefomed n othewise unchnged untime system. Pocess optimiztion A novel ppoch to optimizg concuent pogm by megg code fom sende with code fom the coespondg eceive. Methods fo educg the ovehed of context switchg. System development A desciption of the implementtion spects of complete nd obust ntive code Elng system tht mkes it possible to test compile optimiztions on el wold pogms. Poofs of the usefulness of this system s esech vehicle by usg the system to evlute egiste lloction sttegies nd diffeent hep chitectues. 1.3 THESIS OVERVIEW The thesis is divided to fou pts, Pefce, Implementtion, Evlution, nd Conclusion. In the est of this fist pt some bckgound mteil is coveed: the lnguge Elng, the gols of the HiPE poject, nd the histoy of developg the HiPE compile. The section on the histoy of HiPE, Section 2.3, lso conts some implementtion detils of vesions pio to the cuent one. Pt II begs with chpte descibg the cuent implementtion of HiPE, Chpte 3. This is followed by desciption of egiste lloctos HiPE, Chpte 4, pesenttion of diffeent hep chitectues Chpte 5, nd pesenttion of pocess optimiztion sttegies Chpte 6. In Pt III, the pefomnce of the HiPE compile, the egiste lloctos, nd the hep chitectues e evluted though benchmkg.
32 1.3. THESIS OVERVIEW 9 In the fl pt of the thesis diections fo futue wok e pesented, some conclusions e dwn nd the thesis is summized. The thesis coves mny diffeent es nd thee is no specific chpte on elted wok; sted discussions of elted wok ppe thoughout the thesis s ppopite.
33
34 Chpte 2 Bckgound Rem tene, veb sequento. Cto In this chpte we pesent bckgound mteil needed fo the undestndg of the m pts of the thesis. This chpte is divided to fou sections: Elng (Section 2.1), Gols of HiPE (Section 2.2), A bief histoy of HiPE (Section 2.3), nd Some specil HiPE fetues (Section 2.4). In the fist section, beside pesentg spects of the Elng lnguge tht fluence the HiPE implementtion, we lso pesent elted wok on memoy chitectues fo concuent lnguges (Section 2.1.2). In the second section we pesent the gols of the HiPE poject nd the HiPE compile, lyg down the philosophy tht hs guided us though the implementtion. Then Section 2.3 we pesent the histoy of the HiPE poject; most of this section sets the bckgound fo the cuent implementtion. This section lso pesents some teestg spects of the HiPE compile, such s the bility to compile one function t the time (Section 2.3.2) nd the use of bckptchg to fcilitte hot-code lodg (Section 2.3.4). The lst section pesents some dditionl fetues the HiPE system tht mkes it possible to stument, pofile, nd mesue diffeent spects of both the untime system nd of Elng pplictions. 2.1 ERLANG Elng 1 is dynmiclly typed, stict, concuent, highe-ode functionl lnguge. The lnguge stted out s n expeimentl implementtion, which hs gown to n dustil implementtion. Thee is no foml defition of the lnguge, but the bsic fetues of Elng e descibed the so clled Elng book ( Concuent pogmmg Elng [10]). Mny new fetues hve been dded to the lnguge 1 Nmed fte the Dnish mthemtic Agne Kup Elng ( ).
35 12 CHAPTER 2. BACKGROUND sce tht book ws witten, nd hence the lnguge is pimily defed by the ltest implementtion fom Eicsson. In this section we will descibe the pts of the lnguge tht e needed fo the undestndg of the est of the thesis. (We will ty to descibe the implementtion dependent spects of Elng, but sce thee is no foml defition nd pciple only one defg implementtion, we will lso descibe some spects tht cn be contibuted s spects of the implementtion.) Elng s bsic dt types e toms, numbes (flots nd bity pecision teges), pocess identifies (o PIDs), efeences, nd bies (byte ys). These dt types cn be combed to the compound dt types lists nd tuples. Thee is no destuctive ssignment of vibles o dt, nd the fist occuence of vible is its bdg stnce. Function ule selection is done with ptten mtchg. Elng heits some ides fom concuent constt logic pogmmg lnguges [83], such s the estiction to flt guds function cluses. Fo pogmmg -the-lge, Elng comes with module system. An Elng module defes numbe of functions. Only explicitly expoted functions my be clled fom othe modules. Clls to functions diffeent modules, clled emote clls, e done by supplyg the nme of the module of the clled function. Tilcll optimiztion is equied fetue of Elng. As othe functionl lnguges, memoy mngement Elng is the esponsibility of the untime system. Elng povides ctch/thow-style exception mechnism fo eo hndlg, ny untime eo such s type eo o division by zeo will esult n exception tht cn be cught by ctch. A simple til-ecusive Elng pogm fo clcultg the length of list might look like Pogm 2.1. Elng pogms execute with n Elng node. Sevel pocesses cn execute concuently on one Elng node, nd sevel nodes cn be connected distibuted netwok. As mentioned, Elng is used five nes high-vilbility (i.e., % of the time vilble) systems, whee downtime is equied to be less thn five mutes pe ye. Such systems cnnot be tken down, upgded, nd estted when softwe ptches nd upgdes ive, sce tht would not espect the vilbility equiement. The pplictions built with Elng e often tended to execute contuously fo yes without exhustg esouces, cshg o stoppg fo ny othe eson. To pefom system upgdg while llowg contuous opetion, n Elng system needs to cte fo the bility to chnge the code of module while the system is unng, so clled hot-code lodg. Pocesses
36 2.1. ERLANG 13 Pogm 2.1 A pogm fo clcultg the length of list. -module(length). %% Defes the nme of the module. -expot([length/1]). %% Expots the function length. %% Retuns the numbe of elements the list List. length(list) -> %% Note, vibles stt with cpitl. length(list,0). %% It is OK to defe sevel functions with the sme %% nme s long s thei ity diffe. %% The length of the empty list ([]) is 0. length([],n) -> 0; %% The length of [_ Rest] is the length(rest) + 1. length([_ Rest],N) -> length(rest,n+1). tht execute old code cn contue to un, but e expected to eventully switch to the new vesion of the module by issug emote cll (which will lwys voke the most ecent vesion of tht module). Once the old code is no longe use, the old module cn be unloded. The Elng lnguge ws puposely designed to be smll, but it comes with lge set of built- functions (known s BIFs) nd big stndd liby. With the Open Telecom Pltfom (OTP) middlewe [88], Elng is futhe extended with liby of stndd solutions to common equiements telecommuniction pplictions (distibuted el-time dtbses, seves, stte mches, pocess monitos, lod blncg), stndd tefces (CORBA), nd stndd communiction potocols (e.g., HTTP, FTP) Concuency Elng Elng is by some clled n cto lnguge [3], sce the concuency is supplied though utonomous pocesses tht communicte synchonously though messge pssg. Pocesses Elng e extemely light-weight, much lighte thn OS o Jv theds [42]. It is not uncommon to hve thousnds of Elng pocesses unng on ech Elng node. The memoy equiements of Elng pocesses my vy dynmiclly dug untime. Elng s concuency pimitives spwn,! (send), nd eceive llow pocess to cete new pocesses nd communicte with othe pocesses though synchonous messge pssg. Tht is, the send opetion is non-blockg, but the eceive is blockg. Note though tht Elng povides mechnism fo llowg pocess to timeout
37 14 CHAPTER 2. BACKGROUND while witg fo messges. (This mkes it possible to implement non-blockg eceive by supplyg timeout of zeo.) Any dt vlue cn be sent s messge nd pocesses my be locted on ny Elng node, i.e., ny mche n Elng netwok. Distibution is hence lmost visible Elng. Ech pocess hs milbox, essentilly messge queue, whee ech messge sent to the pocess will ive. Messge selection fom the milbox occus though ptten mtchg. Thee is no shed memoy between pocesses, o fom diffeent pespective, sce thee e no destuctive updtes Elng ny shg cn neve be obseved. One impotnt fetue of Elng used to suppot obust systems, is pocess lkg, tht is, pocess cn egiste to eceive messge when nothe pocess temtes. It is not necessily the fthe (the spwng pocess) tht eceives this messge nd pocesses cn be mutully lked. This mkes it esy to cete supevisg pocess stuctues tht cn estt cshg pocesses. In the cuent implementtion, pocesses e hndled by the untime system schedule, which selects n Elng pocess fom edy queue. The pocess is ssigned numbe of eductions to execute, clled the time-slice of the pocess. Ech time the pocess does function cll eduction is consumed. The pocess is suspended when the time-slice is used up (i.e., the numbe of emg eductions eches zeo), o when the pocess eches eceive nd thee e no mtchg messges its milbox. In the HiPE system, the schedule is implemented C s function tht cn be clled eithe by the BEAM emulto o diectly fom ntive compiled code. The schedule tkes s guments the pocess tht hs been unng nd the numbe of executed eduction steps, nd etuns the next pocess to execute. Pocesses Elng e not gbge collected Elng objects (lthough the pocess identifies e ecycled). They cn keep on livg even though no othe pocess hs ccess to them. Insted pocess will live s long s it hs code to execute. The untime system keeps fomtion bout pocess pocess contol block o PCB. When pocess dies, its PCB is dellocted Memoy mngement Elng nd othe concuent lnguges As othe functionl lnguges, memoy mngement Elng is esponsibility of the untime system nd hppens though gbge collection.
38 2.1. ERLANG 15 Note tht sce thee e no destuctive updtes, the hep n Elng system is unidiectionl, i.e., thee e no cicul stuctues nd ll potes on the hep lwys pot towd olde objects. The soft el-time concens of Elng cll fo bounded time gbge collection techniques [91, 54]. Amstong nd Vidg popose such technique [9]. This technique, bsed on mk-nd-sweep lgoithm, tkes dvntge of the unidiectionlity of the hep but imposes significnt ovehed nd ws neve fully implemented. In pctice, tuned Elng system with genetionl copyg gbge collecto, gbge collection ltency is usully low (less thn 10 milliseconds) s most pocesses e shot-lived o smll size. Longe puses e quite fequent. Howeve, blockg collecto povides no guntees fo el-time esponsiveness. In the cuent implementtion ll Elng tems e tgged nd t ech gbge collection ll oots e known llowg the system to do pecise gbge collection. Tht mens tht the collecto knows the type of ech tem nd does not need to be consevtive [54]. In the context of stict, concuent functionl lnguge implementtions, thee hs been wok tht ims t chievg low gbge collection ltency without pyg the full pice pefomnce tht gunteed el-time gbge collecto usully equies. Notble mong them is the wok of Doligez nd Leoy [31] who combe fst, synchonous copyg collecto fo the thed-specific young genetions with non-disuptive concuent mk-nd-sweep collecto fo the old genetion (which is shed mong ll theds). The esult is qusi-el-time collecto fo Concuent Cml Light. Also, Lose nd Feeley [35] descibe the design of ne-el-time compctg collecto the context of the Gmbit-C Scheme compile. This gbge collecto ws tended to be used the Elng to Scheme (Etos) system, but to the best of ou knowledge, it hs not yet mde it to n Etos distibution. To chieve low gbge collection puse times, concuent o eltime multipocesso collectos hve lso been poposed; both fo (concuent) vts of ML [47, 67, 24], nd ecently fo Jv [12, 46]. An issue which is to lge extent othogonl to the choice of gbge collection technique is the memoy ogniztion of concuent system: Should one use n chitectue which fcilittes shg, o one tht equies copyg of dt? The issue often ttcts heted debtes both
39 16 CHAPTER 2. BACKGROUND the pogmmg lnguge implementtion community nd elsewhee. 2 We will vestigte this issue futhe Chpte 5. Until the fll of 2001, the Eicsson Elng implementtion hd exclusively pivte hep chitectue, tht is memoy chitectue whee ech pocess lloctes nd mnges its own memoy e. We descibe this chitectue Section 5.1. The m eson why this chitectue ws chosen is tht it is believed it esults lowe gbge collection ltency. As we wnted to vestigte the vlidity of this belief, we hve designed nd implemented shed hep memoy chitectue fo Elng pocesses. We descibe this chitectue Section 5.2; it is ledy cluded the Elng/OTP elese Uses of Elng Elng is cuently used dustilly both by Eicsson Telecom nd by othe compnies fo the development of high-vilbility seves nd netwokg equipment. Some exmples of poducts built usg the Elng/OTP system e: AXD/301, sclble ATM switchg system [17], ANx, n ADSL delivey system [68], switchg hdwe contol system, next-genetion cll cente, nd suite of sclble Intenet seves. Sce 1994, the nnul Elng Use Confeence is the pcipl foum fo epotg wok done Elng nd povides ecod of Elng s evolvg dustil use; dditionl fomtion bout Elng pplictions cn be obted though the elevnt pges t GOALS OF HIPE The m gol of the HiPE poject is to fd genelly pplicble techniques fo efficient implementtions of concuent pogmmg lnguges. Anothe gol is to povide tnsfe of technology fom cdemi to dusty by povidg techniques, which e the esult of cdemic esech, n dustil lnguge implementtion. As mens to tht end we e developg the HiPE compile nd untime system. To use this system to evlute new ides nd techniques the system hs to be complete, so tht el wold pogms cn be used the evlution. It lso hs to be efficient so tht the spects we wnt to evlute e not shdowed by the est of the implementtion. Flly it 2 Fo exmple, the netwokg community n issue which is elted to those discussed this thesis is whethe pckets will be pssed up nd down the stck by efeence o by copyg [4]. Also, dug the mid-80 s the issue of whethe files cn be pssed shed memoy ws vestigted by the opetg systems community the context of use-level kenel extensions, fo exmple the Mch Opetg System [94].
40 2.3. A BRIEF HISTORY OF HIPE 17 hs to be obust nd bug fee so tht we know tht we e mesug the coect behvio. Even though ou im is to fd techniques tht e pplicble to ny pogmmg lnguge nd useful fo the development of ny ppliction, we e concenttg ou effots on the untime pefomnce of Elng nd the type of pplictions pimily developed Elng, nmely lge contol systems. The typicl Elng ppliction is vey dynmic ntue nd hence vey hd to nlyze stticlly with good pecision. We believe tht such pplictions could benefit fom pofile-diven just--time compiltion; hence it is impotnt tht the compiltion times e kept low. Anothe chcteistic of contol pplictions is tht they often consist of huge code bse out of which lge chunk is the code fo opetion nd mtennce, which is not time citicl. Fo this eson we feel tht it is impotnt to hve both compct code fomt combed with efficient, but lge, ntive code. We chieve this by llowg vey smll compiltion units when compilg to ntive code; the use cn fo ech function decide whethe it should be emulted o ntive code. To ech these gols esonble time nd to ensue tht the outcome is dustilly elevnt, we hve bsed ou implementtion on the Elng system povided by Eicsson. This is n dustil stength system tht hs been unde constnt development by tem of engees t Eicsson fo moe thn 10 yes. Ou top-level gols hve esulted thee somewht contdictoy equiements on the HiPE compile: 1) The system should be open nd modul ode to let us plug-nd-ply pts of the system to evlute diffeent implementtion techniques. 2) The compiltion times should be kept low ode to llow fo dynmic compiltion. 3) The system should be complete ode to llow us to use el pogms. The fist equiement hs led to lyeed solution with sevel temedite codes the compile; these will be descibed Chpte 3. The second equiement hs led us to exme techniques such s le scn egiste lloction descibed Chpte 4. This equiement is lso one of the motivtions fo llowg the compiltion of sgle functions. By educg the scope of compiltion the compiltion times cn be educed. The thid equiement hs led to the long nd thoough development of the system, s descibed the next section. 2.3 A BRIEF HISTORY OF HiPE In this section we will descibe the histoy of HiPE, while biefly ddessg some implementtion detils nd the tionle behd some design
41 18 CHAPTER 2. BACKGROUND decisions we took. We divide the desciption to five pts, coespondg to the five mjo evisions of the HiPE system: 1. A fist ttempt, witten C, gve some sight on how to ddess the poblem of efficiently implementg Elng nd showed tht consideble speedup could be chieved usg eltively simple methods. 2. A flexible nd moe esily extensible compile design, mostly witten Elng, mde it possible to expeiment with diffeent optimiztion techniques nd mesue thei impct on some elwold pplictions of Elng. 3. An Open Souce Elng distibution fom Eicsson mde it possible fo HiPE to be publicly elesed, get some uses nd put fom the outside wold. 4. A stong couplg of the HiPE compile with the Elng/OTP system esulted HiPE becomg stndd component Open Souce Elng. 5. Testg, clenup, nd poductifiction ; img t mkg HiPE suppoted component the commecil vesion of Elng/OTP JERICO: The fist pototype The sttg pot of the HiPE system ws Mste s thesis poject the summe of 1996 [49]. The gol ws to develop n optimizg compile, clled JERICO, tht would substntilly impove the pefomnce of Elng pogms. One ppoch tht ws biefly consideed ws to use the Jv Vitul Mche (JVM [60]) s bck-end this ws t the time when Jv ws just sttg to become popul lnguge. It ws soon elized tht the chitectue of the JVM is not well-suited fo dynmiclly typed lnguge such s Elng. The JVM povides no suppot fo tgged dt items, so fo exmple teges hve to be wpped, nd it is wkwd to get pope til-ecusion, which is equied fetue of Elng. In ddition, compilg to JVM implies losg contol ove the efficiency of light-weight theds; fetue citicl fo the pefomnce of typicl Elng pplictions; see lso [42] which compes the pefomnce of Elng pocesses nd Jv theds. Consequently, the ide to compile to JVM ws quickly bndoned nd sted we decided to im fo diect compiltion to ntive code. The chosen chitectue ws SPARC V8; ccodg to Eicsson this ws the most common genel pupose pltfom fo Elng pplictions t the time.
42 2.3. A BRIEF HISTORY OF HIPE 19 We decided to implement ou own bck-end fo sevel esons. Fist nd foemost we wnted system with suppot fo on-the-fly compiltion; system ble to ssemble, lk, nd lod the compiled code diectly to unng system, without hvg to ely on ny extenl pogms. Also sce we need to suppot bckptchg (Section 2.3.4) to llow hotcode lodg we needed fe ged contol ove the object code fomt. A smlle but simil poblem is the hndlg of toms the code. Sce tom vlues e only known t lod time the lode hs to be ble to stntite these vlues. Hvg fe ged contol would lso llows us to do moe dvnced switchg on toms s descibed Section Even if we would hve liked to use some stndd tool fo poducg dynmiclly lked libies the choices wee not tht mny. Usg C s potble ssemble would hve been possible. We did not wnt to ty it though sce it is vey hd to get full nd efficient suppot fo tilecusion C. And to lso mke it wok togethe with emulted code, hot-code lodg, nd gbge collection would be nightme. Anothe possibility would hve been to use ML-RISC [37]. To suppot bckptchg, pecise gbge collection, nd stck mps (Section 3.2.3) we would not hve been ble to use it out of the box without hckg the ML-RISC implementtion. A bck-end solution tht nowdys looks pomisg is C-- [56]. The tention of C-- is to be potble ssemble to be used by compile bck-ends, nd it is tended to supply eveythg needed to hndle both gbge collection nd concuency [77]. Unfotuntely C-- did not exist t the time when we stted the poject. Sce the gol ws to develop compile tht woked fo the complete Elng lnguge nd not just toy compile fo subset of Elng, we decided to bse ou compile on the stble nd wokg Elng untime system mde by Eicsson. At tht time thee wee two Elng systems concuently beg developed t Eicsson: JAM The olde system with stck-bsed bstct mche. BEAM A eltively new system bsed on egiste bstct mche, fluenced by the Wen Abstct Mche (WAM) [92] used mny Polog implementtions. At tht time, the BEAM system hd n option to compile Elng pogms to ntive code vi C [43]; this option ws not vey obust nd ws lte emoved. Both systems used the sme untime system nd simil dt epesenttions [40]. The BEAM system ws quite complex nd not elly stble. Also, t tht time, BEAM hd not poven itself substntilly fste thn JAM. The JAM system on the othe hnd ws quite stble nd significntly simple. Fo exmple, thee whee much less thn 256
43 20 CHAPTER 2. BACKGROUND diffeent byte-code stuctions the JAM, while the BEAM hd ove 400 diffeent stuctions. We decided tht this would be good sttg pot fo ou compile: we could tnslte the geneted JAM byte-code to n tenl temedite epesenttion nd then optimize it befoe genetg ntive code Compile In the Eicsson implementtions of Elng, the smllest unit of compiltion is module, but we decided ely on tht the use o the system should be ble to choose to selectively compile sgle pesumbly time-citicl function t time to ntive code. This wy, the compct epesenttion of emulted byte-code with the efficiency of (usully lge) ntive code cn be combed. This fetue is potentilly vey impotnt fo lge telecom pplictions, whee typiclly only smll potion of the code is time-citicl while the emg code dels with eo coection nd mtennce. The tnsltion fom JAM code to the compile s temedite theeddess code ws done stightfowd wy nd left some oppotunities fo optimiztion. Fo exmple, sce JAM ws stck mche thee would be push ech time vible ws efeenced. This push would be tnslted to egiste copy which would often be unnecessy. To impove code qulity, the JERICO compile pefomed constnt popgtion, constnt foldg, unechble code elimtion, nd ded code emovl [5, 66]. A simple dely slot fille which only looked the bsic block pecedg the bnch fo suitble stuctions ws lso implemented. Registe lloction ws bsed on simple gph colog lgoithm Cllg conventions nd stck fmes The JAM stuction set is simple nd the stuctions cont no fomtion bout the cuent fme size. Insted sevel JAM-mche egistes wee used to keep tck of the loction of locl vibles, guments, nd the stck top. All these potes hd to be sved on the stck t function clls. The ntive code on the othe hnd pssed the fist five guments el mche egistes. Apt fom locl vibles, only the etun ddess ws sved on the stck. The fomt of ntive stck fme is shown Figue 2.1. The JERICO untime system used the sme memoy e fo the ntive nd the JAM stcks, stckg ntive fmes nd JAM fmes on top of ech othe. Smll dummy fmes wee plced between fmes of diffeent types to dicte tnsition between emulted nd ntive
44 2.3. A BRIEF HISTORY OF HIPE 21 Agument 6 M Agument N etun ddess ClleSve 1 M SP ClleSve M Figue 2.1: A ntive stck fme JERICO code. Most bugs we encounteed oigted fom the emulted/ntivecode tegtion nd the the hiy stck we ended up with. This scheme ws lte bndoned Bckptchg To fcilitte hot-code lodg nd ecompiltion n tective system we implemented scheme whee cll sites wee ptched when the tget of the cll ws updted. Fo ech function, we kept list of ll cll sites the function nd thei desttions, nd nothe list of ll the clles to the functions. In Figue 2.2 these lists e shown fo thee functions f, g, nd h togethe with the code fo the functions. When function, e.g., g is ecompiled nd loded the list of clles fo g is used to fd nd updte (bckptch) ll clls to the function (In Figue 2.2b the cll fom f g is updted). Sce the set of cll desttions of the new function might be diffeent fom the old function, the clls list is used to fd functions whee g is the clles list nd emoved these efeences (e.g., clles fo h Figue 2.2b). Then ll cll-sites the new vesion of g e seted the clls list (nd the coespondg fomtion dded to the desttions clle lists). The esultg dt stuctues e shown Figue 2.2c. This scheme hs woked well nd hs been used, with mo vitions, ll vesions of the HiPE compile Pefomnce of the JERICO compile The JERICO compile pefomed quite well on smll benchmk pogms. The poduced code ws fequently ttg fcto of 10 speedup ove JAM code [49] nd ws slightly fste thn the BEAM system even when the BEAM compile ws genetg ntive code usg
45 22 CHAPTER 2. BACKGROUND f: clls: [{g,o}] clles: [] g: clls: [{h,o}] clles: [f] h: clls: [] clles: [g] f: clls: [{g,o}] clles: [] g: clls: [{h,o}] clles: [f] h: clls: [] clles: [g] f: clls: [{g,o}] clles: [] g: clls: [{m:q,o}] clles: [f] h: clls: [] clles: [] f(x) -> g(x) g() -> h() h() -> 42. f(x) -> g(x) g() -> h() h() -> 42. f(x) -> g(x) h() -> 42. g() -> m:q(). g() -> m:q() ) Oigl code nd b) New code fo g is c) Afte the lod ll dt suppotg dt stuctues loded, the cll f is stuctues e updted. fo functions f, g, nd h. ptched. Figue 2.2: Bckptchg when code fo the functions g the cll-ch f g h is eloded. The new code clls q (not shown) sted of h. gcc. On the othe hnd, JERICO hd poblems with sclg up to compile lge systems (e.g., tens of thousnds of les of code) nd it ws difficult to develop nd debug new optimiztions pidly it. Anothe pot confimed by ou mesuements ws tht, even concuent pplictions, most time is spent unng sequentil code ( between pocess communictions ). The sme mesuements lso dicted tht pocess communiction would stt beg bottleneck fo those pplictions, only when the system becme 2 3 times fste. We theefoe decided to focus on sequentil optimiztions The HiPE system befoe Open Souce Elng With the JERICO compile s pefomnce evluted on smll pogms, the ntul next step ws to mke the system moe obust, dd moe code optimiztions to the compile, nd evlute its pefomnce on dustil pplictions sted of smll benchmks. It ws soon elized tht especilly the context of n cdemic poject with limited mnpowe the optimiztions we wnted to dd would be much esie to develop Elng thn C. To get moe flexible system tht would llow us to esily dd new optimiztions we decided to ewite the compile fom sctch Elng. At the sme time, the poject (nd the system) ws nmed HiPE. HiPE: Implementtion issues Sce we wee sttg fom sctch, we cquied the then-ltest Elng system fom Eicsson, vesion
46 2.3. A BRIEF HISTORY OF HIPE 23 (This ws still befoe Elng becme open souce.) Tht Elng system could be configued eithe s JAM- obeam-bsed system t stlltion time. The BEAM implementtion hd mtued t this pot, with bette compile thn JAM, 3 but it ws movg tget nd hd still not poven itself much fste thn JAM. Futhemoe, we hd shown tht ou stightfowd compiltion to ntive code ws fste thn BEAM. Sce bsg ou compile on JAM byte-codes ws esie (fewe stuctions to hndle), we chose to sty with the JAM. Runtime System To mimize the numbe of bugs this fist vesion of the HiPE system we decided to put some effot to mkg the tefce between emulted nd ntive code s simple nd clen s possible. Ou pefomnce mesuements hd shown tht educg the stck fmes sizes gve significnt pefomnce impovement so the cost of mtg the diffeent types of fmes ws justified, but on the othe hnd delg with them the sme memoy e ws eo-pone. To void these poblems, we decided to septe the ntive code stck fom the emulted code stck. 4 This wy we could hve diffeent stck fme fomts fo emulted nd ntive code, but on eithe stck we would only hve to del with one type of stck fmes. With septe stcks, specil effot hd to be put to mtg tilcll optimiztion s equied by Elng. In JERICO, the gbge collecto hd to keep stte vible dictg the mode (emulted o ntive) of the stck fme beg scnned, nd fo ech fme it hd to check if the fme mked switch to the othe mode. With septe stcks, the gbge collecto cn scn ech stck quickly nd esily, knowg it will fd only one type of fme on ech stck. One poblem with this scheme ws tht the exception hndlg mechnism ws implemented by set of ctch fmes lked togethe on the stck; this ment tht thee wee lks (potes) between the two stcks. This cused slight compliction when stck needed to be elocted ( ode to expnd o shk it), sce ll ctch fme lks on the othe stck must be updted. Compile Rewitg the HiPE compile Elng ws not the only step mkg it extensible. Insted of the sgle temedite fomt tht the JERICO compile used, sevel temedite epesenttion levels wee toduced; these e descibed detil Section Although techniclly compiles, they still genete vitul mche code fo ech system s emulto. 4 Elng code does not un on the untime system s C stck, except when cllg pimitives implemented C.
47 24 CHAPTER 2. BACKGROUND In ddition to the optimiztions mentioned Section 2.3.1, the HiPE compile optimized constnt dt stuctues to dt efeences nd implemented spillg. The use of Elng nd sevel temedite fomts llowed us to expeiment eltively esily nd copote mny compile optimiztions. On the othe hnd, while implementg some optimiztion lgoithms Elng, we expeienced pefomnce poblems the HiPE compile itself. The compile fequently updted its tenl dt stuctues; howeve, sce it ws witten Elng, these updtes wee implemented by cetg new vesions of the dt stuctues. This spued us to implement fst decltive ys nd hsh tbles, which used destuctive updtes tenlly, nd gve consideble pefomnce benefits. But s we shll see, these impetive stuctues would lte led to some poblems. Benchmkg nd pefomnce By 1998, HiPE ws slowly becomg stble system nd it ws time to mesue its pefomnce nd the effect of vious compile optimiztions on lge dustil Elng pplictions. A quick suvey of cuent Elng pojects led us to the Eicsson AXD/301 [17] poject which ws implementg sclble ATM switch. This ws the lge poject with sevel hunded people volved, nd thei softwe bse consisted of lge mounts of Elng code (i.e., sevel hundeds of thousnds of les). Futhemoe, the developes of AXD/301 wee benchmkg conscious mly due to the competitiveness of the ATM switch mket nd willg to povide el dt to use s benchmks fo HiPE s pefomnce evlution nd spend some time explg how to un these pogms. One of the m poblems with benchmkg dustil Elng pplictions is tht they e often connected to specific hdwe nd softwe pltfom, which is fequently popiety most pobbly the sitution is simil fo mny embedded contol softwe. In pctice, this mens tht benchmkg hs to be conducted on-site, with ll the poblems tht this entils. The AXD/301 benchmk, SCCT, could howeve be un on stnd-lone woksttion. SCCT ws mee 50,000 les of code, fifth of the AXD/301 softwe t tht time, but did cont the time-citicl potion of AXD/301. Ou pefomnce mesuements e summized [50, 51]. Bsiclly, we found tht the AXD/301 system uns huge ne loop, spnng hundeds of functions, none of which stnds out the pofile. Becuse of this, ewitg SCCT some lowe-level lnguge (e.g., C) is not n ttctive option fo impovg AXD s pefomnce. On the othe hnd, due to the size of the pogm, we found tht ntive code
48 2.3. A BRIEF HISTORY OF HIPE 25 thee is little euse the I-cche, nd tht the system fequently stlls witg fo stuctions. Ou second fdg ws tht SCCT spends much time side built- functions mnipultg byte ys nd the tenl dtbse. Flly, consideble time is spent the OS kenel. Fo these esons, the pefomnce speedup fom compilg SCCT to ntive the thn to emulted code, lthough noticeble, is considebly lowe thn tht obted fo smll benchmk pogms, only bout 30% lowe execution time. Still this speedup pobbly justifies the cesed code size nd compiltion time Open souce HiPE In Decembe 1998, Eicsson elesed thei cuent Elng/OTP system s Open Souce Elng (OSE), which opened the possibility to lso distibute the HiPE system s Open Souce. Unfotuntely, contcts between the diffeent Elng development goups (HiPE s nd Eicsson s) hd been fequent nd often diect, which ment tht HiPE nd Eicsson s Elng system hd evolved dependently fo bout two yes (sce Elng 4.5.3). In shot, HiPE ws bsed on n old nd obsolete system nd hd to be poted to OSE befoe we could elese it. The tsk of potg HiPE to OSE tuned out to be significntly hde thn we nticipted (o hoped). Fo exmple, Eicsson s Elng system hd switched to diffeent tggg scheme, usg the low bits of the wod the thn the high ones. The syntctic chnges to the Elng souce code sce wee mssive; ou only option ws to settle fo mostly mnul, nd thus extemely slow nd pful, diff & mege pocess. When potg HiPE to the second OSE elese, JAM , we wee confonted with moe supises! The Eicsson system now fetued genetionl gbge collecto [54] which, besides needg modifictions fo the ntive code stck, ws comptible with ou compile s use of impetive dt stuctues. Genetionl collectos plce objects septe memoy es dependg on thei ge, nd they concentte thei effots to eclimg memoy mong the younge objects sce they tend to be shot-lived. Updtes cn cuse old objects to cont efeences to young objects. These efeences need specil tetment most genetionl collectos: dditionl dt stuctues e needed to ecod them, nd code must be geneted to mt the dt stuctues [54, Chpte 7.5]. Ou poblem ws tht this suppot did not exist the untime system, sce the bse Elng system did not need it. At the time we did not hve time to implement this suppot ouselves, so we eveted
49 26 CHAPTER 2. BACKGROUND to usg puely functionl implementtions of the compile s dt stuctues, which slowed down the compiltion times considebly. The HiPE compile optimizes constnt dt stuctues to efeences to stticlly-llocted litels. This too ws comptible with the new genetionl gbge collecto, nd we hd to chnge it to explicitly not move objects esidg the constnt dt e. This would hve been esy but fo the fct tht the youngest genetion of n Elng pocess is sctteed ove sevel distct memoy es, while its olde genetion is sgle memoy e. (It is usully the othe wy ound.) This mens tht the collecto cnnot esily test if pote efes to the young genetion. Insted, it tests if the pote efes to the olde genetion, nd if not, it ssumes tht it must pot to the young genetion. On the positive side, the potg effot gve us the oppotunity to eview nd evise some design decisions tht etospect wee not entiely stisfctoy. In pticul, we e-implemented the mode-switch tefce to use new JAM stuctions sted of explicit tests, nd tidied up the mode-switch stck fme mngement. In ddition, we ewote HiPE s code seve nd dynmic lke C. In Mch 2000, HiPE vesion 0.92 ws flly elesed s Open Souce bsed on OSE vesion The elesed system consisted of bout 30,000 les of Elng code nd 3,000 les of C nd ssembly code, dded to n othewise mostly unchnged JAM system HiPE 1.0/OTP-R8 Unfotuntely, just s we hd elesed HiPE 0.92 s open souce Eicsson decided to bndon the development of the JAM fvo fo the now much fste BEAM. This ment tht we hd to lso stt suppotg BEAM code HiPE. Fotuntely sce we used ou simple functionl temedite code Icode the font end of the compile the tnsition to BEAM ws mostly pless. (Still, the pocess povided some chllenges sce thee whee no el documenttion of BEAM code nd the BEAM compile used some stnge ticks.) To ensue tht we would not suffe fom moe unplesnt supises we stted much tighte tegtion with OTP cludg fequent code exchnges between the two development tems. With this new tegtion we could stt to give somethg bck to the OTP system, not just the fom of ides but now lso concete code. The fist mjo contibution fom ou side ws completely ewoked tggg scheme [71] which gve Elng/OTP ccess to the full 32 bit ddess spce.
50 2.4. SOME SPECIAL HIPE FEATURES 27 Dug the followg ye we developed n x86 bck-end (k the IA-32 bck-end), nd mde some mjo edesigns of the temedite epesenttions. We lso got chnce to do thoough evlutions of diffeent implementtion techniques nd expeimented with egiste lloction, nd hep chitectues. The detils of these effots nd expeiments e epoted the est of the thesis. Ou effots lso led to n tegted elese of HiPE 1.0 the Open Souce vesion of Elng/OTP-R8 the fll of HiPE 2.0/OTP-R9 As soon s the R8 system ws elesed Eicsson stted to wok on R9, nd we stted to impove upon HiPE. Sce the lst elese we hve fo exmple implemented stck desciptos (see Section 3.2.3), cesed the numbe of guments pssed egistes (fom 5 to 16 on the SPARC, nd fom 0 to 3 on the x86). We e lso wokg on ntive flotg pot suppot [59] nd fste led hndlg of the new by syntx [41]. Ou mbition is to elese HiPE 2.0 with the elese of OTP-R9 the fll of (At the time of witg HiPE is scheduled fo elese with R9, but this elese is couple of weeks to the futue. At the time of ptg this elese might ledy hve tken plce.) 2.4 SOME SPECIAL HiPE FEATURES Instumenttion To help uses decide which functions to compile to ntive code, the HiPE system povides some simple pofilg tools. One of these mesues the numbe of times n emulted function is clled. Sce ecusion is the only wy to implement loop Elng, this mens tht simply countg the numbe of clls to functions will quickly identify pogm hot-spots (i.e., pogm pots whee most of the execution time is spent) which cn then be compiled to ntive code. Bsed on this fetue, it is esy to build (just- time) hot-spot compile fo Elng. Indeed, ledy JERICO one could set tigge level on the numbe of clls to emulted functions befoe these would utomticlly be compiled to ntive code. Sce the compile ws implemented C nd tegted the untime envionment, this ment tht execution of Elng code would be tempoily teupted while compiltion tkes plce. We did no thoough mesuements on the pefomnce impct of this fetue, but we often enbled it while usg the tective Elng shell, nd did not notice ny degdtion system esponsiveness. In the cuent HiPE system, whee the compile is implemented Elng, this compiltion cn tke plce concuently with the execution of the ppliction septe Elng pocess.
51 28 CHAPTER 2. BACKGROUND In mny vesions of HiPE the untime system ws enhnced with pefomnce stumenttion fetues tht could be selectively cluded o excluded t the system s stlltion. These stumenttion fetues cme two foms: 1. Softwe countes: These countes kept tck of how often vious opetions of teest wee pefomed. Fo exmple, the numbe of times ech Elng function ws clled, eithe loclly, emotely, o though met-cll (pply). They could lso count clls to built functions, how mny times ech JAM stuction ws executed, nd how mny times contol pssed between emulted nd ntive code. 2. Pefomnce stumenttion countes (PICs): These wee bsed on the pefomnce stumenttion fcilities of the Sun Ult- SPARC [86]. PICs wee mde ccessible to the use though built- function, nd they wee typiclly used to mesue how much time ws spent specific egion of code, nd fo ccessg hdwe-specific fomtion, fo exmple the mount of time lost due to stlls nd cche misses. The eson fo stll could lso be detemed: dt cche miss, stuction cche miss, extenl cche miss, o bnch mispediction. HiPE used PICs to mesue time spent gbge collection, ech built- function, ntive code, nd ech time-slice. The stumenttion mesued both elpsed cycles nd issued stuctions, mkg it possible to deteme the CPI (cycles pe stuction) tio. Fo moe detils on the stumenttion nd n nlysis of some benchmkg tht used this stumenttion, the ede is efeed to the Licentite thesis of the utho [48], nd n even moe detiled technicl epot [50]. Some of these fetues wee emoved when we dded the x86 bckend. Insted the SPARC low-level pefomnce countes wee mde ccessible s built- functions the bsic untime system povided by Eicsson. The bility to count the numbe of clls to function is lso beg moved to the bsic Elng/OTP system R9.
52 Pt II Implementtion 29
53
54 Chpte 3 The compile fstuctue Bevis esse lboo, obscuus fio. In this chpte we will mostly pesent bckgound fomtion necessy fo the undestndg of the est of the thesis. We will lso pesent few notewothy detils of the HiPE compile which e not pesent most othe compiles. This chpte is divided to two pts, Phses the compile (Section 3.1), nd Intefce Issues (Section 3.2). In the fist section we will pesent the temedite epesenttions (Icode, RTL, x86, nd SPARC) nd the tnsltions done on ech temedite epesenttion. Note tht we hve chosen to divide the temedite epesenttion to sevel lyes with distct tefces between them, mkg the implementtion modul. In the second section of this chpte we will pesent how we hve solved the tegtion of ntive code with the existg untime system. Note tht the HiPE system suppots use contolled mixg of emulted nd ntive code execution t the gnulity of dividul functions. We know of no othe system tht does this nd t the sme time mts tilcll optimiztion. How this is implemented is descibed Sections to In Sections to we lso pesent some bckgound fomtion on how we hndle built- functions, pocess switchg, nd code lodg. The chpte ends with Section descibg how we implement efficient switchg on toms whose vlues e unknown t compile time. 3.1 PHASES IN THE COMPILER The HiPE system consists of compile fom BEAM vitul mche code to ntive mche code (SPARC o x86), The ovell stuctue of the HiPE system is shown Figue 3.1. Even though the compile hs been kept modul by hvg sevel septe temedite epesenttions, they ll use the sme contol-flow gph stuctue. This hs mde
55 32 CHAPTER 3. THE COMPILER INFRASTRUCTURE BEAM Emulto Memoy BEAM Bytecode Othe Dt Ntive Code BEAM Dissssemble HiPE Lode SPARC Symbolic BEAM Icode RTL X86 Elng Run-Time System HiPE Compile Figue 3.1: Stuctue of HiPE-enbled Elng/OTP system. it possible to use the sme implementtion of liveness clcultion, extended bsic blocks, nd emovl of unechble code fo ll temedite epesenttions To BEAM code We use the Elng compile of the undelyg Elng/OTP system to compile Elng souce code to BEAM code. The HiPE compiltion stts fom BEAM files; t this pot the Elng souce code is not necessy fo the compiltion. This mkes it possible to compile thidpty Elng modules to ntive code even if they only e distibuted s pecompiled BEAM files. The BEAM code temedite epesenttion (IR) is simply symbolic epesenttion of BEAM vitul mche code, nd is poduced by disssemblg the functions o module beg compiled. BEAM opetes on lgely implicit hep nd cll-stck, set of globl egistes, nd set of slots the cuent stck fme. Thee e diffeent clsses of egistes: X egistes fo tempoies, Y egistes fo locl vibles (lises fo stck slots), nd F egistes fo tempoy flotg-pot vlues. X nd Y egistes lwys cont fully tgged Elng vlues. BEAM is semi-functionl: composite vlues e immutble, but egistes nd stck slots cn be ssigned feely BEAM to Icode Icode is n idelized Elng ssembly lnguge. The stck is implicit, ny numbe of tempoies my be used, nd tempoies suvive function clls. Icode is vey simplistic lnguge with only 12 diffeent stuctions to mke it n esy tget to tnslte to. Most computtions
56 3.1. PHASES IN THE COMPILER 33 e expessed s function clls. All bookkeepg opetions, cludg memoy mngement nd pocess schedulg, e implicit. BEAM is tnslted to Icode mostly one stuction t time. Howeve, function clls nd the cetion of tuples e sequences of stuctions BEAM but sgle stuctions Icode, equig the tnslto to ecognize those sequences. Tempoies e lso enmed though SSA-convesion [6, 30], to void flse dependencies between diffeent live nges. The Icode fom is then impoved by ppliction of constnt popgtion, constnt foldg, nd ded-code elimtion [66]. In lst stge explicit hep oveflow tests e dded whee needed nd then meged by bckwd popgtion pss. Icode helps to sulte the HiPE compile fom chnges its fontend. Fo exmple, s mentioned HiPE ws peviously bsed on JAM, completely diffeent vitul mche, nd the futue it might be wothwhile to compile diectly fom Elng to Icode Icode to RTL RTL is geneic thee-ddess egiste tnsfe lnguge. RTL itself is tget-dependent, but the code is tget-specific, due to efeences to tget-specific egistes nd pimitive pocedues. RTL hs tgged egistes fo pope Elng vlues, nd untgged egistes fo bity mche vlues. To simplify the gbge collecto tefce, function clls only peseve live tgged egistes. The untime system cnnot hndle deived potes: untgged vlues tht pot to hep-llocted Elng tems. If the gbge collecto eloctes n Elng tem, it would lso hve to locte nd updte ll its deived potes. In the tnsltion fom Icode to RTL, mny opetions (e.g., ithmetic, dt constuction, o tests) e led. Dt tggg opetions [40, 71] e mde explicit, dt ccesses nd itiliztions e tuned to lods nd stoes, etc. Icode-level switch stuctions fo switchg on bsic vlues e tnslted to code tht implements the switches (see Section 3.2.9). Optimiztions pplied to RTL clude common subexpession elimtion, constnt popgtion nd foldg. The RTL code is tnslted eithe to symbolic SPARC code o symbolic x86 code, dependg on the tget of the compiltion Symbolic SPARC The symbolic SPARC ssembly lnguge is simil to RTL, but uses SPARC-specific opetos with some pseudo-opetions (e.g., lodg n tom o function ddess) whose opends e esolved by the lke.
57 34 CHAPTER 3. THE COMPILER INFRASTRUCTURE Symbolic IA-32 The x86 temedite epesenttion is simple bstct x86 ssembly lnguge. It diffes fom RTL two mjo wys: Aithmetic opetions e two-ddess fom, with the desttion opend lso beg souce opend (i.e., x += y sted of x = x + y). Memoy opends descibed by simple ddessg modes (bse egiste plus offset) e pemitted most opend positions Registe lloction Registe lloction is pplied to ty to mp tempoies to ctul mche egistes. Spilled tempoies em unllocted, nd e mpped to stck slots the subsequent fme mngement pss. Befoe egiste lloction, evey tempoy is semnticlly equivlent to egiste, nd ny opend my be tempoy. Afte egiste lloction, ny unllocted tempoy is n implicit memoy opend, but the stuction set chitectue (ISA) of the mche plces some estictions on the use of memoy opends. A fix-up pss is un fte egiste lloction to ensue tht the equiements of the ISA e met. The HiPE system hs sevel diffeent egiste lloctos. These e descibeddetilchpte Fme mngement Stck fmes e toduced to the code fte egiste lloction, when the set of spilled tempoies is known. The fme mngement pss pefoms the followg tsks: A mp is built which mps ech spilled tempoy nd stcked pmete to stck slot. The mppg is given s n offset eltive to vitul fme pote hvg the itil vlue of the stck pote on enty to the function. The fme size nd mximl stck usge fo clls e computed, nd code is dded to the function pologue to check fo stck oveflow nd set up the stck fme. All stuctions e pocessed while ecodg the cuent stck depth (offset fom the vitul fme pote), nd efeences to spilled tempoies e ewitten. On the x86 s memoy opends usg offsets fom the stck pote, nd on the SPARC s lods o stoes.
58 3.2. INTERFACE ISSUES 35 A stck descipto is ceted fo ech cll site, descibg which stck slots coespond to live tempoies (usg the esults of liveness nlysis) nd whethe the cll is the context of locl exception hndle. At ech tilcll, code is geneted to shuffle the ctul pmetes to the itil potion of the stck fme Leizg the code Dug most pts of the compile, the code is epesented contolflow gph fom. Befoe ssembly, the CFG must be leized by odeg the bsic blocks nd ediectg jump stuctions ccodgly. In moden chitectues, it is impotnt tht this leiztion tkes to ccount the likelihood of conditionl jump beg tken o not, nd the sttic bnch pediction lgoithm used hdwe (fowd conditionl: not tken, bckwd conditionl: tken). If this is not done, pefomnce is likely to suffe due to mispedicted bnches. How this is done is descibed below. Fist, the HiPE compile lwys nnottes conditionl jumps with pobbilities fo tken/not-tken. These e usully ccute, sce mny conditionls e type sfety checks nd stck o hep oveflow checks seted by the compile itself. Second, the CFG module is ceful to epesent ll conditionl jumps s unlikely. Thid, the leiztion ttempts to ode blocks followg the likely pth. The net effect is tht most functions, the likely pth is stight sequence of stuctions, with some fowd conditionl jumps to blocks implementg less likely specil cses (such s non-fixnum ithmetic o cllg the gbge collecto on hep oveflow) Assemblg the code The ssemble convets the fl symbolic code to mche code, which is eithe loded to the untime system o sved n object file. 3.2 INTERFACE ISSUES By bsg the HiPE system on the Elng/OTP system we hve been ble to get n dustil stength esech compile which cn be used to benchmk el wold pplictions. Such compile equies tht ll fetues of Elng e implemented nd specil ce hs to be tken to tegte HiPE to the Elng/OTP system. In this section we descibe some such fetues of Elng nd the Elng/OTP system nd how we hve implemented nd tegted them.
59 36 CHAPTER 3. THE COMPILER INFRASTRUCTURE () (b) (c) (d) (e) f s fme f s fme f s fme f s fme f s fme guments to g guments to h guments to h push gs cll g f s et dd g s fme shuffle stck f s et dd g s fme dop fme jmp to h f s et dd h s fme dop fme et $n Figue 3.2: Recusive clls (f g) nd tilclls (g til h) Tilclls Elng, contst to mny othe (non-functionl) pogmmg lnguges, equies pope tilclls. Unfotuntely thee is little suppot hdwe fo function clls tht euse the cuent stck-fme. This foces setion of specil code tht shuffles the stck contents t tilclls. To illustte how clls nd tilclls e implemented by HiPE, ssume tht f clls g, g tilclls h, nd h flly etuns to f. Figue 3.2 shows the stck lyout chnges this pocess. Stte () shows the stck befoe f clls g. The function f evlutes the pmetes to g, pushes them on the stck. On the x86 cll stuction is executed. This stuction pushes etun ddess nd jumps to g, which lloctes its fme (the dshed potion). On the SPARC, the cll does not push the etun ddess, it is sted sved egiste. Then the hede of g sves the etun ddess on the top of the stck s g s fme is set up. So both cses we get to to stte (b), befoe the el code of g stts executg. Then g evlutes the pmetes to h, nd shuffles the stck to ovewite the gument e nd possibly pts of its fme with the new pmetes, ledg to stte (c). Then g completes its tilcll to h by doppg its fme nd jumpg to h, ledg to stte (d). In stte (d), the stck is exctly s if f hd clled h diectly. Eventully h etuns to f. On the x86 et $n stuction is executed, poppg the etun ddess nd n stcked pmetes nd etuns to f. On the SPARC the etun ddess is explicitly loded to the etun ddess egiste nd the stck pote is djusted, both cses ledg to stte (e). Registe pmetes e omitted fom this desciption. Figue 3.2 lso illusttes why it is the cllee who must dellocte the stcked pmetes. In the pesence of tilclls, the clle (f this exmple) does not know which function flly etuns to it, nd it does
60 3.2. INTERFACE ISSUES 37 not know how mny pmetes thee e on the stck upon etun. Theefoe, the clle cnnot dellocte the stcked pmetes, but the etung function cn sce it knows how mny pmetes it tkes. We pot this out becuse this is the opposite of the cllg convention nomlly used by C nd Unix. A disdvntge of this cllg convention is tht the stck shuffle step dug tilclls must lso conside the etun ddess s stcked pmete tht will hve to be moved to diffeent stck slot if the clle nd cllee (g nd h the exmple) hve diffeent numbes of stcked pmetes. Fotuntely the ities of both the clle nd the cllee e known nd the shufflg is only needed when both the clle nd cllee hve guments on the stck. On the SPARC which psses the fist 16 guments egistes this sitution is vey e, but this sitution occus on the x86 whee t most 5 guments cn be pssed egistes Exception hndlg In Elng, n exception thown one function, f, cn be cught by n exception hndle function g, cllg f (See Pogm 3.1). This non-locl etun fom f might be the only wy to ech the code fo the exception hndle g. InHiPEechcllhvenextsuccessolbelif it is with the scope of locl exception hndle (See Figue 3.3). This simplifies the hndlg of function-level CFGs, but on the othe hnd equies tht clls the context of locl exception hndles end bsic blocks. Exception hndles HiPE do not need ny othe specil Pogm 3.1 Any exceptions thown by g is cught by f. f(x) -> %% The ctch will set up n exception hndle %% which this simplest fom will just %% tun n exception to n Elng tem. ctch g(x). g(x) -> %% This opetion will thow n exception %% if X is not numbe. X + 1. %% Exmple execution: %% > f(1). %% 2 %% > f(foo). %% { EXIT, {bdith, []}}
61 38 CHAPTER 3. THE COMPILER INFRASTRUCTURE f/1(v0) 1: edtest() v1 := g(v0)->[5,3] fil 3: v1:=estoe_ctch(3) goto 5 g/1(v0) 1: edtest() v2 := 1 v1 := '+'(v2, v0) etun(v1) 5: etun(v1) Figue 3.3: The Icode CFG s fo the functions g nd f fom Pogm 3.1. Note tht the cll to g fom f is potected by n exception hndle (bsic block 3). epesenttion. A locl exception hndle is just epesented by the bsic blocks implementg it, nd by n exception edge the CFG fom clls tht might fil to it. An led BIF with n exception hndle just jumps to the bsic block of the hndle sted of thowg n exception cse of filue. The lst stge of the compile sets some code edges between clls nd exception hndles ode to move exception vlues to the ight locl tempoy. The lode ecognizes clls with exception hndles nd egistes thei ddess togethe with the ddess of the exception hndle stck descipto (see Section 3.2.3). This wy thee is no untime cost fo settg up n exception hndle. When n exception is thown the stck mp is used while tvesg the cll stck nd if etun ddess hs n exception hndle the contol is tnsfeed to the hndle Stck desciptos The stck fme of function is composed of two pts: fixed-size pt t the top fo clle-sve egistes nd spilled tempoies, nd vible-size pt t the bottom fo pushg the outgog pmetes ecusive clls (see Figue 3.4). On enty, the function fist checks tht enough stck spce is vilble fo the lgest possible fme, cllg untime system pimitive if this is not the cse, then the fixed-size pt is set up. The m benefit of fixed-size fmes is thei low mtennce cost. On the othe hnd, they my cont ded o unitilized stck
62 3.2. INTERFACE ISSUES 39 h t w o g k c t S M Agument L etun ddess ClleSve 1 ClleSve M M M f s fme } } fixed pt Vible pt e m f s g Agument N h s fme Figue 3.4: The cll stck fo fo the functions g/m the cll ch f/l g/m h/n. slots, which cn cese thei sizes nd complicte gbge collection nd exception hndlg. HiPE uses stck desciptos (lso known s stck mps) [18, 54] to suppot exception hndlg nd pecise gbge collection. Fo ech cll site, the compile constucts stck descipto descibg tht cll site to the untime system: the cll site s etun ddess (lookup key) the clle s fme size (excludg this cll site s ctul pmetes) the clle s ity the clle s locl exception hndle, if pesent the live slots the clle s fme This dt enbles the untime system to spect stck fmes nd tvese cll stcks. A stck descipto is epesented by thee-wod hede contg the etun ddess, hsh lk, fme size, nd ity, followed by bitmsk of the live stck slots. The exception hndle is descibed by sgle bit the hede; if set, the hndle ddess is stoed the wod immeditely befoe the hede. In Figue 3.5 the SPARC stck descipto fo cll fom function g/17 to function h/0 is shown. In this exmple thee is one live vlue nd one ded vlue on the stck t the cll to h. Note tht on the SPARC the 16 fist guments e pssed
63 40 CHAPTER 3. THE COMPILER INFRASTRUCTURE f/0 h t w o g k c t S Agument 17 etun ddess DEAD LIVE g/17 ) h sh(ra Stck descipto size:2 ity:1 hsh_lk RA RA -> g h/0 ctch:flse livemp:live DEAD Figue 3.5: A stck descipto fo cll (on SPARC) to h/0 fom g/17. Note tht 16 of the guments to g/17 e pssed egistes nd tht 1 locl vible is live dug the cll to h/0. egistes. This gives stck ity of 1, tht is, one gument is pssed on the stck. The size of the fme, excludg the etun ddess (which is lwys pesent) is 2. On the x86 the memoy ovehed fo the stck desciptos is cuently bout 35% of the code size. Howeve, without stck desciptos dditionl code would hve to be geneted to emove o nullify ded stck slots. Thee e techniques fo compctg stck descipto tbles even futhe, e.g., [87]. Howeve, they lso tend to cese stck descipto lookup costs, nd we hve found those costs to hve mesuble pefomnce impct Gbge collection nd genetionl stck scnng HiPE uses sfe pots gbge collection sttegy: the compile emits code to check fo hep oveflow nd to cll the collecto tht cse. These e noml ecusive clls with ssocited stck desciptos. The gbge collecto uses the stck desciptos to tvese the cll stck nd identify slots contg live Elng vlues. Repeted stck scnng cn cuse high untime oveheds deeply ecusive pogms, to tckle this poblem HiPE implements genetionl stck scnng [25] Mode switchg In HiPE, mode-switch occus wheneve thee is tnsfe of contol fom ntive to emulted code, o vice-ves. We mde the design decision tht the mee pesence of multiple execution modes should not impose ny untime oveheds, s long s no mode-switches occu. This design equiement clls fo get ce when implementg mode-switches, not only fo pefomnce, but lso fo coectness. The fist question which must be nsweed is: whee do modeswitches occu? Sce HiPE compiles dividul functions to ntive
64 3.2. INTERFACE ISSUES 41 code, mode-switch occus wheneve thee is flow of contol fom one function to nothe, nd the two functions e diffeent modes. Thus, mode-switches occu t cll nd etun sites. Elng s exception mechnism lso toduces mode-switches, viz., when n exception is thown fom code executg one mode, nd the most ecent hndle is diffeent mode. We will efe to these cses s cll, etun, nd thow events, espectively. The second question which must be nsweed is: how does the system discove tht pticul stnce of cll, etun, o thow event must pefom mode-switch? The nswe depends on the type of event: Cll events. HiPE uses pseudo-sttic ppoch which clls lwys use the mode of the clle. As descibed Section 3.2.8, if ntive-code clle efes to n emulted-mode cllee, then the lke ediects the cll stuction to sted voke ntive-code stub, which tun cuses switch to emulted mode. If n emulted function is compiled to ntive code, then the stt of the oigl BEAM code is ovewitten with specil emulto stuction which cuses switch to ntive mode. (The symmety between these cses is due to the fct tht the HiPE lke only hs knowledge bout cll sites ntive code.) Retun events. Wheneve ecusive function cll cuses modeswitch, the etun sequence must be ugmented to pefom the vese mode-switch. HiPE uses sme-mode convention fo etuns. When cll cuses mode switch, new contution (stck fme) is ceted the mode of the cllee. The etun ddess this contution pots to code which cuses switch bck to the clle s mode. Fo etuns fom ntive to emulted code, the etun ddess pots to mche code the untime system. Fo etuns fom emulted to ntive code, the etun ddess pots to specil emulto stuction. This cuses no ovehed except dug mode-switches, nd it mimized the mount of chnges needed the existg emultos. Thow events. HiPE dels with exception thows the sme wy s it dels with function etuns: sme-mode convention ugmented with mode-switchg stck fmes. When cll cuses mode-switch, new exception ctch fme is ceted the mode of the cllee. In this ctch fme the hndle ddess pots to code which cuses switches bck to the clle s mode, nd then e-thows the exception. Thus, when cll cuses mode-switch, two fmes e pushed: fist ctch
65 42 CHAPTER 3. THE COMPILER INFRASTRUCTURE SP t h w o g k c t S (1) (2) (4) (5) f s fme etun-to-f mode-switch h s fme RA=mode-switch g s fme (3) NSP Emulto stck Ntive stck Figue 3.6: Mode-switch fmes ceted cll f g h. fme, then etun fme. The code t the etun ddess the etun fme knows tht it lso hs to emove the ctch fme beneth it befoe switchg mode. In ddition to the cll, etun, nd thow events descibed bove, HiPE my lso need to pefom mode-switches when pocess is suspended nd the next pocess to un executes nothe mode. Figue 3.6 illusttes the use of mode-switch fmes nd etun ddesses. Thee e thee functions: f nd h e emulted code, g is ntive code. Fist f clls g, vi the tp-to-ntive stuction plnted g s oigl emulted code by the lke. At the cll, f pushes n emulted-mode etun fme (2) on top of its own fme (1). The modeswitch tnsfes contol to g, nd sets the ntive-code etun ddess egiste to pot to the ntive-to-emulted mode-switch oute. Then g clls h, vi h s tp-to-emulted ntive-code stub. At the cll, g sves its live egistes, cludg its etun ddess, fme (3). The modeswitch pushes mode-switch etun fme (4) on the estck nd vokes h, which then cetes fme fo its locl vibles (5). Mtg til-ecusion The cllg convention with specil stck fmes fo mode-switchg is efficient nd esy to implement. Fo mny pogmmg lnguges, this would be enough. Howeve, Elng, like most othe functionl pogmmg lnguges, elies on til-ecusive function clls fo expessg itetion. Conside the followg sequence of tilclls, whee ech fi e is n emulted function, nd ech fj n is ntive code function: f e 1 til f n 2 til f3 e til f4 n til A coect implementtion is expected to execute such sequence constnt stck spce, egdless of its length.
66 3.2. INTERFACE ISSUES 43 Unfotuntely, t ech cll, new mode-switch stck fme is pushed, to mke the etun pefom the vese mode-switch. Thus, stck spce usge will gow lely with the length of the sequence of tilclls, nd til-ecusion optimiztion is lost. HiPE solves this poblem s follows. The etun ddess modeswitch stck fme will lwys hve known vlue: eithe the ddess of the etun mode-switch oute ( ntive mode), o the ddess of the etun mode-switch stuction ( emulted mode). Thus, simple untime test is ble to distguish mode-switch stck fmes fom noml stck fmes. Now, conside the followg cll sequence: f e g n til h e When f e clls g n, it pushes mode-switch fme on the ntive-code stck. When g n tilclls h e, the system would nomlly push new mode-switch fme, on the emulted-code stck. Insted, HiPE implements mode-switch cll event s follows: 1. If the cuent etun fme is mode-switch fme, then: () pop the mode-switch etun fme fom the clle s stck; (b) voke the cllee. Othewise: 2. push mode-switch etun fme on the cllee s stck; 3. voke the cllee. The itil test pevents djcent mode-switches fom beg ceted, nd thus estoes pope til-ecusive behvio. The test itself is not expensive, nd it is only executed when thee is mode-switch cll. Simil methods fo mtg pope til-ecusion the context of mixed mode execution hve been used some Polog implementtions (e.g., PoLog by BIM, SICStus Polog [76]) nd pehps elsewhee Built- functions Elng defes numbe of built- functions (so clled BIFs) tht ech Elng implementtion should povide. In ddition to this thee e lso the stndd libies with moe functions tht must be implemented n Elng system. Some of these functions e implemented Elng, but some e implemented C the untime system. To cll C functions fom ntive code, such clls hve to be compiled to use
67 44 CHAPTER 3. THE COMPILER INFRASTRUCTURE C s cllg convention, nd the lode hs to be ble to fd the ddess of the C code t lod time. Clls to some of these BIFs (such s element/2) e ecognized by the tnsltion to RTL nd led diectly RTL code. HiPE lso suppots clls to specil pimops (pimitive opetos). These e low level functions implemented C o ssemble fo, e.g., vokg the GC, cesg the ntive stck o suspendg the pocess Pocess switchg Cuently, the HiPE system is not pllel but concuent, thee is just one Elng pocess executg t time. Ech pocess gets time-slice to execute befoe it hs to yield to nothe pocess. This is implemented by eduction counte: the pocess stts with numbe of eductions nd fo ech function cll this numbe is decemented, when the numbe of eductions eches zeo the pocess is suspended. Then schedule is esponsible fo choosg pocess to execute next. In Elng/OTP-R8 the schedule ws implemented s the toplevel loop the untime system, cllg the emulto with the pocess to execute next. The emulto could tun cll ntive code though the mode-switch tefce. When ntive code needed to suspend pocess it hd to go though mode-switch nd the emulto to get to the schedule. In Elng/OTP-R9 we hve tuned this ound so tht the schedule now is function tht cn be clled eithe fom the emulto o fom ntive code diectly to suspend the cuent pocess nd get the next pocess to execute. This hs enbled us to do fste pocess switches fom ntive code Code lodg As descibed befoe, Elng equies the bility to upgde code t untime, without ffectg pocesses cuently executg the old vesion of tht code. The undelyg Elng untime system mts globl tble of ll loded modules. Ech module descipto conts nme, list of expoted functions, nd the loctions of its cuent nd pevious code segments. The expoted functions lwys efe to the cuent code segment. At emote function cll (module:function(pmetes...)), the emulto fist pefoms lookup bsed on the module nd function nme (this lookup is optimized to n diect lod vi tble). If the function is found, the emulto stts executg tht code; othewise, n eo hndle is voked.
68 3.2. INTERFACE ISSUES 45 g (ntive code) g (ntive code) cll f cll f f (emulted code) f (ntive stub) tp to emulted f (emulted code) tp to ntive f (ntive code) () Befoe compilg f to ntive code (b) Afte compilg f code to ntive Figue 3.7: Code bckptchg done by HiPE lke. In ntive code, ech function cll is implemented s mche-level cll to n bsolute ddess. When the clle s code is beg lked, the lke itilizes the cll to diectly voke the cllee. If the cllee hs not yet been loded, the lke will sted diect the cll to stub which pefoms the ppopite eo hndlg. If the cllee exists, but only emulted code, the lke diects the cll to stub which tun will voke the emulto. To hndle hot-code lodg nd dynmic compiltion t untime, the lke lso mts fomtion bout ll cll sites ntive code. This fomtion is used fo dynmic code ptchg, s follows: When module is updted with new vesion of the emulted code, ll emote function clls fom ntive code to tht module e locted. These cll sites e then ptched to cll the new emulted code, vi new ntive-to-emulted code stubs. When n emulted function is compiled to ntive code, ech ntive code cll site which efes to this function is ptched to cll the new ntive code. The fist stuction the BEAM code is lso eplced by new stuction which will cuse the ntive code vesion to be voked. Flly, the ntive-to-emulted stub used to voke it fom ntive code is dellocted. When module is unloded nd its memoy is feed, ll ntive code cll sites efeg to this module e ptched to sted voke n
69 46 CHAPTER 3. THE COMPILER INFRASTRUCTURE eo hndlg stub. All ntive code cll sites with this now nonexistent module e lso emoved fom the lke s dt stuctues, to pevent futue ttempts to updte them. Figue 3.7 illusttes the ctions of the HiPE lke. Initilly, the function f exists only s emulted code, nd the ntive code function g clls it vi tp-to-emulted stub; see Figue 3.7(). Afte compilg f to ntive code, the cll site g is bckptched to voke f s ntive code, nd the fist stuction f s oigl emulted code is eplced with tp-to-ntive emulto stuction; see Figue 3.7(b). 1 Both the stndd Elng system nd the HiPE system suppot lod-on-demnd of modules. When voked, the eo hndle fo undefed function clls will ttempt to lod the code fo tht module fom the file system. If this is successful, the cll contues s noml. As side-effect of lodg the module, the HiPE lke will ptch ntive code cll sites s descibed bove Ptten mtchg implementtion The font end of the compile does most of the wok ssocited with ptten mtchg compiltion usg known techniques [72]. The compile cn eode mutully exclusive pttens ode to goup pttens of the sme type to one compison opetion, epesented s switch stuction BEAM nd Icode. These switches e on the fom: compe the contents of tempoy to set of constnts nd jump to lbel coespondg to the mtchg constnt. A defult lbel tht is used when no constnt mtches is lso given. In the tnsltion to RTL these stuctions e tnslted to sequences of lowe level stuctions. If the set of constnts is too spse the switch is divided to sevel smlle switches [15, 57]. Thee e then sevel wys these switches my be tnslted: 1) s n led by sech, 2) s diect jumptble, o 3) s by sech tble. The method to use is detemed by the numbe of elements to sech (nd the density of the constnts). We hve mesued the pefomnce of these diffeent methods on switches of diffeent sizes nd come up with theshold vlues fo ech method. The fist method, the led by sech, which builds code tee of if stuctions tht sech though the constnts by fshion, is only used when the numbe of constnts is low. Fo lge nd dense sets of smll teges diect jump tble is used sted. 1 We ensue tht the code fo ech BEAM function is lge thn the tp-to-ntive emulto stuction.
70 3.2. INTERFACE ISSUES 47 In othe cses the constnts e soted to dense y though which by sech is conducted, nothe y of the sme size is used to hold the coespondg ddesses. Sce the size of the tble is known t compile time the loop of the by sech cn be unolled. Bsiclly, the sech is done by keepg low pote (ll dexes below e known to pot to smll keys) nd lookhed pote. In ech step, if the constnt t the lookhed pote is smlle thn the seched constnt, move low to lookhed. 2 Then set lookhed to low + cement, whee cement is powe of two tht ech step is divided two. In ech step the vt holds tht the key is somewhee between low nd low+cement. If the tble is of size N then fte logn steps cement is 0 nd hence the key is locted t dex low. (The lgoithm is deived fom Pogmmg Pel ; see Section 8.3 [14]). The size of this led code is O(logN) wheen is the numbe of constnts nd hence significntly smlle thn lg the whole sech tee. Atoms e poblemtic sce thei untime vlues diffe between voctions of the untime system, so switches on toms e tnslted to semi-symbolic code which is flized by the object code lode. Fo this eson the dt section of the code cn cont some the complex stuctues, such s ys of pis of toms nd code lbels tht t lod time e soted on the untime epesenttion of the toms. 2 When ou bck-ends hve full suppot fo conditionl moves the sech code could be futhe impoved by eplcg ll bnches by conditionl moves.
71
72 Chpte 4 Registe lloction Ntu bhoet vcuo. In this chpte we will descibe the egiste lloctos used the evlution pesented Chptes 8 nd 9. We beg with genel desciption of the techniques used Section 4.1 followed by desciption of the specific implementtions used the HiPE system Section 4.2. Most of this mteil is bckgound mteil needed fo the undestndg of the evlution the followg chptes. Section pesents some expeiences fom pplyg le scn egiste lloction on egiste poo chitectue such s the x86. Note tht the desciption of le scn [75] ssumed egiste ich chitectue such s Digitl s Alph. We e the fist to descibe its use both on SPARC nd on the x86, nd we e lso the fist to do n evlution of the pefomnce of le scn on x GLOBAL REGISTER ALLOCATION Registe lloction ims t fdg mppg of souce pogm o compile geneted vibles (hencefoth efeed to s tempoies)to limited set of physicl mche egistes. Locl egiste lloction lgoithms estict thei ttention to the set of tempoies with sgle bsic block. In this cse, efficient lgoithms fo optiml egiste lloction exist; see e.g. [82, 5]. When img to fd such n lloction fo tempoies whose lifetimes spn coss bsic block boundies (e.g., fo ll tempoies of sgle function), the pocess is known s globl egiste lloction. In this cse, contol-flow entes the pictue nd obtg n optiml such mppg becomes n NP-complete poblem; see [21, 66]. Sce the ely 80 s, globl egiste lloction hs been studied extensively the litetue nd diffeent ppoximtions to the optiml lloction usg heuistics methods fom elted NP-complete poblem such s b-pckg, nd gph colog, hve been used to solve the poblem.
73 50 CHAPTER 4. REGISTER ALLOCATION Gph colog egiste lloction The ide behd colog-bsed egiste lloction schemes is to fomulte egiste lloction s gph colog poblem [21] by epesentg liveness fomtion with n tefeence gph. Nodes the tefeence gph epesent tempoies tht e to be llocted to egistes. Edges connect tempoies tht e simultneously live nd thus cnnot use the sme physicl egiste. By usg s mny colos s lloctble physicl egistes, the egiste lloction poblem cn be solved by ssigng colos to nodes the gph such tht ll diectly connected nodes eceive diffeent colos. The clssic heuistics-bsed method by Chit et l. [21, 22] itetively builds n tefeence gph, ggessively colesces ny pi of non-tefeg, move-elted nodes, nd heuisticlly ttempts to colo the esultg gph by simplifiction (i.e., emovl of nodes with degee less thn the numbe of vilble mche egistes). If the gph is not coloble this wy, nodes e deleted fom the gph, the coespondg tempoies e spilled to memoy, nd the pocess is epeted until the gph becomes coloble. Sce Chit s ppe, mny vitions [27, 44] o impovements [19, 29, 69] to the bsic scheme hve emeged nd some of them hve been copoted poduction compiles Iteted egiste colescg Iteted egiste colescg poposed by Geoge nd Appel [38] is colog-bsed technique img t moe ggessive elimtion of edundnt move stuctions: When the souce nd desttion node of move do not tefee (i.e., e not diectly connected the gph), these nodes cn be colesced to one, nd the move stuction cn be emoved. This colescg of nodes nd b is itetively pefomed dug simplifiction if, fo evey neighbo t of, eithet ledy tefees with b o t is of significnt degee. Like the colescg sttegy of Biggs, this colescg citeion is lso consevtive. In pctice, colog-bsed lloction schemes usully poduce good code. Howeve, the cost of egiste lloction is often domted by the constuction of the tefeence gph, which cn tke time (nd spce) qudtic the numbe of nodes. Moeove, sce the colog pocess is bsed on heuistics, thee is no guntee tht the numbe of itetions will be bounded (by constnt). When compiltion time is concen, s just--time compiles o tective development envionments, gph colog o iteted egiste colescg my not be the best method to employ fo egiste lloction.
74 4.1. GLOBAL REGISTER ALLOCATION 51 L2: t3 = t1 + 1; t0 = t3; etun t0; {t1, t2} L1: if t1 > t2 then L3 else L2; L3: t1 = t1 1; goto L1; 1 : L1: if t1 > t2 then L3 else L2; 2 : L3: t1 = t1-1; 3 : goto L1; 4 : L2: t3 = t1 +1; 5: t0 = t3; 6 : etun t0; 1 : L1: if t1 > t2 then L3 else L2; 2 : L2: t3 = t1 +1; 3: t0 = t3; 4 : etun t0; 5 : L3: t1 = t1-1; 6 : goto L1; () Contol-flow gph. (b) Leiztion 1. (c) Leiztion 2. Figue 4.1: Contol-flow gph nd two of its possible leiztion Le scn egiste lloction The le scn lloction lgoithm [75] is simple to undestnd nd implement. Moeove, s its nme implies, its execution time is le the numbe of stuctions nd tempoies. It is bsed on the notion of the live tevl of tempoy, which is n ppoximtion of its liveness egion. The live tevl of tempoy is defed so tht the tempoy is ded t ll stuctions outside the live tevl. This is n ppoximtion s the tempoy might lso be ded t some stuctions with the tevl. The ide is tht fdg this ppoximtion will be much fste thn buildg complete tefeence gph. The lgoithm cn be boken down to the followg fou steps: (1) ode ll stuctions lely; (2) clculte the set of live tevls; (3) llocte egiste to ech tevl (o spill the coespondg tempoy); nd flly (4) ewite the code with the obted lloction. Let us look t ech step, usg Figue 4.1 s ou exmple. Odeg stuctions lely As long s the clcultion of live tevls is coect, n bity le odeg of the stuctions cn be chosen. In ou exmple, the simple contol-flow gph of Figue 4.1() cn be leized mny wys; Figues 4.1(b) nd 4.1(c) show two possible odegs. Diffeent odegs will of couse esult diffeent ppoximtions of live tevls, nd the choice of odeg might impct the lloction nd the numbe of spilled tempoies. An optiml odeg is one with s few contemponeous live tevls s possible, but fdg this fomtion t compile-time is time-consumg nd s such conty to the spiit of the le scn lgoithm. It is theefoe impotnt to pioichoose n odeg tht pefoms best on the ve-
75 52 CHAPTER 4. REGISTER ALLOCATION ge. Poletto nd Sk [75] suggest the use of depth-fist odeg of stuctions s the most ntul odeg nd only compe it with the odeg which the stuctions ppe the temedite code epesenttion. They conclude tht these two odegs poduce oughly simil code fo thei benchmks. We hve expeimented with mny othe diffeent odegs nd discuss thei impct on the qulity of the poduced code Section 9.1. Clcultion of live tevls Given le odeg of the code, thee is miml live tevl fo ech tempoy. Fo tempoies not loop, this tevl stts with the fist defition of the tempoy nd ends with its lst use. Fo tempoies live t the enty of loop, the tevl must be extended to the end of the loop. The optiml tevl cn be found by fist dog pecise liveness nlysis nd then by tvesg the code le ode extendg the tevls of ech tempoy to clude ll stuctions whee the tempoy is live. Fo the fist leiztion of ou exmple, vlid set of live tevls would be: t0 :[5, 6], t1 :[1, 4], t2 :[1, 3], t3 :[4, 5] nd fo the second leiztion, vlid set of live tevls would be: t0 :[3, 4], t1 :[1, 6], t2 :[1, 6], t3 :[2, 3] In the fist set of tevls, t2 is only live t the sme time s t1, but the second one, t2 is simultneously live with ll othe tempoies. A ntul impovement to the bove clcultion of lifetime tevls is to lso employ scheme such s tht descibed [89] fo utilizg lifetime holes o to pefom some fom of live nge splittg. We will emk on the use of these methods Section 9.4. Also, thee e ltentives to pefomg the somewht costly liveness nlysis tht give coect but sub-optiml live tevls. One ppoch is to use stongly connected components the contol-flow gph; see [75]. We will look close t this ltentive Section 9.2. Alloction of egistes to tevls When ll tevls e computed, the esultg dt stuctue (Intevls) gets odeed cesg sttpots so s to mke the subsequent lloction scn efficient. Fo ou fist leiztion this would esult : t1 :[1, 4], t2 :[1, 3], t3 :[4, 5], t0 :[5, 6] Alloction is then done by keepg set of lloctble fee physicl egistes (FeeRegs), set of ledy llocted tempoies (Allocted),
76 4.2. IMPLEMENTED REGISTER ALLOCATORS IN HIPE 53 nd list contg mppg of ctive tevls to egistes (Active). The ctive tevls e odeed on cesg end-pots while tvesg the stt-pot-odeed list of tevls. Fo ech tevl Intevls, tht is, fo ech tempoy t i (with tevl [stt i, end i ]) do: Fo ech tevl j Active which ends befoe o t the cuent tevl (i.e., end j stt i ), fee the coespondg egiste nd move the mppg to Allocted. If thee is fee egiste,, FeeRegs, emove fom FeeRegs, dd t i with the tevl t i :[stt i, end i ]toactive (soted on end). If, on the othe hnd, thee e no fee egistes nd the endpot end k of the fist tempoy t k Active is futhe wy thn the cuent (i.e., end k > end i ), then spill t k, othewise spill t i.by choosg the tempoy whose live tevl ends lst, the numbe of spilled tempoies is hopefully kept low. (Anothe wy to choose the spilled tempoy is discussed Section 9.3.) Rewite of the code Flly, when the lloction is completed, the code is ewitten so tht ech use o defition of tempoy volves the physicl egiste whee the tempoy is llocted to. On RISC chitectues, if the tempoy is spilled, lod o stoe stuction is dded to the code nd pecoloed physicl egiste is used sted of the tempoy. On chitectues with complex ddessg modes, such s the x86, the loction of the spilled tempoy cn often be used diectly the stuction, which cse no ext stuction fo lodg o stog the tempoy is needed. 4.2 IMPLEMENTED REGISTER ALLOCATORS IN HiPE Gph colog egiste llocto The gph colog egiste llocto is simple vt of Biggs llocto [19], which hs been dily use the HiPE compile fo mny yes. Dug this time the implementtion hs been tuned, mly by supplyg moe efficient dt stuctues. The llocto uses the simple spill cost function: the sttic count of uses. This cost function woks quite well ou context sce thee e no loop constucts Elng; eithe thee is no loop function o the function is self-ecusive which cse moe o less the whole function is the loop. A ntul impovement to the cost function would be to use the sttic pediction to give highe spill cost to egistes used the most likely tken pth. We hve not vestigted this issue sce this llocto seldom needs to spill on egiste-ich chitectues such s the SPARC.
77 54 CHAPTER 4. REGISTER ALLOCATION In the gph colog egiste llocto, the followg thee steps e iteted until no new tempoies e dded: 1) build tefeence gph; 2) colo the gph; nd 3) ewite stuctions with spills to use new tempoies. These steps e descibed moe detil below. Build tefeence gph To build the tefeence gph we fist clculte liveness fomtion fo the tempoies nd we then tvese the stuctions setg edges between tefeg tempoies to the tefeence gph. While dog this, we lso clculte the numbe of uses of ech tempoy. This numbe is used the spill cost function. Colo the gph Colog is done stightfowdly. We use wok-list Low, stck of coloble egistes Stck, nd pply the lgoithm shown s Algoithm 4.1 Algoithm 4.1 Heuistic gph colog implementtion Low is itilized to cont ll nodes of significnt degee /* i.e., tivilly coloble */ While the tefeence gph is non-empty While Low is non-empty Remove X fom Low Push {X, coloble} on the Stck Decement degee of neighbos of X Fo ech neighbo Y of low degee, put Y on Low If the tefeence gph is empty, etun Stck; othewise Select node Z to spill Push {Z, spilled} on the Stck Decement the degee of neighbos of Z Add ll significnt degee neighbos of Z to Low While the stck is not empty Pop node N fom the stck If N is not mked s spilled choose fee colo fo N. Rewite of the code The fl step of the gph colog implementtion tveses the code nd ewites ech stuction except move stuctions whose hndlg is postponed tht defes o uses spilled tempoy. Fo ech use we fist set n stuction tht moves the spilled tempoy to new tempoy nd then we ewite the oigl stuction so tht it uses the new tempoy sted. Fo stuctions defg spilled tempoy, we do simil ewite.
78 4.2. IMPLEMENTED REGISTER ALLOCATORS IN HIPE 55 Afte this ewite, spilled tempoies e only efeenced by move stuctions which cn be eplced by lods nd stoes lte stge. If no new tempoies wee dded, we e done (ll tempoies e llocted); othewise we epet the lloction with the new code, with the dded constt tht none of the new tempoies my be spilled Iteted egiste colescg llocto The iteted egiste colescg llocto closely follows the lgoithm descibed by Geoge nd Appel [38]. This llocto is optimistic its spillg (simil to the sttegy descibed [69]). Them m eson fo implementg this llocto ws to be ble to get elly good untime pefomnce of the geneted code when the speed of compiltion is not citicl. This llocto estblishes n ppoximte (pcticl) lowe bound on the numbe of spills fo the benchmks. The m stuctue of the iteted colescg llocto (shown s Algoithm 4.2) is simil to tht of the gph colog llocto; the diffeence lies mly the colog of the tefeence gph. Algoithm 4.2 Iteted egiste colescg implementtion While the tefeence gph is non-empty: /* In ech step below, the tefeence gph is updted */ if possible: Simplify the gph by emovg ll significnt degee nodes Repet colescg move-elted nodes (while possible) Feeze nodes. /* Roughly, this un-colesces pt of node */ Othewise spill node Le scn egiste llocto The fist implementtion of the le scn egiste llocto HiPE ws bsed diectly on the desciption by Poletto nd Sk [75]. Aftewd, we expeimented with vious options nd tuned the fist implementtion considebly. In this chpte, we only descibe the chosen defult implementtion (i.e., the one usg the options tht seem to wok best most cses) by lookg t how ech step of the lgoithm is implemented HiPE. In Chpte 9 we will impove on the wok by Poletto nd Sk [75] by descibg nd quntifyg the impct of the ltentives tht we tied. Odeg stuctions lely Sce thee is no eson to eode the stuctions with bsic block, we ode only the bsic blocks. The
79 56 CHAPTER 4. REGISTER ALLOCATION defult odeg of blocks is depth-fist odeg (lso clled evese postode). Clcultion of live tevls Ech live tevl consists of stt position nd n end position: these e stuction numbes coespondg to the fist defition nd lst use of the tempoy bedth-fist tvesl of the bsic blocks of the contol-flow gph. We use liveness nlysis to fd out the live- nd live-out sets fo ech bsic block. This fomtion is then used to set up the live tevls fo ech tempoy by tvesg the set of bsic blocks. All tempoies the live- set fo bsic block sttg t stuction i hve live tevl tht cludes i. All tempoies the live-out set fo bsic block endg t stuction j hve live tevl tht cludes j. Futhemoe if tempoy, t, is not cluded both the live- nd the live-out set, then the live tevl of t needs to be extended to eithe the fist stuction defg t o the lst stuction usg t with the bsic block. Alloction of egistes to tevls The lloction of egistes to tevls is pefomed by tvesg the list of tevls soted on cesg stt-pots. Dug the tvesl we use fou dt stuctues: Intevls A list of {Tempoy,SttPot,EndPot} tiples. This is soted (on SttPot) epesenttion of the tevl stuctue clculted the pevious step. FeeRegs Alloctble physicl egistes (PhysReg) which e cuently not llocted to tempoy. Active A list of {Tempoy,PhysReg,EndPot} tiples soted on cesg EndPot, used to keep tck of which tempoy is llocted to which physicl egiste fo wht peiod, so s to dellocte the physicl egistes when the lloction hs pssed the EndPot of the tempoy. This list is lso used to fd the tempoy with the longest live tevl when tempoy needs to be spilled. Alloction An unsoted list contg the fl lloction of tempoies to egistes o to spill positions. Elements of the list e eithe llocted, on the fom {Tempoy,{eg,PhysReg}} o spilled, on the fom {Tempoy,{spill,Loction}}. Fo ech tevl the Intevls list, the tvesl does the followg: 1. Move the fomtion bout ech tevl the Active list tht ends befoe o t SttPot to the Alloction stuctue. Also dd the physicl egiste ssigned to the tevl to the FeeRegs list.
80 4.2. IMPLEMENTED REGISTER ALLOCATORS IN HIPE Fd n lloction fo the cuent tevl: If thee is physicl egiste the FeeRegs list then tenttively ssign this egiste to the cuent tevl by setg the tevl nd the physicl egiste to the Active list, nd by emovg the physicl egiste fom the FeeRegs list. If thee is no fee egiste then spill the tevl with the futhest EndPot nd move this tevl to the Alloction list. If the spilled tevl is not the cuent one, then ssign the physicl egiste of the tevl tht ws spilled to the cuent tevl. Rewite of the code When ll tevls e pocessed, the Alloction stuctue is tuned to tuple tht cn be used fo O(1) mppg fom tempoy to its llocted physicl egiste (o spill position). In contst to the gph colog nd iteted colescg lloctos, the le scn egiste llocto does not use n itetive pocess to hndle spills. Insted two egistes e eseved so tht they e not used dug lloction; these egistes cn then be used to ewite stuctions tht use spilled egistes. The downside of dog so is tht ou implementtion of the le scn egiste llocto will spill slightly moe often thn elly necessy. On the othe hnd, we found tht this keeps compiltion times down fo functions tht spill, equig just one moe le pss ove the code to ewite stuctions ccodnce with the lloction mppg A nïve egiste llocto To estblish bse le fo the compisons (pesented Chpte 8) of the pesented egiste lloctos we hve lso implemented nïve egiste llocto. This llocto lloctes lloctes ll tempoies to memoy positions nd ewites the code just one pss. Fo exmple, on the SPARC, evey use of tempoy is peceded by lod of tht tempoy to egiste, nd evey defition is followed by stoe to memoy. This mens tht the numbe of dded lod nd stoe stuctions is equl to the numbe of uses nd defes the pogm. This egiste llocto is vey fst sce it only needs one pss ove the code, but on the othe hnd the dded lods nd stoes cese the code size which tun ceses the totl compiltion time. Obviously, we ecommend this egiste llocto to nobody! We simply use it to estblish lowe bound on the egiste lloction time nd n uppe bound on the numbe of spills ode to evlute the pefomnce nd effectiveness of the othe egiste lloctos.
81 58 CHAPTER 4. REGISTER ALLOCATION eg Nme M Note eg Nme M Note %g0 ZERO 0 0 %l0 ARG6 A (clle-sve) %g1 TEMP0 R Sctch %l1 ARG7 A (clle-sve) %g2 ARG11 A (clle-sve) %l2 ARG8 A (clle-sve) %g3 ARG12 A (clle-sve) %l3 ARG9 A (clle-sve) %g4 ARG13 A (clle-sve) %l4 ARG10 A (clle-sve) %g5 ARG14 A (clle-sve) %l5 TEMP3 A Locl sctch %g6 [OS] - OS-Reseved %l6 TEMP2 A emu ntive %g7 [OS] - OS-Reseved %l7 TEMP1 A Locl sctch %o0 ARG16 A Retun vlue %i0 P G Pocess pote %o1 ARG1 A (clle-sve) %i1 HP G Hep pote %o2 ARG2 A (clle-sve) %i2 H-limit G Hep limit %o3 ARG3 A (clle-sve) %i3 SP G Stck pote %o4 ARG4 A (clle-sve) %i4 S-limit G Stck limit %o5 ARG5 A (clle-sve) %i5 FCALLS G Reduction count %o6 [sp] - C-stck SP %i6 [fp] - C-fme pote %o7 RA/CP G Ret. ddess %i7 ARG15 A (clle-sve) Tble 4.1: Use of SPARC egistes HiPE. A Alloctble, R Reseved, G Globl, - Reseved by C/OS, 0 zeo The SPARC bck-end Recll tht, s pesented Section 3.2.3, HiPE uses stck desciptos to dicte to the gbge collecto which stck slots e live nd which e ded. This mkes it possible to llocte stck slot fo ech spilled tempoy, nd mk tht slot live o ded s ppopite t ech function cll. HiPE is just--time ntive code compile extension to vitulmche-bsed untime system. This hs fluenced the compile sevel wys: Thee e fo exmple sevel specil dt stuctues tht e pt of the vitul mche. These stuctues e hevily used nd impotnt fo the execution of Elng pogms, theefoe we would like to keep them egistes. On the egiste-ich SPARC chitectue, we hve chosen to cche six dt stuctues egistes (the stck pote, stck limit, hep pote, hep limit, pote to the pocess contol block, nd the eduction counte); see Tble 4.1. We hve eseved one egiste (%g1) tht the ssemble cn use to shuffle guments on the stck t tilclls. Thee e nothe five egistes tht the HiPE compile cn not use (Zeo (%g0), C s SP (%o6) nd FP (%i6), nd the two SPARC ABI eseved egistes (%g6 nd %g7)). Sce we e usg the ody SPARC cll stuction, the etun ddess is sved %o7. (At the moment we do not let the egiste lloctos use this egiste
82 4.2. IMPLEMENTED REGISTER ALLOCATORS IN HIPE 59 even non-lef functions whee the etun ddess is lso sved on the stck.) We use 16 egistes to pss guments to Elng functions, but sce these cn lso be used by the egiste lloctos, we get totl of 19 lloctble egistes. Fo le scn, two of these 19 egistes (%l6 nd %l7) e eseved to hndle spills without hvg to itete the lloction pocess, s descibed peviously. Note tht, le scn hs not peviously been pplied [74, 89] settg whee the set of lloctble egistes is limited The x86 bck-end On the x86, the lloctos ssume tht the cll stuction defes ll physicl egistes, thus peventg tempoies tht e live coss function cll fom beg llocted to physicl egiste. This mens tht ll these tempoies will be llocted on (spilled to) the stck. The ppoches used by the two bck-ends diffe when tempoy tht lives coss function clls needs to be ed fom o witten to memoy. In the wost cse, on the SPARC, ed might be needed fte ech cll; on the x86, ed is needed t ech use. 1 On the SPARC, wite is needed t ech cll; on the x86, wite is needed t ech defition. We suspect tht, functionl lnguge without destuctive updtes, the numbe of uses plus the numbe of defes is less thn two times the numbe of clls tempoy is live ove. If so, the ppoch used by the x86 bck-end is wne. We pe-llocte much fewe, only thee, egistes on the x86: The stck pote is llocted to %esp, the pocess pote to %ebp, nd the hep pote is llocted to %esi; see Tble 4.2. At function clls, ll guments e pssed on the stck nd the etun vlue is pssed %ex. The egiste llocto will not ty to llocte the guments egistes but keep them memoy (on the stck). The etun vlue on the othe hnd is lwys moved to new tempoy diectly fte the etun of the function. Hence, we cn use the %ex egiste s genel pupose egiste, levg five egistes fo the egiste lloctos. Most stuctions of the x86 stuction set chitectue (ISA) cn tke memoy loction (e.g. egiste+immedite offset) s one of the opends (o the desttion). By usg these ddessg modes, we cn 1 On the SPARC, we do the obvious nd esy optimiztion of elimtg edundnt lod/stoe pis, e.g., ld [%sp+n],m;... ; st M,[%sp+N] whee none of the stuctions between the ld nd the st ccesses M. The nlogous optimiztion of emovg the second ld the x86 equivlent of ld [%sp+n],m; <use M>; ld [%sp+n],m; <use M> is cuently not pefomed becuse the spilled vlue is most often not ed to egiste (it is used diectly by the stuction sted). Consequently, it is seldom possible to pefom this optimiztion on the x86.
83 60 CHAPTER 4. REGISTER ALLOCATION eg Nme M Note eg Nme M Note %ex A (clle-sve) %esp SP G Stck pote %ebx A (clle-sve) %ebp P G Pocess pote %ecx A (clle-sve) %esi HP G Hep pote %edx A (clle-sve) %edi A (clle-sve) Tble 4.2: Use of x86 egistes HiPE. A Alloctble, G Globl mny cses use spilled tempoies diectly, without hvg to fist lod them fom memoy to egiste. Pio to egiste lloction, the code is pseudo-ia-32 temedite fom which bity opends e llowed ech stuction. Afte egiste lloction, post-pss ensues tht ech stuction complies with the el IA-32 ISA, e.g., tht no by opetion uses moe thn one memoy opend. This is done by ewitg the code to use lods nd stoes to new tempoies. If ny stuction hs to be ewitten this wy, then the egiste llocto is clled g with the dditionl constt thn none of the newly toduced tempoies my be spilled. The nïve egiste llocto is lso usg the memoy opends of the x86. Thus, despite the fct tht ech tempoy is consideed spilled by the llocto, n dditionl lod o stoe stuction is not lwys needed. (If lods ostoes e seted, they use pe-llocted physicl egiste sted of toducg new tempoy.) Tweks fo le scn on the x86 All published ccounts of expeience usg the le scn egiste llocto hve so f been the context of egiste-ich chitectues usg cllg convention simil to the one used by ou SPARC bck-end. When dptg le scn to wok the context of the x86 bck-end, we found tht some smll djustments to the bsic lgoithm wee needed. The fist djustment ws due to the diffeence the cllg convention used by the x86 bck-end, whee the egiste llocto is esponsible fo svg ll live tempoies t cll sites. This is done by defg ll physicl egistes t ech function, focg ll tempoies tht e live t the pot of the cll to be spilled. This ment tht just pplyg the sme tevl clcultion s on the SPARC would hve led to the live tevls of ll physicl egistes ngg ove most of the function (t lest fom the fist cll to the lst cll). This would hve mde lloction of othe tempoies, nd thus use of le scn, impossible. Ou solution ws to let tempoies tht e defed sevel times but
84 4.3. RELATED WORK 61 neve used hve sevel one-stuction tevls. This effectively gve us sot of live nge splittg (see, e.g., [29]) on physicl egistes. We lso discoveed tht ext ce hs to be tken when pe-coloed egistes e used with the le scn lgoithm. Note tht this is geneic issue, but it mnifests itself moe often on egiste-poo chitectue s the x86. The cude ppoximtion of live nges by live tevls used by le scn foces tempoy to ppe live fom its fist defe to its lst use. If some physicl egiste is used often, fo exmple fo pmete pssg, then this egiste will ppe live thoughout most of the code, peventg ny othe tempoy fom beg llocted to the egiste. If mny, o even ll the physicl egistes e pecoloed, specil ce must be tken o the pogm will not be lloctble t ll. A solution to this poblem is to hndle physicl egistes septely, fo exmple by llowg them to hve sevel live tevls. Anothe simil solution would be to llow fo ech physicl egiste n fite mount of pecoloed egistes. Then ech septe use of the egiste could use diffeent pecoloed egiste. (A non-pecoloed tempoy tht is live though most of the code is not poblem, sce it cn be hndled by, e.g., live nge splittg, o by simply spillg the tempoy. A pecoloed egiste on the othe hnd cnnot be spilled sce its use dictes tht the vlue elly hs to be tht egiste.) We hve not tied ny of these solutions sce we void the poblem by not llowg ll egistes to be pecoloed on the x86, this mens tht we cuently cn not use ll physicl egistes fo gument pssg with the le scn llocto. 4.3 RELATED WORK Ou im hs been to evlute the le scn egiste lloction lgoithm, exmg both the speed of compiltion nd the pefomnce of the compiled code. We e the fist to do n evlution of le scn on SPARC nd pticul on the x86. Hnspete Mössenböck nd Michel Pfeiffe [65] hve published some mesuements on compiltion times of le scn on the x86 fte ou fist publiction on the subject. They do not, howeve, pesent ny mesuements on the untime pefomnce of le scn. As mentioned befoe, egiste lloction is n impotnt poblem nd much esech hs been done this e. In ecent yes much ttention hs lso been pid to the cesgly impotnt but egiste poo x86 chitectue. Thee hve fo exmple been poposls to use tege le pogmmg [8] o fomultg the poblem s ptitioned boolen qudtic optimiztion poblem [80] ode to get (ne) opti-
85 62 CHAPTER 4. REGISTER ALLOCATION ml lloction. These techniques show pomisg esults s f s the qulity of poduced code is concened, poducg bette lloctions on egiste poo chitectues thn colescg gph colog ppoch. Even though Appel nd Geoge [8] show tht pctice the compiltion time gows lmost lely (O(n 1.3 )) with the size of thei benchmk pogms, they cn not guntee le time lloction. They epot tht the totl lloction time fo ll thei benchmks gows fom 57 seconds with iteted egiste colescg to 11,306 seconds with optiml spillg. The long compiltion time cn ptly be ttibuted to the fct tht this llocto equies n (extenl) solve fo the tege le pogmmg poblem. Despite the optimlity of the lloction we do not fd this ppoch ttctive, sce extensive engeeg would be equied ode to implemented this llocto compile. Sce ou gol hs been to fd n llocto with low lloction times suitble fo dynmic compiltion, not n optiml lloction, we hve not exmed these lloctos futhe. Also, the iteted colescg llocto, while not gunteeg n optiml lloction, seldom spills even on the x86. As we will see the pefomnce evlution (Chpte 8), the diffeence untime pefomnce between the iteted colescg llocto nd the le scn llocto is still low. 4.4 DISCUSSION With these fou egiste lloctos plce we set out to compe the pefomnce, both tems of compiltion times nd pefomnce of the esultg code. As cn be seen fom the detils of the compison, pesented Chpte 8, code geneted by le scn pefoms esonbly well comped to the shot compiltion times. In Chpte 9 we will lso pesent the impct of some vitions to the implementtion of le scn. With the implementtion nd tung of the le scn llocto we hve tken one big step towd system which we cn pefom dynmic compiltion. As we will see lte this cn be used fo dynmic optimiztion of pocess communiction. But, befoe we stt lookg t these techniques we will exme the foundtion fo the implementtion of concuency, tht is, wys to stuctue the undelyg memoy chitectue of concuent system.
86 Chpte 5 Hep chitectues Elegnce is not optionl. Richd O Keefe A key issue the design of concuent lnguge implementtion is tht of the untime system s memoy chitectue. Clely, thee e mny diffeent wys of stuctug the chitectue of the untime system. In this chpte we pesent diffeent wys of implementg the memoy chitectue of pocesses concuent pogmmg lnguge. We beg by pesentg two diclly diffeent wys of implementg the memoy chitectue, with pivte heps, o with one shed hep. These chitectues e not new but hopefully the chcteiztion of thei behvios will shed some new light on the choices vilble to n implemento of concuent pogmmg lnguges. Both these chitectues e implemented nd tegted the HiPE system nd this desciption lso sets the scene fo the evlution of hep chitectues pesented Chpte 10. As we shll see both chitectues hve thei pos nd cons. In esponse to this we lso popose new hybid chitectue which we hope will hve the stength of the othe two chitectues without thei weknesses. We will only pesent vey pelimy mesuements of the pefomnce of this chitectue sce it equies n escpe nlysis which is beyond the scope of this thesis to pesent nd evlute. The thee chitectues pesented this chpte e: A pivte hep system, whee ech pocess hs its own pivte memoy. A shed hep system which ll pocesses she one common hep. A poposl fo hybid system with pivte heps fo pivte dt nd shed hep fo shed messges.
87 64 CHAPTER 5. HEAP ARCHITECTURES PCB P1 P2 P3 STACK sp hp HEAP Figue 5.1: Memoy chitectue with pivte heps. Thoughout this chpte, if not stted othewise, we mke the followg ssumptions: 1. The system is unng on unipocesso mche. 2. The hep gbge collecto is simil to the collecto cuently used Elng/OTP: Cheney-style semi-spce stop nd copy collecto [23] with two genetions. 3. Messge pssg nd gbge collection e opetions tht cnnot be teupted. 5.1 AN ARCHITECTURE WITH PRIVATE HEAPS The fist memoy chitectue we exme is pocess-centic. In this chitectue, ech pocess lloctes nd mnges its own memoy e which typiclly cludes pocess contol block (PCB), pivte stck, nd pivte hep. Othe memoy es, e.g., spce fo lge objects, might lso exist eithe on pe-pocess bsis o s globl e. This is the defult chitectue of the Elng/OTP R8 system, the vesion of Elng elesed by Eicsson the fll of The stck is used fo function guments, etun ddesses, nd locl vibles. Compound tems such s lists, tuples, nd objects, such s flotg pot numbes nd bity pecision teges (bignums) which e lge thn mche wod, e stoed on the hep. In this system the memoy es e ognized with the hep co-locted with the stck (i.e., the stck nd the hep gowg towd ech othe). The dvntge of dog this is tht stck nd hep oveflow tests become chep, just compison between the stck nd hep potes which cn usully be kept mche egistes. A disdvntge is tht expnsion o eloction of the hep o stck volves both es.
88 5.1. AN ARCHITECTURE WITH PRIVATE HEAPS 65 P1 P2 P1 P2 PCB STACK HEAP Befoe send Afte send Figue 5.2: Messge pssg pivte hep system. As mentioned, Elng lso suppots lge vectos of bytes (bies). These e not stoed on the hep; sted they e efeencecounted nd stoed septe globl memoy e. Hencefoth, we ignoe the possible existence of lge object spce s the issue is completely othogonl to ou discussion. Figue 5.1 shows n stnce of this chitectue when thee pocesses (P1, P2, nd P3) e pesent; shded es epesent unused memoy Pocess communiction Messge pssg is pefomed by copyg the tem to be sent fom the hep of the sende to the hep of the eceive, nd then setg pote to the messge the milbox of the eceive which is conted its PCB; see Figue 5.2. As shown the figue, locl dt stuctue might she the sme copy of subtem, but when tht dt stuctue is sent to nothe pocess ech subtem will be copied septely. As esult, the copied messge occupies moe spce thn the oigl. 1 This phenomenon could be voided by usg some mkg technique nd fowdg potes, but note tht dog so would mke the messge pssg opetion even slowe Gbge collection When pocess uns out of hep (o stck) spce, the pocess s pivte hep is gbge collected. In this memoy chitectue, the oot set of the gbge collection is the pocess stck nd milbox. Recll 1 Howeve, messge expnsion due to loss of shg is quite e pctice. In pticul it does not occu the benchmks used fo the expeimentl evlution.
89 66 CHAPTER 5. HEAP ARCHITECTURES fom Section tht two-genetionl (young nd old) Cheney-style stop-nd-copy collecto is beg used. A new hep, locl to pocess, whee live dt will be plced, is llocted t the begng of the collection. The old hep conts high wte mk (the top of the hep fte the lst gbge collection) nd dug mo collection dt below this mk is fowded to the old genetion while dt bove the mk is put on the new hep. Dug mjo collection the old genetion is lso collected to the new hep. At the end of the gbge collection the stck is moved to the e contg the new hep nd the old hep is feed. To distguish between untime systems tht e themselves implemented on top of concuency, tht is, they could be executg pllel on multiple CPUs fom sequentil implementtions tht would only be utilizg one pocesso even when unng on multi-pocesso mche we cll these systems multi-theded nd non-multi-theded (o not multi-theded) espectively. In system which is non-multitheded, like the cuent Elng/OTP system, the mutto will be stopped nd ll othe pocesses will lso be blocked dug gbge collection pivte hep system. (Thee is only one thed of contol nd it is busy dog gbge collection.) In multi-theded system the gbge collection would not necessily be blockg pivte hep system Pos nd cons This design hs numbe of dvntges: + No cost memoy eclmtion When pocess temtes, its memoy cn be feed diectly without the need fo gbge collection. Thus, one cn use pocesses fo some simple fom of memoy mngement: septe pocess cn be spwned fo computtions tht will poduce lot of gbge. + Smll oot sets Sce ech pocess hs its own hep, the oot set fo gbge collection is the stck nd milbox of the cuent pocess only. This is expected to help keepg the GC stop times shot. Howeve, without el-time gbge collecto thee is no guntee fo this. + Impoved cche loclity Sce ech pocess hs ll its dt one contiguous (nd often smll) stck/hep memoy e, the cche loclity fo ech pocess is expected to be good.
90 5.2. AN ARCHITECTURE WITH A SHARED HEAP 67 + Chepe tests fo stck/hep oveflow With pe-pocess hep, the hep nd stck oveflow tests cn be combed nd fewe fequently ccessed potes need to be kept mche egistes. Unfotuntely this design lso hs some disdvntges: Costly messge pssg Messges between pocesses must be copied between the heps. The cost of tepocess communiction is popotionl to the size of the messge. In some implementtions, the messge might need to be tvesed moe thn once: one pss to clculte its size (so s to void oveflow of the eceive s hep nd tigge its gbge collection o expnsion if needed) nd nothe to pefom the ctul copy. Moe spce needs Sce messges e copied, they equie spce on ech hep they e copied to. As shown, if the messge conts the sme subtem sevel times, thee cn even be non-le gowth when sendg messges. Also, if (sub-)tem is sent bck nd foth between two pocesses new copy of the tem is ceted fo ech send even though the tem ledy esides on the ppopite hep befoe the send. High memoy fgmenttion A pocess cnnot utilize the memoy (e.g., the hep) of nothe pocess even if thee e lge mounts of unused spce tht memoy e. This typiclly implies tht pocesses cn llocte only smll mount of memoy by defult. This tun usully esults lge numbe of clls to the gbge collecto. Fom softwe development pespective, pivte hep chitectue cn hve n impct on how pogms e witten. When pefomnce of messge pssg is concen the pogmme might hve to code the messge o come up with othe ticks (like pssg messges bies s suggested Section 5.2 of the Elng/OTP documenttion [33]) ode to educe the ovehed fo copyg. 5.2 AN ARCHITECTURE WITH A SHARED HEAP The poblems ssocited with costly messge pssg pivte hep system cn be voided by memoy chitectue whee the hep is shed. In such system ech pocess cn still hve its own stck, but thee is only one globl hep, shed by ll pocesses. The shed hep conts both messges nd ll compound tems. Figue 5.3 depicts such n chitectue.
91 68 CHAPTER 5. HEAP ARCHITECTURES PCB P1 P2 P3 STACK sp S-limit SHARED HEAP hp H-limit Figue 5.3: Memoy chitectue with shed hep. PCB P1 P2 P1 P2 STACK SHARED HEAP Befoe send Afte send Figue 5.4: Messge pssg shed hep system Pocess communiction Messge pssg is done by just plcg pote to the messge the eceive s milbox (locted its PCB); see Figue 5.4. The shed hep ems unchnged, nd neithe copyg no tvesl of the messge is needed. In this chitectue, messge pssg is constnt time opetion Gbge collection Conceptully, the gbge collecto fo this system is the sme s the pivte hep one, the only diffeence beg tht the oot set cludes the stcks nd milboxes of ll pocesses; not just those of the pocess foc-
92 5.2. AN ARCHITECTURE WITH A SHARED HEAP 69 g the gbge collection. This implies tht, even multi-theded system, ll pocesses get blocked by GC Pos nd cons This design voids the disdvntges of the pivte hep system, which e now tuned to dvntges: + Fst messge pssg As mentioned, messge pssg only volves updtg pote; n opetion which is dependent of the messge size. + Less spce needed Sce dt pssed s messges is shed on the globl hep, the totl memoy equiements e lowe thn pivte hep system. Also, note tht sce nothg is chnged on the hep, shed subtems of messges em of couse shed with messge. + Low fgmenttion The whole memoy the shed hep is vilble to ny pocess tht needs it. Unfotuntely, even this system hs disdvntges: Lge oot set Sce ll pocesses she the hep, the oot set fo ech GC conceptully cludes the stcks of ll pocesses. Unless concuent gbge collecto is used, ll pocesses em blocked dug GC. Lge to-spce With copyg collecto to-spce s lge s the hep which is beg collected needs to be llocted. One would expect tht genel this e is lge when thee is shed hep thn when collectg the hep of ech pocess septely. Highe GC times When copyg collecto is used, ll live dt will be moved dug gbge collection. As n exteme cse, sleepg pocess tht is bout to die with lots of echble dt will ffect the gbge collection times fo the whole system. With pivte heps, the live dt of only the pocess tht foces the gbge collection needs to be moved dug GC. Septe nd pobbly moe expensive tests fo hep nd stck oveflows. The followg diffeence between the two memoy chitectues lso deseves to be mentioned: In pivte hep system, it is esy to impose limits on the spce esouces tht pticul (type of) pocess cn use. Dog this shed hep system is significntly moe complicted nd pobbly quite costly. Cuently, this bility is not equied by Elng.
93 70 CHAPTER 5. HEAP ARCHITECTURES Optimiztions The poblems due to the lge oot set cn be to lge extent emedied by some simple optimiztions. Fo the fequent mo collections, the oot set need only consist of those pocesses tht hve touched the shed hep sce the lst gbge collection. Sce ech pocess hs its own stck, sfe ppoximtion, which is chep to mt nd is the one we cuently use ou implementtion, is to conside s oot set the set of pocesses tht hve been ctive (hve executed some code o eceived messge thei milbox) sce the lst gbge collection. 2 A ntul efement is to futhe educe the size of the oot set by usg genetionl stck collection techniques [25] so tht, fo pocesses which hve been ctive sce the lst GC, thei entie stck is not escnned multiple times. Notice howeve tht this is n optimiztion which is pplicble to ll memoy chitectues. Flly, the poblem of hvg to move the live dt of sleepg pocesses could be emedied by employg non-movg gbge collecto fo the oldest genetion. 5.3 PROPOSING A HYBRID ARCHITECTURE The chief dvntges of the systems descibed e tht the pivte hep system llows fo chep eclmtion of memoy upon pocess temtion nd fo gbge collection to occu dependently of othe pocesses, while the shed hep system optimizes tepocess communiction nd does not equie unnecessy tvesls of messges. Idelly, we wnt n chitectue tht combes the dvntges of both systems without heitg ny of its disdvntges. Hence, we popose hybid system which thee is one shed memoy e whee messges (i.e., dt which is exchnged between pocesses) e plced, but ech pocess hs its pivte hep fo the est of its dt (which is locl to the pocess). To mke it possible to collect the pivte hep of pocess without touchg dt the globl e, nd thus without hvg to block othe pocesses dug GC, thee should not be ny potes fom the shed messge e to pocess hep. Potes fom pivte heps (o stcks) to the shed e e llowed. Figue 5.5 shows this memoy chitectue: The thee pocesses P1, P2, nd P3 ech hve thei own 2 In ou settg, this optimiztion tuns out to be quite effective dependently of ppliction chcteistics. This is becuse n Elng/OTP system thee is lwys numbe of system pocesses (spwned t system stt-up nd used fo monitog, code upgdg, o exception hndlg) tht typiclly sty ctive thoughout pogm execution.
94 5.3. PROPOSING A HYBRID ARCHITECTURE 71 PCB P1 P2 P3 STACK sp hp HEAP SHARED MESSAGE AREA mp M-limit Figue 5.5: A hybid memoy chitectue. PCB, stck, nd pivte hep. Thee is lso shed e fo messges. The pictue shows potes of ll llowed types. Notice tht thee e no potes out of the shed e, nd no potes between pivte heps Alloction sttegy This hybid chitectue equies fomtion bout whethe dt is locl to pocess o will be sent s messge (nd thus is shed). It is desible tht such fomtion is vilble t compile time nd cn be obted eithe by pogmme nnottions, o utomticlly though the use of n escpe nlysis. Such nlyzes hve been peviously developed fo llowg stck lloction of dt stuctues functionl lnguges [70] nd moe ecently fo synchoniztion emovl fom Jv pogms [16, 26, 79]. In pctice ny nlysis would to some extent be impecise, hence hybid system, which depends on such n nlysis, hs to be designed with the bility to hndle impecise escpe fomtion. It is likely tht septe compiltion, dynmiclly lked libies, nd othe lnguge constucts (e.g., Elng the bility to dynmiclly updte the code of pticul module) will led to lowe pecision, mkg it impotnt to hndle impecise escpe fomtion efficiently. Moe specificlly, the fomtion etuned by such n escpe nlysis is tht t pticul pogm pot eithe n lloction is of type locl topocess,oescpes fom the pocess (i.e., is pt of messge), o is of unknown type (i.e., might be sent s messge). The system should then decide whee dt of unknown type is to be plced. If lloction of unknown dt is done on the locl hep, then ech send
95 72 CHAPTER 5. HEAP ARCHITECTURES PCB P1 P2 P1 P2 STACK HEAP SHARED MESSAGE AREA Befoe send Afte send Figue 5.6: Messge pssg hybid chitectue. opetion hs to test whethe its messge gument esides on the locl hep o the messge e. If the dt is ledy globl, pote cn be pssed to the eceive. Othewise the dt hs to be copied fom the locl hep to the messge e. This design mimizes the mount of dt on the shed messge e. Still, some messges will need to be copied with ll the disdvntges of copyg dt. If, on the othe hnd, lloction of unknown dt hppens on the shed memoy e, then no test is needed nd no dt eve needs to be copied. The downside is tht some dt tht is elly locl to pocess might end up on the shed e whee they cn only be eclimed by gbge collection Pocess communiction Povided tht the messge esides the shed messge e, messge pssg this chitectue hppens exctly s the shed hep system nd is constnt time opetion. Fo unifomity, Figue 5.6 depicts the opetion. As mentioned, if piece of dt which is ctully used s messge is somehow not ecognized s such by the escpe nlysis, it fist hs to be copied fom the pivte hep of the sende to the shed messge e Gbge collection Sce thee exist no extenl potes to pocess pivte e, neithe fom nothe pocess no fom the shed messge e, locl mo nd mjo collections (i.e., those cused by oveflow of pivte hep) cn hppen dependently fom othe pocesses (no synchoniztion is needed) nd need not block the system. This is conty to Steensgd s
96 5.3. PROPOSING A HYBRID ARCHITECTURE 73 scheme [85] fo Jv with thed-specific heps fo thed-specific dt nd shed hep fo shed dt. In this scheme the GC lwys collects the shed e nd thus lockg is equied. In ou scheme, gbge collection of the shed messge e equies synchoniztion. To void the poblems of epeted tvesls of longlived messges nd of hvg to updte potes the pivte heps of pocesses, the shed messge e (o just its old genetion) cn be collected with non-movg mk-nd-sweep collecto. This type of collecto hs the dded dvntge tht it is typiclly esie to be mde cementl (nd hence lso concuent) thn copyg collecto. Anothe ltentive could be to collect messges usg efeence countg. As n side, we note tht sce thee e no destuctive updtes Elng thee cn be no cyclic dt stuctues, which nomlly is poblem fo efeence countg GC Pos nd cons As mentioned, with this hybid chitectue we get most of the dvntges of both othe systems: + Fst messge pssg. + Less spce needs The memoy fo dt pssed s messges between pocesses is shed. + No cost memoy eclmtion When pocess dies, its stck nd hep cn be feed diectly without the need fo gbge collection. + Smll oot sets fo the fequent locl collections Sce ech pocess hs its own hep, the oot set fo locl gbge collection is only the stck of the pocess which is focg the collection. + Chep stck/hep oveflows. Still, this hybid system hs some disdvntges: Memoy fgmenttion. Lge oot set fo the shed messge e A gbge collection of the shed e needs to exme ll pocesses stcks nd locl heps endeg the collection costly. In the wost cse, the cost of GC will be s big s the shed hep system. Howeve, sce mny pplictions messges typiclly occupy only smll fction of the dt stuctues ceted dug pogm s evlution nd sce this shed e cn be quite lge, it is expected tht these globl GCs will be fequent. Moeove, the oot set cn be futhe educed with the optimiztions descibed Section 5.2.
97 74 CHAPTER 5. HEAP ARCHITECTURES Requies escpe nlysis The system s pefomnce is to lge extent dependent on the pecision of the nlysis which is employed Pefomnce of pototype To test the effectiveness of this hybid system we hve implemented pototype of such system with both locl heps nd one globl hep. We hve not yet implemented the escpe nlysis needed to fd out whee to llocte dt. But by mnully ewitg some pogms by hnd we hve been ble to test the system on smll benchmks. This gives us n diction of potentil gs with such system. (In this pototype we still use the sme copyg collecto fo ll pivte heps.) Without the escpe nlysis we hve not been ble to do mesuements on ny lge pogms. Insted we hve tied it on thee tificl benchmks tht send messges g stuctue: 1) keeplive whee ech pocess keeps ll its comg messges live, 2) gbge whee ech pocess thows wy the comg messges nd sted cetes new messge tht it sends to the next pocess, nd 3) sendsme whee sgle messge is ceted which is distibuted to ll the pocesses the g nd then pssed ound. By ewitg these by hnd the wy we hope will be possible fo the compile to do given fomtion fom n escpe nlysis we could do some itil mesuements. These mesuements confimed ou chcteiztion of pos nd cons of the chitectues. The shed hep system did not behve well when mny pocesses kept lge mount of dt live, nd the pivte hep system did not behve well when the sme messge hd to be copied sevel times. The hybid system on the othe hnd neve behved elly bd, but without lso chngg the old genetion GC nd without knowg wht pecision the escpe nlysis will give, we feel tht it is pemtue to sgle out the hybid system s cle wne. Fo futhe fomtion on the evlution descibed hee see [53]. It ems futue wok to do thoough evlution of this system. 5.4 RELATED WORK Tditionlly, opetg systems llocte memoy on pe-pocess bsis. The chitectue of KffeOS [11] uses pocess-specific heps fo Jv pocesses nd shed heps fo dt shed mong pocesses. Objects the shed heps e not llowed to efeence objects pocess-specific heps. This estiction is enfoced with pge potection mechnisms. In the context of multi-theded Jv implementtion, the sme chitectue is lso poposed by Steensgd [85] who gues fo thed-
98 5.4. RELATED WORK 75 specific heps fo thed-specific dt nd shed hep fo shed dt. The ppe epots sttistics showg tht, smll set of multitheded Jv pogms, thee e vey few conflicts between theds, but povides no expeimentl compison of this memoy chitectue with othe chitectues. Domni et l [32] lso suggest n chitectue with both thedspecific heps nd shed hep. In the bsic vesion of thei chitectue, dt is fist llocted loclly but if n object becomes globl (e.g., efeence to the object is seted to globl object) it is mked s globl. To impove on this ppoch they popose method whee pofilg is used to fd lloction sites whee globl dt is llocted, these sites e then ewitten to llocte diectly globl e. They compe this ppoch on one benchmk to bse system with globl shed hep. Thei evlution showed tht gbge collection times wee on vege cut hlf. Although the ovell pefomnce ws not impoved, the numbe of long gbge collection stop times decesed with this ppoch. In most concuent functionl pogmmg lnguges pocess communiction occus though shed memoy nd not though messge pssg. Also, most concuent functionl pogmmg lnguges llow some wy of updtg dt stuctues (explicitly with efeences, e.g., CML [78], o implicitly though suspended evlution lzy lnguges such s Concuent Hskell [55]). In these lnguges it would be unntul to hve pivte heps nd copy dt between pocesses, sce the system would hve to popgte updtes one copy to ll othe copies. Still, pllel nd distibuted vts of these lnguges simil, lbeit moe complicted, memoy chitectues do come up [61]. Ou wok diffes fom such ppoches tht we e evlutg the impct of the hep chitectue on the pefomnce of concuent implementtion on sgle pocesso mche. Appoches simil to ou poposed hybid hep system hve been used some implementtions of concuent functionl lnguges. Fo exmple Doligez nd Leoy descibe Concuent Cml Light [31] ( vesion of ML) whee ech thed hs thed-specific young genetion but they ll she the sme hep fo the old genetion. To llow septe gbge collection of the pivte heps thee my be no potes fom the shed hep to pivte hep no fom one pivte hep to nothe pivte hep. This ppoch equies some ext mchey sce ML hs mutble objects. All mutble objects e llocted on the shed hep, which is moe expensive thn lloctg on the pivte hep, but Doligez nd Leoy gue tht mutble objects e not common ML.
99 76 CHAPTER 5. HEAP ARCHITECTURES No nlysis is needed ode to fd lloctions of mutble objects sce ll such objects e explicitly decled s mutble. Also, if n object is stoed mutble object it hs to be copied to the shed hep ode to mt the vt tht thee e no potes fom the shed hep to pivte hep. This is done by fowdg the object nd ll its childen fom the pivte hep to the shed hep. One ext wod is used fo ech object on the pivte hep ode to fcilitte fo this ext fowdg. The im of this design is to povide low gbge collection ltency system whee theds execute pllel, nd thei mesuements dicte tht thei implementtion chieves this, but they do not povide ny compison with ny ltentive design. Concuently with ou wok, Feeley [34] gued the cse fo unified memoy chitectue fo Elng, n chitectue whee ll pocesses get to she the sme stck nd hep. This is the chitectue used the Etos system tht implements concuency though cll/cc (cllwith-cuent-contution) mechnism[1]. The cse fo the chitectue used Etos is gued convcgly by Feeley. Unfotuntely, it is vey difficult to dw conclusions fom the smll expeimentl compison between Etos nd the Eicsson Elng/OTP implementtion fo sevel esons. Fist of ll, these two systems e completely diffeent nd implement concuency vey diffeent wys. Even if the sme Elng pogm is executed both systems, the behvio of the mutto is not the sme: neithe its execution time no its lloction behvio. Secondly, ech system uses its own gbge collection implementtion with completely diffeent policies fo when nd how to esize the hep. Flly, sce Etos is not complete Elng implementtion, e.g., the module system is not implemented, the evlution only used vey smll tificil pogms. We believe tht to be ble to contibute diffeences the behvio of system to specific spect of the implementtion it is impotnt to hve system whee one cn pefom n expeimentl evlution whee just the spect question is chnged. One of the ims this thesis is to compe memoy chitectues fo concuent lnguges settg whee the est of the system is unchnged. 5.5 DISCUSSION The two extemes the pivte shed chitectue spectum, the pivte hep system nd the shed hep system, e both implemented the HiPE system. By econfigug nd ecompilg the untime system the use cn choose between these two implementtions. No ecompiltion of the ppliction code is equied, the chnges e confed
100 5.5. DISCUSSION 77 to the kenel of the untime system nd the tefce to emulted nd ntive code is unchnged. This mkes it possible to expeimentlly evlute the impct of the hep chitectue by unng exctly the sme Elng pogm ech of the two untime systems. In Chpte 10 we pesent such n evlution. Without gog to detils, the evlution shows tht the pos nd cons discussed this chpte e evident, nd tht to some extent the equiements of the ppliction should dictte the use s choice of chitectue. Still, n chitectue whee the dt is shed between pocesses opens up fo othe types of optimiztions. In the next chpte we will look t some techniques fo pocess optimiztion, nd lthough these techniques do not equie shed memoy chitectue, such n chitectue mkes thei implementtion much esie.
101
102 Chpte 6 Pocess optimiztion The contution tht obeys only obvious stck semntics, O gsshoppe, is not the tue contution. Guy Steele J. In this chpte we will pesent techniques fo educg the ovehed of concuency. We will stt by lookg t two simple techniques fo loweg the ltency of messge pssg by tung the schedule. Both these techniques e well known fom the Opetg System community, ou contibution hee is to pesent how they cn be pplied to Elng. These techniques will led up to moe mbitious new technique fo te-pocess lg guided by pofile fomtion bout the te-pocess communiction pttens of n ppliction. Ou gol hee is to pesent the technique togethe with some pelimy expeiences fom pototype implementtion. 6.1 RESCHEDULING SEND Intepocess communiction Elng is synchonous, nd the send opetion is non-blockg. Howeve, these e ctully conceptul spects on the lnguge level, nd thee e sevel wys to implement them the undelyg untime system. The cuent Elng system is implemented the ntul wy, tht is, the send opetion just plces the messge the milbox of the eceive nd then the sendg pocess contues executg until it eithe blocks eceive sttement o hs exhusted its time-slice. In most cses, when pocess sends messge it is becuse it 1 wnts the eceive to ct upon the sent fomtion. Hence, it would pobbly be the best teest of the sende to yield to the eceive this cse, nd let the eceive ct on the messge. We will efe to this type of send s eschedulg send opetion. 1 Stictly, it is of couse not the tention of the pocess, but the tention of the pogmme/ppliction tht we efe to hee.
103 80 CHAPTER 6. PROCESS OPTIMIZATION A wy to implement this is by hvg the send opetion, t lest some cses, lso suspend the sendg pocess. This would led to lowe messge pssg ltency sce the eceive cn stt executg diectly when messge is sent. Also, the cche behvio would be bette when the eceive get the messge while it still is hot the cche. In pivte hep system the messge is hot the cche ight fte send, sce the messge hs to be copied. Hence, it is impotnt to diectly switch to the eceivg pocess befoe the sende stts poducg new dt. In shed hep system, the messge does not need to be copied but the soone the eceive gets messge fte its cetion the gete the chnce tht it still is the cche. The el benefits of this design will pobbly depend both on the undelyg hdwe nd on the communiction chcteistics of the Elng pogm. The benefits of this optimiztion will likely not be vey significnt isoltion, but the bility to suspend pocess diectly fte send opens up possibilities fo futhe optimiztions. 6.2 DIRECT DISPATCH The ide to let the send opetion suspend the pocess cn be tken one step futhe by completely bypssg the schedule. Sce it is often the cse tht the sende is suspended witg fo the eceive to ect on the sent messge, ntul ction fo the sende to tke is to contibute its emg time-slice to the eceivg pocess hopg tht this will led to fste esponse. We theefoe popose diect disptch send opetion: Afte send hs plced the messge the milbox of the eceive, ny eductions left could be pssed to the eceivg pocess, which could be woken up diectly (bypssg the ode of the edy-queue). With this ppoch, some of the ovehed of the schedule could be elimted nd the ltency of messge pssg would be educed even futhe. Sce this ppoch would lso guntee tht it elly is the eceive of the messge tht will execute next, the effects of hvg the messge the cche will hopefully lso become moe evident. As with ny pocess, the eceive is llowed to execute until it blocks eceive, o the eduction count eches zeo, o it pefoms diect disptch send of its own. If the eceive ws tken fom the edy queue nd then is suspended becuse ny of the two ltte esons (i.e., the eceive is still unnble), it is impotnt to eset it to the edy queue the sme position s it ws tken fom, lest it might stve. If the eceive pefoms diect disptch send bck to the oigl sende then tht sende cn get bck the emg eductions nd cn keep on executg s usul. This wy the common cse, whee one
104 6.3. INTERPROCESS INLINING 81 pocess sends equest to nothe nd then eceives eply to the equest, cn be lmost s efficient s function cll. 6.3 INTERPROCESS INLINING Pocess optimiztion cn be tken beyond just twekg the behvio of the send opetion the untime system, to ctully optimize the code executed befoe send nd fte the ccompnyg eceive. The gols of this optimiztion e to educe the ovehed of messge cetion (fo exmple, by voidg enclosg pts of messge tuple), educe context switchg ovehed, nd open up possibilities fo futhe optimiztions by consideg the code of the eceive combtion with the code of the sende. The optimiztion is pefomed on pi of functions, the function contg the send nd the function contg the eceive. We will efe to these functions s f nd g espectively, nd the pi s cndidte pi. The code t the pot of the eceive sttement g is seted to the code of f t the pot of the send. The esultg code is then optimized usg stndd optimiztion techniques. To pefom this optimiztion we hve to espect the followg equiements: 1. Fd pogm pot whee send is pefomed. 2. Fd out t which eceive sttement this messge is eceived. 3. Ensue tht, t the time of the send, the eceivg pocess is suspended t the eceive sttement found step Ensue tht the milbox of the eceivg pocess is empty. Sce this pocess communiction behvio cn be hd to nlyze stticlly ny concuent lnguge nd dynmiclly typed lnguge such s Elng pticul we popose the use of pofilg nd dynmic optimiztion to implement this tepocess code megg. To do this we tke dvntge of two fetues of Elng: hot code lodg nd concuency. The pesence of concuency mkes it possible to implement supevision nd ecompiltion pocesses wy which is septe fom the ppliction. Hot code lodg ensues tht thee e methods fo lkg nd lodg e-optimized code to unng system n odely wy. We lso use specil HiPE extension tht mkes it possible to eplce code on pe function bsis. We fist stument the system ode to pofile the spects tht cn tigge ecompiltion. Dug noml execution supevision
105 82 CHAPTER 6. PROCESS OPTIMIZATION pocess monitos the pofile. When the pofile dictes tht pt of the pogm should be ecompiled, the supeviso stts septe pocess fo the compiltion. The pofile fomtion is used to choose cndidtes fo te-pocess optimiztion. These cndidtes consist of pis of pogm pots; one pogm pot efes to send sttement, nd the othe efes to the coespondg eceive sttement. These pis e found by pofilg ech send to collect fomtion dug execution. The collected fomtion hs two components: fomtion bout the desttion (Dest), nd the numbe of times the stuction is executed (Count). The Dest field is itilized to none, nd the Count field to 0 (zeo). When the send is executed, the Count field is cesed nd the eceivg pocess is checked. If the milbox of the eceive is empty then the pogm counte (PC) of the eceive is checked; if the PC is equl to Dest o if Dest is equl to none then Dest is set to PC. Othewise Dest is set to unknown. With this simple pofilg send with only one eceive desttion, will esult send/eceive pi consideed s cndidte fo the optimiztion. Initil expeiments with this pofilg method on the AXD 301 ATM switch mentioned Section nd Eddie (n HTTP seve descibed Section 7.2), found tht ll thei sends whee to only one specific eceive[48]. A potentil poblem with this pofilg is tht it clssifies send with two diffeent desttions s unknown. This could be solved with stged pofilg. When the pofile clssifies n impotnt sends s unknown the system could tun on moe mbitious pofile which would ecod sevel diffeent eceive desttions. The optimize could then cete one specilized send fo ech eceive. When n often executed cndidte pi is found, the functions contg the send nd the eceive e compiled to temedite code. The temedite code fgments of the two functions e then meged. In shot, the megg is done so tht the pogm pot of the send is connected with the pogm pot of the eceive. The esultg code is optimized nd compiled to ntive code. To ensue coect behvio, execution of the optimized vesion is guded by un-time test. This test checks tht equiements 3) nd 4) the bove list hold; othewise the oigl unoptimized vesion is executed The tnsfomtion We will efe to the sende (the pocess executg f)sα nd the eceive (the pocess executg g) sβ.
106 6.3. INTERPROCESS INLINING 83 Fo given send the function f cn be divided to the followg bstct blocks of code: 1. Hed (code pecedg the send) 2. Messge cetion 3. send 4. Til (the est of the code) The function g is divided to: 1. Hed (code pecedg the eceive) 2. eceive 3. Til (the est of the code) The tention of the tnsfomtion is to llow pocess α to execute code tht would othewise hve been executed by pocess β. Thus, the esultg code fo α, function f, will cont fgments of the code fom g; see Figue 6.1. The meged function f is copy of the function f with these six dditions: 1. Test A test is seted befoe the send f. This test checks whethe β is suspended t the ight pogm pot (t the eceive g) with n empty milbox. If this test succeeds the execution contues with the optimized code (item 2), othewise the execution contues with the oigl code of f. 2. (Messge copyg) In system with pivte hep chitectue the messge is copied fom the hep of pocess α to the hep of pocess β usg n explicit copy stuction. (In shed hep system, no copyg is needed.) 3. Restoe stte All live tempoies of pocess β e ed fom the stck of β. (This is done by consultg mppg fom temedite code tempoies to stck positions.) 4. Code fom g The code fom g tht is suitble fo extenl execution is then executed. 5. Sve stte All live β tempoies e witten bck to the stck of β. 6. f til A copy of the til of f is executed.
107 84 CHAPTER 6. PROCESS OPTIMIZATION ƒ g ƒ ƒ - Hed g - Hed ƒ - Hed Cete messge Receive Cete messge Send g til Test Yes Copy ƒ-til Send Restoe β-stte ƒ-til Extcted g til Sve β-stte Copy of ƒ-til Figue 6.1: Befoe the megg, function f is executed by α nd function g is executed by β. Afte the megg, f is executed by α. Sce we cn ely on subsequent optimiztion pss to clen thgs up, the megg is stightfowd. The subsequent optimiztion pss cn emove unused pths fom g. By pplyg simple vt of constnt popgtion which lso popgtes the stuctue of tems such s lists nd tuples, even if they e not tue constnts, subsequent tests nd opetions on the tems cn be folded. With the stuctue of the messge vilble the ptten mtchg used fo messge selection cn be shot-cicuited. Often Elng, some pts of the messges e just used fo switchg on the type of messge. The poposed optimiztion would mke cetion, copyg ( pivte hep system), nd switchg on tht pt of the messge unnecessy. The code fom g hs to be ewitten so tht it cn be executed extenlly, tht is, fom with pocess α. This mens tht the pimitives we wnt to le hve to be ewitten fo extenl execution.
108 6.3. INTERPROCESS INLINING 85 We cn extct lmost ll stuctions fom g fo megg with f, s long s the code fulfills fou peequisites: 1. Code explosion must not occu. 2. The code my not suspend. 3. The contol flow my not escpe the cluded code. 4. The extcted code must temte. To mke sue tht these peequisites e fulfilled some stuctions e not extcted: 1. A cll to nothe function, met cll (pply), o etun cn not be extcted sce the contol would be pssed to code tht is not dpted fo extenl execution. 2. Instuctions tht led to the suspension of the pocess, such s the explicit suspension stuction o eceive. 3. Some built- functions e lge nd uncommon nd not woth the effot to dpt fo extenl execution. 4. Non-temtg code is uncceptble. If some bug the code of pocess β mkes it loop foeve, we do not wnt this bug to popgte to the pocess α. One wy of ensug tht the extcted code temtes, is to not ccept ny loops the contol flow gph of the extcted code. Note tht this is not such hsh estiction s it my sound, sce the only wy to get loop the temedite code is by mkg til-ecusive cll whee the clle nd the cllee e the sme. If thee is loop it will pobbly cont the eceive tht cused the extction the fist plce. In this cse the contol-flow gph will be cut t this pot nd the loop will be boken. The stuctions the g til tht do not belong to ny of the ctegoies listed bove e extcted. A contol flow pth tht conts n stuction tht is not extctble is cut just befoe tht stuction. To popgte chnges the stte of β we hve to sve the new stte t the end of the extcted code. To this end, we wite ll live tempoies bck to the stck t the end of ech pth of the extcted code. At the end of ech of these pths, the contution pote of β is set to pot to stub contg the stuctions fom tht pth tht could not be extcted fom g.
109 86 CHAPTER 6. PROCESS OPTIMIZATION To simplify optimiztion we duplicte the til of f. Fom the end of ech pth of the extcted contol flow gph we set jump to this copy. This ensues tht when the code the copy is eched, the execution is gunteed to hve pssed though the code extcted fom g Futhe considetions In untime system chitectue whee ech pocess lloctes its pivte hep, the gbge collecto typiclly elies on the fct tht ll dt stuctues ccessed by pocess e llocted on the hep of tht pocess. This vt is tempoily boken while the pocess α ccesses the stte of pocess β, but sce we hve contol ove when α is suspended nd when gbge collection is tiggeed, we cn ensue tht the vt is mted t these pots. In shed hep chitectue, this is not poblem. Ou te-pocess optimize chnges the schedulg behvio. One might suspect tht this could led to chnge the concuency semntics of the pogm. Howeve, note tht sce the optimized code we do not llow the code fom g to loop nd count ech eduction tht would hve been counted befoe the optimiztion, the obsevble behvio will em unchnged. The te-pocess optimize will mege code fom two functions (f nd g). If the module of g is updted with hot-code lodg, old code fom g will em side f (ctully f ). Howeve, it will neve be executed, sce the un-time test f only succeeds when the eceive is suspended fom old code. If the module contg f is eplced then ll optimized code is emoved nd thee is no poblem t ll Retun messges The sitution whee the eceive of messge sends messge bck to the oigl sende is so common tht it is wothwhile to hndle this sitution specilly. The technique we hve devised equies the followg citei to be fulfilled: 1. Thee is send g til. 2. The desttion of the send g is the pocess α. 3. All pths though f til conts eceive. 4. The milbox of α is empty. Hence, the code stts with untime check tht ensues tht the milbox of pocess α is empty. By dog this check the begng, we get vey simplified contol flow gph fo f.
110 6.3. INTERPROCESS INLINING 87 In pivte hep system we dd stuctions tht copies the messge fom the hep of pocess β to the hep of pocess α, if the desttion of the send is α. In shed hep system no copyg is needed, the pote to the messge cn be put diectly the tempoy contg the eceived messge. Now, the nice thg is tht though simple nlysis we cn often emove the test tht pocess α is the desttion. And, dependg on how the messge is used, we might lso get id of the copyg between the pocesses completely even pivte hep chitectue Expeiences fom pototype We developed smll pototype pocess mege 1999 when the HiPE system still ws bsed on the JAM. With this pototype we pefomed two types of mesuements: mesuements of the communiction behvio of Elng pogms nd mesuements of the pefomnce gs of te-pocess optimiztion on Elng pogms. We tested the simple pofile descibed bove on some el wold pogms nd on pplictions the OTP libies to fd the communiction behvio of Elng pogms. These mesuements whee vey encougg, lmost ll sends we encounteed whee lwys to the sme desttion. The only sends tht hd multiple desttions whee sends with geneic seve ppliction OTP. (This ws expected sce the sme code fo sendg ws used by sevel diffeent seves, nd could be hndled by specilizg the code fo ech seve.) The second type of mesuements tuned out to be bit moe complicted though. Fo exmple lot of mchey ws needed to communicte to the compile exctly which send-eceive pi to optimize. To mke the pototype simple we only identified the functions tht conted the send nd the eceive. This ment tht we could not hndle ny function with multiple sends o eceives them, nd hence not ll types of pogms. Also, messge pssg Elng is quite fst ledy; ou mesuements on 140 MHz UltSPARC I unng Solis (Sun OS 5.6) dicted tht it took less thn 7 mico seconds to send messge HiPE code. To execute loop nd send 1,000,000 messges fom one pocess to nothe nd bck took on the vege 21.6 seconds fo emulted JAM code, nd 14.3 seconds fo ntive code geneted by HiPE. So thee is not to much oom fo impovements, still on synthetic benchmk whee we use pocess with stte to count the length of 10,000 elements long list we got 1.8 times speedup with the pototype. This dictes tht te-pocess optimiztion could be n teestg
111 88 CHAPTER 6. PROCESS OPTIMIZATION wy to go, t lest if we wnt to enble the use of pocesses situtions whee n Elng pogmme tody would hesitte to use them. Thee wee few othe thgs tht mde the implementtion of the pototype somewht ticky. The optimiztion hs to guntee tht pocess β is consistent stte befoe nd fte the optimiztion. In ode to chieve this the pototype we hd to enfoce the sme stckfme lyout ll pths function g, disblg othe optimiztions nd stck timmg. In the cuent vesion of HiPE which uses stck desciptos this cn be solved much esie nd moe elegntly. It lso poved ticky to enfoce consistency the hep dt: fte completed optimized send thee could be no potes between the heps of α nd β. It ws possible to enfoce this but it ment tht ll optimiztions nd tnsfomtions on the code hd to be we of this vt. Sce we choose to toduce the megg on the Icode level the pototype, ll pts of the compile down to the bck-end hd to be mde we of this vt. With shed hep system no such vt would hve to be mted, nd no copyg of dt would hve to be pefomed. This ws one of the esons we wnted to vestigte the shed hep chitectue. Unfotuntely this pototype neve evolved with the est of HiPE nd it is now completely outdted nd will hve to be e-implemented fom sctch. 6.4 POTENTIAL GAINS With tepocess lg we cn educe the ovehed of pocess communiction fou diffeent wys: 1. Shot-cicuit switches on messges We cn use fomtion fom the sendg pocess bout the fom of the messge to shot-cicuit the ptten mtchg the eceive. Sce the switchg usully is mde up of sevel tests on hep llocted dt, shot-cicuitg esults contol flow pth with fewe lod, compe, nd bnch stuctions. We lso expect tht this will lso mke the hdwe pefetchg mechnisms wok bette. If the eceive cn eceive sevel diffeent messges tht hve the sme fequency, then the switch will go diffeent wys ech time endeg the pediction useless, which esults pipele stlls. 2. Reduce messge pssg It is common Elng pogms tht pocess cetes messge, sends it to nothe pocess, which subsequently pefoms
112 6.5. RELATED WORK 89 some mtchg on the stuctue of the messge, ccesses some components of the messge nd neve looks t the whole messge g. By shot-cicuitg switchg on the messge we cn void the cetion of the messge (nd lso educe the time spent the gbge collecto). 3. Reduce context switchg We cn, the cses whee the eceive immeditely nswes, emove the context switch completely. This not only mens tht the eceive does not need to be scheduled, but it lso mens tht the executg pocess does not need to be suspended. Mesuements dicte tht mny concuent Elng pogms pocesses do not use thei whole time-slice but e sted suspended on eceive. If the sende cn keep on unng until the timeslice is used up then the expensive schedule would be executed less. Lettg the sme pocess execute longe lso esults bette cche behvio. 4. Enblg of futhe optimiztions The most significnt g cn come fom the bility to do optimiztions on the meged code, just s the el g fom pocedue lg comes fom the optimiztions done fte the lg. We get the possibility to do, fo exmple, constnt popgtion, common subexpession elimtion, nd egiste lloction on meged code fom the sende nd the eceive. 6.5 RELATED WORK With the cesed teest concuent pogmmg lnguges the impotnce of efficient implementtions hs become evident. Consequently mny methods fo optimizg concuent pogms hve been poposed. In Concuent Logic Pogmmg nd Concuent Object-Oiented Pogmmg, whee concuency is fe-ged nd implicit, pogms hve mny smll pocesses nd tensive pocess communiction. Effots hve been mde to educe the cost of concuency Concuent Logic Pogmmg lnguges by usg kd of dependency nlysis (mode nlysis) to discove situtions whee one pocess need not be stted until nothe pocess hs temted [58, 63]. Plevyk, Zhng, nd Chien [73] descibe n optimiztion of communiction between (concuent) objects concuent object-oiented lnguge. Thei technique, which llows method opetions to be led, esembles ous the use of un-time test to deteme whethe
113 90 CHAPTER 6. PROCESS OPTIMIZATION the optimiztion cn be pplied. Howeve, thei optimiztion elies on the fct tht messge send lwys povides the nme of the messge, nd on the bility to deteme though sttic nlysis the type of n object, nd thus the code the object will execute. In contst, ou technique hndles dynmiclly ceted messges nd pocesses tht my execute code tht is not pesent when the optimiztion is pplied. McNmee nd Olsson [64] descibe nd evlute numbe of soucelevel tnsfomtions fo optimizg pocess communiction impetive lnguges. They do not, howeve, descibe how the optimiztions could be tegted compile. Concuent ML (CML) [78] hs concuency semntics tht esembles tht of Elng tht pocesses cn be stted dynmiclly. An nlysis method fo Concuent ML hs been developed by Chistophe Colby [28]. His nlysis cn give n nswe to the sme question tht ou pofilg ties to nswe: Which occuences of tnsmit cn mtch which occuences of eceive? Sce CML is stticlly typed nd the communiction is synchonous nd tkes plce though shed chnnels the poblem is esie thn fo Elng. Fo exmple, if two pocesses she two chnnels of diffeent types the types of the chnnels identify which tnsmit mtches which eceive. Sce thee e no typed chnnels Elng, it is vey common to send PID messge to pocess which then eplies by sendg messge to the supplied PID. Fdg the desttion of these kds of messges though sttic nlysis would be hd. Agesen nd Hölzle [2] compe pofilg with sttic nlysis (concete type feence) the context of optimizg dynmic disptch of the object-oiented pogmmg lnguge Self. They fd tht the two techniques offe simil pecision, but thee e esons fo suspectg tht sttic nlysis would be poblemtic el Elng pplictions. These e usully huge nd to be completely sfe sttic nlysis would hve to nlyze the whole pogm. Especilly the wy sfety citicl Elng pogms e stuctued, with supevisos tht hve ccess to the PIDs of ll pocesses. Also, these systems hve to cte fo code updtes unng system. This mens tht ny optimiztion would hve to be ble to hndle the vlidtion of ssumptions bsed on code tht is updted. To summize, we would hve to hve sttic nlysis, which would hve to be ble to hndle huge mounts of ticky code with dynmic pocess stuctues nd messges with untyped dt often contg PIDs, coupled with consevtive optimiztion ble to hndle code updtes t untime.
114 Pt III Evlution 91
115
116 Evlution method In this pt of the thesis we will evlute the techniques descibed peviously the thesis. We will stt with genel pefomnce nlysis of the HiPE system to set the scene fo the followg m evlution chptes on the pefomnce of egiste lloctos nd hep chitectues. The evlutions pesented the followg chptes hve been conducted t diffeent pots time with slightly diffeent hdwe nd softwe systems. Ech chpte begs with pesenttion of which HiPE vesion nd wht hdwe hs been used. The benchmks used e lso pesented ech chpte sce they lso hve chnged slightly between evlutions. Wheneve possible nd elevnt we hve pplied ou evlution not only on benchmk pogms but lso on el wold pplictions such s Eddie ( web seve), AXD/SCCT (the time-citicl pt of n ATM switch), nd NetSim (n ppliction fo simultg opetion nd mtennce of lge netwok). The set of dustil Elng ppliction hs been somewht limited sce most such pplictions use specil hdwe nd cn not be set up n off-site evlution. Also, the open souce ntue of Elng hs encouged some uses to twek thei systems slightly, peventg thei pplictions fom unng on othe systems thn thei own. 93
117
118 Chpte 7 Pefomnce of HiPE Qui nimium pobt, nihil pobt. In this chpte we will tke look t the pefomnce of HiPE nd compe it to othe systems. The gol is to put its pefomnce context fo the followg chptes whee egiste lloctos nd hep chitectues e evluted. Note tht these mesuements hve been cied out with diffeent vesions of the HiPE system nd tht they e not tended to be comped diectly to ech othe. The tention is not to give defite pefomnce evlution of the ltest vesion of HiPE but to sset tht the HiPE system hs pefomnce which is compble with tht of othe functionl pogmmg lnguge implementtions. 7.1 ERLANG VS. OTHER FUNCTIONAL LANGUAGES Functionl pogmmg lnguges diffe significntly design philosophy (lzy vs. stict, stticlly vs. dynmiclly typed), fetues they povide (e.g., beg concuent o not), s well s pefomnce chcteistics. Fo these esons, compisons between them cnnot be vey conclusive. The tention of this section is to just get feelg bout the pefomnce of Elng implementtions by compg n ely vesion of HiPE (vesion 0.92) nd the JAM system upon which tht HiPE vesion ws bsed (vesion ) gst high-pefomnce implementtions of othe functionl lnguges. Systems used this compison e: The Bigloo vesion 2.1c Scheme compile [81] (compilg to ntive code vi gcc -O3; the Bigloo optimiztion option -fstck ws lso used), SML/NJ elese 110 with the CML extensions [78], nd CLEAN vesion [20]. Like Elng, Scheme is stict, dynmiclly typed lnguge. CML is concuent, stticlly typed, nd stict. CLEAN is stticlly typed nd lzy. This expeiment ws conducted on two-pocesso 248 MHz Sun Ult-Entepise 3000 with 1.2 GB of pimy memoy unng Solis 2.7 usg the followg fou smll benchmk pogms:
119 96 CHAPTER 7. PERFORMANCE OF HIPE qsot fib huff g(5) JAM HiPE Bigloo CML CLEAN Tble 7.1: Pefomnce of functionl lnguges on thee ecusive pogms nd one concuent. Execution times seconds. qsot Recusive implementtion of quicksot. Sots shot list 50,000 times. fib A ecusive Fiboncci function. Clcultes fib(30) 50 times. huff A vesion of Huffmn encode. Encodes nd decodes file with 32,026 chctes 5 times. The time tken to ed the file is not cluded. g This concuent benchmk cetes g of 10 pocesses nd sends 100,000 messges. The benchmk is executed 5 times. In Tbles 7.4 nd 7.1, the numbe of itetions is shown pentheses. As this benchmk tests the concuency fetues of lnguge, it is un only on implementtions tht suppot concuency. Pefomnce esults ( seconds) e shown Tble 7.1. As seen, the JAM implementtion of Elng is quite slow comped to implementtions of othe functionl lnguges; HiPE-0.92 bgs the gp down significntly. 7.2 COMPARISON OF ERLANG IMPLEMENTATIONS In this section we will compe the old HiPE-0.92 system with fou othe Elng systems of tht time: JAM, BEAM, JERICO, nd Etos. Note tht this vesion of HiPE ws bsed on JAM. The JAM nd BEAM systems used ou mesuements e fom Eicsson s Open Souce Elng system upon which HiPE is bsed. Comped with JAM, the tnsltion of Elng code to BEAM bstct mche stuctions is moe dvnced. Fo exmple, the tetment of ptten mtchg is considebly bette the BEAM system, even though full ptten mtchg compile is not implemented. Also, BEAM uses diect-theded emulto [13] usg gcc s lbels s fist-clss objects extension [84]: stuctions the bstct mche code e ddesses of
120 7.2. COMPARISON OF ERLANG IMPLEMENTATIONS 97 the pt of the emulto tht implement the stuction. The JERICO system hs been descibed Section Etos [36] is system fom the Univesity of Montel bsed on the Gmbit-C Scheme compile. It tnsltes Elng functions to Scheme functions which e then compiled to ntive code vi C. The tnsltion fom Elng to Scheme is fily diect. Thus, tkg dvntges of the similities of the two lnguges, mny optimiztions Gmbit-C e effective when compilg Elng code. Among these optimiztions e lg of function clls (cuently only with sgle module) nd unboxg of flotg-pot tempoies. Etos lso pefoms some optimiztions its Elng to Scheme tnsltion; e.g., simplifiction of ptten-mtchg. Pocess suspension Etos is done usg cll/cc implemented usg lzy copyg sttegy; see [45]. When pocess is suspended, the stck is fozen so tht no fme cuently on the stck cn be dellocted. When contol etuns to suspended pocess, its stck fmes e copied to the top of the stck. When the stck oveflows, the gbge collecto moves ll echble fmes fom the stck to the hep. In genel, suspendg nd esumg pocess will equie its stck to be copied t lest once. In contst, the JAM/BEAM/HiPE untime systems hndle pocesses explicitly; svg o estog the stte of pocess volves stog o lodg only smll numbe of egistes. The Etos compile is wok unde pogess, nd it is not yet full Elng implementtion. We hve theefoe been ble to un only eltively smll benchmks on Etos. The vesion of Etos used is 2.3. This pefomnce compison ws conducted on 143 MHz sglepocesso Sun UltSPARC 1/140 with 128 MB of pimy memoy unng Solis 2.6. In ddition to fib, qsot, nd g, the followg smll sequentil nd concuent benchmks wee used: huff el A slightly diffeent vesion of Huffmn encode compessg nd uncompessg shot stg 5000 times. The diffeence fom huff lies mly how the put is povided (fo the ske of Etos which does not cuently hndle file I/O), but the pogm is lso bitmoeelng-specific; e.g., it uses polymophic lists. nev Nive evese of 100 element list 20,000 times. smith The Smith-Wtemn DNA sequence mtchg lgoithm. The benchmk mtches one sequence gst 100 othes; ll of length 32. This is done 30 times.
121 98 CHAPTER 7. PERFORMANCE OF HIPE fib huff el nev qsot smith decode JAM BEAM JERICO HiPE ETOS Tble 7.2: Times ( seconds) fo sequentil benchmks diffeent Elng implementtions. fib huff el nev qsot smith decode BEAM JERICO HiPE ETOS Tble 7.3: Speedup of diffeent Elng implementtions comped to JAM. decode Pt of telecommunictions potocol. Decodes n comg by messge 500,000 times. This benchmk is bout 400 les. life A concuent benchmk executg 1000 genetions Conwy s gme of life on 10 by 10 bod whee ech sque is implemented s pocess. Besides benchmks, we lso epot on the pefomnce of OTP-bsed systems on two dustil pplictions of Elng: Eddie An HTTP pse hndlg 30 complex HTTP-get equests. Excludg the OTP libies used, it consists of 6 modules fo totl of 1,882 les of Elng code. This is done 1,000 times. AXD/SCCT This is the time-citicl softwe pt of the AXD 301 ATM switch mentioned Section It sets up nd tes down numbe of connections 100 times. The benchmk consists of ound 50,000 les of Elng code. Tbles 7.2 to 7.5 cont the esults of the compison. In ll sequentil benchmks, HiPE nd Etos e the fstest systems: smll pogms they e between 7 to 20 times fste thn JAM nd 3 to 8 times fste thn the BEAM implementtion. The pefomnce diffeence between HiPE nd Etos on smll pogms is not significnt. In decode, whee it is pobbly moe difficult fo Etos to optimize opetions nd ptten mtchg on by objects (i.e., on immutble sequences of
122 7.2. COMPARISON OF ERLANG IMPLEMENTATIONS 99 g(100) life Time Speedup Time Speedup JAM BEAM JERICO HiPE ETOS Tble 7.4: Times ( seconds) nd speedup ove JAM fo concuent benchmks diffeent Elng implementtions. Eddie AXD/SCCT Time Speedup Time Speedup JAM BEAM HiPE Tble 7.5: Times ( seconds) nd speedup ove JAM fo lge benchmks diffeent Elng implementtions. by dt), HiPE is twice s fst s Etos. HiPE is fste thn JAM nd BEAM, but not to the sme extent s fo the othe benchmks. When pocesses ente the pictue, Etos does not seem to be significntly fste thn JAM nd it is slowe thn BEAM. We suspect tht this is due to the implementtion of concuency Etos vi cll/cc [45]. As we move fom benchmks to el-wold pplictions of Elng, pogms tend to spend moe nd moe of thei execution time builts fom the stndd liby. Fo exmple, s mentioned, the benchmk pogm AXD/SCCT extensively uses the built-s to ccess the shed dtbse on top of the Elng tem stoge. As the implementtion of these built-s is cuently shed by JAM, BEAM, nd HiPE, the pecentge of execution spent these builts becomes bottleneck nd HiPE s speedup is less thn befoe. Still, HiPE vesion 0.92 is 24% fste thn BEAM on SCCT, nd considebly fste thn the JAM implementtion on which it is bsed. The gol with this pefomnce evlution ws to show tht HiPE vesion 0.92 ws compble with othe Elng implementtion even though it ws bsed on JAM. The gols of ech of these Elng systems e quite diffeent nd thei meits lie not only the bsolute pefomnce of the system. The JAM system poduces vey smll bytecode files, the BEAM system gives fste execution t the pice of slightly lge code size. The m gol of both theses systems hs been to povide full, potble, nd obust Elng implementtion fo use el
123 100 CHAPTER 7. PERFORMANCE OF HIPE dustil pplictions. The Etos system implements pocesses, s contutions stoed on shed hep, design tht is diclly diffeent fom the design of ll the othe systems. This llows Etos to, e.g., eclim unused stck spce moe pomptly thn then the othe systems. 1 The gol with the HiPE system hs been to hve complete modul Elng system whee implementtion techniques cn be evluted. 7.3 COMPARISON OF NATIVE VS. EMULATED CODE In this section we compe HiPE-1.0 to BEAM-R8 on set of stndd Elng benchmks. The Elng benchmk pogms used e: fib A ecusive Fiboncci function. Clcultes fib(30) 30 times. tk Tkeuchi function, uses ecusion nd tege ithmetic tensely. 1,000 epetitions of computg tk(18,12,6). length A til-ecusive list length function fdg the length of 2,000 element list 50,000 times. qsot Recusive implementtion of quicksot. Sots list 100,000 times. smith The Smith-Wtemn DNA sequence mtchg lgoithm. The benchmk mtches sequence gst 100 othes; ll of length 32. This is done 30 times. huff A Huffmn encode which encodes nd decodes 32,026 chcte stg 5 times. decode Pt of telecommunictions potocol. 500,000 epetitions of decodg n comg by messge. g A concuent benchmk which cetes g of 10 pocesses nd sends 100,000 smll messges. life A concuent benchmk executg 10,000 genetions Conwy s gme of life on 10 by 10 bod whee ech sque is implemented s pocess. pettyp Fomts lge souce pogm fo petty-ptg, epeted 4 times. Recuses vey deeply. 1 In the HiPE system ntive code does not fee up stck spce until pocess temtes. The JAM nd JERICO systems fee unused stck spce fte GC if the pecentge of used stck spce dops unde cet theshold. The BEAM lloctes stcks togethe with heps nd shks this e s needed fte GC. It would lso be esy to implement shkg of the ntive stck HiPE but we hve choosen to pioitize execution speed.
124 7.4. DISCUSSION 101 Benchmk x86 SPARC fib tk length qsot smith huff decode g life pettyp estone Tble 7.6: Speedup of HiPE-1.0 ove BEAM R8. estone Computes n Elng system s Estone numbe by unng numbe of common Elng tsks nd epotg weighted nkg of its pefomnce on these tsks. This benchmk stesses ll pts of n Elng implementtion, cludg its untime system nd concuency pimitives. Tble 7.6 shows the speedup fo HiPE ove BEAM/OTP-R8 both on the x86 nd on the SPARCpltfom. The x86 evlution ws conducted on Dell Inspion 7000, with 333 MHz Pentium-II pocesso, 128 MB memoy, unng Lux, nd the SPARC evlution ws conducted on (one pocesso of) Sun Entepise 3000 with two 248 MHz UltSPARC-II pocessos, 1.2 GB memoy, unng Solis 7. HiPE s modest speedup on g nd life is becuse these pogms spend most of thei time the untime system schedulg pocesses, nd compute vey little on thei own. 7.4 DISCUSSION The m focus of the development of HiPE up to HiPE-1.0 ws to get obust system with good sequentil pefomnce. As we hve shown this chpte, HiPE cn un el dustil pogms like AXD/SCCT, nd the speedup ove BEAM on sequentil code is significnt. On highly concuent pogms HiPE-1.0 does not povide the sme dvntge ove BEAM s on sequentil pogms, hence it is justified to look t optimiztions of concuent pogms. In the next thee chptes we will evlute some buildg blocks needed to implement dynmic compiltion nd to chieve n efficient implementtion of concuency.
125
126 Chpte 8 Pefomnce of egiste lloctos Fit expeimentum copoe vili. In this chpte we exme the pefomnce of diffeent egiste lloctos both on egiste ich chitectue (the SPARC) nd on egiste poo chitectue (the x86). We exme compiltion times, execution times, nd the ctul numbe of spills. The egiste llocto implementtions descibed peviously (Chpte 4) hve been evluted by compilg nd unng set of benchmks. All fou egiste lloctos e tegted the HiPE system (vesion 1.1) nd compile option dictes which one to use. Fo ech llocto the code is compiled the sme wy befoe nd fte pplyg ech llocto: The code is compiled to ou tenl epesenttion of SPARC o pseudo-x86 code s contol flow gph. Some simple optimiztions e pplied to the temedite stges, such s constnt popgtion, constnt foldg, ded code elimtion, nd emovl of unechble code; see e.g. [66]. Flse dependencies e ceted when the egiste llocto mps tempoies to physicl egistes, sce diffeent tempoies e mpped to the sme physicl egiste. This mens tht egiste lloction my tefee with optimiztion psses tht come fte it; stuction schedulg pticul. In ou expeimentl evlution we hve thus tuned off the (nywy quite limited fom of) stuction schedulg tht the HiPE compile pefoms on the SPARC. Still, egiste dependencies due to lloction might ffect the schedulg pefomed dynmiclly hdwe. A simil poblem cn be noted the font-end of the HiPE compile, sce the compiltion stts fom code fo egiste-bsed vitul mche. This code is ledy egiste llocted, but fo egistes of the vitul mche, nd hence conts dependencies between vitul egis-
127 104CHAPTER 8. PERFORMANCE OF REGISTER ALLOCATORS tes. These tificil dependencies follow the code dug the tnsltion fom the code to HiPE s temedite code epesenttion (tht hs n unlimited numbe of tempoies) nd cn hve negtive impct on the pefomnce of some optimiztions the compile. To emedy this, we pefom convesion to sttic sgle ssignment (SSA) fom ely the compile. Sce this convesion toduces mny new tempoies, nd hence hs big impct on the pefomnce of the egiste lloctos, we pesent most of the mesuements of this chpte two views: one without SSA convesion nd one with SSA convesion. In dog so, we lso evlute detil the impct of systemtic enmg pss pio to egiste lloction genel nd to the le scn lgoithm pticul. The two pltfoms we used e: A Pentium-III 850 MHz, 256 MB memoy Dell Ltitude lptop unng Lux, nd 2-pocesso Sun Entepise 3000, 1.2 GB m memoy unng Solis 7. Ech pocesso is 248 MHz UltSPARC-II. (Howeve, the HiPE system uses only one pocesso.) 8.1 BENCHMARKS The set of benchmks we used togethe with bief desciption of them ppes Tble 8.1. Some of them (decode, eddie) hve been chosen fom the benchmks of the pevious chpte becuse they cu spillg when compiled with le scn. We note pssg tht, on the SPARC, most othe benchmks fom the pevious chpte cu no spillg. We hve lso cluded module of the HiPE compile (bem2icode) contg vey lge function which is quite toublesome fo some egiste lloctos. Sizes of benchmk pogms (les of souce code, the numbe of tempoies nd the numbe of stuctions befoe egiste lloction) fo both SPARC nd x86 e shown Tble 8.2. Benchmk pogms mked with use functions fom stndd Elng libies tht e lso dynmiclly compiled. The les epoted e the numbe of les excludg functions fom libies, but the othe columns the tble (nd compiltion times the subsequent tbles) clude mesuements fo liby functions. The tble shows the numbe of tempoies nd stuction without SSA convesion nd with SSA convesion. Note tht the SSA convesion often dds significnt numbe of tempoies (e.g., ove 2,500 fo bem2icode). Howeve, due to bette oppotunities fo optimiztions, the numbe of stuctions on the SPARC is often educed with SSA convesion. Smlle pogms, whee thee e not s mny oppotunities fo optimiztions s lge po-
128 8.1. BENCHMARKS 105 quicksot Recusive implementtion of quicksot. Sots list with 45,000 elements 30 times. spillstess A synthetic benchmk consistg of ecusive function with sevel contuously live vibles; its only pupose is to stess the egiste lloctos. smith The Smith-Wtemn DNA sequence mtchg lgoithm. Mtches one sequence gst 100 othes; ll of length 32. life Executes 1000 genetions Conwy s gme of life on 10 by 10 bod whee ech sque is implemented s pocess. decode Pt of telecommunictions potocol. Decodes n comg messge. huff A Huffmn encode compessg nd uncompessg shot stg. MD5 Clcultes n MD5-checksum on file. The benchmk tkes file of size 32,026 bytes nd conctentes it 10 times befoe clcultg its checksum twice. pettyp Consists mly of vey lge function which fomts its put ( lge file) fo ptg, usg stict-style context pssg implementtion. estone Mesues the numbe of Estones tht n Elng system cn poduce. This is benchmk tht ims t stessg ll pts of n Elng implementtion. bem2icode The pt of the HiPE compile tht tnsltes BEAM code to temedite code. The pogm conts vey big function hndlg diffeent combtions of stuctions. Becuse of its size, this function is poblemtic fo some egiste lloctos. To get mesuble execution times, we un this benchmk 10 times. ytce A ytce tht tces scene with 11 objects (two of them with textues) nd two light souces to 80x70 24-bit colo ppm file. eddie An Elng implementtion of n HTTP pse which hndles http-get equests. Tble 8.1: Desciption of benchmk pogms.
129 106CHAPTER 8. PERFORMANCE OF REGISTER ALLOCATORS SPARC x86 Tempoies Instuctions Tempoies Instuctions Benchmk Les Diect SSA Diect SSA Diect SSA Diect SSA quicksot spillstess smith life decode huff MD pettyp estone bem2icode ytce eddie Tble 8.2: Sizes of benchmk pogms. gms, might cese size due to dded move stuctions t φ-nodes dug the SSA convesion. The BEAM vitul mche hs one egiste (x0) tht is hevily used. It is often the cse tht BEAM code looks like: x0 := x0 + x1 Without SSA convesion, this mps nicely to the 2-ddess stuctions on the x86. Howeve, fte SSA convesion new tempoy is toduced nd the bove code is tuned to: t3 := t1 + t2 This code mps nicely to the 3-ddess stuctions on the SPARC, but on the x86, it hs to be tnslted to: t3 := t1 t3 := t3 + t2 As esult, SSA convesion tends to cese the code sizes moe fo x86 thn fo SPARC. 8.2 COMPILATION TIMES We hve mesued both the time to pefom egiste lloction nd the time to complete the entie compiltion fo ech pogm. The esults (mimum of thee compiltions) e pesented Figues 8.1, 8.2, 8.3, nd 8.4 whee bs show the totl compiltion time nd thei stiped pt stnds fo the time spent the egiste llocto. In genel, both compiltion times nd egiste lloction times cese with SSA convesion, even when the numbe of stuctions the
130 8.2. COMPILATION TIMES N ïv e L e S c n G p h C o lo g C o le s c g N ïv e L e S c n G p h C o lo g C o le s c g N ïv e L e S c n G p h C o lo g C o le s c g N ïv e L e S c n G p h C o lo g C o le s c g N ïv e L e S c n G p h C o lo g C o le s c g N ïv e L e S c n G p h C o lo g C o le s c g quicksot spillstess smith life decode huff C o m p ile t im e ( s ) RA Othe N ïv e L e S c n G p h C o lo g C o le s c g N ïv e L e S c n G p h C o lo g C o le s c g N ïv e L e S c n G p h C o lo g C o le s c g N ïv e L e S c n G p h C o lo g C o le s c g N ïv e L e S c n G p h C o lo g C o le s c g N ïv e L e S c n G p h C o lo g C o le s c g pettyp estone bem2icode MD5 eddie ytce C o m p ile t im e ( s ) RA Othe Figue 8.1: Compiltion times on SPARC.
131 108CHAPTER 8. PERFORMANCE OF REGISTER ALLOCATORS N ïv e L e S c n G p h C o lo g C o le s c g N ïv e L e S c n G p h C o lo g C o le s c g N ïv e L e S c n G p h C o lo g C o le s c g N ïv e L e S c n G p h C o lo g C o le s c g N ïv e L e S c n G p h C o lo g C o le s c g N ïv e L e S c n G p h C o lo g C o le s c g quicksot spillstess smith life decode huff Compile time (s) RA Othe N ïv e L e S c n G p h C o lo g C o le s c g N ïv e L e S c n G p h C o lo g C o le s c g N ïv e L e S c n G p h C o lo g C o le s c g N ïv e L e S c n G p h C o lo g C o le s c g N ïv e L e S c n G p h C o lo g C o le s c g N ïv e L e S c n G p h C o lo g C o le s c g pettyp estone bem2icode md5 eddie ytce Compile time (s) RA Othe 558 Figue 8.2: Compiltion times, with SSA convesion, on SPARC.
132 8.2. COMPILATION TIMES Compile time (s) RA Othe 1 0 Nïve Le scn Gph colog Colescg Nïve Le scn Gph colog Colescg Nïve Le scn Gph colog Colescg Nïve Le scn Gph colog Colescg Nïve Le scn Gph colog Colescg Nïve Le scn Gph colog Colescg quicksot spillstess smith life decode huff ) (s e 30 tim ile p m20 o C RA Othe 10 0 Nïve Le scn Gph colog Colescg Nïve Le scn Gph colog Colescg Nïve Le scn Gph colog Colescg Nïve Le scn Gph colog Colescg Nïve Le scn Gph colog Colescg Nïve Le scn Gph colog Colescg pettyp estone bem2icode MD5 eddie ytce Figue 8.3: Compiltion times on x86.
133 110CHAPTER 8. PERFORMANCE OF REGISTER ALLOCATORS N ïv e L e S c n G p h C o lo g C o le s c g N ïv e L e S c n G p h C o lo g C o le s c g N ïv e L e S c n G p h C o lo g C o le s c g N ïv e L e S c n G p h C o lo g C o le s c g N ïv e L e S c n G p h C o lo g C o le s c g N ïv e L e S c n G p h C o lo g C o le s c g Quicksot Spillstess Smith Life Decode Huff Compile time (s) RA Othe N ïv e L e S c n G p h C o lo g C o le s c g N ïv e L e S c n G p h C o lo g C o le s c g N ïv e L e S c n G p h C o lo g C o le s c g N ïv e L e S c n G p h C o lo g C o le s c g N ïv e L e S c n G p h C o lo g C o le s c g N ïv e L e S c n G p h C o lo g C o le s c g pettyp estone bem2icode md5 eddie ytce Compile time (s) RA Othe Figue 8.4: Compiltion times, with SSA convesion, on x86.
134 8.3. SPEED OF EXECUTION 111 geneted code is educed. The complexity of the gph colog nd the colescg egiste llocto is not diectly dependent on wht one could nively conside s the size of the pogm. Insted the complexity depends on the numbe of edges the tefeence gph (coespondg to the numbe of simultneously live tempoies), which is fo exmple high fo the decode nd pettyp benchmks. On the othe hnd, the le scn llocto is not ffected much by the numbe of simultneously live tempoies; the lloction time is sted domted by the time to tvese the code. The compiltion nd egiste lloction times of bem2icode stick out sce, s mentioned, this pogm conts lge function with mny simultneously live tempoies. This becomes toublesome eithe when mny itetions e needed to void spillg (which is wht hppens with iteted egiste colescg nd SSA convesion on the SPARC), o when the numbe of vilble egistes is low, the poduced lloction does not espect the constts imposed by the ISA, nd smll coections to the lloction e needed (such is the cse on the x86). On the othe hnd, estone, ytce, nd eddie which e lso big pogms consist of lge numbe of smll functions tht do not exhibit this behvio to the sme extent. Compiltion-time-wise, le scn pefoms vey well: comped to gph colog, the time fo egiste lloction is significntly educed (by t lest 50% genel), nd pthologicl cses such s bem2icode e voided. In fct, fo eddie nd especilly fo bem2icode with SSA convesion, compiltion with le scn is even fste thn the nïve lgoithm; see Figue 8.2. This is due to the time needed fo ewite of the code with the lloction. Due to excessive spillg this code is lge fo the nïve llocto thn it is fo le scn; cf. lso Tble SPEED OF EXECUTION The execution times, seconds, fo ech benchmk nd llocto e pesented Figues 8.5, 8.6, 8.7, nd 8.8. They coespond to the mimum of ne executions of the pogms. Fo the estone benchmk, which conts tificil delys, we epot the numbe of estones ssigned to ech execution Figues 8.9 nd Note tht this cse moe estones mens fste execution. Even though le scn nd gph colog spill moe thn the iteted colescg llocto (see dt Tbles 8.3, 8.4, 8.5, nd 8.6), the effect of spillg on execution times is limited. On egiste-ich chitectue such s the SPARC, le scn offes most cses pefomnce compble to tht of the gph colog nd iteted colescg lloc-
135 112CHAPTER 8. PERFORMANCE OF REGISTER ALLOCATORS 20 Nïve Gph Colog Le scn Colescg 15 ) ( s e t im n t io u c e x E quicksot spillstess smith life decode huff MD5 pettyp bem2icode ytce eddie Figue 8.5: Execution times on SPARC. 20 Nïve Gph Colog Le scn Colescg 15 ) ( s e t im n t io u c e x E quicksot spillstess smith life decode huff md5 pettyp bem2icode ytce eddie Figue 8.6: Execution times, with SSA convesion, on SPARC.
136 8.3. SPEED OF EXECUTION Nïve Gph Colog Le scn Colescg 5 ) ( s e 4 t im n t io u c e 3 x E quicksot spillstess smith life decode huff MD5 pettyp bem2icode ytce eddie Figue 8.7: Execution times on x Nïve Gph Colog Le scn Colescg 5 ) ( s e 4 t im n t io u c e 3 x E quicksot spillstess smith life decode huff md5 pettyp bem2icode ytce eddie Figue 8.8: Execution times, with SSA convesion, on x86.
137 114CHAPTER 8. PERFORMANCE OF REGISTER ALLOCATORS 200 Nïve Le scn Gph Colog Colescg 1000 Nïve Le scn Gph Colog Colescg Estones (k) Estones (k) Figue 8.9: Estone nkg on SPARC (left) nd x86 (ight). Highe estone numbes epesents bette pefomnce Nïve Le scn Gph Colog Colescg 1000 Nïve Le scn Gph Colog Colescg Estones (k) Estones (k) Figue 8.10: Estone nkg on SPARC (left) nd x86 (ight) with SSA convesion. 0 tos. Le scn lso pefoms well on the x86; this is ptly due to the diffeent cllg convention (pssg guments on the stck), nd lso ptly due to the L1 cche beg ccessed lmost s fst s the egistes on the Pentium, 1 nd x86 s bility to ccess spilled tempoies diectly fom the stck most stuctions. Also, note tht diffeent egiste ssignments might ffect the dynmic stuction schedulg done by the hdwe, cusg smll diffeences execution times. See fo exmple the execution times of quicksot fo which none of the lloctos spills on SPARC (Tbles 8.3 nd 8.4), but still the execution time of the code poduced by the gph colog llocto diffes fom the execution time of the code poduced by the othe lloctos. 8.4 SPILLS ON SPARC Tble 8.3 shows the numbe of tempoies spilled nd the numbe of stuctions fte lloction without SSA convesion while Tble 8.4 shows 1 Mesuements on n Intel Pentium 4 Model 2.4 MHz with hdwe pefomnce mesuements dictes tht egiste to egiste move tke 0.45 clock cycles, stck to egiste move tkes 0.85 cycles nd egiste to stck move tkes 1.62 cycles. Still, when dependencies entes the pictue memoy ccesses cn tke 4 to 10 cycles while egiste moves still just tke ound hlf clock cycle.
138 8.4. SPILLS ON SPARC 115 Nïve Le Scn Gph Colo Ite. Colesc. Spills Insts Spills Insts Spills Insts Spills Insts quicksot spillstess smith life decode huff MD pettyp estone bem2icode ytce eddie Tble 8.3: Numbe of spilled tempoies nd SPARC stuctions fte lloction. Nïve Le Scn Gph Colo Ite. Colesc. Spills Insts Spills Insts Spills Insts Spills Insts quicksot spillstess smith life decode huff MD pettyp estone bem2icode ytce eddie Tble 8.4: Numbe of spilled tempoies nd SPARC stuctions fte lloction (with SSA). the sme fomtion fo pogms fte SSA convesion. Fom these numbes, one cn see tht even though le scn spills fewe tempoies thn the gph coloe on decode nd eddie, the totl numbe of stuctions fo gph colog is lowe when not usg SSA convesion. This is becuse the le scn llocto hs tendency to spill long live tevls with mny uses, while the gph coloe spills moe tempoies numbe but with shote live nges nd fewe uses. When pplyg SSA convesion, the numbe of live nges ceses but they lso become shote which mens tht the numbe of stuctions cn decese even though the numbe of spilled tempoies ceses. As expected, the iteted colescg llocto lwys genetes fewe spilled tempoies. Also, sce the colescg llocto is usully ble to colesce moves, the esultg numbe of stuctions is smlle fo
139 116CHAPTER 8. PERFORMANCE OF REGISTER ALLOCATORS colescg even with the sme mount of spills. As mentioned, the nïve llocto spills ll non-pecoloed tempoies, ddg lod nd stoe stuctions t ech use o defe site. The numbe of stuctions should be comped to the numbes Tble 8.2 to see the cese size cused by spills toduced by ech lgoithm. Note tht the numbe of stuctions might decese fte egiste lloction, s some move stuctions e emoved. 8.5 SPILLS ON x86 Tble 8.5 epots, fo ech benchmk, the numbe of tempoies tht e plced on the stck by the cllg convention, the numbe of dditionl spills, nd the numbe of stuctions fte lloction. Tht is, the fist column shows the numbe of tempoies tht e live ove function cll nd hence hs to be sved on the stck dug the cll. The numbe of spills the followg columns shows the dditionl numbe of tempoies tht e stoed on the stck. Tble 8.6 shows the coespondg numbes with SSA convesion tuned on. on Nïve Le Scn Gph Colo Ite. Colesc. stck Spills Insts Spills Insts Spills Insts Spills Insts quicksot spillstess smith life decode huff MD pettyp estone bem2icode ytce eddie Tble 8.5: Numbe of spilled tempoies nd x86 stuctions fte lloction. Ou esults e s follows: When the numbe of vilble egistes is low, the iteted colescg lgoithm is the cle wne s f s its bility to void spills is concened. It mnges to mimlly spill on this benchmk set. Comped with gph colog, this is ptly due to the fct tht the colescg llocto is optimistic its spillg sttegy. With only few vilble egistes, the le scn egiste llocto hs touble keepg tempoies egistes nd the numbe of spills is high comped to colescg; sometimes n ode of mgnitude highe. Comped to the gph coloe, even though the numbe of spills is often much lowe, the numbe of stuctions the esultg code is lowe to
140 8.5. SPILLS ON X on Nïve Le Scn Gph Colo Ite. Colesc. stck Spills Insts Spills Insts Spills Insts Spills Insts quicksot spillstess smith life decode huff MD pettyp estone bem2icode ytce eddie Tble 8.6: Numbe of spilled tempoies nd x86 stuctions fte lloction (with SSA). smlle extent, suggestg tht the choice of spilled tempoies is not good one. We stess tht, due to the diffeent cllg conventions used by the SPARC nd x86 bck-ends, the numbe of spills Tbles 8.3 nd 8.4 e not compble with the numbes Tbles 8.5 nd 8.6.
141
142 Chpte 9 A deepe look on le scn: Impct of some ltentives I thk lnguge designes would do bette to conside thei tget use to be genius who will need to do thgs they neve nticipted, the thn bumble who needs to be potected fom himself. The bumble will shoot himself the foot nywy. You my sve him fom efeg to vibles nothe pckge, but you cn t sve him fom witg bdly designed pogm to solve the wong poblem, nd tkg foeve to do it. Pul Ghm In ode to fd the most efficient implementtion of le scn we hve expeimented with numbe of options tht one cn conside when implementg le scn. One of these (spillg heuistics) is lso consideed [75], the question on whethe to pefom liveness nlysis o not comes ntully, nd some othes (vious odegs) e of ou own vention. Nevetheless, s expeimentg with ll these options is timeconsumg, we hope tht ou epotg on them will pove helpful to othe implementos. All expeiments of this section e conducted on the SPARC. We note pssg couple of expeiments tht we will not pesent detil hee. One set of expeiments is on the use of diffeent dt stuctues which llows both fst updtes nd fst lookups. In this cse the choice of dt stuctues is vey lnguge nd implementtion dependent nd these expeiments e only teestg if you don t hve ccess to O(1) destuctive updtes. In this cse genel blnced tees pefom well [7]. We hve lso mesued the effect of pefomg n d hoc enmg pss befoe egiste lloction. We do not epot on
143 120 CHAPTER 9. A DEEPER LOOK ON LINEAR SCAN B1 T = foo(); if (T > 0) then B2 else B3; B2 T2 = 42; etun T2; B3 T3 = T + 42; etun T3; Figue 9.1: A simple contol flow gph. tht expeiment hee, s the effect of moe systemtic enmg pss, bsed on SSA convesion, hs ledy been extensively pesented the pevious section. The esults of this expeiment cn be found the Poceedgs of the PADL 2002 Symposium [52]. 9.1 IMPACT OF INSTRUCTION ORDERING The le scn lgoithm elies on le ppoximtion of the execution ode of the code to deteme simultneously live tempoies. To spill s few egistes s possible, it is impotnt tht this ppoximtion toduces s few flse tefeences s possible. An tefeence is flse when the leiztion plces bsic block (B2) whee tempoy (T) is not live between two blocks (B1, B3) which defe nd use T. The live tevl fo T will then clude ll stuctions B2, esultg flse tefeences between T nd ny tempoies defed B2; see Figue 9.1. If, sted, the leiztion plces B2 nd B3 the opposite ode, thee will not be flse tefeence. Fdg the optiml odeg (i.e., the one with the lest numbe of flse tefeences) of CFG t compile time is not fesible; this would seiously cese the complexity of the lgoithm. It is theefoe impotnt to fd method to ode the gph which gives good esult. To deteme such n method, we hve pplied the le scn lgoithm on eight diffeent geneic odegs nd counted the numbe of spills nd the numbe of dded stuctions on ech benchmk. We will exemplify the followg odegs which we hve tied (most of them e stndd; see, e.g. [5, 66]) usg the contol-flow gph shown Figue 9.1 whee edges e nnotted with sttic pediction (tken/not-tken). Postode All childen e visited befoe the node is visited. Revese postode (o Depth-fist odeg) The evese of the ode which nodes e visited postode tvesl.
144 9.1. IMPACT OF INSTRUCTION ORDERING {t} {n} {t} {n} {n} {t} 6 7 Postode 6, 7, 5, 4, 9, 3, 2, 8, 1 Rev. postode 1, 8, 2, 3, 9, 4, 5, 7, 6 Inode 6, 5, 7, 4, 3, 9, 2, 1, 8 Rev. ode 8, 1, 2, 9, 3, 4, 7, 5, 6 Pediction 1, 2, 3, 4, 5, 7, 6, 9, 8 Peode 1, 2, 3, 4, 5, 6, 7, 9, 8 Bedth-fist 1, 2, 8, 3, 4, 9, 5, 6, 7 Rndom 1, 2, 3, 4, 5, 6, 7, 8, 9 Figue 9.2: A contol-flow gph nd its odegs. Rev PO Post- Pe- Pedict Rev IO In- Bedth- Rndom quicksot spillstess smith life decode huff MD pettyp estone bem2icode ytce eddie Sum Tble 9.1: Numbe of spilled tempoies usg diffeent bsic block odegs. Rev PO Post- Pe- Pedict Rev IO In- Bedth- Rndom quicksot spillstess smith life decode huff MD pettyp estone bem2icode ytce eddie Sum Tble 9.2: Numbe of spilled tempoies usg diffeent bsic block odegs (with SSA).
145 122 CHAPTER 9. A DEEPER LOOK ON LINEAR SCAN Peode Fist the node is visited then the childen. Pediction The sttic pediction of bnches is used to ode the bsic blocks depth fist ode. This should coespond to the most executed pth beg exploed fist. Inode The left (fllthough) bnch is visited fist, then the node followed by othe childen. Revese ode The evese of the ode tvesl. Bedth-fist odeg The stt node is plced fist, then its childen followed by the gndchilden nd so on. Rndom (o the n ppoximtion of the souce code ode.) The blocks e odeed by convetg the hsh-tble of bsic blocks to list; this list is ppoximtely odeed on n cesg bsic block numbeg, which tun coesponds to the ode the bsic blocks wee ceted. The style which pogm is witten hs big impct on which odeg pefoms best. Fctos such s how nested the code is, o the size of ech function come to ply. The esults theefoe, s expected, vy fom benchmk to benchmk, but povided the nge of benchmks is lge wne cn be found. Tbles 9.1 nd 9.2, show the numbe of spilled tempoies fo ech benchmk nd odeg. (The numbe of dded stuctions is omitted s it shows simil pictue.) As cn be seen fom these tbles, the evese postode gives the best esult. In HiPE, we e cuently usg it s the defult. 9.2 IMPACT OF PERFORMING LIVENESS ANALYSIS In [75], fst live tevl nlysis is descibed tht does not use n itetive liveness nlysis. Insted, it extends the tevls of ll live tempoies stongly connected component (SCC) to clude the whole SCC. Afte pesentg the method, the uthos conclude tht lthough compiltion usg le scn bsed on this method is sometimes fste thn noml le scn, the esultg lloction is usully much wose. We hve dependently confimed thei fdgs. In fct, the lloction is sometimes so bd tht excessive spillg ceses the time it tkes to ewite the code so much tht the SCC-bsed le scn llocto becomes slowe thn le scn with full liveness nlysis. In ou benchmk set, even compiltion-time-wise le scn with liveness nlysis is fste on moe thn hlf of the benchmks. Execution-timewise, pefomg liveness nlysis pys off.
146 9.3. IMPACT OF SPILLING HEURISTICS 123 Spills (Instuctions) Execution Time Intevl length Usge count Intevl length Usge count quicksot 0 ( 355) 0 ( 355) spillstess 11 ( 449) 32 ( 452) smith 1 (1169) 8 (1182) life 1 (1450) 2 (1446) decode 35 (1910) 200 (2114) huff 6 (2120) 83 (2292) MD5 21 (2988) 382 (3614) pettyp 31 (7109) 1277 (7326) bem2icode 96 (18562) 2786 (22373) ytce 94 (20448) 485 (20843) eddie 83 (21616) 426 (21988) estone 16 (8656) 125 (8839) 183 k 155 k Tble 9.3: Impct of spillg heuistics (with SSA convesion on the SPARC). 9.3 IMPACT OF SPILLING HEURISTICS We hve lso expeimented with the use of spillg heuistic bsed on usge counts sted of tevl length. Sce fomtion bout the length of tevls is needed by the le scn lgoithm nywy ( ode to fee egistes when n tevl ends) this fomtion cn be used fo fee to guide spillg. The usge count heuistic is slightly moe complicted to implement sce it needs some ext fomtion: the usge count of ech tempoy. Thee is lso cost fdg the tempoy with the lest use. As Tble 9.3 shows, the usge count heuistic spills moe (s expected) but spills tempoies which e not used much, so the size of the esultg code is not much bigge, nd fo life it is even smlle. Howeve, lookg t the pefomnce of the geneted code, one cn see tht the heuistics pefom on p mny cses, but the usge count heuistic pefoms much wose on, e.g., decode nd pettyp. We thus do not ecommend the use of usge counts. 9.4 LIFETIME HOLES AND LIVE RANGE SPLITTING Ou motivtion fo implementg le scn ws to hve fst, povbly le, egiste llocto to be used just--time compiltion settg. Besides liveness nlysis, we hve puposely voided usg ny technique which equies itetion nd could subtly undeme the le time chcteistics of the le scn lgoithm.
147 124 CHAPTER 9. A DEEPER LOOK ON LINEAR SCAN We hve theefoe not consideed the use of lifetime holes s poposed [89] ( technique which equies n itetive dtflow clcultion nd thus does not hve le time bound) no hve we tied to tegte septe live nge splittg [29] pss the le scn egiste llocto. We feel tht eithe of these ppoches would mke the lloction slowe nd moe complicted. In ou implementtion, by usg SSA convesion, we usully get most of the benefits fom live nge splittg, nmely mostly shot live nges. Even though thee might still be situtions whee, e.g., tempoy is defed only once nd then hs sevel lte uses givg it long live nge which might foce it to be spilled, we do not thk tht splittg this live nge would esult significnt impovement execution pefomnce. Some evidence why this is so cn be seen fom the fct tht even though the colescg egiste llocto spills much less thn ou othe lloctos, the execution time pefomnce of the geneted code is not significntly bette.
148 Chpte 10 A compison of hep chitectues And theefoe eduction t the Univesity mostly woked by the ge-old method of puttg lot of young people the vicity of lot of books nd hopg tht somethg would pss fom one to the othe, while the ctul young people put themselves the vicity of ns nd tvens fo exctly the sme eson. Tey Ptchett, Inteestg Times In this chpte we pesent pefomnce evlution of two memoy chitectues, one bsed on pivte heps nd one bsed on shed hep. Both these chitectues hve been fully implemented nd ws elesed s pt of Elng/OTP R8. (The use chooses between them though configue option.) Ou gol is to vestigte the impct of the pos nd cons of these chitectues s discussed Chpte 5. In this chpte we ef fom discussg issues elted to the expnsion/esizg policy o the impct of the itil memoy size of ech chitectue. We sted use the sme expnsion policy ll chitectues nd fix pioiwht we believe e esonble, lbeit vey consevtive, itil sizes fo ll memoy es. Moe specificlly, ll expeiments the pivte hep chitectue is stted with n itil combed stck/hep size of 233 wods pe pocess. We note tht this is the defult settg Elng/OTP nd thus the settg most fequently used the Elng community. In the compison between the pivte nd the shed hep chitectue (Section 10.2), the shed hep system is stted with stck of 233 wods nd n itil shed hep size of 10,946 wods. At fist glnce it might seem unfi to use bigge hep fo the shed hep system, but sce ll pocesses this system she sgle hep, thee is no el eson to stt with smll hep size s the pivte hep system. In such n chitectue pocess tht lloctes lge hep hogs memoy fom othe pocesses, hence thee is need to keep heps smll ode
149 126CHAPTER 10. A COMPARISON OF HEAP ARCHITECTURES to void unng out of memoy nd educe fgmenttion. In ny cse, note tht these hep sizes e extemely smll by tody s stndds (even fo most embedded systems). In ll systems, the expnsion policy is to cese the hep to the closest pe-clculted Fiboncci numbe which is bigge thn the size of the live dt 1 plus the dditionl memoy need THE BENCHMARKS AND THE SETTING The pefomnce evlution ws bsed on the followg benchmks: g A concuent benchmk which cetes g of 100 pocesses nd sends 100,000 messges. life Conwy s gme of life on 10 by 10 bod whee ech sque is implemented s pocess. pocs(numbe of pocesses, messge size) This synthetic benchmk sends messges g of pocesses. Ech pocess cetes new messge when it is spwned nd sends it to the next pocess the g (its child). Ech messge hs counte tht ensues it will be sent exctly 10 times to othe pocesses. In ddition, we used the followg el-life Elng pogms: eddie A medium-sized ( 2,000 les of code) ppliction implementg HTTP pse which hndles http-get equests. BEAM compile A lge pogm ( 30,000 les of code excludg code fo libies) which is mostly sequentil; pocesses e used only fo I/O. The benchmk compiles the file gstk geneic.el of the Elng/OTP R8 distibution to BEAM code. NETSim (Netwok Element Test Simulto) A lge commecil ppliction ( 630,000 les of Elng code) mly used to simulte the opetion nd mtennce behvio of netwok. In the ctul benchmk, netwok with 20 nodes is stted nd then ech node sends 100 lm busts though the netwok. The NETSim ppliction consists of sevel diffeent Elng nodes. Only thee of these nodes e used s benchmks, nmely netwok TMOS seve, netwok coodto, nd the lm seve. Some dditionl fomtion bout the benchmks is conted Tble Due to licensg esons, the pltfom we hd to use fo the NETSim pogm ws SUN Ult 10 with 300 MHz Sun UltSPARC-IIi pocesso nd 384 MB of RAM unng Solis 2.7. The mche ws 1 The size of live dt is the size of the hep fte GC.
150 10.1. COMPARISON OF PRIVATE VS. SHARED HEAP 127 Benchmk Pocesses Messges g ,000 life ,396 eddie 2 2,121 BEAM compile 6 2,481 NETSim TMOS 4,066 58,853 NETSim coodto ,730 NETSim lm seve 12, ,675 pocs 100x ,262 pocs 1000x100 1, ,512 pocs 100x ,262 pocs 1000x1000 1, ,512 Tble 10.1: Numbe of pocesses nd messges. othewise idle dug the benchmk uns: no othe uses, no wdow system. Becuse of this, nd so s to get consistent pictue, we decided to lso use this mche fo ll othe benchmks too. In the est of this section, ll figues contg execution times pesent the dt the sme fom. Mesuements e gouped by benchmk, nd times hve been nomlized so tht the execution time fo the pivte hep system (leftmost b ech goup nd identified by P) is 1. Bs to its ight show the eltive execution time fo the shed hep (S) nd, wheeve pplicble, the hybid (H) system. Fo ech system, the execution time is subdivided to time spent the mutto, time spent the send opetion, time spent copyg messges, nd time tken by the gbge collecto futhe subdivided to time fo mo nd mjo collections. Fo the pivte hep system, Figues 10.2 nd 10.1 we lso explicitly show the time to tvese the messge ode to clculte its size (this is pt of the send opetion) A COMPARISON OF A PRIVATE HEAP VS. A SHARED HEAP ARCHITECTURE Time pefomnce As cn be seen Figue 10.1(), the synthetic pocs benchmk, the shed hep system is much fste when it comes to sendg smll-sized messges mong 100 Elng pocesses. This is ptly due to the send opetion beg fste nd ptly becuse the shed hep system stts with bigge hep nd hence does not need to do s much gbge collection. When messges e smll, cesg the numbe of pocesses to 1000 does not chnge the pictue much s cn be seen Figue 10.1(b). On the othe hnd, if the size of the messge is cesed so tht the
151 128CHAPTER 10. A COMPARISON OF HEAP ARCHITECTURES 1,00 () (b) (c) (d) 0,90 0,80 0,70 0,60 0,50 0,40 0,30 0,20 0,10 0,00 P S P S P S P S 100x x x x1000 mutto send size copy mo mjo Figue 10.1: Nomlized times fo the pocs benchmk. shed hep system lso equies fequent gbge collection, the effect of the bigge oot set becomes visible; see Figues 10.1(c) nd 10.1(d). This is expected, sce the numbe of pocesses which hve been ctive between gbge collections (i.e., the oot set) is quite high. The pefomnce of the two chitectues on el pogms shows moe mixed pictue (Figue 10.2). The shed hep chitectue outpefoms the pivte hep chitectue on mny el-wold pogms. Fo eddie, the g is unelted to the itil hep sizes; cf. [93]. Insted, it is due to the shed hep system hvg bette cche behvio by shg messges nd by voidg gbge collections. In the tuly concuent pogms, g nd life, the pivte hep system spends 18% nd 25% of the execution time tepocess communiction. In contst, the shed hep system only spends less thn 12% of its time messge pssg. The speedup fo the BEAM compile cn be expled by the lge itil hep size fo the shed hep system which educes the totl time spent gbge collection to one thid. The pefomnce of the shed hep chitectue is wose thn tht of the pivte hep system two of the NETSim pogms nd thee is speedup only the cse whee the numbe of pocesses is modete. This is to some extent expected, sce NETSim is commecil poduct developed ove mny yes usg pivte hep-bsed Elng/OTP system nd tuned ode to void gbge collection nd educe send times. Fo exmple,
152 10.2. COMPARISON OF PRIVATE VS. SHARED HEAP 129 1,10 1,00 0,90 0,80 0,70 0,60 0,50 0,40 0,30 0,20 0,10 0,00 P S P S P S P S P S P S P S g life eddie BEAM NETSim compile TMOS seve NETSim NETSim coodto lm seve mutto send size copy mo mjo Figue 10.2: Nomlized execution times. fom the numbe of pocesses Tble 10.1 nd the mximum totl hep sizes which these pogms llocte (dt shown Tble 10.2), it is cle tht the NETSim pogms eithe the mjoity of the pocesses do not tigge gbge collection the pivte hep system s thei heps e smll, o pocesses e used s mens to get no-cost hep eclmtion. As esult, the possible g fom diffeent memoy chitectue is smll. Indeed, s obseved the cse of NETSim lm seve, the lge oot set (cf. Tble 10.1) cn seiously cese the time spent gbge collection nd slow down execution of pogm which hs been tuned fo pivte hep chitectue. We suspect tht the genel speedup fo the mutto the shed hep system is due to bette cche loclity: ptly due to equig fewe gbge collections by shg dt between pocesses nd ptly due to hvg hep dt cche when switchg between pocesses. Note tht this is conty to the genel belief the Elng community nd pehps elsewhee tht pivte hep chitectue esults bette cche behvio. To veify ou hunch, we mesued the numbe of dt cche misses of some of these benchmks usg the UltSPARC hdwe pefomnce countes. In pogms tht equied gbge collection, the numbe of dt cche misses of the shed hep system is deed smlle thn tht of the pivte hep system; howeve only by
153 130CHAPTER 10. A COMPARISON OF HEAP ARCHITECTURES Pivte Mo Pivte Mjo Shed Mo Shed Mjo g life eddie BEAM compile NETSim TMOS NETSim coodto NETSim lm Figue 10.3: Mx gbge collection stop times (ms). bout 3%. Although this confims tht shed hep system cn hve bette cche behvio, we e not sue whethe the diffeence cche misses ccounts fo ll the mutto speedup we obseve o not Stop times Figue 10.3 shows the longest gbge collection stop time milliseconds fo ech benchmk. As cn be seen, the concen tht mny pocesses cn led to lge oot set nd hence longe gbge collection ltency is justified. When the oot set consists of mny pocesses, the stop times fo the shed hep system e slightly longe thn those of the pivte hep system. As the memoy equiement of pogm cese (dt shown Tble 10.2), the gbge collection stop times lso cese. Also, with copyg collecto, moe live dt will led to wose cche behvio. Bigge memoy equiements lso men tht collection is equied moe often, which ceses the likelihood tht GC will be tiggeed t moment when the oot set is lge o thee is lot of live dt (which is the wost cse fo the type of collecto beg used). We mention, tht lthough the genel pictue is simil, the GC ltency deceses when sttg the systems with bigge itil hep sizes. Notice tht the diffeence mximum stop times between the two systems is not vey big nd tht pivte hep system is no guntee fo shot GC stop times. Tue el-time GC ltency cn only be obted usg el-time gbge collecto.
154 10.2. COMPARISON OF PRIVATE VS. SHARED HEAP 131 Pivte Shed Benchmk Allocted Used Allocted Used g life eddie BEAM compile NETSim TMOS NETSim coodto NETSim lm seve Tble 10.2: Hep sizes llocted nd used ( 1,000 wods) Spce pefomnce Tble 10.2 conts spce compison of the pivte vs. the shed hep chitectue on ll non-synthetic benchmks. Fo ech pogm, mximum sizes of hep llocted nd used is shown thousnds of wods. Recll tht both systems gbge collection is tiggeed wheneve the hep is full; fte GC, the hep is not expnded if the hep spce which is ecoveed stisfies the need. This expls why mxim of llocted nd used hep sizes e often identicl fo the shed hep system. Fom these figues, it is cle tht spce-wise the shed hep system is wne. By shg messges, it usully lloctes less hep spce; the spce pefomnce on the NETSim pogms is especilly stikg. Moeove, by voidg fgmenttion, the shed hep system hs bette memoy utiliztion Summy None of the two hep chitectues is cle wne, the behvio of the ppliction should idelly ffect the choice of chitectue. If choice between those chitectues hs to be mde pioi, it ppes tht the shed hep chitectue is pefeble to the pivte hep one: it esults bette spce utiliztion nd is often fste, except cses with mny pocesses with high mounts of live dt. Also, the shed hep chitectue opens up fo othe optimiztions such s those descibed Chpte 6 nd should pehps be used fo these esons.
155
156 Pt IV Conclusion 133
157
158 Chpte 11 Conclusion Most ppes compute science descibe how thei utho lened wht someone else ledy knew. Pete Lnd This thesis descibes genelly pplicble techniques needed fo efficient implementtions of concuent functionl pogmmg lnguges. Specificlly we hve descibed techniques fo egiste lloction, diffeent hep chitectues, nd te-pocesses optimiztion. We hve evluted the pefomnce of the le scn egiste llocto nd the pefomnce of shed hep chitectue, both enblg technologies fo dynmic, pofile-bsed, te-pocess optimiztion. These techniques hve been evluted by benchmkg el dustil pogms with esech compile system of poduction qulity. Apt fom descibg nd evlutg these techniques, this thesis lso gives thoough desciption of the HiPE system which we hope will be of vlue to othe implementos of pogmmg lnguges SUMMARY OF CONTRIBUTIONS As esult of the esech descibed this thesis we povide complete nd obust ntive code Elng system which mkes it possible to test compile optimiztions on el wold pogms. We hve shown the usefulness of this tool s esech vehicle by evlutg egiste lloction sttegies nd diffeent hep chitectues. As softwe systems gow size nd become moe dynmic ntue the need fo pofile-guided optimiztion nd just--time compiltion lso ceses. Just--time compiles not only hve to genete efficient code but the code genetion itself hs to be fst. Theefoe, it is impotnt to fd fst solutions to impotnt compile poblems such s egiste lloction. In this thesis we hve mde thoough evlution of le scn egiste lloction, concuent functionl pogmmg lnguge, settg quite diffeent fom the oigl impetive one. In
159 136 CHAPTER 11. CONCLUSION this evlution we comped le scn with two othe egiste lloctos nd with stck lloction, both on the egiste ich UltSPARC chitectue nd on the egiste poo x86 chitectue. We hve lso evluted implementtion options of the le scn lgoithm. Ou expeience fom the expeiments with le scn is tht egiste-ich envionment, such s the SPARC (o the upcomg IA-64), le scn is vey espectble egiste llocto: It is significntly fste thn lgoithms bsed on gph colog, esultg code tht is lmost s efficient. When compiltion time is concen, o t low optimiztion levels, it should be used. Disegdg compiltion-time concens, on egiste-poo chitectues, n optimistic iteted colescg egiste llocto (which cn elimte most egiste-egiste moves) is bette ppoch to obtg high-pefomnce. We hve lso looked t how to mnge hep memoy concuent pogmmg lnguge by descibg nd systemticlly vestigtg thee diffeent hep chitectues. We hve evluted the pefomnce of the two extemes: system with only pivte heps nd system whee ll pocesses she one globl hep. As ou expeimentl evlution with diffeent hep chitectues shows, pefomnce does depend on pogm chcteistics nd the tdeoffs tht we discussed do exhibit themselves pogms. Pehps it is bette to leve this choice to the use, which is the ppoch we e cuently tkg by povidg moe thn one hep chitectue the Elng/OTP elese. When the choice between these chitectues hs to be mde pioi, it ppes tht the shed hep chitectue is pefeble to the pivte hep one: it esults bette spce utiliztion nd is often fste, except cses with mny pocesses with high mounts of live dt. The hybid system might combe the dvntges of the two othe chitectues, but it ems futue wok to see how well it pefoms nd to fd wht pecision is possible to get of the escpe nlysis tht is equied to guide the compile usg the hybid system. Howeve, pehps thee e othe citei tht might lso fluence the decision. Achitectues whee messges get plced n e which is shed between pocesses fee the pogmme fom woyg bout messge sizes. Moeove, they open up new oppotunities fo tepocess optimiztions. Fo exmple, with shed hep system one could, with lowe ovehed thn pivte hep scheme, switch to the eceivg pocesses t messge send, chievg fom of fst emote pocedue cll between pocesses. It would even be possible to mege (nd futhe optimize) code fom two communictg pocesses
160 11.2. DISCUSSION 137 stightfowd mnne. These methods lso enble futhe optimiztions coss pocess boundies, such s constnt popgtion nd moe globl egiste lloction. The context switch cn be completely elimted some cses, educg the ovehed fo concuency. These optimiztions will speed up existg Elng pogms without equig ny modifictions to thei souce code. Sce the use of pocesses will be less expensive, the usefulness of concuency is extended, mkg it possible to use pocesses cses whee it peviously hs been consideed too expensive DISCUSSION The wok descibed this thesis is hopefully just the begng of longe ongog esech poject. Hee we hve lid the foundtion fo contued esech on concuent functionl pogmmg by buildg the fstuctue fo n dustil stength compile. Encouged, nd pehps misled, by how quickly the fist pototype (JERICO) ws done, we set ou gols on full dustil stength implementtion of Elng. It took us much longe thn expected to weed out the bugs nd to twek nd tune the implementtion. Fotuntely, the whole pocess ws leng expeience nd the esult is compile tht cn be used both the dusty to impove execution speed, nd cdemi to ty out new implementtion techniques FUTURE RESEARCH Thee e sevel es, elted to the wok pesented hee, which we hve not yet hd time to vestigte thooughly, but which we feel would be teestg fo futue esech. One poblem with the le scn egiste llocto, on egiste poo mche, is the hndlg of pecoloed egistes. The le scn lgoithm hs poblems with shot but dependent live-nges tht use the sme tempoy, sce it ppoximtes the liveness of tempoy by n tevl fom the fist defe to the lst use. This poblem cn, most cses, esily be llevited by enmg so tht ech septe live-nge uses diffeent tempoy. Unfotuntely, this does not wok fo pecoloed physicl egistes, which cn not be enmed. Fo exmple, if physicl egiste, %ex, is used the begng of function, s n comg gument egiste, nd then lso used t the end of the function, s n gument egiste cll, the le scn lgoithm will give %ex live tevl tht encompsses the whole function. Thus, %ex cn not be used to hold ny othe tempoy nywhee the function. Now, if ll physicl egistes e to be used s guments one will end up sitution whee thee e no fee egistes
161 138 CHAPTER 11. CONCLUSION nywhee the function. Anothe poblem with le scn is tht the decision on whee to llocte tempoy is tken loclly dependg on wht is fee t the stt of the tevl. This mens tht the llocto cn choose to llocte t1 to %ex, becuse %ex is fee t the stt of t1 s tevl, but lte it might tun out tht %ex is used s pecoloed egiste n stuction with the tevl of t1. In this cse t1 hs to be stck llocted ode to fee up %ex fo the pecoloed use. The fist poblem could pehps be solved by hndlg pecoloed egistes diffeently thn othe tempoies, llowg them to hve sevel live nges. The second poblem could pehps be solved by, dug the lloction of tempoy t1, tygtofdfeephysiclegistewhich will be unused until the end of t1 s tevl. Both these solutions would equie some specil hndlg of pecoloed egistes nd some ext mchey fo choosg fee egiste. We hve not expeimented with them sce it pobbly would mke the llocto slightly slowe, but ode to hndle up to five gument egistes on the x86 solution to these poblems must be found. The pefomnce of the hybid hep system nd the pecision of n escpe nlysis Elng e obviously two elted es tht equie futhe vestigtion. But thee e lso othe pts of the memoy chitectue tht deseve deepe look. One of the potentilly most impotnt chnges fo long lived pplictions is non-movg oldest genetion collecto. Especilly shed hep system it is impotnt tht shot-lived memoy hungy pocesses do not foce lge chunks of long lived dt to be moved sevel times. Anothe spect woth vestigtg is the effect of gbge collection policies such s when nd how much to gow o shk the hep. In Elng, whee the numbe of pocesses unng system cn be vey high, it is impotnt, pivte hep system, to keep the mount of unused hep spce down. Hence, it is impotnt tht spse heps e shunk. The question is how to decide when hep should be shunk nd by how much it should shk. One wy of fdg good policy fo the gbge collecto could be to let the system hve moe dynmic policy tht could chnge ccodg to the behvio of the unng ppliction. A thid spect of the memoy chitectue woth vestigtg is wht the pefomnce impct of el-time gbge collecto would be. One dvntge with the HiPE/OTP system is tht the compile nd the lode e pt of the untime system mkg it esy to implement dynmic ecompiltion. The poblem is figug out wht kd of optimiztions would benefit enough fom dynmic untime fomtion to outweigh the compiltion cost. Anothe question is whethe it is up
162 11.3. FUTURE RESEARCH 139 to the use, the ppliction pogmme, o the compile wite to fd cndidtes fo dynmic optimiztion. It is cle tht if on-the-fly dynmic ecompiltion is to be vible option then much engeeg nd esech hve to be put to the lgoithms used compiles ode to futhe educe compiltion times. We believe tht the HiPE system cn seve s pltfom fo this esech.
163
164 Refeences Timeo homem unius libii. Thoms of Aquo The numbes bces dicte on which pges ech cittion occued. 1. IEEE Std Ieee Stndd fo the Scheme Pogmmg Lnguge. Institute of Electicl nd Electonic Engees, Inc., New Yok, NY, {76} 2. Ole Agesen nd Us Hölzle. Type feedbck vs. concete type feence: A compison of optimiztion techniques fo object-oiented lnguges. ACM SIGPLAN Notices, 30(10):91 107, Octobe {90} 3. Gul A. Agh. Actos: A Model of Concuent Computtion Distibuted Systems. MIT Pess, {13} 4. B. Ahlgen, P. Gunngbeg, nd K. Moldeklev. Incesg communiction pefomnce with miml-copy dt pth suppotg ilp nd lf. Jounl of High Speed Netwoks, 5(2): , {16} 5. Alfed V. Aho, Rvi Sethi, nd Jeffey D. Ullmn. Compiles: Pciples, Techniques nd Tools. Addison-Wesley, Redg, MA, {20, 49, 120} 6. Bowen Alpen, Mk N. Wegmn, nd F. Kenneth Zdeck. Detectg equlity of vibles pogms. In Poceedgs of the 15th Annul ACM Symposium on Pciples of Pogmmg Lnguges, pges ACM Pess, Jnuy {33} 7. Ane Andesson. Genel blnced tees. Jounl of Algoithms, 30(1):1 18, Jnuy {119} 141
165 142 REFERENCES 8. Andew W. Appel nd Ll Geoge. Optiml spillg fo CISC mches with few egistes. In SIGPLAN Confeence on Pogmmg Lnguge Design nd Implementtion, pges , {61, 62} 9. Joe Amstong nd Robet Vidg. One pss el-time genetionl mk-sweep gbge collection. In Heny G. Bke, edito, Poceedgs of Intentionl Wokshop on Memoy Mngement, numbe 986 LNCS, pges Spge-Velg, Septembe {15} 10. Joe Amstong, Robet Vidg, Cles Wikstöm, nd Mike Willims. Concuent Pogmmg Elng. Pentice-Hll, second edition, {3, 11} 11. Godm Bck, Wilson C. Hsieh, nd Jy Lepeu. Pocesses KffeOS: Isoltion, esouce mngement, nd shg Jv. In Poceedgs of the 4th USENIX Symposium on Opetg Systems Design nd Implementtion, Octobe {74} 12. Dvid F. Bcon, Clement R. Attnsio, V. T. Lee, Hn B. Rjn, nd Steven Smith. Jv without the coffee beks: A nontusive multipocesso gbge collecto. In Poceedgs of the ACM SIGPLAN Confeence on Pogmmg Lnguge Design nd Implementtion, pges ACM Pess, June {15} 13. Jmes R. Bell. Theded code. Communictions of the ACM, 16(8): , June {96} 14. J. L. Bentley. Pogmmg Pels. Addison-Wesley, Redg, Msschusetts, ept edition, {47} 15. Robet L. Benste. Poducg good code fo the cse sttement. Softwe Pctice nd Expeience, 15(10): , Octobe {46} 16. Buno Blnchet. Escpe nlysis fo object oiented lnguges. Appliction to Jv TM. In Confeence on Object-Oiented Pogmmg, Systems, Lnguges nd Applictions (OOPSLA 99), pges ACM Pess, Novembe {71} 17. Stffn Blu nd Jn Rooth. AXD 301 A new genetion ATM switchg system. Eicsson Review, 75(1):10 17, {16, 24}
166 REFERENCES P. Bnqut nd J. Lewi. A scheme of stoge lloction nd gbge collection fo Algol-68. In J. E. L. Peck, edito, Algol-68 Implementtion, pges Noth-Hollnd, Amstedm, {39} 19. Peston Biggs, Keith D. Coope, nd Ld Toczon. Impovements to gph colog egiste lloction. ACM Tnsctions on Pogmmg Lnguges nd Systems, 16(3): , My {50, 53} 20. T. Bus, M. C. J. D. vn Eekelen, M. vn Lee, M. J. Plsmeije, nd H. P. Bendegt. CLEAN lnguge fo functionl gph ewitg. In Khn, edito, Poceedgs of the Confeence on Functionl Pogmmg Lnguges nd Compute Achitectue (FPCA 87), numbe 274 LNCS, pges Spge-Velg, {95} 21. Gegoy J. Chit. Registe lloction & spillg vi gph colog. In Poceedgs of the ACM SIGPLAN Symposium on Compile Constuction, pges ACM Pess, June {49, 50} 22. Gegoy J. Chit, Mc A. Auslnde, Ashok K. Chnd, John Cocke, M. E. Hopks, nd Pete W. Mkste. Registe lloction vi colog. Compute Lnguges, 6(1):47 57, Jnuy {50} 23. C. J. Cheney. A nonecusive list compctg lgoithm. Communictions of the ACM, 13(11): , Novembe {64} 24. Pey Cheng nd Guy E. Blelloch. A pllel, el-time gbge collecto. In Poceedgs of the ACM SIGPLAN Confeence on Pogmmg Lnguge Design nd Implementtion, pges ACM Pess, June {15} 25. Pey Cheng, Robet Hpe, nd Pete Lee. Genetionl stck collection nd pofile-diven petenug. In Poceedgs of the ACM SIGPLAN Confeence on Pogmmg Lnguge Design nd Implementtion, PLDI 98, pges ACM Pess, {40, 70} 26. Jong-Deok Choi, Mnish Gupt, Muicio Seno, Vugnm C. Sheedh, nd Sm Midkiff. Escpe nlysis fo Jv. In Confeence on Object-Oiented Pogmmg, Systems, Lnguges nd Applictions (OOPSLA 99), pges ACM Pess, Novembe {71}
167 144 REFERENCES 27. Fed C. Chow nd John L Hennessy. The pioity-bsed colog ppoch to egiste lloction. ACM Tnsctions on Pogmmg Lnguges nd Systems, 12(4): , Octobe {50} 28. Chistophe Colby. Anlyzg the communiction topology of concuent pogms. In Poceedgs of the ACM SIGPLAN Symposium on Ptil Evlution nd Semntics-Bsed Pogm Mnipultion, pges , June {90} 29. Keith D. Coope nd L. Tylo Simpson. Live nge splittg gph colog egiste llocto. In Ki Koskimies, edito, CC 98: Compile Constuction, 7th Intentionl Confeence, numbe 1383 LNCS, pges Spge, Mch/Apil {50, 61, 124} 30. Ron Cyton, Jenne Fente, By K. Rosen, Mk N. Wegmn, nd F. Kenneth Zdeck. Efficiently computg sttic sgle ssignment fom nd the contol dependence gph. ACM Tnsctions on Pogmmg Lnguges nd Systems, 13(4): , Octobe {33} 31. Dmien Doligez nd Xvie Leoy. A concuent, genetionl gbge collecto fo multitheded implementtion of ML. In Confeence Recod of the ACM SIGPLAN-SIGACT Symposium on Pciples of Pogmmg Lnguges, pges ACM Pess, Jnuy {15, 75} 32. Tm Domni, Gl Goldshte, Elliot K. Kolodne, Ethn Lewis, Eez Petnk, nd Dfn Shewld. Thed-locl heps fo jv. In Poceedgs of ISMM, pges 76 87, {75} 33. Eicsson/OTP. The Elng/OTP R8B documenttion. Eicsson Telecom AB, See lso: {67} 34. Mc Feeley. A cse fo the unified hep ppoch to Elng memoy mngement. In Poceedgs of the PLI 01 Elng Wokshop, Septembe {76} 35. Mc Feeley nd Mt Lose. A compctg cementl collecto nd its pefomnce poduction qulity compile. In Poceedgs of ISMM 98: ACM SIGPLAN Intentionl Symposium on Memoy Mngement, pges 1 9. ACM Pess, Octobe {15}
168 REFERENCES Mc Feeley nd Mt Lose. Compilg Elng to Scheme. In C. Plmidessi, H. Glse, nd K. Meke, editos, Pciples of Decltive Pogmmg, numbe 1490 LNCS, pges Spge-Velg, Septembe {97} 37. Ll Geoge. MLRISC: Customizble nd eusble code genetos. Unpublished technicl epot vilble fom: geoge, {19} 38. Ll Geoge nd Andew W. Appel. Iteted egiste colescg. ACM Tnsctions on Pogmmg Lnguges nd Systems, 18(3): , My {50, 55} 39. Jmes Goslg, Bill Joy, Guy Steele, nd Gild Bch. The Jv Lnguge Specifiction Second Edition. Addison-Wesley, Boston, Mss., {3} 40. Dvid Gudemn. Repesentg type fomtion dynmiclly typed lnguges. Technicl Repot TR 93-27, Univesity of Aizon, Deptment of Compute Science, Octobe {19, 33} 41. Pe Gustfsson nd Konstntos Sgons. Ntive code compiltion of Elng s bit syntx. In Poceedgs of ACM SIGPLAN Elng Wokshop, {27} 42. J. Hlén, R. Klsson, nd M. Nilsson. Pefomnce mesuements of theds Jv nd pocesses Elng. Technicl Repot ETX/DN/SU-98:024, Eicsson, Novembe {13, 18} 43. Bogumil Husmn. Tubo Elng: Appochg the speed of C. In Evn Tick nd Gclo Succi, editos, Implementtions of Logic Pogmmg Systems, pges Kluwe Acdemic Publishes, {19} 44. Luie J. Henden, Gung R. Go, Eik R. Altmn, nd Chndik Mukeji. A egiste lloction fmewok bsed on hiechicl cyclic tevl gphs. Jounl of Pogmmg Lnguges, 1(3): , {50} 45. Robet Hieb, R. Kent Dybvig, nd Cl Buggemn. Repesentg contol the pesence of fist-clss contutions. In Poceedgs of the ACM SIGPLAN Confeence on Pogmmg Lnguge Design nd Implementtion, pges 66 77, June {97, 99}
169 146 REFERENCES 46. Richd L. Hudson nd J. Eliot B. Moss. Spphie: Copyg GC without stoppg the wold. In Poceedgs of the ACM Jv Gnde Confeence, pges ACM Pess, June {15} 47. Loenz Huelsbegen nd Jmes R. Lus. A concuent copyg gbge collecto fo lnguges tht distguish (im)mutble dt. In Poceedgs of the Fouth ACM Symposium on Pciples nd Pctice of Pllel Pogmmg, pges ACM Pess, My {15} 48. Eik Johnsson. Pefomnce mesuements nd pocess optimiztion fo Elng. Uppsl thesis compute science 32, Uppsl Univesity, Octobe {28, 82} 49. Eik Johnsson nd Chiste Jonsson. Ntive code compiltion fo Elng. Uppsl mste thesis compute science 100, Uppsl Univesity, Octobe {18, 21} 50. Eik Johnsson, Sven-Olof Nystöm, Thoms Ldgen, nd Chiste Jonsson. Evlution of HiPE, n Elng ntive code compile. Technicl Repot 99/03, ASTEC, Uppsl Univesity, {24, 28} 51. Eik Johnsson, Mikel Pettesson, nd Konstntos Sgons. HiPE: A High Pefomnce Elng system. In Poceedgs of the ACM SIGPLAN Intentionl Confeence on Pciples nd Pctice of Decltive Pogmmg, pges ACM Pess, Septembe {24} 52. Eik Johnsson nd Konstntos Sgons. Le scn egiste lloction high pefomnce Elng compile. In Pcticl Applictions of Decltive Lnguges: Poceedgs of the PADL 2002 Symposium, numbe 2257 LNCS, pges Spge, Jnuy {120} 53. Eik Johnsson, Konstntos Sgons, nd Jespe Wilhelmsson. Hep chitectues fo concuent lnguges usg messge pssg. In Poceedgs of the thid tentionl symposium on Memoy mngement, pges ACM Pess, {74} 54. Richd E. Jones nd Rfel Ls. Gbge Collection: Algoithms fo utomtic memoy mngement. John Wiley & Sons, {15, 25, 39}
170 REFERENCES Simon Peyton Jones, Andew Godon, nd Sigbjon Fne. Concuent Hskell. In Confeence Recod of POPL 96: The 23 d ACM SIGPLAN-SIGACT Symposium on Pciples of Pogmmg Lnguges, pges , St. Petesbug Bech, Floid, {75} 56. Simon Peyton Jones, Nomn Rmsey, nd Fem Reig. C : potble ssembly lnguge tht suppots gbge collection. In Intentionl Confeence on Pciples nd Pctice of Decltive Pogmmg, {19} 57. Smpth Knnn nd Todd A. Poebstg. Coection to Poducg good code fo the cse sttement. Softwe Pctice nd Expeience, 24(2):233, Febuy {46} 58. Andy Kg nd Pul Sope. Schedule Anlysis of Concuent Logic Pogms. In Kzysztof Apt, edito, Poceedgs of the Jot Intentionl Confeence nd Symposium on Logic Pogmmg, pges , Wshgton, USA, Novembe The MIT Pess. {89} 59. Tobis Ldhl nd Konstntos Sgons. Compiltion of flotg pot ithmetic the uncoopetive Elng envionment. In Poceedgs of the 14th Intentionl Wokshop on the Implementtion of Functionl Lnguges (IFL 2002), pges , Septembe {27} 60. Tim Ldholm nd Fnk Yell. The Jv TM Vitul Mche Specifiction. The Jv Seies. Addison-Wesley, {18} 61. Hns-Wolfgng Loidl. The Vitul Shed Memoy Pefomnce of Pllel Gph Reduce. In DSM 2002 Intentionl Wokshop on Distibuted Shed Memoy on Clustes, Bel, Gemny, My Ognised with CCGid 2002 Intentionl Symposium on Cluste Computg nd the Gid. {75} 62. Inmos Ltd. Occm Pogmmg Mnul. Compute Science. Pentice-Hll, {3} 63. Bt C. Mssey nd Evn Tick. Sequentiliztion of pllel logic pogms with mode nlysis. Lectue Notes Compute Science, 698: , {89} 64. Cole M. McNmee nd Ronld A. Olsson. Tnsfomtions fo optimizg tepocess communiction nd synchoniztion mechnisms. Intentionl Jounl of Pllel Pogmmg, 19(5): , {90}
171 148 REFERENCES 65. Hnspete Mössenböck nd Michel Pfeiffe. Le scn egiste lloction the context of ss fom nd egiste constts. In R. Nigel Hospool, edito, Compile Constuction, 11th Intentionl Confeence, CC 2002, numbe 2304 LNCS, pge 229 ff. Spge-Velg, Mch {61} 66. Steven S. Muchnick. Advnced Compile Design & Implementtion. Mogn Kufmn Publishes, Sn Fnsisco, CA, {20, 33, 49, 103, 120} 67. Scott Nettles nd Jmes O Toole. Rel-time epliction gbge collection. In Poceedgs of the ACM SIGPLAN Confeence on Pogmmg Lnguge Design nd Implementtion, pges ACM Pess, June {15} 68. Ptick Nilsson nd Michel Pesson. ANx high-speed tenet ccess. Eicsson Review, 75(1b):24 31, {16} 69. Jpyo Pk nd Soo-Mook Moon. Optimistic egiste colescg. In Poceedgs of the 1998 Intentionl Confeence on Pllel Achitectue nd Compiltion Techniques, pges IEEE Pess, Octobe {50, 55} 70. Young Gil Pk nd Benjm Goldbeg. Escpe nlysis on lists. In Poceedgs of the ACM SIGPLAN Confeence on Pogmmg Lnguge Design nd Implementtion, pges ACM Pess, July {71} 71. Mikel Pettesson. A stged tg scheme fo Elng. Technicl Repot 029, Infomtion Technology Deptment, Uppsl Univesity, Novembe {26, 33} 72. Simon L. Peyton Jones. The Implementtion of Functionl Pogmmg Lnguges. Compute Science. Pentice-Hll, {46} 73. John Plevyk, Xgb Zhng, nd Andew A. Chien. Obtg sequentil efficiency fo concuent object-oiented lnguges. In Confeence Recod of the 22nd ACM SIGPLAN-SIGACT Symposium on Pciples of Pogmmg Lnguges (POPL 95), pges , Sn Fncisco, Clifoni, Jnuy 22 25, ACM Pess. {89} 74. Mssimilo Poletto, Wilson C. Hsieh, Dwson R. Engle, nd M. Fns Kshoek. C nd tcc: A lnguge nd compile fo dynmic code genetion. ACM Tnsctions on Pogmmg Lnguges nd Systems, 21(2): , Mch {59}
172 REFERENCES Mssimilo Poletto nd Vivek Sk. Le scn egiste lloction. ACM Tnsctions on Pogmmg Lnguges nd Systems, 21(5): , Septembe {49, 51, 52, 55, 119, 122} 76. Pogmmg Systems Goup. SICStus Polog Use s Mnul. Technicl epot, Swedish Institute of Compute Science, {43} 77. Nomn Rmsey nd Simon Peyton Jones. Fetheweight concuency potble ssembly lnguge, {19} 78. John H. Reppy. CML: A highe-ode concuent lnguge. In Poceedgs of the ACM SIGPLAN Confeence on Pogmmg Lnguge Design nd Implementtion, pges ACM Pess, {3, 75, 90, 95} 79. Eik Ruf. Effective synchoniztion emovl fo Jv. In Poceedgs of the SIGPLAN Confeence on Pogmmg Lnguge Design nd Implementtion, pges ACM Pess, June {71} 80. Benhd Scholz nd Eik Eckste. Registe lloction fo iegul chitectues. In Poceedgs of the jot confeence on Lnguges, compiles nd tools fo embedded systems, pges ACM Pess, {61} 81. Mnuel Seno nd Piee Weis. Bigloo: potble nd optimizg compile fo stict functionl lnguges. In Aln Mycoft, edito, Poceedgs of the 2nd Sttic Anlysis Symposium, numbe 983 LNCS, pges , Septembe {95} 82. Rvi Sethi. Complete egiste lloction poblems. SIAM Jounl on Computg, 4(3): , Septembe {49} 83. Ehud Y. Shpio. The fmily of concuent logic pogmmg lnguges. ACM Computg Suveys, 21: , {12} 84. Richd M. Stllmn. Usg nd potg gcc. Technicl epot, The Fee Softwe Foundtion, {96} 85. Bjne Steensgd. Thed-specific heps fo multi-theded pogms. In Poceedgs of the ACM SIGPLAN Intentionl Symposium on Memoy Mngement, pges ACM Pess, Octobe {73, 74} 86. Sun Micosystems. UltSPARC TM Use s Mnul. Technicl epot, Sun Micoelectonics, Plo Alto, CA, {28}
173 150 REFERENCES 87. Dvid Tditi. Compct gbge collection tbles. In Poceedgs of the ACM SIGPLAN Intentionl Symposium on Memoy Mngement, pges ACM Pess, Octobe {40} 88. Seved Tostendhl. Open telecom pltfom. Eicsson Review, 75(1):14 17, See lso: {13} 89. Omi Tub, Glenn Hollowy, nd Michel D. Smith. Qulity nd speed le-scn egiste lloction. In Poceedgs of ACM SIG- PLAN Confeence on Pogmmg Lnguge Design nd Implementtion, pges ACM Pess, June {52, 59, 124} 90. Pete Vn Roy nd Seif Hidi. Mozt: A pogmmg system fo gent pplictions. In Intentionl Wokshop on Distibuted nd Intenet Pogmmg with Logic nd Constt Lnguges, Novembe Pt of Intentionl Confeence on Logic Pogmmg (ICLP 99). {3} 91. Robet Vidg. A gbge collecto fo the concuent el-time lnguge Elng. In Heny G. Bke, edito, Poceedgs of IWMM 95: Intentionl Wokshop on Memoy Mngement, numbe 986 LNCS, pges Spge-Velg, Septembe {15} 92. Dvid H. D. Wen. An bstct Polog stuction set. Technicl Repot 309, SRI Intentionl, Menlo Pk, U.S.A., Octobe {19} 93. Jepe Wilhelmsson. Explog ltentive memoy chitectues fo Elng: Implementtion nd pefomnce evlution. Uppsl mste thesis compute science 212, Uppsl Univesity, Apil Avilble t {128} 94. Michel Young, Avdis Tevn, Richd F. Rshid, Dvid B. Golub, Jeffey L. Eppge, Jonthn Chew, Willim J. Bolosky, Dvid L. Blck, nd Robet V. Bon. The dulity of memoy nd communiction the implementtion of multipocesso opetg system. In Symposium on Opetg Systems Pciples, pges 63 76, {16}
GFI MilAchive 6 vs H&S Exchnge@PAM GFI Softwe www.gfi.com GFI MilAchive 6 vs H&S Exchnge@PAM GFI MilAchive 6 H&S Exchnge@PAM Who we e Genel fetues Suppots Micosoft Exchnge 2000, 2003 & 2007 Suppots distibuted
tools for Web data extraction
HTML-we tools fo Web dt extction Thesis pesenttion 1 Student: Xvie Azg Supeviso: Andes Tho Tble of contents Intoduction Dt Extction Pocess Dt Extction Tools Relized tests Futue Wok 2 Intoduction We e going
GFI MilAchive 6 vs EMC EmilXtende Achive Edition GFI Softwe www.gfi.com GFI MilAchive 6 vs EMC EmilXtende Achive Edition GFI MilAchive 6 EMC EmilXtende Achive Edition Who we e Genel fetues Suppots Micosoft
GFI EventsMnge vs Netikus.net EventSenty GFI Softwe www.gfi.com GFI EventsMnge vs Netikus.net EventSenty GFI EventsMnge EventSenty Who we e Suppot fo MS SQL Seve Suppot fo MSDE / MS SQL Expess Suppot fo
Highest Pefomnce Lowest Pice PRODUCT COMPARISON GFI MilAchive vs Symntec Entepise Vult GFI Softwe www.gfi.com GFI MilAchive vs Symntec Entepise Vult GFI MilAchive 6 Symntec Entepise Vult Who we e Genel
Orbits and Kepler s Laws
Obits nd Keple s Lws This web pge intoduces some of the bsic ides of obitl dynmics. It stts by descibing the bsic foce due to gvity, then consides the ntue nd shpe of obits. The next section consides how
N V V L. R a L I. Transformer Equation Notes
Tnsfome Eqution otes This file conts moe etile eivtion of the tnsfome equtions thn the notes o the expeiment 3 wite-up. t will help you to unestn wht ssumptions wee neee while eivg the iel tnsfome equtions
Implementation and Evaluation of Transparent Fault-Tolerant Web Service with Kernel-Level Support
Poceedings of the IEEE Intentionl Confeence on Compute Communictions nd Netwoks Mimi, Floid, pp. 63-68, Octobe 2002. Implementtion nd Evlution of Tnspent Fult-Tolent Web Sevice with Kenel-Level Suppot
GFI MilAchive 6 vs Wtefod Technologies MilMete Achive GFI Softwe www.gfi.com GFI MilAchive 6 vs Wtefod Technologies MilMete Achive Genel fetues Suppots Micosoft Exchnge 2000, 2003 & 2007 Suppots distibuted
(1) continuity equation: 0. momentum equation: u v g (2) u x. 1 a
Comment on The effect of vible viscosity on mied convection het tnsfe long veticl moving sufce by M. Ali [Intentionl Jounl of Theml Sciences, 006, Vol. 45, pp. 60-69] Asteios Pntoktos Associte Pofesso
Screentrade Car Insurance Policy Summary
Sceentde C Insunce Policy Summy This is summy of the policy nd does not contin the full tems nd conditions of the cove, which cn be found in the policy booklet nd schedule. It is impotnt tht you ed the
GFI MilEssentils & GFI MilSecuity vs Tend Mico ScnMil Suite fo Micosoft Exchnge GFI Softwe www.gfi.com GFI MilEssentils & GFI MilSecuity vs Tend Mico ScnMil Suite fo Micosoft Exchnge Exchnge Seve 2000/2003
Summary: Vectors. This theorem is used to find any points (or position vectors) on a given line (direction vector). Two ways RT can be applied:
Summ: Vectos ) Rtio Theoem (RT) This theoem is used to find n points (o position vectos) on given line (diection vecto). Two ws RT cn e pplied: Cse : If the point lies BETWEEN two known position vectos
Random Variables and Distribution Functions
Topic 7 Rndom Vibles nd Distibution Functions 7.1 Intoduction Fom the univese of possible infomtion, we sk question. To ddess this question, we might collect quntittive dt nd ognize it, fo emple, using
(Ch. 22.5) 2. What is the magnitude (in pc) of a point charge whose electric field 50 cm away has a magnitude of 2V/m?
Em I Solutions PHY049 Summe 0 (Ch..5). Two smll, positively chged sphees hve combined chge of 50 μc. If ech sphee is epelled fom the othe by n electosttic foce of N when the sphees e.0 m pt, wht is the
Adaptive Control of a Production and Maintenance System with Unknown Deterioration and Obsolescence Rates
Int J of Mthemtic Sciences nd Appictions, Vo, No 3, Septembe Copyight Mind Rede Pubictions wwwjounshubcom Adptive Conto of Poduction nd Mintennce System with Unknown Deteiotion nd Obsoescence Rtes Fwzy
by K.-H. Rutsch*, P.J. Viljoen*, and H. Steyn* The need for systematic project portfolio selection
An investigtion into the cuent pctice of poject potfolio selection in esech nd development division of the South Aficn minels nd enegy industy by K.-H. Rutsch*, P.J. Viljoen*, nd H. Steyn* J o u n l Synopsis
for Student Service Members and Veterans in Indiana
Apil 2009 The Highe Eduction Lndscpe fo Student Sevice Membes nd Vetens in Indin Mtin Stenbeg, Shelley McDemid Wdswoth, Jo Vughn, nd Ryn Clson Mility Fmily Resech Institute t Pudue Univesity Suppot Len
Software Engineering and Development
I T H E A 67 Softwae Engineeing and Development SOFTWARE DEVELOPMENT PROCESS DYNAMICS MODELING AS STATE MACHINE Leonid Lyubchyk, Vasyl Soloshchuk Abstact: Softwae development pocess modeling is gaining
16. Mean Square Estimation
6 Me Sque stmto Gve some fomto tht s elted to uow qutty of teest the poblem s to obt good estmte fo the uow tems of the obseved dt Suppose epeset sequece of dom vbles bout whom oe set of obsevtos e vlble
Continuous Compounding and Annualization
Continuous Compounding and Annualization Philip A. Viton Januay 11, 2006 Contents 1 Intoduction 1 2 Continuous Compounding 2 3 Pesent Value with Continuous Compounding 4 4 Annualization 5 5 A Special Poblem
Curvature. (Com S 477/577 Notes) Yan-Bin Jia. Oct 8, 2015
Cuvtue Com S 477/577 Notes Yn-Bin Ji Oct 8, 205 We wnt to find mesue of how cuved cuve is. Since this cuvtue should depend only on the shpe of the cuve, it should not be chnged when the cuve is epmetized.
Marketing Logistics: Opportunities and Limitations
Mketig Logistics: Oppotuities d Limittios Pethip Vdhsidhu 1, Ugul Lpted 2 1 Gdute School, MBA i Itetiol Busiess, The Uivesity of the Thi Chmbe of Commece Vibhvdee-Rgsit Rod, Dideg, Bgkok, 10400, Thild
Intro to Circle Geometry By Raymond Cheong
Into to Cicle Geomety By Rymond Cheong Mny poblems involving cicles cn be solved by constucting ight tingles then using the Pythgoen Theoem. The min chllenge is identifying whee to constuct the ight tingle.
Introducing Kashef for Application Monitoring
WextWise 2010 Introducing Kshef for Appliction The Cse for Rel-time monitoring of dtcenter helth is criticl IT process serving vriety of needs. Avilbility requirements of 6 nd 7 nines of tody SOA oriented
r (1+cos(θ)) sin(θ) C θ 2 r cos θ 2
icles xmple 66: Rounding one ssume we hve cone of ngle θ, nd we ound it off with cuve of dius, how f wy fom the cone does the ound stt? nd wht is the chod length? (1+cos(θ)) sin(θ) θ 2 cos θ 2 xmple 67:
2.016 Hydrodynamics Prof. A.H. Techet
.016 Hydodynmics Reding #5.016 Hydodynmics Po. A.H. Techet Fluid Foces on Bodies 1. Stedy Flow In ode to design oshoe stuctues, suce vessels nd undewte vehicles, n undestnding o the bsic luid oces cting
In-stope bolting for a safer working environment
text:templte Jounl 2/3/10 9:34 AM Pge 47 In-stope bolting fo sfe woking envionment by P. Henning* nd P. Feei* J o u n l Synopsis Rock fll ccidents continue to be the min cuse of ftl nd seious injuies in
AntiSpyware Enterprise Module 8.5
AntiSpywre Enterprise Module 8.5 Product Guide Aout the AntiSpywre Enterprise Module The McAfee AntiSpywre Enterprise Module 8.5 is n dd-on to the VirusScn Enterprise 8.5i product tht extends its ility
How To Network A Smll Business
Why network is n essentil productivity tool for ny smll business Effective technology is essentil for smll businesses looking to increse the productivity of their people nd processes. Introducing technology
Exam in physics, El-grunder (Electromagnetism), 2014-03-26, kl 9.00-15.00
Umeå Univesitet, Fysik 1 Vitly Bychkov Em in physics, El-gunde (Electomgnetism, 14--6, kl 9.-15. Hjälpmedel: Students my use ny book(s. Mino notes in the books e lso llowed. Students my not use thei lectue
HEALTHCARE INTEGRATION BASED ON CLOUD COMPUTING
U.P.B. Sci. Bull., Seies C, Vol. 77, Iss. 2, 2015 ISSN 2286-3540 HEALTHCARE INTEGRATION BASED ON CLOUD COMPUTING Roxana MARCU 1, Dan POPESCU 2, Iulian DANILĂ 3 A high numbe of infomation systems ae available
An Efficient Group Key Agreement Protocol for Ad hoc Networks
An Efficient Goup Key Ageement Potocol fo Ad hoc Netwoks Daniel Augot, Raghav haska, Valéie Issany and Daniele Sacchetti INRIA Rocquencout 78153 Le Chesnay Fance {Daniel.Augot, Raghav.haska, Valéie.Issany,
Things to Remember. r Complete all of the sections on the Retirement Benefit Options form that apply to your request.
Retiement Benefit 1 Things to Remembe Complete all of the sections on the Retiement Benefit fom that apply to you equest. If this is an initial equest, and not a change in a cuent distibution, emembe to
Small Business Cloud Services
Smll Business Cloud Services Summry. We re thick in the midst of historic se-chnge in computing. Like the emergence of personl computers, grphicl user interfces, nd mobile devices, the cloud is lredy profoundly
Small Business Networking
Why network is n essentil productivity tool for ny smll business Effective technology is essentil for smll businesses looking to increse the productivity of their people nd business. Introducing technology
How To Reduce Telecommunictions Costs
Reducing your telecommunictions costs Reserch firm IDC 1 hs estimted tht VoIP system cn reduce telephony-relted expenses by 30%. Voice over Internet Protocol (VoIP) hs become vible solution for even the
INITIAL MARGIN CALCULATION ON DERIVATIVE MARKETS OPTION VALUATION FORMULAS
INITIAL MARGIN CALCULATION ON DERIVATIVE MARKETS OPTION VALUATION FORMULAS Vesion:.0 Date: June 0 Disclaime This document is solely intended as infomation fo cleaing membes and othes who ae inteested in
How To Set Up A Network For Your Business
Why Network is n Essentil Productivity Tool for Any Smll Business TechAdvisory.org SME Reports sponsored by Effective technology is essentil for smll businesses looking to increse their productivity. Computer
Department of Health & Human Services (DHHS) Centers for Medicare & Medicaid Services (CMS) Transmittal 1151 Date: November 16, 2012
nul ysem ub 100-20 One-Time Noificion Depmen of elh & umn evices (D) enes fo edice & edicid evices () Tnsmil 1151 De: Novembe 16, 2012 hnge eques 8124 UBJT: Use of Q6 odifie fo Locum Tenens by oviding
Small Business Networking
Why network is n essentil productivity tool for ny smll business Effective technology is essentil for smll businesses looking to increse the productivity of their people nd processes. Introducing technology
Vendor Rating for Service Desk Selection
Vendor Presented By DATE Using the scores of 0, 1, 2, or 3, plese rte the vendor's presenttion on how well they demonstrted the functionl requirements in the res below. Also consider how efficient nd functionl
Small Business Networking
Why network is n essentil productivity tool for ny smll business Effective technology is essentil for smll businesses looking to increse the productivity of their people nd business. Introducing technology
VoIP for the Small Business
Reducing your telecommunictions costs VoIP (Voice over Internet Protocol) offers low cost lterntive to expensive trditionl phone services nd is rpidly becoming the communictions system of choice for smll
ClearPeaks Customer Care Guide. Business as Usual (BaU) Services Peace of mind for your BI Investment
ClerPeks Customer Cre Guide Business s Usul (BU) Services Pece of mind for your BI Investment ClerPeks Customer Cre Business s Usul Services Tble of Contents 1. Overview...3 Benefits of Choosing ClerPeks
Modeling and Verifying a Price Model for Congestion Control in Computer Networks Using PROMELA/SPIN
Modeling and Veifying a Pice Model fo Congestion Contol in Compute Netwoks Using PROMELA/SPIN Clement Yuen and Wei Tjioe Depatment of Compute Science Univesity of Toonto 1 King s College Road, Toonto,
Over-encryption: Management of Access Control Evolution on Outsourced Data
Ove-encyption: Management of Access Contol Evolution on Outsouced Data Sabina De Capitani di Vimecati DTI - Univesità di Milano 26013 Cema - Italy [email protected] Stefano Paaboschi DIIMM - Univesità
Titanium: the innovators metal Historical case studies tracing titanium process and product innovation
Titnium: the innovtos metl Histoicl cse studies tcing titnium pocess nd poduct innovtion by S.J. Oosthuizen* J o u n l Synopsis This ppe exmines innovtion in eltion to the vilbility of new mteil: the metl
VoIP for the Small Business
Reducing your telecommunictions costs TechAdvisory.org SME Reports sponsored by Cybernut Solutions provides outsourced IT support from welth of knowledgeble technicins nd system dministrtors certified
VoIP for the Small Business
Reducing your telecommunictions costs Reserch firm IDC 1 hs estimted tht VoIP system cn reduce telephony-relted expenses by 30%. Voice over Internet Protocol (VoIP) hs become vible solution for even the
Small Business Networking
Why network is n essentil productivity tool for ny smll business Effective technology is essentil for smll businesses looking to increse the productivity of their people nd processes. Introducing technology
VoIP for the Small Business
Reducing your telecommunictions costs Reserch firm IDC 1 hs estimted tht VoIP system cn reduce telephony-relted expenses by 30%. Voice over Internet Protocol (VoIP) hs become vible solution for even the
VoIP for the Small Business
Reducing your telecommunictions costs Reserch firm IDC 1 hs estimted tht VoIP system cn reduce telephony-relted expenses by 30%. Voice over Internet Protocol (VoIP) hs become vible solution for even the
Protocol Analysis. 17-654/17-764 Analysis of Software Artifacts Kevin Bierhoff
Protocol Anlysis 17-654/17-764 Anlysis of Softwre Artifcts Kevin Bierhoff Tke-Awys Protocols define temporl ordering of events Cn often be cptured with stte mchines Protocol nlysis needs to py ttention
Uncertain Version Control in Open Collaborative Editing of Tree-Structured Documents
Uncetain Vesion Contol in Open Collaboative Editing of Tee-Stuctued Documents M. Lamine Ba Institut Mines Télécom; Télécom PaisTech; LTCI Pais, Fance mouhamadou.ba@ telecom-paistech.f Talel Abdessalem
How To Get A Free Phone Line From A Cell Phone To A Landline For A Business
Reducing your telecommunictions costs Reserch firm IDC 1 hs estimted tht VoIP system cn reduce telephony-relted expenses by 30%. Voice over Internet Protocol (VoIP) hs become vible solution for even the
VoIP for the Small Business
VoIP for the Smll Business Reducing your telecommunictions costs Reserch firm IDC 1 hs estimted tht VoIP system cn reduce telephony-relted expenses by 30%. Voice over Internet Protocol (VoIP) hs become
The transport performance evaluation system building of logistics enterprises
Jounal of Industial Engineeing and Management JIEM, 213 6(4): 194-114 Online ISSN: 213-953 Pint ISSN: 213-8423 http://dx.doi.og/1.3926/jiem.784 The tanspot pefomance evaluation system building of logistics
Polynomial Functions. Polynomial functions in one variable can be written in expanded form as ( )
Polynomil Functions Polynomil functions in one vrible cn be written in expnded form s n n 1 n 2 2 f x = x + x + x + + x + x+ n n 1 n 2 2 1 0 Exmples of polynomils in expnded form re nd 3 8 7 4 = 5 4 +
Techniques for Requirements Gathering and Definition. Kristian Persson Principal Product Specialist
Techniques for Requirements Gthering nd Definition Kristin Persson Principl Product Specilist Requirements Lifecycle Mngement Elicit nd define business/user requirements Vlidte requirements Anlyze requirements
Combinatorial Testing for Tree-Structured Test Models with Constraints
Comintoil Testing fo Tee-Stutued Test Models with Constints Tkshi Kitmu, Akihis Ymd, Goo Htym, Cyille Atho, Eun-Hye Choi, Ngo Thi Bih Do, Yutk Oiw, Shiny Skugi Ntionl Institute of Advned Industil Siene
DRIVER BEHAVIOR MODELING USING HYBRID DYNAMIC SYSTEMS FOR DRIVER-AWARE ACTIVE VEHICLE SAFETY
DRIVER BEHAVIOR MODELING USING HYBRID DYNAMIC SYSTEMS FOR DRIVER-AWARE ACTIVE VEHICLE SAFETY Pin Boyz, Amdeep Sthynyn, John H.L. Hnsen Eik Jonsson School o Engineeing nd Compute Science Univesity o Texs
9:6.4 Sample Questions/Requests for Managing Underwriter Candidates
9:6.4 INITIAL PUBLIC OFFERINGS 9:6.4 Sample Questions/Requests fo Managing Undewite Candidates Recent IPO Expeience Please povide a list of all completed o withdawn IPOs in which you fim has paticipated
Network Configuration Independence Mechanism
3GPP TSG SA WG3 Security S3#19 S3-010323 3-6 July, 2001 Newbury, UK Source: Title: Document for: AT&T Wireless Network Configurtion Independence Mechnism Approvl 1 Introduction During the lst S3 meeting
The Role of Gravity in Orbital Motion
! The Role of Gavity in Obital Motion Pat of: Inquiy Science with Datmouth Developed by: Chistophe Caoll, Depatment of Physics & Astonomy, Datmouth College Adapted fom: How Gavity Affects Obits (Ohio State
VoIP for the Small Business
VoIP for the Smll Business Reducing your telecommunictions costs Reserch firm IDC 1 hs estimted tht VoIP system cn reduce telephony-relted expenses by 30%. Voice over Internet Protocol (VoIP) hs become
Health insurance marketplace What to expect in 2014
Helth insurnce mrketplce Wht to expect in 2014 33096VAEENBVA 06/13 The bsics of the mrketplce As prt of the Affordble Cre Act (ACA or helth cre reform lw), strting in 2014 ALL Americns must hve minimum
Health insurance exchanges What to expect in 2014
Helth insurnce exchnges Wht to expect in 2014 33096CAEENABC 02/13 The bsics of exchnges As prt of the Affordble Cre Act (ACA or helth cre reform lw), strting in 2014 ALL Americns must hve minimum mount
Module Availability at Regent s School of Drama, Film and Media Autumn 2016 and Spring 2017 *subject to change*
Availability at Regent s School of Dama, Film and Media Autumn 2016 and Sping 2017 *subject to change* 1. Choose you modules caefully You must discuss the module options available with you academic adviso/
Data replication in mobile computing
Technicl Report, My 2010 Dt repliction in mobile computing Bchelor s Thesis in Electricl Engineering Rodrigo Christovm Pmplon HALMSTAD UNIVERSITY, IDE SCHOOL OF INFORMATION SCIENCE, COMPUTER AND ELECTRICAL
est using the formula I = Prt, where I is the interest earned, P is the principal, r is the interest rate, and t is the time in years.
9.2 Inteest Objectives 1. Undestand the simple inteest fomula. 2. Use the compound inteest fomula to find futue value. 3. Solve the compound inteest fomula fo diffeent unknowns, such as the pesent value,
883 Brochure A5 GENE ss vernis.indd 1-2
ess x a eu / u e a. p o.eu c e / :/ http EURAXESS Reseaches in Motion is the gateway to attactive eseach caees in Euope and to a pool of wold-class eseach talent. By suppoting the mobility of eseaches,
Characteristics of an effective selfdirected work team in the gold-mining industry
Chcteistics of n effective selfdiected wok tem in the gold-mining industy by A. Nel* nd J. Pien* J o u n l Synopsis The gold mining industy in South Afic stnds to benefit much fom the implementtion of
DlNBVRGH + Sickness Absence Monitoring Report. Executive of the Council. Purpose of report
DlNBVRGH + + THE CITY OF EDINBURGH COUNCIL Sickness Absence Monitoring Report Executive of the Council 8fh My 4 I.I...3 Purpose of report This report quntifies the mount of working time lost s result of
An Approach to Optimized Resource Allocation for Cloud Simulation Platform
An Appoach to Optimized Resouce Allocation fo Cloud Simulation Platfom Haitao Yuan 1, Jing Bi 2, Bo Hu Li 1,3, Xudong Chai 3 1 School of Automation Science and Electical Engineeing, Beihang Univesity,
Model-Driven Engineering of Adaptation Engines for Self-Adaptive Software: Executable Runtime Megamodels
Model-Diven Engineeing of Adaptation Engines fo Self-Adaptive Softwae: Executable Runtime Megamodels Thomas Vogel, Holge Giese Technische Beichte N. 66 des Hasso-Plattne-Instituts fü Softwaesystemtechnik
Automatic Testing of Neighbor Discovery Protocol Based on FSM and TTCN*
Automatic Testing of Neighbo Discovey Potocol Based on FSM and TTCN* Zhiliang Wang, Xia Yin, Haibin Wang, and Jianping Wu Depatment of Compute Science, Tsinghua Univesity Beijing, P. R. China, 100084 Email:
VoIP for the Small Business
VoIP for the Smll Business Reducing your telecommunictions costs Reserch firm IDC 1 hs estimted tht VoIP system cn reduce telephony-relted expenses by 30%. Voice over Internet Protocol (VoIP) hs become
Give me all I pay for Execution Guarantees in Electronic Commerce Payment Processes
Give me all I pay fo Execution Guaantees in Electonic Commece Payment Pocesses Heiko Schuldt Andei Popovici Hans-Jög Schek Email: Database Reseach Goup Institute of Infomation Systems ETH Zentum, 8092
Fatigue knowledge a new lever in safety management
text:templte Jounl 1/20/11 12:43 PM Pge 1 Ftigue knowledge new leve in sfety mngement by W.J. Theon* nd G.M.J. vn Heeden J o u n l Synopsis The pupose of the ppe is to give n intoduction to the concept
Reasoning to Solve Equations and Inequalities
Lesson4 Resoning to Solve Equtions nd Inequlities In erlier work in this unit, you modeled situtions with severl vriles nd equtions. For exmple, suppose you were given usiness plns for concert showing
SOEPpapers on Multidisciplinary Panel Data Research
Deutsches Institut fü Witschftsfoschung www.diw.de SOEPppes on Multidiscipliny Pnel Dt Resech 136 Thoms Conelissen John S. Heywood Uwe Jijhn S, Pefomnce Py, Risk Attitudes nd Job Stisfction Belin, Octobe
Database Management Systems
Contents Database Management Systems (COP 5725) D. Makus Schneide Depatment of Compute & Infomation Science & Engineeing (CISE) Database Systems Reseach & Development Cente Couse Syllabus 1 Sping 2012
Unleashing the Power of Cloud
Unleshing the Power of Cloud A Joint White Pper by FusionLyer nd NetIQ Copyright 2015 FusionLyer, Inc. All rights reserved. No prt of this publiction my be reproduced, stored in retrievl system, or trnsmitted,
A framework for the selection of enterprise resource planning (ERP) system based on fuzzy decision making methods
A famewok fo the selection of entepise esouce planning (ERP) system based on fuzzy decision making methods Omid Golshan Tafti M.s student in Industial Management, Univesity of Yazd [email protected]
Chapter 3 Savings, Present Value and Ricardian Equivalence
Chapte 3 Savings, Pesent Value and Ricadian Equivalence Chapte Oveview In the pevious chapte we studied the decision of households to supply hous to the labo maket. This decision was a static decision,
Valuation of Floating Rate Bonds 1
Valuation of Floating Rate onds 1 Joge uz Lopez us 316: Deivative Secuities his note explains how to value plain vanilla floating ate bonds. he pupose of this note is to link the concepts that you leaned
UNIT CIRCLE TRIGONOMETRY
UNIT CIRCLE TRIGONOMETRY The Unit Cicle is the cicle centeed at the oigin with adius unit (hence, the unit cicle. The equation of this cicle is + =. A diagam of the unit cicle is shown below: + = - - -
DOCTORATE DEGREE PROGRAMS
DOCTORATE DEGREE PROGRAMS Application Fo Admission 2015-2016 5700 College Road, Lisle, Illinois 60532 Enollment Cente Phone: (630) 829-6300 Outside Illinois: (888) 829-6363 FAX: (630) 829-6301 Email: [email protected]
