Rmot-scop promotion clarifid John Wickrson (Imprial), Mark Batty (Knt), Ally Donaldson (Imprial), Brad Bckmann (AMD) rctifid vrifid REMS workshop, Cambridg 22 April 2015
In brif RSP is a GPGPU languag xtnsion from AMD that nabls fficint work-staling W workd with AMD to formalis thir dsign (at languag and HW lvl). This ld to a corrctd and improvd implmntation. Formalis arly in th dsign procss!
This talk 1. What is RSP? 2. Adding RSP to th OpnCL mmory modl 3. A formalisd implmntation of OpnCL+RSP
This talk 1. What is RSP? 2. Adding RSP to th OpnCL mmory modl 3. A formalisd implmntation of OpnCL+RSP
C11: flat thrad structur T4 T5
OpnCL: thrad groupings dvic dvic workgroup workgroup workgroup T4 T5
GPUs: hirarchical mmory dvic dvic workgroup workgroup workgroup T4 T5 L1 CACHE L1 CACHE L1 CACHE L2 CACHE L2 CACHE GLOBAL MEMORY
Mmory scops
Mmory scops stor(x,42) load(x) workgroup workgroup
Mmory scops stor(x,42,wg) load(x,wg) workgroup workgroup
Mmory scops stor(x,42,wg) load(x,wg) faulty! workgroup workgroup
Mmory scops stor(x,42,dv) load(x,dv) ok! workgroup workgroup
Mmory scops stor(x,42,dv) load(x,wg) faulty!* workgroup workgroup *...but th OpnCL standard could asily b xtndd to allow this
stor(hada,_,wg) //pop Work-staling stor(hada,_,wg) //push workgroup A workgroup B taila hada tailb hadb
Work-staling stor(hada,_,wg) //push stor(hada,_,wg) //pop stor(hada,_,???) //stal workgroup A workgroup B no way to plug this hol in OpnCL! taila hada hadb tailb
Rmot-scop promotion stor(x,42,dv) load(x,dv)
Rmot-scop promotion stor(x,42,wg) load(x,dv)
Rmot-scop promotion stor(x,42,wg) load(x,dv,rmot)
Rmot-scop promotion stor(x,42,dv,rmot) load(x,wg)
Work-staling stor(hada,_,wg) //push stor(hada,_,wg) //pop stor(hada,_,dv,rmot) //stal workgroup A workgroup B taila hada hadb tailb
This talk 1. What is RSP? 2. Adding RSP to th OpnCL mmory modl 3. A formalisd implmntation of OpnCL+RSP
vrsion W prsnt in Fig. 2 th full OpnCL modl (including x Wprsnt prsntininfig. Fig.22th thfull fullopncl OpnCLmodl modl(including (includingx x W W prsnt Fig. 2languag. th full OpnCL modl (including barrirs) in thin.cat In th following, w discu barrirs)ininth th.cat.catlanguag. languag.ininth thfollowing, following,w wdiscu disc barrirs) w di changs thatinmust b mad to th modl, which ar highlig barrirs) th.cat languag. In th following, changsthat thatmust mustbbmad madtototh thmodl, modl,which whichar arhighlig highli changs thchangs figur. that must b mad to th modl, which ar highl thfigur. figur. th th figur. 2 n sw = rf \ (6 = ) \ ( na) 2 2 thd on n sw = rf \ (6 = ) \ ( na) sw = rf \ (6 = ) \ ( na) 2 thd thd ion In C11ra: sw = rf \ (6=thd ) \ ( na) quntial squntial quntial squntial uggstion sw = rf \ (6 = thd ) \ (=incl ) suggstion uggstion sw = rf \ (6 = ) \ (= ) sw = rf \ (6 = ) \ (= ) thd incl thd incl In OpnCLra: rccommosuggstion sw = rf \ (6 = ) \ (= ) thd incl accommoccommoo accommohroughout 0 0 0 throughout hroughout ==incl 0 0==(( incl 00^^ incl 0)0 ) =incl incl = incl ^0 incl ) 0 d throughout 0 ( incl incl =incl = ( incl ^ incl ) rmovd rmovd rmovd 0 0 0 incl 0 ==((22na ^^==thd 0)0 ) b rmovd na incl incl = thd0 ) _ 0 0 ( 2 na ^ =thd ((( 22WG ^ = incl = 2 na ^ = 0 )0_) _ wg thd WG^^==wg _ ( 2 WG wg) )_ 0 for a ( 2 WG ^ = ) _ 2 DV wg for for aa 2 DV 2 DV bfor a ations 2 DV0 rations ations 0 0 0 = 0 = ( incl 0 0^ 0 )0 _ incl incl h sid rations = = ( ^ incl = incl0 ^0 incl) )_ sid In OpnCL+RSP: =incl 0 ( incl incl sid 0_ =incl = ( ^ ) _ 0^ ( 2 rm) _ 0 incl incl th sid incl rm) ((0 incl incl ^0^22rm) 0 02 ( rm ^ ^ 2 rm) 0( 0)0 ) _ incl incl ( 2 rm ^ ( 2 rm ^ incl ) 0 or a incl 0 for or aa ( 2 rm ^ incl ) Excution oprabfor Excution barrirs barrirs and and rlaxd rlaxd atomics atomics Whn Whn orchs orch opra-a pcification nt th full ntth thfull full nt snt r) inth B. snt full dr) in B. B. r) in ordr) in B. including,including including l, including Scop inclusion
Tsting OpnCL+RSP programs W xtndd Hrd to support th nw mmory modl. W simulatd th 12 litmus tsts dsignd by th AMD dvloprs to dfin thir xpctations of RSP. W found 8 wr good, but: 2 had unintntional racs, 1 nforcd brokn bhaviour, and 1 forbad rasonabl bhaviour. W also found (and fixd) bugs in thir work-staling quu implmntation
This talk 1. What is RSP? 2. Adding RSP to th OpnCL mmory modl 3. A formalisd implmntation of OpnCL+RSP
Implmnting RSP Modl of GPU hardwar Assmbly-lik languag, with instructions modlld as stat transformrs Mapping from OpnCL+RSP oprations to squncs of assmbly instructions Can thn prov that all bhaviours of th compild program ar allowd by th OpnCL+RSP MM.
Nw compilation schm na or WG DV (not rmot) DV (rmot) r=load(x) stor(x,r) r=ftch_inc(x)
Nw compilation schm na or WG DV (not rmot) DV (rmot) r=load(x) LD r x stor(x,r) ST r x r=ftch_inc(x) INCL1 r x
Nw compilation schm na or WG DV (not rmot) DV (rmot) r=load(x) LD r x LD r x INVL1 WG stor(x,r) ST r x FLUL1 WG ST r x r=ftch_inc(x) INCL1 r x FLUL1 WG INCL2 r x INVL1 WG
Nw compilation schm na or WG DV (not rmot) DV (rmot) r=load(x) LD r x LD r x INVL1 WG LD r x FLUL1 DV INVL1 WG stor(x,r) ST r x FLUL1 WG ST r x FLUL1 WG } INVL1 DV LK rmw ST r x r=ftch_inc(x) INCL1 r x FLUL1 WG INCL2 r x INVL1 WG FLUL1 WG INVL1 DV INCL2 r x FLUL1 DV INVL1 WG }LKrmw
Old compilation schm na or WG DV (not rmot) DV (rmot) r=load(x) LD r x INVL1 WG LD r x FLUL1 DV } INVL1 WG LK LD r x x stor(x,r) ST r x FLUL1 WG ST r x FLUL1 WG } ST r x LK INVL1 DV x r=ftch_inc(x) INCL1 r x FLUL1 WG INVL1 WG INCL2 r x FLUL1 DV INVL1 WG}LK x INCL2 r x INVL1 DV LKrmw
Nw compilation schm na or WG DV (not rmot) DV (rmot) r=load(x) LD r x LD r x INVL1 WG LD r x FLUL1 DV INVL1 WG stor(x,r) ST r x FLUL1 WG ST r x FLUL1 WG } INVL1 DV LK rmw ST r x r=ftch_inc(x) INCL1 r x FLUL1 WG INCL2 r x INVL1 WG FLUL1 WG INVL1 DV INCL2 r x FLUL1 DV INVL1 WG }LKrmw
Contributions Extndd OpnCL mm to includ RSP Extndd Hrd with nw mm, and usd it to find and fix bugs in RSP litmus tsts and programs Formalisd implmntation of RSP (modl of GPU hardwar + assmbly languag) Found and fixd bugs in original compilation schm Provd nw improvd schm corrct
Rmot-scop promotion clarifid John Wickrson (Imprial), Mark Batty (Knt), Ally Donaldson (Imprial), Brad Bckmann (AMD) rctifid vrifid REMS workshop, Cambridg 22 April 2015
Spar slids
A corrupt MP *x=42; stor(y,1,dv); if(load(y,dv)==1) print(*x);
A corrupt MP ST x 42; FLUL1 WG; ST y 1; INVL1 WG; LD y; //1 LD x;
A corrupt MP ST x 42; FLUL1 WG; ST y 1; INVL1 WG; LD y; //1 LD x;
A corrupt MP ST x 42; FLUL1 WG; ST y 1; INVL1 WG; LD y; //1 LD x;
A corrupt MP ST x 42; FLUL1 WG; ST y 1; INVL1 WG; LD y; //1 LD x; x=0
A corrupt MP ST x 42; FLUL1 WG; ST y 1; INVL1 WG; LD y; //1 LD x; x=42 x=0
A corrupt MP ST x 42; FLUL1 WG; ST y 1; INVL1 WG; LD y; //1 LD x; x=42 x=0 x=42
A corrupt MP ST x 42; FLUL1 WG; ST y 1; INVL1 WG; LD y; //1 LD x; x=42 y=1 x=0 x=42
A corrupt MP ST x 42; FLUL1 WG; ST y 1; INVL1 WG; LD y; //1 LD x; x=42 y=1 x=0 x=42 y=1
A corrupt MP ST x 42; FLUL1 WG; ST y 1; INVL1 WG; LD y; //1 LD x; x=42 y=1 x=0 y=1 x=42 y=1
A corrupt MP ST x 42; FLUL1 WG; ST y 1; INVL1 WG; LD y; //1 LD x; x=42 y=1 x=0 y=1 x=42 y=1