Maximum Etopy, Paallel Computatio ad Lotteies S.J. Cox Depatmet of Electoics ad Compute Sciece, Uivesity of Southampto, UK. G.J. Daiell Depatmet of Physics ad Astoomy, Uivesity of Southampto, UK. D.A. Nicole, Depatmet of Electoics ad Compute Sciece, Uivesity of Southampto, UK. Abstact By pickig upopula sets of umbes i a lottey, it is possible to icease oe s expected wiigs. We have used the Maximum Etopy method to estimate the pobability of each of the 14 millio tickets beig chose by playes i the UK Natioal Lottey. We discuss the paallel solutio of the o-liea system of equatios o a vaiety of platfoms ad give esults which idicate the etus achieved by a sydicate buyig a lage umbe of tickets each week. Keywods: Maximum Etopy, Lottey, Paallel Computatio, Commodity Supecomputig. 1 Itoductio I may lotteies the pizes which playes wi deped o the umbe of othe wies. I the example of the UK Natioal lottey, which we use thoughout this pape, playes pick 6 umbes fom 1 to 49. A simila system opeates i may lotteies acoss the wold: the Floida state lottey also allows choice of 6 umbes fom 49, whilst i the Califoia State lottey (Supelotto) playes pick 6 fom 51. Fo the UK Natioal lottey, 6 mai ad a bous umbe ae daw evey Wedesday ad Satuday. Playes ae awaded a fixed 10 pize if they match 3 of the mai umbes. The pizes i the othe categoies deped o the umbes of wies ad ae typically 62 fo a 4-match, 1500 fo 5-match, 100 000 fo matchig 5 of the mai umbes ad the bous, while a typical ackpot wie eceives aoud 2000000 [1, 2, 3]. The pize fud is made up of 45 pece fom evey poud ticket bought. I a pevious pape [4] we applied the Maximum Etopy method to elicit stuctue i playes choices of umbes statig fom the published umbes of pize wies. We estimated the populaity of each of the 14 millio tickets, fom which we computed the populaity of idividual umbes ad pais of umbes. A cude calculatio showed that it is possible to double oe s expected wiigs whe puchasig a sigle upopula ticket. I this pape we focus o the paallel solutio of the system of o-liea equatios which esult fom the applicatio of the Maximum Etopy method ad show the etus to a sydicate buyig a lage umbe of tickets. The layout of the pape is as follows. I sectio 2 we discuss the applicatio of the 1
Maximum Etopy method. We discuss the atue of the paallel solutio of the esultig set of o-liea equatios ad give some of the fist scietific esults usig Fota with MPI o a commodity cluste of DEC Alpha wokstatios uig Widows NT [5] i sectio 3. The ew esults we peset i sectio 4 show that a sydicate which buys aoud 75000 tickets pe week would beefit fom choosig the upopula umbes which we ca idetify. We daw ou coclusios i sectio 5. 2 Applicatio of Maximum Etopy We wish to detemie the pobability of each of the possible 13 983 816 tickets beig bought, subect to the costaits that the pobabilities ae cosistet with the umbes of wies obseved i the daws so fa. The data is available fom a idepedet iteet souce [6]. Jayes Maximum Etopy Piciple says that if oe is foced to assig pobabilities, p i, usig limited ifomatio, oe should do so by maximisig the etopy of the distibutio: S = pi log pi, (1) i subect to the costaits of kow expectatio values [7]. This Maximum Etopy distibutio is the most cosevative assigmet i the sese that it does ot pemit oe to daw ay coclusios ot waated by the data. [8]. We use the followig otatio [4]. Each ticket is deoted by a sigle idex t, which is a abbeviatio fo six umbes, chose without epetitio, fom the set of iteges {1, 2,..., 49}. P(t) is the pobability that a playe chooses the ticket labelled t. The wiig set of umbes daw i a paticula week is deoted by. Let ( t,) = 1 If t ad have exactly umbes i commo. 0 Othewise. The expected factio of playes matchig exactly umbes, i a week whe the wiig umbes ae idexed by, is the give by f () whee: f ( ) = ( t,) P(t). (3) t Suppose W lottey daws have bee made, the values of f 3 (), f 4 (), ad f 5 () ae kow fo W diffeet values of : 1, 2,..., W. Equatio (3) the leads to a set of 3W costaits that apply to the distibutio P(t). We assume that P(t) is idepedet of time ad sice playes make idepedet samples fom P(t), the umbe of playes buyig ticket t follows a Poisso distibutio with paamete µ(t) = P(t) N, whe N tickets ae sold. We fid the maximum etopy estimate of the populaity of each ticket is $P ( t): (2) 2
P $ ( t) exp = 1 λ ( t, ), (4) Z, i which Z, the patitio fuctio, omalises the pobability distibutio: Z = exp λ ( t, ) (5) t, To fid the ukow Lagage multiplies, λ, we substitute P $ ( t) fom (4) ito the costait equatios (3). Fo W daws, the costait equatios defie a set of 3W o-liea equatios fo the Lagage multiplies, λ : f 1 ( ) = (t, ) exp λ ( t, ). (6) Z t, 3 Paallel Computatioal Method To solve the o-liea equatios defied by (6), we use a iteative techique based o Newto s method [9] i which we supply the aalytic Jacobia. The pocedue coveges i 5-8 iteatios. The equatios (6) may be witte as: G = { (t, ) f ( )} exp λ ( t, ) = 0, (7) t, which yields the followig explicit elemets of the Jacobia: J i s G = i λ s = Each iteatio updates the Lagage multiplies usig t s(t, i ) { (t, ) f ( )} exp λ, i s i s ( i 1 J ) G s ( t, ). (8) λ ( ew) = λ ( old). (9) We have desiged a efficiet paallel algoithm to solve the system of o-liea equatios (7) which cosists of two pats: 1. The Jacobia fo the system is filled i paallel, by dividig up the sum ove t (the 14 millio tickets) i (8) betwee the pocessos. 2. Calculatio of the Jacobia may be expessed as computatio of the patitio fuctio (5). Usig the aggegate memoy o the multiple pocessos, it is possible to stoe a lookup table fom which the patitio fuctio may be computed easily. 3
The lookup table yields which tickets wo pizes i which weeks. Fo each week of data this table has espectively 246 820, 13 545, ad 258 eties fom tickets wiig 3, 4, ad 5 match pizes. To stoe the lookup table fo a few huded weeks of data equies seveal huded Mb of memoy. To illustate the stoage scheme fo the patitio fuctio, we coside a simple lottey i which thee umbes ae chose out of {1, 2, 3, 4, 5}. Pizes ae awaded fo those tickets matchig 2 o 3 umbes. Let the wiig umbes daw be {1,2,3}, {1,3,4}, ad {1,4,5}. I this case ticket {1,2,3}, fo example, wo a 3-match i week 1 ad a 2-match i week 2. Its cotibutio to the patitio fuctio (5) is 1 2 exp( λ + ). Fo coveiece, each Lagage multiplie is labelled by a sigle idex: λ m = 3 λ2 λ, whee m = (-2) W +, whee = 2,, 3 ad = 1,, W = 3. The lookup table cosists of a couted aay of the pizes each ticket has wo, labelled by m. The fist elemet fo each ticket is the umbe of pizes the ticket has wo, followed by a list of the labels m. Fo ou example ticket, the eties would thus be 2 (the umbe of pizes wo), 4 (3-match i week 1), ad 2 (2 match i week 2). To educe the stoage futhe, it is possible to combie the fist two table eties fo each ticket ito a sigle umbe by shiftig oe of the umbes to the left ad addig. This has the advatage that the size of the lookup table is educed ad it gows by a fixed amout as moe weeks of data ae used. Each pocesso evaluates the patitio fuctio ove its set of tickets, ad the fial esult is obtaied usig a global eductio. The Jacobia matix is filled i a aalogous mae. I Figue 1 we show the pocessig time fo 73 weeks of data. The efficiecy o the 16 ode SP system is ust ove 90%. Use of the lookup table is memoy itesive: ideed the sigle ode pefomace of the code is limited by the available memoy badwidth. The SP thi2 odes have twice the memoy badwidth of the thi2 odes (ad a slightly lage cache) ad pefom ealy twice as fast. The itecoectio etwok betwee pocessos o ou commodity cluste of DEC Alpha wokstatios is 100Mbit switched etheet. At the time of witig (Feb 1998), the 0.92 Beta elease MPI implemetatio o NT 4.0 [10] which we ae usig limits the efficiecy fo uig paallel obs: it is iteded fo use o shaed memoy systems. It achieves a badwidth of 56 kbs -1, compaed with file tasfe badwidth of 4-7 Mbs -1. We supplied ou ow Fota bidigs fo this [5]. Whilst ou esults should ot be itepeted as a bechmak of the pefomace of MPI o NT, we ae ecouaged by the speedup (15%) obtaied o two odes. 4
Time (secods) 4000 3500 3000 2500 2000 1500 IBM SP (Thi1) IBM SP (Thi2) 500 MHz DEC Alpha Cluste 1000 500 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Numbe of Pocessos Figue 1 Total pocessig time fo 73 weeks of data o a vaious machies with paallel costuctio of the Jacobia matix ad table lookup to calculate the patitio fuctio. Whilst a speedup of 14.5 is deived fom usig a 16 ode machie by a simple paallelisatio of the algoithm, it is woth otig that implemetatio of the lookup table made the code u 17 times faste! Good distibuted paallel algoithms should exploit ot oly the pocessig powe, but also the aggegate memoy available o seveal pocessos. We used 6 thi2 SP odes to compute the Lagage multiplies fo W 100 ad used task famig to ou 8 ode DEC cluste fo W < 100. All othe calculatios, which did ot equie sigificat memoy esouces, wee pefomed usig the commodity cluste. 4 Results I the UK, a umbe of ogaisatios buy a lage umbe of tickets each week ad distibute them fo advetisig puposes. We have cosideed a sydicate which buys 75000 tickets each week. If such a sydicate bought tickets at adom, the the expected pizes i the vaious categoies would be the aveage 10 (fixed), 62, 1500, 100 000, ad 2 000 000 fo matchig 3, 4, 5, 5 plus bous, ad 6 umbes espectively. We have cosideed a sydicate which buys, i the ext daw, the least popula 75000 tickets which ou Maximum Etopy techique ca idetify usig the pevious W weeks of data. We ecompute the pizes i the ext daw as if ou sydicate had bought these tickets, takig ito accout the effect of olloves ad supe daws (whee the ackpot is topped up) usig the published pize stuctue [6]. I Table 1 we compae what such a sydicate would have wo usig the eal lottey data with the aveage 5
pizes wo ove the fist 224 weeks of the lottey. I all cases whee pickig upopula tickets ca icease the pize wo, the pizes wo ae iceased by betwee 36% to 101%. Jackpot 5 + bous 5 match 4 match 3 match Maximum Etopy Sydicate Aveage Pize 3 307 196 210 821 2690 87 10 Obseved Pize 1 946 366 104 881 1574 64 10 % Icease 70 101 71 36 (Fixed) Table 1 Aveage Pizes wo by sydicate compaed with aveage pizes obseved fom the lottey data I Figue 2 we show the sydicate s aveage etu o thei total ivestmet as the daws pogess. The peaks i the gaph occu whe the sydicate wo 5+bous pizes (daws 73, 76, 105, 109, 133, 222) o a ackpot pize (daw 133). I total the sydicate spet 15.3 millio ad wo back 10.3 millio. This compaes with the theoetical 6.9 millio expected fom buyig tickets at adom. Ou sydicate would have wo 50% moe moey usig ou Maximum Etopy techique. It is impotat to ote that the chaces of wiig ae uaffected: the additioal wiigs ae oly due to pickig upopula sets of wiig umbes. 100 Aveage Retu pe ticket (Pece) 80 60 40 20 Theoetical Retu fom Aveage Ticket 0 0 50 100 150 200 250 Daw Numbe (W ) Figue 2 Aveage etu pe ticket fo sydicate buyig 75000 tickets pe week as a fuctio of daw umbe 6
5 Coclusios We have applied the Maximum Etopy method to estimate the pobability of each ticket beig bought i the UK Natioal Lottey usig the factio of wies i the 3, 4, ad 5 match categoies. The esultig system of o-liea equatios wee solved usig a efficiet paallel algoithm o a distibuted memoy IBM SP ad o a commodity cluste of DEC Alpha wokstatios. We coside a sydicate which buys, i the ext daw, the least popula 75000 tickets that we ca idetify usig data fom the pevious weeks. We fid that the aveage pize i the 4, 5, 5 + bous match ad ackpot categoy is iceased by at least 36%. The oveall etu is iceased by 50% fom 45 pece i the poud (buyig adomly) to 67 pece. I the futue we ited to pefom the same calculatios fo a umbe of othe lotteies. 6 Ackowledgemets We would like to thak Ageli Thomas, Keith Lloyd ad Joh Haigh fo useful discussios. We appeciate the effots of Richad Lloyd i caefully collatig the data ad placig it i o the iteet. 7 Refeeces [1] HAIGH, J., 1995. Ifeig Gambles Choice of Combiatios i the Natioal Lottey. IMA Bulleti. 31, pp. 132-136. [2] HAIGH, J., 1997. The Statistics of the Natioal Lottey. J. R. Statist. Soc. A 160, Pat 2, pp.187-206. [3] MOORE, P.G., 1997. The Developmet of the UK Natioal Lottey: 1992-96. J. R. Statist. Soc. A: 160, Pat 2, pp.169-185. [4] COX, S.J., DANIELL, G.J., ad NICOLE, D.A., 1997. Usig Maximum Etopy to Double Oe s Expected Wiigs i the UK Natioal Lottey. Submitted to J. R. Statist. Soc. D. [5] COX, S.J., NICOLE, D.A., ad TAKEDA, K.J, 1998. Commodity High Pefomace Computig at Commodity Pices. To appea i WoTUG-21, Poceedigs of the 21st Wold occam ad Taspute Use Goup Techical Meetig. [6] Richad Lloyd. Cuetly: http://lottey.meseywold.com/ [7] JAYNES, E.T., 1983. Papes o Pobability, Statistics ad Statistical Physics (ed. R.D. Rosekatz). Dodecht: Reidel. ISBN 9027714487. 7
[8] JAYNES, E.T., 1959. Pobability Theoy i Egieeig ad Sciece, pp. 110-151, USA: Socoy Mobil Oil Compay. [9] PRESS, W.H., TEULKOLSKY S.A., VETTERLING, W.T. ad FLANNERY B.R., 1992. Numeical Recipes i FORTRAN 77, 2d editio. Cambidge: Cambidge Uivesity Pess. ISBN 052143064X. [10] Athoy Skellum, Bois Potopopov, Shae Hebet, Pete J. Bea, ad Walte Seefeld. 1997. MPI o Widows NT. We used 0.92 Beta elease. Cuetly available at: http://www.ec.msstate.edu/mpi/mpint.html 8