Lecture 5 The Poisso Distributio 5.1 Itroductio Example 5.1: Drowigs i Malta The book [Mou98] cites data from the St. Luke s Hospital Gazette, o the mothly umber of drowigs o Malta, over a period of early 30 years (355 cosecutive moths). Most moths there were o drowigs. Some moths there was oe perso who drowed. Oe moth had four people drow. The data are give as couts of the umber of moths i which a give umber of drowigs occurred, ad we repeat them here as Table 5.1. Lookig at the data i Table 5.1, we might suppose that oe of the followig hypotheses is true: Some moths are particularly dagerous; Or, o the cotrary, whe oe perso has drowed, the surroudig publicity makes others more cautious for a while, prevetig drowigs? Or, drowigs are simply idepedet evets? How ca we use the data to decide which of these hypotheses is true? We might reasoably suppose that the first hypothesis would predict that there would be more moths with high umbers of drowigs tha the idepedece hypothesis; the secod 81
82 The Poisso Distributio Table 5.1: Mothly couts of drowigs i Malta. No. of drowig deaths per moth Frequecy (No. moths observed) 0 224 1 102 2 23 3 5 4 1 5+ 0 hypothesis would predict fewer moths with high umbers of drowigs. The problem is, we do t kow how may we should expect, if idepedece is correct. What we eed is a model: A sesible probability distributio, givig the probability of a moth havig a certai umber of drowigs, uder the idepedece assumptio. The stadard model for this sort of situatio is called the Poisso distributio. The Poisso distributio is used i situatios whe we observe the couts of evets withi a set uit of time, area, volume, legth etc. For example, The umber of cases of a disease i differet tows; The umber of mutatios i give regios of a chromosome; The umber of dolphi pod sightigs alog a flight path through a regio; The umber of particles emitted by a radioactive source i a give time; The umber of births per hour durig a give day. I such situatios we are ofte iterested i whether the evets occur radomly i time or space. Cosider the Babyboom dataset (Table 1.2), that we saw i Lecture 1. The birth times of the babies throughout the day are show i Figure 5.1(a). If we divide up the day ito 24 hour itervals ad
The Poisso Distributio 83 cout the umber of births i each hour we ca plot the couts as a histogram i Figure 5.1(b). How does this compare to the histogram of couts for a process that is t radom? Suppose the 44 birth times were distributed i time as show i Figure 5.1(c). The histogram of these birth times per hour is show i Figure 5.1(d). We see that the o-radom clusterig of evets i time causes there to be more hours with zero births ad more hours with large umbers of births tha the real birth times histogram. This example illustrates that the distributio of couts is useful i ucoverig whether the evets might occur radomly or o-radomly i time (or space). Simply lookig at the histogram is t sufficiet if we wat to ask the questio whether the evets occur radomly or ot. To aswer this questio we eed a probability model for the distributio of couts of radom evets that dictates the type of distributios we should expect to see. 5.2 The Poisso Distributio The Poisso distributio is a discrete probability distributio for the couts of evets that occur radomly i a give iterval of time (or space). If we let X = The umber of evets i a give iterval, The, if the mea umber of evets per iterval is λ The probability of observig x evets i a give iterval is give by λ λx P(X = x) = e x! x =0, 1, 2, 3, 4,... Note e is a mathematical costat. e 2.718282. There should be a butto o your calculator e x that calculates powers of e. If the probabilities of X are distributed i this way, we write X Po(λ) λ is the parameter of the distributio. We say X follows a Poisso distributio with parameter λ
84 The Poisso Distributio 0 200 400 600 800 1000 1200 1440 Birth Time (miutes sice midight) (a) Babyboom data birth times Frequecy 0 5 10 15 0 2 4 6 No. of births per hour (b) Histogram of Babyboom birth times 0 200 400 600 800 1000 1200 1440 Birth Time (miutes sice midight) (c) Noradom birth times Frequecy 0 5 10 15 0 2 4 6 No. of births per hour (d) Histogram of oradom birth times Figure 5.1: Represetig the babyboom data set (upper two) ad a oradom hypothetical collectio of birth times (lower two). Note A Poisso radom variable ca take o ay positive iteger value. I cotrast, the Biomial distributio always has a fiite upper limit.
The Poisso Distributio 85 Example 5.2: Hospital births Births i a hospital occur radomly at a average rate of 1.8 births per hour. What is the probability of observig 4 births i a give hour at the hospital? Let X = No. of births i a give hour (i) Evets occur radomly (ii) Mea rate λ =1.8 X Po(1.8) We ca ow use the formula to calculate the probability of observig exactly 4 births i a give hour P (X = 4) = e 1.8 1.84 4! =0.0723 What about the probability of observig more tha or equal to 2 births i a give hour at the hospital? We wat P (X 2) = P (X = 2) + P (X = 3) +... i.e. a ifiite umber of probabilities to calculate but P (X 2) = P (X = 2) + P (X = 3) +... = 1 P (X <2) = 1 (P (X = 0) + P (X = 1)) 1.8 1.80 = 1 (e +e 0! 1! = 1 (0.16529 + 0.29753) = 0.537 1.8 1.81 )
86 The Poisso Distributio Example 5.3: Disease icidece Suppose there is a disease, whose average icidece is 2 per millio people. What is the probability that a city of 1 millio people has at least twice the average icidece? Twice the average icidece would be 4 cases. We ca reasoably suppose the radom variable X=# cases i 1 millio people has Poisso distributio with parameter 2. The P (X 4) = 1 P (X 3) = 1 e 2 20 21 22 23 + e 2 + e 2 + e 3 0! 1! 2! 3! 5.3 The shape of the Poisso distributio Usig the formula we ca calculate the probabilities for a specific Poisso distributio ad plot the probabilities to observe the shape of the distributio. For example, Figure 5.2 shows 3 differet Poisso distributios. We observe that the distributios (i). are uimodal; (ii). exhibit positive skew (that decreases as λ icreases); (iii). are cetred roughly o λ; (iv). have variace (spread) that icreases as λ icreases. =0.143. 5.4 Mea ad Variace of the Poisso distributio I geeral, there is a formula for the mea of a Poisso distributio. There is also a formula for the stadard deviatio, σ, ad variace, σ 2. If X Po(λ) the µ = λ σ = λ σ 2 = λ
The Poisso Distributio 87 Po(3) Po(5) Po(10) P(X) 0.00 0.05 0.10 0.15 0.20 0.25 0 5 10 15 20 X P(X) 0.00 0.05 0.10 0.15 0.20 0.25 0 5 10 15 20 X P(X) 0.00 0.05 0.10 0.15 0.20 0.25 0 5 10 15 20 X Figure 5.2: Three differet Poisso distributios. 5.5 Chagig the size of the iterval Suppose we kow that births i a hospital occur radomly at a average rate of 1.8 births per hour. What is the probability that we observe 5 births i a give 2 hour iterval? Well, if births occur radomly at a rate of 1.8 births per 1 hour iterval The births occur radomly at a rate of 3.6 births per 2 hour iterval Let Y = No. of births i a 2 hour period The Y Po(3.6) P (Y = 5) = e 3.6 3.65 5! =0.13768 This example illustrates the followig rule If X Po(λ) o 1 uit iterval, the Y Po(kλ) o k uit itervals.
88 The Poisso Distributio 5.6 Sum of two Poisso variables Now suppose we kow that i hospital A births occur radomly at a average rate of 2.3 births per hour ad i hospital B births occur radomly at a average rate of 3.1 births per hour. What is the probability that we observe 7 births i total from the two hospitals i a give 1 hour period? To aswer this questio we ca use the followig rule If X Po(λ 1 ) o 1 uit iterval, ad Y Po(λ 2 ) o 1 uit iterval, the X + Y Po(λ 1 + λ 2 ) o 1 uit iterval. So if we let X = No. of births i a give hour at hospital A ad Y = No. of births i a give hour at hospital B The X Po(2.3), Y Po(3.1) ad X + Y Po(5.4) P (X + Y = 7) = e 5.4 5.47 7! =0.11999 Example 5.4: Disease Icidece, cotiued Suppose disease A occurs with icidece 1.7 per millio, ad disease B occurs with icidece 2.9 per millio. Statistics are compiled, i which these diseases are ot distiguished, but simply are all called cases of disease AB. What is the probability that a city of 1 millio people has at least 6 cases of AB? If Z=# cases of AB, the P Po(4.6). Thus, P (Z 6) = 1 P (Z 5) =1 e 4.6 4.6 0 =0.314. 0! + 4.61 1! + 4.62 2! + 4.63 3! + 4.64 4! + 4.65 5!
The Poisso Distributio 89 5.7 Fittig a Poisso distributio Cosider the two sequeces of birth times we saw i Sectio 1. Both of these examples cosisted of a total of 44 births i 24 hour itervals. Therefore the mea birth rate for both sequeces is 44 24 =1.8333 What would be the expected couts if birth times were really radom i.e. what is the expected histogram for a Poisso radom variable with mea rate λ = 1.8333. Usig the Poisso formula we ca calculate the probabilities of obtaiig each possible value 1 x 0 1 2 3 4 5 6 P (X = x) 0.15989 0.29312 0.26869 0.16419 0.07525 0.02759 0.01127 The if we observe 24 hour itervals we ca calculate the expected frequecies as 24 P (X = x) for each value of x. x 0 1 2 3 4 5 6 Expected frequecy 3.837 7.035 6.448 3.941 1.806 0.662 0.271 24 P (X = x) We say we have fitted a Poisso distributio to the data. This cosisted of 3 steps (i). Estimatig the parameters of the distributio from the data (ii). Calculatig the probability distributio (iii). Multiplyig the probability distributio by the umber of observatios Oce we have fitted a distributio to the data we ca compare the expected frequecies to those we actually observed from the real Babyboom dataset. We see that the agreemet is quite good. x 0 1 2 3 4 5 6 Expected 3.837 7.035 6.448 3.941 1.806 0.662 0.271 Observed 3 8 6 4 3 0 0 1 i practice we group values with low probability ito oe category.
90 The Poisso Distributio Whe we compare the expected frequecies to those observed from the oradom clustered sequece i Sectio 1 we see that there is much less agreemet. x 0 1 2 3 4 5 6 Expected 3.837 7.035 6.448 3.941 1.806 0.662 0.271 Observed 12 3 0 2 2 4 1 I Lecture 9 we will see how we ca formally test for a differece betwee the expected ad observed couts. For ow it is eough just to kow how to fit a distributio. 5.8 Usig the Poisso to approximate the Biomial The Biomial ad Poisso distributios are both discrete probability distributios. I some circumstaces the distributios are very similar. For example, cosider the Bi(100, 0.02) ad Po(2) distributios show i Figure 5.3. Visually these distributios are idetical. I geeral, If is large (say > 50) ad p is small (say < 0.1) the a Bi(, p) ca be approximated with a Po(λ) whereλ = p Example 5.5: Coutig lefties Give that 5% of a populatio are left-haded, use the Poisso distributio to estimate the probability that a radom sample of 100 people cotais 2 or more left-haded people. X = No. of left haded people i a sample of 100 X Bi(100, 0.05) Poisso approximatio X Po(λ) withλ = 100 0.05 = 5
The Poisso Distributio 91 Bi(100, 0.02) Po(2) P(X) 0.00 0.10 0.20 0 2 4 6 8 10 P(X) 0.00 0.10 0.20 0 2 4 6 8 10 X X Figure 5.3: A Biomial ad Poisso distributio that are very similar. We wat P (X 2)? P (X 2) = 1 P (X <2) = 1 P (X = 0) + P (X = 1) 5 50 51 1 e +e 5 0! 1! 1 0.040428 0.9596 If we use the exact Biomial distributio we get the aswer 0.9629. The idea of usig oe distributio to approximate aother is widespread throughout statistics ad oe we will meet agai. Why would we use a approximate distributio whe we actually kow the exact distributio? The exact distributio may be hard to work with. The exact distributio may have too much detail. There may be some features of the exact distributio that are irrelevat to the questios
92 The Poisso Distributio we wat to aswer. By usig the approximate distributio, we focus attetio o the thigs we re really cocered with. For example, cosider the Babyboom data, discussed i Example 5.2. We said that radom birth times should yield umbers of births i each hour that are Poisso distributed. Why? Cosider the births betwee 6 am ad 7 am. Whe we say that the births are radom, we probably mea somethig like this: The times are idepedet of each other, ad have equal chaces of happeig at ay time. Ay give oe of the 44 births has 24 hours whe it could have happeed. The probability that it happes durig this hour is p =1/24 = 0.0417. The births betwee 6 am ad 7 am should thus have about the Bi(44, 0.0417) distributio. This distributio is about the same as Po(1.83), sice 1.83 = 44 0.0417. Example 5.6: Drowigs i Malta, cotiued We ow aalyse the data o the mothly umbers of drowig icidets i Malta. Uder the hypothesis that drowigs have othig to do with each other, ad have causes that do t chage i time, we would expect the probability the radom umber X of drowigs occur i a moth to have a Poisso distributio? Why is that? We might imagie that there are a large umber of people i the populatio, each of whom has a ukow probability p of drowig i ay give moth. The the umber of drowigs i a moth has Bi(, p) distributio. I order to use this model, we eed to kow what ad p are. That is, we eed to kow the size of the populatio, which we do t really care about. O the other had, the expected (mea) umber of mothly drowigs is p, ad that ca be estimated from the observed mea umber of drowigs. If we approximate the biomial distributio by Po(λ), where λ = p, the we do t have to worry about We estimate λ as total umber of drowigs/umber of moths. The total umber of drowigs is 0 224 + 1 102 + 2 23 + 3 5+4 1 = 167, so we estimate λ = 167/355 = 0.47. We show the probabilities for the differet possible outcomes i the last last colum of Table 5.2. I the third colum we show the expected umber of moths with a give umber of drowigs, assumig
The Poisso Distributio 93 Table 5.2: Mothly couts of drowigs i Malta, with Poisso fit. No. of drowig Frequecy (No. Expected frequecy Probability deaths per moth moths observed) Poisso λ = 0.47 0 224 221.9 0.625 1 102 104.3 0.294 2 23 24.5 0.069 3 5 3.8 0.011 4 1 0.45 0.001 5+ 0 0.04 0.0001 the idepedece assumptio ad hece the Poisso model is true. This is computed by multiplyig the last colum by 355. After all, if the probability of o drowigs i ay give moth is 0.625, ad we have 355 moths of observatios, we expect 0.625 355 moths with 0 drowigs. We see that the observatios (i the secod colum) are pretty close to the predictios of the Poisso model (i the third colum), so the data do ot give us strog evidece to reject the eutral assumptio, that drowigs are idepedet of oe aother, ad have a costat rate i time. I Lecture 9 we will describe oe way of testig this hypothesis formally. Example 5.7: Swie flu vacciatio I 1976, fear of a impedig swie flu pademic led to a mass vacciatio campaig i the US. The pademic ever materialised, but there were cocers that the vacciatio may have led to a icrease i a rare ad serious eurological disease, Guillai-Barré Sydrome (GBS). It was difficult to determie whether the vaccie was really at fault, sice GBS may arise spotaeously about 1 perso i 100,000 develops GBS i a give year ad the umber of cases was small. Cosider the followig data from the US state of Michiga: Out of 9 millio residets, about 2.3 millio were vacciated. Of
94 The Poisso Distributio those, 48 developed GBS betwee July 1976 ad Jue 1977. We might have expected 2.3 millio 10 5 cases/perso-year = 23 cases. How likely is it that, purely by chace, this populatio would have experieced 48 cases i a sigle year? If Y is the umber of cases, it would the have Poisso distributio with parameter 23, so that 47 23 23i P (Y 48) = 1 e i! i=0 =3.5 10 6. So, such a extreme umber of cases is likely to happe less tha 1 year i 100,000. Does this prove that the vaccie caused GBS? The people who had the vaccie are people who chose to be vacciated. They may differ from the rest of the populatio i multiple ways i additio to the elemetary fact of havig bee vacciated, ad some of those ways may have predisposed them to GBS. What ca we do? The paper [BH84] takes the followig approach: If the vaccie were ot the cause of the GBS cases, we would expect o coectio betwee the timig of the vaccie ad the oset of GBS. I fact, though, there seemed to be a particularly large umber of cases i the six weeks followig vacciatio. Ca we say that this was more tha could reasoably be expected by chace? The data are give i Table 5.3. Each of the 40 GBS cases was assiged a time, which is the umber of weeks after vacciatio whe the disease was diagosed. (Thus week 1 is a differet caledar week for each subject.) If the cases are evely distributed, the umber i a give week should be Poisso distributed with parameter 40/30 = 1.33. Usig this parameter, we compute the probabilities of 0, 1, 2,... cases i a week, which we give i row 3 of Table 5.3. Multiplyig these umbers by 30 gives the expected frequecies i row 4 of the table. It is clear that the observed ad expected frequecies are very differet. Oe way of seeig this is to cosider the stadard deviatio. The Poisso distributio has SD 1.33 = 1.15 (as discussed i sectio 5.4,
The Poisso Distributio 95 while the data have SD 1 30 1 16 (0 1.33) 2 +7 (1 1.33) 2 +3 (2 1.33) 2 s = =2.48. +2 (4 1.33) 2 +1 (9 1.33) 2 +1 (10 1.33) 2 Table 5.3: Cases of GBS, by weeks after vacciatio # cases per week 0 1 2 3 4 5 6+ observed frequecy 16 7 3 0 2 0 2 probability 0.264 0.352 0.234 0.104 0.034 0.009 0.003 expected frequecy 7.9 10.6 7.0 3.1 1.0 0.3 0.1 5.9 Derivatio of the Poisso distributio (oexamiable) This sectio is ot officially part of the course, but is optioal, for those who are iterested i more mathematical detail. Where does the formula i sectio 5.2 come from? Thik of the Poisso distributio as i sectio 5.8, as a approximatio to a biomial distributio. Let X be the (radom) umber of successes i a collectio of idepedet radom trials, where the expected umber of successes is λ. This will, of course, deped o the umber of trials, but we show that whe the umber of trials (call it ) gets large, the exact umber of trials does t matter. I mathematical laguage, we say that the probability coverges to a limit as goes to ifiity. But how large is large? We would like to kow how good the approximatio is, for real values of, of the sort that we are iterested i. Let X be the radom umber of successes i idepedet trials, where the probability of each success is λ/. Thus, the probability of success goes dow as the umber of trials goes up, ad expected umber of successes is always the same λ. The λ x P {X = x} = C x 1 λ x.
96 The Poisso Distributio Now, those of you who have leared some calculus at A-levels may remember the Taylor series for e z : e z =1+z + z2 2! + z3 3! +. I particular, for small z we have e z 1 z, ad the differece (or error i the approximatio) is o bigger tha z 2 /2. The key idea is that if z is very small (as it is whe z = λ/, ad is large), the z 2 is a lot smaller tha z. Usig a bit of algebra, we have P {X = x} = C x λ x 1 λ x 1 λ λ x ( 1) ( x + 1) = x! = λx (1) 1 1 1 x 1 x! 1 λ x x 1 λ x 1 λ 1 λ. Now, if we re ot cocered about the size of the error, we ca simply say that is much bigger tha λ or x (because we re thikig of a fixed λ ad x, ad gettig large). So we have the approximatios 1 1 1 x 1 1; 1 λ x 1; 1 λ e λ/ = e λ. Thus P {X = x} λx x! e λ. 5.9.1 Error bouds (very mathematical) I the log ru, X has a distributio very close to the Poisso distributio defied i sectio 5.2. But how log is the log ru? Do we eed 10 trials? 1000? a billio? If you just wat the aswer, it s approximately this: The error that you ll make by takig the Poisso distributio istead of the biomial is o more
The Poisso Distributio 97 tha about 1.6λ 2 / 3/2. I Example 5.5, where = 100 ad λ = 5, this says the error wo t be bigger tha about 0.04, which is useful iformatio, although i reality the maximum error is about 10 times smaller tha this. O the other had, if = 400, 000 (about the populatio of Malta), ad λ =0.47, the the error will be oly about 10 8. Let s assume that is at least 4λ 2,soλ< /2. Defie the approximatio error to be := max P {X = x} P {X = x}. (The bars mea that we re oly iterested i how big the differece is, ot whether it s positive or egative.) The P {X = x} P {X = x} = λx (1) 1 1 1 x 1 x! 1 λ x 1 λ λx x! e λ = λx x! e λ (1) 1 1 1 x 1 1 λ x 1 λ/ 1 e λ/ If x is bigger tha,thep {X = x} ad P {X = x} are both tiy; we wo t go ito the details here, but we will cosider oly x that are smaller tha this. Now we have to do some careful approximatio. Basic algebra tells us that if a ad b are positive, (1 a)(1 b) =1 (a + b)+ab > 1 (a + b). We ca exted this to (1 a)(1 b)(1 c) > (1 (a+b))(1 c) > 1 (a+b+c). Ad so, fially, if a, b, c,... are all positive, the Thus ad 1 > (1 a)(1 b)(1 c) (1 z) > 1 (a + b + c + + z). 1 1 1 1 x 1 1 > x 1 > 1 k=0 1 λ x > 1 λx. Agai applyig some calculus, we tur this ito 1 < 1 λ x < 1+ λx λx. k > 1 x2 2,
98 The Poisso Distributio We also kow that which meas that 1 λ <e λ/ < 1 λ + λ2 2 2, ad 1 λ 2 2( 2 λ) < 1 λ/ < 1, e λ/ λ 2 1 2( λ) < λ 2 1 λ/ 1 2( 2 < < 1. λ) e λ/ Now we put together all the overestimates o oe side, ad all the uderestimates o the other. λ x x! e λ λ2 2( λ) λx P {X = x} P {X = x} λx λx x! e λ. λx So, fially, as log as 4λ 2,we get max λx+1 λ e λ x! + x + x (1 x/2. ) We eed to fid the maximum over all possible x. If x< the this becomes max 1 λ x+1 e λ (λ +3x) 4λ2 x! 2π, (by a formula kow as Stirlig s formula ), where λ = max{λ, 1}.