Byesin Updting with Continuous Priors Clss 3, 8.05, Spring 04 Jeremy Orloff nd Jonthn Bloom Lerning Gols. Understnd prmeterized fmily of distriutions s representing continuous rnge of hypotheses for the oserved dt.. Be le to pply Byes theorem to updte prior pdf to posterior pdf given dt nd likelihood function. 3. Be le to interpret nd compute proilities using the posterior. Introduction Up to now we hve only done Byesin updting when we hd finite numer of hypothesis, e.g. our dice emple hd five hypotheses (4, 6, 8, or 0 sides). Now we will study Byesin updting when there is continuous rnge of hypothesis. The Byesin updte process will e essentilly the sme s in the discrete cse. As usul when moving from discrete to continuous we will need to replce pmf y pdf nd sums y integrls. Here re three stndrd emples with continuous rnges of hypotheses. Emple. Suppose you hve system tht cn succeed or fil with proility p. Then we cn hypothesize tht p is nywhere in the rnge [0, ]. Tht is, we hve continuous rnge of hypotheses. We will often model this emple with ent coin with unknown proility p of heds. Emple. The lifetime of certin isotope is modeled y n eponentil distriution ep(λ). In principl, the men lifetime /λ cn e ny rel numer in (0, ). Emple 3. We re not restricted to single prmeter. In principle, the prmeters μ nd σ of norml distriution cn e ny rel numers in (, )nd (0, ), respectively. If we model gesttionl length for single irths y norml distriution, then from millions of dt points we know tht μ is out 40 weeks nd σ is out one week. In ll of these emples hypothesis is model for the rndom process giving rise to the dt (successes nd filures, tomic lifetimes, gesttionl lengths). If we specify prmeterized fmily of distriutions, then hypothesis my e regrded s choice of prmeter(s). 3 Nottionl conventions 3. Prmetrized models As in the emples ove our hypotheses will often tke the form certin prmeter hs vlue θ. We will often use the letter θ to stnd for n ritrry hypothesis. This will leve
8.05 clss 3, Byesin Updting with Continuous Priors, Spring 04 symols like p, f, nd to tke there usul menings s pmf, pdf, nd dt. Rther thn sying the hypothesis tht the prmeter of interest hs vlue θ we will sy simply the hypothesis θ. 3. Big nd little letters We hve two prllel nottions for outcomes nd proility.. Event A, proility function P (A).. Vlue, pmf p() or pdf f(). These nottions re relted y P (X = ) = p(), where is vlue the discrete rndom vrile X nd X = is the corresponding event. We crry these nottions over to the conditionl proilities used in Byesin updting.. Hypotheses H nd dt D hve ssocited proilities P (H), P (D), P (H D), P (D H). In the coin emple we might hve H = the chosen coin hs proility.6 of heds, D = the flip ws heds, nd P (D H) =.6. Hypotheses (vlues) θ nd dt vlues hve proilities or proility densities: p(θ) p() p(θ ) p( θ) f(θ) f() f(θ ) f( θ) In the coin emple we might hve θ =.6 nd =, so p( θ) =.6. We might lso write p( = θ =.6) to emphsize the vlues of nd θ, ut we will never just write p(.6) since here its uncler which vlue is nd which is θ. Although we will still use oth types of nottion, we will mostly use the type involving pmf s nd pdf s from now on. Hypotheses will usully e prmeters represented y Greek letters (θ, λ, µ, σ,... ) while dt vlues will usully e represented y English letters (, i, y,... ). 4 Quick review of pdf nd proility Suppose X is rndom vrile with pdf f(). proility/(units of ). f() Recll f() is ; its units re f() proility f()d P (c X d) c d d The proility tht the vlue of X is in [c, d] is given y d f() d. c
8.05 clss 3, Byesin Updting with Continuous Priors, Spring 04 3 The proility tht X is in n infitesiml rnge d round is f() d. In fct, the integrl formul is just the sum of these infinitesiml proilities. We cn visulize these proilities y viewing the integrl s re under the grph of f(). In order to mnipulte proilities insted of densities, in wht follows we will mke frequent use of the notion tht f() d is the proility tht X is in n infinitesiml rnge round of width d. Plese mke sure tht you fully understnd this notion. 5 Byesin updting with continuous priors The tle for continuous priors is very simple. We cnnot hve row for ech of n infinite numer of hypotheses, so insted we hve just one row with vrile hypothesis θ. After lying out this tle, we will eplin how it rises nturlly y refining our tle for finite numer of hypotheses s the numer of hypotheses grows to infinity. In cses with discrete set of hypotheses we hd prior proility for ech hypothesis. Now suppose our hypotheses re tht the vlue of the prmeter θ lies in the rnge [, ]. In this cse we need prior pdf f(θ) which gives proility t ech hypothesis θ. In order to use proilities we will use infinitesiml rnges nd stte our hypothesis s: We cn then write The prmeter lies in rnge dθ round θ. H : θ ± dθ/, P (H) = f(θ) dθ. This is little cumersome nd it is esy to e sloppy. The dvntge of this nottion is two-fold. First using dθ will provide clue s to when you need to do n integrl. Second, when it comes to simulting with computer it tells you ectly how to discretize your model. For tody we will ssume tht our dt cn only tke discrete set of vlues. In this cse our given dt nd hypothesis θ the likelihood function is p( θ), i.e. the proility of given θ. Net time we will consider continuous dt distriutions where our likelihood will hve the form f( θ) d. Our tle ecomes unnormlized hypothesis prior likelihood posterior posterior θ ± dθ/ f(θ) dθ p( θ) p( θ)f(θ) dθ p( θ)f(θ) dθ T totl f(θ) dθ = T = p( θ)f(θ) dθ Notes:. The sum T of the unnormlized posterior column is given y n integrl. In prctice, computing this integrl is difficult nd est left to computers to do numericlly.. By including dθ, ll the entries in the tle re proilities. The posterior pdf for θ is found y removing the dθ: f(θ ) = p( θ) f(θ) T 3. If the unnormlized posterior turns out to e multiple of fmilir type of distriution, then we cn often void computing T. We will see severl such emples in 8.05.
8.05 clss 3, Byesin Updting with Continuous Priors, Spring 04 4 4. T = p(), the prior predictive proility of, i.e. the priori proility of oserving. By the lw of totl proility (see section elow), p() is weighted verge of likelihoods over ll hypotheses: p() = p( θ)f(θ) dθ. p( θ) f(θ) p( θ) f(θ) f(θ ) = = p( θ)f(θ) dθ p() unnormlized hypothesis prior likelihood posterior posterior θ ± dθ/ dθ θ θ dθ θdθ totl f(θ) dθ = T = θdθ =/ 0 Therefore the posterior pdf (fter seeing heds) is f(θ ) =θ. 5. With dθ removed, the tle orgnizes the continuous version of Byes theorem. Nmely, the posterior pdf is relted to the prior pdf nd likelihood function vi: Regrding oth sides s functions of θ, we cn gin epress Byes theorem in the form: f(θ ) p( θ) f(θ) 6. The use of θ ± dθ/ for hypotheses gets little tedious. At times we my llow ourselves to e it sloppy nd just write θ, ut we will still men θ ± dθ/ Emple 4. Coin with flt prior. Suppose we hve ent coin with unknown proility θ of heds. Also suppose we flip the coin once nd get heds. Strting from flt prior pdf compute the posterior pdf for θ. nswer: By flt prior we men f(θ) = on [0, ]; tht is, we ssume the true proility of heds is eqully likely to e ny proility. As we usully do with coin flips we let = for heds. In this cse the definition of θ sys the likelihood p( θ) = θ. We get the following tle: 5. From discrete to continuous Byesin updting To develop intuition for the trnsition from discrete to continuous Byesin updting, we ll wlk fmilir rod from clculus. Nmely we will: i) pproimte the continuous rnge of hypotheses y finite numer. ii) crete the discrete updting tle for the finite numer of hypotheses. iii) consider how the tle chnges s the numer of hypotheses goes to infinity. In this wy, will see the prior nd posterior pmf s converge to the prior nd posterior pdf s. Emple 5. To keep things concrete, we will work with the ent coin in Emple 4. We strt y slicing [0, ] into 4 equl intervls: [0, /4], [/4, /], [/, 3/4], [3/4, ]. Ech slice hs width Δθ = /4. We put our 4 hypotheses θ i t the centers of the four slices: θ : θ = /8, θ : θ = 3/8, θ 3 : θ = 5/8, θ 4 : θ = 7/8.
8.05 clss 3, Byesin Updting with Continuous Priors, Spring 04 5 The flt prior gives ech hypothesis proility of /4 = θ. We hve the tle: hypothesis prior likelihood un. posterior posterior θ = /8 /4 /8 (/4) (/8) /6 θ = 3/8 /4 3/8 (/4) (3/8) 3/6 θ = 5/8 /4 5/8 (/4) (5/8) 5/6 θ = 7/8 /4 7/8 (/4) (7/8) 7/6 Totl n θ i θ i= Here re the histogrms of the prior nd posterior pmf. The prior nd posterior pdfs from Emple 4 re superimposed on the histogrms in red. /8 3/8 5/8 7/8 /8 3/8 5/8 7/8 Net we slice [0,] into 8 intervls ech of width θ = /8 nd use the center of ech slice for our 8 hypotheses θ i. θ : θ = /6, θ : θ = 3/6, θ 3 : θ = 5/6, θ 4 : θ = 7/6 θ 5 : θ = 9/6, θ 6 : θ = /6, θ 7 : θ = 3/6, θ 8 : θ = 5/6 The flt prior gives ech hypothesis the prolility /8 = θ. Here re the tle nd histogrms. hypothesis prior likelihood un. posterior posterior θ = /6 /8 /6 (/8) (/6) /64 θ = 3/6 /8 3/6 (/8) (3/6) 3/64 θ = 5/6 /8 5/6 (/8) (5/6) 5/64 θ = 7/6 /8 7/6 (/8) (7/6) 7/64 θ = 9/6 /8 9/6 (/8) (9/6) 9/64 θ = /6 /8 /6 (/8) (/6) /64 θ = 3/6 /8 3/6 (/8) (3/6) 3/64 θ = 5/6 /8 5/6 (/8) (5/6) 5/64 n Totl θ i θ i=
8.05 clss 3, Byesin Updting with Continuous Priors, Spring 04 6 /6 3/6 5/6 7/6 9/6 /6 3/6 5/6 /6 3/6 5/6 7/6 9/6 /6 3/6 5/6 Finlly we slice [0,] into 0 pieces. This is essentilly identicl to the previous two cses. Let s skip right to the histogrms. Looking t the sequence of plots we see how the prior nd posterior histogrms converge to the prior nd posterior proility functions. 5. Using the posterior pdf Emple 6. In Emple 4, fter oserving one heds, wht is the (posterior) proility tht the coin is ised towrds heds? nswer: Since the prmeter θ is the proility the coin lnds heds, the prolem sks for P (θ > ). This is esily computed from the posterior pdf. P (θ > ) = f(θ ) dθ = θ dθ = θ = 4 3. This cn e compred with the prior proility tht the coin is ised towrds heds: P (θ > ) = f(θ) dθ =. 5 dθ = θ =. We see tht oserving one heds hs incresed the proility tht the coin is ised towrds heds from / to 3/4. 6 The lw of totl proility Recll tht for discrete hypotheses H, H,... H n the lw of totl proility sys the prior proility of dt D is n P (D) = P ( D H i )P (H i ). i=
8.05 clss 3, Byesin Updting with Continuous Priors, Spring 04 7 This is the prior proility of D ecuse we used the prior proility P (H i )Ifinsted we use the vlues, θ,θ,...,θ n for hypotheses nd for dt then this is written n p( )= p( θ i, )p(θ i ). i= p() = p( )= n p() = p( θ i )p(θ i ). i= We cll this the prior predictive proility of to distinguish it from the prior proility of θ. Suppose we then collect dt. Assuming dt nd re conditionlly independent (i.e., they re independent if we condition on hypothesis) then we cn replce the prior proility for θ y the posterior p(θ )toget the posterior predictive proility of given : n p( )= p( θ i )p(θ i ). i= If nd re not conditionlly independent then the proility of lso needs to e conditioned on. We get the somewht more complicted formul for the posterior predictive proility of given :(This is not something we will see in 8.05.) Likewise for continuous priors nd posteriors over the rnge [, ] nd discrete dt we hve p( θ)f(θ) dθ nd (ssuming nd re conditionlly independent) p( θ)f(θ ) dθ. Emple 7. In Emple 4, compute the prior predictive nd posterior predictive proility of heds on the second toss (i.e., prior nd posterior to tking into ccount tht the first toss ws heds). nswer: The prior predictive proility of = is p( )= p( θ) f(θ) dθ = θ dθ = dθ =. 0 0 0 The posterior predictive proility of = given = is p( )= p( θ) f(θ ) dθ = θ θdθ = θ dθ = =. 3 3 θ 3 0 0 0 0 We see tht oserving heds on the first toss hs incresed the proility of heds on the second toss from / to /3.
MIT OpenCourseWre http://ocw.mit.edu 8.05 Introduction to Proility nd Sttistics Spring 04 For informtion out citing these mterils or our Terms of Use, visit: http://ocw.mit.edu/terms.