Bayes Theorem
Frequentist vs. Bayesian Statistics Common situation in science: We have some data and we want to know the true hysical law describing it. We want to come u with a model that fits the data. Examle: We look at n= random galaxies and find that m=4 are sirals. So what s the true ratio of sirals in the universe, r? Frequentist: There are true, fixed arameters in a model (though they may be unknown at times). Data contain random errors which have a certain robability distribution (Gaussian for examle) Mathematical routines analyze the robability of getting certain data, given a articular model (If I fli a fair coin, what's the robability of me getting exactly 5% heads and 5% tails?) Bayesian: There are no true model arameters. Instead all arameters are treated as random variables with robability distributions. Random errors in data have no robability distribution, but rather the model arameters are random with their own distributions Mathematical routines analyze robability of a model, given some data (If I fli a coin and get X heads and Y tails, what is the robability that the coin is fair?). The statistician makes a guess (rior distribution) and then udates that guess with the data Both aroaches are addressing the same fundamental roblem, but attack it in reverse orders (robability of getting data, given a model, versus robability of a model, given some data). Its quite common to get the same basic result out of both methods, but many will argue that the Bayesian aroach more closely relates to the fundamental roblem in science (we have some data, and we want to infer the most likely truth)
Bayes Theorem The rimary tool of Bayesian statistics. Allows one to estimate the robability of measuring/observing something given that you have already measured/observed some other relevant iece of information ( B A) = ( A B) ( B) ( A) P(B A)=robability of measuring B given A P(A B)=robability of measuring A given B P(B)=rior robability of measuring B, before any data is taken P(A)=rior robability of measuring A, before any data is taken
A simle examle Drug Testing: ` Let say.5% of eole are drug users Our test is 99% accurate (it correctly identifies 99% of drug users and 99% of non-drug users) What s the robability of being a drug user if you ve tested ositive? Our Bayes theorem reads: ( user os) = ( os user) ( user) ( os).5 =.99 =.33..995 +.99.5 (os user)=.99 (99% effective at detecting users) (user)=.5 (only.5% of eole actually are users) (os)=.*.995+.99*.5 (% chance of non-users, 99.5% of the oulation, to be tested ositive, lus 99% chance of the users,.5% of the oulation, to be tested ositive Only a 33% chance that a ositive test is correct This examle assumes we know something about the general oulation (users vs nonusers), but we usually don t!
Examle: Galaxy Poulations Looked at n= random galaxies. Found m=4 sirals. What s the ratio of sirals in the universe, r? We are introducing an unknown model arameter. Bayes Theorem reads: ( r data) = ( data r) ( r) ( data) (r data)=robability of getting r, given our current data (what we want to know) (data r)=robability of measuring the current data for a given r (r)=robability of r before any data is taken (known as a rior) (data)=rior robability of measuring the data. This acts as a normalizing constant, and is defined as = In other words, it s the robability of getting finding the data considering all ossible values of r ( data) ( data r) ( r) dr
( r data) ( data r) = ( r) ( data r) ( r) dr Since there are only two ossible measurements (siral or no siral), (r data) is adequately described by a binomial distribution n! m!( n m)! m n m ( data r) = r ( r) = r 4 ( r) 6 We ll assume that before any data was taken, we figured all ossible values of r were equally likely, so we ll set (r)= (our rior)! 4!6! (data) is just an integral, and we find 4 ( data r ) ( r ) dr = r ( r ) 6 dr =! 4!6!! 4!6! 23
( r data) ( data r) = ( r) ( data r) ( r) dr Putting all this together and simlifying, we get: ( r data) = 23r 4 ( r) 6 This is just a robability distribution for r, centered around.4 as we would exect. Also, as exected, more data makes the result more robust (red curve).
The role of riors In revious examle, we assumed that all values of r were equally likely before we took any data. Often, we'll know something else (aart from the data) which we'll want to incororate into our rior (hysics, models, a hunch, etc.) As an examle, lets say we run a cosmological simulation which suggests r~.7+.5. We'll use this as our rior, (r), and estimate it as a Gaussian distribution centered around.7, with σ =.5. n! m!( n m)! m ( ) ( ) n data r = r r m Same as before ( r) ( r.7) ex σ 2π 2σ = 2 2 New rior
The role of riors Our new distribution Notice the rofound effect the rior can have on the result. The more data one has, the more the rior is overwhelmed, but it clearly lays a owerful (and otentially dangerous) role in low samle sizes Priors can be very controversial, esecially when you have no extra information on which to base your rior. Uniform riors, like we originally chose, are considered too agnostic, even though they may seem like the safest aroach.