Lecture 1 Probability and Statistics

Lecture 1 Probability ad Statistics Itroductio: l Uderstadig of may physical pheomea deped o statistical ad probabilistic cocepts: H Statistical Mechaics (physics of systems composed of may parts: gases, liquids, solids.) u 1 mole of aythig cotais 6x10 23 particles (Avogadro's umber) u impossible to keep track of all 6x10 23 particles eve with the fastest computer imagiable + resort to learig about the group properties of all the particles + partitio fuctio: calculate eergy, etropy, pressure... of a system H Quatum Mechaics (physics at the atomic or smaller scale) u wavefuctio = probability amplitude + probability of a electro beig located at (x,y,z) at a certai time. l Uderstadig/iterpretatio of experimetal data deped o statistical ad probabilistic cocepts: H how do we extract the best value of a quatity from a set of measuremets? H how do we decide if our experimet is cosistet/icosistet with a give theory? H how do we decide if our experimet is iterally cosistet? H how do we decide if our experimet is cosistet with other experimets? + I this course we will cocetrate o the above experimetal issues! K.K. Ga L1: Probability ad Statistics 1

Defiitio of probability: l Suppose we have N trials ad a specified evet occurs r times. H example: rollig a dice ad the evet could be rollig a 6. u defie probability (P) of a evet (E) occurrig as: P(E) = r/n whe N Æ H examples: six sided dice: P(6) = 1/6 coi toss: P(heads) = 0.5 + P(heads) should approach 0.5 the more times you toss the coi. + for a sigle coi toss we ca ever get P(heads) = 0.5! u by defiitio probability is a o-egative real umber bouded by 0 P 1 H if P = 0 the the evet ever occurs H if P = 1 the the evet always occurs H sum (or itegral) of all probabilities if they are mutually exclusive must = 1. evets are idepedet if: P(A«B) = P(A)P(B) «itersectio,» uio evets are mutually exclusive (disjoit) if: P(A«B) = 0 or P(A»B) = P(A) + P(B) K.K. Ga L1: Probability ad Statistics 2

l Probability ca be a discrete or a cotiuous variable. u Discrete probability: P ca have certai values oly. H examples: tossig a six-sided dice: P(x i ) = P i here x i = 1, 2, 3, 4, 5, 6 ad P i = 1/6 for all x i. tossig a coi: oly 2 choices, heads or tails. H for both of the above discrete examples (ad i geeral) whe we sum over all mutually exclusive possibilities: Â P( x i ) =1 u Cotiuous probability: P ca be ay umber betwee 0 ad 1. H defie a probability desity fuctio, pdf, f ( x) f ( x)dx = dp( x a x + dx) with a a cotiuous variable H probability for x to be i the rage a x b is: H just like the discrete case the sum of all probabilities must equal 1. + Ú f x dx =1 H i P(a x b) = - ( ) b Ú a ( ) f x dx + f(x) is ormalized to oe. probability for x to be exactly some umber is zero sice: x=a ( ) Ú f x dx = 0 x=a Notatio: x i is called a radom variable K.K. Ga L1: Probability ad Statistics 3

l Examples of some commo P(x) s ad f(x) s: Discrete = P(x) biomial Poisso Cotiuous = f(x) uiform, i.e. costat Gaussia expoetial chi square l How do we describe a probability distributio? u mea, mode, media, ad variace u for a cotiuous distributio, these quatities are defied by: Mea Mode Media Variace average most probable 50% poit width of distributio + a + f x m = Ú xf (x)dx = 0 0.5 = Ú f (x)dx s 2 = Ú f (x) x - m - ( ) x x = a - - ( ) 2 dx u for a discrete distributio, the mea ad variace are defied by: x i m = 1 Â s 2 = 1 Â(x i - m) 2 K.K. Ga L1: Probability ad Statistics 4

l Some cotiuous pdf: mode media mea s symmetric distributio (gaussia) For a Gaussia pdf, the mea, mode, ad media are all at the same x. Asymmetric distributio showig the mea, media ad mode For most pdfs, the mea, mode, ad media are at differet locatios. K.K. Ga L1: Probability ad Statistics 5

l Calculatio of mea ad variace: u example: a discrete data set cosistig of three umbers: {1, 2, 3} H average (m) is just: x m = Â i = 1+ 2 + 3 = 2 3 H complicatio: suppose some measuremet are more precise tha others. + if each measuremet x i have a weight w i associated with it: m = x i Â w i / Âw i H variace (s 2 ) or average squared deviatio from the mea is just: s 2 = 1 Â(x i - m) 2 s is called the stadard deviatio + rewrite the above expressio by expadig the summatios: s 2 = 1 È x 2 i + Âm 2 Í Â - 2m Â x i Î = 1 Â x 2 i + m 2-2m 2 = 1 Â x 2 i -m 2 = x 2 - x 2 < > average weighted average variace describes the width of the pdf! i the deomiator would be -1 if we determied the average (m) from the data itself. K.K. Ga L1: Probability ad Statistics 6

H usig the defiitio of m from above we have for our example of {1,2,3}: s 2 = 1 Â x 2 i -m 2 = 4.67-2 2 = 0.67 H the case where the measuremets have differet weights is more complicated: s 2 = m is the weighted mea Â w i (x i - m) 2 2 2 2 / Â w i = Â w i x i / Â w i - m 2 if we calculated m from the data, s 2 gets multiplied by a factor /(-1). u example: a cotiuous probability distributio, f (x) = si 2 x for 0 x 2p H has two modes! H has same mea ad media, but differ from the mode(s). 2p H f(x) is ot properly ormalized: Ú si 2 xdx = p 1 0 + ormalized pdf: f (x) = si 2 2p x / Ú si 2 xdx = 1 0 p si2 x K.K. Ga L1: Probability ad Statistics 7

H for cotiuous probability distributios, the mea, mode, ad media are calculated usig either itegrals or derivatives: m = 1 p 2p Ú 0 x si 2 xdx = p mode : x si2 x = 0 fi p 2, 3p 2 media : 1 p a Ú 0 si2 xdx = 1 2 fi a = p u example: Gaussia distributio fuctio, a cotiuous probability distributio K.K. Ga L1: Probability ad Statistics 8

Accuracy ad Precisio: l Accuracy: The accuracy of a experimet refers to how close the experimetal measuremet is to the true value of the quatity beig measured. l Precisio: This refers to how well the experimetal result has bee determied, without regard to the true value of the quatity beig measured. u just because a experimet is precise it does ot mea it is accurate!! u measuremets of the eutro lifetime over the years: The size of bar reflects the precisio of the experimet H steady icrease i precisio but ay of these measuremets accurate? K.K. Ga L1: Probability ad Statistics 9

Measuremet Errors (Ucertaities) l Use results from probability ad statistics as a way of idicatig how good a measuremet is. u most commo quality idicator: relative precisio = [ucertaity of measuremet]/measuremet H example: we measure a table to be 10 iches with ucertaity of 1 ich. relative precisio = 1/10 = 0.1 or 10% (% relative precisio) u ucertaity i measuremet is usually square root of variace: s = stadard deviatio H usually calculated usig the techique of propagatio of errors. Statistics ad Systematic Errors l Results from experimets are ofte preseted as: N ± XX ± YY N: value of quatity measured (or determied) by experimet. XX: statistical error, usually assumed to be from a Gaussia distributio. With the assumptio of Gaussia statistics we ca say (calculate) somethig about how well our experimet agrees with other experimets ad/or theories. Expect a 68% chace that the true value is betwee N - XX ad N + XX. YY: systematic error. Hard to estimate, distributio of errors usually ot kow. u examples: mass of proto = 0.9382769 ± 0.0000027 GeV mass of W boso = 80.8 ± 1.5 ± 2.4 GeV K.K. Ga L1: Probability ad Statistics 10

l What s the differece betwee statistical ad systematic errors? u statistical errors are radom i the sese that if we repeat the measuremet eough times: XX T 0 u systematic errors do ot T 0 with repetitio. H examples of sources of systematic errors: voltmeter ot calibrated properly a ruler ot the legth we thik is (meter stick might really be < meter!) u because of systematic errors, a experimetal result ca be precise, but ot accurate! l How do we combie systematic ad statistical errors to get oe estimate of precisio? + big problem! u two choices: H s tot = XX + YY add them liearly H s tot = (XX 2 + YY 2 ) 1/2 add them i quadrature l Some other ways of quotig experimetal results u lower limit: the mass of particle X is > 100 GeV u upper limit: the mass of particle X is < 100 GeV +4 u asymmetric errors: mass of particle X = 100-3 GeV K.K. Ga L1: Probability ad Statistics 11