Lecture Overview of some probability distributios. I this lecture we will review several commo distributios that will be used ofte throughtout the class. Each distributio is usually described by its probability fuctio (p.f.) i the case of discrete distributios or probability desity fuctio (p.d.f.) i the case of cotiuous distributios. Let us recall basic defiitios associated with these two cases. Discrete distributios. Suppose that a set X cosists of a coutable or fiite umber of poits, X = {a, a, a 3, }. The a probability distributio P o X ca be defied via a fuctio p(x) o X with the followig properties:. 0 p(a i ),. i= p(a i) =. A fuctio p(x) is called the probability fuctio. If X is a radom variable with distributio P the p(a i ) = P(X = a i ) - a probability that X takes value a i. Give a fuctio : X R, the expectatio of (X) is defied by E(X) = (a i )p(a i ). i= (Absolutely) cotiuous distributios. Cotiuous distributio P o R is defied via a probability desity fuctio (p.d.f.) p(x) o R such that p(x) 0 ad p(x)dx =. If a radom variable X has distributio P the the probability that X takes a value i the iterval [a, b] is give by P(X [a, b]) = b a p(x)dx.
Clearly, i this case for ay a R we have P(X = a) = 0. Give a fuctio : X R, the expectatio of (X) is defied by E(X) = (x)p(x)dx. If we itegrate this by parts, we get, y y y = e dy = ye y ϕ ( y)e dy ϕ ϕ = 0 + y e y dy = EY. ϕ Notatio. The fact that a radom variable X has distributio P will be deoted by X P. Normal (Gaussia) Distributio N(, π ). Normal distributio is a cotiuous distributio o R with p.d.f. p(x) = ϕπ e (x ) for x (, ). Here < <, π > 0 are the parameters of the distributio. Let us recall some properties of a ormal distributio. If a radom variable X has a ormal distributio N(, π ) the the r.v. Y = X N(0, ) π has a stadard ormal distributio N(0, ). To see this, we ca write, X P [a, b] = P(X [aπ +, bπ + ]) = π bα+ aα+ b y ϕ = e dy, a (x ) ϕπ e dx where i the last itegral we made a chage of variables y = (x )/π. This, of course, meas that Y N(0, ). The expectatio of Y is y EY = y e dy = 0 ϕ sice the itegrad is a odd fuctio. To compute the secod momet EY, let us first ote that sice y e is a probability desity fuctio, it itegrates to, i.e. y = e dy. ϕ Thus, the secod momet EY =. The variace of Y is Var(Y ) = EY (EY ) = 0 =.
It is ow easy to compute the mea ad the variace of X = + πy N(, π ), EX = + πey =, EX = E( + πy + π Y ) = + π, Var(X) = EX (EX) = + π = π. Thus, parameter is a mea ad parameter π is a variace of a ormal distributio. Let us recall (without givig a proof) that if we have several, say, idepedet radom variables X i, i, such that X i N( i, π ) the their sum will also have a ormal distributio i X +... + X N( +... +, π +... + π ). Normal distributio appears i oe of the most importat results that oe lears i probability class, amely, a Cetral Limit Theorem (CLT), which states the followig. If X,..., X is a i.i.d. sample such that π = Var(X) <, the (X i EX i ) = (X EX ) d N(0, π ) i= coverges i distributio to a ormal distributio with zero mea ad variace π, where covergece i distributio meas that for ay iterval [a, b], b P (X EX ) [a, b] a x ϕπ e dx. This result ca be geeralized for a sequece of radom variables with differet distributios ad it basically says that the sum of may idepedet radom variables/factors approximately looks like a ormal distributio as log as each factor has a small impact o the total sum. A cosequece of this pheomeo is that a ormal distributio gives a good approximatio for may radom objects that by ature are affected by a sum of may idepedet factors, for example, perso s height or weight, fluctuatios of a stock s price, etc. Beroulli Distributio B(p). This distributio describes a radom variable that ca take oly two possible values, i.e. X = {0, }. The distributio is described by a probability fuctio p() = P(X = ) = p, p(0) = P(X = 0) = p for some p [0, ]. It is easy to check that EX = p, Var(X) = p( p). Biomial Distributio B(, p). This distributio describes a radom variable X that is a umber of successes i trials with probability of success p. I other words, X is a sum of idepedet Beroulli r.v. Therefore, X takes values i X = {0,,..., } ad the distributio is give by a probability fuctio p(k) = P(X = k) = p k ( p) k. k 3
It is easy to check that EX = p, Var(X) = p( p). Expoetial Distributio E(). This is a cotiuous distributio with p.d.f. e x x 0, p(x) = 0 x < 0. Here, > 0 is the parameter of the distributio. Agai, it is a simple calculus exersice to check that EX =, Var(X) =. This distributio has the followig ice property. If a radom variable X E() the probability that X exceeds level t for some t > 0 is P(X t) = P(X [t, )) = e x dx = e t. Give aother s > 0, the coditioal probability that X will exceed level t + s give that it will exceed level t ca be computed as follows: i.e. P(X t + s X t) = t P(X t + s, X t) P(X t + s) = P(X t) P(X t) = e (t+s) /e t = e s = P(X s), P(X t + s X t) = P(X s). If X represet a lifetime of some object i some radom coditios, the the above property meas that the chace that X will live loger the t + s give that it will live loger tha t is the same as the chace that X will live loger tha t i the first place. Or, i other words, if X is alive at time t the it is like ew. Therefore, some atural examples that ca be described by expoetial distributio are the lifetime of high quality products (or, possibly, soldiers i combat). Poisso Distributio (). This is a discrete distributio with X = {0,,, 3,...}, k p(k) = P(X = k) = e for k = 0,,,,... k! It is a exercise to show that EX =, Var(X) =. Poisso distributio could be used to describe the followig radom objects: the umber of stars i a radom area of the space; umber of misprits i a typed page; umber of wrog coectios to your phoe umber; distributio of bacteria o some surface or weed i the field. All these examples share some commo properties that give rise to a Poisso distributio. Suppose that we cout a umber of radom objects i a certai regio T ad this coutig process has the followig properties: 4
. Average umber of objects i ay regio S T is proportioal to the size of S, i.e. ECout(S) = S. Here S deotes the size of S, i.e. legth, area, volume, etc. Parameter > 0 represets the itesity of the process.. Couts o disjoit regios are idepedet. 3. Chace to observe more tha oe object i a small regio is very small, i.e. P(Cout(S) ) becomes small whe the size S gets small. PSfrag replacemets We will show that uder these assumptios will imply that the umber Cout(T ) of objects i the regio T has Poisso distributio ( T ) with parameter T. 0 T T X X....... X Couts o small subitervals Figure.: Poisso Distributio For simplicity, let us assume that the regio T is a iterval [0, T ] of legth T. Let us split this iterval ito a large umber of small equal subitervals of legth T/ ad deote by X i the umber of radom objects i the ith subiterval, i =,...,. By the first property above, T EX i =. O the other had, by defiitio of expectatio EX i = kp(x i = k) = 0 + P(X i = ) + α, k0 where α = k kp(x i = k), ad by the last property above we assume that α becomes small with, sice the probability to observe more that two objects o the iterval of size T/ becomes small as becomes large. Combiig two equatios above gives, P(X i = ) T. Also, sice by the last property the probability that ay cout X i is is small, i.e. T P(at least oe X i ) o 0 as, Cout(T ) = X + + X has approximately biomial distributio B(, T /) ad we ca write T k T k P(Cout(T ) = X + + X = k) k (T ) k e T. k! The last limit is a simple calculus exercise ad this is also a famous Poisso approximatio of biomial distributio taught i every probability class. 5
Uiform Distributio U[0, λ]. This distributio has probability desity fuctio, x [0, λ], p(x) = λ 0, otherwise. Matlab review of probability distributios. Matlab Help/Statistics Toolbox/Probability Distributios. Each distributio i Matlab has a ame, for example, ormal distributio has a ame orm. Addig a suffix defies a fuctio associated with this distributio. For example, ormrd geerates radom umbers from distributio orm, ormpdf gives p.d.f., ormcdf gives c.d.f., ormfit fits the ormal distributio for a give dataset (we will look at this last type of fuctios whe we discuss Maximum Likelihood Estimators). Please, look at each fuctio for its sytax, iput, output, etc. Type help ormrd to quickly see how the ormal radom umber geerator works. Also, there is a graphic user iterface tools like disttool (to ru it just type disttool i the mai Matlab widow) that allows you to play with differet distributios, or radtool that geerates ad visualizes radom samples from differet distributios. 6