10-705/ Intermediate Statistics

Size: px
Start display at page:

Download "10-705/36-705 Intermediate Statistics"

Transcription

1 0-705/ Itermediate Statistics Larry Wasserma Fall 0 Week Class I Class II Day III Class IV Syllabus August 9 Review Review, Iequalities Iequalities September 5 No Class O P HW [sol] VC Theory September Covergece Covergece HW [sol] Test I September 9 Covergece Addedum Sufficiecy HW 3 [sol] Sufficiecy September 6 Likelihood Poit Estimatio HW 4 [sol] Miimax Theory October 3 Miimax Summary Asymptotics HW 5 [sol] Asymptotics October 0 Asymptotics Review Test II October 7 Testig Testig HW 6 [sol] Mid-semester Break October 4 Testig Cofidece Itervals HW 7 [sol] Cofidece Itervals October 3 Noparametric Noparametric Review November 7 Test III No Class HW 8 [sol] The Bootstrap November 4 The Bootstrap Bayesia Iferece HW 9 [sol] Bayesia Iferece November No Class No Class No Class November 8 Predictio Predictio HW 0 [sol] Model Selectio December 5 Multiple Testig Causatio Idividual Sequeces Practice Fial

2 0-705/36-705: Itermediate Statistics, Fall 00 Professor Larry Wasserma Office Baker Hall 8 A larry@stat.cmu.edu Phoe Office hours Modays, :30-:30 Class Time Mo-Wed-Fri :30 - :0 Locatio GHC 4307 TAs Wajie Wag ad Xiaoli Yag Website larry/=stat705 Objective This course will cover the fudametals of theoretical statistics. Topics iclude: poit ad iterval estimatio, hypothesis testig, data reductio, covergece cocepts, Bayesia iferece, oparametric statistics ad bootstrap resamplig. We will cover Chapters 5 0 from Casella ad Berger plus some supplemetary material. This course is excellet preparatio for advaced work i Statistics ad Machie Learig. Textbook Casella, G. ad Berger, R. L. (00). Statistical Iferece, d ed. Backgroud I assume that you are familiar with the material i Chapters - 4 of Casella ad Berger. Other Recommeded Texts Wasserma, L. (004). All of Statistics: A cocise course i statistical iferece. Bickel, P. J. ad Doksum, K. A. (977). Mathematical Statistics. Rice, J. A. (977). Mathematical Statistics ad Data Aalysis, Secod Editio. Gradig 0% : Test I (Sept. 6) o the material of Chapters 4 0% : Test II (October 4) 0% : Test III (November 7) 30% : Fial Exam (Date set by the Uiversity) 0% : Homework

3 Exams All exams are closed book. Do NOT buy a plae ticket util the fial exam has bee scheduled. Homework Homework assigmets will be posted o the web. Had i homework to Mari Alice Mcshae, 8 Baker Hall by 3 pm Thursday. No late homework. Readig ad Class Notes Class otes will be posted o the web regularly. Brig a copy to class. The otes are ot meat to be a substitute for the book ad hece are geerally quite terse. Read both the otes ad the text before lecture. Sometimes I will cover topics from other sources. Group Work You are ecouraged to work with others o the homework. But write-up your fial solutios o your ow.. Quick Review of Chapters -4. Iequalities 3. Vapik-Chervoekis Theory 4. Covergece 5. Sufficiecy 6. Likelihood 7. Poit Estimatio 8. Miimax Theory 9. Asymptotics 0. Robustess. Hypothesis Testig. Cofidece Itervals 3. Noparametric Iferece 4. Predictio ad Classificatio 5. The Bootstrap 6. Bayesia Iferece 7. Markov Chai Mote Carlo 8. Model Selectio Course Outlie

4 Lecture Notes Quick Review of Basic Probability (Casella ad Berger Chapters -4) Probability Review Chapters -4 are a review. I will assume you have read ad uderstood Chapters -4. Let us recall some of the key ideas.. Radom Variables A radom variable is a map X from a probability space Ω to R. We write P (X A) = P ({ω Ω : X(ω) A}) ad we write X P to mea that X has distributio P. fuctio (cdf) of X is F X (x) = F (x) = P (X x). If X is discrete, its probability mass fuctio (pmf) is The cumulative distributio p X (x) = p(x) = P (X = x). If X is cotiuous, the its probability desity fuctio fuctio (pdf) satisfies P (X A) = p X (x)dx = p(x)dx ad p X (x) = p(x) = F (x). The followig are all equivalet: A X P, X F, X p. Suppose that X P ad Y Q. We say that X ad Y have the same distributio if A P (X A) = Q(Y A) for all A. I other words, P = Q. I that case we say that X ad Y are equal i distributio ad we write X = d Y. It ca be show that X = d Y if ad oly if F X (t) = F Y (t) for all t.. Expected Values The mea or expected value of g(x) is { g(x)p(x)dx if X is cotiuous E (g(x)) = g(x)df (x) = g(x)dp (x) = j g(x j)p(x j ) if X is discrete.

5 Recall that:. E( k j= c jg j (X)) = k j= c je(g j (X)).. If X,..., X are idepedet the ( ) E X i = i= i E (X i ). 3. We ofte write µ = E(X). 4. σ = Var (X) = E ((X µ) ) is the Variace. 5. Var (X) = E (X ) µ. 6. If X,..., X are idepedet the ( ) Var a i X i = i= i a i Var (X i ). 7. The covariace is Cov(X, Y ) = E((X µ x )(Y µ y )) = E(XY ) µ X µ Y ad the correlatio is ρ(x, Y ) = Cov(X, Y )/σ x σ y. Recall that ρ(x, Y ). The coditioal expectatio of Y give X is the radom variable E(Y X) whose value, whe X = x is E(Y X = x) = y p(y x)dy where p(y x) = p(x, y)/p(x). The Law of Total Expectatio or Law of Iterated Expectatio: E(Y ) = E [ E(Y X) ] = E(Y X = x)p X (x)dx. The Law of Total Variace is Var(Y ) = Var [ E(Y X) ] + E [ Var(Y X) ]. The th momet is E (X ) ad the th cetral momet is E ((X µ) ). The momet geeratig fuctio (mgf) is M X (t) = E ( e tx). The, M () X (t) t=0 = E (X ). If M X (t) = M Y (t) for all t i a iterval aroud 0 the X d = Y.

6 .3 Expoetial Families A family of distributios {p(x; θ) : θ Θ} is called a expoetial family if { k } p(x; θ) = h(x)c(θ) exp w i (θ)t i (x). Example X Poisso(λ) is expoetial family sice p(x) = P (X = x) = e λ λ x = x! x! e λ exp{log λ x}. Example X U (0, θ) is ot a expoetial family. The desity is where I A (x) = if x A ad 0 otherwise. i= p X (x) = θ I (0,θ)(x) We ca rewrite a expoetial family i terms of a atural parameterizatio. For k = we have p(x; η) = h(x) exp{ηt(x) A(η)} where A(η) = log For example a Poisso ca be writte as h(x) exp{ηt(x)}dx. p(x; η) = exp{ηx e η }/x! where the atural parameter is η = log λ. Let X have a expoetial family distributio. The E (t(x)) = A (η), Practice Problem: Prove the above result..4 Trasformatios Var (t(x)) = A (η). Let Y = g(x). The F Y (y) = P(Y y) = P(g(X) y) = where The p Y (y) = F Y (y). If g is mootoic, the where h = g. A y = {x : g(x) y}. p Y (y) = p X (h(y)) dh(y) dy 3 A(y) p X (x)dx

7 Example 3 Let p X (x) = e x for x > 0. Hece F X (x) = e x. Let Y = g(x) = log X. The ad p Y (y) = e y e ey for y R. F Y (y) = P (Y y) = P (log(x) y) = P (X e y ) = F X (e y ) = e ey Example 4 Practice problem. Let X be uiform o (, ) ad let Y = X. Fid the desity of Y. Let Z = g(x, Y ). For exampe, Z = X + Y or Z = X/Y. The we fid the pdf of Z as follows:. For each z, fid the set A z = {(x, y) : g(x, y) z}.. Fid the CDF F Z (z) = P (Z z) = P (g(x, Y ) z) = P ({(x, y) : g(x, y) z}) = p X,Y (x, y)dxdy. A z 3. The pdf is p Z (z) = F Z (z). Example 5 Practice problem. Let (X, Y ) be uiform o the uit square. Let Z = X/Y. Fid the desity of Z..5 Idepedece Recall that X ad Y are idepedet if ad oly if for all A ad B. P(X A, Y B) = P(X A)P(Y B) Theorem 6 Let (X, Y ) be a bivariate radom vector with p X,Y (x, y). X ad Y are idepedet iff p X,Y (x, y) = p X (x)p Y (y). X,..., X are idepedet if ad oly if P(X A,..., X A ) = P(X i A i ). Thus, p X,...,X (x,..., x ) = i= p X i (x i ). If X,..., X are idepedet ad idetically distributed we say they are iid (or that they are a radom sample) ad we write X,..., X P or X,..., X F or X,..., X p. 4 i=

8 .6 Importat Distributios X N(µ, σ ) if p(x) = σ π e (x µ) /(σ). If X R d the X N(µ, Σ) if ( p(x) = (π) d/ Σ exp ) (x µ)t Σ (x µ). X χ p if X = p j= Z j where Z,..., Z p N(0, ). X Beroulli(θ) if P(X = ) = θ ad P(X = 0) = θ ad hece p(x) = θ x ( θ) x x = 0,. X Biomial(θ) if p(x) = P(X = x) = ( ) θ x ( θ) x x x {0,..., }. X Uiform(0, θ) if p(x) = I(0 x θ)/θ..7 Sample Mea ad Variace The sample mea is ad the sample variace is X = X i, S = i (X i X). Let X,..., X be iid with µ = E(X i ) = µ ad σ = Var(X i ) = σ. The E(X) = µ, Theorem 7 If X,..., X N(µ, σ ) the (a) X N(µ, σ ) i Var(X) = σ, E(S ) = σ. (b) ( )S σ χ (c) X ad S are idepedet 5

9 .8 Delta Method If X N(µ, σ ), Y = g(x) ad σ is small the To see this, ote that Y N(g(µ), σ (g (µ)) ). Y = g(x) = g(µ) + (X µ)g (µ) + (X µ) g (ξ) for some ξ. Now E((X µ) ) = σ which we are assumig is small ad so Y = g(x) g(µ) + (X µ)g (µ). Thus Hece, E(Y ) g(µ), Var(Y ) (g (µ)) σ. g(x) N ( g(µ), (g (µ)) σ ). Appedix: Useful Facts Facts about sums i= i = (+). i= i = (+)(+) 6. Geometric series: a + ar + ar +... = a, for 0 < r <. r Partial Geometric series a + ar + ar ar = a( r ) r. Biomial Theorem x=0 ( ) a x = ( + a), x x=0 ( ) a x b x = (a + b). x Hypergeometric idetity x=0 ( )( ) a b = x x ( a + b ). 6

10 Commo Distributios Discrete Uiform X U (,..., N) X takes values x =,,..., N P (X = x) = /N E (X) = x xp (X = x) = x x N = N E (X ) = x x P (X = x) = x x N = N Biomial X Bi(, p) X takes values x = 0,,..., P (X = x) = ( ) x p x ( p) x Hypergeometric X Hypergeometric(N, M, K) P (X = x) = (M x )( N M K x ) ( N K) Geometric X Geom(p) P (X = x) = ( p) x p, x =,,... E (X) = x x( p)x = p x Poisso X Poisso(λ) P (X = x) = e λ λ x x! x = 0,,,... E (X) = Var (X) = λ N(N+) = (N+) N(N+)(N+) 6 d ( ( dp p)x ) = p p =. p p M X (t) = x=0 etx e λ λ x = e λ (λe t ) x x! x=0 = e λ e λet = e λ(et ). x! 7

11 E (X) = λe t e λ(et ) t=0 = λ. Use mgf to show: if X Poisso(λ ), X Poisso(λ ), idepedet the Y = X + X Poisso(λ + λ ). Cotiuous Distributios Normal X N(µ, σ ) p(x) = πσ exp{ σ (x µ) }, x R mgf M X (t) = exp{µt + σ t /}. E (X) = µ Var (X) = σ. e.g., If Z N(0, ) ad X = µ + σz, the X N(µ, σ ). Show this... Proof. which is the mgf of a N(µ, σ ). Alterative proof: M X (t) = E ( e tx) = E ( e t(µ+σz)) = e tµ E ( e tσz) = e tµ M Z (tσ) = e tµ e (tσ) / = e tµ+t σ / F X (x) = P (X x) = P (µ + σz x) = P ( ) x µ = F Z σ ( ) x µ p X (x) = F X(x) = p Z σ σ { = exp ( ) } x µ π σ σ { = exp ( ) } x µ, πσ σ which is the pdf of a N(µ, σ ). ( Z x µ ) σ 8

12 Gamma X Γ(α, β). p X (x) = Γ(α)β α x α e x/β, x a positive real. Γ(α) = 0 x α e x/β dx. β α Importat statistical distributio: χ p = Γ( p, ). χ p = p i= X i, where X i N(0, ), iid. Expoetial X expoe(β) p X (x) = β e x/β, x a positive real. expoe(β) = Γ(, β). e.g., Used to model waitig time of a Poisso Process. Suppose N is the umber of phoe calls i hour ad N P oisso(λ). Let T be the time betwee cosecutive phoe calls, the T expoe(/λ) ad E (T ) = (/λ). If X,..., X are iid expoe(β), the i X i Γ(, β). Memoryless Property: If X expoe(β), the P (X > t + s X > t) = P (X > s). Liear Regressio Model the respose (Y ) as a liear fuctio of the parameters ad covariates (x) plus radom error (ɛ). Y i = θ(x, β) + ɛ i where θ(x, β) = Xβ = β 0 + β x + β x β k x k. 9

13 Geeralized Liear Model Model the atural parameters as liear fuctios of the the covariates. Example: Logistic Regressio. P (Y = X = x) = I other words, Y X = x Bi(, p(x)) ad where η(x) = β T x eβt x + e βt x. ( ) p(x) η(x) = log. p(x) Logistic Regressio cosists of modellig the atural parameter, which is called the log odds ratio, as a liear fuctio of covariates. Locatio ad Scale Families, CB 3.5 Let p(x) be a pdf. Locatio family : {p(x µ) = p(x µ) : µ R} Scale family : { p(x σ) = ( x ) } σ f : σ > 0 σ Locatio Scale family : { p(x µ, σ) = ( ) } x µ σ f : µ R, σ > 0 σ () Locatio family. Shifts the pdf. e.g., Uiform with p(x) = o (0, ) ad p(x θ) = o (θ, θ + ). e.g., Normal with stadard pdf the desity of a N(0, ) ad locatio family pdf N(θ, ). () Scale family. Stretches the pdf. e.g., Normal with stadard pdf the desity of a N(0, ) ad scale family pdf N(0, σ ). (3) Locatio-Scale family. Stretches ad shifts the pdf. e.g., Normal with stadard pdf the desity of a N(0, ) ad locatio-scale family pdf N(θ, σ ), i.e., x µ p( ). σ σ 0

14 Multiomial Distributio The multivariate versio of a Biomial is called a Multiomial. Cosider drawig a ball from a ur with has balls with k differet colors labeled color, color,..., color k. Let p = (p, p,..., p k ) where j p j = ad p j is the probability of drawig color j. Draw balls from the ur (idepedetly ad with replacemet) ad let X = (X, X,..., X k ) be the cout of the umber of balls of each color draw. We say that X has a Multiomial (, p) distributio. The pdf is ( ) p(x) = p x... p x k k x,..., x. k Multivariate Normal Distributio We ow defie the multivariate ormal distributio ad derive its basic properties. We wat to allow the possibility of multivariate ormal distributios whose covariace matrix is ot ecessarily positive defiite. Therefore, we caot defie the distributio by its desity fuctio. Istead we defie the distributio by its momet geeratig fuctio. (The reader may woder how a radom vector ca have a momet geeratig fuctio if it has o desity fuctio. However, the momet geeratig fuctio ca be defied usig more geeral types of itegratio. I this book, we assume that such a defiitio is possible but fid the momet geeratig fuctio by elemetary meas.) We fid the desity fuctio for the case of positive defiite covariace matrix i Theorem 5. Lemma 8 (a). Let X = AY + b The The M X (t) = exp (b t)m Y (A t). (b). Let c be a costat. Let Z = cy. The (c). Let Y = M Z (t) = M Y (ct). ( Y Y ), t = ( t t ( ) t M Y (t ) = M Y. 0 (d). Y ad Y are idepedet if ad oly if ( ) ( ( ) t t M Y = M t Y )M 0 Y. 0t )

15 We start with Z,..., Z idepedet radom variables such that Z i N (0, ). Let Z = (Z,..., Z ). The E(Z) = 0, cov(z) = I, M Z (t) = exp t i = exp t t. () Let µ be a vector ad A a matrix. Let Y = AZ + µ. The E(Y) = µ cov(y) = AA. () Let Σ = AA. We ow show that the distributio of Y depeds oly o µ ad Σ. The momet geeratig fuctio M Y (t) is give by ( M Y (t) = exp(µ t)m Z (A t) = exp µ t + t (A ) ( ) A)t = exp µ t + t Σt. With this motivatio i mid, let µ be a vector, ad let Σ be a oegative defiite matrix. The we say that the -dimesioal radom vector Y has a -dimesioal ormal distributio with mea vector µ, ad covariace matrix Σ, if Y has momet geeratig fuctio ( ) M Y (t) = exp µ t + t Σt. (3) We write Y N (µ, Σ). The followig theorem summarizes some elemetary facts about multivariate ormal distributios. Theorem 9 (a). If Y N (µ, Σ), the E(Y) = µ, cov(y) = Σ. (b). If Y N (µ, Σ), c is a scalar, the cy N (cµ, c Σ). (c). Let Y N (µ, Σ). If A is p, b is p, the AY + b N p (Aµ + b, AΣA ). (d). Let µ be ay vector, ad let Σ be ay oegative defiite matrix. The there exists Y such that Y N (µ, Σ). Proof. (a). This follows directly from () above. (b) ad (c). Homework. (d). Let Z,..., Z be idepedet, Z i N(0, ). Let Z = (Z,..., Z ). It is easily verified that Z N (0, I). Let Y = Σ / Z + µ. By part b, above, Y N (Σ / 0 + µ, Σ). We have ow show that the family of ormal distributios is preserved uder liear operatios o the radom vectors. We ow show that it is preserved uder takig margial ad coditioal distributios.

16 Theorem 0 Suppose that Y N (µ, Σ). Let ( ) ( ) Y µ Y =, µ =, Σ = Y µ ( Σ Σ Σ Σ where Y ad µ are p, ad Σ is p p. (a). Y N p (µ, Σ ), Y N p (µ, Σ ). (b). Y ad Y are idepedet if ad oly if Σ = 0. (c). If Σ > 0, the the coditio distributio of Y give Y is Y Y N p (µ + Σ Σ (Y µ ), Σ Σ Σ Σ ). Proof. (a). Let t = (t, t ) where t is p. The joit momet geeratig fuctio of Y ad Y is Therefore, M Y (t) = exp(µ t + µ t + (t Σ t + t Σ t + t Σ t + t Σ t )). M Y ( t 0 ) = exp(µ t + ( ) t Σ t ), M Y = exp(µ 0t t + t Σ t ). By Lemma c, we see that Y N p (µ, Σ ), Y N p (µ, Σ ). (b). We ote that ( ( ) t M Y (t) = M Y )M 0 Y 0t if ad oly if t Σ t + t Σ t = 0. Sice Σ is symmetric ad t Σ t is a scalar, we see that t Σ t = t Σ t. Fially, t Σ t = 0 for all t R p, t R p if ad oly if Σ = 0, ad the result follows from Lemma d. (c). We first fid the joit distributio of X = Y Σ Σ Y ad Y. ( ) ( X I Σ Σ )( ) Y = 0 I Y Therefore, by Theorem c, the joit distributio of X ad Y is ( ) (( X µ Σ Σ ) ( µ Σ Σ Σ )) Σ 0 N, Y µ 0 Σ ad hece X ad Y are idepedet. Therefore, the coditioal distributio of X give Y is the same as the margial distributio of X, Y ). X Y N p (µ Σ Σ µ, Σ Σ Σ Σ ). 3

17 Sice Y is just a costat i the coditioal distributio of X give Y we have, by Theorem c, that the coditioal distributio of Y = X + Σ Σ Y give Y is Y Y N p (µ Σ Σ µ + Σ Σ Y, Σ Σ Σ Σ ) Note that we eed Σ > 0 i part c so that Σ exists. Lemma Let Y N (µ, σ I), where Y = (Y,..., Y ), µ = (µ,..., µ ) ad σ > 0 is a scalar. The the Y i are idepedet, Y i N (µ, σ ) ad ( ) Y = Y Y µ χ µ σ σ. σ Proof. Let Y i be idepedet, Y i N (µ i, σ ). The joit momet geeratig fuctio of the Y i is M Y (t) = (exp(µ i t i + σ t i )) = exp(µ t + σ t t) i= which is the momet geeratig fuctio of a radom vector that is ormally distributed with mea vector µ ad covariace matrix σ I. Fially, Y Y = ΣYi, µ µ = Σµ i ad Y i /σ N (µ i /σ, ). Therefore Y Y/σ χ (µ µ/σ ) by the defiitio of the ocetral χ distributio. We are ow ready to derive the osigular ormal desity fuctio. Theorem Let Y N (µ, Σ), with Σ > 0. The Y has desity fuctio ( p Y (y) = exp ) (π) / Σ / (y µ) Σ (y µ). Proof. We could derive this by fidig the momet geeratig fuctio of this desity ad showig that it satisfied (3). We would also have to show that this fuctio is a desity fuctio. We ca avoid all that by startig with a radom vector whose distributio we kow. Let Z N (0, I). Z = (Z,..., Z ). The the Z i are idepedet ad Z i N (0, ), by Lemma 4. Therefore, the joit desity of the Z i is ( p Z (z) = exp ) ( (π) / z i = exp ) (π) / z z. i= Let Y = Σ / Z + µ. By Theorem c, Y N (µ, Σ). Also Z = Σ / (Y µ), ad the trasformatio from Z to Y is therefore ivertible. Furthermore, the Jacobia of this iverse trasformatio is just Σ / = Σ /. Hece the desity of Y is p Y (y) = p Z (Σ / (y µ)) Σ ( / = exp ) Σ / (π) / (y µ) Σ (y µ). 4

18 We ow prove a result that is useful later i the book ad is also the basis for Pearso s χ tests. Theorem 3 Let Y N (µ, Σ), Σ > 0. The (a). Y Σ Y χ (µ Σ µ). (b). (Y µ) Σ (Y µ) χ (0). Proof. (a). Let Z = Σ / Y N (Σ / µ, I). By Lemma 4, we see that (b). Follows fairly directly. Z Z = Y Σ Y χ (µ Σ µ). The Spherical Normal For the first part of this book, the most importat class of multivariate ormal distributio is the class i which Y N (µ, σ I). We ow show that this distributio is spherically symmetric about µ. A rotatio about µ is give by X = Γ(Y µ) + µ, where Γ is a orthogoal matrix (i.e., ΓΓ = I). By Theorem, X N (µ, σ I), so that the distributio is uchaged uder rotatios about µ. We therefore call this ormal distributio the spherical ormal distributio. If σ = 0, the P (Y = µ) =. Otherwise its desity fuctio (by Theorem 4) is p Y (y) = (π) / σ exp ( y µ σ By Lemma 4, we ote that the compoets of Y are idepedetly ormally distributed with commo variace σ. I fact, the spherical ormal distributio is the oly multivariate distributio with idepedet compoets that is spherically symmetric. ). 5

19 Probability Iequalities Lecture Notes Iequalities are useful for boudig quatities that might otherwise be hard to compute. They will also be used i the theory of covergece. Theorem (The Gaussia Tail Iequality) Let X N(0, ). The If X,..., X N(0, ) the / P( X > ɛ) e ɛ. ɛ P( X > ɛ) ɛ e ɛ /. Proof. The desity of X is φ(x) = (π) / e x /. Hece, By symmetry, P(X > ɛ) = ɛ = ɛ φ(s)ds ɛ ɛ φ (s)ds = φ(ɛ) ɛ / P( X > ɛ) e ɛ. ɛ ɛ s φ(s)ds / e ɛ. ɛ Now let X,..., X N(0, ). The X = i= X i N(0, /). Thus, X d = / Z where Z N(0, ) ad P( X > ɛ) = P( / Z > ɛ) = P( Z > ɛ) ɛ e ɛ /.

20 Theorem (Markov s iequality) Let X be a o-egative radom variable ad suppose that E(X) exists. For ay t > 0, Proof. Sice X > 0, P(X > t) E(X). () t E(X) = 0 x p(x)dx = x p(x)dx t t x p(x)dx + 0 t t t xp(x)dx p(x)dx = t P(X > t). Theorem 3 (Chebyshev s iequality) Let µ = E(X) ad σ = Var(X). The, P( X µ t) σ t ad P( Z k) k () where Z = (X µ)/σ. I particular, P( Z > ) /4 ad P( Z > 3) /9. Proof. We use Markov s iequality to coclude that P( X µ t) = P( X µ t ) The secod part follows by settig t = kσ. E(X µ) t = σ t. If X,..., X Beroulli(p) the ad X = i= X i The, Var(X ) = Var(X )/ = p( p)/ ad sice p( p) 4 P( X p > ɛ) Var(X ) ɛ = for all p. Hoeffdig s Iequality p( p) ɛ 4ɛ Hoeffdig s iequality is similar i spirit to Markov s iequality but it is a sharper iequality. We begi with the followig importat result. Lemma 4 Supppose that E(X) = 0 ad that a X b. The E(e tx ) e t (b a) /8.

21 Recall that a fuctio g is covex if for each x, y ad each α [0, ], g(αx + ( α)y) αg(x) + ( α)g(y). Proof. Sice a X b, we ca write X as a covex combiatio of a ad b, amely, X = αb + ( α)a where α = (X a)/(b a). By the covexity of the fuctio y e ty we have e tx αe tb + ( α)e ta = X a b a etb + b X b a eta. Take expectatios of both sides ad use the fact that E(X) = 0 to get Ee tx a b a etb + b b a eta = e g(u) (3) where u = t(b a), g(u) = γu + log( γ + γe u ) ad γ = a/(b a). Note that g(0) = g (0) = 0. Also, g (u) /4 for all u > 0. By Taylor s theorem, there is a ξ (0, u) such that g(u) = g(0) + ug (0) + u g (ξ) = u g (ξ) u 8 = t (b a). 8 Hece, Ee tx e g(u) e t (b a) /8. Next, we eed to use Cheroff s method. Lemma 5 Let X be a radom variable. The Proof. For ay t > 0, P(X > ɛ) if t 0 e tɛ E(e tx ). P(X > ɛ) = P(e X > e ɛ ) = P(e tx > e tɛ ) e tɛ E(e tx ). Sice this is true for every t 0, the result follows. Theorem 6 (Hoeffdig s Iequality) Let Y,..., Y be iid observatios such that E(Y i ) = µ ad a Y i b where a < 0 < b. The, for ay ɛ > 0, P ( Y µ ɛ ) e ɛ /(b a). (4) Proof. Without los of geerality, we asume that µ = 0. First we have P( Y ɛ) = P(Y ɛ) + P(Y ɛ) = P(Y ɛ) + P( Y ɛ). 3

22 Next we use Cheroff s method. For ay t > 0, we have, from Markov s iequality, that ( ) ( P(Y ɛ) = P Y i ɛ = P e P ) i= Y i e ɛ i= ( = P e t P i= Y i e tɛ ) e tɛ E ( e t P ) i= Y i = e tɛ i E(e ty i ) = e tɛ (E(e ty i )). From Lemma 4, E(e ty i ) e t (b a) /8. So P(Y ɛ) e tɛ e t (b a) /8. This is miimized by settig t = 4ɛ/(b a) givig P(Y ɛ) e ɛ /(b a). Applyig the same argumet to P( Y ɛ) yields the result. Example 7 Let X,..., X Beroulli(p). Chebyshev s iequality yields Accordig to Hoeffdig s iequality, which decreases much faster. P( X p > ɛ) 4ɛ. P( X p > ɛ) e ɛ Corollary 8 If X, X,..., X are idepedet with P(a X i b) = ad commo mea µ, the, with probability at least δ, ( ) c X µ log (5) δ where c = (b a). 3 The Bouded Differece Iequality So far we have focused o sums of radom variables. The followig result exteds Hoeffdig s iequality to more geeral fuctios g(x,..., x ). Here we cosider McDiarmid s iequality, also kow as the Bouded Differece iequality. 4

23 Theorem 9 (McDiarmid) Let X,..., X be idepedet radom variables. Suppose that sup g(x,..., x i, x i, x i+,..., x ) g(x,..., x i, x i, x i+,..., x ) c i (6) x,...,x,x i for i =,...,. The ( P g(x,..., X ) E(g(X,..., X )) ɛ ) } exp { ɛ. (7) i= c i Proof. Let V i = E(g X,..., X i ) E(g X,..., X i ). The g(x,..., X ) E(g(X,..., X )) = i= V i ad E(V i X,..., X i ) = 0. Usig a similar argumet as i Hoeffdig s Lemma we have, E(e tv i X,..., X i ) e t c i /8. (8) Now, for ay t > 0, ( ) P (g(x,..., X ) E(g(X,..., X )) ɛ) = P V i ɛ ( = P = e tɛ E e t P i= ) ( i= V i e tɛ e tɛ E e t P ) i= V i ( )) (e tv X,..., X e t P i= V i E e tɛ e t c /8 E (e t P ) i= V i. e tɛ e t P i= c i. The result follows by takig t = 4ɛ/ i= c i. Example 0 If we take g(x,..., x ) = i= x i the we get back Hoeffdig s iequality. Example Suppose we throw m balls ito bis. What fractio of bis are empty? Let Z be the umber of empty bis ad let F = Z/ be the fractio of empty bis. We ca write Z = i= Z i where Z i = of bi i is empty ad Z i = 0 otherwise. The µ = E(Z) = E(Z i ) = ( /) m = e m log( /) e m/ i= ad θ = E(F ) = µ/ e m/. How close is Z to µ? Note that the Z i s are ot idepedet so we caot just apply Hoeffdig. Istead, we proceed as follows. 5

24 Defie variables X,..., X m where X s = i if ball s falls ito bi i. The Z = g(x,..., X m ). If we move oe ball ito a differet bi, the Z ca chage by at most. Hece, (6) holds with c i = ad so P( Z µ > t) e t /m. Recall that he fractio of empty bis is F = Z/m with mea θ = µ/. We have P( F θ > t) = P( Z µ > t) e t /m. 4 Bouds o Expected Values Theorem (Cauchy-Schwartz iequality) If X ad Y have fiite variaces the E XY E(X )E(Y ). (9) The Cauchy-Schwarz iequality ca be writte as Cov (X, Y ) σ Xσ Y. Recall that a fuctio g is covex if for each x, y ad each α [0, ], g(αx + ( α)y) αg(x) + ( α)g(y). If g is twice differetiable ad g (x) 0 for all x, the g is covex. It ca be show that if g is covex, the g lies above ay lie that touches g at some poit, called a taget lie. A fuctio g is cocave if g is covex. Examples of covex fuctios are g(x) = x ad g(x) = e x. Examples of cocave fuctios are g(x) = x ad g(x) = log x. Theorem 3 (Jese s iequality) If g is covex, the Eg(X) g(ex). (0) If g is cocave, the Eg(X) g(ex). () Proof. Let L(x) = a+bx be a lie, taget to g(x) at the poit E(X). Sice g is covex, it lies above the lie L(x). So, Eg(X) EL(X) = E(a + bx) = a + be(x) = L(E(X)) = g(ex). Example 4 From Jese s iequality we see that E(X ) (EX). 6

25 Example 5 (Kullback Leibler Distace) Defie the Kullback-Leibler distace betwee two desities p ad q by ( ) p(x) D(p, q) = p(x) log dx. q(x) Note that D(p, p) = 0. We will use Jese to show that D(p, q) 0. Let X f. The ( ) ( ) q(x) q(x) D(p, q) = E log log E = log p(x) q(x) dx = log q(x)dx = log() = 0. p(x) p(x) p(x) So, D(p, q) 0 ad hece D(p, q) 0. Example 6 It follows from Jese s iequality that 3 types of meas ca be ordered. Assume that a,..., a are positive umbers ad defie the arithmetic, geometric ad harmoic meas as a A = (a a ) The a H a G a A. a G = (a... a ) / a H = a a ). Suppose we have a expoetial boud o P(X > ɛ). I that case we ca boud E(X ) as follows. Theorem 7 Suppose that X 0 ad that for every ɛ > 0, for some c > 0 ad c > /e. The, P(X > ɛ) c e c ɛ () where C = ( + log(c ))/c. E(X ) C. (3) Proof. Recall that for ay oegative radom variable Y, E(Y ) = P(Y t)dt. 0 Hece, for ay a > 0, E(X ) = 0 P(X t)dt = a 0 P(X t)dt + Equatio () implies that P(X > t) c e c t. Hece, E(X ) a + a P(X t)dt = a + a a P(X t)dt a + P(X t)dt a + c 7 a a P(X t)dt. e c t dt = a + c e c a c.

26 Set a = log(c )/(c ) ad coclude that Fially, we have E(X ) log(c ) c + c = + log(c ) c. E(X ) E(X ) + log(c ) c. Now we cosider boudig the maximum of a set of radom variables. Theorem 8 Let X,..., X be radom variables. Suppose there exists σ > 0 such that E(e tx i ) e tσ / for all t > 0. The ( ) E max X i σ log. (4) i Proof. By Jese s iequality, { ( )} exp te max X i E i ( { }) exp t max X i i ) ( = E max exp {tx i} i E (exp {tx i }) e t σ /. i= Thus, ( ) E max X i log + tσ i t. The result follows by settig t = log /σ. 5 O P ad o P I statisics, probability ad machie learig, we make use of o P ad O P otatio. Recall first, that a = o() meas that a 0 as. a = o(b ) meas that a /b = o(). a = O() meas that a is evetually bouded, that is, for all large, a C for some C > 0. a = O(b ) meas that a /b = O(). We write a b if both a /b ad b /a are evetually bouded. I computer sicece this s writte as a = Θ(b ) but we prefer usig a b sice, i statistics, Θ ofte deotes a parameter space. Now we move o to the probabilistic versios. Say that Y = o P () if, for every ɛ > 0, P( Y > ɛ) 0. 8

27 Say that Y = o P (a ) if, Y /a = o P (). Say that Y = O P () if, for every ɛ > 0, there is a C > 0 such that P( Y > C) ɛ. Say that Y = O P (a ) if Y /a = O P (). Let s use Hoeffdig s iequality to show that sample proportios are O P (/ ) withi the the true mea. Let Y,..., Y be coi flips i.e. Y i {0, }. Let p = P(Y i = ). Let p = Y i. i= We will show that: p p = o P () ad p p = O P (/ ). We have that P( p p > ɛ) e ɛ 0 ad so p p = o P (). Also, P( p p > C) = P ( p p > C ) e C < δ if we pick C large eough. Hece, ( p p) = O P () ad so ( ) p p = O P. Now cosider m cois with probabilities p,..., p m. The P(max p j p j > ɛ) j m P( p j p j > ɛ) uio boud j= m j= e ɛ Hoeffdig = me ɛ = exp { (ɛ log m) }. Supose that m e γ where 0 γ <. The P(max p j p j > ɛ) exp { (ɛ γ ) } 0. j Hece, max p j p j = o P (). j 9

28 Uiform Bouds Lecture Notes 3 Recall that, if X,..., X Beroulli(p) ad p = i= X i the, from Hoeffdig s iequality, P( p p > ɛ) e ɛ. Sometimes we wat to say more tha this. Example Suppose that X,..., X have cdf F. Let F (t) = I(X i t). i= We call F the empirical cdf. How close is F to F? That is, how big is F (t) F (t)? From Hoeffdig s iequality, P( F (t) F (t) > ɛ) e ɛ. But that is oly for oe poit t. How big is sup t F (t) F (t)? We would like a boud of the form ( ) P sup F (t) F (t) > ɛ t somethig small. Example Suppose that X,..., X P. Let P (A) = I(X i A). i= How close is P (A) to P (A)? That is, how big is P (A) P (A)? From Hoeffdig s iequality, P( P (A) P (A) > ɛ) e ɛ. But that is oly for oe set A. How big is sup A A P (A) P (A) for a class of sets A? We would like a boud of the form ( ) P sup P (A) P (A) > ɛ A A somethig small. Example 3 (Classificatio.) Suppose we observe data (X, Y ),..., (X, Y ) where Y i {0, }. Let (X, Y ) be a ew pair. Suppose we observe X. Now we wat to predict Y. A classifier h is a fuctio h(x) which takes values i {0, }. Whe we observe X we predict Y with h(x). The classificatio error, or risk, is the probability of a error: R(h) = P(Y h(x)).

29 The traiig error is the fractio of errors o the observed data (X, Y ),..., (X, Y ): R(h) = I(Y i h(x i )). By Hoeffdig s iequality, i= P( R(h) R(h) > ɛ) e ɛ. How do we choose a classifier? Oe way is to start with a set of classifiers H. The we defie ĥ to be the member of H that miimizes the traiig error. Thus ĥ = argmi h H R(h). A example is the set of liear classifiers. Suppose that x R d. A liear classifier has the form h(x) = of β T x 0 ad h(x) = 0 of β T x < 0 where β = (β,..., β d ) T is a set of parameters. Although ĥ miimizes R(h), it does ot miimize R(h). Let h miimize the true error R(h). A fudametal questio is: how close is R(ĥ) to R(h )? We will see later tha R(ĥ) is close to R(h ) if sup h R(h) R(h) is small. So we wat ( ) P sup R(h) R(h) > ɛ h somethig small. More geerally, we ca state out goal as follows. For ay fuctio f defie P (f) = f(x)dp (x), P (f) = f(x i ). Let F be a set of fuctios. I our first example, each f was of the form f t (x) = I(x t) ad F = {f t : t R}. We wat to boud ) ( P sup P (f) P (f) > ɛ f F We will see that the bouds we obtai have the form ( P sup P (f) P (f) > ɛ f F i= ) c κ(f)e c ɛ where c ad c are positive costats ad κ(f) is a measure of the size (or complexity) of the class F. Similarly, if A is a class of sets the we wat a boud of the form ( ) P sup P (A) P (A) > ɛ c κ(a)e c ɛ A A where P (A) = i= I(X i A). Bouds like these are called uiform bods sice they hold uiformly over a class of fuctios or over a class of sets..

30 Fiite Classes Let F = {f,..., f N }. Suppose that max sup f j (x) B. j N We will make use of the uio boud. Recall that ) P (A AN x N P(A j ). Let A j be the evet that P (f j ) P (f) > ɛ. From Hoeffdig s iequality, P(A j ) e ɛ /(B ). The ( ) P sup P (f) P (f) > ɛ = P(A AN ) f F N N P(A j ) e ɛ /(B ) = Ne ɛ /(B ). Thus we have show that ( ) P sup P (f) P (f) > ɛ f F κe ɛ /(B ) j= where κ = F. The same idea applies to classes of sets. Let A = {A,..., A N } be a fiite collectio of sets. By the same reasoig we have ( ) P sup P (A) P (A) > ɛ A A κe ɛ /(B ) where κ = F ad P (A) = i= I(X i A). To exted these ideas to ifiite classes like F = {f t : t R} we eed to itroduce a few more cocepts. j= j= 3 Shatterig Let A be a class of sets. Some examples are:. A = {(, t] : t R}.. A = {(a, b) : a b}. 3. A = {(a, b) (c, d) : a b c d}. 3

31 4. A = all discs i R d. 5. A = all rectagles i R d. 6. A = all half-spaces i R d = {x : β T x 0}. 7. A = all covex sets i R d. Let F = {x,..., x } be a fiite set. Let G be a subset of F. Say that A picks out G if A F = G for some A A. For example, let A = {(a, b) : a b}. Suppose that F = {,, 7, 8, 9} ad G = {, 7}. The A picks out G sice A F = G if we choose A = (.5, 7.5) for example. Let S(A, F ) be the umber of these subsets picked out by A. Of course S(A, F ). Example 4 Let A = {(a, b) : a b} ad F = {,, 3}. The A ca pick out:, {}, {}, {3}, {, }, {, 3}, {,, 3}. So s(a, F ) = 7. Note that 7 < 8 = 3. If F = {, 6} the A ca pick out: I this case s(a, F ) = 4 =., {}, {6}, {, 6}. We say that F is shattered if s(a, F ) = where is the umber of poits i F. Let F deote all fiite sets with elemets. Defie the shatter coefficiet Note that s (A). s (A) = sup F F s(a, F ). The followig theorem is due to Vapik ad Chervoeis. The proof is beyod the scope of the course. (If you take 0-70/36-70 you will lear the proof.) 4

32 Class A VC dimesio V A A = {A,..., A N } log N Itervals [a, b] o the real lie Discs i R 3 Closed balls i R d d + Rectagles i R d d Half-spaces i R d d + Covex polygos i R Covex polygos with d vertices d + Table : The VC dimesio of some classes A. Theorem 5 Let A be a class of sets. The ( ) P sup P (A) P (A) > ɛ A A 8 s (A) e ɛ /3. () This partly solves oe of our problems. But, how big ca s (A) be? Sometimes s (A) = for all. For example, let A be all polygos i the plae. The s (A) = for all. But, i may cases, we will see that s (A) = for all up to some iteger d ad the s (A) < for all > d. The Vapik-Chervoekis (VC) dimesio is d = d(a) = largest such that s (A) =. I other words, d is the size of the largest set that ca be shattered. Thus, s (A) = for all d ad s (A) < for all > d. The VC dimesios of some commo examples are summarized i Table. Now here is a iterestig questio: for > d how does s (A) behave? It is less tha but how much less? Theorem 6 (Sauer s Theorem) Suppose that A has fiite VC dimesio d. The, for all d, s(a, ) ( + ) d. () 5

33 We coclude that: Theorem 7 Let A be a class of sets with VC dimesio d <. The ( ) P sup P (A) P (A) > ɛ A A 8 ( + ) d e ɛ /3. (3) Example 8 Let s retur to our first example. Suppose that X,..., X have cdf F. Let F (t) = I(X i t). i= We would like to boud P(sup t F (t) F (t) > ɛ). Notice that F (t) = P (A) where A = (, t]. Let A = {(, t] : t R}. This has VC dimesio d =. So ( ) P(sup F (t) F (t) > ɛ) = P t sup P (A) P (A) > ɛ A A 8 ( + ) e ɛ /3. I fact, there is a tighter boud i this case called the DKW (Dvoretsky-Kiefer-Wolfowitz) iequality: P(sup F (t) F (t) > ɛ) e ɛ. t 4 Boudig Expectatios Eearlier we saw that we ca use expoetial bouds o probabilities to get bouds o expectatios. Let us recall how that works. Cosider a fiite collectio A = {A,..., A N }. Let We kow that Z = max j N P (A j ) P (A j ). P(Z > ɛ) me ɛ. (4) But ow we wat to boud ( ) E(Z ) = max P (A j ) P (A j ). j N We ca rewrite (4) as or, i other words, Recall that, i geeral, if Y 0 the P(Z > ɛ ) Ne ɛ. P(Z > t) Ne t. E(Y ) = 0 6 P(Y > t)dt.

34 Hece, for ay s, E(Z) = = 0 s 0 s + P(Z > t)dt P(Z > t)dt + s s P(Z > t)dt s + N e t dt s ( ) e s = s + N P(Z > t)dt = s + N e s. Let s = log(n)/(). The E(Z) s + N e s = log N + = log N +. Fially, we use Cauchy-Schwartz: E(Z ) ( ) log N + log N E(Z) = O. I summary: ( ) E max P (A j ) P (A j ) = O j N ( ) log N. For a sigle set A we would have E P (A) P (A) O(/ ). The boud oly icreases logarithmically with N. 7

35 Radom Samples Lecture Notes 4 Let X,..., X F. A statistic is ay fuctio T = g(x,..., X ). Recall that the sample mea is X = X i ad sample variace is S = Let µ = E(X i ) ad σ = Var(X i ). Recall that E(X ) = µ, i= (X i X ). i= Var(X ) = σ, E(S ) = σ. Theorem If X,..., X N(µ, σ ) the X N(µ, σ /). So, Proof. We kow that M Xi (s) = e µs+σ s /. M X (t) = E(e tx ) = E(e t P i= X i ) = (Ee txi/ ) = (M Xi (t/)) = { } = exp µt + which is the mgf of a N(µ, σ /). σ t ( ) e (µt/)+σ t /( ) Example (Example 5..0). Let Z,..., Z Cauchy(0, ). The Z Cauchy(0, ). Lemma 3 If X,..., X N(µ, σ ) the T = X µ S/ t N(0, ). Let X (),..., X () deoted the ordered values: X () X () X (). The X (),..., X () are called the order statistics.

36 Covergece Let X, X,... be a sequece of radom variables ad let X be aother radom variable. Let F deote the cdf of X ad let F deote the cdf of X.. X coverges almost surely to X, writte X a.s. X, if, for every ɛ > 0, P( lim X X < ɛ) =. (). X coverges to X i probability, writte X P X, if, for every ɛ > 0, as. I other words, X X = o P (). P( X X > ɛ) 0 () 3. X coverges to X i quadratic mea (also called covergece i L ), writte X qm X, if E(X X) 0 (3) as. 4. X coverges to X i distributio, writte X X, if at all t for which F is cotiuous. lim F (t) = F (t) (4) Covergece to a Costat. A radom variable X has a poit mass distributio if there exists a costat c such that P(X = c) =. The distributio for X is deoted by δ c ad we write X δ c. If X P δ c the we also write X P c. Similarly for the other types of covergece. Theorem 4 X as X if ad oly if, for every ɛ > 0, lim P(sup X m X ɛ) =. m Example 5 (Example 5.5.8). This example shows that covergece i probability does ot imply almost sure covergece. Let S = [0, ]. Let P be uiform o [0, ]. We draw S P. Let X(s) = s ad let X = s + I [0,] (s), X = s + I [0,/] (s), X 3 = s + I [/,] (s) X 4 = s + I [0,/3] (s), X 5 = s + I [/3,/3] (s), X 6 = s + I [/3,] (s) etc. The X P X. But, for each s, X (s) does ot coverge to X(s). Hece, X does ot coverge almost surely to X.

37 Example 6 Let X N(0, /). Ituitively, X is cocetratig at 0 so we would like to say that X coverges to 0. Let s see if this is true. Let F be the distributio fuctio for a poit mass at 0. Note that X N(0, ). Let Z deote a stadard ormal radom variable. For t < 0, sice t. For t > 0, F (t) = P(X < t) = P( X < t) = P(Z < t) 0 F (t) = P(X < t) = P( X < t) = P(Z < t) sice t. Hece, F (t) F (t) for all t 0 ad so X 0. Notice that F (0) = / F (/) = so covergece fails at t = 0. That does t matter because t = 0 is ot a cotiuity poit of F ad the defiitio of covergece i distributio oly requires covergece at cotiuity poits. Now cosider covergece i probability. For ay ɛ > 0, usig Markov s iequality, as. Hece, X P 0. P( X > ɛ) = P( X > ɛ ) E(X ) ɛ = ɛ 0 The ext theorem gives the relatioship betwee the types of covergece. Theorem 7 The followig relatioships hold: (a) X qm X implies that X P X. (b) X P X implies that X X. (c) If X X ad if P(X = c) = for some real umber c, the X P X. as (d) X X implies X P X. I geeral, oe of the reverse implicatios hold except the special case i (c). Proof. We start by provig (a). Suppose that X qm X. Fix ɛ > 0. The, usig Markov s iequality, P( X X > ɛ) = P( X X > ɛ ) E X X ɛ 0. Proof of (b). Fix ɛ > 0 ad let x be a cotiuity poit of F. The F (x) = P(X x) = P(X x, X x + ɛ) + P(X x, X > x + ɛ) P(X x + ɛ) + P( X X > ɛ) = F (x + ɛ) + P( X X > ɛ). 3

38 Also, F (x ɛ) = P(X x ɛ) = P(X x ɛ, X x) + P(X x ɛ, X > x) F (x) + P( X X > ɛ). Hece, F (x ɛ) P( X X > ɛ) F (x) F (x + ɛ) + P( X X > ɛ). Take the limit as to coclude that F (x ɛ) lim if F (x) lim sup F (x) F (x + ɛ). This holds for all ɛ > 0. Take the limit as ɛ 0 ad use the fact that F is cotiuous at x ad coclude that lim F (x) = F (x). Proof of (c). Fix ɛ > 0. The, P( X c > ɛ) = P(X < c ɛ) + P(X > c + ɛ) Proof of (d). This follows from Theorem 4. P(X c ɛ) + P(X > c + ɛ) = F (c ɛ) + F (c + ɛ) F (c ɛ) + F (c + ɛ) = 0 + = 0. Let us ow show that the reverse implicatios do ot hold. Covergece i probability does ot imply covergece i quadratic mea. Let U Uif(0, ) ad let X = I (0,/) (U). The P( X > ɛ) = P( I (0,/) (U) > ɛ) = P(0 U < /) = / 0. Hece, X P 0. But E(X) = / du = for all so X 0 does ot coverge i quadratic mea. Covergece i distributio does ot imply covergece i probability. Let X N(0, ). Let X = X for =,, 3,...; hece X N(0, ). X has the same distributio fuctio as X for all so, trivially, lim F (x) = F (x) for all x. Therefore, X X. But P( X X > ɛ) = P( X > ɛ) = P( X > ɛ/) 0. So X does ot coverge to X i probability. The relatioships betwee the types of covergece ca be summarized as follows: q.m. a.s. prob distributio 4

39 Example 8 Oe might cojecture that if X P b, the E(X ) b. This is ot true. Let X be a radom variable defied by P(X = ) = / ad P(X = 0) = (/). Now, P( X < ɛ) = P(X = 0) = (/). Hece, X P 0. However, E(X ) = [ (/)] + [0 ( (/))] =. Thus, E(X ). Example 9 Let X,..., X Uiform(0, ). Let X () = max i X i. First we claim that P. This follows sice X () P( X () > ɛ) = P(X () ɛ) = i P(X i ɛ) = ( ɛ) 0. Also So ( X () ) Exp(). P(( X () ) t) = P(X () (t/)) = ( t/) e t. Some covergece properties are preserved uder trasformatios. Theorem 0 Let X, X, Y, Y be radom variables. Let g be a cotiuous fuctio. (a) If X P X ad Y P Y, the X + Y P X + Y. (b) If X qm X ad Y qm Y, the X + Y qm X + Y. (c) If X X ad Y c, the X + Y X + c. (d) If X P X ad Y P Y, the X Y P XY. (e) If X X ad Y c, the X Y cx. (f) If X P X, the g(x ) P g(x). (g) If X X, the g(x ) g(x). Parts (c) ad (e) are kow as Slutzky s theorem Parts (f) ad (g) are kow as The Cotiuous Mappig Theorem. It is worth otig that X X ad Y Y does ot i geeral imply that X +Y X + Y. 3 The Law of Large Numbers The law of large umbers (LLN) says that the mea of a large sample is close to the mea of the distributio. For example, the proportio of heads of a large umber of tosses of a fair coi is expected to be close to /. We ow make this more precise. Let X, X,... be a iid sample, let µ = E(X ) ad σ = Var(X ). Recall that the sample mea is defied as X = i= X i ad that E(X ) = µ ad Var(X ) = σ /. 5

40 Theorem (The Weak Law of Large Numbers (WLLN)) If X,..., X are iid, the X P µ. Thus, X µ = o P (). Iterpretatio of the WLLN: The distributio of X aroud µ as gets large. becomes more cocetrated Proof. Assume that σ <. This is ot ecessary but it simplifies the proof. Usig Chebyshev s iequality, P ( X µ > ɛ ) Var(X ) = σ ɛ ɛ which teds to 0 as. Theorem The Strog Law of Large Numbers. Let X,..., X be iid with mea µ. as The X µ. The proof is beyod the scope of this course. 4 The Cetral Limit Theorem The law of large umbers says that the distributio of X piles up ear µ. This is t eough to help us approximate probability statemets about X. For this we eed the cetral limit theorem. Suppose that X,..., X are iid with mea µ ad variace σ. The cetral limit theorem (CLT) says that X = i X i has a distributio which is approximately Normal with mea µ ad variace σ /. This is remarkable sice othig is assumed about the distributio of X i, except the existece of the mea ad variace. Theorem 3 (The Cetral Limit Theorem (CLT)) Let X,..., X be iid with mea µ ad variace σ. Let X = i= X i. The Z X µ Var(X ) where Z N(0, ). I other words, = (X µ) σ Z lim P(Z z) = Φ(z) = z π e x / dx. Iterpretatio: Probability statemets about X ca be approximated usig a Normal distributio. It s the probability statemets that we are approximatig, ot the radom variable itself. 6

41 A cosequece of the CLT is that X µ = O P ( I additio to Z N(0, ), there are several forms of otatio to deote the fact that the distributio of Z is covergig to a Normal. They all mea the same thig. Here they are: ) Z N(0, ) ) X N (µ, σ ) X µ N (0, σ (X µ) N ( 0, σ ) (X µ) N(0, ). σ Recall that if X is a radom variable, its momet geeratig fuctio (mgf) is ψ X (t) = Ee tx. Assume i what follows that the mgf is fiite i a eighborhood aroud t = 0. Lemma 4 Let Z, Z,... be a sequece of radom variables. Let ψ be the mgf of Z. Let Z be aother radom variable ad deote its mgf by ψ. If ψ (t) ψ(t) for all t i some ope iterval aroud 0, the Z Z. Proof of the cetral limit theorem. Let Y i = (X i µ)/σ. The, Z = / i Y i. Let ψ(t) be the mgf of Y i. The mgf of i Y i is (ψ(t)) ad mgf of Z is [ψ(t/ )] ξ (t). Now ψ (0) = E(Y ) = 0, ψ (0) = E(Y ) = Var(Y ) =. So,. Now, ψ(t) = ψ(0) + tψ (0) + t! ψ (0) + t3 3! ψ (0) + = t + t3 3! ψ (0) + = + t + t3 3! ψ (0) + ξ (t) = [ ψ ( t )] = = [ + t + t3 3! 3/ ψ (0) + [ t + + t3 ψ (0) + 3! / e t / 7 ] ]

42 which is the mgf of a N(0,). The result follows from Lemma 4. I the last step we used the fact that if a a the ( + a ) e a. The cetral limit theorem tells us that Z = (X µ)/σ is approximately N(0,). However, we rarely kow σ. We ca estimate σ from X,..., X by S = (X i X ). i= This raises the followig questio: if we replace σ with S, is the cetral limit theorem still true? The aswer is yes. Theorem 5 Assume the same coditios as the CLT. The, T = (X µ) S N(0, ). Proof. We have that where ad Now Z N(0, ) ad W T = Z W (X µ) Z = σ W = σ. S P. The result follows from Slutzky s theorem. There is also a multivariate versio of the cetral limit theorem. Recall that X = (X,..., X k ) T has a multivariate Normal distributio with mea vector µ ad covariace matrix Σ if ( f(x) = exp ) (π) k/ Σ / (x µ)t Σ (x µ). I this case we write X N(µ, Σ). Theorem 6 (Multivariate cetral limit theorem) Let X,..., X be iid radom vectors where X i = (X i,..., X ki ) T with mea µ = (µ,..., µ k ) T ad covariace matrix Σ. Let X = (X,..., X k ) T where X j = i= X ji. The, (X µ) N(0, Σ). 8

43 5 The Delta Method If Y has a limitig Normal distributio the the delta method allows us to fid the limitig distributio of g(y ) where g is ay smooth fuctio. Theorem 7 (The Delta Method) Suppose that (Y µ) N(0, ) σ ad that g is a differetiable fuctio such that g (µ) 0. The I other words, Y N ( µ, σ ) (g(y ) g(µ)) g (µ) σ implies that N(0, ). g(y ) N ( g(µ), (g (µ)) σ ). Example 8 Let X,..., X be iid with fiite mea µ ad fiite variace σ. By the cetral limit theorem, (X µ)/σ N(0, ). Let W = e X. Thus, W = g(x ) where g(s) = e s. Sice g (s) = e s, the delta method implies that W N(e µ, e µ σ /). There is also a multivariate versio of the delta method. Theorem 9 (The Multivariate Delta Method) Suppose that Y = (Y,..., Y k ) is a sequece of radom vectors such that (Y µ) N(0, Σ). Let g : R k R ad let g(y) = Let µ deote g(y) evaluated at y = µ ad assume that the elemets of µ are ozero. The (g(y ) g(µ)) N ( 0, T µ Σ µ ). Example 0 Let ( X X ), ( X X g y. g y k. ),..., ( X X be iid radom vectors with mea µ = (µ, µ ) T ad variace Σ. Let X = X i, i= 9 X = i= ) X i

44 ad defie Y = X X. Thus, Y = g(x, X ) where g(s, s ) = s s. By the cetral limit theorem, ( ) X µ N(0, Σ). X µ Now ad so g(s) = ( T σ σ µ Σ µ = (µ µ ) σ σ ( g s g s ) ( µ µ ) = ( s s ) ) = µ σ + µ µ σ + µ σ. Therefore, (X X µ µ ) N (0, µ σ + µ µ σ + µ σ ). 0

45 Addedum to Lecture Notes 4 where Here is the proof that T = (X µ) S N(0, ) S = (X i X ). i= Step. We first show that R Note that R = P σ where R = (X i X ). i= Xi i= ( ) X i. Defie Y i = X i. The, usig the LLN (law of large umbers) Next, by the LLN, i= X i = i= Y P i E(Y i ) = E(Xi ) = µ + σ. i= X P i µ. Sice g(t) = t is cotiuous, the cotiuous mappig theorem implies that ( ) X i P µ. Thus R i= i= P (µ + σ ) µ = σ. Step. Note that Sice, R S = ( ) R. P σ ad /( ), we have that S P σ. Step 3. Sice g(t) = t is cotiuous, (for t 0) the cotiuous mappig theorem implies that S P σ.

46 Step 4. Sice g(t) = t/σ is cotiuous, the cotiuous mappig theorem implies that S /σ P. Step 5. Sice g(t) = /t is cotiuous (for t > 0) the cotiuous mappig theorem implies that σ/s P. Sice covergece i probability implies covergece i distributio, σ/s. Step 5. Note that ( ) ( ) (X µ) σ T = V W. σ S Now V Z where Z N(0, ) by the CLT. Ad we showed that W. By Slutzky s theorem, T = V W Z = Z.

47 Lecture Notes 5 Statistical Models A statistical model P is a collectio of probability distributios (or a collectio of desities). A example of a oparametric model is { } P = p : (p (x)) dx <. A parametric model has the form { } P = p(x; θ) : θ Θ where Θ R d. A example is the set of Normal desities {p(x; θ) = (π) / e (x θ) / }. For ow, we focus o parametric models. The model comes from assumptios. Some examples: Time util somethig fails is ofte modeled by a expoetial distributio. Number of rare evets is ofte modeled by a Poisso distributio. Legths ad weights are ofte modeled by a Normal distributio. These models are ot correct. But they might be useful. Later we cosider oparametric methods that do ot assume a parametric model Statistics Let X,..., X p(x; θ). Let X (X,..., X ). Ay fuctio T = T (X,..., X ) is itself a radom variable which we will call a statistic. Some examples are: order statistics, X () X () X ()

48 sample mea: X = i X i, sample variace: S = i (X i x), sample media: middle value of ordered statistics, sample miimum: X () sample maximum: X () sample rage: X () X () sample iterquartile rage: X (.75) X (.5) Example If X,..., X Γ(α, β), the X Γ(α, β/). Proof: This is the mgf of Γ(α, β/). M X = E[e tx ] = E[e P X i t/ ] = E[e Xi(t/) ] i [( ) α ] [ ] α = [M X (t/)] = =. βt/ β/t Example If X,..., X N(µ, σ ) the X N(µ, σ /). Example 3 If X,..., X iid Cauchy(0,), for x R, the X Cauchy(0,). p(x) = π( + x ) Example 4 If X,..., X N(µ, σ ) the The proof is based o the mgf. ( ) S χ σ ( ).

49 Example 5 Let X (), X (),..., X () be the order statistics, which meas that the sample X, X,..., X has bee ordered from smallest to largest: X () X () X (). Now, F X(k) (x) = P (X (k) x) = P (at least k of the X,..., X x) = P (exactly j of the X,..., X x) = j=k j=k ( ) [F X (x)] j [ F X (x)] j j Differetiate to fid the pdf (See CB p. 9): p X(k) (x) =! (k )!( k)! [F X(x)] k p(x) [ F X (x)] k. 3 Sufficiecy (Ch 6 CB) We cotiue with parametric iferece. reductio as a formal cocept. I this sectio we discuss data Sample X = X,, X F. Assume F belogs to a family of distributios, (e.g. F is Normal), idexed by some parameter θ. We wat to lear about θ ad try to summarize the data without throwig ay iformatio about θ away. If a statistic T (X,, X ) cotais all the iformatio about θ i the sample we say T is sufficiet. 3

50 3. Sufficiet Statistics Defiitio: T is sufficiet for θ if the coditioal distributio of X T does ot deped o θ. Thus, f(x,..., x t; θ) = f(x,..., x t). Example 6 X,, X Poisso(θ). Let T = i= X i. The, But Hece, p X T (x t) = P(X = x T (X ) = t) = P (X = x ad T = t). P (T = t) 0 if T (x ) t P (X = x ad T = t) = P (X = x ) if T (X ) = t P (X = x ) = Now, T (x ) = x i = t ad so e θ θ x i i= x i! = e θ θ P x i (xi!) = e θ θ t (xi!). P (T = t) = e θ (θ) t t! sice T Poisso(θ). Thus, P (X = x ) P (T = t) = t! ( x i )! t which does ot deped o θ. So T = i X i is a sufficiet statistic for θ. Other sufficiet statistics are: T = 3.7 i X i, T = ( i X i, X 4 ), ad T (X,..., X ) = (X,..., X ). 3. Sufficiet Partitios It is better to describe sufficiecy i terms of partitios of the sample space. Example 7 Let X, X, X 3 Beroulli(θ). Let T = X i. 4

51 x t p(x t) (0, 0, 0) t = 0 (0, 0, ) t = /3 (0,, 0) t = /3 (, 0, 0) t = /3 (0,, ) t = /3 (, 0, ) t = /3 (,, 0) t = /3 (,, ) t = 3 8 elemets 4 elemets. A partitio B,..., B k is sufficiet if f(x X B) does ot deped o θ.. A statistic T iduces a partitio. For each t, {x : T (x) = t} is oe elemet of the partitio. T is sufficiet if ad oly if the partitio is sufficiet. 3. Two statistics ca geerate the same partitio: example: i X i ad 3 i X i. 4. If we split ay elemet B i of a sufficiet partitio ito smaller pieces, we get aother sufficiet partitio. Example 8 Let X, X, X 3 Beroulli(θ). The T = X is ot sufficiet. Look at its partitio: 5

Properties of MLE: consistency, asymptotic normality. Fisher information.

Properties of MLE: consistency, asymptotic normality. Fisher information. Lecture 3 Properties of MLE: cosistecy, asymptotic ormality. Fisher iformatio. I this sectio we will try to uderstad why MLEs are good. Let us recall two facts from probability that we be used ofte throughout

More information

SAMPLE QUESTIONS FOR FINAL EXAM. (1) (2) (3) (4) Find the following using the definition of the Riemann integral: (2x + 1)dx

SAMPLE QUESTIONS FOR FINAL EXAM. (1) (2) (3) (4) Find the following using the definition of the Riemann integral: (2x + 1)dx SAMPLE QUESTIONS FOR FINAL EXAM REAL ANALYSIS I FALL 006 3 4 Fid the followig usig the defiitio of the Riema itegral: a 0 x + dx 3 Cosider the partitio P x 0 3, x 3 +, x 3 +,......, x 3 3 + 3 of the iterval

More information

Overview of some probability distributions.

Overview of some probability distributions. Lecture Overview of some probability distributios. I this lecture we will review several commo distributios that will be used ofte throughtout the class. Each distributio is usually described by its probability

More information

Chapter 6: Variance, the law of large numbers and the Monte-Carlo method

Chapter 6: Variance, the law of large numbers and the Monte-Carlo method Chapter 6: Variace, the law of large umbers ad the Mote-Carlo method Expected value, variace, ad Chebyshev iequality. If X is a radom variable recall that the expected value of X, E[X] is the average value

More information

Chapter 7 Methods of Finding Estimators

Chapter 7 Methods of Finding Estimators Chapter 7 for BST 695: Special Topics i Statistical Theory. Kui Zhag, 011 Chapter 7 Methods of Fidig Estimators Sectio 7.1 Itroductio Defiitio 7.1.1 A poit estimator is ay fuctio W( X) W( X1, X,, X ) of

More information

University of California, Los Angeles Department of Statistics. Distributions related to the normal distribution

University of California, Los Angeles Department of Statistics. Distributions related to the normal distribution Uiversity of Califoria, Los Ageles Departmet of Statistics Statistics 100B Istructor: Nicolas Christou Three importat distributios: Distributios related to the ormal distributio Chi-square (χ ) distributio.

More information

Convexity, Inequalities, and Norms

Convexity, Inequalities, and Norms Covexity, Iequalities, ad Norms Covex Fuctios You are probably familiar with the otio of cocavity of fuctios. Give a twicedifferetiable fuctio ϕ: R R, We say that ϕ is covex (or cocave up) if ϕ (x) 0 for

More information

Chapter 7 - Sampling Distributions. 1 Introduction. What is statistics? It consist of three major areas:

Chapter 7 - Sampling Distributions. 1 Introduction. What is statistics? It consist of three major areas: Chapter 7 - Samplig Distributios 1 Itroductio What is statistics? It cosist of three major areas: Data Collectio: samplig plas ad experimetal desigs Descriptive Statistics: umerical ad graphical summaries

More information

UC Berkeley Department of Electrical Engineering and Computer Science. EE 126: Probablity and Random Processes. Solutions 9 Spring 2006

UC Berkeley Department of Electrical Engineering and Computer Science. EE 126: Probablity and Random Processes. Solutions 9 Spring 2006 Exam format UC Bereley Departmet of Electrical Egieerig ad Computer Sciece EE 6: Probablity ad Radom Processes Solutios 9 Sprig 006 The secod midterm will be held o Wedesday May 7; CHECK the fial exam

More information

Case Study. Normal and t Distributions. Density Plot. Normal Distributions

Case Study. Normal and t Distributions. Density Plot. Normal Distributions Case Study Normal ad t Distributios Bret Halo ad Bret Larget Departmet of Statistics Uiversity of Wiscosi Madiso October 11 13, 2011 Case Study Body temperature varies withi idividuals over time (it ca

More information

In nite Sequences. Dr. Philippe B. Laval Kennesaw State University. October 9, 2008

In nite Sequences. Dr. Philippe B. Laval Kennesaw State University. October 9, 2008 I ite Sequeces Dr. Philippe B. Laval Keesaw State Uiversity October 9, 2008 Abstract This had out is a itroductio to i ite sequeces. mai de itios ad presets some elemetary results. It gives the I ite Sequeces

More information

0.7 0.6 0.2 0 0 96 96.5 97 97.5 98 98.5 99 99.5 100 100.5 96.5 97 97.5 98 98.5 99 99.5 100 100.5

0.7 0.6 0.2 0 0 96 96.5 97 97.5 98 98.5 99 99.5 100 100.5 96.5 97 97.5 98 98.5 99 99.5 100 100.5 Sectio 13 Kolmogorov-Smirov test. Suppose that we have a i.i.d. sample X 1,..., X with some ukow distributio P ad we would like to test the hypothesis that P is equal to a particular distributio P 0, i.e.

More information

Lecture 13. Lecturer: Jonathan Kelner Scribe: Jonathan Pines (2009)

Lecture 13. Lecturer: Jonathan Kelner Scribe: Jonathan Pines (2009) 18.409 A Algorithmist s Toolkit October 27, 2009 Lecture 13 Lecturer: Joatha Keler Scribe: Joatha Pies (2009) 1 Outlie Last time, we proved the Bru-Mikowski iequality for boxes. Today we ll go over the

More information

Hypothesis testing. Null and alternative hypotheses

Hypothesis testing. Null and alternative hypotheses Hypothesis testig Aother importat use of samplig distributios is to test hypotheses about populatio parameters, e.g. mea, proportio, regressio coefficiets, etc. For example, it is possible to stipulate

More information

Sequences and Series

Sequences and Series CHAPTER 9 Sequeces ad Series 9.. Covergece: Defiitio ad Examples Sequeces The purpose of this chapter is to itroduce a particular way of geeratig algorithms for fidig the values of fuctios defied by their

More information

Maximum Likelihood Estimators.

Maximum Likelihood Estimators. Lecture 2 Maximum Likelihood Estimators. Matlab example. As a motivatio, let us look at oe Matlab example. Let us geerate a radom sample of size 00 from beta distributio Beta(5, 2). We will lear the defiitio

More information

A probabilistic proof of a binomial identity

A probabilistic proof of a binomial identity A probabilistic proof of a biomial idetity Joatho Peterso Abstract We give a elemetary probabilistic proof of a biomial idetity. The proof is obtaied by computig the probability of a certai evet i two

More information

I. Chi-squared Distributions

I. Chi-squared Distributions 1 M 358K Supplemet to Chapter 23: CHI-SQUARED DISTRIBUTIONS, T-DISTRIBUTIONS, AND DEGREES OF FREEDOM To uderstad t-distributios, we first eed to look at aother family of distributios, the chi-squared distributios.

More information

Output Analysis (2, Chapters 10 &11 Law)

Output Analysis (2, Chapters 10 &11 Law) B. Maddah ENMG 6 Simulatio 05/0/07 Output Aalysis (, Chapters 10 &11 Law) Comparig alterative system cofiguratio Sice the output of a simulatio is radom, the comparig differet systems via simulatio should

More information

Asymptotic Growth of Functions

Asymptotic Growth of Functions CMPS Itroductio to Aalysis of Algorithms Fall 3 Asymptotic Growth of Fuctios We itroduce several types of asymptotic otatio which are used to compare the performace ad efficiecy of algorithms As we ll

More information

Section 11.3: The Integral Test

Section 11.3: The Integral Test Sectio.3: The Itegral Test Most of the series we have looked at have either diverged or have coverged ad we have bee able to fid what they coverge to. I geeral however, the problem is much more difficult

More information

1. C. The formula for the confidence interval for a population mean is: x t, which was

1. C. The formula for the confidence interval for a population mean is: x t, which was s 1. C. The formula for the cofidece iterval for a populatio mea is: x t, which was based o the sample Mea. So, x is guarateed to be i the iterval you form.. D. Use the rule : p-value

More information

Chapter 14 Nonparametric Statistics

Chapter 14 Nonparametric Statistics Chapter 14 Noparametric Statistics A.K.A. distributio-free statistics! Does ot deped o the populatio fittig ay particular type of distributio (e.g, ormal). Sice these methods make fewer assumptios, they

More information

Normal Distribution.

Normal Distribution. Normal Distributio www.icrf.l Normal distributio I probability theory, the ormal or Gaussia distributio, is a cotiuous probability distributio that is ofte used as a first approimatio to describe realvalued

More information

Lecture 4: Cauchy sequences, Bolzano-Weierstrass, and the Squeeze theorem

Lecture 4: Cauchy sequences, Bolzano-Weierstrass, and the Squeeze theorem Lecture 4: Cauchy sequeces, Bolzao-Weierstrass, ad the Squeeze theorem The purpose of this lecture is more modest tha the previous oes. It is to state certai coditios uder which we are guarateed that limits

More information

SECTION 1.5 : SUMMATION NOTATION + WORK WITH SEQUENCES

SECTION 1.5 : SUMMATION NOTATION + WORK WITH SEQUENCES SECTION 1.5 : SUMMATION NOTATION + WORK WITH SEQUENCES Read Sectio 1.5 (pages 5 9) Overview I Sectio 1.5 we lear to work with summatio otatio ad formulas. We will also itroduce a brief overview of sequeces,

More information

Our aim is to show that under reasonable assumptions a given 2π-periodic function f can be represented as convergent series

Our aim is to show that under reasonable assumptions a given 2π-periodic function f can be represented as convergent series 8 Fourier Series Our aim is to show that uder reasoable assumptios a give -periodic fuctio f ca be represeted as coverget series f(x) = a + (a cos x + b si x). (8.) By defiitio, the covergece of the series

More information

Lecture Notes 1. Brief Review of Basic Probability

Lecture Notes 1. Brief Review of Basic Probability Probability Review Lecture Notes Brief Review of Basic Probability I assume you know basic probability. Chapters -3 are a review. I will assume you have read and understood Chapters -3. Here is a very

More information

Solutions to Selected Problems In: Pattern Classification by Duda, Hart, Stork

Solutions to Selected Problems In: Pattern Classification by Duda, Hart, Stork Solutios to Selected Problems I: Patter Classificatio by Duda, Hart, Stork Joh L. Weatherwax February 4, 008 Problem Solutios Chapter Bayesia Decisio Theory Problem radomized rules Part a: Let Rx be the

More information

Lecture 5: Span, linear independence, bases, and dimension

Lecture 5: Span, linear independence, bases, and dimension Lecture 5: Spa, liear idepedece, bases, ad dimesio Travis Schedler Thurs, Sep 23, 2010 (versio: 9/21 9:55 PM) 1 Motivatio Motivatio To uderstad what it meas that R has dimesio oe, R 2 dimesio 2, etc.;

More information

BASIC STATISTICS. f(x 1,x 2,..., x n )=f(x 1 )f(x 2 ) f(x n )= f(x i ) (1)

BASIC STATISTICS. f(x 1,x 2,..., x n )=f(x 1 )f(x 2 ) f(x n )= f(x i ) (1) BASIC STATISTICS. SAMPLES, RANDOM SAMPLING AND SAMPLE STATISTICS.. Radom Sample. The radom variables X,X 2,..., X are called a radom sample of size from the populatio f(x if X,X 2,..., X are mutually idepedet

More information

1 Correlation and Regression Analysis

1 Correlation and Regression Analysis 1 Correlatio ad Regressio Aalysis I this sectio we will be ivestigatig the relatioship betwee two cotiuous variable, such as height ad weight, the cocetratio of a ijected drug ad heart rate, or the cosumptio

More information

Week 3 Conditional probabilities, Bayes formula, WEEK 3 page 1 Expected value of a random variable

Week 3 Conditional probabilities, Bayes formula, WEEK 3 page 1 Expected value of a random variable Week 3 Coditioal probabilities, Bayes formula, WEEK 3 page 1 Expected value of a radom variable We recall our discussio of 5 card poker hads. Example 13 : a) What is the probability of evet A that a 5

More information

Infinite Sequences and Series

Infinite Sequences and Series CHAPTER 4 Ifiite Sequeces ad Series 4.1. Sequeces A sequece is a ifiite ordered list of umbers, for example the sequece of odd positive itegers: 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29...

More information

Lecture 4: Cheeger s Inequality

Lecture 4: Cheeger s Inequality Spectral Graph Theory ad Applicatios WS 0/0 Lecture 4: Cheeger s Iequality Lecturer: Thomas Sauerwald & He Su Statemet of Cheeger s Iequality I this lecture we assume for simplicity that G is a d-regular

More information

Non-life insurance mathematics. Nils F. Haavardsson, University of Oslo and DNB Skadeforsikring

Non-life insurance mathematics. Nils F. Haavardsson, University of Oslo and DNB Skadeforsikring No-life isurace mathematics Nils F. Haavardsso, Uiversity of Oslo ad DNB Skadeforsikrig Mai issues so far Why does isurace work? How is risk premium defied ad why is it importat? How ca claim frequecy

More information

1. MATHEMATICAL INDUCTION

1. MATHEMATICAL INDUCTION 1. MATHEMATICAL INDUCTION EXAMPLE 1: Prove that for ay iteger 1. Proof: 1 + 2 + 3 +... + ( + 1 2 (1.1 STEP 1: For 1 (1.1 is true, sice 1 1(1 + 1. 2 STEP 2: Suppose (1.1 is true for some k 1, that is 1

More information

Math C067 Sampling Distributions

Math C067 Sampling Distributions Math C067 Samplig Distributios Sample Mea ad Sample Proportio Richard Beigel Some time betwee April 16, 2007 ad April 16, 2007 Examples of Samplig A pollster may try to estimate the proportio of voters

More information

PROCEEDINGS OF THE YEREVAN STATE UNIVERSITY AN ALTERNATIVE MODEL FOR BONUS-MALUS SYSTEM

PROCEEDINGS OF THE YEREVAN STATE UNIVERSITY AN ALTERNATIVE MODEL FOR BONUS-MALUS SYSTEM PROCEEDINGS OF THE YEREVAN STATE UNIVERSITY Physical ad Mathematical Scieces 2015, 1, p. 15 19 M a t h e m a t i c s AN ALTERNATIVE MODEL FOR BONUS-MALUS SYSTEM A. G. GULYAN Chair of Actuarial Mathematics

More information

MEI Structured Mathematics. Module Summary Sheets. Statistics 2 (Version B: reference to new book)

MEI Structured Mathematics. Module Summary Sheets. Statistics 2 (Version B: reference to new book) MEI Mathematics i Educatio ad Idustry MEI Structured Mathematics Module Summary Sheets Statistics (Versio B: referece to ew book) Topic : The Poisso Distributio Topic : The Normal Distributio Topic 3:

More information

Discrete Mathematics and Probability Theory Spring 2014 Anant Sahai Note 13

Discrete Mathematics and Probability Theory Spring 2014 Anant Sahai Note 13 EECS 70 Discrete Mathematics ad Probability Theory Sprig 2014 Aat Sahai Note 13 Itroductio At this poit, we have see eough examples that it is worth just takig stock of our model of probability ad may

More information

Trigonometric Form of a Complex Number. The Complex Plane. axis. ( 2, 1) or 2 i FIGURE 6.44. The absolute value of the complex number z a bi is

Trigonometric Form of a Complex Number. The Complex Plane. axis. ( 2, 1) or 2 i FIGURE 6.44. The absolute value of the complex number z a bi is 0_0605.qxd /5/05 0:45 AM Page 470 470 Chapter 6 Additioal Topics i Trigoometry 6.5 Trigoometric Form of a Complex Number What you should lear Plot complex umbers i the complex plae ad fid absolute values

More information

LECTURE 13: Cross-validation

LECTURE 13: Cross-validation LECTURE 3: Cross-validatio Resampli methods Cross Validatio Bootstrap Bias ad variace estimatio with the Bootstrap Three-way data partitioi Itroductio to Patter Aalysis Ricardo Gutierrez-Osua Texas A&M

More information

5: Introduction to Estimation

5: Introduction to Estimation 5: Itroductio to Estimatio Cotets Acroyms ad symbols... 1 Statistical iferece... Estimatig µ with cofidece... 3 Samplig distributio of the mea... 3 Cofidece Iterval for μ whe σ is kow before had... 4 Sample

More information

1 The Gaussian channel

1 The Gaussian channel ECE 77 Lecture 0 The Gaussia chael Objective: I this lecture we will lear about commuicatio over a chael of practical iterest, i which the trasmitted sigal is subjected to additive white Gaussia oise.

More information

THE REGRESSION MODEL IN MATRIX FORM. For simple linear regression, meaning one predictor, the model is. for i = 1, 2, 3,, n

THE REGRESSION MODEL IN MATRIX FORM. For simple linear regression, meaning one predictor, the model is. for i = 1, 2, 3,, n We will cosider the liear regressio model i matrix form. For simple liear regressio, meaig oe predictor, the model is i = + x i + ε i for i =,,,, This model icludes the assumptio that the ε i s are a sample

More information

Measures of Spread and Boxplots Discrete Math, Section 9.4

Measures of Spread and Boxplots Discrete Math, Section 9.4 Measures of Spread ad Boxplots Discrete Math, Sectio 9.4 We start with a example: Example 1: Comparig Mea ad Media Compute the mea ad media of each data set: S 1 = {4, 6, 8, 10, 1, 14, 16} S = {4, 7, 9,

More information

1 Review of Probability

1 Review of Probability Copyright c 27 by Karl Sigma 1 Review of Probability Radom variables are deoted by X, Y, Z, etc. The cumulative distributio fuctio (c.d.f.) of a radom variable X is deoted by F (x) = P (X x), < x

More information

Soving Recurrence Relations

Soving Recurrence Relations Sovig Recurrece Relatios Part 1. Homogeeous liear 2d degree relatios with costat coefficiets. Cosider the recurrece relatio ( ) T () + at ( 1) + bt ( 2) = 0 This is called a homogeeous liear 2d degree

More information

3. Greatest Common Divisor - Least Common Multiple

3. Greatest Common Divisor - Least Common Multiple 3 Greatest Commo Divisor - Least Commo Multiple Defiitio 31: The greatest commo divisor of two atural umbers a ad b is the largest atural umber c which divides both a ad b We deote the greatest commo gcd

More information

CHAPTER 7: Central Limit Theorem: CLT for Averages (Means)

CHAPTER 7: Central Limit Theorem: CLT for Averages (Means) CHAPTER 7: Cetral Limit Theorem: CLT for Averages (Meas) X = the umber obtaied whe rollig oe six sided die oce. If we roll a six sided die oce, the mea of the probability distributio is X P(X = x) Simulatio:

More information

Chapter 7: Confidence Interval and Sample Size

Chapter 7: Confidence Interval and Sample Size Chapter 7: Cofidece Iterval ad Sample Size Learig Objectives Upo successful completio of Chapter 7, you will be able to: Fid the cofidece iterval for the mea, proportio, ad variace. Determie the miimum

More information

, a Wishart distribution with n -1 degrees of freedom and scale matrix.

, a Wishart distribution with n -1 degrees of freedom and scale matrix. UMEÅ UNIVERSITET Matematisk-statistiska istitutioe Multivariat dataaalys D MSTD79 PA TENTAMEN 004-0-9 LÖSNINGSFÖRSLAG TILL TENTAMEN I MATEMATISK STATISTIK Multivariat dataaalys D, 5 poäg.. Assume that

More information

4.3. The Integral and Comparison Tests

4.3. The Integral and Comparison Tests 4.3. THE INTEGRAL AND COMPARISON TESTS 9 4.3. The Itegral ad Compariso Tests 4.3.. The Itegral Test. Suppose f is a cotiuous, positive, decreasig fuctio o [, ), ad let a = f(). The the covergece or divergece

More information

1 Computing the Standard Deviation of Sample Means

1 Computing the Standard Deviation of Sample Means Computig the Stadard Deviatio of Sample Meas Quality cotrol charts are based o sample meas ot o idividual values withi a sample. A sample is a group of items, which are cosidered all together for our aalysis.

More information

Statistical inference: example 1. Inferential Statistics

Statistical inference: example 1. Inferential Statistics Statistical iferece: example 1 Iferetial Statistics POPULATION SAMPLE A clothig store chai regularly buys from a supplier large quatities of a certai piece of clothig. Each item ca be classified either

More information

Theorems About Power Series

Theorems About Power Series Physics 6A Witer 20 Theorems About Power Series Cosider a power series, f(x) = a x, () where the a are real coefficiets ad x is a real variable. There exists a real o-egative umber R, called the radius

More information

Unbiased Estimation. Topic 14. 14.1 Introduction

Unbiased Estimation. Topic 14. 14.1 Introduction Topic 4 Ubiased Estimatio 4. Itroductio I creatig a parameter estimator, a fudametal questio is whether or ot the estimator differs from the parameter i a systematic maer. Let s examie this by lookig a

More information

Class Meeting # 16: The Fourier Transform on R n

Class Meeting # 16: The Fourier Transform on R n MATH 18.152 COUSE NOTES - CLASS MEETING # 16 18.152 Itroductio to PDEs, Fall 2011 Professor: Jared Speck Class Meetig # 16: The Fourier Trasform o 1. Itroductio to the Fourier Trasform Earlier i the course,

More information

Central Limit Theorem and Its Applications to Baseball

Central Limit Theorem and Its Applications to Baseball Cetral Limit Theorem ad Its Applicatios to Baseball by Nicole Aderso A project submitted to the Departmet of Mathematical Scieces i coformity with the requiremets for Math 4301 (Hoours Semiar) Lakehead

More information

Parametric (theoretical) probability distributions. (Wilks, Ch. 4) Discrete distributions: (e.g., yes/no; above normal, normal, below normal)

Parametric (theoretical) probability distributions. (Wilks, Ch. 4) Discrete distributions: (e.g., yes/no; above normal, normal, below normal) 6 Parametric (theoretical) probability distributios. (Wilks, Ch. 4) Note: parametric: assume a theoretical distributio (e.g., Gauss) No-parametric: o assumptio made about the distributio Advatages of assumig

More information

Chapter 5: Inner Product Spaces

Chapter 5: Inner Product Spaces Chapter 5: Ier Product Spaces Chapter 5: Ier Product Spaces SECION A Itroductio to Ier Product Spaces By the ed of this sectio you will be able to uderstad what is meat by a ier product space give examples

More information

Parameter estimation for nonlinear models: Numerical approaches to solving the inverse problem. Lecture 11 04/01/2008. Sven Zenker

Parameter estimation for nonlinear models: Numerical approaches to solving the inverse problem. Lecture 11 04/01/2008. Sven Zenker Parameter estimatio for oliear models: Numerical approaches to solvig the iverse problem Lecture 11 04/01/2008 Sve Zeker Review: Trasformatio of radom variables Cosider probability distributio of a radom

More information

Lesson 17 Pearson s Correlation Coefficient

Lesson 17 Pearson s Correlation Coefficient Outlie Measures of Relatioships Pearso s Correlatio Coefficiet (r) -types of data -scatter plots -measure of directio -measure of stregth Computatio -covariatio of X ad Y -uique variatio i X ad Y -measurig

More information

Basic Elements of Arithmetic Sequences and Series

Basic Elements of Arithmetic Sequences and Series MA40S PRE-CALCULUS UNIT G GEOMETRIC SEQUENCES CLASS NOTES (COMPLETED NO NEED TO COPY NOTES FROM OVERHEAD) Basic Elemets of Arithmetic Sequeces ad Series Objective: To establish basic elemets of arithmetic

More information

Determining the sample size

Determining the sample size Determiig the sample size Oe of the most commo questios ay statisticia gets asked is How large a sample size do I eed? Researchers are ofte surprised to fid out that the aswer depeds o a umber of factors

More information

THE HEIGHT OF q-binary SEARCH TREES

THE HEIGHT OF q-binary SEARCH TREES THE HEIGHT OF q-binary SEARCH TREES MICHAEL DRMOTA AND HELMUT PRODINGER Abstract. q biary search trees are obtaied from words, equipped with the geometric distributio istead of permutatios. The average

More information

3 Basic Definitions of Probability Theory

3 Basic Definitions of Probability Theory 3 Basic Defiitios of Probability Theory 3defprob.tex: Feb 10, 2003 Classical probability Frequecy probability axiomatic probability Historical developemet: Classical Frequecy Axiomatic The Axiomatic defiitio

More information

FIBONACCI NUMBERS: AN APPLICATION OF LINEAR ALGEBRA. 1. Powers of a matrix

FIBONACCI NUMBERS: AN APPLICATION OF LINEAR ALGEBRA. 1. Powers of a matrix FIBONACCI NUMBERS: AN APPLICATION OF LINEAR ALGEBRA. Powers of a matrix We begi with a propositio which illustrates the usefuless of the diagoalizatio. Recall that a square matrix A is diogaalizable if

More information

Confidence Intervals for One Mean

Confidence Intervals for One Mean Chapter 420 Cofidece Itervals for Oe Mea Itroductio This routie calculates the sample size ecessary to achieve a specified distace from the mea to the cofidece limit(s) at a stated cofidece level for a

More information

One-sample test of proportions

One-sample test of proportions Oe-sample test of proportios The Settig: Idividuals i some populatio ca be classified ito oe of two categories. You wat to make iferece about the proportio i each category, so you draw a sample. Examples:

More information

GCSE STATISTICS. 4) How to calculate the range: The difference between the biggest number and the smallest number.

GCSE STATISTICS. 4) How to calculate the range: The difference between the biggest number and the smallest number. GCSE STATISTICS You should kow: 1) How to draw a frequecy diagram: e.g. NUMBER TALLY FREQUENCY 1 3 5 ) How to draw a bar chart, a pictogram, ad a pie chart. 3) How to use averages: a) Mea - add up all

More information

Here are a couple of warnings to my students who may be here to get a copy of what happened on a day that you missed.

Here are a couple of warnings to my students who may be here to get a copy of what happened on a day that you missed. This documet was writte ad copyrighted by Paul Dawkis. Use of this documet ad its olie versio is govered by the Terms ad Coditios of Use located at http://tutorial.math.lamar.edu/terms.asp. The olie versio

More information

NOTES ON PROBABILITY Greg Lawler Last Updated: March 21, 2016

NOTES ON PROBABILITY Greg Lawler Last Updated: March 21, 2016 NOTES ON PROBBILITY Greg Lawler Last Updated: March 21, 2016 Overview This is a itroductio to the mathematical foudatios of probability theory. It is iteded as a supplemet or follow-up to a graduate course

More information

Modified Line Search Method for Global Optimization

Modified Line Search Method for Global Optimization Modified Lie Search Method for Global Optimizatio Cria Grosa ad Ajith Abraham Ceter of Excellece for Quatifiable Quality of Service Norwegia Uiversity of Sciece ad Techology Trodheim, Norway {cria, ajith}@q2s.tu.o

More information

The Stable Marriage Problem

The Stable Marriage Problem The Stable Marriage Problem William Hut Lae Departmet of Computer Sciece ad Electrical Egieerig, West Virgiia Uiversity, Morgatow, WV William.Hut@mail.wvu.edu 1 Itroductio Imagie you are a matchmaker,

More information

Confidence Intervals. CI for a population mean (σ is known and n > 30 or the variable is normally distributed in the.

Confidence Intervals. CI for a population mean (σ is known and n > 30 or the variable is normally distributed in the. Cofidece Itervals A cofidece iterval is a iterval whose purpose is to estimate a parameter (a umber that could, i theory, be calculated from the populatio, if measuremets were available for the whole populatio).

More information

Plug-in martingales for testing exchangeability on-line

Plug-in martingales for testing exchangeability on-line Plug-i martigales for testig exchageability o-lie Valetia Fedorova, Alex Gammerma, Ilia Nouretdiov, ad Vladimir Vovk Computer Learig Research Cetre Royal Holloway, Uiversity of Lodo, UK {valetia,ilia,alex,vovk}@cs.rhul.ac.uk

More information

MARTINGALES AND A BASIC APPLICATION

MARTINGALES AND A BASIC APPLICATION MARTINGALES AND A BASIC APPLICATION TURNER SMITH Abstract. This paper will develop the measure-theoretic approach to probability i order to preset the defiitio of martigales. From there we will apply this

More information

Example 2 Find the square root of 0. The only square root of 0 is 0 (since 0 is not positive or negative, so those choices don t exist here).

Example 2 Find the square root of 0. The only square root of 0 is 0 (since 0 is not positive or negative, so those choices don t exist here). BEGINNING ALGEBRA Roots ad Radicals (revised summer, 00 Olso) Packet to Supplemet the Curret Textbook - Part Review of Square Roots & Irratioals (This portio ca be ay time before Part ad should mostly

More information

Concentration of Measure

Concentration of Measure Copyright c 2008 2010 Joh Lafferty, Ha Liu, ad Larry Wasserma Do Not Distribute Chapter 7 Cocetratio of Measure Ofte we wat to show that some radom quatity is close to its mea with high probability Results

More information

WHEN IS THE (CO)SINE OF A RATIONAL ANGLE EQUAL TO A RATIONAL NUMBER?

WHEN IS THE (CO)SINE OF A RATIONAL ANGLE EQUAL TO A RATIONAL NUMBER? WHEN IS THE (CO)SINE OF A RATIONAL ANGLE EQUAL TO A RATIONAL NUMBER? JÖRG JAHNEL 1. My Motivatio Some Sort of a Itroductio Last term I tought Topological Groups at the Göttige Georg August Uiversity. This

More information

CME 302: NUMERICAL LINEAR ALGEBRA FALL 2005/06 LECTURE 8

CME 302: NUMERICAL LINEAR ALGEBRA FALL 2005/06 LECTURE 8 CME 30: NUMERICAL LINEAR ALGEBRA FALL 005/06 LECTURE 8 GENE H GOLUB 1 Positive Defiite Matrices A matrix A is positive defiite if x Ax > 0 for all ozero x A positive defiite matrix has real ad positive

More information

Taking DCOP to the Real World: Efficient Complete Solutions for Distributed Multi-Event Scheduling

Taking DCOP to the Real World: Efficient Complete Solutions for Distributed Multi-Event Scheduling Taig DCOP to the Real World: Efficiet Complete Solutios for Distributed Multi-Evet Schedulig Rajiv T. Maheswara, Milid Tambe, Emma Bowrig, Joatha P. Pearce, ad Pradeep araatham Uiversity of Souther Califoria

More information

AP Calculus BC 2003 Scoring Guidelines Form B

AP Calculus BC 2003 Scoring Guidelines Form B AP Calculus BC Scorig Guidelies Form B The materials icluded i these files are iteded for use by AP teachers for course ad exam preparatio; permissio for ay other use must be sought from the Advaced Placemet

More information

INFINITE SERIES KEITH CONRAD

INFINITE SERIES KEITH CONRAD INFINITE SERIES KEITH CONRAD. Itroductio The two basic cocepts of calculus, differetiatio ad itegratio, are defied i terms of limits (Newto quotiets ad Riema sums). I additio to these is a third fudametal

More information

THE TWO-VARIABLE LINEAR REGRESSION MODEL

THE TWO-VARIABLE LINEAR REGRESSION MODEL THE TWO-VARIABLE LINEAR REGRESSION MODEL Herma J. Bieres Pesylvaia State Uiversity April 30, 202. Itroductio Suppose you are a ecoomics or busiess maor i a college close to the beach i the souther part

More information

Building Blocks Problem Related to Harmonic Series

Building Blocks Problem Related to Harmonic Series TMME, vol3, o, p.76 Buildig Blocks Problem Related to Harmoic Series Yutaka Nishiyama Osaka Uiversity of Ecoomics, Japa Abstract: I this discussio I give a eplaatio of the divergece ad covergece of ifiite

More information

Notes on exponential generating functions and structures.

Notes on exponential generating functions and structures. Notes o expoetial geeratig fuctios ad structures. 1. The cocept of a structure. Cosider the followig coutig problems: (1) to fid for each the umber of partitios of a -elemet set, (2) to fid for each the

More information

NATIONAL SENIOR CERTIFICATE GRADE 12

NATIONAL SENIOR CERTIFICATE GRADE 12 NATIONAL SENIOR CERTIFICATE GRADE MATHEMATICS P EXEMPLAR 04 MARKS: 50 TIME: 3 hours This questio paper cosists of 8 pages ad iformatio sheet. Please tur over Mathematics/P DBE/04 NSC Grade Eemplar INSTRUCTIONS

More information

CS103X: Discrete Structures Homework 4 Solutions

CS103X: Discrete Structures Homework 4 Solutions CS103X: Discrete Structures Homewor 4 Solutios Due February 22, 2008 Exercise 1 10 poits. Silico Valley questios: a How may possible six-figure salaries i whole dollar amouts are there that cotai at least

More information

Approximating Area under a curve with rectangles. To find the area under a curve we approximate the area using rectangles and then use limits to find

Approximating Area under a curve with rectangles. To find the area under a curve we approximate the area using rectangles and then use limits to find 1.8 Approximatig Area uder a curve with rectagles 1.6 To fid the area uder a curve we approximate the area usig rectagles ad the use limits to fid 1.4 the area. Example 1 Suppose we wat to estimate 1.

More information

Universal coding for classes of sources

Universal coding for classes of sources Coexios module: m46228 Uiversal codig for classes of sources Dever Greee This work is produced by The Coexios Project ad licesed uder the Creative Commos Attributio Licese We have discussed several parametric

More information

5 Boolean Decision Trees (February 11)

5 Boolean Decision Trees (February 11) 5 Boolea Decisio Trees (February 11) 5.1 Graph Coectivity Suppose we are give a udirected graph G, represeted as a boolea adjacecy matrix = (a ij ), where a ij = 1 if ad oly if vertices i ad j are coected

More information

THE ABRACADABRA PROBLEM

THE ABRACADABRA PROBLEM THE ABRACADABRA PROBLEM FRANCESCO CARAVENNA Abstract. We preset a detailed solutio of Exercise E0.6 i [Wil9]: i a radom sequece of letters, draw idepedetly ad uiformly from the Eglish alphabet, the expected

More information

Descriptive Statistics

Descriptive Statistics Descriptive Statistics We leared to describe data sets graphically. We ca also describe a data set umerically. Measures of Locatio Defiitio The sample mea is the arithmetic average of values. We deote

More information

Inference on Proportion. Chapter 8 Tests of Statistical Hypotheses. Sampling Distribution of Sample Proportion. Confidence Interval

Inference on Proportion. Chapter 8 Tests of Statistical Hypotheses. Sampling Distribution of Sample Proportion. Confidence Interval Chapter 8 Tests of Statistical Hypotheses 8. Tests about Proportios HT - Iferece o Proportio Parameter: Populatio Proportio p (or π) (Percetage of people has o health isurace) x Statistic: Sample Proportio

More information

Center, Spread, and Shape in Inference: Claims, Caveats, and Insights

Center, Spread, and Shape in Inference: Claims, Caveats, and Insights Ceter, Spread, ad Shape i Iferece: Claims, Caveats, ad Isights Dr. Nacy Pfeig (Uiversity of Pittsburgh) AMATYC November 2008 Prelimiary Activities 1. I would like to produce a iterval estimate for the

More information

Permutations, the Parity Theorem, and Determinants

Permutations, the Parity Theorem, and Determinants 1 Permutatios, the Parity Theorem, ad Determiats Joh A. Guber Departmet of Electrical ad Computer Egieerig Uiversity of Wiscosi Madiso Cotets 1 What is a Permutatio 1 2 Cycles 2 2.1 Traspositios 4 3 Orbits

More information

Lecture 3. denote the orthogonal complement of S k. Then. 1 x S k. n. 2 x T Ax = ( ) λ x. with x = 1, we have. i = λ k x 2 = λ k.

Lecture 3. denote the orthogonal complement of S k. Then. 1 x S k. n. 2 x T Ax = ( ) λ x. with x = 1, we have. i = λ k x 2 = λ k. 18.409 A Algorithmist s Toolkit September 17, 009 Lecture 3 Lecturer: Joatha Keler Scribe: Adre Wibisoo 1 Outlie Today s lecture covers three mai parts: Courat-Fischer formula ad Rayleigh quotiets The

More information