I. Chi-squared Distributions



Similar documents
Case Study. Normal and t Distributions. Density Plot. Normal Distributions

Hypothesis testing. Null and alternative hypotheses

University of California, Los Angeles Department of Statistics. Distributions related to the normal distribution

Properties of MLE: consistency, asymptotic normality. Fisher information.

CHAPTER 7: Central Limit Theorem: CLT for Averages (Means)

The following example will help us understand The Sampling Distribution of the Mean. C1 C2 C3 C4 C5 50 miles 84 miles 38 miles 120 miles 48 miles

Sampling Distribution And Central Limit Theorem

Z-TEST / Z-STATISTIC: used to test hypotheses about. µ when the population standard deviation is unknown

Center, Spread, and Shape in Inference: Claims, Caveats, and Insights

5: Introduction to Estimation

Confidence Intervals for One Mean

Lesson 15 ANOVA (analysis of variance)

Measures of Spread and Boxplots Discrete Math, Section 9.4

Chapter 7 - Sampling Distributions. 1 Introduction. What is statistics? It consist of three major areas:

1 Computing the Standard Deviation of Sample Means


Incremental calculation of weighted mean and variance

Chapter 6: Variance, the law of large numbers and the Monte-Carlo method

Week 3 Conditional probabilities, Bayes formula, WEEK 3 page 1 Expected value of a random variable

GCSE STATISTICS. 4) How to calculate the range: The difference between the biggest number and the smallest number.

Determining the sample size

1. C. The formula for the confidence interval for a population mean is: x t, which was

Math C067 Sampling Distributions

Here are a couple of warnings to my students who may be here to get a copy of what happened on a day that you missed.

Overview. Learning Objectives. Point Estimate. Estimation. Estimating the Value of a Parameter Using Confidence Intervals

Normal Distribution.

Lesson 17 Pearson s Correlation Coefficient

Output Analysis (2, Chapters 10 &11 Law)

Multi-server Optimal Bandwidth Monitoring for QoS based Multimedia Delivery Anup Basu, Irene Cheng and Yinzhe Yu

THE REGRESSION MODEL IN MATRIX FORM. For simple linear regression, meaning one predictor, the model is. for i = 1, 2, 3,, n

Asymptotic Growth of Functions

Chapter 7 Methods of Finding Estimators

PSYCHOLOGICAL STATISTICS

Chapter 7: Confidence Interval and Sample Size

In nite Sequences. Dr. Philippe B. Laval Kennesaw State University. October 9, 2008

Overview of some probability distributions.

Soving Recurrence Relations

Confidence Intervals. CI for a population mean (σ is known and n > 30 or the variable is normally distributed in the.

1 Correlation and Regression Analysis

Maximum Likelihood Estimators.

SECTION 1.5 : SUMMATION NOTATION + WORK WITH SEQUENCES

Confidence Intervals

One-sample test of proportions

Definition. A variable X that takes on values X 1, X 2, X 3,...X k with respective frequencies f 1, f 2, f 3,...f k has mean

Example 2 Find the square root of 0. The only square root of 0 is 0 (since 0 is not positive or negative, so those choices don t exist here).

THE ARITHMETIC OF INTEGERS. - multiplication, exponentiation, division, addition, and subtraction

This document contains a collection of formulas and constants useful for SPC chart construction. It assumes you are already familiar with SPC.

MEI Structured Mathematics. Module Summary Sheets. Statistics 2 (Version B: reference to new book)

Now here is the important step

, a Wishart distribution with n -1 degrees of freedom and scale matrix.

Approximating Area under a curve with rectangles. To find the area under a curve we approximate the area using rectangles and then use limits to find

Descriptive Statistics

UC Berkeley Department of Electrical Engineering and Computer Science. EE 126: Probablity and Random Processes. Solutions 9 Spring 2006

3 Basic Definitions of Probability Theory

Section 11.3: The Integral Test

Statistical inference: example 1. Inferential Statistics

Mann-Whitney U 2 Sample Test (a.k.a. Wilcoxon Rank Sum Test)

Annuities Under Random Rates of Interest II By Abraham Zaks. Technion I.I.T. Haifa ISRAEL and Haifa University Haifa ISRAEL.

CS103A Handout 23 Winter 2002 February 22, 2002 Solving Recurrence Relations

Discrete Mathematics and Probability Theory Spring 2014 Anant Sahai Note 13

MARTINGALES AND A BASIC APPLICATION

Inference on Proportion. Chapter 8 Tests of Statistical Hypotheses. Sampling Distribution of Sample Proportion. Confidence Interval

Basic Elements of Arithmetic Sequences and Series

hp calculators HP 12C Statistics - average and standard deviation Average and standard deviation concepts HP12C average and standard deviation

Non-life insurance mathematics. Nils F. Haavardsson, University of Oslo and DNB Skadeforsikring

Hypergeometric Distributions

Biology 171L Environment and Ecology Lab Lab 2: Descriptive Statistics, Presenting Data and Graphing Relationships

Trigonometric Form of a Complex Number. The Complex Plane. axis. ( 2, 1) or 2 i FIGURE The absolute value of the complex number z a bi is

How To Solve The Homewor Problem Beautifully

A Mathematical Perspective on Gambling

SEQUENCES AND SERIES

Exploratory Data Analysis

INVESTMENT PERFORMANCE COUNCIL (IPC)

A probabilistic proof of a binomial identity

The Stable Marriage Problem

BASIC STATISTICS. f(x 1,x 2,..., x n )=f(x 1 )f(x 2 ) f(x n )= f(x i ) (1)

Quadrat Sampling in Population Ecology

A Recursive Formula for Moments of a Binomial Distribution

CHAPTER 3 THE TIME VALUE OF MONEY

Chapter 14 Nonparametric Statistics

LECTURE 13: Cross-validation

Department of Computer Science, University of Otago

A Test of Normality. 1 n S 2 3. n 1. Now introduce two new statistics. The sample skewness is defined as:

Central Limit Theorem and Its Applications to Baseball

COMPARISON OF THE EFFICIENCY OF S-CONTROL CHART AND EWMA-S 2 CONTROL CHART FOR THE CHANGES IN A PROCESS

Lecture 3. denote the orthogonal complement of S k. Then. 1 x S k. n. 2 x T Ax = ( ) λ x. with x = 1, we have. i = λ k x 2 = λ k.

Parametric (theoretical) probability distributions. (Wilks, Ch. 4) Discrete distributions: (e.g., yes/no; above normal, normal, below normal)

PROCEEDINGS OF THE YEREVAN STATE UNIVERSITY AN ALTERNATIVE MODEL FOR BONUS-MALUS SYSTEM

Confidence intervals and hypothesis tests

Unit 8: Inference for Proportions. Chapters 8 & 9 in IPS

5.4 Amortization. Question 1: How do you find the present value of an annuity? Question 2: How is a loan amortized?

CS103X: Discrete Structures Homework 4 Solutions

Sequences and Series

Lecture 4: Cauchy sequences, Bolzano-Weierstrass, and the Squeeze theorem

Convexity, Inequalities, and Norms

THE TWO-VARIABLE LINEAR REGRESSION MODEL

Listing terms of a finite sequence List all of the terms of each finite sequence. a) a n n 2 for 1 n 5 1 b) a n for 1 n 4 n 2

Transcription:

1 M 358K Supplemet to Chapter 23: CHI-SQUARED DISTRIBUTIONS, T-DISTRIBUTIONS, AND DEGREES OF FREEDOM To uderstad t-distributios, we first eed to look at aother family of distributios, the chi-squared distributios. These will also appear i Chapter 26 i studyig categorical variables. Notatio: N(μ, σ) will stad for the ormal distributio with mea μ ad stadard deviatio σ. The symbol ~ will idicate that a radom variable has a certai distributio. For example, Y ~ N(4, 3) is short for Y has a ormal distributio with mea 4 ad stadard deviatio 3. I. Chi-squared Distributios Defiitio: The chi-squared distributio with k degrees of freedom is the distributio of a radom variable that is the sum of the squares of k idepedet stadard ormal radom variables. Weʼll call this distributio χ 2 (k). Thus, if Z 1,..., Z k are all stadard ormal radom variables (i.e., each Zi ~ N(0,1)), ad if they are idepedet, the Z 1 2 +... + Z k 2 ~ χ 2 (k). For example, if we cosider takig simple radom samples (with replacemet) y 1,..., y k from some N(µ,σ) distributio, ad let Y i deote the radom variable whose value is y i, the each is stadard ormal, ad,, are idepedet, so + + ~ χ 2 (k). Notice that the phrase degrees of freedom refers to the umber of idepedet stadard ormal variables ivolved. The idea is that sice these k variables are idepedet, we ca choose them freely (i.e., idepedetly). The followig exercise should help you assimilate the defiitio of chi-squared distributio, as well as get a feel for the χ 2 (1) distributio. Exercise 1: Use the defiitio of a χ 2 (1) distributio ad the 66-95-99.7 rule for the stadard ormal distributio (ad/or aythig else you kow about the stadard ormal distributio) to help sketch the graph of the probability desity

2 fuctio of a χ 2 (1) distributio. (For example, what ca you coclude about the χ 2 (1) curve from the fact that about 68% of the area uder the stadard ormal curve lies betwee -1 ad 1? What ca you coclude about the χ 2 (1) curve from the fact that about 5% of the area uder the stadard ormal lies beyod ± 2?) For k > 1, itʼs harder to figure out what the χ 2 (k) distributio looks like just usig the defiitio, but simulatios usig the defiitio ca help. The followig diagram shows histograms of four radom samples of size 1000 from a N(0,1) distributio: These four samples were put i colums labeled st1, st2, st3, st4. Takig the sum of the squares of the first two of these colums the gives (usig the defiitio of a chi-squared distributio with two degrees of freedom) a radom sample of size 1000 from a χ 2 (2) distributio. Similarly, addig the squares of the first three colums gives a radom sample from a χ 2 (3) distributio, ad formig the colum (st1) 2 +(st2) 2 + (st3) 2 +(st4) 2 yields a radom sample from a χ 2 (4) distributio. Histograms of these three samples from chi-squared distributios are show below, with the sample from the χ 2 (2) distributio i the upper left, the sample from the χ 2 (3) distributio i the upper right, ad the sample from the χ 2 (4) distributio i the lower left. The histograms show the shapes of the three distributios: the χ 2 (2) has a sharp peak at the left; the χ 2 (3) distributio has a less sharp peak ot quite as far left; ad the χ 2 (4) distributio has a still lower peak still a little further to the right. All three distributios are oticeably skewed to the right.

3 There is a picture of a typical chi-squared distributio o p. A-113 of the text. Thought questio: As k gets bigger ad bigger, what type of distributio would you expect the χ 2 (k) distributio to look more ad more like? [Hit: A chi-squared distributio is the sum of idepedet radom variables.] Theorem: A χ 2 (1) radom variable has mea 1 ad variace 2. The proof of the theorem is beyod the scope of this course. It requires usig a (rather messy) formula for the probability desity fuctio of a χ 2 (1) variable. Some courses i mathematical statistics iclude the proof. Exercise 2: Use the Theorem together with the defiitio of a χ 2 (k) distributio ad properties of the mea ad stadard deviatio to fid the mea ad variace of a χ 2 (k) distributio. II. t Distributios Defiitio: The t distributio with k degrees of freedom is the distributio of a Z radom variable which is of the form where U k i. Z ~ N(0,1) ii. U ~ χ 2 (k), ad iii. Z ad U are idepedet.

4 Commet: Notice that this defiitio says that the otio of degrees of freedom for a t-distributio comes from the otio of degrees of freedom of a chi-squared distributio: The degrees of freedom of a t-distributio are the umber of squares of idepedet ormal radom variables that go ito makig up the chi-squared distributio occurrig uder the radical i the deomiator of the t radom Z variable. U k To see what a t-distributio looks like, we ca use the four stadard ormal samples of 1000 obtaied above to simulate a t distributio with 3 degrees of freedom: We use colum s1 as our sample from Z ad (st2) 2 + (st3) 2 +(st4) 2 as our Z sample from U to calculate a sample from the t distributio with 3 degrees U 3 of freedom. The resultig histogram is: Note that this histogram shows a distributio similar to the t-model with 2 degrees of freedom show o p. 554 of the textbook: Itʼs arrower i the middle tha a ormal curve would be, but has heavier tails ote i particular the outliers that would be very uusual i a ormal distributio. The followig ormal probability plot of the simulated data draws attetio to the outliers as well as the oormality. (The plot is quite typical of a ormal probability plot for a distributio with heavy tails o both sides.)

5 III. Why the t-statistic itroduced o p. 553 of the textbook has a t- distributio: 1. Geeral set-up ad otatio: Puttig together the two parts of the defiitio of t-statistic i the box o p. 553 gives t = y µ, s where y ad s are, respectively, the mea ad sample stadard deviatio calculated from the sample y 1, y 2,, y. To talk about the distributio of the t-statistic, we eed to cosider all possible radom 1 samples of size from the populatio for Y. Weʼll use the covetio of usig capital letters for radom variables ad small letters for their values for a particular sample. I this case, we have three statistics ivolved: Y, S ad T. All three have the same associated radom process: Choose a radom sample from the populatio for Y. Their values are as follows: The value of Y is the sample mea y of the sample chose. The value of S is the sample stadard deviatio s of the sample chose. The value of T is the t-statistic t = y µ calculated for the sample chose. s The distributios of Y, S ad T are called the samplig distributios of the mea, the sample stadard deviatio, ad the t-statistic, respectively.

6 Note that the formula for calculatig t from the data gives the formula T = Y µ S, expressig the radom variable T as a fuctio of the radom variables Y ad S. Weʼll first discuss the t-statistic i the case where our uderlyig radom variable Y is ormal, the exted to the more geeral situatio stated i Chapter 23. 2. The case of Y ormal. For Y ormal, we will use the followig theorem: Theorem: If Y is ormal with mea µ ad stadard deviatio, ad if we oly cosider simple radom samples with replacemet 2, of fixed size, the a) The (samplig) distributio of Y is ormal with mea µ ad stadard deviatio, b) Y ad S are idepedet radom variables, ad c) (-1) S 2 2 ~ χ2 (-1) The proof of this theorem is beyod the scope of this course, but may be foud i most textbooks o mathematical statistics. Note that (a) is a special case of the Cetral Limit Theorem. We will give some discussio of the plausibility of parts (b) ad (c) i the Commets sectio below. So for ow suppose Y is a ormal radom variable with mea µ ad stadard deviatio : Y~ N(µ, ). By (a) of the Theorem, the samplig distributio of the sample mea Y (for simple radom samples with replacemet, of fixed size ) is ormal with mea µ ad stadard deviatio : Y ~ N(µ, Stadardizig Y the gives ). Y µ ~ N(0,1). (*) But we doʼt kow, so we eed to approximate it by the sample stadard deviatio s. It would be temptig to say that sice s is approximately equal to,

7 this substitutio (i other words, cosiderig Y µ ) should give us somethig s approximately ormal. Ufortuately, there are two problems with this: First, usig a approximatio i the deomiator of a fractio ca sometimes make a big differece i what youʼre tryig to approximate (See Footote 3 for a example.) Secod, we are usig a differet value of s for differet samples (sice s is calculated from the sample, just as the value of Y is.) This is why we eed to work with the radom variable S rather tha the idividual sample stadard deviatio s. I other words, we eed to work with the radom variable T = Y µ S To use the theorem, first apply a little algebra to to see that Y µ S = (**) Sice Y is ormal, the umerator i the right side of (**) is stadard ormal, as oted i equatio (*) above. Also, by (c) of the theorem, the deomiator of the right side of (**) is of the form U ( 1) where U = (-1) S 2 2 ~ χ2 (-1). Sice alterig radom variables by subtractig costats or dividig by costats does ot affect idepedece, (b) of the theorem implies that the umerator ad deomiator of the right side of (**) are idepedet. Thus for Y ormal, our test statistic T = Y µ S satisfies the defiitio of a t distributio with -1 degrees of freedom. 3. More geerally: The textbook states (pp. 555 556) assumptios ad coditios that are eeded to use the t-model: The headig Idepedece Assumptio o p. 555 icludes a Idepedece Assumptio, a Radomizatio Coditio, ad the 10% Coditio. These three essetially say that the sample is close eough to a simple radom with replacemet to make the theorem close eough to true, still assumig ormality of Y. The headig Normal Populatio Assumptio o p. 556 cosists of the Nearly Normal Coditio, which essetially says that we ca also weake ormality somewhat ad still have the theorem close eough to true for most practical purposes. (The rough idea here is that, by the

8 cetral limit theorem, Y will still be close eough to ormal to make the theorem close eough to true.) The appropriateess of these coditios as good rules of thumb has bee established by a combiatio of mathematical theorems ad simulatios. 4. Commets: i. To help covice yourself of the plausibility of Part (b) of the theorem, try a simulatio as follows: Take a umber of simple radom samples from a ormal distributio ad plot the resultig values of Y vs S. Here is the result from oe such simulatio: The left plot shows y vs s for 1000 draws of a sample of size 25 from a stadard ormal distributio. The right plot shows y vs s for 1000 draws of a sample of size 25 from a skewed distributio. The left plot is elliptical i ature, which is what is expected if the two variables plotted are ideed idepedet. O the other had, the right plot shows a oticeable depedece betwee Y ad S: y icreases as s icreases, ad the coditioal variace of Y (as idicated by the scatter) also icreases as S icreases. ii. To get a little isight ito (c) of the Theorem, ote first that (-1) S 2 = 2, which is ideed a sum of squares, but of squares, ot -1. However, the radom variables beig squared are ot idepedet; the depedece arises from the relatioship Y= Y. Usig this relatioship, it is possible to show

9 that (-1) is ideed the sum of -1 idepedet, stadard ormal radom variables. Although the geeral proof is somewhat ivolved, the idea is fairly easy to see whe = 2: First, a little algebra shows that (for = 2) Y - Y = ad Y - Y =. Pluggig these ito the formula for S 2 (with = 2) the gives S 2 2 (-1) = 2 = (***) Sice Y 1 ad Y 2 are idepedet ad both are ormal, Y 1 - Y 2 is also ormal (by a theorem from probability). Sice Y 1 ad Y 2 have the same distributio, E(Y 1 - Y 2 ) = E(Y 1 ) - E(Y 2 ) = 0 Usig idepedece of Y 1 ad Y 2, we ca also calculate Var(Y 1 - Y 2 ) = Var(Y 1 ) + Var(Y 2 ) = 2σ Stadardizig Y 1 - Y 2 the shows that is stadard ormal, so S 2 2 equatio (***) shows that (-1) ~ χ 2 (1) whe = 2. Foototes 1. Radom is admittedly a little vague here. I sectio 2, iterpret it to mea simple radom sample with replacemet. (See also Footote 2). I sectio 3, iterpret radom to mea Fittig the coditios ad assumptios for the t- model. 2. Techically, the requiremets are that the radom variables Y 1, Y 2,, Y represetig the first, secod, etc. values i the sample are idepedet ad idetically distributed (abbreviated as iid), which meas they are idepedet ad have the same distributio (i.e., the same probability desity fuctio). 3. Cosider, for example, usig 0.011 as a approximatio of 0.01 whe estimatig 1/0.01. Although 0.011 differs from 0.01 by oly 0.001, whe we use the approximatio i the deomiator, we get 1/0.011 = 90. 90, which differs by more tha 9 from 1/0.01 = 100 a differece almost 3 orders of magitude greater tha the differece betwee 0.01 ad 0.001.