Chapter 7 - Sampling Distributions. 1 Introduction. What is statistics? It consist of three major areas:



Similar documents
University of California, Los Angeles Department of Statistics. Distributions related to the normal distribution

Case Study. Normal and t Distributions. Density Plot. Normal Distributions

I. Chi-squared Distributions

Properties of MLE: consistency, asymptotic normality. Fisher information.

Overview of some probability distributions.

Chapter 7 Methods of Finding Estimators

Z-TEST / Z-STATISTIC: used to test hypotheses about. µ when the population standard deviation is unknown

Inference on Proportion. Chapter 8 Tests of Statistical Hypotheses. Sampling Distribution of Sample Proportion. Confidence Interval

Hypothesis testing. Null and alternative hypotheses

CHAPTER 7: Central Limit Theorem: CLT for Averages (Means)

5: Introduction to Estimation

Confidence Intervals for One Mean

1. C. The formula for the confidence interval for a population mean is: x t, which was

Chapter 6: Variance, the law of large numbers and the Monte-Carlo method

UC Berkeley Department of Electrical Engineering and Computer Science. EE 126: Probablity and Random Processes. Solutions 9 Spring 2006

The following example will help us understand The Sampling Distribution of the Mean. C1 C2 C3 C4 C5 50 miles 84 miles 38 miles 120 miles 48 miles

Measures of Spread and Boxplots Discrete Math, Section 9.4

Chapter 7: Confidence Interval and Sample Size

Sampling Distribution And Central Limit Theorem

Determining the sample size

Normal Distribution.

1 Computing the Standard Deviation of Sample Means

Statistical inference: example 1. Inferential Statistics


One-sample test of proportions

Overview. Learning Objectives. Point Estimate. Estimation. Estimating the Value of a Parameter Using Confidence Intervals

Chapter 14 Nonparametric Statistics

Center, Spread, and Shape in Inference: Claims, Caveats, and Insights

PSYCHOLOGICAL STATISTICS

Lesson 15 ANOVA (analysis of variance)

Output Analysis (2, Chapters 10 &11 Law)

A Test of Normality. 1 n S 2 3. n 1. Now introduce two new statistics. The sample skewness is defined as:

In nite Sequences. Dr. Philippe B. Laval Kennesaw State University. October 9, 2008

Central Limit Theorem and Its Applications to Baseball

Math C067 Sampling Distributions

1 Review of Probability

Sequences and Series

Maximum Likelihood Estimators.

How To Calculate A Radom Umber From A Probability Fuctio

A probabilistic proof of a binomial identity

LECTURE 13: Cross-validation

Our aim is to show that under reasonable assumptions a given 2π-periodic function f can be represented as convergent series

SAMPLE QUESTIONS FOR FINAL EXAM. (1) (2) (3) (4) Find the following using the definition of the Riemann integral: (2x + 1)dx

Asymptotic Growth of Functions

Exam 3. Instructor: Cynthia Rudin TA: Dimitrios Bisias. November 22, 2011

1 Correlation and Regression Analysis

Approximating Area under a curve with rectangles. To find the area under a curve we approximate the area using rectangles and then use limits to find

BASIC STATISTICS. f(x 1,x 2,..., x n )=f(x 1 )f(x 2 ) f(x n )= f(x i ) (1)

Descriptive Statistics

Section 11.3: The Integral Test

Lesson 17 Pearson s Correlation Coefficient

THE REGRESSION MODEL IN MATRIX FORM. For simple linear regression, meaning one predictor, the model is. for i = 1, 2, 3,, n

Week 3 Conditional probabilities, Bayes formula, WEEK 3 page 1 Expected value of a random variable

Modified Line Search Method for Global Optimization

PROCEEDINGS OF THE YEREVAN STATE UNIVERSITY AN ALTERNATIVE MODEL FOR BONUS-MALUS SYSTEM

Hypergeometric Distributions

Practice Problems for Test 3

Confidence Intervals

Definition. A variable X that takes on values X 1, X 2, X 3,...X k with respective frequencies f 1, f 2, f 3,...f k has mean

Unbiased Estimation. Topic Introduction

Multi-server Optimal Bandwidth Monitoring for QoS based Multimedia Delivery Anup Basu, Irene Cheng and Yinzhe Yu

5.4 Amortization. Question 1: How do you find the present value of an annuity? Question 2: How is a loan amortized?

Infinite Sequences and Series

Annuities Under Random Rates of Interest II By Abraham Zaks. Technion I.I.T. Haifa ISRAEL and Haifa University Haifa ISRAEL.

Basic Elements of Arithmetic Sequences and Series

Non-life insurance mathematics. Nils F. Haavardsson, University of Oslo and DNB Skadeforsikring

Research Method (I) --Knowledge on Sampling (Simple Random Sampling)

4.3. The Integral and Comparison Tests

Discrete Mathematics and Probability Theory Spring 2014 Anant Sahai Note 13

Topic 5: Confidence Intervals (Chapter 9)

Incremental calculation of weighted mean and variance

3 Basic Definitions of Probability Theory

THE TWO-VARIABLE LINEAR REGRESSION MODEL

Lecture 4: Cauchy sequences, Bolzano-Weierstrass, and the Squeeze theorem

Chapter 5: Inner Product Spaces

Soving Recurrence Relations

Biology 171L Environment and Ecology Lab Lab 2: Descriptive Statistics, Presenting Data and Graphing Relationships

Vladimir N. Burkov, Dmitri A. Novikov MODELS AND METHODS OF MULTIPROJECTS MANAGEMENT

Trigonometric Form of a Complex Number. The Complex Plane. axis. ( 2, 1) or 2 i FIGURE The absolute value of the complex number z a bi is

Convexity, Inequalities, and Norms

Quadrat Sampling in Population Ecology

The Stable Marriage Problem

A Recursive Formula for Moments of a Binomial Distribution

Confidence Intervals. CI for a population mean (σ is known and n > 30 or the variable is normally distributed in the.

MARTINGALES AND A BASIC APPLICATION

Unit 8: Inference for Proportions. Chapters 8 & 9 in IPS

TI-83, TI-83 Plus or TI-84 for Non-Business Statistics

THE ARITHMETIC OF INTEGERS. - multiplication, exponentiation, division, addition, and subtraction

CHAPTER 3 THE TIME VALUE OF MONEY

SECTION 1.5 : SUMMATION NOTATION + WORK WITH SEQUENCES

Mann-Whitney U 2 Sample Test (a.k.a. Wilcoxon Rank Sum Test)

THE HEIGHT OF q-binary SEARCH TREES

FIBONACCI NUMBERS: AN APPLICATION OF LINEAR ALGEBRA. 1. Powers of a matrix

Research Article Sign Data Derivative Recovery

Example 2 Find the square root of 0. The only square root of 0 is 0 (since 0 is not positive or negative, so those choices don t exist here).

Analyzing Longitudinal Data from Complex Surveys Using SUDAAN

Transcription:

Chapter 7 - Samplig Distributios 1 Itroductio What is statistics? It cosist of three major areas: Data Collectio: samplig plas ad experimetal desigs Descriptive Statistics: umerical ad graphical summaries of the data collected from a sample Iferetial Statistics: estimatio, cofidece itervals ad hypothesis testig of parameters of iterest Statistical procedures are part (steps -5 below) of the Scietific Method first espoused by Sir Fracis Baco (1561-166), who wrote to lear the secrets of ature ivolves collectig data ad carryig out experimets. The moder methodology: 1. Observe some pheomeo. State a hypothesis explaiig the pheomeo 3. Collect data 4. Test: Does the data support the hypothesis? 5. Coclusio. If the test fails, go back to step. If you ecouter a scietific claim that you disagree with, scrutiize the steps of the scietific method used. Statistics do t lie, but liars do statistics. - Mark Twai. What is mathematical statistics?: The study of the theoretical foudatio of statistics. What is a statistic? Let X 1, X,..., X be a set of observable radom variables (such as a radom sample of idividuals from a populatio of iterest). A statistic T is a fuctio applied to X 1, X,..., X. POPULATION vs. SAMPLE: T = T (X 1, X,..., X ) Populatio: The etire group of idividuals (subjects or uits), that ca be either existet or coceptual, that we wat iformatio about. Sample: A part of the populatio from which data is collected. 1

PARAMETER vs. STATISTIC: Parameter: A umerical value calculated from all idividuals i the populatio. { x xp (x) if x is discrete Populatio mea: µ = xf(x)dx if x is cotiuous { Populatio variace: σ x = (x µ) P (x) if x is discrete (x µ) f(x)dx if x is cotiuous Populatio proportio: p is the true proportio of 1 s i the populatio. Populatio miimum: m = mi x x. Populatio media: φ.5 is the (ot ecessarily uique) value such that P (X φ.5 ).5 ad P (X φ.5 ).5. Statistic: A umerical value calculated from a sample X 1,..., X. Sample mea: X = T (X 1, X,..., X ) = 1 i=1 X i Sample variace: S = T (X 1, X,..., X ) = 1 1 i=1 (X i X) Sample proportio: ˆp = T (X 1, X,..., X ) = umber of 1 s is the proportio of 1 s i the sample First order statistic: X (1) = T (X 1, X,..., X ) = mi(x 1, X,..., X ) Sample media: ˆφ.5 T (X 1, X,..., X ) = { the middle value if is odd the average of the two middle values if is eve. Samplig Distributios The value of a statistic varies from sample to sample. I other words, differet samples will result i differet values of a statistic. Therefore, a statistic is a radom variable with a distributio! Samplig Distributio: The distributio of statistic values from all possible samples of size. Brute force way to costruct a samplig distributio: Take all possible samples of size from the populatio. Compute the value of the statistic for each sample. Display the distributio of statistic values as a table, graph, or equatio.

.1 Samplig Distributio of X Oe commo populatio parameter of iterest is the populatio mea µ. I iferetial statistics, it is commo to use the statistic X to estimate µ. Thus, the samplig distributio of X is of iterest. Mea ad Variace For ay sample size ad a SRS X 1, X,..., X from ay populatio distributio with mea µ x ad variace σ x: E(X) = µ x = µ x ad E( i=1 X i) = µ x Var(X) = σ x = σ x/ ad Var( i=1 X i) = σ x This result was proved i Example 5.7 usig Theorem 5.1: Let a i for i = 1,,..., k be costats ad let X i for i = 1,,..., k be radom variables. The ( k ) k E a i X i = a i E(X i ) (idepedece ot required) ad Var i=1 i=1 ( k i=1 a ix i ) = k i=1 a i Var(X i ) if X 1, X,..., X k are mutually idepedet. Samplig Distributio whe the data are ormal For ay sample size ad a SRS X 1, X,..., X from a ormal populatio distributio N(µ x, σ x) (Theorem 7.1): X N(µ x, σ x/) i=1 X i N(µ x, σ x) Examples: Suppose that adult male cholesterol levels are distributed as N(10mg/dL, (37mg/dL) ). 1. Give a iterval cetered at the mea µ which captures the middle 95% of all cholesterol values.. Give the samplig distributio of X, the sample mea of cholesterol values take from SRSs of size = 10. 3. Give a iterval cetered at the mea µ which captures the middle 95% of all sample mea cholesterol values take from SRSs of size = 10. 3

Samplig Distributio for large sample sizes For a LARGE sample size ad a SRS X 1, X,..., X from ay populatio distributio with mea µ x ad variace σ x <, the approximate samplig distributios are: ( ) X N µ x, σ x ad X i N ( µ x, σx). This last result follows from the celebrated Cetral Limit Theorem, stated i your book as Theorem 7.4: Let X 1, X,..., X be a SRS from a distributio with mea µ x ad variace σ x <. The the distributio of coverges to N(0, 1) as. We will prove this theorem later. Importat Examples: i=1 U = X µ x σ x / 1. Beroulli { trials. 1 if with probability p = Let X = 0 if with probability(1 p) = The X Ber(p) = Bi( = 1, p). (a) Draw a picture of the pdf of X. (b) Fid E(X) ad V ar(x). (c) Suppose a SRS X 1, X,..., X 40 was collected. Give the approximate samplig distributio of X (ormally deoted by ˆp = X, which idicates that X is a sample proportio).. Normal approximatio to the Biomial (sectio 7.5) I the previous example we cosidered the rv X Ber(p) = Bi( = 1, p). Suppose that a SRS X 1, X,..., X has bee collected with > 1. 4

(a) Give the distributio of Y = i X i, so that Y is the umber of successes out of trials (which is a discrete distributio you leared about i chapter 3). (b) Draw a picture of the pdf of Y = i X i. (c) Give E(Y ) ad V ar(y ). (d) I the Example #1c the Cetral Limit Theorem showed that for ay sample size, whe X Ber(p), the ˆp = X N(, ). (e) I additio to meas X, the Cetral Limit Theorem also gives the approximate samplig distributio of a sum X i. Use the Cetral Limit Theorem to give the approximate samplig distributio of Y = i X i. (f) If the true proportio of supporters of healthcare reform i the Motaa populatio is p =.53, the out of a SRS of Motaas of size = 1000, whats the probability that less the 500 will pledge support?. Samplig Distributio of S Oe commo populatio parameter of iterest is the populatio variace σ. I iferetial statistics, it is commo to use the statistic S to estimate σ. Thus, the samplig distributio of S is of iterest. χ distributio: The sum of squares of idepedet stadard ormal variables is distributed as a χ radom radom variable. More formally (Theorem 7.): 5

If Z 1,..., Z ν are idepedet ad distributed as N(0, 1), the ν i=1 Z i χ (ν). χ (ν) is called the chi-square distributio with ν degrees of freedom. For ay sample size ad a SRS X 1, X,..., X from a ormal distributio N(µ x, σx), ( ) Xi µ x χ (). i=1 σ x Use Table 6 o p. 850 of the textbook for probability calculatios. Mea ad Variace If C χ (ν), the E(C) = ν Var(C) = ν. For ay sample size > 1 ad a SRS X 1, X,..., X from ay populatio distributio with mea µ x ad variace σ x, E(S ) = σ x Var(S ) = σ4 x 1. Samplig distributio whe the data are ormal For ay sample size > 1 ad a SRS X 1, X,..., X from a ormal distributio N(µ x, σx) (Theorem 7.3): ( 1)S χ ( 1).3 Samplig Distributio of X µ S/ σ x I iferetial statistics, the test statistic X µ S/ is ofte used to determie how may stadard errors (s/ ) the sample mea X is from a hypothesized value of µ. Thus, the samplig distributio of X µ S/ is of iterest. t distributio (Defiitio 7.): If Z N(0, 1), W χ (ν), ad Z ad W are idepedet, the: T = Z t(ν). W/ν t(ν) is called the t distributio with ν degrees of freedom. 6

Mea ad Variace If T t(ν), the E(T ) = 0 for ν > 1 Var(T ) = ν for ν > ν Samplig distributio whe the data are ormal For ay sample size > 1 ad a SRS X 1, X,..., X from N(µ, σ ), the X ad S are idepedet Ad ow by Theorems 7.1 ad 7.3 ad Defiitio 7. T = (X µ)/σ ( 1)S /σ 1 = X µ S/ t( 1). Samplig Distributio for large sample sizes For a LARGE sample size ad a SRS X 1, X,..., X from ay populatio distributio with mea µ x ad variace σ x < : T = X µ x S/ t( 1). Some useful facts: The pdf of T is give by f(t) = ν+1 Γ( ) ( ) (ν+1)/ 1 Γ( ν ) 1 + t νπ ν The t distributios are symmetric about 0 ad is bell-shaped like the ormal N(0, 1) distributio but with thicker tails. As ν, the t(ν) distributio approaches the stadard ormal distributio. Use Table 5 o page 849 for probability calculatios. Examples: Suppose that adult male cholesterol levels are distributed as N(10mg/dL, σ ). 1. Give the samplig distributio of X µ S/, where the statistics X ad S are calculated from a SRS of size = 10. 7

. If S = 36.5, give a iterval cetered at the mea µ which captures the middle 95% of all sample mea cholesterol values take from SRSs of size = 10..4 Samplig Distributio of S 1/σ 1 S /σ I iferetial statistics, it is ofte of iterest to compare the variaces σ1 ad σ from two populatios, ad determie if they are differet. Based o two SRSs, oe of size 1 with sample variace S1 ad the other of size with sample variace S, the statistic S 1 /σ 1 is ofte used. S /σ Thus, the samplig distributio of S 1 /σ 1 is of iterest. S /σ F distributio (Defiitio 7.3): If W 1 χ (ν 1 ) ad W χ (ν ) are idepedet, the: F = W 1/ν 1 W /ν F (ν 1, ν ). F (ν 1, ν ) is called the F distributio with ν 1 umerator degrees of freedom ad ν deomiator degrees of freedom. Mea ad Variace If F F (ν 1, ν ), the E(F ) = ν ν for ν > Var(F ) = ν (ν 1+ν ) ν 1 (ν ) (ν 4) for ν > 4 Samplig Distributio whe the data are ormal If X 1, X,..., X 1 are a SRS from N(µ 1, σ 1) ad if Y 1, X,..., X are a idepedet SRS from N(µ, σ ), the W 1 = ( 1 1)S1 σ1 are idepedet, ad so χ ( 1 1) ad W = ( 1)S σ χ ( 1) F = W 1/( 1 1) W /( 1) = S 1/σ 1 S /σ F ( 1 1, 1). Some useful facts: 8

If X F (ν 1, ν ), the the pdf of X is f(x) = Γ ( ν 1 +ν ) Γ ( ) ( ν 1 Γ ν ) ( ν1 Use Table 7 o p. 85 for probability calculatios. ν ) ν1 / ( x (ν 1/) 1 1 + ν ) (ν1 +ν )/ 1 x. ν Example: Is the variace of female reactio times differet tha the variace of male reactio times? Jaso Paulak at the Uiversity of Ciciati ra a web based reactio experimet to aswer this questio. 1 = 398 females participated, with X 1 = 517 ms ad S 1 = 899 ms. = 469 males participated, with X = 383. ms ad S = 335.7 ms. Give the samplig distributio of F = S 1 /σ 1. S /σ If the two populatio variaces are ideed the same, σ 1 = σ, the what is the probability of observig a ratio of sample variaces that we did, or larger? 9

3 Proof of the Cetral Limit Theorem Cetral Limit Theorem Let X 1, X,..., X be a SRS from a distributio with mea µ x ad variace σ x <. The lim U X µ x = lim σ x / = U N(0, 1). Proof (sectio 7.4): The proof relies o momet geeratig fuctios (mgf) from sectio 3.9 of your textbook. We saw that every rv, as well as havig a uique pmf or pdf, also has a uique mgf (Theorem 7.5), writte as M(t). This otatio for the mgf reiforces the fact that it is a fuctio of a real umber t i some eighborhood of t = 0. The mgf for a cotiuous rv Y is defied by M(t) = E(e ty ) = e ty f(y)dy, where f(y) is the pdf of Y. To prove the Cetral Limit Theorem: 1. Use Taylor series to fid the mgf of Z i = X i µ σ.. Fid the mgf of U = X µx σ x /. 3. Show that as, the mgf of U coverges to e t /. 4. Recall that the mgf of a stadard ormal rv is e t /. Thus, by Theorem 7.5, the distributio of U = lim U is a stadard ormal! Here we go: 1. Let Z i = X i µ. This is where we eed the fiite variace assumptio: if σ is ifiite, the σ Z i is ot well defied! (a) Show that EZ = 0, V ar(z) = E(Z ) = 1. Hit: This is easy sice Z is a liear fuctio of X. 10

(b) Show that M Z (0) = 1, d dt M Z(0) = 0, ad d dt M Z (0) = 1. (c) Use Taylor s Theorem (ad the results from (b)) to expad M Z (t) about t = 0 to show that M Z ( t ) = 1 + (t/ ) + R( t! ), where R( t ) is a remaider of cubic terms t of ad higher. (d) Show that the remaider lim R( t ) (t/ ) = lim R( t ) = 0. 11

. By defiitio U = X µ x σ x / = ( i X ) i µ x = 1 Z i, σ x which shows that the mgf of U whe X 1,..., X is a simple radom sample is M U (t) = Π i=1m Zi ( t ( ) = M Z ( t ) ) (by Theorem 6. ad Exercise 3.158). 3. Show that lim M U (t) = e t. Hit: Substitute i the Taylor series approximatio for M Z ( t ); use the fact that lim ( 1 + a ) = e lim a ; use the fact that lim R( t ) (t/ ) = lim R( t ) = 0. i 4. Recall that if U N(0, 1), the M U (t) = e t / from sectio 3.9 of your textbook. To prove this, multiply the two expoetials i the itegral, the complete the square to show that E(e tu ) = e t 1 π e 1 (u t) 1 du. Now observe that π e 1 (u t) is the pdf of a N(t, 1) rv. 1