Distribution of the Sample Mean

Similar documents
Lecture 19: Chapter 8, Section 1 Sampling Distributions: Proportions

5.1 Identifying the Target Parameter

Unbeknownst to us, the entire population consists of 5 cloned sheep with ages 10, 11, 12, 13, 14 months.

Two-sample inference: Continuous data

6.4 Normal Distribution

Chapter 8 Hypothesis Testing Chapter 8 Hypothesis Testing 8-1 Overview 8-2 Basics of Hypothesis Testing

Population Mean (Known Variance)

Confidence Intervals for Cp

Def: The standard normal distribution is a normal probability distribution that has a mean of 0 and a standard deviation of 1.

Point and Interval Estimates

MATH 10: Elementary Statistics and Probability Chapter 7: The Central Limit Theorem

CALCULATIONS & STATISTICS

CHAPTER 7 INTRODUCTION TO SAMPLING DISTRIBUTIONS

Introduction to. Hypothesis Testing CHAPTER LEARNING OBJECTIVES. 1 Identify the four steps of hypothesis testing.

z-scores AND THE NORMAL CURVE MODEL

Estimation and Confidence Intervals

Good luck! BUSINESS STATISTICS FINAL EXAM INSTRUCTIONS. Name:

Comparing Two Groups. Standard Error of ȳ 1 ȳ 2. Setting. Two Independent Samples

Experimental Design. Power and Sample Size Determination. Proportions. Proportions. Confidence Interval for p. The Binomial Test

Confidence Intervals for Cpk

Coefficient of Determination

Review. March 21, S7.1 2_3 Estimating a Population Proportion. Chapter 7 Estimates and Sample Sizes. Test 2 (Chapters 4, 5, & 6) Results

Density Curve. A density curve is the graph of a continuous probability distribution. It must satisfy the following properties:

SAMPLING DISTRIBUTIONS

You flip a fair coin four times, what is the probability that you obtain three heads.

The Math. P (x) = 5! = = 120.

MONT 107N Understanding Randomness Solutions For Final Examination May 11, 2010

SOLUTIONS TO BIOSTATISTICS PRACTICE PROBLEMS

The Central Limit Theorem

Lecture Notes Module 1

Week 3&4: Z tables and the Sampling Distribution of X

Confidence Intervals for One Standard Deviation Using Standard Deviation

Chapter 7 Section 1 Homework Set A

Constructing and Interpreting Confidence Intervals

4. Continuous Random Variables, the Pareto and Normal Distributions

Pr(X = x) = f(x) = λe λx

Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools

Social Studies 201 Notes for November 19, 2003

Introduction to Hypothesis Testing OPRE 6301

Chapter 5: Normal Probability Distributions - Solutions

Chapter 4. Probability and Probability Distributions

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

Math 251, Review Questions for Test 3 Rough Answers

Simple linear regression

6 3 The Standard Normal Distribution

Hypothesis testing - Steps

Math 201: Statistics November 30, 2006

Chapter 3 RANDOM VARIATE GENERATION

REPORT ON THE COCHRANE TOPICAL FLUORIDE REVIEWS INFORMING ABOUT THE IMPORTANCE OF EFFECTIVE USE OF TOPICAL FLUORIDES

CHAPTER 6: Continuous Uniform Distribution: 6.1. Definition: The density function of the continuous random variable X on the interval [A, B] is.

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

Lecture 8. Confidence intervals and the central limit theorem

Random variables, probability distributions, binomial random variable

Statistics I for QBIC. Contents and Objectives. Chapters 1 7. Revised: August 2013

Week 4: Standard Error and Confidence Intervals

SENSITIVITY ANALYSIS AND INFERENCE. Lecture 12

Chapter Study Guide. Chapter 11 Confidence Intervals and Hypothesis Testing for Means

How To Check For Differences In The One Way Anova

PRACTICE PROBLEMS FOR BIOSTATISTICS

Understanding Confidence Intervals and Hypothesis Testing Using Excel Data Table Simulation

The right edge of the box is the third quartile, Q 3, which is the median of the data values above the median. Maximum Median

Introduction to Statistics for Psychology. Quantitative Methods for Human Sciences

Outline. Definitions Descriptive vs. Inferential Statistics The t-test - One-sample t-test

STT315 Chapter 4 Random Variables & Probability Distributions KM. Chapter 4.5, 6, 8 Probability Distributions for Continuous Random Variables

Chapter 7 - Practice Problems 1

sample median Sample quartiles sample deciles sample quantiles sample percentiles Exercise 1 five number summary # Create and view a sorted

Stat 411/511 THE RANDOMIZATION TEST. Charlotte Wickham. stat511.cwick.co.nz. Oct

Statistics 151 Practice Midterm 1 Mike Kowalski

Lesson 17: Margin of Error When Estimating a Population Proportion

An Introduction to Basic Statistics and Probability

Lesson 4 Measures of Central Tendency

AMS 5 CHANCE VARIABILITY

12.5: CHI-SQUARE GOODNESS OF FIT TESTS

Section 14 Simple Linear Regression: Introduction to Least Squares Regression

University of Chicago Graduate School of Business. Business 41000: Business Statistics Solution Key

UNDERSTANDING THE TWO-WAY ANOVA

LOGNORMAL MODEL FOR STOCK PRICES

5/31/ Normal Distributions. Normal Distributions. Chapter 6. Distribution. The Normal Distribution. Outline. Objectives.

The Binomial Probability Distribution

Confidence intervals

Psychology 60 Fall 2013 Practice Exam Actual Exam: Next Monday. Good luck!

Standard Deviation Estimator

Problem sets for BUEC 333 Part 1: Probability and Statistics

Objectives. 6.1, 7.1 Estimating with confidence (CIS: Chapter 10) CI)

Simulation Exercises to Reinforce the Foundations of Statistical Thinking in Online Classes

Descriptive Statistics and Measurement Scales

In the past, the increase in the price of gasoline could be attributed to major national or global

SKEWNESS. Measure of Dispersion tells us about the variation of the data set. Skewness tells us about the direction of variation of the data set.

Lecture 5 : The Poisson Distribution

Inference for two Population Means

A POPULATION MEAN, CONFIDENCE INTERVALS AND HYPOTHESIS TESTING

Permutation Tests for Comparing Two Populations

Map Patterns and Finding the Strike and Dip from a Mapped Outcrop of a Planar Surface

Lesson 1: Comparison of Population Means Part c: Comparison of Two- Means

of course the mean is p. That is just saying the average sample would have 82% answering

Center: Finding the Median. Median. Spread: Home on the Range. Center: Finding the Median (cont.)

Representation of functions as power series

The Standard Normal distribution

TImath.com. Statistics. Areas in Intervals

Comparing Means in Two Populations

Transcription:

Distribution of the Sample Mean

Estimation of the population mean In many investigations the data of interest take on a wide range of possible values. Examples: attachment loss (mm) and DMFS. With this type of data it is often of interest to estimate the population mean, μ. A common estimator for μ is the sample mean, X In this lecture we will focus on the sampling distribution of X

Example: Fluoride Varnish Study* Children in Yakima WA were randomized to two different methods of fluoride varnish delivery Followed for ~3 years Outcome of interest was number of surfaces with new decay * Weinstein, P. et al. Caries Research 2009;43(6):484-90.

Example: Fluoride Varnish Study Can summarize the observed data with the sample mean and standard deviation The sample mean is used as an estimate of the true population mean. X = 7.4 for the Standard group How good of an estimate is it? means ± standard deviations

Example: Fluoride Varnish Study X 7.4 X 7.8 X 7.3 X 6.8 Xis a random variable. X 7.8 X 7.9 X 8.0 X 8.1 Its value is determined by which people are randomly chosen to be in the sample. X 7.4 X 7.4 X 7.5 X 7.0 Many possible samples, many possible X s. X 7.0 X 7.8 X 7.6 X 6.9 X 6.6 X 7.0 X 7.2 X 7.0

Example: Fluoride Varnish Study X 7.4 X 7.8 X 7.3 X 6.8 In our study we only see one occurrence of the sample mean. X 7.8 X 7.9 X 8.0 X 8.1 We will have a better idea of how good our one estimate is if we have good knowledge of how X behaves. X 7.4 X 7.0 X 7.4 X 7.8 X 7.5 X 7.6 X 7.0 X 6.9 That is, if we know the probability distribution of X. X 6.6 X 7.0 X 7.2 X 7.0

The Central Limit Theorem An important result in probability theory states that the probability distribution for averages (i.e. X ) is the Normal distribution* The size of the sample needs to be reasonably large This result will often hold, regardless of the distribution of the original data Probability distribution for X μ *some restrictions will apply

Approximation with the Normal distribution not as good with only 10 observations

More on the distribution of X The expected value of X is μ X is unbiased On average, X is perfect as an estimator of μ

More on the distribution of X The standard deviation of X is SE X = σ n σ is the standard deviation in the population. n is the number of people in the sample It is called the standard error of the mean or SEM

More on the distribution of X SE X = σ n One can think of the SE X as the average error that X makes when estimating μ, or the precision of the estimate. The precision of X is better (SEM is smaller) when the sample is larger (larger n) The precision is worse (SEM is greater) when the population is more variable (has greater σ)

More on the distribution of X By the Central Limit Theorem when n is reasonably large, then the distribution of X will be approximately Normal, with mean μ, and standard deviation σ n X ~ Normal μ, σ2 n

Example: Birthweight data The histogram shows the distribution of birthweights at a Boston hospital. Estimate the probability that the mean birthweight of the next 20 babies born will be greater than 120 oz. μ = 112 oz σ = 20.6 oz

Law of Large Numbers Recall X ~ Normal μ, σ2 As the n gets large, the distribution of X is forced to be closer and closer to μ. n

Law of Large Numbers Recall X ~ Normal μ, σ2 As the n gets large, the distribution of X is forced to be closer and closer to μ. With larger sample sizes X provides a better estimate of μ. The same is true for the sample standard deviation s. As the sample size increases, s should get closer to the population standard deviation σ. n

Standard Error versus Standard Deviation Standard Deviation: describes the variability of a population or a sample. Standard Error: describes the variability of an estimator that is usually a function of the whole sample.

Confidence intervals for the mean If n is large enough we can use the result that X μ ~N(0,1) σ n to a construct confidence interval for μ. However, this would result in a formula that involves σ, a value that we don t usually know. In practice we will estimate σ with the sample standard deviation, s. Substituting the random variable s for σ will alter the distribution of the Z score slightly.

The distribution of the statistic T = X μ s n is called a t distribution with n-1 degrees of freedom, and is denoted by t n-1 The t distribution

The t distribution T = X μ s n The shape of the t distribution is similar to the Normal distribution, but it has higher variability How much higher depends on the degrees of freedom, which depends on the sample size.

The t distribution T = X μ s n The larger the sample, the less variability. t distributions with higher degrees of freedom are more similar to the Normal distribution.

Confidence intervals for the mean If X is Normal or n is large, then T = X μ s with n-1 degrees of freedom and n follows a t distribution P t n 1,0.975 < X μ s n < t n 1,0.975 = 0.95, where t n 1,0.975 is the 97.5 th percentiles of a t n-1 distribution. Doing some algebra gets P X t n 1,0.975 s n < μ < X + t n 1,0.975 s n = 0.95

Confidence intervals for the mean P X t n 1,0.975 s n < μ < X + t n 1,0.975 s n = 0.95 says that there is 95% probability that the interval X t n 1,0.975 s n, X + t n 1,0.975 s n will contain μ.

Example: chewing gum data Group A was comprised of n=25 children. The sample mean of the change in DMFS was 0.72. The sample standard deviation, s, was 5.37. A 95% confidence interval for the true mean change in DMFS is 0.72 t 24,0.975 5.37 25, 0.72 + t 24,0.975 5.37 25 can look up t 24,0.975 = 2.06 in Table 4 in the coursepack (or use Excel) = 0.72 2.06 1.07, 0.72 + 2.06 1.07 = 0.72 2.20, 0.72 + 2.20 = (-2.92, 1.48)

General formula: 100(1-α)% confidence interval for μ: X t n 1,1 α 2 s n where t n 1,1 α 2 is the 100(1-α/2) th percentile of a t n-1 distribution Example: Suppose n = 30. For a 95% confidence interval, we use α = 0.05. We use t 29,0.975 = 2.05, the 97.5 th percentile for the 95% confidence interval.

(1- α)*100% confidence interval for the mean α is meant to indicate the error we are willing to live with. That is, when estimating the mean with 95% confidence, we are allowing an α = 5% chance of missing the true mean. It is standard to use the 1-α/2 th percentile because we want to split the error evenly on either side of the interval

Example: chewing gum data Compute a 99% confidence interval for the true mean change in DMFS for Group A: Acceptable error in this case would be α=1%, so we use the 100(1-0.01/2)% = 99.5 th percentile. From table 4, t 24,.995 = 2.80, thus the 99% CI is 0.72 2.80 5.37 25 = 0.72 3.01 = (-3.73, 2.29) Note: the 99% confidence interval is wider than the 95% confidence interval. It needs to be wider to have a better chance of covering μ.