Section 7. Sampling Distribution of the Mean and the Central Limit Theorem

Similar documents
Section 5 Part 2. Probability Distributions for Discrete Random Variables

Statistics I for QBIC. Contents and Objectives. Chapters 1 7. Revised: August 2013

Two-sample inference: Continuous data

Unbeknownst to us, the entire population consists of 5 cloned sheep with ages 10, 11, 12, 13, 14 months.

4. Continuous Random Variables, the Pareto and Normal Distributions

Week 4: Standard Error and Confidence Intervals

HYPOTHESIS TESTING (ONE SAMPLE) - CHAPTER 7 1. used confidence intervals to answer questions such as...

EMPIRICAL FREQUENCY DISTRIBUTION

HYPOTHESIS TESTING (ONE SAMPLE) - CHAPTER 7 1. used confidence intervals to answer questions such as...

Chapter 7 Section 7.1: Inference for the Mean of a Population

Section 13, Part 1 ANOVA. Analysis Of Variance

Study Guide for the Final Exam

5.1 Identifying the Target Parameter

Pr(X = x) = f(x) = λe λx

Binomial Distribution n = 20, p = 0.3

" Y. Notation and Equations for Regression Lecture 11/4. Notation:

Simple Linear Regression Inference

In the past, the increase in the price of gasoline could be attributed to major national or global

TImath.com. F Distributions. Statistics

Review. March 21, S7.1 2_3 Estimating a Population Proportion. Chapter 7 Estimates and Sample Sizes. Test 2 (Chapters 4, 5, & 6) Results

DATA INTERPRETATION AND STATISTICS

An Introduction to Basic Statistics and Probability

Simple Regression Theory II 2010 Samuel L. Baker

Simulation Exercises to Reinforce the Foundations of Statistical Thinking in Online Classes

The Normal distribution

One-Way Analysis of Variance

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics

1.5 Oneway Analysis of Variance

1. How different is the t distribution from the normal?

Outline. Definitions Descriptive vs. Inferential Statistics The t-test - One-sample t-test

Chapter 23 Inferences About Means

Math 108 Exam 3 Solutions Spring 00

AMS 5 CHANCE VARIABILITY

9. Sampling Distributions

Experimental Design. Power and Sample Size Determination. Proportions. Proportions. Confidence Interval for p. The Binomial Test

Name: Date: Use the following to answer questions 3-4:

2 ESTIMATION. Objectives. 2.0 Introduction

Course Text. Required Computing Software. Course Description. Course Objectives. StraighterLine. Business Statistics

STA-201-TE. 5. Measures of relationship: correlation (5%) Correlation coefficient; Pearson r; correlation and causation; proportion of common variance

Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools

T test as a parametric statistic

Math 58. Rumbos Fall Solutions to Review Problems for Exam 2

Week 3&4: Z tables and the Sampling Distribution of X

Statistics Review PSY379

Confidence Intervals in Public Health

CHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression

UNIVERSITY of MASSACHUSETTS DARTMOUTH Charlton College of Business Decision and Information Sciences Fall 2010

12: Analysis of Variance. Introduction

Lecture 8. Confidence intervals and the central limit theorem

Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.

Chi Square Tests. Chapter Introduction

Lesson 1: Comparison of Population Means Part c: Comparison of Two- Means

Quantitative Methods for Finance

Estimation and Confidence Intervals

Mean = (sum of the values / the number of the value) if probabilities are equal

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

Regression Analysis: A Complete Example

STA 130 (Winter 2016): An Introduction to Statistical Reasoning and Data Science

STAT 350 Practice Final Exam Solution (Spring 2015)

Comparing Means in Two Populations

MTH 140 Statistics Videos

Statistics 151 Practice Midterm 1 Mike Kowalski

CHI-SQUARE: TESTING FOR GOODNESS OF FIT

A POPULATION MEAN, CONFIDENCE INTERVALS AND HYPOTHESIS TESTING

PROBABILITY AND STATISTICS. Ma To teach a knowledge of combinatorial reasoning.

Chapter 8 Hypothesis Testing Chapter 8 Hypothesis Testing 8-1 Overview 8-2 Basics of Hypothesis Testing

Chapter 7 Notes - Inference for Single Samples. You know already for a large sample, you can invoke the CLT so:

Fairfield Public Schools

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Final Exam Practice Problem Answers

One-Way Analysis of Variance (ANOVA) Example Problem

LAGUARDIA COMMUNITY COLLEGE CITY UNIVERSITY OF NEW YORK DEPARTMENT OF MATHEMATICS, ENGINEERING, AND COMPUTER SCIENCE

Class 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1)

Probability and Statistics Vocabulary List (Definitions for Middle School Teachers)

Comparing Two Groups. Standard Error of ȳ 1 ȳ 2. Setting. Two Independent Samples

business statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar

Elementary Statistics Sample Exam #3

Recall this chart that showed how most of our course would be organized:

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

Confidence Intervals for One Standard Deviation Using Standard Deviation

Stat 20: Intro to Probability and Statistics

Chicago Booth BUSINESS STATISTICS Final Exam Fall 2011

Statistical Functions in Excel

Descriptive Statistics. Purpose of descriptive statistics Frequency distributions Measures of central tendency Measures of dispersion

CA200 Quantitative Analysis for Business Decisions. File name: CA200_Section_04A_StatisticsIntroduction

Lecture Notes Module 1

Introduction to Statistics for Psychology. Quantitative Methods for Human Sciences

Chapter 7. One-way ANOVA

Statistical Confidence Calculations

Chapter 7: Simple linear regression Learning Objectives

Introduction to. Hypothesis Testing CHAPTER LEARNING OBJECTIVES. 1 Identify the four steps of hypothesis testing.

Confidence intervals

Review of basic statistics and the simplest forecasting model: the sample mean

MAT 155. Key Concept. September 27, S5.5_3 Poisson Probability Distributions. Chapter 5 Probability Distributions

Lecture 2: Discrete Distributions, Normal Distributions. Chapter 1

Descriptive Statistics and Measurement Scales

STAT 145 (Notes) Al Nosedal Department of Mathematics and Statistics University of New Mexico. Fall 2013

17. SIMPLE LINEAR REGRESSION II

Confidence Intervals for Cp

Is it statistically significant? The chi-square test

Transcription:

Section 7 Sampling Distribution of the Mean and the Central Limit Theorem

Review: Probability Distributions Any characteristic that can be measured or categorized is called a variable. If the variable can assume a number of different values such that any particular outcome is determined by chance it is called a random variable. Every random variable has a corresponding probability distribution. The probability distribution applies the theory of probability to describe the behavior of the random variable. PubH 6414 Section 7 2

Review: Probability Distributions So far we ve covered the following probability distribution Probability tables to describe the distributions of Nominal variables Probability density curves for continuous variables particularly the Normal Distribution Probability distributions for discrete variables including Binomial Distribution Poisson Distribution PubH 6414 Section 7 3

Sampling Distributions Review: a statistic is a numerical value used as a summary measure for a sample Statistics are random variables that have different values from sample to sample Since statistics are random variables, they have probability distributions Probability distributions for statistics are called sampling distributions. PubH 6414 Section 7 4

Sampling Distribution of a Statistic Definition of sampling distribution: The probability distribution of a statistic that results from all possible samples of a given size is the sampling distribution of the statistic. If you know the sampling distribution of the statistic you can generalize results from samples to the population using Confidence intervals for estimating parameters Hypothesis tests Other statistical inference methods PubH 6414 Section 7 5

Preview of Sampling Distributions The sampling distributions we will cover in this course are Normal distribution for Sample mean if population standard deviation is known Sample Proportion if n*π > 5 and n*(1-π) > 5 (Section 9) t-distribution for Sample mean if standard deviation is estimated from the sample Chi-square distribution for Chi-square statistic used to test for independence between two categorical variables (Section 12) F-distribution for Ratio of two variances (Section 13) PubH 6414 Section 7 6

Section 7 Outline Sampling Distribution of the Sample Mean Central Limit Theorem SE of the mean Calculating probabilities from the sampling distribution of the mean Introduction to t-distribution PubH 6414 Section 7 7

Sampling vs. Population Distribution The distribution of the individual observations: the population distribution The distribution of the sample means derived from samples drawn from the population: the sampling distribution PubH 6414 Section 7 8

Central Limit Theorem The Central Limit Theorem (CLT) is based on the sampling distribution of sample means from all possible samples of size n drawn from the population. However, to apply the CLT it isn t necessary to generate all possible samples of size n and calculate the mean for each sample to determine the sampling distribution. If the population mean and standard deviation are known, the sampling distribution of the sample mean is also known: x ~ σ, N µ n PubH 6414 Section 7 9

PubH 6414 Section 7 10

Central Limit Theorem Illustration View the following website for an interactive illustration of the Central Limit Theorem http://davidmlane.com/hyperstat/a14043.html PubH 6414 Section 7 11

Standard Error of the Mean The Standard Error of the Mean (SEM) measures the variability in the sampling distribution of sample means SE = σ n As sample size increases, the SEM decreases because the sample size (n) is in the denominator of the calculation PubH 6414 Section 7 12

Standard Deviation vs. Standard Error Standard deviation measures the variability in the population and is based on measurements of individual observations Standard error is the standard deviation of the statistic and measures the variability of the statistic from repeated samples The sampling distribution of ANY statistic has a standard error. The SEM is the SE for the sampling distribution of the mean PubH 6414 Section 7 13

Birth weight example Birth weights over a long period of time at a certain hospital show a normal distribution with mean µ of 112 oz and a standard deviation σ of 20.6 oz. Calculate the probability that the next infant born weighs between 107 and 117 oz. This is calculating the probability for an individual use the population distribution given above Calculate the probability that the mean birth weight for the next 25 infants is between 107 and 117 oz. This is calculating the probability for a sample mean use the CLT and sampling distribution of the sample mean for samples of size n= 25 PubH 6414 Section 7 14

Distribution of Birth weights Single Observation Birth weights over a long period of time at a certain hospital show a normal distribution with mean µ of 112 oz and a standard deviation σ of 20.6 oz. Calculate the probability that the next infant born weighs between 107 and 117 oz. PubH 6414 Section 7 15

Distribution of birth weights Since birth weights are normally distributed, the pnorm function in R can be used to find the probability that the next infant born is between 107 117 oz > pnorm(117,112, 20.6) - pnorm(107, 112, 20.6) [1] 0.1917765 Interpretation: PubH 6414 Section 7 16

Sampling Distribution of Mean Birth weights Calculate the probability that the mean birth weight of the next 25 infants born will fall between 107 and 117 oz. First determine the mean and SEM for this sampling distribution. From the CLT we know: x 20.6 20.6 ~ N 112, with standard error for x = = 25 25 4.12 PubH 6414 Section 7 17

Birth weight example The sampling distribution of sample means has a normal distribution, the pnorm function in R can be used to find the probability that the mean birthweight of the next 25 infants is between 107 117 oz > pnorm(117,112, 4.12) - pnorm(107, 112, 4.12) [1] 0.7750965 Interpretation: PubH 6414 Section 7 18

Comparing the Results of the Two Probability Calculations P (next infant born weighs 107-117 oz) = 0.192 P (mean weight of next 25 infants is 107-117) = 0.774 Why are these probabilities different? PubH 6414 Section 7 19

Wing length example The distribution of wing lengths of butterflies in Baja, CA has a mean value (μ) of 4 cm and variance (σ 2 ) of 25 cm 2 We don t know if butterfly wing length is normally distributed or not. What is the probability that a sample mean wing length calculated from 64 butterflies will fall between 3.5 cm and 4.5 cm? PubH 6414 Section 7 20

Sampling distribution of mean wing lengths We know from the CLT that, regardless of the distribution of the population data, the sample means from a sample of size 64 are normally distributed with x ~ N 4, 5 64 PubH 6414 Section 7 21

Sampling distribution of mean wing length What is the probability that the mean wing length for a sample of 64 butterfly wings is between 3.5 and 4.5 cm? > pnorm(4.5, 4,.625) - pnorm(3.5, 4,.625) [1] 0.5762892 Interpretation: PubH 6414 Section 7 22

Sampling distribution of mean wing length Standard Normal Approach What is the probability that the mean wing length for a sample of 64 butterfly wings is between 3.5 and 4.5 cm? > pnorm(0.8, 0,1)-pnorm(-0.8,0,1) [1] 0.5762892 Interpretation: PubH 6414 Section 7 23

Sampling distribution when σ is unknown The sampling distribution of the sample mean is normally distributed (n>30) when the population standard deviation (σ) is known. Often σ is unknown and is estimated by the sample standard deviation (s). When σ is unknown the SE is calculated as SE = s n PubH 6414 Section 7 24

The CLT when σ is unknown The sampling distribution of the sample mean is distributed as a Student s t distribution with n-1 degrees of freedom when the population standard deviation is unknown and one of these conditions is met: N> 40 and the population is unimodal N>15 and the population is approximately normal. N any size, and the population is normal. ~ s, 1 x t n µ n PubH 6414 Section 7 25

History of Student s t-distribution William Gosset 1876-1937 Gosset worked as a chemist in the Guinness brewery in Dublin and did important work on statistics. He discovered the form of the t-distribution and invented the t-test to handle small samples for quality control in brewing. He published his findings (in July 1908!) under the name "Student". This is why the t-test is sometimes referred to as Student s t-test. PubH 6414 Section 7 26

t-distribution The t-distribution is similar to the standard normal distribution It is unimodal and symmetric about the mean The mean of the t-distribution = 0 The total area under the t-distribution curve = 1 There are some differences between the t-distribution and the standard normal The t-distribution has larger tail areas (the tail areas are the areas at either end of the curve) The t-distribution is Indexed by degrees of freedom (df) which are equal to the sample size - 1. PubH 6414 Section 7 27

t-distribution and standard normal distribution As n increases, the t-distribution is closer to the Standard Normal distribution Values of the t-distribution are called t-coefficients PubH 6414 Section 7 28

Using pt in R To find P(t n-1 < t*) >pt(t*, df) To find P(t n-1 >t*) > 1 pt(t*,df) PubH 6414 Section 7 29

t-coefficient notation t-coefficients are notated with the degrees of freedom as a subscript t-coefficient for a sample size of 10: t 9 t-coefficient for a sample size of 18: t 17 PubH 6414 Section 7 30

Wing length example with σ unknown The distribution of wing lengths of butterflies in Baja, CA has a mean value (μ) of 4 cm and unknown variance (σ 2 ). We don t know if butterfly wing length is normally distributed or not. PubH 6414 Section 7 31

Wing length example with σ unknown What is the probability that a sample mean wing length calculated from 64 butterflies will fall between 3.5 cm and 4.5 cm? Suppose the sample of 64 butterflies has a sample standard deviation of 6 cm. ~ s, 1 x t n µ n PubH 6414 Section 7 32

Wing length example with σ unknown What is the probability that a sample mean wing length calculated from 64 butterflies will fall between 3.5 cm and 4.5 cm? Suppose the sample of 64 butterflies has a sample standard deviation of 6 cm. P( 3.5 < x < 4.5) = P( < z < ) > pt(0.667,63)-pt(-0.667,63) [1] 0.4927922 PubH 6414 Section 7 33

Sampling Distribution of the Sample Mean: Overview By the Central Limit Theorem, means of observations with known standard deviation are normally distributed for large enough sample sizes regardless of the distribution of the observations. When the population standard deviation is unknown, the sampling distribution of the means is a t- distribution with n-1 df. The SEM measures the variability of the distribution of means. PubH 6414 Section 7 34