Where are we? Recap from last time. Lecture 4: Confidence Intervals & Hypothesis Testing. Statistical inference

Similar documents
Chapter 7 Section 7.1: Inference for the Mean of a Population

In the general population of 0 to 4-year-olds, the annual incidence of asthma is 1.4%

Experimental Design. Power and Sample Size Determination. Proportions. Proportions. Confidence Interval for p. The Binomial Test

2 Sample t-test (unequal sample sizes and unequal variances)

LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING

Normal Distribution as an Approximation to the Binomial Distribution

HYPOTHESIS TESTING (ONE SAMPLE) - CHAPTER 7 1. used confidence intervals to answer questions such as...

Lesson 1: Comparison of Population Means Part c: Comparison of Two- Means

HYPOTHESIS TESTING (ONE SAMPLE) - CHAPTER 7 1. used confidence intervals to answer questions such as...

Chapter 7 Notes - Inference for Single Samples. You know already for a large sample, you can invoke the CLT so:

Statistics I for QBIC. Contents and Objectives. Chapters 1 7. Revised: August 2013

Chicago Booth BUSINESS STATISTICS Final Exam Fall 2011

How To Test For Significance On A Data Set

Study Guide for the Final Exam

Stat 411/511 THE RANDOMIZATION TEST. Charlotte Wickham. stat511.cwick.co.nz. Oct

Lecture 8. Confidence intervals and the central limit theorem

Two-sample t-tests. - Independent samples - Pooled standard devation - The equal variance assumption

Quantitative Methods for Finance

4. Continuous Random Variables, the Pareto and Normal Distributions

Principles of Hypothesis Testing for Public Health

Fairfield Public Schools

STAT 145 (Notes) Al Nosedal Department of Mathematics and Statistics University of New Mexico. Fall 2013

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

Recall this chart that showed how most of our course would be organized:

Comparing Means in Two Populations

A POPULATION MEAN, CONFIDENCE INTERVALS AND HYPOTHESIS TESTING

Chapter 8 Hypothesis Testing Chapter 8 Hypothesis Testing 8-1 Overview 8-2 Basics of Hypothesis Testing

THE FIRST SET OF EXAMPLES USE SUMMARY DATA... EXAMPLE 7.2, PAGE 227 DESCRIBES A PROBLEM AND A HYPOTHESIS TEST IS PERFORMED IN EXAMPLE 7.

Nonparametric tests these test hypotheses that are not statements about population parameters (e.g.,

Chapter 23. Two Categorical Variables: The Chi-Square Test

Confidence Intervals for the Difference Between Two Means

5.1 Identifying the Target Parameter

Confidence Intervals on Effect Size David C. Howell University of Vermont

1. How different is the t distribution from the normal?

1.5 Oneway Analysis of Variance

University of Chicago Graduate School of Business. Business 41000: Business Statistics Solution Key

Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.

Chapter 7. One-way ANOVA

CONTENTS OF DAY 2. II. Why Random Sampling is Important 9 A myth, an urban legend, and the real reason NOTES FOR SUMMER STATISTICS INSTITUTE COURSE

University of Chicago Graduate School of Business. Business 41000: Business Statistics

Simple Regression Theory II 2010 Samuel L. Baker

Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur

Need for Sampling. Very large populations Destructive testing Continuous production process

SCHOOL OF HEALTH AND HUMAN SCIENCES DON T FORGET TO RECODE YOUR MISSING VALUES

Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools

DDBA 8438: The t Test for Independent Samples Video Podcast Transcript

Paired 2 Sample t-test

Testing differences in proportions

Course Text. Required Computing Software. Course Description. Course Objectives. StraighterLine. Business Statistics

Chapter 5: Normal Probability Distributions - Solutions

How To Check For Differences In The One Way Anova

In the past, the increase in the price of gasoline could be attributed to major national or global

Multivariate Normal Distribution

Multivariate Logistic Regression

Statistics 104: Section 6!

NCSS Statistical Software. One-Sample T-Test

Biostatistics: DESCRIPTIVE STATISTICS: 2, VARIABILITY

INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA)

Chi Squared and Fisher's Exact Tests. Observed vs Expected Distributions

Goodness of Fit. Proportional Model. Probability Models & Frequency Data

Review. March 21, S7.1 2_3 Estimating a Population Proportion. Chapter 7 Estimates and Sample Sizes. Test 2 (Chapters 4, 5, & 6) Results

3.4 Statistical inference for 2 populations based on two samples

List of Examples. Examples 319

LOGNORMAL MODEL FOR STOCK PRICES

Two-sample inference: Continuous data

Chapter 23 Inferences About Means

Independent t- Test (Comparing Two Means)

TImath.com. F Distributions. Statistics

Difference of Means and ANOVA Problems

Sampling Distributions

Class 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1)

Chapter 19 The Chi-Square Test

Confidence Intervals in Public Health

" Y. Notation and Equations for Regression Lecture 11/4. Notation:

Generalized Linear Models

The Binomial Distribution

Chapter 9. Two-Sample Tests. Effect Sizes and Power Paired t Test Calculation

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION

UNDERSTANDING THE DEPENDENT-SAMPLES t TEST

Two-sample hypothesis testing, II /16/2004

Universally Accepted Lean Six Sigma Body of Knowledge for Green Belts

Unit 26 Estimation with Confidence Intervals

Data Mining Techniques Chapter 5: The Lure of Statistics: Data Mining Using Familiar Tools

Two-Sample T-Tests Allowing Unequal Variance (Enter Difference)

Multiple Linear Regression

Descriptive Statistics

Unbeknownst to us, the entire population consists of 5 cloned sheep with ages 10, 11, 12, 13, 14 months.

12: Analysis of Variance. Introduction

Online 12 - Sections 9.1 and 9.2-Doug Ensley

Chapter 4. Probability and Probability Distributions

Simple Linear Regression Inference

Chapter 8 Section 1. Homework A

Section 13, Part 1 ANOVA. Analysis Of Variance

Stats for Strategy Fall 2012 First-Discussion Handout: Stats Using Calculators and MINITAB

Chapter 2 Probability Topics SPSS T tests

HYPOTHESIS TESTING: POWER OF THE TEST

Chapter 7 Section 1 Homework Set A

Week 3&4: Z tables and the Sampling Distribution of X

Lecture 14. Chapter 7: Probability. Rule 1: Rule 2: Rule 3: Nancy Pfenning Stats 1000

Transcription:

Where are we? Recap from last time Lecture 4: Confidence Intervals & Hypothesis Testing Sandy Eckel seckel@jhsph.edu 25 April 2008 Lecture 3 Summary The Normal Distribution Sampling distributions (i.e., of the sample mean) The Central Limit Theorem Today, we ll discuss Confidence intervals for population parameters The t-distribution Hypothesis testing (p-values) 1 / 25 2 / 25 Recall: Summary of Sampling Distributions Statistical inference Sampling Distribution Statistic Mean Variance σ X µ 2 n σ1 X 1 X 2 µ 1 - µ 2 2 + σ2 2 pq n ˆp p nˆp np npq ˆp 1 ˆp 2 p 1 p 2 p 1 q 1 + p 2q 2 Two methods Estimation (Confidence intervals) Hypothesis testing (p-values) Both make use of sampling distributions Remember to use CLT Sampling distributions allow us to make statements about the unobserved true population parameter in relation to the observed sample statistic This is called statistical inference! 3 / 25 4 / 25

What do we mean by Estimation What do we mean by Hypothesis Testing Point estimation An estimator of a population parameter: a statistic (i.e., X, ˆp) An estimate of a population parameter: the value of the estimator for a particular sample From a sample of 100 infants, sample mean birth weight was X = 3012 grams From a sample of 100 Vitamin A treated girls, 2 died so ˆp = 2 100 = 0.02 Interval estimation A point estimate plus an interval that expresses the uncertainty or variability associated with the estimate Pre-specify a null hypothesis and an alternative hypothesis that relate to the population parameter Given the observed data (and resulting statistic), we decide to reject or fail to reject the null hypothesis in favor of the alternative Significance testing 5 / 25 6 / 25 Point Estimation Interval Estimation X is a point estimator of µ X 1 X 2 is a point estimator of µ 1 µ 2 ˆp is a point estimator of p ˆp 1 ˆp 2 is a point estimator of p 1 p 2 We know the sampling distribution of these statistics, e.g. X N(µ X = µ, σ2 X = σ2 n ) If σ 2 is not known, we can use s 2, the sample variance, as a point estimator of σ 2 100(1 α)% Confidence interval: estimate ± (critical value of z or t) (standard error) Example: Confidence interval for the population mean Plugging in the values, we get X ± z α/2 σ X = [L,U] Note: The z α/2 is the value such that under a standard normal curve the area under the curve that is larger than z α/2 is α/2 and the area under the curve that is less than z α/2 is α/2 7 / 25 8 / 25

Derivation of Confidence Interval (CI) for the mean Summary: CI for mean We get the 100(1 α)% confidence interval for µ by taking: P( z α/2 Z z α/2 ) = 1 α P( z α/2 X µ z α/2 ) = 1 α σ X P( z α/2 σ X X µ z α/2 σ X ) = 1 α A 100(1 α)% confidence interval for µ, the population mean, is given by the interval estimate X ± z (α/2) σ X when the population variance σ 2 is known After some algebra: P( X z α/2 σ X µ X + z α/2 σ X) = 1 α P(L µ U) = 1 α The population variance is very rarely known (!), but you ll see we can deal with this... In this class, we ll always use 100(1 α)% = 95% confidence intervals, but you might sometimes see 90% or 99% CI in the literature. 9 / 25 10 / 25 Interpretation of the CI for µ Known Variance Assumption Before the data are observed, the probability is at least (1 α) that [L, U] will contain µ, the population parameter In repeated sampling from a normally distributed population, 100(1 α)% of all intervals of the form above will include the the population mean µ After the data are observed, the constructed interval [L,U] either contains the true mean or it does not (no probability involved anymore) Sampling from a normally distributed population with known variance (σ 2 known) Confidence interval: X ± z(α/2) σ X What if σ 2 is unknown? Best we can do is use the best estimate we have of population variance: sample variance 11 / 25 12 / 25

Using the Sample Variance The t-distribution Sampling from a normally distributed population with population variance unknown We can make use of the sample variance s 2 Now we construct the confidence interval as: X ± z (α/2) s X when n is large X ± t (α/2,n 1) s X when n is small Estimate σ 2 with s 2 Here, s X = s n and t α/2 has n-1 degrees of freedom The distribution of X is not quite normal, so we need the t-distribution t Density t = X µ s/ n x df=2 df=5 df=20 13 / 25 14 / 25 Properties of the t-distribution Comparing t with normal mean = median = mode = 0 Symmetric about the mean t ranges from to + Family of distributions determined by n 1, the degrees of freedom The t distribution approaches the standard normal distribution as n 1 approaches Density Std. normal t with df=2 x 15 / 25 16 / 25

T-tables Summary: Confidence intervals for means Population Sample Population 95% Confidence Distribution Size Variance Interval Any σ Normal 2 known X ± 1.96σ/ n Any σ 2 unknown, use s 2 X ± t0.025,n 1 s/ n Not Normal/ Large σ 2 known X ± 1.96σ/ n Large σ Unknown 2 unknown, use s 2 X ± 1.96s/ n Small Any Non-parametric methods Large - ˆp ± 1.96 ˆp(1 ˆp)/n Binomial Small - Exact methods 17 / 25 18 / 25 Confidence Intervals for Differences in Means Equal Variances Assumption This is a bit tricky Recall that formulas for CIs for a single mean depend on whether or not σ 2 is known the sample size For a difference in means, the formula for a CI depends on whether or not the variances are assumed to be equal when variance are unknown sample sizes in each group When variances are assumed to be equal: The standard error of the difference is estimated by: s 2 p + s2 p Here, s 2 p is the pooled variance s 2 p = ( 1)s 2 1 + ( 1)s 2 2 + 2 where the degrees of freedom (df) = + 2 Recall, is the size of sample 1, and is the size of sample 2 19 / 25 20 / 25

Unequal Variances Assumption Summary: Confidence intervals for difference of means When variances are assumed to be unequal: The standard error of the difference is estimated by: s1 2 + s2 2 Here, df = ν and ν = ( s2 1 + s2 2 ) 2 (s1 2/) 2 1 + (s2 2 /) 2 1 21 / 25 Population Sample Population 95% Confidence Distribution Size Variances Interval σ1 Any known ( X 1 X 2 ) ± 1.96 2 Normal + σ2 2 s 2 p Any unknown, ( X 1 X 2 ) ± t 0.025,n1+ 2 σ1 2 = σ2 2 s1 Any unknown, ( X 1 X 2 ) ± t 2 0.025,ν + s2 2 σ1 2 σ2 2 Large known ( X 1 X 2 ) ± 1.96 + s2 p σ1 2 + σ2 2 s 2 p + s2 p Not Normal/ Large unknown, ( X 1 X 2 ) ± 1.96 Unknown σ1 2 = σ2 2 Large unknown, ( X 1 X 2 ) ± 1.96 σ1 2 σ2 2 Small Any Non-parametric methods s 2 1 + s2 2 22 / 25 Confidence intervals for difference of proportions Recap: Statistical Inference Population Sample 95% Confidence Distribution Size Interval Binomial ˆp Large (ˆp 1 ˆp 2 ) ± 1.96 1(1 ˆp 1) Small Exact methods + ˆp2(1 ˆp2) Estimation Point estimation Confidence intervals Hypothesis Testing This is next! We will first discuss hypothesis testing as it applies to means of distributions for continuous variables We will then discuss discrete data (specifically dichotomous variables) - probably next week 23 / 25 24 / 25

To be continued... The remaining material from this lecture on Hypothesis Testing has been moved to Lecture 5. 25 / 25