Experimental Design. Power and Sample Size Determination. Proportions. Proportions. Confidence Interval for p. The Binomial Test



Similar documents
Inference for two Population Means

BA 275 Review Problems - Week 6 (10/30/06-11/3/06) CD Lessons: 53, 54, 55, 56 Textbook: pp , ,

Chapter 8 Hypothesis Testing Chapter 8 Hypothesis Testing 8-1 Overview 8-2 Basics of Hypothesis Testing

Practice problems for Homework 12 - confidence intervals and hypothesis testing. Open the Homework Assignment 12 and solve the problems.

Comparing Two Groups. Standard Error of ȳ 1 ȳ 2. Setting. Two Independent Samples

HYPOTHESIS TESTING (ONE SAMPLE) - CHAPTER 7 1. used confidence intervals to answer questions such as...

BA 275 Review Problems - Week 5 (10/23/06-10/27/06) CD Lessons: 48, 49, 50, 51, 52 Textbook: pp

Unit 26 Estimation with Confidence Intervals

HYPOTHESIS TESTING: POWER OF THE TEST

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

Introduction to Hypothesis Testing. Hypothesis Testing. Step 1: State the Hypotheses

3.4 Statistical inference for 2 populations based on two samples

HYPOTHESIS TESTING (ONE SAMPLE) - CHAPTER 7 1. used confidence intervals to answer questions such as...

Math 251, Review Questions for Test 3 Rough Answers

Two-Sample T-Tests Assuming Equal Variance (Enter Means)

4. Continuous Random Variables, the Pareto and Normal Distributions

Pearson's Correlation Tests

Chapter 4 - Lecture 1 Probability Density Functions and Cumul. Distribution Functions

Two-Sample T-Tests Allowing Unequal Variance (Enter Difference)

p ˆ (sample mean and sample

Mind on Statistics. Chapter 12

Hypothesis Testing for Beginners

Hypothesis testing - Steps

Binomial Distribution n = 20, p = 0.3

Statistics I for QBIC. Contents and Objectives. Chapters 1 7. Revised: August 2013

Confidence Intervals for the Difference Between Two Means

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

Two-sample hypothesis testing, II /16/2004

Normal distribution. ) 2 /2σ. 2π σ

Review #2. Statistics

Name: Date: Use the following to answer questions 3-4:

Two-sample inference: Continuous data

5.1 Identifying the Target Parameter

Statistics 104: Section 6!

12.5: CHI-SQUARE GOODNESS OF FIT TESTS

MATH 10: Elementary Statistics and Probability Chapter 5: Continuous Random Variables

Non-Parametric Tests (I)

C. The null hypothesis is not rejected when the alternative hypothesis is true. A. population parameters.

Simple Regression Theory II 2010 Samuel L. Baker

Probability Distributions

General Method: Difference of Means. 3. Calculate df: either Welch-Satterthwaite formula or simpler df = min(n 1, n 2 ) 1.

AP STATISTICS (Warm-Up Exercises)

Module 2 Probability and Statistics

1. How different is the t distribution from the normal?

Class 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1)

Lesson 17: Margin of Error When Estimating a Population Proportion

Section 13, Part 1 ANOVA. Analysis Of Variance

LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING

Simple Linear Regression Inference

II. DISTRIBUTIONS distribution normal distribution. standard scores

How To Check For Differences In The One Way Anova

Testing Hypotheses About Proportions

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

Chapter 2. Hypothesis testing in one population

MONT 107N Understanding Randomness Solutions For Final Examination May 11, 2010

Hypothesis Testing --- One Mean

Normal Distribution as an Approximation to the Binomial Distribution

Characteristics of Binomial Distributions

Introduction to Hypothesis Testing OPRE 6301

Non-Inferiority Tests for Two Means using Differences

WHERE DOES THE 10% CONDITION COME FROM?

BUS/ST 350 Exam 3 Spring 2012

Point Biserial Correlation Tests

Hypothesis Testing: Two Means, Paired Data, Two Proportions

Lesson 1: Comparison of Population Means Part c: Comparison of Two- Means

Difference of Means and ANOVA Problems

Calculating P-Values. Parkland College. Isela Guerra Parkland College. Recommended Citation

Lecture Notes Module 1

Dongfeng Li. Autumn 2010

Chapter 5: Normal Probability Distributions - Solutions

Introduction. Hypothesis Testing. Hypothesis Testing. Significance Testing

Hypothesis testing. c 2014, Jeffrey S. Simonoff 1

Simulation Exercises to Reinforce the Foundations of Statistical Thinking in Online Classes

In the general population of 0 to 4-year-olds, the annual incidence of asthma is 1.4%

An Introduction to Statistics Course (ECOE 1302) Spring Semester 2011 Chapter 10- TWO-SAMPLE TESTS

22. HYPOTHESIS TESTING

Introduction to the Practice of Statistics Fifth Edition Moore, McCabe

Unit 26: Small Sample Inference for One Mean

Stat 411/511 THE RANDOMIZATION TEST. Charlotte Wickham. stat511.cwick.co.nz. Oct

Math 108 Exam 3 Solutions Spring 00

Psychology 60 Fall 2013 Practice Exam Actual Exam: Next Monday. Good luck!

How To Test For Significance On A Data Set

Week 3&4: Z tables and the Sampling Distribution of X

Chapter 8 Section 1. Homework A

Chapter 7 Notes - Inference for Single Samples. You know already for a large sample, you can invoke the CLT so:

STA 130 (Winter 2016): An Introduction to Statistical Reasoning and Data Science

research/scientific includes the following: statistical hypotheses: you have a null and alternative you accept one and reject the other

Introduction to Hypothesis Testing

Final Exam Practice Problem Answers

Chapter 4 Statistical Inference in Quality Control and Improvement. Statistical Quality Control (D. C. Montgomery)

Lecture 5 : The Poisson Distribution

Hypothesis Testing. Hypothesis Testing

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Chapter 3: DISCRETE RANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS. Part 3: Discrete Uniform Distribution Binomial Distribution

5/31/2013. Chapter 8 Hypothesis Testing. Hypothesis Testing. Hypothesis Testing. Outline. Objectives. Objectives

Fairfield Public Schools

NCSS Statistical Software

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Bowerman, O'Connell, Aitken Schermer, & Adcock, Business Statistics in Practice, Canadian edition

Standard Deviation Estimator

Transcription:

Experimental Design Power and Sample Size Determination Bret Hanlon and Bret Larget Department of Statistics University of Wisconsin Madison November 3 8, 2011 To this point in the semester, we have largely focused on methods to analyze the data that we have with little regard to the decisions on how to gather the data. Design of Experiments is the area of statistics that examines plans on how to gather data to achieve good (or optimal) inference. Here, we will focus on the question of sample size: how large does a sample need to be so that a confidence interval will be no wider than a given size? how large does a sample need to be so that a hypothesis test will have a low p-value if a certain alternative hypothesis is true? Sample size determination is just one aspect of good design of experiments: we will encounter additional aspects in future lectures. Power 1 / 31 Power The Big Picture 2 / 31 Proportions Proportions Recall methods for inference about proportions: confidence intervals Confidence Interval for p A P% confidence interval for p is p z p (1 p ) n < p < p + z p (1 p ) where n = n + 4 and p = X +2 n+4 = X +2 n and z is the critical number from a standard normal distribution where the area between z and z is P/100. (For 95%, z = 1.96.) n... and hypothesis tests. The Binomial Test If X Binomial(n, p) with null hypothesis p = p 0 and we observe X = x, the p-value is the probability that a new random variable Y Binomial(n, p 0 ) would be at least as extreme (either P(Y x) or P(Y x) or P( Y np 0 x np 0 ) depending on the alternative hypothesis chosen.) Power Proportions 3 / 31 Power Proportions 4 / 31

Sample size for proportions Case Study Next year it is likely that there will be a recall election for Governor Scott Walker. A news organization plans to take a poll of likely voters over the next several days to find, if the election were held today, the proportion of voters who would vote for Walker against an unnamed Democratic opponent. Assuming that the news organization can take a random sample of likely voters: How large of a sample is needed for a 95% confidence interval to have a margin of error of no more than 4%? Notice that the margin of error depends on both n and p, but we do not know p. p 1.96 (1 p ) n + 4 However, the expression p (1 p ) is maximized at 0.5; if the value of p from the sample turns out to be different, the margin of error will just be a bit smaller, which is even better. So, it is conservative (in a statistical, not political sense) to set p = 0.5 and then solve this inequality for n. (0.5)(0.5) 1.96 < 0.04 n + 4 Show on the board why n > ( ) 2 (1.96)(0.5). 0.04 4 = 621. Power Proportions Confidence Intervals 5 / 31 Power Proportions Confidence Intervals 6 / 31 General Formula Sample size for proportions ( ) (1.96)(0.5) 2 n > 4 M where M is the desired margin of error. Errors in Hypothesis Tests When a hypothesis test is used to make a decision to reject the null hypothesis when the p-value is below a prespecified fixed value α, there are two possible correct decisions and two possible errors. We first saw these concepts with proportions, but review them now. The two decisions we can make are to Reject or Not Reject the null hypothesis. The two states of nature are the the null hypothesis is either True or False. These possibilities combine in four possible ways. H 0 is True H 0 is False Reject H 0 Type I error Correct decision Do not Reject H 0 Correct decision Type II error Power Proportions Confidence Intervals 7 / 31 Power Proportions Hypothesis Tests 8 / 31

Type I and Type II Errors Definition A Type I Error is rejecting the null hypothesis when it is true. The probability of a type I error is called the significance level of a test and is denoted α. Definition A Type II Error is not rejecting a null hypothesis when it is false. The probability of a type II error is called β, but the value of β typically depends on which particular alternative hypothesis is true. Graphs Note that as there are many possible alternative hypotheses, for a single α there are many values of β. It is helpful to plot the probability of rejecting the null hypothesis against the parameter values. Definition The power of a hypothesis test for a specified alternative hypothesis is 1 β. The power is the probability of rejecting the null hypothesis in favor of the specific alternative. Power Proportions Hypothesis Tests 9 / 31 Power Proportions Hypothesis Tests 10 / 31 Sample size calculation Solution Consider a population with proportion p. Let X be the number of successes in a random sample of size 100 with model X Binomial(100, p). Consider the hypotheses H 0 : p = 0.3 versus H A : p < 0.3. The researchers decide to reject the null hypothesis if X 22. 1 Find α 2 Find β if p = 0.2. 3 Plot the probability of rejecting the null hypothesis versus p. α = P(X 22 p = 0.3) 22 ( ) 100 = (0.3) k (0.7) 100 k k k=0. = 0.0479 1 β(p) = P(X 22 p) = 22 ( ) 100 p k (1 p) 100 k k k=0 1 β(0.2). = 0.7389 Power Proportions Hypothesis Tests 11 / 31 Power Proportions Hypothesis Tests 12 / 31

Two binomial distributions X ~ Binomial(100,0.3) Graph of Power 0.10 Probability 0.05 1.0 0.00 0.8 1 β 0 20 40 60 80 100 X ~ Binomial(100,0.2) x Power 0.6 0.4 Probability 0.10 0.05 0.00 0.2 0.0 α 0.0 0.2 0.3 0.4 0.6 0.8 1.0 0 20 40 60 80 100 p x Power Proportions Hypothesis Tests 13 / 31 Power Proportions Hypothesis Tests 14 / 31 Sample Size Suppose that we wanted a sample size large enough so that we could pick a rejection rule where α was less than 0.05 and the power when p = 0.2 was greater than 0.9. How large would n need to be? Simplify by letting n be a multiple of 25. We see in the previous example that 100 is not large enough: if the critical rejection value were more than 22, then α > 0.05 and the power is 0.7389 which is less than 0.9. The calculation is tricky because as n changes we need to change the rejection rule so that α 0.05 and then find the corresponding power. We can use the quantile function for the binomial distribution, qbinom(), to find the rejection region for a given α. For example, for X Binomial(100, 0.3), > k = qbinom(0.05, 100, 0.3) > k [1] 23 > pbinom(k, 100, 0.3) [1] 0.07553077 > pbinom(k - 1, 100, 0.3) [1] 0.04786574 we see that P(X 22) < 0.05 < P(X 23) and rejecting the null hypothesis when X 22 results in a test with α < 0.05. Power Proportions Hypothesis Tests 15 / 31 Power Proportions Hypothesis Tests 16 / 31

We can just substract one from the quantile function in general to find the rejection region for a given n. > k = qbinom(0.05, 100, 0.3) - 1 > k [1] 22 The power when p = 0.2 is > pbinom(k, 100, 0.2) [1] 0.7389328 This R code will find the rejection region, significance level, and power for n = 100, 125, 150, 175, 200. > n = seq(100, 200, 25) > k = qbinom(0.05, n, 0.3) - 1 > alpha = pbinom(k, n, 0.3) > power = pbinom(k, n, 0.2) > data.frame(n, k, alpha, power) n k alpha power 1 100 22 0.04786574 0.7389328 2 125 28 0.03682297 0.7856383 3 150 35 0.04286089 0.8683183 4 175 42 0.04733449 0.9193571 5 200 48 0.03594782 0.9309691 We see that n = 175 is the smallest n which is a multiple of 25 for which a test with α < 0.05 has power greater than 0.9 when p = 0.2. Power Proportions Hypothesis Tests 17 / 31 Power Proportions Hypothesis Tests 18 / 31 Normal Populations Butterfat The previous problems were for the binomial distribution and proportions, which is tricky because of the discreteness and necessary sums of binomial probability calculations. Answering similar problems for normal populations is easier. However, we need to provide a guess for σ. We want to know the mean percentage of butterfat in milk produced by area farms. We can sample multiple loads of milk. Previous records indicate that the standard deviation among loads is 0.15 percent. How many loads should we sample if we desire the margin of error of a 99% confidence interval for µ, the mean butterfat percentage, to be no more than 0.03? Power Normal Distribution 19 / 31 Power Normal Distribution Confidence Intervals 20 / 31

Confidence Intervals We will use the t distribution, not the standard normal for the actual confidence interval, but it is easiest to use the normal distribution for planning the design. If the necessary sample size is even moderately large, the differences between z and t is tiny. Recall the confidence interval for µ. Confidence Interval for µ A P% confidence interval for µ has the form Ȳ t s < µ < Ȳ + t s n n For a 99% confidence interval, we find z = 2.58. We need n so that Work on the board to show n > 2.58 0.15 n < 0.03 ( ) 2 (2.58)(0.15). 0.03 = 167. where t is the critical value such that the area between t and t under a t-density with n 1 degrees of freedom is P/100, where n is the sample size. Power Normal Distribution Confidence Intervals 21 / 31 Power Normal Distribution Confidence Intervals 22 / 31 General Formula Sample size for a single mean ( (z )(σ) n > M where M is the desired margin of error. ) 2 Rejection region Graph the power for a sample size of n = 25 for α = 0.05 for H 0 : µ = 3.35 versus H A : µ 3.35. Note that the p-value is less than 0.05 approximately when The rejection region is X 3.35 0.15/ > 1.96 25 ( ) 0.15.= X < 3.35 1.96 3.291 25 or ( ) 0.15.= X > 3.35 + 1.96 3.409 25 Power Normal Distribution Confidence Intervals 23 / 31 Power Normal Distribution Confidence Intervals 24 / 31

Power when µ = 3.3 Graph of Power s To find the power when µ = 3.3, we need to find the area under the normal density centered at µ = 3.3 in the rejection region. Here is an R calculation for the power. > se = 0.15/sqrt(25) > a = 3.35-1.96 * se > b = 3.35 + 1.96 * se > c(a, b) [1] 3.2912 3.4088 > power = pnorm(a, 3.3, se) + (1 - pnorm(b, 3.3, se)) > power [1] 0.3847772 The previous slide shows a numerical calculation for the power at a single alternative hypothesis, µ = 3.3. The next slide illustrates both the rejection region (top graph) and the power at µ = 3.3 (bottom graph). The following slide shows a graph of the power for many alternative hypotheses. Power Normal Distribution Confidence Intervals 25 / 31 Power Normal Distribution Confidence Intervals 26 / 31 Power: n = 25, H 0 : µ = 3.35 10 Density 5 0 1.0 3.20 3.25 3.30 3.35 3.40 3.45 Percentage Butter Fat Power 0.8 0.6 0.4 10 0.2 Density 5 0.0 α 0 3.20 3.25 3.30 3.35 3.40 3.45 3.20 3.25 3.30 3.35 3.40 3.45 3.50 µ Percentage Butter Fat Power Normal Distribution Confidence Intervals 27 / 31 Power Normal Distribution Confidence Intervals 28 / 31

Sample Size Problem Long-run average percent butterfat in milk at a farm is 3.35 and the standard deviation is 0.15, with measurements taken by the load. How large should a sample size be to have an 80% chance of detecting a change to µ = 3.30 at a significance level of α = 0.05 with a two-sided test? Solution The left part of the rejection region will have form ( ) 0.15 Ȳ < a = 3.35 1.96 n as z = 1.96 cuts off the bottom α/2 = 0.025 of the area. The power when µ = 3.30 is essentially just the area to the left of a under the normal density centered at 3.30 as the area in the right part of the rejection region is essentially 0. For the power to be 0.80, a must be the 0.80 quantile of the Normal(3.30, 0.15/ n) density. As the 0.80 quantile of a standard normal curve is z = 0.84, this means that ( ) 0.15 a = 3.30 + 0.84 n Set these two expressions for a equal to one another and solve for n. ( ) 2.= (Finish on the chalk board to verify n 71.) (0.15)(1.96+0.84 3.35 3.30 Power Normal Distribution Confidence Intervals 29 / 31 Power Normal Distribution Confidence Intervals 30 / 31 What you should know You should know: what the definitions of power and significance level are; how to find sample sizes for desired sizes of confidence intervals for both proportions and means; how to find sample sizes with desired power for specific alternative hypotheses for both proportions and means; how to examine and interpret a power curve; how power curves change as n changes. Power What you should know 31 / 31