6: Introduction to Hypothesis Testing



Similar documents
Introduction to. Hypothesis Testing CHAPTER LEARNING OBJECTIVES. 1 Identify the four steps of hypothesis testing.

Hypothesis testing - Steps

Name: Date: Use the following to answer questions 3-4:

Introduction to Hypothesis Testing. Hypothesis Testing. Step 1: State the Hypotheses

Chapter Study Guide. Chapter 11 Confidence Intervals and Hypothesis Testing for Means

Section 7.1. Introduction to Hypothesis Testing. Schrodinger s cat quantum mechanics thought experiment (1935)

HYPOTHESIS TESTING (ONE SAMPLE) - CHAPTER 7 1. used confidence intervals to answer questions such as...

Testing a claim about a population mean

HYPOTHESIS TESTING: POWER OF THE TEST

HYPOTHESIS TESTING (ONE SAMPLE) - CHAPTER 7 1. used confidence intervals to answer questions such as...

Tutorial 5: Hypothesis Testing

Testing Hypotheses About Proportions

Chapter 8 Hypothesis Testing Chapter 8 Hypothesis Testing 8-1 Overview 8-2 Basics of Hypothesis Testing

p-values and significance levels (false positive or false alarm rates)

DDBA 8438: Introduction to Hypothesis Testing Video Podcast Transcript

t Tests in Excel The Excel Statistical Master By Mark Harmon Copyright 2011 Mark Harmon

THE FIRST SET OF EXAMPLES USE SUMMARY DATA... EXAMPLE 7.2, PAGE 227 DESCRIBES A PROBLEM AND A HYPOTHESIS TEST IS PERFORMED IN EXAMPLE 7.

An Introduction to Statistics Course (ECOE 1302) Spring Semester 2011 Chapter 10- TWO-SAMPLE TESTS

Correlational Research

5/31/2013. Chapter 8 Hypothesis Testing. Hypothesis Testing. Hypothesis Testing. Outline. Objectives. Objectives

Chapter 23 Inferences About Means

Hypothesis Testing --- One Mean

Two-Sample T-Tests Assuming Equal Variance (Enter Means)

Hypothesis Test for Mean Using Given Data (Standard Deviation Known-z-test)

22. HYPOTHESIS TESTING

Two-Sample T-Tests Allowing Unequal Variance (Enter Difference)

LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING

HYPOTHESIS TESTING WITH SPSS:

Introduction to Hypothesis Testing OPRE 6301

Descriptive Statistics

Recall this chart that showed how most of our course would be organized:

Non-Inferiority Tests for One Mean

WISE Power Tutorial All Exercises

MONT 107N Understanding Randomness Solutions For Final Examination May 11, 2010

Two Related Samples t Test

CHANCE ENCOUNTERS. Making Sense of Hypothesis Tests. Howard Fincher. Learning Development Tutor. Upgrade Study Advice Service

Hypothesis Testing. Steps for a hypothesis test:

Online 12 - Sections 9.1 and 9.2-Doug Ensley

A POPULATION MEAN, CONFIDENCE INTERVALS AND HYPOTHESIS TESTING

How To Check For Differences In The One Way Anova

II. DISTRIBUTIONS distribution normal distribution. standard scores

Hypothesis Testing. Reminder of Inferential Statistics. Hypothesis Testing: Introduction

Lesson 1: Comparison of Population Means Part c: Comparison of Two- Means

Regression Analysis: A Complete Example

Psychology 60 Fall 2013 Practice Exam Actual Exam: Next Monday. Good luck!

Chapter 9. Two-Sample Tests. Effect Sizes and Power Paired t Test Calculation

Sample Size and Power in Clinical Trials

BA 275 Review Problems - Week 6 (10/30/06-11/3/06) CD Lessons: 53, 54, 55, 56 Textbook: pp , ,

STAT 145 (Notes) Al Nosedal Department of Mathematics and Statistics University of New Mexico. Fall 2013

Stats Review Chapters 9-10

Hypothesis testing. c 2014, Jeffrey S. Simonoff 1

Introduction. Hypothesis Testing. Hypothesis Testing. Significance Testing

3.4 Statistical inference for 2 populations based on two samples

Mind on Statistics. Chapter 12

Stats for Strategy Fall 2012 First-Discussion Handout: Stats Using Calculators and MINITAB

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

Testing Group Differences using T-tests, ANOVA, and Nonparametric Measures

Chapter 26: Tests of Significance

COMPARISONS OF CUSTOMER LOYALTY: PUBLIC & PRIVATE INSURANCE COMPANIES.

12: Analysis of Variance. Introduction

= $96 = $24. (b) The degrees of freedom are. s n For the mean monthly rent, the 95% confidence interval for µ is

C. The null hypothesis is not rejected when the alternative hypothesis is true. A. population parameters.

Introduction to Hypothesis Testing

2013 MBA Jump Start Program. Statistics Module Part 3

General Method: Difference of Means. 3. Calculate df: either Welch-Satterthwaite formula or simpler df = min(n 1, n 2 ) 1.

Chapter 2 Probability Topics SPSS T tests

Study Guide for the Final Exam

Experimental Design. Power and Sample Size Determination. Proportions. Proportions. Confidence Interval for p. The Binomial Test

research/scientific includes the following: statistical hypotheses: you have a null and alternative you accept one and reject the other

Testing for differences I exercises with SPSS

Chapter 7. Comparing Means in SPSS (t-tests) Compare Means analyses. Specifically, we demonstrate procedures for running Dependent-Sample (or

Calculating P-Values. Parkland College. Isela Guerra Parkland College. Recommended Citation

Normality Testing in Excel

Hypothesis Testing for Beginners

Pearson's Correlation Tests

Chapter 8: Hypothesis Testing for One Population Mean, Variance, and Proportion

In the past, the increase in the price of gasoline could be attributed to major national or global

Hypothesis Testing. Hypothesis Testing

Chapter 7: Simple linear regression Learning Objectives

Unit 27: Comparing Two Means

Comparing Two Groups. Standard Error of ȳ 1 ȳ 2. Setting. Two Independent Samples

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

Chi Square Tests. Chapter Introduction

2. Simple Linear Regression

Independent t- Test (Comparing Two Means)

Chapter 23. Inferences for Regression

Good luck! BUSINESS STATISTICS FINAL EXAM INSTRUCTIONS. Name:

KSTAT MINI-MANUAL. Decision Sciences 434 Kellogg Graduate School of Management

1-3 id id no. of respondents respon 1 responsible for maintenance? 1 = no, 2 = yes, 9 = blank

Mind on Statistics. Chapter 13

NCSS Statistical Software

Data Analysis Tools. Tools for Summarizing Data

Chapter 5 Analysis of variance SPSS Analysis of variance

Chapter 7 Section 7.1: Inference for the Mean of a Population

STATISTICS 8, FINAL EXAM. Last six digits of Student ID#: Circle your Discussion Section:

Opgaven Onderzoeksmethoden, Onderdeel Statistiek

Module 2 Probability and Statistics

Transcription:

6: Introduction to Hypothesis Testing Significance testing is used to help make a judgment about a claim by addressing the question, Can the observed difference be attributed to chance? We break up significance testing into three (or four) steps: Step A: Null and alternative hypotheses The first step of hypothesis testing is to convert the research question into null and alterative hypotheses. We start with the null hypothesis (H 0 ). The null hypothesis is a claim of no difference. The opposing hypothesis is the alternative hypothesis (H 1 ). The alternative hypothesis is a claim of a difference in the population, and is the hypothesis the researcher often hopes to bolster. It is important to keep in mind that the null and alternative hypotheses reference population values, and not observed statistics. Step B: Test statistic We calculate a test statistic from the data. There are different types of test statistics. This chapter introduces the one-sample z-statistics. The z statistic will compare the observed sample mean to an expected population mean μ 0. Large test statistics indicate data are far from expected, providing evidence against the null hypothesis and in favor of the alternative hypothesis. Step C: p Value and conclusion The test statistic is converted to a conditional probability called a P-value. The P- value answers the question If the null hypothesis were true, what is the probability of observing the current data or data that is more extreme? Small p values provide evidence against the null hypothesis because they say the observed data are unlikely when the null hypothesis is true. We apply the following conventions: o o o o When p value >.10 the observed difference is not significant When p value.10 the observed difference is marginally significant When p value.05 the observed difference is significant When p value.01 the observed difference is highly significant Use of significant in this context means the observed difference is not likely due to chance. It does not mean of important or meaningful. Step D: Decision (optional) Alpha (α) is a probability threshold for a decision. If P α, we will reject the null hypothesis. Otherwise it will be retained for want of evidence. Page 6.1 (C:\data\StatPrimer\hyp-test.doc Last printed 7/13/2006 10:59:00 PM)

One-Sample z Test The one-sample z test is used to compare a mean from a single sample to an expected norm. The norm for the test comes from a hypothetical value or observations in prior studies, and does not come from the current data. In addition, this test is used only when the population standard deviation σ is known from a prior source. Finally, data represent a SRS, and measurements that comprise the data are assumed to be accurate and meaningful. Example ( Lake Wobegon ). Garrison Keller claims the children of Lake Wobegon are above average. You take a simple random sample of 9 children from Lake Wobegon and measure their intelligence with a Wechsler test and find the following scores: {116, 128, 125, 119, 89, 99, 105, 116, and 118}. The mean of this sample ( x ) is 112.8. We know Wechsler scores are scaled to be Normally distributed with a mean of 100 and standard deviation of 15. Is this sample mean sufficiently different from a population mean µ of 100 to reject the null hypothesis of no difference? The null and alternative hypotheses The claim being made in the illustrative example is that the population has higher than average intelligence. The null hypothesis is the population has average intelligence. Since an average intelligence score is 100, H 0 : μ 100. The alternative hypothesis claims the population has a higher than average intelligence. Therefore, H 1 : μ > 100. (The alternative hypothesis resembles the claim the investigator wishes to bolster.) It would be incorrect to state H 0 : x 100. Inferential statements address the population, not the sample. The alternative hypothesis is one-sided. It is interested only in whether the population has a higher average core. It is not interested a lower than average score. Test statistic We use our knowledge of sampling distributions of the means (SDM) to help make judgments. Assuming the null hypothesis is true, the sampling distribution of x based on n 9 would be Normal with a mean of 100 and standard error of σ / n 15/ 9 5. Therefore, under H 0, x ~ N(100, 5). Page 6.2 (C:\data\StatPrimer\hyp-test.doc Last printed 7/13/2006 10:59:00 PM)

The observed x of 112.8 is out in the right-tail of the SDM. The test statistic compares x to the hypothesized value (µ 0 ) as follows: x μ SEM z stat 0 where SEM represents the standard error of the mean and is equal to σ / n. The population standard deviation σ must be known in order to use this test statistic. (The z stat quantifies how far x is from μ 0 in standard deviation units.) For the illustrative example, 112.8 100 z stat 2.56. 5 P-value and conclusion To convert a z stat to a P-value, find the area under the curve beyond the z stat on a Standard Normal distribution. Use the Z table or a statistical package (e.g., StaTable) for this purpose. For the current problem we have: Therefore, P 0.0052. This provides good evidence against H 0. The jargon is to say the difference is significant. You can reject H 0. Page 6.3 (C:\data\StatPrimer\hyp-test.doc Last printed 7/13/2006 10:59:00 PM)

Two-Sided Alternative The two-sided z test makes no presupposition about the direction of the difference. The null hypothesis is the same as in the one-sided test: H 0 : μ μ 0. The alternative hypothesis is H 1 : μ μ 0. The test statistic is the same as the one-sample z test. Because we must consider sample means that might be above or below μ 0, the p value is two-tailed. Illustrative example (Two-sided alternative). We return to the Lake Wobegon illustrative example, where children are assumed to be above average. The two-sided alternative allows for unanticipated findings that are either up or down from expected. Let μ 0 represent the expected value of the population mean under the null hypothesis. In the Lake Wobegon example, μ 0 100. Therefore, we test H 0 : μ 100 versus H 1 : μ 100. x μ 112.8 100 The test statistic is z 0 2. 56 5 stat SEM (same as before). With two-sided alternatives, we reject the null hypothesis in favor of the alternative if the sample mean is either significantly greater or less than μ 0. Rejection regions for the test statistic lie in both tails of SDM. The effect is to double the size of the one-sided p value. The one-sided p value for the Lake Wobegon illustration was.0052. Therefore, the two-sided p value 2.0052.0104. The P-value of 0.0104 provides good evidence against the null hypothesis. Page 6.4 (C:\data\StatPrimer\hyp-test.doc Last printed 7/13/2006 10:59:00 PM)

Fallacies of Statistical Hypothesis Testing The results of statistical tests are frequently misunderstood. Therefore, I m going to list some of the fallacies of hypothesis testing here. It will be helpful to refer back to this list as you grapple with the interpretation of hypothesis tests results. 1. Failure to reject the null hypothesis leads to its acceptance. (WRONG! Failure to reject the null hypothesis implies insufficient evidence for its rejection.) 2. The p value is the probability that the null hypothesis is incorrect. (WRONG! The p value is the probability of the current data or data that is more extreme assuming H 0 is true.) 3. α.05 is a standard with an objective basis. (WRONG! α.05 is merely a convention that has taken on unwise mechanical use. There is no sharp distinction between significant and insignificant results, only increasingly strong evidence as the p value gets smaller. Surely god loves p.06 nearly as much as p.05) 4. Small p values indicate large effects. (WRONG! p values tell you next to nothing about the size of an effect.) 5. Data show a theory to be true or false. (WRONG! Data can at best serve to bolster or refute a theory or claim.) 6. Statistical significance implies importance. (WRONG! WRONG! WRONG! Statistical significance says very little about the importance of a relation.) Page 6.5 (C:\data\StatPrimer\hyp-test.doc Last printed 7/13/2006 10:59:00 PM)

One-Sample t Test The prior section used a z stat to test a sample mean against an expectation. The z stat needed population standard deviation σ (without estimating it from the data) to determine the standard error of the mean. To conduct a one-sample test when the population standard deviation is not known, we use a variant of the z stat called the t stat. The advantage of the t stat is that it can use sample standard deviation s instead of σ to formulate the estimated standard error of the mean. Hypotheses: The null and alternative hypotheses are identical to those used by the z test. The null hypothesis is H 0 : μ μ 0. Alternatives are H 1 : μ μ 0 (two-sided) H 1 : μ > μ 0 (one-sided to right) H 1 : μ < μ 0 (one-sided to left) Test statistic: The one-sample t statistic is: x μ sem t stat 0 where x represents the sample mean, μ 0 represents the expected value under the null hypothesis, and sem s / n. This statistic has n 1 degrees of freedom. P-value and conclusion: The t stat is converted to a p value with a computer program or t table. When using the t table, you will only be able to find boundaries for the p value. Small values of P provide evidence against H 0. Illustrative Example (%ideal.sav). These data were introduced in the prior chapter. Briefly, each value represents body weight expressed as a percentage of ideal (e.g., 100 represents 100% of ideal). We want to test whether data provides sufficient evidence to support a non-ideal body weight in the population. Hypotheses. If the mean body weight in the population was ideal, then μ would equal 100. Therefore, H 0 : μ 100. The test can be one-sided or two-sided. Because we are prudent, we choose the two-sided alternative, which is H 0 : μ 100. Test statistic. Eighteen (n 18) subjects demonstrate the following data {107, 119, 99, 114, 120, 104, 88, 114, 124, 116, 101, 121, 152, 100, 125, 114, 95, 117}. The sample mean x 112.778. The sample standard deviation s 14.424. The sem 14.424 / 18 3.400. 112.778 100 The t stat 3.76 with df 18 1 17. The t stat of 3.76 shows that the sample 3.400 mean of 112.778 is 3.76 standard errors greater than the null value. P-Value and conclusion. The two-sided P-value 0.0016, indicating that values as far from 100 as x 112.228 would occur 0.16% of the time if H 0 were true. This provides good evidence against H 0. We reject H 0 : μ 100 and conclude the difference is significant. Here s output from SPSS for the problem: Page 6.6 (C:\data\StatPrimer\hyp-test.doc Last printed 7/13/2006 10:59:00 PM)

If software is unavailable, use a t table to convert the t stat to the P-value. Follow this procedure: o Draw a t curve. The t distribution curve is centered on 0 with inflection points at ±1. Label the X-axis with tick marks approximately one standard deviation from each other. By the time you get past ±3 standard deviations, the curve should almost be touching the X-axis as an asymptote. o Place t stat on the horizontal axis of the curve in its approximate location. Shade the region in the right-tail. For two-sided alternatives, place t stat on the negative side of the curve. o Use the t table to find t landmarks just to the left and just to the right of the t stat. The one-tailed P-value is between these two values. Double the one-side p value to get the two-sided p value. Here s the visualization for the illustrative example: The t stat (3.76) is between critical values of 3.65 (right-tail region of.001) and 3.97 (right-tail region of.0005). Therefore, the one-tail P-value is greater than 0.0005 and less than 0.001. The two-tail P-value is greater than 0.001 and less than 0.002. Page 6.7 (C:\data\StatPrimer\hyp-test.doc Last printed 7/13/2006 10:59:00 PM)

Vocabulary Null hypothesis (H 0 ) - A statement that declares the observed difference is due to chance. It is the hypothesis the researcher hopes to reject. Alternative hypothesis (H 1 ) - The opposite of the null hypothesis. The hypothesis the researcher hopes to bolster. Alpha (α) - The probability the researcher is willing to take in falsely rejecting a true null hypothesis. Test statistic - A statistic used to test the null hypothesis. P-value - A probability statement that answers the question If the null hypothesis were true, what is the probability of observing the current data or data that is more extreme than the current data?. It is the probability of the data conditional on the truth of H0. It is NOT the probability that the null hypothesis is true. Type I error - a rejection of a true null hypothesis; a false alarm. Type II error - a retention of an incorrect null hypothesis; failure to sound the alarm. Confidence (1 - α) - the complement of alpha. Beta (β) - the probability of a type II error; probability of a retaining a false null hypothesis. Power (1 - β) - the complement of β; the probability of avoiding a type II error; the probability of rejecting a false null hypothesis. Comment: Statistical hypothesis testing is not the same thing as scientific hypothesis testing. Joseph Goldberger (1923) studied the cause of pellagra in a three year study completed from 1914-1917. The study was carried out at four orphanages and in a state sanitarium at a time when pellagra was endemic. Goldberger modified diets at the institutions by reducing maize content and adding legumes and fresh animal protein (meat, milk, and eggs). More than four hundred pellagrins and non-pellagrins were studied. There was a single recurrence among the pellagrins and not a single case among the non-pellagrins following initiation of dietary changes. Following discontinuation of the study, diets returned to their former state, and so did the state of health of the inmates: the incidence of pellagra returned to approximately 40 percent. Resumption of the modified diet for 14 months was followed by complete disappearance of the disease. This type of challenge-dechallange-rechallenge study did warranted a statistical test; the conclusion that pellagra was prevented by a diet high in legumes and animal protein was clear. Page 6.8 (C:\data\StatPrimer\hyp-test.doc Last printed 7/13/2006 10:59:00 PM)