Chapter 8: Introduction to Hypothesis Testing

Similar documents
Introduction to Hypothesis Testing. Hypothesis Testing. Step 1: State the Hypotheses

Introduction to. Hypothesis Testing CHAPTER LEARNING OBJECTIVES. 1 Identify the four steps of hypothesis testing.

Introduction to Hypothesis Testing OPRE 6301

DDBA 8438: Introduction to Hypothesis Testing Video Podcast Transcript

Chapter 8 Hypothesis Testing Chapter 8 Hypothesis Testing 8-1 Overview 8-2 Basics of Hypothesis Testing

Independent samples t-test. Dr. Tom Pierce Radford University

An Introduction to Statistics Course (ECOE 1302) Spring Semester 2011 Chapter 10- TWO-SAMPLE TESTS

Study Guide for the Final Exam

Descriptive Statistics

Difference tests (2): nonparametric

Comparing Two Groups. Standard Error of ȳ 1 ȳ 2. Setting. Two Independent Samples

WISE Power Tutorial All Exercises

Lesson 1: Comparison of Population Means Part c: Comparison of Two- Means

Lesson 9 Hypothesis Testing

Chapter 6: Probability

Introduction to Hypothesis Testing

Sample Size and Power in Clinical Trials

Psychology 60 Fall 2013 Practice Exam Actual Exam: Next Monday. Good luck!

Two-sample hypothesis testing, II /16/2004

Statistical tests for SPSS

Stat 411/511 THE RANDOMIZATION TEST. Charlotte Wickham. stat511.cwick.co.nz. Oct

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

HYPOTHESIS TESTING: POWER OF THE TEST

UNDERSTANDING THE DEPENDENT-SAMPLES t TEST

12.5: CHI-SQUARE GOODNESS OF FIT TESTS

" Y. Notation and Equations for Regression Lecture 11/4. Notation:

Introduction to Analysis of Variance (ANOVA) Limitations of the t-test

Point Biserial Correlation Tests

Week 4: Standard Error and Confidence Intervals

Chapter 7 Section 7.1: Inference for the Mean of a Population

Chapter 14: Repeated Measures Analysis of Variance (ANOVA)

Section 7.1. Introduction to Hypothesis Testing. Schrodinger s cat quantum mechanics thought experiment (1935)

SCHOOL OF HEALTH AND HUMAN SCIENCES DON T FORGET TO RECODE YOUR MISSING VALUES


1 Why is multiple testing a problem?

p ˆ (sample mean and sample

Comparison of frequentist and Bayesian inference. Class 20, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom

Class 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1)

Hypothesis testing. c 2014, Jeffrey S. Simonoff 1

Mind on Statistics. Chapter 12

II. DISTRIBUTIONS distribution normal distribution. standard scores

Good luck! BUSINESS STATISTICS FINAL EXAM INSTRUCTIONS. Name:

Normal distribution. ) 2 /2σ. 2π σ

Pearson's Correlation Tests

Testing Hypotheses About Proportions

The Normal Distribution

Writing Thesis Defense Papers

Lecture Notes Module 1

HYPOTHESIS TESTING WITH SPSS:

Analysis of Variance ANOVA

Math 251, Review Questions for Test 3 Rough Answers

Survey Research: Choice of Instrument, Sample. Lynda Burton, ScD Johns Hopkins University

Formal Languages and Automata Theory - Regular Expressions and Finite Automata -

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

What is a P-value? Ronald A. Thisted, PhD Departments of Statistics and Health Studies The University of Chicago

Chapter 2. Hypothesis testing in one population

Statistics 2014 Scoring Guidelines

In the past, the increase in the price of gasoline could be attributed to major national or global

Chapter 9. Two-Sample Tests. Effect Sizes and Power Paired t Test Calculation

Statistics Review PSY379

QUANTITATIVE METHODS BIOLOGY FINAL HONOUR SCHOOL NON-PARAMETRIC TESTS

Inclusion and Exclusion Criteria

research/scientific includes the following: statistical hypotheses: you have a null and alternative you accept one and reject the other

Simple Regression Theory II 2010 Samuel L. Baker

Sample Paper for Learning Research Proposal. Daren H. Kaiser. Indiana University Purdue University Fort Wayne

3. Mathematical Induction

C. The null hypothesis is not rejected when the alternative hypothesis is true. A. population parameters.

"Statistical methods are objective methods by which group trends are abstracted from observations on many separate individuals." 1

Two-sample inference: Continuous data

Multivariate Analysis of Variance. The general purpose of multivariate analysis of variance (MANOVA) is to determine

The Null Hypothesis. Geoffrey R. Loftus University of Washington

Analysis of Data. Organizing Data Files in SPSS. Descriptive Statistics

Basic Concepts in Research and Data Analysis

Hypothesis Testing: Two Means, Paired Data, Two Proportions

UNDERSTANDING THE INDEPENDENT-SAMPLES t TEST

Non-Parametric Tests (I)

1 Hypothesis Testing. H 0 : population parameter = hypothesized value:

Permutation Tests for Comparing Two Populations

6.4 Normal Distribution

Odds ratio, Odds ratio test for independence, chi-squared statistic.

Experimental Design. Power and Sample Size Determination. Proportions. Proportions. Confidence Interval for p. The Binomial Test

Linear Programming Notes VII Sensitivity Analysis

Lesson 4 Measures of Central Tendency

Association Between Variables

A POPULATION MEAN, CONFIDENCE INTERVALS AND HYPOTHESIS TESTING

Section 13, Part 1 ANOVA. Analysis Of Variance

Sample Paper for Research Methods. Daren H. Kaiser. Indiana University Purdue University Fort Wayne

Recall this chart that showed how most of our course would be organized:

Selecting Research Participants

Unit 26 Estimation with Confidence Intervals

A Few Basics of Probability

5.1 Identifying the Target Parameter

LAB : THE CHI-SQUARE TEST. Probability, Random Chance, and Genetics

6.3 Conditional Probability and Independence

Name: (b) Find the minimum sample size you should use in order for your estimate to be within 0.03 of p when the confidence level is 95%.

Week 3&4: Z tables and the Sampling Distribution of X

Reflections on Probability vs Nonprobability Sampling

Variables Control Charts

BA 275 Review Problems - Week 5 (10/23/06-10/27/06) CD Lessons: 48, 49, 50, 51, 52 Textbook: pp

STATISTICS PROJECT: Hypothesis Testing

Transcription:

Chapter 8: Introduction to Hypothesis Testing We re now at the point where we can discuss the logic of hypothesis testing. This procedure will underlie the statistical analyses that we ll use for the remainder of the semester, so it s important that you understand the logic. A hypothesis test is a statistical method that uses sample data to evaluate a hypothesis about a population parameter. So, the procedure is to first state a hypothesis about a population parameter, such as µ. The next step is to collect sample data that would address the hypothesis. And then to determine the extent to which the sample data are consistent with the hypothesis about the parameter. Here s an illustration of the logic of null hypothesis significance testing (NHST). Suppose that a population of 2-year-old children has a mean weight µ = 26 pounds and σ = 4. If you were to administer a treatment (handling) to every member of the population, what would happen to the scores? The effect of treatment is modeled as an additive constant (adding either plus or minus constant). If you recall the impact of an additive constant on variability, you ll realize that the standard deviation would stay the same. If the treatment has no effect, then the treatment constant would be zero, and the population mean would be unchanged. Schematically, the situation is illustrated below: So, the question becomes, Was the treatment effect zero, or was it greater than zero? To test that question, we would typically construct a testable statistical hypothesis, called the null hypothesis (H 0 ). In this case, H 0 : µ = 26. But, of course, we cannot treat and measure every member of the population. Instead, we will take a sample (e.g., n = 16) and give them extra handling. If the mean weight of the sample is near 26 pounds, then the handling treatment will likely be considered to be ineffective. To the extent that the mean weight of the sample is much larger (or smaller) than 26, then we would be inclined to think that the handling treatment was effective. The crucial question is, How much must the mean weight differ from 26 pounds to convince us that the treatment was effective? The conventional way of determining the extent to which the treatment was effective is by establishing a level of significance or alpha level. What an alpha level represents is one s willingness to say that a sample mean was not drawn from the population with H 0 true, when in Ch8-1

fact it was. In other words, what level of error are you willing to tolerate? (This particular kind of error is called a Type I Error, but more on that a bit later on.) Another way of thinking about alpha level is that it determines the weird or unusual scores that would occur in the distribution when H 0 is true. The typical alpha level is.05 (α =.05), which means that the definition of weird is a score that would occur so infrequently as to lie in the lower or upper 2.5% of the sampling distribution. Alternatively, you could think that you are willing to conclude that if your sample mean falls in the lower or upper 2.5% of the distribution when H 0 is true, you would be better off concluding that H 0 is false. As you may recall, the z-scores that determine the lower and upper.025 of a normal distribution are 1.96 and +1.96, as illustrated in the figure below. After you ve determined your null and alternative hypotheses, and your significance level (typically.05), you re ready to collect your sample and determine the sample mean. With the sample mean, you re now in a position to compute a z-score: z = X µ obtained difference = σ X difference due to chance Finally, given your z-score, you can make a decision about the null hypothesis. If the sample mean leads to a z-score that would fall in the critical region, you would decide to reject H 0. If the sample mean leads to a z-score that doesn t fall in the critical region, you would fail to reject, or retain, H 0. Uncertainty and Errors in Hypothesis Testing One of the reasons that we never talk about proving anything in science is that we recognize the ubiquity of uncertainty. Because we can never know the truth independent of our research, we can never be certain of our conclusions. Thus, when you decide to reject H 0, you need to be aware that H 0 could really be false, in which case you have made a correct rejection. It s also possible, however, that rejected H 0 and it s really true. If so, you ve made an error. We call that error a Type I error. You should recognize that the significance level you choose is an expression of tolerance for a Type I error. Ch8-2

When you decide to retain H 0, it may well be that H 0 is true, which is a correct retention. It s also possible, however, that H 0 is false. In that case retaining H 0 would be an error a Type II error. You can summarize the four possibilities in a table: Experimenter s Decision Actual Situation H 0 True H 0 False Reject H 0 Type I Error Correct Rejection Retain H 0 Correct Retention Type II Error Note that if you make the decision to reject H 0, you cannot make a Type II Error. J Example of Hypothesis Testing with a z-score 1. Question: Does prenatal alcohol have an impact on birth weight? 2. What are population characteristics normally?: µ = 18 grams, σ = 4 3. State the null and alternative hypotheses, as well as the α-level: H 0 : µ = 18 H 1 : µ 18 α =.05, so if z 1.96, reject H 0 4. Collect the data and compute the test statistic: With a sample of n = 16 and a sample mean ( X ) = 15, you could compute your z-score: 5. Make a decision: z = X µ 15 18 = = 3.0 σ X 1 Because z Obtained 1.96, I would reject H 0 and conclude that the impact of prenatal alcohol is to reduce birth weight. (Note that I could be making a Type I error.) Assumptions underlying hypothesis tests with z-scores You will actually see very few hypothesis tests using z-scores. That s because doing so requires that one knows σ, which is a highly unusual circumstance. Beyond that major assumption, there are other assumptions of note: 1. Participants are randomly sampled. (And that almost never happens in psychological research!) 2. Observations are independent of one another. Random sampling will typically produce observations that are independent of one another. 3. The value of σ will not be affected by the treatment. Even in the rare circumstances under which we might actually know σ, it s also crucial that any treatment we use has no impact on σ, otherwise that statistic will be thrown off. 4. The sampling distribution of the mean is normal. Of course, with increasingly large sample size (approaching infinity), the sampling distribution will become normal. Ch8-3

Effect Size Some people have expressed reservations about null hypothesis significance testing, which your text s authors detail (and you should read and think about). A crucial question that is not addressed by a significance test is the size of the treatment effect. That effect can be assessed by a number of measures, one of which is Cohen s d. You compute this measure of effect size as follows: mean difference d = standard deviation A value of d between 0 and 0.2 is considered a small effect, between 0.2 and 0.8 is considered a medium effect, and a value of d greater than 0.8 is considered a large effect. Statistical Power Throughout this course, you will be exposed to the notion of power. The power of a test is the probability that the test will correctly reject a false null hypothesis. It represents the opposite of a Type II error, also called a β error. Thus, power is described as 1 β. Several factors can influence power, but for now, you should think of the impact of treatment effect on power. In the example of the impact of prenatal alcohol on birth weight, if the alcohol had a greater impact, the mean weight of the sample would be smaller still (leading to an even larger z-score). As you can see in the figures below, if the sample were drawn from a population with µ = 240, power would be greater than if the sample were drawn from a population with µ = 220. (Of course, in the real world, you d never know the µ of the population from which your sample was drawn.) Ch8-4

Problems: 1. In earlier notes, we used normally distributed gestation times (µ = 268, σ = 16) to address various questions. In a similar fashion, we could test hypotheses using that distribution. For example, suppose that you had a sample of 16 women and their mean gestation period was 274 days. How likely is it that your sample was randomly drawn from the population of gestation periods? Would your conclusion change if you sample size was n = 64? What would the effect size (d) be in both cases? H 0 : H 1 : α =.05 n = 16 n = 64 Compute statistics: Compute statistics: Decision: Conclusion: Decision: Conclusion: 2. Given that IQ scores are normally distributed with µ = 100 and σ = 15, how likely is it that a sample of n = 25 students with M = 109 was randomly drawn from that population? What is the effect size (d) in this case? Ch8-5

3. What is the relationship between effect size and power? If you have a great deal of power, do you need to worry about effect size? If you have a small effect size, do you need to worry about power? Provide examples to make your points. 4. Can you relate Null Hypothesis Significance Testing (NHST) to Signal Detection Theory (SDT)? What concept in NHST might you relate to d in SDT? 5. What s the major limitation to conducting hypothesis tests with z-scores? How might you surmount that difficulty? Ch8-6