The Null Hypothesis. Geoffrey R. Loftus University of Washington



Similar documents
Introduction to Hypothesis Testing. Hypothesis Testing. Step 1: State the Hypotheses

Introduction to Hypothesis Testing OPRE 6301

Sample Size and Power in Clinical Trials

II. DISTRIBUTIONS distribution normal distribution. standard scores

CHANCE ENCOUNTERS. Making Sense of Hypothesis Tests. Howard Fincher. Learning Development Tutor. Upgrade Study Advice Service

Recall this chart that showed how most of our course would be organized:

Association Between Variables

DDBA 8438: Introduction to Hypothesis Testing Video Podcast Transcript

Independent samples t-test. Dr. Tom Pierce Radford University

UNDERSTANDING THE DEPENDENT-SAMPLES t TEST

Study Guide for the Final Exam

Hypothesis Testing for Beginners

HYPOTHESIS TESTING: POWER OF THE TEST

AP: LAB 8: THE CHI-SQUARE TEST. Probability, Random Chance, and Genetics

Analysis of Variance ANOVA

Introduction to Quantitative Methods

Likelihood: Frequentist vs Bayesian Reasoning

Practical Research. Paul D. Leedy Jeanne Ellis Ormrod. Planning and Design. Tenth Edition

1.5 Oneway Analysis of Variance

Premaster Statistics Tutorial 4 Full solutions

Using Excel for inferential statistics

9.63 Laboratory in Cognitive Science. Interaction: memory experiment

Hypothesis testing. c 2014, Jeffrey S. Simonoff 1

SCHOOL OF HEALTH AND HUMAN SCIENCES DON T FORGET TO RECODE YOUR MISSING VALUES

Introduction to. Hypothesis Testing CHAPTER LEARNING OBJECTIVES. 1 Identify the four steps of hypothesis testing.

Module 5: Multiple Regression Analysis

How To Check For Differences In The One Way Anova

Comparison of frequentist and Bayesian inference. Class 20, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom

QUANTITATIVE METHODS BIOLOGY FINAL HONOUR SCHOOL NON-PARAMETRIC TESTS

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

Research Methods & Experimental Design

Point Biserial Correlation Tests

Two-sample inference: Continuous data

Chapter 8 Hypothesis Testing Chapter 8 Hypothesis Testing 8-1 Overview 8-2 Basics of Hypothesis Testing

LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING

CONTENTS OF DAY 2. II. Why Random Sampling is Important 9 A myth, an urban legend, and the real reason NOTES FOR SUMMER STATISTICS INSTITUTE COURSE

Pearson's Correlation Tests

Simple Regression Theory II 2010 Samuel L. Baker

Randomized Block Analysis of Variance

Writing Thesis Defense Papers

C. The null hypothesis is not rejected when the alternative hypothesis is true. A. population parameters.

Two Correlated Proportions (McNemar Test)

Section 13, Part 1 ANOVA. Analysis Of Variance

Statistical tests for SPSS

22. HYPOTHESIS TESTING

General Method: Difference of Means. 3. Calculate df: either Welch-Satterthwaite formula or simpler df = min(n 1, n 2 ) 1.

Online 12 - Sections 9.1 and 9.2-Doug Ensley

Quine on truth by convention

We are often interested in the relationship between two variables. Do people with more years of full-time education earn higher salaries?

Chapter 7 Notes - Inference for Single Samples. You know already for a large sample, you can invoke the CLT so:

Statistics Review PSY379

Unit 26 Estimation with Confidence Intervals


3.4 Statistical inference for 2 populations based on two samples

Hypothesis testing - Steps

Introduction to Analysis of Variance (ANOVA) Limitations of the t-test

p ˆ (sample mean and sample

Correlational Research

Psychology 60 Fall 2013 Practice Exam Actual Exam: Next Monday. Good luck!

UNDERSTANDING THE INDEPENDENT-SAMPLES t TEST

LAB : THE CHI-SQUARE TEST. Probability, Random Chance, and Genetics

Basic Concepts in Research and Data Analysis

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS

Experimental Design. Power and Sample Size Determination. Proportions. Proportions. Confidence Interval for p. The Binomial Test

Comparing Means in Two Populations

Wooldridge, Introductory Econometrics, 4th ed. Chapter 7: Multiple regression analysis with qualitative information: Binary (or dummy) variables


HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION

Introduction to Hypothesis Testing

Binomial Sampling and the Binomial Distribution

SECTION 10-2 Mathematical Induction

Regression Analysis: A Complete Example

CHI-SQUARE: TESTING FOR GOODNESS OF FIT

Additional sources Compilation of sources:

LOGIT AND PROBIT ANALYSIS

Multiple-Comparison Procedures

The Wilcoxon Rank-Sum Test

The Public Policy Process W E E K 1 2 : T H E S C I E N C E O F T H E P O L I C Y P R O C E S S

Experimental Designs (revisited)

Descriptive Statistics

Two-sample hypothesis testing, II /16/2004

SPSS Explore procedure

Mathematical Induction

Exhibit memory of previously-learned materials by recalling facts, terms, basic concepts, and answers. Key Words

Lecture 9 Maher on Inductive Probability

How To Test For Significance On A Data Set

Regression III: Advanced Methods

Elasticity. I. What is Elasticity?

Inference for two Population Means

Statistics 2014 Scoring Guidelines

LOGISTIC REGRESSION ANALYSIS

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

Chapter 9. Systems of Linear Equations

Chapter 6 Experiment Process

" Y. Notation and Equations for Regression Lecture 11/4. Notation:

Projects Involving Statistics (& SPSS)

Mgmt 469. Regression Basics. You have all had some training in statistics and regression analysis. Still, it is useful to review

Solutions to Math 51 First Exam January 29, 2015

Why are thesis proposals necessary? The Purpose of having thesis proposals is threefold. First, it is to ensure that you are prepared to undertake the

Transcription:

The Null Hypothesis Geoffrey R. Loftus University of Washington Send correspondence to: Geoffrey R. Loftus Department of Psychology, Box 351525 University of Washington Seattle, WA 98195-1525 gloftus@u.washington.edu 206 543-8874

Loftus, G.R. Page 2 of 8 2/26/09 In many sciences including for example, ecology, medicine, and psychology, null hypothesis significance testing (NHST) is the primary means by which the numbers comprising the data from some experiment are translated into conclusions about the question(s) that the experiment was designed to address. In this entry, I make three main points. First, I provide a brief description of NHST and within the context of NHST, define the most common incarnation of a null hypothesis. Second, I sketch other less common forms of a null hypothesis. Third, I articulate a number of problems with using null hypothesis-based data analysis procedures. NHST and the Null Hypothesis Most experiments entail measuring the effect(s) of some number of independent variables on some dependent variable. An example experiment In the simplest sort of experimental design, one measures the effect of a single independent variable, say amount of information held in short-term memory on a single dependent variable, say reaction time to scan through this information. To pick a somewhat arbitrary example from cognitive psychology, consider what is known as a Sternberg experiment, in which a short sequence of memory digits (e.g., 34291 ) is read to an observer who must then decide whether a single, subsequently presented test digit was part of the sequence. Thus for instance, given the memory digits above, the correct answer would be yes for a test digit of 2 but no for a test digit of 8. The independent variable of amount of information held in short-term memory can be implemented by varying set size which is the number of memory digits presented: in different conditions, set size might be, say, 1, 3, 5 (as in the example), or 8 presented memory digits. The number of different set sizes (here 4) is more generally referred to as the number of levels of the independent variable. The dependent variable is the reaction time measured from the appearance of the test digit to the observer s response. Of interest in general is the degree to which the magnitude of the dependent variable (here, reaction time) depends on the level of the independent variable (here set size). Sample and population means Typically, the principal dependent variable takes the form of a mean. In this example mean reaction time for a given set size could be computed across observers. Such a computed mean is called a sample mean, referring to its having been computed across an observed sample of numbers. A sample mean is construed as an estimate of a corresponding population mean which is what the mean value of the dependent variable would be if all observers in the relevant population were to participate in a given condition of the experiment. Generally, conclusions from experiments are meant to apply to population

Loftus, G.R. Page 3 of 8 2/26/09 means. Therefore, the measured sample means are only interesting insofar as they are estimates of the corresponding population means. Notationally, the sample means are referred to as the M j s while the population means are referred to as the µ j s. For both sample and population means, the subscript j indexes the level of the independent variable; thus in our example M 2 would refer to the observed mean reaction time of the second set-size level, i.e., set size = 3 and likewise, µ 2 would refer to the corresponding, unobservable population mean reaction time corresponding to set size = 3. Two competing hypotheses NHST entails establishing and evaluating two mutually exclusive and exhaustive hypotheses about the relation between the independent variable and the dependent variable. Usually, and in its simplest form, the null hypothesis (abbreviated H 0 ) is that the independent variable has no effect on the dependent variable, while the alternative hypothesis (abbreviated H 1 ) is that the independent variable has some effect on the dependent variable. Note an important asymmetry between a null hypothesis and an alternative hypothesis: a null hypothesis an exact hypothesis while an alternative hypothesis is an inexact hypothesis. By this is meant that a null hypothesis can only be correct in only one way, viz, the µ j s are all equal to one another, while there are an infinite number of ways in which the µ j s can be different from one another, i.e., an infinite number of ways in which an alternative hypothesis can be true. Decisions based on data Having established a null and an alternative hypothesis that are mutually exclusive and exhaustive, the experimental data are used to roughly speaking; see Point 2 below decide between them. The technical manner by which one makes such a decision is beyond the scope of this entry, but two remarks about the process are appropriate here. 1. A major ingredient in the decision is the variability of the M j s. To the degree that the M j s are close to one another, evidence ensues for possible equality of the µ j s and, ipso facto, validity of the null hypothesis. Conversely, to the degree that the M j s differ from one another, evidence ensues for associated differences among the µ j s and, ipso facto, validity of the alternative hypothesis. 2. The asymmetry between the null hypothesis (which is exact) and the alternative hypothesis (which is inexact) sketched above implies an associated asymmetry in conclusions about their validity. If the M j s differ sufficiently, one rejects the null hypothesis in favor of accepting the alternative hypothesis. However if the M j s do not differ sufficiently, one does not accept the null hypothesis, but rather one fails to reject the null hypothesis. The reason for the awkward, but logically necessary, wording of the

Loftus, G.R. Page 4 of 8 2/26/09 latter conclusion is that, because the alternative hypothesis is inexact, one cannot generally distinguish a genuinely true null hypothesis on the one hand from an alternative hypothesis entailing very small differences among the µ j s on the other hand. Multifactor designs: Multiple null hypothesis-alternative hypothesis pairings So far I have described a simple design in which the effect of a single independent variable on a single dependent variable is examined. Many, if not most experiments, utilize multiple independent variables, and are known as multifactor designs ( factor and independent variable are synonymous). Continuing with the example experiment, imagine that in addition to measuring effects of set size on reaction time in a Sternberg task, one also wanted to simultaneously measure effects on reaction time of the test digit s visual contrast (informally, the degree to which the test digit stands out against the background). One might then factorially combine the four levels of set size (now called Factor 1 ) with, say, two levels, high contrast and low contrast, of test-digit contrast (now called Factor 2 ). Combining the four set-size levels with the two test-digit contrast levels would yield 4 x 2 = 8 separate conditions. Typically, three independent NHST procedures would then be carried out, entailing three null hypothesis-alternative hypothesis pairings. They are: 1. For the set size main effect: H 0 : Averaged over the two test-digit contrasts, there is no set-size effect H 1 : Averaged over the two test-digit contrasts, there is a set-size effect 2. For the test-digit contrast main effect: H 0 : Averaged over the four set sizes, there is no test-digit contrast effect H 1 : Averaged over the four set sizes, there is a test-digit contrast effect 3. For set-size x test-digit contrast interaction: Two independent variables are said to interact if the effect of one independent variable depends on the level of the other independent variable. As with the main effects, interaction effects are immediately identifiable with respect to the M j s; however again as with main effects, the goal is to decide whether interaction effects exist with respect to the corresponding µ j s. As with the main effects, NHST involves pitting a null hypothesis against an associated alternative hypothesis. H 0 : With respect to the µ j s, set size and test-digit contrast do not interact. H 1 : With respect to the µ j s, set size and test-digit contrast do interact. The logic of carrying out NHST with respect to interactions is the same as the logic of carrying out NHST with respect to main effects. In particular, with interactions as with main effects, one can reject a

Loftus, G.R. Page 5 of 8 2/26/09 null hypothesis of no interaction, but one cannot accept a null hypothesis of no interaction. Non- Zero-Effect Null Hypotheses The null hypotheses described above imply no effect of one sort or another either no main effect of some independent variable, or no interaction between two independent variables. This kind of noeffect null hypothesis is by far the most common null hypothesis to be found in the literature. Technically however, a null hypothesis can be any exact hypothesis; that is the null hypothesis of all µ j s are equal to one another is but one special case of what a null hypothesis can be. To illustrate another form, let us continue with the first, simpler Sternberg-task example (set size is the only independent variable), but imagine that prior research justifies the assumption that the relation between set size and reaction time is linear. Suppose further that research with digits has yielded the conclusion that reaction time increases by 35 ms for every additional digit held in short-term memory; i.e., that if reaction time were plotted against set size, the resulting function would be linear with a slope of 35 ms. Now let us imagine that the Sternberg experiment is done with words rather than digits. One could establish the null hypothesis that short-term memory processing proceeds at the same rate with words as it does with digits, i.e., that the slope of the reaction time versus set-size function would be 35 ms for words just as it is known to be with digits. The alternative hypothesis would then be for words, the function s slope is anything other than 35 ms. Again the fundamental distinction between a null and alternative hypothesis is that the null hypothesis is exact (35 ms/digit), while the alternative hypothesis is inexact (anything else). This distinction would again drive the asymmetry between conclusions, articulated above: a particular pattern of empirical results could logically allow rejection of the null hypothesis; i.e., acceptance of the alternative hypothesis but not acceptance of the null hypothesis. Problems with NHST No description of NHST in general, or a null hypothesis in particular is complete without at least a brief account of serious problems that accrue when NHST is the sole statistical technique used for making inferences about the µ s from the M j s. Very briefly, three of the major problems involving a null hypothesis as the centerpiece of data analysis are these. A null hypothesis cannot be literally true In most sciences it is almost a self-evident truth that any independent variable must have some effect, even if small, on any dependent variable. This is certainly true in psychology. In the Sternberg task, to illustrate, it is simply implausible that set size would have literally zero effect on reaction time, i.e., that is that the µ j s corresponding to the different set sizes would be identical to an infinite number of

Loftus, G.R. Page 6 of 8 2/26/09 decimal places. Therefore, rejecting a null hypothesis which, as noted, is the only strong conclusion that is possible within the context of NHST tells the investigator nothing that the investigator should have been able to realize was true beforehand. Most investigators do not recognize this, but that does not prevent it from being so. Human nature makes acceptance of a null hypothesis almost irresistible Earlier I articulated why it is logically forbidden to accept a null hypothesis. However, human nature dictates that people do not like to make weak yet complicated conclusions such as We fail to reject the null hypothesis. Scientific investigators, generally being humans, are not exceptions. Instead, a fail to reject decision, dutifully made in an article s results section, almost inevitably morphs into the null hypothesis is true in the article s discussion and conclusions sections. This kind of sloppiness, while understandable, has led to no end of confusion and general scientific mischief within numerous disciplines. NHST emphasizes barren, dichotomous conclusions Earlier, I described that the pattern of population means the relations among the unobservable µ j s are of primary interest in most scientific experiments, and that the observable M j s are estimates of the µ j s. Accordingly, it should be of great interest to assess how good are the M j s as estimates of the µ j s. If, to use an extreme example, the M j s were perfect estimates of the µ j s there would be no need for statistical analysis: the answers to any question about the µ j s would be immediately available from the data. To the degree that the estimates are less good, one must exercise concomitant caution in using the M j s to make inferences about the µ j s. None of this is relevant within the process of NHST, which does not in any way emphasize the degree to which the M j s are good estimates of the µ j s. In its typical form, NHST allows only a very limited assessment of the nature of the µ j s: Are they all equal or not? Typically, the no or not necessarily no conclusion that emerges from this process is woefully insufficient to evaluate the totality of what the data might potentially reveal about the nature of the µ j s. An alternative that is gradually emerging within several NHST-heavy sciences an alternative that is common in the natural sciences is the use of confidence intervals which assess directly how good is a M j as an estimate of the corresponding µ j. Very briefly, a confidence interval is an interval constructed around a sample mean that, with some pre-specified probability (typically 95%), includes the corresponding population mean. A glance at a set of plotted M j s with associated plotted confidence intervals provides immediate and intuitive information about (a) the most likely pattern of the µ j s and (b) the reliability of the pattern of M j s as an estimate of the pattern of µ j s. This in turn provides immediate

Loftus, G.R. Page 7 of 8 2/26/09 and intuitive information both about the relatively uninteresting question of whether some null hypothesis is true, and about the much more interesting questions of what the pattern of µ j s actually is and how much belief can be placed in it based on the data at hand. Further readings Fidler, F.. & Loftus, G.R. (in press). Why hypothesis testing is misunderstood: Hypotheses and Data. Loftus, G.R. (1996). Psychology will be a much better science when we change the way we analyze data. Current Directions in Psychological Science, 161-171. Loftus, G.R. & Masson, M.E.J. (1994) Using confidence intervals in within-subjects designs. Psychonomic Bulletin & Review, 1, 476-490. Other relevant entries Alpha Analysis of Variance (ANOVA) Beta Chi-squared Test Confidence Intervals Contrasts Decision Rule Directional Hypotheses F Test Hypothesis Hypothesis Testing Inference (Inductive and Deductive) Level of Significance Logic of Scientific Discovery, The (Popper) Nonsignificance Population Power Analysis

Loftus, G.R. Page 8 of 8 2/26/09 p-value Research Hypothesis Significance (Statistical Significance) Significance Level Simple Main Effects Statistical Power Analysis for the Behavioral Sciences (Cohen) Two-tailed Test Type I Error Type II Error