Statistical Foundations:

Similar documents
Experimental Design. Power and Sample Size Determination. Proportions. Proportions. Confidence Interval for p. The Binomial Test

Statistics 2014 Scoring Guidelines

Introduction to. Hypothesis Testing CHAPTER LEARNING OBJECTIVES. 1 Identify the four steps of hypothesis testing.

MONT 107N Understanding Randomness Solutions For Final Examination May 11, 2010

Introduction to Hypothesis Testing. Hypothesis Testing. Step 1: State the Hypotheses

Class 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1)

II. DISTRIBUTIONS distribution normal distribution. standard scores

Two-sample hypothesis testing, II /16/2004

Study Guide for the Final Exam

Principles of Hypothesis Testing for Public Health

Section 13, Part 1 ANOVA. Analysis Of Variance

research/scientific includes the following: statistical hypotheses: you have a null and alternative you accept one and reject the other

Chapter 7 Notes - Inference for Single Samples. You know already for a large sample, you can invoke the CLT so:

Descriptive Statistics

Outline. Definitions Descriptive vs. Inferential Statistics The t-test - One-sample t-test

Fairfield Public Schools

Tutorial 5: Hypothesis Testing

Using Excel for inferential statistics

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION

Comparison of frequentist and Bayesian inference. Class 20, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom

Simple Linear Regression Inference

LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING

Projects Involving Statistics (& SPSS)

General Method: Difference of Means. 3. Calculate df: either Welch-Satterthwaite formula or simpler df = min(n 1, n 2 ) 1.


Consider a study in which. How many subjects? The importance of sample size calculations. An insignificant effect: two possibilities.

A Statistical Analysis of Popular Lottery Winning Strategies

Two-Sample T-Tests Assuming Equal Variance (Enter Means)

Chapter 8 Hypothesis Testing Chapter 8 Hypothesis Testing 8-1 Overview 8-2 Basics of Hypothesis Testing

Two-sample inference: Continuous data

Psychology 60 Fall 2013 Practice Exam Actual Exam: Next Monday. Good luck!

HYPOTHESIS TESTING: POWER OF THE TEST

" Y. Notation and Equations for Regression Lecture 11/4. Notation:

CALCULATIONS & STATISTICS

Lesson 1: Comparison of Population Means Part c: Comparison of Two- Means

SCHOOL OF HEALTH AND HUMAN SCIENCES DON T FORGET TO RECODE YOUR MISSING VALUES

Section 7.1. Introduction to Hypothesis Testing. Schrodinger s cat quantum mechanics thought experiment (1935)

Two-Sample T-Tests Allowing Unequal Variance (Enter Difference)

Mathematical goals. Starting points. Materials required. Time needed

Independent samples t-test. Dr. Tom Pierce Radford University

Sample Size and Power in Clinical Trials

QUANTITATIVE METHODS BIOLOGY FINAL HONOUR SCHOOL NON-PARAMETRIC TESTS

DDBA 8438: Introduction to Hypothesis Testing Video Podcast Transcript

Research Methods & Experimental Design

Lecture Notes Module 1

AP STATISTICS (Warm-Up Exercises)

Survey Research: Choice of Instrument, Sample. Lynda Burton, ScD Johns Hopkins University

Chapter 2 Probability Topics SPSS T tests

How To Check For Differences In The One Way Anova

Inference for two Population Means

WISE Power Tutorial All Exercises

Likelihood: Frequentist vs Bayesian Reasoning

CONTENTS OF DAY 2. II. Why Random Sampling is Important 9 A myth, an urban legend, and the real reason NOTES FOR SUMMER STATISTICS INSTITUTE COURSE

Using R for Linear Regression

Error Type, Power, Assumptions. Parametric Tests. Parametric vs. Nonparametric Tests

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Hypothesis testing. c 2014, Jeffrey S. Simonoff 1

Correlational Research

THE FIRST SET OF EXAMPLES USE SUMMARY DATA... EXAMPLE 7.2, PAGE 227 DESCRIBES A PROBLEM AND A HYPOTHESIS TEST IS PERFORMED IN EXAMPLE 7.

Chicago Booth BUSINESS STATISTICS Final Exam Fall 2011

A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution

A full analysis example Multiple correlations Partial correlations

Variables Control Charts

HYPOTHESIS TESTING WITH SPSS:

Comparing Means in Two Populations

MTH 140 Statistics Videos

Unit 26 Estimation with Confidence Intervals

Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools

COMPARISONS OF CUSTOMER LOYALTY: PUBLIC & PRIVATE INSURANCE COMPANIES.

Rank-Based Non-Parametric Tests

Lecture 9: Bayesian hypothesis testing

Lesson 9 Hypothesis Testing

Two-Group Hypothesis Tests: Excel 2013 T-TEST Command

The Mann-Whitney U test. Peter Shaw

LAB : THE CHI-SQUARE TEST. Probability, Random Chance, and Genetics

STA-201-TE. 5. Measures of relationship: correlation (5%) Correlation coefficient; Pearson r; correlation and causation; proportion of common variance

NCSS Statistical Software

BA 275 Review Problems - Week 6 (10/30/06-11/3/06) CD Lessons: 53, 54, 55, 56 Textbook: pp , ,

Additional sources Compilation of sources:

Testing a claim about a population mean

Mind on Statistics. Chapter 12

CHAPTER 14 NONPARAMETRIC TESTS

NONPARAMETRIC STATISTICS 1. depend on assumptions about the underlying distribution of the data (or on the Central Limit Theorem)

p-values and significance levels (false positive or false alarm rates)

Lean Six Sigma Black Belt Body of Knowledge

TI-Inspire manual 1. Instructions. Ti-Inspire for statistics. General Introduction

Confidence Intervals for Cpk

p ˆ (sample mean and sample

First-year Statistics for Psychology Students Through Worked Examples

AP: LAB 8: THE CHI-SQUARE TEST. Probability, Random Chance, and Genetics

This chapter discusses some of the basic concepts in inferential statistics.

Independent t- Test (Comparing Two Means)

Two Related Samples t Test

Bowerman, O'Connell, Aitken Schermer, & Adcock, Business Statistics in Practice, Canadian edition

A Hands-On Exercise Improves Understanding of the Standard Error. of the Mean. Robert S. Ryan. Kutztown University

How Does My TI-84 Do That

t Tests in Excel The Excel Statistical Master By Mark Harmon Copyright 2011 Mark Harmon

Testing Hypotheses About Proportions

Chapter Eight: Quantitative Methods

Transcription:

Statistical Foundations: Hypothesis Testing Psychology 790 Lecture #10 9/26/2006

Today sclass Hypothesis Testing. An Example. Types of errors illustrated. Misconceptions about hypothesis testing.

Upcoming Schedule Today (9/26): Hypothesis Testing. Thursday 9/28: Confidence Intervals. Tuesday 10/3: t and two-sample tests. t Thursday 10/5: Midterm review Tuesday 10/10: Midterm (20 item multiple choice). Wednesday (10/11 national holiday for my wife s 30 th birthday) Thursday 10/12: Fall break. Tuesday 10/17: Correlation and Regression from Hays.

Hypothesis Testing Example

An Example of Hypothesis Testing Recall from last week we talked a bit about the Wechsler Adult Intelligence Scale. In the general population, the test has an average of 100 and a standard deviation of 15. Lets go and try a hypothesis test to see if KU students have a similar mean WAIS score. We will sample 100 KU students at random and administer the WAIS.

Buy the WAIS on ebay!!

Example Setup What is the null hypothesis? H 0 : μ KU = 100 What is the alternative hypothesis? H A : μ KU 100

Distributional Setup The key element in our example is to find out what the assumed distribution under H 0. In our case, we will be sampling 100 subjects and taking a sample mean. What does the sampling distribution of the mean look like for N=100? σ ( ) 15, = N 100, N ( 100,1.51 ) N = 100 μ N

Distribution of Test Statistic Under Ui Using R, Rthe plot ltto the right is a picture of the distribution of the test statistic ( x ) under the null hypothesis. Null Hypothesis

Step 1: Set the Type I Error Rate Before we collect our sample, we must first set the Type I error rate for our experiment. Recall the Type I error rate (or α) is the maximum probability we will allow for rejecting the null hypothesis when the null hypothesis is true. This sets up the decision rule for our test. From this, we can obtain a critical value to which we can compare our test statistic. Wh d? What rate do you want to set? Let s to α = 0.05, for tradition s sake.

Decision Rule Using α = 0.05, we can then assign a region of our null distribution where we will reject the null hypothesis. Because we have no idea which direction KU s sample mean will fall, we will split our region into two halfs: An upper tail and a lower tail. We then want to find the following points: α 2 = 0. = α 2 = 0. X ( ) = U such that P x X U 025 X such that P L X L Find these two points. ( x ) 025

Decision Rule Plot We will reject H 0 if our sample mean is in either of these two regions. 97.06 102.94

Our Sample KU Students Laser

Our Sample Lucky for you, I have tweaked my laser pointer to now give me the WAIS score (up to 5 digits) for individuals when hit with the laser beam.

Test Statistic Our sample mean was 107.79. 79 The sample SD was 15.55. Now, what do we decide about our hypothesis test? We reject H 0 because our sample mean of j 0 p 107.79 falls into the rejection region (it is greater than 102.79).

Errors in Hypothesis Testing

Inferential Errors and NHST Real World Null is true Null is false Con nclusion n of the te est lse Null is tr rue Null is fa Correct decision Type I error Type II error Correct decision

Errors and Our Example Knowing a bit about the truth (from simulated data), we can revisit our example for a better description of Type I and Type II errors with graphics. From the example, we knew that the null population sampling distribution of the mean was N(100,1.5). The KU student population sampling distribution for mean WAIS scores was N(105,1.5). We can overlay the two populations and draw regions representing Type II errors.

Type I Error Null Distribution ib ti Alternative Distribution ib ti

Type II Error Null Distribution ib ti Alternative Distribution ib ti

Power Null Distribution ib ti Alternative Distribution ib ti

Points of Interest The example we explored previously was an example of what is called a z-test of a sample mean. Significance tests have been developed for a number of statistics difference between two group means: t-test difference between two or more group means: ANOVA differences between proportions: chi-square

How do we control Type I errors? The Type I error rate is typically controlled by the researcher. It is called the alpha rate, and corresponds to the probability cut-off that one uses in a significance test. By convention, researchers often use an alpha rate of.05. In other words, they will only reject the null hypothesis when a statistic is likely to occur 5% of the time or less when the null hypothesis is true. In principle, any probability value could be chosen for making the accept/reject decision. 5% is used by convention.

Type I errors What does 5% mean in this context? It means that we will only make a decision error 5% of the time if the null hypothesis is true. If the null hypothesis is false, the Type I error rate is undefined.

How do we control Type II errors? Type II errors can also be controlled by the experimenter. The Type II error rate is sometimes called beta. How can the beta rate be controlled? The easiest way to control Type II errors is by increase the statistical power of a test.

Statistical Power Statistical power is defined as the probability of rejecting the null hypothesis when it is false a correct decision (1-beta). Power is strongly influenced by sample size. With a larger N, we are more likely l to reject the null hypothesis if it is truly false. (As N increases, the standard error shrinks. Sampling error becomes less problematic, and true differences are easier to detect.)

Power and correlation This graph shows how the power of the significance test for a correlation varies as a function of sample size. Notice that when N = 80, there is about an 80% chance of correctly rejecting the null hypothesis (beta =.20). When N = 45, we only have a 50% chance of making the correct decision a coin toss (beta =.50). POWER 1.0 0.8 0.4 0.6 0.2 Population r =.30 50 100 150 200 SAMPLE SIZE

Power and correlation Power also varies as a function of the size of the correlation. r =.80 r =.60 When the population correlation is large (e.g.,.80), it requires fewer subjects to correctly reject the null hypothesis that the population correlation is 0. When the population p correlation is smallish (e.g.,.20), it requires a large number of subjects to correctly reject the null hypothesis. POWER 0.0 0.2 0.4 0.6 0.8 1.0 r =.40 r =.20 When the population correlation is 0, the probability of rejecting the null is constant at 5% (alpha). Here power is technically undefined because the null hypothesis is true. 50 100 150 200 SAMPLE SIZE r =.00

Low Power Studies Because correlations in the.2 to.4 range are typically observed in non-experimental research, one would be wise not to trust research based on sample sizes less than 60ish. 0.8 1.0 r =.80 r =.60 r =.40 Why? Because such research only stands a 50% chance of yielding the correct decision, if the null is false. It would be more efficient (and, importantly, just as accurate) to flip a coin to make the decision rather than collecting data and using a significance test. POWER 0.4 0.6 0.0 0.2 r =.20 r =.00 50 100 150 200 SAMPLE SIZE

A Sad Fact In 1962 Jacob Cohen surveyed all articles in the Journal of Abnormal and Social Psychology and determined that the typical power of research conducted in this area was 53%. An even sadder fact: In 1989, Sedlmeier and Gigerenzer surveyed studies in the same journal (now called the Journal of Abnormal Psychology) and found that the power had decreased slightly. Researchers, unfortunately, pay little attention to power. As a consequence, the Type II error rate of research in psychology is likely to be dangerously high maybe as high as 50%.

Power in Research Design Power is important to consider, and should be used to design research projects. Given an educated guess about what the population parameter might be (e.g., a correlation of.30, a mean difference of.5 SD), one can determine the number of subjects needed for a desired level of power. Cohen and others recommend that researchers try to obtain a power level of about 80%.

Power in Research Design Thus, if one used an alpha-level level of 5% and collected enough subjects to ensure a power of 80% for an assumed effect, one would know, before the study was done, what the theoretical error rates are for the statistical test. Although these error rates correspond to long-run outcomes, one could get a sense of whether the research design was a credible one whether it is likely to minimize the two kinds of errors that are possible in NHST and, correspondingly, maximize the likelihood of making a correct decision.

Misconceptions About Hypothesis Testing

Three Common Misinterpretations of Significance Tests and p-values 1. The p-value indicates the probability that the results are due to sampling error or chance. 2. A statistically significant result is a reliable result. 3. A statistically significant result is a powerful, important result.

Misinterpretation # 1 The p-value is a conditional probability. The probability of observing a specific range of sample statistics GIVEN (i.e., conditional upon) that the null hypothesis is true. P(D H o ). This is not equivalent to the probability bilit of the null hypothesis being true, given the data. P(H o D) P(D H o )

Misinterpretation # 2 Is a significant result a reliable, easily replicated result? Not necessarily. The p-value is a poor indicator of the replicability of a finding. Replicability (assuming a real effect exists, that is, that he null hypothesis is false), is primarily a function of statistical ttiti power.

Misinterpretation # 2 If a study had a statistical power equivalent to 80%, what is the probability of obtaining a significant result twice? The probability of two independent events both occurring is the simple product of the probability bilit of each of them occurring..80.80 =.64 If power = 50%?.50.50 =.25 Bottom line: The likelihood of replicating a result is determined by statistical power, not the p-value derived from a significance test. When power of the test is low, the likelihood lih of a long-run series of replications is even lower.

Misinterpretation # 3 Is a significant result a powerful, important result? Not necessarily. The importance of the result, of course, depends on the issue at hand, the theoretical context of the finding, etc.

Misinterpretation # 3 We can measure the practical or theoretical significance of an effect using an index of effect size. An effect size is a quantitative index of the strength of the relationship between two variables. Some common measures of effect size are correlations, regression weights, t-values, and R-squared.

Misinterpretation # 3 Importantly, the same effect size can have different p- values, depending on the sample size of the study. For example, a correlation of.30 would not statistically significant with a sample size of 30, but would be statistically ti ti significant ifi with a sample size of 130. Bottom line: The p-value is a poor way to evaluate the Bottom line: The p value is a poor way to evaluate the practical significance of a research result.

Wrapping Up Today was another fun lecture about the philosophy p of hypothesis testing. We do hypothesis testing all the time. That doesn tmakeit something without error, though.

Next Time Confidence Intervals and their association with hypothesis tests. Confidence Intervals (Ch 6.8 6.11).