p-values and significance levels (false positive or false alarm rates)



Similar documents
Testing Hypotheses About Proportions

MONT 107N Understanding Randomness Solutions For Final Examination May 11, 2010

Chapter 26: Tests of Significance

Correlational Research

6: Introduction to Hypothesis Testing

Chapter 4. Hypothesis Tests

STAT 35A HW2 Solutions

Introduction to. Hypothesis Testing CHAPTER LEARNING OBJECTIVES. 1 Identify the four steps of hypothesis testing.

Planning sample size for randomized evaluations

Math 58. Rumbos Fall Solutions to Assignment #3

LAB : THE CHI-SQUARE TEST. Probability, Random Chance, and Genetics

WISE Power Tutorial All Exercises

AP: LAB 8: THE CHI-SQUARE TEST. Probability, Random Chance, and Genetics

Lesson 9 Hypothesis Testing

Power Analysis for Correlation & Multiple Regression

Likelihood: Frequentist vs Bayesian Reasoning

Introduction to Hypothesis Testing

DDBA 8438: Introduction to Hypothesis Testing Video Podcast Transcript

Contemporary Mathematics- MAT 130. Probability. a) What is the probability of obtaining a number less than 4?

The power of money management

Package empiricalfdr.deseq2

Testing Research and Statistical Hypotheses

HYPOTHESIS TESTING WITH SPSS:

Discrete Structures for Computer Science

Chapter 23 Inferences About Means

A Statistical Analysis of Popular Lottery Winning Strategies

Tutorial 5: Hypothesis Testing

Data Mining Techniques Chapter 5: The Lure of Statistics: Data Mining Using Familiar Tools

Coins, Presidents, and Justices: Normal Distributions and z-scores

Independent samples t-test. Dr. Tom Pierce Radford University

Math 58. Rumbos Fall Solutions to Review Problems for Exam 2

MA 1125 Lecture 14 - Expected Values. Friday, February 28, Objectives: Introduce expected values.

CHANCE ENCOUNTERS. Making Sense of Hypothesis Tests. Howard Fincher. Learning Development Tutor. Upgrade Study Advice Service

HOW TO WRITE A LABORATORY REPORT

CROSS EXAMINATION OF AN EXPERT WITNESS IN A CHILD SEXUAL ABUSE CASE. Mark Montgomery

Solutions: Problems for Chapter 3. Solutions: Problems for Chapter 3


Simple Regression Theory II 2010 Samuel L. Baker

Activation of your SeKA account

Algebra 2 C Chapter 12 Probability and Statistics

Selecting Research Participants

Understanding Confidence Intervals and Hypothesis Testing Using Excel Data Table Simulation

Bayesian Updating with Discrete Priors Class 11, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom

Cultural Relativism. 1. What is Cultural Relativism? 2. Is Cultural Relativism true? 3. What can we learn from Cultural Relativism?

QUANTITATIVE METHODS BIOLOGY FINAL HONOUR SCHOOL NON-PARAMETRIC TESTS

How To Check For Differences In The One Way Anova

Particle Soup: Big Bang Nucleosynthesis

The Bonferonni and Šidák Corrections for Multiple Comparisons

Hypothesis testing. c 2014, Jeffrey S. Simonoff 1

WISE Sampling Distribution of the Mean Tutorial

Variables Control Charts

Pivot Point Trading. If you would rather work the pivot points out by yourself, the formula I use is below:

Radiometric Dating Lab By Vicky Jordan

Activity 1: Using base ten blocks to model operations on decimals

Results from the 2014 AP Statistics Exam. Jessica Utts, University of California, Irvine Chief Reader, AP Statistics

CHAPTER IV FINDINGS AND CONCURRENT DISCUSSIONS

Mgmt 469. Model Specification: Choosing the Right Variables for the Right Hand Side

MONEY MANAGEMENT. Guy Bower delves into a topic every trader should endeavour to master - money management.

Chapter 6: Probability

Section 7.1. Introduction to Hypothesis Testing. Schrodinger s cat quantum mechanics thought experiment (1935)

What are confidence intervals and p-values?

2 GENETIC DATA ANALYSIS

Hypothesis Test for Mean Using Given Data (Standard Deviation Known-z-test)

Because the slope is, a slope of 5 would mean that for every 1cm increase in diameter, the circumference would increase by 5cm.

Comparing Two Groups. Standard Error of ȳ 1 ȳ 2. Setting. Two Independent Samples

NONPARAMETRIC STATISTICS 1. depend on assumptions about the underlying distribution of the data (or on the Central Limit Theorem)

Cell Phone Impairment?

6th Grade Lesson Plan: Probably Probability

YOU WILL NOT BE EFFECTIVE READING THIS.

E10: Controlled Experiments

" Y. Notation and Equations for Regression Lecture 11/4. Notation:

Sampling Biases in IP Topology Measurements

Module 3. Ways of Finding Answers to Research Questions

A Short Course in Logic Zeno s Paradox

Chemotaxis and Migration Tool 2.0

1 Hypothesis Testing. H 0 : population parameter = hypothesized value:

Online Senior Assessment 2013: Social and Behavioral Sciences

MrMajik s Money Management Strategy Copyright MrMajik.com 2003 All rights reserved.

Bootstrap Hypothesis Test

Permutation & Non-Parametric Tests

Chapter 7: One-Sample Inference

The power of a test is the of. by using a particular and a. value of the that is an to the value

Social Security Number

Betting systems: how not to lose your money gambling

Remarks on the Concept of Probability

University of Chicago Graduate School of Business. Business 41000: Business Statistics

How to Run a Test Trial of New B2B Distribution Channels

Statistics 2014 Scoring Guidelines

CONTENTS OF DAY 2. II. Why Random Sampling is Important 9 A myth, an urban legend, and the real reason NOTES FOR SUMMER STATISTICS INSTITUTE COURSE

Inclusion and Exclusion Criteria

Pascal is here expressing a kind of skepticism about the ability of human reason to deliver an answer to this question.

Sample Size and Power in Clinical Trials

Negotiation Skills and Strategies

StatCrunch and Nonparametric Statistics

Cambridge ESOL Entry 3 Certificate in ESOL Skills for Life


Transcription:

p-values and significance levels (false positive or false alarm rates) Let's say 123 people in the class toss a coin. Call it "Coin A." There are 65 heads. Then they toss another coin. Call it "Coin B." There are 72 heads. Unbeknownst to the class, one of these coins is biased towards heads and one is fair. We should expect around half heads with the fair coin and more for the biased coin. But of course it is possible that we could get 72 heads with a fair coin and it is possible to get 65 heads with a biased coin. So knowing this doesn't really tell us which one is which. Our null hypothesis is that the coins are fair this means that the probability of getting heads is 50%. Our alternative hypothesis is that the coins are biased towards heads. This means that the probability of getting heads is >50%. We want to see if we can reject the null hypothesis based on the data, so we generate a null distribution. We use the computer to simulate 123 people tossing fair coins and count how many heads there were and repeat this 1000 times. This gives us the following histogram. Notice that the x axis in this case is counts of heads instead of fraction of heads (p_hat). If we know how many total coin flips there are (123 here) it is easy to convert counts to fraction and back again:

Now we point out where our data falls: Should we reject our null hypothesis in each of these cases? The decision to reject the null hypothesis depends on a cutoff. We need to decide on an acceptable false positive rate, also called a significance level. If the probability of getting our statistic or more extreme is less than that significance level cutoff, then we will reject the null hypothesis. So we need to decide on a significance level cutoff (acceptable false positive rate) and then see if the p- values for our actual data are more or less than this cutoff. The significance level is the threshold p- value. If the p- value for the data is less than this significance level, then we will reject the null hypothesis. Normally, you would decide on a significance level at which to reject the null hypothesis. There is a convention to have a significance level of 5%, but this is ultimately an arbitrary choice. We want to pick a threshold low enough that we think we would be unlikely to get our data result if the null hypothesis were true.

Let s first calculate the p- values for each of our samples. First, for Coin A: So if the null hypothesis were true, we would get 65 heads or more 25% of the time.

Now for Coin B: If the null hypothesis were true, we would get 72 heads or more 3% of the time. So what to do? We need to pick a significance level. Remember that we don t know which of our coins is fair and which is biased. In fact, the students who flipped them didn't know if either were fair or either were biased! So we need to pick a single significance level that we ll use to test all our samples. In a real situation, the decision of what significance level to use should be made before you see your data. Remember that the significance level is the p- value cutoff you use to decide whether the data is in agreement with the null hypothesis or whether it disagrees. If the p- value of your data is less than the cutoff then your data disagrees with the null hypothesis (you reject the null hypothesis). If the p- value of your data is more than the cutoff, then your data is in agreement with the null hypothesis (you fail to reject the null hypothesis). Agrees with = fail to reject = consistent with the null hypothesis Disagrees with = reject = inconsistent with the null hypothesis Let s explore the consequences of this choice.

Before we do this, I'll tell you which coin is which... Coin A is biased and Coin B is fair! Let s say we choose a significance level of 0.3 The red region is now the region where we reject our null hypothesis. With this significance level, we would correctly reject the null hypothesis for the biased coin but incorrectly reject it for the fair coin (Type I error).

Let s say we choose significance level of 0.01 Now we correctly fail to reject the null hypothesis for the fair coin, but we also fail to reject it for the biased coin (Type II error).

How about a significance level of 0.15? Here we are wrong about both coins! We fail to reject for the biased coin (Type II) and reject for the fair coin (Type I). Ultimately, when we are doing science, we need to make a statement about our data, so we have to pick a single significance level. Let s choose the standard 0.05 significance level cutoff. Then we would make a statement like this: We did two experiments with two different coins In the first, we got 65 heads and in the second we got 72 heads. The p- value for 65 heads is 0.25. The p- value for 72 heads is 0.03. We chose a significance level (a p- value cutoff) of 0.05. Based on this significance level, we fail to reject the null hypothesis for the coin that gave us 65 heads. We reject the null hypothesis for the coin that gave us 72 heads. Therefore, our data is consistent with the coin that gave us 65 heads being fair and consistent with the coin that gave us 72 heads being biased. When we do a real experiment, we wouldn't know whether the coins were fair or biased. So we could get unlucky (as I set us up to be here) and just be completely wrong! Setting a significance level lets us control the tradeof between false alarms (false positives, Type I errors) and missed opportunities (false negatives, Type II errors).