Comparison of frequentist and Bayesian inference. Class 20, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom



Similar documents
Bayesian Updating with Discrete Priors Class 11, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom

Fairfield Public Schools

Conditional Probability, Independence and Bayes Theorem Class 3, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom

CONTENTS OF DAY 2. II. Why Random Sampling is Important 9 A myth, an urban legend, and the real reason NOTES FOR SUMMER STATISTICS INSTITUTE COURSE

A Few Basics of Probability

Lecture Note 1 Set and Probability Theory. MIT Spring 2006 Herman Bennett

What Is Probability?

Probability: Terminology and Examples Class 2, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom

Chapter 7 Notes - Inference for Single Samples. You know already for a large sample, you can invoke the CLT so:

Likelihood: Frequentist vs Bayesian Reasoning

Introduction to Hypothesis Testing

People have thought about, and defined, probability in different ways. important to note the consequences of the definition:

Lecture 9: Bayesian hypothesis testing

QUANTITATIVE METHODS BIOLOGY FINAL HONOUR SCHOOL NON-PARAMETRIC TESTS

CHAPTER 2 Estimating Probabilities

Introduction to Hypothesis Testing OPRE 6301

NONPARAMETRIC STATISTICS 1. depend on assumptions about the underlying distribution of the data (or on the Central Limit Theorem)

THE FIRST SET OF EXAMPLES USE SUMMARY DATA... EXAMPLE 7.2, PAGE 227 DESCRIBES A PROBLEM AND A HYPOTHESIS TEST IS PERFORMED IN EXAMPLE 7.

Introduction to. Hypothesis Testing CHAPTER LEARNING OBJECTIVES. 1 Identify the four steps of hypothesis testing.

Basics of Statistical Machine Learning

Statistics 2014 Scoring Guidelines


MONT 107N Understanding Randomness Solutions For Final Examination May 11, 2010

Chapter 8 Hypothesis Testing Chapter 8 Hypothesis Testing 8-1 Overview 8-2 Basics of Hypothesis Testing

HYPOTHESIS TESTING WITH SPSS:

LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING

Teaching & Learning Plans. Plan 1: Introduction to Probability. Junior Certificate Syllabus Leaving Certificate Syllabus

Bayesian Analysis for the Social Sciences

Hypothesis testing. c 2014, Jeffrey S. Simonoff 1

Stat 5102 Notes: Nonparametric Tests and. confidence interval

Foundations of Statistics Frequentist and Bayesian

Exact Nonparametric Tests for Comparing Means - A Personal Summary

WORKED EXAMPLES 1 TOTAL PROBABILITY AND BAYES THEOREM

Lesson 1: Comparison of Population Means Part c: Comparison of Two- Means

Inference of Probability Distributions for Trust and Security applications

Bayesian Tutorial (Sheet Updated 20 March)

This chapter discusses some of the basic concepts in inferential statistics.

Likelihood Approaches for Trial Designs in Early Phase Oncology

CONTINGENCY TABLES ARE NOT ALL THE SAME David C. Howell University of Vermont

Sample Size and Power in Clinical Trials

CALCULATIONS & STATISTICS

Question: What is the probability that a five-card poker hand contains a flush, that is, five cards of the same suit?

Chapter 2. Hypothesis testing in one population

Two-Sample T-Tests Assuming Equal Variance (Enter Means)

Non-Inferiority Tests for Two Proportions

Tutorial 5: Hypothesis Testing

Two-Sample T-Tests Allowing Unequal Variance (Enter Difference)

Statistics I for QBIC. Contents and Objectives. Chapters 1 7. Revised: August 2013

HYPOTHESIS TESTING: POWER OF THE TEST

Tests for Two Proportions

Having a coin come up heads or tails is a variable on a nominal scale. Heads is a different category from tails.

C. The null hypothesis is not rejected when the alternative hypothesis is true. A. population parameters.

1 Prior Probability and Posterior Probability

Study Guide for the Final Exam

AP: LAB 8: THE CHI-SQUARE TEST. Probability, Random Chance, and Genetics

Class 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1)

STA 371G: Statistics and Modeling

Odds ratio, Odds ratio test for independence, chi-squared statistic.

Topic 8. Chi Square Tests

MATH 140 Lab 4: Probability and the Standard Normal Distribution

Introduction to Hypothesis Testing. Hypothesis Testing. Step 1: State the Hypotheses

Logic and Reasoning Practice Final Exam Spring Section Number

WISE Power Tutorial All Exercises

Two-sample inference: Continuous data

Correlational Research

Chicago Booth BUSINESS STATISTICS Final Exam Fall 2011

Introduction. Hypothesis Testing. Hypothesis Testing. Significance Testing

Section 6.2 Definition of Probability

t Tests in Excel The Excel Statistical Master By Mark Harmon Copyright 2011 Mark Harmon

Principles of Hypothesis Testing for Public Health

Hypothesis Testing for Beginners

E3: PROBABILITY AND STATISTICS lecture notes

Hypothesis Testing: Two Means, Paired Data, Two Proportions

II. DISTRIBUTIONS distribution normal distribution. standard scores

An Introduction to Statistics Course (ECOE 1302) Spring Semester 2011 Chapter 10- TWO-SAMPLE TESTS

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Simulating Chi-Square Test Using Excel

Section 7.1. Introduction to Hypothesis Testing. Schrodinger s cat quantum mechanics thought experiment (1935)

PROBABILITY SECOND EDITION

LAB : THE CHI-SQUARE TEST. Probability, Random Chance, and Genetics

Section 13, Part 1 ANOVA. Analysis Of Variance

Lab 11. Simulations. The Concept

Bayesian Statistical Analysis in Medical Research

Statistical tests for SPSS

Non-Inferiority Tests for Two Means using Differences

Independent samples t-test. Dr. Tom Pierce Radford University

1. How different is the t distribution from the normal?

HYPOTHESIS TESTING (ONE SAMPLE) - CHAPTER 7 1. used confidence intervals to answer questions such as...

Nursing Journal Toolkit: Critiquing a Quantitative Research Article

Using Excel for inferential statistics

Hypothesis testing - Steps

Difference tests (2): nonparametric

Permutation Tests for Comparing Two Populations

Comparing Two Groups. Standard Error of ȳ 1 ȳ 2. Setting. Two Independent Samples

Fixed-Effect Versus Random-Effects Models

Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics.

The Null Hypothesis. Geoffrey R. Loftus University of Washington

Applied Statistics and Data Analysis

Chapter 7 Section 7.1: Inference for the Mean of a Population

Transcription:

Comparison of frequentist and Bayesian inference. Class 20, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom 1 Learning Goals 1. Be able to explain the difference between the p-value and a posterior probability to a doctor. 2 Introduction We have now learned about two schools of statistical inference: Bayesian and frequentist. Both approaches allow one to evaluate evidence about competing hypotheses. In these notes we will review and compare the two approaches, starting from Bayes formula. 3 Bayes formula as touchstone In our first unit (probability) we learned Bayes formula, a perfectly abstract statement about conditional probabilities of events: P (B A)P (A) P (A B) =. P (B) We began our second unit (Bayesian inference) by reinterpreting the events in Bayes formula: P (D H)P (H) P (H D)=. P (D) Now H is a hypothesis and D is data which may give evidence for or against H. Each term in Bayes formula has a name and a role. The prior P (H) is the probability that H is true before the data is considered. The posterior P (H D) is the probability that H is true after the data is considered. The likelihood P (D H) is the evidence about H provided by the data D. P (D) is the total probability of the data taking into account all possible hypotheses. If the prior and likelihood are known for all hypotheses, then Bayes formula computes the posterior exactly. Such was the case when we rolled a die randomly selected from a cup whose contents you knew. We call this the deductive logic of probability theory, and it gives a direct way to compare hypotheses, draw conclusions, and make decisions. In most experiments, the prior probabilities on hypotheses are not known. In this case, our recourse is the art of statistical inference: we either make up a prior (Bayesian) or do our best using only the likelihood (frequentist). 1

18.05 class 20, Comparison of frequentist and Bayesian inference., Spring 2014 2 The Bayesian school models uncertainty by a probability distribution over hypotheses. One s ability to make inferences depends on one s degree of confidence in the chosen prior, and the robustness of the findings to alternate prior distributions may be relevant and important. The frequentist school only uses conditional distributions of data given specific hypotheses. The presumption is that some hypothesis (parameter specifying the conditional distribution of the data) is true and that the observed data is sampled from that distribution. In particular, the frequentist approach does not depend on a subjective prior that may vary from one investigator to another. These two schools may be further contrasted as follows: Bayesian inference uses probabilities for both hypotheses and data. depends on the prior and likelihood of observed data. requires one to know or construct a subjective prior. dominated statistical practice before the 20th century. may be computationally intensive due to integration over many parameters. Frequentist inference (NHST) never uses or gives the probability of a hypothesis (no prior or posterior). depends on the likelihood P (D H)) for both observed and unobserved data. does not require a prior. dominated statistical practice during the 20th century. tends to be less computationally intensive. Frequentist measures like p-values and confidence intervals continue to dominate research, especially in the life sciences. However, in the current era of powerful computers and big data, Bayesian methods have undergone a enormous renaissance in fields like machine learning and genetics. There are now a number of large, ongoing clinical trials using Bayesian protocols, something that would have been hard to imagine a generation ago. While professional divisions remain, the consensus forming among top statisticians is that the most effective approaches to complex problems often draw on the best insights from both schools working in concert. 4 Critiques and defenses 4.1 Critique of Bayesian inference 1. The main critique of Bayesian inference is that a subjective prior is, well, subjective. There is no single method for choosing a prior, so different people will produce different priors and may therefore arrive at different posteriors and conclusions.

18.05 class 20, Comparison of frequentist and Bayesian inference., Spring 2014 3 2. Furthermore, there are philosophical objections to assigning probabilities to hypotheses, as hypotheses do not constitute outcomes of repeatable experiments in which one can measure long-term frequency. Rather, a hypothesis is either true or false, regardless of whether one knows which is the case. A coin is either fair or unfair; treatment 1 is either better or worse than treatment 2; the sun will or will not come up tomorrow. 4.2 Defense of Bayesian inference 1. The probability of hypotheses is exactly what we need to make decisions. When the doctor tells me a screening test came back positive I want to know what is the probability this means I m sick. That is, I want to know the probability of the hypothesis I m sick. 2. Using Bayes theorem is logically rigorous. Once we have a prior all our calculations have the certainty of deductive logic. 3. By trying different priors we can see how sensitive our results are to the choice of prior. 4. It is easy to communicate a result framed in terms of probabilities of hypotheses. 5. Even though the prior may be subjective, one can specify the assumptions used to arrive at it, which allows other people to challenge it or try other priors. 6. The evidence derived from the data is independent of notions about data more extreme that depend on the exact experimental setup (see the Stopping rules section below). 7. Data can be used as it comes in. There is no requirement that every contingency be planned for ahead of time. 4.3 Critique of frequentist inference 1. It is ad-hoc and does not carry the force of deductive logic. Notions like data more extreme are not well defined. The p-value depends on the exact experimental setup (see the Stopping rules section below). 2. Experiments must be fully specified ahead of time. This can lead to paradoxical seeming results. See the voltmeter story in: http://en.wikipedia.org/wiki/likelihood_principle 3. The p-value and significance level are notoriously prone to misinterpretation. Careful statisticians know that a significance level of.05 means the probability of a type I error is 5%. That is, if the null hypothesis is true then 5% of the time it will be rejected due to randomness. Many (most) other people erroneously think a p-value of.05 means that the probability of the null hypothesis is 5%. Strictly speaking you could argue that this is not a critique of frequentist inference but, rather, a critique of popular ignorance. Still, the subtlety of the ideas certainly contributes to the problem. (see Mind your p s below). 4.4 Defense of frequentist inference 1. It is objective: all statisticians will agree on the p-value. Any individual can then decide if the p-value warrants rejecting the null hypothesis.

18.05 class 20, Comparison of frequentist and Bayesian inference., Spring 2014 4 2. Hypothesis testing using frequentist significance testing is applied in the statistical analysis of scientific investigations, evaluating the strength of evidence against a null hypothesis with data. The interpretation of the results is left to the user of the tests. Different users may apply different significance levels for determining statistical significance. Frequentist statistics does not pretend to provide a way to choose the significance level; rather it explicitly describes the trade-off between type I and type II errors. 3. Frequentist experimental design demands a careful description of the experiment and methods of analysis before starting. This is helps control for experimenter bias. 4. The frequentist approach has been used for over 100 years and we have seen tremendous scientific progress. Although the frequentist herself would not put a probability on the belief that frequentist methods are valuable shoudn t this history give the Bayesian a strong prior belief in the utility of frequentist methods? 5 Mind your p s. We run a two-sample t-test for equal means, with α =.05, and obtain a p-value of.04. What are the odds that the two samples are drawn from distributions with the same mean? (a) 19/1 (b) 1/19 (c) 1/20 (d) 1/24 (e) unknown answer: (e) unknown. Frequentist methods only give probabilities of statistics conditioned on hypotheses. They do not give probabilities of hypotheses. 6 Stopping rules When running a series of trials we need a rule on when to stop. Two common rules are: 1. Run exactly n trials and stop. 2. Run trials until you see a certain result and then stop. In this example we ll consider two coin tossing experiments. Experiment 1: Toss the coin exactly 6 times and report the number of heads. Experiment 2: Toss the coin until the first tails and report the number of heads. Jon is worried that his coin is biased towards heads, so before using it in class he tests it for fairness. He runs an experiment and reports to Jerry that his sequence of tosses was HHHHHT. But Jerry is only half-listening, and he forgets which experiment Jon ran to produce the data. Frequentist approach. Since he s forgotten which experiment Jon ran, Jerry the frequentist decides to compute the p-values for both experiments given Jon s data. Let θ be the probability of heads. We have the null and one-sided alternative hypotheses H 0 : θ =.5, H A : θ>.5. Experiment 1: The null distribution is binomial(6,.5) so, the one sided p-value is the probability of 5 or 6 heads in 6 tosses. Using R we get p = 1 - pbinom(4, 5,.5) = 0.1094.

18.05 class 20, Comparison of frequentist and Bayesian inference., Spring 2014 5 Experiment 2: The null distribution is geometric(.5) so, the one sided p-value is the probability of 5 or more heads before the first tails. Using R we get p = 1 - pgeom(4,.5) = 0.0313. Using the typical significance level of.05, the same data leads to opposite conclusions! We would reject H 0 in experiment 2, but not in experiment 1. The frequentist is fine with this. The set of possible outcomes is different for the different experiments so the notion of extreme data, and therefore p-value, is different. For example, in experiment 1 we would consider T HHHHH to be as extreme as HHHHHT. In experiment 2 we would never see T HHHHH since the experiment would end after the first tails. Bayesian approach. Jerry the Bayesian knows it doesn t matter which of the two experiments Jon ran, since the binomial and geometric likelihood functions (columns) for the data HHHHHT are proportional. In either case, he must make up a prior, and he chooses Beta(3,3). This is a relatively flat prior concentrated over the interval.25 θ.75. See http://ocw.mit.edu/ans7870/18/18.05/s14/applets/beta-jmo.html Since the beta and binomial (or geometric) distributions form a conjugate pair the Bayesian update is simple. Data of 5 heads and 1 tails gives a posterior distribution Beta(8,4). Here is a graph of the prior and the posterior. The blue lines at the bottom are 50% and 90% probability intervals for the posterior. 0.0 1.0 2.0 3.0 Prior Beta(3,3) Posterior Beta(8,4) 0.25.50.75 1.0 θ Prior and posterior distributions with.5 and.9 probability intervals Here are the relevant computations in R: Posterior 50% probability interval: qbeta(c(.25,.75), 8, 4) = [0.58 0.76] Posterior 90% probability interval: qbeta(c(.05,.95), 8, 4) = [0.44 0.86] P (θ >.50 data) = 1- pbeta(.5,posterior.a,posterior.b) =.89 Starting from the prior Beta(3,3), the posterior probability that the coin is biased toward heads is.89.

18.05 class 20, Comparison of frequentist and Bayesian inference., Spring 2014 6 7 Making decisions Quite often the goal of statistical inference is to help with making a decision, e.g. whether or not to undergo surgery, how much to invest in a stock, whether or not to go to graduate school, etc. In statistical decision theory, consequences of taking actions are measured by a utility function. The utility function assigns a weight to each possible outcome; in the language of probability, it is simply a random variable. For example, in my investments I could assign a utility of d to the outcome of a gain of d dollars per share of a stock (if d<0 my utility is negative). On the other hand, if my tolerance for risk is low, I will assign a more negative utility to losses than to gains (say, d 2 if d <0and d if d 0). A decision rule combines the expected utility with evidence for each hypothesis given by the data (e.g., p-values or posterior distributions) into a formal statistical framework for making decisions. In this setting, the frequentist will consider the expected utility given a hypothesis E(U H) where U is the random variable representing utility. There are frequentist are methods of combining the expected utility with p-values of hypotheses to guide decisions. The Bayesian can combine E(U H) with the posterior (or prior if it s before data is collected) to create a Bayesian decision rule. In either framework, two people considering the same investment may have different utility functions and make different decisions. For example, a risker stock (with higher potential upside and downside) will be more appealing with respect to the first utility function above than with respect to the second (loss-averse) one. A significant theoretical result is that for any decision rule there is a Bayesian decision rule which is, in a precise sense, a better rule.

MIT OpenCourseWare http://ocw.mit.edu 18.05 Introduction to Probability and Statistics Spring 2014 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.