Some preliminary Q&A

Similar documents
Comparison of frequentist and Bayesian inference. Class 20, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom

Likelihood: Frequentist vs Bayesian Reasoning

Basics of Statistical Machine Learning

Foundations of Statistics Frequentist and Bayesian

LAB : THE CHI-SQUARE TEST. Probability, Random Chance, and Genetics

AP: LAB 8: THE CHI-SQUARE TEST. Probability, Random Chance, and Genetics

People have thought about, and defined, probability in different ways. important to note the consequences of the definition:

A Few Basics of Probability

STAT 35A HW2 Solutions

Objections to Bayesian statistics


Non Parametric Inference

HYPOTHESIS TESTING (ONE SAMPLE) - CHAPTER 7 1. used confidence intervals to answer questions such as...

Bayesian Updating with Discrete Priors Class 11, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom

Online 12 - Sections 9.1 and 9.2-Doug Ensley

Fairfield Public Schools

Normal distribution. ) 2 /2σ. 2π σ

Week 3&4: Z tables and the Sampling Distribution of X

The normal approximation to the binomial

Chapter 4. Probability and Probability Distributions

1 Prior Probability and Posterior Probability

HYPOTHESIS TESTING (ONE SAMPLE) - CHAPTER 7 1. used confidence intervals to answer questions such as...

The Binomial Probability Distribution

Lecture 9: Bayesian hypothesis testing

What is a P-value? Ronald A. Thisted, PhD Departments of Statistics and Health Studies The University of Chicago

Sample Size and Power in Clinical Trials

Week 4: Standard Error and Confidence Intervals

Hypothesis testing - Steps

C. The null hypothesis is not rejected when the alternative hypothesis is true. A. population parameters.

Introduction to Analysis of Variance (ANOVA) Limitations of the t-test

University of Chicago Graduate School of Business. Business 41000: Business Statistics

E3: PROBABILITY AND STATISTICS lecture notes

Part 2: One-parameter models

Binomial Sampling and the Binomial Distribution

Model-based Synthesis. Tony O Hagan

Hypothesis Testing for Beginners

QUANTITATIVE METHODS BIOLOGY FINAL HONOUR SCHOOL NON-PARAMETRIC TESTS

SCHOOL OF HEALTH AND HUMAN SCIENCES DON T FORGET TO RECODE YOUR MISSING VALUES

Two Correlated Proportions (McNemar Test)

SENSITIVITY ANALYSIS AND INFERENCE. Lecture 12

Chapter 3: DISCRETE RANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS. Part 3: Discrete Uniform Distribution Binomial Distribution

A POPULATION MEAN, CONFIDENCE INTERVALS AND HYPOTHESIS TESTING

Penalized regression: Introduction

4. Continuous Random Variables, the Pareto and Normal Distributions

MAS2317/3317. Introduction to Bayesian Statistics. More revision material

An Introduction to Basic Statistics and Probability

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

CHAPTER 2 Estimating Probabilities

Mathematical goals. Starting points. Materials required. Time needed

Inclusion and Exclusion Criteria

Frequentist vs. Bayesian Statistics

Tutorial 5: Hypothesis Testing

IEOR 6711: Stochastic Models I Fall 2012, Professor Whitt, Tuesday, September 11 Normal Approximations and the Central Limit Theorem

Bayesian probability theory

Paper PO06. Randomization in Clinical Trial Studies

P (B) In statistics, the Bayes theorem is often used in the following way: P (Data Unknown)P (Unknown) P (Data)

p ˆ (sample mean and sample

Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools

This chapter discusses some of the basic concepts in inferential statistics.

CONTENTS OF DAY 2. II. Why Random Sampling is Important 9 A myth, an urban legend, and the real reason NOTES FOR SUMMER STATISTICS INSTITUTE COURSE

ST 371 (IV): Discrete Random Variables

Topic #6: Hypothesis. Usage

Bayesian Tutorial (Sheet Updated 20 March)

Characteristics of Binomial Distributions

Correlational Research

Basic Bayesian Methods

Book Review of Rosenhouse, The Monty Hall Problem. Leslie Burkholder 1

Solutions: Problems for Chapter 3. Solutions: Problems for Chapter 3

Independent samples t-test. Dr. Tom Pierce Radford University

The sample space for a pair of die rolls is the set. The sample space for a random number between 0 and 1 is the interval [0, 1].

Section 1.3 P 1 = 1 2. = P n = 1 P 3 = Continuing in this fashion, it should seem reasonable that, for any n = 1, 2, 3,..., =

Introduction to. Hypothesis Testing CHAPTER LEARNING OBJECTIVES. 1 Identify the four steps of hypothesis testing.

Bayesian Statistical Analysis in Medical Research

Business Statistics 41000: Probability 1

Chicago Booth BUSINESS STATISTICS Final Exam Fall 2011

Chapter 8 Hypothesis Testing Chapter 8 Hypothesis Testing 8-1 Overview 8-2 Basics of Hypothesis Testing

The Null Hypothesis. Geoffrey R. Loftus University of Washington

3.4. The Binomial Probability Distribution. Copyright Cengage Learning. All rights reserved.

Mind on Statistics. Chapter 12

Chapter 6: Probability

Statistics 104: Section 6!

Conditional Probability, Hypothesis Testing, and the Monty Hall Problem

Course Syllabus MATH 110 Introduction to Statistics 3 credits

5.1 Identifying the Target Parameter

The Importance of Statistics Education

Hypothesis testing. c 2014, Jeffrey S. Simonoff 1

Introduction to Hypothesis Testing

Stat 411/511 THE RANDOMIZATION TEST. Charlotte Wickham. stat511.cwick.co.nz. Oct

Lesson 9 Hypothesis Testing

Bayesian inference for population prediction of individuals without health insurance in Florida

99.37, 99.38, 99.38, 99.39, 99.39, 99.39, 99.39, 99.40, 99.41, cm

Chapter 5. Random variables

Chapter 7 Section 7.1: Inference for the Mean of a Population

Lecture 8 The Subjective Theory of Betting on Theories

TUTORIAL on ICH E9 and Other Statistical Regulatory Guidance. Session 1: ICH E9 and E10. PSI Conference, May 2011

Likelihood Approaches for Trial Designs in Early Phase Oncology

Having a coin come up heads or tails is a variable on a nominal scale. Heads is a different category from tails.

Section 5 Part 2. Probability Distributions for Discrete Random Variables

Two-sample inference: Continuous data

Transcription:

Some preliminary Q&A What is the philosophical difference between classical ( frequentist ) and Bayesian statistics? Chapter 1: Approaches for Statistical Inference p. 1/19

Some preliminary Q&A What is the philosophical difference between classical ( frequentist ) and Bayesian statistics? To a frequentist, unknown model parameters are fixed and unknown, and only estimable by replications of data from some experiment. Chapter 1: Approaches for Statistical Inference p. 1/19

Some preliminary Q&A What is the philosophical difference between classical ( frequentist ) and Bayesian statistics? To a frequentist, unknown model parameters are fixed and unknown, and only estimable by replications of data from some experiment. A Bayesian thinks of parameters as random, and thus having distributions (just like the data). We can thus think about unknowns for which no reliable frequentist experiment exists, e.g. θ = proportion of US men with untreated atrial fibrillation Chapter 1: Approaches for Statistical Inference p. 1/19

Some preliminary Q&A How does it work? Chapter 1: Approaches for Statistical Inference p. 2/19

Some preliminary Q&A How does it work? A Bayesian writes down a prior guess for θ, p(θ), then combines this with the information that the data X provide to obtain the posterior distribution of θ, p(θ X). All statistical inferences (point and interval estimates, hypothesis tests) then follow as appropriate summaries of the posterior. Chapter 1: Approaches for Statistical Inference p. 2/19

Some preliminary Q&A How does it work? A Bayesian writes down a prior guess for θ, p(θ), then combines this with the information that the data X provide to obtain the posterior distribution of θ, p(θ X). All statistical inferences (point and interval estimates, hypothesis tests) then follow as appropriate summaries of the posterior. Note that posterior information prior information 0, with the second replaced by = only if the prior is noninformative (which is often uniform, or flat ). Chapter 1: Approaches for Statistical Inference p. 2/19

Some preliminary Q&A Is the classical approach wrong? Chapter 1: Approaches for Statistical Inference p. 3/19

Some preliminary Q&A Is the classical approach wrong? While a hardcore Bayesian might say so, it is probably more accurate to think of classical methods as merely limited in scope! Chapter 1: Approaches for Statistical Inference p. 3/19

Some preliminary Q&A Is the classical approach wrong? While a hardcore Bayesian might say so, it is probably more accurate to think of classical methods as merely limited in scope! The Bayesian approach expands the class of models we can fit to our data, enabling us to handle repeated measures unbalanced or missing data nonhomogenous variances multivariate data and many other settings that are awkward or infeasible from a classical point of view. Chapter 1: Approaches for Statistical Inference p. 3/19

Some preliminary Q&A Is the classical approach wrong? While a hardcore Bayesian might say so, it is probably more accurate to think of classical methods as merely limited in scope! The Bayesian approach expands the class of models we can fit to our data, enabling us to handle repeated measures unbalanced or missing data nonhomogenous variances multivariate data and many other settings that are awkward or infeasible from a classical point of view. The approach also eases the interpretation of and learning from those models once fit. Chapter 1: Approaches for Statistical Inference p. 3/19

Simple example of Bayesian thinking From Business Week, online edition, July 31, 2001: Chapter 1: Approaches for Statistical Inference p. 4/19

Simple example of Bayesian thinking From Business Week, online edition, July 31, 2001: Economists might note, to take a simple example, that American turkey consumption tends to increase in November. A Bayesian would clarify this by observing that Thanksgiving occurs in this month. Chapter 1: Approaches for Statistical Inference p. 4/19

Simple example of Bayesian thinking From Business Week, online edition, July 31, 2001: Economists might note, to take a simple example, that American turkey consumption tends to increase in November. A Bayesian would clarify this by observing that Thanksgiving occurs in this month. Data: plot of turkey consumption by month Chapter 1: Approaches for Statistical Inference p. 4/19

Simple example of Bayesian thinking From Business Week, online edition, July 31, 2001: Economists might note, to take a simple example, that American turkey consumption tends to increase in November. A Bayesian would clarify this by observing that Thanksgiving occurs in this month. Data: plot of turkey consumption by month Prior: location of Thanksgiving in the calendar knowledge of Americans Thanksgiving eating habits Chapter 1: Approaches for Statistical Inference p. 4/19

Simple example of Bayesian thinking From Business Week, online edition, July 31, 2001: Economists might note, to take a simple example, that American turkey consumption tends to increase in November. A Bayesian would clarify this by observing that Thanksgiving occurs in this month. Data: plot of turkey consumption by month Prior: location of Thanksgiving in the calendar knowledge of Americans Thanksgiving eating habits Posterior: Understanding of the pattern in the data! Chapter 1: Approaches for Statistical Inference p. 4/19

Bayes means revision of estimates Humans tend to be Bayesian in the sense that most revise their opinions about uncertain quantities as more data accumulate. For example: Suppose you are about to make your first submission to a particular academic journal Chapter 1: Approaches for Statistical Inference p. 5/19

Bayes means revision of estimates Humans tend to be Bayesian in the sense that most revise their opinions about uncertain quantities as more data accumulate. For example: Suppose you are about to make your first submission to a particular academic journal You assess your chances of your paper being accepted (you have an opinion, but the true probability is unknown) Chapter 1: Approaches for Statistical Inference p. 5/19

Bayes means revision of estimates Humans tend to be Bayesian in the sense that most revise their opinions about uncertain quantities as more data accumulate. For example: Suppose you are about to make your first submission to a particular academic journal You assess your chances of your paper being accepted (you have an opinion, but the true probability is unknown) You submit your article and it is accepted! Chapter 1: Approaches for Statistical Inference p. 5/19

Bayes means revision of estimates Humans tend to be Bayesian in the sense that most revise their opinions about uncertain quantities as more data accumulate. For example: Suppose you are about to make your first submission to a particular academic journal You assess your chances of your paper being accepted (you have an opinion, but the true probability is unknown) You submit your article and it is accepted! Question: What is your revised opinion regarding the acceptance probability for papers like yours? Chapter 1: Approaches for Statistical Inference p. 5/19

Bayes means revision of estimates Humans tend to be Bayesian in the sense that most revise their opinions about uncertain quantities as more data accumulate. For example: Suppose you are about to make your first submission to a particular academic journal You assess your chances of your paper being accepted (you have an opinion, but the true probability is unknown) You submit your article and it is accepted! Question: What is your revised opinion regarding the acceptance probability for papers like yours? If you said anything other than 1, you are a Bayesian! Chapter 1: Approaches for Statistical Inference p. 5/19

Bayes can account for structure County-level breast cancer rates per 10,000 women: 79 87 83 80 78 90 89 92 99 95 96 100 110 115 101 109 105 108 112 96 104 92 101 96 With no direct data for, what estimate would you use? Chapter 1: Approaches for Statistical Inference p. 6/19

Bayes can account for structure County-level breast cancer rates per 10,000 women: 79 87 83 80 78 90 89 92 99 95 96 100 110 115 101 109 105 108 112 96 104 92 101 96 With no direct data for, what estimate would you use? Is 200 reasonable? Chapter 1: Approaches for Statistical Inference p. 6/19

Bayes can account for structure County-level breast cancer rates per 10,000 women: 79 87 83 80 78 90 89 92 99 95 96 100 110 115 101 109 105 108 112 96 104 92 101 96 With no direct data for, what estimate would you use? Is 200 reasonable? Probably not: all the other rates are around 100 Chapter 1: Approaches for Statistical Inference p. 6/19

Bayes can account for structure County-level breast cancer rates per 10,000 women: 79 87 83 80 78 90 89 92 99 95 96 100 110 115 101 109 105 108 112 96 104 92 101 96 With no direct data for, what estimate would you use? Is 200 reasonable? Probably not: all the other rates are around 100 Perhaps use the average of the neighboring values (again, near 100) Chapter 1: Approaches for Statistical Inference p. 6/19

Accounting for structure (cont d) Now assume that data become available for county : 100 women at risk, 2 cancer cases. Thus rate = 2 100 10, 000 = 200 Would you use this value as the estimate? Chapter 1: Approaches for Statistical Inference p. 7/19

Accounting for structure (cont d) Now assume that data become available for county : 100 women at risk, 2 cancer cases. Thus rate = 2 100 10, 000 = 200 Would you use this value as the estimate? Probably not: The sample size is very small, so this estimate will be unreliable. How about a compromise between 200 and the rates in the neighboring counties? Chapter 1: Approaches for Statistical Inference p. 7/19

Accounting for structure (cont d) Now assume that data become available for county : 100 women at risk, 2 cancer cases. Thus rate = 2 100 10, 000 = 200 Would you use this value as the estimate? Probably not: The sample size is very small, so this estimate will be unreliable. How about a compromise between 200 and the rates in the neighboring counties? Now repeat this thought experiment if the county data were 20/1000, 200/10000,... Chapter 1: Approaches for Statistical Inference p. 7/19

Accounting for structure (cont d) Now assume that data become available for county : 100 women at risk, 2 cancer cases. Thus rate = 2 100 10, 000 = 200 Would you use this value as the estimate? Probably not: The sample size is very small, so this estimate will be unreliable. How about a compromise between 200 and the rates in the neighboring counties? Now repeat this thought experiment if the county data were 20/1000, 200/10000,... Bayes and empirical Bayes methods can incorporate the structure in the data, weight the data and prior information appropriately, and allow the data to dominate as the sample size becomes large. Chapter 1: Approaches for Statistical Inference p. 7/19

Motivating Example From Berger and Berry (1988, Amer. Scientist): Consider a clinical trial to study the effectiveness of Vitamin C in treating the common cold. Chapter 1: Approaches for Statistical Inference p. 8/19

Motivating Example From Berger and Berry (1988, Amer. Scientist): Consider a clinical trial to study the effectiveness of Vitamin C in treating the common cold. Observations are matched pairs of subjects (twins?), half randomized (in double blind fashion) to vitamin C, half to placebo. We count how many pairs had C giving superior relief after 48 hours. Chapter 1: Approaches for Statistical Inference p. 8/19

Two Designs Design #1: Sample n = 17 pairs, and test H 0 : P(C better) = 1 2 vs. H A : P(C better) 1 2 Suppose we observe x = 13 preferences for C. Then p-value = P(X 13 or X 4) =.049 So if α =.05, stop and reject H 0. Chapter 1: Approaches for Statistical Inference p. 9/19

Two Designs Design #1: Sample n = 17 pairs, and test H 0 : P(C better) = 1 2 vs. H A : P(C better) 1 2 Suppose we observe x = 13 preferences for C. Then p-value = P(X 13 or X 4) =.049 So if α =.05, stop and reject H 0. Design #2: Sample n 1 = 17 pairs. Then: if x 1 13 or x 1 4, stop. otherwise, sample an additional n 2 = 27 pairs. Reject H 0 if X 1 + X 2 29 or X 1 + X 2 15. Chapter 1: Approaches for Statistical Inference p. 9/19

Two Designs (cont d) We choose this second stage since under H 0, P(X 1 + X 2 29 or X 1 + X 2 15) =.049 the same as Stage 1! Chapter 1: Approaches for Statistical Inference p. 10/19

Two Designs (cont d) We choose this second stage since under H 0, P(X 1 + X 2 29 or X 1 + X 2 15) =.049 the same as Stage 1! Suppose we again observe X 1 = 13. Now: p-value = P(X 1 13 or X 1 4) +P(X 1 + X 2 29 and 4 < X 1 < 13) +P(X 1 + X 2 15 and 4 < X 1 < 13) =.085 no longer significant at α =.05! Chapter 1: Approaches for Statistical Inference p. 10/19

Two Designs (cont d) We choose this second stage since under H 0, P(X 1 + X 2 29 or X 1 + X 2 15) =.049 the same as Stage 1! Suppose we again observe X 1 = 13. Now: p-value = P(X 1 13 or X 1 4) +P(X 1 + X 2 29 and 4 < X 1 < 13) +P(X 1 + X 2 15 and 4 < X 1 < 13) =.085 no longer significant at α =.05! Yet the observed data was exactly the same; all we did was contemplate a second stage (no effect on data), and it changed our answer! Chapter 1: Approaches for Statistical Inference p. 10/19

Additional Q & A Q: What if we kept adding stages? Chapter 1: Approaches for Statistical Inference p. 11/19

Additional Q & A Q: What if we kept adding stages? A: p-value 1, even though x 1 still 13! Chapter 1: Approaches for Statistical Inference p. 11/19

Additional Q & A Q: What if we kept adding stages? A: p-value 1, even though x 1 still 13! Q: So are p-values really objective evidence? Chapter 1: Approaches for Statistical Inference p. 11/19

Additional Q & A Q: What if we kept adding stages? A: p-value 1, even though x 1 still 13! Q: So are p-values really objective evidence? A: No, since extra info (like design) critical! Chapter 1: Approaches for Statistical Inference p. 11/19

Additional Q & A Q: What if we kept adding stages? A: p-value 1, even though x 1 still 13! Q: So are p-values really objective evidence? A: No, since extra info (like design) critical! Q: What about unforeseen events? Example: First 5 patients develop an allergic reaction to the treatment trial is stopped by clinicians. Can a frequentist analyze these data? Chapter 1: Approaches for Statistical Inference p. 11/19

Additional Q & A Q: What if we kept adding stages? A: p-value 1, even though x 1 still 13! Q: So are p-values really objective evidence? A: No, since extra info (like design) critical! Q: What about unforeseen events? Example: First 5 patients develop an allergic reaction to the treatment trial is stopped by clinicians. Can a frequentist analyze these data? A: No: This aspect of design wasn t anticipated, so p-values not computable! Chapter 1: Approaches for Statistical Inference p. 11/19

Additional Q & A Q: What if we kept adding stages? A: p-value 1, even though x 1 still 13! Q: So are p-values really objective evidence? A: No, since extra info (like design) critical! Q: What about unforeseen events? Example: First 5 patients develop an allergic reaction to the treatment trial is stopped by clinicians. Can a frequentist analyze these data? A: No: This aspect of design wasn t anticipated, so p-values not computable! Q: Can a Bayesian? Chapter 1: Approaches for Statistical Inference p. 11/19

Additional Q & A Q: What if we kept adding stages? A: p-value 1, even though x 1 still 13! Q: So are p-values really objective evidence? A: No, since extra info (like design) critical! Q: What about unforeseen events? Example: First 5 patients develop an allergic reaction to the treatment trial is stopped by clinicians. Can a frequentist analyze these data? A: No: This aspect of design wasn t anticipated, so p-values not computable! Q: Can a Bayesian? A: (obviously) Yes as we shall see... Chapter 1: Approaches for Statistical Inference p. 11/19

Bayes/frequentist controversy Frequentist outlook: Suppose we have k unknown quantities θ = (θ 1,θ 2,...,θ k ), and data X = (X 1,...,X n ) with distribution which depends on θ, say p(x θ). Chapter 1: Approaches for Statistical Inference p. 12/19

Bayes/frequentist controversy Frequentist outlook: Suppose we have k unknown quantities θ = (θ 1,θ 2,...,θ k ), and data X = (X 1,...,X n ) with distribution which depends on θ, say p(x θ). The frequentist selects a loss function, L(θ, a), and develops procedures which perform well with respect to frequentist risk, R(θ,δ) = E X θ [L(θ,a)] Chapter 1: Approaches for Statistical Inference p. 12/19

Bayes/frequentist controversy Frequentist outlook: Suppose we have k unknown quantities θ = (θ 1,θ 2,...,θ k ), and data X = (X 1,...,X n ) with distribution which depends on θ, say p(x θ). The frequentist selects a loss function, L(θ, a), and develops procedures which perform well with respect to frequentist risk, R(θ,δ) = E X θ [L(θ,a)] A good frequentist procedure is one that enjoys low loss over repeated data sampling regardless of the true value of θ. Chapter 1: Approaches for Statistical Inference p. 12/19

Bayes/frequentist controversy So are there any such procedures? Sure: Example: Suppose X i θ iid N(θ,σ 2 ), i = 1,...,n. Standard 95% frequentist confidence interval (CI) is δ(x) = ( x ± 1.96 s/ n). Chapter 1: Approaches for Statistical Inference p. 13/19

Bayes/frequentist controversy So are there any such procedures? Sure: Example: Suppose X i θ iid N(θ,σ 2 ), i = 1,...,n. Standard 95% frequentist confidence interval (CI) is δ(x) = ( x ± 1.96 s/ n). If we choose the loss function { 0 if θ δ(x) L(θ,δ) = 1 if θ δ(x), then R[(θ,σ),δ] = E x θ,σ [L(θ,δ(x))] = P θ,σ (θ δ(x)) =.05 Chapter 1: Approaches for Statistical Inference p. 13/19

Bayes/frequentist controversy So are there any such procedures? Sure: Example: Suppose X i θ iid N(θ,σ 2 ), i = 1,...,n. Standard 95% frequentist confidence interval (CI) is δ(x) = ( x ± 1.96 s/ n). If we choose the loss function { 0 if θ δ(x) L(θ,δ) = 1 if θ δ(x), then R[(θ,σ),δ] = E x θ,σ [L(θ,δ(x))] = P θ,σ (θ δ(x)) =.05 On average in repeated use, δ fails only 5% of the time. Chapter 1: Approaches for Statistical Inference p. 13/19

Bayes/frequentist controversy So are there any such procedures? Sure: Example: Suppose X i θ iid N(θ,σ 2 ), i = 1,...,n. Standard 95% frequentist confidence interval (CI) is δ(x) = ( x ± 1.96 s/ n). If we choose the loss function { 0 if θ δ(x) L(θ,δ) = 1 if θ δ(x), then R[(θ,σ),δ] = E x θ,σ [L(θ,δ(x))] = P θ,σ (θ δ(x)) =.05 On average in repeated use, δ fails only 5% of the time. Has some appeal, since it works for all θ and σ! Chapter 1: Approaches for Statistical Inference p. 13/19

Bayes/frequentist controversy Many frequentist analyses based on the likelihood function, which is just p(x θ) viewed as a function of θ: L(θ;x) p(x θ) Larger p(x θ) for values of θ which are more likely. Leads naturally to... Chapter 1: Approaches for Statistical Inference p. 14/19

Bayes/frequentist controversy Many frequentist analyses based on the likelihood function, which is just p(x θ) viewed as a function of θ: L(θ;x) p(x θ) Larger p(x θ) for values of θ which are more likely. Leads naturally to... The Likelihood Principle: In making inferences or decisions about θ after x is observed, all relevant experimental information is contained in the likelihood function for the observed x. Furthermore, two likelihood functions contain the same information about θ if they are proportional to each other as functions of θ. Chapter 1: Approaches for Statistical Inference p. 14/19

Bayes/frequentist controversy Many frequentist analyses based on the likelihood function, which is just p(x θ) viewed as a function of θ: L(θ;x) p(x θ) Larger p(x θ) for values of θ which are more likely. Leads naturally to... The Likelihood Principle: In making inferences or decisions about θ after x is observed, all relevant experimental information is contained in the likelihood function for the observed x. Furthermore, two likelihood functions contain the same information about θ if they are proportional to each other as functions of θ. Seems completely reasonable (and theoretically justifiable) yet frequentist analyses may violate!... Chapter 1: Approaches for Statistical Inference p. 14/19

Binomial vs. Negative Binomial Example due to Pratt (comment on Birnbaum, 1962 JASA): Suppose 12 independent coin tosses: 9H, 3T. Test: H 0 : θ = 1 2 vs. H A : θ > 1 2 Two possibilities for f(x θ): Chapter 1: Approaches for Statistical Inference p. 15/19

Binomial vs. Negative Binomial Example due to Pratt (comment on Birnbaum, 1962 JASA): Suppose 12 independent coin tosses: 9H, 3T. Test: H 0 : θ = 1 2 vs. H A : θ > 1 2 Two possibilities for f(x θ): Binomial: n = 12 tosses (fixed beforehand) X = #H Bin(12,θ) L 1 (θ) = p 1 (x θ) = ( n x = ( 12 9 ) θ 9 (1 θ) 3. ) θ x (1 θ) n x Chapter 1: Approaches for Statistical Inference p. 15/19

Binomial vs. Negative Binomial Example due to Pratt (comment on Birnbaum, 1962 JASA): Suppose 12 independent coin tosses: 9H, 3T. Test: H 0 : θ = 1 2 vs. H A : θ > 1 2 Two possibilities for f(x θ): Binomial: n = 12 tosses (fixed beforehand) X = #H Bin(12,θ) L 1 (θ) = p 1 (x θ) = ( n x ) θ x (1 θ) n x = ( 12) 9 θ 9 (1 θ) 3. Negative Binomial: Flip until we get r = 3 tails X NB(3,θ) L 2 (θ) = p 2 (x θ) = ( r+x 1 x = ( 11 9 ) θ 9 (1 θ) 3. ) θ x (1 θ) r Chapter 1: Approaches for Statistical Inference p. 15/19

Binomial vs. Negative Binomial Example due to Pratt (comment on Birnbaum, 1962 JASA): Suppose 12 independent coin tosses: 9H, 3T. Test: H 0 : θ = 1 2 vs. H A : θ > 1 2 Two possibilities for f(x θ): Binomial: n = 12 tosses (fixed beforehand) X = #H Bin(12,θ) L 1 (θ) = p 1 (x θ) = ( n x ) θ x (1 θ) n x = ( 12) 9 θ 9 (1 θ) 3. Negative Binomial: Flip until we get r = 3 tails X NB(3,θ) L 2 (θ) = p 2 (x θ) = ( r+x 1 x ) θ x (1 θ) r = ( 11 9 ) θ 9 (1 θ) 3. Adopt the rejection region, Reject H 0 if X c. Chapter 1: Approaches for Statistical Inference p. 15/19

Binomial vs. Negative Binomial p-values: Chapter 1: Approaches for Statistical Inference p. 16/19

Binomial vs. Negative Binomial p-values: α 1 = P θ= 1(X 9) = Σ 12 ( 12 ) 2 j=9 j θ j (1 θ) 12 j =.075 Chapter 1: Approaches for Statistical Inference p. 16/19

Binomial vs. Negative Binomial p-values: α 1 = P θ= 1(X 9) = Σ 12 2 ( 12 j=9 j α 2 = P θ= 1 9) = Σ 2(X j=9 ( 2+j j ) θ j (1 θ) 12 j =.075 ) θ j (1 θ) 3 =.0325 Chapter 1: Approaches for Statistical Inference p. 16/19

Binomial vs. Negative Binomial p-values: α 1 = P θ= 1(X 9) = Σ 12 2 ( 12 j=9 j α 2 = P θ= 1 9) = Σ 2(X j=9 ( 2+j j ) θ j (1 θ) 12 j =.075 ) θ j (1 θ) 3 =.0325 So at α =.05, two different decisions! Violates the Likelihood Principle, since L 1 (θ) L 2 (θ)!! Chapter 1: Approaches for Statistical Inference p. 16/19

Binomial vs. Negative Binomial p-values: α 1 = P θ= 1(X 9) = Σ 12 2 ( 12 j=9 j α 2 = P θ= 1 9) = Σ 2(X j=9 ( 2+j j ) θ j (1 θ) 12 j =.075 ) θ j (1 θ) 3 =.0325 So at α =.05, two different decisions! Violates the Likelihood Principle, since L 1 (θ) L 2 (θ)!! What happened? Besides the observed x = 9, we also took into account the more extreme X 10. Chapter 1: Approaches for Statistical Inference p. 16/19

Binomial vs. Negative Binomial p-values: α 1 = P θ= 1(X 9) = Σ 12 2 ( 12 j=9 j α 2 = P θ= 1 9) = Σ 2(X j=9 ( 2+j j ) θ j (1 θ) 12 j =.075 ) θ j (1 θ) 3 =.0325 So at α =.05, two different decisions! Violates the Likelihood Principle, since L 1 (θ) L 2 (θ)!! What happened? Besides the observed x = 9, we also took into account the more extreme X 10. Jeffreys (1961):...a hypothesis which may be true may be rejected because it has not predicted observable results which have not occurred. Chapter 1: Approaches for Statistical Inference p. 16/19

Binomial vs. Negative Binomial p-values: α 1 = P θ= 1(X 9) = Σ 12 2 ( 12 j=9 j α 2 = P θ= 1 9) = Σ 2(X j=9 ( 2+j j ) θ j (1 θ) 12 j =.075 ) θ j (1 θ) 3 =.0325 So at α =.05, two different decisions! Violates the Likelihood Principle, since L 1 (θ) L 2 (θ)!! What happened? Besides the observed x = 9, we also took into account the more extreme X 10. Jeffreys (1961):...a hypothesis which may be true may be rejected because it has not predicted observable results which have not occurred. In our example, the probability of the unpredicted and nonoccurring set X 10 has been used as evidence against H 0! Chapter 1: Approaches for Statistical Inference p. 16/19

Conditional (Bayesian) Perspective Always condition on data which has actually occurred; the long-run performance of a procedure is of (at most) secondary interest. Fix a prior distribution p(θ), and use Bayes Theorem (1763): p(θ x) p(x θ)p(θ) ( posterior likelihood prior ) Chapter 1: Approaches for Statistical Inference p. 17/19

Conditional (Bayesian) Perspective Always condition on data which has actually occurred; the long-run performance of a procedure is of (at most) secondary interest. Fix a prior distribution p(θ), and use Bayes Theorem (1763): p(θ x) p(x θ)p(θ) ( posterior likelihood prior ) Indeed, it often turns out that using the Bayesian formalism with relatively vague priors produces procedures which perform well using traditional frequentist criteria (e.g., low mean squared error over repeated sampling)! Chapter 1: Approaches for Statistical Inference p. 17/19

Conditional (Bayesian) Perspective Always condition on data which has actually occurred; the long-run performance of a procedure is of (at most) secondary interest. Fix a prior distribution p(θ), and use Bayes Theorem (1763): p(θ x) p(x θ)p(θ) ( posterior likelihood prior ) Indeed, it often turns out that using the Bayesian formalism with relatively vague priors produces procedures which perform well using traditional frequentist criteria (e.g., low mean squared error over repeated sampling)! several examples in Chapter 4 of the C&L text! Chapter 1: Approaches for Statistical Inference p. 17/19

Frequentist criticisms [from B. Efron (1986, Amer. Statist.), Why isn t everyone a Bayesian? ] Shouldn t condition on x (but this renews conflict with the Likelihood Principle) Chapter 1: Approaches for Statistical Inference p. 18/19

Frequentist criticisms [from B. Efron (1986, Amer. Statist.), Why isn t everyone a Bayesian? ] Shouldn t condition on x (but this renews conflict with the Likelihood Principle) Not easy/automatic (but computing keeps improving: MCMC methods, WinBUGS software...) Chapter 1: Approaches for Statistical Inference p. 18/19

Frequentist criticisms [from B. Efron (1986, Amer. Statist.), Why isn t everyone a Bayesian? ] Shouldn t condition on x (but this renews conflict with the Likelihood Principle) Not easy/automatic (but computing keeps improving: MCMC methods, WinBUGS software...) How to pick the prior p(θ): Two experimenters could get different answers with the same data! How to control influence of the prior? How to get objective results (say, for a court case, scientific report,...)? Chapter 1: Approaches for Statistical Inference p. 18/19

Frequentist criticisms [from B. Efron (1986, Amer. Statist.), Why isn t everyone a Bayesian? ] Shouldn t condition on x (but this renews conflict with the Likelihood Principle) Not easy/automatic (but computing keeps improving: MCMC methods, WinBUGS software...) How to pick the prior p(θ): Two experimenters could get different answers with the same data! How to control influence of the prior? How to get objective results (say, for a court case, scientific report,...)? Clearly this final criticism is the most serious! Chapter 1: Approaches for Statistical Inference p. 18/19

Bayesian Advantages in Inference Ability to formally incorporate prior information Chapter 1: Approaches for Statistical Inference p. 19/19

Bayesian Advantages in Inference Ability to formally incorporate prior information The reason for stopping experimentation does not affect the inference Chapter 1: Approaches for Statistical Inference p. 19/19

Bayesian Advantages in Inference Ability to formally incorporate prior information The reason for stopping experimentation does not affect the inference Answers are more easily interpretable by nonspecialists (e.g. confidence intervals) Chapter 1: Approaches for Statistical Inference p. 19/19

Bayesian Advantages in Inference Ability to formally incorporate prior information The reason for stopping experimentation does not affect the inference Answers are more easily interpretable by nonspecialists (e.g. confidence intervals) All analyses follow directly from the posterior; no separate theories of estimation, testing, multiple comparisons, etc. are needed Chapter 1: Approaches for Statistical Inference p. 19/19

Bayesian Advantages in Inference Ability to formally incorporate prior information The reason for stopping experimentation does not affect the inference Answers are more easily interpretable by nonspecialists (e.g. confidence intervals) All analyses follow directly from the posterior; no separate theories of estimation, testing, multiple comparisons, etc. are needed Any question can be directly answered (bioequivalence, multiple comparisons/hypotheses,...) Chapter 1: Approaches for Statistical Inference p. 19/19

Bayesian Advantages in Inference Ability to formally incorporate prior information The reason for stopping experimentation does not affect the inference Answers are more easily interpretable by nonspecialists (e.g. confidence intervals) All analyses follow directly from the posterior; no separate theories of estimation, testing, multiple comparisons, etc. are needed Any question can be directly answered (bioequivalence, multiple comparisons/hypotheses,...) Inferences are conditional on the actual data Chapter 1: Approaches for Statistical Inference p. 19/19

Bayesian Advantages in Inference Ability to formally incorporate prior information The reason for stopping experimentation does not affect the inference Answers are more easily interpretable by nonspecialists (e.g. confidence intervals) All analyses follow directly from the posterior; no separate theories of estimation, testing, multiple comparisons, etc. are needed Any question can be directly answered (bioequivalence, multiple comparisons/hypotheses,...) Inferences are conditional on the actual data Bayes procedures possess many optimality properties (e.g. consistent, impose parsimony in model choice, define the class of optimal frequentist procedures,...) Chapter 1: Approaches for Statistical Inference p. 19/19