An Introduction to Bayesian Statistics

Similar documents
1.5 Oneway Analysis of Variance

How To Check For Differences In The One Way Anova

Simple Regression Theory II 2010 Samuel L. Baker

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

Good luck! BUSINESS STATISTICS FINAL EXAM INSTRUCTIONS. Name:

LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics

Principles of Hypothesis Testing for Public Health

Lecture Notes Module 1

" Y. Notation and Equations for Regression Lecture 11/4. Notation:

Basic Statistics and Data Analysis for Health Researchers from Foreign Countries

Simple Linear Regression Inference

Case Study in Data Analysis Does a drug prevent cardiomegaly in heart failure?

2013 MBA Jump Start Program. Statistics Module Part 3

Fairfield Public Schools

Lesson 1: Comparison of Population Means Part c: Comparison of Two- Means

Data Mining Techniques Chapter 5: The Lure of Statistics: Data Mining Using Familiar Tools

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

Comparing Two Groups. Standard Error of ȳ 1 ȳ 2. Setting. Two Independent Samples

E3: PROBABILITY AND STATISTICS lecture notes

DATA INTERPRETATION AND STATISTICS

Week 4: Standard Error and Confidence Intervals

Statistics courses often teach the two-sample t-test, linear regression, and analysis of variance

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

STATISTICA Formula Guide: Logistic Regression. Table of Contents

Regression Analysis: A Complete Example

MTH 140 Statistics Videos

1. How different is the t distribution from the normal?

Chapter 5 Analysis of variance SPSS Analysis of variance

Statistics I for QBIC. Contents and Objectives. Chapters 1 7. Revised: August 2013

Recall this chart that showed how most of our course would be organized:

Comparison of frequentist and Bayesian inference. Class 20, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

CONTENTS OF DAY 2. II. Why Random Sampling is Important 9 A myth, an urban legend, and the real reason NOTES FOR SUMMER STATISTICS INSTITUTE COURSE

12: Analysis of Variance. Introduction

Estimation of σ 2, the variance of ɛ

University of Chicago Graduate School of Business. Business 41000: Business Statistics

Experimental Design. Power and Sample Size Determination. Proportions. Proportions. Confidence Interval for p. The Binomial Test

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Penalized regression: Introduction

CURVE FITTING LEAST SQUARES APPROXIMATION

Inference for two Population Means

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number

BA 275 Review Problems - Week 6 (10/30/06-11/3/06) CD Lessons: 53, 54, 55, 56 Textbook: pp , ,

Section 14 Simple Linear Regression: Introduction to Least Squares Regression

Bayesian Analysis for the Social Sciences

How To Run Statistical Tests in Excel

Chapter 7 Section 1 Homework Set A

HYPOTHESIS TESTING (ONE SAMPLE) - CHAPTER 7 1. used confidence intervals to answer questions such as...

Likelihood: Frequentist vs Bayesian Reasoning

Multivariate Normal Distribution

COMP6053 lecture: Relationship between two variables: correlation, covariance and r-squared.

Statistics Review PSY379

Two-sample inference: Continuous data

Unit 26 Estimation with Confidence Intervals

STAT 360 Probability and Statistics. Fall 2012

t Tests in Excel The Excel Statistical Master By Mark Harmon Copyright 2011 Mark Harmon

Part 2: Analysis of Relationship Between Two Variables

Interpretation of Somers D under four simple models

Two-Sample T-Tests Assuming Equal Variance (Enter Means)

2. Simple Linear Regression

Chapter 15. Mixed Models Overview. A flexible approach to correlated data.

SENSITIVITY ANALYSIS AND INFERENCE. Lecture 12

Simple linear regression

Fixed-Effect Versus Random-Effects Models

Comparing Means in Two Populations

Financial Mathematics and Simulation MATH Spring 2011 Homework 2

Organizing Your Approach to a Data Analysis

Chapter 7. One-way ANOVA

Means, standard deviations and. and standard errors

Multiple Linear Regression

Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools

STT315 Chapter 4 Random Variables & Probability Distributions KM. Chapter 4.5, 6, 8 Probability Distributions for Continuous Random Variables

Regression III: Advanced Methods

Exercise 1.12 (Pg )

CHAPTER 13. Experimental Design and Analysis of Variance

Normality Testing in Excel

4. Continuous Random Variables, the Pareto and Normal Distributions

Descriptive Statistics

5. Linear Regression

Session 7 Bivariate Data and Analysis

INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA)

STAT 350 Practice Final Exam Solution (Spring 2015)

NCSS Statistical Software

Section 13, Part 1 ANOVA. Analysis Of Variance

THE FIRST SET OF EXAMPLES USE SUMMARY DATA... EXAMPLE 7.2, PAGE 227 DESCRIBES A PROBLEM AND A HYPOTHESIS TEST IS PERFORMED IN EXAMPLE 7.

Concepts of Experimental Design

Factors affecting online sales

A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution

AP Physics 1 and 2 Lab Investigations

One-Way Analysis of Variance

General Method: Difference of Means. 3. Calculate df: either Welch-Satterthwaite formula or simpler df = min(n 1, n 2 ) 1.

Unit 26: Small Sample Inference for One Mean

DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses.

Chicago Booth BUSINESS STATISTICS Final Exam Fall 2011

Stat 411/511 THE RANDOMIZATION TEST. Charlotte Wickham. stat511.cwick.co.nz. Oct

Introduction to Regression and Data Analysis

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION

Transcription:

An Introduction to Bayesian Statistics Robert Weiss Department of Biostatistics UCLA School of Public Health robweiss@ucla.edu April 2011 Robert Weiss (UCLA) An Introduction to Bayesian Statistics UCLA CHIPTS 2011 1 / 32

Bayesian Statistics What Exactly is Bayesian Statistics? A philosophy of statistics. A generalization of classical statistics. An approach to statistics that explicitly incorporates expert knowledge in modeling data. A language to talk about statistical models and expert knowledge. An approach that allows complex models and explains why they are helpful. A different method for fitting models. An easier approach to statistics. Robert Weiss (UCLA) An Introduction to Bayesian Statistics UCLA CHIPTS 2011 2 / 32

Prior to Posterior Bayes Theorem Prior which parameter values you think are likely and unlikely. Collect data. Data gives us Likelihood which parameter values the data consider likely Update prior to Posterior what values you think are likely and unlikely given prior info and data. Prior and posterior are both probability distributions Bayes Theorem: Posterior = c Prior Data Likelihood (1) Robert Weiss (UCLA) An Introduction to Bayesian Statistics UCLA CHIPTS 2011 3 / 32

Prior Distributions Prior: Probability distribution summarize beliefs about the unknown parameters prior to seeing the data. In practice: a combination of expert knowledge and practical choices. Priors can be specified by A point estimate (My systolic blood pressure is around 123), and A measure of uncertainty (I might be in error by 3 points). (Is that a maximal error, an SD, or perhaps 2 SD? ) Need to decide (1 SD). And, A practical choice about the density: normal distribution. N(123, 3 2 ). Robert Weiss (UCLA) An Introduction to Bayesian Statistics UCLA CHIPTS 2011 4 / 32

Bayesian Inference From Before the Data to After the Data Data is random before you see it, both classical and Bayesian statistics. After you see data, it is fixed in Bayesian statistics (not classical). Uncertainty in parameter values is quantified by probability distributions. Parameters have distributions before you see the data [Prior distribution]. Parameters have updated distributions after you see the data [Posterior distribution]. The conclusion of an analysis is a posterior distribution. Point estimates, sds, 95% CIs: summaries of the posterior distribution. Robert Weiss (UCLA) An Introduction to Bayesian Statistics UCLA CHIPTS 2011 5 / 32

Interpreting the Prior Five Data Points 123 is the value I find most likely. 3 is my assessment of the standard deviation. 123 ± 3 is (120, 126) is about a 68% interval. (117, 129) is I expect my true blood pressure to fall in this interval 95% of the time. That 68% or 95% would be over all prior intervals that I specify for various quantities. Robert Weiss (UCLA) An Introduction to Bayesian Statistics UCLA CHIPTS 2011 6 / 32

Collecting My Data From CVS and from Kaiser I went into Rite-Aid and took 5 measurements in a row. Measures were: 125, 126, 112, 120, 116. Robert Weiss (UCLA) An Introduction to Bayesian Statistics UCLA CHIPTS 2011 7 / 32

Collecting My Data From CVS and from Kaiser I went into Rite-Aid and took 5 measurements in a row. Measures were: 125, 126, 112, 120, 116. Compare to: A Kaiser visit, a measurement of 115. I was surprised. At Kaiser: 115 was common 3 years ago. Last 5 Kaiser data measures were 127 (2010), 115 (2010), 117 (2007), 115, 116. Robert Weiss (UCLA) An Introduction to Bayesian Statistics UCLA CHIPTS 2011 7 / 32

Graphical Results for Rob s Blood Pressure Prior, Data Likelihood, Posterior, CVS density 0.00 0.05 0.10 0.15 0.20 prior post like 105 110 115 120 125 130 135 mu Robert Weiss (UCLA) An Introduction to Bayesian Statistics UCLA CHIPTS 2011 8 / 32

Graphical Results for Rob s Blood Pressure 2 Prior, Data Likelihood, Posterior from Kaiser density 0.00 0.05 0.10 0.15 0.20 prior post like 105 110 115 120 125 130 135 mu Robert Weiss (UCLA) An Introduction to Bayesian Statistics UCLA CHIPTS 2011 9 / 32

Interpreting the Figure Rob s Blood Pressure Red curve is prior - what I believed before seeing the data. X axis values where red curve is highest are most plausible values a priori. Where red curve is low, values are less plausible. Black curve is likelihood Where black curve is high are values supported by data. Blue curve is posterior combination of prior and likelihood. Where blue curve is high are most plausible values given data and my prior beliefs. Robert Weiss (UCLA) An Introduction to Bayesian Statistics UCLA CHIPTS 2011 10 / 32

Inferences for Rob, CVS data Inference Mean SD 9% CI Posterior 121.2 2.0 (117.2, 125.2) Likelihood 119.8 2.7 (114.5, 125.1) Prior 123 3 (117.0, 129.0) Table: Likelihood gives the classical inference. Posterior is the Bayesian solution. Prior is what you guessed prior to seeing the data. Posterior sd is smaller than classical answer and posterior CI is narrower, ie more precise. Robert Weiss (UCLA) An Introduction to Bayesian Statistics UCLA CHIPTS 2011 11 / 32

Formula for Posterior Mean µ = n σ 2 n + 1 ȳ + σ 2 τ 2 µ Posterior mean µ 0 Prior mean τ Prior sd 1 τ 2 n + 1 µ 0 σ 2 τ 2 σ 2 Sampling (data) sd n Sample size ȳ Data mean Posterior mean is a weighted average of the prior mean and the data mean. Robert Weiss (UCLA) An Introduction to Bayesian Statistics UCLA CHIPTS 2011 12 / 32

Other Sources of Prior Information In your analyses, prior information came from you, the experts on your own blood pressure. There are other places to get prior information. Knowledge of usual blood pressure values can be used as prior knowledge. For example, nurses average SBP is 118. Nurses population sd is 8.7. Sampling sd of SBP is 13. Robert Weiss (UCLA) An Introduction to Bayesian Statistics UCLA CHIPTS 2011 13 / 32

UCLA Nurse s Blood Pressure Study An average of 47 measurements of Ambulatory Blood Pressure on 203 Nurses. Calculate average SBP for each nurse. Pretend this is their true" average SBP. Average SBP is 118, sd of SBP is 8.7. SD of SBP measures around nurse s average is 13. Suppose we record a single SBP measurement of 140. Does the nurse likely have 140 SBP, or is it a somewhat high reading from someone with 125 SBP, or a somewhat low reading from someone with SBP=155? Robert Weiss (UCLA) An Introduction to Bayesian Statistics UCLA CHIPTS 2011 14 / 32

Nurse s Blood Pressure Study The Population Distribution of Blood Pressures Density 0.00 0.01 0.02 0.03 0.04 0.05 100 110 120 130 140 150 Population distribution of blood pressures Robert Weiss (UCLA) An Introduction to Bayesian Statistics UCLA CHIPTS 2011 15 / 32

Nurse s Blood Pressure Study We observe SBP=140 Density 0.00 0.01 0.02 0.03 0.04 0.05 100 110 120 130 140 150 Population distribution of blood pressures Robert Weiss (UCLA) An Introduction to Bayesian Statistics UCLA CHIPTS 2011 16 / 32

Compare Bayesian Analysis to Classical Analysis Sample a single observation y 1 from one nurse in the data set. Model the true mean µ a priori as normal with a prior mean of 118 and standard deviation of 8.7. The sampling sd is 13. The classical estimate is just the value of the single measurement. The Bayes measurement is calculated as illustrated for the measurement of 140 and as you did in the homework. Repeat for all subjects. Which result is closer to the true value? Calculate root mean squared error, and count which statistical approach comes closer to true value more often. Robert Weiss (UCLA) An Introduction to Bayesian Statistics UCLA CHIPTS 2011 17 / 32

Bayesian Analysis Beats Classical Analysis Badly Mean squared error Bayesian method 51 units. (7.1) Classical method 153 units. (12.4) Bayes method is 3 times more precise. Bayesian prior worth roughly 3 observations. Bayesian estimate closer 140 times. Classical estimate closer 63 times. Robert Weiss (UCLA) An Introduction to Bayesian Statistics UCLA CHIPTS 2011 18 / 32

Nurse s Blood Pressure Study Histogram of Bayesian Bayesian And Classical Errors Classical Density 0.00 0.01 0.02 0.03 0.04 0.05 0.06 Density 0.00 0.01 0.02 0.03 0.04 0.05 0.06 40 20 0 20 40 60 Posterior Minus True 40 20 0 20 40 60 Sample Minus True Robert Weiss (UCLA) An Introduction to Bayesian Statistics UCLA CHIPTS 2011 19 / 32

Nurse s Blood Pressure Study Errors from Bayesian And Classical Estimates Bayesian errors are packed more closely in towards zero. Almost never larger than 20 units. Mostly between -10 and 10 units. Some classical errors are even as large as -40 and +60. Generally between -20 and 20, but roughly 10% outside that range. Robert Weiss (UCLA) An Introduction to Bayesian Statistics UCLA CHIPTS 2011 20 / 32

More Complex Statistical Models Hierarchical Models for Blood Pressure Can model all data from the Nurse s Blood Pressure Data set at one time. Model looks like n=203 copies of our simpler blood pressure data model. Plus there is a model on top that models the population of possible blood pressures. SBP population mean( 118) and sd 8.7 are estimated in the model. Sampling sd (sigma 13) is estimated while fitting the model. The 203 individual nurse s average SBP values are also estimated. This is called a random effects model. The random effects are the individual nurses (unknown) average SBP values. Robert Weiss (UCLA) An Introduction to Bayesian Statistics UCLA CHIPTS 2011 21 / 32

Philosophy Confidence Intervals Classical statistics confidence interval 95% confidence intervals constructed from repeated data sets will contain the true parameter value 95% of the time. Like throwing horseshoes: the true parameter is fixed, the data sets are random and attempt to throw a ringer capturing the true parameter. Bayesian confidence interval (aka posterior interval or credible interval) The probability this interval contains the true parameter value is 95%. The data is fixed, the parameter is unknown. Like throwing a dart. Both: The parameter is fixed but unknown. Confidence is a statement about the statistician. Bayesian posterior probability is a statement about the particular scientific problem being studied. Robert Weiss (UCLA) An Introduction to Bayesian Statistics UCLA CHIPTS 2011 22 / 32

Philosophy Hypothesis Tests Classical hypothesis test p-value is the probability of observing a test statistic as extreme or more extreme assuming the null hypothesis is true. Bayesian hypothesis test. The probability that the null (alternative) hypothesis is true. The classical statement requires one more leap of faith: if the p-value is small, either something unusual occurred or the null hypothesis must be false. The Bayesian statement is a direct statement about the probability of the null hypothesis. Robert Weiss (UCLA) An Introduction to Bayesian Statistics UCLA CHIPTS 2011 23 / 32

Philosophy Hypothesis Tests II Classical hypothesis test H 0 is treated differently from H A. Only two hypotheses H0 must be nested in H A. Bayesian hypothesis test. May have 2 or 3 or more hypotheses. Hypotheses are treated symmetrically. Hypotheses need not be nested. Example: Three hypotheses H 1 : β 1 < β 2, H 2 : β 1 = β 2, H 3 : β 1 > β 2 can be handled in a Bayesian framework. The result is three probabilities. The probability that each hypothesis is true given the data. Robert Weiss (UCLA) An Introduction to Bayesian Statistics UCLA CHIPTS 2011 24 / 32

Orthodontic Example Example of Many Hypotheses Orthodontic fixtures (braces) gradually reposition teeth by sliding the teeth along an arch wire attached to the teeth by brackets. Each bracket was attached to the arch wire at one of four angles: 0, 2, 4, or 6. Outcome is frictional resistance which is expected to increase linearly as the angle increases. Four different types of fixtures are used in this study. 4 groups. Interest in intercepts and slopes (on angle) in each group. Are intercepts same or different? Which slopes are equal? For any two slopes, which is higher? Lower? Robert Weiss (UCLA) An Introduction to Bayesian Statistics UCLA CHIPTS 2011 25 / 32

ANOVA table Treatments, Angle, Treatments by Angle F value Degrees Freedom p-value Treatment.25 3.86 Angle 185 1 <.0001 Treatment Angle 7.43 3 <.0001 Table: Results from the analysis of covariance with interaction. The p-value for treatment is for differences between treatment intercepts, and the p-value for treatment angle is for differences between treatment-specific slopes. Robert Weiss (UCLA) An Introduction to Bayesian Statistics UCLA CHIPTS 2011 26 / 32

Pairwise p-values Are they significant? What about multiple comparisons? Treatment 2 3 4 Treatment 1.03.04.04 2 <.0001.80 3.0003 Table: P-values for differences between treatment-specific slopes from the previous ANOVA. Robert Weiss (UCLA) An Introduction to Bayesian Statistics UCLA CHIPTS 2011 27 / 32

Orthodontic Example Plot of Data Lines are fitted values from usual least squares regression. Points are average responses for each group 1 through 4 and each angle. Four intercepts and four slopes in plot. Which intercepts appear to be possibly equal, which slopes appear to possibly be equal? Robert Weiss (UCLA) An Introduction to Bayesian Statistics UCLA CHIPTS 2011 28 / 32

Orthodontic Example Data and fitted lines Friction 200 400 600 800 4 23 1 1 2 3 4 2 14 3 0 2 4 6 Angle 4 2 1 3 2 1 3 Robert Weiss (UCLA) An Introduction to Bayesian Statistics UCLA CHIPTS 2011 29 / 32

Orthodontic Example Model Description Alber and Weiss (2009). A model selection approach to analysis of variance and covariance. Statistics in Medicine, 28, 1821-1840. Data y ij is the jth observation in group i and has angle x ij. y ij = α i + β i x ij + error. Simple linear regression within group Different intercept α i and different slope β i in each group. Prior on intercepts is that any two groups may or may not be equal. Same for slopes: Any two slopes may or may not be equal. Results on next page. For each pair of intercepts or slopes, the figure plots the probability that the first slope (intercept) is less than the next slope (intercept), the probability that they are equal, the probability that the first is greater than the second. Robert Weiss (UCLA) An Introduction to Bayesian Statistics UCLA CHIPTS 2011 30 / 32

Figure 4: Posterior probabilities of intercept differences (a) and slope differences (b) from Robert Weiss (UCLA) An Introduction to Bayesian Statistics UCLA CHIPTS 2011 31 / 32 the independent partitions model. For p = 0 (intercepts) and p = 1 (slopes), each pair of Orthodontic Example Results Susan Alber 4 (a) 1 Posterior Probability Posterior Probability 0.75 0.5 0.25 0 (µ01, µ02) (µ01, µ03) (µ01, µ04) (µ02, µ03) (µ02, µ04) (µ03, µ04) Intercept Pair (µ0j, µ0k) (b) 1 0.75 0.5 0.25 µ0j < µ0k µ0j = µ0k µ0j > µ0k µ1j < µ1k µ1j = µ1k µ1j > µ1k 0 (µ11, µ12) (µ11, µ13) (µ11, µ14) (µ12, µ13) (µ12, µ14) (µ13, µ14) Slope Pair (µ1j, µ1k)

Discussion of Orthodontic Results All pairs of intercepts have probability about.6 of being equal. Intercepts, groups 1, 2, 3: Roughly equal probabilities of one being larger or smaller than the other if not equal for groups. Modest probability that group 4 is larger than others, if not equal. Slopes: 1 less than 2, 4. Slope 3 less than 1, 2, and 4. Only slopes 2, 4 may be equal. Robert Weiss (UCLA) An Introduction to Bayesian Statistics UCLA CHIPTS 2011 32 / 32