Bayesian Analysis for the Social Sciences



Similar documents
Data Modeling & Analysis Techniques. Probability & Statistics. Manfred Huber

E3: PROBABILITY AND STATISTICS lecture notes

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics

Basics of Statistical Machine Learning

Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools

Bayesian Updating with Discrete Priors Class 11, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

People have thought about, and defined, probability in different ways. important to note the consequences of the definition:

Bayesian Statistics in One Hour. Patrick Lam

Comparison of frequentist and Bayesian inference. Class 20, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom

Fairfield Public Schools

Part 2: One-parameter models

1 Prior Probability and Posterior Probability

CHAPTER 2 Estimating Probabilities


Simple Linear Regression Inference

Inference of Probability Distributions for Trust and Security applications

Elements of statistics (MATH0487-1)

Lecture 9: Bayesian hypothesis testing

Description. Textbook. Grading. Objective

Statistics in Geophysics: Introduction and Probability Theory

Probability and Statistics Vocabulary List (Definitions for Middle School Teachers)

Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.

MATHEMATICAL METHODS OF STATISTICS

2DI36 Statistics. 2DI36 Part II (Chapter 7 of MR)

Final Mathematics 5010, Section 1, Fall 2004 Instructor: D.A. Levin

STA 371G: Statistics and Modeling

Generalized Linear Models

An Introduction to Using WinBUGS for Cost-Effectiveness Analyses in Health Economics

Exploratory Data Analysis

A Few Basics of Probability

Lecture Note 1 Set and Probability Theory. MIT Spring 2006 Herman Bennett

Math 58. Rumbos Fall Solutions to Review Problems for Exam 2

Bayes and Naïve Bayes. cs534-machine Learning

Additional sources Compilation of sources:

" Y. Notation and Equations for Regression Lecture 11/4. Notation:

Course Text. Required Computing Software. Course Description. Course Objectives. StraighterLine. Business Statistics

3. Data Analysis, Statistics, and Probability

Statistics I for QBIC. Contents and Objectives. Chapters 1 7. Revised: August 2013

International College of Economics and Finance Syllabus Probability Theory and Introductory Statistics

Some Essential Statistics The Lure of Statistics

Persuasion by Cheap Talk - Online Appendix

MAS2317/3317. Introduction to Bayesian Statistics. More revision material

Probabilistic Models for Big Data. Alex Davies and Roger Frigola University of Cambridge 13th February 2014

2. Information Economics

What is the purpose of this document? What is in the document? How do I send Feedback?

Web-based Supplementary Materials for Bayesian Effect Estimation. Accounting for Adjustment Uncertainty by Chi Wang, Giovanni

For a partition B 1,..., B n, where B i B j = for i. A = (A B 1 ) (A B 2 ),..., (A B n ) and thus. P (A) = P (A B i ) = P (A B i )P (B i )

DECISION MAKING UNDER UNCERTAINTY:

Tutorial 5: Hypothesis Testing

Final Exam Practice Problem Answers

Prentice Hall Algebra Correlated to: Colorado P-12 Academic Standards for High School Mathematics, Adopted 12/2009

More details on the inputs, functionality, and output can be found below.

Dongfeng Li. Autumn 2010

4. Continuous Random Variables, the Pareto and Normal Distributions

NEW YORK STATE TEACHER CERTIFICATION EXAMINATIONS

STAT 315: HOW TO CHOOSE A DISTRIBUTION FOR A RANDOM VARIABLE

Basic Bayesian Methods

Quantitative Methods for Finance

HYPOTHESIS TESTING (ONE SAMPLE) - CHAPTER 7 1. used confidence intervals to answer questions such as...

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

Chicago Booth BUSINESS STATISTICS Final Exam Fall 2011

Chapter 4. Probability and Probability Distributions

Elementary Statistics Sample Exam #3

SOCIETY OF ACTUARIES/CASUALTY ACTUARIAL SOCIETY EXAM C CONSTRUCTION AND EVALUATION OF ACTUARIAL MODELS EXAM C SAMPLE QUESTIONS

Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

COMMON CORE STATE STANDARDS FOR

II. DISTRIBUTIONS distribution normal distribution. standard scores

Diagrams and Graphs of Statistical Data

Elements of probability theory


Normality Testing in Excel

Algebra Academic Content Standards Grade Eight and Grade Nine Ohio. Grade Eight. Number, Number Sense and Operations Standard

BASIC STATISTICAL METHODS FOR GENOMIC DATA ANALYSIS

The Variability of P-Values. Summary

Basic Probability. Probability: The part of Mathematics devoted to quantify uncertainty

SENSITIVITY ANALYSIS AND INFERENCE. Lecture 12

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 5 9/17/2008 RANDOM VARIABLES

Important Probability Distributions OPRE 6301

Statistical Functions in Excel

ELEMENTARY STATISTICS

Lecture 8. Confidence intervals and the central limit theorem

business statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar

CONTENTS OF DAY 2. II. Why Random Sampling is Important 9 A myth, an urban legend, and the real reason NOTES FOR SUMMER STATISTICS INSTITUTE COURSE

Definition and Calculus of Probability

Lecture 3: Linear methods for classification

Likelihood Approaches for Trial Designs in Early Phase Oncology

MAS131: Introduction to Probability and Statistics Semester 1: Introduction to Probability Lecturer: Dr D J Wilkinson

Statistics 104: Section 6!

STATS8: Introduction to Biostatistics. Data Exploration. Babak Shahbaba Department of Statistics, UCI

Statistics 151 Practice Midterm 1 Mike Kowalski

Chapter 4 Lecture Notes

Mathematics Pre-Test Sample Questions A. { 11, 7} B. { 7,0,7} C. { 7, 7} D. { 11, 11}

CHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression

Transcription:

Bayesian Analysis for the Social Sciences Simon Jackman Stanford University http://jackman.stanford.edu/bass November 9, 2012 Simon Jackman (Stanford) Bayesian Analysis for the Social Sciences November 9, 2012 1 / 32

Introduction to Bayesian Inference Bayesian inference relies exclusively on Bayes Theorem: p(h data) p(h) p(data h) h is a usually a parameter (but could also be a data point, a model, a hypothesis) p are probability densities (or probability mass functions in the case of discrete h and/or discrete data) p(h) a prior density; p(data h) the likelihood or conditional density of the data given h p(h data) is the posterior density for h given the data. Gives rise to the Bayesian mantra: a posterior density is proportional to the prior times the likelihood Simon Jackman (Stanford) Bayesian Analysis for the Social Sciences November 9, 2012 2 / 32

Probability Densities as Representations of Beliefs Definition (Probability Density Function (informal)) Let h be a unknown quantity, h H R. A function p(h) is a proper probability density function if 1 p(h) 0 h. 2 p(h)dh = 1. H N(0,1) N(0,2) 3 2 1 0 1 2 3 θ 3 2 1 0 1 2 3 θ Simon Jackman (Stanford) Bayesian Analysis for the Social Sciences November 9, 2012 3 / 32

Probability Densities as Representations of Beliefs Definition (Probability Density Function (informal)) Let h be a unknown quantity, h H R. A function p(h) is a proper probability density function if 1 p(h) 0 h. 2 p(h)dh = 1. H Unif(0,1) Beta(2,3) 1.5 1.5 p(θ) 1.0 p(θ) 1.0 0.5 0.5 0.0 0.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 θ θ Simon Jackman (Stanford) Bayesian Analysis for the Social Sciences November 9, 2012 4 / 32

Probability Mass Function Definition (Probability Mass Function) If h is a discrete random variable, taking values in a countable space H R, then a function p : H [0, 1] is a probability mass function if 1 p(h) = 0 h R \ H 2 h H p(h) = 1 1.0 trinomial 1.0 Binomial(2/3,5) 1.0 Poisson(4) 0.8 0.8 0.8 0.6 0.6 0.6 0.4 0.4 0.4 0.2 0.2 0.2 0.0 0.0 0.0 red green blue 0 1 2 3 4 5 0 2 4 6 8 10 12 θ θ θ Simon Jackman (Stanford) Bayesian Analysis for the Social Sciences November 9, 2012 5 / 32

Introduction to Bayesian Inference p(h data) p(h) p(data h) Bayesian inference involves computing, summarizing and communicating summaries of the posterior density p(h data). How to do this is what this class is about. Depending on the problem, doing all this is easy or hard; we solve hard with computing power. We re working with densities (or sometimes, mass functions). Bayesian point estimates are a single number summary of a posterior density Uncertainty assessed/communicated in various ways: e.g., the standard deviation of the posterior, width of interval spanning 2.5th to 97.5th percentiles of the posterior, etc. Sometimes, can just draw a picture; details, examples coming. Simon Jackman (Stanford) Bayesian Analysis for the Social Sciences November 9, 2012 6 / 32

Introduction to Bayesian Inference p(h data) p(h) p(data h) Bayes Theorem tells us how to update beliefs about h in light of evidence ( data ) a general method for induction or for learning from data : prior - data - posterior Bayes Theorem is itself uncontroversial: follows from widely accepted axioms of probability theory (e.g., Kolmogorov) and the definition of conditional probability Simon Jackman (Stanford) Bayesian Analysis for the Social Sciences November 9, 2012 7 / 32

Why Be Bayesian? conceptual simplicity: say what you mean and mean what you say (subjective probability) a foundation for inference that does not rest on the thought experiment of repeated sampling uniformity of application: no special tweeks for this or that data analysis. Apply Bayes Rule. modern computing makes Bayesian inference easy and nearly universally applicable Simon Jackman (Stanford) Bayesian Analysis for the Social Sciences November 9, 2012 8 / 32

Conceptual Simplicity p(h data) p(h) p(data h) the posterior density (or mass function) p(h data) is a complete characterization of beliefs after looking at data as such it contains everything we need for making inferences Examples: the posterior probability that a regression coefficient is positive, negative or lies in a particular interval; the posterior probability that a subject belongs to a particular latent class; the posterior probability that a hypothesis is true; or, the posterior probabilities that a particular statistical model is true model among a family of statistical models. Simon Jackman (Stanford) Bayesian Analysis for the Social Sciences November 9, 2012 9 / 32

Contrast Frequentist Inference Model for data: y f (h). Estimate h: e.g., least squares, MLE, etc, to yield h h(y) null hypothesis e.g., H 0 : h H0 = 0 Inference via the sampling distribution of h conditional on H 0 : e.g., assuming H 0, over repeated applications of the sampling process, how frequently would we observe a result at least as extreme as the one we obtained? At least as extreme? Assessed via a test statistic, e.g., t(y) = (h H0 - h)/ var(h h = h H0 ) how frequently? The p-value, relative frequency with which we see t > t(y) in repeated applications of the sampling process. Often t(y) d N(0, 1). Simon Jackman (Stanford) Bayesian Analysis for the Social Sciences November 9, 2012 10 / 32

Contrast Frequentist Inference null hypothesis e.g., H 0 : h H0 = 0 test-statistic: Often t(y) d N(0, 1). t(y) = (h H0 - h)/ var(h h = h H0 ) p-value is a statement about the plausibility of the statistic h relative to what we might have observed in random sampling assuming H 0 : h H0 = 0 one more step need to reject/fail-to-reject H 0. Is p sufficiently small? frequentist p-value is a summary of the distribution of h under H 0 Simon Jackman (Stanford) Bayesian Analysis for the Social Sciences November 9, 2012 11 / 32

Contrast Frequentist Inference n.b., frequentist inference treats h as a random variable h is a fixed but unknown feature of the population from which data is being (randomly) sampled Bayesian inference: h is fixed, a function of the data available for analysis Bayesian inference: h is a random variable, subject to (subjective) uncertainty Bayesian Frequentist h random fixed but unknown h fixed random random-ness subjective sampling distribution of interest posterior sampling distribution p(h y) p(h(y) h = h H0 ) Simon Jackman (Stanford) Bayesian Analysis for the Social Sciences November 9, 2012 12 / 32

Subjective Uncertainty how do we do statistical inference in situations where repeated sampling is infeasible? inference when we have the entire population and hence no uncertainty due to sampling: e.g., parts of comparative political economy. Bayesians rely on a notion of subjective uncertainty e.g., h is a random variable because we don t know its value Bayes Theorem tells us how to manage that uncertainty, how to update beliefs about h in light of data Contrast objectivist notion of probability: probability as a property of the object under study (e.g., coins, decks of cards, roulette wheels, people, groups, societies). Simon Jackman (Stanford) Bayesian Analysis for the Social Sciences November 9, 2012 13 / 32

Subjective Uncertainty Many Bayesians regard objectivist probability as metaphysical nonsense. de Finetti: PROBABILITY DOES NOT EXIST The abandonment of superstitious beliefs about...fairies and Witches was an essential step along the road to scientific thinking. Probability, too, if regarded as something endowed with some kind of objective existence, is not less a misleading misconception, an illusory attempt to exteriorize or materialize our true probabilistic beliefs. In investigating the reasonableness of our own modes of thought and behaviour under uncertainty, all we require, and all that we are reasonably entitled to, is consistency among these beliefs, and their reasonable relation to any kind of relevant objective data ( relevant in as much as subjectively deemed to be so). This is Probability Theory. Simon Jackman (Stanford) Bayesian Analysis for the Social Sciences November 9, 2012 14 / 32

Subjective Uncertainty Bayesian probability statements are thus about states of mind over states of the world, and not about states of the world per sé. Borel: one can guess the outcome of a coin toss while the coin is still in the air and its movement is perfectly determined, or even after the coin has landed but before one reviews the result. i.e., subjective uncertainty obtains irrespective of objective uncertainty (however conceived) not just any subjective uncertainty: beliefs must conform to the rules of probability: e.g., p(h) should be proper: i.e., H p(h)dh = 1, p(h) 0 h H. Simon Jackman (Stanford) Bayesian Analysis for the Social Sciences November 9, 2012 15 / 32

Bayes Theorem Conditional probability: Let A and B be events with P(B) > 0. Then the conditional probability of A given B is P(A B) = P(A B) P(B) = P(A, B) P(B). Multiplication rule: P(A B) = P(A, B) = P(A B)P(B) = P(B A)P(A) Law of Total Probability: P(B) = P(A B) + P( A B) = P(B A)P(A) + P(B A)P( A) Bayes Theorem: If A and B are events with P(B) > 0, then P(A B) = P(B A)P(A) P(B) Simon Jackman (Stanford) Bayesian Analysis for the Social Sciences November 9, 2012 16 / 32

Bayes Theorem, Example case, drug-testing work suggests that about 3% of the subject pool (elite athletes) uses a particular prohibited drug. H U : test subject uses the prohibited substance. p(h U ) =.03. E (evidence) is a positive test result. Test has a false negative rate of.05; i.e., P( E H U ) =.05 P(E H U ) =.95. Test has a false positive rate of.10: i.e., P(E H U ) =.10. Bayes Theorem: P(H U E) = P(H U )P(E H U ) i {U, U} P(H i)p(e H i ) =.03~.95 (.03~.95) + (.97~.10) =.0285.0285 +.097 =.23 Simon Jackman (Stanford) Bayesian Analysis for the Social Sciences November 9, 2012 17 / 32

Bayes Theorem, Continuous Parameter Bayes Theorem: p(h y) = p(y h)p(h) p(y h)p(h)dh Proof: by the definition of conditional probability p(h, y) = p(h y)p(y) = p(y h)p(h), (1) where all these densities are assumed to exist and have the properties p(z) > 0 and p(z)dz = 1 (i.e., are proper probability densities. The result follows by re-arranging the quantities in equation equation 1 and noting that p(y) = p(y, h)dh = p(y h)p(h)dh. Simon Jackman (Stanford) Bayesian Analysis for the Social Sciences November 9, 2012 18 / 32

and Densities, Continuous Parameter 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 Simon Jackman (Stanford) Bayesian Analysis for the Social Sciences November 9, 2012 19 / 32

, and s: less standard cases θ θ θ θ θ θ Simon Jackman (Stanford) Bayesian Analysis for the Social Sciences November 9, 2012 20 / 32

Cromwell s Rule: the dangers of dogmatism p(h data) p(h) p(data h) p(h data) = 0 hs.t.p(h) = 0. Cromwell s Rule: After the English deposed, tried and executed Charles I in 1649, the Scots invited Charles son, Charles II, to become king. The English regarded this as a hostile act, and Oliver Cromwell led an army north. to the outbreak of hostilities, Cromwell wrote to the synod of the Church of Scotland, I beseech you, in the bowels of Christ, consider it possible that you are mistaken. a dogmatic prior that assigns zero probability to a hypothesis can never be revised likewise, a hypothesis with prior weight of 1.0 can never be refuted. Simon Jackman (Stanford) Bayesian Analysis for the Social Sciences November 9, 2012 21 / 32

Cromwell s Rule Simon Jackman (Stanford) Bayesian Analysis for the Social Sciences November 9, 2012 22 / 32

Bayesian Point Estimates Bayes estimates: single number summary of a posterior density but which one?: e.g., mode, median, mean, some quantile(s)? different loss functions rationalize different point estimate Loss: Let H be a set of possible states of nature h, and let a A be actions availble to the researcher. Then define l(h, a) as the loss to the researcher from taking action a when the state of nature is h. expected loss: Given a posterior distribution for h, p(h y), the posterior expected loss of an action a is m(p(h y), a) = H l(h, a)p(h y)dh. Simon Jackman (Stanford) Bayesian Analysis for the Social Sciences November 9, 2012 23 / 32

Mean as Bayes Estimator Under Quadratic Loss quadratic loss: If h H is a parameter of interest, and ~ h is an estimate of h, then l(h, ~ h) = (h - ~ h) 2 is the quadratic loss arising from the use of the estimate ~ h instead of h. Mean as Bayes Estimate Under Quadratic Loss: E(h y) = ~ h = H h p(h y)dh. Proof: Quadratic loss implies that the posterior expected loss is m(h, ~ h) = (h - ~ h) 2 p(h y)dh. Expanding the quadratic yields H m(h, ~ h) = H h2 p(h y)dh + ~ h 2-2 ~ he(h y). Differentiate with respect to ~ h, noting that the first term does not involve ~ h. Solve the 1st order condition for ~ h and the result follows. Simon Jackman (Stanford) Bayesian Analysis for the Social Sciences November 9, 2012 24 / 32

Bayes Estimates Quadratic Loss: mean of the posterior density, E(h y) = h p(h y)dh Symmetric Linear Loss: median of the posterior density, n.b., only well-defined for h H R, in which case ~ h is defined such that.5 All-or-nothing Loss: mode of the posterior density H = ~ h = argmax p(h y) h H Simon Jackman (Stanford) Bayesian Analysis for the Social Sciences November 9, 2012 25 / 32

Credible Region; HPD region Definition (Credible Region) A region C X such that p(h)dh = 1 - α, 0 α 1 is a 100(1 - α)% C credible region for h. For single-parameter problems (i.e., X R), if C is not a set of disjoint intervals, then C is a credible interval. If p(h) is a (prior/posterior) density, then C is a (prior/posterior) credible region. Definition (Highest Probability Density Region) A region C X is a 100(1 - α)% highest probability density region for h under p(h) if 1 P(h C) = 1 - α 2 P(h 1 ) P(h 2 ), h 1 C, h 2 C Simon Jackman (Stanford) Bayesian Analysis for the Social Sciences November 9, 2012 26 / 32

HPD intervals A 100(1 - α)% HPD region for a symmetric, unimodal density is unique and symmetric around the mode; e.g., a normal density. Cf skewed distributions; a HPD differs from simply reading off the quantiles. N(0,1) χ 2 4 df 25% 75% 2 1 0 1 2 0 2 4 6 8 10 Simon Jackman (Stanford) Bayesian Analysis for the Social Sciences November 9, 2012 27 / 32

HPD intervals HPDs can be a series of disjoint intervals, e.g., a bimodal density these are uncommon; but in such a circumstance, presenting a picture of the density might be the reasonable thing to do. See Example 1.7, p28: y i N(0, R), subject to extreme missingness. The posterior density of q(r) = r 12 / r 11 r 22 : 1.0 0.5 0.0 0.5 1.0 Correlation Coefficient Simon Jackman (Stanford) Bayesian Analysis for the Social Sciences November 9, 2012 28 / 32

Bayesian Consistency for anything other than a dogmatic/degenerate prior (see the earlier discussion of Cromwell s Rule), more and more data will overwhelm the prior. Bayesian asymptotics: with an arbitrarily large amount of sample information relative to prior information, the posterior density tends to the likelihood (normalized to be a density over h). central limit arguments: since likelihoods are usually approximately normal in large samples, then so too are posterior densities. Simon Jackman (Stanford) Bayesian Analysis for the Social Sciences November 9, 2012 29 / 32

Bayesian Consistency The prior remains fixed across the sequence, as sample size increases and h * is held constant. In this example, n = 6, 30, 90, 450 across the four columns. Simon Jackman (Stanford) Bayesian Analysis for the Social Sciences November 9, 2012 30 / 32

Bayesian Consistency The prior remains fixed across the sequence, as sample size increases and h * is held constant. In this example, n = 6, 30, 150, 1500 across the four columns. Simon Jackman (Stanford) Bayesian Analysis for the Social Sciences November 9, 2012 31 / 32

Other topics from Chapter One 1.8. Bayesian hypothesis testing. 1.9. Exchangeability. de Finetti s Representation Theorem. Simon Jackman (Stanford) Bayesian Analysis for the Social Sciences November 9, 2012 32 / 32