Lecture 11: Further Topics in Bayesian Statistical Modeling: Graphical Modelling and Model Selection with DIC

Similar documents
Model-based Synthesis. Tony O Hagan

Modeling and Analysis of Call Center Arrival Data: A Bayesian Approach

Penalized regression: Introduction

CHAPTER 3 EXAMPLES: REGRESSION AND PATH ANALYSIS

Bayesian Approaches to Handling Missing Data

Using SAS PROC MCMC to Estimate and Evaluate Item Response Theory Models

5 Directed acyclic graphs

Data Modeling & Analysis Techniques. Probability & Statistics. Manfred Huber

A Latent Variable Approach to Validate Credit Rating Systems using R

The Basics of Graphical Models

Regression III: Advanced Methods

Bayesian Machine Learning (ML): Modeling And Inference in Big Data. Zhuhua Cai Google, Rice University

STA 4273H: Statistical Machine Learning

Question 2 Naïve Bayes (16 points)

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN

Linear Threshold Units

Analyzing Clinical Trial Data via the Bayesian Multiple Logistic Random Effects Model

Latent Class Regression Part II

5. Multiple regression

Applications of R Software in Bayesian Data Analysis

PS 271B: Quantitative Methods II. Lecture Notes

Module 3: Correlation and Covariance

Monitoring the Behaviour of Credit Card Holders with Graphical Chain Models

Bayesian Statistics in One Hour. Patrick Lam

Journal of Statistical Software

Spatial Statistics Chapter 3 Basics of areal data and areal data modeling

Lecture 16 : Relations and Functions DRAFT

Statistical Machine Learning

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

Introducing the Multilevel Model for Change

WinBUGS User Manual. Imperial College School of Medicine, Norfolk Place, London W2 1PG, UK

Bayesian Statistics: Indian Buffet Process

Markov Chain Monte Carlo Simulation Made Simple

The Graphical Method: An Example

Statistical Models in R

5 Systems of Equations

PREDICTIVE DISTRIBUTIONS OF OUTSTANDING LIABILITIES IN GENERAL INSURANCE

Solving simultaneous equations using the inverse matrix

Up/Down Analysis of Stock Index by Using Bayesian Network

Lecture 4: BK inequality 27th August and 6th September, 2007

Validation of Software for Bayesian Models Using Posterior Quantiles

A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution

Basic Bayesian Methods

STATISTICA Formula Guide: Logistic Regression. Table of Contents

Course: Model, Learning, and Inference: Lecture 5

Introduction to General and Generalized Linear Models

Bayesian Networks. Read R&N Ch Next lecture: Read R&N

Model Calibration with Open Source Software: R and Friends. Dr. Heiko Frings Mathematical Risk Consulting

Factor Analysis. Chapter 420. Introduction

A Bayesian Antidote Against Strategy Sprawl

Message-passing sequential detection of multiple change points in networks

Statistics Graduate Courses

Markov random fields and Gibbs measures

8.2. Solution by Inverse Matrix Method. Introduction. Prerequisites. Learning Outcomes

Bayesian Hidden Markov Models for Alcoholism Treatment Tria

Data Mining: An Overview. David Madigan

Multivariate Normal Distribution

Validation of Software for Bayesian Models using Posterior Quantiles. Samantha R. Cook Andrew Gelman Donald B. Rubin DRAFT

Model selection in R featuring the lasso. Chris Franck LISA Short Course March 26, 2013

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

R2MLwiN Using the multilevel modelling software package MLwiN from R

Life of A Knowledge Base (KB)

This unit will lay the groundwork for later units where the students will extend this knowledge to quadratic and exponential functions.

Monte Carlo and Empirical Methods for Stochastic Inference (MASM11/FMS091)

TIME SERIES ANALYSIS

Reliability Applications (Independence and Bayes Rule)

BayesX - Software for Bayesian Inference in Structured Additive Regression

Parameter estimation for nonlinear models: Numerical approaches to solving the inverse problem. Lecture 12 04/08/2008. Sven Zenker

3.2. Solving quadratic equations. Introduction. Prerequisites. Learning Outcomes. Learning Style

Generalized linear models and software for network meta-analysis

South Carolina College- and Career-Ready (SCCCR) Probability and Statistics

Nonparametric statistics and model selection

A Bayesian hierarchical surrogate outcome model for multiple sclerosis

In order to describe motion you need to describe the following properties.

Confidence Intervals for Spearman s Rank Correlation

Trend and Seasonal Components

Gaussian Processes to Speed up Hamiltonian Monte Carlo

What is Linear Programming?

The Chinese Restaurant Process

New Work Item for ISO Predictive Analytics (Initial Notes and Thoughts) Introduction

Testing for Granger causality between stock prices and economic growth

Anomaly detection for Big Data, networks and cyber-security

SAS Software to Fit the Generalized Linear Model

Solving Mass Balances using Matrix Algebra

Probabilistic Models for Big Data. Alex Davies and Roger Frigola University of Cambridge 13th February 2014

STAT3016 Introduction to Bayesian Data Analysis

Simple Linear Regression Inference

Monte Carlo-based statistical methods (MASM11/FMS091)

Temperature Scales. The metric system that we are now using includes a unit that is specific for the representation of measured temperatures.

CHAPTER 2 Estimating Probabilities

APPLIED MISSING DATA ANALYSIS

Bayesian prediction of disability insurance frequencies using economic indicators

Transcription:

Lecture 11: Further topics in Bayesian statistical modeling [1] Lecture 11: Further Topics in Bayesian Statistical Modeling: Graphical Modelling and Model Selection with DIC

Graphical Models Lecture 11: Further topics in Bayesian statistical modeling [2] Statistical modeling of complex systems involve usually many interconnected random variables. Question: How to build these connections? Answer: Think locally, act globally! Directed Acyclic Graphs (DAG): All quantities (random variables) in a model are represented by a node Relationships between nodes by arrows The graph is used to represent a set of conditional independence statements Express the joint relationship between all known (data) and unknown quantities (parameters, predictions, missing data, etc.) in a model through a series of simple local relationships. Provides the basis for computations

Conditional independence Lecture 11: Further topics in Bayesian statistical modeling [3] Two variables, X and Y are statistically independent if p(x, Y ) = p(x)p(y ). Equivalently, variables X and Y are statistically independent if Conditional independence: p(y X) = p(y ) Given three variables X, Y, and Z we say that X and Y are conditionally independent give Z, denoted by X Y Z, if p(x, Y Z) = p(x Z)p(Y Z)

Lecture 11: Further topics in Bayesian statistical modeling [4] Example: A Toy Model (Spiegelhalter, 1998) From a DAG, we can read of some conditional independence statements (Local Markov property) that use the natural order of the graph, e.g. B C, E, F A

Lecture 11: Further topics in Bayesian statistical modeling [5] How to read further conditional independence statements from a DAG? We define a Moral Graph by marrying the parents dropping arrows From this graph, different properties can be deduced and in particular the Global Markov property: any two subsets separated by a third one are conditional independent given the third. By separated, we mean that there is no path between the 2 subsets that does not go through the third one. In particular, p(v rest) = p(v neighbours of v) where by neighbours of v we mean the parents, spouse and children.

Moral graph Lecture 11: Further topics in Bayesian statistical modeling [6] D A, E, F (B, C) i.e. p(d rest) = p(d B, C)

Link between Gibbs sampling and DAG Lecture 11: Further topics in Bayesian statistical modeling [7] If we want to sample from p(a, B, C, D, F ) with a Gibbs sampler we define each marginal full conditional distribution using the conditional independence pattern of the DAG. Then we sample by iteratively sampling from (A, B, C, D, E, F ) p(a, B, C, D, E, F ) A p(a rest) = p(a) B p(b rest) = p(b A ) C p(c rest) = p(c A ) D p(d rest) = p(d B, C ) E p(e rest) = p(e A, F ) F p(f rest) = p(f ).

Lecture 11: Further topics in Bayesian statistical modeling [8] Summary DAG gives a non-algebraic description of the model Using a DAG is an interpretable way of specifying joint distributions through simple local terms It can be used to build hierarchical models It is used to find locally all conditional marginal distributions in a Bayesian model DAG is used to programs the kernel of the Gibbs sampler

WinBUGS and Graphical Models Lecture 11: Further topics in Bayesian statistical modeling [9] The WinBUGS User Manual recommends that the first step in any analysis should be the construction of a directed graphical model In Bayesian analysis both observable variables (data) and parameters are random variables. A Bayesian graphical model consists of nodes representing both data and parameters. These graphical representation can add clarity to complex patters of dependency.

WinBUGS implementation Lecture 11: Further topics in Bayesian statistical modeling [10] DoodleBUGS is a tool for drawing graphical models. BUGS code for a model can be generated from the graph. Types of nodes: Constants: fixed values - assigned values in data; cannot have parent nodes. Stochastic nodes: random variables assigned a probability distribution in the model - can be observed (data) or unobserved (parameters). Deterministic nodes: derived from other nodes as mathematical or logical functions of them.

Lecture 11: Further topics in Bayesian statistical modeling [11] Array of nodes - e.g. data values y[i]. They are represented compactly by a plate, indexed by i = 1,..., N. Type of links between nodes: Single arrows: represent stochastic dependence. Double arrows: represent logical (mathematical) dependence

Example: regression model Lecture 11: Further topics in Bayesian statistical modeling [12] A DAG representation for a linear regression model: y i N(µ i, τ) (i = 1,..., N) with µ i = θ 1 x 1,i + θ 2 x 1,i and τ = 1/σ 2

Multiple indexing Lecture 11: Further topics in Bayesian statistical modeling [13] Very useful to represents complex model structures: Each level of indexing of a variable requires its own plate in a graphical model. So an array variable like y ij would require two plates, one for each index. The y ij node will be in the intersection of the two plates. See example Dyes from WinBUGS Examples Vol. I - complete nesting. Any variable indexed by only j, for example, would be in the j plate but not in the i plate. See example Rats Vol I - repeated measures - x j (time) is the same for each i (rats), and so is in the j plate only.

Lecture 11: Further topics in Bayesian statistical modeling [14] Dyes from WinBUGS Examples Vol. I - complete nesting.

Rats Vol I - repeated measures - Lecture 11: Further topics in Bayesian statistical modeling [15]

More about model building Lecture 11: Further topics in Bayesian statistical modeling [16] Model criticism and sensitivity analysis Standard checks based on fitted model applied to Bayesian modeling: residuals: plot versus covariates, checks for auto-correlations and so on. prediction: check accuracy on external validation set, or cross validation. In addition should check for conflict between prior and data should check for unintended sensitivity to the prior using MCMC, we can replicate parameters and data.

Bayesian Model Selection Lecture 11: Further topics in Bayesian statistical modeling [17] Classical model selection criteria like C p, AIC and BIC assumed that the number of parameters in the model is a well-defined concepts. It is taken to be equivalent to degrees of freedom or the number of free parameters. In Bayesian analysis the prior effectively acts to restrict the freedom of these parameters to some extent and thus the appropriate model degrees of freedom is less clear. Another issue in complex models (i.e. hierarchical models) is that the likelihood is not a well defined concept. Moreover models to compare are not nested.

Using DIC for model selection Lecture 11: Further topics in Bayesian statistical modeling [18] Spiegelhater et al (2002) proposed a Bayesian model comparison criterion based on trading off goodness of fit and model complexity: Deviance Information Criterion, DIC = goodness of fit + complexity They measure goodness of fit via the deviance: D(θ) = 2 log L(data θ) Complexity of the model via: p D = E θ y [D] D ( E θ y [θ] ) = D D( θ)

Lecture 11: Further topics in Bayesian statistical modeling [19] i.e. posterior mean deviance minus deviance evaluated at the posterior mean of the parameters. The DIC is defined similarly to AIC as DIC = D( θ) + 2 p D = D + p D Models with smaller DIC are better supported by the data DIC can be monitored in WinBUGS from Interface/DIC menu.

Lecture 11: Further topics in Bayesian statistical modeling [20] Example: Gelman et. al pag 182 Suppose that the data model is y µ N(µ, 1) with prior µ Unif(0, 1000). Now suppose that we observe y 1 = 0.5 and y 2 = 100. Which is the effective number of parameters p D in each case: model{ y1 ~ dnorm(mu1, 1) y2 ~ dnorm(mu2, 1) mu1 ~ dunif(0,1000) mu2 ~ dunif(0, 1000) } #data list(y1 = 0.5, y2= 100)

Lecture 11: Further topics in Bayesian statistical modeling [21] Then we have Dbar Dhat pd DIC y1 2.585 2.094 0.490 3.075 y2 2.858 1.838 1.020 3.877 If we observe y 1 = 0.5 then effective number of parameters p D is approximately 0.5, since roughly half the information in the posterior distribution is coming from the data and half from the prior constraint of positivity. If we observe y 2 = 100 then the constrain is essentially irrelevant and the effective number of parameters is approximately 1.

Lecture 11: Further topics in Bayesian statistical modeling [22] Some comments p D is not invariant to reparametrization, i.e. which estimate is used in D( θ) p D can be negative if there is a strong prior-data conflict DIC and p D are particular useful in hierarchical models p D depends on the model and on the data. This is fundamentally different to AIC or BIC