Approximating Cross-validatory Predictive Evaluation in Bayesian Latent Variables Models with Integrated IS and WAIC



Similar documents
CS 2750 Machine Learning. Lecture 3. Density estimation. CS 2750 Machine Learning. Announcements

benefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ).

PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 12

Causal, Explanatory Forecasting. Analysis. Regression Analysis. Simple Linear Regression. Which is Independent? Forecasting

THE DISTRIBUTION OF LOAN PORTFOLIO VALUE * Oldrich Alfons Vasicek

NPAR TESTS. One-Sample Chi-Square Test. Cell Specification. Observed Frequencies 1O i 6. Expected Frequencies 1EXP i 6

GRAVITY DATA VALIDATION AND OUTLIER DETECTION USING L 1 -NORM

What is Candidate Sampling

CHAPTER 14 MORE ABOUT REGRESSION

Can Auto Liability Insurance Purchases Signal Risk Attitude?

Forecasting the Direction and Strength of Stock Market Movement

DEFINING %COMPLETE IN MICROSOFT PROJECT

How Sets of Coherent Probabilities May Serve as Models for Degrees of Incoherence

STATISTICAL DATA ANALYSIS IN EXCEL

Latent Class Regression. Statistics for Psychosocial Research II: Structural Models December 4 and 6, 2006

An Alternative Way to Measure Private Equity Performance

Logistic Regression. Lecture 4: More classifiers and classes. Logistic regression. Adaboost. Optimization. Multiple class classification

How To Calculate The Accountng Perod Of Nequalty

Regression Models for a Binary Response Using EXCEL and JMP

Realistic Image Synthesis

8.5 UNITARY AND HERMITIAN MATRICES. The conjugate transpose of a complex matrix A, denoted by A*, is given by

The Application of Fractional Brownian Motion in Option Pricing

Risk Model of Long-Term Production Scheduling in Open Pit Gold Mining

) of the Cell class is created containing information about events associated with the cell. Events are added to the Cell instance

Calculation of Sampling Weights

Binomial Link Functions. Lori Murray, Phil Munz

Exhaustive Regression. An Exploration of Regression-Based Data Mining Techniques Using Super Computation

1. Fundamentals of probability theory 2. Emergence of communication traffic 3. Stochastic & Markovian Processes (SP & MP)

1 De nitions and Censoring

The OC Curve of Attribute Acceptance Plans

Luby s Alg. for Maximal Independent Sets using Pairwise Independence

Brigid Mullany, Ph.D University of North Carolina, Charlotte

The Development of Web Log Mining Based on Improve-K-Means Clustering Analysis

Survival analysis methods in Insurance Applications in car insurance contracts

BERNSTEIN POLYNOMIALS

PRACTICE 1: MUTUAL FUNDS EVALUATION USING MATLAB.

Estimation of Dispersion Parameters in GLMs with and without Random Effects

Control Charts with Supplementary Runs Rules for Monitoring Bivariate Processes

Lecture 3: Force of Interest, Real Interest Rate, Annuity

Feature selection for intrusion detection. Slobodan Petrović NISlab, Gjøvik University College

Imperial College London

1. Measuring association using correlation and regression

Characterization of Assembly. Variation Analysis Methods. A Thesis. Presented to the. Department of Mechanical Engineering. Brigham Young University

An Interest-Oriented Network Evolution Mechanism for Online Communities

Analysis of Premium Liabilities for Australian Lines of Business

Risk-based Fatigue Estimate of Deep Water Risers -- Course Project for EM388F: Fracture Mechanics, Spring 2008

Prediction of Disability Frequencies in Life Insurance

Economic Interpretation of Regression. Theory and Applications

SIMPLE LINEAR CORRELATION

Damage detection in composite laminates using coin-tap method

Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Vision Mouse. Saurabh Sarkar a* University of Cincinnati, Cincinnati, USA ABSTRACT 1. INTRODUCTION

A DYNAMIC CRASHING METHOD FOR PROJECT MANAGEMENT USING SIMULATION-BASED OPTIMIZATION. Michael E. Kuhl Radhamés A. Tolentino-Peña

L10: Linear discriminants analysis

Forecasting the Demand of Emergency Supplies: Based on the CBR Theory and BP Neural Network

Portfolio Loss Distribution

Extending Probabilistic Dynamic Epistemic Logic

Logistic Regression. Steve Kroon

Estimation of Attrition Biases in SIPP

An artificial Neural Network approach to monitor and diagnose multi-attribute quality control processes. S. T. A. Niaki*

CHOLESTEROL REFERENCE METHOD LABORATORY NETWORK. Sample Stability Protocol

CHAPTER 5 RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES

Statistical Methods to Develop Rating Models

A Novel Methodology of Working Capital Management for Large. Public Constructions by Using Fuzzy S-curve Regression

ERP Software Selection Using The Rough Set And TPOSIS Methods

SPEE Recommended Evaluation Practice #6 Definition of Decline Curve Parameters Background:

Diagnostic Tests of Cross Section Independence for Nonlinear Panel Data Models

Fragility Based Rehabilitation Decision Analysis

Georey E. Hinton. University oftoronto. Technical Report CRG-TR May 21, 1996 (revised Feb 27, 1997) Abstract

How To Understand The Results Of The German Meris Cloud And Water Vapour Product

The Racial and Gender Interest Rate Gap. in Small Business Lending: Improved Estimates Using Matching Methods*

Estimating Age-specific Prevalence of Testosterone Deficiency in Men Using Normal Mixture Models

Searching for Interacting Features for Spam Filtering

Properties of Indoor Received Signal Strength for WLAN Location Fingerprinting

Inverse Modeling of Tight Gas Reservoirs

Course outline. Financial Time Series Analysis. Overview. Data analysis. Predictive signal. Trading strategy

Transition Matrix Models of Consumer Credit Ratings

Bayesian Network Based Causal Relationship Identification and Funding Success Prediction in P2P Lending

Joint Scheduling of Processing and Shuffle Phases in MapReduce Systems

How Much to Bet on Video Poker

HOUSEHOLDS DEBT BURDEN: AN ANALYSIS BASED ON MICROECONOMIC DATA*

An Empirical Study of Search Engine Advertising Effectiveness

Decision Tree Model for Count Data

Scale Dependence of Overconfidence in Stock Market Volatility Forecasts

Lecture 5,6 Linear Methods for Classification. Summary

A Model of Private Equity Fund Compensation

Fixed income risk attribution

Support Vector Machines

1 Example 1: Axis-aligned rectangles

Transcription:

Approxmatng Cross-valdatory Predctve Evaluaton n Bayesan Latent Varables Models wth Integrated IS and WAIC Longha L Department of Mathematcs and Statstcs Unversty of Saskatchewan Saskatoon, SK, CANADA 3 Aprl 2014

Acknowledgements Jont work wth Sh Qu, Be Zhang and Cndy X. Feng. The work was supported by grants from Natural Scences and Engneerng Research Councl of Canada (NSERC) and Canada Foundaton for Innovaton (CFI). Thank Dr. Yao and Dr. Du for ther warm hostng of my vst to Kansas State Unversty.

Outlne 1 Introducton 2 Bayesan Models wth Unt-specfc Latent Varables 3 Cross-valdatory Predctve Evaluaton 4 Importance Samplng (IS) Approxmatons Non-ntegrated Importance Samplng (nis) Integrated Importance Samplng (IS) 5 WAIC Approxmatons Non-Integrated WAIC Integrated WAIC 6 Real Data Examples Mxture Models Correlated Random Spatal Effect Models CV Posteror p-values n Logstc Regresson 7 Conclusons and Future Work 8 References

Approxmatons for Out-of-Sample Predctve Evaluaton Predctve evaluaton s often used for model comparson, dagnostcs, and detectng outlers n practce. There are three ways for ths wth ther own advantages and lmtatons: Out of sample Valdaton Leave One Out Cross Valdaton Tranng Valdaton + Bas Correcton y obs 1 y obs 1 y obs 1 y obs 1 y obs 1 y obs 2 y obs 2 y obs 2 y obs 2 y obs 2.................. y obs n y obs n y obs n y obs n y obs n y obs n+1 + a Correcton for Optmstc Bas Cross-valdaton wth Integrated IS and WAIC/1. Introducton/ 4/46

Revews of Bas-corrected Tranng Valdaton 1 AIC, DIC and others (eg., Spegelhalter et al. (2002), Celeux et al. (2006), Plummer (2008), and Ando (2007)). Partcularly, ) DIC = 2 (log P(y obs ˆθ) p DIC, where, (1) p DIC = 2[log P(y obs ˆθ) E post ( log(p(y obs θ)) ) ] (2) Good for models wth dentfable parameters. 2 Importance Samplng (eg. Gelfand et al. (1992)). For each unt: P(y obs y obs ) = 1/E post (1/P(y obs θ)) (3) 3 Wdely Applcable Informaton Crteron (WAIC, proposed by Watanabe (2009)). For each unt: y obs) = E post(p(y obs θ))/ exp [ ( V post log(p(y obs θ)) )] (4) P(y obs Applcable to models wth non-dentfable parameters, but not to models wth correlated unts. Cross-valdaton wth Integrated IS and WAIC/1. Introducton/ 5/46

What Wll We Propose? Two mproved methods (namely IS, and WAIC) nspred by mportance samplng formulae for Bayesan models wth unt-specfc latent varables that may be correlated. Cross-valdaton wth Integrated IS and WAIC/1. Introducton/ 6/46

Bayesan Models wth Unt-specfc Latent Varables The two methods to be proposed am at mprovng IS and WAIC evaluaton for such models: for = 1,, n for = 1,, n x covarate varables y observable varables for = 1,, n b θ model parameters latent varables Fgure 1: Graphcal representaton. The double arrows n the box for b 1:n mean possble dependency between b 1:n. Note that the covarate x wll be omtted n the condtons of denstes for b and y throughout ths paper for smplcty. Cross-valdaton wth Integrated IS and WAIC/2. Bayesan Models wth Unt-specfc Latent Varables/ 7/46

Posteror Dstrbuton Gven Full Data Suppose condtonal on θ, we have specfed a densty for y gven b : P(y b, θ), a jont pror densty for latent varables b 1:n : P(b 1:n θ), and a pror densty for θ: P(θ). The posteror of (b 1:n, θ) gven observatons y obs 1:n s proportonal to the jont densty of y obs 1:n, b 1:n, and θ: P post (θ, b 1:n y obs 1:n) = n j=1 P(y obs j b j, θ)p(b 1:n θ)p(θ)/c 1, (5) where C 1 s the normalzng constant nvolvng only wth y obs 1:n. Cross-valdaton wth Integrated IS and WAIC/2. Bayesan Models wth Unt-specfc Latent Varables/ 8/46

CV Posteror Dstrbutons To do cross-valdaton, for each = 1,..., n, we omt observaton, and then draw MCMC samples from CV posteror dstrbuton: y obs P post(-) (θ, b 1:n y obs ) = P(y obs j b j, θ)p(b 1:n θ)p(θ) / C 2, (6) j If we drop b from samples of (θ, b 1:n ) (6), we obtan samples of (θ, b ) from the margnalzed CV posteror: P post(-), M (θ, b y obs ) = P(y obs j b j, θ)p(b θ)p(θ) / C 2, (7) j where P(b θ) = P(b 1:n θ)db. It s useful to note that P post(-) (θ, b 1:n y obs ) = P post(-), M (θ, b y obs )P(b b, θ) (8) Samplng P post(-) = samplng P post(-), M + drawng b P(b b, θ). Cross-valdaton wth Integrated IS and WAIC/3. Cross-valdatory Predctve Evaluaton/ 9/46

CV Posteror Predctve Evaluaton: General Suppose we specfy an evaluaton functon a(y obs, θ, b ) that measures certan goodness-of-ft (or dscrepancy) of the dstrbuton P(y θ, b ) to the actual observaton y obs. CV posteror predctve evaluaton s defned as the expectaton of the a(y obs 1:n,.,.) wth respect to P post(-)(θ, b 1:n y obs ) gven n equatons (8) or (6): E post(-) (a(y obs, θ, b )) = a(y obs, θ, b )P post(-) (θ, b 1:n y obs )dθdb 1:n (9) Cross-valdaton wth Integrated IS and WAIC/3. Cross-valdatory Predctve Evaluaton/ 10/46

CV Posteror Predctve Evaluaton: Two Specfc Cases 1 Let a be the value of predctve densty: a(y obs, θ, b ) = P(y obs θ, b ). (10) Then E post(-) (a(y obs, θ, b )) = P(y obs y obs ) (11) We call t CV posteror predctve densty for the held-out unt y obs. CV nformaton crteron (CVIC) for evaluatng a Bayesan model s: CVIC = 2 2 Let a be a tal probablty: a(y obs n =1 log(p(y obs y obs )). (12), θ, b ) = Pr(y > y obs θ, b ) + 0.5Pr(y = y obs θ, b ), (13) Then, E post(-) (a(y obs, θ, b )) = Pr(y > y obs call t CV posteror p-value. y obs ) + 0.5Pr(y = y obs y obs ). We Cross-valdaton wth Integrated IS and WAIC/3. Cross-valdatory Predctve Evaluaton/ 11/46

Non-ntegrated IS (nis) Approxmaton: General If our samples are from P post (θ, b 1:n y obs 1:n ), but we are nterested n estmatng the mean of a wth respect to P post(-) (θ, b 1:n y obs ) as n (9), mportance weghtng method s based on the followng equalty for CV expected evaluaton: E post(-) (a(y obs, θ, b )) = E [ post a(y obs, θ, b )W nis (θ, b 1:n ) ] [ E post W nis (θ, b 1:n ) ], (14) where E post [ ] s expectaton wth respect to P post (θ, b 1:n y obs 1:n ), and W nis ) (θ, b 1:n ) = P post(-)(θ, b 1:n y obs P post (θ, b 1:n y obs 1:n ) C 2 = C 1 1 P(y obs θ, b ). (15) Gelfand et al. (1992) may be the frst to propose ths method. Cross-valdaton wth Integrated IS and WAIC/4. Importance Samplng (IS) Approxmatons/nIS 12/46

nis Estmate of CVIC To estmate CVIC, we set a(y obs, θ, b ) = P(y obs θ, b ), the CV posteror predctve densty P(y obs y obs ) s equal to harmonc mean of the non-ntegrated predctve densty P(y obs θ, b ) wth respect to P(θ, b 1:n y obs 1:n ): P(y obs y obs ) = 1 E post [ 1/P(y obs θ, b ) ]. (16) Based on (16), nis estmates the CV posteror predctve densty by: ˆP nis (y obs y obs ) = 1 Ê post [ 1/P(y obs θ, b ) The correspondng nis estmate of CVIC usng (17) s ĈVIC nis = 2 n =1 ]. (17) log( ˆP nis (y obs y obs )) (18) However, nis often doesn t work well because b fts y obs too well. Cross-valdaton wth Integrated IS and WAIC/4. Importance Samplng (IS) Approxmatons/nIS 13/46

Theory for Integrated Importance Samplng (IS): I 1 Integrated Evaluaton Functon Rewrte the expectaton n (9) as E post(-) (a(y obs, θ, b )) = E post(-), M (A(y obs, θ, b )) (19) = A(y obs, θ, b )P(θ, b y obs )dθdb (20) where, A(y obs, θ, b ) = a(y obs, θ, b )P(b b, θ)db. (21) Note: In (21), we ntegrate a(y obs, θ, b ) wth respect to P(b b, θ), whch s uncondtonal on y obs. Cross-valdaton wth Integrated IS and WAIC/4. Importance Samplng (IS) Approxmatons/IS 14/46

Theory for Integrated Importance Samplng (IS): II 2 Integrated Predctve Densty The full data posteror of (θ, b ) s P post, M (θ, b y obs )= where, [ j P(y obs θ, b ) = ] P(y obs j b j, θ)p(b θ)p(θ) P(y obs θ, b )/C 1, (22) P(y obs b, θ)p(b b, θ)db. (23) We wll call (23) ntegrated predctve densty, because t ntegrates away b wthout reference to y obs. Cross-valdaton wth Integrated IS and WAIC/4. Importance Samplng (IS) Approxmatons/IS 15/46

Theory for Integrated Importance Samplng (IS): III 3 Integrated Importance Samplng Formula Usng the standard mportance weghtng method, we wll estmate (20) by [ A(y obs E post(-), M (A(y obs where W IS, θ, b )) = E post, M, θ, b ) W IS (θ, b ) ] [ E post, M W IS (θ, b ) ], s the ntegrated mportance weght: (24) W IS ) (θ, b ) = P post(-), M(θ, b y obs P post, M (θ, b y obs ) C 2 = C 1 1 P(y obs θ, b ). (25) In summary, n IS, we ntegrate the evaluaton functon a(y obs, θ, b ) and P(y obs θ, b ) over b drawn from P(b b, θ), whch s uncondtonal on, to fnd ntegrated evaluaton and predctve densty functons. y obs Cross-valdaton wth Integrated IS and WAIC/4. Importance Samplng (IS) Approxmatons/IS 16/46

IS Estmate for CVIC In the specal case of estmatng CVIC, the evaluaton functon a s just the predctve densty P(y obs θ, b ), therefore, A s just recprocal of W IS Therefore, the IS estmate for P(y obs y obs ) s ˆP IS (y obs y obs ) = Accordngly, IS estmate of CVIC s ĈVIC IS = 2 1 Ê post, M [ 1/P(y obs θ, b ) n =1. ]. (26) log( ˆP IS (y obs y obs )) (27) Cross-valdaton wth Integrated IS and WAIC/4. Importance Samplng (IS) Approxmatons/IS 17/46

WAIC for Models wthout Latent Varables Watanabe (2009) defnes a verson of WAIC for models wthout latent varables as follows: WAIC = 2 n =1 [ log(epost (P(y obs θ))) V post (log(p(y obs θ))) ], (28) where E post and V post stand for mean and varance over θ wth respect to P(θ y obs 1,..., y obs n ). By comparng the forms of WAIC and CVIC, we can thnk of that n WAIC, the CV posteror predctve densty s estmated by: ˆP WAIC (y obs y obs ) = exp { log(e post (P(y obs θ))) V post (log(p(y obs θ))) }. (29) Cross-valdaton wth Integrated IS and WAIC/5. WAIC Approxmatons/ 18/46

nwaic for Latent Varables Models For the models wth possbly correlated latent varables, a nave way to approxmate CVIC s to apply WAIC drectly to the non-ntegrated predctve densty of y obs condtonal on θ and b : ˆP nwaic (y obs y obs ) = exp { log(e post (P(y obs θ, b ))) V post (log(p(y obs θ, b ))) }. (30) We wll refer to (30) as non-ntegrated WAIC (or nwaic for short) method for approxmatng CV posteror predctve densty. The correspondng nformaton crteron based on (30) s: nwaic = 2 n =1 log( ˆP nwaic (y obs y obs )). (31) Cross-valdaton wth Integrated IS and WAIC/5. WAIC Approxmatons/nWAIC 19/46

WAIC for Latent Varables Models Usng heurstcs, we propose to apply WAIC approxmaton to the ntegrated predctve densty (23) to estmate the CV posteror predctve densty: ˆP WAIC (y obs y obs ) = exp { log(e post (P(y obs θ, b ))) V post (log(p(y obs θ, b ))) }. (32) Accordngly, WAIC for approxmatng CVIC s gven by : WAIC = 2 n =1 log( ˆP WAIC (y obs y obs )). (33) Cross-valdaton wth Integrated IS and WAIC/5. WAIC Approxmatons/WAIC 20/46

Galaxy Data We obtaned the data set from R package MASS. The data set s a numerc vector of veloctes (km/sec) of 82 galaxes from 6 well-separated conc sectons of an unflled survey of the Corona Boreals regon. Densty 0.00 0.05 0.10 0.15 0.20 Densty 0.00 0.05 0.10 0.15 0.20 Densty 0.00 0.05 0.10 0.15 0.20 10 15 20 25 30 35 10 15 20 25 30 35 10 15 20 25 30 35 (a) K = 4 (b) K = 5 (c) K = 6 Fgure 2: Hstograms of Galaxy data and three estmated densty curves usng MCMC samples from fttng fnte mxture models wth dfferent numbers of components, K = 4, 5, 6 and the full data set. Cross-valdaton wth Integrated IS and WAIC/6. Real Data Examples/Mxture Models 21/46

Mxture Models wth a Fxed Number, K, of Components We appled mxture models to ft the 82 numbers. The fnte mxture model that we used to ft Galaxy data s as follows: y z = k, µ 1:K, σ 1:K N(µ k, σk 2 ), for = 1,..., n (34) z p 1:K Category(p 1,..., p K ), for = 1,..., n (35) µ k N(20, 10 4 ), for k = 1,..., K (36) σ 2 k Inverse-Gamma(0.01, 0.01 20), for k = 1,..., K (37) p k Drchlet(1,..., 1) for k = 1,..., K (38) Here we set the pror mean of µ k to 20, whch s the mean of the 82 numbers, and set the scale for Inverse Gamma pror for σk 2 to 20, whch s the varance of the 82 numbers. Our purpose of computng CVIC for fnte mxture models s to determne the numbers of mxture components, K, that can adequately capture the heterogenety n a data but don t overft the data. Cross-valdaton wth Integrated IS and WAIC/6. Real Data Examples/Mxture Models 22/46

How Dd We Run MCMC? We used JAGS to run MCMC smulatons for fttng the above model to Galaxy data wth varous choce of K. To avod the problem that MCMC may get stuck n a model wth only one component, we followed JAGS eyes example to restrct the MCMC to have at least a data pont n each component. All MCMC smulatons started wth a randomly generated z 1:n, and ran 5 parallel chans, each dong 2000, 2000, and 100,000 teratons for adaptng, burnng, and samplng, respectvely. Cross-valdaton wth Integrated IS and WAIC/6. Real Data Examples/Mxture Models 23/46

The Mxture Model s a Latent Varable Model The fnte mxture model (equatons (35) - (38)) falls n the class of models depcted by Fgure 1: the observed varable s y, the latent varable b s the mxture component ndcator z, and the model parameters θ s (µ 1:K, σ 2 1:K, p 1:K ). In ths model, the latent varables z 1,..., z n n ths model are ndependent gven the model parameter θ. It follows that y 1,..., y n are ndependent gven θ. Cross-valdaton wth Integrated IS and WAIC/6. Real Data Examples/Mxture Models 24/46

Computng nis, IS, nwaic, WAIC n Mxture Models For each MCMC sample of (θ, z 1,..., z n ) and each unt, we compute The non-ntegrated predctve densty: P(y obs z, θ) = φ(y obs µ z, σ z ). The ntegrated predctve densty: P(y obs θ, z ) = P(y obs θ) = K k=1 p k φ(y obs µ k, σ k ) (39) Notes: 1) z and y are ndependent gven θ. 2) the large component, not the component close to y obs, domnates (39) Then we can compute nis, IS, nwaic and WAIC. We see that, to compute IS and WAIC, we just apply IS and WAIC to the margnalzed models wth z 1:n ntegrated out, although z 1:n are ncluded n MCMC smulatons. Cross-valdaton wth Integrated IS and WAIC/6. Real Data Examples/Mxture Models 25/46

Comparson of 5 Informaton Crtera Table 1: Comparson of 5 nformaton crtera for mxture models. The numbers are the averages of ICs from 100 ndependent MCMC smulatons. The numbers n brackets ndcates standard devatons. K DIC nwaic nis WAIC IS CVIC 2 445.38(1.64) 420.27(0.39) 425.63(3.45) 449.56(0.14) 449.62(0.17) 450.55 3 528.78(45.12) 384.94(9.94) 391.29(6.17) 437.23(4.70) 436.43(3.79) 427.46 4 774.85(31.58) 339.91(1.87) 363.55(5.32) 422.43(0.53) 422.76(0.54) 423.16 5 710.88(25.34) 328.19(0.29) 362.30(3.70) 421.02(0.09) 421.41(0.10) 421.10 6 679.95(17.48) 323.62(1.33) 355.49(5.72) 420.97(0.27) 421.35(0.31) 421.34 7 675.27(18.57) 321.61(0.30) 364.41(4.49) 421.25(0.07) 421.64(0.12) 421.53 Cross-valdaton wth Integrated IS and WAIC/6. Real Data Examples/Mxture Models 26/46

Comparson of Statstcal Sgnfcance CVIC s the sum of mnus twce of log CV posteror predctve denstes. Therefore, the statstcal sgnfcance of the dfferences of two CVICs (or estmates) can be accessed by lookng at the populaton mean dfferences of two groups of log CV posteror predctve denstes (or ther estmates). Table 2: One-sded pared t-test p-values for comparng means of 82 log posteror predctve denstes for Galaxy data gven by mxture models wth dfferent number of mxture components, K. par of models nwaic nis WAIC IS CVIC K =3 vs K = 2 0.000 0.000 0.016 0.013 0.010 K = 4 vs K = 3 0.000 0.019 0.030 0.032 0.190 K = 5 vs K = 4 0.000 0.249 0.070 0.066 0.027 K = 6 vs K = 5 0.002 0.203 0.489 0.476 0.674 K = 7 vs K = 6 0.110 0.840 0.716 0.711 0.700 Cross-valdaton wth Integrated IS and WAIC/6. Real Data Examples/Mxture Models 27/46

Vsualze the Need of Integratng z Fgure 3: Scatter-plot of non-ntegrated predctve denstes aganst µ z, gven MCMC samples from the full data posteror (4a) and the actual CV posteror wth the 3rd number removed (4b), when K = 5 components are used. 5 10 15 20 25 30 35 10 8 6 4 2 0 log harmonc mean = 1.448 µ Z log(p(y µz σ Z )) (a) 0 10 20 30 40 600 400 200 0 log mean = 3.160 µ Z log(p(y µz σ Z )) (b) Cross-valdaton wth Integrated IS and WAIC/6. Real Data Examples/Mxture Models 28/46

Scottsh Lp Cancer Data I The data represents male lp cancer counts (over the perod 1975-1980) n the n = 56 dstrcts of Scotland. The data ncludes these columns: the number of observed cases of lp cancer, y ; the number of expected cases, E, whch are based on age effects, and are proportonal to a populaton at rsk after such effects have been taken nto account; the percent of populaton employed n agrculture, fshng and forestry, x, used as a covarate; and a lst of the neghbourng regons. Cross-valdaton wth Integrated IS and WAIC/6. Real Data Examples/Correlated Random Spatal Effect Models 29/46

Scottsh Lp Cancer Data II Cross-valdaton wth Integrated IS and WAIC/6. Real Data Examples/Correlated Random Spatal Effect Models 30/46

Four Models Consdered: I The y s modelled as a Posson random varable: y E, λ Posson(λ E ), (40) where λ denotes the underlyng relatve rsk for dstrct. Let s = log(λ ). We consder four dfferent models for the vector s = (s 1,, s n ) : model 1 (spatal+lnear, full) : s N n (α + X β, Φτ 2 ), (41) model 2 (spatal) : s N n (α, Φτ 2 ), (42) model 3 (lnear) : s N n (α + X β, I n τ 2 ), (43) model 4 (exchangable) : s N n (α, I n τ 2 ), (44) where Φ specfy spatal assocaton between dstrcts, wth detals follow. Cross-valdaton wth Integrated IS and WAIC/6. Real Data Examples/Correlated Random Spatal Effect Models 31/46

Four Models Consdered: II Φ = (I n φc) 1 M s a matrx modellng spatal dependency, n whch, c j = (E j /E ) 1/2 f areas and j are neghbours, equals to 0 otherwse, m = E 1 and m j = 0 f j. Ths model s called proper condtonal auto regresson (CAR) model. Lookng at the condtonal dstrbuton of s s, α, β, φ (48) may help understand ths dstrbuton. At a hgher level, we assgn β, τ, andφ wth very dffuse pror: τ 2 Inv-Gamma(0.5, 0.0005) (45) β N(0, 1000 2 ) (46) φ Unf(φ 0, φ 1 ), (47) where (φ 0, φ 1 ) s the nterval for φ such that Φ s postve-defnte. Cross-valdaton wth Integrated IS and WAIC/6. Real Data Examples/Correlated Random Spatal Effect Models 32/46

How Dd we Run MCMC? We used OpenBUGS through R package R2OpenBUGS to run MCMC smulatons for fttng the above four models to lp cancer data. For each smulaton, we ran two parrallel chans, each for 15000 teratons, and the frst 5000 were dscarded as burnng. For replcatng computng nformaton crteron (wth each method), we ran 100 ndependent smulatons as above by randomzng ntal θ and randomzng bugs random seed for OpenBUGS. Cross-valdaton wth Integrated IS and WAIC/6. Real Data Examples/Correlated Random Spatal Effect Models 33/46

The Posson Model s a Latent Varable Model the observed varable s y, the latent varable b s s (or λ ) the model parameters θ s (τ, β, φ). In models 1 and 2, the latent varables s 1,..., s n are dependent gven the model parameter θ. Cross-valdaton wth Integrated IS and WAIC/6. Real Data Examples/Correlated Random Spatal Effect Models 34/46

Computng IS and WAIC n Model 1 For each unt, and for each MCMC sample of (s, θ): Condtonal dstrbuton (proper auto regresson): s s, θ N(α + x β + φ j N (c j (s j α x j β)), τ 2 m ), (48) where N s the set of neghbours of dstrct. Integrated predctve densty: P(y obs θ, s ) = dposson(y obs λ E )P(s θ, s )ds (49) We generate 200 random numbers of s from the dstrbuton (48), and then estmate the ntegral n (49). Then we can compute IS and WAIC wth the ntegrated predctve densty. Cross-valdaton wth Integrated IS and WAIC/6. Real Data Examples/Correlated Random Spatal Effect Models 35/46

Comparson of 5 Informaton Crtera Table 3: Comparsons of nformaton crtera for lp cancer data. Each table entry shows the average of 100 nformaton crtera computed from 100 ndependent MCMC smulatons, and the standard devaton n bracket. Model CVIC DIC WAIC IS nwaic nis full 343.88 269.43(12.30) 344.47(0.12) 345.21(0.19) 306.82(0.21) 335.54(1.27) spatal 352.54 266.79(10.15) 354.11(0.06) 356.06(0.37) 304.61(0.18) 338.77(1.85) lnear 349.48 310.42(0.11) 350.48(0.05) 350.54(0.05) 306.94(0.21) 338.81(3.02) exch. 366.61 312.57(0.12) 368.01(0.03) 368.08(0.03) 306.74(0.17) 346.55(3.46) Cross-valdaton wth Integrated IS and WAIC/6. Real Data Examples/Correlated Random Spatal Effect Models 36/46

Seeds Data The example s taken from Table 3 of Crowder (1978). The study concerns about the proporton of seeds that germnated on each of 21 plates arranged accordng to a 2 by 2 factoral layout by seed and type of root extract. For = 1,..., 21, let r be the number of germnated seeds n the th plate, n be the total number of seeds n the th plate, x 1 be the seed type (0/1), and x 2 be root extract (0/1). Cross-valdaton wth Integrated IS and WAIC/6. Real Data Examples/CV Posteror p-values n Logstc Regresson 37/46

Logstc Regresson Models wth Random Effects The condtonal dstrbuton of r gven n, x 1 and x 2 are specfed as follows: r n, p Bnomal(n, p ) (50) logt(p ) = α 0 + α 1 x 1 + α 2 x 2 + α 12 x 1 x 2 + b (51) b N(0, σ 2 ), (52) and parameters α 0, α 1, α 2, α 12 are assgned wth N(0, 10 6 ) as pror, and σ 2 s assgned wth Inverse-Gamma (0.001, 0.001) as pror. Cross-valdaton wth Integrated IS and WAIC/6. Real Data Examples/CV Posteror p-values n Logstc Regresson 38/46

CV Posteror p-value for Outler Detecton The p-value (gven parameters and latent varable) defned by (13) for ths example s the rght tal probablty of Bnomal dstrbuton wth number of trals n and success rate p : p-value(r obs, θ, b ) = 1 pbnom(r obs ; n, p ) + 0.5 dbnom(r obs ; n, p ), (53) where r obs s the actual observaton of r, and pbnom and dbnom denote CDF and PMF of Bnomal dstrbuton. CV posteror p-value for observaton r obs s the mean of p-value(r obs, θ, b ) wth respect to the CV posteror dstrbuton P(θ, b r obs ). Cross-valdaton wth Integrated IS and WAIC/6. Real Data Examples/CV Posteror p-values n Logstc Regresson 39/46

Four Methods for Approxmatng CV Posteror p-value: I Posteror check (Gelman et al. 1996) Average each p-value(r obs, θ, b ) wth respect to the posteror of (θ, b ) gven the full data set r obs 1:21. Ghostng method (Marshall and Spegelhater, 2003) For each MCMC sample, one averages p-value(r obs, θ, b ) wth respect to the condtonal dstrbuton of b gven θ (but wthout r obs ) to obtan ghostng p-value, then averages the ghostng p-values over all MCMC samples. nis: Average p-value(r obs, θ, b ) after beng weghted wth the nverse of probablty densty (mass) of r obs : 1/dbnom(r obs ; n, p ) Cross-valdaton wth Integrated IS and WAIC/6. Real Data Examples/CV Posteror p-values n Logstc Regresson 40/46

Four Methods for Approxmatng CV Posteror p-value: II IS: For each MCMC sample, we frst average each of p-value(r obs, θ, b ) and dbnom(r obs ; n, p ) over 30 b randomly generated from b θ to fnd the ntegrated p-value and the ntegrated predctve densty respectvely. Then compute the weghted average of the ntegrated p-values wth the reversed ntegrated predctve densty as weghts over all MCMC samples usng formula (24). Cross-valdaton wth Integrated IS and WAIC/6. Real Data Examples/CV Posteror p-values n Logstc Regresson 41/46

Comparson of Estmated CV Posteror p-value I Fgure 4: Scatterplots of estmated posteror p-values from an MCMC smulaton aganst actual CV posteror p-values. The number for ponts show ndces of plates Estmated p value by Posteror Check 0.0 0.2 0.4 0.6 0.8 1.0 11 5 8 12 6 20 15 4 7 18 2 19 2113 9 16 17 1 10 14 3 Estmated p value by Ghost method 0.0 0.2 0.4 0.6 0.8 1.0 11 5 6 12 8 20 15 4 7 18 2 19 2113 9 16 17 10 1 3 14 0.0 0.2 0.4 0.6 0.8 1.0 CV Posteror p value (a) Posteror checkng 0.0 0.2 0.4 0.6 0.8 1.0 CV Posteror p value (b) Ghostng method Cross-valdaton wth Integrated IS and WAIC/6. Real Data Examples/CV Posteror p-values n Logstc Regresson 42/46

Comparson of Estmated CV Posteror p-value II Estmated p value by nis 0.0 0.2 0.4 0.6 0.8 1.0 11 5 6 12 8 20 15 4 7 18 2 19 13 21 9 3 16 17 10 1 14 Estmated p value by IS 0.0 0.2 0.4 0.6 0.8 1.0 11 5 6 12 8 20 15 4 7 18 2 19 21 13 9 16 10 17 31 14 0.0 0.2 0.4 0.6 0.8 1.0 CV Posteror p value (c) Non-ntegrated IS (nis) 0.0 0.2 0.4 0.6 0.8 1.0 CV Posteror p value (d) Integrated IS (IS) Cross-valdaton wth Integrated IS and WAIC/6. Real Data Examples/CV Posteror p-values n Logstc Regresson 43/46

Replcaton Study To measure more precsely the accuracy of estmated p-values to the actual CV p-values, we use absolute relatve error n percentage scale defned as RE = (1/n) n =1 ˆp p 100, (54) mn(p, 1 p ) where ˆp 1:n are estmates of p 1:n. Ths measure emphaszes greatly on the error between ˆp and p when p s very small or very large, for whch we demand more on absolute error than when p s close to 0.5. Table 4: Comparsons of the averages of 100 absolute relatve errors (n percentage) of estmated CV p-values from 100 ndependent MCMC smulatons, for logstc regresson example. The numbers n brackets ndcate standard devatons. IS nis Ghostng Posteror checkng 2.319(0.399) 5.234(1.083) 35.610(1.267) 93.887(3.854) Cross-valdaton wth Integrated IS and WAIC/6. Real Data Examples/CV Posteror p-values n Logstc Regresson 44/46

Conclusons and Future Work The new proposed IS and WAIC sgnfcantly reduce the bas of nis and nwaic n evaluatng Bayesan models wth unt-specfc latent varables. In our studes, they gave results very close to what gven by the actual cross-valdaton. WAIC works very well n the spatal random effect models. The result s surprsng and encouragng. One may consder nvestgatng the valdty of WAIC theoretcally. IS and WAIC are lmted to Bayesan model wth unt-specfc latent varables. In many models, a latent varable s shared by multple unts. How to mprove IS and WAIC for such models? Advancng applcatons to other models. An nterestng model s auto logstc regresson model for spatal data, where the model s defned wth condtonal dstrbuton only. I wll nvestgate the applcablty of IS, WAIC or cross-valdaton tself for comparson and dagnostcs of such models. Cross-valdaton wth Integrated IS and WAIC/7. Conclusons and Future Work/ 45/46

References To read more of ths topc, the followng s a short lst of references: Vehtar, A. and Ojanen, J. (2012), A survey of Bayesan predctve methods for model assessment, selecton and comparson, Statstcs Surveys, 6, 142-228. Spegelhalter, D. J., Best, N. G., Carln, B. P., and van der Lnde, A. (2002), Bayesan measures of model complexty and ft, JRSSB, 64, 583-639. Watanabe, S. (2009), Asymptotc Equvalence of Bayes Cross Valdaton and Wdely Applcable Informaton Crteron n Sngular Learnng Theory, Journal of Machne Learnng Research, 11, 3571-3594. Gelman, A., Hwang, J., and Vehtar, A. (2013), Understandng predctve nformaton crtera for Bayesan models, unpublshed onlne manuscrpt, avalable from Gelman s webste. The paper wth more detals about ths talk can be found from: http://math.usask.ca/~longha/doc. Cross-valdaton wth Integrated IS and WAIC/8. References/ 46/46