BayesX - Software for Bayesian Inference in Structured Additive Regression



Similar documents
Semiparametric Multinomial Logit Models for the Analysis of Brand Choice Behaviour

Statistics Graduate Courses

Flexible geoadditive survival analysis of non-hodgkin lymphoma in Peru

CHAPTER 3 EXAMPLES: REGRESSION AND PATH ANALYSIS

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics

Adequacy of Biomath. Models. Empirical Modeling Tools. Bayesian Modeling. Model Uncertainty / Selection

Handling attrition and non-response in longitudinal data

SAS Software to Fit the Generalized Linear Model

Spatial Statistics Chapter 3 Basics of areal data and areal data modeling

STA 4273H: Statistical Machine Learning

Modeling and Analysis of Call Center Arrival Data: A Bayesian Approach

PS 271B: Quantitative Methods II. Lecture Notes

Predictive Modeling Techniques in Insurance

Statistics in Retail Finance. Chapter 6: Behavioural models

How To Understand The Theory Of Probability

Bayesian Machine Learning (ML): Modeling And Inference in Big Data. Zhuhua Cai Google, Rice University

Multilevel Modelling of medical data

Statistics & Probability PhD Research. 15th November 2014

CHAPTER 9 EXAMPLES: MULTILEVEL MODELING WITH COMPLEX SURVEY DATA

STATISTICA Formula Guide: Logistic Regression. Table of Contents

Master programme in Statistics

SPSS TRAINING SESSION 3 ADVANCED TOPICS (PASW STATISTICS 17.0) Sun Li Centre for Academic Computing lsun@smu.edu.sg

Parallelization Strategies for Multicore Data Analysis

Regression Modeling Strategies

Lecture 8: Gamma regression

SUMAN DUVVURU STAT 567 PROJECT REPORT

11. Time series and dynamic linear models

CHAPTER 8 EXAMPLES: MIXTURE MODELING WITH LONGITUDINAL DATA

Publication List. Chen Zehua Department of Statistics & Applied Probability National University of Singapore

Christfried Webers. Canberra February June 2015

ANNUITY LAPSE RATE MODELING: TOBIT OR NOT TOBIT? 1. INTRODUCTION

Linear Threshold Units

LÉVY-DRIVEN PROCESSES IN BAYESIAN NONPARAMETRIC INFERENCE

Dirichlet Processes A gentle tutorial

Gaussian Processes in Machine Learning

Missing data and net survival analysis Bernard Rachet

Overview. Longitudinal Data Variation and Correlation Different Approaches. Linear Mixed Models Generalized Linear Mixed Models

**BEGINNING OF EXAMINATION** The annual number of claims for an insured has probability function: , 0 < q < 1.

Lecture 3: Linear methods for classification

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012

Probabilistic Methods for Time-Series Analysis

CHAPTER 12 EXAMPLES: MONTE CARLO SIMULATION STUDIES

Smoothing and Non-Parametric Regression

Bayesian Statistics: Indian Buffet Process

The zero-adjusted Inverse Gaussian distribution as a model for insurance claims

Bayesian Estimation of Joint Survival Functions in Life Insurance

Linda K. Muthén Bengt Muthén. Copyright 2008 Muthén & Muthén Table Of Contents

EMPIRICAL RISK MINIMIZATION FOR CAR INSURANCE DATA

Doctor of Philosophy Program in Statistics International Program Faculty of Science and Technology Thammasat University New Curriculum Year 2004

Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics.

GLM I An Introduction to Generalized Linear Models

Probabilistic Models for Big Data. Alex Davies and Roger Frigola University of Cambridge 13th February 2014

APPLIED MISSING DATA ANALYSIS

Imputing Values to Missing Data

State Space Time Series Analysis

DURATION ANALYSIS OF FLEET DYNAMICS


Combining Linear and Non-Linear Modeling Techniques: EMB America. Getting the Best of Two Worlds

Statistical machine learning, high dimension and big data

Geostatistics Exploratory Analysis

A revisit of the hierarchical insurance claims modeling

MISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group

Exam C, Fall 2006 PRELIMINARY ANSWER KEY

Multivariate Statistical Inference and Applications

Bayesian Hidden Markov Models for Alcoholism Treatment Tria

Applied Statistics. J. Blanchet and J. Wadsworth. Institute of Mathematics, Analysis, and Applications EPF Lausanne

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Dynamic Linear Models with R

Regression III: Advanced Methods

Nonnested model comparison of GLM and GAM count regression models for life insurance data

Chapter 6: Multivariate Cointegration Analysis

Penalized Splines - A statistical Idea with numerous Applications...

Modeling the Claim Duration of Income Protection Insurance Policyholders Using Parametric Mixture Models

An explicit link between Gaussian fields and Gaussian Markov random fields; the stochastic partial differential equation approach

Statistics Graduate Programs

BNG 202 Biomechanics Lab. Descriptive statistics and probability distributions I

Web-based Supplementary Materials for Bayesian Effect Estimation. Accounting for Adjustment Uncertainty by Chi Wang, Giovanni

Least Squares Estimation

Model-based Synthesis. Tony O Hagan

Gaussian Processes to Speed up Hamiltonian Monte Carlo

Linear regression methods for large n and streaming data

Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.

Course Text. Required Computing Software. Course Description. Course Objectives. StraighterLine. Business Statistics

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus

Gamma Distribution Fitting

Exact Inference for Gaussian Process Regression in case of Big Data with the Cartesian Product Structure

Using the Delta Method to Construct Confidence Intervals for Predicted Probabilities, Rates, and Discrete Changes

business statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar

South Carolina College- and Career-Ready (SCCCR) Probability and Statistics

Probabilistic concepts of risk classification in insurance

NEW YORK STATE TEACHER CERTIFICATION EXAMINATIONS

Transcription:

BayesX - Software for Bayesian Inference in Structured Additive Regression Thomas Kneib Faculty of Mathematics and Economics, University of Ulm Department of Statistics, Ludwig-Maximilians-University Munich joint work with Stefan Lang (University of Innsbruck) & Andreas Brezger (University of Munich) 2.7.2007

Structured Additive Regression Structured Additive Regression Regression in a general sense: Generalised linear models, Multivariate (categorical) generalised linear models, Regression models for duration times (Cox-type models, multi-state models). Common structure: Model a quantity of interest in terms of categorical and continuous covariates, e.g. E(y u) = h(u γ) (GLM) or λ(t u) = λ 0 (t) exp(u γ) (Cox model) BayesX - Software for Bayesian Inference in Structured Additive Regression 1

Structured Additive Regression General idea of structured additive regression: Replace usual parametric predictor with a flexible semiparametric predictor containing Nonparametric effects of time scales and continuous covariates, Spatial effects, Interaction surfaces, Varying coefficient terms (continuous and spatial effect modifiers), Random intercepts and random slopes. BayesX - Software for Bayesian Inference in Structured Additive Regression 2

Example: Car insurance data from two insurance companies in Belgium. Structured Additive Regression Sample of approximately 160.000 policyholders. Aims: Separate risk analyses for claim size and claim frequency to predict risk premium from covariates. Variables of primary interest: Claim size y i or claim frequency h i of policyholders. Covariates: vage vehicles age page policyholders age hp vehicles horsepower bm bonus-malus score s district in Belgium v Vector of categorical covariates BayesX - Software for Bayesian Inference in Structured Additive Regression 3

Geoadditive models: Structured Additive Regression Gaussian model for log-costs log(y): log(y) N(η, σ 2 ) with η = f 1 (vage) + f 2 (page) + f 3 (bm) + f 4 (hp) + f spat (s) + v ζ. Poisson model for frequencies h i : h P o(exp(η)) with η = f 1 (vage) + f 2 (page) + f 3 (page)sex + f 3 (bm) + f 4 (hp) + f spat (s) + v ζ. BayesX - Software for Bayesian Inference in Structured Additive Regression 4

Results for claim frequency: Structured Additive Regression.6.3 0.3.6.2 0.2.4.6.8 17 30 43 56 69 82 95 age of the policyholder 17 30 43 56 69 82 95 age of the policyholder (deviation for males).7.35 0.35.7 0 5.5 11 16.5 22 bonus malus score 1.1.6.1.4.9 1.4 10 39 68 97 126 155 184 213 242 horse power of the car BayesX - Software for Bayesian Inference in Structured Additive Regression 5

Structured Additive Regression -0.470814 0 0.642283 BayesX - Software for Bayesian Inference in Structured Additive Regression 6

Spatial effect for claim size: Structured Additive Regression -0.181311 0 0.26348 BayesX - Software for Bayesian Inference in Structured Additive Regression 7

Model Components and Priors Model Components and Priors Penalised splines. Approximate f(x) = ξ j B j (x) by a weighted sum of B-spline basis functions. Employ a large number of basis functions to enable flexibility. Penalise differences between parameters of adjacent basis functions to ensure smoothness. 2 1 0 1 2 3 1.5 0 1.5 3 2 1 0 1 2 3 1.5 0 1.5 3 2 1 0 1 2 3 1.5 0 1.5 3 BayesX - Software for Bayesian Inference in Structured Additive Regression 8

Bivariate penalised splines. Model Components and Priors Varying coefficient models. Effect of covariate x varies smoothly over the domain of a second covariate z: f(x, z) = x g(z) Spatial effect modifier Geographically weighted regression. BayesX - Software for Bayesian Inference in Structured Additive Regression 9

Spatial effect for regional data: Markov random fields. Model Components and Priors Bivariate extension of a first order random walk on the real line. Define appropriate neighbourhoods for the regions. Assume that the expected value of f spat (s) is the average of the function evaluations of adjacent sites. f(t+1) E[f(t) f(t 1),f(t+1)] f(t 1) τ 2 2 t 1 t t+1 BayesX - Software for Bayesian Inference in Structured Additive Regression 10

Model Components and Priors Spatial effect for point-referenced data: Stationary Gaussian random fields. Well-known as Kriging in the geostatistics literature. Spatial effect follows a zero mean stationary Gaussian stochastic process. Correlation of two arbitrary sites is defined by an intrinsic correlation function. Can be interpreted as a basis function approach with radial basis functions. BayesX - Software for Bayesian Inference in Structured Additive Regression 11

All effects can be cast into one general framework. Model Components and Priors All vectors of function evaluations f j can be expressed as f j = Z j ξ j with design matrix Z j and regression coefficients ξ j. Generic form of the prior for ξ j : p(ξ j τ 2 j ) (τ 2 j ) k j 2 exp ( 1 2τ 2 j ξ jk j ξ j ). K j 0 acts as a penalty matrix, rank(k j ) = k j d j = dim(ξ j ). τ 2 j 0 can be interpreted as a variance or (inverse) smoothness parameter. BayesX - Software for Bayesian Inference in Structured Additive Regression 12

Bayesian Inference Bayesian Inference Fully Bayesian inference: All parameters (including the variance parameters τ 2 ) are assigned suitable prior distributions. Typically, estimation is based on MCMC simulation techniques. Usual estimates: Posterior expectation, posterior median (easily obtained from the samples). Empirical Bayes inference: Differentiate between parameters of primary interest (regression coefficients) and hyperparameters (variances). Assign priors only to the former. Estimate the hyperparameters by maximising their marginal posterior. Plugging these estimates into the joint posterior and maximising with respect to the parameters of primary interest yields posterior mode estimates. BayesX - Software for Bayesian Inference in Structured Additive Regression 13

Bayesian Inference MCMC-based inference: Assign inverse gamma prior to τ 2 j : p(τ 2 j ) ( ) 1 (τj 2)a j+1 exp b j τj 2. Proper for a j > 0, b j > 0 Common choice: a j = b j = ε small. Improper for b j = 0, a j = 1 Flat prior for variance τj 2, b j = 0, a j = 1 2 Flat prior for standard deviation τ j. Conditions for proper posteriors in structured additive regression are available. Gibbs sampler for τj 2 : Sample from an inverse Gamma distribution with parameters a j = a j + 1 2 rank(k j) and b j = b j + 1 2 ξ jk j ξ j. BayesX - Software for Bayesian Inference in Structured Additive Regression 14

Bayesian Inference Metropolis-Hastings update for ξ j : Propose new state from a multivariate Gaussian distribution with precision matrix and mean P j = Z jw Z j + 1 τ 2 j K j and m j = P 1 j Z jw (ỹ η j ). IWLS-Proposal with appropriately defined working weights W observations ỹ. and working Efficient algorithms make use of the sparse matrix structure of P j and K j. For binary or categorical regression models: Efficient implementation based on latent variable representation. BayesX - Software for Bayesian Inference in Structured Additive Regression 15

Bayesian Inference Empirical Bayes inference. Consider the variances τ 2 j as unknown constants to be estimated from their marginal posterior. Consider the regression coefficients ξ j as correlated random effects with multivariate Gaussian distribution Use mixed model methodology for estimation. Problem: In most cases partially improper random effects distribution. Mixed model representation: Decompose where ξ j = X j β j + V j b j, p(β j ) const and b j N(0, τ 2 j I kj ). β j is a fixed effect and b j is an i.i.d. random effect. BayesX - Software for Bayesian Inference in Structured Additive Regression 16

This yields a variance components model with pedictor Bayesian Inference η = Xβ + V b where in turn p(β) const and b N(0, Q). Obtain empirical Bayes estimates / penalized likelihood estimates via iterating Penalized maximum likelihood for the regression coefficients β and b. Restricted Maximum / Marginal likelihood for the variance parameters in Q: L(Q) = L(β, b, Q)p(b)dβdb max Q. Involves a Laplace approximation to the marginal likelihood (corresponding to REML estimation of variances in Gaussian mixed models). BayesX - Software for Bayesian Inference in Structured Additive Regression 17

BayesX BayesX BayesX is a software tool for estimating structured additive regression models. BayesX - Software for Bayesian Inference in Structured Additive Regression 18

Stand-alone software with Stata-like syntax. BayesX Developed by Andreas Brezger, Thomas Kneib and Stefan Lang with contributions of seven colleagues. Computationally demanding parts are implemented in C++. Graphical user interface and visualisation tools are implemented in Java. Currently, BayesX only runs under Windows, a Linux version as well as a connection to R are work in progress. BayesX - Software for Bayesian Inference in Structured Additive Regression 19

Resources: BayesX http://www.stat.uni-muenchen.de/~bayesx. Reference Manual, Methodology Manual, Tutorial Manual. Some publications: Brezger, Kneib & Lang (2005). BayesX: Analyzing Bayesian structured additive regression models. Journal of Statistical Software, 14 (11). Brezger, A. & Lang, S. (2006). Generalized additive regression based on Bayesian P-splines. Computational Statistics and Data Analysis 50, 967 991. Fahrmeir, L., Kneib, T. & Lang, S. (2004). Penalized structured additive regression for space-time data: a Bayesian perspective. Statistica Sinica 14, 731 761. BayesX - Software for Bayesian Inference in Structured Additive Regression 20

R provides functionality for generalised additive models and a number of extensions such as varying coefficient models, interaction surfaces, spatio-temporal effects, etc. What is the gain in using BayesX? BayesX BayesX provides functionality for more general response types: Ordered and unordered categorical responses, Continuous survival time models extending the Cox model, Multi-state models. Inference can be based on mixed model methodology or MCMC. Not all possibilities and combinations are supported by both inferential concepts. BayesX - Software for Bayesian Inference in Structured Additive Regression 21

Categorical responses with unordered categories: BayesX Multinomial logit and multinomial probit models, Category-specific and globally-defined covariates, Non-availability indicators can be defined to account for varying choice sets. Response families in BayesX: multinomial, multinomialprobit. Categorical responses with ordered categories: Ordinal as well as sequential models, Logit and probit models, Effects can be category-specific or constant over the categories. Response families in BayesX: cumlogit, cumprobit, seqlogit, seqprobit. BayesX - Software for Bayesian Inference in Structured Additive Regression 22

Continuous survival times: BayesX Cox-type hazard regression models, Joint estimation of the baseline hazard rate and the covariate effects, Time-varying effects and time-varying covariates, Arbitrary combinations of right, left and interval censoring as well as left truncation. Response family: cox. Multi-state models: Describe the evolution of discrete phenomena in continuous time, Model in terms of transition intensities, similar as in the Cox model. Response family: multistate. BayesX - Software for Bayesian Inference in Structured Additive Regression 23

Summary & Future Plans Summary & Future Plans Take home message: BayesX is a user-friendly software that allows for the routine estimation of a broad class of semiparametric regression models even in non-standard situations. Future plans: Linux version and connection to R. Interval censoring for multi-state models. Bayesian regularisation-priors and model choice (with LASSO-type priors). Bayesian treatment of measurement error in structured additive regressions models. BayesX - Software for Bayesian Inference in Structured Additive Regression 24