Multivariate Statistical Modelling Based on Generalized Linear Models

Similar documents
Regression Modeling Strategies

BayesX - Software for Bayesian Inference in Structured Additive Regression

Statistics Graduate Courses

CHAPTER 3 EXAMPLES: REGRESSION AND PATH ANALYSIS

SAS Software to Fit the Generalized Linear Model

STATISTICA Formula Guide: Logistic Regression. Table of Contents

Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics.

CHAPTER 8 EXAMPLES: MIXTURE MODELING WITH LONGITUDINAL DATA

Applied Multiple Regression/Correlation Analysis for the Behavioral Sciences

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics

SPSS TRAINING SESSION 3 ADVANCED TOPICS (PASW STATISTICS 17.0) Sun Li Centre for Academic Computing lsun@smu.edu.sg

STA 4273H: Statistical Machine Learning

Adequacy of Biomath. Models. Empirical Modeling Tools. Bayesian Modeling. Model Uncertainty / Selection

APPLIED MISSING DATA ANALYSIS

Nominal and ordinal logistic regression

Master programme in Statistics

How To Understand Multivariate Models

Semiparametric Multinomial Logit Models for the Analysis of Brand Choice Behaviour

CHAPTER 12 EXAMPLES: MONTE CARLO SIMULATION STUDIES

Advanced Signal Processing and Digital Noise Reduction

Univariate and Multivariate Methods PEARSON. Addison Wesley

Statistics in Retail Finance. Chapter 6: Behavioural models

CHAPTER 9 EXAMPLES: MULTILEVEL MODELING WITH COMPLEX SURVEY DATA

Handling attrition and non-response in longitudinal data

Spatial Statistics Chapter 3 Basics of areal data and areal data modeling

MATHEMATICAL METHODS OF STATISTICS

Multivariate Statistical Inference and Applications

Statistical Tools for Nonlinear Regression

Linda K. Muthén Bengt Muthén. Copyright 2008 Muthén & Muthén Table Of Contents

7 Generalized Estimating Equations

Applied Statistics. J. Blanchet and J. Wadsworth. Institute of Mathematics, Analysis, and Applications EPF Lausanne

Generalized Linear Models

SAS Certificate Applied Statistics and SAS Programming

Interpretation of Somers D under four simple models

Publication List. Chen Zehua Department of Statistics & Applied Probability National University of Singapore

11. Time series and dynamic linear models

ATV - Lifetime Data Analysis

UNDERGRADUATE DEGREE DETAILS : BACHELOR OF SCIENCE WITH

SUMAN DUVVURU STAT 567 PROJECT REPORT

Computer-Aided Multivariate Analysis

Contents. List of Figures. List of Tables. List of Examples. Preface to Volume IV

Applied Regression Analysis and Other Multivariable Methods

How To Understand The Theory Of Probability

GLM I An Introduction to Generalized Linear Models

Department of Epidemiology and Public Health Miller School of Medicine University of Miami

Modeling and Analysis of Call Center Arrival Data: A Bayesian Approach

Overview. Longitudinal Data Variation and Correlation Different Approaches. Linear Mixed Models Generalized Linear Mixed Models

Probabilistic Methods for Time-Series Analysis

PS 271B: Quantitative Methods II. Lecture Notes

Efficient and Practical Econometric Methods for the SLID, NLSCY, NPHS

Master s thesis tutorial: part III

Exploratory Data Analysis with MATLAB

15 Ordinal longitudinal data analysis

Logistic Regression (a type of Generalized Linear Model)

Assumptions. Assumptions of linear models. Boxplot. Data exploration. Apply to response variable. Apply to error terms from linear model

Statistical Rules of Thumb

Methods for Meta-analysis in Medical Research

Logistic Regression (1/24/13)

HLM software has been one of the leading statistical packages for hierarchical

Alessandro Birolini. ineerin. Theory and Practice. Fifth edition. With 140 Figures, 60 Tables, 120 Examples, and 50 Problems.

Regression III: Advanced Methods

MISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group

STATISTICS COURSES UNDERGRADUATE CERTIFICATE FACULTY. Explanation of Course Numbers. Bachelor's program. Master's programs.

Bayesian Statistics: Indian Buffet Process

Longitudinal Data Analysis. Wiley Series in Probability and Statistics

Linear Threshold Units

THE MULTIVARIATE ANALYSIS RESEARCH GROUP. Carles M Cuadras Departament d Estadística Facultat de Biologia Universitat de Barcelona

MODELLING AND ANALYSIS OF

Probability and Statistics

Imputing Missing Data using SAS

Practical. I conometrics. data collection, analysis, and application. Christiana E. Hilmer. Michael J. Hilmer San Diego State University

Multivariate Logistic Regression

Stephen du Toit Mathilda du Toit Gerhard Mels Yan Cheng. LISREL for Windows: PRELIS User s Guide

A random point process model for the score in sport matches

Linear Classification. Volker Tresp Summer 2015

Statistics Graduate Programs

A Bayesian hierarchical surrogate outcome model for multiple sclerosis

Logistic regression modeling the probability of success

A Basic Introduction to Missing Data

Missing data and net survival analysis Bernard Rachet

PREDICTIVE DISTRIBUTIONS OF OUTSTANDING LIABILITIES IN GENERAL INSURANCE

The Probit Link Function in Generalized Linear Models for Data Mining Applications

Detection. Perspective. Network Anomaly. Bhattacharyya. Jugal. A Machine Learning »C) Dhruba Kumar. Kumar KaKta. CRC Press J Taylor & Francis Croup

Christfried Webers. Canberra February June 2015

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Joint models for classification and comparison of mortality in different countries.

Bayesian networks - Time-series models - Apache Spark & Scala

Local classification and local likelihoods

Multilevel Modelling of medical data

New Work Item for ISO Predictive Analytics (Initial Notes and Thoughts) Introduction

ECON 523 Applied Econometrics I /Masters Level American University, Spring Description of the course

Order Statistics: Theory & Methods. N. Balakrishnan Department of Mathematics and Statistics McMaster University Hamilton, Ontario, Canada. C. R.

Survival analysis methods in Insurance Applications in car insurance contracts

Problem of Missing Data

(and sex and drugs and rock 'n' roll) ANDY FIELD

Lecture 3: Linear methods for classification

Transcription:

Ludwig Fahrmeir Gerhard Tutz Multivariate Statistical Modelling Based on Generalized Linear Models Second Edition With contributions from Wolfgang Hennevogl With 51 Figures Springer

Contents Preface to the Second Edition v Preface to the First Edition, vii List of Examples xvii List of Figures xxi List of Tables xxv 1. Introduction 1 1.1 Outline and Examples 2 1.2 Remarks on Notation 13 1.3 Notes and Further Reading 14 2. Modelling and Analysis of Cross-Sectional Data: A Review of Univariate Generalized Linear Models 15 2.1 Univariate Generalized Linear Models 16 2.1.1 Data 16 Coding of Covariates 16 Grouped and Ungrouped Data 17 2.1.2 Definition of Univariate Generalized Linear Models.. 18 2.1.3 Models for Continuous Responses 22 Normal Distribution 22 Gamma Distribution 23 Inverse Gaussian Distribution 24 2.1.4 Models for Binary and Binomial Responses 24 Linear Probability Model 25 Probit Model 26 Logit Model 26 Complementary Log-Log Model 26 Complementary Log-Model 26 Binary Models as Threshold Models of Latent Linear Models 29 Parameter Interpretation 29 Overdispersion 35 2.1.5 Models for Count Data 36 Log-linear Poisson Model 36

x Contents Linear Poisson Model 36 2.2 Likelihood Inference 38 2.2.1 Maximum Likelihood Estimation 38 Log-likelihood, Score Function and Information Matrix 39 Numerical Computation of the MLE by Iterative Methods 41 Uniqueness and Existence of MLEs* 43 Asymptotic Properties 44 Discussion of Regularity Assumptions* 46. Additional Scale or Overdispersion Parameter 47 2.2.2 Hypothesis Testing and Goodness-of-Fit Statistics... 47 Goodness-of-Fit Statistics 50 2.3 Some Extensions 55 2.3.1 Quasi-likelihood Models 55 Basic Models 55 Variance Functions with Unknown Parameters 58 Nonconstant Dispersion Parameter 59 2.3.2 Bayesian Models 60 2.3.3 Nonlinear and Nonexponential Family Regression p Models* 65 2.4 Notes and Further Reading 67 3. Models for Multicategorical Responses: Multivariate Extensions of Generalized Linear Models 69 3-1 Multicategorical Response Models 70 3.1.1 Multinomial Distribution 70 3.1.2 Data 71 3.1.3 The Multivariate Model 72 3.1.4 Multivariate Generalized Linear Models 75 3.2 Models for Nominal Responses 77 3.2.1 The Principle of Maximum Random Utility..'. 77 3.2.2 Modelling of Explanatory Variables: Choice of Design Matrix 79 3.3 Models for Ordinal Responses 81 3.3.1 Cumulative Models: The Threshold Approach 83 Cumulative Logistic Model or Proportional Odds Model 83 Grouped Cox Model or Proportional Hazards Model. 86 Extreme Maximal-value Distribution Model 86 3.3.2 Extended Versions of Cumulative Models 87 3.3.3 Link Functions and Design Matrices for Cumulative Models 88 3.3.4 Sequential Models 92 Generalized Sequential Models 95 Link Functions of Sequential Models 98

Contents xi 3.3.5 Strict Stochastic Ordering* 99 3.3.6 Two-Step Models 100 Link Function and Design Matrix for Two-Step Models 102 3.3.7 Alternative Approaches 103 3.4 Statistical Inference 105 3.4.1 Maximum Likelihood Estimation 105 Numerical Computation 107 3.4.2 Testing and Goodness-of-Fit 107 Testing of Linear Hypotheses 107 Goodness-of-Fit Statistics 107 3.4.3 Power-Divergence Family* 109 Asymptotic Properties under Classical "Fixed Cells" Assumptions Ill Sparseness and "Increasing-Cells" Asymptotics 112 3.5 Multivariate Models for Correlated Responses 112 ^ 3.5.1 Conditional Models 114 Asymmetric Models 114 Symmetric Models 116 3.5.2 Marginal Models 119 Marginal Models for Correlated Univariate Responses 120 The Generalized Estimating Approach for Statistical Inference 123 Marginal Models for Correlated Categorical Responses 129 Likelihood-based Inference for Marginal Models 135 3.6 Notes and Further Reading 136 Bayesian Inference 136 4. Selecting and Checking Models 139 4.1 Variable Selection : 139 4.1.1 Selection Criteria 140 4.1.2 Selection Procedures 142 All-Subsets Selection 142 Stepwise Backward and Forward Selection 143 4.2 Diagnostics 145 4.2.1 Diagnostic Tools for the Classical Linear Model... 146 4.2.2 Generalized Hat Matrix 147 4.2.3 Residuals and Goodness-of-Fit Statistics 151 4.2.4 Case Deletion 156 4.3 General Tests for Misspecification* 161 4.3.1 Estimation under Model Misspecification 162 4.3.2 Hausman-type Tests 165 Hausman Tests 165 Information Matrix Test 166

xii Contents 4.3.3 Tests for Nonnested Hypotheses 167 Tests Based on Artificial Nesting 168 Generalized Wald and Score Tests 168 4.4 Notes and Further Reading 170 Bayesian Model Determination 170 Robust Estimates 172 Model Tests Against Smooth Alternatives 172 5. Semi- and Nonparametric Approaches to Regression Analysis 173 5.1 Smoothing Techniques for Continuous Responses 174 5.1.1 Regression Splines and Other Basis Functions 174 Regression Splines 176 Other Basis Functions 178 Regularization 179 5.1.2*" Smoothing Splines 181 5.1.3 Local Estimators 183 Simple Neighborhood Smoothers 183 Local Regression 184 Bias-Variance Trade-off 187 Relation to Other Smoothers 189 5.1.4 Selection of Smoothing Parameters 190 5.2 Smoothing for Non-Gaussian Data 193 5.2.1 Basis Function Approach 193 Fisher Scoring for Penalized Likelihood* 194 5.2.2 Penalization and Spline Smoothing 195 Fisher Scoring for Generalized Spline Smoothing*... 196 Choice of Smoothing Parameter 197 5.2.3 Localizing Generalized Linear Models 198 Local Fitting by Weighted Scoring 201 5.3 Modelling with Multiple Covariates 202 5.3.1 Modelling Approaches 207 Generalized Additive Models 207 Partially Linear Models 208 Varying-Coefficient Models 208 Projection Pursuit Regression 209 Basis Function Approach 210 5.3.2 Estimation Concepts 213 Backfitting Algorithm for Generalized Additive Models 213 Backfitting with Spline Functions 217 Choice of Smoothing Parameter 220 Partial Linear Models 220 5.4 Semiparametric Bayesian Inference for Generalized Regression 221

Contents 5.4.1 Gaussian Responses 221 Smoothness Priors Approaches 221 Basis Function Approaches 227 Models with Multiple Covariates 228 5.4.2 Non-Gaussian Responses 231 Latent Variable Models for Categorical Responses... 234 5.5 Notes and Further Reading 239 6. Fixed Parameter Models for Time Series and Longitudinal Data 241 6.1 Time Series 242 6.1.1 Conditional Models 242 Generalized Autoregressive Models 242 Quasi-Likelihood Models and Generalized Autoregression Moving Average Models 246 6.1.2 Statistical Inference for Conditional Models 249 6.1.3 Marginal Models 255 Estimation of Marginal Models 258 6.2 Longitudinal Data 260 6.2.1 Conditional Models 261 Generalized Autoregressive Models, Quasi-Likelihood Models 261 Statistical Inference 262 Transition Models 264 Subject-specific Approaches and Conditional Likelihood 264 6.2.2 Marginal Models 267 Statistical Inference 268 6.2.3 Generalized Additive Models for Longitudinal Data.. 274 6.3 Notes and Further Reading 278 7. Random Effects Models ;... 283 7.1 Linear Random Effects Models for Normal Data 285 7.1.1 Two-stage Random Effects Models 285 Random Intercepts 286 Random Slopes 287 Multilevel Models 288 7.1.2 Statistical Inference 289 Known Variance-Covariance Components 289 Unknown Variance-Covariance Components 289 Derivation of the EM algorithm* 291 7.2 Random Effects in Generalized Linear Models 292 Generalized Linear Models with Random Effects... 293 Examples 294 7.3 Estimation Based on Posterior Modes 298 xiii

xiv Contents 7.3.1 Known Variance-Covariance" Components 298 7.3.2 Unknown Variance-Covariance Components 299 7.3.3 Algorithmic Details*.300 Fisher Scoring for Given Variance-Covariance Components 300 EM Type Algorithm 302 7.4 Estimation by Integration Techniques 303 7.4.1 Maximum Likelihood Estimation of Fixed Parameters 303 Direct Maximization Using Fitting Techniques for GLMs 305 Nonparametric Maximum Likelihood for Finite Mixtures 308 7.4.2 Posterior Mean Estimation of Random Effects 310 > 7.4.3 Indirect Maximization Based on the EM Algorithm*. 311 7.4.4 Algorithmic Details for Posterior Mean Estimation*. 315 " 7.5 Examples 318 7.6 Bayesian Mixed Models 321 Bayesian Generalized Mixed Models 321 Generalized Additive Mixed Models 322 7.7 Marginal Estimation Approach to Random Effects Models.. 325 7.8 Notes and Further Reading 328 8. State Space and Hidden Markov Models 331 8.1 Linear State Space Models and the Kalman Filter 332 8.1.1 Linear State Space Models 332 8.1.2 Statistical Inference 337 Linear Kalman Filtering and Smoothing 338 Kalman Filtering and Smoothing as Posterior Mode Estimation* 340 Unknown Hyperparameters 342 EM Algorithm for Estimating Hyperparameters*... 343 8.2 Non-Normal and Nonlinear State Space Models...' 345 8.2.1 Dynamic Generalized Linear Models 345 Categorical Time Series 347 8.2.2 Nonlinear and Nonexponential Family Models* 349 8.3 Non-Normal Filtering and Smoothing 350 8.3.1 Posterior Mode Estimation 351 Generalized Extended Kalman Filter and Smoother*. 352 Gauss-Newton and Fisher-Scoring Filtering and Smoothing*, 354 Estimation of Hyperparameters* 356 Some Applications 356 8.3.2 Markov Chain Monte Carlo and Integration-based Approaches 361 MCMC Inference 362

Contents Integration-based Approaches 365 8.4 Longitudinal Data 369 8.4.1 State Space Modelling of Longitudinal Data 369 8.4.2 Inference For Dynamic Generalized Linear Mixed Models 372 8.5 Spatial and Spatio-temporal Data 376 8.6 Notes and Further Reading 383 9. Survival Models 385 9.1 Models for Continuous Time 385 9.1.1 Basic Models 385 Exponential Distribution 386 Weibull Distribution 387 * Piecewise Exponential Model 388 9.1.2 Parametric Regression Models 388 Location-Scale Models for log T 388 Proportional Hazards Models 389 Linear Transformation Models and Binary Regression Models 390 9.1.3 Censoring 391 Random Censoring 391 Type I Censoring 392 9.1.4 Estimation 393 Exponential Model 394 Weibull Model 394 Piecewise Exponential Model 395 9.2 Models for Discrete Time 396 9.2.1 Life Table Estimates 397 9.2.2 Parametric Regression Models 400 The Grouped Proportional Hazards Model 400 A Generalized Version: The Model of Aranda-Qrdaz. 402 The Logistic Model ; 403 Sequential Model and Parameterization of the Baseline Hazard 403 9.2.3 Maximum Likelihood Estimation 404 9.2.4 Time-varying Covariates 408 Internal Covariates* 411 Maximum Likelihood Estimation* 412 9.3 Discrete Models for Multiple Modes of Failure 414 9.3.1 Basic Models 414 9.3.2 Maximum Likelihood Estimation 417 9.4 Smoothing in Discrete Survival Analysis 420 9.4.1 Smoothing Life Table Estimates 420 9.4.2 Smoothing with Covariates 422 9.4.3 Dynamic Discrete-Time Survival Models 423 xv

xvi Contents Posterior Mode Smoothing 423 Fully Bayesian Inference via MCMC 425 9.5 Remarks and Further Reading 429 A 433 A.I Exponential Families and Generalized Linear Models 433 A.2 Basic Ideas for Asymptotics 437 A.3 EM Algorithm 442 A.4 Numerical Integration 443 A.5 Monte Carlo Methods 449 B. Software for Fitting Generalized Linear Models and Extensions 455 Bibliography 467 Author Index 505 Subject Index 512