Model-Based Recursive Partitioning for Detecting Interaction Effects in Subgroups

Similar documents

Testing, Monitoring, and Dating Structural Changes in Exchange Rate Regimes

Implementing a Class of Structural Change Tests: An Econometric Computing Approach

Model-based Recursive Partitioning

partykit: A Toolkit for Recursive Partytioning

A Unified Approach to Structural Change Tests Based on ML Scores, F Statistics, and OLS Residuals

Lecture 10: Regression Trees

Statistical Machine Learning

Gerry Hobbs, Department of Statistics, West Virginia University

Data Mining Classification: Decision Trees

Monitoring Structural Change in Dynamic Econometric Models

Data Mining Techniques Chapter 6: Decision Trees

Machine Learning Methods for Causal Effects. Susan Athey, Stanford University Guido Imbens, Stanford University

EXPLORING & MODELING USING INTERACTIVE DECISION TREES IN SAS ENTERPRISE MINER. Copyr i g ht 2013, SAS Ins titut e Inc. All rights res er ve d.

STATISTICA Formula Guide: Logistic Regression. Table of Contents

Data mining techniques: decision trees

Data Mining with R. Decision Trees and Random Forests. Hugh Murrell

Learning Example. Machine learning and our focus. Another Example. An example: data (loan application) The data and the goal

Data Mining Methods: Applications for Institutional Research

Generalized M-Fluctuation Tests for Parameter Instability

Simple Linear Regression Inference

Exchange Rate Regime Analysis for the Chinese Yuan

Benchmarking Open-Source Tree Learners in R/RWeka

Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets

Social Media Mining. Data Mining Essentials

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Location matters. 3 techniques to incorporate geo-spatial effects in one's predictive model

Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris

Statistics Graduate Courses

An Overview and Evaluation of Decision Tree Methodology

Insurance Analytics - analýza dat a prediktivní modelování v pojišťovnictví. Pavel Kříž. Seminář z aktuárských věd MFF 4.

Data Mining - Evaluation of Classifiers

Introduction to General and Generalized Linear Models

Data Mining Algorithms Part 1. Dejan Sarka

Decision-Tree Learning

How To Find Seasonal Differences Between The Two Programs Of The Labor Market

Business Analytics and Credit Scoring

COMP3420: Advanced Databases and Data Mining. Classification and prediction: Introduction and Decision Tree Induction

TACKLING SIMPSON'S PARADOX IN BIG DATA USING CLASSIFICATION & REGRESSION TREES

LOGISTIC REGRESSION. Nitin R Patel. where the dependent variable, y, is binary (for convenience we often code these values as

Principles of Data Mining by Hand&Mannila&Smyth

Data Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression

From the help desk: Bootstrapped standard errors

Classification and Prediction

Penalized Logistic Regression and Classification of Microarray Data

The Data Mining Process

Quantitative Methods for Finance

Data Mining for Knowledge Management. Classification

Data Mining: An Overview. David Madigan

STATISTICA. Clustering Techniques. Case Study: Defining Clusters of Shopping Center Patrons. and

THE HYBRID CART-LOGIT MODEL IN CLASSIFICATION AND DATA MINING. Dan Steinberg and N. Scott Cardell

Efficiency in Software Development Projects

Supervised Learning (Big Data Analytics)

!"!!"#$$%&'()*+$(,%!"#$%$&'()*""%(+,'-*&./#-$&'(-&(0*".$#-$1"(2&."3$'45"

Detection of changes in variance using binary segmentation and optimal partitioning

Data Mining Practical Machine Learning Tools and Techniques

Penalized regression: Introduction

Decision Trees from large Databases: SLIQ

The Design and Analysis of Benchmark Experiments

1 Teaching notes on GMM 1.

R: A Free Software Project in Statistical Computing

Multivariate Logistic Regression

D-optimal plans in observational studies

Designing a learning system

A teaching experience through the development of hypertexts and object oriented software.

Regression Modeling Strategies

Maximally Selected Rank Statistics in R

Online Appendices to the Corporate Propensity to Save

An Overview of Data Mining: Predictive Modeling for IR in the 21 st Century

Model Combination. 24 Novembre 2009

Data mining on the EPIRARE survey data

DATA INTERPRETATION AND STATISTICS

The Artificial Prediction Market

Probabilistic Methods for Time-Series Analysis

Classification and Prediction

REPORT DOCUMENTATION PAGE

Statistical Models in R

Winning the Kaggle Algorithmic Trading Challenge with the Composition of Many Models and Feature Engineering

Introduction to Support Vector Machines. Colin Campbell, Bristol University

A hidden Markov model for criminal behaviour classification

Linear Models and Conjoint Analysis with Nonlinear Spline Transformations

THE RISE OF THE BIG DATA: WHY SHOULD STATISTICIANS EMBRACE COLLABORATIONS WITH COMPUTER SCIENTISTS XIAO CHENG. (Under the Direction of Jeongyoun Ahn)

CREDIT SCORING MODEL APPLICATIONS:

ECLT5810 E-Commerce Data Mining Technique SAS Enterprise Miner -- Regression Model I. Regression Node

Machine Learning and Data Mining. Regression Problem. (adapted from) Prof. Alexander Ihler

SYSTEMS OF REGRESSION EQUATIONS

The Operational Value of Social Media Information. Social Media and Customer Interaction

Chapter 12 Discovering New Knowledge Data Mining

Department of Economics

Machine Learning Methods for Demand Estimation

Bootstrapping Big Data

Big Data: Big N. V.C Note. December 2, 2014

Open-Source Machine Learning: R Meets Weka

Transcription:

Model-Based Recursive Partitioning for Detecting Interaction Effects in Subgroups Achim Zeileis, Torsten Hothorn, Kurt Hornik http://eeecon.uibk.ac.at/~zeileis/

Overview Motivation: Trees, leaves, and branches Model-based recursive partitioning Model estimation Tests for parameter instability Segmentation Pruning Application: Treatment effect for chronic disease Summary

Motivation: Trees Breiman (2001, Statistical Science) distinguishes two cultures of statistical modeling. Data models: Stochastic models, typically parametric. Algorithmic models: Flexible models, data-generating process unknown. Example: Recursive partitioning models dependent variable Y by learning a partition w.r.t explanatory variables Z 1,..., Z l. Key features: Predictive power in nonlinear regression relationships with automatic interaction detection. Interpretability (enhanced by visualization), i.e., no black box methods.

Motivation: Leaves Typically: Simple models for univariate Y, e.g., mean or proportion. Examples: CART and C4.5 in statistical and machine learning, respectively. Problems: For classical tree algorithms. No concept of significance, possibly biased variable selection. No complex (parametric) models in leaves. Many different tree algorithms for different types of data. Here: Synthesis of parametric data models and algorithmic tree models. Fitting local models by partitioning of the sample space. Based on statistical hypothesis tests for parameter instability.

Motivation: Branches Base algorithm: Growth of branches from the roots to the leaves of the tree follows a generic recursive partitioning algorithm. 1 Fit a (possibly very simple) model for the response Y. 2 Assess association of Y and each Z j. 3 Split sample along the Z j with strongest association: Choose breakpoint with highest improvement of the model fit. 4 Repeat steps 1 3 recursively in the subsamples until some stopping criterion is met. Here: Segmentation (3) of parametric models (1) with additive objective function using parameter instability tests (2) and associated statistical significance (4).

Model-based recursive partitioning: Estimation Models: M(Y, θ) with (potentially) multivariate observations Y Y and k-dimensional parameter vector θ Θ. Parameter estimation: θ by optimization of objective function Ψ(Y, θ) for n observations Y i (i = 1,..., n): θ = argmin θ Θ n Ψ(Y i, θ). i=1 Special cases: Maximum likelihood (ML), weighted and ordinary least squares (OLS and WLS), quasi-ml, and other M-estimators. Central limit theorem: If there is a true parameter θ 0 and given certain weak regularity conditions, ˆθ is asymptotically normal with mean θ 0 and sandwich-type covariance.

Model-based recursive partitioning: Estimation Estimating function: θ can also be defined in terms of n ψ(y i, θ) = 0, i=1 where ψ(y, θ) = Ψ(Y, θ)/ θ. Idea: In many situations, a single global model M(Y, θ) that fits all n observations cannot be found. But it might be possible to find a partition w.r.t. the variables Z = (Z 1,..., Z l ) so that a well-fitting model can be found locally in each cell of the partition. Tool: Assess parameter instability w.r.t to partitioning variables Z j Z j (j = 1,..., l).

Model-based recursive partitioning: Tests Generalized M-fluctuation tests capture instabilities in θ for an ordering w.r.t Z j. Basis: Empirical fluctuation process of cumulative deviations w.r.t. to an ordering σ(z ij ). nt W j (t, θ) = V 1/2 n 1/2 ψ(y σ(zij ), θ) (0 t 1) i=1 Functional central limit theorem: Under parameter stability W j (, θ) d W 0 ( ), where W 0 is a k-dimensional Brownian bridge.

Model-based recursive partitioning: Tests Test statistics: Scalar functional λ(w j ) that captures deviations from zero. Null distribution: Asymptotic distribution of λ(w 0 ). Special cases: Class of test encompasses many well-known tests for different classes of models. Certain functionals λ are particularly intuitive for numeric and categorical Z j, respectively. Advantage: Model M(Y, θ) just has to be estimated once. Empirical estimating functions ψ(y i, θ) just have to be re-ordered and aggregated for each Z j.

Model-based recursive partitioning: Tests Splitting numeric variables: Assess instability using suplm statistics. λ suplm (W j ) = max i=i,...,ı ( i n n i ) 1 ( ) i 2 W j. n n 2 Interpretation: Maximization of single shift LM statistics for all conceivable breakpoints in [i, ı]. Limiting distribution: Supremum of a squared, k-dimensional tied-down Bessel process.

Model-based recursive partitioning: Tests Splitting categorical variables: Assess instability using χ 2 statistics. λ χ 2(W j ) = C c=1 n I c ( ) i 2 I c W j n 2 Feature: Invariant for re-ordering of the C categories and the observations within each category. Interpretation: Captures instability for split-up into C categories. Limiting distribution: χ 2 with k (C 1) degrees of freedom.

Model-based recursive partitioning: Segmentation Goal: Split model into b = 1,..., B segments along the partitioning variable Z j associated with the highest parameter instability. Local optimization of b i I b Ψ(Y i, θ b ). B = 2: Exhaustive search of order O(n). B > 2: Exhaustive search is of order O(n B 1 ), but can be replaced by dynamic programming of order O(n 2 ). Different methods (e.g., information criteria) can choose B adaptively. Here: Binary partitioning.

Model-based recursive partitioning: Pruning Pruning: Avoid overfitting. Pre-pruning: Internal stopping criterion. Stop splitting when there is no significant parameter instability. Post-pruning: Grow large tree and prune splits that do not improve the model fit (e.g., via cross-validation or information criteria). Here: Pre-pruning based on Bonferroni-corrected p values of the fluctuation tests.

Application: Treatment effect for chronic disease Task: Identify groups of chronic disease patients with different treatment effects. Source: Anonymized data from consulting project. Model: Logistic regression estimated by maximum likelihood. Response: Improvement (yes/no) of chronic disease for 1354 patients after treatment over several weeks. Regressor: Treatment (active drug/placebo). Partitioning variables: 11 variables that describe disease status of patients. Lower values indicate more severe forms of the disease. Result: Treatment most effective for certain intermediate forms.

Application: Treatment effect for chronic disease 1 risk7 p = 0.004 0 > 0 3 risk5 p = 0.042 3.765 > 3.765 Node 2 (n = 689) 1 Node 4 (n = 527) 1 Node 5 (n = 138) 1 0.8 0.8 0.8 yes no 0.6 0.4 0.2 yes no 0.6 0.4 0.2 yes no 0.6 0.4 0.2 0 0 0 placebo drug placebo drug placebo drug

Application: Treatment effect for chronic disease Model-based recursive partitioning: Coefficient estimates for regressors. Parameter instability tests for partitioning variables (bold = significant at adjusted 5% level, underlined = smallest p value). Regressors Partitioning variables (const.) treatment risk4 risk5 risk7 risk9 risk10... 1 0.99 0.77 13.05 16.01 22.02 1.48 13.03 2 1.15 0.62 12.57 9.64 3.85 0.06 8.94 3 0.82 0.90 6.96 17.01 2.53 2.14 13.84... 4 0.72 1.00 2.84 6.46 2.51 0.50 12.78 5 1.13 0.21 3.96 0.98 3.54 6.10 3.14

Summary Synthesis of parametric data models and algorithmic tree models. Based on modern class of parameter instability tests. Aims to minimize clearly defined objective function by greedy forward search. Can be applied to general class of parametric models: generalized linear models, psychometric models (e.g., Rasch, Bradley-Terry), models for location and scale, etc. Automatic interaction detection of effects in subgroups. Alternative to traditional means of model specification, especially for variables with unknown association. Object-oriented implementation freely available: Extension for new models requires some coding, though.

References Zeileis A, Hothorn T, Hornik K (2008). Model-Based Recursive Partitioning. Journal of Computational and Graphical Statistics, 17(2), 492 514. doi:10.1198/106186008x319331 Zeileis A, Hornik K (2007). Generalized M-Fluctuation Tests for Parameter Instability. Statistica Neerlandica, 61(4), 488 508. doi:10.1111/j.1467-9574.2007.00371.x Hothorn T, Hornik K, Strobl C, Zeileis A (2013). party: A Laboratory for Recursive Partytioning. R package version 1.0-7. http://cran.r-project.org/package=party