COMMON METHODOLOGICAL ISSUES FOR CER IN BIG DATA Harvard Medical School and Harvard School of Public Health sharon@hcp.med.harvard.edu December 2013 1 / 16
OUTLINE UNCERTAINTY AND SELECTIVE INFERENCE 1 2 Methodological 3 Concluding 2 / 16
TRANSRADIAL VS TRANSFEMORAL PCI CONTEXT MASSACHUSETTS Radial artery access permits easier access and easier closure Large number of patients undergoing both procedures Not particularly well studied and of growing importance in the US Marked heterogeneity in predisposition to bleeding Significant treatment selection (healthier patients undergo transradial procedures) Radial Artery Access & Complications (%) 10/2008 4/2009 10/2009 4/2010 10/2009 4/2011 28 Treatment = Radial Artery Access (vs Femoral) 26 Outcome = Bleeding/Vascular Complication 24 22 20 18 16 14 12 10 8 6 4 2 0 1 2 3 4 5 6 7 8 9 10 11 12 Quarter 130,000 PCIs in MA adults 3 / 16
TRANSRADIAL VS TRANSFEMORAL PCI Does radial artery access cause fewer complications compared to femoral artery access? If so, then shorter LOS and money is saved; patients ambulatory quicker Large data registry containing detailed clinical information on patients undergoing PCI More than 300 variables measured on each person Gets larger when considering treatment specific information (multiple lesions) Introduces selective inference issues Drawing inference on a selected subset of the parameters, a subset that is selected because the parameters within seem interesting after viewing the data 4 / 16
SELECTIVE INFERENCE An old issue becoming a big problem because: Better data acquisition technologies More interconnectivity Increasing focus on use of observational databases for comparing the effectiveness of treatments More perspectives: Payer: Coverage with Evidence Development Patient: Services that are high value for some may be low value for others (e.g., STEMI versus NSTEMI patients) Health care provider: Adoption of value-enhancing technologies Two issues: Uncertainty - which is the correct model? Bias - causal parameters 5 / 16
SELECTIVE INFERENCE Many decisions required: Select outcome(s) Defining treatments Identify confounders Inclusion/exclusion criteria for subjects Causal framework I will focus on confounders and causal framework 6 / 16
MOST COMMONLY EMPLOYED APPROACH 1: Methods that limit number of confounders based on perceived clinical relevance and estimate a single model Identify confounders based on statistical testing and conduct inferences using the identified confounders More than one model may fit the data well All Subjects Intervention Radial Femoral 5192 35022 No. of Procedures Mean Age [SD] 63 [12] 65 [12] Female 25.3 29.8 Race White 89.6 89.4 Black 3.3 3.2 Hispanic 4.3 3.5 Asian 1.8 1.7 Native American 0.02 0.07 Other 1.0 2.2 Health Insurance Government 46.0 50.3 Commercial 4.8 13.4 Other 49.2 36.3 Comorbidities Diabetes 33.1 32.7 Prior CHF 9.4 12.7 Prior PCI 32.0 34.3 Prior myocardial 28.7 30.1 infarction (MI) Prior bypass surgery 8.4 15.7 Hypertension 79.6 80.7 Peripheral vascular 12.1 12.8 disease Smoker 24.8 23.1 Lung disease 13.7 14.4 All Subjects Intervention Radial Femoral 5192 35022 No. of Procedures Cardiac Presentation Multi-vessel Disease 10.3 10.9 Number of Vessels > 1.49 1.58 70% stenosis Left main Disease 3.7 7.2 ST-elevated MI 38.9 42.6 Shock 0.44 1.8 Drugs Prior to Procedure Heparin (unfractionated) 87.3 61.7 Heparin (low weight 3.83 4.27 molecular) Thrombin 25.5 54.9 G2B3A inhibitors 26.7 26.8 Platelet Aggregate 85.8 86.6 inhibitors Intra-Aortic Balloon Pump 0.10 0.55 In-Hospital Complication, % 0.69 2.73 Mean Difference, % (95% CI) -2.04 (-2.30, -1.80) 7 / 16
BIG DATA SETTING Methods that limit number of confounders based on perceived clinical relevance and estimate a single model Main problems: Exact confounders required to satisfy no unmeasured confounding assumptions are rarely known Subgroups exhibiting heterogeneous treatment effects are rarely known Increasing uncertainty amid the availability of high-dimensional covariate information How to reduce the dimension of the problem? 8 / 16
DIMENSION REDUCTION TECHNIQUES 2a: Methods relying upon sparseness only a small number of variables are required to parsimoniously represent the underlying data structure where β is of low dimension Y i = X iβ + ɛ i Main idea: assume many model parameters are 0 by imposing a penalty on including too many variables Tools: penalized least squares; least absolute shrinkage and selection (LASSO) methods; and sparse additive models No special attention to causality 9 / 16
DIMENSION REDUCTION TECHNIQUES 2b: Methods relying upon denseness shrink estimates to a common mean and permit a small number of variables to have distinct coefficients Tool: Kernel Regularized least squares 2c: Methods relying upon both denseness and sparseness shrink estimates to a common mean and to zero so that there are two penalty terms Tool: Elastic Net No special attention to causality 10 / 16
DIMENSION REDUCTION TECHNIQUES 3: Model averaging approaches p( ) = M p( M k )p(m k ) m=1 M k indexes model and a parameter of interest = bleeding risk in radial artery access patients bleeding risk in femoral artery access patients M 1 may be a polynomial regression model; M 2 a logistic model with many interactions, etc Difficult to define the space of models over which to average No real link to causality in development 11 / 16
DIMENSION REDUCTION TECHNIQUES Estimate the treatment assignment model (propensity score) & the outcome model simultaneously then average More in line with causal thinking Y = observed outcome; X observed confounders; T binary treatment (1 = new; 0 = standard) Assume you have all the measured confounders logitp (T i = 1) = γ 0 + p ( ) α X j γj X ij j=1 Y i = β αy 0 + β αy T T i + p j=1 ( α Y j ) β α Y j X ij + ɛ Y i α Y j and α X j = inclusion probabilities Confounders: those with large values of both α Y j and α X j 12 / 16
GENERAL IDEA BUT Model Averaging However Little evidence of use in clinical and policy literature since its introduction in late 1990s Major paradigm shift if adopted for causal inference Meta-analysis acknowledged as providing valid evidence of treatment effectiveness Approach is transparent A solution in presence of high dimensional data 13 / 16
OBSERVATIONS 1 Plenty of methodology being developed for BIG DATA Need a focus on causal rather than predictive inference 2 Causal inference for CER has constraints different from predictive inference No unmeasured confounder assumption Subjects have a chance of getting the treatment Treatment groups are balanced in terms of observables Constant or non-constant treatment effect 3 Non-parametric approach for outcome equation may be more robust 14 / 16
OBSERVATIONS Compared to transfemoral artery access, transradial access causes: 1.58% (1.12, 2.05) absolute reduction in complications (regression adjusted using perceived clinical importance) 1.40% (0.90, 1.80) (propensity score matched) 2.56% (0.35, 4.75) (2SLS approach) 15 / 16
SOME RECENT RECOMMENDATIONS 16 / 16