Double Robustness in Estimation of Causal Treatment Effects

Similar documents
Simple Regression Theory II 2010 Samuel L. Baker

Covariance and Correlation

Missing Data: Part 1 What to Do? Carol B. Thompson Johns Hopkins Biostatistics Center SON Brown Bag 3/20/13

3. Mathematical Induction

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Introduction to Fixed Effects Methods

Randomized trials versus observational studies

Review Jeopardy. Blue vs. Orange. Review Jeopardy

Econometrics Simple Linear Regression

Multiple Imputation for Missing Data: A Cautionary Tale

Economics Series. A Simple and Successful Shrinkage Method for Weighting Estimators of Treatment Effects

MATH10212 Linear Algebra. Systems of Linear Equations. Definition. An n-dimensional vector is a row or a column of n numbers (or letters): a 1.

Arkansas Tech University MATH 4033: Elementary Modern Algebra Dr. Marcel B. Finan

Normal distribution. ) 2 /2σ. 2π σ

The Basics of Graphical Models

A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS

Cross Validation techniques in R: A brief overview of some methods, packages, and functions for assessing prediction models.

Many research questions in epidemiology are concerned. Estimation of Direct Causal Effects ORIGINAL ARTICLE

CHAPTER 3. Methods of Proofs. 1. Logical Arguments and Formal Proofs

Targeted Learning with Big Data

Post-hoc comparisons & two-way analysis of variance. Two-way ANOVA, II. Post-hoc testing for main effects. Post-hoc testing 9.

MISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group

Elements of statistics (MATH0487-1)

Module 223 Major A: Concepts, methods and design in Epidemiology

Overview. Longitudinal Data Variation and Correlation Different Approaches. Linear Mixed Models Generalized Linear Mixed Models

An Application of the G-formula to Asbestos and Lung Cancer. Stephen R. Cole. Epidemiology, UNC Chapel Hill. Slides:

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus

Testing for Granger causality between stock prices and economic growth

Hypothesis Testing for Beginners

Principle of Data Reduction

Exact Nonparametric Tests for Comparing Means - A Personal Summary

STRUTS: Statistical Rules of Thumb. Seattle, WA

2. Linear regression with multiple regressors

Some probability and statistics

Pigeonhole Principle Solutions

The Bivariate Normal Distribution

Week 3&4: Z tables and the Sampling Distribution of X

Markov Chain Monte Carlo Simulation Made Simple

CURVE FITTING LEAST SQUARES APPROXIMATION

ACTUARIAL NOTATION. i k 1 i k. , (ii) i k 1 d k

Unit 4 The Bernoulli and Binomial Distributions

Sample Size and Power in Clinical Trials

Average Redistributional Effects. IFAI/IZA Conference on Labor Market Policy Evaluation

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

PS 271B: Quantitative Methods II. Lecture Notes

Machine Learning Methods for Causal Effects. Susan Athey, Stanford University Guido Imbens, Stanford University

MARK SCHEME for the October/November 2012 series 9709 MATHEMATICS. 9709/11 Paper 1, maximum raw mark 75

Forecasting in supply chains

Health Policy and Administration PhD Track in Health Services and Policy Research

Pricing I: Linear Demand

INDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition)

Handling missing data in Stata a whirlwind tour

Discrete Mathematics and Probability Theory Fall 2009 Satish Rao, David Tse Note 2

Introduction to Matrix Algebra

Random variables P(X = 3) = P(X = 3) = 1 8, P(X = 1) = P(X = 1) = 3 8.

Linear Classification. Volker Tresp Summer 2015

Victor Shoup Avi Rubin. Abstract

6/15/2005 7:54 PM. Affirmative Action s Affirmative Actions: A Reply to Sander

Fairfield Public Schools

ECON 459 Game Theory. Lecture Notes Auctions. Luca Anderlini Spring 2015

Robust Tree-based Causal Inference for Complex Ad Effectiveness Analysis

CAUSALLY MOTIVATED ATTRIBUTION FOR ONLINE ADVERTISING

Tom wants to find two real numbers, a and b, that have a sum of 10 and have a product of 10. He makes this table.

Solving Linear Systems, Continued and The Inverse of a Matrix

Introduction to. Hypothesis Testing CHAPTER LEARNING OBJECTIVES. 1 Identify the four steps of hypothesis testing.

3. Logical Reasoning in Mathematics

On Marginal Effects in Semiparametric Censored Regression Models

Department of Mathematics, Indian Institute of Technology, Kharagpur Assignment 2-3, Probability and Statistics, March Due:-March 25, 2015.

MULTIPLE REGRESSION WITH CATEGORICAL DATA

1.6 The Order of Operations

The sample space for a pair of die rolls is the set. The sample space for a random number between 0 and 1 is the interval [0, 1].

Chapter 5: Analysis of The National Education Longitudinal Study (NELS:88)

Important Probability Distributions OPRE 6301

II. DISTRIBUTIONS distribution normal distribution. standard scores

The Harvard MPH in Epidemiology Online/On-Campus/In the Field. Two-Year, Part-Time Program. Sample Curriculum Guide

Causal Infraction and Network Marketing - Trends in Data Science

SYSTEMS OF EQUATIONS AND MATRICES WITH THE TI-89. by Joseph Collison

Statistical Machine Learning

Data Mining Practical Machine Learning Tools and Techniques

QUANTITATIVE METHODS BIOLOGY FINAL HONOUR SCHOOL NON-PARAMETRIC TESTS

Missing Data. A Typology Of Missing Data. Missing At Random Or Not Missing At Random

Introduction to General and Generalized Linear Models

MARK SCHEME for the October/November 2012 series 9709 MATHEMATICS. 9709/13 Paper 1, maximum raw mark 75

Chapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 )

arrays C Programming Language - Arrays

BOOLEAN ALGEBRA & LOGIC GATES

Test Bias. As we have seen, psychological tests can be well-conceived and well-constructed, but

Chapter 9 Assessing Studies Based on Multiple Regression

Classification Problems

Elasticity. I. What is Elasticity?

LINEAR EQUATIONS IN TWO VARIABLES

Linear Models for Continuous Data

Notes on Factoring. MA 206 Kurt Bryan

Network Analytics in Marketing

Control Trials. variable Y. Paper Type I - Descriptive. Use estimates from the first two types of. For ex. Some studies find that insurance

A logistic approximation to the cumulative normal distribution

THE SELECTION OF RETURNS FOR AUDIT BY THE IRS. John P. Hiniker, Internal Revenue Service

Transcription:

Double Robustness in Estimation of Causal Treatment Effects Marie Davidian Department of Statistics North Carolina State University http://www.stat.ncsu.edu/ davidian Based in part on Lunceford, J.K. and Davidian, M. (2004). Stratification and weighting via the propensity score in estimation of causal treatment effects: A comparative study. Statistics in Medicine 23:2937 2960 Double Robustness, EPID 369, Spring 2007 1

Outline 1. Introduction 2. Model based on potential outcomes (counterfactuals) 3. Adjustment by regression modeling 4. Adjustment by inverse weighting 5. Doubly-robust estimator 6. Some interesting issues 7. Discussion Double Robustness, EPID 369, Spring 2007 2

1. Introduction Simplest situation: Observational study Point exposure study Continuous or discrete (e.g. binary) outcome Y Treatment (exposure) or Control (no exposure) Objective: Make causal inference on the effect of treatment Would like to be able to say that such an effect is attributable, or caused by treatment Average causal effect based on population mean outcome { mean if entire population exposed mean if entire population not exposed } Double Robustness, EPID 369, Spring 2007 3

Complication: Confounding 1. Introduction The fact that a subject was exposed or not is associated with subject characteristics that may also be associated with his/her potential outcomes under treatment and control To estimate the average causal effect from observational data requires taking appropriate account of this confounding Challenge: Estimate the average causal treatment effect from observational data, adjusting appropriately for confounding Different methods of adjustment are available Any method requires assumptions ; what if some of them are wrong? The property of double robustness offers protection against some particular incorrect assumptions...... and can lead to more precise inferences Double Robustness, EPID 369, Spring 2007 4

2. Counterfactual Model Define: Z X Y = 1 if treatment, = 0 if control vector of pre-exposure covariates observed outcome Z is observed (not assigned) Observed data are i.i.d. copies (Y i, Z i, X i ) for each subject i = 1,..., n Based on these data: Estimate the average causal treatment effect To formalize what we mean by this and what the issues are, appeal to the notion of potential outcomes or counterfactuals Double Robustness, EPID 369, Spring 2007 5

2. Counterfactual Model Counterfactuals: Each subject has potential outcomes (Y 0, Y 1 ) Y 0 Y 1 outcome the subject would have if s/he received control outcome the subject would have if s/he received treatment Average causal treatment effect: The probability distribution of Y 0 represents how outcomes in the population would turn out if everyone received control, with mean E(Y 0 ) (= P (Y 0 = 1) for binary outcome) The probability distribution of Y 1 represents this if everyone received treatment, with mean E(Y 1 ) (= P (Y 1 = 1) for binary outcome) Thus, the average causal treatment effect is = µ 1 µ 0 = E(Y 1 ) E(Y 0 ) Double Robustness, EPID 369, Spring 2007 6

2. Counterfactual Model Problem: Do not see (Y 0, Y 1 ) for all n subjects; instead we only observe Y = Y 1 Z + Y 0 (1 Z) If i was exposed to treatment, Z i = 1 and Y i = Y 1i If i was not exposed (control), Z i = 0 and Y i = Y 0i Challenge: Would like to estimate based on the observed data First, a quick review of some statistical concepts... Double Robustness, EPID 369, Spring 2007 7

2. Counterfactual Model Unconditional (marginal) expectation: Conceptually, the average across all possible values a random variable can take on in the population Statistical independence: For two random variables Y and Z Y and Z are independent if the probabilities with which Y takes on its values are the same regardless of the value Z takes on Notation Y Z Double Robustness, EPID 369, Spring 2007 8

2. Counterfactual Model Conditional expectation: For two random variables Y and Z, E(Y Z = z) is the average across all values of Y for which the corresponding value of Z is equal to z E(Y Z) is a function of Z taking on values E(Y Z = z) as Z takes on values z E{ E(Y Z) } = E(Y ) ( average the averages across all possible values of Z) E{ Y f(z) Z } = f(z)e(y Z) f(z) is constant for any value of Z so factors out of the average If Y Z, then E(Y Z = z) = E(Y ) for any z Double Robustness, EPID 369, Spring 2007 9

2. Counterfactual Model Challenge, again: Based on observed data (Y i, Z i, X i ), i = 1,..., n, estimate = µ 1 µ 0 = E(Y 1 ) E(Y 0 ) Observed sample means: Averages among those observed to receive treatment or control Y (1) = n 1 1 n Z i Y i, Y (0) = n 1 0 n (1 Z i )Y i i=1 i=1 n 1 = n Z i = # subjects observed to receive treatment i=1 n 0 = n (1 Z i ) = # observed to receive control i=1 Double Robustness, EPID 369, Spring 2007 10

2. Counterfactual Model What is being estimated? If we estimate by Y (1) Y (0) Y (1) estimates E(Y Z = 1) = population mean outcome among those observed to receive treatment E(Y Z = 1) = E{Y 1 Z + Y 0 (1 Z) Z = 1} = E(Y 1 Z = 1)...... which is not equal to E(Y 1 ) = mean outcome if entire population received treatment Similarly, Y (0) estimates E(Y Z = 0) = E(Y 0 Z = 0) E(Y 0 ) Thus, what is being estimated in general is E(Y Z = 1) E(Y Z = 0) = E(Y 1 ) E(Y 0 ) Double Robustness, EPID 369, Spring 2007 11

2. Counterfactual Model Exception: Randomized study Treatment is assigned with no regard to how a subject might respond to either treatment or control Formally, treatment received is independent of potential outcome: This means that (Y 0, Y 1 ) Z E(Y Z = 1) = E{ Y 1 Z+Y 0 (1 Z) Z = 1} = E(Y 1 Z = 1) = E(Y 1 ) and similarly E(Y Z = 0) = E(Y 0 ) Y (1) Y (0) is an unbiased estimator for Double Robustness, EPID 369, Spring 2007 12

2. Counterfactual Model In contrast: Observational study Exposure to treatment is not controlled, so exposure may be related to the way a subject might potentially respond: (Y 0, Y 1 ) Z And, indeed, E(Y Z = 1) E(Y 1 ) and E(Y Z = 0) E(Y 0 ), and Y (1) Y (0) is not an unbiased estimator for Double Robustness, EPID 369, Spring 2007 13

2. Counterfactual Model Confounders: It may be possible to identify covariates related to both potential outcome and treatment exposure If X contains all confounders, then among subjects sharing the same X there will be no association between exposure Z and potential outcome (Y 0, Y 1 ), i.e. (Y 0, Y 1 ) and Z are independent conditional on X: (Y 0, Y 1 ) Z X No unmeasured confounders is an unverifiable assumption If we believe no unmeasured confounders, can estimate by appropriate adjustment... Double Robustness, EPID 369, Spring 2007 14

3. Adjustment by Regression Modeling Regression of Y on Z and X: We can identify the regression E(Y Z, X), as this depends on the observed data E.g., for continuous outcome, E(Y Z, X) = α 0 + α Z Z + X T α X In general, E(Y Z = 1, X) is the regression among treated, E(Y Z = 0, X) among control Usefulness: Averaging over all possible values of X (both treatments) E{ E(Y Z = 1, X) } = E{ E(Y 1 Z = 1, X) } = E{ E(Y 1 X) } = E(Y 1 ) and similarly E{ E(Y Z = 0, X) } = E(Y 0 ) Double Robustness, EPID 369, Spring 2007 15

3. Adjustment by Regression Modeling Thus: Under no unmeasured confounders = E(Y 1 ) E(Y 0 ) = E{ E(Y Z = 1, X) } E{ E(Y Z = 0, X) } = E{ E(Y Z = 1, X) E(Y Z = 0, X) } Suggests postulating a model for the outcome regression E(Y Z, X), fitting the model, and then and averaging the resulting estimates of E(Y Z = 1, X) E(Y Z = 0, X) over all observed X (both groups) to estimate Double Robustness, EPID 369, Spring 2007 16

3. Adjustment by Regression Modeling Example continuous outcome: Suppose the true regression is E(Y Z, X) = α 0 + α Z Z + X T α X If this really is the true outcome regression model, then E(Y Z = 1, X) E(Y Z = 0, X) = α 0 + α Z (1) + X T α X α 0 α Z (0) X T α X = α Z So = E{E(Y Z = 1, X) E(Y Z = 0, X)} = α Z Can thus estimate directly from fitting this model (e.g. by least squares); don t even need to average! = α Z Double Robustness, EPID 369, Spring 2007 17

3. Adjustment by Regression Modeling Example binary outcome: Suppose the true regression is E(Y Z, X) = exp(α 0 + α Z Z + X T α X ) 1 + exp(α 0 + α Z Z + X T α X ) If this really is the true outcome regression model, then E(Y Z = 1, X) E(Y Z = 0, X) = exp(α 0 + α Z + X T α X ) 1 + exp(α 0 + α Z + X T α X ) exp(α 0 + X T α X ) 1 + exp(α 0 + X T α X ) Logistic regression yields ( α 0, α Z, α X ) Estimate by averaging over all observed X i { } n = n 1 exp( α 0 + α Z + X T i α X ) 1 + exp( α 0 + α Z + X T i α X ) exp( α 0 + X T i α X ) 1 + exp( α 0 + X T i α X ) i=1 Double Robustness, EPID 369, Spring 2007 18

3. Adjustment by Regression Modeling Critical: For the argument on slide 16 to go through, E(Y Z, X) must be the true regression of Y on Z and X Thus, if we substitute estimates for E(Y Z = 1, X) and E(Y Z = 0, X) based on a postulated outcome regression model, this postulated model must be identical to the true regression If not, average of the difference will not necessarily estimate Result: Estimator for obtained from regression adjustment will be biased (inconsistent ) if the regression model used is incorrectly specified! Moral: Estimation of via regression modeling requires that the postulated regression model is correct Double Robustness, EPID 369, Spring 2007 19

4. Adjustment by Inverse Weighting Propensity score: Probability of treatment given covariates e(x) = P (Z = 1 X) = E{I(Z = 1) X} = E(Z X) X Z e(x) Under no unmeasured confounders, (Y 0, Y 1 ) Z e(x) Customary to estimate by postulating and fitting a logistic regression model, e.g. P (Z = 1 X) = e(x, β) = exp(β 0 + X T β 1 ) 1 + exp(β 0 + X T β 1 ) e(x, β) = e(x, β) Double Robustness, EPID 369, Spring 2007 20

4. Adjustment by Inverse Weighting One idea: Rather than use the difference of simple averages Y (1) Y (0), estimate by the difference of inverse propensity score weighted averages, e.g., IP W,1 = n 1 n i=1 Z i Y i n 1 e(x i, β) n i=1 (1 Z i )Y i 1 e(x i, β) Interpretation : Inverse weighting creates a pseudo-population in which there is no confounding, so that the weighted averages reflect averages in the true population Why does this work? Consider n 1 n i=1 Z i Y i e(x i, β) By the law of large numbers, this should estimate the mean of a term in the sum with β replaced by the quantity it estimates Double Robustness, EPID 369, Spring 2007 21

4. Adjustment by Inverse Weighting If: e(x, β) = e(x), the true propensity score { } { } [ { }] ZY ZY1 ZY1 E = E = E E e(x) e(x) e(x) Y 1, X (1) { } { } Y1 = E e(x) E(Z Y Y1 1, X) = E e(x) E(Z X) (2) { } Y1 = E e(x) e(x) = E(Y 1 ) (3) (1) follows because ZY = Z{Y 1 Z + Y 0 (1 Z)} = Z 2 Y 1 + Z(1 Z)Y 0 and Z 2 = Z and Z(1 Z) = 0 (binary ) (2) follows because (Y 0, Y 1 ) Z X (no unmeasured confounders) (3) follows because e(x) = E(Z X) Similarly: E { } (1 Z)Y 1 e(x) = E(Y 0 ) Double Robustness, EPID 369, Spring 2007 22

4. Adjustment by Inverse Weighting Critical: For the argument on slide 22 to go through, e(x) must be the true propensity score Thus, if we substitute estimates for e(x) based on a postulated propensity score model, this postulated model must be identical to the true propensity score If not, IP W,1 will not necessarily estimate Moral: Estimation of via inverse weighted requires that the postulated propensity score model is correct Double Robustness, EPID 369, Spring 2007 23

5. Doubly Robust Estimator Recap: = E(Y 1 ) E(Y 0 ) Estimator for based on regression modeling requires correct postulated regression model Estimator for based on inverse propensity score weighting requires correct postulated propensity model Modified estimator: Combine both approaches in a fortuitous way Double Robustness, EPID 369, Spring 2007 24

5. Doubly Robust Estimator Modified estimator: [ n DR = n 1 Z i Y i i=1 e(x i, β) {Z i e(x i, β)} ] e(x i, β) m 1 (X i, α 1 ) [ n n 1 (1 Z i )Y i 1 e(x i, β) + {Z i e(x i, β)} ] 1 e(x i, β) m 0 (X i, α 0 ) i=1 = µ 1,DR µ 0,DR e(x, β) is a postulated model for the true propensity score e(x) = E(Z X) (fitted by logistic regression ) m 0 (X, α 0 ) and m 1 (X, α 1 ) are postulated models for the true regressions E(Y Z = 0, X) and E(Y Z = 1, X) (fitted by least squares ) Double Robustness, EPID 369, Spring 2007 25

5. Doubly Robust Estimator Modified estimator: [ n DR = n 1 Z i Y i i=1 e(x i, β) {Z i e(x i, β)} ] e(x i, β) m 1 (X i, α 1 ) [ n n 1 (1 Z i )Y i 1 e(x i, β) + {Z i e(x i, β)} ] 1 e(x i, β) m 0 (X i, α 0 ) i=1 = µ 1,DR µ 0,DR µ 1,DR (and µ 0,DR and hence DR ) may be viewed as taking the inverse weighted estimator and augmenting it by a second term What does this estimate? Consider µ 1,DR ( µ 0,DR similar) Double Robustness, EPID 369, Spring 2007 26

5. Doubly Robust Estimator [ n µ 1,DR = n 1 Z i Y i e(x i, β) {Z i e(x i, β)} ] e(x i, β) m 1 (X i, α 1 ) i=1 By the law of large numbers, µ 1,DR estimates the mean of a term in the sum with with β and α 1 replaced by the quantities they estimate That is, µ 1,DR estimates E ] m 1 (X, α 1 ) [ ZY {Z e(x, β)} e(x, β) e(x, β) [ ZY1 {Z e(x, β)} = E e(x, β) e(x, β) [ {Z e(x, β)} = E Y 1 + ] m 1 (X, α 1 ) ] {Y 1 m 1 (X, α 1 )} (by algebra ) e(x, β) [ ] {Z e(x, β)} = E(Y 1 ) + E {Y 1 m 1 (X, α 1 )} e(x, β) Double Robustness, EPID 369, Spring 2007 27

5. Doubly Robust Estimator Thus: µ 1,DR estimates [ {Z e(x, β)} E(Y 1 ) + E e(x, β) ] {Y 1 m 1 (X, α 1 )} (1) for any general functions of X e(x, β) and m 1 (X, α 1 ) (that may or may not be equal to the true propensity score or true regression) Thus, for µ 1,DR to estimate E(Y 1 ), the second term in (1) must = 0! When does the second term = 0? Double Robustness, EPID 369, Spring 2007 28

5. Doubly Robust Estimator Scenario 1: Postulated propensity score model e(x, β) is correct, but postulated regression model m 1 (X, α 1 ) is not, i.e., e(x, β) = e(x) = E(Z X) ( = E(Z Y 1, X) by no unmeasured confounders ) m 1 (X, α 1 ) E(Y Z = 1, X) Is the second term = 0 under these conditions? Double Robustness, EPID 369, Spring 2007 29

5. Doubly Robust Estimator E [ ] {Z e(x)} {Y 1 m 1 (X, α 1 )} e(x) ( [ ]) {Z e(x)} = E E {Y 1 m 1 (X, α 1 )} e(x) Y 1, X ( [ ]) {Z e(x)} = E {Y 1 m 1 (X, α 1 )} E e(x) Y 1, X ( = E {Y 1 m 1 (X, α 1 )} {E(Z Y ) 1, X) e(x)} e(x) ( ) {E(Z X) e(x)} = E {Y 1 m 1 (X, α 1 )} e(x) ( ) {e(x) e(x)} = E {Y 1 m 1 (X, α 1 )} = 0 e(x) (1) uses no unmeasured confounders (1) Double Robustness, EPID 369, Spring 2007 30

5. Doubly Robust Estimator Result: As long as the propensity score model is correct, even if the postulated regression model is incorrect µ 1,DR estimates E(Y 1 ) Similarly, µ 0,DR estimates E(Y 0 ) And hence DR estimates! Double Robustness, EPID 369, Spring 2007 31

5. Doubly Robust Estimator Scenario 2: Postulated regression model m 1 (X, α 1 ) is correct, but postulated propensity score model e(x, β) is not e(x, β) e(x) = E(Z X) m 1 (X, α 1 ) = E(Y Z = 1, X) (= E(Y 1 X) by no unmeasured confounders ) Is the second term = 0 under these conditions? Double Robustness, EPID 369, Spring 2007 32

5. Doubly Robust Estimator E [ ] {Z e(x, β)} {Y 1 E(Y Z = 1, X)} e(x, β) ([ ]) {Z e(x, β)} = E {Y 1 E(Y Z = 1, X)} e(x, β) Z, X ( ) {Z e(x, β)} = E E [{Y 1 E(Y Z = 1, X)} Z, X] e(x, β) ( ) {Z e(x, β)} = E {E(Y 1 Z, X) E(Y Z = 1, X)} e(x, β) ( ) {Z e(x, β)} = E {E(Y 1 X) E(Y 1 X)} = 0 (1) e(x, β) (1) uses no unmeasured confounders, which says E(Y Z = 1, X) = E(Y 1 Z = 1, X) = E(Y 1 X) = E(Y 1 Z, X) Double Robustness, EPID 369, Spring 2007 33

5. Doubly Robust Estimator Result: As long as the regression model is correct, even if the postulated propensity model is incorrect µ 1,DR estimates E(Y 1 ) Similarly, µ 0,DR estimates E(Y 0 ) And hence DR estimates! Obviously: From these calculations if both models are correct, DR estimates! Of course, if both are incorrect, DR does not estimate (not consistent ) Double Robustness, EPID 369, Spring 2007 34

5. Doubly Robust Estimator Summary: If The regression model is incorrect but the propensity model is correct OR The propensity model is incorrect but the regression model is correct then DR is a (consistent ) estimator for! Definition: This property is referred to as double robustness Double Robustness, EPID 369, Spring 2007 35

5. Doubly Robust Estimator Remarks: The doubly robust estimator Offers protection against mismodeling If e(x) is modeled correctly, will have smaller variance than the simple inverse weighted estimator (in large samples ) If E(Y Z, X) is modeled correctly, may have larger variance (in large samples ) than the regression estimator...... but gives protection in the event it is not Double Robustness, EPID 369, Spring 2007 36

6. Some Interesting Issues Issue 1: How do we get standard errors for DR? One way: use standard large sample theory, which leads to the so-called sandwich estimator n n 2 i=1 Î 2 i Î i = Z iy i e(x i, β) {Z i e(x i, β)} e(x i, β) m 1 (X i, α 1 ) [ (1 Z i )Y i 1 e(x i, β) + {Z i e(x i, β)} ] 1 e(x i, β) m 0 (X i, α 0 ) Use the bootstrap (i.e., resample B data sets of size n) DR Double Robustness, EPID 369, Spring 2007 37

6. Some Interesting Issues Issue 2: Computation? Is there software? A SAS procedure is coming soon... In many situations, it is possible to find DR by fitting a single regression model for E(Y Z, X) that includes as covariates. Z e(x, β) and (1 Z) {1 e(x, β)} See Bang, H. and Robins, J. M. (2005). Doubly robust estimation in missing data and causal inference models. Biometrics 61, 962 972. Double Robustness, EPID 369, Spring 2007 38

6. Some Interesting Issues Issue 3: How to select elements of X to include in the models? For the inverse weighted estimators: Variables unrelated to exposure but related to outcome should always be included in the propensity score model = increased precision Variable related to exposure but unrelated to outcome can be omitted = decreased precision See Brookhart, M. A. et al. (2006). Variable selection for propensity score models. American Journal of Epidemiology 163, 1149 1156. Best way to select for DR estimation is an open problem See Brookhart, M. A. and van der Laan, M. J. (2006). A semiparametric model selection criterion with applications to the marginal structural model. Computational Statistics and Data Analysis 50, 475 498. Double Robustness, EPID 369, Spring 2007 39

6. Some Interesting Issues Issue 4: Variants on doubly robust estimators? See Tan, Z. (2006) A distributional approach for causal inference using propensity scores. Journal of the American Statistical Association 101, 1619 1637. Double Robustness, EPID 369, Spring 2007 40

6. Some Interesting Issues Issue 5: Connection to missing data problems (Y 0, Y 1, Z, X) are the full data we wish we could see, but we only observe Y = Y 1 Z + Y 0 (1 Z) Z = 1 means Y 1 is observed but Y 0 is missing ; happens with probability e(x); vice versa for Z = 0 Missing data theory of Robins, Rotnitzky, and Zhao (1994) applies and leads to the doubly robust estimator The theory shows that the doubly robust estimator with all of e(x), m 0 (X, α 0 ), m 1 (X, α 1 ) correctly specified has smallest variance among all estimators that require one to model the propensity score correctly (but make no further assumptions about anything) Double Robustness, EPID 369, Spring 2007 41

7. Discussion Regression modeling and inverse propensity score weighting are two popular approaches when one is willing to assume no unmeasured confounders The double robust estimator combines both and offers protection against mismodeling Offers gains in precision of estimation over simple inverse weighting May not be as precise as regression modeling when the regression is correctly modeled, but adds protection, and modifications are available Doubly robust estimators are also available for more complicated problems Double Robustness, EPID 369, Spring 2007 42