The three statistics of interest comparing the exposed to the not exposed are:

Size: px

Start display at page:

Download "The three statistics of interest comparing the exposed to the not exposed are:"

Virgil Davidson
7 years ago
Views:

1 Review: Common notation for a x table: Not Exposed Exposed Disease a b a+b No Disease c d c+d a+c b+d N Warning: Rosner presents the transposed table so b and c have their meanings reversed in the text. where OR= /(- )/ /(- )=ad/bc Here =P(disease exposed) and =P(disease not exposed) or =P(exposed disease) and =P(exposed not diseased) The three statistics of interest comparing the exposed to the not exposed are: Risk Difference = RD = a/(a+c) b/(b+d) Risk Ratio = RR = (a/(a+c)) / (b/(b+d)) Odds Ratio = OR = ad/bc Author: Blume, Greevy BIOS 3 Page of 4

where OR= /(- )/ /(- )=ad/bc Here =P(disease exposed) and =P(disease not exposed) or =P(exposed disease) and =P(exposed not diseased) The

2 Confidence Intervals for RD, RR, OR The interval for RD we ve studied extensively coming up with an exact interval, an asymptotically normal (Wald) interval, and a modified Wald (Wilson) interval. For the RR, we need some calculus. Delta Method: Var[f(X)] (f (X)) * Var[X]. Take f = log(), and let p =a/(a+c) and p =b/(b+d) For simplicity write p =a/n and p =b/n Var[ log(p /p ) ] = Var[ log(p ) - log(p ) ] = Var[ log(p ) ] + Var[ log(p ) ] by independence /p Var(p ) + /p Var(p ) by Delta Method = p *(- p )/(n *p ) + p *(- p )/(n *p ) = (- p )/(n *p ) + (- p )/(n *p ) (c/n )/(n *a/n ) + (d/n )/(n *b/n ) by substituting our point estimates for p and p = c/(a*n ) + d/(b*n ) (-α)00% CI give n and n are sufficiently large: exp{ log(p /p ) + Z -α/ * sqrt(c/(a*n ) + d/(b*n )) } Author: Blume, Greevy BIOS 3 Page of 4

Take f = log(), and let p =a/(a+c) and p =b/(b+d) For simplicity write p =a/n and p =b/n Var[ log(p /p ) ] = Var[ log(p ) - log(p ) ] = Var[ log(p ) ] + Var[ log(p ) ] by independence /p Var(p ) + /p

3 Author: Blume, Greevy BIOS 3 Page 3 of 4 Typically we test H 0 : = (or H 0 : OR=) with a Chisquare statistic, given by: ˆ ˆ d b c a d c d b c a b a bc ad N E E O p p i i i i Additionally, we can estimate the Log odds ratio with a (-α)00% CI of the form: d c b a or Var or Var Z or ˆ log where ˆ log ˆ log / To get the confidence interval for the Odds Ratio, you need to exponentiate the ends of the CI for the Log Odds ratio: Use (e l, e u ) where d c b a Z or u d c b a Z or l ˆ log ˆ log / / Remember that this exponentiated confidence interval is not symmetric about the Odds ratio!

ˆ log ˆ log / To get the confidence interval for the Odds Ratio, you need to exponentiate the ends of the CI for the Log Odds ratio: Use (e l, e u

4 Combining x tables The Mantel-Haenszel method is an alternative to pooling data from multiple x tables. The primary motivation is to avoid Simpson s Paradox problems. The basic idea is this: Instead of pooling the data and then computing the pooled OR, we will compute a weighted average of the odds ratios from each x table. Suppose we have i=,,g x tables (or strata): Exposed Not Exposed Disease a i b i a i +b i No Disease c i d i c i +d i a i +c i b i +d i N i The odds ratio for the i th strata is OR i = a i d i /b i c i and the simple summary OR is OR pooled = a i d i / b i c i. Let w i be the inverse of the variance of the strata specific log-odds ratio: w i = /Var[log(OR i)] or w i = [/a i +/b i +/c i +/d i ] - Author: Blume, Greevy BIOS 3 Page 4 of 4

Suppose we have i=,,g x tables (or strata): Exposed Not Exposed Disease a i b i a i +b i No Disease c i d i c i +d i a i +c i b i +d i N i The odds ratio for the i th strata is OR i = a i d

5 The basic idea of is to get a weighted average of the strata-specific odds ratios. This can be done on the log scale using weights that are inversely proportional to the variance of the strata specific estimate or it can be done on the correct scale using other weights. Important side note: Statisticians love to take weighted averages! But even more than that, they love to set the weights to be inversely proportional to the variance. Why? Because then estimates with a larger variance (less precise, less information) gets less weight and estimates with a smaller variance (more precise, more information) get more weight. Clever statisticians have proven that using weights that are inversely proportional to the stratumspecific variances will always yield the weighted average with the smallest variance. (WOW!) Author: Blume, Greevy BIOS 3 Page 5 of 4

Important side note: Statisticians love to take weighted averages! But even more than that, they love to set the weights to be inversely proportional to the variance. Why?

6 There are many different ways to estimate the common odds ratio underlying different x tables. (Notice the implicit assumption here: There must be a common OR or else there is no reason to combine the data in the first place.) One natural estimate of a common Odds Ratio is OR exp = exp{ w i log(or i )/ w i } where w i = [/a i +/b i +/c i +/d i ] - The Mantel-Haenszel estimate of the common OR is OR mh = w i OR i / w i where w i = b i c i /N i Important side note (re-scaling the weights): Notice that we divide by w i. We do this because the weights to not sum to one, i.e. w + + w g. We have that w + + w g = w i and (w + + w g )/ w i = So therefore w / w i + + w g / w i = which we re-write as w * + + w * g = where w * = w / w i Author: Blume, Greevy BIOS 3 Page 6 of 4

) One natural estimate of a common Odds Ratio is OR exp = exp{ w i log(or i )/ w i } where w i = [/a i +/b i +/c i +/d i ] - The Mantel-Haenszel estimate of the common OR is OR mh = w i OR i /

7 Alternate forms of the above ORs are OR exp = exp{ w * i log(or i ) } where w * i = w i / w i w i = [/a i +/b i +/c i +/d i ] - OR mh = w * i OR i where w * i = w i / w i w i = b i c i /N i The point of re-writing these estimates using weights that add to one, is to show that these estimates are subject to the same advantages and disadvantages of weighted averages just like those we explored in the context of crude and adjusted rates. But here, instead of choosing weights according to the population distribution of interest, we choose weights according to some statistical criteria. The advantage is often statistical. For example, OR exp yields the combined estimate with the smallest possible variance and OR mh works even when there are zero cells. The disadvantage is that we can never compare two weighted odds ratios because their weights will be different (e.g., you can not compare the common OR over age between males and females because the weights are different for each gender). Author: Blume, Greevy BIOS 3 Page 7 of 4

the context of crude and adjusted rates. But here, instead of choosing weights according to the population distribution of interest, we choose weights according to some statistical criteria.

8 Recall our Simpson s Paradox Scenario Males Smoking cancer Yes No Yes No OR(males) = (9x43)/(6x5) = Females Smoking cancer Yes No Yes 4 7 No OR(females) = (4x)/(9x7) =.6358 combined Smoking cancer Yes No Yes No OR(combined) = (3x55)/(58x5) = 0.87 Author: Blume, Greevy BIOS 3 Page 8 of 4

64706 Females Smoking cancer Yes No Yes 4 7 No 9 3 33 9 5 OR(females) = (4x)/(9x7) =.

9 The first proposed estimate for the common OR is OR exp = exp{ w i log(or i )/ w i } where w i = [/a i +/b i +/c i +/d i ] - = exp{ [ /(/9+/5+/6+/43) * log[(9*43)/(6*5)] + /(/4+/7+/9+/) * log[(4*)/(9*7)] ] / [ /(/9+/5+/6+/43) + /(/4+/7+/9+/) ] } = exp{ } = The Mantel-Haenszel estimate of the common OR is OR mh = w i OR i / w i where w i = b i c i /N i = [ (5*6/09) * (9*43)/(6*5) + (7*9/5) * (4*)/(9*7) ] / [ (5*6/09) + (7*9/5) ] = Notice that both estimators prevent Simpson s paradox. Author: Blume, Greevy BIOS 3 Page 9 of 4

63966 The Mantel-Haenszel estimate of the common OR is OR mh = w i OR i / w i where w i = b i c i /N i = [ (5*6/09) * (9*43)/(6*5) +

10 Keep in mind that OR exp and OR mh will yield different estimates for the same (unknown) common OR. Generally these differences are exacerbated when some cells of the x table are very small or zero. One work around is to add ½ to each cell (so-called `continuity correction ). Generally we use OR exp to verify that the there is indeed one common underlying OR. This is called the Test of Homogeneity. If indeed the strata are judged to be homogeneous by the above test, then it is sometimes common to estimate the common OR using Mantel-Haenszel s OR mh and to check if OR mh = statistically. This is done using a Test of Association. These two steps combined is called the Mantel- Haenszel method. Author: Blume, Greevy BIOS 3 Page 0 of 4

Generally we use OR exp to verify that the there is indeed one common underlying OR. This is called the Test of Homogeneity.

11 Test of Homogeneity The test of Homogeneity is based on OR exp where OR exp = exp{ w i log(or i )/ w i } and w i = [/a i +/b i +/c i +/d i ] - A (- α)00% CI for OR exp is given by (e l, e u ) where l log OR exp Z / w i u log OR exp Z / w i The corresponding Hypothesis Test has the null hypothesis that H 0 : OR =... =OR g and the test statistic is χ = w i [ log(or i ) - log(or exp ) ] which is approximately distributed as Chi-square with g- df. Author: Blume, Greevy BIOS 3 Page of 4

The corresponding Hypothesis Test has the null hypothesis that H 0 : OR =.

12 Test of Association The test of association is based on OR mh where OR mh = w i OR i / w i where w i = b i c i /N i The null hypothesis that H 0 : OR mh = and the test statistic is χ = { a i - E[a i ] } / Var i ~ Chi-square, df. Where E[a i ] =(a i +b i )(a i +c i )/N i and Var i =(a i +b i )(a i +c i )(b i +d i )(d i +c i )/N i (N i -) The variance of OR mh is too ugly to be put to paper. We get a (- α)00% CI for OR mh by using the set of null hypothesis that do not reject at the α-level. This makes use of the duality of hypothesis testing and confidence intervals (sometimes it is called inverting the test), and this process can be expressed analytically: Z OR mh, OR / Z / mh Author: Blume, Greevy BIOS 3 Page of 4

Where E[a i ] =(a i +b i )(a i +c i )/N i and Var i =(a i +b i )(a i +c i )(b i +d i )(d i +c i )/N i (N i -) The variance of OR mh is too ugly to be put to paper.

13 Final thoughts Tests of Homogeneity and Association are only used when the strata specific ORs all indicate associations in the same direction (i.e., all greater than one or less than one). If this not the case, the contribution of some tables can cancel contributions from others and make it appear there is no association when there is. Males Exposure Disease Yes No Yes No OR(males)=(5x95)/(5x85)=3.353 Females Exposure Disease Yes No Yes No OR(Females)=(5x85)/(5x95)=0.98 Author: Blume, Greevy BIOS 3 Page 3 of 4

If this not the case, the contribution of some tables can cancel contributions from others and make it appear there is no association when

14 Both Exposure Disease Yes No Yes No OR pooled =(80x0)/(80x0)= OR mh = w i OR i / w i where w i = b i c i /N i OR mh = {85(5)/00 (3.353) + (5)95/00 (0.98)} / { 85(5)/00+(5)95/00 } = In this case the test statistic is zero (p-value=). χ = { a i - E[a i ] } / Var i =0 But a i =0 and E[a i ]=00(0)/00+(0)00/00=0 So a i - E[a i ] = 0 and χ =0 in this case. Thus, we would fail to reject that males and females have the same OR; we fail to reject that the association between disease and exposure does not exist. Author: Blume, Greevy BIOS 3 Page 4 of 4

χ = { a i - E[a i ] } / Var i =0 But a i =0 and E[a i ]=00(0)/00+(0)00/00=0 So a i - E[a i ] = 0 and χ =0 in this case.

Lecture 19: Conditional Logistic Regression

Lecture 19: Conditional Logistic Regression Dipankar Bandyopadhyay, Ph.D. BMTRY 711: Analysis of Categorical Data Spring 2011 Division of Biostatistics and Epidemiology Medical University of South Carolina