# Chapter 18. Effect modification and interactions Modeling effect modification

Save this PDF as:

Size: px
Start display at page:

## Transcription

1 Chapter 18 Effect modification and interactions 18.1 Modeling effect modification weight male female male female dose dose Figure 18.1: Data from an experimental study investigating the effectofthevitaminbintakeonthe weight of mice. Figure 18.2: The data of Figure 18.1 with regression lines for male and female mice. In some applications one is interested in the question how one covariate may affect the effect of another covariate. Figure 18.1 illustrates a typical situation of this type, in which the effect ofthevitaminbintakeontheweightofmiceseemstodifferbetweenmaleandfemalemice.if wefitregressionlinesseparatelyformaleandfemalemice(figure18.2),itlookslikethatthe vitaminbhasonlyasmalleffectinfemalemice,butalargeeffectinmalemice. Fitting two regression lines, however, does not directly allow to compare the two lines in the sense of assessing the statistical significance of the difference between the two lines. This can 247

2 248 CHAPTER 18. EFFECT MODIFICATION AND INTERACTIONS be approached by considering a joint model across the two groups. Introducing the random variables X 1 =VitaminBintake(inmg) and X 2 = { 1 ifmouseismale 0 ifmouseisfemale we can describe the situation by two separate regression models: µ(x 1,x 2 )= { β A 0 +βa 1 x 1 ifx 2 =1 β B 0 +βb 1 x 1 ifx 2 =0 ( ) withβ A 1 denotingtheslopeinmalemiceandβb 1 denotingtheslopeinfemalemiceasthemain parameter of interest. However, this description does not have the usual form of a regression modelstill. ThiscanbeapproachedbyregardingX 1 asactuallytwodifferentcovariates,one definedinthemaleandonedefinedinthefemalemice: { X A X1 ifx 1 = 2 =1 0 ifx 2 =0 { 0 X B ifx2 =1 1 = X 1 ifx 2 =0 WecannowconsideraregressionmodelwiththecovariatesX A 1,XB 1 andx 2whichreads µ(x A 1,xB 1,x 2)= β 0 + β A 1 xa 1 + β B 1 xb 1 + β 2 x 2. Here β A 1 denotestheeffectofchanging X 1 inthemalemice, and β B 1 describestheeffectof changingx 1 inthefemalemice,andhenceitisnotsurprisingthat β A 1 =βa 1 and β B 1 =βb 1.Indeed, this model is equivalent to the double model(*) with the relations becauseinthecasex 2 =0wehave andinthecasex 2 =1wehave β 0 =β A 0, β A 1 =βa 1, β B 1 =βb 1, and β 2 =β B 0 βa 0 µ(x 1,0)=β A 0 +βa 1 x 1 and µ(x A 1,0,0)= β 0 + β A 1 xa 1 = β 0 + β A 1 x 1 µ(x 1,1)=β B 0 +βb 1 x 1 and µ(0,x B 1,1)= β 0 + β B 1 xb 1 + β 2 = β 0 + β 2 + β B 1 x 1 Consequently,iffittingaregressionmodelwiththethreecovariatesX A 1,XB 1 andx 2tothedata offigure18.1,weobtainanoutputlike variable beta SE 95%CI p-value intercept [25.933,69.495] dosemale [0.151,1.026] dosefemale [-0.295,0.581] sex [ ,33.631] 0.811

3 18.1. MODELING EFFECT MODIFICATION 249 andweobtainˆβ A 1 =0.589andˆβ B 1 =0.143astheslopeparameterestimatesinmalesandfemales, respectively. To assess the degree of effect modification, we will look at the difference between the slopes, i.e. γ=β B 1 βa 1 and we obtain ˆγ = ˆβ B 1 ˆβ A 1 = = By further steps we can obtain a confidenceintervalof[ 0.173,1.065]andap-valueof0.116.Sotheevidencewehaveinfavor foratruedifferencebetweenmaleandfemalemicewithrespecttotheeffectofvitaminb intake is limited. However, the large confidence interval suggests that the difference may be substantial.notethatitdoesnotmakegreatsensetolookatthep-valuesofˆβ B 1 andˆβ A 1,sinceit isnottheaimofouranalysistoassessthequestion,whetherthereisaneffectineachsubgroup, buttoassesswhetherthereisadifferenceintheeffect. logit (rel. frequency) female placebo treatment male female male placebo treatment placebo treatment number of patients patients with decrease in BP fraction of patients with decrease in BP 32.8% 37.1% 38% 67.2% odds = fraction/(100-fraction) logit=log(odds) Figure 18.3: Data from a clinical trial on an antihypertensive treatment and a visualisation of this data Figure 18.3 illustrates a second example in which the effect of an antihypertensive treatment isconsideredtobedependentonsex.wecanseethatinfemalesthetreatmentdifferencecanbe expressedasanempiricaloddsratioof1.207= 0.590,whereasinthemalesweobtainanodds ratioof3.331= 2.045,i.e.thetreatmenteffectismuchbiggerinmales,aswecanalsoseeinthe graphoffigure18.3.wecannowuselogisticregressiontoobtainaconfidenceintervalanda p-value for the ratio between these two odds ratios, which describes to which degree sex alters the treatment effect. Introducing the two covariates { 1 iftreatmentisgiventothepatient X 1 = 0 ifplaceboisgiventothepatient

4 250 CHAPTER 18. EFFECT MODIFICATION AND INTERACTIONS and X 2 = { 1 ifpatientismale 0 ifpatientisfemale we can describe the situation again by a double regression model logitπ(x 1,x 2 )= { β A 0 +βa 1 ifx 2 =1 β B 0 +βb 1 ifx 2 =0 ande βa 1 isidenticalwiththeoddsratioinmales, ande βb 1 isidenticalwiththeoddsratioin females.consequently,e γ forγ=β B 1 βa 1 satisfies e γ =e βb 1 βa 1 = e βb 1 i.e.itistheratiooftheoddsratioinmalesandfemales. AsaboveitisequivalenttoconsideraregressionmodelinthethreecovariatesX A 1,XB 1 andx 2 andinfittingthismodelweobtainanoutputlike e βa 1 variable beta SE 95%CI p-value intercept [-1.226,-0.206] treatmale [0.504,1.904] treatfemale [-0.537,0.912] sex [-0.472,0.927] Thedifference ˆγ= ˆβ B 1 ˆβ A 1 = =1.016correspondstothedifferenceweseein Figure18.3,andbyfurtherstepswecanobtainaconfidenceintervalof[0.009,2.024].Byexponentiating the estimate and the confidence interval we obtain with a confidence interval of [1.009, 7.568]. Note that is exactly the ratio between the two empirical odds ratios and we have computed above. The considerations above can be easily extended to categorical covariates. If, for example, a categorical covariate X 2 with three categories A, B, and C modifies the effect of another covariatex 1,wehavejusttoconstructforeachcategoryacovariate: X A 1 = X B 1 = X C 1 = { X1 ifx 2 =A 0 ifx 2 A { X1 ifx 2 =B 0 ifx 2 B { X1 ifx 2 =C 0 ifx 2 C andweproceedasabove.

5 18.1. MODELING EFFECT MODIFICATION 251 relative frequency non smokers smokers age classes Figure 18.4: The relative frequency of hypertension in dependence on the smoking status and age in an observational study with 1332 subjects. Age is available in years and only grouped into categories for this graph. Itremainstoconsiderthecase,inwhichweareinterestedinmodelingthemodifyingeffectof a continuous covariate on the effect of another covariate. Figure 18.4 illustrates such a situation with the two covariates { 1 subjectisasmoker X 1 = 0 subjectisnosmoker and X 2 = ageofsubject(inyears), andweareinterestedinhowagechangestheeffectofsmoking. SinceX 2 isacontinuouscovariate,wecannotapplytheapproachdefiningaversionofx 1 foranyvalueofx 2.Instead,wehavetoconsiderdirectlythemodelweareinterestedin,i.e.a modelallowingtheeffectofx 1 todependonx 2.Wecanwritesuchamodelas logitπ(x 1,x 2 )=β 0 +β 1 (x 2 )x 1 +β 2 x 2, i.e.weallowtheregressioncoefficientofx 1 todependonthevalueofx 2. Themostsimple type of dependence is a linear relation, i.e. we specify the dependence as β 1 (x 2 )=α 0 +α 1 x 2 suchthatα 0 istheeffectofx 1 ifx 2 =0andα 1 describestheincreaseoftheeffectofx 1 ifwe increasex 2 by1.0.ifwenowinsertthisinthemodelabove,weobtain whichwecanrewriteas logitπ(x 1,x 2 )=β 0 +(α 0 +α 1 x 2 )x 1 +β 2 x 2, logitπ(x 1,x 2 )=β 0 +α 0 x 1 +α 1 x 1 x 2 +β 2 x 2. ( ) Thisisamodelwecanfitdirectlytoourdata.Wehavejusttointroduceanewcovariate X 3 =X 1 X 2

6 252 CHAPTER 18. EFFECT MODIFICATION AND INTERACTIONS andwefitalinearmodelinthethreecovariatesx 1,X 2 andx 3. Ifwedothisforthedataof Figure 18.4, we obtain an output like variable beta SE 95%CI p-value intercept [-4.660,-3.152] <0.001 smoking [1.963,3.947] <0.001 age [0.039,0.064] <0.001 agesmoking [-0.050,-0.016] <0.001 Soweobtaintheestimateˆα 1 = 0.033,suggestingthethedifferencebetweensmokersandnon smokers with respect to the probability of hypertension(on the logit scale) decreases by witheachyearofage,orby0.33with10yearsofage.or,ifonepreferstouseoddsratios,that theoddsratiocomparingsmokersandnonsmokersdecreasesbyafactorofe 0.33 =0.72with tenyearsofage. However, just presenting this result would be a rather incomplete analysis, as we only know, thatthedifferencebecomessmaller,butwedonotknow,howbigtheeffectofsmokingisat differentages. Henceitiswisetoaddestimatesoftheeffectofsmokingatdifferentages, e.g.attheagesof35and75.theeffectofsmokingatacertainagex 2 is,asmentionedabove, α 0 +α 1 x 2. Accordingto(**),α 0 isjustthecoefficientofx 1 inthefittedmodel,i.e.fromthe outputaboveweobtain ˆα 0 =2.955.Soweobtainestimatesfortheeffectofsmokingattheage of35as ˆα 0 +ˆα 1 35=2.955+( 0.033) 35=1.812 oranorofe =6.12 andattheageof75weobtain ˆα 0 +ˆα 1 75=2.955+( 0.033) 75=0.506 oranorofe =1.66. Sotheoddsratiofortheeffectofsmokingonhypertensionisattheageof75onlyabouta fourthoftheoddsratioattheageof35. Remark: Note that the computation of the effect of smoking at selected ages contributes also to assessing the relevance of the effect modification. In our example, we observe a reduction of theeffectofsmoking(onthelogitscale)from1.812atage35downto0.506attheageof75. Sooverthewholerangewehaveasubstantialeffectofsmoking. However,thesameestimate of ˆα 1 willalsoappear,iftheeffectofsmokingdecreasesfrom0.612downto Inthis casetheeffectofsmokingvanisheswithincreasingageandhasattheendtheoppositesign. Such a qualitative change is much more curios and typically much more relevant than the purely quantitative change seen above Adjusted effect modifications The adequate assessment of effect modifications may require to take other covariates into account. Asanexample,letusconsiderthedatashowninFigure18.5. Theeffectofalcohol

7 18.2. ADJUSTED EFFECT MODIFICATIONS 253 females males logit (rel. freq.) no smoking yes no alcohol consumption low high yes smoking Figure 18.5: The relative frequency of hypertension(on the logit scale) in dependence on smoking status, alcohol consumption and sex. logit (rel. freq.) alcohol consumption low high no yes smoking Figure 18.6: The relative frequency of hypertension(on the logit scale) in dependence on smoking status and alcohol consumption. consumption seems to be roughly identical in all four subgroups defined by smoking status and sex. However,ifweignoreinouranalysisthevariablesex(Figure18.6),itlookslikethatthe effect of alcohol consumption is bigger in smoking subjects. Indeed, if we fit a logistic model tothedataoffigure18.6allowingdifferenteffectsofalcoholforsmokersandnonsmokers,we obtain an output like variable beta SE 95%CI p-value alcoholsmoker [0.374,1.229] <0.001 alcoholnosmoker [-0.067,0.836] smoking [0.052,0.965] suggesting different effects of alcohol consumption for smokers and non smokers. However, if weadjusttheanalysisforsex,weobtainanoutputlike

8 254 CHAPTER 18. EFFECT MODIFICATION AND INTERACTIONS variable beta SE 95%CI p-value alcoholsmoker [-0.086,0.833] alcoholnosmoker [-0.061,0.896] smoking [0.100,1.067] sex [1.053,1.726] <0.001 with nearly identical effects of alcohol consumption for smokers and non smokers, as suggested by Figure The explanation for this striking difference can be found in Figure 18.7, in whichwehaveaddedthenumberofsubjectsineachsubgroup.wecanseethatthesubgroupof smokers with high alcohol consumption is dominated by males, whereas in all other subgroups themajorityofsubjectsarefemales.andasmaleshaveinthisstudyahigherriskforhypertension than females, this contributes to the high frequency of hypertension in smokers with high alcohol consumption resulting in the pattern we have seen in Figure Hence this examples illustrates that it is important to adjust for potential confounders in the analysis of effect modifications also. logit (rel. freq.) females no yes smoking no males yes smoking alcohol consumption low high Figure 18.7: The relative frequency of hypertension(on the logit scale) in dependence on smoking status, alcohol consumption and sex. The numbers indicate the number of subjects behind the relative frequency Interactions Wheneveronestartstomodelaneffectmodificationonehastobeawareofaslightarbitrariness, becausewedecideinadvancethatx 1 modifiestheeffectofx 2,andnotviceversa. However, these two perspectives are equivalent and in principle indistinguishable. Let us consider forexamplefigure18.4.wehaveusedittoillustratehowagemodifiestheeffectofsmoking, butwecanalsoregarditasanillustrationthattheeffectofageisdifferentforsmokersand non smokers: The frequency of hypertension increases with age slower for smokers than for nonsmokers. SimilarinFigure18.3wemaysaythateachtreatmentisassociatedwithasex

9 18.3. INTERACTIONS 255 effectofitsowninsteadofthatsexaltersthetreatmenteffect.itmayhappenthatoneofthetwo perspectives sounds more logically but in principle both perspectives are correct. The symmetry of both perspectives can be also nicely illustrated by our considerations above inthecaseofcontinuouscovariates. WehaveshownthatmodelingthemodifyingeffectofX 2 ontheeffectofx 1 leadstotheexpression β 0 +(α 0 +α 1 x 2 )x 1 +β 2 x 2 = β 0 +α 0 x 1 +β 2 x 2 +α 1 x 1 x 2. NowmodelingthemodifyingeffectofX 1 ontheeffectofx 2 leadsto β 0 + β 1 x 1 + β 2 (x 1 )x 2 andassumingthe β 2 (x 1 )islinearinx 1 weobtain β 0 + βx 1 +( α 0 + α 1 x 1 )x 2 = β 0 + βx 1 + α 0 x 2 + α 1 x 1 x 2 where α 1 nowdescribeshowmuchtheeffectβ 2 (x 1 )ofx 2 increasesifweincreasex 1 by1.0. However,inbothcasesweobtainalinearmodelinthecovariatesX 1,X 2 andx 1 X 2,hencethe valuesofα 1 and α 1 areidentical.hencethissinglenumberdescribesboththemodifyingeffect ofx 1 ontheeffectofx 2 aswellasthemodifyingeffectofx 2 ontheeffectofx 1.Hencechoosing only one of these two interpretations is arbitrary. The statistical literature avoids this problem by introducing the term interaction, i.e. instead oftalkingaboutwhetherx 1 modifiestheeffectofx 2 orviceversa,wejustsaythatthetwo covariates interact. And instead of modeling the modifying effect explicitly as we have done in theprevious sections statisticians justadd the product X 1 X 2 asan additional term tothe regression model. This term is called an interaction term, and the corresponding parameter is called an interaction parameter. And we have just seen that this makes perfect sense as this parameter describes the modifying effect from both perspectives. Adding the product of covariates makes perfect sense also in the case of effect modification byabinarycovariate.letusassumethatx 2 isabinarycovariateandwestartwithamodelwith an interaction: µ(x 1,x 2 )= β 0 + β 1 x 1 + β 2 x 2 + β 12 x 1 x 2. ( ) Then µ(x 1,0) = β 0 + β 1 x 1 and µ(x 1,1) = β 0 + β 1 x 1 + β 2 + β 12 x 1 = β 0 + β 2 +( β 1 + β 12 )x 1. If we compare this with our explicit modeling of the effect modification µ(x 1,0) = β A 0 + β A 1 x 1 and µ(x 1,1) = β B 0 + β B 1 x 1

10 256 CHAPTER 18. EFFECT MODIFICATION AND INTERACTIONS we can observe that which implies β 1 =β A 1 and β 1 + β 12 =β B 1 β 12 =β B 1 βa 1. Hencetheinteractionparameter β 12 isnothingelsebutthedifferencesoftheslopesinthetwo subgroupsdefinedby X 2, whichwehavealreadypreviouslyusedtodescribethedifference betweenthetwosubgroupswithrespecttotheeffectofx 1. Andofcoursethisfitsalsotothe generalinterpretationoftheinteractionparameter: ItdescribeshowtheeffectofX 1 increases ifweincreasex 2 by1.0,andifthelatterisabinarycovariate,adifferenceof1.0meanstogo fromx 2 =0toX 2 =1. IfwereanalyzethedataofFigure18.1byamodellike(*)withaninteractionterm,weobtain an output like variable beta SE 95%CI p-value intercept [25.933,69.495] dose [-0.295,0.581] sex [ ,33.631] dosesex [-0.173,1.065] Wecanseethattheestimate ˆ β 12 = 0.446coincideswiththevaluesof ˆγwehavecomputed previously.ifwewanttoobtainestimatesoftheeffectofvitaminbintakeinthetwosubgroups, wehavejusttorememberthatforfemales(sex=0)ourmodelreducesto µ(x 1,0)= β 0 + β 1 x 1 soˆ β 1 =0.143istheestimatedeffectofvitaminBintakeinfemales,andformalesitreducesto µ(x 1,1)= β 0 + β 2 +( β 1 + β 12 )x 1 soˆ β 1 +ˆ β 12 = =0.589istheestimatedeffectofvitaminBintakeinmales. So using interaction terms in regression models is a general framework which is equivalent to an explicit modeling of effect modifications. Inclusion of interaction terms is directly supported by most statistical packages, whereas the explicit modeling used in the previous sections has tobedoneusuallybyhand. Hencetheinteractionapproachistheonetypicallyseeninthe literature. However, it is a major drawback of the interaction approach that is does not directly provide effect estimates within subgroups or at selected covariate values, which is essential for the interpretation of interactions. One has to construct such estimates usually by hand, and not all statistical packages support to compute confidence intervals for the constructed estimates. And in my experience this construction is cumbersome and error-prone, especially in models with several interactions or categorical covariates(see the next two sections). Hence I recommend, especially if a researcher starts to investigate effect modifications for the first time, to use an explicit modeling, as this approach is more transparent. However, this is restricted to the case where at least one of the interacting covariates is binary or categorical.

11 18.3. INTERACTIONS 257 Onemayarguethattheinteractionapproachisalwaysthepreferableone,asitavoidstoselect (arbitrarily) a perspective with respect to the effect modification. However, this is in my opinion not a valid argument. As shown above an interpretation of an interaction parameter without taking its actual impact on subgroup specific effects into account will be always incomplete, hence it is dangerous to look only at the value of the interaction parameter. Hence the translation of an interaction parameter into an effect modification is an important step in the analysis. One may now argue that one should always look at both perspectives of effect modification. I agree with this, however, this should not imply that we have to present the results of both perspectives inanypaper. Itisallowedtoselectonlyoneofthetwoperspectives, andtypicallyweare especiallyinterestedinoneofourcovariates,soitmakessensetofocusonhowtheeffectofthis covariatemaydependonothercovariates.itisonlynecessarytokeepinmindthattherearetwo perspectives, and to ensure that the perspective neglected is not an important one. As a simple example, let us consider the results of a randomised trial randomising general practitioners (GPs)toaninterventiongroupandacontrolgroup. IntheinterventiongrouptheGPswere informed about recent developments in antidepressant drug treatment and called to use new, effective drugs. To evaluate the study, for each GP the rate of hospitalisation for depression within one year among all patients with a incident diagnosis of depression was computed and averaged over GPs. The following table summarizes the results stratified by practice size: practice size small large control group 10.7% 33.4% intervention group 27.8% 30.4% OR The above representation suggests that the practice size modifies the intervention effect: A clear deterioration, i.e. increase of hospitalisations in the intervention group in small practices andaslightimprovementinlargepractices. However,acloserlookatthetablerevealsthatin the control group reflecting the current situation GPs from small practices are much better in handling patients with depressions than GPs in large practices. If we speculate that the original differenceamongsmallandlargepracticesisduetothatgpsinsmallpracticeshavemoretime to talk with their patients and hence can manage patients suffering from depressions without drug treatment, then we have to conclude that here the intervention modifies the differences betweengpsinsmallandlargepractices:thefocusondrugtherapypreventsgpsinthesmall practices to remain superior to GPs in large practices. This example illustrates also another problem of using the term effect modification. It sounds like an action, which really takes place, and that we have a responsible, active actor. However,theonlythingwedoistodescribeadifferenceofeffect(estimates),andwehavein any case to think carefully about possible explanations for this. The situation is similar to that of using the term effect, which we discussed in Section 11.1.

12 258 CHAPTER 18. EFFECT MODIFICATION AND INTERACTIONS 18.4 Modeling effect modifications in several covariates In some applications one might be interested in investigating the modifying effect of several covariatesontheeffectofothercovariates. Aslongaswecanstructuretheprobleminaway that the effect of a single covariate depends only on one other binary or categorical covariate, one can try the explicit modeling approach of the first section, even if the binary/categorical covariate modifies the effect of several covariates. However, as soon as the effect of one covariate depends on several covariates, this approach becomes cumbersome, and if at least two continuous covariates interact, one is forced to use the interaction term approach by adding an interaction term for each pair of interacting covariates. The interpretation of the interaction termsremain: Theydescribehowtheeffectofonecovariatechangesifthevalueoftheother increasesby1.0. Inanycaseoneneedtocomputesubgroupspecificeffectstocompletethe analysis. Wehaveseenabove,thatitcanberathertrickytoobtainsubgroupspecificeffects,butthere isasimplegeneralstrategytodothis.ifweareinterestedintheeffectsoftheothercovariates foraspecificvaluexofthecovariatex j,wejusthavetoinsertthisvalueintotheregression model formula and we obtain a new regression model describing exactly the effect of the other covariates if X j = x. For example, let us assume we are interested in a model with three covariatesallowingthatx 3 modifiestheeffectofx 1 andx 2. Hencewehaveamodelwiththe interactiontermsx 1 X 3 andx 2 X 3 : β 0 +β 1 x 1 +β 2 x 2 +β 3 x 3 +β 13 x 1 x 3 +β 23 x 2 x 3 IfwenowwanttoknowtheeffectofX 1 andx 2 ifx 3 =7.3,wejusthavetoimpute7.3forx 3 in the above expression and obtain β 0 +β 1 x 1 +β 2 x 2 +β β 13 x β 23 x = (β β 3 )+(β β 13 )x 1 +(β β 23 )x 2 SotheeffectofX 1 isβ β 13 andtheeffectofx 2 isβ β 23 ifx 3 =7.3,andwhat remains is only to insert the estimates. Remark: IfoneisinterestedinanalysingthemodifyingeffectofX 3 andx 2 togetheronthe effectofx 1,theinclusionoftheinteractiontermsX 1 X 2 andx 1 X 3 isequivalenttomodellingthe effectβ 1 ofx 1 asalinearfunctionofx 2 andx 3 : β 0 +β 1 (x 2,x 3 )x = β 0 +(α 0 +α 2 x 2 +α 3 x 3 )x = β 0 +α 0 x 1 +α 2 x 1 x 2 +α 3 x 1 x One can now of course ask, whether there is again an interaction between X 2 and X 3 with respecttothemodifyingeffect.forexampleifx 2 isageandx 3 issex,onemightbeinterested in evaluating, whether the effect is especially large in old females. Then one can extend the

13 18.5. THE EFFECT OF A COVARIATE IN THE PRESENCE OF INTERACTIONS 259 above approach to β 0 +β 1 (x 2,x 3 )x = β 0 +(α 0 +α 2 x 2 +α 3 x 3 +α 23 x 2 x 3 )x = β 0 +α 0 x 1 +α 2 x 1 x 2 +α 3 x 1 x 3 +α 23 x 1 x 2 x i.e. we have to include the product of three covariates. Such interaction terms are called higher order interactions The effect of a covariate in the presence of interactions Wheneverwestarttoincludeaninteractionterm,e.g.X 1 X 2 inaregressionmodel,itisouraim toinvestigatehowx 1 modifiestheeffectofx 2 andviceversa. Thisimpliesthatweassumea priorithattheeffectofx 1 variesindependenceonx 2 andviceversa.soitdoesnotmakesense totalkabout theeffectofx 1 or theeffectofx 2. Although a sole effect of a covariate involved in an interaction is logically not existing, one can find in the medical literature very frequently that the effect of covariates are reported although they are involved in interactions. There are two reason for this: First many statistical packages report in the analysis of a regression model with interaction terms some estimates whichlooklikeeffectestimates. Letuslookforexampleagainattheoutputinanalysingthe data of Figure 18.1 using the interaction term approach. variable beta SE 95%CI p-value intercept [25.933,69.495] dose [-0.295,0.581] sex [ ,33.631] dosesex [-0.173,1.065] Tounderstandthemeaningofˆβ 1 =0.143andˆβ 2 =2.829,letuslookagainatourmodel: µ(x 1,x 2 )=β 0 +β 1 x 1 +β 2 x 2 +β 12 x 1 x 2. Ifwesetx 2 =0,weobtain µ(x 1,0)=β 0 +β 1 x 1, henceˆβ 1 isjusttheestimatefortheeffectofvitaminbinfemales.anditiscompletelyunjustifiedtocallthistheeffectofvitaminbintakeingeneral.ifwesetx 1 =0,weobtain µ(0,x 2 )=β 0 +β 2 x 2, henceˆβ 2 isjusttheestimatefortheeffectofsexifwehaveavitaminbintakeof0.itisagain unjustified to call this the general sex effect and it is moreover an unreliable extrapolation, as therangeofvitaminbinthedataisfrom30to70mg. Soitmakesnosensetolookatthese

14 260 CHAPTER 18. EFFECT MODIFICATION AND INTERACTIONS estimates. We have to regard them usually as nuisance parameters like the intercept parameter. Wecanonlyusethemasatechnicalaidinsomecomputations,e.g.forsubgroupspecificeffects. Thesecondreasonisthatonecandefineoveralleffectsofacovariatebytakingtheaverage of subgroup specific effects. For example one can in the example above define the overall dose effectastheaverageofthedoseeffectinmaleandfemalemice,respectively,i.e.as 1 2 βa βb 1, which we can simply estimate by 2ˆβ 1 A 1 + 2ˆβ 1 B 1, and we may obtain also confidence intervals and p-values. Some statistical packages provide such estimates and inference in their standard output as main effects, especially if they are designed to analyse experimental data. There are also subtle differences between different programs with respect to choosing the weights. Some packages give equal weights to the subgroups, some weight the subgroups by their size. Whether such overall effects are meaningful has to be discussed in each single application. It happens that in spite of a main interest in assessing effect modifications, one still is interested in confirming that a covariate has an effect at all. Then the overall effect provides one reasonable andsimplewaytoanswerthequestionofinterest.onehasonlytobeawareofthatinthecase of a qualitative interaction, i.e. both positive and negative subgroup specific effects, the overall effectmaybecloseto0andhenceonemayfailtodemonstrateanoveralleffect.analternative isthentoperformatestofthenullhypothesis H 0 :β A 1 =0 and βb 1 =0, i.e.totestwhetherbothsubgroupspecificeffectsare0. Ifonecanrejectthisnullhypothesis, one has also shown that the covariate has some effect. In the case of no qualitative interactions, thistestismuchlesspowerfulthantestingtheaverageeffect,soitcannotberecommendedin general. Hence we have again here the situation that one typically starts with testing the overall effect, but that in presence of a qualitative interaction one applies post hoc the second test, and hastoconvincethereaderofapaperthatthisdecisionisjustified. Remark:TheinterpretationoftheregressionparameterofacovariateX j,whichisnotinvolved in any interaction term, is not affected by the presence of interactions terms for other covariates. ThisparametersdescribestilltheeffectofadifferenceinX j of1.0,ifwecomparetwosubjects agreeing in all other covariates. This interpretation is obviously independent of how the other covariates may interact together Interactions as deviations from additivity In the medical literature one can find frequently that interactions are interpreted as deviations from additivity of effects, and one can find the terms synergistic and antagonistic effects for positive or negative interaction terms. The rationale behind this is illustrated at hand of the example in Figure In subjects continuing smoking who received only standard antihypertensive treatment we can observe an average decrease of blood pressure by 2.1. In subjects continuing smoking with additional physical exercises we observe an average decrease of 9.5, i.e. physical exercises increase the effect of the standard treatment by 7.4. In subjects quitting

15 18.6. INTERACTIONS AS DEVIATIONS FROM ADDITIVITY 261 decrease in blood pressure smoking continue quit decrease in blood pressure β 2 } β 12 } β 1 +β 2 β 1 standard + exercises treatment standard + exercises treatment Figure 18.8: Results of a randomised clinical trial in smokers on adding physical exercises to standard antihypertensive treatment, stratified by continuing or quitting smoking Figure 18.9: The concept of additivity and its relation to the interactiontermillustratedathandofthe data of Figure smoking who received only standard antihypertensive treatment we can observe an average decreaseofbloodpressureby13.3,soquittingsmokingseemtoincreasetheeffectofthestandard treatmentby11.2. Fromthesenumberswemaynowexpect,thattheaveragedecreaseinsubjects quitting smoking and performing additional physical exercises is = However, we observe an average decrease in blood pressure of 27.9 in these subjects, i.e. a higher value than we have to expect from the effects of adding additional physical exercises or quitting smoking alone. This synergistic effect corresponds to a positive interaction term as illustrated in Figure Let { 1 physicalexercisesaddedtostandardtreatment X 1 = 0 standard treatment only and X 2 = { 1 quittingsmoking 0 continuing smoking and we consider a regression model with an interaction: µ(x 1,x 2 )=β 0 +β 1 x 1 +β 2 x 2 +β 12 x 1 x 2. Then the effect of adding physical exercises to the standard treatment is µ(1,0) µ(0,0)=β 1 and the effect of quitting smoking under the standard treatment is µ(0,1) µ(0,0)=β 2

16 262 CHAPTER 18. EFFECT MODIFICATION AND INTERACTIONS sothatinthecaseofadditivityofthesetwoeffectsweexpect butinourmodelwehave µ(1,1) µ(0,0)=β 1 +β 2 µ(1,1) µ(0,0)=β 1 +β 2 +β 12 suchthatβ 12 equalstheaccesscomparedtoadditivity. Similar,anegativevalueofβ 12 would correspond to an antagonistic effect, i.e. µ(1, 1) is smaller than expected. y X 2 =0 X 2 =1 y X 2 =0 X 2 =1 0 1 X X 1 Figure18.10: Illustrationsofqualitativeinteractions. Leftside: TheeffectofX 1 ispositiveif X 2 =1,butnegativeifX 2 =0andtheeffectofX 2 ispositiveforbothvaluesofx 1.Rightside: TheeffectofX 1 ispositive,ifx 2 =0,butnegativeifX 2 =1andtheeffectofX 2 ispositive,if X 1 =0,butnegativeifX 1 =1. It is, however, misleading to interpret any positive interaction parameter as synergy and any negativeparameterasantagonism. If,forexample,weexchangethevalues1and0inoneof the covariates in the example above, the interaction parameter will become negative. So it is essential that both covariates have a positive effect in order to associate a positive interaction withsynergy. Wehavealreadylearnedthatwehavetobecarefulwithusingtheterm effect ofacovariate inthepresenceofinteractions,anditisindeednotenoughtolookatsingle effect estimates. Whenever we want to talk about synergistic or antagonistic effects, we have to sort out the situations illustrated in Figure 18.10, where in at least one of the two perspectives of effect modification the modified effect changes its sign. Such situations are often referred to as qualitative interactions, whereas situation as in Figure 18.8, where we have a positive treatment effect in subjects continuing smoking as well as quitting smoking and a positive effect of quitting smoking in both treatment arms are often referred to as quantitative interactions. So only if we have the situation of a quantitative interaction and all subgroup specific effects are positive, it makes sense to interpret a positive interaction parameter as synergy and a negative interaction parameter as antagonism. To check the presence of a quantitative interaction, in the case of two binary covariates we just havetolookatthetwosubgroupspecificeffectsinbothperspectives.inthecaseofcontinuous

17 18.7. SCALES AND INTERACTIONS 263 covariateswehavetodefinea natural rangeofthecovariateandtocheck,whethertheeffects have identical signs at both boundaries of the range. However, a small deviation, e.g. an effect of1.0atoneboundaryandaneffectof-0.15attheotherboundarymaybestillregardedasa quantitative interaction, as we make the rather strong assumption that the change of the effect is linearoverthewholerange.sosucharesultmaybemoreanindication,thattheeffectvanishes ifweapproachtheboundary,ratherthanthatthesignoftheeffectreallychanges. number of mutated cells number of mutated cells x1=0 x1=1 x2=0 x2=1 x1=0 x1=1 x2=0 x2=1 Figure 18.11: An artificial dataset Figure 18.12: The same dataset as infigure18.11,nowwiththeoutcomeonalogarithmicscaleonthe y-axis Scales and interactions Figure shows the results of an experimental study investigating the effect of two binary covariatesx 1 andx 2 onthenumberofmutationsinacellculture.themeanvaluessuggestthat goingfromx 1 =0tox 1 =1doublestheamountofmutatedcellsinbothgroupsdefinedbyX 2 (5to10ifx 2 =0and10to20ifx 2 =1),sothereisnoeffectmodification.However,analysing this data by linear regression, we obtain a highly significant interaction, as we have different differences in the two subgroups: 10-5=5 and 20-10=10. So the absence and presence of an interaction of effect modification depends on whether we consider additive or multiplicative effects. If one views the data on a logarithmic scale(figure 18.12) for the outcome, the apparent interaction vanishes. A similar situation occur in the following example, again with two binary covariates, but now with a binary outcome. The(true) model can be described by the following probabilities:

18 264 CHAPTER 18. EFFECT MODIFICATION AND INTERACTIONS x 1 x 2 π(x 1,x 2 ) logitπ(x 1,x 2 ) Ifwelookattheprobabilityvalues,wehaveexactadditivity:π(1,0)=π(0,0)+0.11,π(0,1)= π(0,0)+0.17,π(1,1)=π(0,0) However,onthelogitscalewehaveadistinct interaction,cf.figure Similar,inaCoxmodelwemayhaveaninteractiononthelog hazard scale, which correspond to additivity on the hazard scale, or vice versa. π(x1, x2) x 2 =0 x 2 =1 logit π(x1, x2) x 2 =0 x 2 =1 0 1 x x 1 Figure 18.13: Illustrations of a simple model with additivity on the probability scale, but an interaction on the logit scale. Hence in any interpretation of(quantitative) interactions or effect modification, one has to consider always the choice of the scale as one possible explanation. However, a change in the sign of the effect of a covariate, i.e. a qualitative interaction, is robust against monotone transformations of the scale, so here this explanation cannot be valid. Remark: In the case of a continuous outcome, one may decide to transform the outcome variable if one feels that the standard scale is inadequate. However, here the same considerations as with transforming the outcome to obtain a better linear fit have to be applied(cf. Section 17.1) Ceiling effects and interactions Many outcome variables cannot be decreased or increased to any degree. There may be biological bounds, e.g. a certain fat intake is necessary for human beings to survive. Many outcome variables are also bounded by the measurement instrument: For example if assessing the quality oflifeofapatientbyscorescomputedfromthemarksonaquestionnaire,apatientcannotscore

19 18.9. HUNTING FOR INTERACTIONS 265 morethanthemaximalnumberofpoints.iftherearenowsubjectsinastudywithvaluesclose tosuchabound,theymaybelimitedinshowingarelationtocovariates,astheycannotfurther improve or deteriorate. We then say that a regression analysis may suffer from ceiling(or floor) effects. Ceiling effects are often a source for interactions, as they may prevent one subgroups of subjects to be sensitive to the covariate of interest. For example we may consider another study ingpslookingattheeffectofcallinggpstousemoretimeoncounselinginpatientssuffering from depressions. In such an intervention study we may observe the following result with respect to rates of hospitalisation: practice size small large control group 10.7% 33.4% intervention group 10.1% 12.4% OR OnemayconcludethattheinterventionisworkingfineinGPsinlargepracticeswhereasin smallpracticesthereisnoeffect. However,itisnotallowedtointerpretthisinthewaythat GPs in the small practices did not follow the suggestions of the intervention study or lacked compliance in another manner. As visible from the results in the control group GPs in small practice are already today good in handling patients with depressions and the intervention brings thegpsfromlargepracticesjustuptothelevelofgpsfromsmallpractices.ifweassumethat 10%ofallpatientswithdepressionareingeneralrobustagainsttreatmentbyGPsandcanonly be managed by hospitalisation, than the GPs in small practices had no chance to improve, and hencenochancetoshowaneffectoftheinterventionincontrasttothegpsfromlargepractices. SothisinteractionmaybeduetoaflooreffectofGPsinsmallpractices. Sowheneveroneobservesaninteraction,oneshouldtakeacloserlookatthedataandask oneselfwhetheritmaybeduetoaceilingeffect. Ifitcanbeexplainedthisway,itdoesnot meanthattheinteractionisinvalidoranartefact. Itjustmeansthatonehastotaketheceiling effect into account in the subject matter interpretation of the interaction Hunting for interactions From a general research methodological point of view it is highly desirable to detect existing interactions. If we use regression models to describe and understand the influence of several covariatesx 1,X 2,...,X p,interactionswillmakeamajorcontributiontotheunderstanding,as thedependenceoftheeffectofx 1 onx 2 mayallowsomeinsightsintothemechanismhow X 1 influencestheoutcome.qualitativeinteractionsareofspecialimportance,astheytypically challenge the traditional interpretation of covariates as universal risk factors in the particular setting. So this suggests to look systematically for interactions, whenever we fit a regression model to asses the effect of covariates.

20 266 CHAPTER 18. EFFECT MODIFICATION AND INTERACTIONS Unfortunately an investigation of interactions or effect modification suffers usually from a poorpower.letuscomparethesituationofestimatingtheoveralleffectβ 1 ofacovariatex 1 in a model without interactions with estimating the difference between the two subgroup specific effectsβ A 1 andβb 1,i.e.aninteractionwithabinarycovariateX 2. If50%ofoursampleshows X 2 =1andtheotherhalfX 2 =0,theneachofthetwoestimatesˆβ A 1 andˆβ B 1 arebasedonhalfof thedata.thisimplies,thattheirstandarderrorsareinflatedbyafactorof 2incomparisonto thestandarderrorof ˆβ 1. Andtakingthedifference ˆβ B 1 ˆβ A 1 increasesthestandarderrorfurther byafactorof 2, suchthatthestandarderrorofthedifference ˆβ B 1 ˆβ A 1 ishencetwicethe standarderrorofˆβ 1.Inthesituationofaquantitativeinteraction,wemayhaveβ A 1 =0.5β 1and β A 1 =1.5β 1,suchthatβ B 1 βa 1 =β 1,i.e.theparameterofinterestisofthesamemagnitudeinboth situations.however,ifourstudyhasapowerof80%torejectthenullhypothesish 0 :β 1 =0, itwillhaveonlyapowerof29%torejectthenullhypothesish 0 :β A 1 =βb 1 ofnointeraction duetothelargerstandarderrorofˆβ B 1 ˆβ A 1.Ifwewanttohaveasimilarpowerforthedetection ofadifferenceinthesubgroupspecificeffects,wehavetoassumethatthedifferencebetween β B 1 andβa 1 istwiceasβ 1,i.e.aratherextremeeffectmodification. Andinmostapplicationsof regression models we are typically glad to have a sample size allowing to estimate the effect ofthecovariateswithsufficientprecisiontoprovethattheyhaveaneffect,suchthatwecannot expect a large power to detect interactions. The second problem with a systematic investigation of interactions is multiplicity. If we have 5covariates,wehavealready10pairsofcovariates,i.e.10interactiontermswecanadd(oreven more,ifsomeofthecovariatesarecategorical).soifwejustlookattheresultingp-values,we cannot regard the significance of an interaction as a reliable indicator for its existence. One can find nevertheless sometimes the results of such a search in medical publications without any attempt to take the multiplicity problem into account, just reporting typically one significant interactionoutof10or20tested. Thisisofnogreatvalue,anditisoneexampleofthebad tradition of hunting for p-values. So whenever one starts to think about investigating interactions systematically, one should be awareofthatonetriessomethingwhichhasonlyaverylimitedchancetoproduceanyreliable result. We have a rather low power to detect true interactions and simultaneously a high chance toproducefalsesignals.toreducethemultiplicityproblemthefirststepshouldbetodefinea (small) subset of all possible interactions or effect modifications, which are a priori of special interest or likely to indicate effect modifications of interest. There are three major sources to identify such interactions: a) Wemayfindhintsoninteractionsintheliterature. Thesemaybeduethatauthorshave already investigated effect modification or that studies varying in important population characteristics vary with respect to their effect estimates of a covariate. Personal communications with experienced researchers in the field of interest may be a similar source, as they may have experienced that some traditional factors are of limited value in some subjects. b) Certain effect modifications may be more plausible than others. Subject matter reasons may suggest that a factor has different meanings in different subgroups. For example in looking at the association between physical occupation and back pain as considered in

### How Do We Test Multiple Regression Coefficients?

How Do We Test Multiple Regression Coefficients? Suppose you have constructed a multiple linear regression model and you have a specific hypothesis to test which involves more than one regression coefficient.

### 11. Analysis of Case-control Studies Logistic Regression

Research methods II 113 11. Analysis of Case-control Studies Logistic Regression This chapter builds upon and further develops the concepts and strategies described in Ch.6 of Mother and Child Health:

### Marginal Effects for Continuous Variables Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised February 21, 2015

Marginal Effects for Continuous Variables Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised February 21, 2015 References: Long 1997, Long and Freese 2003 & 2006 & 2014,

### Statistics 104 Final Project A Culture of Debt: A Study of Credit Card Spending in America TF: Kevin Rader Anonymous Students: LD, MH, IW, MY

Statistics 104 Final Project A Culture of Debt: A Study of Credit Card Spending in America TF: Kevin Rader Anonymous Students: LD, MH, IW, MY ABSTRACT: This project attempted to determine the relationship

### Handling missing data in Stata a whirlwind tour

Handling missing data in Stata a whirlwind tour 2012 Italian Stata Users Group Meeting Jonathan Bartlett www.missingdata.org.uk 20th September 2012 1/55 Outline The problem of missing data and a principled

### Discussion Section 4 ECON 139/239 2010 Summer Term II

Discussion Section 4 ECON 139/239 2010 Summer Term II 1. Let s use the CollegeDistance.csv data again. (a) An education advocacy group argues that, on average, a person s educational attainment would increase

### Econ 371 Problem Set #3 Answer Sheet

Econ 371 Problem Set #3 Answer Sheet 4.1 In this question, you are told that a OLS regression analysis of third grade test scores as a function of class size yields the following estimated model. T estscore

### REGRESSION LINES IN STATA

REGRESSION LINES IN STATA THOMAS ELLIOTT 1. Introduction to Regression Regression analysis is about eploring linear relationships between a dependent variable and one or more independent variables. Regression

### MULTIPLE REGRESSION EXAMPLE

MULTIPLE REGRESSION EXAMPLE For a sample of n = 166 college students, the following variables were measured: Y = height X 1 = mother s height ( momheight ) X 2 = father s height ( dadheight ) X 3 = 1 if

### Multinomial and Ordinal Logistic Regression

Multinomial and Ordinal Logistic Regression ME104: Linear Regression Analysis Kenneth Benoit August 22, 2012 Regression with categorical dependent variables When the dependent variable is categorical,

### Please follow the directions once you locate the Stata software in your computer. Room 114 (Business Lab) has computers with Stata software

STATA Tutorial Professor Erdinç Please follow the directions once you locate the Stata software in your computer. Room 114 (Business Lab) has computers with Stata software 1.Wald Test Wald Test is used

### Introduction to Stata

Introduction to Stata September 23, 2014 Stata is one of a few statistical analysis programs that social scientists use. Stata is in the mid-range of how easy it is to use. Other options include SPSS,

### How to set the main menu of STATA to default factory settings standards

University of Pretoria Data analysis for evaluation studies Examples in STATA version 11 List of data sets b1.dta (To be created by students in class) fp1.xls (To be provided to students) fp1.txt (To be

### Generalized Linear Models

Generalized Linear Models We have previously worked with regression models where the response variable is quantitative and normally distributed. Now we turn our attention to two types of models where the

### Econ 371 Problem Set #3 Answer Sheet

Econ 371 Problem Set #3 Answer Sheet 4.3 In this question, you are told that a OLS regression analysis of average weekly earnings yields the following estimated model. AW E = 696.7 + 9.6 Age, R 2 = 0.023,

### Lecture 16: Logistic regression diagnostics, splines and interactions. Sandy Eckel 19 May 2007

Lecture 16: Logistic regression diagnostics, splines and interactions Sandy Eckel seckel@jhsph.edu 19 May 2007 1 Logistic Regression Diagnostics Graphs to check assumptions Recall: Graphing was used to

### Multivariate Logistic Regression

1 Multivariate Logistic Regression As in univariate logistic regression, let π(x) represent the probability of an event that depends on p covariates or independent variables. Then, using an inv.logit formulation

### Linear Regression with One Regressor

Linear Regression with One Regressor Michael Ash Lecture 10 Analogy to the Mean True parameter µ Y β 0 and β 1 Meaning Central tendency Intercept and slope E(Y ) E(Y X ) = β 0 + β 1 X Data Y i (X i, Y

### In Chapter 2, we used linear regression to describe linear relationships. The setting for this is a

Math 143 Inference on Regression 1 Review of Linear Regression In Chapter 2, we used linear regression to describe linear relationships. The setting for this is a bivariate data set (i.e., a list of cases/subjects

### Statistical Modelling in Stata 5: Linear Models

Statistical Modelling in Stata 5: Linear Models Mark Lunt Arthritis Research UK Centre for Excellence in Epidemiology University of Manchester 08/11/2016 Structure This Week What is a linear model? How

### 25 Working with categorical data and factor variables

25 Working with categorical data and factor variables Contents 25.1 Continuous, categorical, and indicator variables 25.1.1 Converting continuous variables to indicator variables 25.1.2 Converting continuous

### Failure to take the sampling scheme into account can lead to inaccurate point estimates and/or flawed estimates of the standard errors.

Analyzing Complex Survey Data: Some key issues to be aware of Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised January 24, 2015 Rather than repeat material that is

### Interaction effects and group comparisons Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised February 20, 2015

Interaction effects and group comparisons Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised February 20, 2015 Note: This handout assumes you understand factor variables,

### Correlation and Regression

Correlation and Regression Scatterplots Correlation Explanatory and response variables Simple linear regression General Principles of Data Analysis First plot the data, then add numerical summaries Look

### August 2012 EXAMINATIONS Solution Part I

August 01 EXAMINATIONS Solution Part I (1) In a random sample of 600 eligible voters, the probability that less than 38% will be in favour of this policy is closest to (B) () In a large random sample,

### Using Stata 11 & higher for Logistic Regression Richard Williams, University of Notre Dame, Last revised March 28, 2015

Using Stata 11 & higher for Logistic Regression Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised March 28, 2015 NOTE: The routines spost13, lrdrop1, and extremes are

### Interaction effects between continuous variables (Optional)

Interaction effects between continuous variables (Optional) Richard Williams, University of Notre Dame, http://www.nd.edu/~rwilliam/ Last revised February 0, 05 This is a very brief overview of this somewhat

### When Does it Make Sense to Perform a Meta-Analysis?

CHAPTER 40 When Does it Make Sense to Perform a Meta-Analysis? Introduction Are the studies similar enough to combine? Can I combine studies with different designs? How many studies are enough to carry

### Regression Analysis. Data Calculations Output

Regression Analysis In an attempt to find answers to questions such as those posed above, empirical labour economists use a useful tool called regression analysis. Regression analysis is essentially a

### VI. Introduction to Logistic Regression

VI. Introduction to Logistic Regression We turn our attention now to the topic of modeling a categorical outcome as a function of (possibly) several factors. The framework of generalized linear models

### Using Stata for Categorical Data Analysis

Using Stata for Categorical Data Analysis NOTE: These problems make extensive use of Nick Cox s tab_chi, which is actually a collection of routines, and Adrian Mander s ipf command. From within Stata,

### Basic Statistics and Data Analysis for Health Researchers from Foreign Countries

Basic Statistics and Data Analysis for Health Researchers from Foreign Countries Volkert Siersma siersma@sund.ku.dk The Research Unit for General Practice in Copenhagen Dias 1 Content Quantifying association

### SUMAN DUVVURU STAT 567 PROJECT REPORT

SUMAN DUVVURU STAT 567 PROJECT REPORT SURVIVAL ANALYSIS OF HEROIN ADDICTS Background and introduction: Current illicit drug use among teens is continuing to increase in many countries around the world.

### 1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

1 Final Review 2 Review 2.1 CI 1-propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years

### Biostatistics: Types of Data Analysis

Biostatistics: Types of Data Analysis Theresa A Scott, MS Vanderbilt University Department of Biostatistics theresa.scott@vanderbilt.edu http://biostat.mc.vanderbilt.edu/theresascott Theresa A Scott, MS

### Statistics 305: Introduction to Biostatistical Methods for Health Sciences

Statistics 305: Introduction to Biostatistical Methods for Health Sciences Modelling the Log Odds Logistic Regression (Chap 20) Instructor: Liangliang Wang Statistics and Actuarial Science, Simon Fraser

### is paramount in advancing any economy. For developed countries such as

Introduction The provision of appropriate incentives to attract workers to the health industry is paramount in advancing any economy. For developed countries such as Australia, the increasing demand for

### especially with continuous

Handling interactions in Stata, especially with continuous predictors Patrick Royston & Willi Sauerbrei German Stata Users meeting, Berlin, 1 June 2012 Interactions general concepts General idea of a (two-way)

### BRIEF OVERVIEW ON INTERPRETING COUNT MODEL RISK RATIOS

BRIEF OVERVIEW ON INTERPRETING COUNT MODEL RISK RATIOS An Addendum to Negative Binomial Regression Cambridge University Press (2007) Joseph M. Hilbe 2008, All Rights Reserved This short monograph is intended

### Basic Statistical and Modeling Procedures Using SAS

Basic Statistical and Modeling Procedures Using SAS One-Sample Tests The statistical procedures illustrated in this handout use two datasets. The first, Pulse, has information collected in a classroom

### If several different trials are mentioned in one publication, the data of each should be extracted in a separate data extraction form.

General Remarks This template of a data extraction form is intended to help you to start developing your own data extraction form, it certainly has to be adapted to your specific question. Delete unnecessary

### IAPRI Quantitative Analysis Capacity Building Series. Multiple regression analysis & interpreting results

IAPRI Quantitative Analysis Capacity Building Series Multiple regression analysis & interpreting results How important is R-squared? R-squared Published in Agricultural Economics 0.45 Best article of the

### Nonlinear Regression Functions. SW Ch 8 1/54/

Nonlinear Regression Functions SW Ch 8 1/54/ The TestScore STR relation looks linear (maybe) SW Ch 8 2/54/ But the TestScore Income relation looks nonlinear... SW Ch 8 3/54/ Nonlinear Regression General

### Econ 371 Problem Set #4 Answer Sheet. P rice = (0.485)BDR + (23.4)Bath + (0.156)Hsize + (0.002)LSize + (0.090)Age (48.

Econ 371 Problem Set #4 Answer Sheet 6.5 This question focuses on what s called a hedonic regression model; i.e., where the sales price of the home is regressed on the various attributes of the home. The

### Title. Syntax. stata.com. fp Fractional polynomial regression. Estimation

Title stata.com fp Fractional polynomial regression Syntax Menu Description Options for fp Options for fp generate Remarks and examples Stored results Methods and formulas Acknowledgment References Also

### 10. Analysis of Longitudinal Studies Repeat-measures analysis

Research Methods II 99 10. Analysis of Longitudinal Studies Repeat-measures analysis This chapter builds on the concepts and methods described in Chapters 7 and 8 of Mother and Child Health: Research methods.

### Introduction. Survival Analysis. Censoring. Plan of Talk

Survival Analysis Mark Lunt Arthritis Research UK Centre for Excellence in Epidemiology University of Manchester 01/12/2015 Survival Analysis is concerned with the length of time before an event occurs.

### ESTIMATING AVERAGE TREATMENT EFFECTS: IV AND CONTROL FUNCTIONS, II Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics

ESTIMATING AVERAGE TREATMENT EFFECTS: IV AND CONTROL FUNCTIONS, II Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics July 2009 1. Quantile Treatment Effects 2. Control Functions

### Simple Linear Regression Inference

Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation

### MODEL I: DRINK REGRESSED ON GPA & MALE, WITHOUT CENTERING

Interpreting Interaction Effects; Interaction Effects and Centering Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised February 20, 2015 Models with interaction effects

### Department of Economics Session 2012/2013. EC352 Econometric Methods. Solutions to Exercises from Week 10 + 0.0077 (0.052)

Department of Economics Session 2012/2013 University of Essex Spring Term Dr Gordon Kemp EC352 Econometric Methods Solutions to Exercises from Week 10 1 Problem 13.7 This exercise refers back to Equation

### 2. Simple Linear Regression

Research methods - II 3 2. Simple Linear Regression Simple linear regression is a technique in parametric statistics that is commonly used for analyzing mean response of a variable Y which changes according

### ECON Introductory Econometrics. Lecture 17: Experiments

ECON4150 - Introductory Econometrics Lecture 17: Experiments Monique de Haan (moniqued@econ.uio.no) Stock and Watson Chapter 13 Lecture outline 2 Why study experiments? The potential outcome framework.

### 0.1 Multiple Regression Models

0.1 Multiple Regression Models We will introduce the multiple Regression model as a mean of relating one numerical response variable y to two or more independent (or predictor variables. We will see different

### Categorical Data Analysis

Richard L. Scheaffer University of Florida The reference material and many examples for this section are based on Chapter 8, Analyzing Association Between Categorical Variables, from Statistical Methods

### STATISTICS 8, FINAL EXAM. Last six digits of Student ID#: Circle your Discussion Section: 1 2 3 4

STATISTICS 8, FINAL EXAM NAME: KEY Seat Number: Last six digits of Student ID#: Circle your Discussion Section: 1 2 3 4 Make sure you have 8 pages. You will be provided with a table as well, as a separate

### I. Basics of Hypothesis Testing

Introduction to Hypothesis Testing This deals with an issue highly similar to what we did in the previous chapter. In that chapter we used sample information to make inferences about the range of possibilities

### NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates

### Lab 5 Linear Regression with Within-subject Correlation. Goals: Data: Use the pig data which is in wide format:

Lab 5 Linear Regression with Within-subject Correlation Goals: Data: Fit linear regression models that account for within-subject correlation using Stata. Compare weighted least square, GEE, and random

### Organizing Your Approach to a Data Analysis

Biost/Stat 578 B: Data Analysis Emerson, September 29, 2003 Handout #1 Organizing Your Approach to a Data Analysis The general theme should be to maximize thinking about the data analysis and to minimize

### Lecture 16. Endogeneity & Instrumental Variable Estimation (continued)

Lecture 16. Endogeneity & Instrumental Variable Estimation (continued) Seen how endogeneity, Cov(x,u) 0, can be caused by Omitting (relevant) variables from the model Measurement Error in a right hand

### ECON 142 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE #2

University of California, Berkeley Prof. Ken Chay Department of Economics Fall Semester, 005 ECON 14 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE # Question 1: a. Below are the scatter plots of hourly wages

### CONTENTS OF DAY 2. II. Why Random Sampling is Important 9 A myth, an urban legend, and the real reason NOTES FOR SUMMER STATISTICS INSTITUTE COURSE

1 2 CONTENTS OF DAY 2 I. More Precise Definition of Simple Random Sample 3 Connection with independent random variables 3 Problems with small populations 8 II. Why Random Sampling is Important 9 A myth,

### III. INTRODUCTION TO LOGISTIC REGRESSION. a) Example: APACHE II Score and Mortality in Sepsis

III. INTRODUCTION TO LOGISTIC REGRESSION 1. Simple Logistic Regression a) Example: APACHE II Score and Mortality in Sepsis The following figure shows 30 day mortality in a sample of septic patients as

### General Method: Difference of Means. 3. Calculate df: either Welch-Satterthwaite formula or simpler df = min(n 1, n 2 ) 1.

General Method: Difference of Means 1. Calculate x 1, x 2, SE 1, SE 2. 2. Combined SE = SE1 2 + SE2 2. ASSUMES INDEPENDENT SAMPLES. 3. Calculate df: either Welch-Satterthwaite formula or simpler df = min(n

### Overview Classes. 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7)

Overview Classes 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7) 2-4 Loglinear models (8) 5-4 15-17 hrs; 5B02 Building and

### Unit 12 Logistic Regression Supplementary Chapter 14 in IPS On CD (Chap 16, 5th ed.)

Unit 12 Logistic Regression Supplementary Chapter 14 in IPS On CD (Chap 16, 5th ed.) Logistic regression generalizes methods for 2-way tables Adds capability studying several predictors, but Limited to

: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary

### International Statistical Institute, 56th Session, 2007: Phil Everson

Teaching Regression using American Football Scores Everson, Phil Swarthmore College Department of Mathematics and Statistics 5 College Avenue Swarthmore, PA198, USA E-mail: peverso1@swarthmore.edu 1. Introduction

### Statistics in Medicine Research Lecture Series CSMC Fall 2014

Catherine Bresee, MS Senior Biostatistician Biostatistics & Bioinformatics Research Institute Statistics in Medicine Research Lecture Series CSMC Fall 2014 Overview Review concept of statistical power

### Analyzing Research Data Using Excel

Analyzing Research Data Using Excel Fraser Health Authority, 2012 The Fraser Health Authority ( FH ) authorizes the use, reproduction and/or modification of this publication for purposes other than commercial

### From this it is not clear what sort of variable that insure is so list the first 10 observations.

MNL in Stata We have data on the type of health insurance available to 616 psychologically depressed subjects in the United States (Tarlov et al. 1989, JAMA; Wells et al. 1989, JAMA). The insurance is

### Lecture 13. Use and Interpretation of Dummy Variables. Stop worrying for 1 lecture and learn to appreciate the uses that dummy variables can be put to

Lecture 13. Use and Interpretation of Dummy Variables Stop worrying for 1 lecture and learn to appreciate the uses that dummy variables can be put to Using dummy variables to measure average differences

### Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus Tihomir Asparouhov and Bengt Muthén Mplus Web Notes: No. 15 Version 8, August 5, 2014 1 Abstract This paper discusses alternatives

### Regression in Stata. Alicia Doyle Lynch Harvard-MIT Data Center (HMDC)

Regression in Stata Alicia Doyle Lynch Harvard-MIT Data Center (HMDC) Documents for Today Find class materials at: http://libraries.mit.edu/guides/subjects/data/ training/workshops.html Several formats

### Sample Size Planning, Calculation, and Justification

Sample Size Planning, Calculation, and Justification Theresa A Scott, MS Vanderbilt University Department of Biostatistics theresa.scott@vanderbilt.edu http://biostat.mc.vanderbilt.edu/theresascott Theresa

### Guide to Biostatistics

MedPage Tools Guide to Biostatistics Study Designs Here is a compilation of important epidemiologic and common biostatistical terms used in medical research. You can use it as a reference guide when reading

### Module 14: Missing Data Stata Practical

Module 14: Missing Data Stata Practical Jonathan Bartlett & James Carpenter London School of Hygiene & Tropical Medicine www.missingdata.org.uk Supported by ESRC grant RES 189-25-0103 and MRC grant G0900724

### Binary Diagnostic Tests Two Independent Samples

Chapter 537 Binary Diagnostic Tests Two Independent Samples Introduction An important task in diagnostic medicine is to measure the accuracy of two diagnostic tests. This can be done by comparing summary

### GETTING STARTED: STATA & R BASIC COMMANDS ECONOMETRICS II. Stata Output Regression of wages on education

GETTING STARTED: STATA & R BASIC COMMANDS ECONOMETRICS II Stata Output Regression of wages on education. sum wage educ Variable Obs Mean Std. Dev. Min Max -------------+--------------------------------------------------------

### Testing for serial correlation in linear panel-data models

The Stata Journal (2003) 3, Number 2, pp. 168 177 Testing for serial correlation in linear panel-data models David M. Drukker Stata Corporation Abstract. Because serial correlation in linear panel-data

### This section focuses on Chow Test and leaves general discussion on dummy variable models to other section.

Jeeshim and KUCC65 (3//008) Statistical Inferences in Linear Regression: 7 4. Tests of Structural Changes This section focuses on Chow Test and leaves general discussion on dummy variable models to other

### Logistic Regression. Introduction CHAPTER The Logistic Regression Model 14.2 Inference for Logistic Regression

Logistic Regression Introduction The simple and multiple linear regression methods we studied in Chapters 10 and 11 are used to model the relationship between a quantitative response variable and one or

### Basic Biostatistics for Clinical Research. Ramses F Sadek, PhD GRU Cancer Center

Basic Biostatistics for Clinical Research Ramses F Sadek, PhD GRU Cancer Center 1 1. Basic Concepts 2. Data & Their Presentation Part One 2 1. Basic Concepts Statistics Biostatistics Populations and samples

### SECOND M.B. AND SECOND VETERINARY M.B. EXAMINATIONS INTRODUCTION TO THE SCIENTIFIC BASIS OF MEDICINE EXAMINATION. Friday 14 March 2008 9.00-9.

SECOND M.B. AND SECOND VETERINARY M.B. EXAMINATIONS INTRODUCTION TO THE SCIENTIFIC BASIS OF MEDICINE EXAMINATION Friday 14 March 2008 9.00-9.45 am Attempt all ten questions. For each question, choose the

### Quantitative Methods for Economics Tutorial 9. Katherine Eyal

Quantitative Methods for Economics Tutorial 9 Katherine Eyal TUTORIAL 9 4 October 2010 ECO3021S Part A: Problems 1. In Problem 2 of Tutorial 7, we estimated the equation ŝleep = 3, 638.25 0.148 totwrk

### Inference for Regression

Simple Linear Regression Inference for Regression The simple linear regression model Estimating regression parameters; Confidence intervals and significance tests for regression parameters Inference about

### Regression Analysis: A Complete Example

Regression Analysis: A Complete Example This section works out an example that includes all the topics we have discussed so far in this chapter. A complete example of regression analysis. PhotoDisc, Inc./Getty

### An Article Critique - Helmet Use and Associated Spinal Fractures in Motorcycle Crash Victims. Ashley Roberts. University of Cincinnati

Epidemiology Article Critique 1 Running head: Epidemiology Article Critique An Article Critique - Helmet Use and Associated Spinal Fractures in Motorcycle Crash Victims Ashley Roberts University of Cincinnati

### Competing-risks regression

Competing-risks regression Roberto G. Gutierrez Director of Statistics StataCorp LP Stata Conference Boston 2010 R. Gutierrez (StataCorp) Competing-risks regression July 15-16, 2010 1 / 26 Outline 1. Overview

### Fairfield Public Schools

Mathematics Fairfield Public Schools AP Statistics AP Statistics BOE Approved 04/08/2014 1 AP STATISTICS Critical Areas of Focus AP Statistics is a rigorous course that offers advanced students an opportunity

### An analysis method for a quantitative outcome and two categorical explanatory variables.

Chapter 11 Two-Way ANOVA An analysis method for a quantitative outcome and two categorical explanatory variables. If an experiment has a quantitative outcome and two categorical explanatory variables that

### DATA INTERPRETATION AND STATISTICS

PholC60 September 001 DATA INTERPRETATION AND STATISTICS Books A easy and systematic introductory text is Essentials of Medical Statistics by Betty Kirkwood, published by Blackwell at about 14. DESCRIPTIVE

### MISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group

MISSING DATA TECHNIQUES WITH SAS IDRE Statistical Consulting Group ROAD MAP FOR TODAY To discuss: 1. Commonly used techniques for handling missing data, focusing on multiple imputation 2. Issues that could

### Randomization in Clinical Trials

in Clinical Trials Versio.0 May 2011 1. Simple 2. Block randomization 3. Minimization method Stratification RELATED ISSUES 1. Accidental Bias 2. Selection Bias 3. Prognostic Factors 4. Random selection

### Fixed-Effect Versus Random-Effects Models

CHAPTER 13 Fixed-Effect Versus Random-Effects Models Introduction Definition of a summary effect Estimating the summary effect Extreme effect size in a large study or a small study Confidence interval

### Statistics II Final Exam - January Use the University stationery to give your answers to the following questions.

Statistics II Final Exam - January 2012 Use the University stationery to give your answers to the following questions. Do not forget to write down your name and class group in each page. Indicate clearly