1.2 DISTRIBUTIONS FOR CATEGORICAL DATA
|
|
|
- Veronica Banks
- 9 years ago
- Views:
Transcription
1 DISTRIBUTIONS FOR CATEGORICAL DATA 5 present models for a categorcal response wth matched pars; these apply, for nstance, wth a categorcal response measured for the same subjects at two tmes. Chapter 11 covers models for more general types of repeated categorcal data, such as longtudnal data from several tmes wth explanatory varables. In Chapter 1 we present a broad class of models, generalzed lnear mxed models, that use random effects to account for dependence wth such data. In Chapter 13 further extensons and applcatons of the models from Chapters 10 through 1 are descrbed. The fourth and fnal unt s more theoretcal. In Chapter 14 we develop asymptotc theory for categorcal data models. Ths theory s the bass for large-sample behavor of model parameter estmators and goodness-of-ft statstcs. Maxmum lkelhood estmaton receves prmary attenton here and throughout the book, but Chapter 15 covers alternatve methods of estmaton, such as the Bayesan paradgm. Chapter 16 stands alone from the others, beng a hstorcal overvew of the development of categorcal data methods. Most categorcal data methods requre extensve computatons, and statstcal software s necessary for ther effectve use. In Appendx A we dscuss software that can perform the analyses n ths book and show the use of SAS for text examples. See the Web ste aarcdarcda.html to download sample programs and data sets and fnd nformaton about other software. Chapter 1 provdes background materal. In Secton 1. we revew the key dstrbutons for categorcal data: the bnomal, multnomal, and Posson. In Secton 1.3 we revew the prmary mechansms for statstcal nference, usng maxmum lkelhood. In Sectons 1.4 and 1.5 we llustrate these by presentng sgnfcance tests and confdence ntervals for bnomal and multnomal parameters. 1. DISTRIBUTIONS FOR CATEGORICAL DATA Inferental data analyses requre assumptons about the random mechansm that generated the data. For regresson models wth contnuous responses, the normal dstrbuton plays the central role. In ths secton we revew the three key dstrbutons for categorcal responses: bnomal, multnomal, and Posson Bnomal Dstrbuton Many applcatons refer to a fxed number n of bnary observatons. Let y 1, y,..., yn denote responses for n ndependent and dentcal trals such that PYs Ž 1. s and PYs Ž 0. s 1 y. We use the generc labels success and falure for outcomes 1 and 0. Identcal trals means that the probablty of success s the same for each tral. Independent trals means
2 6 INTRODUCTION: DISTRIBUTIONS AND INFERENCE FOR CATEGORICAL DATA that the Y 4 are ndependent random varables. These are often called Bernoull trals. The total number of successes, Y s Ý n s1y, has the bnomal dstrbuton wth ndex n and parameter, denoted by bn Ž n,.. The probablty mass functon for the possble outcomes y for Y s ž / n y nyy pž y. s Ž 1 y., y s 0, 1,,...,n, Ž 1.1. y ž/ n y where the bnomal coeffcent s n!rw y! Ž n y y.!.snce x EY s EY s 1 q 0 Ž 1 y. s, EŽ Y. s and varž Y. s Ž 1 y.. The bnomal dstrbuton for Y s ÝY has mean and varance s EŽ Y. s n and s varž Y. s n Ž 1 y.. The skewness s descrbed by EYy r s 1 y r n Ž 1 y.. The dstrbuton converges to normalty as n ncreases, for fxed. There s no guarantee that successve bnary observatons are ndependent or dentcal. Thus, occasonally, we wll utlze other dstrbutons. One such case s samplng bnary outcomes wthout replacement from a fnte populaton, such as observatons on gender for 10 students sampled from a class of sze 0. The hypergeometrc dstrbuton, studed n Secton 3.5.1, s then relevant. In Secton 1..4 we menton another case that volates these bnomal assumptons. 3 3 ' 1.. Multnomal Dstrbuton Some trals have more than two possble outcomes. Suppose that each of n ndependent, dentcal trals can have outcome n any of c categores. Let yjs 1 f tral has outcome n category j and yjs 0 otherwse. Then y s Ž y, y,..., y. 1 c represents a multnomal tral, wth Ý j yj s 1; for nstance, Ž 0, 0, 1, 0. denotes outcome n category 3 of four possble categores. Note that yc s redundant, beng lnearly dependent on the others. Let n js Ýyj denote the number of trals havng outcome n category j. The counts Ž n, n,..., n. 1 c have the multnomal dstrbuton. Let s PY Ž s 1. j j denote the probablty of outcome n category j for each tral. The multnomal probablty mass functon s ž / 1 c n! n 1 n n pž n 1, n,...,ncy1. s 1 c c. Ž 1.. n! n! n!
3 DISTRIBUTIONS FOR CATEGORICAL DATA 7 Snce Ý n s n, ths s Ž cy1. -dmensonal, wth n s n y Ž j j c n1 q qn. cy1. The bnomal dstrbuton s the specal case wth c s. For the multnomal dstrbuton, EŽ n. s n, varž n. s n 1 y, covž n, n. syn. j j j j j j k j k Ž 1.3. We derve the covarance n Secton The margnal dstrbuton of each n s bnomal. j 1..3 Posson Dstrbuton Sometmes, count data do not result from a fxed number of trals. For nstance, f y s number of deaths due to automoble accdents on motorways n Italy durng ths comng week, there s no fxed upper lmt n for y Žas you are aware f you have drven n Italy.. Snce y must be a nonnegatve nteger, ts dstrbuton should place ts mass on that range. The smplest such dstrbuton s the Posson. Its probabltes depend on a sngle parameter, the mean. The Posson probablty mass functon Ž Posson 1837, p. 06. s e y y pž y. s, y s 0, 1,,.... Ž 1.4. y! It satsfes EY s varž Y. s. It s unmodal wth mode equal to the 3 3 nteger part of. Its skewness s descrbed by EYy r s 1r'. The dstrbuton approaches normalty as ncreases. The Posson dstrbuton s used for counts of events that occur randomly over tme or space, when outcomes n dsjont perods or regons are ndependent. It also apples as an approxmaton for the bnomal when n s large and s small, wth s n. Sofeach of the 50 mllon people drvng n Italy next week s an ndependent tral wth probablty of dyng n a fatal accdent that week, the number of deaths Y s a bnž , varate, or approxmately Posson wth s n s 50,000,000Ž s 100. A key feature of the Posson dstrbuton s that ts varance equals ts mean. Sample counts vary more when ther mean s hgher. When the mean number of weekly fatal accdents equals 100, greater varablty occurs n the weekly counts than when the mean equals Overdsperson In practce, count observatons often exhbt varablty exceedng that predcted by the bnomal or Posson. Ths phenomenon s called o erdsperson. We assumed above that each person has the same probablty of dyng n a fatal accdent n the next week. More realstcally, these probabltes vary,
4 8 INTRODUCTION: DISTRIBUTIONS AND INFERENCE FOR CATEGORICAL DATA due to factors such as amount of tme spent drvng, whether the person wears a seat belt, and geographcal locaton. Such varaton causes fatalty counts to dsplay more varaton than predcted by the Posson model. Suppose that Y s a random varable wth varance varžy. for gven, but tself vares because of unmeasured factors such as those just descrbed. Let s EŽ.. Then uncondtonally, EŽ Y. s E EŽ Y., varž Y. s E varž Y. q var EŽ Y.. When Y s condtonally Posson Ž gven., for nstance, then EY s EŽ. s and varž Y. s EŽ. q varž. s q varž.. Assumng a Posson dstrbuton for a count varable s often too smplstc, because of factors that cause overdsperson. The negat e bnomal s a related dstrbuton for count data that permts the varance to exceed the mean. We ntroduce t n Secton Analyses assumng bnomal Ž or multnomal. dstrbutons are also sometmes nvald because of overdsperson. Ths mght happen because the true dstrbuton s a mxture of dfferent bnomal dstrbutons, wth the parameter varyng because of unmeasured varables. To llustrate, suppose that an experment exposes pregnant mce to a toxn and then after a week observes the number of fetuses n each mouse s ltter that show sgns of malformaton. Let n denote the number of fetuses n the ltter for mouse. The mce also vary accordng to other factors that may not be measured, such as ther weght, overall health, and genetc makeup. Extra varaton then occurs because of the varablty from ltter to ltter n the probablty of malformaton. The dstrbuton of the number of fetuses per ltter showng malformatons mght cluster near 0 and near n, showng more dsperson than expected for bnomal samplng wth a sngle value of. Overdsperson could also occur when vares among fetuses n a ltter accordng to some dstrbuton Ž Problem In Chapters 4, 1, and 13 we ntroduce methods for data that are overdspersed relatve to bnomal and Posson assumptons Connecton between Posson and Multnomal Dstrbutons In Italy ths next week, let y1 s number of people who de n automoble accdents, y s number who de n arplane accdents, and y3 s number who de n ralway accdents. A Posson model for Ž Y, Y, Y. 1 3 treats these as ndependent Posson random varables, wth parameters Ž 1,, 3.. The jont probablty mass functon for Y 4 s the product of the three mass functons of form Ž The total n s ÝY also has a Posson dstrbuton, wth parameter Ý. Wth Posson samplng the total count n s random rather than fxed. If we assume a Posson model but condton on n, Y 4 no longer have Posson dstrbutons, snce each Y cannot exceed n. Gvenn, Y 4 are also no longer ndependent, snce the value of one affects the possble range for the others.
5 STATISTICAL INFERENCE FOR CATEGORICAL DATA 9 For c ndependent Posson varates, wth EY s, let s derve ther condtonal dstrbuton gven that ÝY s n. The condtonal probablty of a set of counts n 4 satsfyng ths condton s P Ž Y1s n 1, Ys n,...,ycs nc. Ý Yjs n s PŽ Y1s n 1, Ys n,...,ycs nc. P Ž ÝY s n. j s s, Ž 1.5. n Ł exp y rn! n! n Ł n expž yý Ý rn! Ł n! j.ž j. where s rž Ý.4.Thssthe multnomal Žn, 4. j dstrbuton, charac- terzed by the sample sze n and the probabltes 4. Many categorcal data analyses assume a multnomal dstrbuton. Such analyses usually have the same parameter estmates as those of analyses assumng a Posson dstrbuton, because of the smlarty n the lkelhood functons. 1.3 STATISTICAL INFERENCE FOR CATEGORICAL DATA The choce of dstrbuton for the response varable s but one step of data analyss. In practce, that dstrbuton has unknown parameter values. In ths secton we revew methods of usng sample data to make nferences about the parameters. Sectons 1.4 and 1.5 cover bnomal and multnomal parameters Lkelhood Functons and Maxmum Lkelhood Estmaton In ths book we use maxmum lkelhood for parameter estmaton. Under weak regularty condtons, such as the parameter space havng fxed dmenson wth true value fallng n ts nteror, maxmum lkelhood estmators have desrable propertes: They have large-sample normal dstrbutons; they are asymptotcally consstent, convergng to the parameter as n ncreases; and they are asymptotcally effcent, producng large-sample standard errors no greater than those from other estmaton methods. Gven the data, for a chosen probablty dstrbuton the lkelhood functon s the probablty of those data, treated as a functon of the unknown parameter. The maxmum lkelhood Ž ML. estmate s the parameter value that maxmzes ths functon. Ths s the parameter value under whch the data observed have the hghest probablty of occurrence. The parameter value that maxmzes the lkelhood functon also maxmzes the log of that functon. It s smpler to maxmze the log lkelhood snce t s a sum rather than a product of terms.
6 10 INTRODUCTION: DISTRIBUTIONS AND INFERENCE FOR CATEGORICAL DATA We denote a parameter for a generc problem by and ts ML estmate by. ˆ The lkelhood functon s l Ž. and the log-lkelhood functon s LŽ. s logwl Ž.x. For many models, LŽ. has concave shape and ˆ s the pont at whch the dervatve equals 0. The ML estmate s then the soluton of the lkelhood equaton, LŽ. r s 0. Often, s multdmensonal, denoted by, and ˆ s the soluton of a set of lkelhood equatons. Let SE denote the standard error of, ˆ and let covž ˆ. denote the asymptotc covarance matrx of. ˆ Under regularty condtons ŽRao 1973, p. 364., covž ˆ. s the nverse of the nformaton matrx. The Ž j, k. element of the nformaton matrx s ž / L Ž ye.. Ž 1.6. The standard errors are the square roots of the dagonal elements for the nverse nformaton matrx. The greater the curvature of the log lkelhood, the smaller the standard errors. Ths s reasonable, snce large curvature mples that the log lkelhood drops quckly as moves away from ; ˆ hence, the data would have been much more lkely to occur f took a value near ˆ rather than a value far from. ˆ j k 1.3. Lkelhood Functon and ML Estmate for Bnomal Parameter The part of a lkelhood functon nvolvng the parameters s called the kernel. Snce the maxmzaton of the lkelhood s wth respect to the parameters, the rest s rrelevant. To llustrate, consder the bnomal dstrbuton Ž The bnomal coeffn ž/ y cent has no nfluence on where the maxmum occurs wth respect to. Thus, we gnore t and treat the kernel as the lkelhood functon. The bnomal log lkelhood s then y nyy L s log 1 y s ylog q n y y log 1 y. 1.7 Dfferentatng wth respect to yelds LŽ. r s yr y Ž n y y. rž 1 y. s Ž y y n. r Ž 1 y.. Ž 1.8. Equatng ths to 0 gves the lkelhood equaton, whch has soluton ˆ s yrn, the sample proporton of successes for the n trals. Calculatng L r, takng the expectaton, and combnng terms, we get ye L r s E yr q n y y r 1 y s nr 1 y. Ž 1.9.
7 STATISTICAL INFERENCE FOR CATEGORICAL DATA 11 Thus, the asymptotc varance of ˆ s Ž 1 y. rn. Ths s no surprse. Snce EY s n and varž Y. s n Ž 1 y., the dstrbuton of ˆ s Yrn has mean and standard error ( Ž 1 y. EŽ ˆ. s, Ž ˆ. s. n Wald Lkelhood Rato Score Test Trad Three standard ways exst to use the lkelhood functon to perform large-sample nference. We ntroduce these for a sgnfcance test of a null hypothess H 0: s 0 and then dscuss ther relaton to nterval estmaton. They all explot the large-sample normalty of ML estmators. Wth nonnull standard error SE of, ˆ the test statstc Ž 0. z s ˆ y rse has an approxmate standard normal dstrbuton when s 0. One refers z to the standard normal table to obtan one- or two-sded P-values. Equvalently, for the two-sded alternatve, z has a ch-squared null dstrbuton wth 1 degree of freedom Ž df.; the P-value s then the rght-taled ch-squared probablty above the observed value. Ths type of statstc, usng the nonnull standard error, s called a Wald statstc Ž Wald The multvarate extenson for the Wald test of H 0: s 0 has test statstc y1 Ž ˆ. Ž ˆ. Ž ˆ 0 0. W s y cov y. Ž The prme on a vector or matrx denotes the transpose.. The nonnull covarance s based on the curvature Ž 1.6. of the log lkelhood at. ˆ The asymptotc multvarate normal dstrbuton for ˆ mples an asymptotc ch-squared dstrbuton for W. The df equal the rank of covž ˆ., whch s the number of nonredundant parameters n. A second general-purpose method uses the lkelhood functon through the rato of two maxmzatons: Ž. 1 the maxmum over the possble parameter values under H, and Ž. 0 the maxmum over the larger set of parameter values permttng H0 or an alternatve Ha to be true. Let l denote the 0 maxmzed value of the lkelhood functon under H 0, and let l denote the 1 maxmzed value generally e., under H j H. 0 a. For nstance, for parameter vector s Ž,. 0 1 and H 0: 0s 0, l s the lkelhood functon calculated 1 at the value for whch the data would have been most lkely; l s the 0 lkelhood functon calculated at the 1 value for whch the data would have been most lkely, when 0 s 0. Then l s always at least as large as 1 l, snce l results from maxmzng over a restrcted set of the parameter 0 0 values.
8 1 INTRODUCTION: DISTRIBUTIONS AND INFERENCE FOR CATEGORICAL DATA The rato s l rl of the maxmzed lkelhoods cannot exceed 1. Wlks 0 1 Ž 1935, showed that y log has a lmtng null ch-squared dstrbuton, as n. The df equal the dfference n the dmensons of the parameter spaces under H0 j Ha and under H 0. The lkelhood-rato test statstc equals y log sy logž l rl. syž L0y L 1., 0 1 where L0 and L1 denote the maxmzed log-lkelhood functons. The thrd method uses the score statstc, due to R. A. Fsher and C. R. Rao. The score test s based on the slope and expected curvature of the log-lkelhood functon LŽ. at the null value 0.Itutlzes the sze of the score functon už. s LŽ. r, evaluated at. The value už. tends to be larger n absolute value when ˆ 0 0 w s farther from. Denote ye LŽ. r x e., the nformaton. 0 evaluated at by Ž.. The score statstc s the rato of už to ts null SE, whch s w Ž.x 0 1r. Ths has an approxmate standard normal null dstrbuton. The ch-squared form of the score statstc s 0 L r 0 už. s, Ž. ye LŽ. r 0 0 where the partal dervatve notaton reflects dervatves wth respect to that are evaluated at 0.Inthe multparameter case, the score statstc s a quadratc form based on the vector of partal dervatves of the log lkelhood wth respect to and the nverse nformaton matrx, both evaluated at the H estmates e., assumng that s Fgure 1.1 s a generc plot of a log-lkelhood LŽ. for the unvarate case. It llustrates the three tests of H 0: s 0. The Wald test uses the ˆ Ž ˆ. behavor of L at the ML estmate, havng ch-squared form rse. The SE of ˆ depends on the curvature of LŽ. at. ˆ The score test s based on the slope and curvature of LŽ. at s 0. The lkelhood-rato test combnes nformaton about LŽ. at both ˆ and 0 s 0. It compares the log-lkelhood values L at ˆ 1 and L0 at 0s 0 usng the ch-squared statstc yž L y L InFgure 1.1, ths statstc s twce the vertcal dstance between values of LŽ. at ˆ and at 0. In a sense, ths statstc uses the most nformaton of the three types of test statstc and s the most versatle. As n, the Wald, lkelhood-rato, and score tests have certan asymptotc equvalences Ž Cox and Hnkley 1974, Sec For small to moderate sample szes, the lkelhood-rato test s usually more relable than the Wald test.
9 STATISTICAL INFERENCE FOR CATEGORICAL DATA 13 FIGURE 1.1 Log-lkelhood functon and nformaton used n three tests of H : s Constructng Confdence Intervals In practce, t s more nformatve to construct confdence ntervals for parameters than to test hypotheses about ther values. For any of the three test methods, a confdence nterval results from nvertng the test. For nstance, a 95% confdence nterval for s the set of 0 for whch the test of H 0: s 0 has a P-value exceedng Let za denote the z-score from the standard normal dstrbuton havng rght-taled probablty a; ths s the 100Ž 1 y a. percentle of that dstrbuton. Let Ž a. denote the 100Ž 1 y a. df percentle of the ch-squared dstrbuton wth degrees of freedom df. 100Ž 1 y.% confdence ntervals based on asymptotc normalty use z r, for nstance z0.05 s 1.96 for 95% confdence. The Wald confdence nterval s the set of for whch ˆ y 0 0 rse z r. Ths gves the nterval ˆ z Ž SE. r. The lkelhood-rato-based confdence w Ž ˆ.x nterval s the set of for whch y L y L Ž.. w Recall that s z. x 1 r When ˆ has a normal dstrbuton, the log-lkelhood functon has a parabolc shape e., a second-degree polynomal.. For small samples wth categorcal data, ˆ may be far from normalty and the log-lkelhood functon can be far from a symmetrc, parabolc-shaped curve. Ths can also happen wth moderate to large samples when a model contans many parameters. In such cases, nference based on asymptotc normalty of ˆ may have nadequate performance. A marked dvergence n results of Wald and lkelhoodrato nference ndcates that the dstrbuton of ˆ may not be close to normalty. The example n Secton llustrates ths wth qute dfferent confdence ntervals for dfferent methods. In many such cases, nference can
10 14 INTRODUCTION: DISTRIBUTIONS AND INFERENCE FOR CATEGORICAL DATA nstead utlze an exact small-sample dstrbuton or hgher-order asymptotc methods that mprove on smple normalty Že.g., Perce and Peters The Wald confdence nterval s most common n practce because t s smple to construct usng ML estmates and standard errors reported by statstcal software. The lkelhood-rato-based nterval s becomng more wdely avalable n software and s preferable for categorcal data wth small to moderate n. For the best known statstcal model, regresson for a normal response, the three types of nference necessarly provde dentcal results. 1.4 STATISTICAL INFERENCE FOR BINOMIAL PARAMETERS In ths secton we llustrate nference methods for categorcal data by presentng tests and confdence ntervals for the bnomal parameter, based on y successes n n ndependent trals. In Secton 1.3. we obtaned the lkelhood functon and ML estmator ˆ s yrn of Tests about a Bnomal Parameter Consder H 0: s 0. Snce H0 has a sngle parameter, we use the normal rather than ch-squared forms of Wald and score test statstcs. They permt tests aganst one-sded as well as two-sded alternatves. The Wald statstc s ˆ y 0 y ˆ 0 zw s s. Ž SE ' ˆŽ 1 y ˆ. rn Evaluatng the bnomal score Ž 1.8. and nformaton Ž 1.9. at 0 yelds y ny y n už 0. s y, Ž 0. s. 1 y Ž 1 y The normal form of the score statstc smplfes to už 0. y y n 0 y ˆ 0 zs s s s. Ž r. Ž. n 0Ž 1 y 0. 0Ž 1 y 0. rn 0 ' Whereas the Wald statstc zw uses the standard error evaluated at ˆ, the score statstc zs uses t evaluated at 0. The score statstc s preferable, as t uses the actual null SE rather than an estmate. Its null samplng dstrbuton s closer to standard normal than that of the Wald statstc. The bnomal log-lkelhood functon Ž 1.7. equals L0 s ylog 0 q Ž n y y. logž 1 y. under H and L s y log ˆ q Ž n y y. logž 1 y ˆ. more '
11 STATISTICAL INFERENCE FOR BINOMIAL PARAMETERS 15 generally. The lkelhood-rato test statstc smplfes to Expressed as / 0 0 ˆ 1 y ˆ yž L0y L1. s y log q Ž n y y. log. ž 1 y / y ny y yž L0y L1. s ž y log q Ž n y y. log, n n y n 0 0 t compares observed success and falure counts to ftted.e., null counts by observed Ý observed log. Ž 1.1. ftted We ll see that ths formula also holds for tests about Posson and multnomal parameters. Snce no unknown parameters occur under H0 and one occurs under H, Ž 1.1. has an asymptotc ch-squared dstrbuton wth df s 1. a 1.4. Confdence Intervals for a Bnomal Parameter A sgnfcance test merely ndcates whether a partcular value Žsuch as s 0.5. s plausble. We learn more by usng a confdence nterval to determne the range of plausble values. Invertng the Wald test statstc gves the nterval of 0 values for whch z z,or W r ( ˆ Ž 1 y ˆ. ˆ z r. Ž n Hstorcally, ths was one of the frst confdence ntervals used for any parameter Ž Laplace 181, p Unfortunately, t performs poorly unless n s very large Ž e.g., Brown et al The actual coverage probablty usually falls below the nomnal confdence coeffcent, much below when s near 0 1 or 1. A smple adjustment that adds z r observatons of each type to the sample before usng ths formula performs much better Ž Problem The score confdence nterval contans values for whch z 0 S z r. Its endponts are the solutons to the equatons 0 ˆ y 0 r' 0Ž 1 y 0. rn s z r.
12 16 INTRODUCTION: DISTRIBUTIONS AND INFERENCE FOR CATEGORICAL DATA These are quadratc n. Frst dscussed by E. B. Wlson Ž , ths nterval s n 1 z r ˆ q ž / ž / n q z r nq z r ) ž /ž / ž / ž / 1 n 1 1 z r z r ˆ Ž 1 y ˆ. q. n q z nq z nq z r r r 1 The mdpont of the nterval s a weghted average of ˆ and, where the Ž weght nr n q z. r gven ˆ ncreases as n ncreases. Combnng terms, ths Ž. Ž mdpont equals s y q z r r n q z. r r. Ths s the sample proporton for an adjusted sample that adds z r observatons, half of each type. The square of the coeffcent of z r n ths formula s a weghted average of the varance of a sample proporton when s ˆ and the varance of a sample 1 proporton when s, usng the adjusted sample sze n q z r n place of n. Ths nterval has much better performance than the Wald nterval. The lkelhood-rato-based confdence nterval s more complex computatonally, but smple n prncple. It s the set of 0 for whch the lkelhoodrato test has a P-value exceedng. Equvalently, t s the set of 0 for whch double the log lkelhood drops by less than Ž. 1 from ts value at the ML estmate ˆ s yrn Proporton of Vegetarans Example To collect data n an ntroductory statstcs course, recently I gave the students a questonnare. One queston asked each student whether he or she was a vegetaran. Of n s 5 students, y s 0 answered yes. They were not a random sample of a partcular populaton, but we use these data to llustrate 95% confdence ntervals for a bnomal parameter. Snce y s 0, ˆ s 0r5 s 0. Usng the Wald approach, the 95% confdence nterval for s ' Ž r5, or Ž 0, 0.. When the observaton falls at the boundary of the sample space, often Wald methods do not provde sensble answers. By contrast, the 95% score nterval equals Ž 0.0, Ths s a more belevable nference. For H 0: s 0.5, for nstance, the score test statstc s z S s 0 y 0.5 r' Ž r5 sy5.0, so 0.5 does not fall n the nterval. By contrast, for H 0: s 0.10, zs s 0 y 0.10 r' Ž r5 sy1.67, so 0.10 falls n the nterval.
13 STATISTICAL INFERENCE FOR BINOMIAL PARAMETERS 17 When y s 0 and n s 5, the kernel of the lkelhood functon s l Ž. s 0 Ž 1 y. 5 s Ž 1 y. 5. The log lkelhood Ž 1.7. s LŽ. s 5 logž 1 y.. Note that LŽ ˆ. s LŽ 0. s 0. The 95% lkelhood-rato confdence nterval s the set of for whch the lkelhood-rato statstc 0 yž L y L. sy LŽ. y LŽ ˆ sy50 logž 1 y 0. F 1 Ž s The upper bound s 1 y expž y3.84r50. s 0.074, and the confdence nterval equals Ž 0.0, win ths book, we use the natural logarthm throughout, so ts nverse s the exponental functon expž x. s e x. x Fgure 1. shows the lkelhood and log-lkelhood functons and the correspondng confdence regon for. The three large-sample methods yeld qute dfferent results. When s near 0, the samplng dstrbuton of ˆ s hghly skewed to the rght for small n. It s worth consderng alternatve methods not requrng asymptotc approxmatons. FIGURE 1. Bnomal lkelhood and log lkelhood when y s 0nn s 5 trals, and confdence nterval for.
14 18 INTRODUCTION: DISTRIBUTIONS AND INFERENCE FOR CATEGORICAL DATA Exact Small-Sample Inference* 1 Wth modern computatonal power, t s not necessary to rely on large-sample approxmatons for the dstrbuton of statstcs such as. ˆ Tests and confdence ntervals can use the bnomal dstrbuton drectly rather than ts normal approxmaton. Such nferences occur naturally for small samples, but apply for any n. We llustrate by testng H : s 0.5 aganst H : 0.5 for the survey 0 a results on vegetaransm, y s 0 wth n s 5. We noted that the score statstc equals z sy5.0. The exact P-value for ths statstc, based on the null bn 5, 0.5 dstrbuton, s 5 5 PŽ z G 5.0. s PŽ Ys 0orY s 5. s 0.5 q 0.5 s Ž 1 y.% confdence ntervals consst of all 0 for whch P-values exceed n exact bnomal tests. The best known nterval ŽClopper and Pearson uses the tal method for formng confdence ntervals. It requres each one-sded P-value to exceed r. The lower and upper endponts are the solutons n to the equatons 0 n y n k nyk n k Ýž / 0 0 Ý ž / 0 0 k k ksy ks0 nyk Ž 1 y. s r and Ž 1 y. s r, except that the lower bound s 0 when y s 0 and the upper bound s 1 when y s n. When y s 1,,..., n y 1, from connectons between bnomal sums and the ncomplete beta functon and related cumulatve dstrbuton functons Ž cdf s. of beta and F dstrbutons, the confdence nterval equals y1 n y y q 1 n y y 1q 1 q, yf Ž 1 y r. Ž y q 1. F Ž r. y,ž nyyq1. Ž yq1.,ž nyy. where F Ž c. a, b denotes the 1 y c quantle from the F dstrbuton wth degrees of freedom a and b. When y s 0 wth n s 5, the Clopper Pearson 95% confdence nterval for s Ž 0.0, In prncple ths approach seems deal. However, there s a serous complcaton. Because of dscreteness, the actual coverage probablty for any s at least as large as the nomnal confdence level ŽCasella and Berger 001, p. 434; Neyman and t can be much greater. Smlarly, for a test of H 0: s 0 at a fxed desred sze such as 0.05, t s not usually possble to acheve that sze. There s a fnte number of possble samples, and hence a fnte number of possble P-values, of whch 0.05 may not be one. In testng H wth fxed, one can pck a partcular that can occur as a P-value Sectons marked wth an astersk are less mportant for an overvew. y1
15 STATISTICAL INFERENCE FOR BINOMIAL PARAMETERS 19 FIGURE 1.3 Plot of coverage probabltes for nomnal 95% confdence ntervals for bnomal parameter when n s 5. For nterval estmaton, however, ths s not an opton. Ths s because constructng the nterval corresponds to nvertng an entre range of 0 values n H 0: s 0, and each dstnct 0 value can have ts own set of possble P-values; that s, there s not a sngle null parameter value 0 as n one test. For any fxed parameter value, the actual coverage probablty can be much larger than the nomnal confdence level. When n s 5, Fgure 1.3 plots the coverage probabltes as a functon of for the Clopper Pearson method, the score method, and the Wald method. At a fxed value wth a gven method, the coverage probablty s the sum of the bnomal probabltes of all those samples for whch the resultng nterval contans that. There are 6 possble samples and 6 correspondng confdence ntervals, so the coverage probablty s a sum of somewhere between 0 and 6 bnomal probabltes. As moves from 0 to 1, ths coverage probablty jumps up or down whenever moves nto or out of one of these ntervals. Fgure 1.3 shows that coverage probabltes are too low for the Wald method, whereas the Clopper Pearson method errs n the opposte drecton. The score method behaves well, except for some values close to 0 or 1. Its coverage probabltes tend to be near the nomnal level, not beng consstently conservatve or lberal. Ths s a good method unless s very close to 0 or 1 Ž Problem In dscrete problems usng small-sample dstrbutons, shorter confdence ntervals usually result from nvertng a sngle two-sded test rather than two
16 0 INTRODUCTION: DISTRIBUTIONS AND INFERENCE FOR CATEGORICAL DATA one-sded tests. The nterval s then the set of parameter values for whch the P-value of a two-sded test exceeds. For the bnomal parameter, see Blaker Ž 000., Blyth and Stll Ž 1983., and Sterne Ž for methods. For observed outcome y o, wth Blaker s approach the P-value s the mnmum of the two one-taled bnomal probabltes PYG Ž y. and PYF Ž y. o o plus an attanable probablty n the other tal that s as close as possble to, but not greater than, that one-taled probablty. The nterval s computatonally more complex, although avalable n software Ž Blaker gave S-Plus functons.. The result s stll conservatve, but less so than the Clopper Pearson nterval. For the vegetaransm example, the 95% confdence nterval usng the Blaker exact method s Ž 0.0, compared to the Clopper Pearson nterval of Ž 0.0, Inference Based on the Md-P-Value* To adjust for dscreteness n small-sample dstrbutons, one can base nference on the md-p- alue Ž Lancaster For a test statstc T wth observed value toand one-sded Hasuch that large T contradcts H 0, 1 md-p-value s P Ts to q P T t o, wth probabltes calculated from the null dstrbuton. Thus, the md-p-value s less than the ordnary P-value by half the probablty of the observed result. Compared to the ordnary P-value, the md-p-value behaves more lke the P-value for a test statstc havng a contnuous dstrbuton. The sum of ts two one-sded P-values equals 1.0. Although dscrete, under H0 ts null dstrbuton s more lke the unform dstrbuton that occurs n the contnuous case. For nstance, t has a null expected value of 0.5, whereas ths expected value exceeds 0.5 for the ordnary P-value for a dscrete test statstc. Unlke an exact test wth ordnary P-value, a test usng the md-p-value does not guarantee that the probablty of type I error s no greater than a nomnal value Ž Problem However, t usually performs well, typcally beng a bt conservatve. It s less conservatve than the ordnary exact test. Smlarly, one can form less conservatve confdence ntervals by nvertng tests usng the exact dstrbuton wth the md-p-value Že.g., the 95% confdence nterval s the set of parameter values for whch the md-p-value exceeds For testng H 0: s 0.5 aganst H a: 0.5 n the example about the proporton of vegetarans, wth y s 0 for n s 5, the result observed s the most extreme possble. Thus the md-p-value s half the ordnary P-value, or Usng the Clopper Pearson nverson of the exact bnomal test but wth the md-p-value yelds a 95% confdence nterval of Ž 0.000, for, compared to Ž 0.000, for the ordnary Clopper Pearson nterval. The md-p-value seems a sensble compromse between havng overly conservatve nference and usng rrelevant randomzaton to elmnate prob-
17 STATISTICAL INFERENCE FOR MULTINOMIAL PARAMETERS 1 lems from dscreteness. We recommend t both for tests and confdence ntervals wth hghly dscrete dstrbutons. 1.5 STATISTICAL INFERENCE FOR MULTINOMIAL PARAMETERS We now present nference for multnomal parameters 4 j.ofn observa- tons, n j occur n category j, j s 1,...,c Estmaton of Multnomal Parameters Frst, we obtan ML estmates of 4.Asafuncton of 4 j j,themultnomal probablty mass functon Ž 1.. s proportonal to the kernel Ł j n j where all G 0 and s 1. Ž Ý j j j j The ML estmates are the 4 that maxmze Ž j. The multnomal log-lkelhood functon s LŽ. s Ý n j log j. j To elmnate redundances, we treat L as a functon of Ž,...,. 1 cy1, snce s 1 y Ž q q. c 1 cy1. Thus, cr jsy1, j s 1,...,c y 1. Snce log c 1 c 1 s sy, j c j c dfferentatng L wth respect to gves the lkelhood equaton j LŽ. nj nc s y s0. j j c The ML soluton satsfes ˆ jr ˆcs n jrn c. Now ˆ c ž Ý n j / j ˆ c n Ý ˆ j s 1 s s, n n j so ˆ cs ncrn and then ˆ js n jrn. From general results presented later n the book Ž Secton 8.6., ths soluton does maxmze the lkelhood. Thus, the ML estmates of 4 are the sample proportons. j c c
18 INTRODUCTION: DISTRIBUTIONS AND INFERENCE FOR CATEGORICAL DATA 1.5. Pearson Statstc for Testng a Specfed Multnomal In 1900 the emnent Brtsh statstcan Karl Pearson ntroduced a hypothess test that was one of the frst nferental methods. It had a revolutonary mpact on categorcal data analyss, whch had focused on descrbng assocatons. Pearson s test evaluates whether multnomal parameters equal certan specfed values. Hs orgnal motvaton n developng ths test was to analyze whether possble outcomes on a partcular Monte Carlo roulette wheel were equally lkely Ž Stgler Consder H 0: js j0, j s 1,...,c, where Ý j j0s 1. When H0 s true, the expected values of n 4 j,called expected frequences, are js n j0, j s 1,..., c. Pearson proposed the test statstc Ž n jy j. X s. Ž Ý j 4 Greater dfferences n jy j produce greater X values, for fxed n. Let Xo Ž denote the observed value of X. The P-value s the null value of P X G X. o. Ths equals the sum of the null multnomal probabltes of all count arrays havng a sum of n wth X G X o. For large samples, X has approxmately a ch-squared dstrbuton wth Ž. df s c y 1. The P-value s approxmated by P cy1 G X o, where cy1 denotes a ch-squared random varable wth df s c y 1. Statstc Ž s called the Pearson ch-squared statstc. j Example: Testng Mendel s Theores Among ts many applcatons, Pearson s test was used n genetcs to test Mendel s theores of natural nhertance. Mendel crossed pea plants of pure yellow stran wth plants of pure green stran. He predcted that second-generaton hybrd seeds would be 75% yellow and 5% green, yellow beng the domnant stran. One experment produced n s 803 seeds, of whch n s 1 60 were yellow and n s 001 were green. The expected frequences for H : s 0.75, s 0.5 are s 803Ž s and s The Pearson statstc X s Ž df s 1. has a P-value of P s Ths does not contradct Mendel s hypothess. Mendel performed several experments of ths type. In 1936, R. A. Fsher summarzed Mendel s results. He used the reproductve property of chsquared: If X1,..., Xk are ndependent ch-squared statstcs wth degrees of freedom,...,, then Ý X has a ch-squared dstrbuton wth df s 1 k Ý. Fsher obtaned a summary ch-squared statstc equal to 4, wth df s 84. A ch-squared dstrbuton wth df s 84 has mean 84 and standard devaton Ž 84. 1r s 13.0, and the rght-taled probablty above 4 s P s In other words, the ch-squared statstc was so small that the ft seemed too good.
19 STATISTICAL INFERENCE FOR MULTINOMIAL PARAMETERS 3 Fsher commented: The general level of agreement between Mendel s expectatons and hs reported results shows that t s closer than would be expected n the best of several thousand repettons.... I have no doubt that Mendel was deceved by a gardenng assstant, who knew only too well what hs prncpal expected from each tral made. In a letter wrtten at the tme Ž see Box 1978, p. 97., he stated: Now, when data have been faked, I know very well how generally people underestmate the frequency of wde chance devatons, so that the tendency s always to make them agree too well wth expectatons. In summary, goodness-of-ft tests can reveal not only when a ft s nadequate, but also when t s better than random fluctuatons would have us expect. wr. A. Fsher s daughter, Joan Fsher Box Ž1978, pp , and Freedman et al. Ž 1978, pp , 478. dscussed Fsher s analyss of Mendel s data and the accompanyng controversy. Despte possble dffcultes wth Mendel s data, subsequent work led to general acceptance of hs theores.x Ch-Squared Theoretcal Justfcaton* We now outlne why Pearson s statstc has a lmtng ch-squared dstrbuton. For a multnomal sample Ž n,..., n. 1 c of sze n, the margnal dstrbuton of n s the bnž n,. j j dstrbuton. For large n, bythe normal approxma- ton to the bnomal, n j Ž and ˆ js n jrn. have approxmate normal dstrbutons. More generally, by the central lmt theorem, the sample proportons ˆ s Ž n rn,..., n rn. 1 cy1 have an approxmate multvarate normal dstrbuton Ž Secton Let denote the null covarance matrx of ' 0 n, ˆ and let s Ž,...,.. Under H, snce ' n Ž y cy1,0 0 ˆ 0 converges to a NŽ 0,. dstrbuton, the quadratc form 0 y1 Ž ˆ 0. 0 Ž ˆ 0. n y y Ž has dstrbuton convergng to ch-squared wth df s c y 1. In Secton we show that the covarance matrx of ' n ˆ has elements ½ y j k f j k jk s. j Ž 1 y j. f j s k y1 The matrx has Ž j, k. th element 1r when j k and Ž 1r q 1r. 0 c0 j0 c0 Ž y1 when j s k. You can verfy ths by showng that 0 0 equals the dentty matrx.. Wth ths substtuton, drect calculaton Žwth approprate combnng. of terms shows that 1.16 smplfes to X.InSecton 14.3 we provde a formal proof n a more general settng. Ths argument s smlar to Pearson s n R. A. Fsher Ž 19. gave a smpler justfcaton, the gst of whch follows: Suppose that Ž n,..., n. 1 c are ndependent Posson random varables wth means Ž,...,.. For large 1 c
20 4 INTRODUCTION: DISTRIBUTIONS AND INFERENCE FOR CATEGORICAL DATA 4, the standardzed values z s Ž n y. r 4 j j j j j have approxmate standard normal dstrbutons. Thus, Ý jz j s X has an approxmate ch-squared dstrbuton wth c degrees of freedom. Addng the sngle lnear constrant Ý Ž n y. j j j s 0, thus convertng the Posson dstrbutons to a multnomal, we lose a degree of freedom. When c s, Pearson s X smplfes to the square of the normal score statstc Ž For Mendel s data, ˆ 1 s 60r803, 10 s 0.75, n s 803, and z S s 0.13, for whch X s 0.13 s In fact, for general c the Pearson test s the score test about multnomal parameters. ' Lkelhood-Rato Ch-Squared An alternatve test for multnomal parameters uses the lkelhood-rato test. The kernel of the multnomal lkelhood s Ž Under H0 the lkelhood s maxmzed when ˆ js j0.inthe general case, t s maxmzed when ˆ j s n jrn. The rato of the lkelhoods equals n j Ł j Ž j0. s n j. Ł n rn j j Thus, the lkelhood-rato statstc, denoted by G,s Ý j j j0 G sy log s n log n rn. Ž Ths statstc, whch has form Ž 1.1., s called the lkelhood-rato ch-squared statstc. The larger the value of G, the greater the evdence aganst H 0. In the general case, the parameter space conssts of 4 j subject to Ý s 1, so the dmensonalty s c y 1. Under H,the 4 j j 0 j are specfed completely, so the dmenson s 0. The dfference n these dmensons equals c y 1.For large n, G has a ch-squared null dstrbuton wth df s c y 1. When H0 holds, the Pearson X and the lkelhood rato G both have asymptotc ch-squared dstrbutons wth df s c y 1. In fact, they are asymptotcally equvalent n that case; specfcally, X y G converges n probablty to zero Ž Secton When H0 s false, they tend to grow proportonally to n; they need not take smlar values, however, even for very large n. For fxed c, as n ncreases the dstrbuton of X usually converges to ch-squared more quckly than that of G. The ch-squared approxmaton s usually poor for G when nrc 5. When c s large, t can be decent for X for nrc as small as 1 f the table does not contan both very small and moderately large expected frequences. We provde further gudelnes n Secton Alternatvely, one can use the multnomal probabltes to generate exact dstrbutons of these test statstcs Ž Good et al
21 STATISTICAL INFERENCE FOR MULTINOMIAL PARAMETERS Testng wth Estmated Expected Frequences Pearson s X Ž compares a sample dstrbuton to a hypothetcal one 4.Insomeapplcatons, s Ž.4 j0 j0 j0 are functons of a smaller set of unknown parameters. ML estmates ˆ of determne ML estmates Ž ˆ.4 of 4 and hence ML estmates s n Ž ˆ.4 j0 j0 ˆ j j0 of expected frequen- 4 4 ces n X. Replacng j by estmates ˆ j affects the dstrbuton of X. When dmž. s p, the true df s Ž c y 1. y p Ž Secton Pearson faled to realze ths Ž Secton We now show a goodness-to-ft test wth estmated expected frequences. A sample of 156 dary calves born n Okeechobee County, Florda, were classfed accordng to whether they caught pneumona wthn 60 days of brth. Calves that got a pneumona nfecton were also classfed accordng to whether they got a secondary nfecton wthn weeks after the frst nfecton cleared up. Table 1.1 shows the data. Calves that dd not get a prmary nfecton could not get a secondary nfecton, so no observatons can fall n the category for no prmary nfecton and yes secondary nfecton. That combnaton s called a structural zero. A goal of ths study was to test whether the probablty of prmary nfecton was the same as the condtonal probablty of secondary nfecton, gven that the calf got the prmary nfecton. In other words, f ab denotes the probablty that a calf s classfed n row a and column b of ths table, the null hypothess s H 0: 11 q 1 s 11rŽ 11 q 1. or 11 s 11 q 1. Let s 11 q 1 denote the probablty of prmary nfecton. The null hypothess states that the probabltes satsfy the structure that Table 1. shows; that s, probabltes n a trnomal for the categores Ž yes yes, yes no, no no. for prmary secondary nfecton equal Ž, Ž 1 y.,1y.. Let n denote the number of observatons n category Ž a, b. ab. The ML estmate of s the value maxmzng the kernel of the multnomal lkelhood n n 11 1 n Ž. Ž y. Ž 1 y.. TABLE 1.1 Prmary and Secondary Pneumona Infectons n Calves Secondary Infecton a Prmary Infecton Yes No Yes 30 Ž Ž No 0 Ž. 63 Ž Source: Data courtesy of Thang Tran and G. A. Donovan, College of Veternary Medcne, Unversty of Florda. a Values n parentheses are estmated expected frequences.
22 6 INTRODUCTION: DISTRIBUTIONS AND INFERENCE FOR CATEGORICAL DATA TABLE 1. Probablty Structure for Hypothess Secondary Infecton Prmary Infecton Yes No Total Yes Ž 1 y. No 1 y 1 y The log lkelhood s LŽ. s n11 log q n1 logž y. q n logž 1 y.. Dfferentaton wth respect to gves the lkelhood equaton The soluton s n n n n q y y s0. 1 y 1 y ˆ s Ž n11 q n1. rž n11 q n1 q n.. For Table 1.1, ˆ s Snce n s 156, the estmated expected frequen- ces are ˆ s n s 38.1, s nž y. s 39.0, and s nž 1 y. 11 ˆ ˆ1 ˆ ˆ ˆ ˆ s Table 1.1 shows them. Pearson s statstc s X s Snce the c s 3 possble responses have p s 1 parameter Ž. determnng the expected frequences, df s Ž 3 y 1. y 1 s 1. There s strong evdence aganst H Ž 0 Ps Inspecton of Table 1.1 reveals that many more calves got a prmary nfecton but not a secondary nfecton than H0 predcts. The researchers concluded that the prmary nfecton had an mmunzng effect that reduced the lkelhood of a secondary nfecton. NOTES Secton 1.1: Categorcal Response Data 1.1. Stevens Ž defned Ž nomnal, ordnal, nterval. scales of measurement. Other scales result from mxtures of these types. For nstance, partally ordered scales occur when subjects respond to questons havng categores ordered except for don t know or undecded categores. Secton 1.3: Statstcal Inference for Categorcal Data 1.. The score method does not use. ˆ Thus, when s a model parameter, one can usually compute the score statstc for testng H 0: s 0 wthout fttng the model. Ths s advantageous when fttng several models n an exploratory analyss and model fttng s computatonally ntensve. An advantage of the score and lkelhood-rato methods s that
23 PROBLEMS 7 they apply even when ˆ s. In that case, one cannot compute the Wald statstc. Another dsadvantage of the Wald method s that ts results depend on the parameterzaton; nference based on ˆ and ts SE s not equvalent to nference based on a nonlnear functon of t, such as log ˆ and ts SE. Secton 1.4: Statstcal Inference for Bnomal Parameters 1.3. Among others, Agrest and Coull Ž 1998., Blyth and Stll Ž 1983., Brown et al. Ž 001., Ghosh Ž 1979., and Newcombe Ž 1998a. showed the superorty of the score nterval to the Wald nterval for. Ofthe exact methods, Blaker s Ž 000. has partcularly good propertes. It s contaned n the Clopper Pearson nterval and has a nestedness property whereby an nterval of hgher nomnal confdence level necessarly contans one of lower level Usng contnuty correctons wth large-sample methods provdes approxmatons to exact small-sample methods. Thus, they tend to behave conservatvely. We do not present them, snce f one prefers an exact method, wth modern computatonal power t can be used drectly rather than approxmated In theory, one can elmnate problems wth dscreteness n tests by performng a supplementary randomzaton on the boundary of a crtcal regon Ž see Problem In rejectng the null at the boundary wth a certan probablty, one can obtan a fxed overall type I error probablty even when t s not an achevable P-value. For such randomzaton, the one-sded P y value s randomzed P-value s U PŽ Ts t. q PŽ T t., o o where U denotes a unform Ž 0, 1. random varable Ž Stevens In practce, ths s not used, as t s absurd to let ths random number nfluence a decson. The md P-value replaces the arbtrary unform multple U PTs Ž t. by ts expected value. o Secton 1.5: Statstcal Inference for Multnomal Parameters 1.6. The ch-squared dstrbuton has mean df, varance df, and skewness Ž 8rdf. 1r. It s approxmately normal when df s large. Greenwood and Nkuln Ž 1996., Kendall and Stuart Ž 1979., and Lancaster Ž presented other propertes. Cochran Ž 195. presented a hstorcal survey of ch-squared tests of ft. See also Cresse and Read Ž 1989., Koch and Bhapkar Ž 198., Koehler Ž 1998., and Moore Ž 1986b.. PROBLEMS Applcatons 1.1 Identfy each varable as nomnal, ordnal, or nterval. a. UK poltcal party preference ŽLabour, Conservatve, Socal Democrat. b. Anxety ratng Ž none, mld, moderate, severe, very severe. c. Patent survval Ž n number of months. d. Clnc locaton Ž London, Boston, Madson, Rochester, Montreal.
24 8 INTRODUCTION: DISTRIBUTIONS AND INFERENCE FOR CATEGORICAL DATA e. Response of tumor to chemotherapy Žcomplete elmnaton, partal reducton, stable, growth progresson. f. Favorte beverage Ž water, juce, mlk, soft drnk, beer, wne. g. Apprasal of company s nventory level Žtoo low, about rght, too hgh. 1. Each of 100 multple-choce questons on an exam has four possble answers, one of whch s correct. For each queston, a student guesses by selectng an answer randomly. a. Specfy the dstrbuton of the student s number of correct answers. b. Fnd the mean and standard devaton of that dstrbuton. Would t be surprsng f the student made at least 50 correct responses? Why? c. Specfy the dstrbuton of Ž n, n, n, n , where n j s the number of tmes the student pcked choce j. d. Fnd En, varž n., covž n, n., and corrž n, n. j j j k j k. 1.3 An experment studes the number of nsects that survve a certan dose of an nsectcde, usng several batches of nsects of sze n each. The nsects are senstve to factors that vary among batches durng the experment but were not measured, such as temperature level. Explan why the dstrbuton of the number of nsects per batch survvng the experment mght show overdsperson relatve to a bnž n,. dstrbuton. 1.4 In hs autobography A Sort of Lfe, Brtsh author Graham Greene descrbed a perod of severe mental depresson durng whch he played Russan Roulette. Ths game conssts of puttng a bullet n one of the sx chambers of a pstol, spnnng the chambers to select one at random, and then frng the pstol once at one s head. a. Greene played ths game sx tmes and was lucky that none of them resulted n a bullet frng. Fnd the probablty of ths outcome. b. Suppose that he had kept playng ths game untl the bullet fred. Let Y denote the number of the game on whch t fres. Show the probablty mass functon for Y, and justfy. 1.5 Consder the statement, Please tell me whether or not you thnk t should be possble for a pregnant woman to obtan a legal aborton f she s marred and does not want any more chldren. For the 1996 General Socal Survey, conducted by the Natonal Opnon Research Center Ž NORC., 84 repled yes and 98 repled no. Let denote
25 PROBLEMS 9 the populaton proporton who would reply yes. Fnd the P-value for testng H 0: s 0.5 usng the score test, and construct a 95% confdence nterval for. Interpret the results. 1.6 Refer to the vegetaransm example n Secton For testng H 0: s 0.5 aganst H a: 0.5, show that: a. The lkelhood-rato statstc equals w5logž 5r1.5.x s b. The ch-squared form of the score statstc equals 5.0. c. The Wald z or ch-squared statstc s nfnte. 1.7 In a crossover tral comparng a new drug to a standard, denotes the probablty that the new one s judged better. It s desred to estmate and test H 0: s 0.5 aganst H a: 0.5. In 0 ndependent observatons, the new drug s better each tme. a. Fnd and sketch the lkelhood functon. Gve the ML estmate of. b. Conduct a Wald test and construct a 95% Wald confdence nterval for. Are these sensble? c. Conduct a score test, reportng the P-value. Construct a 95% score confdence nterval. Interpret. d. Conduct a lkelhood-rato test and construct a lkelhood-based 95% confdence nterval. Interpret. e. Construct an exact bnomal test and 95% confdence nterval. Interpret. f. Suppose that researchers wanted a suffcently large sample to estmate the probablty of preferrng the new drug to wthn 0.05, wth confdence If the true probablty s 0.90, about how large a sample s needed? 1.8 In an experment on chlorophyll nhertance n maze, for 1103 seedlngs of self-fertlzed heterozygous green plants, 854 seedlngs were green and 49 were yellow. Theory predcts the rato of green to yellow s 3:1. Test the hypothess that 3:1 s the true rato. Report the P-value, and nterpret. 1.9 Table 1.3 contans Ladslaus von Bortkewcz s data on deaths of solders n the Prussan army from kcks by army mules ŽFsher 1934; Qune and Seneta The data refer to 10 army corps, each observed for 0 years. In 109 corps-years of exposure, there were no deaths, n 65 corps-years there was one death, and so on. Estmate the mean and test whether probabltes of occurrences n these fve categores follow a Posson dstrbuton Ž truncated for 4 and above..
26 30 INTRODUCTION: DISTRIBUTIONS AND INFERENCE FOR CATEGORICAL DATA TABLE 1.3 Data for Problem 1.9 Number of Deaths Number of Corps-Years G A sample of 100 women suffer from dysmenorrhea. A new analgesc s clamed to provde greater relef than a standard one. After usng each analgesc n a crossover experment, 40 reported greater relef wth the standard analgesc and 60 reported greater relef wth the new one. Analyze these data. Theory and Methods 1.11 Why s t easer to get a precse estmate of the bnomal parameter 1 when t s near 0 or 1 than when t s near? 1.1 Suppose that PYs Ž 1. s 1 y PYs Ž 0. s, s 1,...,n, where Y 4 are ndependent. Let Y s Ý Y. a. What are varž Y. and the dstrbuton of Y? b. When Y 4 nstead have parwse correlaton 0, show that varž Y. n Ž 1 y., overdsperson relatve to the bnomal. wal- tham Ž dscussed generalzatons of the bnomal that allow correlated trals.x c. Suppose that heterogenety exsts: PYs Ž 1. s for all, but s a random varable wth densty functon gž. on w0, 1x havng mean and postve varance. Show that varž Y. n Ž 1 y.. ŽWhen has a beta dstrbuton, Y has the beta-bnomal dstrbuton of Secton d. Suppose that PYs Ž 1. s, s 1,...,n, where 4 are nde- pendent from gž.. Explan why Y has a bnž n,. dstrbuton uncondtonally but not condtonally on 4. Ž Hnt: In each case, s Y a sum of ndependent, dentcal Bernoull trals? For a sequence of ndependent Bernoull trals, Y s the number of successes before the kth falure. Explan why ts probablty mass
27 PROBLEMS 31 functon s the negat e bnomal, Ž y q k y 1.! y k pž y. s Ž 1 y., y s 0, 1,,.... y! Ž k y 1.! wfor t, EY s k rž 1 y. and varž Y. s k rž 1 y.,sovarž Y. EY; the Posson s the lmt as k and 0 wth k s fxed. x 1.14 For the multnomal dstrbuton, show that ' corrž n, n. sy r 1 y Ž 1 y.. j k j k j j k k Show that corr n, n sy1 when c s Show that the moment generatng functon Ž mgf. for the bnomal dstrbuton s mt Ž. s Ž1 y q e t. n, and use t to obtan the frst two moments. Show that the mgf for the Posson dstrbuton s mt Ž. s expž wexpž. t y 1 x4, and use t to obtan the frst two moments A lkelhood-rato statstc equals t o. Atthe ML estmates, show that the data are expž t r. tmes more lkely under H than under H. o a Assume that y 1, y,..., yn are ndependent from a Posson dstrbuton. a. Obtan the lkelhood functon. Show that the ML estmator ˆ s y. b. Construct a large-sample test statstc for H : s usng () 0 0 the Wald method, ( ) the score method, and ( ) the lkelhood-rato method. c. Construct a large-sample confdence nterval for usng () the Wald method, ( ) the score method, and ( ) the lkelhood-rato method Inference for Posson parameters can often be based on connectons wth bnomal and multnomal dstrbutons. Show how to test H 0: 1s for two populatons based on ndependent Posson counts Ž y, y. 1, usng a correspondng test about a bnomal parameter. w Hnt: Condton on n s y q y and dentfy s rž q.. x How can one construct a confdence nterval for 1r based on one for? 1.19 A researcher routnely tests usng a nomnal PŽ type I error. s 0.05, rejectng H f the P-value F An exact test usng test statstc T 0
28 3 INTRODUCTION: DISTRIBUTIONS AND INFERENCE FOR CATEGORICAL DATA has null dstrbuton PTs Ž 0. s 0.30, PTs Ž 1. s 0.6, and PTs s 0.08, where a hgher T provdes more evdence aganst the null. a. Wth the usual P-value, show that the actual PŽ type I error. s 0. b. Wth the md-p-value, show that the actual PŽ type I error. s c. Fnd PŽ type I error. n parts Ž a. and Ž b. when PŽ Ts 0. s 0.30, PTs Ž 1. s 0.66, PTs s Note that the test wth md- P-value can be conservatve or lberal. The exact test wth ordnary P-value cannot be lberal. d. In part Ž. a, a randomzed-decson test generates a unform random 5 varable U from w0, 1x and rejects H0 when T s and U F 8. Show the actual PŽ type I error. s Is ths a sensble test? 1.0 For a bnomal parameter, show how the nverson process for constructng a confdence nterval works wth Ž a. the Wald test, and Ž b. the score test. 1.1 For a flp of a con, let denote the probablty of a head. An experment tests H 0: s 0.5 aganst H a: 0.5, usng n s 5 ndependent flps. a. Show that the true null probablty of rejectng H0 at the sgnfcance level s 0.0 for the exact bnomal test and 16 usng the large-sample score test. b. Suppose that truly s 0.5. Explan why the probablty that the 95% Clopper Pearson confdence nterval contans equals 1.0. Ž Hnt: Is there any possble y for whch both one-sded tests of H 0: s 0.5 have P-value F 0.05?. 1. Consder the Wald confdence nterval for a bnomal parameter. Snce t s degenerate when ˆ s 0or1,argue that for 0 1 the w n probablty the nterval covers cannot exceed 1 y y Ž 1 y. n x; hence, the nfmum of the coverage probablty over 0 1 equals 0, regardless of n. 1.3 Consder the 95% bnomal score confdence nterval for. When y s 1, show that the lower lmt s approxmately 0.18rn; n fact, rn then falls n an nterval only when y s 0. Argue that for large n and just barely below 0.18rn or just barely above 1 y 0.18rn, the actual coverage probablty s about e y0.18 s Hence, even as n, ths method s not guaranteed to have coverage probablty G 0.95 Ž Agrest and Coull 1998; Blyth and Stll From Secton 1.4. the mdpont of the score confdence nterval for s the sample proporton for an adjusted data set that adds z r r
29 PROBLEMS 33 observatons of each type to the sample. Ths motvates an adjusted Wald nterval, ' Ž. z r 1 y rn*, where n* s n q z r. Show that the varance Ž 1 y. rn* atthe weghted average s at least as large as the weghted average of the varances that appears under the square root sgn n the score nterval ŽHnt: Use Jensen s nequalty.. Thus, ths nterval contans the score nterval. wagrest and Coull Ž and Brown et al. Ž 001. showed that t performs much better than the Wald nterval. It does not have the score nterval s dsadvantage Ž Problem 1.3. of poor coverage near 0 and 1. x 1.5 A bnomal sample of sze n has y s 0 successes. a. Show that the confdence nterval for based on the lkelhood w Ž functon s 0.0, 1 y exp yz rn.x r. For s 0.05, use the expanson of an exponental functon to show that ths s approxmately w0, rn x. b. For the score method, show that the confdence nterval s w Ž 0, z r n q z.x,orapproxmately w0, 4rŽ n q 4.x r r when s c. For the Clopper Pearson approach, show that the upper bound s 1 y Ž r. 1r n, or approxmately ylogž rn s 3.69rn when s d. For the adaptaton of the Clopper Pearson approach usng the md-p-value, show that the upper bound s 1 y 1r n,orapproxmately ylogž rn s 3rn when s For the geometrc dstrbuton pž y. s y Ž 1 y., y s 0, 1,,..., show that the tal method for constructng a confdence nterval w.e., equatng PYG Ž y. and PYF Ž y. to rx yelds wž r. 1r y, Ž 1 y r. 1rŽ yq1. x. Show that all between 0 and 1 y r ne er fall above a confdence nterval, and hence the actual coverage probablty exceeds 1 y r over ths regon. 1.7 A statstc T has dscrete dstrbuton wth cdf FŽ. t. Show that FŽ T. s stochastcally larger than unform over w0, 1 x; that s, ts cdf s everywhere no greater than that of the unform ŽCasella and Berger 001, pp. 77, Explan why an mplcaton s that a P-value based on T has null dstrbuton that s stochastcally larger than unform. 1.8 Suppose that PTs Ž t. s, j s 1,.... Show that EŽ md-p-value. j j s 0.5. w Hnt: Show that Ý Ž r q q. s Ž Ý. r. x j j j jq1 j j
30 34 INTRODUCTION: DISTRIBUTIONS AND INFERENCE FOR CATEGORICAL DATA 1.9 For a statstc T wth cdf FŽ. t and pž. ts PŽ Ts t., the md-dstrbuton functon s F Ž. t s Ft Ž. y 0.5 pt Ž.Ž Parzen md. Gven T s t o, show that the md-p-value equals 1 y Ft. Ž o It also satsfes EF w Ž T.xs 0.5 and varwf Ž T.x s Ž 1r1.1 y Ew p Ž T.x4.. md md w 1.30 Genotypes AA, Aa, and aa occur wth probabltes, Ž 1 y., Ž 1 y. x.amultnomal sample of sze n has frequences Ž n, n, n. 1 3 of these three genotypes. a. Form the log lkelhood. Show that ˆ s Ž n q n. rž 1 n1q nq n. 3. b. Show that y L r s wž n q n. r x q wž n q n. 1 3 r Ž 1 y. x and that ts expectaton s nr Ž 1 y.. Use ths to obtan an asymptotc standard error of. ˆ c. Explan how to test whether the probabltes truly have ths pattern Refer to Secton Usng the lkelhood functon to obtan the nformaton, fnd the approxmate standard error of. ˆ 1.3 Refer to Secton Let a denote the number of calves that got a prmary, secondary, and tertary nfecton, b the number that receved a prmary and secondary but not a tertary nfecton, c the number that receved a prmary but not a secondary nfecton, and d the number that dd not receve a prmary nfecton. Let be the probablty of a prmary nfecton. Consder the hypothess that the probablty of nfecton at tme t, gven nfecton at tmes 1,...,t y 1, s also, for t s, 3. Show that ˆ s Ž 3a q b q c. rž 3a q 3b q c q d Refer to quadratc form Ž a. Verfy that the matrx quoted n the text for y1 0 s the nverse of 0. b. Show that Ž smplfes to Pearson s statstc Ž c. For the z statstc 1.11, show that z s X for c s. S 1.34 For testng H 0: js j0, j s 1,...,c, usng sample multnomal proportons 4,thelkelhood-rato statstc Ž s ˆj G syn ˆ log r ˆ. S Ý j j0 j j Show that G G 0, wth equalty f and only f s for all j. Hnt: Apply Jensen s nequalty to EŽ yn log X., where X equals j0r ˆ j wth probablty ˆj.. Ž ˆj j0
31 PROBLEMS 35 Ž. Use t to prove the reproductve property of the ch-squared dstrbuton. y r The ch-squared mgf wth df s s mt s 1 y t, for t For the multnomal Žn, 4. j dstrbuton wth c, confdence lmts for are the solutons of j ˆj j Ž r c. jž j. y s z 1 y rn, j s 1,...,c. a. Usng the Bonferron nequalty, argue that these c ntervals smultaneously contan all 4 Ž for large samples. j wth probablty at least 1 y. b. Show that the standard devaton of ˆ y s w q y Ž j ˆk j k jy. x k rn. For large n, explan why the probablty s at least 1 y that the Wald confdence ntervals 1r j k r a½ j k j k 5 ˆ y ˆ z ˆ q ˆ y ˆ y ˆ rn smultaneously contan the a s ccy Ž 1. r dfferences y 4 j k Ž see Ftzpatrck and Scott 1987; Goodman
32 Categorcal Data Analyss, Second Edton. Alan Agrest Copyrght 00 John Wley & Sons, Inc. ISBN: CHAPTER 4 Introducton to Generalzed Lnear Models In Chapters and 3 we focused on methods for two-way contngency tables. Most studes, however, have several explanatory varables, and they may be contnuous as well as categorcal. The goal s usually to descrbe ther effects on response varables. Modelng the effects helps us do ths effcently. A good-fttng model evaluates effects, ncludes relevant nteractons, and provdes smoothed estmates of response probabltes. The rest of the book focuses on model buldng for categorcal response varables. In ths chapter we ntroduce a famly of generalzed lnear models that contans the most mportant models for categorcal responses as well as standard models for contnuous responses. Secton 4.1 covers three components common to all generalzed lnear models. Secton 4. llustrates wth models for bnary responses. The most mportant case s logstc regresson, a lnear model for the logt transformaton of a bnomal parameter. In Chapters 5 through 7 we study these models n detal. In Secton 4.3 we present generalzed lnear models for counts. A Posson regresson model called a loglnear model s a lnear model for the log of a Posson mean. In Chapters 8 and 9 we study them for modelng counts n contngency tables. Sectons 4.4 through 4.8 are more techncal. Readers wantng manly an overvew of methods can skp them or read them lghtly. For generalzed lnear models, Secton 4.4 covers lkelhood equatons and the asymptotc covarance matrx of ML model parameter estmates, and Secton 4.5 summarzes nferental methods. Methods of solvng the lkelhood equatons are presented n Secton 4.6. In the fnal two sectons we ntroduce generalzatons, quas-lkelhood and generalzed addt e models, that further extend the scope of models. 115
33 116 INTRODUCTION TO GENERALIZED LINEAR MODELS 4.1 GENERALIZED LINEAR MODEL Generalzed lnear models Ž GLMs. extend ordnary regresson models to encompass nonnormal response dstrbutons and modelng functons of the mean. Three components specfy a generalzed lnear model: A random component dentfes the response varable Y and ts probablty dstrbuton; a systematc component specfes explanatory varables used n a lnear predctor functon; and a lnk functon specfes the functon of EŽ Y. that the model equates to the systematc component. Nelder and Wedderburn Ž 197. ntroduced the class of GLMs, although many models n the class were well establshed by then Components of Generalzed Lnear Models The random component of a GLM conssts of a response varable Y wth ndependent observatons Ž y,..., y. 1 N from a dstrbuton n the natural exponental famly. Ths famly has probablty densty functon or mass functon of form fž y ;. s až. bž y. exp yq Ž.. Ž 4.1. Several mportant dstrbutons are specal cases, ncludng the Posson and bnomal. The value of the parameter may vary for s 1,...,N, depend- ng on values of explanatory varables. The term QŽ. s called the natural parameter. InSecton 4.4 we present a more general formula that also has a dsperson parameter, but Ž 4.1. s suffcent for basc dscrete data models. The systematc component of a GLM relates a vector Ž,...,. 1 N to the explanatory varables through a lnear model. Let xj denote the value of predctor j Ž js 1,,..., p. for subject. Then Ý s x, s 1,...,N. j j j Ths lnear combnaton of explanatory varables s called the lnear predctor. Usually, one x s 1 for all, for the coeffcent of an ntercept Ž j often denoted by. n the model. The thrd component of a GLM s a lnk functon that connects the random and systematc components. Let s EY, s 1,...,N. The model lnks to by s gž., where the lnk functon g s a monotonc, dfferentable functon. Thus, g lnks EŽ Y. to explanatory varables through the formula Ý g Ž. s x, s 1,...,N. Ž 4.. j j j
34 GENERALIZED LINEAR MODEL 117 The lnk functon gž. s, called the dentty lnk, has s. It specfes a lnear model for the mean tself. Ths s the lnk functon for ordnary regresson wth normally dstrbuted Y. The lnk functon that transforms the mean to the natural parameter s called the canoncal lnk. For t, gž. s QŽ., and QŽ. s Ý j jx j. The followng subsectons show examples. In summary, a GLM s a lnear model for a transformed mean of a response varable that has dstrbuton n the natural exponental famly. We now llustrate the three components by ntroducng the key GLMs for dscrete response varables Bnomal Logt Models for Bnary Data Many response varables are bnary. Represent the success and falure outcomes by 1 and 0. The Bernoull dstrbuton for ths Bernoull tral specfes probabltes PYs Ž 1. s and PYs Ž 0. s 1 y, for whch EY s. Ths s the specal case of the bnomal Ž 1.1. wth n s 1. The probablty mass functon s y 1yy f y; s 1 y s 1 y r 1 y ž / s Ž 1 y. exp y log Ž y for y s 0 and 1. Ths s n the natural exponental famly Ž 4.1., dentfyng wth, až. s 1 y, bž y. s 1, and QŽ. s logw rž 1 y.x. The natural parameter logw rž 1 y.x s the log odds of response 1, the logt of. Ths s the canoncal lnk. GLMs usng the logt lnk are often called logt models. y Posson Loglnear Models for Count Data Some response varables have counts as ther possble outcomes. For a sample of slcon wafers used n manufacturng computer chps, each observaton mght be the number of mperfectons on a wafer. Counts also occur as entres n contngency tables. The smplest dstrbuton for count data s the Posson. Lke counts, Posson varates can take any nonnegatve nteger value. Let Y denote a count and let s EY. The Posson probablty mass functon Ž 1.4. for Y s ž / e y y 1 fž y;. s sexpž y. expž y log., y s 0, 1,,.... y! y! Ths has natural exponental form Ž 4.1. wth s, až. s expž y., bž y. s 1ry!, and QŽ. s log. The natural parameter s log, sothe canoncal
35 118 INTRODUCTION TO GENERALIZED LINEAR MODELS TABLE 4.1 Types of Generalzed Lnear Models for Statstcal Analyss Random Systematc Component Lnk Component Model Chapters Normal Identty Contnuous Regresson Normal Identty Categorcal Analyss of varance Normal Identty Mxed Analyss of covarance Bnomal Logt Mxed Logstc regresson 5 and 6 Posson Log Mxed Loglnear 8 and 9 Multnomal Generalzed Mxed Multnomal response 7 logt lnk functon s the log lnk, s log. The model usng ths lnk s Ý log s x, s 1,...,N. Ž 4.4. j j j Ths model s called a Posson loglnear model Generalzed Lnear Models for Contnuous Responses The class of GLMs also ncludes models for contnuous responses. The normal dstrbuton s n a natural exponental famly that ncludes dsperson parameters. Its natural parameter s the mean. Therefore, an ordnary regresson model for EY s a GLM usng the dentty lnk. Table 4.1 lsts ths and other standard models for a normal random component. The table also lsts GLMs for dscrete responses that are presented n the next sx chapters. A tradtonal way to analyze data transforms Y so that t has approxmately a normal dstrbuton wth constant varance; then, ordnary leastsquares regresson s applcable. Wth GLMs, by contrast, the choce of lnk functon s separate from the choce of random component. If a lnk s useful n the sense that a lnear model for the predctors s plausble for that lnk, t s not necessary that t also stablzes varance or produces normalty. Ths s because the fttng process maxmzes the lkelhood for the choce of dstrbuton for Y, and that choce s not restrcted to normalty Devance For a partcular GLM for observatons y s Ž y,..., y., let LŽ ; y. 1 N denote the log-lkelhood functon expressed n terms of the means s Ž,...,. 1 N. Let LŽ ; ˆ y. denote the maxmum of the log lkelhood for the model. Consdered for all possble models, the maxmum achevable log lkelhood s
36 GENERALIZED LINEAR MODEL 119 LŽ y; y.. Ths occurs for the most general model, havng a separate parameter for each observaton and the perfect ft ˆ s y. Such a model s called the saturated model. Ths model s not useful, snce t does not provde data reducton. However, t serves as a baselne for comparson wth other model fts. The de ance of a Posson or bnomal GLM s defned to be y LŽ ; ˆ y. y LŽ y; y.. Ths s the lkelhood-rato statstc for testng the null hypothess that the model holds aganst the general alternatve e., the saturated model.. For some Posson and bnomal GLMs, the number of observatons N stays fxed as the ndvdual counts ncrease n sze. Then the devance has a ch-squared asymptotc null dstrbuton. The df s N y p, where p s the number of model parameters; that s, df equals the dfference between the numbers of parameters n the saturated and unsaturated models. The devance then provdes a test of model ft. An example s bnomal counts at N fxed settngs of predctors when the number of trals at each settng ncreases. Let Y be bnž n,., s 1,..., N. Consder the smple model of homogenety, s all. Ithas p s 1 parameter. The saturated model makes no assumpton about 4, lettng them be any N values between 0 and 1.0. It has N parameters. The devance for the homogenety model has df s N y 1. In fact, t equals the G lkelhood-rato statstc Ž for testng ndependence n the N table that these samples form. Under ndependence, t has approxmately a chsquared dstrbuton as the n 4 ncrease, for fxed N. We use the devance throughout the book for model checkng and for nferental comparsons of models. Components of the devance are resdual measures of lack of ft. Methods for analyzng the devance generalze analyss of varance methods for normal lnear models Advantages of the GLM Formulaton GLMs provde a unfed theory of modelng that encompasses the most mportant models for contnuous and dscrete varables. Models studed n ths text are GLMs wth bnomal or Posson random component, or multvarate extensons of GLMs. The ML parameter estmates are computed wth an algorthm, presented n Secton 4.6, that teratvely uses a weghted verson of least squares. The reason for restrctng GLMs to the exponental famly of dstrbutons for Y s that the same algorthm apples to ths entre famly, for any choce of lnk functon. Most statstcal software has the faclty to ft GLMs. Appendx A gves detals.
37 10 INTRODUCTION TO GENERALIZED LINEAR MODELS 4. GENERALIZED LINEAR MODELS FOR BINARY DATA Let Y denote a bnary response varable. For nstance, Y mght ndcate vote n a Brtsh electon Ž Labour, Conservatve., choce of automoble Ždomestc, mport., or dagnoss of breast cancer Ž present, absent.. Each observaton has one of two outcomes, denoted by 0 and 1, bnomal for a sngle tral. The mean EY s PYs Ž 1.We. denote PYs Ž 1. by Ž x., reflectng ts dependence on values x s Ž x,..., x. of predctors. The varance of Y s 1 p varž Y. s Ž x. 1 y Ž x., the bnomal varance for one tral. In ntroducng GLMs for bnary data, for smplcty we use a sngle explanatory varable Lnear Probablty Model For a bnary response, the regresson model Ž x. s q x Ž 4.5. s called a lnear probablty model. Wth ndependent observatons t s a GLM wth bnomal random component and dentty lnk functon. The lnear probablty model has a major structural defect. Probabltes fall between 0 and 1, but lnear functons take values over the entre real lne. Model Ž 4.5. has Ž x. 0 and Ž x. 1 for suffcently large or small x values. For ts extenson wth multple predctors, dffcultes often occur fttng ths model because durng the fttng process, ˆ Ž. x falls outsde the w0, 1x range for some subjects x values. The model can be vald over a restrcted range of x values. When t s plausble, an advantage s ts smple nterpretaton: s the change n Ž x. for a one-unt ncrease n x. We defer to Secton 4.6 the techncal detals of fttng ths and other GLMs. One should assume a bnomal dstrbuton for Y and use maxmum lkelhood Ž ML. rather than ordnary least squares. Least squares s ML for a normal dstrbuton wth constant varance. For bnary responses, the constant varance condton that makes least squares estmators optmal Ž.e., mnmum varance n the class of lnear unbased estmators. s not satsfed. Snce varž Y. s Ž x.w1 y Ž x.x, the varance depends on x through ts nfluence on Ž x..as Ž x. moves toward 0 or 1, the dstrbuton of Y s more nearly concentrated at a sngle pont, and the varance moves toward 0. Because of the nonconstant varance, the bnomal ML estmator s more effcent than least squares. Also Y, beng bnary, s very far from normally dstrbuted. Thus, the usual samplng dstrbutons for the least squares estmators do not apply. The estmates and standard errors for ML and least squares are usually smlar, however, when ˆ Ž x. for the sample x values falls n the range wthn whch the varance s relatvely stable Ž about 0.3 to 0.7..
38 GENERALIZED LINEAR MODELS FOR BINARY DATA 11 TABLE 4. Relatonshp between Snorng and Heart Dsease Heart Dsease Proporton Lnear Logt a a Snorng Yes No Yes Ft Ft Never Occasonally Nearly every nght Every nght a Model fts refer to proporton of yes responses. Source: P. G. Norton and E. V. Dunn, Brtsh Med. J. 91: Ž 1985., BMJ Publshng Group. 4.. Snorng and Heart Dsease Example We llustrate the lnear probablty model wth Table 4., from an epdemologcal survey of 484 subjects to nvestgate snorng as a rsk factor for heart dsease. Those surveyed were classfed accordng to ther spouses report of how much they snored. The model states that the probablty of heart dsease s lnearly related to the level of snorng x. Wetreat the rows of the table as ndependent bnomal samples. No obvous choce of scores exsts for categores of x. Weused Ž 0,, 4, 5., treatng the last two levels as closer than the other adjacent pars Ž Problem 4.4 uses equally spaced scores.. ML estmates and standard errors are the same f we use a data fle of 484 bnary observatons or f we enter the four bnomal totals of yes and no responses lsted n Table 4.. Software Ž see, e.g., Table A.3 for SAS. reports the ML ft, ˆ Ž x. s q x, wth a standard error SE s for ˆ s For nonsnorers Ž x s 0,. the estmated proporton of subjects havng heart dsease s We refer to the estmated values of EY for a GLM as ftted alues. Table 4. shows the sample proportons and the ftted values for ths model. Fgure 4.1 graphs the sample and ftted values. The table and graph suggest that the model fts well. ŽIn Secton 5..3 we dscuss formal goodness-of-ft analyses for bnary-response GLMs.. The model nterpretaton s smple. The estmated probablty of heart dsease s about 0.0 for nonsnorers; t ncreases Ž s 0.04 for occasonal snorers, another 0.04 for those who snore nearly every nght, and another 0.0 for those who always snore Logstc Regresson Model Usually, bnary data result from a nonlnear relatonshp between Ž x. and x. Afxed change n x often has less mpact when Ž x. s near 0 or 1 than when Ž x. s near 0.5. In the purchase of an automoble, consder the choce between buyng new or used. Let Ž x. denote the probablty of selectng new when annual famly ncome s x. An ncrease of $50,000 n annual
39 1 INTRODUCTION TO GENERALIZED LINEAR MODELS FIGURE 4.1 Predcted probabltes for lnear probablty and logstc regresson models. ncome would have less effect when x s $1,000,000 wfor whch Ž x. s near 1x than when x s $50,000. In practce, nonlnear relatonshps between Ž x. and x are often monotonc, wth Ž x. ncreasng contnuously or Ž x. decreasng contnuously as x ncreases. The S-shaped curves n Fgure 4. are typcal. The most mportant curve wth ths shape has the model formula expž q x. Ž x. s. Ž q expž q x. Ths s the logstc regresson model. As x, Ž x. x0 when 0 and Ž x. 1 when 0. Let s fnd the lnk functon for whch logstc regresson s a GLM. For Ž 4.6. the odds are Ž x. 1 y Ž x. s expž q x.. The log odds has the lnear relatonshp Ž x. log s q x. Ž y Ž x.
40 GENERALIZED LINEAR MODELS FOR BINARY DATA 13 FIGURE 4. Logstc regresson functons. Thus, the approprate lnk s the log odds transformaton, the logt. Logstc regresson models are GLMs wth bnomal random component and logt lnk functon. Logstc regresson models are also called logt models. The logt s the natural parameter of the bnomal dstrbuton, so the logt lnk s ts canoncal lnk. Whereas Ž x. must fall n the Ž 0, 1. range, the logt can be any real number. The real numbers are also the range for lnear predctors Ž such as q x. that form the systematc component of a GLM. So ths model does not have the structural problem that s true of the lnear probablty model. For the snorng data n Table 4., software reports the logstc regresson ML ft logt ˆ Ž x. sy3.87 q 0.40 x. The postve ˆ s 0.40 reflects the ncreased ncdence of heart dsease at hgher snorng levels. In Chapters 5 and 6 we study logstc regresson n detal and nterpret such equatons. Estmated probabltes result from substtutng x values nto the estmate of probablty formula Ž Table 4. also reports these ftted values. Fgure 4.1 dsplays the ft. The ft s close to lnear over ths narrow range of estmated probabltes, and results are smlar to those for the lnear probablty model.
41 14 INTRODUCTION TO GENERALIZED LINEAR MODELS 4..4 Bnomal GLM for Contngency Tables Among the smplest GLMs for a bnary response s the one havng a sngle explanatory varable X that s also bnary. Label ts values by 0 and 1. For a gven lnk functon, the GLM has the effect of X descrbed by lnk Ž x. s q x s lnk Ž 1. y lnk Ž 0.. For the dentty lnk, s Ž. 1 y Ž. 0 s the dfference between propor- Ž.x 1 y logw Ž.x 0 s logw Ž. 1 r Ž.x 0 s the tons. For the log lnk, s logw log relatve rsk. For the logt lnk, Ž 1. Ž 0. s logt Ž 1. y logt Ž 0. s log y log 1 y Ž 1. 1y Ž 0. Ž 1. r 1 y Ž 1. s log Ž 0. r 1 y Ž 0. s the log odds rato. Measures of assocaton for tables are effect parameters n GLMs for bnary data Probt and Inverse CDF Lnk Functons* A monotone regresson curve such as the frst one n Fgure 4. has the shape of a cumulatve dstrbuton functon Ž cdf. for a contnuous random varable. Ths suggests a model for a bnary response havng form Ž x. s FŽ x. for some cdf F. Usng an entre class of locaton-scale cdf s, such as normal cdf s wth ther varety of means and varances, permts the curve Ž x. s FŽ x. to have flexblty n the rate of ncrease and n the locaton where most of that ncrease occurs. Let Ž. denote the standard cdf of the class, such as the NŽ 0, 1. cdf. Usng but wrtng the model as Ž x. s Ž q x. Ž 4.8. provdes the same flexblty. Shapes of dfferent cdf s n the class occur as and vary. Replacng x by x permts the curve to ncrease at a dfferent rate than the standard cdf Ž or even to decrease f 0;varyng. moves the curve to the left or rght. When s strctly ncreasng over the entre real lne, ts nverse functon y1 Ž. exsts and Ž 4.8. s, equvalently, y1 Ž x. s q x. Ž 4.9.
42 GENERALIZED LINEAR MODELS FOR COUNTS 15 For ths class of cdf shapes, the lnk functon for the GLM s y1. The lnk functon maps the Ž 0, 1. range of probabltes onto Ž y,., the range of lnear predctors. The curve has the shape of a normal cdf when s the standard normal cdf. Model Ž 4.9. s then called the probt model. Ths curve has smlar appearance to the logstc regresson curve. Probt models are dscussed n Secton 6.6. When 0, the logstc regresson curve Ž 4.6. s a cdf for the logstc dstrbuton. When 0, the curve for 1 y Ž x., the probablty Y s 0, has that appearance. The cdf of the logstc dstrbuton wth mean and dsperson parameter 0s exp Ž x y. r FŽ x. s, y x. 1 q exp Ž x y. r The correspondng probablty densty functon s symmetrc and bell-shaped, wth standard devaton r' 3 Žhere, s the mathematcal constant It looks much lke the normal densty wth the same mean and standard devaton but wth slghtly thcker tals. ŽIts kurtoss equals that of a t dstrbuton wth df s 9.. The standardzed form of the logstc cdf has s 0 and s 1, so x x s e rž1 q e x.. For that functon, the logstc regresson curve Ž 4.6. has form Ž x. s Ž q x.. By Ž 4.9. the logt transformaton s smply the nverse functon for the standard logstc cdf; that s, when Ž x. s Ž x. s x Ž x. y1 e r 1 q e, then x s w Ž x.xs logw Ž x. rž1 y Ž x..x. 4.3 GENERALIZED LINEAR MODELS FOR COUNTS The best known GLMs for count data assume a Posson dstrbuton for Y. We ntroduced ths dstrbuton n Secton In Chapters 8 and 9 we present Posson GLMs for counts n contngency tables wth categorcal response varables. In ths secton we ntroduce Posson GLMs usng an alternatve applcaton: modelng count or rate data for a sngle dscrete response varable Posson Loglnear Models The Posson dstrbuton has a postve mean. Although a GLM can model a postve mean usng the dentty lnk, t s more common to model the log of the mean. Lke the lnear predctor q x, the log mean can take any real value. The log mean s the natural parameter for the Posson dstrbuton, and the log lnk s the canoncal lnk for a Posson GLM. A Posson loglnear GLM assumes a Posson dstrbuton for Y and uses the log lnk. The Posson loglnear model wth explanatory varable X s log s q x. Ž 4.10.
43 16 INTRODUCTION TO GENERALIZED LINEAR MODELS For ths model, the mean satsfes the exponental relatonshp x s exp q x s e e A 1-unt ncrease n x has a multplcatve mpact of e on : The mean at x q 1 equals the mean at x multpled by e Horseshoe Crab Matng Example We llustrate Posson GLMs for Table 4.3 from a study of nestng horseshoe crabs. Each female horseshoe crab had a male crab resdent n her nest. The study nvestgated factors affectng whether the female crab had any other males, called satelltes, resdng nearby. Explanatory varables are the female crab s color, spne condton, weght, and carapace wdth. The response outcome for each female crab s her number of satelltes. For now, we use wdth alone as a predctor. Table 4.3 lsts wdth n centmeters. The sample mean wdth equals 6.3 and the standard devaton equals.1. Fgure 4.3 plots the response counts of satelltes aganst wdth, wth numbered symbols ndcatng the number of observatons at each pont. The substantal varablty makes t dffcult to dscern a clear trend. To get a clearer pcture, we grouped the female crabs nto wdth categores ŽF 3.5, , , , , , , 9.5. and calculated the sample mean number of satelltes for female crabs n each category. Fgure 4.4 plots these sample means aganst the sample mean wdth for crabs n each category. More sophstcated ways of portrayng the trend smooth the data wthout groupng the wdth values or assumng a partcular functonal relatonshp. Fgure 4.4 also shows a smoothed curve based on an extenson of the GLM ntroduced n Secton 4.8. The sample means and the smoothed curve both show a strong ncreasng trend. ŽThe means tend to fall above the curve, snce the response counts n a category tend to be skewed to the rght; the smoothed curve s less susceptble to outlyng observatons.. The trend seems approxmately lnear, and we dscuss next models for the ungrouped data for whch the mean or the log of the mean s lnear n wdth. For a female crab, let be the expected number of satelltes and x s wdth. From GLM software Ž e.g., for SAS, see Table A.4., the ML ft of the Posson loglnear model Ž s log ˆ s ˆ q ˆ x sy3.305 q x. The effect ˆ s of wdth s postve, wth SE s The model ftted value at any wdth level s an estmated mean number of satelltes. ˆ For nstance, the ftted value at the mean wdth of x s 6.3 s ˆ s exp ˆ q ˆ x s exp y3.305 q 0.164Ž 6.3. s.74.
44 GENERALIZED LINEAR MODELS FOR COUNTS 17 TABLE 4.3 Number of Crab Satelltes by Female s Characterstcs a C S W Wt Sa C S W Wt Sa C S W Wt Sa C S W Wt Sa a C, color Ž 1, lght medum;, medum; 3, dark medum; 4, dark.; S, spne condton Ž1, both good;, one worn or broken; 3, both worn or broken.; W, carapace wdth Ž cm.; Wt, weght Ž kg.; Sa, number of satelltes. Source: Data courtesy of Jane Brockmann, Zoology Department, Unversty of Florda; study descrbed n Ethology 10:
45 18 INTRODUCTION TO GENERALIZED LINEAR MODELS FIGURE 4.3 Number of satelltes by wdth of female crab. For ths model, expž ˆ. s expž s 1.18 s the multplcatve effect on ˆ for a 1-cm ncrease n x. For nstance, the ftted value at x s 7.3 s 6.3 q 1 s expwy3.305 q 0.164Ž 7.3.x s 3.3, whch equals A 1-cm ncrease n wdth yelds an 18% ncrease n the estmated mean. Fgure 4.4 shows that EY may grow approxmately lnearly wth wdth. Ths suggests the Posson GLM wth dentty lnk. It has ML ft ˆ s ˆ q ˆ x sy11.53 q 0.55 x. Ths model has an addtve rather than a multplcatve effect of X on. A 1-cm ncrease n x has an estmated ncrease of ˆ s 0.55 n. ˆ The ftted values are postve at all sampled x, and the model descrbes smply the effect: On the average, about a -cm ncrease n wdth s assocated wth an extra satellte. Fgure 4.5 plots ˆ aganst wdth for the models wth log lnk and dentty lnk. Although they dverge somewhat for relatvely small and large wdths, they provde smlar predctons over the wdth range n whch most observatons occur. We now study whether ether model fts adequately.
46 GENERALIZED LINEAR MODELS FOR COUNTS 19 FIGURE 4.4 Smoothngs of horseshoe crab counts. TABLE 4.4 Sample Mean and Varance of Number of Satelltes Number of Number of Sample Sample Wdth Ž cm. Cases Satelltes Mean Varance
47 130 INTRODUCTION TO GENERALIZED LINEAR MODELS FIGURE 4.5 Estmated mean number of satelltes for log and dentty lnks Overdsperson for Posson GLMs In Secton 1..4 we noted that count data often show greater varablty than the Posson allows. For the grouped horseshoe crab data, Table 4.4 shows the sample mean and varance for the counts of number of satelltes for the female crabs n each wdth category. The varances are much larger than the means, whereas Posson dstrbutons have dentcal mean and varance. The greater varablty than predcted by the GLM random component reflects o erdsperson. A common cause of overdsperson s subject heterogenety. For nstance, suppose that wdth, weght, color, and spne condton are the four predctors that affect a female crab s number of satelltes. Suppose that Y has a Posson dstrbuton at each fxed combnaton of those predctors. Our model uses wdth alone as a predctor. Crabs havng a certan wdth are then a mxture of crabs of varous weghts, colors, and spne condtons. Thus, the populaton of crabs havng that wdth s a mxture of several Posson populatons, each havng ts own mean for the response. Ths heterogenety results n an overall response dstrbuton at that wdth havng greater varaton than the Posson predcts. If the varance equals the mean when all relevant varables are controlled, t exceeds the mean when only one s controlled. Overdsperson s not an ssue n ordnary regresson wth normally dstrbuted Y, because that dstrbuton has a separate parameter Ž the varance.
48 GENERALIZED LINEAR MODELS FOR COUNTS 131 to descrbe varablty. For bnomal and Posson dstrbutons, however, the varance s a functon of the mean. Overdsperson s common n the modelng of counts. When the model for the mean s correct but the true dstrbuton s not Posson, the ML estmates of model parameters are stll consstent but standard errors are ncorrect. We next ntroduce an extenson of the Posson GLM that has an extra parameter and accounts better for overdsperson. In Secton 4.7 we present another approach for ths, quaslkelhood nference Negatve Bnomal GLMs The negat e bnomal dstrbuton has probablty mass functon k y Ž y q k. ž k / ž k / Ž k. Ž y q 1. q k q k f y; k, s 1 y, y s 0, 1,,..., where k and are parameters. Ths dstrbuton has EŽ Y. s, varž Y. s q rk. Ž 4.1. y1 y1 The ndex k s called a dsperson parameter. Ask 0, varž Y. and the negatve bnomal dstrbuton converges to the Posson ŽCameron and. y1 Trved 1998, p. 75. Usually, k s unknown. Estmatng t helps summarze the extent of overdsperson. For k fxed, one can express Ž 4.1. n natural exponental famly form Ž Then, a model wth negatve bnomal random component s a GLM. For smplcty, such models let k be the same constant for all observatons but treat t as unknown. As n GLMs for bnary data, a varety of lnk functons are possble. Most common s the log lnk, as n Posson loglnear models, but sometmes the dentty lnk s adequate. In Secton 13.4 we dscuss negatve bnomal GLMs. We llustrate t here for the crab data analyzed above wth Posson GLMs. Wth the dentty lnk and wdth as predctor, the Posson GLM has ˆ sy11.53 q 0.55 x ŽSE s 0.06 for ˆ.. For the negatve bnomal GLM, ˆ sy11.15 q 0.53 x ŽSE s Moreover, ˆy1 k s 0.98, so at a predcted, ˆ the estmated varance s roughly ˆ q ˆ, compared to ˆ for the Posson GLM. Although ftted values are smlar, the greater SE for ˆ and the greater estmated varance n the negatve bnomal model reflect the overdsperson uncaptured wth the Posson GLM Posson Regresson for Rates When events of a certan type occur over tme, space, or some other ndex of sze, t s usually more relevant to model the rate at whch they occur than the number of them. For nstance, a study of homcdes n a gven year for a
49 13 INTRODUCTION TO GENERALIZED LINEAR MODELS sample of ctes mght model the homcde rate, defned for a cty as ts number of homcdes that year dvded by ts populaton sze. The model mght descrbe how the rate depends on the cty s unemployment rate, ts resdents medan ncome, and the percentage of resdents havng completed hgh school. In Secton 9.7 we dscuss Posson regresson for modelng rates Posson GLM of Independence n I J Contngency Tables One use of Posson loglnear models s n modelng counts n contngency tables. We llustrate for two-way tables wth ndependent counts Y 4 j havng Posson dstrbutons wth means 4.Suppose that 4 satsfy j s, j j where 4 and 4 j are postve constants satsfyng Ý s Ý j js 1. Ths s a multplcatve model, but a lnear predctor for a GLM results usng the log lnk, log s q q j j, Ž where s log, s log, j s log j. Ths Posson loglnear model has addtve man effects of the two classfcatons but no nteracton. Snce the Y 4 j are ndependent, the total sample sze ÝÝY j j has a Posson dstrbuton wth mean ÝÝ j js. Condtonal on ÝÝY j js n, the cell counts have a multnomal dstrbuton wth probabltes j s r s.smlarly, 4 j j you can check that condtonal on n, the row totals Y 4 have a multnomal dstrbuton wth probabltes s 4 q q and the column totals Y 4 qj have a multnomal dstrbuton wth probabltes s 4 qj j. Condtonal on n, the model s a multnomal one that satsfes js j s q qj. Ths s ndependence of the two classfcatons. In fact, n Posson form ndependence s the loglnear model Ž The nferences conducted n Chapter 3 about ndependence n two-way contngency tables relate to GLMs, ether Posson loglnear models or correspondng multnomal models that fx n or the row or column totals. In Chapters 8 and 9 we present more complex loglnear models for contngency tables. j 4.4 MOMENTS AND LIKELIHOOD FOR GENERALIZED LINEAR MODELS* Havng ntroduced GLMs for bnary and count data, we now turn our attenton to detals such as lkelhood equatons and methods for fttng them. The remander of ths chapter s somewhat techncal, provdng general results applyng to most modelng methods presented n subsequent chapters. See McCullagh and Nelder Ž for further detals.
50 MOMENTS AND LIKELIHOOD FOR GENERALIZED LINEAR MODELS 133 It s helpful to extend the notaton for a GLM so that t can handle many dstrbutons that have a second parameter. The random component of the GLM specfes that the N observatons Ž y,..., y. 1 N on Y are ndependent, wth probablty mass or densty functon for y of form 4 fž y ;,. s exp y y bž. raž. q cž y,.. Ž Ths s called the exponental dsperson famly and s called the dsperson parameter Ž Jorgensen The parameter s the natural parameter. When s known, Ž smplfes to the form Ž 4.1. for the natural exponental famly, whch s fž y ;. s až. bž y. exp yq Ž.. We dentfy QŽ. here wth raž. n Ž 4.14., až. wth expwybž. raž.x n Ž 4.14., and bž y. wth expwcž y,.x n Ž The more general formula Ž s not needed for one-parameter famles such as the bnomal and Posson. Usually, až. has form až. s r for a known weght. For nstance, when y s a mean of n ndependent readngs, such as a sample proporton for n Bernoull trals, s n Ž Secton Mean and Varance Functons for the Random Component General expressons for EŽ Y. and varž Y. use terms n Ž Let Ls log fž y;,. denote the contrbuton of y to the log lkelhood; that s, the log-lkelhood functon s L s Ý L. Then, from Ž 4.14., Therefore, Ls y y bž. raž. q cž y,.. Ž L r s y y b Ž. raž., L r syb Ž. raž., where b Ž. and b Ž. denote the frst two dervatves of bž. evaluated at.wenow apply the general lkelhood results ž / L L L ž / ž / E s 0 and ye s E, whch hold under regularty condtons satsfed by the exponental famly Ž Cox and Hnkley 1974, Sec From the frst formula appled wth a sngle observaton, EYy w b Ž.xraŽ. s 0, or s E Y s b. 4.16
51 134 INTRODUCTION TO GENERALIZED LINEAR MODELS From the second formula, b Ž. raž. s E Y y b Ž. raž. s varž Y. r až., so that varž Y. s b Ž. až.. Ž Ž. In summary, the functon b n 4.14 determnes moments of Y Mean and Varance Functons for Posson and Bnomal We llustrate the mean and varance expressons for Posson and bnomal dstrbutons. When Y s Posson, e y y fž y ;. s sexpž y log y y log y!. y! s exp y y expž. y log y!, where s log. Ths has exponental dsperson form Ž wth bž. s expž., až. s 1, and cž y,. sylog y!. The natural parameter s s log. From Ž and Ž 4.17., EŽ Y s b. Ž. s expž. s, varž Y s b. Ž. s expž. s. Next, suppose that ny has a bnž n,. dstrbuton; that s, here y s the sample proporton Ž rather than number. of successes, so EY s ndepen- dent of n. Let s logw rž 1 y.x. Then, s expž. rw1 q expž.x and logž 1 y. sylogw1 q expž.x. Extendng Ž 4.3., one can show that ž / n n n yny y fž y ;, n. s Ž 1 y. ny y y log 1 q expž. n s exp q log. Ž rn ny ž / Ths has exponental dsperson form Ž wth bž. s logw1 q expž.x, ž n / ny až. s 1rn, and cž y,. s log s logw rž 1 y.x. From Ž and Ž 4.17.,. The natural parameter s the logt, EŽ Y. s b Ž. s expž. r 1 q expž. s, ½ 5 varž Y. s b Ž. až. s expž. r 1 q expž. n s Ž 1 y. rn.
52 MOMENTS AND LIKELIHOOD FOR GENERALIZED LINEAR MODELS Systematc Component and Lnk Functon Let Ž x,..., x. 1 p denote values of explanatory varables for observaton. The systematc component of a GLM relates parameters 4 to these varables usng a lnear predctor Ý s x, s 1,...,N. j j j In matrx form, s X, where s Ž,...,., s Ž,...,. 1 N 1 p are column vectors of model parameters, and X s the N p matrx of values of the explanatory varables for the N subjects. In ordnary lnear models, X s called the desgn matrx. It need not refer to an expermental desgn, however, and the GLM lterature calls t the model matrx. The GLM lnks to s EY by a lnk functon gž.. Thus, relates to the explanatory varables by s g Ž. s Ý jx j, s 1,..., N. j The lnk functon g for whch gž. s n Ž s the canoncal lnk. For t, the drect relatonshp s Ý x j j j occurs between the natural parameter and the lnear predctor. Snce s b Ž., the natural parameter s the functon of the mean, Ž. y1 Ž. y1 s b, where b Ž. denotes the nverse functon to b. Thus, the canoncal lnk s the nverse of b.inthe Posson case, for nstance, bž. s expž.,sob Ž. s expž. s. Thus, Žb. y1 Ž. s the nverse of the expo- nental functon, whch s the log functon e., s log.. The canoncal lnk s the log lnk Lkelhood Equatons for a GLM For N ndependent observatons, from Ž the log lkelhood s y y bž. LŽ. s ÝLs Ý log fž y ;,. sý qýcž y,.. až. Ž The notaton L reflects the dependence of on the model parameters.
53 136 INTRODUCTION TO GENERALIZED LINEAR MODELS The lkelhood equatons are LŽ. r js Ý Lr js 0 for all j. To dfferentate the log lkelhood 4.19, we use the chan rule, L L s. Ž 4.0. j j Snce L r s w y y b Ž.xraŽ., and snce s b Ž. and varž Y. s b Ž. až. from Ž and Ž 4.17., L r s y y ra, r s b Ž. Ž. s varž Y. raž.. Also, snce s Ý x, j j j r s x. j j Fnally, snce s gž., r depends on the lnk functon for the model. In summary, substtutng nto Ž 4.0. gves us L yy až. Ž yy. xj s xjs. Ž 4.1. až. varž Y. varž Y. The lkelhood equatons are j N Ž yy. xj Ý varž Y. s1 s 0, j s 1,..., p. Ž 4.. Although does not appear n these equatons, t s there mplctly through y1, snce s g Ž Ý x. j j j. Dfferent lnk functons yeld dfferent sets of equatons. Interestngly, the lkelhood equatons Ž 4.. depend on the dstrbuton of Y only through and varž Y.. The varance tself depends on the mean through a partcular functonal form varž Y. s Ž. for some functon, such as Ž. s for the Posson, Ž. s Ž 1 y. for the Bernoull, and s e., constant. for the normal. When Y has dstrbuton n the natural exponental famly, the relatonshp between the mean and the varance characterzes the dstrbuton Ž Jorgensen For nstance, f Y has dstrbuton n the natural exponental famly and f Ž. s, then necessarly Y has the Posson dstrbuton.
54 MOMENTS AND LIKELIHOOD FOR GENERALIZED LINEAR MODELS Lkelhood Equatons for Bnomal GLMs Usng notaton from Secton 4.4., suppose that ny has a bnž n,. dstrbuton. Then y s a sample proporton of successes for n trals. The bnomal GLM Ž 4.8. for a sngle predctor extends wth several predctors to ž Ý / s x, Ž 4.3. j j j where s the standard cdf of some class of contnuous dstrbutons. Snce s s Ž. wth s Ý x, j j j ž / r s Ž. s Ý jx j, where Ž u. s Ž u. r u Ž.e., the probablty densty functon correspondng to the cdf.. Snce varž Y. s Ž 1 y. rn, the lkelhood equatons Ž 4.. smplfy to ž / j nž yy. x j x s 0, Ž 4.4. Ž 1 y. Ý Ý j j y1 where s Ý j jx j. These depend on the lnk functon through the dervatve of ts nverse. For the logt lnk, s logw rž 1 y.x,so r s 1rw Ž 1 y.x and r s r s Ž 1 y.. Then the lkelhood equatons Ž 4.. and Ž 4.4. smplfy to j Ý n Ž y y. x j s 0, Ž 4.5. where satsfes Ž 4.3. wth the standard logstc cdf Asymptotc Covarance Matrx of Model Parameter Estmators The lkelhood functon for the GLM also determnes the asymptotc covarance matrx of the ML estmator. ˆ Ths matrx s the nverse of the w nformaton matrx I, whch has elements E y LŽ. r x h j. To fnd ths, for the contrbuton L to the log lkelhood we use the helpful result ž / ž /ž / L L L E sye, h j h j
55 138 INTRODUCTION TO GENERALIZED LINEAR MODELS whch holds for exponental famles Cox and Hnkley 1974, Sec Thus, ž / L Ž Yy. xh Ž Yy. xj E sye from Ž 4.1. varž Y. varž Y. h j Snce LŽ. s Ý L, ž / yxh xj s. varž Y. ž / N LŽ. xh xj E y s. varž Y. Ý ž / h j s1 Generalzng from ths typcal element to the entre matrx, the nformaton matrx has the form I s X WX, Ž 4.6. where W s the dagonal matrx wth man-dagonal elements Ž. w s r rvar Y. 4.7 ˆ The asymptotc covarance matrx of s estmated by $ y1 ˆy1 cov ˆ s I s Ž XWX ˆ., Ž 4.8. ˆ ˆ where W s W evaluated at. From 4.7, the form of W also depends on the lnk functon. We ll see an example for Posson GLMs next and for bnomal GLMs n Secton Lkelhood Equatons and Covarance Matrx for Posson Loglnear Model The general Posson loglnear model Ž 4.4. has the matrx form log s X. For the log lnk, s log,so s expž. and r s expž. s. Snce varž Y. s, the lkelhood equatons Ž 4.. smplfy to These equate the suffcent statstcs Ý yx Ý Ž y y. x j s 0. Ž 4.9. j for to ther expected values.
56 INFERENCE FOR GENERALIZED LINEAR MODELS 139 Also, snce Ž. w s r rvar Y s ˆ ˆ y1 the estmated covarance matrx Ž 4.8. of s Ž XWX., where Wˆ s the dagonal matrx wth elements of ˆ on the man dagonal. 4.5 INFERENCE FOR GENERALIZED LINEAR MODELS For most GLMs the lkelhood equatons Ž 4.. are nonlnear functons of. For now, we put off detals about solvng them for the ML estmator ˆ and focus nstead on usng the ft for statstcal nference. The Wald, score, and lkelhood-rato methods ntroduced n Secton for sgnfcance testng and nterval estmaton apply to any GLM. In ths secton we concentrate on lkelhood-rato nference, through the de ance of the GLM Devance and Goodness of Ft From Secton 4.1.5, the saturated GLM has a separate parameter for each observaton. It gves a perfect ft. Ths sounds good, but t s not a helpful model. It does not smooth the data or have the advantages that a smpler model has, such as parsmony. Nonetheless, t serves as a baselne for other models, such as for checkng model ft. A saturated model explans all varaton by the systematc component of the model. Let denote the estmate of for the saturated model, correspondng to estmated means s y for all. For a partcular unsatu- rated model, denote the correspondng ML estmates by ˆ and ˆ. For maxmzed log lkelhood LŽ ; ˆ y. for that model and maxmzed log lkelhood LŽ y; y. n the saturated case, y log maxmum lkelhood for model maxmum lkelhood for saturated model syl ; Ž ˆ y. y LŽ y; y. descrbes lack of ft. It s the lkelhood-rato statstc for testng the null hypothess that the model holds aganst the alternatve that a more general model holds. From Ž 4.19., y LŽ ; ˆ y. y LŽ y; y. Ý s y y b raž. y y ˆ y b ˆ raž.. Ý
57 140 INTRODUCTION TO GENERALIZED LINEAR MODELS Usually, až. n Ž has the form až. s r, and ths statstc equals y y ˆ y b q b ˆ r s DŽ y; ˆ. r. Ž Ý ž / Ths s called the scaled de ance and DŽ y; ˆ. s called the de ance. The greater the scaled devance, the poorer the ft. For some GLMs the scaled devance has an approxmate ch-squared dstrbuton Devance for Posson Models For Posson GLMs, by Secton 4.4., ˆ s log and bž ˆ. s exp Ž ˆ. ˆ s ˆ. Smlarly, s log y and bž. s y for the saturated model. Also až. s 1, so the devance and scaled devance Ž equal Ý Ž. DŽ y; ˆ. s y log y r ˆ y y q ˆ. Ž When a model wth log lnk contans an ntercept term, the lkelhood equaton Ž 4.9. mpled by that parameter s Ý ys Ý ˆ. Then the devance smplfes to Ý DŽ y; ˆ. s y log y r ˆ. Ž 4.3. For two-way contngency tables, ths reduces to the G statstc Ž n Secton 3..1, substtutng cell count nj for y and the ndependence ftted value ˆ j for ˆ. For a Posson or multnomal model appled to a contngency table wth a fxed number of cells N, wewll see n Secton 14.3 that the devance has an approxmate ch-squared dstrbuton for large Devance for Bnomal Models: Grouped and Ungrouped Data Now consder bnomalglmswthsampleproportons y 4 based on n 4 trals. By Secton 4.4., ˆ s logw rž 1 y.x and bž ˆ. s logw1 q expž ˆ.x ˆ ˆ s ylogž 1 y.. Smlarly, s logw y rž 1 y y.x and bž. sylogž 1 y y. ˆ for the saturated model. Also, až. s 1rn,so s 1 and s n. The devance Ž equals ½ 5 ž / y ˆ Ý n y log y log q log 1 y y y log 1 y ˆ 1 y y 1 y ˆ ny n ˆ 1 y y s Ýnylog y Ýnylog q Ýn log n y ny ny n ˆ 1 y ˆ ny ny ny s Ýnylog q ÝŽ ny ny. log. n ˆ n y n ˆ
58 INFERENCE FOR GENERALIZED LINEAR MODELS 141 At settng, ny s the number of successes and Ž n y ny. s the number of falures, s 1,..., N. Thus, the devance s a sum over the N cells of successes and falures and has the same form, Ý DŽ y; ˆ. s observed logž observedrftted., Ž as the devance Ž 4.3. for Posson loglnear models wth ntercept term. Wth bnomal responses, t s possble to construct the data fle as expressed here wth the counts of successes and falures at each settng for the predctors, or wth the ndvdual Bernoull 0 1 observatons at the subject level. The devance dffers n the two cases. In the frst case the saturated model has a parameter at each settng for the predctors, whereas n the second case t has a parameter for each subject. We refer to these as grouped data and ungrouped data cases. The approxmate ch-squared dstrbuton for the devance occurs for grouped data but not for ungrouped data Ž see Problems 4. and Wth grouped data, the sample sze ncreases for a fxed number of settngs of the predctors and hence a fxed number of parameters for the saturated model Lkelhood-Rato Model Comparson Usng the Devance For a Posson or bnomal model M, s 1, so the devance Ž equals DŽ y; ˆ. sy LŽ ; ˆ y. y LŽ y; y.. Ž Consder two models, M0 wth ftted values ˆ 0 and M1 wth ftted values ˆ1, wth M0 a specal case of M 1. Model M0 s sad to be nested wthn M 1. Snce M0 s smpler than M 1,asmaller set of parameter values satsfes M0 than satsfes M 1. Maxmzng the log lkelhood over a smaller space cannot yeld a larger maxmum. Thus, LŽ ˆ ; y. F LŽ ; y. 0 ˆ1, and t follows from Ž wth the same LŽ y; y. for each model that D y; ˆ F D y; ˆ. 1 0 Smpler models have larger devances. Assumng that model M1 holds, the lkelhood-rato test of the hypothess that M holds uses the test statstc y L ˆ ; y y L ˆ ; y 0 1 sy L ˆ ; y y LŽ y; y. y y L ˆ ; y y LŽ y; y s D y; ˆ y D y; ˆ. 0 1 The lkelhood-rato statstc comparng the two models s smply the dfference between the devances. Ths statstc s large when M0 fts poorly compared to M 1.
59 14 INTRODUCTION TO GENERALIZED LINEAR MODELS In fact, snce the part n 4.30 nvolvng the saturated model cancels, the dfference between devances, DŽ y;. y DŽ y;. s y ˆ y ˆ y b ˆ q b ˆ, ˆ 0 ˆ1 Ý ž 1 0 / Ž 1. Ž 0. also has the form of the devance. Under regularty condtons, ths dfference has approxmately a ch-squared null dstrbuton wth df equal to the dfference between the numbers of parameters n the two models. For bnomal GLMs and Posson loglnear GLMs wth ntercept, from expresson Ž for the devance, the dfference n devances uses the observed counts and the two sets of ftted values n the form DŽ y; ˆ 0. y DŽ y; ˆ1. s Ý observed logž ftted 1rftted 0.. Wth bnomal responses, the test comparng models does not depend on whether the data fle has grouped or ungrouped form. The saturated model dffers n the two cases, but ts log lkelhood cancels when one forms the dfference between the devances Resduals for GLMs When a GLM fts poorly accordng to an overall goodness-of-ft test, examnaton of resduals hghlghts where the ft s poor. One type of resdual uses components of the devance. In Ž let DŽ y; ˆ. s Ýd, where ž / d s y y ˆ y b q b ˆ. The de ance resdual for observaton s An alternatve s the Pearson resdual, ' Ž. d sgn y y ˆ, Ž yy ˆ e s $. Ž r. varž Y. For nstance, for a Posson GLM, varž Y. s and the Pearson resdual s e s y y ˆ r ˆ. ' For two-way contngency tables dentfyng y wth cell count nj and ˆ wth the ndependence ftted value ˆj, ths has the form 3.1 ; then Ýe j s X, the Pearson X statstc. Smlarly, the sum of squared devance resduals Ýd s G, the lkelhood-rato statstc for testng ndependence. j
60 FITTING GENERALIZED LINEAR MODELS 143 When the model holds, Pearson and devance resduals are less varable than standard normal because they compare y to the ftted means rather than the true mean Že.g., the denomnator of Ž estmates wvarž Y.x 1r s wvarž Y y.x 1r rather than wvarž Y y.x 1r. ˆ. Standardzed resduals dvde the ordnary resduals by ther asymptotc standard errors. For GLMs the asymptotc covarance matrx of the vector of the raw resduals y y ˆ 4 s covž Y y ˆ. s covž Y. wiy Hat x. Here, I s the dentty matrx and Hat s the hat matrx, 1r y1 1r Hat s W XŽ X WX. X W, Ž where W s the dagonal matrx wth elements Ž 4.7. Ž Pregbon Let ˆh denote the estmated dagonal element of Hat for observaton, called ts le erage. Then, standardzng by dvdng yy ˆ by ts estmated SE yelds the standardzed Pearson resdual yy ˆ e r s s. Ž r. varž Y. 1 y ˆh ' 1 y ˆh ½ 5 For Posson GLMs, for nstance, r s Ž y y. r 1 y ˆ h. Perce and Schafer Ž presented standardzed devance resduals. In lnear models the hat matrx s so-named because Hat y projects the data to the ftted values, ˆ s mu-hat. For GLMs, applyng the estmated hat matrx to a lnearzed approxmaton for gž. y yelds ˆ s gž ˆ., the model s estmated lnear predctor values. The greater an observaton s leverage, the greater ts potental nfluence on the ft. As n ordnary regresson, the leverages fall between 0 and 1 and sum to the number of model parameters. Unlke ordnary regresson, the hat values depend on the ft as well as the model matrx, and ponts that have extreme predctor values need not have hgh leverage. ' ˆ ˆ 4.6 FITTING GENERALIZED LINEAR MODELS Fnally, we study how to fnd the ML estmators ˆ of GLM parameters. The lkelhood equatons Ž 4.. are usually nonlnear n. ˆ We descrbe a general-purpose teratve method for solvng nonlnear equatons and apply t two ways to determne the maxmum of a lkelhood functon Newton Raphson Method The Newton Raphson method s an teratve method for solvng nonlnear equatons, such as equatons whose soluton determnes the pont at whch a functon takes ts maxmum. It begns wth an ntal guess for the soluton. It
61 144 INTRODUCTION TO GENERALIZED LINEAR MODELS obtans a second guess by approxmatng the functon to be maxmzed n a neghborhood of the ntal guess by a second-degree polynomal and then fndng the locaton of that polynomal s maxmum value. It then approxmates the functon n a neghborhood of the second guess by another second-degree polynomal, and the thrd guess s the locaton of ts maxmum. In ths manner, the method generates a sequence of guesses. These converge to the locaton of the maxmum when the functon s sutable andror the ntal guess s good. In more detal, here s how Newton Raphson determnes the value ˆ at whch a functon LŽ. s maxmzed. Let u s Ž LŽ. r, LŽ. r, Let H denote the matrx havng entres h s LŽ. ab r a b, called the Hessan matrx. Let u Žt. and H Žt. be u and H evaluated at Žt., the guess t for. ˆ Step t n the teratve process Ž t s 0, 1,,.... approxmates LŽ. near Žt. by the terms up to second order n ts Taylor seres expanson, Žt. Žt. Žt. 1 Žt. Žt. Žt. L f L q u y q y H y. Žt. Žt. Ž Žt.. Solvng L r f u q H y s 0 for yelds the next guess. That guess can be expressed as Žtq1. Žt. Žt. y1 Žt. s y Ž H. u, Ž Žt. assumng that H s nonsngular. ŽHowever, computng routnes use standard methods for solvng the lnear equatons rather than explctly calculatng the nverse.. Ž Žt. Iteratons proceed untl changes n L. between successve cycles are suffcently small. The ML estmator s the lmt of Žt. as t ; however, ths need not happen f LŽ. has other local maxma at whch the dervatve of LŽ. equals 0. In that case, a good ntal estmate s crucal. To help understand the Newton Raphson process, work through these steps when has a sngle element Ž Problem Then, Fgure 4.6 llustrates a cycle of the method, showng the parabolc Ž second-order. approxmaton at a gven step. In the next chapter we use Newton Raphson for logstc regresson models. For now, we llustrate t wth a smpler problem for whch we know the answer, maxmzng the log lkelhood based on an observaton y from a bnž n,. dstrbuton. From Secton 1.3., the frst two dervatves of LŽ. s y log q Ž n y y. logž 1 y. are u s y y n r 1 y, H sy yr q n y y r 1 y. Each Newton Raphson step has the form Žt. y ny y yy n Žtq1. Žt. s q q. Žt. Žt. Žt. Žt. Ž. Ž 1 y. Ž 1 y. y1
62 FITTING GENERALIZED LINEAR MODELS 145 FIGURE 4.6 Cycle of Newton Raphson method. Ths adjusts Žt. up f yrn Žt. and down f yrn Žt.. For nstance, Ž0. 1 Ž1. Žt. wth s, you can check that s yrn. When s yrn, noadjustment occurs and Žtq1. s yrn, whch s the correct answer for. ˆ For 1 startng values other than, adequate convergence usually takes four or fve teratons. Žt. The convergence of to ˆ for the Newton Raphson method s usually fast. For large t, the convergence satsfes, for each j, Žtq1. ˆ Žt. y F c y ˆ j j j j for some c 0 and s referred to as second-order. Ths mples that the number of correct decmals n the approxmaton roughly doubles after suffcently many teratons. In practce, t often takes relatvely few teratons for satsfactory convergence Fsher Scorng Method Fsher scorng s an alternatve teratve method for solvng lkelhood equatons. It resembles the Newton Raphson method, the dstncton beng wth the Hessan matrx. Fsher scorng uses the expected alue of ths matrx, called the expected nformaton, whereas Newton Raphson uses the matrx tself, called the obser ed nformaton. Žt. Let I denote the approxmaton t for the ML estmate of the expected Žt. Ž nformaton matrx; that s, has elements ye LŽ. r., evalu- I a b
63 146 INTRODUCTION TO GENERALIZED LINEAR MODELS ated at Žt.. The formula for Fsher scorng s or Žtq1. Žt. Žt. y1 Žt. s q u I I I Žt. Žtq1. s Žt. Žt. q u Žt.. Ž For estmatng a bnomal parameter, from Secton 1.3. the nformaton s nrw Ž 1 y.x.astep of Fsher scorng gves y1 Žt. n yy n Žtq1. Žt. s q Žt. Ž 1 y Žt.. Žt. Ž 1 y Žt.. y y n Žt. y Žt. s q s. n n Ths gves the answer for ˆ after a sngle teraton and stays at that value for successve teratons. Žt. Žt. Formula 4.6 showed that I s XWX. Smlarly, I s XW X, where Žt. w x Žt. W s W see 4.7 evaluated at. The estmated asymptotc covarance matrx ˆy1 I of ˆ wsee Ž 4.8.x occurs as a by-product of ths algorthm as Ž Žt. I. y1 for t at whch convergence s adequate. From Ž 4.., for both Fsher scorng and Newton Raphson, u has elements LŽ. N Ž yy. xj u j s s Ý. Ž varž Y. j s1 For GLMs wth a canoncal lnk, we ll see Ž Secton that the observed and expected nformaton are the same. For noncanoncal lnk models, Fsher scorng has the advantages that t produces the asymptotc covarance matrx as a by-product, the expected nformaton s necessarly nonnegatve defnte, and as seen next, t s closely related to weghted least squares methods for ordnary lnear models. However, t need not have second-order convergence, and for complex models the observed nformaton s often easer to calculate. Efron and Hnkley Ž 1978., developng arguments of R. A. Fsher, gave reasons for preferrng observed nformaton. They argued that ts varance estmates better approxmate a relevant condtonal varance Žcondtonal on statstcs not relevant to the parameter beng estmated., t s closer to the data, and t tends to agree more closely wth Bayesan analyses ML as Iteratve Reweghted Least Squares* A relaton exsts between weghted least squares estmaton and usng Fsher scorng to fnd ML estmates. We refer here to the general lnear model of
64 FITTING GENERALIZED LINEAR MODELS 147 form z s X q. When the covarance matrx of s V, the weghted least squares estmator of s Ž WLS. I y1 y1 y1 Ž XV X. XV z. From s XWX, expresson Ž for elements of u, and snce dagonal elements of W are w s Ž r. rvarž Y.,tfollows that n Ž 4.40., where z Žt. has elements I Žt. Žt. q u Žt. s X W Žt. z Žt., Žt. Žt. Žt. Žt. Ý j j Ž. Žt. j z s x q y y Žt. Žt. Žt. Ž. Žt. s q y y. Equatons 4.40 for Fsher scorng then have the form Ž X W Žt. X. Žtq1. s X W Žt. z Žt.. These are the normal equatons for usng weghted least squares to ft a lnear model for a response varable z Žt., when the model matrx s X and the nverse of the covarance matrx s W Žt.. The equatons have soluton Žtq1. Žt. y1 Žt. Žt. s Ž XW X. XW z. The vector z n ths formulaton s a lnearzed form of the lnk functon g, evaluated at y, gž y f g q y y g. Ž. Ž. s q Ž yy.ž r. s z. Ž 4.4. Ths adjusted Ž or workng. response arable z has element approxmated by z Žt. for cycle t of the teratve scheme. That cycle regresses z Žt. on X wth Žt. Žtq1. weght.e., nverse covarance W to obtan a new estmate. Ths estmate yelds a new lnear predctor value Žtq1. s X Žtq1. and a new adjusted response value z Žtq1. for the next cycle. The ML estmator results from teratve use of weghted least squares, n whch the weght matrx changes at each cycle. The process s called terat e reweghted least squares. A smple way to begn the teratve process uses the data y as the ntal estmate of. Ths determnes the frst estmate of the weght matrx W and
65 148 INTRODUCTION TO GENERALIZED LINEAR MODELS hence the ntal estmate of. Itmay be necessary to alter some observatons slghtly for ths frst cycle only so that gž. y, the ntal value of z, s fnte. For nstance, when g s the log lnk appled to counts, a count of 1 ys 0sproblematc, so one could set ys. Ths s not a problem wth the model tself, snce the log apples to the mean, and ftted means are usually strctly postve n successve teratons Smplfcatons for Canoncal Lnks* Certan smplfcatons result wth GLMs usng the canoncal lnk. For that lnk, Ý s s x. j j j Often, až. n the densty or mass functon Ž s dentcal for all observatons, such as for Posson GLMs waž. s 1x and bnomal GLMs wth each n s 1 wfor whch až. s 1rn s 1.Then x the part of the log lkelhood Ž nvolvng both parameters and data s Ý y, whch smplfes to ž / Ý Ý Ý Ý y x s yx. j j j j j j ž / Suffcent statstcs for estmatng n the GLM are then For the canoncal lnk, Ý yx, j s 1,..., p. j r s r s b r s b Ž.. Thus, the contrbuton 4.1 to the lkelhood equaton for smplfes to j When a L yy Ž yy. x j s b Ž. xjs. Ž varž Y. až. j s dentcal for all observatons, the lkelhood equatons are Ý Ý x y s x, j s 1,..., p. Ž j j These equatons equate the suffcent statstcs for the model parameters to ther expected values Ž Nelder and Wedderburn For a normal dstrbuton wth dentty lnk, these are the normal equatons. Weobtaned these for Posson loglnear models n Ž 4.9. and for bnomal logstc regresson models Ž when each n s 1. n Ž 4.5..
66 QUASI-LIKELIHOOD AND GENERALIZED LINEAR MODELS 149 From expresson Ž for Lr j, wth the canoncal lnk the second dervatves of the log lkelhood have components ž / L xj sy. až. j h h Ths does not depend on the observaton y,so I L r h js E L r h j. That s, H sy, and the Newton Raphson and Fsher scorng algorthms are dentcal for canoncal lnk models Ž Nelder and Wedderburn QUASI-LIKELIHOOD AND GENERALIZED LINEAR MODELS* A GLM gž. s Ý j jxj specfes usng a lnk functon g and lnear predctor. From Ž 4.. and Ž 4.41., the ML estmates ˆ are the solutons of the lkelhood equatons / N Ž yy. xj u jž. s Ý s0, j s 1,..., p, Ž 4.45 ž. Ž. s1 y1 where s g Ž Ý x. and Ž. s varž Y. j j j. These equatons set the score functons u Ž.4 j,whchare dervatves of the log lkelhood wth respect to 4 j, equal to 0. As we noted n Secton 4.4.4, the lkelhood equatons depend on the assumed dstrbuton for Y only through and Ž.. The choce of dstrbuton determnes the mean varance relatonshp Ž Mean Varance Relatonshp Determnes Quas-lkelhood Estmates Wedderburn Ž proposed an alternatve approach, quas-lkelhood estmaton, whch assumes only a mean varance relatonshp rather than a specfc dstrbuton for Y.Ithas a lnk functon and lnear predctor of the usual GLM form, but nstead of assumng a dstrbutonal type for Y t assumes only varž Y. s Ž. for some chosen varance functon. The equatons that determne quaslkelhood estmates are the same as the lkelhood equatons Ž for GLMs. They are not lkelhood equatons, however, wthout the addtonal assumpton that Y 4 has dstrbuton n the natural exponental famly. To llustrate, suppose we assume that the Y 4 are ndependent wth Ž. s.
67 150 INTRODUCTION TO GENERALIZED LINEAR MODELS The quas-lkelhood Ž QL. estmates are the soluton of Ž wth Ž. replaced by.under the addtonal assumpton that Y 4 have dstrbuton n the exponental dsperson famly Ž 4.14., these estmates are also ML estmates. That case s smply the Posson dstrbuton. Thus, for Ž. s, quas-lkelhood estmates are also ML estmates when the random component has a Posson dstrbuton. Wedderburn suggested usng the estmatng equatons Ž for any varance functon, even f t does not occur for a member of the natural exponental famly. In fact, the purpose of the quas-lkelhood method was to encompass a greater varety of cases, such as dscussed n Secton The QL estmates have asymptotc covarance matrx of the same form Ž 4.8. as n Ž ˆ. y1 GLMs, namely XWX wth w s r rvarž Y Overdsperson for Posson GLMs and Quas-lkelhood For count data, we ve seen Ž Secton that the Posson assumpton s often unrealstc because of overdsperson the varance exceeds the mean. One cause for ths s heterogenety among subjects. Ths suggests an alternatve to a Posson GLM n whch the mean varance relatonshp has the form Ž. s for some constant. The case 1 represents overdsperson for the Posson model. In the estmatng equatons Ž wth Ž. s, drops out. Thus, the equatons are dentcal to lkelhood equatons for Posson models, and model parameter estmates are also dentcal. Also, w s Ž r. varž Y. s Ž r. r, Ž ˆ. Ž ˆ. y1 so the estmated cov s XWX s tmes that for the Posson model. When a varance functon has the form Ž. s * Ž., usually s also unknown. However, s not n the estmatng equatons. Let X s ÝŽ y y. r * Ž. ˆ ˆ, a Pearson-type statstc for the smpler model wth s 1. Then X r s a sum of squares of N standardzed terms. When X r s approxmately ch-squared or when s approxmately lnear n Ž wth * close to *, then E X r. ˆ f N y p, the number of observaw tons mnus the number of model parameters p. Hence, E X rž N y p.x f. Usng the motvaton of moment estmaton, Wedderburn Ž suggested ˆ takng s X rž N y p. as the estmated multple of the covarance matrx. In summary, ths quas-lkelhood approach for count data s smple: Ft the ordnary Posson model and use ts p parameter estmates. Multply the ordnary standard error estmates by ' X rž N y p.. We llustrate for the horseshoe crab data analyzed wth Posson GLMs n Secton Wth the log lnk, the ft usng wdth to predct number of
68 QUASI-LIKELIHOOD AND GENERALIZED LINEAR MODELS 151 satelltes was log ˆ sy3.305 q x, wth SE s 0.00 for ˆ s To mprove the adequacy of usng a ch-squared statstc to summarze ft, we use the satellte totals and ft for all female crabs at a gven wdth, to ncrease the counts and ftted values relatve to those for ndvdual female crabs. The N s 66 dstnct wdth levels each have a total count y for the number of satelltes and a ftted total ˆ. The Pearson statstc comparng these s X s The quas-lkelhood adjustment for standard errors equals '174.3rŽ 66 y. s Thus, SE s 1.65Ž s s a more plausble standard error for ˆ s n ths predcton equaton. Alternatve ways of handlng overdsperson nclude mxture models that allow heterogenety n the mean at fxed settngs of predctors. For count data these nclude Posson GLMs havng random effects Ž Secton and negatve bnomal GLMs that result when a Posson parameter tself has a gamma dstrbuton Ž Secton and Overdsperson for Bnomal GLMs and Quas-lkelhood The quas-lkelhood approach can also handle overdsperson for counts based on bnary data. When y s the sample mean of n ndependent bnary observatons wth parameter, s 1,..., N, then bnomal samplng has EY s and varž Y. s Ž 1 y. rn.asmple quas-lkelhood approach uses the alternatve varance functon Ž. s Ž 1 y. rn. Ž Overdsperson occurs when 1. The quas-lkelhood estmates are the same as ML estmates for the bnomal model, snce drops out of the estmatng equatons Ž As n the overdspersed Posson case, enters the denomnator of w. Thus, the asymptotc covarance matrx multples by, and standard errors multply by '.Anestmate of usng the X ft statstc for the ordnary bnomal model s X rž N y p. Ž Fnney Methods lke these that use estmates from ordnary models but nflate ther standard errors are approprate only f the model chosen descrbes well the structural relatonshp between the mean of Y and the predctors. If a large goodness-of-ft statstc s due to some other type of lack of ft, such as falng to nclude a relevant nteracton term, makng an adjustment for overdsperson wll not address the nadequacy. For counts wth bnary data, alternatve mechansms for handlng overdsperson nclude mxture models such as bnomal GLMs wth random effects Ž Secton 1.3. and models for whch a bnomal parameter tself has a beta dstrbuton Ž Secton Teratology Overdsperson Example Table 4.5 shows results of a teratology experment n whch female rats on ron-defcent dets were assgned to four groups. Rats n group 1 were gven placebo njectons, and rats n other groups were gven njectons of an ron
69 15 INTRODUCTION TO GENERALIZED LINEAR MODELS TABLE 4.5 Response Counts of ( Ltter Sze, Number Dead) for 58 Ltters of Rats n Low-Iron Teratology Study Group 1: Untreated Ž low ron. Ž 10, 1. Ž 11, 4. Ž 1, 9. Ž 4, 4. Ž 10, 10. Ž 11, 9. Ž 9, 9. Ž 11, 11. Ž 10, 10. Ž 10, 7. Ž 1, 1. Ž 10, 9. Ž 8, 8. Ž 11, 9. Ž 6, 4. Ž 9, 7. Ž 14, 14. Ž 1, 7. Ž 11, 9. Ž 13, 8. Ž 14, 5. Ž 10, 10. Ž 1, 10. Ž 13, 8. Ž 10, 10. Ž 14, 3. Ž 13, 13. Ž 4, 3. Ž 8, 8. Ž 13, 5. Ž 1, 1. Group : Injectons days 7 and 10 Ž 10, 1. Ž 3, 1. Ž 13, 1. Ž 1, 0. Ž 14, 4. Ž 9,. Ž 13,. Ž 16, 1. Ž 11, 0. Ž 4, 0. Ž 1, 0.Ž 1, 0. Group 3: Injectons days 0 and 7 Ž 8, 0. Ž 11, 1. Ž 14, 0. Ž 14, 1. Ž 11, 0. Group 4: Injectons weekly Ž 3, 0. Ž 13, 0. Ž 9,. Ž 17,. Ž 15, 0. Ž, 0. Ž 14, 1. Ž 8, 0. Ž 6, 0. Ž 17, 0. Source: Moore and Tsats supplement; ths was done weekly n group 4, only on days 7 and 10 n group, and only on days 0 and 7 n group 3. The 58 rats were made pregnant, sacrfced after three weeks, and then the total number of dead fetuses was counted n each ltter. In teratology experments, due to unmeasured covarates and genetc varablty the probablty of death may vary from ltter to ltter wthn a partcular treatment group. Let yž g. denote the proporton of dead fetuses out of the nž g. n ltter n treatment group g. Let Ž g. denote the probablty of death for a fetus n that ltter. Consder the model wth n y a bnž n,. varate, where Ž g. Ž g. Ž g. Ž g. Ž g. s g, g s 1,, 3, 4. That s, the model treats all ltters n a partcular group g as havng the same probablty of death g. The ML ft has estmate ˆ g equal to the sample proporton of deaths for all fetuses from ltters n that group. These equal ˆ s Ž SE s 0.04., s 0.10 Ž SE s ˆ, ˆ 3s Ž SE s 0.04., and s Ž SE s ˆ4, where for group g, SEs ' ˆ gž 1 y ˆ g. rž Ý n Ž g... The estmated probablty of death s consderably hgher for the placebo group. For ltter n group g, nž g. ˆ g s a ftted number of deaths and n Ž 1 y. Ž g. ˆg s a ftted number of nondeaths. Comparng these ftted values to the observed counts of deaths and nondeaths n the N s 58 ltters usng the Pearson statstc gves X s wth df s 58 y 4 s 54. There s consderable evdence of overdsperson. Wth the quas-lkelhood approach, 4 ˆ are the same as the bnomal ML estmates; however, s X rž N y p. ˆg s 154.7rŽ 58 y 4. s.86, so standard errors multply by ˆ1r s Even wth ths adjustment for overdsperson, strong evdence remans that the probablty of death s substantally hgher for the placebo group. For
70 GENERALIZED ADDITIVE MODELS 153 nstance, a 95% confdence nterval for y 1 s Ž y Ž q Ž r or Ž 0.54, Ths s wder, however, than the Wald nterval of 0.59, 0.73 for comparng ndependent proportons, whch gnores the overdsperson. 4.8 GENERALIZED ADDITIVE MODELS* The GLM generalzes the ordnary lnear model to permt nonnormal dstrbutons and modelng functons of the mean. Quas-lkelhood provdes a further generalzaton, specfyng how the varance depends on the mean wthout assumng a gven dstrbuton. Another generalzaton replaces the lnear predctor by smooth functons of the predctors Smoothng Data The GLM structure gž. s Ý j jxj generalzes to Ý g Ž. s s Ž x., j j j where s Ž. j s an unspecfed smooth functon of predctor j. Auseful smooth functon s the cubc splne. Ithas separate cubc polynomals over sets of dsjont ntervals, joned together smoothly at boundares of those ntervals. Lke GLMs, ths model specfes a dstrbuton for the random component and a lnk functon g. The resultng model s called a generalzed addt e model, symbolzed by GAM Ž Haste and Tbshran The GLM s the specal case n whch each sj s a lnear functon. Also possble s takng some s as smooth functons and others as lnear functons or as dummy varables j for qualtatve predctors. The detals for fttng GAMs are beyond our scope. The fttng algorthm employs a generalzaton of the Newton Raphson method that utlzes local smoothng. Ths corresponds to subtractng from the log-lkelhood functon a penalty functon that ncreases as the smooth functon gets more wggly. The model ft assgns a devance and an approxmate df value to each sj n the addtve predctor, enablng nference about those terms. For nstance, a smooth functon havng df s 5ssmlar n overall complexty to a fourth- degree polynomal, whch has fve parameters. One s choce of a df value Žor smoothng parameter. determnes how smooth the resultng GAM ft looks. It s usually worth tryng a varety of degrees of smoothng to fnd one that smooths the data suffcently so that the trend s not too rregular but does
71 154 INTRODUCTION TO GENERALIZED LINEAR MODELS not smooth so much that t suppresses nterestng patterns. Ths approach may suggest that a lnear model s adequate wth a partcular lnk or suggest ways to mprove on lnearty. Some software packages that do not have GAMs can smooth the data by employng a type of regresson that gves greater weght to nearby observatons n predctng the value at a gven pont; such locally weghted least squares regresson s often referred to as lowess. We prefer GAMs because they recognze explctly the form of the response. For nstance, wth a bnary response, lowess can gve predcted values below 0 or above 1, whch cannot happen wth a GAM. Even when one plans to use GLMs, a GAM can be helpful for exploratory analyss. For nstance, for contnuous X wth contnuous responses, scatter dagrams provde vsual nformaton about the dependence of Y on X. For bnary responses, the followng example shows that such dagrams are not very nformatve. Plottng the ftted smooth functon for a predctor may reveal a general trend wthout assumng a partcular functonal relatonshp. FIGURE 4.7 Whether satelltes are present 1, yes; 0, no, by wdth of female crab, wth smoothng ft of generalzed addtve model.
72 NOTES GAMs for Horseshoe Crab Example In Secton 4.3., Fgure 4.4 showed the trend relatng number of satelltes for horseshoe crabs to ther wdth. Ths smooth curve s the ft of a generalzed addtve model, assumng a Posson dstrbuton and usng the log lnk. In the next chapter we ll use logstc regresson to model the probablty that a crab has at least one satellte. For crab, let y s 1fshe has at least one satellte and y s 0 otherwse. Fgure 4.7 plots these data aganst x s crab wdth. It conssts of a set of ponts wth y s 1 and a second set of ponts wth y s 0. The numbered symbols ndcate the number of observa tons at each pont. It appears that y s 1 tends to occur relatvely more often at hgher x values. Fgure 4.7 also shows a curve based on smoothng the data usng a GAM, assumng a bnomal response and logt lnk. Ths curve shows a roughly ncreasng trend and s more nformatve than vewng the bnary data alone. It suggests that an S-shaped regresson functon may descrbe ths relatonshp relatvely well. NOTES Secton 4.1: Generalzed Lnear Model 4.. Dstrbuton Ž 4.1. s called a natural Ž or lnear. exponental famly to dstngush t from a more general exponental famly that replaces y by rž y. n the exponental term. For other generalzatons, see Jorgensen Ž Books on GLMs and related models, n approxmate order of techncal level from hghest to lowest, are McCullagh and Nelder Ž 1989., Fahrmer and Tutz Ž 001., Atkn et al. Ž 1989., Dobson Ž 00., and Gll Ž See also Frth Ž Secton 4.3: Generalzed Lnear Models for Counts 4.. For further dscusson of Posson regresson and related models for count data, see Breslow Ž 1984., Cameron and Trved Ž 1998., Frome Ž 1983., Hnde Ž 198., Lawless Ž 1987., and Seeber Ž and references theren. Secton 4.4: Moments and Lkelhood for Generalzed Lnear Models 4.3. The functon bž. n Ž s called the cumulant functon, snce when až. s 1 ts dervatves yeld the cumulants of the dstrbuton Ž Jorgensen For many GLMs, ncludng Posson models wth log lnk and bnary models wth logt lnk, wth full-rank model matrx the Hessan s negatve defnte and the log lkelhood s a strctly concave functon. Then ML estmates of model parameters exst and are unque under qute general condtons Ž Wedderburn Secton 4.5: Inference for Generalzed Lnear Models 4.4. The matrx W used n covž ˆ. wsee Ž 4.8.x, n the hat matrx for standardzed Pearson resduals wsee Ž 4.38.x, and n Fsher scorng wsee Ž 4.40.x s the nverse of the covarance matrx of the lnearzed form of gž.ž y see Secton
73 156 INTRODUCTION TO GENERALIZED LINEAR MODELS McCullagh and Nelder Ž 1989, Chap. 1. dscussed model checkng for GLMs. For dscussons about resduals, see also Green Ž 1984., Perce and Schafer Ž 1986., Pregbon Ž 1980, 1981., and Wllams Ž Pregbon Ž 198. showed that the squared standardzed Pearson resdual s the score statstc for testng whether the observaton s an outler. Davson and Hnkley Ž 1997, Sec. 7.. dscussed bootstrappng n GLMs. Secton 4.6: Fttng Generalzed Lnear Models 4.5. Fsher Ž 1935b. ntroduced the Fsher scorng method to calculate ML estmates for probt models. For further dscusson of GLM model fttng and the relatonshp between teratve reweghted least squares and ML estmaton, see Green Ž 1984., Jorgensen Ž 1983., McCullagh and Nelder Ž 1989., and Nelder and Wedderburn Ž Green Ž 1984., Jorgensen Ž 1983., and Palmgren and Ekholm Ž also dscussed ths relaton for exponental famly nonlnear models. Secton 4.7: Quas-lkelhood and Generalzed Lnear Models 4.6. For more on quas-lkelhood, see Sectons 11.4, 1.6.4, and 13.3, Breslow Ž 1984., Cox Ž 1983., Frth Ž 1987., Hnde and Demetro Ž 1998., McCullagh Ž 1983., McCullagh and Nelder Ž 1989., Nelder and Pregbon Ž 1987., and Wedderburn Ž 1974, See Heyde Ž for a theoretcal perspectve. Secton 4.8: Generalzed Addt e Models 4.7. Besdes GAMs, other nonparametrc smoothng methods can descrbe the dependence of a bnary response on a predctor. For nstance, see Copas Ž 1983., Lloyd Ž1999, Chap. 5,and. Secton for kernel smoothng and Kauermann and Tutz Ž 001. for models wth random effects. PROBLEMS Applcatons 4.1 In the 000 U.S. presdental electon, Palm Beach County n Florda was the focus of unusual votng patterns Žncludng a large number of llegal double votes. apparently caused by a confusng butterfly ballot. Many voters clamed that they voted mstakenly for the Reform Party canddate, Pat Buchanan, when they ntended to vote for Al Gore. Fgure 4.8 shows the total number of votes for Buchanan plotted aganst the number of votes for the Reform Party canddate n 1996 Ž Ross Perot., by county n Florda. ŽFor detals, see A. Agrest and B. Presnell, J. Law Publc Polcy, Volume 13, Fall 001, a. In county, let denote the proporton of the vote for Buchanan and let x denote the proporton of the vote for Perot n For the lnear probablty model ftted to all countes except Palm Beach County, sy q x. Gve the value of P n the ˆ
74 PROBLEMS 157 FIGURE 4.8 Total vote, by county n Florda, for Reform Party canddates Buchanan n 000 and Perot n nterpretaton: The estmated proporton vote for Buchanan n 000 was roughly P% ofthat for Perot n b. For Palm Beach County, s and xs Does ths result appear to be an outler? Explan. c. For logstc regresson, logwˆ rž 1 y.x ˆ sy7.164 q 1.19 x. Fnd n Palm Beach County. Is that county an outler for ths model? ˆ 4. For games n baseball s Natonal League durng nne decades, Table 4.6 shows the percentage of tmes that the startng ptcher ptched a complete game. TABLE 4.6 Data for Problem 4. Percent Percent Percent Decade Complete Decade Complete Decade Complete Source: Data from George Wll, Newsweek, Apr. 10, 1989.
75 158 INTRODUCTION TO GENERALIZED LINEAR MODELS a. Treatng the number of games as the same n each decade, the ML ft of the lnear probablty model s ˆ s y x, where x s decade Ž x s 1,,...,9.. Interpret and y b. Substtutng x s 10, 11, 1, predct the percentages of complete games for the next three decades. Are these predctons plausble? Why? c. The ML ft wth logstc regresson s ˆ s expž y 0.315x. rw1 q expž y 0.315x.x. Obtan ˆ for x s 10, 11, 1. Are these more plausble? 4.3 For Table 3.7 wth scores 0, 0.5, 1.5, 4.0, 7.0 for alcohol consumpton, ML fttng of the lnear probablty model for malformaton has output. Parameter Estmate Std Error Wald 95% Conf Lmts Intercept Alcohol y Interpret the model ft. Use t to estmate the relatve rsk of malformaton for alcohol consumpton levels 0 and For Table 4., reft the lnear probablty model or the logstc regresson model usng the scores Ž a. Ž 0,, 4, 6., Ž b. Ž 0, 1,, 3., and Ž c. Ž1,, 3, 4.. Compare ˆ for the three choces. Compare ftted values. Summarze the effect of lnear transformatons of scores, whch preserve relatve szes of spacngs between scores. 4.5 For Table 4.3, let Y s 1facrab has at least one satellte, and Y s 0 otherwse. Usng x s weght, ft the lnear probablty model. a. Use ordnary least squares. Interpret the parameter estmates. Fnd the estmated probablty at the hghest observed weght Ž 5.0 kg.. Comment. b. Try to ft the model usng ML, treatng Y as bnomal. wthe falure s due to a ftted probablty fallng outsde the Ž 0, 1. range. The ft n part Ž. a s ML for a normal random component, for whch ftted values outsde ths range are permssble.x c. Ft the logstc regresson model. Show that the ftted probablty at a weght of 5.0 kg equals d. Ft the probt model. Fnd the ftted probablty at 5.0 kg. 4.6 An experment analyzes mperfecton rates for two processes used to fabrcate slcon wafers for computer chps. For treatment A appled to 10 wafers, the numbers of mperfectons are 8, 7, 6, 6, 3, 4, 7,, 3, 4. Treatment B appled to 10 other wafers has 9, 9, 8, 14, 8, 13, 11, 5, 7, 6
76 PROBLEMS 159 mperfectons. Treat the counts as ndependent Posson varates havng means A and B. a. Ft the model log s q x, where x s 1 for treatment B and x s 0 for treatment A. Show that expž. s Br A, and nterpret ts estmate. b. Test H 0: As B wth the Wald or lkelhood rato test of H 0: s 0. Interpret. c. Construct a 95% confdence nterval for r. Ž B A Hnt: Frst construct one for.. d. Test H 0: As B based on ths result: If Y1 and Y are ndependent Posson wth means and, then ŽY Y q Y s bnomal wth n s Y q Y and s rž q For Table 4.3, Table 4.7 shows SAS output for a Posson loglnear model ft usng X s weght and Y s number of satelltes. a. Estmate EY for female crabs of average weght,.44 kg. b. Use ˆ to descrbe the weght effect. Show how to construct the reported confdence nterval. c. Construct a Wald test that Y s ndependent of X. Interpret. d. Can you conduct a lkelhood-rato test of ths hypothess? If not, what else do you need? e. Is there evdence of overdsperson? If necessary, adjust standard errors and nterpret. TABLE 4.7 SAS Output for Problem 4.7 Crteron DF Value Devance Pearson Ch- Square Log Lkelhood Parameter Estmate Std Error Wald 95% Conf Lmts Ch- Sq Pr > ChSq Intercept y y y weght < Refer to Problem 4.7. Usng the dentty lnk wth x s weght, ˆ s y.60 q.64 x, where ˆ s.64 has SE s 0.8. Repeat parts Ž. a through Ž. c. 4.9 Refer to Table 4.3. a. Ft a Posson loglnear model usng both W s weght and C s color to predct Y s number of satelltes. Assgnng dummy varables, treat C as a nomnal factor. Interpret parameter estmates.
77 160 INTRODUCTION TO GENERALIZED LINEAR MODELS b. Estmate EY for female crabs of average weght 44 kg. that are Ž. medum lght, and dark. c. Test whether color s needed n the model. Ž Hnt: From Secton 4.5.4, the lkelhood-rato statstc comparng models s the dfference n devances.. d. The estmated color effects are monotone across the four categores. Ft a smpler model that treats C as quanttatve and assumes a lnear effect. Interpret ts color effect and repeat the analyses of parts Ž b. and Ž c.. Compare the ft to the model n part Ž. a.interpret. e. Add wdth to the model. What effect does the strong postve correlaton between wdth and weght have? Are both needed n the model? 4.10 In Secton 4.3., refer to the Posson model wth dentty lnk. The ft usng least squares s ˆ sy10.4 q 0.51 x Ž SE s Explan why the parameter estmates dffer and why the SE values are so dfferent For the negatve bnomal model ftted to the crab satellte counts wth log lnk and wdth predctor, ˆ sy4.05, ˆ s 0.19 Ž SE s , ˆy1 k s Ž SE s Interpret. Why s SE for ˆ so dfferent from SE s 0.00 for the correspondng Posson GLM n Sec 4.3.? Whch s more approprate? Why? 4.1 Refer to Problem 4.6. The sample mean and varance are 5.0 and 4. for treatment A and 9.0 and 8.4 for treatment B. a. Is there evdence of overdsperson for the Posson model havng a dummy varable for treatment? Explan. b. Ft the negatve bnomal loglnear model. Note that the estmated dsperson parameter s 0 and that estmates of treatment means and standard errors are the same as wth the Posson loglnear GLM. c. For the overall sample of 0 observatons, the sample mean and varance are 7.0 and 10.. Ft the loglnear model havng only an ntercept term under Posson and negatve bnomal assumptons. Compare results, and compare confdence ntervals for the overall mean response. Why do they dffer? Ž Note: Ths shows how the Posson model can deterorate when an mportant covarate s unmeasured Table 4.8 shows the free-throw shootng, by game, of Shaq O Neal of the Los Angeles Lakers durng the 000 NBA Ž basketball. playoffs. Commentators remarked that hs shootng vared dramatcally from game to game. In game, suppose that Y s number of free throws
78 PROBLEMS 161 TABLE 4.8 Data for Problem 4.13 Number Number of Number Number of Number Number of Game Made Attempts Game Made Attempts Game Made Attempts Source: made out of n attempts s a bnž n,. varate and the Y 4 are ndependent. a. Ft the model, s, and fnd and nterpret ˆ and ts standard error. Does the model appear to ft adequately? ŽNote: You could check ths wth a small-sample test of ndependence of the 3 table of game and the bnary outcome.. b. Adjust the standard error for overdsperson. Usng the orgnal SE and ts correcton, fnd and compare 95% confdence ntervals for. Interpret Refer to Table Ft a loglnear model wth a dummy varable for race, Ž a. assumng a Posson dstrbuton, and Ž b. allowng overdsperson wth a quas-lkelhood approach. Compare results Refer to Problem 4.6. The wafers are also classfed by thckness of slcon coatng Ž z s 0, low; z s 1, hgh.. The frst fve mperfecton counts reported for each treatment refer to z s 0 and the last fve refer to z s 1. Analyze these data Refer to Table 13.9 on frequency of sexual ntercourse. Analyze these data. Theory and Methods 4.17 Descrbe the purpose of the lnk functon of a GLM. What s the dentty lnk? Explan why t s not often used wth bnomal or Posson responses For known k, show that the negatve bnomal dstrbuton Ž 4.1. has exponental famly form Ž 4.1. wth natural parameter logw rž q k.x.
79 16 INTRODUCTION TO GENERALIZED LINEAR MODELS 4.19 For bnary data, defne a GLM usng the log lnk. Show that effects refer to the relatve rsk. Why do you thnk ths lnk s not often used? Ž Hnt: What happens f the lnear predctor takes a postve value?. 4.0 For the logstc regresson model Ž 4.6. wth 0, show that Ž a. as x, Ž x. s monotone ncreasng, and Ž b. the curve for Ž x. s the cdf of a logstc dstrbuton havng mean y r and standard devaton rž ' Show representaton 4.18 for the bnomal dstrbuton. 4. Let Y be a bnž n,. varate for group, s 1,..., N, wth Y 4 ndependent. Consder the model that 1 s s N. Denote that common value by. Forobservatons y 4,showthat s Ž Ý y. rž Ýn. ˆ. When all n s 1, for testng ths model s ft n the N table, show that X s n. Thus, goodness-of-ft statstcs can be completely unnformatve for ungrouped data. Ž See also Problem Suppose that Y s Posson wth gž. s q x, where xs 1 for s 1,...,nA from group A and x s 0 for s naq 1,...,nAq nb from group B. Show that for any lnk functon g, the lkelhood equatons Ž 4.. mply that ftted means ˆ A and ˆ B equal the sample means. 4.4 For bnary data wth sample proporton y based on n trals, we use quas-lkelhood to ft a model usng varance functon Ž Show that parameter estmates are the same as for the bnomal GLM but that the covarance matrx multples by. 4.5 A bnomal GLM s Ž Ý x. j j j wth arbtrary nverse lnk functon assumes that ny has a bnž n,. dstrbuton. Fnd w n Ž 4.7. $ and hence cov Ž ˆ.. For logstc regresson, show that w s n Ž 1 y A GLM has parameter wth suffcent statstc S. Agoodness-of-ft test statstc T has observed value t o.if were known, a P-value s P s PTG Ž t ;.. Explan why PTG Ž t S. o o s the unform mnmum varance unbased estmator of P. 4.7 Let yj be observaton j of a count varable for group, s 1,..., I, j s 1,...,n.Suppose that Y 4 are ndependent Posson wth EY j j s. a. Show that the ML estmate of s ˆ s ys Ý jyjrn. b. Smplfy the expresson for the devance for ths model. wfor testng ths model, t follows from Fsher Ž1970, p. 58, orgnally publshed
80 PROBLEMS 163 n 195. that the devance and the Pearson statstc ÝÝ j yjy y ry have approxmate ch-squared dstrbutons wth df s Ý Ž n y 1.. For a sngle group, Cochran Ž referred to Ý j y1 j y y1 ry1 as the arance test for the ft of a Posson dstrbuton, snce t compares the sample varance to the estmated Posson varance y 1.x 4.8 Condtonal on, Y has a Posson dstrbuton wth mean. Values of vary accordng to gamma densty Ž 13.1., whch has EŽ. s, var s rk. Show that margnally Y has the negatve bnomal dstrbuton Ž Explan why the negatve bnomal model s a way to handle overdsperson for the Posson. 4.9 Consder the class of bnary models Ž 4.8. and Ž Suppose that the standard cdf corresponds to a probablty densty functon that s symmetrc around 0. a. Show that x at whch Ž x. s 0.5 s x sy r. b. Show that the rate of change n Ž x. when Ž x. s 0.5 s Ž 0.. Show ths s 0.5 for the logt lnk and r' Ž where s for the probt lnk. c. Show that the probt regresson curve has the shape of a normal cdf wth mean y r and standard devaton 1r Show the normal dstrbuton NŽ,. wth fxed satsfes famly Ž 4.1., and dentfy the components. Formulate the ordnary regresson model as a GLM In Problem 4.30, when s also a parameter, show that t satsfes the exponental dsperson famly y1 1r tan Ž q x.. Whch dstrbuton has cdf of ths form? Explan when a GLM usng ths curve mght be more approprate than logstc regresson For bnary observatons, consder the model x s q 4.33 Fnd the form of the devance resdual Ž for an observaton n a Ž a. bnomal GLM, and Ž b. Posson GLM. Illustrate part Ž b. for a cell count n a two-way contngency table for the model of ndependence ˆ Ž0. Consder the value that maxmzes a functon L. Let denote an ntal guess. ˆ Ž0. ˆ Ž0. Ž Ž0. a. Usng L s L q y L. q, argue that for Ž0. ˆ Ž0. ˆ Ž0. Ž Ž0. close to, approxmately 0 s L q y L.. Ž1. Solve ths equaton to obtan an approxmaton for. ˆ
81 164 INTRODUCTION TO GENERALIZED LINEAR MODELS Žt. b. Let denote approxmaton t for, t s 0, 1,,... Justfy that the next approxmaton s Žtq1. s Žt. y L Ž Žt.. rl Ž Žt... ˆ 4.35 For n ndependent observatons from a Posson dstrbuton, show that Žtq1. Fsher scorng gves s y for all t 0. By contrast, what happens wth Newton Raphson? 4.36 Wrte a computer program usng the Newton Raphson algorthm to maxmze the lkelhood for a bnomal sample. For ˆ s 0.3 based on n s 10, prnt out results of the frst sx teratons when the startng Ž0. value s Ž a. 0.1, Ž b. 0.,..., 0.9. Summarze the effects of the startng value on speed of convergence. What happens f t s 0 or 1? 4.37 In a GLM, suppose that varž Y. s Ž. for s EY. Show that the w xy1r Žt. lnk g satsfyng g s has the same weght matrx W at each cycle. Show ths lnk for a Posson random component s gž. s ' For noncanoncal lnks n a GLM, show that the observed nformaton matrx may depend on the data and hence dffers from the expected nformaton. Illustrate usng the probt model.
82 Categorcal Data Analyss, Second Edton. Alan Agrest Copyrght 00 John Wley & Sons, Inc. ISBN: CHAPTER 5 Logstc Regresson In ntroducng generalzed lnear models for bnary data n Chapter 4 we hghlghted logstc regresson. Ths s the most mportant model for categorcal response data. It s used ncreasngly n a wde varety of applcatons. Early uses were n bomedcal studes but the past 0 years have also seen much use n socal scence research and marketng. Recently, logstc regresson has become a popular tool n busness applcatons. Some credt-scorng applcatons use logstc regresson to model the probablty that a subject s credt worthy. For nstance, the probablty that a subject pays a bll on tme may use predctors such as the sze of the bll, annual ncome, occupaton, mortgage and debt oblgatons, percentage of blls pad on tme n the past, and other aspects of an applcant s credt hstory. A company that reles on catalog sales may determne whether to send a catalog to a potental customer by modelng the probablty of a sale as a functon of ndces of past buyng behavor. Another area of ncreasng applcaton s genetcs. For nstance, one recent artcle ŽJ. M. Henshall and M. E. Goddard, Genetcs 151: , used logstc regresson to estmate quanttatve trat loc effects, modelng the probablty that an offsprng nherts an allele of one type nstead of another type as a functon of phenotypc values on varous trats for that offsprng. Another recent artcle ŽD. F. Levnson et al., Amer. J. Hum. Genet., 67:65 663, 000. used logstc regresson for analyss of the genotype data of affected sblng pars Ž ASPs. and ther parents from several research centers. The model studed the probablty that ASPs have denttyby-descent allele sharng and tested ts heterogenety among the centers. In ths chapter we study logstc regresson more closely. Secton 5.1 covers parameter nterpretaton. In Secton 5. we present nferental methods for those parameters. Sectons 5.3 and 5.4 generalze to multple predctors, some of whch may be qualtatve. Fnally, n Secton 5.5 we apply GLM model-fttng methods to determne and solve lkelhood equatons for logstc regresson. 165
83 166 LOGISTIC REGRESSION 5.1 INTERPRETING PARAMETERS IN LOGISTIC REGRESSION For a bnary response varable Y and an explanatory varable X, let Ž x. s PYs Ž 1 X s x. s 1 y PYs Ž 0 X s x.. The logstc regresson model s exp Ž q x. Ž x. s. Ž q exp Ž q x. Equvalently, the log odds, called the logt, has the lnear relatonshp Ž x. logt Ž x. s log s q x. Ž y Ž x. Ths equates the logt lnk functon to the lnear predctor Interpretng : Odds, Probabltes, and Lnear Approxmatons How can we nterpret n Ž 5..? Its sgn determnes whether Ž x. s ncreasng or decreasng as x ncreases. The rate of clmb or descent ncreases as ncreases; as 0 the curve flattens to a horzontal straght lne. When s 0, Y s ndependent of X. For quanttatve x wth 0, the curve for Ž x. has the shape of the cdf of the logstc dstrbuton Žrecall Secton Snce the logstc densty s symmetrc, Ž x. approaches 1 at the same rate that t approaches 0. Exponentatng both sdes of Ž 5.. shows that the odds are an exponental functon of x. Ths provdes a basc nterpretaton for the magntude of : The odds ncrease multplcatvely by e for every 1-unt ncrease n x. In other words, e s an odds rato, the odds at X s x q 1 dvded by the odds at X s x. Most scentsts are not famlar wth odds or logts, so the nterpretaton of a multplcatve effect of e on the odds scale or an addtve effect of on the logt scale s not helpful to them. A smpler, although approxmate slope nterpretaton uses a lnearzaton argument Ž Berkson Snce t has a curved rather than a lnear appearance, the logstc regresson functon Ž 5.1. mples that the rate of change n Ž x. per unt change n x vares. A straght lne drawn tangent to the curve at a partcular x value, shown n Fgure 5.1, descrbes the rate of change at that pont. Calculatng Ž x. r x usng Ž 5.1. yelds a farly complex functon of the parameters and x, but t smplfes to the form Ž x.w1 y Ž x.x. 1 For nstance, the lne tangent to the curve at x for whch Ž x. s has 1 1 slope s r4; when Ž x. s 0.9 or 0.1, t has slope The slope approaches 0 as Ž x. approaches 1.0 or 0. The steepest slope occurs at x for 1 1 whch Ž x. s ; that x value s x sy r. wto check that Ž x. s at ths
84 INTERPRETING PARAMETERS IN LOGISTIC REGRESSION 167 FIGURE 5.1 Lnear approxmaton to logstc regresson curve. 1 pont, substtute y r for x n 5.1, or substtute x s n 5. and solve for x. x Ths x value s sometmes called the medan effect ele el and denoted EL. In toxcology studes t s called LD Ž LD s lethal dose , the dose wth a 50% chance of a lethal result. 1 From ths lnear approxmaton, near x where Ž x. s,achange n x of 1 1r corresponds to a change n Ž x. of roughly Ž 1r.Ž r4. s 4 ; that s, 1r approxmates the dstance between x values where Ž x. s 0.5 or 0.75 Ž n realty, 0.7 and and where Ž x. s The lnear approxmaton works better for smaller changes n x, however. An alternatve way to nterpret the effect reports the values of Ž x. at certan x values, such as ther quartles. Ths entals substtutng those quartles for x nto formula Ž 5.1. for Ž x.. The change n Ž x. over the mddle half of x values, from the lower quartle to the upper quartle of x, then descrbes the effect. It can be compared to the correspondng change over the mddle half of values of other predctors. The ntercept parameter s not usually of partcular nterest. However, by centerng the predctor about 0 w.e., replacng x by Ž x y x.x, becomes Ž the logt at that mean, and thus e r 1 q e. s Ž x.. ŽAs n ordnary regresson, centerng s also helpful n complex models contanng quadratc or nteracton terms to reduce correlatons among model parameter estmates..
85 168 LOGISTIC REGRESSION 5.1. Lookng at the Data In practce, these nterpretatons use formula 5.1 wth ML estmates substtuted for parameters. Before fttng the model and makng such nterpretatons, look at the data to check that the logstc regresson model s approprate. Snce Y takes only values 0 and 1, t s dffcult to check ths by plottng Y aganst x. It can be helpful to plot sample proportons or logts aganst x. Let n denote the number of observatons at settng of x. Of them, let y denote the number of 1 outcomes, wth p s yrn. Sample logt s logw p rž 1 y p.xs logwy rž n y y.x. Ths s not fnte when y s 0orn.An ad hoc adjustment adds a postve constant to the number of outcomes of the two types. The adjustment 1 y q log 1 ny yq s the least-based estmator of ths form of the true logt Note 5.. The plot of sample logts should be roughly lnear. When X s contnuous and all n s 1, or when t s essentally contnuous and all n are small, ths s unsatsfactory. One could group the data wth nearby x values nto categores before calculatng sample proportons and sample logts. A better approach that does not requre choosng arbtrary categores uses a smoothng mechansm to reveal trends. One such smoothng approach fts a generalzed addtve model Ž Secton 4.8., whch replaces the lnear predctor of a GLM by a smooth functon. Inspect a plot of the ft to see f severe dscrepances occur from the S-shaped trend predcted by logstc regresson Horseshoe Crabs Revsted To llustrate logstc regresson, we reanalyze the horseshoe crab data ntroduced n Secton The bnary response s whether a female crab has any male crabs resdng nearby Ž satelltes.: Y s 1fshe has at least one satellte, and Y s 0fshe has none. We frst use as a predctor the female crab s wdth. Fgure 4.7 plotted the data and showed the smoothed predcton of the mean provded by a generalzed addtve model Ž GAM., assumng a bnomal response and logt lnk. The logstc regresson model appears to be adequate. Ths s also suggested by the groupng of the data used to nvestgate the adequacy of Posson regresson models n Secton 4.3. Ž Table In each of the eght wdth categores, we computed the sample proporton of crabs havng satelltes and the mean wdth for the crabs n that category. Fgure 5. shows eght dots representng the sample proportons of female crabs havng satelltes plotted aganst the mean wdths for the eght cate-
86 INTERPRETING PARAMETERS IN LOGISTIC REGRESSION 169 FIGURE 5. Observed and ftted proportons of satelltes by wdth of female crab. gores. The eght plotted sample proportons and the GAM smoothng curve both show a roughly ncreasng trend, so we proceed wth fttng the logstc regresson model wth lnear wdth predctor. We defer to Secton 5.5 detals about ML fttng. Software Že.g., for SAS see Table A.8. reports output such as Table 5.1 exhbts. For the ungrouped data from Table 4.3, let Ž x. denote the probablty that a female horseshoe crab of wdth x has a satellte. The ML ft s expž y1.351 q 0.497x. ˆ Ž x. s. 1 q expž y1.351 q 0.497x. TABLE 5.1 Crab Data Computer Output for Logstc Regresson Model wth Horseshoe Crtera For Assessng Goodness Of Ft Crteron DF Value Devance Pearson Ch- Square Log Lkelhood y97.63 Std Lkelhood- Rato Wald Parameter Estmate Error 95% Conf Lmts Ch- Sq P>ChSq Intercept y y y <.0001 wdth <.0001
87 170 LOGISTIC REGRESSION Substtutng x s 6.3 cm, the mean wdth level n ths sample, ˆ Ž x. s The estmated probablty equals when x sy r ˆ ˆ s 1.351r0.497 s 4.8. Fgure 5. plots ˆ Ž x. aganst wdth. The estmated odds of a satellte multply by expž ˆ. s expž s 1.64 for each 1-cm ncrease n wdth; that s, there s a 64% ncrease. To convey the effect less techncally, we could report the ncremental rate of change n the probablty of a satellte. At the mean wdth, ˆ Ž x. s 0.674, and ˆ Ž x. ncreases by about ˆw ˆŽ x.ž1 y ˆ Ž x..x s 0.497Ž Ž s 0.11 for a 1-cm ncrease n wdth. Or, we could report ˆ Ž x. at the quartles of x. The lower quartle, medan, and upper quartle for wdth are 4.9, 6.1, and 7.7; ˆ Ž x. at those values equals 0.51, 0.65, and 0.81, ncreasng by 0.30 over the x values for the mddle half of the sample. The latter summary s useful for comparng the effects of predctors havng dfferent unts. For nstance, wth crab weght as the predctor, logtwˆ Ž x.x s y3.695 q x. A 1-kg ncrease n weght s not comparable to a 1-cm ncrease n wdth, so ˆ s for x s wdth s not comparable to ˆ s for x s weght. The quartles for weght are.00,.35, and.85; ˆ Ž x. at those values are 0.48, 0.64, and 0.81, ncreasng by 0.33 over the mddle half of the sampled weghts. The effect s smlar to that of wdth Logstc Regresson wth Retrospectve Studes Another property of logstc regresson relates to stuatons n whch the explanatory varable X rather than the response varable Y s random. Ths occurs wth retrospectve samplng desgns, such as case control bomedcal studes Ž Secton For samples of subjects havng Y s 1 Ž cases. and havng Y s 0 Ž controls., the value of X s observed. Evdence exsts of an assocaton f the dstrbuton of X values dffers between cases and controls. In retrospectve studes, one can estmate odds ratos Ž Secton..4.. Effects n the logstc regresson model refer to odds ratos. Thus, one can ft such models and estmate effects n case control studes. Here s a justfcaton for ths. Let Z ndcate whether a subject s sampled Ž 1 s yes, 0 s no.. Let s PŽZs 1 y s 1. 1 denote the probablty of samplng a case, and let s PŽZs 1 y s 0. 0 denote the probablty of samplng a control. Even though the condtonal dstrbuton of Y gven X s x s not sampled, we need a model for PYs Ž 1 z s 1, x., assumng that PYs Ž 1 x. follows the logstc model. By Bayes theorem, PŽ Zs 1 y s 1, x. PŽ Ys 1 x. PŽ Ys 1 z s 1, x. s. Ž Ý PŽ Zs 1 y s j, x. PŽ Ys j x. js0 Ž. Ž. Now, suppose that P Zs 1 y, x s P Zs 1 y for y s 0 and 1; that s, for each y, the samplng probabltes do not depend on x. For nstance, often x
88 INTERPRETING PARAMETERS IN LOGISTIC REGRESSION 171 refers to exposure of some type, such as whether someone has been a smoker. Then, for cases and for controls, the probablty of beng sampled s the same for smokers and nonsmokers. Under ths assumpton, substtutng and n Ž 5.3. and dvdng numerator and denomnator by PYs Ž 0 x. 1 0, Ž 5.3. smplfes to 1 expž q x. PŽ Ys 1 z s 1, x. s. q expž q x. 0 1 Then, dvdng numerator and denomnator by 0 and usng 1r 0s expwlogž r.x yelds 1 0 logt PŽ Ys 1 z s 1, x. s * q x wth * s q logž r Thus, the logstc regresson model holds wth the same effect parameter as n the model for PYs Ž 1 x..ifthe samplng rate for cases s 10 tmes that for controls, the ntercept estmated s logž 10. s.3 larger than the one estmated wth a prospectve study. For related comments, see Anderson Ž 197., Breslow and Day Ž 1980, p. 03., Breslow and Powers Ž 1978., Carroll et al. Ž 1995., Farewell Ž 1979., Mantel Ž 1973., Prentce Ž 1976a., and Prentce and Pyke Ž Wth case control studes, one cannot estmate n other bnaryresponse models. Unlke the odds rato, the effect for the condtonal dstrbuton of X gven Y does not then equal that for Y gven X. Ths s an mportant advantage of the logt lnk and s a major reason why logt models have surpassed other models n popularty n bomedcal studes. Many case control studes employ matchng. Each case s matched wth one or more control subjects. The controls are lke the case on key characterstcs such as age. The model and subsequent analyss should take the matchng nto account. In Secton we dscuss logstc regresson for matched case control studes. Regardless of the samplng mechansm, logstc regresson may or may not descrbe a relatonshp well. In one specal case, t necessarly holds. Gven that Y s, suppose that X has NŽ,. dstrbuton, s 0, 1. Then, by Ž. Bayes theorem, PYs 1 X s x equals 5.1 wth s 1y 0 r Ž Cornfeld When a populaton s a mxture of two types of subjects, one type wth Y s 1 that s approxmately normally dstrbuted on X and the other type wth Y s 0 that s approxmately normal on X wth smlar varance, the logstc regresson functon Ž 5.1. approxmates well the curve for Ž x..ifthe dstrbutons are normal but wth dfferent varances, the model apples also havng a quadratc term Ž Anderson In that case, the relatonshp s nonmonotone, wth Ž x. ncreasng and then decreasng, or the reverse Ž Problem
89 17 LOGISTIC REGRESSION 5. INFERENCE FOR LOGISTIC REGRESSION By Wald s Ž asymptotc results for ML estmators, parameter estmators n logstc regresson models have large-sample normal dstrbutons. Thus, nference can use the Ž Wald, lkelhood-rato, score. trad of methods Ž Secton Types of Inference For the model wth a sngle predctor, logt Ž x. s q x, sgnfcance tests focus on H 0: s 0, the hypothess of ndependence. The Wald test uses the log lkelhood at, ˆ wth test statstc z s rse ˆ or ts square; under H 0, z s asymptotcally 1. The lkelhood-rato test uses twce the dfference between the maxmzed log lkelhood at ˆ and at s 0 and also has an asymptotc 1 null dstrbuton. The score test uses the log lkelhood at s 0 through the dervatve of the log lkelhood Ž.e., the score functon. at that pont. The test statstc compares the suffcent statstc for to ts null expected value, sutably standardzed w NŽ 0, 1. or x 1. In Secton present ths test of H 0: s 0. For large samples, the three tests usually gve smlar results. The lkelhood-rato test s preferred over the Wald. It uses more nformaton, snce t ncorporates the log lkelhood at H as well as at. ˆ When 0 s relatvely large, the Wald test s not as powerful as the lkelhood-rato test and can even show aberrant behavor wsee Hauck and Donner Ž and Problem 5.38 x. Confdence ntervals are more nformatve than tests. An nterval for results from nvertng a test of H 0: s 0. The nterval s the set of 0 for whch the ch-squared test statstc s no greater than 1 s z r. For the ˆ Wald approach, ths means wž y. rsex F z ; the nterval s ˆ 0 r z Ž SE. r. For summarzng the relatonshp, other characterstcs may have greater mportance than, such as Ž x. at varous x values. For fxed x s x 0, logtwˆ Ž x.x s q ˆ 0 ˆ x0 has a large-sample SE gven by the estmated square root of ˆ var ˆ q x s varž ˆ. q x var ˆ q x cov, ˆ ˆ A 95% confdence nterval for logtw Ž x.x s Ž q ˆ x. 0 ˆ SE. Substtutng each endpont nto the nverse transformaton Ž x. s expž logt. 0 r w1 q expž logt.x gves a correspondng nterval for Ž x. 0. Each method of nference can also produce small-sample confdence ntervals and tests. We defer dscusson of ths untl Secton 6.7.
90 INFERENCE FOR LOGISTIC REGRESSION Inference for Horseshoe Crab Data We llustrate logstc regresson nferences wth the model for the probablty a horseshoe crab has a satellte, wth wdth as the predctor. Table 5.1 showed the ft and standard errors. The statstc z s rse ˆ s 0.497r0.10 s 4.9 provdes strong evdence of a postve wdth effect Ž P The equvalent Wald ch-squared statstc, z s 3.9, has df s 1. The maxmzed log lkelhoods equal y11.88 under H 0: s 0 and y97.3 for the full model. The lkelhood-rato statstc equals yž y11.88 y s 31.3, wth df s 1. Ths provdes even stronger evdence than the Wald test. The Wald 95% confdence nterval for s Ž 0.10., or Ž0.98, Table 5.1 reports a lkelhood-rato confdence nterval of Ž0.308, , based on the profle lkelhood functon. The confdence nterval for the effect on the odds per 1-cm ncrease n wdth equals Že 0.308, e s Ž 1.36,.03.. We nfer that a 1-cm ncrease n wdth has at least a 36% ncrease and at most a doublng n the odds of a satellte. Most software for logstc regresson also reports estmates and confdence ntervals for Ž x. Že.g., PROC GENMOD n SAS wth the OBSTATS opton.. Consder ths for crabs of wdth x s 6.5, near the mean wdth. The estmated logt s y1.351 q 0.497Ž 6.5. s 0.85, and ˆ Ž x. s Software reports $ $ $ var s 6.910, var ˆ s , cov, ˆ sy0.668, Ž ˆ. from whch Ž ˆ. $ var logt x s q x q x y ˆ 4 At x s 6.5 ths s 0.038, so the 95% confdence nterval for logtw Ž 6.5.x equals 0.85 Ž ' 0.038, or Ž 0.44, Ths translates to the nterval Ž 0.61, for the probablty of satelltes Že.g., expž rw1 q expž 0.44.x s ŽAlternatvely, for the model ft usng predctor x* s x y 6.5, ˆ and ts SE are the estmated logt and ts SE.. Fgure 5.3 plots the confdence bands around the predcton equaton for Ž x. as a functon of x. Hauck Ž gave alternatve bands for whch the confdence coeffcent apples smultaneously to all possble predctor values. One could gnore the model ft and smply use sample proportons Ž.e., the saturated model. to estmate such probabltes. Sx female crabs n the sample had x s 6.5, and four of them had satelltes. The sample proporton estmate at x s 6.5 s ˆ s 4r6 s 0.67, smlar to the model-based estmate. The 95% score confdence nterval Ž Secton based on these sx observatons alone equals Ž 0.30, When the logstc regresson model truly holds, the model-based estmator of a probablty s consderably better than the sample proporton. The model has only two parameters to estmate, whereas the saturated model has a
91 174 LOGISTIC REGRESSION FIGURE 5.3 Predcton equaton and 95% confdence bands for probablty of satellte as a functon of wdth. separate parameter for every dstnct value of x. For nstance, at x s 6.5, software reports SE s 0.04 for the model-based estmate 0.695, whereas the SE s ' ˆ Ž 1 y ˆ. rn s ' Ž 0.67.Ž r6 s 0.19 for the sample proporton of 0.67 wth only 6 observatons. The 95% confdence ntervals are Ž0.61, usng the model versus Ž 0.30, usng the sample proporton. Instead of usng only 6 observatons, the model uses the nformaton that all 173 observatons provde n estmatng the two model parameters. The result s a much more precse estmate. Realty s a bt more complcated. In practce, the model s not exactly the true relatonshp between Ž x. and x. However, f t approxmates the true probabltes decently, ts estmator stll tends to be closer than the sample proporton to the true value. The model smooths the sample data, somewhat dampenng the observed varablty. The resultng estmators tend to be better unless each sample proporton s based on an extremely large sample. Secton dscusses ths advantage of usng models Checkng Goodness of Ft: Ungrouped and Grouped Data In practce, there s no guarantee that a certan logstc regresson model fts the data well. For any type of bnary data, one way to detect lack of ft uses a lkelhood-rato test to compare the model to more complex ones. A more complex model mght contan a nonlnear effect, such as a quadratc term. Models wth multple predctors would consder nteracton. If more complex models do not ft better, ths provdes some assurance that the model chosen s reasonable.
92 INFERENCE FOR LOGISTIC REGRESSION 175 Other approaches to detectng lack of ft search for any way that the model fals. Ths s smplest when the explanatory varables are solely categorcal, as we ll llustrate n Secton At each settng of x, one can multply the estmated probabltes of the two outcomes by the number of subjects at that settng to obtan estmated expected frequences for y s 0 and y s 1. These are ftted alues. The test of the model compares the observed counts and ftted values usng a Pearson X or lkelhood-rato G statstc. For a fxed number of settngs, as the ftted counts ncrease, X and G have lmtng ch-squared null dstrbutons. The degrees of freedom, called the resdual df for the model, subtract the number of parameters n the model from the number of parameters n the saturated model Ž.e., the number of settngs of x.. The reason for the restrcton to categorcal predctors for a global test of ft relates to the dstncton n Secton that we mentoned between grouped and ungrouped data for bnomal models. The saturated model dffers n the two cases. An asymptotc ch-squared dstrbuton for the devance results as n wth a fxed number of parameters n that model and hence a fxed number of settngs of predctor values Goodness of Ft of Model for Horseshoe Crabs We llustrate wth a goodness-of-ft analyss for the model usng x s wdth to predct the probablty that a female crab has a satellte. One way to check t compares t to a more complex model, such as the model contanng a quadratc term. Wth wdth centered at 0 by subtractng ts mean of 6.3, that model has ft logt ˆ Ž x. s q x q x. The quadratc estmate has SE s There s not much evdence to support addng that term. The lkelhood-rato statstc for testng that the true coeffcent of x s 0 equals 0.83 Ž df s 1.. We next consder overall goodness of ft. Wdth takes 66 dstnct values for the 173 crabs, wth few observatons at most wdths. One can vew the data as a66 contngency table. The two cells n each row count the number of crabs wth satelltes and the number of crabs wthout satelltes, at that wdth. The ch-squared theory for X and G apples when the number of levels of x s fxed, and the number of observatons at each level grows. Although we grouped the data usng the dstnct wdth values rather than usng 173 separate bnary responses, ths theory s volated here n two ways. Frst, most ftted counts are very small. Second, when more data are collected, addtonal wdth values would occur, so the contngency table would contan more cells rather than a fxed number. Because of ths, X and G for logstc regresson models wth contnuous or nearly contnuous predctors do not have approxmate ch-squared dstrbutons. ŽNormal approxmatons can be
93 176 LOGISTIC REGRESSION TABLE 5. Groupng of Observed and Ftted Values for Ft of Logstc Regresson Model to Horseshoe Crab Data Number Number Ftted Ftted Wdth Ž cm. Yes No Yes No more approprate, but no sngle method has receved much attenton; see Secton for references.. One could use X and G to compare the observed and ftted values n grouped form. Table 5. uses the groupngs of Table 4.4, gvng an 8 table. In each wdth category, the ftted value for a yes response s the sum of the estmated probabltes ˆ Ž x. for all crabs havng wdth n that category; the ftted value for a no response s the sum of 1 y ˆ Ž x. for those crabs. The ftted values are then much larger. Then, X and G have better valdty, although the ch-squared theory stll s not perfect snce Ž x. s not constant n each category. Ther values are X s 5.3 and G s 6.. Table 5. has eght bnomal samples, one for each wdth settng; the model has two parameters, so df s 8 y s 6. Nether X nor G shows evdence of lack of ft Ž P Thus, we can feel more comfortable about usng the model for the orgnal ungrouped data Checkng Goodness of Ft wth Ungrouped Data by Groupng As just noted, wth ungrouped data or wth contnuous or nearly contnuous predctors, X and G do not have lmtng ch-squared dstrbutons. They are stll useful for comparng models, as done above for checkng a quadratc term and as we wll dscuss n Sectons and Also, as just noted, one can apply them n an approxmate manner to grouped observed and ftted values for a partton of the space of x values. As the number of explanatory varables ncreases, however, smultaneous groupng of values for each varable can produce a contngency table wth a large number of cells, most of whch have small counts. Regardless of the number of predctors, one can partton observed and ftted values accordng to the estmated probabltes of success usng the orgnal ungrouped data. One common approach forms the groups n the partton so they have approxmately equal sze. Wth 10 groups, the frst par
94 LOGIT MODELS WITH CATEGORICAL PREDICTORS 177 of observed counts and correspondng ftted counts refers to the nr10 observatons havng the hghest estmated probabltes, the next par refers to the nr10 observatons havng the second decle of estmated probabltes, and so on. Each group has an observed count of subjects wth each outcome and a ftted value for each outcome. The ftted value for an outcome s the sum of the estmated probabltes for that outcome for all observatons n that group. Ths constructon s the bass of a test due to Hosmer and Lemeshow Ž They proposed a Pearson statstc comparng the observed and ftted counts for ths partton. Let yj denote the bnary outcome for observaton j n group of the partton, s 1,..., g, j s 1,...,n. Let ˆ j denote the correspondng ftted probablty for the model ftted to the ungrouped data. Ther statstc equals g Ž Ý jyjy Ý j ˆ j. Ý Ý ˆ 1 y Ý ˆ rn s1 j j j j. When many observatons have the same estmated probablty, there s some arbtrarness n formng the groups, and dfferent software may report somewhat dfferent values. Ths statstc does not have a lmtng ch-squared dstrbuton, because the observatons n a group are not dentcal trals, snce they do not share a common success probablty. However, Hosmer and Lemeshow noted that when the number of dstnct patterns of covarate values equals the sample sze, the null dstrbuton s approxmated by ch-squared wth df s g y. For the logstc regresson ft to the horseshoe crab data wth contnuous wdth predctor, the Hosmer Lemeshow statstc wth g s 10 groups equals 3.5, wth df s 8. It also ndcates a decent ft. Unfortunately, lke other proposed global ft statstcs, the Hosmer Lemeshow statstc does not have good power for detectng partcular types of lack of ft Ž Hosmer et al In any case, a large value of a global ft statstc merely ndcates some lack of ft but provdes no nsght about ts nature. The approach of comparng the workng model to a more complex one s more useful from a scentfc perspectve, snce t searches for lack of ft of a partcular type. For ether approach, when the ft s poor, dagnostc measures descrbe the nfluence of ndvdual observatons on the model ft and hghlght reasons for the nadequacy. We dscuss these n Secton LOGIT MODELS WITH CATEGORICAL PREDICTORS Lke ordnary regresson, logstc regresson extends to nclude qualtatve explanatory varables, often called factors. Inths secton we use dummy varables to do ths.
95 178 LOGISTIC REGRESSION ANOVA-Type Representaton of Factors For smplcty, we frst consder a sngle factor X, wth I categores. In row of the I table, y s the number of outcomes n the frst column Ž successes. out of n trals. We treat y as bnomal wth parameter. The logt model wth a factor s log s q. Ž y The hgher s, the hgher the value of. The rght-hand sde of Ž 5.4. resembles the model formula for cell means n one-way ANOVA. As n ANOVA, the factor has as many parameters 4 as categores, but one s redundant. Wth I categores, X has I y 1 nonredundant parameters. One parameter can be set to 0, say I s 0. If the values do not satsfy ths, we can recode so that t s true. For nstance, set s y I and s q I, whch satsfy s 0. Then I logtž. s q s Ž y. q q s q, I I where the newly defned parameters satsfy the constrant. When s 0, I equals the logt n row I, and s the dfference between the logts n rows and I. Thus, equals the log odds rato for that par of rows. For any 0, 4 4 exst such that model Ž 5.4. holds. The model has as many parameters Ž I. as bnomal observatons and s saturated. When a factor has no effect, 1s s s I. Snce ths s equvalent to 1 s s, ths model wth only an ntercept term specfes statstcal ndepen- I dence of X and Y Dummy Varables n Logt Models An equvalent expresson of model Ž 5.4. uses dummy arables. Let x s 1 for observatons n row and x s 0 otherwse, s 1,...,I y 1. The model s logtž. s q 1 x 1 q x q q Iy1 x Iy1. Ths accounts for parameter redundancy by not formng a dummy varable for category I. The constrant s 0nŽ 5.4. I corresponds to ths form of dummy varable. The choce of category to exclude for the dummy varable s arbtrary. Some software sets 1 s 0; ths corresponds to a model wth dummy varables for categores through I, but not category 1. Another way to mpose constrants sets Ý s 0. Suppose that X has I s categores, so sy. Ths results from effect codng for a dummy 1 varable, x s 1 n category 1 and x sy1 n category.
96 LOGIT MODELS WITH CATEGORICAL PREDICTORS 179 The same substantve results occur for any codng scheme. For model Ž 5.4., regardless of the constrant for 4, q ˆ 4 and hence 4 ˆ ˆ are the same. The dfferences ˆ y ˆ for pars Ž a, b. a b of categores of X are dentcal and represent estmated log odds ratos. Thus, expž ˆ y ˆ. a b s the estmated odds of success n category a of X dvded by the estmated odds of success n category b of X. Reparameterzng a model may change parameter estmates but does not change the model ft or the effects of nterest. The value or ˆ for a sngle category s rrelevant. Dfferent constrant systems result n dfferent values. For a bnary predctor, for nstance, usng dummy varables wth reference value s 0, the log odds rato equals 1y s 1;bycontrast, for effect codng wth 1 dummy varable and hence q s 0, the log odds rato equals y s yž y s 1. A parameter or ts estmate makes sense only by comparson wth one for another category Alcohol and Infant Malformaton Example Revsted We return now to Table 3.7 from the study of maternal alcohol consumpton and chld s congental malformatons, shown agan n Table 5.3. For model Ž 5.4., we treat malformatons as the response and alcohol consumpton as an explanatory factor. Regardless of the constrant for 4, q ˆ 4 ˆ are the sample logts, reported n Table 5.3. For nstance, logtž ˆ. s ˆ q ˆ s logž 48r17,066. sy For the codng that constrans s 0, sy3.61 and ˆ 5 ˆ 1sy.6. For the codng 1 s 0, ˆ sy5.87. Table 5.3 shows that except for the slght reversal between the frst and second categores of alcohol consumpton, the logts and hence the sample proportons of malformaton cases ncrease as alcohol consumpton ncreases. The smpler model wth all s 0 specfes ndependence. For t, ˆ equals the logt for the overall sample proporton of malformatons, or logž 93r3481. sy5.86. To test H : ndependence Ž df s 4,. the Pearson 0 TABLE 5.3 Logts and Proporton of Malformaton for Table 3.7 Alcohol Proporton Malformed Consumpton Present Absent Logt Observed Ftted ,066 y ,464 y y y G y
97 180 LOGISTIC REGRESSION statstc 3.10 s X s 1.1 Ž P s 0.0., and the lkelhood-rato statstc 3.11 s G s 6. Ž P s These provde mxed sgnals. Table 5.3 has a mxture of very small, moderate, and extremely large counts. Even though n s 3,574, the null samplng dstrbutons of X or G may not be close to ch-squared. The P-values usng the exact condtonal dstrbutons of X and G are 0.03 and These are closer, but stll gve dfferng evdence. In any case, these statstcs gnore the ordnalty of alcohol consumpton. The sample suggests that malformatons may tend to be more lkely wth hgher alcohol consumpton. The frst two percentages are smlar and the next two are also smlar, however, and any of the last three percentages changes substantally wth the addton or deleton of one malformaton case Lnear Logt Model for I Tables Model Ž 5.4. treats the explanatory factor as nomnal, snce t s nvarant to the orderng of categores. For ordered factor categores, other models are more parsmonous than ths, yet more complex than the ndependence model. For nstance, let scores x, x,..., x 4 1 I descrbe dstances between categores of X. When one expects a monotone effect of X on Y, tsnatural to ft the lnear logt model logtž. s q x. Ž 5.5. The ndependence model s the specal case s 0. The near-monotone ncrease n sample logts n Table 5.3 ndcates that the lnear logt model Ž 5.5. may ft better than the ndependence model. As measured, alcohol consumpton groups a naturally contnuous varable. Wth scores x s 0, x s 0.5, x s 1.5, x s 4.0, x s , the last score beng somewhat arbtrary, Table 5.4 shows results. The estmated multplcatve TABLE 5.4 Computer Output for Logstc Regresson Model wth Infant Malformaton Data Crtera For Assessng Goodness Of Ft Crteron DF Value Devance Pearson Ch- Square Log Lkelhood y Std Lkelhood- Rato Wald Parameter Estmate Error 95% Conf Lmts Ch- Sq Pr>ChSq Intercept y y y <.0001 alcohol
98 LOGIT MODELS WITH CATEGORICAL PREDICTORS 181 effect of a unt ncrease n daly alcohol consumpton on the odds of malformaton s expž s Table 5.3 shows the observed and ftted proportons of malformaton. The model seems to ft well, as statstcs comparng observed and ftted counts are G s 1.95 and X s.05, wth df s Cochran Armtage Trend Test Armtage Ž and Cochran Ž were among the frst to emphasze the mportance of utlzng ordered categores n a contngency table. For I tables wth ordered rows and I ndependent bnž n,. varates y 4,they proposed a trend statstc for testng ndependence by parttonng the Pearson statstc for that hypothess. They used a lnear probablty model, s q x, Ž 5.6. ftted by ordnary least squares. For ths model, the null hypothess of ndependence s H 0: s 0. Let x s Ýnxrn. Let ps yrn, and let p s Ž Ý y. rn denote the overall proporton of successes. The predcton equaton s where s p q bž x y x., ˆ ÝnŽ py p.ž xy x. b s. Ý n Ž x y x. Denote the Pearson statstc for testng ndependence by X I. For I tables wth ordered rows, t satsfes where 1 Ý X Ž I. s n Ž p y p. s z q X Ž L., pž 1 y p. 1 X Ž L. s Ý n Ž p y ˆ. pž 1 y p. z s n x y x s. Ž 5.7. pž 1 y p. ' b Ý xy x y Ý Ž. pž 1 y p. ÝnŽ xy x. When the lnear probablty model holds, X Ž L. s asymptotcally ch-squared wth df s I y. It tests the ft of the model. The statstc z, wth df s 1,
99 18 LOGISTIC REGRESSION tests H : s 0 for the lnear trend n the proportons Ž The test of ndependence usng ths statstc s called the Cochran Armtage trend test. Ths analyss seems unrelated to the lnear logt model. However, the Cochran Armtage statstc s equvalent to the score statstc for testng H 0: s 0nthat model. Moreover, ths statstc relates to the statstc M n 3.15 used to test for a lnear trend n an I J table; namely, t equals M appled when J s, except wth Ž n y 1. replaced by n. When I s, X L s 0 and z s X Ž I.. For Table 5.3 on alcohol consumpton and malformaton, X Ž I. s 1.1. Usng the same scores as n the lnear logt model, the Cochran Armtage trend test has z s 6.6 Ž P-value s The test suggests strong evdence of a postve slope. In addton, X Ž I. s 1.1 s 6.6 q 5.5, where X Ž L. s 5.5 Ž df s 3. shows only slght evdence of departure of the proportons from lnearty. The trend test agrees wth M for the sample correlaton of r s for n s 3,573 Ž Secton For the chosen scores, the correlaton seems weak. However, r has lmted use as a descrptve measure for tables that are hghly dscrete and unbalanced. The Cochran Armtage trend test e., the score test. usually gves results smlar to the Wald or lkelhood-rato test of H 0: s 0nthe lnear logt model. The asymptotcs work well even for qute small n when n 4 are equal and x 4 are equally spaced. Wth Table 5.3, the Wald statstc equals Ž ˆ. rse s 0.317r0.15 s 6.4 Ž P s and the lkelhood-rato statstc equals 4.5 Ž P s The hghly unbalanced counts suggest that t s safest to use the lkelhood functon through the lkelhood-rato approach. Ths s also true for estmaton. The profle lkelhood 95% confdence nterval of Ž 0.0, 0.5. for reported n Table 5.4 s preferable to the Wald nterval of Ž s Ž 0.07, Even though n s very large, exact nference based on small-sample methods presented n Secton s relevant here. 5.4 MULTIPLE LOGISTIC REGRESSION Lke ordnary regresson, logstc regresson extends to models wth multple explanatory varables. For nstance, the model for Ž. x s PYs Ž 1. at values x s Ž x,..., x. of p predctors s 1 p logt x s q x q x q q x p p
100 MULTIPLE LOGISTIC REGRESSION 183 Ž. The alternatve formula, drectly specfyng x,s exp q 1x1q xq q px p Ž x. s. Ž q exp q x q x q q x 1 1 p p The parameter refers to the effect of x on the log odds that Y s 1, controllng the other x. For nstance, expž. j s the multplcatve effect on the odds of a 1-unt ncrease n x,atfxed levels of other x j.anexplanatory varable can be qualtatve, usng dummy varables for categores Logt Models for Multway Contngency Tables When all varables are categorcal, a multway contngency table dsplays the data. We llustrate deas wth bnary predctors X and Z. We treat the sample sze at gven combnatons Ž, k. of X and Z as fxed and regard the two counts on Y at each settng as bnomal, wth dfferent bnomals treated as ndependent. Denote the two categores for each varable by Ž 0, 1., and let dummy varables for X and Z have x1s z1s 1 and xs zs 0. The model logt PŽ Ys 1. s q 1 x q zk Ž has man effects for X and Z but assumes an absence of nteracton. The effect of one factor s the same at each level of the other. At a fxed level zk of Z, the effect on the logt of changng categores of X s q 1Ž 1. q zk y q 1Ž 0. q zk s 1. Ž Ths logt dfference equals the dfference of log odds, whch s the log odds rato between X and Y, fxng Z. Thus, expž. 1 s the condtonal odds rato between X and Y. Controllng for Z, the odds of success when X s 1 equal expž. 1 tmes the odds when X s 0. Ths condtonal odds rato s the same at each level of Z; that s, there s homogeneous XY assocaton ŽSecton The lack of an nteracton term n Ž mples a common odds rato for the partal tables. When 1 s 0, that common odds rato equals 1. Then X and Y are ndependent n each partal table, or condtonally ndependent, g en Z Ž Secton Addtvty on the logt scale s the generally accepted defnton of no nteracton for categorcal varables. However, one could, nstead, defne t as addtvty on some other scale, such as wth probt or dentty lnk. Sgnfcant nteracton can occur on one scale when there s none on another scale. In some applcatons, a partcular defnton may be natural. For nstance, theory mght assume an underlyng normal dstrbuton and predct that the probt s an addtve functon of predctor effects.
101 184 LOGISTIC REGRESSION A factor wth I categores needs I y 1 dummy varables, as we showed n Secton An alternatve representaton of such factors resembles the way that ANOVA models often express them. The model formula X Z logt P Ys 1 s q q k 5.11 represents effects of X wth parameters X 4 and effects of Z wth parame- ters Z 4. Ž k The X and Z superscrpts are merely labels and do not represent powers.. Model form Ž apples for any number of categores for X and Z. The parameter X denotes the effect on the logt of classfcaton n category of X. Condtonal ndependence between X and Y, gven Z, X X X corresponds to s s s, whereby PYs Ž 1. 1 I does not change as changes. For each factor, one parameter n Ž s redundant. Fxng one at 0, such as I X s K Z s 0, represents the category not havng ts own dummy varable. When X and Z have two categores, the parameterzaton n model X X 5.11 then corresponds to that n model 5.10 wth 1 s 1 and s 0, and wth 1 Z s and Z s AIDS and AZT Example Table 5.5 s from a study on the effects of AZT n slowng the development of AIDS symptoms. In the study, 338 veterans whose mmune systems were begnnng to falter after nfecton wth the AIDS vrus were randomly assgned ether to receve AZT mmedately or to wat untl ther T cells showed severe mmune weakness. Table 5.5 cross-classfes the veterans race, whether they receved AZT mmedately, and whether they developed AIDS symptoms durng the 3-year study. In model Ž 5.10., we dentfy X wth AZT treatment Žx1 s 1 for mmedate AZT use, x s 0 otherwse. and Z wth race Ž z1s 1 for whtes, zs 0 for blacks., for predctng the probablty that AIDS symptoms developed. Thus, s the log odds of developng AIDS symptoms for black subjects wthout mmedate AZT use, 1 s the ncrement to the log odds for those wth mmedate AZT use, and s the ncrement to the log odds for whte TABLE 5.5 Development of AIDS Symptoms by AZT Use and Race Symptoms Race AZT Use Yes No Whte Yes No 3 81 Black Yes 11 5 No 1 43 Source: New York Tmes, Feb. 15, 1991.
102 MULTIPLE LOGISTIC REGRESSION 185 TABLE 5.6 Computer Output for Logt Model wth AIDS Symptoms Data Goodness- of- Ft Statstcs Crteron DF Value Pr ChSq Devance Pearson Analyss of Maxmum Lkelhood Estmates Parameter Estmate Std Error Wald Ch- Square Pr > ChSq Intercept y azt y race Odds Rato Estmates Effect Estmate 95% Wald Confdence Lmts azt race Profle Lkelhood Confdence Interval for Odds Ratos Effect Estmate 95% Confdence Lmts azt race Obs race azt y n p hat lower upper subjects. Table 5.6 shows output. The estmated odds rato between mmedate AZT use and development of AIDS symptoms equals expž y s For each race, the estmated odds of symptoms are half as hgh for those who took AZT mmedately. The Wald confdence nterval for ths effect s expwy Ž 0.79.x s Ž 0.8, Smlar results occur for the lkelhood-based nterval. The hypothess of condtonal ndependence of AZT treatment and development of AIDS symptoms, controllng for race, s H : s 0nŽ The lkelhood-rato statstc comparng model Ž wth the smpler model havng s 0 equals 6.9 Ž df s 1,showng. evdence of assocaton Ž P s Ž ˆ. The Wald statstc 1rSE s y0.70r0.79 s 6.65 provdes smlar results. Table 5.7 shows parameter estmates for three ways of defnng factor parameters n Ž 5.11.: Ž 1. settng the last parameter equal to 0, Ž. settng the frst parameter equal to 0, and Ž. 3 havng parameters sum to zero. For each codng scheme, at a gven combnaton of AZT use and race, the estmated probablty of developng AIDS symptoms s the same. For nstance, the ntercept estmate plus the estmate for mmedate AZT use plus the estmate for beng whte s y1.738 for each scheme, so the estmated probablty
103 186 LOGISTIC REGRESSION TABLE 5.7 Parameter Estmates for Logt Model Ftted to Table 5.5 Defnton of Parameters Parameter Last s Zero Frst s Zero Sum s Zero Intercept y1.074 y1.738 y1.406 AZT Yes y y0.360 No Race Whte Black y0.055 y0.08 FIGURE 5.4 Estmated effects of AZT use and race on probablty of developng AIDS symptoms Ž dots are sample proportons.. that whte veterans wth mmedate AZT use develop AIDS symptoms equals expž y rw1 q expž y1.738.x s The bottom of Table 5.6 shows pont and nterval estmates of the probabltes. Fgure 5.4 shows a graphcal representaton of the sample proportons Ž the four dots. and the pont estmates enclosed n 95% confdence ntervals. Smlarly, for each codng scheme, 1 X y X s dentcal and represents the condtonal log odds rato of X wth the response, gven Z. Here, expž ˆ X y ˆ X. s expž y s 0.49 estmates the common odds rato between mmedate AZT use and AIDS symptoms, for each race Goodness of Ft as a Lkelhood-Rato Test The lkelhood-rato statstc yž L y L. 0 1 tests whether certan model parameters are zero by comparng the log lkelhood L1 for the ftted model M1 wth L for a smpler model M. Denote ths statstc for testng M, gven 0 0 0
104 MULTIPLE LOGISTIC REGRESSION 187 that M holds, by G Ž M M.. The goodness-of-ft statstc G Ž M s a specal case n whch M0s M and M1 s the saturated model. In testng whether M fts, we test whether all parameters n the saturated model but not n M equal zero. The asymptotc df s the dfference n the number of parameters n the two models, whch s the number of bnomals modeled mnus the number of parameters n M. We llustrate by checkng the ft of model Ž for the AIDS data. For ts ft, whte veterans wth mmedate AZT use had estmated probablty of developng AIDS symptoms durng the study. Snce 107 whte veterans took AZT, the ftted value s 107Ž s 16.0 for developng symptoms and 107Ž s 91.0 for not developng them. Smlarly, one can obtan ftted values for all eght cells n Table 5.5. The goodness-of-ft statstcs comparng these wth the cell counts are G s 1.38 and X s The model has four bnomals, one at each combnaton of AZT use and race. Snce t has three parameters, resdual df s 4 y 3 s 1. The small G and X values suggest that the model fts decently Ž P 0... For model Ž 5.10., the odds rato between X and Y s the same at each level of Z. The goodness-of-ft test checks ths structure. That s, the test also provdes a test of homogeneous odds ratos. For Table 5.5, homogenety s plausble. Snce resdual df s 1, the more complex model that adds an nteracton term and permts the two odds ratos to dffer s saturated. Let LS denote the maxmzed log lkelhood for the saturated model. As dscussed n Secton 4.5.4, the lkelhood-rato statstc for comparng models M1 and M0 s G M M syž L y L syž L y L. y yž L y L. 0 S 1 S s G Ž M0. y G Ž M 1.. The test statstc comparng two models s dentcal to the dfference n G goodness-of-ft statstcs Ž devances. for the two models. To llustrate, consder H 0: s 0 for the race effect wth the AIDS data. The lkelhood-rato statstc equals 0.04, suggestng that the smpler model s adequate. But ths equals G Ž M. y G Ž M. 0 1 s 1.4 y 1.38, where M0 s the smpler model wth s 0. The model comparson statstc often has an approxmate ch-squared null dstrbuton even when separate G Ž M. do not. For nstance, when a predctor s contnuous or a contngency table has very small ftted values, the samplng dstrbuton of G Ž M. may be far from ch-squared. Nonetheless, f df for the comparson statstc s modest Žas n comparng two models that dffer by a few parameters., the null dstrbuton of G Ž M M. 0 1 s approxmately ch-squared.
105 188 LOGISTIC REGRESSION Horseshoe Crab Example Revsted Lke ordnary regresson, logstc regresson can have a mxture of quanttatve and qualtatve predctors. We llustrate wth the horseshoe crab data Ž Secton , usng the female crab s wdth and color as predctors. Color has fve categores: lght, medum lght, medum, medum dark, dark. It s a surrogate for age, older crabs tendng to be darker. The sample contaned no lght crabs, so our models use only the other four categores. We frst treat color as qualtatve. The four categores use three dummy varables. The model s logtž. s q 1c1q cq 3c3q 4x, Ž 5.1. where s PYs 1, x s wdth n centmeters, and c s 1 for medum-lght color, and 0 otherwse, 1 c s 1 for medum color, and 0 otherwse, c s 1 for medum-dark color, and 0 otherwse. 3 The crab color s dark Ž category 4. when c1s cs c3s 0. Table 5.8 shows the ML parameter estmates. For nstance, for dark crabs, logtž ˆ. s y1.715 q x; bycontrast, for medum-lght crabs, c s 1, and logtž ˆ. 1 sž y1.715 q q x sy q x. At the average wdth of 6.3 cm, ˆ s for dark crabs and for medum-lght crabs. The model assumes a lack of nteracton between color and wdth n ther effects. Wdth has the same coeffcent Ž for all colors, so the shapes of the curves relatng wdth to are dentcal. For each color, a 1-cm ncrease n wdth has a multplcatve effect of expž s 1.60 on the odds that Y s 1. Fgure 5.5 dsplays the ftted model. Any one curve equals any other TABLE 5.8 Computer Output for Model wth Wdth and Color Predctors Crtera For Assessng Goodness Of Ft Crteron DF Value Devance Pearson Ch- Square Log Lkelhood y Standard Lkelhood- Rato 95% Ch- Parameter Estmate Error Confdence Lmts Square Pr>ChSq ntercept y y y <.0001 c y c c y wdth <.0001
106 MULTIPLE LOGISTIC REGRESSION 189 FIGURE 5.5 Logstc regresson model usng wdth and color predctors of satellte presence for horseshoe crabs. curve shfted to the rght or left. The parallelsm of curves n the horzontal dmenson mples that any two curves never cross. At all wdth values, color 4 Ž dark. has a lower estmated probablty of a satellte than the other colors. There s a notceable postve effect of wdth. The exponentated dfference between two color parameter estmates s an odds rato comparng those colors. For nstance, the dfference for medumlght crabs and dark crabs equals At any gven wdth, the estmated odds that a medum-lght crab has a satellte are expž s 3.8 tmes the estmated odds for a dark crab. At wdth x s 6.3, the odds equal 0.715r0.85 s.51 for a medum-lght crab and 0.399r0.601 s 0.66 for a dark crab, for whch.51r0.66 s Model Comparson To test whether color contrbutes sgnfcantly to model Ž 5.1., we test H 0: 1s s 3s 0. Ths states that controllng for wdth, the probablty of a satellte s ndependent of color. We compare the maxmzed log-lkelhood L for the full model Ž to L0 for the smpler model. The test statstc yž L y L. 0 1 s 7.0 has df s 3, the dfference between the numbers of parameters n the two models. The ch-squared P-value of 0.07 provdes slght evdence of a color effect. The more complex model allowng color wdth nteracton has three addtonal terms, the cross-products of wdth wth the color dummy varables.
107 190 LOGISTIC REGRESSION Fttng ths model s equvalent to fttng logstc regresson wth wdth predctor separately for crabs of each color. Each color then has a dfferentshaped curve relatng wdth to PYs Ž 1,. so a comparson of two colors vares accordng to the wdth value. The lkelhood-rato statstc comparng the models wth and wthout the nteracton terms equals 4.4, wth df s 3. The evdence of nteracton s weak Ž P s Quanttatve Treatment of Ordnal Predctor Color has ordered categores, from lghtest to darkest. A smpler model yet treats ths predctor as quanttatve. Color may have a lnear effect, for a set of monotone scores. To llustrate, for scores c s 1,, 3, 44 for the color categores, the model logtž. s q 1c q x Ž has ˆ sy0.509 Ž SE s 0.4. and ˆ s Ž SE s Ths shows strong evdence of an effect for each. At a gven wdth, for every one-category ncrease n color darkness, the estmated odds of a satellte multply by expž y s The lkelhood-rato statstc comparng ths ft to the more complex model Ž 5.1. havng a separate parameter for each color equals 1.7 Ž df s.ths. statstc tests that the smpler model Ž s adequate, gven that model Ž 5.1. holds. It tests that when plotted aganst the color scores, the color parameters n Ž 5.1. follow a lnear trend. The smplfcaton seems permssble Ž P s The color parameter estmates n the qualtatve-color model Ž 5.1. are Ž 1.33, 1.40, 1.11, 0., the 0 value for the dark category reflectng ts lack of a dummy varable. Although these values do not depart sgnfcantly from a lnear trend, the frst three are qute smlar compared to the last one. Thus, another potental color scorng for model Ž s 1, 1, 1, 04; that s, score s 0 for dark-colored crabs, and score s 1 otherwse. The lkelhood-rato statstc comparng model Ž wth these bnary scores to model Ž 5.1. equals 0.5 Ž df s,showng. that ths smpler model s also adequate. Its ft s logtž ˆ. sy1.980 q 1.300c q x, Ž wth standard errors 0.56 and At a gven wdth, the estmated odds that a lghter-colored crab has a satellte are expž s 3.7 tmes the estmated odds for a dark crab. In summary, the qualtatve-color model, the quanttatve-color model wth scores 1,, 3, 44, and the model wth bnary color scores 1, 1, 1, 04 all suggest that dark crabs are least lkely to have satelltes. A much larger sample s
108 MULTIPLE LOGISTIC REGRESSION 191 needed to determne whch color scorng s most approprate. It s advantageous to treat ordnal predctors n a quanttatve manner when such models ft well. The model s smpler and easer to nterpret, and tests of the predctor effect are more powerful when t has a sngle parameter rather than several parameters. In Secton 6.4 we dscuss ths ssue further Standardzed and Probablty-Based Interpretatons To compare effects of quanttatve predctors havng dfferent unts, t can be helpful to report standardzed coeffcents. One approach fts the model to standardzed predctors, replacng each x by Ž x y x. j j jrs x j. Then, each regresson coeffcent represents the effect of a standard devaton change n a predctor, controllng for the other varables. Equvalently, for each j one can multply unstandardzed estmate ˆ by s Ž see also Note 5.9. j x j. Regardless of the unts, many fnd t dffcult to understand odds or odds rato effects. The smpler nterpretaton of the approxmate change n the probablty based on a lnearzaton of the model Ž Secton apples also to multple predctors. Consder a settng of predctors at whch PYs ˆŽ 1. s. ˆ Then, controllng for the other predctors, a 1-unt ncrease n x corresponds approxmately to a ˆ Ž 1 y. j jˆ ˆ change n. ˆ For nstance, at predctor settngs at whch ˆ s 0.5 for ft Ž 5.14., the approxmate effect of a 1-cm ncrease n wdth s Ž Ž 0.5.Ž 0.5. s 0.1. Ths s consderable, snce a 1-cm change n wdth s less than half a standard devaton. Ths lnear approxmaton deterorates as the change n the predctor ncreases. More precse nterpretatons use the probablty formula drectly. To descrbe the effect of x j, one could set the other predctors at ther sample means and compute the estmated probabltes at the smallest and largest x j values. These are senstve to outlers, however. It s often more sensble to use the quartles. For ft Ž 5.14., the sample means are 6.3 for x and for c. The lower and upper quartles of x are 4.9 and 7.7. At x s 4.9 and c s c, ˆ s At x s 7.7 and c s c, ˆ s The change n ˆ from 0.51 to 0.80 over the mddle 50% of the range of wdth values reflects a strong wdth effect. Snce c takes only values 0 and 1, one could nstead report ths effect separately for each. Also, when an explanatory varable s a dummy, t makes sense to report the estmated probabltes at ts two values rather than at quartles, whch could be dentcal. At x s 6.3, ˆ s 0.40 when c s 0 and ˆ s 0.71 when c s 1. Ths color effect, dfferentatng dark crabs from others, s also substantal. Table 5.9 shows a way to present effects that can be understandable to those not famlar wth odds ratos. It also shows results of the extenson of model Ž 5.14., permttng nteracton. The estmated wdth effect s then greater for the lghter-colored crabs. However, the nteracton s not sgnfcant.
109 19 LOGISTIC REGRESSION TABLE 5.9 Summary of Effects n Model ( 5.14) wth Crab Wdth and Color as Predctors of Presence of Satelltes Varable Estmate SE Comparson Change n Probablty No nteracton model Intercept y Color Ž0 s dark, 1 s other Ž 1, 0. at x 0.31 s 0.71 y 0.40 Wdth, x Ž cm Ž UQ, LQ. at c 0.9 s 0.80 y 0.51 Interacton model Intercept y Color Ž0 s dark, 1 s other. y Wdth, x Ž cm Ž UQ, LQ. at c s s 0.43 y 0.30 Wdth color Ž UQ, LQ. at c s s 0.84 y FITTING LOGISTIC REGRESSION MODELS The mechancs of ML estmaton and model fttng for logstc regresson are specal cases of the GLM fttng results of Secton 4.6. Wth n subjects, we treat the n bnary responses as ndependent. Let x s Ž x,..., x. 1 p denote settng of values of p explanatory varables, s 1,..., N. When explanatory varables are contnuous, a dfferent settng may occur for each subject, n whch case N s n. The logstc regresson model Ž 5.8., regardng as a regresson parameter wth unt coeffcent, s Lkelhood Equatons exp Ý js1 p jxj Ž x. s. Ž 5.15 p. 1 q exp Ý x js1 j j When more than one observaton occurs at a fxed x value, t s suffcent to record the number of observatons n and the number of successes. We then let y refer to ths success count rather than to an ndvdual bnary response. Then Y,...,Y 4 are ndependent bnomals wth EY s n Ž x., where 1 N n1 q qnn s n. Ther jont probablty mass functon s proportonal to the product of N bnomal functons, N y n yy Ł Ž x. 1 y Ž x. s1 ½ y ž / 5 s1 Ž x. N n ½ Ý ½Ł 1 y Ž x. 5 5 s1 ½ 5 s1 N n N x s Łexp log Ł 1 y Ž x. 1 y Ž x. s exp y log 1 y x.
110 FITTING LOGISTIC REGRESSION MODELS 193 For model Ž 5.15., the th logt s Ý j jx j,sothe exponental term n the last expresson equals expwý y Ž Ý x.xs expwý Ž Ý yx. x j j j j j j. Also, snce w1 y Ž x.xs w1 q expž Ý x.x y1, the log lkelhood equals j j j ž j/ j ž j j/ j j Ý Ý Ý Ý LŽ. s yx y n log 1 q exp x. Ž Ths depends on the bnomal counts only through the suffcent statstcs Ý yx, j s 1,..., p 4 j. The lkelhood equatons result from settng LŽ. r s 0. Snce LŽ. expž Ýk kxk. s Ýyx jy Ýnx j, 1 q expž Ý x. the lkelhood equatons are j k k k Ý Ý yx y n ˆ x s 0, j s 1,..., p, Ž j j where s expž Ý ˆ x. rw1 q expž Ý ˆ x.x s the ML estmate of Ž x. ˆ k k k k k k. We observed these equatons as a specal case of those for bnomal GLMs n Ž 4.5. Ž but there y s the proporton of successes.. The equatons are nonlnear and requre teratve soluton. Let X denote the N p matrx of values of x 4 j. The lkelhood equatons Ž have form X y s X, ˆ Ž where ˆ s nˆ. Ths equaton llustrates a fundamental result: For GLMs wth canoncal lnk, the lkelhood equatons equate the suffcent statstcs to the estmates of ther expected values. Equaton Ž showed ths result n the GLM context, and Ž are the normal equatons n ordnary regresson Asymptotc Covarance Matrx of Parameter Estmators The ML estmators ˆ have a large-sample normal dstrbuton wth covarance matrx equal to the nverse of the nformaton matrx. The observed nformaton matrx has elements L xaxbn exp Ý j jxj y s Ý s Ýx a x b n Ž 1 y.. Ž a b 1 q exp Ý x j j j Ths s not a functon of y 4,sotheobserved and expected nformaton are dentcal. Ths happens for all GLMs that use canoncal lnks Ž Secton
111 194 LOGISTIC REGRESSION The estmated covarance matrx s the nverse of the matrx havng elements Ž 5.19., substtutng. ˆ Ths has form $ y1 4 cov ˆ s X dag n ˆ Ž 1 y ˆ. X, Ž 5.0. where dagwn Ž 1 y.x ˆ ˆ denotes the N N dagonal matrx havng n Ž 1 y.4 ˆ ˆ on the man dagonal. Ths s the specal case of the GLM covarance matrx Ž 4.8. wth estmated dagonal weght matrx Wˆ havng elements wˆ s n Ž 1 y. ˆ ˆ. The square roots of the man dagonal elements of Ž 5.0. are estmated standard errors of. ˆ Dstrbuton of Probablty Estmators $ Usng cov Ž ˆ., one can conduct nference about and related effects such as odds ratos. One can also construct confdence ntervals for response probabltes Ž. x at partcular settngs x. $ The estmated varance of logtwˆ Ž.x x s x ˆ s x covž ˆ. x. For large sam- $ ples, logtwˆ Ž.x x z ' x cov ˆ x r s a confdence nterval for the true logt. The endponts nvert to a correspondng nterval for Ž. x usng the transform s expž logt. rw1 q expž logt.x Newton Raphson Method Appled to Logstc Regresson We refer back to Secton for the Newton Raphson teratve method. Let LŽ. Žt. Žt. u s s Ý Ž y y n. x j j j Žt. h s sy x x n 1 y. L Žt. Žt. Žt. ab Ý a b a b Žt. Here, Žt., approxmaton t for, ˆ sobtaned from Žt. through Ž js1 j j. exp Ý js1 p j Žt. x Žt. j s. Ž 5.1. p Žt. 1 q exp Ý x Žt. Žt. Žtq1. We use u and H wth formula 4.39 to obtan the next value, whch n ths context s 4 Žtq1. Žt. Žt. Žt. y1 Žt. s q X dag n 1 y X X yy, Ž 5.. where Žt. s n Žt.. Ths s used to obtan Žtq1., and so forth.
112 FITTING LOGISTIC REGRESSION MODELS 195 Ž0. Ž0. Wth an ntal guess, 5.1 yelds, and for t 0 the teratons Žt. Žt. proceed as just descrbed usng 5. and 5.1. In the lmt, and ˆ Žt. converge to the ML estmates ˆ and Walker and Duncan The H matrces converge to Hˆ syx dagwn Ž 1 y.xx. ByŽ 5.0. ˆ ˆ the estmated asymptotc covarance matrx of ˆ s a by-product of the Newton Raphson method, namely yh ˆ y1. From the argument n Secton 4.6.3, Žtq1. has the teratve reweghted Ž y1. y1 y1 Žt. Žt. least squares form XV X XV z, where z has elements t t Žt. yy n Žt. Žt. Žt. Žt. Žt. 1 y n 1 y z s log q, Ž 5.3. Žt. Ž Žt. and where V s a dagonal matrx wth elements 1rn 1 y.4 t.inths expresson, z Žt. s the lnearzed form of the logt lnk functon for the sample Žt. data, evaluated at wsee Ž 4.4.x. From Secton the elements of Vt are estmated asymptotc varances of the sample logts. The ML estmate s the lmt of a sequence of weghted least squares estmates, where the weght matrx changes at each cycle Convergence and Exstence of Fnte Estmates The log-lkelhood functon for logstc regresson models s strctly concave. ML estmates exst and are unque except n certan boundary cases ŽHaber- man 1974a; Wedderburn 1976; Albert and Anderson Estmates do not exst or may be nfnte when there s no overlap n the sets of explanatory varable values havng y s 0 and havng y s 1; that s, when a hyperplane can pass through the space of predctor values such that on one sde of that hyperplane y s 0 for all observatons, whereas on the other sde, y s 1 always. There s then perfect dscrmnaton, asone can predct the sample outcomes perfectly by knowng the predctor values Žexcept possbly at a boundary pont.. When there s overlap, ML estmates exst and are unque. Smlar results occur for the probt and some other lnks Ž Slvapulle Fgure 5.6 llustrates for a sngle explanatory varable. Here, y s 0 at x s 10, 0, 30, 40, and y s 1at x s 60, 70, 80, 90. An deal ft has ˆ s 0 for x F 40 and ˆ s 1 for x G 60. By lettng ˆ and, for fxed, ˆ lettng ˆ sy ˆŽ 50. so that ˆ s 0.5 at x s 50, one generates a sequence wth ever-ncreasng value of the lkelhood that comes successvely closer to a perfect ft. In practce, most software fals to recognze that ˆ s. After a few cycles of teratve fttng, the log lkelhood looks flat at the workng estmate, and convergence crtera are satsfed. Because the log lkelhood s so flat and because varances come from the nverse of the matrx of negatve second dervatves, software typcally reports huge standard errors. For these data, for nstance, PROC GENMOD n SAS reports logtž ˆ. sy19. q 3.8 x wth standard errors of and
113 196 LOGISTIC REGRESSION FIGURE 5.6 Perfect dscrmnaton resultng n nfnte logstc regresson parameter estmate. NOTES Secton 5.1: Interpretng Parameters n Logstc Regresson 5.1. Books focusng on appled logstc regresson nclude Collett Ž and Hosmer and Lemeshow Ž Books havng major components on logstc regresson nclude Chrstensen Ž 1997., Cox and Snell Ž 1989., and Morgan Ž Prentce Ž 1976b. and Stukel Ž extended the scope by ntroducng shape parameters that modfy the behavor of the curve n extreme probablty regons and allow for asymmetrc treatment of the two tals Haldane Ž recommended addng to the numerator and denomnator of the sample logt. Wth ths modfcaton, the bas s on the order of only 1rn, for large n Ž see Frth 1993a and Problem The Cornfeld Ž 196. result about normal dstrbutons for ŽX Ys. mplyng the logstc curve for PYs Ž 1 x. suggests that logstc regresson s useful n dscrmnaton and classfcaton problems. These use a subject s x value to predct to whch of two populatons they belong. Anderson Ž 1975., Bull and Donner Ž 1987., Efron Ž 1975., and Press and Wlson Ž compared logstc regresson favorably to dscrmnant analyss, whch assumes that explanatory varables have a normal dstrbuton at each level of Y Rosenbaum and Rubn Ž used logstc regresson to adjust for bas n comparng two groups n observatonal studes. They defned the propensty as the probablty of beng n one group, for a gven settng of the explanatory varables x, and they used logstc regresson to estmate how propensty depends on x. Incomparng the groups on the response varable, they showed that one can control for dfferng dstrbutons of the groups on x by adjustng for the estmated propensty. Ths s done by usng the propensty to match samples from the groups or to subclassfy subjects nto several strata consstng of ntervals of propensty scores or to adjust drectly by enterng the propensty n the model. See D Agostno Ž for a tutoral Adelbast and Plackett Ž 1983., Chaloner and Larntz Ž 1988., Mnkn Ž 1987., and Wu Ž dscussed desgn problems for bnary response experments, such as choosng settngs for a predctor to optmze a crteron for estmatng parameter values or estmatng the settng at whch the response probablty equals some fxed value. The nonconstant varance makes ths challengng.
114 PROBLEMS 197 Secton 5.: Inference for Logstc Regresson 5.6. Albert and Anderson Ž 1984., Berkson Ž 1951, 1953, 1955., Cox Ž 1958a., Hodges Ž 1958., and Walker and Duncan Ž dscussed ML estmaton for logstc regresson. For adjustments wth complex sample surveys, see Hosmer and Lemeshow Ž 000, Sec and LaVange et al. Ž Scott and Wld Ž 001. dscussed the analyses of case control studes wth complex samplng desgns Tsats Ž suggested an alternatve goodness-of-ft test that parttons values for the explanatory varables nto a set of regons and adds a dummy varable to the model for each regon. The test statstc compares the ft of ths model to the smpler one, testng that the extra parameters are not needed. The dea of groupng values to check model ft by comparng observed and ftted counts extends to any GLM Ž Pregbon Hosmer et al. Ž compared varous ways of dong ths. Secton 5.3: Logt Models wth Categorcal Predctors 5.8. The Cochran Armtage trend test s locally asymptotcally effcent for both lnear and logstc alternatves for PYs Ž 1.Its. effcency aganst lnear alternatves follows from the approxmate normalty of the sample proportons, wth constant Bernoull varance when s 0. For the lnear logt model Ž 5.5., ts effcency follows from ts equvalence wth the score test. See Problem 9.35 and Cox Ž 1958a. for related remarks. Tarone and Gart Ž showed that the score test for a bnary lnear trend model does not depend on the lnk functon. Gross Ž noted that for the lnear logt model, the local asymptotc relatve effcency for testng ndependence usng the statstc wth an ncorrect set of scores equals the square of the Pearson correlaton between the true and ncorrect scores. Smon Ž gave related asymptotc results. Corcoran et al. Ž 001., Mantel Ž 1963., and Podgor et al. Ž extended the trend test. Secton 5.4: Multple Logstc Regresson 5.9. Snce the standardzed logstc cdf has standard devaton r 3, some software Že.g., PROC LOGISTIC n SAS. defnes a standardzed estmate by multplyng the unstandardzed estmate by s ' 3r. x j ' PROBLEMS Applcatons 5.1 For a study usng logstc regresson to determne characterstcs assocated wth remsson n cancer patents, Table 5.10 shows the most mportant explanatory varable, a labelng ndex Ž LI.. Ths ndex measures prolferatve actvty of cells after a patent receves an njecton of trtated thymdne, representng the percentage of cells that are labeled. The response Y measured whether the patent acheved remsson Ž 1 s yes.. Software reports Table 5.11 for a logstc regresson model usng LI to predct the probablty of remsson.
115 198 LOGISTIC REGRESSION TABLE 5.10 Data for Problem 5.1 Number Number of Number Number of Number Number of LI of Cases Remssons LI of Cases Remssons LI of Cases Remssons Source: Data reprnted wth permsson from E. T. Lee, Comput. Prog. Bomed. 4: TABLE 5.11 Computer Output for Problem 5.1 Intercept Intercept and Crteron Only Covarates y Log L Testng Global Null Hypothess: BETA = 0 Test Ch- Square DF Pr > ChSq Lkelhood Rato Score Wald Parameter Estmate Standard Error Ch- Square Pr > ChSq Intercept y l Odds Rato Estmates Effect Pont Estmate 95% Wald Confdence Lmts l Estmated Covarance Matrx Varable Intercept l Intercept y l y Obs l remss n p hat lower upper a. Show how software obtaned ˆ s when LI s 8. b. Show that ˆ s 0.5 when LI s 6.0. c. Show that the rate of change n ˆ s when LI s 8 and when LI s 6. d. The lower quartle and upper quartle for LI are 14 and 8. Show that ˆ ncreases by 0.4, from 0.15 to 0.57, between those values. e. For a unt change n LI, show that the estmated odds of remsson multply by 1.16.
116 PROBLEMS 199 f. Explan how to obtan the confdence nterval reported for the odds rato. Interpret. g. Construct a Wald test for the effect. Interpret. h. Conduct a lkelhood-rato test for the effect, showng how to construct the test statstc usng the y log L values reported.. Show how software obtaned the confdence nterval for reported at LI s 8. Ž Hnt: Use the reported covarance matrx.. TABLE 5.1 Data for Problem 5. a Ft Temp TD Ft Temp TD Ft Temp TD Ft Temp TD Ft Temp TD a Ft, flght number; Temp, temperature Ž F.; TD, thermal dstress Ž 1, yes; 0, no.. Source: Data based on Table 1 n J. Amer. Statst. Assoc., 84: , Ž 1989., by S. R. Dalal, E. B. Fowlkes, and B. Hoadley. Reprnted wth permsson from the Journal of the Amercan Statstcal Assocaton. 5. For the 3 space shuttle flghts before the Challenger msson dsaster n 1986, Table 5.1 shows the temperature at the tme of the flght and whether at least one prmary O-rng suffered thermal dstress. a. Use logstc regresson to model the effect of temperature on the probablty of thermal dstress. Plot a fgure of the ftted model, and nterpret. b. Estmate the probablty of thermal dstress at 31 F, the temperature at the place and tme of the Challenger flght. c. Construct a confdence nterval for the effect of temperature on the odds of thermal dstress, and test the statstcal sgnfcance of the effect. d. Check the model ft by comparng t to a more complex model Refer to Table 4.. Usng scores 0,, 4, 5 for snorng, ft the logstc regresson model. Interpret usng ftted probabltes, lnear approxmatons, and effects on the odds. Analyze the goodness of ft. 5.4 Haste and Tbshran 1990, p. 8 descrbed a study to determne rsk factors for kyphoss, severe forward flexon of the spne followng correctve spnal surgery. The age n months at the tme of the operaton for the 18 subjects for whom kyphoss was present were 1, 15, 4, 5, 59, 73, 8, 91, 96, 105, 114, 10, 11, 18, 130, 139, 139, 157
117 00 LOGISTIC REGRESSION and for of the subjects for whom kyphoss was absent were 1, 1,, 8, 11, 18,, 31, 37, 61, 7, 81, 97, 11, 118, 17, 131, 140, 151, 159, 177, 06. a. Ft a logstc regresson model usng age as a predctor of whether kyphoss s present. Test whether age has a sgnfcant effect. b. Plot the data. Note the dfference n dsperson on age at the two w x levels of kyphoss. Ft the model logt x s q 1x q x. Test the sgnfcance of the squared age term, plot the ft, and nterpret. Ž Note also Problem Refer to Table The Pearson test of ndependence has X Ž I. s 6.88 Ž P s For equally spaced scores, the Cochran Armtage trend test has z s 6.67 Ž P s Interpret, and explan why results dffer so. Analyze the data usng a lnear logt model. Test ndependence usng the Wald and lkelhood-rato tests, and compare results to the Cochran Armtage test. Check the ft of the model, and nterpret. 5.6 For Table 5.3, conduct the trend test usng alcohol consumpton scores Ž 1,, 3, 4, 5. nstead of Ž 0.0, 0.5, 1.5, 4.0, Compare results, notng the senstvty to the choce of scores for hghly unbalanced data. 5.7 Refer to Table.11. Usng scores 0, 3, 9.5, 19.5, 37, 55 for cgarette smokng, analyze these data usng a logt model. Is the ntercept estmate meanngful? Explan. 5.8 A study used the 1998 Behavoral Rsk Factors Socal Survey to consder factors assocated wth women s use of oral contraceptves n the Unted States. Table 5.13 summarzes effects for a logstc regresson model for the probablty of usng oral contraceptves. Each predctor uses a dummy varable, and the table lsts the category havng dummy outcome 1. Interpret effects. Construct and nterpret a confdence nterval for the condtonal odds rato between contraceptve use and educaton. TABLE 5.13 Data for Problem 5.8 Varable Codng s 1 f: Estmate SE Age 35 or younger y Race Whte Educaton G 1 year college Martal status Marred y Source: Data courtesy of Debbe Wlson, College of Pharmacy, Unversty of Florda.
118 PROBLEMS 01 TABLE 5.14 Computer Output for Problem 5.9 Crtera For Assessng Goodness Of Ft Crteron DF Value Devance Pearson Ch- Square Log Lkelhood y Standard Lkelhood Rato Ch- Parameter Estmate Error 95% Conf Lmts Square Intercept y y y def y y y vc LR Statstcs Source DF Ch- Square Pr > ChSq def vc < Refer to Table.6. Table 5.14 shows the results of fttng a logt model, treatng death penalty as the response Ž 1 s yes. and defendant s race Ž 1 s whte. and vctms race Ž 1 s whte. as dummy predctors. a. Interpret parameter estmates. Whch group s most lkely to have the yes response? Fnd the estmated probablty n that case. b. Interpret 95% confdence ntervals for condtonal odds ratos. c. Test the effect of defendant s race, controllng for vctms race, usng a Ž. Wald test, and lkelhood-rato test. Interpret. d. Test the goodness of ft. Interpret Model the effects of vctm s race and defendant s race for Table.13. Interpret Table 5.15 appeared n a natonal study of 15- and 16-year-old adolescent. The event of nterest s ever havng sexual ntercourse. Analyze, TABLE 5.15 Data for Problem 5.11 Intercourse Race Gender Yes No Whte Male Female Black Male 9 3 Female 36 Source: S. P. Morgan and J. D. Teachman, J. Marrage Fam. 50: Ž Reprnted wth permsson from the Natonal Councl on Famly Relatons.
119 0 LOGISTIC REGRESSION ncludng descrpton and nference about the effects of gender and race, goodness of ft, and summary nterpretatons. 5.1 Accordng to the Independent newspaper Ž London, Mar. 8, 1994., the Metropoltan Polce n London reported 30,475 people as mssng n the year endng March For those of age 13 or less, 33 of 371 mssng males and 38 of 486 mssng females were stll mssng a year later. For ages 14 to 18, the values were 63 of 756 males and 108 of 8877 females; for ages 19 and above, the values were 157 of 5065 males and 159 of 350 females. Analyze and nterpret. ŽThanks to Pat Altham for showng me these data The Natonal Collegate Athletc Assocaton studed graduaton rates for freshman student athletes durng the academc year. The Ž sample sze, number graduated. totals were Ž 796, 498. for whte females, Ž 165, 878. for whte males, Ž 143, 54. for black females, and Ž 60, 197. for black males ŽJ. J. McArdle and F. Hamagam, J. Amer. Statst. Assoc. 89: , Analyze and nterpret In a study desgned to evaluate whether an educatonal program makes sexually actve adolescents more lkely to obtan condoms, adolescents were randomly assgned to two expermental groups. The educatonal program, nvolvng a lecture and vdeotape about transmsson of the HIV vrus, was provded to one group but not the other. Table 5.16 summarzes results of a logstc regresson model for factors observed to nfluence teenagers to obtan condoms. a. Fnd the parameter estmates for the ftted model, usng Ž 1, 0. dummy varables for the frst three predctors. Based on the correspondng confdence nterval for the log odds rato, determne the standard error for the group effect. b. Explan why ether the estmate of 1.38 for the odds rato for gender or the correspondng confdence nterval s ncorrect. Show that f the reported nterval s correct, 1.38 s actually the log odds rato, and the estmated odds rato equals TABLE 5.16 Data for Problem % Confdence Varable Odds Rato Interval Group Ž educaton vs. none Ž 1.17, Gender Ž males vs. females Ž SES Ž hgh vs. low. 5.8 Ž 1.87, Lfetme number of partners 3. Ž 1.08, Source: V.I.Rckert et al., Cln. Pedatr. 31:
120 PROBLEMS 03 TABLE 5.17 Data for Problem 5.15 Varable Effect P-value Intercept y Alcohol use Smokng Race Race smokng Table 5.17 shows estmated effects for a logstc regresson model wth squamous cell esophageal cancer Ž Y s 1, yes; Y s 0, no. as the response. Smokng status Ž S. equals 1 for at least one pack per day and 0 otherwse, alcohol consumpton Ž A. equals the average number of alcoholc drnks consumed per day, and race Ž R. equals 1 for blacks and 0 for whtes. To descrbe the race smokng nteracton, construct the predcton equaton when R s 1 and agan when R s 0. Fnd the ftted YS condtonal odds rato for each case. Smlarly, construct the predcton equaton when S s 1 and agan when S s 0. Fnd the ftted YR condtonal odds ratos. Note that for each assocaton, the coeffcent of the cross-product term s the dfference between the log odds ratos at the two fxed levels for the other varable. Explan why the coeffcent of S represents the log odds rato between Y and S for whtes. To what hypotheses do the P-values for R and S refer? 5.16 A survey of hgh school students on Y s whether the subject has drven a motor vehcle after consumng a substantal amount of alcohol Ž 1 s yes., s s gender Ž 1 s female., r s race Ž 1 s black; 0 s whte., and g s grade Ž g1s 1, grade 9; gs 1, grade 10; g3s 1, grade 11; g s g s g s 0, grade 1. has predcton equaton 1 3 logt Pˆ Ž Ys 1. sy0.88 y 0.40 s y 0.7 r y. g1y 1.43 gy 0.58 g3 q 0.74rg q 0.38rg q 0.01rg. 1 3 a. Carefully nterpret effects. Explan the nteracton by descrbng the race effect at each grade and the grade effect for each race. b. Replace r above by r Ž 1 s black, 0 s other. 1. The study also measured r Ž 1 s Hspanc, 0 s other., wth r1s rs 0 for whte. Suppose that the predcton equaton s as above but wth addtonal terms y0.9 rq 0.53 rg1q 0.5 rgy 0.06 rg 3. Interpret the effects.
121 04 LOGISTIC REGRESSION TABLE 5.18 Data for Problem 5.17 Patent D T Y Patent D T Y Patent D T Y Source: Data from D. Collett, n Encyclopeda of Bostatstcs Ž New York: Wley: 1998., pp Table 5.18 shows the results of a study about Y s whether a patent havng surgery wth general anesthesa experenced a sore throat on wakng Ž 0 s no, 1 s yes. as a functon of the D s duraton of the surgery Ž n mnutes. and the T s type of devce used to secure the arway Ž 0 s laryngeal mask arway, 1 s tracheal tube.. Ft a logt model usng these predctors, nterpret parameter estmates, and conduct nference about the effects Refer to model Ž 5.. for the horseshoe crabs usng x s wdth. a. Show that Ž. at the mean wdth Ž 6.3., the estmated odds of a satellte equal.07; at x s 7.3, the estmated odds equal 3.40; and snce expž ˆ. s 1.64, 3.40 s Ž , and the odds ncrease by 64%. b. Based on the 95% confdence nterval for, show that for x near where s 0.5, the rate of ncrease n the probablty of a satellte per 1-cm ncrease n x falls between about 0.07 and For Table 4.3, ft a logstc regresson model for the probablty of a satellte, usng color alone as the predctor. a. Treat color as nomnal. Explan why ths model s saturated. Express ts parameter estmates n terms of the sample logts for each color. b. Conduct a lkelhood-rato test that color has no effect. c. Ft a model that treats color as quanttatve. Interpret the ft, and test that color has no effect. d. Test the goodness of ft of the model n part Ž. c. Interpret.
122 PROBLEMS Refer to model Descrbe the effect of wdth by fndng the estmated probabltes of a satellte at ts lower and upper quartles, separately for c s 1 and c s Refer to the predcton equaton logtž ˆ. sy y 0.509c q x for model Ž The means and standard devatons are c s.44 and s s 0.80 for color, and x s 6.30 and s s.11 for wdth. For standardzed predctors we.g., x s Ž wdth y 6.3. r.11 x, explan why the estmated coeffcents of c and x equal y0.41 and Interpret these by comparng the partal effects of a 1 standard devaton ncrease n each predctor on the odds. Descrbe the color effect by estmatng the change n ˆ between the frst and last color categores at the mean score for wdth. 5. Refer to model Ž a. Ft the model usng x s weght. Interpret effects of weght and color. b. Does the model permttng nteracton provde an mproved ft? Interpret. c. For part Ž b., construct a confdence nterval for a dfference between the slope parameters for medum-lght and dark crabs. Interpret. d. Usng models that treat color as quanttatve, repeat the analyses n parts Ž. a to Ž. c. 5.3 Fowlkes et al. Ž reported results of a survey of employees of a large natonal corporaton to determne how satsfacton depends on race, gender, age, and regonal locaton. The data are at the book s Web ste Ž aarcdarcda.html.. Ft a logt model to these data and carefully nterpret the parameter estmates. Fowlkes et al. Ž reported The least-satsfed employees are less than 35 years of age, female, other Ž race., and work n the Northeast;.... The most satsfed group s greater than 44 years of age, male, other, and workng n the Pacfc or Md-Atlantc regons; the odds of such employees beng satsfed are about 3.5 to 1. Show how these nterpretatons result from the ft of ths model. 5.4 Let Y denote a subject s opnon about current laws legalzng aborton Ž 1 s support., for gender h Ž hs 1, female; h s, male., relgous afflaton Ž s 1, Protestant; s, Catholc; s 3, Jewsh., and poltcal party afflaton j Ž j s 1, Democrat; j s, Republcan; j s 3, Independent.. For survey data, software for fttng the model G R P logt PŽ Ys 1. s q h q q j
123 06 LOGISTIC REGRESSION reports ˆ s 0.6, ˆG s 0.08, ˆG sy0.08, ˆR sy0.16, ˆR 1 1 s y0.5, ˆR s 0.41, ˆP s 0.87, ˆP sy1.7, ˆP s a. Interpret how the odds of support depends on relgon. b. Estmate the probablty of support for the group most Ž least. lkely to support current laws. c. If, nstead, parameters used constrants 1 G s 1 R s 1 P s 0, report the estmates. 5.5 Table 5.19 refers to a sample of subjects randomly selected for an Italan study on the relaton between ncome and whether one possesses a travel credt card. At each level of annual ncome n mllons of lra, the table ndcates the number of subjects sampled and the number possessng at least one travel credt card. Analyze these data. TABLE 5.19 Data for Problem 5.5 Income Number Income Number Income Number Žmllons of Credt Žmllons of Credt Žmllons of Credt of lra. Cases Cards of lra. Cases Cards of lra. Cases Cards Source: Categorcal Data Analyss, Quadern del Corso Estvo d Statstca e Calcolo delle Probablta, ` n. 4., Isttuto d Metod Quanttatv, Unversta `Lug Boccon, by R. Pccarreta. 5.6 Refer to Table 9.1, treatng marjuana use as the response varable. Analyze these data. 5.7 The book s Web ste Ž aarcdarcda.html. contans a fve-way table relatng occupatonal aspratons Ž hgh, low. to gender, resdence, IQ, and socoeconomc status. Analyze these data. Theory and Methods w x 5.8 For model 5.1, show that x r x s x 1 y x.
124 PROBLEMS For model Ž 5.1., when Ž x. s small, explan why you can nterpret expž. approxmately as Ž x q 1. r Ž x Prove that the logstc regresson curve Ž 5.1. has the steepest slope 1 where Ž x. s. Generalze to model Ž The calbraton problem s that of estmatng x at whch Ž x. s 0. For the lnear logt model, argue that a confdence nterval s the set of x values for whch 1r ˆ ˆ ˆ 0 r ˆ q x y logt r var ˆ q x var q x cov ˆ, z. w x Morgan 199, Sec..7 surveyed other approaches. 5.3 A study for several professonal sports of the effect of a player s draft poston d Ž ds 1,, 3,.... of selecton from the pool of potental players n a gven year on the probablty of eventually beng named an all star used the model logtž. s q log d ŽS. M. Berry, Chance, 14:53 57, a. Show that r 1 y s e d. Show that e s odds for the frst draft pck. b. In the Unted States, Berry reported ˆ s.3 and ˆ sy1.1 for pro basketball and ˆ s 0.7 and ˆ sy0.6 for pro baseball. Ths suggests that n basketball a frst draft pck s more crucal and pcks wth hgh d are relatvely less lkely to be all-stars. Explan why For the populaton of subjects havng Y s j, X has a NŽ,. j dstrbuton, j s 0,1. a. Usng Bayes theorem, show that PYs Ž 1 x. satsfes the logstc regresson model wth s 1y 0 r. b. Suppose that Ž X Ys j. s NŽ,. j j wth 0 1. Show that the logstc model holds wth a quadratc term Ž Anderson wprob- lem 5.4 showed that a quadratc term s helpful when x values have qute dfferent dsperson at y s 0 and y s 1. Ths result also suggests that to test equalty of means of normal dstrbutons when the varances dffer, one can ft a quadratc logstc regresson wth the two groups as the response and test the quadratc term; see O Bren Ž x c. Suppose that ŽX Ys j. has exponental dsperson famly densty fž x;. s expw x y bž.xraž. q cž x,.4 j j j. Fnd the relevant lo- gstc model.
125 08 LOGISTIC REGRESSION d. For multple predctors, suppose that ŽX Y s j. has a multvarate NŽ,. dstrbuton, j s 0, 1. Show that PYs Ž 1 x. j satsfes lo- y1 gstc regresson wth effect parameters Ž y. Ž 1 0 Cornfeld Suppose that Ž x. s FŽ x. for some strctly ncreasng cdf F. Explan why a monotone transformaton of x exsts such that the logstc regresson model holds. Generalze to alternatve lnk functons For an I contngency table, consder logt model Ž a. Gven 0,showhowtofnd 4 4 satsfyng Is 0. b. Prove that 1s s s I s the ndependence model. Fnd ts lkelhood equaton, and show that s logtwž Ý y. rž Ý n.x. ˆ 5.36 Construct the log-lkelhood functon for the model logtw Ž x.x s q x wth ndependent bnomal outcomes of y0 successes n n0 trals at x s 0 and y1 successes n n1 trals at x s 1. Derve the lkelhood equatons, and show that ˆ s the sample log odds rato A study has n ndependent bnary observatons y,..., y 4 1 n when X s x, s 1,...,N, wth n s Ý n. Consder the model logtž. s q x, where s PY Ž s 1.. j a. Show that the kernel of the lkelhood functon s the same treatng the data as n Bernoull observatons or N bnomal observatons. b..for the saturated model, explan why the lkelhood functon s dfferent for these two data forms. Ž Hnt: The number of parameters dffers.. Hence, the devance reported by software depends on the form of data entry. c. Explan why the dfference between devances for two unsaturated models does not depend on the form of data entry. d. Suppose that each ns 1. Show that the devance depends on ˆ but not y. Hence, t s not useful for checkng model ft Ž see also Problem Suppose that Y has a bnž n,. dstrbuton. For the model, logtž. s, consder testng H : s 0 e., s Let ˆ s yrn. a. From Secton 3.1.6, the asymptotc varance of ˆ s logtž ˆ. s wn Ž 1 y.x y1. Compare the estmated SE for the Wald test and the SE usng the null value of, usng test statstc wlogtž ˆ. rse x. Show that the rato of the Wald statstc to the statstc wth null SE equals 4 ˆŽ 1 y ˆ.. What s the mplcaton about performance of the Wald test f s large and ˆ tends to be near 0 or 1?
126 PROBLEMS 09 b. Wald nference depends on the parameterzaton. How does the comparson of tests change wth the scale wž ˆ y 0.5. rse x, where SE s now the estmated or null SE of ˆ? c. Suppose that y s 0or y s n. Show that the Wald test n part Ž. a cannot reject H 0: s 0 for any 0 0 1, whereas the Wald test n part Ž b. rejects every such. w 0 Note: Analogous results apply for nference about the Posson mean versus the log mean; see Mantel Ž 1987a.. x 5.39 Fnd the lkelhood equatons for model Show that they mply the ftted values and that the sample values are dentcal n the margnal two-way tables Consder the lnear logt model Ž 5.5. for an I table, wth y a bnž n,. varate. a. Show that the log lkelhood s I Ý LŽ. s y Ž q x. y n log 1 q expž q x.. s1 I Ý s1 b. Show that the suffcent statstc for s Ýyx, and explan why ths s essentally the varable utlzed n the Cochran Armtage test. Ž Hence that test s a score test of H : s c. Lettng S s Ýy, show that the lkelhood equatons are Ý expž q x. S s Ý n 1 q expž q x. expž q x. yxs Ýnx. 1 q expž q x. d. Let ˆ s n ˆ 4.Explan why Ý ˆ s Ý y and Ý y ˆ x s Ýx. S Ý ˆ a a Explan why ths mples that the mean score on x across the rows n the frst column s the same for the model ft as for the observed data. They are also dentcal for the second column.
127 10 LOGISTIC REGRESSION 5.41 Let Y be bnž n,. at x, and let ps yrn. For bnomal GLMs wth logt lnk: a. For p near, show that p py log f log q. 1 y p 1 y Ž 1 y. Žt. b. Show that z n Ž 5.3. s a lnearzed verson of the th sample logt, evaluated at approxmaton Žt. for ˆ. $ c. Verfy the formula Ž 5.0. for cov Ž ˆ Usng graphs or tables, explan what s meant by no nteracton n modelng response Y and explanatory X and Z when: a. All varables are contnuous Ž multple regresson.. b. Y and X are contnuous, Z s categorcal Ž analyss of covarance.. c. Y s contnuous, X and Z are categorcal Ž two-way ANOVA.. d. Y s bnary, X and Z are categorcal Ž logt model..
128 Categorcal Data Analyss, Second Edton. Alan Agrest Copyrght 00 John Wley & Sons, Inc. ISBN: CHAPTER 6 Buldng and Applyng Logstc Regresson Models Havng studed the bascs of fttng and nterpretng logstc regresson models, we now turn our attenton to buldng and applyng them. Wth several explanatory varables, there are many potental models. In Secton 6.1 we dscuss strateges for model selecton. After choosng a prelmnary model, model checkng addresses whether systematc lack of ft exsts. Secton 6. covers dagnostcs, such as resduals, for model checkng. In practce, a common applcaton compares two groups on a bnary response, wth data stratfed by control varables. In Secton 6.3 we present logt-related analyses of such data. In Secton 6.4 we show the advantages of a well-chosen model n enhancng nferental power for detectng and estmatng assocatons. Secton 6.5 covers power and sample sze determnaton for logstc regresson. Although the logt s the most popular lnk functon for probabltes, other lnks are sometmes more approprate. In Secton 6.6 we present models usng the probt lnk and lnks makng a double log transform. For small samples or models wth many parameters, ordnary large-sample ML nference may perform poorly. In Secton 6.7 we dscuss condtonal logstc regresson. Lke small-sample methods for tables, ths uses condtonng arguments to elmnate nusance parameters. 6.1 STRATEGIES IN MODEL SELECTION Model selecton for logstc regresson faces the same ssues as for ordnary regresson. The selecton process becomes harder as the number of explanatory varables ncreases, because of the rapd ncrease n possble effects and nteractons. There are two competng goals: The model should be complex enough to ft the data well. On the other hand, t should be smple to nterpret, smoothng rather than overfttng the data. 11
129 1 BUILDING AND APPLYING LOGISTIC REGRESSION MODELS Most studes are desgned to answer certan questons. Those questons gude the choce of model terms. Confrmatory analyses then use a restrcted set of models. For nstance, a study hypothess about an effect may be tested by comparng models wth and wthout that effect. For studes that are exploratory rather than confrmatory, a search among possble models may provde clues about the dependence structure and rase questons for future research. In ether case, t s helpful frst to study the effect on Y of each predctor by tself usng graphcs Ž ncorporatng smoothng. for a contnuous predctor or a contngency table for a dscrete predctor. Ths gves a feel for the margnal effects. Unbalanced data, wth relatvely few responses of one type, lmt the number of predctors for the model. One gudelne suggests at least 10 outcomes of each type should occur for every predctor ŽPeduzz et al If y s 1 only 30 tmes out of n s 1000, for nstance, the model should contan no more than about three x terms. Such gudelnes are approxmate, and ths does not mean that f you have 500 outcomes of each type you are well served by a model wth 50 predctors. Many model selecton procedures exst, no one of whch s always best. Cautons that apply to ordnary regresson hold for any generalzed lnear model. For nstance, a model wth several predctors may suffer from multcollnearty correlatons among predctors makng t seem that no one varable s mportant when all the others are n the model. A varable may seem to have lttle effect because t overlaps consderably wth other predctors n the model, tself beng predcted well by the other predctors. Deletng such a redundant predctor can be helpful, for nstance to reduce standard errors of other estmated effects Horseshoe Crab Example Revsted The horseshoe crab data set n Table 4.3 has four predctors: color Žfour categores., spne condton Ž three categores., weght, and wdth of the carapace shell. We now ft a logstc regresson model usng all these to predct whether the female crab has satelltes Ž y s 1.. We start by fttng a model contanng man effects, logt PŽ Ys 1. s q 1weght q wdth q 3c1 q c q c q s q s, treatng color Ž c. and spne condton Ž s. as qualtatve Ž factors. j, wth dummy varables for the frst three colors and the frst two spne condtons. Table 6.1 shows results. A lkelhood-rato test that Y s jontly ndependent of these predctors smultaneously tests H 0: 1s s 7s 0. The test statstc equals 40.6 wth df s 7 Ž P Ths shows extremely strong evdence that at least one predctor has an effect.
130 STRATEGIES IN MODEL SELECTION 13 TABLE 6.1 Computer Output from Fttng Model wth All Man Effects to Horseshoe Crab Data Testng Global Null Hypothess: BETA = 0 Test Ch- Square DF Pr > ChSq Lkelhood Rato <.0001 Analyss of Maxmum Lkelhood Estmates Parameter Estmate Std Error Ch- Square Pr > ChSq Intercept y weght wdth color color color spne 1 y spne y Although the overall test s hghly sgnfcant, the Table 6.1 results are dscouragng. The estmates for weght and wdth are only slghtly larger than ther SE values. The estmates for the factors compare each category to the fnal one as a baselne. For color, the largest dfference s less than two standard errors; for spne condton, the largest dfference s less than a standard error. The small P-value for the overall test, yet the lack of sgnfcance for ndvdual effects, s a warnng sgn of multcollnearty. In Secton 5.. we showed strong evdence of a wdth effect. Controllng for weght, color, and spne condton, lttle evdence remans of a partal wdth effect. However, weght and wdth have a strong correlaton Ž For practcal purposes they are equally good predctors, but t s nearly redundant to use them both. Our further analyss uses wdth Ž W. wth color Ž C. and spne condton Ž S. as predctors. For smplcty, we symbolze models by ther hghest-order terms, regardng C and S as factors. For nstance, Ž C q S q W. denotes a model wth man effects, whereas Ž C q S*W. denotes a model that has those man effects plus an S W nteracton. It s not usually sensble to consder a model wth nteracton but not the man effects that make up that nteracton Stepwse Procedures In exploratory studes, an algorthmc method for searchng among models can be nformatve f we use results cautously. Goodman Ž 1971a. proposed methods analogous to forward selecton and backward elmnaton n ordnary regresson. Forward selecton adds terms sequentally untl further addtons do not mprove the ft. At each stage t selects the term gvng the greatest mprove-
131 14 BUILDING AND APPLYING LOGISTIC REGRESSION MODELS ment n ft. The mnmum P-value for testng the term n the model s a sensble crteron, snce reductons n devance for dfferent terms may have dfferent df values. A stepwse varaton of ths procedure retests, at each stage, terms added at prevous stages to see f they are stll sgnfcant. Backward elmnaton begns wth a complex model and sequentally removes terms. At each stage, t selects the term for whch ts removal has the least damagng effect on the model Ž e.g., largest P-value.. The process stops when any further deleton leads to a sgnfcantly poorer ft. Wth ether approach, for qualtatve predctors wth more than two categores, the process should consder the entre varable at any stage rather than just ndvdual dummy varables. Add or drop the entre varable rather than just one of ts dummes. Otherwse, the result depends on the codng. The same remark apples to nteractons contanng that varable. Many statstcans prefer backward elmnaton over forward selecton, feelng t safer to delete terms from an overly complex model than to add terms to an overly smple one. Forward selecton can stop prematurely because a partcular test n the sequence has low power. Nether strategy necessarly yelds a meanngful model. Use varable selecton procedures wth cauton! When you evaluate many terms, one or two that are not mportant may look mpressve smply due to chance. For nstance, when all the true effects are weak, the largest sample effect may substantally overestmate ts true effect. See Westfall and Wolfnger Ž and Westfall and Young Ž for ways to adjust P-values to take multple tests nto account. Some software has addtonal optons for selectng a model. One approach attempts to determne the best model wth some fxed number of terms, accordng to some crteron. If such a method and backward and forward selecton procedures yeld qute dfferent models, ths s an ndcaton that such results are of dubous use. Another such ndcaton would be when a qute dfferent model results from applyng a gven procedure to a bootstrap sample of the same sze from the sample dstrbuton. Fnally, statstcal sgnfcance should not be the sole crteron for ncluson of a term n a model. It s sensble to nclude a varable that s central to the purposes of the study and report ts estmated effect even f t s not statstcally sgnfcant. Keepng t n the model may help reduce bas n estmated effects of other predctors and may make t possble to compare results wth other studes where the effect s sgnfcant Žperhaps because of a larger sample sze.. Algorthmc selecton procedures are no substtute for careful thought n gudng the formulaton of models Backward Elmnaton for Horseshoe Crab Example Table 6. summarzes results of fttng and comparng several logt models to the horseshoe crab data wth predctors wdth, color, and spne condton. The devance ŽG. test of ft compares the model to the saturated model. As noted n Sectons 5..4 and 5..5, ths s not approxmately ch-squared when a predctor s contnuous, as wdth s. However, the dfference of devances
132 STRATEGIES IN MODEL SELECTION 15 TABLE 6. Results of Fttng Several Logstc Regresson Models to Horseshoe Crab Data Devance Models Devance Corr. a Model Predctors G df AIC Compared Dfference rž y, ˆ. 1 Ž C*S*W Ž C*S q C*W q S*W Ž Ž df s 3. 3a Ž C*S q S*W Ž 3a. 3.7 Ž df s 3. 3b Ž C*W q S*W Ž 3b. 7.9 Ž df s 6. 3c Ž C*S q C*W Ž 3c. 0.0 Ž df s. 4a Ž S q C*W Ž 4a. Ž 3c. 8.0 Ž df s 6. 4b Ž W q C*S Ž 4b. Ž 3c. 3.9 Ž df s 3. 5 Ž C q S q W Ž 5. Ž 4b. 9.0 Ž df s 6. 6a Ž C q S Ž 6a. Ž 5.. Ž df s 1. 6b Ž S q W Ž 6b. Ž Ž df s 3. 6c Ž C q W Ž 6c. Ž Ž df s a Ž C Ž 7a. Ž 6c. 4.5 Ž df s b Ž W Ž 7b. Ž 6c. 7.0 Ž df s Ž C s dark q W Ž 8. Ž 6c. 0.5 Ž df s None Ž.Ž Ž df s a C, color; S, spne condton; W, wdth. between two models that dffer by a modest number of parameters s relevant. That dfference s the lkelhood-rato statstc yž L y L. 0 1 comparng the models, and t has an approxmate null ch-squared dstrbuton.. To select a model, we use backward elmnaton. We test only the hghest-order terms for each varable. It s napproprate, for nstance, to remove a man effect term f the model has nteractons nvolvng that term. We begn wth the most complex model, symbolzed by Ž C*S*W., model 1 n Table 6.. Ths model uses man effects for each term as well as the three two-factor nteractons and the three-factor nteracton. It allows a separate wdth effect at each CS combnaton. ŽIn fact, at some of those combnatons y outcomes of only one type occur, so effects are not estmable.. The lkelhood-rato statstc comparng ths model to the smpler model ŽC*S q C*W q S*W. removng the three-factor nteracton term equals 3. Ž df s 3.. Ths suggests that the three-factor term s not needed Ž P s 0.36., thank goodness, so we contnue the smplfcaton process. In the next stage we consder the three models that remove a two-factor nteracton. Of these, Ž C*S q C*W. gves essentally the same ft as the more complex model, so we drop the S W nteracton. Next, we consder droppng one of the other two-factor nteractons. The model Ž S q C*W., droppng the C S nteracton, has an ncreased devance of 8.0 on df s 6 Ž P s 0.4.; the model Ž W q C*S., droppng the C W nteracton, has an ncreased devance of 3.9 on df s 3 Ž P s Nether ncrease s mportant, suggestng that we can drop ether and proceed. In ether case, droppng next the remanng nteracton also seems permssble. For nstance,
133 16 BUILDING AND APPLYING LOGISTIC REGRESSION MODELS droppng the C S nteracton from model Ž W q C*S., leavng model Ž C q S q W., ncreases the devance by 9.0 on df s 6 Ž P s The workng model now has the man effects alone. In the next stage we consder droppng one of them. Table 6. shows lttle consequence of removng S. Both remanng varables Ž C and W. then have nonneglgble effects. For nstance, removng C ncreases the devance Žcomparng models 7b and 6c. by 7.0 on df s 3 Ž P s The analyss n Secton revealed a notceable dfference between dark crabs Ž category 4. and the others. The smpler model that has a sngle dummy varable for color, equalng 0 for dark crabs and 1 otherwse, fts essentally as well. ŽThe devance dfference between models 8 and 6c equals 0.5, wth df s.. Further smplfcaton results n large ncreases n devance and s unjustfed AIC, Model Selecton, and the Correct Model In selectng a model, we are mstaken f we thnk that we have found the true one. Any model s a smplfcaton of realty. For nstance, wdth does not exactly have a lnear effect on the probablty of satelltes, whether we use the logt lnk or the dentty lnk. What s the logc of testng the ft of a model when we know that t does not truly hold? A smple model that fts adequately has the advantages of model parsmony. If a model has relatvely lttle bas, descrbng realty well, t tends to provde more accurate estmates of the quanttes of nterest. Ths was dscussed n Sectons and 5.. and s examned further n Secton Other crtera besdes sgnfcance tests can help select a good model n terms of estmatng quanttes of nterest. The best known s the Akake nformaton crteron Ž AIC.. It judges a model by how close ts ftted values tend to be to the true values, n terms of a certan expected value. Even though a smple model s farther from the true model than s a more complex model, t may be preferred because t tends to provde better estmates of certan characterstcs of the true model, such as cell probabltes. Thus, the optmal model s the one that tends to have ft closest to realty. Gven a sample, Akake showed that ths crteron selects the model that mnmzes AIC syž maxmzed log lkelhood number of parameters n model.. Ths penalzes a model for havng many parameters. Wth models for categorcal Y, ths orderng s equvalent to one based on an adjustment of w the devance, G y Ž df.x, by twce ts resdual df. For cogent arguments supportng ths crteron, see Burnham and Anderson Ž We llustrate AIC for model selecton usng the models Table 6. lsts. That table also shows the AIC values. Of models usng the three basc varables, AIC s smallest Ž AIC s for C q W, havng man effects of color and wdth. The smpler model havng a dummy varable for whether a crab s dark fares better yet Ž AIC s Ether model seems reasonable.
134 6 6 STRATEGIES IN MODEL SELECTION 17 We should balance the lower AIC for the smpler model aganst ts havng been suggested by the ft of C q W Usng Causal Hypotheses to Gude Model Buldng Although selecton procedures are helpful exploratory tools, the model-buldng process should utlze theory and common sense. Often, a tme orderng among the varables suggests possble causal relatonshps. Analyzng a certan sequence of models helps to nvestgate those relatonshps ŽGoodman We llustrate wth Table 6.3, from a Brtsh study. A sample of men and women who had pettoned for dvorce and a smlar number of marred people were asked: Ž. a Before you marred your Ž former. husbandrwfe, had you ever made love wth anyone else? ; Ž b. Durng your Ž former. marrage, Ž dd you have. have you had any affars or bref sexual encounters wth another manrwoman? The table has varables G s gender, E s whether reported extramartal sex, P s whether reported premartal sex, and M s martal status. The tme ponts at whch responses on the four varables occur suggests the followng orderng of the varables: 6 G P E M gender premartal extramartal martal sex sex status Any of these s an explanatory varable when a varable lsted to ts rght s the response. Fgure 6.1 shows one possble causal structure. In ths fgure, a varable at the tp of an arrow s a response for a model at some stage. The explanatory varables have arrows pontng to the response, drectly or ndrectly. We frst treat P as a response. Fgure 6.1 predcts that G has a drect effect on P, so the model of ndependence of these varables s nadequate. TABLE 6.3 Martal Status by Report of Pre- and Extramartal Sex ( PMS and EMS) Women Gender Men PMS: Yes No Yes No Martal Status EMS: Yes No Yes No Yes No Yes No Dvorced Stll marred Source: G. N. Glbert, Modellng Socety London: George Allen & Unwn, Reprnted wth permsson from Unwn Hyman Ltd.
135 18 BUILDING AND APPLYING LOGISTIC REGRESSION MODELS FIGURE 6.1 Causal dagram for Table 6.3. At the second stage, E s the response. Fgure 6.1 predcts that P and G have drect effects on E. Italso suggests that G has an ndrect effect on E, through ts effect on P. These effects on E can be analyzed usng the logt model for E wth addtve G and P effects. If G has only an ndrect effect on E, the model wth P alone as a predctor s adequate; that s, controllng for P, E and G are condtonally ndependent. At the thrd stage, M s the response. Fgure 6.1 predcts that E has a drect effect on M, P has drect effects and ndrect effects through ts effects on E, and G has ndrect effects through ts effects on P and E. Ths suggests the logt model for M havng addtve E and P effects. For ths model, G and M are ndependent, gven P and E. Table 6.4 shows results. The frst stage, havng P as the response, shows strong evdence of a GP assocaton. The sample odds rato for ther margnal table s 0.7; the estmated odds of premartal sex for females are 0.7 tmes that for males. The second stage has E as the response. Only weak evdence occurs that G had a drect as well as an ndrect effect on E, asg drops by.9 Ž df s 1. after addng G to a model already contanng P as a predctor. For ths model, the estmated EP condtonal odds rato s 4.0. The thrd stage has M as the response. Fgure 6.1 specfes the logt model wth man effects of E and P, but t fts poorly. The model that allows an TABLE 6.4 Goodness of Ft of Varous Models for Table 6.3 a Response Potental Actual Stage Varable Explanatory Explanatory G df 1 P G None Ž G E G, P None Ž P..9 Ž G q P M G, P, E Ž Eq P Ž E*P Ž E*P q G a P, premartal sex; E, extramartal sex; M, martal status; G, gender.
136 LOGISTIC REGRESSION DIAGNOSTICS 19 E P nteracton n ther effects on M but assumes condtonal ndepen- Ž dence of G and M fts much better G decrease of 13.0, df s 1.The. model that also has a man effect for G fts slghtly better yet. Ether model s more complcated than Fgure 6.1 predcted, snce the effects of E on M vary accordng to the level of P. However, some prelmnary thought about causal relatonshps suggested a model smlar to one gvng a good ft. We leave t to the reader to estmate and nterpret effects for the thrd stage New Model-Buldng Strateges for Data Mnng As computng power contnues to explode, enormous data sets are more common. A fnancal nsttuton that markets credt cards may have observatons for mllons of subjects to whom they sent advertsng, on whether they appled for a card. For ther customers, they have monthly data on whether they pad ther bll on tme plus nformaton on many varables measured on the credt card applcaton. The analyss of huge data sets s called data mnng. Model buldng for huge data sets s challengng. There s currently consderable study of alternatves to tradtonal statstcal methods, ncludng automated algorthms that gnore concepts such as samplng error or modelng. Sgnfcance tests are usually rrelevant, as nearly any varable has a sgnfcant effect f n s suffcently large. Model-buldng strateges vew some models as useful for predcton even f they have complex structure. Nonetheless, a pont of dmnshng returns stll occurs n addng predctors to models. After a pont, new predctors tend to be so correlated wth a lnear combnaton of ones already n the model that they do not mprove predctve power. For large n, nference s less relevant than summary measures of predctve power. Ths s a topc of the next secton. 6. LOGISTIC REGRESSION DIAGNOSTICS In Secton 5..3 we ntroduced statstcs for checkng model ft n a global sense. After selectng a prelmnary model, we obtan further nsght by swtchng to a mcroscopc mode of analyss. In contngency tables, for nstance, the pattern of lack of ft revealed n cell-by-cell comparsons of observed and ftted counts may suggest a better model. For contnuous predctors, graphcal dsplays are also helpful. Such dagnostc analyses may suggest a reason for the lack of ft, such as nonlnearty n the effect of an explanatory varable Pearson, Devance, and Standardzed Resduals Wth categorcal predctors, t s useful to form resduals to compare observed and ftted counts. Let y denote the bnomal varate for n trals at
137 0 BUILDING AND APPLYING LOGISTIC REGRESSION MODELS settng of the explanatory varables, s 1,...,N. Let ˆ denote the model estmate of PYs Ž 1.. Then nˆ s the ftted number of successes. For a GLM wth bnomal random component, the Pearson resdual Ž for ths ft s yy nˆ yy nˆ e s $ s. Ž 6.1 1r. varž Y. ' nˆ Ž 1 y ˆ. Ths dvdes the raw resdual Ž y y. ˆ by the estmated bnomal standard devaton of y. The Pearson statstc for testng the model ft satsfes N Ý s1 X s e. Each squared Pearson resdual s a component of X. Wth replaced by n the numerator of Ž 6.1. ˆ, e s the dfference between a bnomal random varable and ts expectaton, dvded by ts estmated standard devaton. For large n, e then has an approxmate NŽ 0, 1. dstrbuton, when the model holds. Snce s estmated by ˆ and the ˆ 4 depend on y 4, however, y y n 4 ˆ tend to be smaller than 4 4 yy n and the e are less varable than N 0, 1. If X has df s, X s Ý e s asymptotcally comparable to the sum of squares of Ž rather than N. ndependent standard normal random varables. Thus, when the model holds, EŽÝ e. rn f rn 1. The standardzed Pearson resdual s slghtly larger n absolute value and s approxmately NŽ 0, 1. when the model holds. In Secton we showed the adjustment uses the leverage from an estmated hat matrx. For observaton wth leverage ˆh, the standardzed resdual s e yy n ˆ r s s. '1 y ˆh n ˆ 1 y ˆ 1 y ˆh ' Absolute values larger than roughly or 3 provde evdence of lack of ft. An alternatve resdual uses components of the G ft statstc. These are the de ance resduals, ntroduced for GLMs n Ž The devance resdual for observaton s where ' Ž. d sgn y y n ˆ, Ž 6.. ž / y ny y ds y log q Ž ny y. log. n ˆ n y n ˆ Ths also tends to be less varable then N 0, 1 and can be standardzed.
138 LOGISTIC REGRESSION DIAGNOSTICS 1 Plots of resduals aganst explanatory varables or lnear predctor values may detect a type of lack of ft. When ftted values are very small, however, just as X and G lose relevance, so do resduals. When explanatory varables are contnuous, often ns 1ateach settng. Then ycan equal only 0or1,and e can assume only two values. One must then be cautous about regardng ether outcome as extreme, and a sngle resdual s usually unnformatve. Plots of resduals also then have lmted use, consstng smply of two parallel lnes of dots. The devance tself s then completely unnformatve Ž Problem When data can be grouped nto sets of observatons havng common predctor values, t s better to compute resduals for the grouped data than for ndvdual subjects. 6.. Heart Dsease Example A sample of male resdents of Framngham, Massachusetts, aged 40 through 59, were classfed on several factors, ncludng blood pressure Ž Table The response varable s whether they developed coronary heart dsease durng a sx-year follow-up perod. Let be the probablty of heart dsease for blood pressure category. The table shows the ft and the standardzed Pearson resduals for two logstc regresson models. The frst model, logtž. s, treats the response as ndependent of blood pressure. Some resduals for that Ž model are large. Ths s not surprsng, snce the model fts poorly G s 30.0, X s 33.4, df s 7.. TABLE 6.5 Standardzed Pearson Resduals for Logt Models Ftted to Data on Blood Pressure and Heart Dsease Observed Ftted Resdual Blood Sample Heart Indep. Lnear Indep. Lnear Pressure Sze Dsease Model Logt Model Logt y.6 y y y.0 y y0.74 y y y0.18 Source: Data from Cornfeld 196.
139 BUILDING AND APPLYING LOGISTIC REGRESSION MODELS TABLE 6.6 Resduals Reported n SAS for Heart Dsease Data of Table 6.5 a Observaton Statstcs Observ dsease n blood Resch Resdev StResch y y y y y0.840 y y y0.516 y y0.304 y y y y0.140 y a Resch, Pearson resdual; StResch, adjusted resdual. A plot of the resduals show an ncreasng trend. Ths suggests the lnear logt model, logtž. s q x, wth scores x 4 for blood pressure level. We used scores Ž 111.5, 11.5, 131.5, 141.5, 151.5, 161.5, 176.5, The nonextreme scores are mdponts for the ntervals of blood pressure. The trend n resduals dsappears for ths model, and only the second category shows some evdence of lack of ft. Table 6.6 reports resduals for the lnear logt model, as reported by SAS. The Pearson resduals Ž Resch., devance resduals Ž Resdev., and standardzed Pearson resduals Ž StResch. show smlar results. Each s somewhat large n the second category. One relatvely large resdual s not surprsng, however. Wth many resduals, some may be large purely by chance. Here the FIGURE 6. Observed and predcted proportons of heart dsease for lnear logt model.
140 LOGISTIC REGRESSION DIAGNOSTICS 3 Ž. overall ft statstcs G s 5.9, X s 6.3 wth df s 6 do not ndcate problems. In analyzng resdual patterns, we should be cautous about attrbutng patterns to what mght be chance varaton from a model. Another useful graphcal dsplay for showng lack of ft compares observed and ftted proportons by plottng them aganst each other or by plottng both of them aganst explanatory varables. For the lnear logt model, Fgure 6. plots both the observed proportons and the estmated probabltes of heart dsease aganst blood pressure. The ft seems decent. Studyng resduals helps us understand ether why a model fts poorly or where there s lack of ft n a generally good-fttng model. The next example llustrates the second case Graduate Admssons Example Table 6.7 refers to graduate school applcatons to the 3 departments n the College of Lberal Arts and Scences at the Unversty of Florda durng the academc year. It cross-classfes applcant s gender Ž G., whether admtted Ž A., and department Ž D. to whch the prospectve students appled. We consder logt models wth A as the response varable. Let yk denote the number admtted and let k denote the probablty of admsson for gender n department k. Wetreat Yk 4 as ndependent bn Ž n k, k..other thngs beng equal, one would hope the admssons decson s ndependent of gender. However, the model wth no gender effect, gven the department, logtž. s q D, Ž. fts rather poorly G s 44.7, X s 40.9, df s 3. k k TABLE 6.7 Data Relatng Admsson to Gender and Department for Model wth No Gender Effect Females Males Std. Res Females Males Std. Res Dept Yes No Yes No (Fem,Yes) Dept Yes No Yes No (Fem,Yes) anth y0.76 lng astr math chem y0.7 phl clas y1.07 phys comm y0.63 pol y0.3 comp psyc y.7 engl rel geog roma geol y0.6 soc germ stat y0.01 hst y0.18 zool y1.76 lat Source: Data courtesy of James Booth.
141 4 BUILDING AND APPLYING LOGISTIC REGRESSION MODELS Table 6.7 also reports standardzed Pearson resduals for the number of females who were admtted for ths model. For nstance, the astronomy department admtted 6 females, whch was.87 standard devatons hgher than the model predcted. Each department has only a sngle nonredundant standardzed resdual, because of margnal constrants for the model. The model has ft s Ž y q y. ˆk 1k k rn qk, correspondng to an ndependence ft Ž ˆ s. n each partal table. Now, y y n s y y n Ž 1k ˆ k 1k 1kˆ1k 1k 1k y1k q y. rn s Ž n rn. y y Ž n rn. y syž y y n. k qk k qk 1k 1k qk k k kˆ k. Thus, standard errors of Ž y y n. and Ž y y n. 1k 1kˆ1k k kˆ k are dentcal. The standardzed resduals are dentcal n absolute value for males and females but of dfferent sgn. Astronomy admtted 3 males, and ther standardzed resdual was y.87; the number admtted was.87 standard devatons fewer than predcted. Ths s another advantage of standardzed over ordnary Pearson resduals. The model of ndependence n a partal table has df s 1. Only one bt of nformaton exsts about how the data depart from ndependence, yet the ordnary Pearson resdual for males need not equal the ordnary Pearson resdual for females. Departments wth large standardzed Pearson resduals reveal the reason for the lack of ft. Sgnfcantly more females were admtted than the model predcts n the astronomy and geography departments, and fewer n the psychology department. Wthout these three departments, the model fts Ž reasonably well G s 4.4, X s.8, df s 0.. For the complete data, addng a gender effect to the model does not Ž provde an mproved ft G s 4.4, X s 39.0, df s., because the departments just descrbed have assocatons n dfferent drectons and of greater magntude than other departments. Ths model has an ML estmate of 1.19 for the GA condtonal odds rato, the odds of admsson beng 19% hgher for females than males, gven department. By contrast, the margnal table collapsed over department has a GA sample odds rato of 0.94, the overall odds of admsson beng 6% lower for females. Ths llustrates Smpson s paradox Ž Secton.3.., the condtonal assocaton havng dfferent drecton than the margnal assocaton Influence Dagnostcs for Logstc Regresson Other regresson dagnostc tools are also helpful n assessng ft. These nclude plots of ordered resduals aganst normal percentles ŽHaberman 1973a. and analyses that descrbe an observaton s nfluence on parameter estmates and ft statstcs. Whenever a resdual ndcates that a model fts an observaton poorly, t can be nformatve to delete the observaton and reft the model to remanng ones. Ths s equvalent to addng a parameter to the model for that observaton, forcng a perfect ft for t. As n ordnary regresson, an observaton may be relatvely nfluental n determnng parameter estmates. The greater an observaton s leverage, the greater ts potental nfluence. The ft could be qute dfferent f an
142 LOGISTIC REGRESSION DIAGNOSTICS 5 observaton that appears to be an outler on y and has large leverage s deleted. However, a sngle observaton can have a more exorbtant nfluence n ordnary regresson than a sngle bnary observaton n logstc regresson, snce there s no bound on the dstance of y from ts expected value. Also, n Secton we observed that the GLM estmated hat matrx $ y1 1r ˆ ˆ ˆ 1r Hat s W XŽ X WX. X W depends on the ft as well as the model matrx X. For logstc regresson, n Secton 5.5. we showed that the weght matrx Wˆ s dagonal wth element wˆ s n Ž 1 y. ˆ ˆ for the n observatons at settng of predctors. Ponts that have extreme predctor values need not have hgh leverage. In fact, the leverage can be small f ˆ s close to 0 or 1. Several measures that descrbe the effect on parameter estmates and ft statstcs of removng an observaton from the data set are related algebracally to the observaton s leverage Ž Pregbon 1981; Wllams In logstc regresson, the observaton could be a sngle bnary response or a bnomal response for a set of subjects all havng the same predctor values. Influence measures for each observaton nclude: 1. For each model parameter, the change n the parameter estmate when the observaton s deleted. Ths change, dvded by ts standard error, s called Dfbeta.. A measure of the change n a jont confdence nterval for the parameters produced by deletng the observaton. Ths confdence nterval dsplacement dagnostc s denoted by c. 3. The change n X or G goodness-of-ft statstcs when the observaton s deleted. For each measure, the larger the value, the greater the nfluence. We llustrate them usng the lnear logt model wth blood pressure as a predctor for heart dsease n Table 6.5. Table 6.8 contans smple approxmatons Ždue to Pregbon for the Dfbeta measure for the coeffcent of blood pressure, the confdence nterval dagnostc c, the change n G, and the change n X. ŽThs s the square of the standardzed Pearson resdual, r.. All ther values show that deletng the second observaton has the greatest effect. Ths s not surprsng, as that observaton has the only relatvely large resdual. By contrast, Table 6.8 also contans the changes n X and G for deletng observatons n fttng the ndependence model. At the low and hgh ends of the blood pressure values, several changes are very large. However, these all relate to removng an entre bnomal sample at a blood pressure level nstead of removng a sngle subject s bnary observaton. Such subjectlevel deletons have lttle effect even for ths model. Wth contnuous or multple predctors, t can be nformatve to plot these dagnostcs, for nstance aganst the estmated probabltes. See Cook and
143 6 BUILDING AND APPLYING LOGISTIC REGRESSION MODELS TABLE 6.8 Dagnostc Measures for Logstc Regresson Models Ftted to Heart Dsease Data Blood Pearson Lkelhood-Rato Pearson Lkelhood-Rato a a Pressure Dfbeta c X Dff. G Dff. X Dff. G Dff y y y a Independence model; other values refer to model wth blood pressure predctor. Source: Data from Cornfeld 196. Wesberg 1999, Chap., Fowlkes 1987, and Landwehr et al for examples of useful dagnostc plots Summarzng Predctve Power: R and R-Squared Measures In ordnary regresson, R descrbes the proportonal reducton n varaton n comparng the condtonal varaton of the response to the margnal varaton. It and the multple correlaton R descrbe the power of the explanatory varables to predct the response, wth R s 1 for perfect predcton. Despte varous attempts to defne analogs for categorcal response models, no proposed measure s as wdely useful as R and R.Wepresent a few proposed measures n ths secton. For any GLM, the correlaton rž y, ˆ. between the observed responses y 4 and the model s ftted values 4 ˆ measures predctve power. For least squares regresson, ths s the multple correlaton between Y and the predctors. An advantage of the correlaton relatve to ts square s the appeal of workng on the orgnal scale and ts approxmate proportonalty to effect sze: For a small effect wth a sngle predctor, doublng the slope corresponds roughly to doublng the correlaton. Ths measure can be useful for comparng fts of dfferent models to the same data set. In logstc regresson, ˆ for a partcular model s the estmated probablty ˆ for bnary observaton. Table 6. shows rž y,. ˆ for a few models ftted to the horseshoe crab data. Wdth alone has r s 0.40, and addng color to the model ncreases r to The smpler model that uses color merely to ndcate whether a crab s dark does essentally as well, wth r s The complex model contanng color, spne condton, wdth, and all ther twoand three-way nteractons has r s Ths seems consderably hgher, but wth multple predctors the r estmates become more hghly based n estmatng the true correlaton. It can be msleadng to compare r values for models wth greatly dfferent df values. After a jackknfe adjustment desgned
144 LOGISTIC REGRESSION DIAGNOSTICS 7 to reduce bas, there s lttle dfference between r for ths overly complex model and the smpler model Ž Zheng and Agrest Lttle s lost and much s ganed by usng the smpler model. Another way to measure the assocaton between the bnary responses y 4 and ther ftted values 4 uses the proportonal reducton n squared error ˆ Ý yy ˆ 1 y, Ý Ž y y y. obtaned by usng nstead of y s Ý y rn as a predctor of y Ž Efron ˆ. Amemya Ž suggested a related measure that weghts squared devatons by nverse predcted varances. For logstc regresson, unlke normal GLMs, these and rž y, ˆ. need not be nondecreasng as the model gets more complex. Lke any correlaton-type measure, they can depend strongly on the range of observed values of explanatory varables. Other measures drectly use the lkelhood functon. Denote the maxmzed log lkelhood by LM for a gven model, LS for the saturated model, and L0 for the null model contanng only an ntercept term. Probabltes are no greater than 1.0, so log lkelhoods are nonpostve. As the model complexty ncreases, the parameter space expands, so the maxmzed log lkelhood ncreases. Thus, L F L F L F 0. The measure 0 M S L y L M 0 L y L S 0 Ž 6.3. falls between 0 and 1. It equals 0 when the model provdes no mprovement n ft over the null model, and t equals 1 when the model fts as well as the saturated model. A weakness s the log lkelhood s not an easly nterpretable scale. Interpretng the numercal value s dffcult, other than n a comparatve sense for dfferent models. For n ndependent Bernoull observatons, the maxmzed log lkelhood s n Ł n y 1yy Ý s1 log ˆ 1 y ˆ s y log ˆ q 1 y y log 1 y ˆ. s1 The null model gves s Ž Ý y. rn s y, sothat ˆ L0 s n yž log y. q Ž 1 y y. log Ž 1 y y.. The saturated model has a parameter for each subject and mples that
145 8 BUILDING AND APPLYING LOGISTIC REGRESSION MODELS s y for all. Thus, L s 0 and 6.3 smplfes to ˆ S L y L 0 M D s. L 0 McFadden Ž proposed ths measure. Wth multple observatons at each settng of explanatory varables, the data fle can take the grouped-data form of N bnomal counts rather than n Bernoull ndcators. The saturated model then has a parameter for each count. It gves N ftted proportons equal to the N sample proportons of success. Then L s nonzero and Ž 6.3. S takes a dfferent value than when calculated usng ndvdual subjects. For N bnomal counts, the maxmzed lkelhoods are related to the G goodness-of-ft statstc by G Ž M. s yž L y L.,soŽ 6.3. becomes M S G Ž 0. y G Ž M. D* s. G Ž 0. Goodman Ž 1971a. and Thel Ž dscussed ths and related partal assocaton measures. Wth grouped data D* can be large even when predctve power s weak at the subject level. For nstance, a model can ft much better than the null model even though ftted probabltes are close to 0.5 for the entre sample. In partcular, D* s 1 when t fts perfectly, regardless of how well one can predct ndvdual subject s responses on Y wth that model. Also, suppose that the populaton satsfes the gven model, but not the null model. As the sample sze n ncreases wth number of settngs N fxed, G Ž M. behaves lke a ch-squared random varable but G Ž. 0 grows unboundedly. Thus, D* 1 as n, and ts magntude tends to depend on n. Ths measure confounds model goodness of ft wth predctve power. Smlar behavor occurs for R n regresson analyses when calculated usng means of Y values Žrather than ndvdual subjects. at N dfferent x settngs. It s more sensble to use D for bnary, ungrouped data Summarzng Predctve Power: Classfcaton Tables and ROC Curves A classfcaton table cross-classfes the bnary response wth a predcton of whether y s 0or1.The predcton s ˆy s 1 when ˆ 0 and ˆy s 0 when ˆ F 0, for some cutoff 0. Most classfcaton tables use 0 s 0.5 and summarze predctve power by senstvty s PŽ ˆys 1 y s 1. and specfcty s PŽ ˆys 0 y s 0.
146 LOGISTIC REGRESSION DIAGNOSTICS 9 FIGURE 6.3 ROC curve for logstc regresson model wth horseshoe crab data. Ž Recall Sectons.1... Lmtatons of ths table are that t collapses contnuous predctve values ˆ nto bnary ones, the choce of 0 s arbtrary, and t s hghly senstve to the relatve numbers of tmes y s 1 and y s 0. A rece er operatng characterstc Ž ROC. curve s a plot of senstvty as a functon of Ž 1 y specfcty. for the possble cutoffs 0. Ths curve usually has a concave shape connectng the ponts Ž 0, 0. and Ž 1, 1.. The hgher the area under the curve, the better the predctons. The ROC curve s more nformatve than the classfcaton table, snce t summarzes predctve power for all possble 0. Fgure 6.3 shows how PROC LOGISTIC n SAS reports the ROC curve for the model for the horseshoe crabs usng wdth and color as predctors. The area under a ROC curve s dentcal to the value of another measure of predctve power, the concordance ndex. Consder all pars of observatons Ž, j. such that ys 1 and yjs 0. The concordance ndex c estmates the probablty that the predctons and the outcomes are concordant, the observaton wth the larger y also havng the larger ˆ Ž Harrell et al A value c s 0.5 means predctons were no better than random guessng. Ths corresponds to a model havng only an ntercept term and an ROC curve that s a straght lne connectng ponts Ž 0, 0. and Ž 1, 1.. For the horseshoe crab data, c s wth color alone as a predctor, 0.74 wth wdth alone, wth
147 30 BUILDING AND APPLYING LOGISTIC REGRESSION MODELS wdth and color, and 0.77 wth wdth and a dummy for whether a crab has dark color. ROC curves are a popular way of evaluatng dagnostc tests. Sometmes such tests have J ordered response categores rather than Žpostve, negatve.. The ROC curve then refers to the varous possble cutoffs for defnng a result to be postve. It plots senstvty aganst 1 y specfcty for the possble collapsngs of the J categores to a Ž postve, negatve. scale wsee Toledano and Gatsons Ž 1996.x. 6.3 INFERENCE ABOUT CONDITIONAL ASSOCIATIONS IN K TABLES The analyss of the graduate admssons data n Sectons 6..3 used the model of condtonal ndependence. Ths model s an mportant one n bomedcal studes that nvestgate whether an assocaton exsts between a treatment varable and a dsease outcome after controllng for a possbly confoundng varable that mght nfluence that assocaton. In ths secton we revew the test of condtonal ndependence as a logt model analyss for a K contngency table. We also present a test Ž Mantel and Haenszel that seems non-model-based but relates to the logt model. We llustrate usng Table 6.9, showng results of a clncal tral wth eght centers. The study compared two cream preparatons, an actve drug and a TABLE 6.9 Clncal Tral Relatng Treatment to Response for Eght Centers Response Center Treatment Success Falure Odds Rato varž n. 1 Drug Control 10 7 Drug Control 10 3 Drug Control Drug Control Drug Control Drug Control Drug Control Drug Control 6 1 Source: Betler and Lands k 11 k
148 INFERENCE ABOUT CONDITIONAL ASSOCIATIONS IN K TABLES 31 control, on ther success n curng an nfecton. Ths table llustrates a common pharmaceutcal applcaton, comparng two treatments on a bnary response wth observatons from several strata. The strata are often medcal centers or clncs; or they may be levels of age or severty of the condton beng treated or combnatons of levels of several control varables; or they may be dfferent studes of the same sort evaluated n a meta analyss Usng Logt Models to Test Condtonal Independence For a bnary response Y, we study the effect of a bnary predctor X, controllng for a qualtatve covarate Z. Let s PYs Ž 1 X s, Z s k. k. Consder the model logtž k. s q x q k Z, s 1,, k s 1,...,K, Ž 6.4. where x1s 1 and xs 0. Ths model assumes that the XY condtonal odds rato s the same at each category of Z, namely expž.. The null hypothess Ž ˆ. of XY condtonal ndependence s H 0: s 0. The Wald statstc s rse. The lkelhood-rato statstc s the dfference between G statstcs for the reduced model logtž. s q Z Ž 6.5. k and the full model. These tests are sensble when X has a smlar effect at each category of Z. They have df s 1. Alternatvely, snce the reduced model Ž 6.5. s equvalent to condtonal ndependence of X and Y, one could test condtonal ndependence usng a goodness-of-ft test of that model. That test has df s K when X s bnary. Ths corresponds to comparng model Ž 6.5. and the saturated model, whch permts 0 and contans XZ nteracton parameters. When no nteracton exsts or when nteracton exsts but t has mnor substantve mportance, t follows from results to be presented n Secton 6.4. that ths approach s less powerful, especally when K s large. However, when the drecton of the XY assocaton vares among categores of Z, tcan be more powerful. k 6.3. Cochran Mantel Haenszel Test of Condtonal Independence Mantel and Haenszel Ž proposed a non-model-based test of H 0: condtonal ndependence n K tables. Focusng on retrospectve studes of dsease, they treated response Ž column. margnal totals as fxed. Thus, n each partal table k of cell counts n 4 jk, ther analyss condtons on both the predctor totals Žn, n 4 and the response outcome totals Ž n, n. 1qk qk q1 k q k. The usual samplng schemes then yeld a hypergeometrc dstrbuton Ž for the frst cell count n11 k n each partal table. That count determnes n,n, n 4,gventhemargnal totals. 1 k 1 k k
149 3 BUILDING AND APPLYING LOGISTIC REGRESSION MODELS Under H, the hypergeometrc mean and varance of n 0 11k 11 k s EŽ n11 k. s n1qk nq1 krnqqk varž n11 k. s n1qk nqk nq1 k nq krnqqk Ž nqqk y 1.. Cell counts from dfferent partal tables are ndependent. The test statstc combnes nformaton from the K tables by comparng Ýk n11 k to ts null expected value. It equals are ÝkŽ n11 k y 11 k. CMH s. Ž 6.6. Ý varž n. k Ths statstc has a large-sample ch-squared null dstrbuton wth df s 1. When the odds rato 1npartal table k, weexpect that Ž XYŽ k. n11 k y. 11 k 0. When XYŽ k. 1nevery partal table or XYŽ k. 1neach table, Ý Ž n y. k 11 k 11 k tends to be relatvely large n absolute value. Ths test works best when the XY assocaton s smlar n each partal table. In ths sense t s smlar to the tests of H : s 0nlogt model Ž When the sample szes n the strata are moderately large, ths test usually gves smlar results. In fact, t s a score test Ž Secton of H : s 0nthat model Ž 0 Day and Byar Cochran Ž proposed a smlar statstc. He treated the rows n each table as two ndependent bnomals rather than a hypergeometrc. Cochran s statstc s Ž 6.6. wth varž n. replaced by 11 k 11 k varž n11 k. s n1qk nqk nq1 k nq krnqqk 3. Because of the smlarty n ther approaches, we call Ž 6.6. the Cochran Mantel Haenszel Ž CMH. statstc. The Mantel and Haenszel approach usng the hypergeometrc s more general n that t also apples to some cases n whch the rows are not ndependent bnomal samples from two populatons. Examples are retrospectve studes and randomzed clncal trals wth the avalable subjects randomly allocated to two treatments. In the frst case the column totals are naturally fxed. In the second, under the null hypothess the column margns are the same regardless of how subjects were assgned to treatments, and randomzaton arguments lead to the hypergeometrc n each table. Mantel and Haenszel Ž proposed Ž 6.6. wth a contnuty correcton. The P-value from the test then better approxmates an exact condtonal test Ž Secton but t tends to be conservatve. The CMH statstc generalzes for I J K tables Ž Secton Multcenter Clncal Tral Example For the multcenter clncal tral, Table 6.9 reports the sample odds rato for each table and the expected value and varance of the number of successes
150 INFERENCE ABOUT CONDITIONAL ASSOCIATIONS IN K TABLES 33 for the drug treatment Ž n. 11 k under H 0: condtonal ndependence. In each table except the last, the sample odds rato shows a postve assocaton. Thus, t makes sense to combne results wth CMH s 6.38, wth df s 1. There s consderable evdence aganst H Ž Ps Smlar results occur n testng H : s 0nlogt model Ž The model ˆ ft has s wth SE s The Wald statstc s 0.777r0.307 s 6.4 Ž P s The lkelhood-rato statstc equals 6.67 Ž P s CMH Test and Sparse Data* In summary, for logt model 6.4, CMH s the score statstc alternatve to the lkelhood-rato or Wald test of H : s 0. As n wth fxed K, the 0 tests have the same asymptotc ch-squared behavor under H 0.Anadvantage of CMH s that ts ch-squared lmt also apples wth an alternatve asymptotc scheme n whch K as n. The asymptotc theory for lkel- hood-rato and Wald tests requres the number of parameters Ž and hence K. to be fxed, so t does not apply to ths scheme. An applcaton of ths type s when each stratum has a sngle matched par of subjects, one n each group. Wth strata of matched pars, n1qk s nqk s 1 for each k. Then n s K, so K as n. Table 6.10 shows the data layout for ths stuaton. When both subjects n stratum k make the same response Žas n the frst case n Table 6.10., nq1 k s 0ornq k s 0. Gven the margnal counts, the nternal counts are then completely determned, and 11 k s n11 k and varž n. s 0. When the subjects make dfferng responses Žas n the second 11 k case., n s n s 1, so that s 0.5 and varž n. q1 k q k 11 k 11 k s 0.5. Thus, a matched par contrbutes to the CMH statstc only when the two subjects responses dffer. Let K * denote the number of the K tables that satsfy ths. Although each n11 k can take only two values, the central lmt theorem mples that Ýk n11 k s approxmately normal for large K *. Thus, the dstrbu- ton of CMH s approxmately ch-squared. Usually, when K grows wth n, each stratum has few observatons. There may be more than two observatons, such as case control studes that match several controls wth each case. Contngency tables wth relatvely few observatons are referred to as sparse. The nonstandard settng n whch K as n s called sparse-data asymptotcs. Ordnary ML estmaton then breaks down because the number of parameters s not fxed, nstead havng the same order as the sample sze. In partcular, an approxmate ch-squared dstrbuton holds for the lkelhood-rato and Wald statstcs for testng condtonal TABLE 6.10 Stratum Contanng a Matched Par Element Response Response of Par Success Falure Success Falure Frst Second
151 34 BUILDING AND APPLYING LOGISTIC REGRESSION MODELS ndependence only when the strata margnal totals generally exceed about 5 to 10 and K s fxed and small relatve to n Estmaton of Common Odds Rato It s more nformatve to estmate the strength of assocaton than to test hypotheses about t. When the assocaton seems stable among partal tables, t s helpful to combne the K sample odds ratos nto a summary measure of condtonal assocaton. The logt model Ž 6.4. mples homogeneous assocaton, s s s expž. XYŽ1. XYŽ K.. The ML estmate of the common odds rato s expž ˆ.. Other estmators of a common odds rato are not model-based. Woolf Ž proposed an exponentated weghted average of the K sample log odds ratos. Mantel and Haenszel Ž proposed that ÝkŽ n11 k n krnqqk. Ýk p11 k p k nqqk ˆ MH s s, Ž 6.7. Ý Ž n n rn. Ý p p n k 1 k 1 k qqk k 1 k 1 k qqk where pj ks njkrn qqk. Ths gves more weght to strata wth larger sample szes. It s preferred over the ML estmator when K s large and the data are sparse. The ML estmator ˆ of the log odds rato then tends to be too large n absolute value. For sparse-data asymptotcs wth only a sngle matched p par n each stratum, for nstance, ˆ. wths con ergence n probablty means that for any 0, PŽ ˆ y. 1as n ; see Problem 10.4.x Hauck Ž gave an asymptotc varance for logž ˆ. MH that apples for a fxed number of strata. In that case logž ˆ. MH s slghtly less effcent than the ML estmator ˆ unless s 0 Ž Tarone et al Robns et al. Ž derved an estmated varance that apples both for these standard asymptotcs wth large n and fxed K and for sparse asymptotcs n whch K s also large. Expressng ˆ s RrS s Ž Ý R. rž Ý S. MH k k k k wth Rks n11 k n krn qqk, ther dervaton showed that Ž log ˆ y log. MH s approxmately proportonal to Ž R y S.. They also showed that EŽ Ry S. s 0 and derved the varance of Ž R y S.. Ther result s 1 ˆ y1 ˆ log s Ý n Ž n q n. R R k MH qqk 11 k k k 1 y1 q n Ž n q n. S S Ý qqk 1 k 1 k k k 1 y1 q Ý n qqk Ž n 11 k q n k. S k q Ž n 1 k q n 1 k. R k. RS k
152 INFERENCE ABOUT CONDITIONAL ASSOCIATIONS IN K TABLES 35 For the eght-center clncal tral summarzed by Table 6.9, Ž r73 q qž 4 1. r13 ˆ MH s s.13. Ž r73 q qž 6. r13 For log ˆ s 0.758, wlog ˆ x MH ˆ MH s A 95% confdence nterval for the common odds rato s expž or Ž 1.18, Smlar results occur usng model Ž The 95% confdence nterval for expž. s expž , or Ž 1.19, 3.97., usng the Wald nterval, and Ž1.0, 4.0. usng the lkelhood-rato nterval. Although the evdence of an effect s consderable, nference about ts sze s rather mprecse. The odds of success may be as lttle as 0% hgher wth the drug, or they may be as much as four tmes as hgh. If the true odds ratos are not dentcal but do not vary drastcally, ˆ MH stll s a useful summary of the condtonal assocatons. Smlarly, the CMH test s a powerful summary of evdence aganst H 0: condtonal ndependence, as long as the sample assocatons fall prmarly n a sngle drecton. It s not necessary to assume equalty of odds ratos to use the CMH test Testng Homogenety of Odds Ratos The homogeneous assocaton condton XYŽ1. s s XYŽ K. for K tables s equvalent to logt model Ž A test of homogeneous assocaton s mplctly a goodness-of-ft test of ths model. The usual G and X test statstcs provde ths, wth df s K y 1. They test that the K y 1 parameters n the saturated model that are the coeffcents of nteracton terms wcross products of the dummy varable for x wth Ž K y 1. dummy varables for categores of Zx all equal 0. Breslow and Day Ž 1980, p. 14. proposed an alternatve large-sample test Ž Note For the eght-center clncal tral data n Table 6.9, G s 9.7 and X s 8.0 Ž df s 7. do not contradct the hypothess of equal odds ratos. It s reasonable to summarze the condtonal assocaton by a sngle odds rato Že.g., ˆ s.1. MH for all eght partal tables. In fact, even wth a small P-value n a test of homogeneous assocaton, f the varablty n the sample odds ratos s not substantal, a summary measure such as ˆ MH s useful. A test of homogenety s not a prerequste for ths measure or for testng condtonal ndependence Summarzng Heterogenety n Odds Ratos In practce, a predctor effect s often smlar from stratum to stratum. In multcenter clncal trals comparng a new drug to a standard, for example, f the new drug s truly more benefcal, the true effect s usually postve n each stratum.
153 36 BUILDING AND APPLYING LOGISTIC REGRESSION MODELS In strct terms, however, a model wth homogeneous effects s unrealstc. Frst, we rarely expect the true odds rato to be exactly the same n each stratum, because of unmeasured covarates that affect t. Breslow Ž dscussed modelng of the log odds rato usng a set of explanatory varables. Second, the model regards the strata effects Z 4 k as fxed effects, treatng them as the only strata of nterest. Often the strata are merely a samplng of the possble ones. Multcenter clncal trals have data for certan centers but many other centers could have been used. Scentsts would lke ther conclusons to apply to all such centers, not only those n the study. A somewhat dfferent logt model treats the true log odds ratos n partal tables as a random sample from a NŽ,. dstrbuton. Fttng the model yelds an estmated mean log odds rato and an estmated varablty about that mean. The nference apples to the populaton of strata rather than only those sampled. Ths type of model uses random effects n the lnear predctor to nduce ths extra type of varablty. In Chapter 1 we dscuss GLMs wth random effects, and n Secton we ft such a model to Table USING MODELS TO IMPROVE INFERENTIAL POWER When contngency tables have ordered categores, n Secton 3.4 we showed that tests that utlze the orderng can have mproved power. Testng ndependence aganst a lnear trend alternatve n a lnear logt model ŽSectons 5.3.4, and s a way to do ths. In ths secton we present the reason for these power mprovements Drected Alternatves Consder an I contngency table for I bnomal varates wth parameters 4. H :ndependence states 0 logtž. s. The ordnary X and G statstcs of Secton 3..1 refer to the general alternatve, logtž. s q, whch s saturated. They test H 0: 1s s s I s 0 n that model, wth df s Ž I y 1.. Ther general alternatve treats both classfcatons as nomnal. Denote these test statstcs as G Ž I. and X Ž I.. Recall that G Ž I. s the lkelhood-rato statstc G Ž M M. syž L y L for comparng the saturated model M wth the ndependence Ž I. 1 model M 0. Ordnal test statstcs refer to narrower, usually more relevant, alternatves. Wth ordered rows, an example s a test of H 0: s 0nthe lnear logt
154 USING MODELS TO IMPROVE INFERENTIAL POWER 37 model, logtž. s q x. The lkelhood-rato statstc G Ž I L. s G Ž I. y G Ž L. compares the lnear logt model and the ndependence model. When a test statstc focuses on a sngle parameter, such as n that model, t has df s 1. Now, df equals the mean of the ch-squared dstrbuton. A large test statstc wth df s 1 falls farther out n ts rght-hand tal than a comparable value of X Ž I. or G Ž I. wth df s Ž I y 1.. Thus, t has a smaller P-value Noncentral Ch-Squared Dstrbuton To compare power of G Ž I L. and G Ž I.,tsnecessary to compare ther nonnull samplng dstrbutons. When H0 s false, ther dstrbutons are approxmately noncentral ch-squared. Ths dstrbuton, ntroduced by R. A. Fsher n 198, arses from the followng constructon: If Z NŽ,1., s 1,...,, and f Z 1,...,Z are ndependent, ÝZ has the noncentral chsquared dstrbuton wth df s and noncentralty parameter s Ý. Its mean s q and ts varance s Ž q.. The ordnary Ž central. chsquared dstrbuton, whch occurs when H0 s true, has s 0. Let X, denote a noncentral ch-squared random varable wth df s and noncentralty. Afundamental result for ch-squared analyses s that, for fxed, P X, ncreases as decreases. That s, the power for rejectng H0 at a fxed -level ncreases as the df of the test decreases Ž e.g., Das Gupta and Perlman For fxed, the power equals when s 0, and t ncreases as ncreases. The nverse relaton between power and df suggests that focusng the noncentralty on a statstc havng a small df value can mprove power Increased Power for Narrower Alternatves Suppose that X has, at least approxmately, a lnear effect on logtw PŽ Ys 1.x. To test ndependence, t s then sensble to use a statstc havng strong power for that effect. Ths s the purpose of the tests based on the lnear logt model, usng the lkelhood-rato statstc G Ž I L., the Wald statstc z s rse, ˆ and the Cochran Armtage Ž score. statstc. When s G Ž I L. more powerful than G Ž I.? The statstcs satsfy G Ž I. s G Ž I L. q G Ž L., where G Ž L. tests goodness of ft of the lnear logt model. When the lnear logt model holds, G Ž L. has an asymptotc ch-squared dstrbuton wth
155 38 BUILDING AND APPLYING LOGISTIC REGRESSION MODELS df s I y ; then f 0, G Ž I. and G Ž I L. both have approxmate noncentral ch-squared dstrbutons wth the same noncentralty. Whereas df s I y 1 for G Ž I.,dfs 1 for G Ž I L.. Thus, G Ž I L. s more powerful, snce t uses fewer degrees of freedom. When the lnear logt model does not hold, G Ž I. has greater noncentralty than G Ž I L., the dscrepancy ncreasng as the model fts more poorly. However, when the model approxmates realty farly well, usually G Ž I L. s stll more powerful. That test s df value of 1 more than compensates for ts loss n noncentralty. The closer the true relatonshp s to the lnear logt, the more nearly G Ž I L. captures the same noncentralty as G Ž I., and the more powerful t s compared to G Ž I.. Tollustrate, Fgure 6.4 plots power as a functon of noncentralty when df s 1 and 7. When the noncentralty of a test havng df s 1satleast about half that of a test havng df s 7, the test wth df s 1smore powerful. The lnear logt model then helps detect a key component of an assocaton. As Mantel Ž argued n a smlar context, that a lnear regresson s beng tested does not mean that an assumpton of lnearty s beng made. Rather t s that test of a lnear component of regresson provdes power for detectng any progressve assocaton whch may exst. The mproved power results from sacrfcng power n other cases. The G Ž I. test can have greater power than G Ž I L. when the lnear logt model descrbes realty very poorly. The remark about the desrablty of focusng noncentralty holds for nomnal varables also. For nstance, consder testng condtonal ndepen- dence n K tables. One approach tests s 0nmodel Ž 6.4., usng df s 1. Another approach tests goodness of ft of model Ž 6.5., usng df s K FIGURE 6.4 Power and noncentralty, for df s 1 and df s 7, when s 0.05.
156 USING MODELS TO IMPROVE INFERENTIAL POWER 39 TABLE 6.11 Change n Clncal Condton by Degree of Infltraton Degree of Infltraton Proporton Clncal Change Hgh Low Hgh Worse Statonary Slght mprovement Moderate mprovement Marked mprovement Source: Reprnted wth permsson from the Bometrc Socety Cochran Secton When model 6.4 holds, both tests have the same noncentralty. Thus, the test of s 0smore powerful, snce s has fewer degrees of freedom Treatment of Leprosy Example Table 6.11 refers to an experment on the use of sulfones and streptomycn drugs n the treatment of leprosy. The degree of nfltraton at the start of the experment measures a type of skn damage. The response s the change n the overall clncal condton of the patent after 48 weeks of treatment. We use response scores y1, 0, 1,, 3. 4 The queston of nterest s whether subjects wth hgh nfltraton changed dfferently from those wth low nfltraton. Here, the clncal change response varable s ordnal. It seems natural to compare the mean change for the two nfltraton levels. Cochran Ž and Yates Ž noted that ths analyss s dentcal to a trend test treatng the bnary varable as the response. That test s senstve to lnearty between clncal change and the proporton of cases wth hgh nfltraton. The test G Ž I. s 7.8 Ž df s 4. does not show much evdence of assocaton Ž P s 0.1., but t gnores the row orderng. The sample proporton of hgh nfltraton ncreases monotoncally as the clncal change mproves. The test of H : s 0nthe lnear logt model has G Ž I L. 0 s 6.65, wth df s 1 Ž P s It gves strong evdence of more postve clncal change at the hgher level of nfltraton. Usng the orderng by decreasng df from 4 to 1 pays a strong dvdend. In addton, G Ž L. s 0.63 wth df s 3 suggests that the lnear trend model fts well Model Smoothng Improves Precson of Estmaton Usng drected alternatves can mprove not only test power, but also estmaton of cell probabltes and summary measures. In generc form, let be true cell probabltes n a contngency table, let p denote sample proportons, and let ˆ denote model-based ML estmates of.
157 40 BUILDING AND APPLYING LOGISTIC REGRESSION MODELS When satsfy a certan model, both ˆ for that model and p are consstent estmators of. The model-based estmator ˆ s better, as ts true asymptotc standard error cannot exceed that of p. Ths happens because of model parsmony: The unsaturated model, on whch ˆ s based, has fewer parameters than the saturated model, on whch p s based. In fact, modelbased estmators are also more effcent n estmatng functons gž. of cell probabltes. For any dfferentable functon g, ' Ž ˆ. ' asymp. var ng F asymp. var ng p. In Secton 14.. we prove ths result. It holds more generally than for categorcal data models Ž Altham Ths s one reason that statstcans prefer parsmonous models. In realty, of course, a chosen model s unlkely to hold exactly. However, when the model approxmates well, unless n s extremely large, ˆ s stll better than p. Although ˆ s based, t has smaller varance than p, and MSEŽ. MSEŽ p. ˆ when ts varance plus squared bas s smaller than varž p.. InSecton we showed that n two-way tables, ndependence- model estmates of cell probabltes can be better than sample proportons even when that model does not hold. 6.5 SAMPLE SIZE AND POWER CONSIDERATIONS* In any statstcal procedure, the sample sze n nfluences the results. Strong effects are lkely to be detected even when n s small. By contrast, detecton of weak effects requres large n. A study desgn should reflect the sample sze needed to provde good power for detectng the effect Sample Sze and Power for Comparng Two Proportons For test statstcs havng large-sample normal dstrbutons, power calculatons can use ordnary methods. To llustrate, consder a test comparng bnomal parameters and for two medcal treatments. An experment 1 plans ndependent samples of sze n s nr recevng each treatment. The researchers expect f 0.6 for each, and a dfference of at least 0.10 s mportant. In testng H : s, the varance of the dfference ˆ y ˆ n sample proportons s Ž 1 y. rž nr. q Ž 1 y. rž nr. 1 1 f Ž 4rn. s 0.96rn. Inpartcular, z s Ž ˆ 1y ˆ. y Ž 1y. 1r Ž 0.96rn. has approxmately a standard normal dstrbuton for and near
158 SAMPLE SIZE AND POWER CONSIDERATIONS 41 The power of an -level test of H 0 s approxmately ˆ y ˆ 1 P G z r. 1r Ž 0.96rn. When y s 0.10, for s 0.05, ths equals 1 P ˆ y ˆ y y 0.10Ž nr0.96 1r. Ž 0.96rn. 1 1r qp ˆ y ˆ y 0.10 y1.96 y 0.10Ž nr0.96 1r. Ž 0.96rn. 1 1r s P z 1.96 y 0.10 nr0.96 1r 1r q P z y1.96 y 0.10Ž nr r 1r s 1 y 1.96 y 0.10 nr0.96 q y1.96 y 0.10 nr0.96, where s the standard normal cdf. The power s approxmately 0.11 when n s 50 and 0.30 when n s 00. It s not easy to attan sgnfcance when effects are small and the sample s not very large. Fgure 6.5 shows how the power ncreases n n when y s 0.1. By contrast, t shows how the 1 power mproves when 1y s 0.. For a gven PŽ type I error. s and PŽ type II error. s Ž and hence power s 1 y., one can determne the sample sze needed to attan those FIGURE 6.5 Approxmate power for testng equalty of proportons, wth true values near mddle of range and s 0.05.
159 4 BUILDING AND APPLYING LOGISTIC REGRESSION MODELS values. A study usng n s n 1 requres approxmately n s n s Ž z q z. Ž 1 y. q Ž 1 y. rž y.. 1 r For a test wth s 0.05 and s 0.10 when and are truly about and 0.70, n s n s 473. Ths formula also provdes the sample szes needed 1 for a comparable confdence nterval for y. Wth about 473 subjects n 1 each group, a 95% confdence nterval has only a 0.10 chance of contanng 0 when actually, 1s 0.60 and s Ths sample-sze formula s approxmate and may underestmate slghtly the actual values requred. It s adequate for most practcal work, though, n whch only rough conjectures are avalable for and. Fless Ž showed more precse formulas Sample Sze Determnaton n Logstc Regresson Consder now the model logtw Ž x.x s q x, s 1,...,n, nwhch x s quanttatve. wwe use so as not to confuse wth s PŽ type II error.. x The sample sze needed to acheve a certan power for testng H 0: s 0 depends on the varance of. ˆ Ths depends on Ž x.4,andformulas for n use a guess for ˆ s Ž x. and the dstrbuton of X. The effect sze s the log odds rato comparng Ž x. to Ž x q s. x, the probablty for a standard devaton above the mean of x. For a one-sded test when X s approxmately normal, Hseh Ž derved where ˆ ˆ n s z q z exp y r4 Ž 1q. rž., s 1 q Ž 1 q. expž 5 r4. r 1 q expž y r4.. The value n decreases as ˆ 0.5 and as ncreases. We llustrate for modelng the effect of x s cholesterol level on the probablty of severe heart dsease for a populaton for whch that probablty at an average level of cholesterol s about Researchers want the test to be senstve to a 50% ncrease n ths probablty, for a standard devaton ncrease n cholesterol. The odds of severe heart dsease at the mean cholesterol level equal 0.08r0.9 s 0.087, and the odds one standard devaton above the mean equal 0.1r0.88 s The odds rato equals 0.136r0.087 s 1.57, and s logž s For s 0.05 and s 0.10, s and n s Sample Sze n Multple Logstc Regresson A multple logstc regresson model requres larger n to detect effects. Let R denote the multple correlaton between the predctor X of nterest and the
160 SAMPLE SIZE AND POWER CONSIDERATIONS 43 others n the model. The formula for n above dvdes by Ž1 y R..Inthat formula, ˆ s evaluated at the mean of all the explanatory varables, and the odds rato refers to the effect of X at the mean level of the other predctors. Consder the example n Secton 6.5. when blood pressure s also a predctor. If the correlaton between cholesterol and blood pressure s 0.40, we need n f 61rw1 y Ž x s 79. These formulas provde, at best, rough ndcatons of sample sze. Most applcatons have only a crude guess for ˆ and R, and X may be far from normally dstrbuted. For other work on ths problem, see Hseh et al. Ž and Whttemore Ž Power for Ch-Squared Tests n Contngency Tables When hypotheses are false, squared normal and X and G statstcs have large-sample noncentral ch-squared dstrbutons Ž Secton Suppose that H0 s equvalent to model M for a contngency table. Let denote the true probablty n cell, and let Ž M. denote the value to whch the ML estmate for model M converges, where Ý s Ý Ž M. ˆ s 1. For a multnomal sample of sze n, the noncentralty parameter for X equals y Ž M. s n Ý. Ž 6.8. Ž M. Ths has the same form as X, wth n place of the sample proporton p and M n place of ˆ. The noncentralty parameter for G equals s ný log. Ž 6.9. Ž M. TABLE 6.1 Power of Ch-Squared Test for s 0.05 Noncentralty df Source: Reprnted wth permsson from G. E. Haynam, Z. Govndarajulu, and F. C. Leone, n Selected Tables n Mathematcal Statstcs, eds. H. L. Harter and D. B. Owen ŽChcago: Markham,
161 44 BUILDING AND APPLYING LOGISTIC REGRESSION MODELS When H s true, all s Ž M. 0. Then, for ether statstc, s 0 and the central ch-squared dstrbuton apples. To determne the approxmate power for a ch-squared test wth df s, Ž. 1 choose a hypothetcal set of true values 4, Ž. calculate Ž M.4 by fttng to 4 the model M for H, Ž. 0 3 calculate the noncentralty parameter Ž. w, and 4 calculate P X Ž.x,. Table 6.1 shows an excerpt from a table of noncentral ch-squared probabltes for step 4 wth s Power for Testng Condtonal Independence We use an example based on one n O Bren Ž A standard fetal heart rate montorng test predcts whether a fetus wll requre nonroutne care followng delvery. The standard test has categores Ž worrsome, reassurng.. The response Y s whether the newborn requred some nonroutne medcal care durng the frst week after brth Ž 1 s yes, 0 s no.. A new fetal heart rate montorng test s developed, havng categores Žvery worrsome, somewhat worrsome, reassurng.. A physcan plans to study whether ths new test can help make predctons about the outcome; that s, gven the result of the standard test, s there an assocaton between the response and the result of the new test? A relevant statstc tests the effect of the new montorng test n the logt model havng the new test Ž N. and standard test Ž S. as qualtatve predctors. To help select n, astatstcan asks the physcan to conjecture about the jont dstrbuton of the explanatory varables, wth questons such as What proporton of the cases do you thnk wll be scored reassurng by both tests? For each NS combnaton, the physcan also guessed PYs Ž 1.. Table 6.13 shows one scenaro for margnal and condtonal probabltes. These yeld a jont dstrbuton 4 jk from ther product, such as s for the proporton of cases judged worrsome by the standard test and very worrsome by the new test and requrng nonroutne medcal care. These jont probabltes yeld ftted probabltes Ž M. and Ž M. 0 1 for the null and alternatve logt models. ŽOne can get these by enterng 4 n jk TABLE 6.13 Scenaro for Power Computaton Jont Standard New Probablty P Ž nonroutne care. Worrsome Very worrsome Somewhat worrsome Reassurng Reassurng Very worrsome Somewhat worrsome Reassurng Source: Reprnted wth permsson from O Bren 1986.
162 PROBIT AND COMPLEMENTARY LOG-LOG MODELS 45 percentage form as counts n software for logstc regresson, ft the relevant model, and dvde the ftted counts by 100 to get the ftted jont probabltes.. The lkelhood-rato test comparng these models has noncentralty Ž 6.9. wth Ž M. playng the role of and Ž M. playng the role of Ž M For the scenaro n Table 6.13, the noncentralty equals n, wth df s. For n s 400, 600, and 1000, the approxmate powers when s 0.05 are 0.35, 0.49, and Ths scenaro predcts 64% of the observatons to occur at only one combnaton of the factors. The lack of dsperson for the factors weakens the power Effects of Sample Sze on Model Selecton and Inference The effects of sample sze suggest some cautons for model selecton. For small n, the most parsmonous model accepted n a goodness-of-ft test may be qute smple. By contrast, larger samples usually requre more complex models to pass goodness-of-ft tests. Then, some effects that are statstcally sgnfcant may be weak and substantvely unmportant. Wth large n t may be adequate to use a model that s smpler than models that pass goodnessof-ft tests. An analyss that focuses solely on goodness-of-ft tests s ncomplete. It s also necessary to estmate model parameters and descrbe strengths of effects. These remarks merely reflect lmtatons of sgnfcance testng. Null hypotheses are rarely true. Wth large enough n, they wll be rejected. A more relevant concern s whether the dfference between true parameter values and null hypothess values s suffcent to be mportant. Many methodologsts overemphasze testng and underutlze estmaton methods such as confdence ntervals. When the P-value s small, a confdence nterval specfes the extent to whch H0 may be false, thus helpng us determne whether rejectng t has practcal mportance. When the P-value s not small, the confdence nterval ndcates whether some plausble parameter values are far from H 0.Awde confdence nterval contanng the H0 value ndcates that the test had weak power at mportant alternatves. 6.6 PROBIT AND COMPLEMENTARY LOG-LOG MODELS* For bnary responses, n ths secton we dscuss two alternatves to logt models. Lke the logt model, these models have form 4.8, Ž x. s Ž q x. Ž for a contnuous cdf. The followng argument motvates ths class Tolerance Motvaton for Bnary Response Models In toxcology, bnary response models descrbe the effect of dosage of a toxn on whether a subject des. The tolerance dstrbuton provdes justfcaton for
163 46 BUILDING AND APPLYING LOGISTIC REGRESSION MODELS model Ž Let x denote the dosage level. For a randomly selected subject, let Y s 1fthe subject des. Suppose that the subject has tolerance T for the dosage, wth Ž Y s 1. equvalent to Ž T F x.. For nstance, an nsect survves f the dosage x s less than T and des f the dosage s at least T. Tolerances vary among subjects, and let FŽ. ts PŽ TF t.. For fxed dosage x, the probablty a randomly selected subject des s Ž x. s PŽ Ys 1 X s x. s PŽ TF x. s FŽ x.. That s, the approprate bnary model s the one havng the shape of the cdf F of the tolerance dstrbuton. Let denote the standard cdf for the famly to whch F belongs. A common standardzaton uses the mean and standard devaton of T, so that Ž x. s FŽ x. s Ž x y. r. Then, the model has form x s q x Probt Models Toxcologcal experments often measure dosage as the log concentraton Ž Blss Often, the tolerance dstrbuton for the dosage s approxmately NŽ,. for unknown and.if F s the NŽ,. cdf, then Ž x. has the form Ž x. s Ž q x., where s the standard normal cdf, sy r and s 1r.InGLM form, y1 Ž x. s q x Ž s the probt model. The probt lnk functon s. Whereas the cdf maps the real lne onto the Ž 0, 1. probablty scale, the nverse cdf maps the Ž 0, 1. scale for Ž x. onto the real lne values for lnear predctors n bnary response models. The response curve for Ž x. wor for 1 y Ž x., when 0x has the appearance of the normal cdf wth mean sy r and standard devaton s 1r. Snce 68% of the normal densty falls wthn a standard devaton of the mean, 1r s the dstance between x values where Ž x. s 0.16 or 0.84 and where Ž x. s The rate of change n Ž x. s Ž x. r x s Ž q x., where Ž. s the standard normal densty functon. The rate s hghest when q x s 0 e., at x sy r., where t equals rž. 1r s Ž for s At that pont, Ž x. s. By comparson, n logstc regresson wth parameter, the curve for Ž x. s a logstc cdf wth standard devaton r ' 3.Its rate of change n Ž x. 1 at x sy r s 0.5. The rates of change where Ž x. s are the same for the cdf s correspondng to the probt and logstc curves when the logstc s 0.40r0.5 s 1.6 tmes the probt. The standard devatons are the same when the logstc s r' 3 s 1.8 tmes the probt. When both y1 Ž.
164 PROBIT AND COMPLEMENTARY LOG-LOG MODELS 47 models ft well, parameter estmates n logstc regresson are about 1.6 to 1.8 tmes those n probt models. The lkelhood equatons that Ž 4.4. showed for bnomal regresson models apply to probt models Ž see also Problem One can solve them usng the Fsher scorng algorthm for GLMs Ž Blss 1935, Fsher 1935b.. Newton Raphson yelds the same ML estmates but slghtly dfferent standard errors. For the nformaton matrx nverted to obtan the asymptotc covarance matrx, Newon Raphson uses observed nformaton, whereas Fsher scorng uses expected nformaton. These dffer for bnary lnks other than the logt Beetle Mortalty Example Table 6.14 reports the number of beetles klled after 5 hours of exposure to gaseous carbon dsulfde at varous concentratons. Fgure 6.6 plots Ž as dots. the proporton klled aganst the log concentraton. The proporton jumps up at about x s 1.8, and t s close to 1 above there. The ML ft of the probt model s y1 ˆ Ž x. sy34.96 q x. For ths ft, ˆ Ž x. s 0.5 at x s 34.96r19.74 s The ft corresponds to a normal tolerance dstrbuton wth s 1.77 and s 1r19.74 s The curve for ˆ Ž x. s that of a NŽ1.77, cdf. At dosage x wth n beetles, n Ž x. ˆ s the ftted count for death, s 1,...,8.Table 6.14 reports the ftted values and Fgure 6.6 shows the ft. The table also shows ftted values for the lnear logt model. These models ft smlarly and rather poorly. The G goodness-of-ft statstc equals 11.1 for the logt model and 10.0 for the probt model, wth df s 6. TABLE 6.14 Beetles Klled after Exposure to Carbon Dsulfde Number Number Ftted Values Log Dose of Beetles Klled Comp. Log-Log Probt Logt Source: Data reprnted wth permsson from Blss 1935.
165 48 BUILDING AND APPLYING LOGISTIC REGRESSION MODELS FIGURE 6.6 Proporton of beetles klled versus log dosage, wth fts of probt and complementary log-log models Complementary Log-Log Lnk Models The logt and probt lnks are symmetrc about 0.5, n the sense that To llustrate, lnk Ž x. sylnk 1 y Ž x.. logt Ž x. s log Ž x. rž 1 y Ž x.. sylog Ž 1 y Ž x.. r Ž x. sylogt 1 y Ž x.. Ths means that the response curve for Ž x. has a symmetrc appearance about the pont where Ž x. s 0.5, so Ž x. approaches 0 at the same rate t approaches 1. Logt and probt models are napproprate when ths s badly volated. The response curve Ž x. s 1 y exp yexpž q x. Ž 6.1. has the shape shown n Fgure 6.7. It s asymmetrc, x approachng 0 farly slowly but approachng 1 qute sharply. For ths model, log ylogž 1 y Ž x.. s q x. The lnk for ths GLM s called the complementary log-log lnk, snce the log-log lnk apples to the complement of x.
166 PROBIT AND COMPLEMENTARY LOG-LOG MODELS 49 FIGURE 6.7 Model wth complementary log log lnk. To nterpret model 6.1, we note that at x and x, 1 so that log ylog 1 y Ž x. y log ylog 1 y Ž x. s Ž x y x., 1 1 log 1 y Ž x. log 1 y Ž x. 1 s exp Ž x y x. 1 and exp w Ž x yx 1.x 1 y Ž x. s 1 y Ž x.. 1 For x y x s 1, the complement probablty at x probablty at x rased to the power expž. 1. A related model to Ž 6.1. s 1 equals the complement Ž x. s exp yexpž q x.. Ž For t, Ž x. approaches 0 sharply but approaches 1 slowly. As x ncreases, the curve s monotone decreasng when 0, and monotone ncreasng when 0. In GLM form t uses the log-log lnk log ylogž Ž x.. s q x. When the complementary log-log model holds for the probablty of a success, the log-log model holds for the probablty of a falure. Model Ž wth log-log lnk s the specal case of Ž wth cdf of the extreme alue Ž or Gumbel. dstrbuton. The cdf equals s expyexp y x y a rb 4 FŽ x.
167 50 BUILDING AND APPLYING LOGISTIC REGRESSION MODELS for parameters b 0 and y a. It has mean a q 0.577b and standard devaton br' 6.Models wth log-log lnks can be ftted usng the Fsher scorng algorthm for GLMs Beetle Mortalty Example Revsted For the beetle mortalty data Ž Table 6.14., the complementary log-log model has ML estmates ˆ sy39.5 and ˆ s.01. At dosage x s 1.7, the ftted probablty of survval s 1 y ˆ Ž x. s expyexpwy39.5 q.01ž 1.7.x4 s 0.885, whereas at x s 1.8 t s 0.33 and at x s 1.9 t s 5 10 y5. The probablty of survval at dosage x q 0.1 equals the probablty at dosage x rased to the power exp s For nstance, 0.33 s Ž Table 6.14 shows the ftted values and Fgure 6.6 shows the ft. They are Ž close to the observed death counts G s 3.5, df s 6.. The ft seems adequate. Aranda-Ordaz Ž and Stukel Ž dscussed these data further. 6.7 CONDITIONAL LOGISTIC REGRESSION AND EXACT DISTRIBUTIONS* ML estmators of logstc model parameters work best when the sample sze n s large compared to the number of parameters. When n s small or when the number of parameters grows as n does, mproved nference results usng condtonal maxmum lkelhood. In ths secton we present ths approach and n Secton 10. apply t wth matched case control studes Condtonal Lkelhood Ths condtonal lkelhood approach elmnates nusance parameters by condtonng on ther suffcent statstcs. Ths generalzes Fsher s method for tables Ž Secton The condtonal lkelhood refers to a condtonal dstrbuton defned for potental samples that provde the same nformaton about the nusance parameters that occurs n the observed sample. We begn wth a general exposton and then dscuss specal cases. Let y denote the bnary response for subject, s 1,...,N. ŽFor now, each y refers to a sngle tral, so n s 1.. Let xj be the value of predctor j for that subject, j s 1,..., p. The model s p exp yž q Ý js1 jxj. p 1 q exp q Ý js1 jxj PŽ Y s y. s, Ž where substtutng y s 1 gves the usual expresson, such as Ž Here, we explctly separate the ntercept from the coeffcents of the p predctors. For N ndependent observatons, p exp Ýy q Ýjs1 Ýyx j j 1 1 N N p Ł 1 q expž q Ý js1 jxj. P Y s y,...,y s y s. 6.15
168 CONDITIONAL LOGISTIC REGRESSION AND EXACT DISTRIBUTIONS 51 From ths lkelhood functon, the suffcent statstc for j s Ýyx j, j s 1,..., p. The suffcent statstc for s Ýy, the total number of successes. Usually, some parameters refer to effects of prmary nterest. Others may be there to adjust for relevant effects, but ther values are not of specal nterest. We can elmnate the latter parameters from the lkelhood by condtonng on ther suffcent statstcs. We llustrate by elmnatng. ŽIn Secton we show that for models for matched case control studes, ntercept terms cause dffcultes wth nference about the prmary parameters, so t can be helpful to elmnate them.. Snce the suffcent statstc for s Ýy,wecondton on Ýy. Suppose that Ýys t. Denote the condtonal reference set of samples havng the same value of Ýy as observed by ½ 5 SŽ t. s Ž y*,..., 1 y* N. : Ý y * s t. 4 Wth y such that Ý y s t, the condtonal lkelhood functon equals P Ž Y s y,...,y s y. 1 1 N N P Y1s y 1,...,YN s yn Ý ys t s ž / Ý PŽ Y s y *,...,Y s y *. SŽt. 1 1 N N p p exp t q Ýjs1 Ýyx j j Ł 1 q exp q Ý js1 jxj s p p ÝSŽt. exp t q Ýjs1 Ýy *x j j rł 1 q exp q Ý js1 jxj p exp Ýjs1 Ýyx j j s. p Ý exp Ý Ý y *x SŽt. js1 j j Ths does not depend on. A condtonal lkelhood s used just lke an ordnary lkelhood. For the parameters n t, ther condtonal ML estmates are the values maxmzng t. Calculated usng teratve methods, the estmators are asymptotcally normal wth covarance matrx equal to the negatve nverse of the matrx of second partal dervatves of the condtonal log lkelhood Small-Sample Condtonal Inference for Logstc Regresson For small samples, nference for a parameter uses the condtonal dstrbuton after elmnatng all other parameters. Wth t, one can calculate probabltes such as P-values exactly rather than wth crude approxmatons ŽCox For nstance, suppose that nference focuses on n model Ž p. To elmnate other parameters, we condton on ther suffcent statstcs T s Ý yx, j s 0,..., p y 1 Ž where x s 1.Wth. an argument lke that j j 0
169 5 BUILDING AND APPLYING LOGISTIC REGRESSION MODELS just shown, one obtans the condtonal dstrbuton where Ž 1 1 N N j j. P Y s y,...,y s y T s t, j s 0,..., p y 1 Ý exp Ž Ý y *x. SŽt,...,t. Ž p p. exp Ýyx p p exp tp p s s, Ý exp t * SŽt 0,...,t py1. p p 0 py1 ½ 5 Ý S t,...,t s Ž y*,..., y*. : y *x s t, j s 0,..., p y 1. 0 py1 1 N j j Ths depends only on p. Inference for p uses the condtonal dstrbuton of ts suffcent statstc, T s Ý yx, gven the others. Let ct,...,t Ž, t. p p 0 py1 denote the number of data vectors n St,...,t 0 py1 for whch Tp s t. The condtonal dstrbuton of T s p c t 0,...,t py1, t exp t p PŽ T s t p Tjs t j, j s 0,..., p y 1. s, Ý c t,...,t, u exp u u 0 py1 p Ž where the denomnator summaton refers to the possble values u of T p. For testng H 0: p s 0, the condtonal dstrbuton smplfes. For H a: 0 and observed T s t, the exact condtonal P-value s p p obs Ýt G t c t 0,...,t py1,t obs PŽ T s t T s t, j s 0,..., p y 1. s, Ý c t,...,t, u Ý p j j tgt obs u 0 py1 the proporton of data confguratons n the condtonal set that have the suffcent statstc for p at least as large as observed. Implementng ths nference requres calculatng ct,...,t Ž, u.4 0 py1. For all but the smplest problems, computatons are ntensve and requre specalzed software Že.g., LogXact of Cytel Software or PROC LOGISTIC n SAS.. In the remander of ths secton we consder specal cases for small-sample nference Small-Sample Condtonal Inference for Contngency Tables Frst, consder logstc regresson wth a sngle predctor x, logt PŽ Y s 1. s q x, s 1,...,N, Ž when x takes only two values. The model apples to tables, where xs 1 denotes row 1 and xs 0 denotes row. The suffcent statstc for
170 CONDITIONAL LOGISTIC REGRESSION AND EXACT DISTRIBUTIONS 53 s Ýy, whch s the frst column total. The suffcent statstc for s T s Ýyx, whch smplfes to the number of successes n the frst row. Equvalently, the suffcent statstcs for the model are the numbers of successes n the two rows. Let s1 and s denote these bnomal varates. The row totals n1 and n are ther ndces. To elmnate, wecondton on s s s1q s, the frst column total. Snce N s n1q n s fxed, so then s the other column margnal total. Fxng both sets of margnal totals yelds hypergeometrc probabltes for s1 that depend only on wsee Ž 3.0., dentfyng s expž.x.inthat case the condtonal ž / ž t / t0 y t n N y n dstrbuton satsfes Ž wth cž t, t. s and wth t s s and t s s. The resultng exact condtonal test that s 0sFsher s exact test for 1 tables Secton Small-Sample Condtonal Inference for Lnear Logt Model The lnear logt model, logtž. s q x, apples to I tables wth ordered rows. We dscussed ths model n Secton For t, the data y 4 are I ndependent bnž n,.4 counts, wth fxed row totals n 4.Condton- ng on s s Ý y and hence the column totals yelds a condtonal lkelhood free of. Exact nference about uses ts suffcent statstc, T s Ý xy. From Ž ts dstrbuton has the form ž / Ý c s, u e cž s, t. e t P Ts t Ý y s s; s u. Ž n ž y / Here, cs, Ž u. equals the sum of Ł for all tables wth the gven margnal totals that have T s u. When s 0, the cell counts have the multple hypergeometrc dstrbuton Ž To test ths, orderng the tables wth the gven margns by T s equvalent to orderng them by the Cochran Armtage statstc ŽSecton Thus, ths test for the lnear logt model s an exact trend test. In Secton we appled the Cochran Armtage test to Table 5.3 on maternal alcohol consumpton and nfant malformaton. Even though n s 3,573, the table s hghly unbalanced, wth both very small and very large counts. It s safer to use small-sample methods. For the exact condtonal trend test wth the same scores, the one-sded P-value for H : 0 s a The two-sded P-value s 0.017, reflectng asymmetry of the condtonal dstrbuton, gven the margnal counts. Ths s not much dfferent from the two-sded P-value of obtaned wth the large-sample Cochran Armtage test. u
171 54 BUILDING AND APPLYING LOGISTIC REGRESSION MODELS Small-Sample Tests of Condtonal Independence n K Tables For K tables n 4 jk,the Cochran Mantel Haenszel test uses Ýkn 11 k. For logt model Ž 6.4., ths s the suffcent statstc for, the effect of X. To conduct a small-sample test of s 0, one needs to elmnate the other model parameters. Constructng the lkelhood reveals that the suffcent statstcs for Z 4 are the column margnal totals n 4 k qjk n each partal table. When X and Z are predctors, t s natural to treat the numbers of trals n 4 qk at each combnaton of XZ values as fxed. Thus, exact nference about condtons on the row and column totals n each stratum. Condtonal on the strata margns, an exact test uses Ýk n 11 k. Hypergeo- metrc probabltes occur n each partal table for the ndependent null dstrbutons of n, k s 1,...,K 4 11 k. The product of the K mass functons gves the null jont dstrbuton of n, k s 1,...,K 4. wths s Ž k below, settng s 1. x Ths determnes the null dstrbuton of Ýk n 11 k. For H a: 0, the P-value s the null probablty that Ýk n11 k s at least as large as observed, for the fxed strata margnal totals. Mehta et al. Ž presented a fast algorthm. The test smplfes to Fsher s exact test when K s Promoton Dscrmnaton Example Table 6.15 refers to U.S. government computer specalsts of smlar senorty consdered for promoton. The table cross-classfes promoton decson by employee s race, consdered for three separate months. We test condtonal ndependence of promoton decson and race, or H : s 0, n model Ž The table contans several small counts. The overall sample sze s not small Ž n s 74., but one margnal count Ž collapsng over month of decson. equals zero, so we mght be wary of usng the CMH test. For H : 0 e., odds rato 1,. a the probablty of promoton was lower for black employees than for whte employees. For the margns of the partal tables n Table 6.15, n111 can range between 0 and 4, n11 can range between 0 and 4, and n113 can range between 0 and. The total Ýk n11 k can range between 0 and 10. The sample data are the most extreme possble TABLE 6.15 Promoton Decsons by Race and by Month July August September Promotons Promotons Promotons Race Yes No Yes No Yes No Black Whte Source: J. Gastwrth, Statstcal Reasonng n Law and Publc Polcy ŽSan Dego, CA: Academc Press, 1988., p. 66.
172 CONDITIONAL LOGISTIC REGRESSION AND EXACT DISTRIBUTIONS 55 result n each case. The observed Ýk n11 k s 0, and the P-value s the null probablty of ths outcome. Software provdes P s A two-sded P- value, based on summng the probabltes of all tables no more lkely than the observed table, equals Exact Condtonal Estmaton and Comparson of Odds Ratos For model Ž 6.4. of homogeneous assocaton n K tables, the ordnary ML estmator of the odds rato s expž. behaves poorly for sparse-data asymptotcs. The condtonal ML estmator maxmzes the condtonal lkelhood functon after reducng the parameter space by condtonng on suffcent statstcs for the other parameters ŽAndersen 1970; Brch 1964b.. For cell counts n 4,gvenn, n 4 jk qk qjk for all k, thecondtonal probabl- ty mass functon that Ž n s t,...,n s t K K s the product of the functons Ž 3.0. from the separate strata, or Ł ž /ž / už /ž q1 k / n1qk nqqk y n1qk t k tk nq1 ky tk PŽ n s t 11 k k n 1qk, n q1 k, n qqk ;. s Ł. n n y n 1qk u Ý u n y u k k qqk 1qk Ž The condtonal ML estmator ˆ maxmzes Ž Lke the Mantel Haenszel estmator ˆ MH, t has good propertes for both standard and sparse-data asymptotc cases Ž Andersen 1970; Breslow 1981., snce the number of parameters does not change as K does. It can be slghtly more effcent than ˆ MH, except when s 1.0, where they are equally effcent, or for matched pars, where they are dentcal Ž Breslow The condtonal dstrbuton Ž propagates one for Ýk n 11 k, whch s used to test H 0: s 0 for an arbtrary value. Then, a 95% confdence nterval for conssts of all 0 for whch the P-value exceeds Such an nterval s guaranteed to have at least the nomnal coverage probablty ŽGart 1970; Km and Agrest 1995; Mehta et al Ths extends the nterval for a sngle table Ž Secton For the promoton dscrmnaton case Ž Table 6.15., Ýk n11 k s 0, so the lower bound of any confdence nterval for should be 0. For the generalzaton to several strata of Cornfeld s tal-method nterval, StatXact reports a 95% confdence nterval of Ž 0, Zelen Ž presented a small-sample test of homogenety of the odds ratos. See Agrest Ž 199. for dscusson of ths and other small-sample methods for contngency tables.
173 56 BUILDING AND APPLYING LOGISTIC REGRESSION MODELS TABLE 6.16 Example for Exact Condtonal Logstc Regresson Length Cases of Sample a Cephalexn a Age a of Stay Darrhea Sze a See the text for an explanaton of 0 and 1. Source: Based on study by E. Jaffe and V. Chang, Cornell Medcal Center, reported n the Manual for LogXact Ž Cambrdge, MA: CYTEL Software, 1999., p Darrhea Example The fnal example deals wth a larger number of varables. Table 6.16 refers to 493 patents havng stays n a hosptal. The response s whether they suffered an acute form of darrhea durng ther stay. The three predctors are age Ž 1 for over 50 years old, 0 for under 50., length of stay n hosptal Ž1 for more than 1 week, 0 for less than 1 week., and exposure to an antbotc called Cephalexn Ž 1 for yes, 0 for no.. We dscuss estmaton of the effect of Cephalexn, controllng for age and length of stay, usng a model contanng only man-effect terms. The sample sze s large, yet relatvely few cases of acute darrhea occurred. Moreover, all subjects havng exposure to Cephalexn were also darrhea cases. Such boundary outcomes n whch none or all responses fall n one category cause nfnte ML estmates of some model parameters. An ML estmate of for the Cephalexn effect means that the lkelhood functon ncreases contnually as the parameter estmate for Cephalexn ncreases ndefntely. To study the Cephalexn effect, we use an exact dstrbuton, condtonng on suffcent statstcs for the other predctors. Although the estmate of the log-odds-rato parameter for the effect of Cephalexn s nfnte, t s possble to construct a confdence nterval by nvertng the famly of tests for the parameter, usng the condtonal dstrbuton. Dong ths, a 95% confdence nterval s Ž 19,. for the odds rato. Assumng that the man-effects model s vald, Cephalexn appears to have a strong effect. Smlarly, P for testng that the log odds rato equals zero. Results must be qualfed somewhat because no Cephalexn cases occurred at the frst three combnatons of levels of age and length of stay. In fact, the frst three rows of Table 6.16 make no contrbuton to the analyss ŽProblem The data actually provde evdence about the effect of Cephalexn only for older subjects havng a long stay.
174 NOTES Complcatons from Dscreteness Lke Fsher s exact test, exact condtonal nference for contngency tables s conservatve because of dscreteness. Ths s especally true when n s small or the data are unbalanced, wth most observatons fallng n a sngle column or row. Usng md-p-values or P-values based on a fner parttonng of the sample space Ž Note 3.9. n tests and related confdence ntervals reduces conservatveness. For the promoton dscrmnaton data Ž Table 6.15., we reported a 95% confdence nterval for the common odds rato of Ž 0, Invertng exact tests of H 0: s 0 wth the md-p-value yelds the nterval Ž 0, However, ths approach cannot guarantee that the actual coverage probablty s bounded below by A partcular problem occurs when no other set of y* 4 values has the same value of a gven suffcent statstc Ýyx j as the observed data. In that case the condtonal dstrbuton of the suffcent statstc for the parameter of nterest s degenerate. The P-value for the exact test then equals 1.0. Ths commonly happens when at least one explanatory varable x j whose effect s condtoned out for the nference s contnuous, wth unequally spaced observed values. Fnally, a lmtaton of the condtonal approach s requrng suffcent statstcs for the nusance parameters. Ths happens only wth GLMs that use the canoncal lnk. Thus, for nstance, the condtonal approach works for logt models but not probt models. NOTES Secton 6.1: Strateges n Model Selecton 6.1. A Bayesan argument motvates the Bayesan nformaton crteron BIC s G y Ž log n.ž df.x, an alternatve to AIC. It takes sample sze nto account. Compared to AIC, BIC gravtates less quckly toward more complex models as n ncreases. For detals and crtques, see Raftery Ž and the February 1999 ssue of Socologcal Methods and Research. 6.. Tree-structured methods such as CART are alternatves to logstc regresson that formalze a decson process usng a sequental set of questons that branch n dfferent drectons dependng on a subject s responses. An example s decdng whether a subject wth chest pans may be sufferng a heart attack. Zhang et al. Ž surveyed such methods. w Secton 6.: Logstc Regresson Dagnostcs 6.3. For logstc regresson dagnostcs, see Copas Ž 1988., Fowlkes Ž 1987., Hosmer and Lemeshow Ž 000, Chap. 5., Johnson Ž 1985., Landwehr et al. Ž 1984., and Pregbon Ž Separate dagnostcs are useful for checkng the adequacy of each component of a GLM Ž McCullagh and Nelder 1989, Chap. 1.. For a famly gž ;. of lnk functons ndexed
175 58 BUILDING AND APPLYING LOGISTIC REGRESSION MODELS by parameter, Pregbon Ž showed how to estmate gvng the lnk wth best ft and how to check the adequacy of a gven lnk gž ; Amemya Ž 1981., Efron Ž 1978., Maddala Ž 1983., and Zheng and Agrest Ž 000. and references theren revewed R measures for bnary regresson. Hosmer and Lemeshow Ž 000, Sec dscussed classfcaton tables and ther lmtatons. Pepe Ž 000. and references theren surveyed ROC methodology. Secton 6.3: Inference about Condtonal Assocatons n K Tables 6.5. Analogs of ˆ MH summarze dfferences of proportons or relatve rsks from several strata Ž Greenland and Robns Breslow and Day Ž 1980, p. 14. proposed an alternatve large-sample test of homogenety of odds ratos. In each partal table let 4 have the same margnals as the data observed, yet have odds rato equal to ˆ ˆjk MH. Ther test statstc has the Pearson form comparng n 4 to ˆ 4. Tarone Ž jk jk showed that because of the neffcency of ˆ MH one must adjust the Breslow Day statstc for t to have a lmtng ch-squared null dstrbuton wth df s K y 1. Ths adjustment s usually mnor. Jones et al. Ž revewed and compared several tests of homogenety n sparse and nonsparse settngs. Other work on comparng odds ratos and estmatng a common value nclude Breslow and Day Ž 1980, Sec. 4.4., Donner and Hauck Ž 1986., Gart Ž 1970., and Lang and Self Ž For modelng the odds rato, see Breslow Ž 1976., Breslow and Day Ž 1980, Sec. 7.5., and Prentce Ž 1976a.. Breslow emphaszed retrospectve studes, n whch the condtonal approach s natural snce the outcome totals are fxed. Secton 6.5: Sample Sze and Power Consderatons 6.6. For sample-sze determnaton for comparng proportons, Fless Ž 1981, Sec. 3.. provded tables. See Lachn Ž for the I J case. Chapman and Meng Ž 1966., Drost et al. Ž 1989., Haberman Ž 1974a, pp , Harkness and Katz Ž 1964., Mtra Ž 1958., and Patnak Ž derved theory for asymptotc nonnull behavor of ch-squared statstcs; see also Secton O Bren s Ž smulaton results suggested that the noncentral ch-squared approxmaton for G holds well for a wde range of powers. Read and Cresse Ž 1988, pp lsted other artcles that studed the nonnull behavor of X and G. Secton 6.6: Probt and Complementary Log-Log Models 6.7. Fnney Ž s the standard reference on probt modelng. Chambers and Cox Ž showed that t s dffcult to dstngush between probt and logt models unless n s extremely large. Ashford and Sowden Ž generalzed the probt model for multvarate bnary responses; see also Lesaffre and Molenberghs Ž and Och and Prentce Ž Wedderburn Ž showed that the log lkelhood s concave for probt and complementary log-log lnks. Secton 6.7: Condtonal Logstc Regresson 6.8. For detals about condtonal logstc regresson, see Secton 10., Breslow and Day Ž 1980, Chap. 7., Cox Ž 1970., and Hosmer and Lemeshow Ž 000, Chap. 5.. Lang Ž showed that condtonal ML estmators and condtonal score tests are asymptotcally equvalent to ther uncondtonal counterparts under samplng from exponental famles. For exact nference usng the condtonal lkelhood, see Hrj et al. Ž 1987., Mehta and Patel Ž 1995., and the LogXact manual Ž Cytel Software.. Mehta et al. Ž 000. dscussed Monte Carlo approxmatons.
176 PROBLEMS 59 PROBLEMS Applcatons 6.1 For the horseshoe crab data, ft a model usng weght and wdth as predctors. Conduct Ž a. a lkelhood-rato test of H 0: 1s s 0, and Ž b. separate tests for the partal effects. Why does nether test n part Ž b. show evdence of an effect when the test n part Ž a. shows strong evdence? 6. Refer to the data for Problem Treatng opnon about premartal sex as the response varable, use backward elmnaton to select a model. Interpret. 6.3 Refer to Table 6.4. Ft the stage 3 model denoted there by E*P q G. Use parameter estmates to nterpret the G effect and the dependence of the E effect on P. 6.4 Dscern the reasons that Smpson s paradox occurs for Table Refer to Problem.1. a. Ft the model wth G and D man effects. Usng t, estmate the AG condtonal odds rato. Compare to the margnal odds rato, and explan why they are so dfferent. Test ts goodness of ft. b. Ft the model of no G effect, gven the department. Use X to test ft. Obtan resduals, and nterpret the lack of ft. ŽEach department has a sngle nonredundant standardzed Pearson resdual. They 6 satsfy Ý r s X, ther squares gvng sx df s 1 components.. s1 c. Ft the two models excludng department A. Agan consder lack of ft, and nterpret. 6.6 Conduct a resdual analyss for the ndependence model wth Table What type of lack of ft s ndcated? 6.7 Table 6.17, refers to the effectveness of mmedately njected or 1 1 -hour-delayed penclln n protectng rabbts aganst lethal njecton wth -hemolytc streptococc. a. Let X s delay, Y s whether cured, and Z s penclln level. Ft the logt model Ž Argue that the pattern of 0 cell counts suggests that Ž wth no ntercept. ˆZ sy and ˆZ 1 5 s. What does your software report? b. Usng the logt model, conduct the lkelhood-rato test of XY condtonal ndependence. Interpret.
177 60 BUILDING AND APPLYING LOGISTIC REGRESSION MODELS TABLE 6.17 Data for Problem 6.7 Penclln Response Level Delay Cured Ded 1 8 None h None h None h 4 1 None h None h 5 0 Source: Reprnted wth permsson from Mantel c. Test XY condtonal ndependence usng the Cochran Mantel Haenszel test. Interpret. d. Estmate the XY condtonal odds rato usng Ž. ML wth the logt model, and the Mantel Haenszel estmate. Interpret. e. The small cell counts make large-sample analyses questonnable. Conduct small-sample nference, and nterpret. 6.8 Refer to Table.6. Use the CMH statstc to test ndependence of death penalty verdct and vctm s race, controllng for defendant s race. Show another test of ths hypothess, and compare results. 6.9 Treatments A and B were compared on a bnary response for 40 pars of subjects matched on relevant covarates. For each par, treatments were assgned to the subjects randomly. Twenty pars of subjects made the same response for each treatment. Sx pars had a success for the subject recevng A and a falure for the subject recevng B, whereas the other 14 pars had a success for B and a falure for A. Use the Cochran Mantel Haenszel procedure to test ndependence of response and treatment. ŽIn Secton 10.1 we present an equvalent test, McNemar s test Refer to Secton Suppose that 1s 0.7 and s 0.6. What sample sze s needed for the test to have approxmate power 0.80, when s 0.05, for Ž a. H :, and Ž b. H :? a 1 a 1
178 PROBLEMS Refer to Secton Suppose that 1s 0.63 and s When treatment sample szes are equal, explan why the jont probabltes n the table are and n the row for treatment A and 0.85 and 0.15 n the row for treatment B. For the model of ndependence, explan why the ftted jont probabltes are 0.30 for success and 0.0 for falure, n each row. Show that X has noncentralty parameter n and df s 1. For n s 00 and s 0.05, fnd the power. 6.1 In an experment desgned to compare two treatments on a three-category response, a researcher expects the condtonal dstrbutons to be approxmately Ž 0., 0., 0.6. and Ž 0.3, 0.3, Ž. a. Wth s 0.05, fnd the approxmate power usng X, and G to compare the dstrbutons wth 100 observatons for each treatment. Compare results. b. What sample sze s needed for each treatment for the tests n part Ž. atohave approxmate power 0.90? 6.13 The horseshoe crab wdth values n Table 4.3 have x s 6.3 and sx s.1. If the true relatonshp were smlar to the ftted equaton n Secton 5.1.3, about how large a sample yelds PŽ type II error. s 0.10, wth s 0.05, for testng H 0: s 0 aganst H a: 0? 6.14 Refer to Problem 5.1. Table 6.18 shows output for fttng a probt model. Interpret the parameter estmates Ž a. usng characterstcs of the normal cdf response curve, Ž b. fndng the estmated rate of change n the probablty of remsson where t equals 0.5, and Ž. c fndng the dfference between the estmated probabltes of remsson at the upper and lower quartles of the labelng ndex, 14 and 8. TABLE 6.18 Data for Problem 6.14 Standard Lkelhood Rato 95% Ch- Parameter Estmate Error Confdence Lmts Square Pr ChSq Intercept y y y LI Use probt models to descrbe the effects of wdth and color on the probablty of a satellte for Table 4.3. Interpret Refer to Table Ft the model havng log-log lnk rather than complementary log-log. Test the ft. Why does t ft so poorly?
179 6 BUILDING AND APPLYING LOGISTIC REGRESSION MODELS 6.17 For the lnear logt model wth Table 3.9 and scores Ž 0, 15, 30., conduct the exact test of H 0: s 0 and fnd a pont and nterval estmate of usng the condtonal lkelhood. Interpret Refer to Table Apply condtonal logstc regresson to the model dscussed n Secton a. Obtan an exact P-value for testng no C effect aganst the alternatve of a postve effect. Construct a 95% confdence nterval for the condtonal CD odds rato. b. Construct the partal tables relatng C to D for the combnatons of levels of Ž A, L.. Note that three tables have no data when C s 1. For the sole partal table havng data at both C levels, fnd a 95% exact confdence nterval for the odds rato and fnd an exact one-sded P-value. Compare to results usng the entre data set. Comment about the contrbuton to nference of tables havng only a sngle postve row total or a sngle postve column total. c. Obtan the ordnary ML ft of the logstc regresson model. To nvestgate the senstvty of the estmated C effect, fnd the change n the estmate and SE after addng one observaton to the data set, a case wth no darrhea when C, A, L s 1, 1, Consder Table 6.19, from a study of nonmetastatc osteosarcoma ŽA. M. Goorn, J. Cln Oncol. 5: , 1987, and the manual for LogXact.. The response s whether the subject acheved a three-year dsease-free nterval. a. Show that each predctor has a sgnfcant effect when used ndvdually wthout the others. b. Try to ft a man-effects logstc regresson model contanng all three predctors. Explan why the ML estmate for the effect of lymphocytc nfltraton s nfnte. TABLE 6.19 Data for Problem 6.19 Lymphocytc Osteoblastc Dsease-Free Infltraton Gender Pathology Yes No Hgh Female No 3 0 Yes 0 Male No 4 0 Yes 1 0 Low Female No 5 0 Yes 3 Male No 5 4 Yes 6 11 Source: LogXact 4 for Wndows Cambrdge, MA: CYTEL Software, 1999.
180 PROBLEMS 63 c. Usng condtonal logstc regresson, Ž. conduct an exact test for the effect of lymphocytc nfltraton, controllng for the other varables; and fnd a 95% confdence nterval for the effect. Interpret results. 6.0 Use the methods dscussed n ths chapter to select a model for Table Logstc regresson s appled ncreasngly to large fnancal databases, such as for credt scorng to model the nfluence of predctors on whether a consumer s credtworthy. The data archve found under the ndex at www. stat.un-muenchen. de contans such a data set that ncludes 0 covarates for 1000 observatons. Buld a model for credtworthness usng the predctors runnng account, duraton of credt, payment of prevous credts, ntended use, gender, and martal status. Theory and Methods 6. For a sequence of s nested models M 1,...,M s, model Ms s the most complex. Let denote the dfference n resdual df between M1 and M s. a. Explan why for j k, G Ž M M. F G Ž M M. j k j s. b. Assume model M j,sothat Mk also holds when k j. For all k j, as n, PG w Ž M M. Ž.x j k F. Explan why. c. Gabrel Ž suggested a smultaneous testng procedure n whch, for each par of models, the crtcal value for dfferences between G values s Ž.. The fnal model accepted must be more complex than any model rejected n a parwse comparson. Snce part Ž b. s true for all j k, argue that Gabrel s procedure has type I error probablty no greater than. 6.3 Prove that the Pearson resduals for the lnear logt model appled to a I contngency table satsfy X s Ýs1 I e. Note that ths holds for a bnomal GLM wth any lnk. 6.4 Refer to logt model Ž 6.4. for a K contngency table n 4 jk. a. Usng dummy varables, wrte the log-lkelhood functon. Identfy the suffcent statstcs for the varous parameters. Explan how to conduct exact condtonal nference about the effect of X, controllng for Z. b. Usng a basc result for testng n exponental famles, explan why unformly most powerful unbased tests of condtonal XY ndependence are based on Ý n Ž Brch 1964b; Lehmann 1986, Sec k 11 k
181 64 BUILDING AND APPLYING LOGISTIC REGRESSION MODELS 6.5 Suppose that 4 n a table are, by row, Ž jk 0.15, 0.10 r 0.10, when Z s 1 and Ž 0.10, 0.15 r 0.15, when Z s. For testng condtonal XY ndependence wth logt models havng Y as a response, explan why the lkelhood-rato test comparng models X q Z and Z s not consstent but the lkelhood-rato test of ft of the XY condtonal ndependence model s. 6.6 Refer to Secton When Y s NŽ,., consder the comparson of Ž,...,. 1 I based on ndependent samples at the I categores of X. When approxmately s q x, explan why the t or F test of H 0: s 0smore powerful than the one-way ANOVA F test. Descrbe a pattern for 4 for whch the ANOVA test would be more powerful. 6.7 For a multnomal dstrbuton, let s Ýb, and suppose that s f Ž. 0, s 1,...,I. For sample proportons p 4, let S s Ýbp. Let T s Ý b, where s f Ž ˆ., for the ML estmator ˆ ˆ ˆ of. w a. Show that var S s Ý b y Ž Ý b. x rn. w Ž ˆ.xw x b. Usng the delta method, show var T f var Ýbf. c. By computng the nformaton for LŽ. s Ý n logw f Ž.x, show that Ž ˆ. w Ž. xy1 var s approxmately ný f rf. d. Asymptotcally, show that varw' nž Ty.xF varw' nž Sy.x. whnt: Show that varž T. rvarž S. s a squared correlaton between two random varables, where wth probablty the frst equals b and the second equals f Ž. rf Ž.. x 6.8 A threshold model can also motvate the probt model. For t, there s an unobserved contnuous response Y * such that the observed y s 0 f y * F and ys 1f y *. Suppose that y * s q, where s q x and where 4 are ndependent from a NŽ0,. dstr- buton. For dentfablty one can set s 1 and the threshold s 0. Show that the probt model holds and explan why represents the expected number of standard devaton change n Y * for a 1-unt ncrease n x. 6.9 Consder the choce between two optons, such as two product brands. Let U0 denote the utlty of outcome y s 0 and U1 the utlty of y s 1. For y s 0 and 1, suppose that Uys yq yx q y, usng a scale such that y has some standardzed dstrbuton. A subject selects y s 1f U1 U0 for that subject. a. If and are ndependent NŽ 0, random varables, show that PYs Ž 1. satsfes the probt model. b. If y are ndependent extreme-value random varables, wth cdf FŽ. s expwyexpž y.x, show that PYs Ž 1. satsfes the logstc regresson model Ž Maddala 1983, p. 60; McFadden
182 PROBLEMS Consder model Ž 6.1. wth complementary log-log lnk. 1 a. Fnd x at whch Ž x. s. b. Show the greatest rate of change of Ž x. occurs at x sy r. What does Ž x. equal at that pont? Gve the correspondng result for the model wth log-log lnk, and compare to the logt and probt models Suppose that log-log model 6.13 holds. Explan how to nterpret. 6.3 Let y, s 1,..., n, denote n ndependent bnary random varables. y1 a. Derve the log lkelhood for the probt model w Ž x.x s Ý j jx j. b. Show that the lkelhood equatons for the logstc and probt regresson models are Ý Ž. j y y ˆ zx s 0, j s 0,..., p, where z s 1 for the logstc case and z s Ž Ý ˆ x. r ˆ Ž 1 y ˆ. j j j for the probt case. ŽWhen the lnk s not canoncal, there s no reducton of the data n suffcent statstcs Sometmes, sample proportons are contnuous rather than of the bnomal form Ž number of successes. rž number of trals.. Each observaton s any real number between 0 and 1, such as the proporton of a tooth surface that s covered wth plaque. For ndependent responses y 4,Atchson and Shen Ž and Bartlett Ž modeled logtž Y. NŽ,.. Then Y tself s sad to have a logstc-normal dstrbuton. a. Expressng a NŽ,. varate as q Z, where Z s standard normal, show that Y s expž q Z. rw1 q expž q Z.x. b. Show that for small, e e 1 e 1 y e Y s q Z q Z q. 3 1 q e 1 q e 1 q e 1q e Ž. c. Lettng s e r 1 q e, when s close to 0 show that EŽ Y. f, varž Y. f Ž 1 y.. d. For ndependent contnuous proportons y 4,let s EY.Fora GLM, t s sensble to use an nverse cdf lnk for, but t s unclear how to choose a dstrbuton for Y. The approxmate moments for the logstc-normal motvate a quas-lkelhood approach ŽWedder- burn wth varance functon Ž. s w Ž 1 y.x for un
183 66 BUILDING AND APPLYING LOGISTIC REGRESSION MODELS known. Explan why ths provdes smlar results as fttng a normal regresson model to the sample logts assumng constant varance. ŽThe QL approach has the advantage of not requrng adjustment of 0 or 1 observatons, for whch sample logts don t exst.. e. Wedderburn Ž gave an example wth response the proporton of a leaf showng a type of blotch. Envson an approxmaton of bnomal form based on cuttng each leaf nto a large number of small regons of the same sze and observng for each regon whether t s mostly covered wth blotch. Explan why ths suggests that Ž. s Ž 1 y.. What volaton of the bnomal assump- tons mght make ths questonnable? wthe parametrc famly of beta dstrbutons has varance functon of ths form Žsee Secton Barndorff-Nelsen and Jorgensen Ž proposed a dstrbuton havng Ž. s w Ž 1 y.x 3 ; see also Cox Ž x 6.34 For ndependent bnomal samplng, construct the log lkelhood and dentfy the suffcent statstcs to be condtoned out to perform exact nference about n model Ž Žy. Ž Žy1. Žyn.. Žy Let ˆ s ˆ,..., ˆ, where ˆ denotes the estmate of EY for bnary observaton after fttng the model wthout that observaton. Cross-valdaton declares a model to have good predctve Ž Žy. power f corr ˆ, y. s hgh. Consder the model logtž. s for all Žy.. Show that ˆ s y and hence s wnrž n y 1.xw y y Ž 1rn. y x ˆ, and Ž Žy. hence corr ˆ, y. sy1 regardless of how well the model fts. Thus, cross-valdaton can be msleadng wth bnary data ŽZheng and Agrest 000..
184 Categorcal Data Analyss, Second Edton. Alan Agrest Copyrght 00 John Wley & Sons, Inc. ISBN: CHAPTER 8 Loglnear Models for Contngency Tables In Secton 4.3 we ntroduced loglnear models as generalzed lnear models Ž GLMs. usng the log lnk functon wth a Posson response. A common use s modelng cell counts n contngency tables. The models specfy how the expected count depends on levels of the categorcal varables for that cell as well as assocatons and nteractons among those varables. The purpose of loglnear modelng s the analyss of assocaton and nteracton patterns. In Secton 8.1 we ntroduce loglnear models for two-way contngency tables. In Sectons 8. and 8.3 we extend them to three-way tables, and n Secton 8.4 dscuss models for multway tables. Loglnear models are of use prmarly when at least two varables are response varables. Wth a sngle categorcal response, t s smpler and more natural to use logt models. When one varable s treated as a response and the others as explanatory varables, logt models for that response varable are equvalent to certan loglnear models. Secton 8.5 covers ths connecton. In Sectons 8.6 and 8.7 we dscuss ML loglnear model fttng. 8.1 LOGLINEAR MODELS FOR TWO-WAY TABLES Consder an I J contngency table that cross-classfes a multnomal sample of n subjects on two categorcal responses. The cell probabltes are j 4 and the expected frequences are s n 4 j j.loglnear model formulas use 4 rather than 4 j j,sothey also apply wth Posson samplng for N s IJ ndependent cell counts Y 4 havng s EY 4 j j j.inethercasewedenote the observed cell counts by n 4. j Independence Model Under statstcal ndependence, n Secton we noted that the 4 j have the structure js j. 314
185 LOGLINEAR MODELS FOR TWO-WAY TABLES 315 For multnomal samplng, for nstance, js n q qj. Denote the row varable by X and the column varable by Y. The formula expressng ndependence s multplcatve. Thus, log has addtve form j log js q X q Y j Ž 8.1. for a row effect X and a column effect Y j. Ths s the loglnear model of ndependence. Asusual, dentfablty requres constrants such as X I s Y J s 0. The ML ftted values are s n n rn 4 ˆj q qj,the estmated expected frequences for ch-squared tests of ndependence. The tests usng X and G Ž Secton are also goodness-of-ft tests of ths loglnear model Interpretaton of Parameters Loglnear models for contngency tables are GLMs that treat the N cell counts as ndependent observatons of a Posson random component. Loglnear GLMs dentfy the data as the N cell counts rather than the ndvdual classfcatons of the n subjects. The expected cell counts lnk to the explanatory terms usng the log lnk. As Ž 8.1. llustrates, of the cross-classfed varables, the model does not dstngush between response and explanatory varables. It treats both jontly as responses, modelng 4 j for combnatons of ther levels. To nterpret parameters, however, t s helpful to treat the varables asymmetrcally. We llustrate wth the ndependence model for I tables. In row, the logt equals logt PŽ Ys 1 X s. PŽ Ys 1 X s. s log PŽ Ys X s. 1 s log s log y log 1 s Ž q X q 1 Y. y Ž q X q Y. s 1 Y y Y. The fnal term does not depend on ; that s, logtwpžys 1 X s.x s dentcal at each level of X. Thus, ndependence mples a model of form, logtw PYs Ž 1 X s.x s. Ineach row, the odds of response n column 1 equal expž. s expž Y y Y. 1. An analogous property holds when J. Dfferences between two parameters for a gven varable relate to the log odds of makng one response, relatve to the other, on that varable. Of course, wth a sngle response varable, logt models apply drectly and loglnear models are unneeded.
186 316 LOGLINEAR MODELS FOR CONTINGENCY TABLES Saturated Model Statstcally dependent varables satsfy a more complex loglnear model, log js q X q Y j q j XY. Ž 8.. XY The 4 j are assocaton terms that reflect devatons from ndependence. The rght-hand sde of Ž 8.. resembles the formula for cell means n two-way XY ANOVA, allowng nteracton. The 4 j represent nteractons between X and Y, whereby the effect of one varable on j depends on the level of the XY other. The ndependence model 8.1 results when all j s 0. X Y Wth constrants s s 0 n Ž 8.1. and Ž 8.., X 4 and Y 4 I J j are, equvalently, coeffcents of dummy varables for the frst Ž I y 1. categores XY of X and the frst J y 1 categores of Y. Thus, j s the coeffcent of the X Y product of dummy varables for and. Snce there are Ž I y 1.Ž J y 1. j XY XY such cross products, s s 0, and only Ž I y 1.Ž J y 1. Ij J of these parameters are nonredundant. Tests of ndependence analyze whether these Ž I y 1.Ž J y 1. parameters equal zero, so they have resdual df s Ž I y 1.Ž J y 1.. The number of parameters n model Ž 8.. equals 1 q Ž I y 1. q Ž J y 1. q Ž I y 1.Ž J y 1. s IJ, the number of cells. Hence, ths model descrbes perfectly any 04 Ž see Problem j. It s the most general model for two-way contngency tables, the saturated model. For t, drect relatonshps XY exst between log odds ratos and 4.Fornstance, for tables, j 11 log s log s log q log y log y log s Ž q 1 X q 1 Y q 11 XY. q Ž q X q Y q XY. yž q 1 X q Y q 1 XY. y Ž q X q 1 Y q 1 XY. s 11 XY q XY y 1 XY y 1 XY. Ž 8.3. XY Thus, 4 j determne the assocaton. In practce, unsaturated models are preferable, snce ther ft smooths the sample data and has smpler nterpretatons. For tables wth at least three varables, unsaturated models can nclude assocaton terms. Then, loglnear models are more commonly used to descrbe assocatons Žthrough two-factor terms. than to descrbe odds Ž through sngle-factor terms.. Lke others n ths book, model Ž 8.. s herarchcal. Ths means that the model ncludes all lower-order terms composed from varables contaned n a hgher-order model term. When the model contans j XY,talso contans X and Y j. A reason for ncludng lower-order terms s that, otherwse, the statstcal sgnfcance and the nterpretaton of a hgher-order term depends on how varables are coded. Ths s undesrable, and wth herarchcal models the same results occur no matter how varables are coded.
187 LOGLINEAR MODELS FOR TWO-WAY TABLES 317 An example of a nonherarchcal model s log s q X q XY. j j Ths model permts assocaton but forces unnatural behavor of expected frequences, wth the pattern dependng on constrants used for parameters. For nstance, wth constrants whereby parameters are zero at the last level, log Ij s n every column. Nonherarchcal models are rarely sensble n practce. Usng them s analogous to usng ANOVA or regresson models wth nteracton terms but wthout the correspondng man effects. When a model has two-factor terms, nterpretatons focus on them rather than on the sngle-factor terms. By analogy wth two-way ANOVA wth two-factor nteracton, t can be msleadng to report man effects. The estmates of the man-effect terms depend on the codng scheme used for the hgher-order effects, and the nterpretaton also depends on that scheme Žsee Problem Normally, we restrct our attenton to the hghest-order terms for a varable, as we llustrate n Secton Alternatve Parameter Constrants As wth the ndependence model, the parameter constrants for the saturated model are arbtrary. Instead of settng all Ij XY s J XY s 0, one could set Ý j XY s Ý j j XY s 0 for all and j. Dfferent software uses dfferent con- strants. What s unque are contrasts such as 11 XY q XY y 1 XY y 1 XY n Ž 8.3. that determne odds ratos. For nstance, suppose that a log odds rato equals.0 n a table. Wth the frst set of constrants,.0 s the coeffcent of a product of a dummy varable ndcatng the frst category of X and a dummy varable ndcatng the frst category of Y. Wth t, XY s.0 and XY s XY s XY s 0. For sum-to-zero constrants, 11 XY s XY s 0.5, 1 XY s 1 XY sy0.5. For ether set, the log odds rato Ž 8.3. equals.0. For a set of parameters, an advantage of settng a baselne parameter equal to 0 nstead of the sum equal to 0 s that some parameters n a set can have nfnte estmates Multnomal Models for Cell Probabltes Condtonal on the sum n of the cell counts, Posson loglnear models for 4 become multnomal models for cell probabltes s rž ÝÝ.4 j j j ab. To llustrate, for the saturated model, exp q X q Y j q j XY js. Ž 8.4 X Y XY. exp q q q ÝÝ a b ab a b
188 318 LOGLINEAR MODELS FOR CONTINGENCY TABLES Ths representaton mples the usual constrants for probabltes, G 04 j and ÝÝ j js 1. The ntercept parameter cancels n the multnomal model Ž Ths parameter relates purely to the total sample sze, whch s random n the Posson model but not n the multnomal model. 8. LOGLINEAR MODELS FOR INDEPENDENCE AND INTERACTION IN THREE-WAY TABLES In Secton.3 we ntroduced three-way contngency tables and related structure such as condtonal ndependence and homogeneous assocaton. Loglnear models for three-way tables descrbe ther ndependence and assocaton patterns Types of Independence A three-way I J K cross-classfcaton of response varables X, Y, and Z has several potental types of ndependence. We assume a multnomal dstrbuton wth cell probabltes 4 jk,andýýý j k jks 1.0. The models also apply to Posson samplng wth means 4 jk. The three varables are mutually ndependent when jks qq qjq qqk for all, j, and k. Ž 8.5. For expected frequences 4,mutual ndependence has loglnear form jk log jks q X q Y j q Z k. Ž 8.6. Varable Y s jontly ndependent of X and Z when jks qk qjq for all, j, and k. Ž 8.7. Ths s ordnary two-way ndependence between Y and a varable composed of the IK combnatons of levels of X and Z. The loglnear model s log jks q X q Y j q Z k q k XZ. Ž 8.8. Smlarly, X could be jontly ndependent of Y and Z, orz could be jontly ndependent of X and Y. Mutual ndependence Ž 8.5. mples jont ndependence of any one varable from the others. From Secton.3, X and Y are condtonally ndependent, g en Z when ndependence holds for each partal table wthn whch Z s fxed. That s, f s PŽ Xs, Y s j Zs k., then j k s for all, j, and k. j k q k qj k
189 LOGLINEAR MODELS FOR THREE-WAY TABLES 319 For jont probabltes over the entre table, equvalently jks qk qjkr qqk for all, j, and k. Ž 8.9. Condtonal ndependence of X and Y, gven Z, s the loglnear model log jks q X q Y j q Z k q k XZ q YZ jk. Ž Ths s a weaker condton than mutual or jont ndependence. Mutual ndependence mples that Y s jontly ndependent of X and Z, whch tself mples that X and Y are condtonally ndependent. Table 8.1 summarzes these three types of ndependence. In Secton.3. we showed that partal assocatons can be qute dfferent from margnal assocatons. For nstance, condtonal ndependence does not mply margnal ndependence. Condtonal ndependence and margnal ndependence both hold when one of the stronger types of ndependence studed above apples. Fgure 8.1 summarzes relatonshps among the four types of ndependence. 8.. Homogeneous Assocaton and Three-Factor Interacton Loglnear models Ž 8.6., Ž 8.8., and Ž have three, two, and one par of condtonally ndependent varables, respectvely. In the latter two models, TABLE 8.1 Summary of Loglnear Independence Models Probablstc Assocaton Terms Model Form for jk n Loglnear Model Interpretaton Ž 8.6. qq qjq qqk None Varables mutually ndependent XZ 8.8 qk qjq k Y ndependent of X and Z Ž r XZ YZ q X and Y ndependent, gven Z qk qjk qqk k jk FIGURE 8.1 Relatonshps among types of XY ndependence.
190 30 LOGLINEAR MODELS FOR CONTINGENCY TABLES Ž XY the doubly subscrpted terms such as. j pertan to condtonally depen- dent varables. A model that permts all three pars to be condtonally dependent s log jks q X q Y j q Z k q j XY q k XZ q YZ jk. Ž From exponentatng both sdes, the cell probabltes have form s. jk j jk k No closed-form expresson exsts for the three components n terms of margns of 4 except n certan specal cases Ž see Note 9.. jk. For ths model, n the next secton we show that condtonal odds ratos between any two varables are dentcal at each category of the thrd varable. That s, each par has homogeneous assocaton Ž Secton Model Ž s called the loglnear model of homogeneous assocaton or of no three-factor nteracton. The general loglnear model for a three-way table s log jks q X q Y j q Z k q j XY q k XZ q YZ jk q jk XYZ. Ž 8.1. Wth dummy varables, jk XYZ s the coeffcent of the product of the th dummy varable for X, jth dummy varable for Y, and kth dummy varable for Z. The total number of nonredundant parameters s 1 q Ž I y 1. q Ž J y 1. q Ž K y 1. q Ž I y 1.Ž J y 1. q Ž I y 1.Ž K y 1. qž J y 1.Ž K y 1. q Ž I y 1.Ž J y 1.Ž K y 1. s IJK, the total number of cell counts. Ths model has as many parameters as observatons and s saturated. It descrbes all possble postve 4 jk.each par of varables may be condtonally dependent, and an odds rato for any par may vary across categores of the thrd varable. Settng certan parameters equal to zero n Ž 8.1. yelds the models ntroduced prevously. Table 8. lsts some of these models. To ease referrng to models, Table 8. assgns to each model a symbol that lsts the hghest-order TABLE 8. Loglnear Models for Three-Dmensonal Tables Loglnear Model Symbol X Y Z log s q q q Ž X, Y, Z. jk j k X Y Z XY log s q q q q Ž XY, Z. jk j k j X Y Z XY YZ log s q q q q q Ž XY, YZ. jk j k j jk X Y Z XY YZ XZ log s q q q q q q Ž XY, YZ, XZ. jk j k j jk k X Y Z XY YZ XZ XYZ log s q q q q q q q Ž XYZ. jk j k j jk k jk
191 LOGLINEAR MODELS FOR THREE-WAY TABLES 31 termž. s for each varable. For nstance, the model Ž of condtonal ndependence between X and Y has symbol Ž XZ, YZ., snce ts hghest-order terms are k XZ and YZ jk.inthe notaton we used for logt models n Sectons 6.1 and 7.1. ths stands for Ž X*Z q Y *Z., whch s tself shorthand for notaton Ž X q Y q Z q X Z q Y Z. that has the man effects as well as nteractons Interpretng Model Parameters Interpretatons of loglnear model parameters use ther hghest-order terms. For nstance, nterpretatons for model Ž use the two-factor terms to descrbe condtonal odds ratos. At a fxed level k of Z, the condtonal assocaton between X and Y uses Ž I y 1.Ž J y 1. odds ratos, such as the local odds ratos jk q1, jq1,k jž k. s, 1F F I y 1, 1 F j F J y 1. Ž 8.13., jq1,k q1, j, k Smlarly, Ž I y 1.Ž K y 1. odds ratos 4 Ž j.k descrbe XZ condtonal assoc- aton, and Ž J y 1.Ž K y 1. odds ratos 4 Ž. jk descrbe YZ condtonal assocaton. Loglnear models have characterzatons usng constrants on condtonal odds ratos. For nstance, condtonal ndependence of X and Y s equvalent to s 1, s 1,...,I y 1, j s 1,..., J y 1, k s 1,...,K 4 jž k.. The two-factor parameters relate drectly to the condtonal odds ratos. To llustrate, substtutng Ž for model Ž XY, XZ, YZ. nto log yelds jk q1, jq1,k XY XY XY XY log jž k. s log s j q q1, jq1 y, jq1 y q1, j. Ž q1, jk 1, jq1,k Snce the rght-hand sde s the same for all k, an absence of three-factor nteracton s equvalent to s s s for all and j. jž1. jž. jž K. The same argument for the other condtonal odds ratos shows that model Ž XY, XZ, YZ. s also equvalent to and to s s s for all and k, Ž1. k Ž. k Ž J.k s s s for all j and k. Ž1. jk Ž. jk Ž I. jk Any model not havng the three-factor nteracton term has a homogeneous assocaton for each par of varables. jž k.
192 3 LOGLINEAR MODELS FOR CONTINGENCY TABLES When X and Y have two categores, only one nonredundant j XY parame- ter occurs. Thus, expresson Ž s smplfed dependng on the constrants. By the same argument as n Secton for tables, the condtonal log odds rato smplfes to 11 XY wth dummy-varable constrants settng parameters at the second level of X or Y equal to 0. XYZ The term n the general model Ž 8.1. jk refers to three-factor nterac- ton. It descrbes how the odds rato between two varables changes across categores of the thrd. We llustrate for tables. By drect substtuton of the general model formula, 11Ž1. Ž rž log s log Ž. rž. 11Ž s Ž 111 XYZ q 1 XYZ y 11 XYZ y 11 XYZ. yž 11 XYZ q XYZ y 1 XYZ y 1 XYZ.. Only one parameter s nonredundant. For constrants settng the second-category parameters equal to 0, ths log rato of odds ratos equals 111 XYZ. When 111 XYZ s 0, 11Ž1. s 11Ž., gvng homogeneous XY assocaton Alcohol, Cgarette, and Marjuana Use Example Table 8.3 refers to a 199 survey by the Wrght State Unversty School of Medcne and the Unted Health Servces n Dayton, Oho. The survey asked 76 students n ther fnal year of hgh school n a nonurban area near Dayton, Oho whether they had ever used alcohol, cgarettes, or marjuana. Denote the varables n ths table by A for alcohol use, C for cgarette use, and M for marjuana use. Secton 8.7 covers the fttng of loglnear models. For now, we emphasze nterpretaton. Table 8.4 shows ftted values for several loglnear models. The TABLE 8.3 Alcohol, Cgarette, and Marjuana Use for Hgh School Senors Alcohol Cgarette Marjuana Use Use Use Yes No Yes Yes No No Yes 3 43 No 79 Source: Data courtesy of Harry Khams, Wrght State Unversty.
193 LOGLINEAR MODELS FOR THREE-WAY TABLES 33 TABLE 8.4 Ftted Values for Loglnear Models Appled to Table 8.3 Alcohol Cgarette Marjuana Loglnear Model a Use Use Use Ž A, C, M.Ž AC, M.Ž AM, CM.Ž AC, AM, CM.Ž ACM. Yes Yes Yes No No Yes No No Yes Yes No No Yes No a A, alcohol use; C, cgarette use; M, marjuana use. ft for model Ž AC, AM, CM. s close to the observed data, whch are the ftted values for the saturated model Ž ACM.. The other models ft poorly. Table 8.5 llustrates model assocaton patterns by presentng estmated condtonal and margnal odds ratos. For example, the entry 1.0 for the AC condtonal assocaton for the model Ž AM, CM. of AC condtonal ndependence s the common value of the AC ftted odds ratos at the two levels of M, s s The entry.7 for the AC margnal assocaton for ths model s the odds rato for the margnal AC ftted table. The odds ratos for the observed data are those reported for the saturated model Ž ACM.. Table 8.5 shows that estmated condtonal odds ratos equal 1.0 for each parwse term not appearng n a model, such as the AC assocaton n model Ž AM, CM.. For that model, the estmated margnal AC odds rato dffers from 1.0, snce condtonal ndependence does not mply margnal ndependence. Some models have condtonal assocatons that are necessarly the TABLE 8.5 Estmated Odds Ratos for Loglnear Models n Table 8.5 Condtonal Assocaton Margnal Assocaton Model AC AM CM AC AM CM Ž A, C, M Ž AC, M Ž AM, CM Ž AC, AM, CM Ž ACM. level Ž ACM. level
194 34 LOGLINEAR MODELS FOR CONTINGENCY TABLES same as the correspondng margnal assocatons. In Secton 9.1. we present a condton guaranteeng ths. Model Ž AC, AM, CM. permts all parwse assocatons but mantans homogeneous odds ratos between two varables at each level of the thrd. The AC ftted condtonal odds ratos for ths model equal 7.8. One can calculate ths odds rato usng the model s ftted values at ether level of M, or wfrom Ž 8.14.x usng expž ˆAC q ˆAC y ˆAC y ˆAC Table 8.5 shows that estmated odds ratos are very dependent on the model. Ths hghlghts the mportance of good model selecton. An estmate from ths table s nformatve only to the extent that ts model fts well. In the next secton we dscuss goodness of ft. 8.3 INFERENCE FOR LOGLINEAR MODELS A good-fttng loglnear model provdes a bass for descrbng and makng nferences about assocatons among categorcal responses. Standard methods apply for checkng ft and makng nference about model parameters Ch-Squared Goodness-of-Ft Tests As usual, X and G test whether a model holds by comparng cell ftted values to observed counts. Here df equals the number of cell counts mnus the number of model parameters. For the student survey Ž Table 8.3., Table 8.6 shows results of testng ft for several loglnear models. Models that lack any assocaton term ft poorly. The model Ž AC, AM, CM. that has all parwse assocatons fts well ŽP s It s suggested by other crtera also, such as mnmzng AIC syž maxmzed log lkelhood number of parameters n model. w x or equvalently, mnmzng G y df. TABLE 8.6 Goodness-of-Ft Tests for Loglnear Models n Table 8.4 a Model G X df P-value Ž A, C, M Ž A, CM Ž C, AM Ž M, AC Ž AC, AM Ž AC, CM Ž AM, CM Ž AC, AM, CM Ž ACM a P-value for G statstc.
195 INFERENCE FOR LOGLINEAR MODELS Inference about Condtonal Assocatons Tests about condtonal assocatons compare loglnear models. The lkelhood-rato statstc yž L y L. s dentcal to the dfference G Ž M M s G Ž M. y G Ž M. 0 1 between devances for models wthout that term and wth t. For model Ž XY, XZ, YZ., consder the hypothess of XY condtonal XY ndependence. Ths s H : s 0 for the Ž I y 1.Ž J y 1. 0 j XY assocaton parameters. The test statstc s G Ž XZ, YZ. y G Ž XY, XZ, YZ., wth df s Ž I y 1.Ž J y 1.. Ths has the same purpose as the generalzed CMH and model-based tests for nomnal varables presented n Secton 7.5. For nstance, the test of condtonal ndependence between alcohol use and cgarette smokng compares model Ž AM, CM. wth the alternatve Ž AC, AM, CM.. The test statstc s G Ž AM, CM. Ž AC, AM, CM. s y 0.4 s 187.4, wth df s y 1 s 1 Ž P The statstcs comparng Ž AC, CM. and Ž AC, AM. wth Ž AC, AM, CM. also provde strong evdence of AM and CM condtonal assocatons. Further analyses of Table 8.3 use model Ž AC, AM, CM.. Wth large sample szes, statstcally sgnfcant effects can be weak and unmportant. A more relevant concern s whether the assocatons are strong enough to be mportant. Confdence ntervals are more useful than tests for assessng ths. Table 8.7 shows output from fttng model Ž AC, AM, CM. wth TABLE 8.7 Output for Fttng Loglnear Model to Table 8.3 Crtera For Assessng Goodness Of Ft Crteron DF Value Value / DF Devance Pearson Ch- Square Standard Wald Parameter Estmate Error Ch- Square Pr>ChSq Intercept <.0001 a <.0001 c 1 y <.0001 m 1 y <.0001 a*m <.0001 a*c <.0001 c*m <.0001 LR Statstcs Source DF Ch- Square Pr>ChSq a*m <.0001 a*c <.0001 c*m <.0001
196 36 LOGLINEAR MODELS FOR CONTINGENCY TABLES parameters n the last row and n the last column equal to zero, such as by usng Ž 1, 0. dummy varables for each classfcaton. Consder the condtonal AC odds rato, assumng model Ž AC, AM, CM.. Table 8.7 reports ˆAC 11 s.054, wth SE s For these constrants, ths s the estmated condtonal log odds rato. A 95% Wald confdence nterval for the true cond- 1.96Ž x, or Ž 5.5, Strong pos- tonal AC odds rato s expw.054 tve assocaton exsts between cgarette use and alcohol use, both for users and nonusers of marjuana. For model Ž AC, AM, CM., the 95% Wald confdence ntervals are Ž 8.0, 49.. for the AM condtonal odds rato and Ž 1.5, 3.8. for the CM condtonal odds rato. The ntervals are wde, but these assocatons also are strong. Table 8.5 shows that estmated margnal assocatons are even stronger. Controllng for outcome on one response moderates the assocaton somewhat between the other two. The analyses n ths secton pertan to assocatons. A dfferent analyss pertans to comparng sngle-varable margnal dstrbutons, for nstance to determne f students used cgarettes more than alcohol or marjuana. That type of analyss s presented n Secton LOGLINEAR MODELS FOR HIGHER DIMENSIONS Loglnear models for three-way tables are more complex than for two-way tables, because of the varety of potental assocaton terms. Loglnear models for three-way tables extend readly, however, to multway tables. As the number of dmensons ncreases, some complcatons arse. One s the ncrease n the number of possble assocaton and nteracton terms, makng model selecton more dffcult. Another s the ncrease n number of cells. In Secton 9.8 we show that ths can cause dffcultes wth exstence of estmates and approprateness of asymptotc theory Four-Way Contngency Tables We llustrate models for hgher dmensons usng a four-way table wth varables W, X, Y, and Z. Interpretatons are smplest when the model has no three-factor nteracton terms. Such models are specal cases of log s q W q X q Y q Z hjk h j k q WX q WY q WZ q XY q XZ q YZ, h hj hk j k jk denoted by WX, WY, WZ, XY, XZ, YZ. Each par of varables s condtonally dependent, wth the same odds ratos at each combnaton of categores of the other two varables. An absence of a two-factor term mples condtonal ndependence, gven the other two varables.
197 LOGLINEAR MODELS FOR HIGHER DIMENSIONS 37 A varety of models exhbt three-factor nteracton. A model could contan any of WXY, WXZ, WYZ, orxyz terms. For model Ž WXY, WZ, XZ, YZ., each par of varables s condtonally dependent, but at each level of Z the WX assocaton, the WY assocaton, and the XY assocaton may vary across categores of the remanng varable. The condtonal assocaton between Z and another varable s homogeneous. The saturated model contans all the three-factor terms plus a four-factor nteracton term Automoble Accdent Example Table 8.8 summarzes observatons of 68,694 passengers n autos and lght trucks nvolved n accdents n the state of Mane n The table classfes passengers by gender Ž G., locaton of accdent Ž L., seat-belt use Ž S., and njury Ž I.. Table 8.8 reports the sample proporton of passengers who were njured. For each GL combnaton, the proporton of njures was about halved for passengers wearng seat belts. Table 8.9 dsplays tests of ft for several loglnear models. To nvestgate the complexty of model needed, we consder models Ž G, I, L, S., TABLE 8.8 Loglnear Models for Injury, Seat-Belt Use, Gender, and Locaton a Sample Injury Ž GI, GL, GS, IL, IS, LS. Ž GLS, GI, IL, IS. Seat Proporton Gender Locaton Belt No Yes No Yes No Yes Yes Female Urban No 7, , ,73. 1, Yes 11, , , Rural No 3, , , Yes 6, , , Male Urban No 10, , , Yes 10, , , Rural No 6,13 1,084 6, , ,150. 1, Yes 6, , , a G, gender; I, njury; L, locaton; S, seat-belt use. Source:Data courtesy of Crstanna Cook, Medcal Care Development, Augusta, Mane. TABLE 8.9 Goodness-of-Ft Tests for Loglnear Models n Table 8.8 Model G df P-Value Ž G, I, L, S Ž GI, GL, GS, IL, IS, LS Ž GIL, GIS, GLS, ILS Ž GIL, GS, IS, LS Ž GIS, GL, IL, LS Ž GLS, GI, IL, IS Ž ILS, GI, GL, GS
198 38 LOGLINEAR MODELS FOR CONTINGENCY TABLES TABLE 8.10 Estmated Condtonal Odds Ratos for Models of Table 8.8 Loglnear Model Odds Rato GI, GL, GS, IL, IS, LS GLS, GI, IL, IS GI IL IS GL S s no S s yes GS Ls urban Ls rural LS Gs female Gs male Ž GI, GL, GS, IL, IS, LS., and Ž GIL, GIS, GLS, ILS. havng all terms of varyng complexty. Model Ž G, I, L, S. of mutual ndependence fts very poorly. Model Ž GI, GL, GS, IL, IS, LS. fts much better but stll has a lack of ft Ž P Model GIL, GIS, GLS, ILS fts well G s 1.3, df s 1. but s complex and dffcult to nterpret. Ths suggests studyng models more complex than Ž GI, GL, GS, IL, IS, LS. but smpler than ŽGIL, GIS, GLS, ILS.. Frst, however, we analyze model Ž GI, GL, GS, IL, IS, LS., whch focuses on parwse assocatons. Table 8.8 dsplays ts ftted values. Table 8.10 reports the model-based estmated condtonal odds ratos. One can obtan them drectly usng the ftted values for partal tables relatng two varables at any combnaton of levels of the other two. They also follow drectly from parameter estmates; for nstance, 0.44 s expž ˆIS q ˆIS y ˆIS y ˆIS Snce the sample sze s large, the estmates of odds ratos are qute precse. For nstance, the standard error of the estmated IS condtonal log odds rato of y0.814 s A 95% Wald confdence nterval for the true odds rato s expwy Ž 0.08.x or Ž 0.4, Ths model estmates that the odds of njury for passengers wearng seat belts were less than half the odds for passengers not wearng them, at each gender locaton combnaton. The ftted odds ratos n Table 8.10 also suggest that other factors beng fxed, njury was more lkely n rural than urban accdents and more lkely for females than for males. The estmated odds that males used seat belts were only 0.63 tmes the estmated odds for females. Interpretatons are more complex for models contanng three-factor nteracton terms. Table 8.9 shows results of addng a sngle three-factor term to model Ž GI, GL, GS, IL, IS, LS.. Of the four possble models, Ž GLS, GI, IL, IS. appears to ft best. Table 8.8 also dsplays ts ft. Gven the large sample sze, ts G value suggests that t fts qute well. For model Ž GLS, GI, IL, IS., each par of varables s condtonally dependent, and at each category of I the assocaton between any two of the others
199 LOGLINEAR MODELS FOR HIGHER DIMENSIONS 39 vares across categores of the remanng varable. For ths model, t s napproprate to nterpret the GL, GS, and LS two-factor terms on ther own. Snce I does not occur n a three-factor nteracton, the condtonal odds rato between I and each varable Ž see the top porton of Table s the same at each combnaton of categores of the other two varables. When a model has a three-factor nteracton term but no term of hgher order than that, one can study the nteracton by calculatng ftted odds ratos between two varables at each level of the thrd. One can do ths at any levels of remanng varables not nvolved n the nteracton. The bottom porton of Table 8.10 llustrates ths for model Ž GLS, GI, IL, IS.. For nstance, the ftted GS odds rato of 0.66 for Ž L s urban. refers to four ftted values for urban accdents, both the four wth Ž njury s no. and the four wth Žnjury s yes.; for example, 0.66 s Ž ,959.. rž 11, , Large Samples and Statstcal versus Practcal Sgnfcance Model Ž GLS, GI, IL, IS. seems to ft much better than ŽGI, GL, GS,. IL, IS, LS. The dfference n G values of 3.4 y 7.5 s 15.9 has df s 5 y 4 s 1 Ž P s Table 8.10 ndcates, however, that the degree of threefactor nteracton s weak. The ftted odds rato between any two of G, L, and S s smlar at both levels of the thrd varable. The sgnfcantly better ft of model Ž GLS, GI, IL, IS. reflects manly the enormous sample sze. As n any test, a statstcally sgnfcant effect need not be practcally mportant. Wth huge samples, t s crucal to focus on estmaton rather than hypothess testng. For nstance, a comparson of ftted odds ratos for the two models n Table 8.10 suggests that the smpler model Ž GI, GL, GS, IL, IS, LS. s adequate for most purposes Dssmlarty Index For a table of arbtrary dmenson wth cell counts n s np 4 and ftted values ˆ s n 4 ˆ,onecansummarze the closeness of a model ft to the data by the dssmlarty ndex Ž Gn 1914., Ý Ý ˆ s n y ˆ rn s p y ˆ r. Ths ndex falls between 0 and 1, wth smaller values representng a better ft. It represents the proporton of sample cases that must move to dfferent cells for the model to ft perfectly. The dssmlarty ndex ˆ estmates a correspondng populaton ndex descrbng model lack of ft. The value s 0 occurs when the model holds perfectly. In practce, ths s unrealstc for unsaturated models, and 0. The estmator ˆ helps study whether the lack of ft s mportant n a practcal sense. When ˆ 0.0 or 0.03, the sample data follow the model pattern
200 330 LOGLINEAR MODELS FOR CONTINGENCY TABLES qute closely, even though the model s not perfect. When s near 0, ˆ tends to overestmate, substantally so for small n. Frth and Kuha Ž 000. provded an approxmate varance for ˆ and studed ways to reduce ts estmaton bas. For Table 8.8, model Ž GI, GL, GS, IL, IS, LS. has ˆ s 0.008, and model Ž GLS, GI, IL, IS. has ˆ s For ether model, movng less than 1% of the data yelds a perfect ft. The relatvely large G value for Ž GI, GL, GS, IL, IS, LS. ndcated that t does not truly hold. Nevertheless, the small ˆ value suggests that, n practcal terms, t fts decently. 8.5 LOGLINEAR LOGIT MODEL CONNECTION Loglnear models treat categorcal response varables symmetrcally, focusng on assocatons and nteractons n ther jont dstrbuton. Logt models, by contrast, descrbe how a sngle categorcal response depends on explanatory varables. The model types seem dstnct, but connectons exst between them. For a loglnear model, formng logts on one response helps to nterpret the model. Moreover, logt models wth categorcal explanatory varables have equvalent loglnear models Usng Logt Models to Interpret Loglnear Models To understand mplcatons of a loglnear model formula, t can help to form a logt on one varable. We llustrate wth the loglnear model Ž XY, XZ, YZ.. When Y s bnary, ts logt s PŽ Ys 1 X s, Z s k. 1k log s log s log 1ky log PŽ Ys X s, Z s k. k s k Ž q X q 1 Y q Z k q 1 XY q k XZ q 1k YZ. yž q X q Y q Z k q XY q k XZ q YZ k. s Ž 1 Y y Y. q Ž 1 XY y XY. q Ž 1k YZ y YZ k.. The frst parenthetcal term s a constant, not dependng on or k. The second parenthetcal term depends on the category of X. The thrd parenthetcal term depends on the category k of Z. Ths logt has the addtve form Ž. X Z logt P Ys 1 X s, Z s k s q q k Usng the notaton summarzng logt models by ther predctors, we denote t by X q Z.
201 LOGLINEAR LOGIT MODEL CONNECTION 331 In Secton we dscussed ths logt model. When Y s bnary, the XZ loglnear model XY, XZ, YZ s equvalent to t. The k terms for assoca- ton among explanatory varables cancel n the dfference n logarthms the logt defnes. The logt model does not study ths assocaton Auto Accdent Example Revsted For the Mane auto accdents Ž Table 8.8., n Secton 8.4. we showed that the loglnear model Ž GLS, GI, LI, IS., log s q G q I q L q S q GI q GL q GS g l s g l s g g l gs q IL q IS q LS q GLS, l s l s g l s fts well. It s natural to treat njury Ž I. as a response varable and gender Ž G., locaton Ž L., and seat-belt use Ž S. as explanatory varables, or perhaps S as a response wth G and L as explanatory. One can show that ths loglnear model s equvalent to logt model Ž G q L q S., Ž. g l s G L S logt P Is 1 G s g, L s l, S s s s q q q For nstance, the seat-belt effects n the two models satsfy s S s 1 IS s y IS s. In the logt calculaton, all terms n the loglnear model not havng the njury ndex cancel. Ftted values, goodness-of-ft statstcs, resdual df, and standardzed Pearson resduals for the logt model are dentcal to those for the loglnear model. Odds ratos descrbng effects on I relate to two-factor loglnear parameters and man-effect logt parameters. In the logt model, the log odds rato for the effect of S on I equals 1 S y S. Ths equals 11 IS q IS y 1 IS y 1 IS n the loglnear model. Ther estmates are the same no matter how software sets up constrants. For Table 8.8, ˆS y ˆS 1 sy0.817 for the logt model, and ˆIS q ˆIS y ˆIS y ˆIS sy0.817 for the loglnear model. Loglnear models are GLMs that treat the 16 cell counts n Table 8.8 as 16 ndependent Posson varates. Logt models are GLMs that treat the table as bnomal counts. Logt models wth I as the response treat the margnal GLS table n gql 4 as fxed and regard n 4 s g1 l s as eght ndependent bnomal varates on that response. Although the samplng models dffer, the results from fts of correspondng models are dentcal Correspondence between Loglnear and Logt Models In the dervaton of the logt model Ž X q Z. wsee Ž 8.15.x from loglnear model XZ XY, XZ, YZ, the term cancels. It mght seem as f the model Ž XY, YZ. k omttng ths term s also equvalent to that logt model. Indeed, formng the logt on Y for Ž XY, YZ. results n the same logt formula. The loglnear
202 33 LOGLINEAR MODELS FOR CONTINGENCY TABLES TABLE 8.11 Equvalent Loglnear and Logt Models for a Three-Way Table wth Bnary Response Varable Y Loglnear Symbol Logt Model Logt Symbol Ž Y, XZ. Ž. X XY, XZ q Ž X. Ž YZ, XZ. q Z k Ž Z. Ž XY, YZ, XZ. q X Z q k Ž X q Z. X Z XZ XYZ q q q Ž X*Z. k k model that has the same ft as the logt model, however, contans a general nteracton term for relatonshps among the explanatory varables. The logt model does not assume anythng about relatonshps among explanatory varables, so t allows an arbtrary nteracton pattern for them. Table 8.11 summarzes equvalent logt and loglnear models for three-way tables when Y s a bnary response. Each loglnear model contans the XZ assocaton term relatng the explanatory varables n the logt models. The smple loglnear model Ž Y, XZ. states that Y s jontly ndependent of both X and Z, and s equvalent to the logt model havng only an ntercept. The saturated loglnear model Ž XYZ. contans the three-factor nteracton term. When Y s a bnary response, ths model s equvalent to a logt model wth an nteracton between the predctors X and Z. For nstance, the effect of X on Y depends on Z, meanng that the XY odds rato vares across ts categores. That logt model s also saturated. Analogous correspondences hold when Y has several categores, usng baselne-category logt models. An advantage of the loglnear approach s ts generalty. It apples when more than one response varable exsts. The alcohol cgarette marjuana example n Secton 8..4, for nstance, used loglnear models to study assocaton patterns among three response varables. Loglnear models are most natural when at least two varables are response varables. When only one s a response, t s more sensble to use logt models drectly Generalzed Loglnear Model* Let n s Ž n,...,n. and s Ž,...,. 1 N 1 N denote column vectors of observed and expected counts for the N cells of a contngency table, wth n s Ýn. For smplcty we use a sngle ndex, but the table may be multd- mensonal. Loglnear models for postve Posson means have the form log s X Ž for model matrx X and column vector of model parameters.
203 MODEL FITTING: LIKELIHOOD EQUATIONS AND ASYMPTOTICS 333 We llustrate wth the ndependence model, log s q X q Y, for a j j table. Wth constrants X s Y s 0, t s log log X s 1. log Y log A generalzaton of 8.17 allows many addtonal models. Ths generalzed loglnear model s C logž A. s X Ž for matrces C and A. The ordnary loglnear model Ž results when C and A are dentty matrces. Other specal cases nclude logt models for bnary or multcategory responses. For nstance, the loglnear model of ndependence for a table s equvalent to a model by whch the logt for Y s the same n each row of X Ž see Secton That logt model has form Ž 8.18.: A s a 4 4 dentty matrx, so A s the 4 1 vector s Ž,,, ; the product C logž A. forms the logt n row 1 and the logt n row usng 1 y1 0 0 C s ; y1 then X s Ž 1, 1. s a 1 matrx, and s a sngle constant, sox forms a common value for those two logts. In Chapters 10 and 11 we use the generalzed loglnear model for models outsde the classes of GLMs studed thus far. An example s modelng margnal dstrbutons of multvarate responses. 8.6 LOGLINEAR MODEL FITTING: LIKELIHOOD EQUATIONS AND ASYMPTOTIC DISTRIBUTIONS* In dscussng the fttng of loglnear models, we frst derve suffcent statstcs and lkelhood equatons. We then present large-sample normal dstrbutons for ML estmators of model parameters and cell probabltes. We llustrate results wth models for three-way tables. For smplcty, dervatons use the Posson samplng model, whch does not requre a constrant on parameters such as the multnomal does.
204 334 LOGLINEAR MODELS FOR CONTINGENCY TABLES Mnmal Suffcent Statstcs For three-way tables, the jont Posson probablty that cell counts Y s n 4 jk jk s e y jk n jk jk, ŁŁŁ n! j k jk where the product refers to all cells of the table. The kernel of the log lkelhood s ÝÝÝ ÝÝÝ LŽ. s n log y. Ž jk jk jk j k j k For the general loglnear model 8.1, ths smplfes to Ý Ý Ý L s n q n X q n Y q n Z qq qjq j qqk k j k ÝÝ ÝÝ ÝÝ q n XY q n XZ q n YZ jq j qk k qjk jk j k j k ÝÝÝ ÝÝÝ q n XYZ y exp q q XYZ. Ž 8.0. jk jk jk j k j k Snce the Posson dstrbuton s n the exponental famly, coeffcents of the parameters are suffcent statstcs. For ths saturated model, n 4 jk are XYZ coeffcents of 4 jk, so there s no reducton of the data. For smpler models, certan parameters are zero and Ž 8.0. smplfes. For nstance, for the model Ž X, Y, Z. of mutual ndependence, suffcent statstcs are the coeffcents n Ž 8.0. of X 4, Y 4, and Z 4. These are n 4, n 4 j k qq qjq, and n 4 qqk. Table 8.1 lsts mnmal suffcent statstcs for several loglnear models. Each one s the coeffcent of the hghest-order termž. s n whch a varable appears. In fact, they are the margnal dstrbutons for terms n the model symbol. Smpler models use more condensed sample nformaton. For nstance, whereas Ž X, Y, Z. uses only the sngle-factor margnal dstrbutons, Ž XY, XZ, YZ. uses the two-way margnal tables. TABLE 8.1 Mnmal Suffcent Statstcs for Fttng Loglnear Models Model Mnmal Suffcent Statstcs Ž X, Y, Z. n 4, n 4, n 4 qq qjq qqk Ž XY, Z. n 4, n 4 jq qqk Ž XY, YZ. n 4, n 4 jq qjk Ž XY, XZ, YZ. n 4, n 4, n 4 jq qk qjk
205 MODEL FITTING: LIKELIHOOD EQUATIONS AND ASYMPTOTICS Lkelhood Equatons for Loglnear Models The ftted values for a model are solutons to the lkelhood equatons. We derve lkelhood equatons usng general representaton Ž for a loglnear model. For a vector of counts n wth s EŽ n., the model s log s X, for whch logž. s Ý jxj j for all. Extendng Ž 8.19., for Posson samplng the log lkelhood s Ý Ý L s n log y ž / ž / Ý Ý Ý Ý s n x y exp x. Ž 8.1. j j j j j j The suffcent statstc for s ts coeffcent, Ý nx. Snce j j ž / ž / exp Ýxj j s xjexp Ýxj j s xj, j j j LŽ. s nx y x, j s 1,,..., p. Ý j Ý j j The lkelhood equatons equate these dervatves to zero. They have the form X n s X. ˆ Ž 8.. These equatons equate the suffcent statstcs to ther expected values, a result obtaned wth GLM theory n Ž For models consdered so far, these suffcent statstcs are the margnal tables n the model symbol. To llustrate, consder model Ž XZ, YZ.. Its log lkelhood s Ž 8.0. wth XY s XYZ s 0. The log-lkelhood dervatves L XZ k L s n y and s n y YZ qk qk qjk qjk jk yeld the lkelhood equatons s n for all and k, Ž 8.3. ˆqk qk s n for all j and k. Ž 8.4. ˆqjk qjk Dervatves wth respect to lower-order terms yeld equatons mpled by these Ž Problem For model Ž XZ, YZ., the ftted values have the same XZ and YZ margnal totals as the observed data.
206 336 LOGLINEAR MODELS FOR CONTINGENCY TABLES Brch s Results for Loglnear Models For model Ž XZ, YZ., from Ž 8.3., Ž 8.4., and Table 8.1, the mnmal suffcent statstcs are the ML estmates of the correspondng margnal dstrbutons of expected frequences. Equaton Ž 8.. gves the correspondng result for any loglnear model. Brch Ž showed that lkelhood equatons for loglnear models match mnmal suffcent statstcs to ther expected values. Posson GLM theory mpled ths result n Ž 4.9. and Ž Thus, ftted values for loglnear models are smoothed versons of the cell counts that match them n certan margnal dstrbutons but have assocatons and nteractons satsfyng the model-mpled patterns. Brch showed that a unque set of ftted values both satsfy the model and match the data n the mnmal suffcent statstcs. Hence, f we fnd such a soluton, t must be the ML soluton. To llustrate, the ndependence model for a two-way table log s q X q Y j j has mnmal suffcent statstcs n 4 and n 4.Thelkelhood equatons are q ˆ s n, ˆ s n, for all and j. q q qj qj The ftted values s n n rn4 ˆj q qj satsfy these equatons and also satsfy the model. Brch s result mples that they are the ML estmates. qj Drect versus Iteratve Calculaton of Ftted Values To llustrate how to solve lkelhood equatons, we contnue the analyss of model Ž XZ, YZ.. From Ž 8.9., the model satsfes qk qjk s for all, j, and k. jk qqk For Posson samplng, the related formula uses expected frequences. Settng s rn, ths s s r 4 jk jk jk qk qjk qqk. The lkelhood equatons Ž 8.3. and Ž 8.4. specfy that ML estmates satsfy ˆ qk s nqk and ˆ qjk s nqjk and thus also ˆ qqk s n qqk. Snce ML estmates of functons of parameters are the same functons of the ML estmates of those parameters, ˆ qk ˆ qjk nqk nqjk ˆjk s s. n ˆqqk Ths soluton satsfes the model and matches the data n the suffcent statstcs. Thus, t s the unque ML soluton. qqk
207 MODEL FITTING: LIKELIHOOD EQUATIONS AND ASYMPTOTICS 337 TABLE 8.13 Ftted Values for Loglnear Models n Three-Way Tables a Model Probablstc Form Ftted Value Ž X, Y, Z. n n n jks qq qjq qqk ˆ jks n Ž XY, Z. njqnqqk jks jq qqk ˆ jks n Ž XY, XZ. jq qk njqnqk jks ˆ jks n qq qq qjq qqk Ž XY, XZ, YZ. s Iteratve methods Ž Secton 8.7. jk j jk k Ž XYZ. No restrcton s n a Formulas for models not lsted are obtaned by symmetry; for example, for Ž XZ, Y., ˆjk s nqk nqjqrn. Smlar reasonng produces 4 ˆjk for all except one model n Table 8.1. Table 8.13 shows formulas. That table also expresses 4 jk n terms of margnal probabltes. These expressons and the lkelhood equatons determne the ML formulas, usng the approach just descrbed. For models havng explct formulas for ˆjk, the estmates are sad to be drect. Many loglnear models do not have drect estmates. ML estmaton then requres teratve methods. Of models n Tables 8.1 and 8.13, the only one not havng drect estmates s Ž XY, XZ, YZ.. Although the two-way margnal tables are ts mnmal suffcent statstcs, t s not possble to express 4 drectly n terms of 4, 4,and 4 jk jq qk qjk.drect estmates do not exst for unsaturated models contanng all two-factor assocatons. In practce, t s not essental to know whch models have drect estmates. Iteratve methods for models not havng drect estmates also apply wth models that have drect estmates. Statstcal software for loglnear models uses such teratve methods for all cases Ch-Squared Goodness-of-Ft Tests Model goodness-of-ft statstcs compare ftted cell counts to sample counts. For Posson GLMs, n Secton 4.5. we showed that for models wth an ntercept term, the devance equals the G statstc. Wth a fxed number of cells, G and X have approxmate ch-squared null dstrbutons when expected frequences are large. The df equal the dfference n dmenson between the alternatve and null hypotheses. Ths equals the dfference between the number of parameters n the general case and when the model holds. We llustrate wth model Ž X, Y, Z., for multnomal samplng wth probabltes 4 jk.inthe general case, the only constrant s ÝÝÝ j k jks 1, so there are IJK y 1parameters. For model Ž X, Y, Z., s 4 jk qq qjq qqk are determned by I y 1of 4 Ž snce Ý s 1,. J y 1of 4 qq qq qjq,and K y 1of 4.Thus, qqk df s Ž IJK y 1. y Ž I y 1. q Ž J y 1. q Ž K y 1. s IJK y I y J y K q. ˆjk jk qq
208 338 LOGLINEAR MODELS FOR CONTINGENCY TABLES TABLE 8.14 Resdual Degrees of Freedom for Loglnear Models for Three-Way Tables Model Degrees of Freedom Ž X, Y, Z. IJK y I y J y K q Ž XY, Z. Ž Ky 1.Ž IJ y 1. Ž XZ, Y. Ž Jy 1.Ž IK y 1. Ž YZ, X. Ž Iy 1.Ž JK y 1. Ž XY, YZ. JŽ I y 1.Ž K y 1. Ž XZ, YZ. KŽ I y 1.Ž J y 1. Ž XY, XZ. IŽ J y 1.Ž K y 1. Ž XY, XZ, YZ. Ž I y 1.Ž J y 1.Ž K y 1. Ž XYZ. 0 The same df formula apples for Posson samplng. Then, the general case has IJK 4 jk parameters. The resdual df equal the number of cells n the table mnus the number of parameters n the Posson loglnear model for 4.Fornstance, model Ž X, Y, Z. has resdual df s IJK y w1 q Ž I y 1. jk q Ž J y 1. q Ž K y 1.x, reflectng the sngle ntercept parameter and constrants such as I X s Y J s Z K s 0. Ths equals the number of lnearly ndependent parameters equated to zero n the saturated model to obtan the gven model. Table 8.14 shows df formulas for testng three-way loglnear models Covarance Matrx of ML Parameter Estmators To present large-sample dstrbutons of ML parameter estmators, we return to general expresson logž. s Ý jxj j, from whch we obtaned the log-lke- lhood dervatves LŽ. s nx y x, j s 1,,..., p. Ý j Ý j j The Hessan matrx of second partal dervatves has elements LŽ. syý x j j k k ½ ž / 5 syýxj exp Ýxh h syýxjxk. k h Lke logstc regresson models, loglnear models are GLMs usng the canoncal lnk; thus ths matrx does not depend on the observed data. The
209 MODEL FITTING: LIKELIHOOD EQUATIONS AND ASYMPTOTICS 339 nformaton matrx, the negatve of ths matrx, s I s X dagž. X, where dagž. has the elements of on the man dagonal. For a fxed number of cells, as n, the ML estmator ˆ s asymptotcally normal wth mean and covarance matrx I y1. Thus, for Posson samplng, the asymptotc covarance matrx cov ˆ s X dag X. 8.5 y1 Substtutng ML ftted values and then takng square roots of dagonal elements yelds standard errors for. ˆ Ths also follows from the general expresson Ž 4.8. for GLMs, as noted n Secton Connecton between Multnomal and Posson Loglnear Models Smlar asymptotc results hold wth multnomal samplng. When Y, s 1,...,N4 are ndependent Posson random varables, the condtonal dstrbuton of Y 4 gven n s Ý Y s multnomal wth parameters s r Ž Ý.4.Brch Ž a a showed that ML estmates of loglnear model parame- ters are the same for multnomal samplng as for ndependent Posson samplng. He showed that estmates are also the same for ndependent multnomal samplng, as long as the model contans a term for the margnal dstrbuton fxed by the samplng desgn. To llustrate, suppose that at each combnaton of categores of X and Z, anndependent multnomal sample 4 XZ occurs on Y. Then, nqk are fxed. The model must contan k,sothe ftted values satsfy s n 4 ˆqk qk. That separate nferental theory s unnecessary for multnomal loglnear models follows from the followng argument. Express the Posson loglnear model for 4 as log s q x, where Ž 1, x. s row of the model matrx X and Ž,. s the model parameter vector. The Posson log lkelhood s Ý Ý L s L, s n log y Ý Ý Ý s n Ž q x. y expž q x. s n q n x y, where s Ý s Ý expž q x.. Snce log s q logwý expž x.x, ths log lkelhood has the form ½Ý Ý 5 L s LŽ,. s n x y nlog expž x. q Ž nlog y.. Ž 8.6.
210 340 LOGLINEAR MODELS FOR CONTINGENCY TABLES Now s rž Ý. s expž q x. rwý expž q x.x, and expž. a a a a can- cels n the numerator and denomnator. Thus, the frst term Ž n braces. on the rght-hand sde n Ž 8.6. s Ýnlog, whch s the multnomal log lkelhood, condtonal on the total cell count n. Uncondtonally, n s Ý n has a Posson dstrbuton wth expectaton Ý s,sothe second term n Ž 8.6. s the Posson log lkelhood for n. Snce enters only n the frst term, the ML estmator ˆ and ts covarance matrx for the Posson log lkelhood LŽ,. are dentcal to those for the multnomal log lkelhood. The Posson loglnear model has one more parameter e.,. than the multnomal loglnear model because of the random sample sze. See Brch Ž 1963., Lang Ž 1996c., McCullagh and Nelder Ž 1989, p. 11., and Palmgren Ž for detals. For a multnomal sample, we show n Secton that the estmated covarance matrx of loglnear parameter estmators s $ y1 4 cov ˆ s X dagž ˆ. y rn ˆˆ X. Ž 8.7. The ntercept from the Posson model s not relevant, and X for the multnomal model deletes the column of X pertanng to t n the Posson model. A smlar argument apples wth several ndependent multnomal samples. Each log-lkelhood term s a sum of components from dfferent samples, but the Posson log lkelhood agan decomposes nto two parts. One part s a Posson log lkelhood for the ndependent sample szes, and the other part s the sum of the ndependent multnomal log lkelhoods. Palmgren Ž showed that condtonal on observed margnal totals for explanatory varables, the asymptotc covarances for estmators of parameters nvolvng the response are the same as for Posson samplng. For a sngle multnomal sample, Palmgren s result mples that Ž 8.7. s dentcal to Ž 8.5. wth the row and column referrng to deleted. Brch Ž and Goodman Ž gave related results. Lang Ž 1996c. gave an elegant dscusson of connectons between multnomal and Posson models. Hs results mply that the asymptotc varance of any lnear contrast of estmated log means wthn a covarate level s dentcal for the two models Dstrbuton of Probablty Estmators For multnomal samplng, the ML estmates of cell probabltes are ˆ s rn. ˆ Wenext gve the asymptotc covž ˆ.. Lang Ž 1996c. showed the asymptotc covarance matrx for ˆ for Posson samplng and ts connecton wth covž ˆ.. The saturated model has ˆ s p, the sample proportons. Under multnomal samplng, from Ž 3.7. and Ž 3.8., ther covarance matrx s covž p. s dagž. y rn. Ž 8.8.
211 MODEL FITTING: LIKELIHOOD EQUATIONS AND ASYMPTOTICS 341 Wth I ndependent multnomal samples on a response varable wth J categores, and p consst of I sets of proportons, each havng J y 1 nonredundant elements. Then, covž p. s a block dagonal matrx. Each of the ndependent samples has a Ž J y 1. Ž J y 1. block of form Ž 8.8., and the matrx contans zeros off the man dagonal of blocks. Now assume an unsaturated model. Usng the delta method we show n Sectons 14.. and that ˆ has an asymptotc normal dstrbuton about. The estmated covarance matrx equals $ $ $ y1 $ cov s cov pxxcov px Xcov p rn. ˆ ½ 5 For a sngle multnomal sample, ths expresson equals ½ $ y1 covž ˆ. s dagž ˆ. y ˆˆ XX dagž ˆ. y ˆˆ X X dagž. y rn. ˆ ˆˆ 5 For tables wth many cells, t s not unusual to have a sample proporton of 0 n a cell. In ths case the ordnary standard error s 0, whch s unappealng. An advantage of fttng a model s that t typcally has a postve ftted probablty and standard error Unqueness of ML Estmates When all n 0,theMLestmates 4 exst and are unque. To show ths, for smplcty we use Posson samplng. Suppose that the model s parameterzed so that X has full rank. Brch Ž showed that the lkelhood equatons are soluble, by notng that the kernel of the Posson log lkelhood Ý LŽ. s Ž n log y. has ndvdual terms convergng to y as logž. ; thus, the log lkelhood s bounded above and attans ts maxmum at fnte values of the model parameters. It s statonary at ths maxmum, snce t has contnuous frst partal dervatves. Brch showed that the lkelhood equatons have a unque soluton, and the lkelhood s maxmzed at that pont. He proved ths by showng that the matrx of values y Lr 4 w.e., the nformaton matrx X dag( ) Xx h j s nonsngular and nonnegatve defnte, and hence postve defnte. Nonsngularty follows from X havng full rank and the dagonal matrx havng postve 4 ( ) w x elements.anyquadratc form cxdag Xc equals Ý' Ý jxjcj G 0, so the matrx s also nonnegatve defnte.
212 34 LOGLINEAR MODELS FOR CONTINGENCY TABLES 8.7 LOGLINEAR MODEL FITTING: ITERATIVE METHODS AND THEIR APPLICATION* When a loglnear model does not have drect estmates, teratve algorthms such as Newton Raphson can solve the lkelhood equatons. In ths secton we also present a smpler but more lmted method, terat e proportonal fttng Newton Raphson Method In Secton we ntroduced the Newton Raphson method. Referrng to notaton there, we dentfy LŽ. as the log lkelhood for Posson loglnear models. From Ž 8.1., let Then so that ž / ž / Ý Ý Ý Ý L s n x y exp x. h h h h h h LŽ. u js sýnx jy Ý x j, j LŽ. h jk s syý x j x k, Ý j k u Žt. s n y Žt. x and h Žt. sy Žt. x x. j j jk j k The tth approxmaton Žt. for ˆ derves from Žt. through Žt. s Ž Žt.. Žtq1. exp X. It generates the next value usng Ž 4.39., whch n ths context s Ý Žtq1. Žt. Žt. y1 Žt. s q X dag X X ny. Ths n turn produces Žtq1., and so on. Alternatvely, Žtq1. can be expressed as Žtq1. Žt. y1 Žt. syž H. r, Ž 8.9. Žt. Žt. Žt. Žt. Žt. where rj s Ý xj log q ny r. The expresson n brackets s the frst term n the Taylor seres expanson of log n at log Žt..
213 LOGLNEAR MODEL FITTING: ITERATIVE METHODS 343 The teratve process begns wth all Ž0. s n,orwth an adjustment such 1 as s nq f any n s 0. Then 8.9 produces Ž1., and for t 0 the teratons proceed as just descrbed wth n 4.Forloglnear models LŽ. s concave, and Žt. and Žt. usually converge rapdly to the ML estmates ˆ ˆ Žt. and as t ncreases. The H matrx converges to Hˆ syx dagž ˆ. X. By Ž 8.5., the estmated large-sample covarance matrx of ˆ s yh ˆy1, a by-product of the method. As we dscussed n Secton for GLMs, Ž 8.9. has the teratve reweghted least squares form s XVˆ X XVˆ z. Žtq1. y1 y1 y1 Žt. t t Žt. Žt. Ž Žt.. Žt. ˆ t y1 Žtq1. Here, z has elements n s log q n y r and V s w Ž Žt. dag.x. Thus, s the weghted least squares soluton for a model z Žt. s X q, 4 Žt. 4 Ž0. 4 Ž1. where are uncorrelated wth varances 1r. Wth s n, s the weghted least squares estmate for model logž n. s X q Iteratve Proportonal Fttng The terat e proportonal fttng Ž IPF. algorthm s a smple method for calculatng 4 ˆ for loglnear models. Introduced by Demng and Stephan Ž 1940., t has the followng steps: Ž0. 1. Start wth 4 satsfyng a model no more complex than the one beng Ž0. ftted. For nstance, 1.04 are trvally adequate. Ž0.. By multplyng by approprate factors, adjust 4 successvely to match each margnal table n the set of mnmal suffcent statstcs. 3. Contnue untl the maxmum dfference between the suffcent statstcs and ther ftted values s suffcently close to zero. We llustrate usng model Ž XY, XZ, YZ.. Its mnmal suffcent statstcs are n 4, n 4, and n 4 jq qk qjk. Intal estmates must satsfy the model. The frst cycle of the IPF algorthm has three steps: njq nqk n Ž1. Ž0. Ž. Ž1. Ž3. Ž. qjk jks jk, jks jk, jks jk. Ž0. Ž1. Ž. jq qk qjk Summng both sdes of the frst expresson over k shows that Ž1. jq s njq for all and j. After step 1, observed and ftted values match n the XY margnal table. After step, all Ž. qk s n qk, but the XY margnal tables no longer match. After step 3, all qjk Ž3. s n qjk, but the XY and XZ margnal tables no
214 344 LOGLINEAR MODELS FOR CONTINGENCY TABLES longer match. A new cycle begns by agan matchng the XY margnal tables, Ž4. Ž3. Ž Ž3. usng s n r. jk jk jq jq, and so on. At each step, the updated estmates contnue to satsfy the model. For Ž Ž0. nstance, step 1 uses the same adjustment factor n r. jq jq at dfferent levels k of Z. Thus, XY odds ratos from dfferent levels of Z have rato equal to 1, and the homogeneous assocaton pattern contnues at each step. As the cycles progress, the G statstc comparng cell counts to the updated ft s monotone decreasng, and the process must converge ŽFenberg 1970a; Haberman 1974a.. The IPF algorthm produces ML estmates because t generates a sequence of ftted values convergng to a soluton that both satsfes the model and matches the suffcent statstcs. By Brch s results Ž Secton , only one such soluton exsts, and t s ML. The IPF method works even for models havng drect estmates. Then, IPF normally yelds ML estmates wthn one cycle Ž Haberman 1974a, p We llustrate wth the model of ndependence. The mnmal suffcent statstcs 4 4 Ž0. are n and n.wth 1.04, the frst cycle gves q qj j nq n Ž1. Ž0. q s s, j j Ž0. q J nqj nqn Ž. Ž1. qj s s. j j Ž1. qj n The IPF algorthm then gves Žt. s n n rn for all t. ˆj q qj Comparson of Iteratve Methods The IPF algorthm s smple and easy to mplement. It converges to the ML ft even when the lkelhood s poorly behaved, for nstance wth zero ftted counts and estmates on the boundary of the parameter space. The Newton Raphson method s more complex, requrng solvng a system of equatons at each step. Newton Raphson s sometmes not feasble when the model s of hgh dmensonalty for nstance, when the contngency table and parameter vector are huge. However, IPF has dsadvantages. It s applcable prmarly to models for whch lkelhood equatons equate observed and ftted counts n margnal tables. By contrast, Newton Raphson s a general-purpose method that can solve more complex lkelhood equatons. IPF sometmes converges slowly compared to Newton Raphson. Unlke Newton Raphson, IPF does not produce the model parameter estmates and ther estmated covarance matrx as a by-product. Ftted values that IPF produces can generate ths nformaton. Model parameter estmates are contrasts of log 4 Ž ˆ see Problems 8.16 and 8.17., and substtutng ftted values nto Ž 8.5. yelds covž ˆ.. Because Newton Raphson apples to a wde varety of models and also yelds standard errors, t s the fttng routne used by most software for
215 LOGLNEAR MODEL FITTING: ITERATIVE METHODS 345 loglnear models. IPF s ncreasngly vewed as prmarly of hstorcal nterest. However, for some applcatons the analyss s more transparent usng IPF, as the next example llustrates Contngency Table Standardzaton Table 8.15 relates educaton and atttudes toward legalzed aborton usng a General Socal Survey, conducted by the Natonal Opnon Research Center. To make patterns of assocaton clearer, Smth Ž standardzed the table so that all row and column margnal totals equal 100 whle mantanng the sample odds rato structure. The IPF routne to standardze wth margns of 100 s and then for t s 1, 3, 5,..., Ž0. j s nj Žt. Žty1. Žtq1. Žt. j s j, j s j. Žty1. Žt. q At the end of each odd-numbered step, all row totals equal 100. At the end of each even-numbered step, all column totals equal 100. Odds ratos do not change at each odd Ž even. step, snce all counts n a gven row Ž column. multply by the same constant. The IPF algorthm converges to the entres n parentheses n Table The assocaton s clearer n ths standardzed table. A rdge appears down the man dagonal, wth hgher levels of educaton havng more favorable atttudes about aborton. The other counts fall away smoothly on both sdes. Table standardzaton s useful for comparng tables havng dfferent margnal structures. Mosteller Ž compared ntergeneratonal occupa- qj TABLE 8.15 Margnal Standardzaton of Atttudes toward Aborton by Years of Schoolng Atttude toward Legalzed Aborton Generally Mddle Generally Schoolng Dsapprove Poston Approve Total Less than hgh school Ž Ž 3.0. Ž Ž 100. Hgh school Ž 3.8. Ž Ž Ž 100. More than hgh school Ž Ž Ž Ž 100. Total Ž 100. Ž 100. Ž 100. Source: Smth 1976.
216 346 LOGLINEAR MODELS FOR CONTINGENCY TABLES tonal moblty tables from Brtan and Denmark. Yule Ž 191. compared three hosptals on vaccnaton and recovery for smallpox patents. A modern applcaton s adjustng sample data to match margnal dstrbutons specfed by census results. The process of table standardzaton s called rakng the table. Imrey et al. Ž and Lttle and Wu Ž derved the asymptotc covarance matrx for raked sample proportons. For sample counts n 4 wth s En 4 j j j, let E 4 denote expected frequences for the standardzed table and Eˆ 4 j j ftted values n the standardzed table. The standardzaton process corresponds to fttng the model logž Ejr j. s q E q j A. That s, mantanng the odds ratos means that the two-way tables of E r 4 and of Eˆ rn 4 j j j j satsfy ndependence. The ftted values Eˆ 4 n the standardzed table satsfy j ˆ ˆ ˆE ˆA log E y log n s q q. j j j The adjustment term, ylog n j,tothe log lnk of the ft s called an offset. The ft corresponds to usng log nj as a predctor on the rght-hand sde and forcng ts coeffcent to equal 1.0. Standard GLM software can ft models havng offsets. To rake a table, one enters as sample data pseudo-values that satsfy ndependence and have the desred margns, takng log nj as an offset. Ž For SAS, see Table A.14.. In Secton we dscuss further the use of model offsets. NOTES Secton 8.: Loglnear Models for Independence and Interacton n Three-Way Tables 8.1. Roy and Mtra Ž dscussed types of ndependence for three-way tables and ther large-sample tests. Brch s Ž artcle on ML estmaton for loglnear models was part of substantal research on loglnear models n the 1960s, much due to L. A. Goodman Ž see Secton Haberman Ž 1974a. presented an nfluental theoretcal study of loglnear models. Secton 8.3: Inference for Loglnear Models 8.. Goodman Ž 1970, 1971b., Haberman Ž 1974a, Chap. 5., Laurtzen Ž 1996., Sundberg Ž 1975., and Whttaker Ž 1990, Sec dscussed famles of loglnear models that have drect ML estmates and nterpretatons n terms of ndependence, condtonal ndependence, or equprobablty. Such models are called decomposable, snce expected frequences decompose nto products and ratos of expected margnal suffcent statstcs. Haberman proved condtons under whch loglnear models have drect estmates. Baglvo et al. Ž 199., Forster et al. Ž 1996., and Morgan and Blumensten Ž dscussed exact nference.
217 PROBLEMS For methods that allow for msclassfcaton error, see Kuha and Sknner Ž and Kuha et al. Ž and references theren. For treatment of mssng data, see Lttle Ž 1998., Schafer Ž 1997, Chap. 8., and ther references. Secton 8.7: Loglnear Model Fttng: Iterat e Methods and Ther Applcaton 8.4. Demng Ž 1964, Chap. VII. descrbed early work on IPF by Demng and Stephan. Darroch Ž 196. used IPF to obtan ML estmates n contngency tables. Bshop et al. Ž 1975., Fenberg Ž 1970a., and Speed Ž presented other applcatons of IPF. Darroch and Ratclff Ž 197. generalzed IPF for models n whch suffcent statstcs are more complex than margnal tables For further dscusson of table rakng, see Bshop et al. Ž 1975, pp , Fless Ž1981, Chap. 14., Haberman Ž 1979, Chap. 9., Hoem Ž 1987., and Lttle and Wu Ž PROBLEMS Applcatons 8.1 The 1988 General Socal Survey compled by the Natonal Opnon Research Center asked: Do you support or oppose the followng measures to deal wth AIDS? Ž. 1 Have the government pay all of the health care costs of AIDS patents; Ž. Develop a government nformaton program to promote safe sex practces, such as the use of condoms. Table 8.16 summarzes opnons about health care costs Ž H. and the nformaton program Ž I., classfed also by the respondent s gender Ž G.. a. Ft loglnear models Ž GH, GI., Ž GH, HI., Ž GI, HI., and ŽGH, GI, HI.. Show that models that lack the HI term ft poorly. b. For model Ž GH, GI, HI., show that 95% Wald confdence ntervals equal Ž 0.55, for the GH condtonal odds rato and Ž 0.99,.55. for the GI condtonal odds rato. Interpret. Is t plausble that gender has no effect on opnon for these ssues? TABLE 8.16 Data for Problem 8.1 Informaton Health Opnon Gender Opnon Support Oppose Male Support Oppose 6 5 Female Support Oppose Source: 1988 General Socal Survey, Natonal Opnon Research Center.
218 TABLE 8.17 Data for Problem 8. a Home 348 LOGLINEAR MODELS FOR CONTINGENCY TABLES Presdent Busng a 1, Yes;, no; 3, don t know. Source: 1991 General Socal Survey, Natonal Opnon Research Center. 8. Refer to Table 8.17 from the 1991 General Socal Survey. Whte subjects were asked: Ž B. Do you favor busng of Ž NegrorBlack. and whte school chldren from one school dstrct to another?, Ž P. If your party nomnated a Ž NegrorBlack. for Presdent, would you vote for hm f he were qualfed for the job?, Ž D. Durng the last few years, has anyone n your famly brought a frend who was a Ž NegrorBlack. home for dnner? The response scale for each tem was Ž yes, no, don t know.. Ft model Ž BD, BP, DP.. a. Usng the yes and no categores, estmate the condtonal odds rato for each par of varables. Interpret. b. Analyze the model s goodness of ft. Interpret. c. Conduct nference for the BP condtonal assocaton usng a Wald or lkelhood-rato confdence nterval and test. Interpret. 8.3 Refer to Secton Explan why software for whch parameters sum to zero across levels of each ndex reports ˆAC 11 s ˆAC s and ˆAC s ˆAC sy0.514, wth SE s for each term Refer to Table.6. Let D s defendant s race, V s vctms race, and P s death penalty verdct. Ft the loglnear model Ž DV, DP, PV.. a. Usng the ftted values, estmate and nterpret the odds rato between D and P at each level of V. Note the common odds rato property. b. Calculate the margnal odds rato between D and P, Ž. usng the ftted values, and usng the sample data. Why are they equal? Contrast the odds rato wth part Ž. a. Explan why Smpson s paradox occurs.
219 TABLE 8.19 Data for Problem 8.6 a Ctes PROBLEMS 349 TABLE 8.18 Data for Problem 8.5 Safety Equpment Whether Injury n Use Ejected Nonfatal Fatal Seat belt Yes 1, No 411, None Yes 4, No 157,34 1,008 Source: Florda Department of Hghway Safety and Motor Vehcles. c. Ft the correspondng logt model, treatng P as the response. Show the correspondence between parameter estmates and ft statstcs. d. Is there a smpler model that fts well? Interpret, and show the logt loglnear connecton. 8.5 Table 8.18 refers to automoble accdent records n Florda n a. Fnd a loglnear model that descrbes the data well. Interpret assocatons. b. Treatng whether klled as the response, ft an equvalent logt model. Interpret the effects. c. Snce n s large, goodness-of-ft statstcs are large unless the model fts very well. Calculate the dssmlarty ndex for the model n part Ž. a,and nterpret. 8.6 Refer to Table Subjects were asked ther opnons about government spendng on the envronment Ž E., health Ž H., assstance to bg ctes Ž C., and law enforcement Ž L.. Law 1 3 Envronment Health Enforcement: a 1, Too lttle;, about rght; 3, too much. Source: 1989 General Socal Survey, Natonal Opnon Research Center.
220 350 LOGLINEAR MODELS FOR CONTINGENCY TABLES TABLE 8.0 Output for Fttng Model to Table 8.19 Crtera For Assessng Goodness Of Ft Crteron DF Value Value / DF Devance Pearson Ch- Square Log Lkelhood Standard Wald 95% Ch- Parameter DF Estmate Error Confdence Lmts Square e*h e*h e*h y e*h y e*l y y e*l y e*l 1 1 y y e*l y e*c e*c e*c y e*c h*c y y h*c y h*c 1 1 y y h*c y h*l h*l h*l h*l c*l y c*l y c*l c*l a. Table 8.0 shows some results, ncludng the two-factor estmates, for the homogeneous assocaton model. Check the ft, and nterpret. b. All estmates at category 3 of each varable equal 0. Report the estmated condtonal odds ratos usng the too much and too lttle categores for each par of varables. Summarze the assocatons. Based on these results, whch termž. s mght you consder droppng from the model? Why? c. Table 8.1 reports ˆEH 4 eh when parameters sum to zero wthn rows and wthn columns, and when parameters are zero n the frst row and frst column. Show how these yeld the estmated EH condtonal odds rato for the too much and too lttle categores. Compare to part Ž b.. Construct a confdence nterval for that odds rato. Interpret.
221 PROBLEMS 351 TABLE 8.1 Parameter Estmates for Problem 8.6 Sum to Zero Constrants Zero for Frst Level H H E y y0.065 y y0.445 y Refer to the loglnear models for Table 8.8. a. Explan why the ftted odds ratos n Table 8.10 for model Ž GI, GL, GS, IL, IS, LS. suggest that the most lkely accdent case for njury s females not wearng seat belts n rural locatons. b. Ft model Ž GLS, GI, IL, IS.. Usng model parameter estmates, show that the ftted IS condtonal odds rato equals Show that for each njury level, the estmated condtonal LS odds rato s 1.17 for Ž G s female. and 1.03 for Ž G s male.. How can you get these usng the model parameter estmates? 8.8 Consder the followng two-stage model for Table 8.8. The frst stage s a logt model wth S as the response for the three-way GLS table. The second stage s a logt model wth these three varables as predctors for I n the four-way table. Explan why ths composte model s sensble, ft the models, and nterpret results. 8.9 Refer to the logt model n Problem 5.4. Let A s opnon on aborton. a. Gve the symbol for the loglnear model that s equvalent to ths logt model. b. Whch logt model corresponds to loglnear model Ž AR, AP, GRP.? c. State the equvalent loglnear and logt models for whch Ž. A s jontly ndependent of G, R, and P; there are man effects of R on A, but A s condtonally ndependent of G and P, gven R; there s nteracton between P and R n ther effects on A, and G has man effects For a multway contngency table, when s a logt model more approprate than a loglnear model? When s a loglnear model more approprate? 8.11 Usng software, conduct the analyses descrbed n ths chapter for the student survey data Table 8.3.
222 35 LOGLINEAR MODELS FOR CONTINGENCY TABLES 8.1 Standardze Table Descrbe the mgraton patterns The book s Web ste aarcdarcda.html has a 3 table relatng responses on frequency of attendng relgous servces, poltcal vews, opnon on makng brth control avalable to teenagers, and opnon about a man and woman havng sexual relatons before marrage. Analyze these data usng loglnear models. Theory and Methods 8.14 Suppose that s n 4 satsfy the ndependence model Ž 8.1. j j. Y Y a. Show that y s log Ž r. a b qa qb. Y b. Show that all s 04 s equvalent to s 1rJ for all j. j 8.15 Refer to the ndependence model, js j. For the correspondng loglnear model Ž 8.1.: a. Show that one can constran Ý X s Ý Y j s 0bysettng ž Ý / ž Ý / X s log y log I, Y s log y log J, h j j h h h s log q log I q log J. qj ž Ý h/ ž Ý h/ h h b. Show that one can constran 1 X s 1 Y s 0bydefnng X s log y log and Y s log y log. Then, what does equal? 1 j j For an I J table, let js log j, and let a dot subscrpt denote the X mean for that ndex e.g., s Ý rj. Then, let s, s y. j j..., Y s y, and XY s y y q... j. j.. j j..j.. a. Show that log js q X q Y j q j XY. Hence, any set of postve 4 satsfes the saturated model. j b. Show that Ý X s Ý j Y j s Ý j XY s Ý j j XY s 0. c. For tables, show that log s 4 11 XY. XY d. For J tables, show that 11 s Ý jlog j r J, where js 11 jr 1 1 j, j s,..., J. e. Alternatve constrants have other odds rato formulas. Let s, 11 X s y, Y s y, and XY s y y q j 1 j 11 j j 1 1j 11 Then, show that the saturated model holds wth 1 X s 1 Y s 1j XY s XY XY s 0 for all and j, and s logž r.. 1 j 11 j 1 j 1
223 PROBLEMS Suppose that all jk 0. Let jks log jk, and consder model parameters wth zero-sum constrants. a. For the general loglnear model Ž 8.1., defne parameters n the Ž XY fashon of Problem 8.16 e.g., s y y q. j j....j..... XY b. For model XY, XZ, YZ wth a table, show that 11 1 s 4 log 11Ž k.. c. For Ž XYZ. wth a table, show that XYZ s 8 log 11Ž1. r 11Ž.. Thus, XYZ s 0sequvalent to s. jk 11Ž1. 11Ž Two balanced cons are flpped, ndependently. Let X s whether the frst flp resulted n a head Ž yes, no., Y s whether the second flp resulted n a head, and Z s whether both flps had the same result. Usng ths example, show that margnal ndependence for each par of three varables does not mply that the varables are mutually ndependent For three categorcal varables X, Y, and Z: a. When Y s jontly ndependent of X and Z, show that X and Y are condtonally ndependent, gven Z. b. Prove that mutual ndependence of X, Y, and Z mples that X and Y are both margnally and condtonally ndependent. c. When X s ndependent of Y and Y s ndependent of Z, does t follow that X s ndependent of Z? Explan. d. When any par of varables s condtonally ndependent, explan why there s no three-factor nteracton. 8.0 Suppose that X and Y are condtonally ndependent, gven Z, and X and Z are margnally ndependent. a. Show that X s jontly ndependent of Y and Z. b. Show X and Y are margnally ndependent. c. Show that f X and Z are condtonally Ž rather than margnally. ndependent, then X and Y are stll margnally ndependent A table satsfes qqs qjqs qqk s, all, j, k. Gve an example of 4 that satsfes model Ž a. Ž X, Y, Z., Ž b. Ž XY, Z. jk, Ž.Ž c XY, YZ., Ž.Ž d XY, XZ, YZ., and Ž.Ž e XYZ., but n each case not a smpler model. 8. Suppose that model XY, XZ, YZ holds n a table, and the common XY condtonal log odds rato at the two levels of Z s
224 354 LOGLINEAR MODELS FOR CONTINGENCY TABLES postve. If the XZ and YZ condtonal log odds ratos are both postve or both negatve, show that the XY margnal odds rato s larger than the XY condtonal odds rato. Hence, Smpson s paradox cannot occur for the XY assocaton. 8.3 Show that the general loglnear model n T dmensons has T terms. T T ž/ 1 ž/ w Hnt: It has an ntercept, sngle-factor terms, two-factor terms,....x 8.4 Each of T responses s bnary. For dummy varables z,..., z 4 1 T,the loglnear model of mutual ndependence has the form log s z q q z. z,..., z 1 1 T T 1 T Show how to express the general loglnear model Cox Consder a cross-classfcaton of W, X, Y, Z. a. Explan why Ž WXZ, WYZ. s the most general loglnear model for whch X and Y are condtonally ndependent. b. State the model symbol for whch X and Y are condtonally ndependent and there s no three-factor nteracton. 8.6 For a four-way table wth bnary response Y, gve the equvalent loglnear and logt models that have: a. Man effects of A, B, and C on Y. b. Interacton between A and B n ther effects on Y, and C has man effects. c. Repeat part Ž. a for a nomnal response Y wth a baselne-category logt model. 8.7 For a 3 3 table wth ordered rows havng scores x 4,dentfy all terms n the generalzed loglnear model Ž for models Ž a. logtw PYF Ž j.x s q x, and Ž b. logw PYs Ž j. rpž Ys 3.x j s jq jx. 8.8 For the ndependence model for a two-way table, derve mnmal suffcent statstcs, lkelhood equatons, ftted values, and resdual df. 8.9 For the loglnear model for an I J table, log js q X, show that s n rj and resdual df s IŽ Jy 1.. ˆj q 8.30 Wrte the log lkelhood L for model XZ, YZ. Calculate Lr and show that t mples s n. Show that Lr X s n y. ˆqqq qq qq
225 PROBLEMS 355 Smlarly, dfferentate wth respect to each parameter to obtan lkelhood equatons. Show Ž 8.3. and Ž 8.4. mply the other equatons, so those equatons determne the ML estmates For model Ž XY, Z., derve Ž a. mnmal suffcent statstcs, Ž b. lkelhood equatons, Ž. c ftted values, and Ž. d resdual df for tests of ft. 8.3 Consder the loglnear model wth symbol Ž XZ, YZ.. a. For fxed k, show that 4 ˆjk equal the ftted values for testng ndependence between X and Y wthn level k of Z. b. Show that the Pearson and lkelhood-rato statstcs for testng ths model s ft have form X s Ý Xk, where Xk tests ndependence between X and Y at level k of Z Verfy the df values shown n Table 8.14 for models Ž XY, Z., Ž XY, YZ., and Ž XY, XZ, YZ Verfy that loglnear model Ž GLS, GI, LI, IS. mples logt model Ž Show that the condtonal log odds rato for the effect of S on I equals 1 S y S n the logt model and 11 IS q IS y 1 IS y 1 IS n the loglnear model Table 8. shows ftted values for models for four-way tables that have drect estmates. a. Use Brch s results to verfy that the entry s correct for Ž W, X, Y, Z.. Verfy ts resdual df. b. Motvate the estmate and df formulas for Ž WX, YZ., Ž WXY, Z., Ž WXY, WZ., and Ž WXY, WXZ. usng composte varables and the correspondng results for two-way tables we.g., for Ž WXY, WZ., gven W, Z s ndependent of the composte XY varable x. TABLE 8. Data for Problem 8.35 a Model Expected Frequency Estmate Resdual DF 3 W, X, Y, Z nhqqqnqqqnqqjqnqqqkrn HIJK y H y I y J y K q 3 WX, Y, Z nhqqnqqjqnqqqkrn HIJK y HI y J y K q Ž WX, WY, Z. nhqqnhqjq nqqqkrnhqqq n HIJK y HI y HJ y K q H q 1 Ž WX, YZ. n n rn Ž HIy 1.Ž JK y 1. hqq qqjk Ž WX, WY, XZ. nhqqnhqjqnqqkrnhqqq nqqq HIJK y HI y HJ y IK q H q I Ž WX, WY, WZ. n hqqnhqjq nhqqkr nhqqq HIJK y HI y HJ y HK q H Ž WXY, Z. n n rn Ž HIJ y 1.Ž K y 1. hjq qqqk Ž WXY, WZ. n n rn HŽ IJy 1.Ž K y 1. hjq hqqk hqqq Ž WXY, WXZ. n n rn HIŽ Jy 1.Ž K y 1. hjq hqk hqq a Number of levels of W, X, Y, Z, denoted by H, I, J, K. Estmates for other models of each type are obtaned by symmetry.
226 356 LOGLINEAR MODELS FOR CONTINGENCY TABLES 8.36 A T-dmensonal table n 4 ab...t has I categores n dmenson. a. Fnd mnmal suffcent statstcs, ML estmates of cell probabltes, and resdual df for the mutual ndependence model. b. Fnd the mnmal suffcent statstcs and resdual df for the herarchcal model havng all two-factor assocatons but no three-factor nteractons Consder loglnear model Ž X, Y, Z. for a table. a. Express the model n the form log s X. b. Show that the lkelhood equatons Xns X ˆ equate n 4 jk and 4 n the one-dmensonal margns. ˆjk 8.38 Apply IPF to model a X, YZ, and b XZ, YZ. Show that the ML estmates result wthn one cycle Gven target row totals r 04 and column totals c 0: 4 j a. Explan how to use IPF to adjust sample proportons p 4 j to have these totals but mantan the sample odds ratos. b. Show how to fnd cell proportons that have these totals and for whch all local odds ratos equal 0. Ž Hnt: Take ntal values of 1.0 n all cells n the frst row and n the frst column. Ths determnes all other ntal cell entres such that all local odds ratos equal.. c. Explan how cell proportons are determned by the margnal proportons and the local odds ratos Refer to Brch s results n Secton Show that L has ndvdual terms convergng to y as log. Explan why postve def- nteness of the nformaton matrx mples that the soluton of the lkelhood equatons s unque, wth lkelhood maxmzed at that pont.
227 Categorcal Data Analyss, Second Edton. Alan Agrest Copyrght 00 John Wley & Sons, Inc. ISBN: CHAPTER 9 Buldng and Extendng r Loglnear Logt Models In Chapters 5 through 7 we presented logstc regresson models, whch use the logt lnk for bnomal or multnomal responses. In Chapter 8 we presented loglnear models for contngency tables, whch use the log lnk for Posson cell counts. Equvalences between them were dscussed n Secton In ths chapter we dscuss buldng and extendng these models wth contngency tables. In Secton 9.1 we present graphs that show a model s assocaton and condtonal ndependence patterns. In Secton 9. we dscuss selecton and comparson of loglnear models. Dagnostcs for checkng models, such as resduals, are presented n Secton 9.3. The loglnear models of Chapter 8 treat all varables as nomnal. In Secton 9.4 we present loglnear models of assocaton between ordnal varables. In Sectons 9.5 and 9.6 we present generalzatons that replace fxed scores by parameters. In the fnal secton we dscuss complcatons that occur wth sparse contngency tables. 9.1 ASSOCIATION GRAPHS AND COLLAPSIBILITY A graphcal representaton for assocatons n loglnear models ndcates the pars of condtonally ndependent varables. Ths representaton helps reveal mplcatons of models. Our presentaton derves partly from Darroch et al. Ž 1980., who used mathematcal graph theory to represent certan loglnear models Ž called graphcal models. havng a condtonal ndependence structure Assocaton Graphs An assocaton graph has a set of vertces, each vertex representng a varable. An edge connectng two varables represents a condtonal assocaton be- 357
228 358 BUILDING AND EXTENDING LOGLINEAR r LOGIT MODELS FIGURE 9.1 Assocaton graph for model WX, WY, WZ, YZ. tween them. For nstance, loglnear model Ž WX, WY, WZ, YZ. lacks XY and XZ terms. It assumes ndependence between X and Y and between X and Z, condtonal on the remanng two varables. Fgure 9.1 portrays ths model s assocaton graph. The four varables form the vertces. The four edges represent parwse condtonal assocatons. Edges do not connect X and Y or X and Z, the condtonally ndependent pars. Two loglnear models wth the same parwse assocatons have the same assocaton graph. For nstance, ths assocaton graph s also the one for model Ž WX, WYZ., whch adds a three-factor WYZ nteracton. A path n an assocaton graph s a sequence of edges leadng from one varable to another. Two varables X and Y are sad to be separated by a subset of varables f all paths connectng X and Y ntersect that subset. For nstance, n Fgure 9.1, W separates X and Y, snce any path connectng X and Y goes through W. The subset W, Z4 also separates X and Y. A fundamental result states that two varables are condtonally ndependent gven any subset of varables that separates them ŽKrener 1987; Whttaker 1990, p Thus, not only are X and Y condtonally ndependent gven W and Z, but also gven W alone. Smlarly, X and Z are condtonally ndependent gven W alone Collapsblty n Three-Way Contngency Tables In Secton.3.3 we showed that condtonal assocatons n partal tables usually dffer from margnal assocatons. Under certan collapsblty condtons, however, they are the same. For three-way tables, XY margnal and condtonal odds ratos are dentcal f ether Z and X are condtonally ndependent or f Z and Y are condtonally ndependent. The condtons state that the varable treated as the control Ž Z. s condtonally ndependent of X or Y, orboth. These condtons occur for loglnear models Ž XY, YZ. and Ž XY, XZ.. Thus, the ftted XY odds rato s dentcal n the partal tables and the margnal table for models wth assocaton graphs X Y Z and Y X Z
229 ASSOCIATION GRAPHS AND COLLAPSIBILITY 359 or even smpler models, but not for the model wth graph X Z Y n whch an edge connects Z to both X and Y. The proof follows drectly from the formulas for models Ž XY, YZ. and Ž XY, XZ. Ž Problem We llustrate for the student survey Ž Table 8.3. from Secton 8..4, wth A s alcohol use, C s cgarette use, and M s marjuana use. Model Ž AM, CM. specfes AC condtonal ndependence, gven M. Ithas assocaton graph A M C. Consder the AM assocaton. Snce C s condtonally ndependent of A, the AM ftted condtonal odds ratos are the same as the AM ftted margnal odds rato collapsed over C. From Table 8.5, both equal Smlarly, the CM assocaton s collapsble. The AC assocaton s not, because M s condtonally dependent wth both A and C n model Ž AM, CM.. Thus, A and C may be margnally dependent, even though they are condtonally ndependent. In fact, from Table 8.5, the ftted AC margnal odds rato for ths model s.7. For model Ž AC, AM, CM., no par s condtonally ndependent. No collapsblty condtons are fulflled. Table 8.5 showed that each par has qute dfferent ftted margnal and condtonal assocatons for ths model. When a model contans all two-factor effects, effects may change after collapsng over any varable Collapsblty and Logt Models The collapsblty condtons apply also to logt models. For nstance, suppose that a clncal tral studes the assocaton between a bnary treatment varable X Ž x s 1, x s 0. 1 and a bnary response Y, usng data from K centers Ž Z.. The logt model logt PŽ Ys 1 X s, Z s k. s q xq k has the same treatment effect for each center. Snce ths model corresponds to loglnear model Ž XY, XZ, YZ., ths effect may dffer after collapsng the K table over centers. The estmated XY condtonal odds rato, expž ˆ., typcally dffers from the sample odds rato n the margnal table. Next, consder the smpler model that lacks center effects, logt PŽ Ys 1 X s, Z s k. s q x. For a gven treatment, the success probablty s dentcal for each center. The model satsfes a collapsblty condton, because t states that Z s Z
230 360 BUILDING AND EXTENDING LOGLINEAR r LOGIT MODELS condtonally ndependent of Y, gven X. Ths logt model s equvalent to loglnear model Ž XY, XZ., for whch the XY assocaton s collapsble. So, when center effects are neglgble and the smpler model fts nearly as well, the estmated treatment effect s approxmately the margnal XY odds rato Collapsblty and Assocaton Graphs for Multway Tables Bshop et al. Ž 1975, p. 47. provded a parametrc collapsblty condton wth multway tables: Suppose that a model for a multway table parttons varables nto three mutually exclusve subsets, A, B, C, such that B separates A and C. After collapsng the table over the varables n C, parameters relatng varables n A and parameters relatng varables n A to varables n B are unchanged. We llustrate usng model Ž WX, WY, WZ, YZ. Ž Fgure Let A s X 4, B s W 4, and C s Y, Z 4.Sncethe XY and XZ terms do not appear, all parameters lnkng set A wth set C equal zero, and B separates A and C. If we collapse over Y and Z, the WX assocaton s unchanged. Next, dentfy A s Y, Z 4, B s W 4, C s X 4. Then, condtonal assocatons among W, Y, and Z reman the same after collapsng over X. Ths result also mples that when any varable s ndependent of all other varables, collapsng over t does not affect any other model terms. For nstance, assocatons among W, X, and Y n model Ž WX, WY, XY, Z. are the same as n Ž WX, WY, XY.. When set B contans more than one varable, although parameter values are unchanged n collapsng over set C, the ML estmates of those parameters may dffer slghtly. A stronger collapsblty defnton also requres that the estmates be dentcal. Ths condton of commutatvty of fttng and collapsng holds f the model contans the hghest-order term relatng varables n B to each other. Asmussen and Edwards Ž dscussed ths property, whch relates to decomposablty of tables Ž Note MODEL SELECTION AND COMPARISON Strateges for selectng and comparng loglnear models are smlar to those for logstc regresson dscussed n Secton 6.1. A model should be complex enough to ft well but also relatvely smple to nterpret, smoothng rather than overfttng the data Consderatons n Model Selecton The potentally useful models are usually a small subset of the possble models. A study desgned to answer certan questons through confrmatory analyses may plan to compare models that dffer only by the ncluson of certan terms. Also, models should recognze dstnctons between response
231 MODEL SELECTION AND COMPARISON 361 and explanatory varables. The modelng process should concentrate on terms lnkng responses and terms lnkng explanatory varables to responses. The model should contan the most general nteracton term relatng the explanatory varables. From the lkelhood equatons, ths has the effect of equatng the ftted totals to the sample totals at combnatons of ther levels. Ths s natural, snce one normally treats such totals as fxed. Related to ths, certan margnal totals are often fxed by the samplng desgn. Any potental model should nclude those totals as suffcent statstcs, so lkelhood equatons equate them to the ftted totals. Consder Table 8.8 wth I s automoble njury and S s seat-belt use as responses and G s gender and L s locaton as explanatory varables. Then we treat n 4 gqlq as fxed at each combnaton for G and L. Forexample, 0,69 women had accdents n urban locatons, so the ftted counts should have 0,69 women n urban locatons. To ensure ths, a loglnear model should contan the GL term, whch mples from ts lkelhood equatons that ˆ gqlq s n gqlq 4. Thus, the model should be at least as complex as Ž GL, S, I. and focus on the effects of G and L on S and I as well as the SI assocaton. If S s also explanatory and only I s a response, n gql 4 s should be fxed. Wth a sngle categorcal response, relevant loglnear models correspond to logt models for that response. One should then use logt rather than loglnear models, when the man focus s descrbng effects on that response. For exploratory studes, a search among potental models may provde clues about assocatons and nteractons. One approach frst fts the model havng sngle-factor terms, then the model havng two-factor and sngle-factor terms, then the model havng three-factor and lower terms, and so on. Fttng such models often reveals a restrcted range of good-fttng models. In Secton 8.4. we used ths strategy wth the automoble njury data set. Automatc search mechansms among possble models, such as backward elmnaton, may also be useful but should be used wth care and skeptcsm. Such a strategy need not yeld a meanngful model. 9.. Model Buldng for the Dayton Student Survey In Sectons 8..4 and 8.3. we analyzed the use of alcohol Ž A., cgarettes Ž C., and marjuana Ž M. by a sample of hgh school senors. The study also classfed students by gender Ž G. and race Ž R.. Table 9.1 shows the fve-dmensonal contngency table. In selectng a model, we treat A, C, and M as responses and G and R as explanatory. Thus, a model should contan the GR term, whch forces the GR ftted margnal totals to equal the sample margnal totals Table 9. dsplays goodness-of-ft tests for several models. Because many cell counts are small, the ch-squared approxmaton for G may be poor, but ths ndex s useful for comparng models. The frst model lsted contans only the GR assocaton and assumes condtonal ndependence for the other nne pars of assocatons. It fts horrbly, whch s no surprse. Model, wth all two-factor terms, on the other hand, seems to ft well. Model 3, contanng all
232 36 BUILDING AND EXTENDING LOGLINEAR r LOGIT MODELS TABLE 9.1 Alcohol, Cgarette, and Marjuana Use for Hgh School Senors Race s Whte Marjuana Use Race s Other Alcohol Cgarette Female Male Female Male Use Use Yes No Yes No Yes No Yes No Yes Yes No No Yes No Source: Harry Khams, Wrght State Unversty. TABLE 9. Goodness-of-Ft Tests for Loglnear Models for Table 9.1 a Model G df 1. Mutual ndependence q GR Homogeneous assocaton All three-factor terms a. Ž. AC b. Ž. AM c. Ž. CM d. Ž. AG e. Ž. AR f. Ž. CG g. Ž. CR h. Ž. GM Ž. MR Ž AC, AM, CM, AG, AR, GM, GR, MR Ž AC, AM, CM, AG, AR, GM, GR Ž AC, AM, CM, AG, AR, GR a G, gender; R, race; A, alcohol use; C, cgarette use; M, marjuana use. the three-factor nteracton terms, also fts well, but the mprovement n ft s Ž not great dfference n G of 15.3 y 5.3 s 10.0 based on df s 16 y 6 s 10.. Thus, we consder models wthout three-factor terms. Begnnng wth model, we elmnate two-factor terms. We use backward elmnaton, sequentally takng out terms for whch the resultng ncrease n G s smallest, when refttng the model. Table 9. shows the start of ths process. Nne parwse assocatons are canddates for removal from model Ž all except GR., shown n models 4a through 4. The smallest ncrease n G, compared to model, occurs n removng the CR term e., model 4g.. The ncrease s 15.8 y 15.3 s 0.5, wth df s 17 y 16 s 1, so ths elmnaton seems sensble. After removng t,
233 MODEL SELECTION AND COMPARISON 363 the smallest addtonal ncrease results from removng the CG term Žmodel. 5,resultng n G s 16.7 wth df s 18, and a change n G of 0.9 based on df s 1. Removng next the MR term model 6 yelds G s 19.9 wth df s 19, a change n G of 3. based on df s 1. Further removals have a more severe effect. For nstance, removng the AG term ncreases G by 5.3, wth df s 1, for a P-value of 0.0. One cannot take such P-values lterally, snce the data suggested these tests, but t seems safest not to drop addtonal terms. wsee Westfall and Wolfnger Ž and Westfall and Young Ž for methods of adjustng P-values to account for multple tests x. Model 6, denoted by Ž AC, AM, CM, AG, AR, GM, GR., has assocaton graph M G C A R Every path between C and G, R4 nvolves a varable n A, M 4.Gventhe outcome on alcohol use and marjuana use, the model states that cgarette use s ndependent of both gender and race. Collapsng over the explanatory varables race and gender, the condtonal assocatons between C and A and between C and M are the same as wth the model Ž AC, AM, CM. ftted n Secton Removng the GM term from ths model yelds model 7 n Table 9.. Its assocaton graph reveals that A separates G, R4 from C, M 4. Thus, all parwse condtonal assocatons among A, C, and M n model 7 are dentcal to those n model Ž AC, AM, CM., collapsng over G and R. Infact, Ž model 7 does not ft poorly G s 8.8 wth df s 0. consderng the large sample sze. Ž Its sample dssmlarty ndex s ˆ s Hence, one mght collapse over gender and race n studyng assocatons among the prmary varables. An advantage of the full fve-varable model s that t estmates effects of gender and race on these responses, n partcular the effects of race and gender on alcohol use and the effect of gender on marjuana use Loglnear Model Comparson Statstcs Consder two loglnear models, M1 and M 0, wth M0 a specal case of M 1.By Sectons and 5.4.3, the lkelhood-rato statstc for testng M0 aganst M s G Ž M M. s G Ž M. y G Ž M We used ths statstc above n comparng pars of models. Let n denote a column vector of the observed cell counts n 4. Let ˆ 0 and ˆ denote vectors of the ftted values 4 and 4 1 ˆ0 ˆ1 for M0 and M 1.The devance G Ž M. for the smpler model parttons nto 0 G Ž M. s G Ž M. q G M M. Ž Just as G Ž M. measures the dstance of ftted values for M from n, G Ž M M. measures the dstance of ft ˆ from ft ˆ. In ths sense,
234 364 BUILDING AND EXTENDING LOGLINEAR r LOGIT MODELS decomposton Ž 9.1. expresses a certan orthogonalty: The dstance of n from ˆ 0 equals the dstance of n from ˆ1 plus the dstance of ˆ1 from ˆ 0. The model comparson statstc equals Ý Ž ˆ. Ý Ž ˆ. G M M s n log n r y n log n r Ý 1 0 s n log ˆ r ˆ. Ž 9.. The two loglnear models have the matrx form 8.17, or log s X and log s X Snce M s smpler than M, one can express log s X s X , where 1 equals 0 wth 0 elements appended correspondng to the extra parameters n but not n. Then, from Ž 9.., 1 0 G Ž M M. s n Ž log ˆ y log ˆ. s n X ˆ y X ˆ ˆ ˆ ˆ ˆ1 ˆ1 ˆ 0 s X y X s log y log Ý s ˆ log ˆ r ˆ, Ž 9.3. ˆ 1 where the replacement of n by follows from the lkelhood equatons nx s X for M wrecall Ž 8..x. Statstc Ž ˆ has the same form as G Ž M., but wth 4 0 ˆ1 playng the role of the observed data. Note that G Ž M. s the specal case of G Ž M M wth M1 saturated. The Pearson dfference X Ž M. y X Ž M. 0 1 does not have Pearson form. It s not even necessarly nonnegatve. A more approprate Pearson statstc for comparng models s Ý Ž 0 1. ˆ1 ˆ0 ˆ0 X M M s y r. Ž 9.4. Ths has the usual form wth 4 n place of n 4. Statstcs Ž 9.3. and Ž 9.4. ˆ1 depend on the data only through the ftted values and thus only through suffcent statstcs for M 1. When M holds, G Ž M. and G Ž M have asymptotc ch-squared dstrbutons, and G Ž M M. 0 1 s asymptotcally ch-squared wth df equal to the dfference between df for M and M. Haberman Ž 1977a. 0 1 showed that G Ž M M. and X Ž M M have the same null large-sample behavor, even for farly sparse tables. ŽUnder certan condtons, ther dfference converges n probablty to 0 as n ncreases.. When M holds but M does not, G Ž M stll has ts asymptotc ch-squared dstrbuton, but the other two statstcs tend to grow unboundedly as n ncreases.
235 MODEL SELECTION AND COMPARISON Parttonng Ch-Squared wth Model Comparsons Equaton Ž 9.1. utlzes the property by whch a ch-squared statstc wth df 1 parttons nto components. We used such parttonngs n tests for trend wth ordnal predctors n lnear logt or lnear probablty models Ž Secton and wth ordnal responses n cumulatve logt models ŽSecton 7... More generally, ths property apples wth a set of nested models to test a sequence of hypotheses. The separate tests for comparng pars of models are asymptotcally ndependent. For example, a ch-squared decomposton wth J y 1 models justfes the parttonng of G stated n Secton for J tables. For j s,..., J, let Mj denote the model that satsfes s Ž 1, q1. rž 1, q1. s 1, s 1,..., j y 1. For M j, the j table consstng of columns 1 through j satsfes ndepen- dence. Model MJ s ndependence n the complete J table. Model Mh s a specal case of M whenever h j. ByŽ 9.., j s G Ž MJ MJy1. q G Ž MJy1 MJy. q G Ž MJy. s s G M M q qg M M q G Ž M.. G MJ s G MJ MJy1 q G MJy1 J Jy1 3 Ž. From 9.3, G Mj Mjy1 has the G form wth the ftted values for model Mjy1 playng the role of the observed data. Substtuton of ftted values for Ž. the two models nto 9.3 shows that G Mj Mjy1 s dentcal to G for testng ndependence n a table; the frst column combnes column 1 through j y 1ofthe orgnal table, and the second column s column j of the orgnal table. Wth several preplanned comparsons, smultaneous test procedures lessen the probablty of attrbutng mportance to sample effects that smply reflect chance varaton. These procedures use adjusted sgnfcance levels. For a set of s tests for nested models, when each test has level 1 y Ž 1 y. 1r s, the overall asymptotc PŽ type I error. F Ž Goodman 1969a.. For nstance, suppose that we test the ft of Ž WXZ, WY, XY, ZY., compare that model to Ž WX, WZ, XZ, WY, XY, ZY., and compare that model to ŽWX, WZ, XZ, WY, ZY.. To ensure overall s 0.05 for the s s 3 tests, use level 1 y Ž r3 s for each Identcal Margnal and Condtonal Tests of Independence A test usng G Ž M M. 0 1 smplfes dramatcally when both models have drect estmates. In that case, the models have ndependence lnkages neces-
236 366 BUILDING AND EXTENDING LOGLINEAR r LOGIT MODELS sary to ensure collapsblty. A test of condtonal ndependence has the same result as the test of ndependence appled to the margnal table. Sundberg Ž proved the followng: When two drect models M0 and M1 are dentcal except for a parwse assocaton term, G Ž M M. 0 1 s dentcal to G for testng ndependence n the margnal table for that par of varables. Bshop Ž and Goodman Ž 1970, 1971b. have related dscusson. w x XY For nstance, G X, Y, Z XY, Z tests s 0 n model Ž XY, Z.. Thus, t tests XY condtonal ndependence under the assumpton that X and Y are jontly ndependent of Z. Usng the two sets of ftted values, from Ž 9.3., t equals njqnqqk njqnqqk rn log n n n n rn ÝÝÝ j k qq qjq qqk n jq s ÝÝn jq log, n n rn j qq qjq whch equals G wž X, Y.x for testng ndependence n the margnal XY table. Ths s not surprsng. The collapsblty condtons mply that for model Ž XY, Z., the margnal XY assocaton s the same as the condtonal XY assocaton. 9.3 DIAGNOSTICS FOR CHECKING MODELS The model comparson test usng G Ž M M. 0 1 s useful for detectng whether an extra term mproves a model ft. Cell resduals provde a cell-specfc ndcaton of model lack of ft Resduals for Loglnear Models In Secton we noted that resduals for the ndependence model ŽSecton extend to any Posson GLM. For cell n a contngency table wth observed count n and ftted value ˆ, the Pearson resdual s ny ˆ e s. Ž 9.5. 'ˆ These relate to the Pearson statstc by Ýe s X. Lke the Pearson resdual Ž 6.1. for bnomal models, the asymptotc varances of e 4 are less than 1.0. They average Ž resdual df. ržnumber of
237 MODELING ORDINAL ASSOCIATIONS 367. cells. Haberman 1973a defned the standardzed Pearson resdual, ' r s e r 1 y ˆh, where the leverage ˆh s a dagonal element of the estmated hat matrx Ž Secton Ths has an asymptotc standard normal dstrbuton and s preferable to the Pearson resdual. A closed-form expresson apples for loglnear models havng drect estmates Ž Haberman 1978, p Alternatve resduals use components of the devance Ž Secton Student Survey Example Revsted For Table 9.1 cross-classfyng alcohol, cgarette, and marjuana use by gender and race, we suggested n Secton 9.. that the model wth all two-factor assocatons s plausble. For t, the only large standardzed Pearson resdual equals 3., resultng from a ftted value of 3.1 n the cell havng a count of 8. Further comparsons suggested that the smpler model Ž AC, AM, CM, AG, AR, GM, GR. s adequate. Its only large standardzed resdual equals 3.3, referrng to a ftted value of.9 n that cell. The number of nonwhte males who dd not use alcohol or marjuana but who smoked cgarettes s somewhat greater than ether model predcts. The standardzed Pearson resduals do not suggest problems wth ether model, consderng the large sample sze and many cells studed Correspondence between Loglnear and Logt Resduals In Secton 8.5 we showed that logt models n contngency tables are equvalent to certan loglnear models. However, a Pearson resdual for a logt model dffers from a Pearson resdual for a loglnear model. The numerators comparng the th observed and ftted bnomal or Posson count are the same, snce the model ftted values are the same. However, the logt model uses a ftted bnomal standard devaton n the denomnator wsee Ž 6.1.x, whereas the loglnear model uses a ftted Posson standard devaton wsee Ž 9.5.x. Thus, the logt Pearson resdual exceeds the loglnear Pearson resdual Ž Once standardzed by dvdng by estmated standard errors, the standardzed Pearson resduals are dentcal for the two models. Ths s another reason for preferrng standardzed resduals over ordnary Pearson resduals. 9.4 MODELING ORDINAL ASSOCIATIONS The loglnear models presented so far have a serous lmtaton they treat all classfcatons as nomnal. If the order of a varable s categores changes n
238 368 BUILDING AND EXTENDING LOGLINEAR r LOGIT MODELS TABLE 9.3 Opnons about Premartal Sex and Avalablty of Teenage Brth Control Teenage Brth Control a Strongly Strongly Premartal Sex Dsagree Dsagree Agree Agree Always wrong Ž 51.. Ž Ž y4.1 y Ž Ž Ž 9.1. Almost always wrong Ž Ž Ž 3.5. Ž y0.8 y.8 Ž 0.8. Ž 3.1. Ž Ž Wrong only sometmes Ž Ž Ž 61.. Ž y y1.0 Ž 4.4. Ž Ž Ž Not wrong at all Ž Ž 85.. Ž Ž y6.1 y Ž Ž Ž Ž a1 Independence model ft; standardzed Pearson resduals for the ndependence model ft; 3 lnear-by-lnear assocaton model ft. Source: 1991 General Socal Survey, Natonal Opnon Research Center. any way, the ft s the same. For ordnal classfcatons, these models gnore mportant nformaton. Refer to Table 9.3. Subjects were asked ther opnon about a man and woman havng sexual relatons before marrage Žalways wrong, almost always wrong, wrong only sometmes, not wrong at all.. They were also asked whether methods of brth control should be avalable to teenagers between the ages of 14 and 16 Ž strongly dsagree, dsagree, agree, strongly agree.. For the loglnear model of ndependence, denoted by I, G Ž I. s 17.6 wth df s 9. The model fts poorly. Yet, addng the ordnary assocaton term makes t saturated and unhelpful. Table 9.3 also contans ftted values and standardzed resduals for ndependence. The resduals n the corners stand out. Sample counts are much larger than ndependence predcts where both responses are the most negatve possble or the most postve possble. By contrast, the counts are much smaller than ftted values where one response s the most postve and the other s the most negatve. Cross-classfcatons of ordnal varables often exhbt ther greatest devatons from ndependence n the corner cells. Ths pattern for Table 9.3 ndcates lack of ft n the form of a postve trend.
239 MODELING ORDINAL ASSOCIATIONS 369 Subjects who are more wllng to make brth control avalable to teenagers also tend to feel more tolerant about premartal sex. Models for ordnal varables use assocaton terms that permt trends. The models are more complex than the ndependence model, yet unsaturated. Models wth assocaton and nteracton terms exst n stuatons n whch nomnal models are saturated. Tests wth ordnal models have mproved power for detectng trends Lnear-by-Lnear Assocaton n Two-Way Tables For two-way tables, a smple model for two ordnal varables assgns ordered row scores u1f uf F ui and column scores 1F F F J. The model s log js q X q Y j q u j, Ž 9.6. wth constrants such as I X s Y J s 0. Ths s the specal case of the satu- XY rated model 8. n whch j s u j.itrequres only one parameter to descrbe assocaton, whereas the saturated model requres Ž I y 1.Ž J y 1.. Independence occurs when s 0. The term u j represents the deva- ton of log j from ndependence. The devaton s lnear n the Y scores at a fxed level of X and lnear n the X scores at a fxed level of Y. Incolumn j, for nstance, the devaton s a lnear functon of X, havng form Ž slope. Ž score for X., wth slope. Because of ths property, Ž 9.6. j s called the lnear-by-lnear assocaton model Ž abbrevated, L L.. The model has ts greatest departures from ndependence n the corners of the table. Brch Ž 1965., Goodman Ž 1979a., and Haberman Ž 1974b. ntroduced specal cases. The drecton and strength of the assocaton depend on. When 0, Y tends to ncrease as X ncreases. Expected frequences are larger than expected Ž under ndependence. n cells where X and Y are both hgh or both low. When 0, Y tends to decrease as X ncreases. When the data dsplay a postve or negatve trend, the L L model usually fts much better than the ndependence model. For the table usng the cells ntersectng rows a and c wth columns b and d, drect substtuton shows that the model has ab cd log s Ž ucy u a.ž dy b.. Ž 9.7. ad cb Ths log odds rato s stronger as ncreases and for pars of categores that are farther apart. Smple nterpretatons result when uy u1s s ui y u and y s s y.whenu s 4 and sj 4 Iy1 1 J Jy1 j,fornstance, the local odds ratos 10. for adjacent rows and adjacent columns have common value e. Goodman Ž 1979a. called ths case unform assocaton. Fgure 9. portrays local odds ratos havng unform value.
240 370 BUILDING AND EXTENDING LOGLINEAR r LOGIT MODELS Ž FIGURE 9. Constant odds rato mpled by unform assocaton model. Note: s the constant log odds rato for adjacent rows and adjacent columns.. The choce of scores affects the nterpretaton of. Often, the response scale dscretzes an nherently contnuous scale. It s sensble to choose scores that approxmate dstances between mdponts of categores for the underlyng scale, such as we dd n measurng alcohol consumpton for a lnear logt model n Secton It s sometmes useful to standardze the scores, subtractng the mean and dvdng by the standard devaton, so Ý Ý Ý Ý u s s 0 q j qj u s s 1. q j qj Then, represents the log odds ratos for standard devaton dstances n the X and Y drectons. The L L model tends to ft well when an underlyng contnuous dstrbuton s approxmately bvarate normal. For standardzed scores, s then comparable to rž1 y., where s the underlyng correlaton. For weak assocatons, f Žsee Becker 1989b; Goodman 1981a, b, Correspondng Logt Model for Adjacent Responses A logt formulaton of the L L model treats Y as a response and X as explanatory. Let s PYs Ž j Xs. j. Usng logts for adjacent response categores Ž Secton , jq1, jq1 Y Y log s log s Ž jq1 y j. q Ž jq1 y j. u. j For unt-spaced 4,thssmplfes to j j log jq1 j s jq u
241 MODELING ORDINAL ASSOCIATIONS 371 where js Y jq1 y Y j. The same lnear logt effect apples smultaneously for all Ž J y 1. pars of adjacent response categores: The odds Y s j q 1 nstead of Y s j multply by e for each unt change n X. In usng equal-nterval response scores, we mplctly assume that the effect of X s the same on each of the J y 1 adjacent-categores logts for Y Lkelhood Equatons and Model Fttng The Posson log-lkelhood LŽ. s ÝÝn j j log jy ÝÝ j j smplfes for the L L model Ž 9.6. to Ý Ý ÝÝ L s n q n X q n Y q u n q qj j j j j j ÝÝ j j j y exp q X q Y q u. Ž X Y Dfferentatng L wth respect to,,. j and settng the three partal dervatves equal to zero yelds lkelhood equatons ˆ s n, s 1,..., I, ˆ s n, j s 1,..., J, q q qj qj ÝÝ ÝÝ u ˆ s u n. j j j j j j Iteratve methods such as Newton Raphson yeld the ML ft. Let pjs njrn and ˆ js ˆ jrn. The thrd lkelhood equaton mples that u ˆ s u p. ÝÝ ÝÝ j j j j j j Snce margnal dstrbutons and hence margnal means and varances are dentcal for ftted and observed dstrbutons, the thrd equaton mples the correlaton between the scores for X and Y s the same for both dstrbutons. The ftted counts dsplay the same postve or negatve trend as the data. Snce u 4 and 4 are fxed, the L L model Ž 9.6. j has only one more parameter Ž. than the ndependence model. Its resdual df s IJ y 1 q Ž I y 1. q Ž J y 1. q 1 s IJ y I y J, unsaturated for all but tables Sex Opnons Example Table 9.3 also reports ftted values for the lnear-by-lnear assocaton model appled to Table 9.3, usng scores 1,, 3, 44 for rows and columns. Table 9.4
242 37 BUILDING AND EXTENDING LOGLINEAR r LOGIT MODELS TABLE 9.4 Output for Fttng Lnear-by-Lnear Assocaton Model to Table 9.3 Crtera For Assessng Goodness Of Ft Crteron DF Value Devance Pearson Ch- Square Standard Wald 95% Conf. Ch- Parameter Estmate Error Lmts Square Pr ChSq Intercept y premar premar y premar 3 y y premar brth brth brth brth lnln LR Statstcs Source DF Ch- Square Pr ChSq lnln shows software output. To get ths, we added a varable Ž denoted lnln. to the ndependence model havng values equal to the product of row and column number. Compared to the ndependence model, for whch G Ž I. s 17.6 wth df s 9, the L L model fts dramatcally better wg Ž L L. s 11.5, df s 8. x Ths s especally notceable n the corners, where t predcts the greatest departures from ndependence. The ML estmate ˆ s 0.86 Ž SE s ndcates that subjects havng more favorable atttudes about teen brth control also tend to have more tolerant atttudes about premartal sex. The estmated local odds rato s expž ˆ. s expž s A 95% Wald confdence nterval s expž , or Ž 1.6, The strength of assocaton seems weak. From Ž 9.7., however, nonlocal odds ratos are stronger. The estmated odds rato for the four corner cells equals exp ˆ Ž u y u.ž y. sexp 0.86Ž 4 y 1.Ž 4y 1. s Ths also results from the corner ftted values, Ž rž s Two sets of scores havng the same spacngs yeld the same ˆ and the same ft. Any other sets of equally spaced scores yeld the same ft but an approprately rescaled. ˆ For nstance, usng row scores, 4, 6, 84 wth sj4 also yelds G s 11.5, but ˆ s wth SE s Žboth half as j
243 ASSOCIATION MODELS 373 large.. For Table 9.3, one mght regard categores and 3 as farther apart than categores 1 and, or categores 3 and 4. Scores such as 1,, 4, 54 for rows and columns recognze ths. The L L model then has G s 8.8 Ž df s 8. and ˆ s Ž SE s One need not regard the scores as approxmatons for dstances between categores or as reasonable scalngs of ordnal varables n order for the models to be vald. They smply mply a certan pattern for the odds ratos. If the L L model fts well wth equally spaced row and column scores, the unform local odds rato descrbes the assocaton regardless of whether the scores are sensble ndexes of true dstances between categores. For scores u s 4 wth Table 9.3, the margnal mean and standard devaton for premartal sex are.81 and 1.6. The standardzed scores are Ž y.81. r1.6 4, or Ž y1.44, y0.65, 0.15, The standardzed equal-nterval scores for brth control are Ž y1.65, y0.69, 0.7, For these scores, ˆ ˆ Ž s By solvng s r ˆ 1 y ˆ. for, ˆ ˆ s If there s an underlyng bvarate normal dstrbuton, we estmate the correlaton to be Drected Ordnal Test of Independence For the lnear-by-lnear assocaton model, H 0: ndependence s H 0: s 0. The lkelhood-rato test statstc equals G Ž I L L. s G Ž I. y G Ž L L.. Desgned to detect postve or negatve trends, t has df s 1. For Table 9.3, G Ž I L L. s 17.6 y 11.5 s Ths has P , extremely strong Ž ˆ. evdence of an assocaton. The Wald statstc z s rse s Ž 0.86r0.08. s 10.5 Ž df s 1. also shows strong evdence. The correlaton statstc Ž presented n Secton for testng ndependence s the score statstc for H : s 0nths model. It equals 11.6 Ž df s When the L L model holds, the ordnal test usng G Ž I L L. s asymptotcally more powerful than the test usng G Ž I.. Ths s true for the same reason gven n Secton 6.4. for the lnear logt model. The power of a ch-squared test ncreases when df decrease, for fxed noncentralty. When the L L model holds, the noncentralty s the same for G Ž I L L. and G Ž I.; thus G Ž I L L. s more powerful, snce ts df s 1 compared to Ž I y 1.Ž J y 1. for G Ž I.. The power advantage ncreases as I and J ncrease, snce the noncentralty remans focused on df s 1 for G Ž I L L. but df also ncreases for G Ž I ASSOCIATION MODELS* Generalzatons of the lnear-by-lnear assocaton model apply to multway tables or treat scores as parameters rather than fxed. The models are called assocaton models, because they focus on the assocaton structure.
244 374 BUILDING AND EXTENDING LOGLINEAR r LOGIT MODELS Row and Column Effects Models We frst present a model that treats X as nomnal and Y as ordnal. It s approprate for two-way tables wth ordered columns, usng scores 1F F F J. Snce the rows are unordered, they do not have scores. Replacng the ordered values u 4 n the lnear-by-lnear term u n model Ž 9.6. j by unordered parameters 4 gves log js q X q Y j q j. Ž 9.8. X Y Constrants are needed such as s s s 0. The 4 I J I are called row effects. The model s called the row effects model. Model Ž 9.8. has I y 1moreparameters Žthe 4. than the ndependence model. Independence s the specal case 1 s s I. A correspondng column effects model has assocaton term u j.ittreats X as ordnal wth scores u 4 and Y as nomnal wth parameters 4 j. The row effects and column effects models were developed by Goodman Ž 1979a., Haberman Ž 1974b., and Smon Ž Logt Model for Adjacent Responses Wth y s1, 4 jq1 j the row effects model has adjacent-categores logt form PŽ Ys j q 1 X s. log s jq. Ž 9.9. PŽ Ys j Xs. The effect n row s dentcal for each par of adjacent responses. Plots of these logts aganst Ž s 1,..., I. for dfferent j are parallel. Goodman Ž referred to model Ž 9.9. as the parallel odds model. Dfferences among 4 compare rows wth respect to ther condtonal dstrbutons on Y. When s h, rows h and have dentcal condtonal dstrbutons. If h, Y s stochastcally hgher n row than row h. The lkelhood equatons for the row effects model Ž 9.8. are s n 4 ˆq q, s n 4,and ˆqj qj Ý j Ý ˆ s n, s 1,..., I. j j j j Let ˆ j s ˆ jr ˆq and pj s njrn q. Snce ˆ qs n q, the thrd lkelhood equaton s Ý j jˆ j s Ý j jp j. For the condtonal dstrbuton wthn each row, the mean column score s the same for the ftted and sample dstrbutons. The lkelhood equatons are solved usng teratve methods.
245 ASSOCIATION MODELS 375 TABLE 9.5 Observed Frequences and Ftted Values for Poltcal Ideology Data Poltcal Ideology a Party Afflaton Lberal Moderate Conservatve Total Democrat Ž Ž Ž Ž Independent Ž 10.. Ž Ž Ž Ž Ž Republcan Ž Ž Ž 7.7. Ž Ž Ž a1 Independence model; row effects model. Source: Based on data n R. D. Hedlund, Publc Opnon Quart. 41: Ž Poltcal Ideology Example Table 9.5 dsplays the relatonshp between poltcal deology and poltcal party afflaton for a sample of voters n a presdental prmary n Wsconsn. The table shows ftted values for the ndependence Ž I. model and the row effects Ž R. model wth sj 4 j. Table 9.6 shows output. Goodness-of-ft tests show that ndependence s nadequate. Addng the row effects parameters much mproves the ft ŽG Ž I. s 105.7, df s 4; G Ž R. s.8, df s.also,. testng H 0: 1s s 3 usng G Ž I R. s 10.9 Ž df s. shows very strong evdence of an assocaton. In Table 9.5, the mproved ft s especally notceable at the ends of the ordnal scale, where the model has greatest devaton from ndependence. The output uses dummy varables for the frst two categores of each classfcaton. The nteracton term equals the product of the score for deology and a parameter for party. Thus, the row effect estmates satsfy ˆ 3 s 0, and the other two estmates contrast the frst two partes wth Republcans. The estmates are ˆ 1sy1.13 and ˆ sy The further ˆ falls n the negatve drecton, the greater the tendency for the party to locate at the lberal end of the deology scale, relatve to Republcans. In ths sample the Republcans are much more conservatve than the other two groups, and the Democrats Ž row 1. are the most lberal. From Ž 9.9. the model predcts constant odds ratos for adjacent columns of poltcal deology. For nstance, snce ˆ 3y ˆ 1s 1.13, the estmated odds that Republcans were conservatve nstead of moderate, or moderate nstead of lberal, were expž s 3.36 tmes the correspondng estmated odds for Democrats. Fgure 9.3 shows the parallelsm of the estmated logts for the row effects model. The loglnear model does not dstngush between response and explanatory varables. Instead, one could use a cumulatve logt model to descrbe
246 376 BUILDING AND EXTENDING LOGLINEAR r LOGIT MODELS TABLE 9.6 Output for Fttng Row Effects Model to Table 9.5 Crtera For Assessng Goodness Of Ft Crteron DF Value Devance.8149 Pearson Ch- Square.8039 Std Wald 95% Conf. Ch- Pr Parameter Estmate Error Lmts Square ChSq Intercept party Democ party Indep party Repub deology 1 y y.4831 y deology y y y deology score*party Democ y y y score*party Indep y y y score*party Repub LR Statstcs Source DF Ch- Square Pr ChSq score*party FIGURE 9.3 Observed and predcted logts for adjacent response categores.
247 ASSOCIATION MODELS 377 the effects of party afflaton on deology, or a baselne-category logt model to descrbe lnear effects of deology on party afflaton Ordnal Varables n Models for Multway Tables Multdmensonal tables wth ordnal responses can use generalzatons of assocaton models. In three dmensons, the rch collecton of models ncludes Ž. 1 assocaton models that are more parsmonous than the nomnal model Ž XY, XZ, YZ., and models permttng heterogeneous assocaton that, unlke model Ž XYZ., are unsaturated. Models for assocaton that are specal cases of Ž XY, XZ, YZ. replace assocaton terms by structured terms that account for ordnalty. For nstance, when both X and Y are ordnal, alternatves to j XY are a lnear-by- lnear term u j,arow effects term j,oracolumn effects term u j; these provde a stochastc orderng of condtonal dstrbutons wthn rows and wthn columns, or just wthn rows, or just wthn columns. Wth a lnear-by-lnear term, the model s log jks q X q Y j q Z k q u q j k XZ q YZ jk. Ž The condtonal local odds ratos 8.13 then satsfy log jž k. s Ž uq1 y u.ž jq1 y j. for all k. The assocaton s the same n dfferent partal tables, wth homogeneous lnear-by-lnear XY assocaton. When the assocaton s heterogeneous, structured terms for ordnal varables make effects smpler to nterpret than n the saturated model. For nstance, the heterogeneous lnear-by-lnear XY assocaton model log jks q X q Y j q Z k q ku q j k XZ q YZ jk Ž allows the XY assocaton to change across levels of Z. Wth unt-spaced scores, log jž k. s k for all and j. It has unform assocaton wthn each level of Z, but heterogenety among levels of Z n the strength of assocaton. Fttng t corresponds to fttng the L L model Ž 9.6. separately at each level of Z Ar Polluton and Breathng Examples Table 9.7 dsplays assocatons among smokng status Ž S., breathng test results Ž B., and age Ž A. for workers n certan ndustral plants n Houston,
248 378 BUILDING AND EXTENDING LOGLINEAR r LOGIT MODELS TABLE 9.7 Cross-Classfcaton of Industral Workers by Breathng Test Results Breathng Test Results Age Smokng Status Normal Borderlne Abnormal 40 Never smoked Former smoker Current smoker Never smoked Former smoker Current smoker Source: From p. 1 of Publc Program Analyss by R. N. Forthofer and R. G. Lehnen. Copyrght 1981 by Lfetme Learnng Publcatons, Belmont, CA 9400, a dvson of Wadsworth, Inc. Reprnted by permsson of Van Nostrand Renhold. All rghts reserved. Ž Texas. The loglnear model SA, SB, BA fts poorly G s 5.9, df s 4.. Thus, smpler models such as homogeneous lnear-by-lnear SB assocaton Ž are not plausble G s 9.1, df s 7, usng equally spaced scores.. The heterogeneous lnear-by-lnear SB assocaton model fts much better wth Ž only one addtonal parameter G s 10.8, df s 6.. Wth nteger scores for S and B, ˆ s for the younger group and ˆ 1 s for the older group, wth SE s for the dfference. The effect of smokng seems much stronger for the older group, wth estmated local odds rato of expž s.18 compared to expž s 1.1 for the younger group. Here, t may be more natural to use logt models wth B as the response varable ŽProblem When strata are ordered, roughly a lnear trend may exst across strata n certan log odds ratos as Table 9.8 llustrates. The data refer to a sample of coal mners, measured on B s breathlessness, W s wheeze, and A s age, where B and W are response varables. One could use a separate logt model to descrbe effects of age on each response. To study whether the BW assocaton vares by age, we ft model Ž BW, AB, AW.. It has resdual G s 6.7, wth df s 8. Table 9.8 reports the standardzed Pearson resduals. They show a decreasng tendency as age ncreases. Ths suggests the model log jks Ž BW, AB, AW. q kiž s j s 1., Ž 9.1. where I s the ndcator functon. It amends the homogeneous assocaton model by addng n the cell for 111,...,9 n the cell for 119. Then, the BW log odds rato changes lnearly n the age category. The model ft has ˆ sy0.131 Ž SE s The estmated BW log odds rato at level k of age s y 0.131k, decreasng from 3.55 to.50. The model has resdual G s 6.80 Ž df s 7.. McCullagh and Nelder Ž 1989, Sec showed other analyses.
249 ASSOCIATION, CORRELATION, AND CORRESPONDENCE MODELS 379 TABLE 9.8 Coal Mners Classfed by Breathlessness, Wheeze, and Age Yes Breathlessness Wheeze Wheeze Wheeze Wheeze Std. Pearson Age Yes No Yes No Resdual a y y y1.44 a Resdual refers to yes yes and no no cells; reverse sgn for yes no and no yes cells. Source: Reprnted wth permsson from Ashford and Sowden Ž No Other Ordnal Tests of Condtonal Independence Tests of condtonal ndependence of ordnal classfcatons can generalze G Ž I L L.. For nstance, one can compare the XY condtonal ndependence model Ž XZ, YZ. to the homogeneous lnear-by-lnear XY assocaton model Ž It tests s 0nthat model, wth df s 1. Ths s an alternatve to the ordnal test of condtonal ndependence n Secton Lke Mantel s score statstc Ž 7.1., ths statstc uses correlaton nformaton, snce Ý Ž Ý Ý u n. s the suffcent statstc for n model Ž k j j jk. In fact, the Mantel statstc provdes the score test of H 0: s 0nthat model. Exact, small-sample tests can use lkelhood-rato, score, or Wald statstcs for such models. Computatons requre specal algorthms ŽAgrest et al. 1990; Km and Agrest ASSOCIATION MODELS, CORRELATION MODELS, AND CORRESPONDENCE ANALYSIS* The lnear-by-lnear assocaton Ž L L. model s a specal case of the row effects Ž R. model, whch has parameter row scores, and the column effects Ž C. model, whch has parameter column scores. These models are specal cases of a more general model wth row and column parameter scores Multplcatve Row and Column Effects Model Replacng u 4 and 4 n the L L model Ž 9.6. j by parameters yelds the row and column effects Ž RC. model Ž Goodman 1979a. log s q X q Y q j j j
250 380 BUILDING AND EXTENDING LOGLINEAR r LOGIT MODELS Identfablty requres locaton and scale constrants on 4 and 4 j.the resdual df s Ž I y.ž J y.ths. model s not loglnear, because the predctor s a multplcatve Ž rather than lnear. functon of parameters and j.it treats classfcatons as nomnal; the same ft results from a permutaton of rows or columns. Parameter nterpretaton s smplest when at least one varable s ordnal, through the local log odds ratos log js Ž q1 y.ž jq1 y j.. Although t may seem appealng to use parameters nstead of arbtrary scores, the RC model presents complcatons that do not occur wth loglnear models. The lkelhood may not be concave and may have local maxma. Independence s a specal case, but t s awkward to test ndependence usng the RC model. Haberman Ž showed that the null dstrbuton of G Ž I. y G Ž RC. s not ch-squared but rather that of the maxmum egenvalue from a Wshart matrx. When one set of parameter scores s fxed, the RC model smplfes to the R or C model. Goodman Ž 1979a. suggested an teratve model-fttng algorthm that explots ths. A cycle of the algorthm has two steps. Frst, for some ntal guess of 4 j,testmatestherowscoresasnther model. Then, treatng the estmated row scores from the frst step as fxed, t estmates the column scores as n the C model. Those estmates serve as fxed column scores n the frst step of the next cycle, for reestmatng the row scores n the R model. There s no guarantee of convergence to ML estmates, but ths seems to happen when the model fts well. Haberman Ž provded more sophstcated fttng methods for assocaton models. Goodman Ž expressed the assocaton term n the saturated model n a form that generalzes the term n the RC model, namely, j M XY j s Ý k k jk 9.14 ks1 where M s mn I y 1, J y 1.The parameters satsfy constrants such as Ý Ý Ý Ý s s 0 for all k, k q jk qj j Ý s s 1 for all k, Ž k q jk qj j Ý s s 0 for all k h. k h q jk jh qj j When s 0 for k M*, model Ž s called the RCŽ M*. k model. See Becker Ž for ML model fttng. The RC model Ž s the case M* s 1.
251 ASSOCIATION, CORRELATION, AND CORRESPONDENCE MODELS 381 TABLE 9.9 Cross-Classfcaton of Mental Health Status and Socoeconomc Status Mental Health Status Parents Mld Moderate Socoeconomc Symptom Symptom Status Well Formaton Formaton Impared A Ž hgh B C D E F Ž low Source: Reprnted wth permsson from L. Srole et al. Mental Health n the Metropols: The Mdtown Manhattan Study, New York: NYU Press, 1978, p Mental Health Status Example Table 9.9 descrbes the relatonshp between chld s mental mparment and parents socoeconomc status for a sample of resdents of Manhattan ŽGood-. Ž man 1979a. The RC model fts well G s 3.6, df s 8.For. scalng Ž 9.15., the ML estmates are Ž y1.11, y1.1, y0.37, 0.03, 1.01, 1.8. for the row scores, Ž y1.68, y0.14, 0.14, for the column scores, and ˆ s Nearly all estmated local log odds ratos are postve, ndcatng a tendency for mental health to be better at hgher levels of parents SES. Ordnal loglnear models also ft well. For equal-nterval scores, G Ž L L. s 9.9 Ž df s 14.. The statstc G Ž L L RC. s 6.3 Ž df s 6. tests that row and column scores n the RC model are equal-nterval. The parameter scores do not provde a sgnfcantly better ft. It s suffcent to use a unform local odds rato to descrbe the table. For unt-spaced scores, ˆ s Ž SE s , so the ftted local odds rato s expž s There s strong evdence of postve assocaton, but the degree of assocaton s rather weak, at least locally Correlaton Models A correlaton model for two-way tables has many features n common wth the RC model Ž Goodman In ts smplest form, t s s 1 q, Ž j q qj j where 4 and 4 are score parameters satsfyng Ý qs Ý j qj s 0 and Ý qs Ý j qj s 1.
252 38 BUILDING AND EXTENDING LOGLINEAR r LOGIT MODELS The parameter s the correlaton between the scores for jont dstrbuton Ž The correlaton model s also called the canoncal correlaton model, because ML estmates of the scores maxmze the correlaton for Ž The general canoncal correlaton model s M j q qj Ý k k jk ž / ks1 s 1 q where 0 F F F F 1 and wth constrants such as n Ž M 1. The parameter s the correlaton between, s 1,...,I4 and k k jk, j s 1,..., J 4. The 4 and 4 1 j1 are standardzed scores that maxmze the correlaton for the jont dstrbuton; 4 and 4 1 j are standardzed scores that maxmze the correlaton, subject to 4 and 4 1 beng uncorrelated and 4 and 4 j1 j beng uncorrelated, and so on. Unsaturated models result from replacng M by M* mnž I y 1, J y 1.. Glula and Haberman Ž and Goodman Ž dscussed ML fttng. When s close to zero n Ž 9.16., Goodman Ž 1981a, 1985, noted that ML estmates of and the score parameters are smlar to those of and the score parameters n the RC model. Correlaton models can also use fxed scores nstead of parameter scores. Goodman dscussed advantages of assocaton models over correlaton models. The correlaton model s not defned for all possble combnatons of score values because of the constrant 0 F jf 1, ML ftted values do not have the same margnal totals as the observed data, and the model s not smply generalzable to multway tables. Glula and Haberman Ž analyzed multway tables wth correlaton models by treatng explanatory varables as a sngle varable and response varables as a second varable Correspondence Analyss Correspondence analyss s a graphcal way to represent assocatons n two-way contngency tables. The rows and columns are represented by ponts on a graph, the postons of whch ndcate assocatons. Goodman Ž 1985, noted that coordnates of the ponts are reparameterzatons of 4 and 4 k jk n the general canoncal correlaton model. Correspondence analyss uses adjusted scores x s, y s. k k k jk k jk These are close to zero for dmensons k n whch the correlaton k s close to zero. A correspondence analyss graph uses the frst two dmensons, plottng Ž x, x. for each row and Ž y, y. for each column. 1 j1 j
253 ASSOCIATION, CORRELATION, AND CORRESPONDENCE MODELS 383 TABLE 9.10 Scores from Correspondence Analyss Appled to Table 9.9 Dmenson Dmenson Column Score 1 3 Row Score y y y0.011 y y0.013 y0.069 y y0.01 y y y y y y0.87 y Source: Reprnted wth permsson from the Insttute of Mathematcal Statstcs, based on Goodman Goodman Ž 1985, used Table 9.9 to llustrate the smlartes of correspondence analyss to analyses usng correlaton models and assocaton models. For the general canoncal correlaton model, M s mnž I y 1, J y 1. s 3. Its estmated squared correlatons are Ž 0.060, , and The assocaton s rather weak. Table 9.10 contans estmated row and column scores for the correspondence analyss of these three dmensons. Both sets of scores n the frst dmenson fall n a monotone ncreasng pattern, except for a slght dscrepancy between the frst two row scores. Ths ndcates an overall postve assocaton. The scores for the second and thrd dmenson are close to zero, reflectng the relatvely small ˆ and ˆ 3. Fgure 9.4 exhbts the results of the correspondence analyss. The horzontal axs has estmates for the frst dmenson, and the vertcal axs has estmates for the second dmenson. Sx ponts Ž crcles. represent the sx rows, wth pont gvng Ž ˆx, x.. Smlarly, four ponts Ž squares. 1 ˆ dsplay the estmates Ž ˆy, y. j1 ˆj. Both sets of ponts le close to the horzontal axs, snce the frst dmenson s more mportant than the second. FIGURE 9.4 Graphcal dsplay of scores from frst two dmensons of correspondence analyss. wbased on Escoufer Ž 198.; reprnted wth permsson. x
254 384 BUILDING AND EXTENDING LOGLINEAR r LOGIT MODELS Row ponts that are close together represent rows wth smlar condtonal dstrbutons across the columns. Close column ponts represent columns wth smlar condtonal dstrbutons across rows. Row ponts close to column ponts represent combnatons that are more lkely than expected under ndependence. Fgure 9.4 shows a tendency for subjects at the hgh end of one scale to be at the hgh end of the other and for subjects at the low end of one to be at the low end of the other. Correspondence analyss s used manly as a descrptve tool. Goodman Ž developed nferental methods for t. For Table 9.9, nferental analyss reveals that the frst dmenson, accountng for 94% of the total squared correlaton, s adequate for descrbng the assocaton. Goodman argued for choosng the unsaturated model employng only one dmenson and havng graphcs dsplay ftted scores for that dmenson alone. Then, correspondence analyss s equvalent to a ML analyss usng correlaton model Ž The estmated scores for that model are Ž y1.09, y1.17, y0.37, 0.05, 1.01, for the rows and Ž y1.60, y0.19, 0.09, for the columns. The model fts Ž well G s.75, df s 8.. The qualty of ft and the estmated scores are smlar to those we saw n Secton 9.6. for the RC model. More parsmonous correlaton models also ft these data well, such as ones usng equally spaced scores. All analyses of Table 9.9 have yelded smlar conclusons about the assocaton. They all neglect, however, that mental health s a natural response varable. It may make more sense to use an ordnal logt model. Lke correlaton models, a severe lmtaton of correspondence analyss s nontrval generalzaton to multway tables. Greenacre Ž showed dsplays of several parwse assocatons n a sngle plot Model Selecton and Score Choce for Ordnal Varables The past three sectons showed several ways to use category orderngs n model buldng. Wth allowance for ordnal effects, the varety of potental models s much greater than standard loglnear models. To choose among models, one approach uses the standard models for gudance. If a standard model fts well, smplfy by replacng some parameters wth structured terms for ordnal classfcatons. Assocaton, correlaton, and correspondence analyss models have scores for categores of ordnal varables. Parameter nterpretatons are smplest for equally spaced scores. Wth parameter scores, the resultng ML estmates of scores need not be monotone. Constraned versons of the models force monotoncty by maxmzng the lkelhood subject to order restrctons Že.g., Agrest et al. 1987; Rtov and Glula Dsadvantages exst, however, of treatng scores as parameters. The model becomes less parsmonous, and tests of effects may be less powerful because of a greater df value Žrecall Secton When one varable alone s a response, cumulatve lnk models
255 POISSON REGRESSION FOR RATES 385 Ž Sectons 7. and 7.3. parameter scores. for that response do not requre preassgned or 9.7 POISSON REGRESSION FOR RATES Loglnear models need not refer to contngency tables. In Secton 4.3 we ntroduced Posson regresson for modelng counts. When outcomes occur over tme, space, or some other ndex of sze, t s more relevant to model ther rate of occurrence than ther raw number Analyzng Rates Usng Loglnear Models wth Offsets When a response count n has ndex equal to t, the sample rate s nrt. Its expected value s rt. Wth an explanatory varable x, aloglnear model for the expected rate has form Ths model has equvalent representaton logž rt. s q x. Ž log y log t s q x. As noted n Secton 8.7.4, the adjustment term, ylog t,tothe log lnk of the mean s called an offset. The ft correspond to usng log t as a predctor on the rght-hand sde and forcng ts coeffcent to equal 1.0. For model 9.17, the expected response count satsfes s t expž q x.. The mean s proportonal to the ndex, wth proportonalty constant dependng on the value of x. The dentty lnk s also sometmes useful. The model s then rt s q x, or s t q xt. Ths does not requre an offset. It corresponds to an ordnary Posson GLM usng dentty lnk wth t and xt as explanatory varables and no ntercept. It provdes addtve, rather than multplcatve, predctor effects. It s less useful wth many predctors, as the fttng process may fal because of negatve ftted counts at some teraton Modelng Death Rates for Heart Valve Operatons Lard and Olver 1981 analyzed patent survval after heart valve replacement operatons. A sample of 109 patents were classfed by type of heart
256 386 BUILDING AND EXTENDING LOGLINEAR r LOGIT MODELS TABLE 9.11 Data on Heart Valve Replacement Operatons Type of Heart Valve Age Aortc Mtral 55 Deaths 4 1 Tme at rsk Death rate q Deaths 7 9 Tme at rsk Death rate Source: Reprnted wth permsson, based on data n Lard and Olver valve Ž aortc, mtral. and by age Ž 55, G 55.. Follow-up observatons occurred untl the patent ded or the study ended. Operatons occurred throughout the study perod, and follow-up observatons covered lengths of tme varyng from 3 to 97 months. The response was whether the subject ded and the follow-up tme. For subjects who ded, ths s the tme after the operaton untl death; for the others, t s the tme untl the study ended or the subject wthdrew from t. Table 9.11 lsts the numbers of deaths durng the follow-up perod, by valve type and age. These counts are the frst layer of a three-way contngency table that classfes valve type, age, and whether ded Ž yes, no.. The subjects not tabulated n Table 9.11 were not observed to de. They are censored, snce we know only a lower bound for how long they lved after the operaton. It s napproprate to analyze that table usng bnary GLMs for the probablty of death, snce subjects had dfferng tmes at rsk; t s not sensble to treat a subject who could be observed for 3 months and a subject who could be observed for 97 months as dentcal trals wth the same probablty. To use age and valve type as predctors n a model for frequency of death, the proper baselne s not the number of subjects but rather the total tme that subjects were at rsk. Thus, we model the rate of death. The tme at rsk for a subject s ther follow-up tme of observaton. For a gven age and valve type, the total tme at rsk s the sum of the tmes at rsk for all subjects n that cell Ž those who ded and those censored.. Table 9.11 lsts those total tmes n months. The sample rate, also shown n that table, dvdes the number of deaths by total tme at rsk. For nstance, 4 deaths n 159 months of observaton occurred for younger subjects wth aortc valve replacement, so ther sample rate s 4r159 s We now model effects of age and valve type on the rate. Let a be a dummy varable for age, wth a s 0 for the younger age group and a s 1 1 for the older group. Let be a dummy varable for valve type, wth 1 s0 for aortc and s1 for mtral. Let n denote the number of deaths for age a j and valve type, wth expected value for total tme at rsk t. Gven t, j j j j
257 POISSON REGRESSION FOR RATES 387 TABLE 9.1 Ft to Table 9.11 for Posson Regresson Models Log Lnk Identty Lnk Age Aortc Mtral Aortc Mtral 55 Number of deaths Death rate q Number of deaths Death rate the expected rate s jrt j. The model log rt s q a q Ž j j 1 j assumes a lack of nteracton n the effects. Model fttng uses standard teratve methods, treatng n 4 j as ndepen- dent Posson varates wth means 4.Thssdonecondtonal on t 4 j j. Table 9.1 presents the ftted death counts and estmated rates. The estmated effects are ˆ s 1.1 Ž SE s , ˆ sy0.330 Ž SE s There s evdence of an age effect. Gven valve type, the estmated rate for the older age group s expž 1.1. s 3.4 tmes that for the younger age group. The 95% Wald confdence nterval for of Ž translates to Ž 1., 9.3. for the true multplcatve effect expž.. Ž 1 The lkelhood-rato confdence nterval s Ž 1.3, The study contans much censored data. Of the 109 patents, only 1 ded durng the study perod. Both effect estmates are mprecse. Note, though, that the analyss uses all 109 patents through ther contrbutons to the tmes at rsk. 4 4 Goodness-of-ft statstcs comparng nj to ftted values ˆ j are G s 3. and X s 3.1. The resdual df s 1, snce the four response counts have three parameters. The mld evdence of lack of ft corresponds to evdence of nteracton between valve type and age. However, the model wthout valvew x type effects.e., s 0 n 9.18 fts nearly as well, wth G s 3.8 and X s 3.8 Ž df s.models. omttng age effects ft poorly. The correspondng model wth dentty lnk s t q at q t j j 1 j j j shows a good ft, wth G s 1.1 and X s 1.1 Ž df s 1.Table. 9.1 shows the ft. Substantve conclusons are smlar. The estmate ˆ s Ž 1 SE s then represents an estmated dfference n death rates between the older and younger age groups for each valve type.
258 388 BUILDING AND EXTENDING LOGLINEAR r LOGIT MODELS Modelng Survval Tmes* A method for modelng survval tmes relates to the Posson loglnear model for rates. Ths method focuses on tmes untl death rather than on numbers of deaths. Let T denote the tme to some event, such as death or such as product falure n a relablty study. Let fž. t denote the probablty densty functon Ž pdf. and Ft the cdf of T. A connecton exsts between ML estmaton usng a Posson lkelhood for numbers of events and a negatve exponental lkelhood for T Ž Atkn and Clayton A subject havng T s t contrbutes fž. t to the lkelhood. For a subject whose censorng tme equals t, weknow only that T t. Thus, ths subject contrbutes PT Ž t. s 1 y Ft. Usng the ndcator w s 1 for death and 0 for censorng for subject, the survval-tme lkelhood for n ndependent observatons s The log lkelhood equals Ý n w Ł s1 1yw fž t. 1 y FŽ t.. Ý w log fž t. q Ž 1 y w. log 1 y FŽ t.. Ž Further analyss requres a parametrc form for f and a model for the dependence of ts parameters on explanatory varables. Most survval models focus on the rate at whch death occurs rather than on ET. The hazard functon fž t. Pwt T t q T tx hž t. s s lm 1 y FŽ t. x0 represents the nstantaneous rate of death for subjects who have survved to tme t. A smple densty for survval modelng s the negatve exponental. The pdf s fž t. s e y t, t 0. Ž. y t y1 The cdf s Ft s 1 y e for t 0, and ET s. The hazard functon s hž t. s, t 0, constant for all t. Now we nclude explanatory varables x. Suppose that the hazard functon for a negatve exponental survval dstrbuton s hž t; x. s expž x.. Ž 9.0.
259 POISSON REGRESSION FOR RATES 389 That s, the dstrbuton for T has parameter dependng on x through Ž The choce of functonal form Ž 9.0. for explanatory varable effects ensures the hazard s nonnegatve at all x. For nstance, loglnear model Ž corresponds to a multplcatve model of type Ž 9.0. for the rate tself. Now, consder the log lkelhood Ž wth fž t. equal to the negatve exponental densty wth parameter expž x.. For subject, let s t exp Ž x.. Wth ths substtuton, the log lkelhood smplfes to Ý Ý Ý w log y y w log t. The frst two terms nvolve. Ths part s dentcal to the log lkelhood for ndependent Posson varates w 4 wth expected values 4.Inthsapplca- ton w 4 are bnary rather than Posson, but that s rrelevant to the process of maxmzng wth respect to. Ths process s equvalent to maxmzng the lkelhood for the Posson loglnear model log y log t s log q x wth offset logž t., usng observatons w 4.Whenwesumtermsnthelog lkelhood for subjects havng a common value of x, the observed data are the numbers of deaths Ž Ýw. at each settng of x, and the offset s the log of Ž Ýt. at each settng. The assumpton of constant hazard over tme s often not sensble. As products wear out, ther falure rate ncreases. A generalzaton dvdes the tme scale nto dsjont tme ntervals and assumes constant hazard n each, namely, hž t; x. s exp k Ž x. for t n nterval k, k s 1,.... A separate hazard rate apples to each pece of the tme scale. Consder the contngency table for numbers of deaths, n whch one dmenson s a dscrete tme scale and other dmensons represent categorcal explanatory varables. Holford Ž and Lard and Olver Ž showed that Posson loglnear models and lkelhoods for ths table are equvalent to loglnear hazard models and lkelhoods that assume pecewse exponental hazards for the survval tmes. For short tme ntervals, the pecewse exponental approach s essentally nonparametrc, makng no assumpton about the dependence of the hazard on tme. Ths suggests the generalzaton of model Ž 9.0. that replaces by an unspecfed functon Ž. t,sothat hž t; x. s Ž t. expž x.. Ths s the Cox proportonal hazards model. Its rato of hazards s the same for all t. Ž 1. Ž 1. h t; x rh t; x s exp x y x
260 390 TABLE 9.13 BUILDING AND EXTENDING LOGLINEAR r LOGIT MODELS Number of Deaths from Lung Cancer Follow-up Hstology a Tme Interval Dsease I II III Ž months. Stage: Ž Ž Ž Ž Ž Ž q Ž a Values n parentheses represent total follow-up. Source: Reprnted wth permsson from the Bometrc Socety, based on Holford Ž Lung Cancer Survval Example* Table 9.13 descrbes survval for 539 males dagnosed wth lung cancer. The prognostc factors are hstology Ž H. and stage Ž S. of dsease. For a pecewse exponental hazard approach, the tme scale for follow-up Ž T. was dvded nto two-month ntervals. Let jk denote the expected number of deaths and tjk the total tme at rsk for hstology and state of dsease j, nfollow-up tme nterval k. The model log rt s q H q S q T Ž 9.1. jk jk j k has resdual G s 43.9 Ž df s 5.. All models assumng no nteracton between follow-up tme nterval and ether prognostc factor are proportonal hazards models, snce they have the same effects of hstology and stage of dsease for each tme nterval. Table 9.14 summarzes results of fttng several such models. Although stage of dsease s an mportant prognostc factor, hstology dd not contrbute sgnfcant addtonal nformaton. For model Ž 9.1., the effects of stage of dsease satsfy y s Ž SE s , ˆS ˆS 1 y s 1.34 Ž SE s ˆS ˆS 3 1
261 EMPTY CELLS AND SPARSENESS IN MODELING CONTINGENCY TABLES 391 TABLE 9.14 Results for Posson Regresson Models of Proportonal Hazards Form wth Table 9.13 a Effects G df T T q H T q S T q S q H T q S q H q S H a T, tme scale for follow-up; H, hstology; S, dsease stage. For nstance, at a fxed follow-up tme for a gven hstology, the estmated death rate at the thrd stage of dsease s expž s 3.8 tmes that at the frst stage. Addng nteracton terms between stage and tme does not sg- Ž nfcantly mprove the ft change n G s 14.9, change n df s 1.. The ˆS 4 are very smlar for the smpler model wthout the hstology effects. j Analyzng Weghted Data* The process of fttng a loglnear model wth an offset s also useful n other applcatons. For expected frequences 4 and fxed constants t 4,consder a model logž rt. s q 1 x1q x q. Standard loglnear models have t s 1.Thegeneral 4 form s useful for the analyss of categorcal data wth samplng desgns more complex than smple random samplng. Many surveys have samplng desgns employng stratfcaton andror clusterng. Case weghts nflate or deflate the nfluence of each observaton accordng to features of that desgn. Addng the case weghts for subjects n a partcular cell provdes a total weghted frequency for that cell. The average cell weght z s defned to be the total weghted frequency dvded by the cell count. Condtonal on z 4, loglnear models for the weghted expected 4 y1 frequences z s rt wth ts z express the model as a standard loglnear model for log 4,wthoffset log t sylog z 4.Fttng ths model provdes approprate parameter estmates and standard errors ŽClogg and Elason EMPTY CELLS AND SPARSENESS IN MODELING CONTINGENCY TABLES Contngency tables havng small cell counts are sad to be sparse. Weend ths chapter by dscussng effects of sparse tables on model fttng. Sparse
262 39 BUILDING AND EXTENDING LOGLINEAR r LOGIT MODELS tables occur when the sample sze n s small. They also occur when n s large but so s the number of cells. Sparseness s common n tables wth many varables. The followng dscusson refers to a generc contngency table and model, wth cell counts n 4 and expected frequences 4 for n observatons n N cells Empty Cells: Samplng versus Structural Zeros Sparse tables usually contan cells wth n s 0. These empty cells are of two types: samplng zeros and structural zeros. In most cases, even though n s 0, 0. It s possble to have observatons n the cell, and n 0 wth suffcently large n. Ths empty cell s called a samplng zero. The empty cells n Table 9.1 for the student survey are samplng zeros. An empty cell n whch observatons are mpossble s called a structural zero. For such cells s 0 and necessarly ˆ s 0 and ns 0 regardless of n. For a table that cross classfes cancer patents on ther gender, race, and type of cancer, some cancers Ž e.g., prostate cancer, ovaran cancer. are gender specfc. Thus, certan cells have structural zeros. Contngency tables wth structural zeros are called ncomplete tables. Samplng zeros are part of the data set. A count of 0 s a permssble outcome for a Posson or multnomal varate. It contrbutes to the lkelhood functon and model fttng. A structural zero, on the other hand, s not an observaton and s not part of the data. Samplng zeros are much more common than structural zeros, and the remanng dscusson refers to them Exstence of Estmates n Loglnear r Logt Models Samplng zeros can affect the exstence of fnte ML estmates of loglnear and logt model parameters. Haberman Ž 1973b, 1974a., generalzng work by Brch Ž and Fenberg Ž 1970b., studed ths. Let n denote the vector of cell counts and ther expected values. Haberman showed results 1 through 5 for Posson samplng, but by result 6 they apply also to multnomal samplng. 1. The log-lkelhood functon s a strctly concave functon of log.. If a ML estmate of exsts, t s unque and satsfes the lkelhood equatons X n s X. ˆ Conversely, f ˆ satsfes the model and also the lkelhood equatons, t s the ML estmate of. 3. If all n 0, ML estmates of loglnear model parameters exst. 4. Suppose that ML parameter estmates exst for a loglnear model that equates observed and ftted counts n certan margnal tables. Then those margnal tables have unformly postve counts. 5. If ML estmates exst for a model M, they also exst for any specal case of M.
263 EMPTY CELLS AND SPARSENESS IN MODELING CONTINGENCY TABLES For any loglnear model, the ML estmates ˆ are dentcal for multnomal and ndependent Posson samplng, and those estmates exst n the same stuatons. To llustrate, consder the saturated model. By results and 3, when all n 0, the ML estmate of s n. Byresult 4, parameter estmates do not exst when any n s 0. Model parameter estmates are contrasts of log 4 ˆ, and snce ˆ s n for the saturated model, the estmates are fnte only when all n 0. For unsaturated models, by results 3 and 4 ML estmates exst when all n 0 and do not exst when any count s zero n the set of suffcent margnal tables. Suppose that at least one n s 0 but the suffcent margnal counts are all postve. For herarchcal loglnear models, Glonek et al. Ž showed that the postvty of the suffcent counts mples the exstence of ML estmates f and only f the model s decomposable Ž Note 8.., whch ncludes the condtonal ndependence models. Models havng all pars of varables assocated, however, are more complex. For model Ž XY, XZ, YZ., for nstance, ML estmates exst when only one n s 0 but may not exst when at least two cells are empty. For nstance, ML estmates do not exst for Table 9.15, even though all suffcent statstcs Ž the two-way margnal totals. are postve Ž Problem Haberman showed that the supremum of the lkelhood functon s fnte. Ths motvated hm to defne extended ML estmators of. These always exst but may equal 0 and, fallng on the boundary, need not have the same propertes as regular ML estmators wsee also Baker et al. Ž 1985.x. A sequence of estmates satsfyng the model that converges to the extended estmate has log lkelhood approachng ts supremum. In ths extended sense, ˆ s 0s the ML estmate of for the saturated model when ns 0, and one can have nfnte loglnear parameter estmates. When a suffcent margnal count for a factor equals zero, nfnte estmates occur for that term. For nstance, when a XY margnal total equals ˆXY 4 j zero, nfnte estmates occur among for loglnear models such as Ž XY, XZ, YZ., and nfnte estmates occur among ˆ X 4 for the effect of X on Y n logt models. Sometmes, however, not even nfnte estmates exst. An example s estmatng the log odds rato when both entres n a row or column of a table equal 0. TABLE 9.15 Data for Whch ML Estmates Do Not Exst for Model ( XY, XZ, YZ) a Z: 1 X Y: a Cells contanng * may contan any postve numbers.
264 394 BUILDING AND EXTENDING LOGLINEAR r LOGIT MODELS A value of Ž or y. for a ML parameter estmate mples that ML ftted values equal 0 n some cells, and some odds rato estmates equal or 0. One potental ndcator s when the teratve fttng process does not converge, typcally because an estmate keeps ncreasng from cycle to cycle. Most software, however, s fooled after a certan pont n the teratve process by the nearly flat lkelhood. It reports convergence, but because of the very slght curvature of the log lkelhood, the estmated standard errors Žbased on nvertng the nformaton matrx of second partal dervatves. are extremely large and numercally unstable. Slght changes n the data then often cause dramatc changes n the estmates and ther standard errors. A danger wth sparse data s that one mght not realze that a true estmated effect s nfnte and, as a consequence, report estmated effects and results of statstcal nferences that are nvald and hghly unstable. Many ML analyses are unharmed by empty cells. Even when a parameter estmate s nfnte, ths s not fatal to data analyss. The lkelhood-rato confdence nterval for the true log odds rato has one endpont that s fnte. For nstance, when n s 0 but other n 0na table, log ˆ 11 j sy and a confdence nterval has form Ž y, U. for some fnte upper bound U. When the pattern of empty cells forces certan ftted values for a model to equal 0, ths affects the df for testng model ft Ž Haslett Clncal Trals Example Table 9.16 shows results of a clncal tral conducted at fve centers. The purpose was to compare an actve drug to placebo for treatng fungal nfectons, wth a bnary Ž success, falure. response. For these data, let Y s response, X s treatment Ž x s 1 for actve drug and x s 0 for placebo. 1, and Z s center. Centers 1 and 3 had no successes. Thus, the 5 margnal table relatng response to center, collapsed over treatment, contans zero counts. The last two columns of Table 9.16 show ths margnal table. Infnte ML estmates occur for terms n loglnear or logt models contanng the YZ assocaton. An example s the logt model Ž. k Z logt P Ys 1 X s, Z s k s x q. ŽWe omt the ntercept, so the Z 4 k need no constrant; then, these refer to center effects rather than contrasts between centers and a baselne center.. The lkelhood functon ncreases contnually as 1 Z and 3 Z decrease toward y ; that s, as the logt decreases toward y, sothe ftted probablty of success decreases toward the ML estmate of 0 for those centers. The counts n the margnal table relatng response to treatment, shown n the bottom panel of Table 9.16, are all postve. The empty cells n Table 9.16 affect the center estmates, but not the treatment estmate, for ths logt model. In the lmt as the log lkelhood ncreases, the ftted values have a log odds rato ˆ s 1.55 Ž SE s Most software reports ths, but
265 EMPTY CELLS AND SPARSENESS IN MODELING CONTINGENCY TABLES 395 TABLE 9.16 Clncal Tral Relatng Treatment to Response wth XY and YZ Margnal Tables a Response YZ Margnal Center Treatment Success Falure Success Falure 1 Actve drug 0 5 Placebo 0 9 Actve drug 1 1 Placebo Actve drug 0 7 Placebo Actve drug 6 3 Placebo 6 5 Actve drug 5 9 Placebo 1 XY Actve drug 1 36 margnal Placebo 4 4 a X, Treatment; Y, response; Z, center. Source: Data courtesy of Dane Connell, Sandoz Pharmaceutcals Corporaton nstead of ˆZ s ˆZ 1 3 sy reports large numbers wth extremely large stan- dard errors. For nstance, PROC GENMOD n SAS reports values of about y6 for ˆZ and ˆZ 1 3, wth standard errors of about 00,000. The treatment estmate ˆ s 1.55 also results from deletng centers 1 and 3 from the analyss. When a center contans responses of only one type, t provdes no nformaton about ths odds rato. ŽIt does provde nformaton about the sze of some other measures, such as the dfference of proportons.. In fact, such tables also make no contrbuton to standard tests of condtonal ndependence, such as the Cochran Mantel Haenszel test Ž Secton and exact test Ž Secton An alternatve strategy n multcenter analyses combnes centers of a smlar type. Then, f each resultng partal table has responses wth both outcomes, the nferences use all data. For Table 9.16, perhaps centers 1 and 3 are smlar to center, snce the success rate s very low for that center. Combnng these three centers and refttng the model to ths table and the tables for the other two centers yelds ˆ s 1.56 Ž SE s Usually, ths strategy produces results smlar to deletng the table wth no outcomes of a partcular type Effect of Small Samples on X and G Although empty cells and sparse tables need not affect parameter estmates of nterest, they can cause samplng dstrbutons of goodness-of-ft statstcs to be far from ch-squared. The true samplng dstrbutons converge to
266 396 BUILDING AND EXTENDING LOGLINEAR r LOGIT MODELS ch-squared as n, for a fxed number of cells N. The adequacy of the ch-squared approxmaton depends both on n and N. Cochran studed the ch-squared approxmaton for X n several artcles. In 1954, he suggested that to test ndependence wth df 1, a mnmum expected value f 1spermssble as long as no more than about 0% of 5. Koehler Ž 1986., Koehler and Larntz Ž 1980., and Larntz Ž showed that X apples wth smaller n and more sparse tables than G. The dstrbuton of G s usually poorly approxmated by ch-squared when nrn s less than 5. Dependng on the sparseness, P-values based on referrng G to a ch-squared dstrbuton can be too large or too small. When most are smaller than 0.5, treatng G as ch-squared gves a hghly conservatve test; when H0 s true, reported P-values tend to be much larger than true ones. When most are between 0.5 and 4, G tends to be too lberal; the reported P-value tends to be too small. The sze of nrn that produces adequate approxmatons for X tends to decrease as N ncreases Ž Koehler and Larntz However, the approxmaton tends to be poor for sparse tables contanng both small and moderately large Ž Haberman It s dffcult to gve a gudelne that covers all cases. For other dscusson, see Cresse and Read Ž and Lawal Ž For fxed n and N, the ch-squared approxmaton s better for tests wth smaller df. For nstance, n testng condtonal ndependence n I J K tables, G wž XZ, YZ. Ž XY, XZ, YZ.x Žwth df s Ž I y 1.Ž J y 1.. s closer to ch-squared than G Ž XZ, YZ. wwth df s KŽ Iy 1.Ž J y 1.x. The ordnal test of H 0: s 0 wth the homogeneous lnear-by-lnear XY assocaton model Ž has df s 1, and behaves even better Model-Based Tests and Sparseness From Ž 9.3. and Ž 9.4., the model-based statstcs G Ž M M. and X Ž M M depend on the data only through the ftted values, and hence only through mnmal suffcent statstcs for the more complex model. These statstcs have null dstrbutons convergng to ch-squared as the expected values of the mnmal suffcent statstcs grow. For most loglnear models, these suffcent statstcs refer to margnal tables. Margnal totals are more nearly normally dstrbuted than are sngle cell counts. Thus, G Ž M M. 0 1 and X Ž M M. 0 1 converge to ther lmtng ch-squared dstrbuton more quckly than does G Ž M. and X Ž M. 0 0, whch depend also on ndvdual cell counts. When 4 ˆ are small but the suffcent margnal totals for M 1 are mostly n at least the range 5 to 10, the ch-squared approxmaton s usually adequate for model comparson statstcs. Haberman Ž 1977a. provded theoretcal justfcaton Alternatve Asymptotcs and Alternatve Statstcs When large-sample approxmatons are nadequate, exact small-sample methods are an alternatve. When they are nfeasble, t s often possble to
267 EMPTY CELLS AND SPARSENESS IN MODELING CONTINGENCY TABLES 397 approxmate exact dstrbutons precsely usng Monte Carlo methods Že.g., Booth and Butler 1999; Forster et al. 1996; Km and Agrest 1997; Mehta et al An alternatve approach uses sparse asymptotc approxmatons that apply when the number of cells N ncreases as n ncreases. For ths approach, 4 need not ncrease, as they must do n the usual Ž fxed N, n. large-sample theory. For goodness-of-ft testng of a specfed multnomal, Koehler and Larntz 1980 showed that a standardzed verson of G has an approxmate normal dstrbuton for very sparse tables. Koehler Ž presented lmtng normal dstrbutons for G for use n testng models havng drect ML estmates. McCullagh Ž revewed ways of handlng sparse tables and presented an alternatve approxmaton for G. Zelterman Ž gave normal approxmatons for X and proposed an alternatve statstc Addng Constants to Cells of a Contngency Table Empty cells and sparse tables can cause problems wth exstence of estmates for loglnear model parameters, estmaton of odds ratos, performance of computatonal algorthms, and asymptotc approxmatons of ch-squared statstcs. However, they need not be problematc. The lkelhood can stll be maxmzed, a pont estmate of for an effect stll usually has a fnte lower bound for a lkelhood-based confdence nterval, and one can use small-sample nferental methods rather than asymptotc ones. One way to obtan fnte estmates of all effects and ensure convergence of fttng algorthms s to add a small constant to cell counts. Some algorthms 1 add to each cell, as Goodman Ž 1964b, 1970, 1971a. recommended for saturated models. An example of the benefcal effect of ths for a saturated model s bas reducton for estmatng an odds rato n a table ŽGart ; Gart and Zweful Addng to each cell before fttng an unsaturated model smooths the data too much, however, causng havoc wth samplng dstrbutons. Ths operaton has too conservatve an nfluence on estmated effects and test statstcs. The effect s very severe wth a large number of cells. 1 Even for a saturated model, addng to each cell s not a panacea for all purposes. When the ordnary ML estmate of an odds rato s nfnte, the 1 estmate after addng to each cell s fnte, as are the endponts of any confdence nterval. However, t s more sensble to use an upper bound of for the odds rato, snce no sample evdence suggests that the odds rato falls below any gven value. When n doubt about the effect of sparse data, one should perform a senstvty analyss. For example, for each possbly nfluental observaton, delete t or move t to another cell to see how results vary wth small perturbatons to the data. Influence dagnostcs for GLMs Ž Wllams are also useful for ths purpose. Often, some assocatons are not affected by empty cells and gve stable results for the varous analyses, whereas others
268 398 BUILDING AND EXTENDING LOGLINEAR r LOGIT MODELS that are affected are hghly unstable. Use cauton n makng conclusons about an assocaton f small changes n the data are nfluental. Later chapters show ways to smooth data n a less ad hoc manner than addng arbtrary constants to cells. These nclude random effects models Ž Secton 1.3. and Bayesan methods Ž Secton NOTES Secton 9.1: Assocaton Graphs and Collapsblty 9.1. Darroch et al. Ž defned a class of graphcal models that contans the famly of decomposable models Ž see Note 8... For expostons on graphcal models and ther relevant ndependence graphs, whch show the condtonal ndependence structure, see also Anderson and Bockenholt Ž 000., Edwards Ž 000., Edwards and Krener Ž 1983., Krener Ž 1998., Laurtzen Ž 1996., and Whttaker Ž Whttaker Ž 1990, Sec summarzed connectons wth varous defntons of collapsblty. 9. For I J tables, the collapsblty condtons Ž Secton are necessary as well as suffcent Ž Smpson 1951; Whttemore For I J K tables, Ducharme and Lepage Ž showed the condtons are necessary and suffcent for the odds ratos to reman the same no matter how the levels of Z are pooled Ž.e., no matter how Z s partally collapsed.. Darroch Ž 196. defned a perfect table as one for whch for all, j, k, Ý jq qk qjk jq s qjq qqk, Ý s qq qqk, qq qk qjk Ý s qq qjq. k qqk j qjq For perfect tables, homogeneous assocaton mples that jks jq qk qjkr qq qjq qqk 4 and condtonal odds ratos are dentcal to margnal odds ratos. Whttemore Ž used perfect tables to llustrate that for I J K tables wth K, condtonal and margnal odds ratos can be dentcal even when no par of varables s condtonally ndependent. See also Davs Ž 1986b.. Suppose that the dfference of proportons or relatve rsk, computed for a bnary response Y and predctor X, sthe same at every level of Z. If Z s ndependent of X n the margnal XZ table or f Z s condtonally ndependent of Y gven X, the measure has the same value n the margnal XY table Ž Shapro Thus, for factoral desgns wth the same number of observatons at each combnaton of levels, the dfference of proportons and relatve rsk are collapsble. See also Wermuth Ž Secton 9.: Model Selecton and Comparson 9.3. Artcles on loglnear model selecton nclude Atkn Ž 1979, 1980., Benedett and Brown Ž 1978., Brown Ž 1976., Goodman Ž 1970, 1971a., Wermuth Ž 1976., and Whttaker and Atkn When a certan model holds, G rdf has an asymptotc mean of 1. Goodman Ž 1971a. recommended ths ndex for comparng fts. Smaller values represent better fts.
269 NOTES Kullback et al. Ž 196. and Lancaster Ž were among the frst to partton ch-squared statstcs n multway tables. Goodman Ž and Plackett Ž 196. noted dffcultes wth ther approaches. When observatons have dstrbuton n the natural exponental famly, Smon Ž showed G Ž M M. s Ý logž r. 0 1 ˆ1 ˆ1 ˆ0 whenever models are lnear n the natural parameters. See Lang Ž 1996b. for parttonngs for more complex models. Secton 9.4: Modelng Ordnal Assocatons 9.5. Goodman Ž 1979a. stmulated research on loglnear models for ordnal data. Hs work XY extended Haberman 1974b, who expressed the assocaton term wth an expanson n orthogonal polynomals. For more general ordnal models for multway tables, see Agrest Ž 1984., Becker Ž 1989a., Becker and Clogg Ž 1989., and Goodman Ž Secton 9.6: Assocaton Models, Correlaton Models, and Correspondence Analyss 9.6. Early artcles on the RC model nclude Goodman Ž 1979a, 1981a, b. and Andersen Ž 1980, pp , apparently partly motvated by earler work of G. Rasch Žsee Andersen Anderson and Bockenholt Ž 000., Becker Ž 1989a, b, 1990., Becker and Clogg Ž 1989., Chuang et al. Ž 1985., and Goodman Ž 1985, 1986, dscussed generalzatons for multway tables. Anderson Ž dscussed a related model. Anderson and Vermunt Ž 000. showed that RC and related assocaton models arse when observed varables are condtonally ndependent gven a latent varable that s condtonally normal, gven the observed varables. Ther work generalzes results n Laurtzen and Wermuth Ž and dscusson by Whttaker of van der Hejden et al. Ž See also de Falguerolles et al. Ž Clogg and Shhadeh Ž surveyed assocaton models and related correlaton models Kendall and Stuart Ž 1979, Chap. 33. surveyed basc canoncal correlaton methods for contngency tables. See also Wllams Ž 195., who dscussed earler work by R. A. Fsher and others. Karl Pearson often analyzed tables by assumng an underlyng bvarate normal dstrbuton Ž Secton For estmatng that dstrbuton s correlaton, see Becker Ž 1989b., Goodman Ž 1981b., Kendall and Stuart Ž1979, Chaps. 6 and 33., Lancaster Ž 1969, Chap. X., the Pearson Ž tetrachorc correlaton for tables, and the Lancaster and Hamdan Ž polychorc correlaton for I J tables Correspondence analyss ganed popularty n France under the nfluence of Benzecr Ž see, e.g., Goodman Ž attrbuted ts orgns to H. O. Hartley, publshng under hs orgnal German name Ž Hrschfeld, Greenacre Ž related t to the sngular value decomposton of a matrx. For other dscusson, see Escoufer Ž 198., Frendly Ž 000, Chap. 5., Goodman Ž 1986, 1996, 000., Mchalds and de Leeuw Ž 1998., van der Hejden and de Leeuw Ž 1985., and van der Hejden et al. Ž Gabrel Ž dscussed related work on bplots. Secton 9.7: Posson Regresson for Rates 9.9. Another applcaton usng offsets s table standardzaton Ž Secton For analyses of rate data, see Breslow and Day Ž 1987, Sec. 4.5., Freeman and Holford Ž 1980., Frome Ž 1983., and Hoem Ž Artcles dealng wth grouped survval data, partcularly loglnear and logt models for survval probabltes, nclude Aranda-Ordaz Ž 1983., Larson Ž 1984., Prentce and Gloeckler Ž 1978., Schluchter and Jackson Ž 1989., Stokes et al. Ž 000, Chap. 17., and Thompson Ž Atkn and Clayton Ž dscussed exponental survval models and also presented smlar models havng hazard functons
270 400 BUILDING AND EXTENDING LOGLINEAR r LOGIT MODELS for Webull or extreme-value survval dstrbutons. Log lkelhood Ž actually apples only for nonnformat e censorng mechansms. It does not make sense f subjects tend to wthdraw from the study because of factors related to t, perhaps because of health effects related to one of the treatments Lndsey and Mersch Ž 199. showed a clever way to use loglnear models to ft exponental famly dstrbutons fž y;. of form Ž wth known. One breaks the response scale nto ntervals Ž y y r, y q r.4 k k k k. Counts n those ntervals follow a multnomal wth probabltes approxmated by fž y k,. k 4. The log expected count approxmatons are lnear n wth an offset. PROBLEMS Applcatons 9.1 Use odds ratos n Table 8.3 to llustrate the collapsblty condtons. a. For Ž A, C, M., all condtonal odds ratos equal 1.0. Explan why all reported margnal odds ratos equal 1.0. b. For Ž AC, M., explan why all condtonal odds ratos are the same as the margnal odds ratos, and all ˆ acq s n acq. c. For Ž AM, CM., explan why the AC condtonal odds ratos of 1.0 need not be the same as the AC margnal odds rato, the AM and CM condtonal odds ratos are the same as the margnal odds ratos, and all ˆ aqm s naqm and ˆ qcm s n qcm. d. For Ž AC, AM, CM., explan why no condtonal odds ratos need be the same as the related margnal odds ratos, and the ftted margnal odds ratos must equal the sample margnal odds ratos. 9. Table 9.17 summarzes a study wth varables age of mother Ž A., length of gestaton Ž G. n days, nfant survval Ž I., and number of cgarettes smoked per day durng the prenatal perod Ž S.. Treat G and I as response varables and A and S as explanatory. a. Explan why a loglnear model should nclude the AS term. b. Ft the models Ž AGIS., Ž AGI, AIS, AGS, GIS., Ž AG, AI, AS, GI, GS, IS., and Ž AS, G, I.. Identfy a subset of models nested between two of these that may ft well. Select one such model. c. Use Ž. forward selecton, and backward elmnaton to buld a model. Compare the results of the strateges, and nterpret the models chosen. 9.3 Refer to Table.13. Consder the nested set Ž DVP., Ž DP, VP, DV., Ž VP, DV., Ž P, DV., Ž D, V, P.4. Partton ch-squared to compare the four pars, ensurng that the overall type I error probablty for the four comparsons does not exceed s Whch model would you select, usng a backward comparson startng wth Ž DVP.? Show that the fnal
271 PROBLEMS 401 TABLE 9.17 Data for Problem 9. Infant Survval Age Smokng Gestaton No Yes 30 5 F q F q 5 F q F Source: N. Wermuth, pp n Proc. 9th Internatonal Bometrcs Conference, Vol Reprnted wth permsson from the Bometrc Socety. model selected depends on the choce of nested set, by repeatng the analyss wth DP, VP, DV, DP, DV, P, DV, D, V, P. 9.4 Consder the loglnear model selecton for Table 6.3. a. Why s t not sensble to consder models omttng the GM term? b. Usng forward selecton startng wth Ž GM, E, P., show that model Ž GM, GP, EG, EMP. seems reasonable. c. Usng backward elmnaton, show that Ž GM, GP, EMP. or Ž GM, GP, EG, EMP. seems reasonable. d. The EMP nteracton seems vtal. To descrbe t, show that the effect of extramartal sex on dvorce s greater for subjects who had no premartal sex. e. Use resduals to descrbe the lack of ft of model Ž GM, EMP For model Ž AC, AM, CM. wth Table 8.3, the standardzed Pearson resdual n each cell equals Interpret, and explan why each one has the same absolute value. By contrast, model Ž AM, CM. has standardzed Pearson resdual 3.70 n each cell where M s yes Že.g., q3.70 when A s C s yes. and 1.80 n each cell where M s no Ž e.g., q1.80 when A s C s yes.. Interpret. 9.6 Refer to Table 8.8. Conduct a resdual analyss wth the model of no three-factor nteracton to descrbe the nature of the nteracton. 9.7 Perform a resdual analyss for the ndependence model wth Table 3.. Explan why t suggests that the lnear-by-lnear assocaton model may ft better. Ft t, compare to the ndependence model, and nterpret.
272 40 BUILDING AND EXTENDING LOGLINEAR r LOGIT MODELS 9.8 Refer to Problem 9.7. a. Usng standardzed scores, fnd. ˆ Comment on the strength of assocaton. b. Ft a model n whch job satsfacton scores are parameters. Interpret the estmated scores, and compare the ft to the L L model. 9.9 Refer to Table 9.3. a. For the lnear-by-lnear assocaton model, construct a 95% confdence nterval for the odds rato usng the four corner cells. Interpret. b. Ft the column effects model. Compare estmated column scores to the equal-nterval scores n part Ž. a. Test that the true column scores are equal-nterval, gven that the model holds. Interpret. Construct a 95% confdence nterval for the odds rato usng the four corner cells. Compare to part Ž. a A weak local assocaton may be substantvely mportant for nonlocal categores. Illustrate wth the L L model for Table 9.9, showng how the estmated odd rato for the four corner cells compares to the estmated local odds rato Refer to Table 7.8. Ft the homogeneous lnear-by-lnear assocaton model, and nterpret. Test condtonal ndependence between ncome Ž I. and job satsfacton Ž S., controllng for gender Ž G., usng Ž a. that model, and Ž b. model Ž IS, IG, SG.. Explan why the results are so dfferent. 9.1 Ft the RC model to Table 9.3. Interpret the estmated scores. Does t ft better than the unform assocaton model? 9.13 Replcate the results n Secton 9.6 for the correlaton and correspondence models wth Table One hundred leukema patents were randomly assgned to two treatments. Durng the study, 10 subjects on treatment A ded and 18 subjects on treatment B ded. The total tme at rsk was years for treatment A and years for treatment B. Test whether the two treatments have the same death rates. Compare the rates wth a confdence nterval For Table 9.11, ft a model n whch death rate depends only on age. Interpret the age effect Consder model Ž What s the effect on the model parameter estmates, ther standard errors, and the goodness-of-ft statstcs when Ž a. the tmes at rsk are doubled, but the numbers of deaths stay the
273 PROBLEMS 403 same; Ž b. the tmes at rsk stay the same, but the numbers of deaths double; and Ž. c the tmes at rsk and the numbers of deaths both double Consder Table Explan how one could analyze whether the hazard depends on tme An artcle by W. A. Ray et al. Ž Amer. J. Epdemol. 13: , 199. dealt wth motor vehcle accdent rates for 16,6 subjects aged years, wth data on each for up to 4 years. In 17.3 thousand years of observaton, the women had 175 accdents n whch an njury occurred. In 1.4 thousand years, men had 30 njurous accdents. a. Fnd a 95% confdence nterval for the true overall rate of njurous accdents. b. Usng a model, compare the rates for men and women A table at the text s Web ste Ž aarcdarcda.html. shows the number of tran mles Ž n mllons. and the number of collsons nvolvng Brtsh Ral passenger trans between 1970 and A Posson model assumng a constant log rate over the 14-year perod has ˆ sy4.177 SE s and X s 14.8 Ž df s 13.. Interpret. 9.0 Table 9.18 lsts total attendance Ž n thousands. and the total number of arrests n the season for soccer teams n the Second Dvson of the Brtsh football league. Let Y s number of arrests for a team, and let t s total attendance. Explan why the model EY s t TABLE 9.18 Data for Problem 9.0 Attendance Attendance Team Ž thousands. Arrests Team Ž thousands. Arrests Aston Vlla Shrewsbury Bradford Cty Swndon Town Leeds Unted Sheffeld Utd Bournemouth Stoke Cty West Brom 13 Barnsley Hudderfeld Mllwall Mddlesbro Hull Cty Brmngham Manchester Cty Ipswch Town Plymouth 6 9 Lecester Cty 3 81 Readng Blackburn Oldham Crystal Palace Source: The Independent London, Dec. 1, Thanks to P. M. E. Altham for showng me these data.
274 404 BUILDING AND EXTENDING LOGLINEAR r LOGIT MODELS mght be plausble. Assumng Posson samplng, ft t and nterpret. Plot arrests aganst attendance, and overlay the predcton equaton. Use resduals to dentfy teams that had arrest counts much dfferent than expected. TABLE 9.19 Data for Problem 9.1 Person-Years Coronary Deaths Age Nonsmokers Smokers Nonsmokers Smokers ,793 5, ,673 43, , , Source: R.Doll and A. B. Hll, Natl. Cancer Inst. Monogr. 19: Ž See also N. R. Breslow n A Celebraton of Statstcs, ed. A. C. Atknson and S. E. Fenberg, ŽNew York: Sprnger-Verlag, Table 9.19 s based on a study wth Brtsh doctors. a. For each age, fnd the sample coronary death rates per 1000 person-years for nonsmokers and smokers. To compare them, take ther rato and descrbe ts dependence on age. b. Ft a man-effects model for the log rates havng four parameters for age and one for smokng. In dscussng lack of ft, show that ths model assumes a constant rato of nonsmokers to smokers coronary death rates over age. c. From part Ž. a, explan why t s sensble to add a quanttatve nteracton of age and smokng. For ths model, show that the log rato of coronary death rates changes lnearly wth age. Assgn scores to age, ft the model, and nterpret. 9. Analyze Table 9.9 usng ordnal logt models. Interpret, and dscuss advantagesrdsadvantages compared to loglnear analyses. 9.3 Refer to Problem 8.6. Analyze these data, usng methods of ths chapter. Theory and Methods 9.4 In a K table, the true XY condtonal odds ratos are dentcal, but dfferent from the XY margnal odds rato. Is there three-factor nteracton? Is Z condtonally ndependent of X or Y? Explan.
275 PROBLEMS Consder loglnear model WX, XY, YZ. Explan why W and Z are ndependent gven X alone or gven Y alone or gven both X and Y. When are W and Y condtonally ndependent? When are X and Z condtonally ndependent? 9.6 Suppose that loglnear model Ž XY, XZ. holds. a. Fnd jq and log jq. Show the loglnear model for the XY XY margnal table has the same assocaton parameters as 4 j n Ž XY, XZ.. Deduce that odds ratos are the same n the XY margnal table as n the partal tables. Usng an analogous result for model Ž XY, YZ., deduce the collapsblty condtons n Secton b. Calculate log for model Ž XY, XZ, YZ. jq, and explan why mar- gnal assocatons need not equal condtonal assocatons. 9.7 For a four-way table, s the WX condtonal assocaton the same as the WX margnal assocaton for the loglnear model Ž a. Ž WX, XYZ.? and Ž b. Ž WX, WZ, XY, YZ.? Why? 9.8 Loglnear model M0 s a specal case of loglnear model M 1. a. Explan why the ftted values for the two models are dentcal n the suffcent margnal dstrbutons for M 0. b. Haberman Ž 1974a. showed that when 4 ˆ satsfy any model that s a specal case of M 0, Ý ˆ 1 log ˆ s ݈ 0 log ˆ. Thus, ˆ 0 s the orthogonal projecton of ˆ 1 onto the lnear manfold of log 4 satsfyng M. Usng ths, show that G Ž M. y G Ž M s Ý ˆ logž ˆ r ˆ Ž. 9.9 Refer to Secton Show that G Mj Mjy1 equals G for nde- pendence n the table comparng columns 1 through j y 1 wth column j For T categorcal varables X 1,..., X T, explan why: a. G Ž X, X,..., X. s G Ž X, X. q G Ž X X, X. 1 T q qg Ž X X X, X. 1 Ty1 T. b. G Ž X X, X. s G Ž X, X. q G Ž X X, X X. 1 Ty1 T 1 T 1 T 1 q qg Ž X X X, X X X X.. 1 Ty1 1 Ty T 9.31 For I contngency tables, explan why the lnear-by-lnear assoca- ton model s equvalent to the lnear logt model Consder the L L model Ž 9.6. wth sj4 replaced by s j 4 j j. ˆ 4 ˆ 4 Explan why s halved but,,andg are unchanged. ˆj j
276 406 BUILDING AND EXTENDING LOGLINEAR r LOGIT MODELS 9.33 Lehmann Ž defned Ž X, Y. to be post ely lkelhood-rato dependent f ther jont densty satsfes fž x, y. fž x, y. 1 1 G fž x, y. fž x, y. 1 1 whenever x1 x and y1 y. Then, the condtonal dstrbuton of Y Ž X. stochastcally ncreases as X Ž Y. ncreases Ž Goodman 1981a.. a. For the L L model, show that the condtonal dstrbutons of Y and of X are stochastcally ordered. What s ts nature f 0? b. In row effects model Ž 9.8., f h, show that the condtonal dstrbuton of Y s stochastcally hgher n row than n row h. Explan why 1 s s I s equvalent to the equalty of the I condtonal dstrbutons wthn rows Yule Ž defned a table to be sotropc f an orderng of rows and of columns exsts such that the local log odds ratos are all nonnegatve wsee also Goodman Ž 1981a.x. a. Show that a table s sotropc f t satsfes Ž. the lnear-by-lnear assocaton model, the row effects model, and the RC model. b. Explan why a table that s sotropc for a certan orderng s stll sotropc when adjacent rows or columns are combned Consder the log lkelhood for the lnear-by-lnear assocaton model. a. Dfferentatng wth respect to and evaluatng at s 0 and null estmates of parameters, show that the score functon s proportonal to ÝÝ j j q qj j u Ž p y p p.. b. Use the delta method to show that ts null SE s 1r ½ Ý Ž Ý. Ý Ž Ý q q j qj j qj. 5 up y up p y p n. c. Construct a score statstc for testng ndependence. Show that t s essentally the correlaton test Ž whrotsu Ž 198. dscussed a famly of score tests for ordered alternatves.x 9.36 Gven the parenthetcal result n Problem 7.33, show that f cumulatve logt model Ž 7.4. holds and s small, the lnear-by-lnear assocaton model should ft well wth row scores x 4 and rdt column scores sw PYF Ž j y 1. q PYF Ž j.xr 4 j,wth ts parameter about twce for model Ž 7.4..
277 PROBLEMS Consder the row effects model Ž a. Show that no loss of generalty occurs n lettng I X s Y J s Is 0. b. Show that mnmal suffcent statstcs are n 4, n 4,and q qj Ý j jn j, s 1,...,I 4,andderve the lkelhood equatons Show that the column effects model corresponds to a baselne-category logt model for Y that s lnear n scores for X, wth slope dependng on the pared response categores Refer to the homogeneous lnear-by-lnear assocaton model a. Show that the lkelhood equatons are, for all, j, and k, ÝÝ ÝÝ ˆ s n, ˆ s n, u ˆ s u n. qk qk qjk qjk j jq j jq j j b. Show that resdual df s KŽ Iy 1.Ž J y 1. y 1. c. When I s J s, explan why t s equvalent to Ž XY, XZ, YZ.. d. Show how the last lkelhood equaton above changes for heterogeneous lnear-by-lnear XY assocaton Ž Explan why, n each stratum, the ftted XY correlaton equals the sample correlaton When model Ž XY, XZ, YZ. s nadequate and varables are ordnal, useful models are nested between t and Ž XYZ.. For ordered scores u 4, 4,andw 4,consder j k log jks q X q Y j q Z k q j XY q k XZ q YZ jk q u jw k. Ž 9.. a. Defne jk s jžkq1. r jžk. s Ž jq1. kr Ž j.k s Žq1. jkr Ž. jk. For unt-spaced scores, show that log s. Goodman Ž 1979a. jk called ths the unform nteracton model. b. Show that log odds ratos for any two varables change lnearly across levels of the thrd varable. c. Show that the lkelhood equatons are those for model Ž XY, XZ, YZ. plus ÝÝÝ ÝÝÝ u w ˆ s u w n. j k jk j k jk j k j k d. Explan why model 9.1 s a specal case of model Construct a model havng general XZ and YZ assocatons, but row effects for the XY assocaton that are Ž a. homogeneous, and Ž b. heterogeneous across levels of Z. Interpret.
278 408 BUILDING AND EXTENDING LOGLINEAR r LOGIT MODELS 9.4 Explan why the RC model requres scale constrants for the scores. Show the resdual df s Ž I y.ž J y.fnd. and nterpret the lkelhood equatons. Explan why the ft s nvarant to category orderngs Refer to correlaton model Ž Ž Goodman 1985, a. Show that s the correlaton between the scores. b. If ths model holds, show that Ý Ž r. j qj s j and Ý Ž r. j j j q s. Interpret. c. Wth close to zero, show that logž. j has form q q jq ož., where ož. r 0as 0. Thus, when the assocaton s weak, the correlaton model s smlar to the lnear-by-lnear assocaton model wth s and scores u s 4 and s 4. j j 9.44 For the general canoncal correlaton model, show that Ý k s ÝÝ j jy q qj r q qj. Thus, the squared correlatons partton a dependence measure that s the noncentralty 6.8 of X for the ndependence model wth n s 1. wgoodman Ž stated other parttonngs.x 9.45 Refer to model Ž Gven the tmes at rsk t 4 j,showthat suffcent statstcs are n 4 and n 4. q qj 9.46 Refer to Secton Let T s Ýt and W s Ýw. Suppose that survval tmes have a negatve exponental dstrbuton wth parameter. a. Usng log lkelhood Ž 9.19., show that ˆ s WrT. b. Condtonal on T, show that W has a Posson dstrbuton wth mean T. Usng the Posson lkelhood, show that ˆ s WrT Show that ML estmates do not exst for Table w Hnt: Haberman Ž 1973b, 1974a, p. 398.: If ˆ111 s c 0, then margnal constrants the model satsfy mply that syc. x ˆ 9.48 For a loglnear model, explan heurstcally why the ML estmate of a parameter s nfnte when ts suffcent statstc takes ts maxmum or mnmum possble value, for gven values of other suffcent statstcs.
benefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ).
REVIEW OF RISK MANAGEMENT CONCEPTS LOSS DISTRIBUTIONS AND INSURANCE Loss and nsurance: When someone s subject to the rsk of ncurrng a fnancal loss, the loss s generally modeled usng a random varable or
THE DISTRIBUTION OF LOAN PORTFOLIO VALUE * Oldrich Alfons Vasicek
HE DISRIBUION OF LOAN PORFOLIO VALUE * Oldrch Alfons Vascek he amount of captal necessary to support a portfolo of debt securtes depends on the probablty dstrbuton of the portfolo loss. Consder a portfolo
PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 12
14 The Ch-squared dstrbuton PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 1 If a normal varable X, havng mean µ and varance σ, s standardsed, the new varable Z has a mean 0 and varance 1. When ths standardsed
CS 2750 Machine Learning. Lecture 3. Density estimation. CS 2750 Machine Learning. Announcements
Lecture 3 Densty estmaton Mlos Hauskrecht [email protected] 5329 Sennott Square Next lecture: Matlab tutoral Announcements Rules for attendng the class: Regstered for credt Regstered for audt (only f there
Causal, Explanatory Forecasting. Analysis. Regression Analysis. Simple Linear Regression. Which is Independent? Forecasting
Causal, Explanatory Forecastng Assumes cause-and-effect relatonshp between system nputs and ts output Forecastng wth Regresson Analyss Rchard S. Barr Inputs System Cause + Effect Relatonshp The job of
NPAR TESTS. One-Sample Chi-Square Test. Cell Specification. Observed Frequencies 1O i 6. Expected Frequencies 1EXP i 6
PAR TESTS If a WEIGHT varable s specfed, t s used to replcate a case as many tmes as ndcated by the weght value rounded to the nearest nteger. If the workspace requrements are exceeded and samplng has
An Alternative Way to Measure Private Equity Performance
An Alternatve Way to Measure Prvate Equty Performance Peter Todd Parlux Investment Technology LLC Summary Internal Rate of Return (IRR) s probably the most common way to measure the performance of prvate
BERNSTEIN POLYNOMIALS
On-Lne Geometrc Modelng Notes BERNSTEIN POLYNOMIALS Kenneth I. Joy Vsualzaton and Graphcs Research Group Department of Computer Scence Unversty of Calforna, Davs Overvew Polynomals are ncredbly useful
Latent Class Regression. Statistics for Psychosocial Research II: Structural Models December 4 and 6, 2006
Latent Class Regresson Statstcs for Psychosocal Research II: Structural Models December 4 and 6, 2006 Latent Class Regresson (LCR) What s t and when do we use t? Recall the standard latent class model
CHAPTER 14 MORE ABOUT REGRESSION
CHAPTER 14 MORE ABOUT REGRESSION We learned n Chapter 5 that often a straght lne descrbes the pattern of a relatonshp between two quanttatve varables. For nstance, n Example 5.1 we explored the relatonshp
THE METHOD OF LEAST SQUARES THE METHOD OF LEAST SQUARES
The goal: to measure (determne) an unknown quantty x (the value of a RV X) Realsaton: n results: y 1, y 2,..., y j,..., y n, (the measured values of Y 1, Y 2,..., Y j,..., Y n ) every result s encumbered
How To Calculate The Accountng Perod Of Nequalty
Inequalty and The Accountng Perod Quentn Wodon and Shlomo Ytzha World Ban and Hebrew Unversty September Abstract Income nequalty typcally declnes wth the length of tme taen nto account for measurement.
1 Example 1: Axis-aligned rectangles
COS 511: Theoretcal Machne Learnng Lecturer: Rob Schapre Lecture # 6 Scrbe: Aaron Schld February 21, 2013 Last class, we dscussed an analogue for Occam s Razor for nfnte hypothess spaces that, n conjuncton
8.5 UNITARY AND HERMITIAN MATRICES. The conjugate transpose of a complex matrix A, denoted by A*, is given by
6 CHAPTER 8 COMPLEX VECTOR SPACES 5. Fnd the kernel of the lnear transformaton gven n Exercse 5. In Exercses 55 and 56, fnd the mage of v, for the ndcated composton, where and are gven by the followng
1 De nitions and Censoring
De ntons and Censorng. Survval Analyss We begn by consderng smple analyses but we wll lead up to and take a look at regresson on explanatory factors., as n lnear regresson part A. The mportant d erence
1. Measuring association using correlation and regression
How to measure assocaton I: Correlaton. 1. Measurng assocaton usng correlaton and regresson We often would lke to know how one varable, such as a mother's weght, s related to another varable, such as a
What is Candidate Sampling
What s Canddate Samplng Say we have a multclass or mult label problem where each tranng example ( x, T ) conssts of a context x a small (mult)set of target classes T out of a large unverse L of possble
STATISTICAL DATA ANALYSIS IN EXCEL
Mcroarray Center STATISTICAL DATA ANALYSIS IN EXCEL Lecture 6 Some Advanced Topcs Dr. Petr Nazarov 14-01-013 [email protected] Statstcal data analyss n Ecel. 6. Some advanced topcs Correcton for
Recurrence. 1 Definitions and main statements
Recurrence 1 Defntons and man statements Let X n, n = 0, 1, 2,... be a MC wth the state space S = (1, 2,...), transton probabltes p j = P {X n+1 = j X n = }, and the transton matrx P = (p j ),j S def.
Calculation of Sampling Weights
Perre Foy Statstcs Canada 4 Calculaton of Samplng Weghts 4.1 OVERVIEW The basc sample desgn used n TIMSS Populatons 1 and 2 was a two-stage stratfed cluster desgn. 1 The frst stage conssted of a sample
Can Auto Liability Insurance Purchases Signal Risk Attitude?
Internatonal Journal of Busness and Economcs, 2011, Vol. 10, No. 2, 159-164 Can Auto Lablty Insurance Purchases Sgnal Rsk Atttude? Chu-Shu L Department of Internatonal Busness, Asa Unversty, Tawan Sheng-Chang
SIMPLE LINEAR CORRELATION
SIMPLE LINEAR CORRELATION Smple lnear correlaton s a measure of the degree to whch two varables vary together, or a measure of the ntensty of the assocaton between two varables. Correlaton often s abused.
Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur
Module LOSSLESS IMAGE COMPRESSION SYSTEMS Lesson 3 Lossless Compresson: Huffman Codng Instructonal Objectves At the end of ths lesson, the students should be able to:. Defne and measure source entropy..
The OC Curve of Attribute Acceptance Plans
The OC Curve of Attrbute Acceptance Plans The Operatng Characterstc (OC) curve descrbes the probablty of acceptng a lot as a functon of the lot s qualty. Fgure 1 shows a typcal OC Curve. 10 8 6 4 1 3 4
Binomial Link Functions. Lori Murray, Phil Munz
Bnomal Lnk Functons Lor Murray, Phl Munz Bnomal Lnk Functons Logt Lnk functon: ( p) p ln 1 p Probt Lnk functon: ( p) 1 ( p) Complentary Log Log functon: ( p) ln( ln(1 p)) Motvatng Example A researcher
Exhaustive Regression. An Exploration of Regression-Based Data Mining Techniques Using Super Computation
Exhaustve Regresson An Exploraton of Regresson-Based Data Mnng Technques Usng Super Computaton Antony Daves, Ph.D. Assocate Professor of Economcs Duquesne Unversty Pttsburgh, PA 58 Research Fellow The
Feature selection for intrusion detection. Slobodan Petrović NISlab, Gjøvik University College
Feature selecton for ntruson detecton Slobodan Petrovć NISlab, Gjøvk Unversty College Contents The feature selecton problem Intruson detecton Traffc features relevant for IDS The CFS measure The mrmr measure
Meta-Analysis of Hazard Ratios
NCSS Statstcal Softare Chapter 458 Meta-Analyss of Hazard Ratos Introducton Ths module performs a meta-analyss on a set of to-group, tme to event (survval), studes n hch some data may be censored. These
Support Vector Machines
Support Vector Machnes Max Wellng Department of Computer Scence Unversty of Toronto 10 Kng s College Road Toronto, M5S 3G5 Canada [email protected] Abstract Ths s a note to explan support vector machnes.
Forecasting the Direction and Strength of Stock Market Movement
Forecastng the Drecton and Strength of Stock Market Movement Jngwe Chen Mng Chen Nan Ye [email protected] [email protected] [email protected] Abstract - Stock market s one of the most complcated systems
Statistical algorithms in Review Manager 5
Statstcal algorthms n Reve Manager 5 Jonathan J Deeks and Julan PT Hggns on behalf of the Statstcal Methods Group of The Cochrane Collaboraton August 00 Data structure Consder a meta-analyss of k studes
Regression Models for a Binary Response Using EXCEL and JMP
SEMATECH 997 Statstcal Methods Symposum Austn Regresson Models for a Bnary Response Usng EXCEL and JMP Davd C. Trndade, Ph.D. STAT-TECH Consultng and Tranng n Appled Statstcs San Jose, CA Topcs Practcal
Lecture 3: Force of Interest, Real Interest Rate, Annuity
Lecture 3: Force of Interest, Real Interest Rate, Annuty Goals: Study contnuous compoundng and force of nterest Dscuss real nterest rate Learn annuty-mmedate, and ts present value Study annuty-due, and
v a 1 b 1 i, a 2 b 2 i,..., a n b n i.
SECTION 8.4 COMPLEX VECTOR SPACES AND INNER PRODUCTS 455 8.4 COMPLEX VECTOR SPACES AND INNER PRODUCTS All the vector spaces we have studed thus far n the text are real vector spaces snce the scalars are
Quantization Effects in Digital Filters
Quantzaton Effects n Dgtal Flters Dstrbuton of Truncaton Errors In two's complement representaton an exact number would have nfntely many bts (n general). When we lmt the number of bts to some fnte value
Characterization of Assembly. Variation Analysis Methods. A Thesis. Presented to the. Department of Mechanical Engineering. Brigham Young University
Characterzaton of Assembly Varaton Analyss Methods A Thess Presented to the Department of Mechancal Engneerng Brgham Young Unversty In Partal Fulfllment of the Requrements for the Degree Master of Scence
Institute of Informatics, Faculty of Business and Management, Brno University of Technology,Czech Republic
Lagrange Multplers as Quanttatve Indcators n Economcs Ivan Mezník Insttute of Informatcs, Faculty of Busness and Management, Brno Unversty of TechnologCzech Republc Abstract The quanttatve role of Lagrange
Answer: A). There is a flatter IS curve in the high MPC economy. Original LM LM after increase in M. IS curve for low MPC economy
4.02 Quz Solutons Fall 2004 Multple-Choce Questons (30/00 ponts) Please, crcle the correct answer for each of the followng 0 multple-choce questons. For each queston, only one of the answers s correct.
The Development of Web Log Mining Based on Improve-K-Means Clustering Analysis
The Development of Web Log Mnng Based on Improve-K-Means Clusterng Analyss TngZhong Wang * College of Informaton Technology, Luoyang Normal Unversty, Luoyang, 471022, Chna [email protected] Abstract.
CHAPTER 5 RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES
CHAPTER 5 RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES In ths chapter, we wll learn how to descrbe the relatonshp between two quanttatve varables. Remember (from Chapter 2) that the terms quanttatve varable
Section 5.4 Annuities, Present Value, and Amortization
Secton 5.4 Annutes, Present Value, and Amortzaton Present Value In Secton 5.2, we saw that the present value of A dollars at nterest rate per perod for n perods s the amount that must be deposted today
Statistical Methods to Develop Rating Models
Statstcal Methods to Develop Ratng Models [Evelyn Hayden and Danel Porath, Österrechsche Natonalbank and Unversty of Appled Scences at Manz] Source: The Basel II Rsk Parameters Estmaton, Valdaton, and
Risk-based Fatigue Estimate of Deep Water Risers -- Course Project for EM388F: Fracture Mechanics, Spring 2008
Rsk-based Fatgue Estmate of Deep Water Rsers -- Course Project for EM388F: Fracture Mechancs, Sprng 2008 Chen Sh Department of Cvl, Archtectural, and Envronmental Engneerng The Unversty of Texas at Austn
DEFINING %COMPLETE IN MICROSOFT PROJECT
CelersSystems DEFINING %COMPLETE IN MICROSOFT PROJECT PREPARED BY James E Aksel, PMP, PMI-SP, MVP For Addtonal Informaton about Earned Value Management Systems and reportng, please contact: CelersSystems,
How Sets of Coherent Probabilities May Serve as Models for Degrees of Incoherence
1 st Internatonal Symposum on Imprecse Probabltes and Ther Applcatons, Ghent, Belgum, 29 June 2 July 1999 How Sets of Coherent Probabltes May Serve as Models for Degrees of Incoherence Mar J. Schervsh
Realistic Image Synthesis
Realstc Image Synthess - Combned Samplng and Path Tracng - Phlpp Slusallek Karol Myszkowsk Vncent Pegoraro Overvew: Today Combned Samplng (Multple Importance Samplng) Renderng and Measurng Equaton Random
Analysis of Premium Liabilities for Australian Lines of Business
Summary of Analyss of Premum Labltes for Australan Lnes of Busness Emly Tao Honours Research Paper, The Unversty of Melbourne Emly Tao Acknowledgements I am grateful to the Australan Prudental Regulaton
+ + + - - This circuit than can be reduced to a planar circuit
MeshCurrent Method The meshcurrent s analog of the nodeoltage method. We sole for a new set of arables, mesh currents, that automatcally satsfy KCLs. As such, meshcurrent method reduces crcut soluton to
Portfolio Loss Distribution
Portfolo Loss Dstrbuton Rsky assets n loan ortfolo hghly llqud assets hold-to-maturty n the bank s balance sheet Outstandngs The orton of the bank asset that has already been extended to borrowers. Commtment
CHOLESTEROL REFERENCE METHOD LABORATORY NETWORK. Sample Stability Protocol
CHOLESTEROL REFERENCE METHOD LABORATORY NETWORK Sample Stablty Protocol Background The Cholesterol Reference Method Laboratory Network (CRMLN) developed certfcaton protocols for total cholesterol, HDL
Logistic Regression. Lecture 4: More classifiers and classes. Logistic regression. Adaboost. Optimization. Multiple class classification
Lecture 4: More classfers and classes C4B Machne Learnng Hlary 20 A. Zsserman Logstc regresson Loss functons revsted Adaboost Loss functons revsted Optmzaton Multple class classfcaton Logstc Regresson
Traffic-light a stress test for life insurance provisions
MEMORANDUM Date 006-09-7 Authors Bengt von Bahr, Göran Ronge Traffc-lght a stress test for lfe nsurance provsons Fnansnspetonen P.O. Box 6750 SE-113 85 Stocholm [Sveavägen 167] Tel +46 8 787 80 00 Fax
) of the Cell class is created containing information about events associated with the cell. Events are added to the Cell instance
Calbraton Method Instances of the Cell class (one nstance for each FMS cell) contan ADC raw data and methods assocated wth each partcular FMS cell. The calbraton method ncludes event selecton (Class Cell
SPEE Recommended Evaluation Practice #6 Definition of Decline Curve Parameters Background:
SPEE Recommended Evaluaton Practce #6 efnton of eclne Curve Parameters Background: The producton hstores of ol and gas wells can be analyzed to estmate reserves and future ol and gas producton rates and
Staff Paper. Farm Savings Accounts: Examining Income Variability, Eligibility, and Benefits. Brent Gloy, Eddy LaDue, and Charles Cuykendall
SP 2005-02 August 2005 Staff Paper Department of Appled Economcs and Management Cornell Unversty, Ithaca, New York 14853-7801 USA Farm Savngs Accounts: Examnng Income Varablty, Elgblty, and Benefts Brent
Section 2 Introduction to Statistical Mechanics
Secton 2 Introducton to Statstcal Mechancs 2.1 Introducng entropy 2.1.1 Boltzmann s formula A very mportant thermodynamc concept s that of entropy S. Entropy s a functon of state, lke the nternal energy.
Sketching Sampled Data Streams
Sketchng Sampled Data Streams Florn Rusu, Aln Dobra CISE Department Unversty of Florda Ganesvlle, FL, USA [email protected] [email protected] Abstract Samplng s used as a unversal method to reduce the
L10: Linear discriminants analysis
L0: Lnear dscrmnants analyss Lnear dscrmnant analyss, two classes Lnear dscrmnant analyss, C classes LDA vs. PCA Lmtatons of LDA Varants of LDA Other dmensonalty reducton methods CSCE 666 Pattern Analyss
An Evaluation of the Extended Logistic, Simple Logistic, and Gompertz Models for Forecasting Short Lifecycle Products and Services
An Evaluaton of the Extended Logstc, Smple Logstc, and Gompertz Models for Forecastng Short Lfecycle Products and Servces Charles V. Trappey a,1, Hsn-yng Wu b a Professor (Management Scence), Natonal Chao
The Application of Fractional Brownian Motion in Option Pricing
Vol. 0, No. (05), pp. 73-8 http://dx.do.org/0.457/jmue.05.0..6 The Applcaton of Fractonal Brownan Moton n Opton Prcng Qng-xn Zhou School of Basc Scence,arbn Unversty of Commerce,arbn [email protected]
Luby s Alg. for Maximal Independent Sets using Pairwise Independence
Lecture Notes for Randomzed Algorthms Luby s Alg. for Maxmal Independent Sets usng Parwse Independence Last Updated by Erc Vgoda on February, 006 8. Maxmal Independent Sets For a graph G = (V, E), an ndependent
OLA HÖSSJER, BENGT ERIKSSON, KAJSA JÄRNMALM AND ESBJÖRN OHLSSON ABSTRACT
ASSESSING INDIVIDUAL UNEXPLAINED VARIATION IN NON-LIFE INSURANCE BY OLA HÖSSJER, BENGT ERIKSSON, KAJSA JÄRNMALM AND ESBJÖRN OHLSSON ABSTRACT We consder varaton of observed clam frequences n non-lfe nsurance,
Lecture 5,6 Linear Methods for Classification. Summary
Lecture 5,6 Lnear Methods for Classfcaton Rce ELEC 697 Farnaz Koushanfar Fall 2006 Summary Bayes Classfers Lnear Classfers Lnear regresson of an ndcator matrx Lnear dscrmnant analyss (LDA) Logstc regresson
Stress test for measuring insurance risks in non-life insurance
PROMEMORIA Datum June 01 Fnansnspektonen Författare Bengt von Bahr, Younes Elonq and Erk Elvers Stress test for measurng nsurance rsks n non-lfe nsurance Summary Ths memo descrbes stress testng of nsurance
Chapter 2 The Basics of Pricing with GLMs
Chapter 2 The Bascs of Prcng wth GLMs As descrbed n the prevous secton, the goal of a tarff analyss s to determne how one or more key ratos Y vary wth a number of ratng factors Ths s remnscent of analyzng
Single and multiple stage classifiers implementing logistic discrimination
Sngle and multple stage classfers mplementng logstc dscrmnaton Hélo Radke Bttencourt 1 Dens Alter de Olvera Moraes 2 Vctor Haertel 2 1 Pontfíca Unversdade Católca do Ro Grande do Sul - PUCRS Av. Ipranga,
IDENTIFICATION AND CORRECTION OF A COMMON ERROR IN GENERAL ANNUITY CALCULATIONS
IDENTIFICATION AND CORRECTION OF A COMMON ERROR IN GENERAL ANNUITY CALCULATIONS Chrs Deeley* Last revsed: September 22, 200 * Chrs Deeley s a Senor Lecturer n the School of Accountng, Charles Sturt Unversty,
PRIVATE SCHOOL CHOICE: THE EFFECTS OF RELIGIOUS AFFILIATION AND PARTICIPATION
PRIVATE SCHOOL CHOICE: THE EFFECTS OF RELIIOUS AFFILIATION AND PARTICIPATION Danny Cohen-Zada Department of Economcs, Ben-uron Unversty, Beer-Sheva 84105, Israel Wllam Sander Department of Economcs, DePaul
Extending Probabilistic Dynamic Epistemic Logic
Extendng Probablstc Dynamc Epstemc Logc Joshua Sack May 29, 2008 Probablty Space Defnton A probablty space s a tuple (S, A, µ), where 1 S s a set called the sample space. 2 A P(S) s a σ-algebra: a set
Variance estimation for the instrumental variables approach to measurement error in generalized linear models
he Stata Journal (2003) 3, Number 4, pp. 342 350 Varance estmaton for the nstrumental varables approach to measurement error n generalzed lnear models James W. Hardn Arnold School of Publc Health Unversty
Joe Pimbley, unpublished, 2005. Yield Curve Calculations
Joe Pmbley, unpublshed, 005. Yeld Curve Calculatons Background: Everythng s dscount factors Yeld curve calculatons nclude valuaton of forward rate agreements (FRAs), swaps, nterest rate optons, and forward
8 Algorithm for Binary Searching in Trees
8 Algorthm for Bnary Searchng n Trees In ths secton we present our algorthm for bnary searchng n trees. A crucal observaton employed by the algorthm s that ths problem can be effcently solved when the
APPLICATION OF PROBE DATA COLLECTED VIA INFRARED BEACONS TO TRAFFIC MANEGEMENT
APPLICATION OF PROBE DATA COLLECTED VIA INFRARED BEACONS TO TRAFFIC MANEGEMENT Toshhko Oda (1), Kochro Iwaoka (2) (1), (2) Infrastructure Systems Busness Unt, Panasonc System Networks Co., Ltd. Saedo-cho
Estimation of Dispersion Parameters in GLMs with and without Random Effects
Mathematcal Statstcs Stockholm Unversty Estmaton of Dsperson Parameters n GLMs wth and wthout Random Effects Meng Ruoyan Examensarbete 2004:5 Postal address: Mathematcal Statstcs Dept. of Mathematcs Stockholm
7 ANALYSIS OF VARIANCE (ANOVA)
7 ANALYSIS OF VARIANCE (ANOVA) Chapter 7 Analyss of Varance (Anova) Objectves After studyng ths chapter you should apprecate the need for analysng data from more than two samples; understand the underlyng
An Empirical Study of Search Engine Advertising Effectiveness
An Emprcal Study of Search Engne Advertsng Effectveness Sanjog Msra, Smon School of Busness Unversty of Rochester Edeal Pnker, Smon School of Busness Unversty of Rochester Alan Rmm-Kaufman, Rmm-Kaufman
RELIABILITY, RISK AND AVAILABILITY ANLYSIS OF A CONTAINER GANTRY CRANE ABSTRACT
Kolowrock Krzysztof Joanna oszynska MODELLING ENVIRONMENT AND INFRATRUCTURE INFLUENCE ON RELIABILITY AND OPERATION RT&A # () (Vol.) March RELIABILITY RIK AND AVAILABILITY ANLYI OF A CONTAINER GANTRY CRANE
Logistic Regression. Steve Kroon
Logstc Regresson Steve Kroon Course notes sectons: 24.3-24.4 Dsclamer: these notes do not explctly ndcate whether values are vectors or scalars, but expects the reader to dscern ths from the context. Scenaro
Face Verification Problem. Face Recognition Problem. Application: Access Control. Biometric Authentication. Face Verification (1:1 matching)
Face Recognton Problem Face Verfcaton Problem Face Verfcaton (1:1 matchng) Querymage face query Face Recognton (1:N matchng) database Applcaton: Access Control www.vsage.com www.vsoncs.com Bometrc Authentcaton
PRACTICE 1: MUTUAL FUNDS EVALUATION USING MATLAB.
PRACTICE 1: MUTUAL FUNDS EVALUATION USING MATLAB. INDEX 1. Load data usng the Edtor wndow and m-fle 2. Learnng to save results from the Edtor wndow. 3. Computng the Sharpe Rato 4. Obtanng the Treynor Rato
How To Understand The Results Of The German Meris Cloud And Water Vapour Product
Ttel: Project: Doc. No.: MERIS level 3 cloud and water vapour products MAPP MAPP-ATBD-ClWVL3 Issue: 1 Revson: 0 Date: 9.12.1998 Functon Name Organsaton Sgnature Date Author: Bennartz FUB Preusker FUB Schüller
Question 2: What is the variance and standard deviation of a dataset?
Queston 2: What s the varance and standard devaton of a dataset? The varance of the data uses all of the data to compute a measure of the spread n the data. The varance may be computed for a sample of
Scaling Models for the Severity and Frequency of External Operational Loss Data
Scalng Models for the Severty and Frequency of External Operatonal Loss Data Hela Dahen * Department of Fnance and Canada Research Char n Rsk Management, HEC Montreal, Canada Georges Donne * Department
The Current Employment Statistics (CES) survey,
Busness Brths and Deaths Impact of busness brths and deaths n the payroll survey The CES probablty-based sample redesgn accounts for most busness brth employment through the mputaton of busness deaths,
Project Networks With Mixed-Time Constraints
Project Networs Wth Mxed-Tme Constrants L Caccetta and B Wattananon Western Australan Centre of Excellence n Industral Optmsaton (WACEIO) Curtn Unversty of Technology GPO Box U1987 Perth Western Australa
Control Charts with Supplementary Runs Rules for Monitoring Bivariate Processes
Control Charts wth Supplementary Runs Rules for Montorng varate Processes Marcela. G. Machado *, ntono F.. Costa * * Producton Department, Sao Paulo State Unversty, Campus of Guaratnguetá, 56-4 Guaratnguetá,
INVESTIGATION OF VEHICULAR USERS FAIRNESS IN CDMA-HDR NETWORKS
21 22 September 2007, BULGARIA 119 Proceedngs of the Internatonal Conference on Informaton Technologes (InfoTech-2007) 21 st 22 nd September 2007, Bulgara vol. 2 INVESTIGATION OF VEHICULAR USERS FAIRNESS
Diagnostic Tests of Cross Section Independence for Nonlinear Panel Data Models
DISCUSSION PAPER SERIES IZA DP No. 2756 Dagnostc ests of Cross Secton Independence for Nonlnear Panel Data Models Cheng Hsao M. Hashem Pesaran Andreas Pck Aprl 2007 Forschungsnsttut zur Zukunft der Arbet
An Interest-Oriented Network Evolution Mechanism for Online Communities
An Interest-Orented Network Evoluton Mechansm for Onlne Communtes Cahong Sun and Xaopng Yang School of Informaton, Renmn Unversty of Chna, Bejng 100872, P.R. Chna {chsun,yang}@ruc.edu.cn Abstract. Onlne
Brigid Mullany, Ph.D University of North Carolina, Charlotte
Evaluaton And Comparson Of The Dfferent Standards Used To Defne The Postonal Accuracy And Repeatablty Of Numercally Controlled Machnng Center Axes Brgd Mullany, Ph.D Unversty of North Carolna, Charlotte
A Probabilistic Theory of Coherence
A Probablstc Theory of Coherence BRANDEN FITELSON. The Coherence Measure C Let E be a set of n propostons E,..., E n. We seek a probablstc measure C(E) of the degree of coherence of E. Intutvely, we want
Vasicek s Model of Distribution of Losses in a Large, Homogeneous Portfolio
Vascek s Model of Dstrbuton of Losses n a Large, Homogeneous Portfolo Stephen M Schaefer London Busness School Credt Rsk Electve Summer 2012 Vascek s Model Important method for calculatng dstrbuton of
Transition Matrix Models of Consumer Credit Ratings
Transton Matrx Models of Consumer Credt Ratngs Abstract Although the corporate credt rsk lterature has many studes modellng the change n the credt rsk of corporate bonds over tme, there s far less analyss
Prediction of Disability Frequencies in Life Insurance
Predcton of Dsablty Frequences n Lfe Insurance Bernhard Köng Fran Weber Maro V. Wüthrch October 28, 2011 Abstract For the predcton of dsablty frequences, not only the observed, but also the ncurred but
Part 1: quick summary 5. Part 2: understanding the basics of ANOVA 8
Statstcs Rudolf N. Cardnal Graduate-level statstcs for psychology and neuroscence NOV n practce, and complex NOV desgns Verson of May 4 Part : quck summary 5. Overvew of ths document 5. Background knowledge
"Research Note" APPLICATION OF CHARGE SIMULATION METHOD TO ELECTRIC FIELD CALCULATION IN THE POWER CABLES *
Iranan Journal of Scence & Technology, Transacton B, Engneerng, ol. 30, No. B6, 789-794 rnted n The Islamc Republc of Iran, 006 Shraz Unversty "Research Note" ALICATION OF CHARGE SIMULATION METHOD TO ELECTRIC
