Scton 3: Logstc Rgrsson As our motvaton for logstc rgrsson, w wll consdr th Challngr dsastr, th sx of turtls, collg math placmnt, crdt card scorng, and markt sgmntaton. Th Challngr Dsastr On January 28, 1986 th spac shuttl, Challngr, had a catastrophc falur du to burnthrough of an O-rng sal at a jont n on of th sold-ful rockt boostrs. Ths was th 25th shuttl flght. Of th 24 prvous shuttl flghts, 7 had ncdnts of damag to jonts, 16 had no ncdnts of damag, and 1 was unknown. (Th data coms from rcovrd sold rockt boostrs th on that was unknown was not rcovrd.) Th quston w wsh to xamn s: Could damag to sold-rockt boostr fld jonts b rlatd to cold wathr at th tm of launch? Damag to Boostr Rockt Fld Jonts Blow ar data from th Prsdntal Commsson on th Spac Shuttl Challngr Accdnt (1986). Th data consst of th flght, tmpratur at th tm of launch ( F) and whthr or not thr was damag to th boostr rockt fld jonts (No = 0. Ys = 1). Flght Tmp Damag Flght Tmp Damag Flght Tmp Damag STS-1 66 NO STS-9 70 NO STS 51-B 75 NO* STS-2 70 YES STS 41-B 57 YES STS 51-G 70 NO STS-3 69 NO STS 41-C 63 YES STS 51-F 81 NO STS-4 80??? STS 41-D 70 YES STS 51-I 76 NO STS-5 68 NO STS 41-G 78 NO STS 51-J 79 NO STS-6 67 NO STS 51-A 67 NO STS 61-A 75 YES STS-7 72 NO STS 51-C 53 YES STS 61-B 76 NO STS-8 73 NO STS 51-D 67 NO STS 61-C 58 YES Th tmpratur whn STS 51-L (Challngr) was launchd was 31 F. Fgur18: Plot of Incdnc of Boostr Fld Jont Damag vs. Tmpratur
Ovrall, thr wr 7 ncdncs of jont damag out of 23 flghts: 7 30%. Whn th 23 tmpratur was blow 65 F, all 4 shuttls had jont damag, 4 = 100%, and whn th 4 tmpratur was abov 65 F, only 3 out of 19 had jont damag, 3 16%. Is thr som 19 way to prdct th chanc of boostr fld jont damag gvn th tmpratur at launch? Th rspons varabl s th probablty of falur (damag) not ncssarly catastroph. Rcall that th tmpratur was 31 F on th day of th Challngr launch. Sx of Turtls as t Rlats to Incubaton Tmpratur Ths data ar courtsy of Prof. Kn Kohlr, Iowa Stat Unvrsty. What dtrmns th sx (mal or fmal) of turtls? Gntcs or nvronmnt? For a partcular spcs of turtls, tmpratur sms to hav a grat ffct on sx. Turtl ggs (all on spcs) wr collctd from Illnos and put nto boxs, wth svral ggs n ach box. Ths boxs wr ncubatd at dffrnt tmpraturs, wth thr boxs at ach tst tmpratur and tmpraturs rangng from 27.2 C to 29.9 C. Whn th ggs hatchd, th sx of ach turtl was dtrmnd. Tmp( C) Mal Fmal Tmp( C) Mal Fmal Tmp( C) Mal Fmal 27.2 1 9 27.2 0 8 27.2 1 8 27.7 7 3 27.7 4 2 27.7 6 2 28.3 13 0 28.3 6 3 28.3 7 1 28.4 7 3 28.4 5 3 28.4 7 2 29.9 10 1 29.9 8 0 29.9 9 0 Tmpratur and Gndr of Hatchd Turtls Th ovrall proporton of mal turtls was 91 0.67. Whn th tmpratur was blow 136 27.5, th proporton of th turtls that wr mal was 2 = 0.07. Whn th tmpratur 27 was blow 28, proporton of th turtls that wr mal was 19 0.37. Whn th 51 64 tmpratur was blow 28.5, 0.59 wr mal, and for tmpraturs blow 30.0, 108 91 0.67 wr mal. 136 27
Fgur19: Plot of Proporton of Mal Turtls vs. Incubaton Tmpratur Not: W rally cannot b sur w hav a random sampl w may smply hav turtl ggs that wr asy to fnd not havng a random sampl mght chang th rror structur. Is thr som way to prdct th proporton of mal turtls gvn th ncubaton tmpratur? What th scntst wantd to know was at what tmpratur would thr b a 50/50 splt n mal/fmal. Othr stuatons naturally lad to analyss through logstc rgrsson. A fw xampls ar gvn blow: Collg Math Placmnt Us ACT or SAT scors to prdct whthr ndvduals would rcv a grad of B or bttr n an ntry lvl math cours and so should b placd n a hghr lvl math cours. Crdt Card Scorng Us varous dmographc and crdt hstory varabls to prdct f ndvduals wll b good or bad crdt rsks. Markt Sgmntaton Us varous dmographc and purchasng nformaton to prdct f ndvduals wll purchas from a catalog snt to thr homs. All of ths stuatons nvolv th da of prdcton, and all hav a bnary rspons, for nstanc, damag/no damag, or mal/fmal. On s ntrstd n prdctng a chanc, probablty, proporton, or prcntag. Unlk othr prdcton stuatons, th rspons s boundd wth 0 p 1. 28
Logstc Rgrsson Logstc rgrsson s a statstcal tchnqu that can b usd n bnary rspons problms. W wll nd to transform th rspons to us ths tchnqu what ls wll w nd to chang? W dfn our bnary rsponss as: Y = 1 damag to fld jont and Y = 0 no damag. Y = 1 mal turtl hatchd and Y = 0 fmal turtl hatchd. Y = 1 rcv a B or bttr and Y = 0 don't rcv a B or bttr. Y = 1 good crdt rsk and Y = 0 not good crdt rsk. Y = 1 wll purchas from catalog and Y = 0 wll not purchas from catalog. In ach stuaton, w ar ntrstd n prdctng th probablty that Y = 1 from th prdctor varabl. Hr w ar only ntrstd n fndng a prdcton modl. Infrnc s not an ssu. Th bnary form of th rspons ncssarly volats th normalty and qual varanc assumptons on th rrors, so f w wr to do nfrnc w would nd dffrnt mthods from thos usd n ordnary last squars rgrsson. W dnot Prob(Y = 1) = π and Prob(Y = 0) = 1 π and E( Y ) = 0( 1 π ) + 1( π ) = π W want to prdct PY ( = 1) = π from a gvn x-valu, x. Can w ft ths wth a lnar modl of th form E( Y X ) = β + β X = π 0 1? Thr ar a fw problms that dstngush ths from mor typcal rgrsson problms. 1. Thr s a constrant on th rspons, whch s boundd btwn 0 and 1, that s, 0 E( Y X ) = π 1 2. Thr s a non-constant varanc on th rspons. W know, snc ths s a bnomal stuaton, that Var( ε ) = Var( Y ) = π ( 1 π ). Consquntly, th varanc dpnds on th valu of X. 3. Non-normal rror trms: ε Y ( β0 X ) ε = 1 ( β + β X ) 0 1 = +. Whn Y = 1, w hav Whn th rspons varabl s bnary, or a bnomal proporton, th shap of th xpctd rspons s oftn a curv. Th S-shapd curv shown blow s known as th logstc curv. 29
Logstc Curvlnar Modl Fgur 20: Incrasng and Dcrasng Logstc Plots Th modl usd n logstc rgrsson has th form blow: ( X ) EY = π = 1 + ( β + β X ) 0 1 ( β + β ) 0 1X Th paramtrs to b stmatd show up n th xponnt n both numrator and dnomnator. As bfor, w wll us a transformaton to lnarz th data, ft a lnar modl to th transformd data, and r-wrt to rturn to th orgnal scal. What transformaton wll lnarz somthng as complcatd as th quaton abov? Th Logt Transformaton Th logt transformaton s dvlopd by consdrng th quaton supprss th subscrpts to kp th algbra clan.) π β0+ X =. (W β0+ X 1 + If thn and β0+ X π =, β0+ X 1 + + 1 π = = 1+ 1+ 1+ β0+ X β0+ X 1 1 β0+ X β0+ X β0+ X π = 1 π 0 1X β + β. 30
π Th xprsson 1 π So, s a lnar functon of X. ar th odds of gttng a 1. ln π = β + 1 π β X 0 1 W stmat π by p, th obsrvd proporton, and apply th logt transformaton, p ln. Thn w fnd a lnar modl to ft of th form ln p = b + 0 bx 1. By back 1 p 1 p b0 + b1x transformng, w fnd th logstc modl wll b $π =. + b0 + b1x 1 Th tabl blow was cratd from th turtl data by combnng all 3 groups at ach tmpratur sttng, and usng th combnd proporton for th probablty of a mal, dnotd Pmal. Tmp Mal Fmal Total Pmal 27.2 2 25 27 0.0741 27.7 17 7 24 0.7083 28.3 26 4 30 0.8667 28.4 19 8 27 0.7037 29.9 27 1 28 0.9643 Fgur 21: Logt Transformaton 31
Tmp Pmal, p p ln 1 p 27.2 0.0741-2.5257 27.7 0.7083 0.8873 28.3 0.8667 1.8718 28.4 0.7037 0.8650 29.9 0.9643 3.2958 Logt transformaton and smpl lnar rgrsson gvs th modl to th r-xprssd data as π π = ˆ 51.1116+ 1.8371X, whr ˆπ rprsnts th fttd valus for ln 1 π π W now tak th prdctd valus of ln 1 π gvn by π $ and back-transform to fnd th prdctd valu of $π. Not that th valus of π ˆ ar obtand by applyng to ach π$ valu. πˆ = 1+ πˆ πˆ Sx of Turtls Tmp Prdctd Logt, π$ Prdctd $π 27.2-1.1420 0.242 27.7-0.2234 0.444 28.3 0.8789 0.707 28.4 1.0626 0.743 29.9 3.8183 0.979 32
Th graph of th logstc modl 51.1122+ 1.8371x πˆ = aganst th data s gvn blow. 1+ 51.1122+ 1.8371x Fgur 22: Logstc modl 51.1122+ 1.8371x πˆ = graphd aganst th data 1+ 51.1122+ 1.8371x As w can s, th Logt transformaton has adjustd for th curvd natur of th rspons. It has not, howvr, hlpd wth th problm of volatng assumptons on th rrors n Smpl Lnar Rgrsson. Consquntly, w can not us standard nfrnc mthods wth ths modl. Maxmum Lklhood Approach To mprov th qualty of th ft and allow for th us of nfrnc procdurs, w can us maxmum lklhood tchnqus rathr than th last squars mthods. Frst, dfn th n ( Y lklhood functon (( ) ) ( ) 1 Y L β0, ; Data π 1 π β0+ X ) =, wth π =. Not ( β0+ X ) = 1 1 + that whn Y = 1, ths factor s π ; whn Y = 0, ths factor s 1 π. Now, choos β 0 and β 1 so as to maxmz th lklhood for any gvn data. For Smpl Lnar Rgrsson, mnmzng th sum of squard rsduals s quvalnt to maxmzng a normal dstrbuton lklhood. To fnd th valus of β 0 and β 1 that maxmz th lklhood for ths lklhood functon gvn th prsnt data, us th Bnary Logstc Rgrsson command n Mntab. You wll fnd t undr. -Stat -Rgrsson -Bnary Logstc Rgrsson 33
Th procss usd to calculat th valus of β 0 and β 1 s an tratv procss that s byond th scop of ths cours. Th rsult of th calculaton s smlar to that from rgrsson: $ = 6132. + 2. 2110 X π Sx of Turtls Tmp Prdctd Logt Prdctd $π 27.2-1.1791 0.235 27.7-0.0736 0.482 28.3 1.2530 0.778 28.4 1.4741 0.814 29.9 4.7906 0.992 Fgur 23: Bnary Logstc Rgrsson graph Th coffcnts n a logstc rgrsson ar oftn dffcult to ntrprt bcaus th ffct of ncrasng X by on unt vars dpndng on th sz of X. Ths s th ssnc of a π nonlnar modl. Consdr frst th ntrprtaton of. Ths quantty gvs th odds. 1 π If π = 0. 75, thn th odds ar 3 to 1. Succss s thr tms as lkly as falur. In logstc rgrsson w modl th log-odds. Th prdctd log-odds, π$ s gvn n th turtl xampl by th lnar quaton: π$ = 6132. + 2. 2110X $ $ π π Th prdctd odds for that valu of X s =. So f w ncras X 1 π$ by on unt, w multply th prdctd odds by $ β 1, or 2.2110 = 9.13 n th turtl xampl. At 27 dgrs th prdctd odds for a mal turtl ar approxmatly 0.20, about 1 to 5. That s, t s 5 tms mor lkly to b a fmal. At 28 dgrs th prdctd odds for a mal ar 9.13 tms bggr than at 27 dgrs, 1.80. Now mals ar almost twc as lkly 34
as fmals. Th ntrcpt can b thought of as th log-odds whn X s zro. Th antlog of th ntrcpt may hav som manng as a basln log-odds, spcally f zro s wthn th rang of th orgnal data. Snc th tmpraturs consdrd run from about 27 to 30 dgrs, th valu of zro s wll outsd th rang of th data. Th ntrcpt, and ts antlog, hav no practcal ntrprtaton n ths xampl. On of th qustons w wantd to answr was, "At what tmpratur ar mals and fmals qually lkly?" In ths cas, th log-odds ar qual to 1. So, w can solv th quaton Assssng th Ft 6132. + 2. 211X = 1, so 62.32 X = 28.2 dgrs. 2.211 So far w hav only lookd at stmatng paramtrs and prdctng valus. Th stmats and prdctons ar subjct to varaton. W must b abl to quantfy ths varaton n ordr to mak nfrncs. Just as n ordnary rgrsson, w nd som mans of assssng th ft of a logstc rgrsson modl and dtrmnng th sgnfcanc of coffcnts n that modl. For logstc rgrsson, th dvanc (also known as rsdual dvanc) s usd to assss th ft of th ovrall modl. Th dvanc for a logstc modl can b lknd to th rsdual sum of squars n ordnary rgrsson. Th smallr th dvanc th bttr th ft of th modl. Th dvanc can b compard to a ch-squar dstrbuton, whch approxmats th dstrbuton of th dvanc. Ths s an asymptotc rsult that rqurs larg sampl szs. Th dvanc for th combnd turtl data s 14.863 on 3 dgrs of frdom. Th chanc that a χ 2 wth 3 dgrs of frdom xcds 14.863 s 0.0019. Essntally w ar usng th dvanc to tst H : 0 ft s good vrsus Ha : ft s not good. Th p-valu of 0.0019 ndcats that th dvanc lft aftr th ft s too larg to conclud that th ft s good. Thus, thr s room for mprovmnt n th modl. Although thr s som lack of ft, dos tmpratur gv us statstcally sgnfcant nformaton about th sx of turtls va th logstc rgrsson? Look at th chang n dvanc whn tmpratur s addd to th modl. That s, compar dvanc whn th modl s smply πˆ = π to th dvanc from th logstc modl usng tmpratur as th xplanatory varabl. In Mntab ths s summarzd by G, th tst that all slops ar zro. G = 49.556 on 1 dgr of frdom p-valu = 0.000 Rjct th hypothss that th slop n th logstc rgrsson s zro Thus w can conclud that tmpratur dos gv us statstcally sgnfcant nformaton about th sx of turtls. 35
Altrnatv Tst Th rato of th stmatd coffcnt to ts standard rror, an approxmat z-statstc, can 2.211 b usd to assss sgnfcanc. In ths stuaton, z = = 5.13 wth p = 0.000. 0.4306 Both th z- and th G-statstc ndcat that tmpratur s statstcally sgnfcant. Snc sampl szs ar modrat, btwn 25 and 30, th p-valus drvd from thr tst wll b approxmat, at bst. In concluson, tmpratur s statstcally sgnfcant n th logstc rgrsson for th sx of turtls. Th logstc rgrsson may not provd th bst ft; othr modls may ft bttr. Th Challngr Dsastr Rvstd Usng th tchnqus of ths scton, w can ft a lnar modl to logts from th Challngr data. W wll rgroup th data by tmpratur nto ntrvals of 5 dgrs, usng th mdpont of ach ntrval for th ndpndant varabl. W also adjust th probablts a bt, rplacng 0 wth 0.01 and 1 wth 0.99, so w can tak logarthms for th logt ft. Thus, w hav th followng data:. Intrval (51, 55) (56, 60) (61, 65) (66, 70) (71, 75) (76, 80) (81, 85) Tmp 53 58 63 68 73 78 83 Prob 0.99 0.99 0.99 0.20 0.25 0.01 0.01 Logt 4.595 4.595 4.595-1.386-1.099-4.595-4.595 Th graph of th transformd data wth th lnar ft s shown blow. Th lnar modl s p fttd ln = 25.386 0.369Tmp. 1 p Fgur 24: Logt r-xprsson and lnar modl 36
Transformng ths modl to a probablty of falur scal s don by sttng 25.386 0.369Tmp ˆ P =. Ths graph s shown blow. 25.386 0.369Tmp 1 + Fgur 25: 25.386 0.369Tmp P = graphd aganst th tmprautur 25.386 0.369Tmp 1 + From th modl w can s that falurs wll occur at last half of th tm f th tmpratur s blow 68.8 dgrs. At 31 dgrs, th probablty s ssntally 1 for an O-rng falur. You can also us th ungroupd data wth Mntab's Bnary Logstc Rgrsson. In that analyss, th Logstc Rgrsson modl s Rvrsng th logt transformaton on has p fttd ln = 15.043 0.2322Tmp. 1 p 15.043 0.2322Tmp ˆ P =. 15.043 0.2322Tmp 1 + From ths modl, falurs wll occur at last half of th tm f th tmpratur s blow 64.8 dgrs. 37
Fgur 26: 15.043 0.2322Tmp P = graphd aganst th tmprautur 15.043 0.2322Tmp 1 + 38