Section 3: Logistic Regression



Similar documents
QUANTITATIVE METHODS CLASSES WEEK SEVEN

Authenticated Encryption. Jeremy, Paul, Ken, and Mike

Modern Portfolio Theory (MPT) Statistics

The influence of advertising on the purchase of pharmaceutical products

Life Analysis for the Main bearing of Aircraft Engines

ANALYSIS OF ORDER-UP-TO-LEVEL INVENTORY SYSTEMS WITH COMPOUND POISSON DEMAND

Facts About Chronc Fatgu Syndrom - sample thereof

Logistic Regression for Insured Mortality Experience Studies. Zhiwei Zhu, 1 Zhi Li 2

Mininum Vertex Cover in Generalized Random Graphs with Power Law Degree Distribution

Control of Perceived Quality of Service in Multimedia Retrieval Services: Prediction-based mechanism vs. compensation buffers

No 28 Xianning West Road, Xi an No 70 Yuhua East Road, Shijiazhuang.

Modelling Exogenous Variability in Cloud Deployments

Advantageous Selection versus Adverse Selection in Life Insurance Market

Traffic Flow Analysis (2)

ERLANG C FORMULA AND ITS USE IN THE CALL CENTERS

benefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ).

Tax Collection, Transfers, and Corruption: the Russian Federation at the Crossroads 1)

Online Load Balancing and Correlated Randomness

AP Calculus AB 2008 Scoring Guidelines

Reputation Management for DHT-based Collaborative Environments *

Term Structure of Interest Rates: The Theories

The example is taken from Sect. 1.2 of Vol. 1 of the CPN book.

PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 12

Personalized Web Search by User Interest Hierarchy

VOL. 25, NÚM. 54, EDICIÓN JUNIO 2007 PP

Game of Platforms: Strategic Expansion into Rival (Online) Territory

Adverse Selection and Moral Hazard in a Model With 2 States of the World

Econ 371: Answer Key for Problem Set 1 (Chapter 12-13)

Regression Models for a Binary Response Using EXCEL and JMP

Bank Incentives, Economic Specialization, and Financial Crises in Emerging Economies

by John Donald, Lecturer, School of Accounting, Economics and Finance, Deakin University, Australia

The Beer-Bouguer-Lambert law. Concepts of extinction (scattering plus absorption) and emission. Schwarzschild s equation.

What is Candidate Sampling

Long run: Law of one price Purchasing Power Parity. Short run: Market for foreign exchange Factors affecting the market for foreign exchange

STATISTICAL DATA ANALYSIS IN EXCEL

CHAPTER 14 MORE ABOUT REGRESSION

1 De nitions and Censoring

5 2 index. e e. Prime numbers. Prime factors and factor trees. Powers. worked example 10. base. power

FACULTY SALARIES FALL NKU CUPA Data Compared To Published National Data

Question 3: How do you find the relative extrema of a function?

THE METHOD OF LEAST SQUARES THE METHOD OF LEAST SQUARES

L10: Linear discriminants analysis

Managing the Outsourcing of Two-Level Service Processes: Literature Review and Integration

CPS 220 Theory of Computation REGULAR LANGUAGES. Regular expressions

81-1-ISD Economic Considerations of Heat Transfer on Sheet Metal Duct

Basis risk. When speaking about forward or futures contracts, basis risk is the market

Can Auto Liability Insurance Purchases Signal Risk Attitude?

C H A P T E R 1 Writing Reports with SAS

Robust Design of Public Storage Warehouses. Yeming (Yale) Gong EMLYON Business School

Causal, Explanatory Forecasting. Analysis. Regression Analysis. Simple Linear Regression. Which is Independent? Forecasting

Non-Linear and Unbalanced Three-Phase Load Static Compensation with Asymmetrical and Non Sinusoidal Supply

Analysis of Premium Liabilities for Australian Lines of Business

Real-Time Evaluation of Campaign Performance

Forecasting the Direction and Strength of Stock Market Movement

An RSA-based (t, n) threshold proxy signature scheme with freewill identities

CCD CHARGE TRANSFER EFFICIENCY (CTE) DERIVED FROM SIGNAL VARIANCE IN FLAT FIELD IMAGES The CVF method

is knowing the car market inside out.

Binomial Link Functions. Lori Murray, Phil Munz

The price of liquidity in constant leverage strategies. Marcos Escobar, Andreas Kiechle, Luis Seco and Rudi Zagst

Factorials! Stirling s formula

Part 2 - Notes on how to complete your application form

5.4 Exponential Functions: Differentiation and Integration TOOTLIFTST:

Extending Probabilistic Dynamic Epistemic Logic

How To Write A Recipe Card

Buffer Management Method for Multiple Projects in the CCPM-MPL Representation

Sun Synchronous Orbits for the Earth Solar Power Satellite System

JOB-HOPPING IN THE SHADOW OF PATENT ENFORCEMENT

Parallel and Distributed Programming. Performance Metrics

Portfolio Loss Distribution

Category 7: Employee Commuting

A Probabilistic Approach to Latent Cluster Analysis

Lecture 5,6 Linear Methods for Classification. Summary

union scholars program APPLICATION DEADLINE: FEBRUARY 28 YOU CAN CHANGE THE WORLD... AND EARN MONEY FOR COLLEGE AT THE SAME TIME!

Mathematics. Mathematics 3. hsn.uk.net. Higher HSN23000

How Sets of Coherent Probabilities May Serve as Models for Degrees of Incoherence

Section G3: Differential Amplifiers

Meta-Analysis of Hazard Ratios

THE DISTRIBUTION OF LOAN PORTFOLIO VALUE * Oldrich Alfons Vasicek

) of the Cell class is created containing information about events associated with the cell. Events are added to the Cell instance

Latent Class Regression. Statistics for Psychosocial Research II: Structural Models December 4 and 6, 2006

New Basis Functions. Section 8. Complex Fourier Series

Statistical Methods to Develop Rating Models

Contents Stochastic Ray Tracing

Protecting E-Commerce Systems From Online Fraud

1.2 DISTRIBUTIONS FOR CATEGORICAL DATA

Entity-Relationship Model

Initial inventory levels for a book publishing firm

Foreign Exchange Markets and Exchange Rates

ME 612 Metal Forming and Theory of Plasticity. 6. Strain

Use a high-level conceptual data model (ER Model). Identify objects of interest (entities) and relationships between these objects

Incomplete 2-Port Vector Network Analyzer Calibration Methods

SPECIAL VOWEL SOUNDS

Electronic Commerce. and. Competitive First-Degree Price Discrimination

Transcription:

Scton 3: Logstc Rgrsson As our motvaton for logstc rgrsson, w wll consdr th Challngr dsastr, th sx of turtls, collg math placmnt, crdt card scorng, and markt sgmntaton. Th Challngr Dsastr On January 28, 1986 th spac shuttl, Challngr, had a catastrophc falur du to burnthrough of an O-rng sal at a jont n on of th sold-ful rockt boostrs. Ths was th 25th shuttl flght. Of th 24 prvous shuttl flghts, 7 had ncdnts of damag to jonts, 16 had no ncdnts of damag, and 1 was unknown. (Th data coms from rcovrd sold rockt boostrs th on that was unknown was not rcovrd.) Th quston w wsh to xamn s: Could damag to sold-rockt boostr fld jonts b rlatd to cold wathr at th tm of launch? Damag to Boostr Rockt Fld Jonts Blow ar data from th Prsdntal Commsson on th Spac Shuttl Challngr Accdnt (1986). Th data consst of th flght, tmpratur at th tm of launch ( F) and whthr or not thr was damag to th boostr rockt fld jonts (No = 0. Ys = 1). Flght Tmp Damag Flght Tmp Damag Flght Tmp Damag STS-1 66 NO STS-9 70 NO STS 51-B 75 NO* STS-2 70 YES STS 41-B 57 YES STS 51-G 70 NO STS-3 69 NO STS 41-C 63 YES STS 51-F 81 NO STS-4 80??? STS 41-D 70 YES STS 51-I 76 NO STS-5 68 NO STS 41-G 78 NO STS 51-J 79 NO STS-6 67 NO STS 51-A 67 NO STS 61-A 75 YES STS-7 72 NO STS 51-C 53 YES STS 61-B 76 NO STS-8 73 NO STS 51-D 67 NO STS 61-C 58 YES Th tmpratur whn STS 51-L (Challngr) was launchd was 31 F. Fgur18: Plot of Incdnc of Boostr Fld Jont Damag vs. Tmpratur

Ovrall, thr wr 7 ncdncs of jont damag out of 23 flghts: 7 30%. Whn th 23 tmpratur was blow 65 F, all 4 shuttls had jont damag, 4 = 100%, and whn th 4 tmpratur was abov 65 F, only 3 out of 19 had jont damag, 3 16%. Is thr som 19 way to prdct th chanc of boostr fld jont damag gvn th tmpratur at launch? Th rspons varabl s th probablty of falur (damag) not ncssarly catastroph. Rcall that th tmpratur was 31 F on th day of th Challngr launch. Sx of Turtls as t Rlats to Incubaton Tmpratur Ths data ar courtsy of Prof. Kn Kohlr, Iowa Stat Unvrsty. What dtrmns th sx (mal or fmal) of turtls? Gntcs or nvronmnt? For a partcular spcs of turtls, tmpratur sms to hav a grat ffct on sx. Turtl ggs (all on spcs) wr collctd from Illnos and put nto boxs, wth svral ggs n ach box. Ths boxs wr ncubatd at dffrnt tmpraturs, wth thr boxs at ach tst tmpratur and tmpraturs rangng from 27.2 C to 29.9 C. Whn th ggs hatchd, th sx of ach turtl was dtrmnd. Tmp( C) Mal Fmal Tmp( C) Mal Fmal Tmp( C) Mal Fmal 27.2 1 9 27.2 0 8 27.2 1 8 27.7 7 3 27.7 4 2 27.7 6 2 28.3 13 0 28.3 6 3 28.3 7 1 28.4 7 3 28.4 5 3 28.4 7 2 29.9 10 1 29.9 8 0 29.9 9 0 Tmpratur and Gndr of Hatchd Turtls Th ovrall proporton of mal turtls was 91 0.67. Whn th tmpratur was blow 136 27.5, th proporton of th turtls that wr mal was 2 = 0.07. Whn th tmpratur 27 was blow 28, proporton of th turtls that wr mal was 19 0.37. Whn th 51 64 tmpratur was blow 28.5, 0.59 wr mal, and for tmpraturs blow 30.0, 108 91 0.67 wr mal. 136 27

Fgur19: Plot of Proporton of Mal Turtls vs. Incubaton Tmpratur Not: W rally cannot b sur w hav a random sampl w may smply hav turtl ggs that wr asy to fnd not havng a random sampl mght chang th rror structur. Is thr som way to prdct th proporton of mal turtls gvn th ncubaton tmpratur? What th scntst wantd to know was at what tmpratur would thr b a 50/50 splt n mal/fmal. Othr stuatons naturally lad to analyss through logstc rgrsson. A fw xampls ar gvn blow: Collg Math Placmnt Us ACT or SAT scors to prdct whthr ndvduals would rcv a grad of B or bttr n an ntry lvl math cours and so should b placd n a hghr lvl math cours. Crdt Card Scorng Us varous dmographc and crdt hstory varabls to prdct f ndvduals wll b good or bad crdt rsks. Markt Sgmntaton Us varous dmographc and purchasng nformaton to prdct f ndvduals wll purchas from a catalog snt to thr homs. All of ths stuatons nvolv th da of prdcton, and all hav a bnary rspons, for nstanc, damag/no damag, or mal/fmal. On s ntrstd n prdctng a chanc, probablty, proporton, or prcntag. Unlk othr prdcton stuatons, th rspons s boundd wth 0 p 1. 28

Logstc Rgrsson Logstc rgrsson s a statstcal tchnqu that can b usd n bnary rspons problms. W wll nd to transform th rspons to us ths tchnqu what ls wll w nd to chang? W dfn our bnary rsponss as: Y = 1 damag to fld jont and Y = 0 no damag. Y = 1 mal turtl hatchd and Y = 0 fmal turtl hatchd. Y = 1 rcv a B or bttr and Y = 0 don't rcv a B or bttr. Y = 1 good crdt rsk and Y = 0 not good crdt rsk. Y = 1 wll purchas from catalog and Y = 0 wll not purchas from catalog. In ach stuaton, w ar ntrstd n prdctng th probablty that Y = 1 from th prdctor varabl. Hr w ar only ntrstd n fndng a prdcton modl. Infrnc s not an ssu. Th bnary form of th rspons ncssarly volats th normalty and qual varanc assumptons on th rrors, so f w wr to do nfrnc w would nd dffrnt mthods from thos usd n ordnary last squars rgrsson. W dnot Prob(Y = 1) = π and Prob(Y = 0) = 1 π and E( Y ) = 0( 1 π ) + 1( π ) = π W want to prdct PY ( = 1) = π from a gvn x-valu, x. Can w ft ths wth a lnar modl of th form E( Y X ) = β + β X = π 0 1? Thr ar a fw problms that dstngush ths from mor typcal rgrsson problms. 1. Thr s a constrant on th rspons, whch s boundd btwn 0 and 1, that s, 0 E( Y X ) = π 1 2. Thr s a non-constant varanc on th rspons. W know, snc ths s a bnomal stuaton, that Var( ε ) = Var( Y ) = π ( 1 π ). Consquntly, th varanc dpnds on th valu of X. 3. Non-normal rror trms: ε Y ( β0 X ) ε = 1 ( β + β X ) 0 1 = +. Whn Y = 1, w hav Whn th rspons varabl s bnary, or a bnomal proporton, th shap of th xpctd rspons s oftn a curv. Th S-shapd curv shown blow s known as th logstc curv. 29

Logstc Curvlnar Modl Fgur 20: Incrasng and Dcrasng Logstc Plots Th modl usd n logstc rgrsson has th form blow: ( X ) EY = π = 1 + ( β + β X ) 0 1 ( β + β ) 0 1X Th paramtrs to b stmatd show up n th xponnt n both numrator and dnomnator. As bfor, w wll us a transformaton to lnarz th data, ft a lnar modl to th transformd data, and r-wrt to rturn to th orgnal scal. What transformaton wll lnarz somthng as complcatd as th quaton abov? Th Logt Transformaton Th logt transformaton s dvlopd by consdrng th quaton supprss th subscrpts to kp th algbra clan.) π β0+ X =. (W β0+ X 1 + If thn and β0+ X π =, β0+ X 1 + + 1 π = = 1+ 1+ 1+ β0+ X β0+ X 1 1 β0+ X β0+ X β0+ X π = 1 π 0 1X β + β. 30

π Th xprsson 1 π So, s a lnar functon of X. ar th odds of gttng a 1. ln π = β + 1 π β X 0 1 W stmat π by p, th obsrvd proporton, and apply th logt transformaton, p ln. Thn w fnd a lnar modl to ft of th form ln p = b + 0 bx 1. By back 1 p 1 p b0 + b1x transformng, w fnd th logstc modl wll b $π =. + b0 + b1x 1 Th tabl blow was cratd from th turtl data by combnng all 3 groups at ach tmpratur sttng, and usng th combnd proporton for th probablty of a mal, dnotd Pmal. Tmp Mal Fmal Total Pmal 27.2 2 25 27 0.0741 27.7 17 7 24 0.7083 28.3 26 4 30 0.8667 28.4 19 8 27 0.7037 29.9 27 1 28 0.9643 Fgur 21: Logt Transformaton 31

Tmp Pmal, p p ln 1 p 27.2 0.0741-2.5257 27.7 0.7083 0.8873 28.3 0.8667 1.8718 28.4 0.7037 0.8650 29.9 0.9643 3.2958 Logt transformaton and smpl lnar rgrsson gvs th modl to th r-xprssd data as π π = ˆ 51.1116+ 1.8371X, whr ˆπ rprsnts th fttd valus for ln 1 π π W now tak th prdctd valus of ln 1 π gvn by π $ and back-transform to fnd th prdctd valu of $π. Not that th valus of π ˆ ar obtand by applyng to ach π$ valu. πˆ = 1+ πˆ πˆ Sx of Turtls Tmp Prdctd Logt, π$ Prdctd $π 27.2-1.1420 0.242 27.7-0.2234 0.444 28.3 0.8789 0.707 28.4 1.0626 0.743 29.9 3.8183 0.979 32

Th graph of th logstc modl 51.1122+ 1.8371x πˆ = aganst th data s gvn blow. 1+ 51.1122+ 1.8371x Fgur 22: Logstc modl 51.1122+ 1.8371x πˆ = graphd aganst th data 1+ 51.1122+ 1.8371x As w can s, th Logt transformaton has adjustd for th curvd natur of th rspons. It has not, howvr, hlpd wth th problm of volatng assumptons on th rrors n Smpl Lnar Rgrsson. Consquntly, w can not us standard nfrnc mthods wth ths modl. Maxmum Lklhood Approach To mprov th qualty of th ft and allow for th us of nfrnc procdurs, w can us maxmum lklhood tchnqus rathr than th last squars mthods. Frst, dfn th n ( Y lklhood functon (( ) ) ( ) 1 Y L β0, ; Data π 1 π β0+ X ) =, wth π =. Not ( β0+ X ) = 1 1 + that whn Y = 1, ths factor s π ; whn Y = 0, ths factor s 1 π. Now, choos β 0 and β 1 so as to maxmz th lklhood for any gvn data. For Smpl Lnar Rgrsson, mnmzng th sum of squard rsduals s quvalnt to maxmzng a normal dstrbuton lklhood. To fnd th valus of β 0 and β 1 that maxmz th lklhood for ths lklhood functon gvn th prsnt data, us th Bnary Logstc Rgrsson command n Mntab. You wll fnd t undr. -Stat -Rgrsson -Bnary Logstc Rgrsson 33

Th procss usd to calculat th valus of β 0 and β 1 s an tratv procss that s byond th scop of ths cours. Th rsult of th calculaton s smlar to that from rgrsson: $ = 6132. + 2. 2110 X π Sx of Turtls Tmp Prdctd Logt Prdctd $π 27.2-1.1791 0.235 27.7-0.0736 0.482 28.3 1.2530 0.778 28.4 1.4741 0.814 29.9 4.7906 0.992 Fgur 23: Bnary Logstc Rgrsson graph Th coffcnts n a logstc rgrsson ar oftn dffcult to ntrprt bcaus th ffct of ncrasng X by on unt vars dpndng on th sz of X. Ths s th ssnc of a π nonlnar modl. Consdr frst th ntrprtaton of. Ths quantty gvs th odds. 1 π If π = 0. 75, thn th odds ar 3 to 1. Succss s thr tms as lkly as falur. In logstc rgrsson w modl th log-odds. Th prdctd log-odds, π$ s gvn n th turtl xampl by th lnar quaton: π$ = 6132. + 2. 2110X $ $ π π Th prdctd odds for that valu of X s =. So f w ncras X 1 π$ by on unt, w multply th prdctd odds by $ β 1, or 2.2110 = 9.13 n th turtl xampl. At 27 dgrs th prdctd odds for a mal turtl ar approxmatly 0.20, about 1 to 5. That s, t s 5 tms mor lkly to b a fmal. At 28 dgrs th prdctd odds for a mal ar 9.13 tms bggr than at 27 dgrs, 1.80. Now mals ar almost twc as lkly 34

as fmals. Th ntrcpt can b thought of as th log-odds whn X s zro. Th antlog of th ntrcpt may hav som manng as a basln log-odds, spcally f zro s wthn th rang of th orgnal data. Snc th tmpraturs consdrd run from about 27 to 30 dgrs, th valu of zro s wll outsd th rang of th data. Th ntrcpt, and ts antlog, hav no practcal ntrprtaton n ths xampl. On of th qustons w wantd to answr was, "At what tmpratur ar mals and fmals qually lkly?" In ths cas, th log-odds ar qual to 1. So, w can solv th quaton Assssng th Ft 6132. + 2. 211X = 1, so 62.32 X = 28.2 dgrs. 2.211 So far w hav only lookd at stmatng paramtrs and prdctng valus. Th stmats and prdctons ar subjct to varaton. W must b abl to quantfy ths varaton n ordr to mak nfrncs. Just as n ordnary rgrsson, w nd som mans of assssng th ft of a logstc rgrsson modl and dtrmnng th sgnfcanc of coffcnts n that modl. For logstc rgrsson, th dvanc (also known as rsdual dvanc) s usd to assss th ft of th ovrall modl. Th dvanc for a logstc modl can b lknd to th rsdual sum of squars n ordnary rgrsson. Th smallr th dvanc th bttr th ft of th modl. Th dvanc can b compard to a ch-squar dstrbuton, whch approxmats th dstrbuton of th dvanc. Ths s an asymptotc rsult that rqurs larg sampl szs. Th dvanc for th combnd turtl data s 14.863 on 3 dgrs of frdom. Th chanc that a χ 2 wth 3 dgrs of frdom xcds 14.863 s 0.0019. Essntally w ar usng th dvanc to tst H : 0 ft s good vrsus Ha : ft s not good. Th p-valu of 0.0019 ndcats that th dvanc lft aftr th ft s too larg to conclud that th ft s good. Thus, thr s room for mprovmnt n th modl. Although thr s som lack of ft, dos tmpratur gv us statstcally sgnfcant nformaton about th sx of turtls va th logstc rgrsson? Look at th chang n dvanc whn tmpratur s addd to th modl. That s, compar dvanc whn th modl s smply πˆ = π to th dvanc from th logstc modl usng tmpratur as th xplanatory varabl. In Mntab ths s summarzd by G, th tst that all slops ar zro. G = 49.556 on 1 dgr of frdom p-valu = 0.000 Rjct th hypothss that th slop n th logstc rgrsson s zro Thus w can conclud that tmpratur dos gv us statstcally sgnfcant nformaton about th sx of turtls. 35

Altrnatv Tst Th rato of th stmatd coffcnt to ts standard rror, an approxmat z-statstc, can 2.211 b usd to assss sgnfcanc. In ths stuaton, z = = 5.13 wth p = 0.000. 0.4306 Both th z- and th G-statstc ndcat that tmpratur s statstcally sgnfcant. Snc sampl szs ar modrat, btwn 25 and 30, th p-valus drvd from thr tst wll b approxmat, at bst. In concluson, tmpratur s statstcally sgnfcant n th logstc rgrsson for th sx of turtls. Th logstc rgrsson may not provd th bst ft; othr modls may ft bttr. Th Challngr Dsastr Rvstd Usng th tchnqus of ths scton, w can ft a lnar modl to logts from th Challngr data. W wll rgroup th data by tmpratur nto ntrvals of 5 dgrs, usng th mdpont of ach ntrval for th ndpndant varabl. W also adjust th probablts a bt, rplacng 0 wth 0.01 and 1 wth 0.99, so w can tak logarthms for th logt ft. Thus, w hav th followng data:. Intrval (51, 55) (56, 60) (61, 65) (66, 70) (71, 75) (76, 80) (81, 85) Tmp 53 58 63 68 73 78 83 Prob 0.99 0.99 0.99 0.20 0.25 0.01 0.01 Logt 4.595 4.595 4.595-1.386-1.099-4.595-4.595 Th graph of th transformd data wth th lnar ft s shown blow. Th lnar modl s p fttd ln = 25.386 0.369Tmp. 1 p Fgur 24: Logt r-xprsson and lnar modl 36

Transformng ths modl to a probablty of falur scal s don by sttng 25.386 0.369Tmp ˆ P =. Ths graph s shown blow. 25.386 0.369Tmp 1 + Fgur 25: 25.386 0.369Tmp P = graphd aganst th tmprautur 25.386 0.369Tmp 1 + From th modl w can s that falurs wll occur at last half of th tm f th tmpratur s blow 68.8 dgrs. At 31 dgrs, th probablty s ssntally 1 for an O-rng falur. You can also us th ungroupd data wth Mntab's Bnary Logstc Rgrsson. In that analyss, th Logstc Rgrsson modl s Rvrsng th logt transformaton on has p fttd ln = 15.043 0.2322Tmp. 1 p 15.043 0.2322Tmp ˆ P =. 15.043 0.2322Tmp 1 + From ths modl, falurs wll occur at last half of th tm f th tmpratur s blow 64.8 dgrs. 37

Fgur 26: 15.043 0.2322Tmp P = graphd aganst th tmprautur 15.043 0.2322Tmp 1 + 38