METHODS FOR HANDLING TIED EVENTS IN THE COX PROPORTIONAL HAZARD MODEL

Similar documents
QUANTITATIVE METHODS CLASSES WEEK SEVEN

The example is taken from Sect. 1.2 of Vol. 1 of the CPN book.

Traffic Flow Analysis (2)

Incomplete 2-Port Vector Network Analyzer Calibration Methods

Econ 371: Answer Key for Problem Set 1 (Chapter 12-13)

Question 3: How do you find the relative extrema of a function?

EFFECT OF GEOMETRICAL PARAMETERS ON HEAT TRANSFER PERFORMACE OF RECTANGULAR CIRCUMFERENTIAL FINS

A Note on Approximating. the Normal Distribution Function

Adverse Selection and Moral Hazard in a Model With 2 States of the World

(Analytic Formula for the European Normal Black Scholes Formula)

5 2 index. e e. Prime numbers. Prime factors and factor trees. Powers. worked example 10. base. power

by John Donald, Lecturer, School of Accounting, Economics and Finance, Deakin University, Australia

Gold versus stock investment: An econometric analysis

Dehumidifiers: A Major Consumer of Residential Electricity

Parallel and Distributed Programming. Performance Metrics

Keywords Cloud Computing, Service level agreement, cloud provider, business level policies, performance objectives.

C H A P T E R 1 Writing Reports with SAS

CPU. Rasterization. Per Vertex Operations & Primitive Assembly. Polynomial Evaluator. Frame Buffer. Per Fragment. Display List.

WORKERS' COMPENSATION ANALYST, 1774 SENIOR WORKERS' COMPENSATION ANALYST, 1769

Mathematics. Mathematics 3. hsn.uk.net. Higher HSN23000

Architecture of the proposed standard

Category 7: Employee Commuting

New Basis Functions. Section 8. Complex Fourier Series

STATEMENT OF INSOLVENCY PRACTICE 3.2

Lecture 20: Emitter Follower and Differential Amplifiers

Long run: Law of one price Purchasing Power Parity. Short run: Market for foreign exchange Factors affecting the market for foreign exchange

Lecture 3: Diffusion: Fick s first law

Use a high-level conceptual data model (ER Model). Identify objects of interest (entities) and relationships between these objects

Real-Time Evaluation of Campaign Performance

Cloud and Big Data Summer School, Stockholm, Aug., 2015 Jeffrey D. Ullman

CALCULATING MARGINAL PROBABILITIES IN PROC PROBIT Guy Pascale, Memorial Health Alliance

Abstract. Introduction. Statistical Approach for Analyzing Cell Phone Handoff Behavior. Volume 3, Issue 1, 2009

On the moments of the aggregate discounted claims with dependence introduced by a FGM copula

MONEY ILLUSION IN THE STOCK MARKET: THE MODIGLIANI-COHN HYPOTHESIS*

IMES DISCUSSION PAPER SERIES

Entity-Relationship Model

ESTIMATING VEHICLE ROADSIDE ENCROACHMENT FREQUENCY USING ACCIDENT PREDICTION MODELS

Theoretical approach to algorithm for metrological comparison of two photothermal methods for measuring of the properties of materials

Free ACA SOLUTION (IRS 1094&1095 Reporting)

Rural and Remote Broadband Access: Issues and Solutions in Australia

Over-investment of free cash flow

FACULTY SALARIES FALL NKU CUPA Data Compared To Published National Data

SPREAD OPTION VALUATION AND THE FAST FOURIER TRANSFORM

An International Journal of the Polish Statistical Association

Hardware Modules of the RSA Algorithm

Basis risk. When speaking about forward or futures contracts, basis risk is the market

Developing Software Bug Prediction Models Using Various Software Metrics as the Bug Indicators

Relationship between Cost of Equity Capital And Voluntary Corporate Disclosures

A Theoretical Model of Public Response to the Homeland Security Advisory System

Essays on Adverse Selection and Moral Hazard in Insurance Market

AP Calculus AB 2008 Scoring Guidelines

An Adaptive Clustering MAP Algorithm to Filter Speckle in Multilook SAR Images

Far Field Estimations and Simulation Model Creation from Cable Bundle Scans

Performance Evaluation

Policies for Simultaneous Estimation and Optimization

Estimating Aboveground Tree Biomass on Forest Land in the Pacific Northwest: A Comparison of Approaches

Upper Bounding the Price of Anarchy in Atomic Splittable Selfish Routing

81-1-ISD Economic Considerations of Heat Transfer on Sheet Metal Duct

CPS 220 Theory of Computation REGULAR LANGUAGES. Regular expressions

Foreign Exchange Markets and Exchange Rates

High Interest Rates In Ghana,

The international Internet site of the geoviticulture MCC system Le site Internet international du système CCM géoviticole

A Multi-Heuristic GA for Schedule Repair in Precast Plant Production

A copy of the Consultation Paper is in the Members Library and further details are available at

Production Costing (Chapter 8 of W&W)

Sharp bounds for Sándor mean in terms of arithmetic, geometric and harmonic means

Theoretical aspects of investment demand for gold

ME 612 Metal Forming and Theory of Plasticity. 6. Strain

Modeling Motorcycle Accident on Rural Highway

Closed-form solutions for Guaranteed Minimum Accumulation Benefits

User-Perceived Quality of Service in Hybrid Broadcast and Telecommunication Networks

Planning and Managing Copper Cable Maintenance through Cost- Benefit Modeling

Time to Event Tutorial. Outline. How Not to Understand Time to Event

FACILITY MANAGEMENT SCHEMES FOR SCHOOLS IN THE UK:A STUDY OF VARIATIONS IN SUPPORT SERVICES COSTS AND CAPITAL EFFICIENCY RATIOS

June Enprise Rent. Enprise Author: Document Version: Product: Product Version: SAP Version:

Methods for quantitative relaxation parameter mapping: measuring T 1 and T 2

Constraint-Based Analysis of Gene Deletion in a Metabolic Network

Analysis of Trade Before and After the WTO: A Case Study of India

Introduction to Finite Element Modeling

Version 1.0. General Certificate of Education (A-level) January Mathematics MPC3. (Specification 6360) Pure Core 3. Final.

Development of Financial Management Reporting in MPLS

Analyzing the Economic Efficiency of ebaylike Online Reputation Reporting Mechanisms

Ethanolic Extraction of Soybean Oil: Oil Solubility Equilibria and Kinetic Studies

NUMERICAL COMPUTATION OF THE EFFECTIVENESS-NUMBER OF TRANSFER UNITS FOR SEVERAL CROSS-FLOW HEAT EXCHANGERS WITH DIFFERENT FLOW ARRANGEMENTS

Lift Selection Guide

Cost-Volume-Profit Analysis

Budget Optimization in Search-Based Advertising Auctions

Enforcing Fine-grained Authorization Policies for Java Mobile Agents

Sci.Int.(Lahore),26(1), ,2014 ISSN ; CODEN: SINTE 8 131

Carbon captured as a new instrument in forest management: some implications

Global Sourcing: lessons from lean companies to improve supply chain performances

SPECIAL VOWEL SOUNDS

Fleet vehicles opportunities for carbon management

Lecture notes: 160B revised 9/28/06 Lecture 1: Exchange Rates and the Foreign Exchange Market FT chapter 13

An Broad outline of Redundant Array of Inexpensive Disks Shaifali Shrivastava 1 Department of Computer Science and Engineering AITR, Indore

The Australian Rules Football Fixed Odds and Line Betting Markets: Econometric Tests for Efficiency and Simulated Betting Systems

The Neolithic transition, a major episode in human history, is

Transcription:

STUDIA OECONOMICA POSNANIENSIA 204, vol. 2, no. 2 (263 Jadwiga Borucka Warsaw School of Economics, Institut of Statistics and Dmography, Evnt History and Multilvl Analysis Unit jadwiga.borucka@gmail.com METHODS FOR HANDLING TIED EVENTS IN THE COX PROPORTIONAL HAZARD MODEL Abstract: Th Cox proportional hazard modl is on of th most common mthods usd in tim to vnt data analysis. Th modl is basd on svral rstrictiv assumptions on of which concrns tid vnts, i.. vnts with xactly th sam survival tim. If tim wr masurd in a prfctly continuous scal, such cass would nvr occur. In ral applications tim is usually masurd in a discrt mannr which rsults in th xistnc of tis in most survival data. Howvr, if this assumption is violatd, it should not ncssarily prvnt analysis by using th Cox modl. Th currnt papr prsnts and compars fiv ways proposd for handling tid vnts. On th basis of th calculations prformd it can b statd that xact xprssion and th discrt modl giv th bst rsults in trms of fit statistics; howvr, thy ar th most tim-consuming. Efron and Brslow approximations ar much fastr but rsult in wors modl fit. In th cas analysd th Efron mthod sms to b th bst choic, taking into account diffrncs in paramtrs stimats, fit statistics and calculation tim. What is mor, a simpl mthod basd on subtracting a tiny random valu from ach tid survival tim prformd surprisingly wll; both in trms of paramtr stimats compard with th xact xprssion, as wll as fit statistics. In gnral, in th cas of larg datasts and/or a larg numbr of tis, if stimation prcision is not as important as stimation tim, Brslow or mor prfrably Efron approximations might b usd. Howvr, if tim is not limitd, on should considr choosing an xact mthod or discrt modl that can provid bttr fit statistics and mor fficint paramtr stimats. Kywords: Cox modl, tid vnts, applid survival analysis, partial liklihood function. JEL Classification: C3, C4.

92 Jadwiga Borucka METODY ESTYMACJI MODELU PROPORCJONALNYCH HAZARDÓW COXA W WYPADKU WYSTĘPOWANIA ZDARZEŃ POWIĄZANYCH Strszczni: Modl proporcjonalnych hazardów Coxa jst jdną z najczęścij wykorzystywanych mtod w analizi czasu przżycia. Jst oparty na kilku rstrykcyjnych założniach; jdno z nich dotyczy występowania tzw. zdarzń powiązanych, tzn. zdarzń zaobsrwowanych dokładni w tym samym momnci. Gdyby czas był mirzony w sposób ciągły, taka sytuacja ni miałaby mijsca. W zastosowaniach praktycznych jdnakż czas jst zazwyczaj mirzony w sposób dyskrtny, co skutkuj występowanim zdarzń powiązanych w większości zbiorów wykorzystywanych w analizi czasu przżycia. Nimnij jdnak, odchylni od tgo założnia ni musi stanowić przszkody w stosowaniu modlu Coxa. Ninijszy artykuł przntuj pięć sposóbów proponowanych w litraturz dla zbiorów zawirających zdarznia powiązan. Na podstawi przprowadzonych obliczń można stwirdzić, ż mtoda xact oraz modl dyskrtny dają najlpsz rzultaty pod względm statystyk dopasowania, jdnakż są najbardzij czasochłonn. Przybliżnia Brslowa i Efrona pozwalają otrzymać oszacowania znaczni szybcij, al skutkują gorszą jakością dopasowania modlu. W rozważanym przykładzi przybliżni Efrona jst najlpszym wyborm, biorąc pod uwagę różnic w oszacowaniach paramtrów, statystyki dopasowania modlu oraz czas obliczń. Ponadto prosta mtoda oparta na odjęciu od każdj wartości zminnj czasowj pośród zdarzń powiązanych niwilkij wartości wylosowanj z rozkładu jdnostajngo, daj zaskakująco dobr wyniki zarówno pod względm oszacowań paramtrów, jak i statystyk dopasowania modlu. Ogólni rzcz biorąc, w przypadku dużych zbiorów i/lub dużj liczby zdarzń powiązanych, jśli prcyzja stymacji ni jst prioryttm, a ważnijszy jst czas obliczń, przybliżnia Brslowa lub Efrona mogą być wykorzystan. Jdnakż jśli czas ni jst ograniczony, warto rozważyć wybór mtody xact lub modlu dyskrtngo, któr umożliwiają otrzymani lpszych statystyk dopasowania modlu oraz wyższj fktywności oszacowań paramtrów. Słowa kluczow: modl Coxa, zdarznia powiązan, analiza czasu przżycia, funkcja częściowj wiarygodności. Introduction Th Cox proportional hazard modl is on of th most common mthods usd in tim to vnt data analysis. Th ida of th modl is to dfin hazard lvl as a dpndnt variabl which is bing xplaind by th tim-rlatd componnt (so calld baslin hazard and covariat-rlatd componnt. Th modl is dfind as follows: h(t, x, β = h 0 (t xp(βx, whr: h(t, x, β hazard function that dpnds on timpoint t and vctor of covariats x,

Mthods for Handling Tid Evnts in th Cox Proportional Hazard Modl 93 h 0 (t baslin hazard function that dpnds on tim only, xp(βx covariat-rlatd componnt, β vctor of paramtr stimats. Th modl is basd on svral rstrictiv assumptions on of which concrns tid vnts, i.. vnts with xactly th sam survival tim. This assumption is dirctly rlatd to th mthod of Cox modl stimation. Th paramtr stimats ar obtaind through th us of th maximum liklihood mthod, as suggstd by Cox [972]. Assuming a simpl modl for on vnt with right cnsoring, including a singl covariat and using th following rlationship btwn th probability dnsity function of th tim variabl, hazard function and survival function: f (t, x, β = h(t, x, β S(t, x, β th liklihood function for Cox modl can b drivd as follows [Hosmr & Lmshow 999]: n i= ci ci { i i i i } l( β = h( t, x, β S( t, x, β, whr: l(β liklihood function dpnding on paramtr β, i =, 2,, n obsrvations ordrd by tim (actual or cnsord, h(t i, x i, β hazard function at timpoint t i for subjct with covariat valu x i, S(t i, x i, β survival function at timpoint t i for subjct with covariat valu x i, c i cnsoring indicator (qual to if subjct xprincd th vnt and 0 if subjct is cnsord. Aftr taking th logarithm and doing crtain transformations th following xprssion is obtaind: x { ( i β i i i i n ( i } n 0 0 i= L( β = c ln h t + c x β + l S t. Th full liklihood function as dfind abov rquirs maximization with rspct to th paramtr β as wll as to th baslin hazard function which

94 Jadwiga Borucka is unspcifid. Cox [972] suggsts using an altrnativ xprssion, calld by him th partial liklihood function, which dpnds on th paramtr β only. H argus that th stimation of paramtr β obtaind through th us of th function proposd by him should hav th sam proprtis as on that would rsult from th full liklihood function. This thsis is provd in Andrsn t al. [993] as wll as Flming and Harrington [99]. Th approach prsntd by Cox lads to th following formula for th partial liklihood function: l ( β p c n xiβ = i x = jβ i j R ( ti, whr R(t i risk st at timpoint t i, which can also b rwrittn, taking into account only non-cnsord obsrvations, as follows: l ( β p m i* = i= x β j R ( ti* whr: l p (β partial liklihood function dpnding on paramtr β, i =, 2,, n non-cnsord obsrvations ordrd by tim (actual, t i* survival tim of i th subjct who xprincd th vnt. Th abov formula rsults in th following form for th logarithm of partial liklihood function: x jβ m Lp( β = xi* β ln i= j R ( ti* and aftr diffrntiating with rspct to β [Hosmr & Lmshow 999]: x j β x m j L m p( β j R ( ti* = xi* = xi* wij( β x j = β x j i= β i= j R ( ti* j R ( t i* m = { xi* xw },, x j β i= i

Mthods for Handling Tid Evnts in th Cox Proportional Hazard Modl 95 whr: x β j w ( β = and x = w ( β x. ij xl β wi ij j j R ( ti* l R ( ti* On th basis of th quation abov it can b sn that th ordr of vnts dos mattr: ach subjct who xprincs th vnt has thir own contribution to th partial liklihood function which includs summing up som xprssions for all th subjcts who ar at risk at th momnt at which th vnt for this particular subjct is bing obsrvd. Lt us say thr ar two subjcts A and B xprincing an vnt at xactly th sam tim. In such a situation it is not clar whthr subjct A should b considrd as bing at risk whil subjct B is xprincing th vnt and vic vrsa. Thus, th partial liklihood function dfind as abov assums that thr ar no tid vnts among th obsrvations. If tim wr masurd in a prfctly continuous scal, such cass would nvr occur and a propr ordring of vnts would not b a problm. In fact, in most applications, tim is bing masurd in a discrt mannr which rsults in th xistnc of tis in most survival data. Howvr, vn if th assumption of lack of tid vnts is violatd, it dos not xclud th Cox modl as a potntial tool in an analysis of tim to vnt data, although dviation from th assumption should not b nglctd. So far thr hav bn svral mthods dvlopd to handl tid vnts in proportional hazard modls. Th nxt sction of th currnt papr prsnts a thortical background for four of thm: th Brslow approximation, th Efron approximation, Kalbflisch and Prntic xact xprssion, as wll as th discrt modl. Additionally, a simpl altrnativ mthod basd on subtracting tiny random valu from ach tid survival tim is dscribd. In th subsqunt part of th articl, ths mthods ar implmntd in practic and compard in trms of diffrncs in paramtr stimats, fficincy of paramtrs stimats, fit statistics and computational tim.. Handling tid vnts thortical background In ordr to tak into account tid vnts in Cox modl stimation it is ncssary to adjust th partial liklihood function appropriatly. As Kalbflisch and Prntic [2002] argu, th most natural way to do this is to considr all possibl ordrs of vnt occurrncs for subjcts having th sam survival

96 Jadwiga Borucka tim. Thir ida can b dscribd as follows by assuming thr ar m distinct obsrvd survival tims ordrd as follows: t < < t m and that d i vnts happn at t i, whr i =, 2,, m. Furthrmor, Kalbflisch and Prntic dfin D( ti = { i,, id i } as th st of labls of failing obsrvations at t i. Taking Q i as th st of d i! prmutations for d i vnts obsrvd at t i, P is on lmnt of Q i and is dfind as P = (( p,,, p d i and finally R(t i, P, k = R(t i {p,, p k }. Thn, ach subjct with an obsrvd survival tim qual to t i has a contribution to th partial liklihood function qual to: whr x + i d i = j= x, which rsults in th following formula for th partial liklihood function: j di x i + β x l β d i! P Q i k= l R ( ti, P, k l( β m xi + β = di i= P Q i k= j R ( ti, P, k Maximization of th function dfind as abov might b tim-consuming, spcially in th cas of a larg numbr of tis. This fact ld to th drivation of som approximat xprssions for th partial liklihood function. On of thm is prsntd by Brslow [975] who suggsts summing up covariatrlatd componnts for all subjcts xprincing th vnt at a givn timpoint t i and raising th rsult to a powr qual to th numbr of vnts tid at t i. Th partial liklihood function that uss this approach is dfind as follows: l( β m xi+ β = i= j R ( ti Howvr, if th numbr of tid vnts for any timpoint is rlativly larg, this mthod might not giv a good approximation of th partial liklihood function as dfind by Kalbflisch and Prntic. An altrnativ suggstion coms x jβ d i x jβ.,.

Mthods for Handling Tid Evnts in th Cox Proportional Hazard Modl 97 from Efron [977]. According to his ida, th partial liklihood function can b approximatd as follows: l( β = m xi+ β di i= x j β k xj β k= d j R ( ti i j D ( ti As Kalbflisch and Prntic point out, obtaining paramtr stimats on th basis of th abov dfind approximation is not particularly complicatd; howvr, th stimators rsulting from Efron and Brslow approximations might b biasd. What is mor, stimator of th varianc of paramtr ˆβ is not consistnt [Kalbflisch & Prntic 2002]. On th basis of numrous calculations it has bn shown that th Brslow mthod causs svr bias for datasts with larg fraction of d i /n i, whr n i is th numbr of subjcts at risk at t i ; whil Efron approximation and xact xprssion still prform wll, vn if numbr of tis is high. If th numbr of tid vnts is vry small, all thr mthods giv vry similar rsults. For datasts with no tis, all mthods lad to xactly th sam rsults. As an altrnativ for approximations of th partial liklihood function, applicabl spcially for datasts with a larg numbr of tis (which might suggst th fact that tid vnts do not rsult from insufficint prcision in tim masurmnt but rathr from th discrt charactr of th tim variabl, on can considr choosing th approach suggstd by Cox [972], i.. using a discrt logistic modl dfind as follows: h( t, x, β h0( t = h( t, x, β h ( t Kalbflisch and Prntic provid a gnralization of th partial liklihood function applicabl to th modl dfind abov. It nabls a calculation of th probability that, givn th risk st at timpoint t i and th numbr of vnts d i occurring at this timpoint, th vnts will b xprincd by xactly ths subjcts who hav an obsrvd survival tim qual to t i. Th conditional probability is givn by [Kalbflisch & Prntic 2002]: 0 m xi+ β + = xi β i j R di ( ti, xβ..

98 Jadwiga Borucka whr R di ( ti constituts th st of all possibl substs including xactly d i distinct subjcts slctd out of units who ar at risk at t i, dnotd as R(t j. Th mthods dscribd abov ar usually mntiond in th litratur as far as th handling of tid vnts in th Cox modl is concrnd. As a simpl altrnativ, thr is anothr way proposd in th currnt papr. Th mthod itslf is suggstd by Hosmr and Lmshow [999] as th way to brak tis for th purpos of non-paramtric survival function stimation. Th ida is to subtract a tiny random valu from ach tid survival tim. In this way, tid vnts will b uniquly ordrd with rspct to ach othr but thir position in rlation to all rmaining obsrvations rmains unchangd. Hosmr and Lmshow argu that this solution has no ffct whn stimating th survival function as th stimat for th last tid vnt at t i is xactly th sam as if it wr calculatd for all tis at t i considrd simultanously. Thy do not analys howvr th utility of this mthod in trms of braking tis in th Cox modl. In th currnt papr, th ida of this mthod is bing applid for a Cox modl whr tid vnts cannot b nglctd du to partial liklihood function rquirmnts. In ordr to brak tis, a random valu from th uniform distribution dfind at th intrval [0, 0.00] is subtractd from ach tid obsrvd survival tim. In this cas, it dos not mattr which vrsion of th partial liklihood function is bing usd (Efron, Brslow or xact xprssion as all of thm lad to th sam rsults if thr ar no tis among th obsrvations. 2. Mthods of tid vnts handling application and rsults As has bn mntiond, th main aim of th prsnt papr is to compar fiv mthods dscribd abov in trms of paramtr stimats, fficincy of stimators, fit statistics, as wll as computational tim. In ordr to do this, th sam Cox proportional hazard modl was stimatd through th us of ach of ths mthods. Additionally, for ach mthod th stimation was rpatd 0 tims and th avrag computational tim pr mthod was calculatd. Th datast usd for analysis containd artificially gnratd data including th following: tim variabl, cnsoring indicator and 7 covariats that ar considrd as potntial xplanatory variabls in tim to vnt analysis. Th datast contains 6500 obsrvations, out of which 700 ar cnsord. Th numbr of tis is rlativly high: thr ar 3 distinct obsrvd survival tims, thir counts ranging from 00 to 500. Du to th fact that th prsnt papr is focusd on th mthodology rathr than on mpirical rsults, using artificial data is justifid. Paramtr stimats ar not supposd to b intrprtd as th main

Mthods for Handling Tid Evnts in th Cox Proportional Hazard Modl 99 aim is to compar th fiv mthods of handling tid vnts with rgard to th stimator fficincy and fit statistics, thus, th variabls in th datast ar not namd dirctly but ar rfrrd to as tim variabl, cnsoring indicator and covariats (or xplanatory variabls numbrd from to 7. All calculations wr prformd using th SAS Bas 9.3 modul. Th SAS cod that nabld th modl stimation using all of ths mthods and to modify th tim variabl by subtracting a random valu from th uniform distribution is includd in th Appndix. As far as th mthod basd on subtracting a tiny random valu is concrnd, th Brslow vrsion of partial liklihood function was usd as in th cas of a lack of tid vnts all approximations lad to th sam rsults; howvr, th algorithm for Brslow approximation mbddd in th PHREG procdur is th fastst out of all four mthods. Tabl prsnts paramtr stimats with p-valus and Tabl 2 standard rrors of paramtr stimats obtaind through th us of ach of fiv mthods of tid vnts handling. Thr ar som visibl diffrncs btwn paramtr stimats obtaind through th us of diffrnt mthods. Efron approximation givs rsults that ar vry clos to thos gnratd by xact xprssion. Estimats from Brslow Tabl. Paramtr stimats for th Cox modl Covariat Exact Efron Brslow Discrt Random valu Covariat 0.0250 0.0250 0.0230 0.0248 0.0250 Covariat 2 0.07 0.07 0.098 0.0234 0.070 Covariat 3 0.438 0.438 0.394 0.573 0.437 Covariat 4 0.08 0.082 0.095 0.095 0.082 Covariat 5 0.50 0.50 0.479 0.707 0.52 Covariat 6 0.2642 0.2650 0.2636 0.2700 0.2648 Covariat 7 3.8732 3.8692 3.6705 4.020 3.8672 p-valu Covariat < 0.000 < 0.000 < 0.000 < 0.000 < 0.000 Covariat 2 < 0.000 < 0.000 < 0.000 < 0.000 < 0.000 Covariat 3 < 0.000 < 0.000 < 0.000 < 0.000 < 0.000 Covariat 4 0.0 0.004 0.067 0.079 0.0804 Covariat 5 < 0.000 < 0.000 < 0.000 < 0.000 < 0.000 Covariat 6 < 0.000 < 0.000 < 0.000 < 0.000 < 0.000 Covariat 7 < 0.000 < 0.000 < 0.000 < 0.000 < 0.000 Sourc: Own calculation.

00 Jadwiga Borucka Tabl 2. Standard rrors of paramtr stimats for th Cox modl Covariat Exact Efron Brslow Discrt Random valu Covariat 0.006 0.006 0.006 0.007 0.006 Covariat 2 0.0042 0.004 0.004 0.0044 0.004 Covariat 3 0.0075 0.0075 0.0074 0.0078 0.0075 Covariat 4 0.0495 0.0495 0.0490 0.052 0.0495 Covariat 5 0.0094 0.0094 0.0092 0.0098 0.0094 Covariat 6 0.0552 0.0552 0.0547 0.0570 0.0552 Covariat 7 0.40 0.409 0.389 0.475 0.408 Sourc: Own calculation. approximation sm to giv undrstimations for fiv out of svn variabls, as compard with xact xprssion or th Efron mthod. This obsrvation is in accordanc with th conclusions drawn by Hrtz-Picciotto and Rockhill [977] who also indicat a tndncy for Brslow approximation to undrstimat paramtrs in th Cox modl. Th discrt modl tnds to giv highr absolut valus for paramtr stimats (with th xcption of covariat 7. In this cas, howvr, it should b notd that th discrt modl is basd on a diffrnt liklihood function than th Cox proportional hazard modl which uss a partial liklihood function (with potntial modification such as xact xprssion, th Brslow or Efron approximation, thus th paramtrs obtaind from ths modls do not hav xactly th sam intrprtation [Kalbflisch & Prntic 2002]. Th rsults obtaind through th us of random valu mthods do not rprsnt any systmatic pattrn, as som stimats ar highr than for xact xprssion and som ar lowr; howvr, paramtr stimats ar vry clos to thos coming from th Efron or xact mthod. What is intrsting, is th fact that diffrncs btwn th paramtrs obtaind through th us of xact xprssion and th random valu mthod ar in most cass lowr than th diffrncs btwn paramtrs coming from th xact mthod, Brslow approximation or th discrt modl, which might indicat that simply subtracting a tiny random valu might vn giv bttr rsults than som mor formal mthods of handling tid vnts. As far as standard rrors ar concrnd, th rsults obtaind do not diffr btwn th fiv mthods to a grat xtnt, only th discrt mthod shows slightly highr rrors, but ths diffrncs ar not vry high. What is worth bing mntiond is that th p valu for covariat 3 diffrs btwn all fiv mthods to th xtnt that could vn lad to diffrnt conclusions concrning th statistical significanc of

Mthods for Handling Tid Evnts in th Cox Proportional Hazard Modl 0 th covariats, dpnding on th significanc lvl. Assuming alpha = 0.05 all mthods indicat a lack of significanc, howvr taking alpha = 0. w would hav to chang this dcision on th basis of Brslow, discrt and random valu mthods. Th rsults obtaind through th us of Brslow approximation gav th strongst rason for a rjction of th null hypothsis, whil thos from th discrt modl and random valu mthod ar not that obvious. Additionally, standardizd masurs of variability, as suggstd by Nardi and Schmpr [2003] and dfind as: σ β SV = βˆ wr obtaind. Thy can b usd to assss fficincy of paramtr stimators. Thir valus ar prsntd in Tabl 3. Tabl 3. Standardizd masur of variability Covariat Exact Efron Brslow Discrt Random valu Covariat 0.0657 0.0657 0.0705 0.0685 0.0657 Covariat 2 0.2427 0.2420 0.2056 0.88 0.2432 Covariat 3 0.0522 0.0522 0.053 0.0496 0.0522 Covariat 4 0.6098 0.6088 0.5352 0.5694 0.6090 Covariat 5 0.0622 0.062 0.0623 0.0576 0.062 Covariat 6 0.2089 0.208 0.2074 0.22 0.2082 Covariat 7 0.0364 0.0364 0.0378 0.0367 0.0364 Sourc: Own calculation. On th basis of th standardizd masur of variability, it is hard to intrchangably indicat th bst mthod. What is worth bing mntiond is that th Brslow mthod gav th highst valu for th cofficint th most oftn: for four out of svn variabls; howvr, variability is also th lowst for two variabls whn th Brslow approximation is applid. Th lowst SV valus ar displayd by th discrt modl in thr variabls; and in th random valu mthod, xact xprssion and Efron approximation in two out of svn covariats. Additionally, fit statistics including doubld ngativ logarithm of liklihood function as wll as th information critria AIC and SBC wr calculatd s Tabl 4.

02 Jadwiga Borucka Tabl 4. Fit statistics Statistics Exact Efron Brslow Discrt Random valu -2 LOG L 32 87.43 72 707.39 73 424.65 32 098.36 72 708. AIC 32 20.43 72 72.39 73 438.65 32 2.36 72 722. SBC 32 246.77 72 766.72 73 483.99 32 57.70 72 767.45 Sourc: Own calculations. Th discrt modl guarants th bst valus for fit statistics; howvr, th diffrnc btwn th discrt modl and xact xprssion is vry slight. Brslow and Efron approximations, as wll as th random valu mthod, rsult in much highr valus for fit statistics, vn mor than twic as much as thos coming from th xact or discrt modls. Th worst rsults com from th Brslow mthod. What is worth paying attntion to is that th random valu mthod nabls not only bttr rsults to b obtaind as compard with th Brslow mthod, but also valus that ar vry clos to thos coming from Efron approximation. Finally, th computational tim that was ndd to obtain th paramtr stimats through th us of ach of th fiv mthods was compard. Estimation was prformd by using th SAS PHREG procdur (Tabl 5; dtails concrning th SAS cod that was usd is includd in th Appndix. Tabl 5. Computational tim (SAS PHREG procdur Tim Exact Efron Brslow Discrt Random valu Ral Tim (sc. 2.984 0.099 0.075 4.9 0.0 CPU Tim (sc. 2.980 0.092 0.068 4.8 0.085 Sourc: Own calculations. As was xpctd, mor complicatd mthods such as xact xprssion or th discrt modl wr th most tim-consuming. It took th most tim to stimat th discrt modl. Exact xprssion was fastr than th discrt modl but was still considrably slowr than Brslow and Efron approximations. Estimation of th modl with a partial liklihood basd on Brslow approximation was th fastst, th mthod subtracting random valu from It should b notd that comparisons of statistical fit btwn th discrt modl and othr mthods that us a partial liklihood function should b prformd with caution, as othr modls ar basd on diffrnt liklihood functions.

Mthods for Handling Tid Evnts in th Cox Proportional Hazard Modl 03 tid vnt tims was slightly slowr, as was th Efron approximation; howvr, ths diffrncs wr vry slight. In gnral, th discrt modl and xact xprssion provid th bst fit statistics but th stimation tim ndd to us ths mthods is visibly highr as compard with othr mthods. Th xact valus of stimation tim dpnd obviously on th computr paramtrs; howvr, using th sam quipmnt to stimat all th modls nabld such comparisons to b prformd. It should also b takn into account that th datast usd in th calculations had only 6500 rcords and th modl contains 7 covariats, thus vn th slowst mthod allows rsults to b obtaind in a vry short priod of tim. For mor complx analysis dsigns and/or largr datasts, and/or a largr numbrs of tid vnts, ths diffrncs might b vn mor visibl and as such might constitut a mor important factor whn choosing th bst mthods for a givn analysis. In th scintific fild, prcision of stimation is usually mor important than th tim that is ndd to gt th rsults; howvr, as far as practical applications ar concrnd, computational tim might also nd to b takn into account. Conclusions This simpl mpirical study showd that in th cas of a rlativly high numbr of tis, rsults obtaind through th us of diffrnt mthods lad to visibly diffrnt rsults. Efron approximation sms to giv rsults that ar th closst to th ons obtaind through th us of xact xprssion as drivd by Kalbflisch and Prntic. On th othr hand, a simpl mthod basd on subtracting tiny random valu from ach tid survival tim provids rsults that do not diffr from th xact rsults to a grat xtnt. A comparison of standard rrors and of paramtr stimats indicats that th discrt modl might b prforming a littl wors as compard with othr mthods. It would b hard, howvr, to choos th bst mthod on th basis of standard rrors as wll as standardizd masurs of variability. In trms of fit statistics, xact xprssion and th discrt modl ar suprior whn compard with th othrs. Surprisingly, th random valu mthod lads to a bttr fit than Brslow and is comparabl to Efron approximation; howvr, ths diffrncs ar not vry grat. Whn it coms to computational tim, mor sophisticatd mthods such as xact xprssion and th discrt modl rquir substantially longr priod of tim to obtain stimats. Brslow and Efron approximations as wll as th random valu mthod took mor or lss th sam tim to prform calculations, which was visibly shortr whn compard to xact xprssion. Considring all th

04 Jadwiga Borucka statistics analysd, in this instanc th Efron mthod sms to b th bst choic: fit statistics ar visibly wors than for th xact or discrt mthods; howvr, fficincy and th valus of paramtr stimats ar narly qual to th rsults of obtaind through th us of th xact mthod, whil th stimation tim for th Efron mthod is rmarkably shortr than for th modl with partial liklihood basd on th xact xprssion. What is worth bing mntiond is that th simpl random valu mthod sms to b an attractiv altrnativ hr, taking into account th fact that paramtr stimats wr vry clos to thos obtaind through th us of xact xprssion, fit statistics wr bttr than in Brslow approximation and almost qual to th ons obtaind through th us of th Efron mthod, and th vry short priod of tim ndd to prform calculations. In gnral, in th cas of larg datasts and/ or a larg numbr of tis, if stimation prcision is not xtrmly important but th stimation tim is, th Brslow mthod might b usd as it guarants a rlativly short calculation tim. Efron approximation, which rquirs only a littl mor tim to obtain rsults and provids a visibly bttr fit as wll as rsults that ar much closr to th ons coming from xact xprssion, might b an vn bttr choic. If tim is not that limitd and stimation prcision is a mor important factor which should usually b th cas in scintific rsarch th xact mthod would b th most dsirabl on as it provids a much bttr fit statistics and highr fficincy of paramtr stimats. Exact xprssion could also b considrd as th bst choic du to a construction that is basd on th assumption that vry ordring of tid vnts can happn with xactly th sam probability, which is rasonabl and th most saf approach in th cas of tis. Apart from th four mthods that ar usually dscribd in th litratur in th contxt of tis in th Cox modl, a simpl mthod basd on subtracting a tiny random valu from ach tid survival tim sms to b an attractiv altrnativ. This mthod prforms surprisingly wll whn compard to mor formal ways of handling tid vnts, taking far lss tim to prform calculations, thus it might b considrd as wll, spcially for crtain prliminary analyss that do not rquir a strong thortical foundation but whr rsults ar xpctd quit quickly.

Mthods for Handling Tid Evnts in th Cox Proportional Hazard Modl 05 Rfrncs Andrsn, P.K., Borgan, Ø., Gill, R.D., Kiding, N., 993, Statistical Modls Basd on Counting Procsss, Springr-Vrlag, Nw York. Brslow, N., 975, Covarianc Analysis of Cnsord Survival Data, Biomtrics, vol. 30, no., s. 89 99. Cox, D.R., 972, Rgrssion Modls and Lif-Tabls, Journal of th Royal Statistical Socity. Sris B (Mthodological, vol. 34, no. 2, s. 87 220. Efron, B., 977, Th Efficincy of Cox s Liklihood Function for Cnsord Data, Journal of th Amrican Statistical Association, vol. 72, no. 359, s. 557 565. Flming, T.R., Harrington, D.P., 99, Counting Procss and Survival Analysis, John Wily & Sons, Nw York. Hrtz-Picciotto, I., Rockhill, B., 977, Validity and Efficincy of Approximation Mthods for Tid Survival Tims in Cox Rgrssion, Biomtrics, vol. 53, no. 3, s. 5 56. Hosmr, D.W., Lmshow, S., 999, Applid Survival Analysis. Rgrssion Modlling of Tim to Evnt Data, John Wily & Sons, Nw York. Kalbflisch, J.D., Prntic, R.L., 2002, Th Statistical Analysis of Failur Tim Data, John Wily & Sons, Nw York. Nardi, A., Schmpr, M., 2003, Comparing Cox and Paramtric Modls in Clinical Studis, Statistics in Mdicin, vol. 22, s. 3597 360, DOI: 0.002/sim.592. Appndix /*STEP Modify tim variabl for tid vnts by subtracting small valu from uniform distribution [0, 000]*/ proc sql; crat tabl dist as slct tim, count(* as count from Tid_vnts whr cnsor = group by tim; crat tabl Tid_vnts as slct a.*, b.count from Tid_vnts as a lft join dist as b on a.tim = b.tim; quit; data Tid_vnts; st Tid_vnts; if count> and cnsor = thn tim_mod = tim-ranuni(234/000; ls tim_mod = tim; run; /*STEP 2 Estimation*/ /*Exact xprssion: option tis = xact*/ ods output ParamtrEstimats = st_x FitStatistics = fit_x;

06 Jadwiga Borucka proc phrg data= Tid_vnts; modl Tim*Cnsor(0 = Ag Ind Ind2 Ind3 Ind4 Ind5 Ind6 /tis = xact; run; ods output clos; /*Brslow approximation: option tis = Brslow*/ ods output ParamtrEstimats = st_br FitStatistics = fit_br; proc phrg data= Tid_vnts; modl Tim*Cnsor(0 = Ag Ind Ind2 Ind3 Ind4 Ind5 Ind6 /tis = Brslow; run; ods output clos; /*Efron approximation: option tis = Efron*/ ods output ParamtrEstimats = st_f FitStatistics = fit_f; proc phrg data= Tid_vnts; modl Tim*Cnsor(0 = Ag Ind Ind2 Ind3 Ind4 Ind5 Ind6 /tis = Efron; run; ods output clos; /*Discrt modl: option tis = disctt*/ ods output ParamtrEstimats = st_dis FitStatistics = fit_dis; proc phrg data= Tid_vnts; modl Tim*Cnsor(0 = Ag Ind Ind2 Ind3 Ind4 Ind5 Ind6 /tis = discrt; run; ods output clos; /*Random valu mthod: modifid tim variabl*/ ods output ParamtrEstimats = st_ran FitStatistics = fit_ran; proc phrg data= Tid_vnts; modl Tim_mod*Cnsor(0 = Ag Ind Ind2 Ind3 Ind4 Ind5 Ind6 /tis = Brslow; run; ods output clos;