A piecewise Markov process for analysing survival from breast cancer in dierent risk groups



Similar documents
Statistics in Retail Finance. Chapter 6: Behavioural models

MODEL CHECK AND GOODNESS-OF-FIT FOR NESTED CASE-CONTROL STUDIES

Introduction. Survival Analysis. Censoring. Plan of Talk

Tips for surviving the analysis of survival data. Philip Twumasi-Ankrah, PhD

Survey, Statistics and Psychometrics Core Research Facility University of Nebraska-Lincoln. Log-Rank Test for More Than Two Groups

Lecture 2 ESTIMATING THE SURVIVAL FUNCTION. One-sample nonparametric methods

5.1 CHI-SQUARE TEST OF INDEPENDENCE

Checking proportionality for Cox s regression model

More details on the inputs, functionality, and output can be found below.

Applied Statistics. J. Blanchet and J. Wadsworth. Institute of Mathematics, Analysis, and Applications EPF Lausanne

Survival Analysis of Left Truncated Income Protection Insurance Data. [March 29, 2012]

Life Table Analysis using Weighted Survey Data

ATV - Lifetime Data Analysis

SPSS TRAINING SESSION 3 ADVANCED TOPICS (PASW STATISTICS 17.0) Sun Li Centre for Academic Computing lsun@smu.edu.sg

STATISTICA Formula Guide: Logistic Regression. Table of Contents

SUMAN DUVVURU STAT 567 PROJECT REPORT

Missing data and net survival analysis Bernard Rachet

Survival Distributions, Hazard Functions, Cumulative Hazards

7.1 The Hazard and Survival Functions

Nonparametric adaptive age replacement with a one-cycle criterion

TABLE OF CONTENTS. GENERAL AND HISTORICAL PREFACE iii SIXTH EDITION PREFACE v PART ONE: REVIEW AND BACKGROUND MATERIAL

A LONGITUDINAL AND SURVIVAL MODEL WITH HEALTH CARE USAGE FOR INSURED ELDERLY. Workshop

Survival analysis methods in Insurance Applications in car insurance contracts

Regression Modeling Strategies

Andrew Charles Titman

A random point process model for the score in sport matches

Least Squares Estimation

Statistics Graduate Courses

Competing-risks regression

SAS Software to Fit the Generalized Linear Model

Personalized Predictive Medicine and Genomic Clinical Trials

Lecture 15 Introduction to Survival Analysis

Introduction to General and Generalized Linear Models

Application of Markov chain analysis to trend prediction of stock indices Milan Svoboda 1, Ladislav Lukáš 2

Modeling the Claim Duration of Income Protection Insurance Policyholders Using Parametric Mixture Models

A LOGNORMAL MODEL FOR INSURANCE CLAIMS DATA

Komorbide brystkræftpatienter kan de tåle behandling? Et registerstudie baseret på Danish Breast Cancer Cooperative Group

Tests for Two Survival Curves Using Cox s Proportional Hazards Model

CHILDHOOD CANCER SURVIVOR STUDY Analysis Concept Proposal

Gamma Distribution Fitting

Methods for Meta-analysis in Medical Research

Simple Linear Regression Inference

Principle Component Analysis and Partial Least Squares: Two Dimension Reduction Techniques for Regression

R 2 -type Curves for Dynamic Predictions from Joint Longitudinal-Survival Models

Monitoring the Behaviour of Credit Card Holders with Graphical Chain Models

Chapter 6: Multivariate Cointegration Analysis

LECTURE 4. Last time: Lecture outline

People have thought about, and defined, probability in different ways. important to note the consequences of the definition:

If several different trials are mentioned in one publication, the data of each should be extracted in a separate data extraction form.

Testing Simple Markov Structures for Credit Rating Transitions

Life Tables. Marie Diener-West, PhD Sukon Kanchanaraksa, PhD

The Landmark Approach: An Introduction and Application to Dynamic Prediction in Competing Risks

LOGISTIC REGRESSION ANALYSIS

SAMPLE SIZE TABLES FOR LOGISTIC REGRESSION

Product Pricing and Solvency Capital Requirements for Long-Term Care Insurance

Survival Analysis, Software

Effect of Risk and Prognosis Factors on Breast Cancer Survival: Study of a Large Dataset with a Long Term Follow-up

Bias in the Estimation of Mean Reversion in Continuous-Time Lévy Processes

Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics.

Model-Based Cluster Analysis for Web Users Sessions

Yield Curve Modeling

An Application of the G-formula to Asbestos and Lung Cancer. Stephen R. Cole. Epidemiology, UNC Chapel Hill. Slides:

Interpretation of Somers D under four simple models

Dealing with Missing Data

2 Right Censoring and Kaplan-Meier Estimator

CHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression

BayesX - Software for Bayesian Inference in Structured Additive Regression

Hormones and cardiovascular disease, what the Danish Nurse Cohort learned us

TABLE OF CONTENTS. 4. Daniel Markov 1 173

Systematic Reviews and Meta-analyses

Poisson Models for Count Data

Randomized trials versus observational studies

A Regime-Switching Model for Electricity Spot Prices. Gero Schindlmayr EnBW Trading GmbH

Survival Analysis of the Patients Diagnosed with Non-Small Cell Lung Cancer Using SAS Enterprise Miner 13.1

Analysis of Financial Time Series

Introduction to mixed model and missing data issues in longitudinal studies

A Bayesian hierarchical surrogate outcome model for multiple sclerosis

Predicting Customer Default Times using Survival Analysis Methods in SAS

These axioms must hold for all vectors ū, v, and w in V and all scalars c and d.

Vignette for survrm2 package: Comparing two survival curves using the restricted mean survival time

Statistical Rules of Thumb

A new score predicting the survival of patients with spinal cord compression from myeloma

Transcription:

STATISTICS IN MEDICINE Statist. Med. 2001; 20:109 122 A piecewise Markov process for analysing survival from breast cancer in dierent risk groups Rafael Perez-Ocon ;, Juan Eloy Ruiz-Castro and M. Luz Gamiz-Perez Departamento de Estadstica e Investigacion Operativa; Universidad de Granada; 18071 Granada; Spain SUMMARY A study of the relapse and survival times for 300 breast cancer patients submitted to post-surgical treatments is presented. After surgery, these patients were given three treatments: chemotherapy; radiotherapy; hormonal therapy and a combination of them. From the data set, a non-homogeneous Markov model is selected as suitable for the evolution of the disease. The model is applied considering two time periods during the observation of the cohort where the disease is well dierentiated with respect to death and relapse. The eect of the treatments on the patients is introduced into the model via the transition intensity functions. A piecewise Markov process is applied, the likelihood function is built and the parameters are estimated, following a parametric methodological procedure. As a consequence, a survival table for dierent treatments is given, and survival functions for dierent treatments are plotted and compared with the corresponding empirical survival function. The t of the dierent curves is good, and predictions can be made on the survival probabilities to post-surgical treatments for dierent risk groups. Copyright? 2001 John Wiley & Sons, Ltd. 1. INTRODUCTION The introduction of dynamic models in survival studies presents certain advantages that are useful in this eld of application since the disease evolves over time and therefore the subjects must be followed up within a certain period. It is characteristic of survival studies that data are frequently censored; however, sometimes dierent factors that aect the lifetime evolution of the subjects must be introduced into the model when they are known. Stochastic processes are helpful when subjects in a population can spend times in dierent states, with sojourn times in the states being of interest. This is the case for cancer patients under treatment; they can be in dierent states, and the sojourn times in these states, as well as the changes among states that occur are of interest. When the present state of the disease summarizes all the previous information, the Markov model is appropriate. Moreover, this model has the advantage of exibility in the incorporation Correspondence to: Rafael Perez-Ocon, Departamento de Estadstica e Investigacion Operativa, Universidad de Granada, 18071 Granada, Spain E-mail: rperezo@ugr.es Contract=grant sponsor: DGES; contract=grant number: PB97-0827 Received March 1998 Copyright? 2001 John Wiley & Sons, Ltd. Accepted January 2000

110 R. P EREZ-OC ON, J. E. RUIZ-CASTRO AND M. L. G AMIZ-P EREZ of censored data and in the introduction of risk factors. In survival studies, homogeneous Markov models in continuous time have been used by various authors to analyse dierent diseases [1 4]. These works and the references therein are used as a basis to demonstrate that such models represent a good approach in this eld and provide useful insights. The hypothesis of homogeneity is unrealistic in some cases, since as time goes on the disease evolves. Several approaches to statistical inference in non-homogeneous Markov processes have been followed using non-parametric methods [5 7]. In this paper we propose a continuous-time non-homogeneous Markov process to study the evolution of breast cancer in a cohort of 300 patients. The non-homogeneous approach considered is a piecewise Markov process, with the transition intensity functions being step functions. We are interested in the survival evolution of dierent groups of patients that are determined by the treatment. These treatments are incorporated as covariates via the transition intensity functions of the Markov process. There are three treatments: chemotherapy; radiotherapy; hormonal therapy, and a combination of these three. Several researchers have studied Markov processes with covariates, and a procedure to obtain the parameters in a model with covariates has been reported [1; 2]. The methodology we review in this work follows a parametric procedure [1] dierent from other nonand semi-parametric methods [7]. In our model, we consider a three-dimensional covariate vector. Parametric methods are applied, the likelihood function is constructed and the transition intensity functions estimated. These functions are dependent on time due to their non-homogeneity, and the eect of the risk factors considered are included in these functions in a similar way to the Cox model [8]. The risk groups considered are the dierent classes of treatments that the patients received after surgery. We examine the inuence of post-surgical treatments in these groups, particularly on their lifetimes and relapse times. The paper is organized as follows. In Section 2 we provide a description of the data and the model. In Section 3 the likelihood function is built, parameters are estimated and survival functions for dierent treatments are given and compared with the empirical survival graphs. A test of the goodness-of-t between theoretical and empirical curves is applied, showing the goodness-of-t of the model. In Section 4 we draw several conclusions from the work. 2. SUBJECTS AND METHODS Data on a cohort of subjects submitted to the surgical treatment of breast cancer were obtained from the Department of Radiology in the Hospital Clnico of Granada. The follow-up dates from 1973 and all non-censored subjects were seen longitudinally until December 1995. A total of 518 subjects submitted to surgical treatment have been monitored. Entry into the study began at the time of surgery. Then, dierent treatments were given to the patients. Three states were considered in the evolution of the disease: no relapse (state 1); relapse (state 2), and death (state 3), which is an absorbent state. A test for Markov assumption, using a competing-risks Cox model, was performed [9; 10]. The p-value calculated from the data was 0.1119, so that there is no empirical evidence for rejecting the null hypothesis of the Markov assumption. From this cohort, a subcohort of 300 patients with breast cancer was considered. All of these 300 patients were selected because they received the same chemotherapy treatment, when applied, and had at least one axillary node aected. The observation period for subjects in this cohort was at least ve years. This cohort was studied

PIECEWISE MARKOV PROCESS FOR ANALYSING SURVIVAL FROM BREAST CANCER 111 because it was deemed informative by the group of oncologists with whom we are working. Breast cancer is the most common cancer in the female population, not only in Spain but in the entire world. The usual treatments include the ones considered here, and relapse and mortality occur more frequently in patients with some axillary nodes aected. Transition rates to relapse and death for dierent groups of risk are estimated, and these are quantities with which epidemiologists are familiar [1; 2; 18]. With the multi-state model we use, it is possible to learn more about how the dierent treatments aect the evolution of the disease over time. The mean age for our cohort was 52.5 years, and the total censored patients was 122 (40.7 per cent of the total). In the rst four years, 195 survived (170 in state 1 and 25 in state 2), 105 died (67 in state 1 and 38 in state 2), with no censored data in this period. From four to ten years there were 70 deaths (45 in state 1 and 25 in state 2), 25 censored (24 in state 1 and 1 in state 2), 12 relapses (4 per cent of total data), and 100 patients still alive after ten years. After ten years, there were 3 deaths (2 in state 1 and 1 in state 2), 97 censored (86 in state 1 and 11 in state 2), and only one woman relapsed in this period. It is thus evident that the deaths and relapses occurred essentially in the rst ten years. Most censored data are in state 1 (25 in the rst ten years and 97 thereafter). After surgery, three specic treatments were applied to these patients: chemotherapy; (CT); radiotherapy (RT), and hormonal therapy (HT), and combinations of these three. No treatment was given to patients transferred to the hospital from other places. The treatment RT-HT-CT was applied to 39 patients, RT-CT to 110, RT to only 50, while 47 had no treatment, and the remaining possible treatments were applied to less than 22 patients. 2.1. The Markov model Let {X (t); t 0} be a Markov process with a state space E, where X (t) denotes the state of the disease at time t 0. Homogeneous Markov processes have frequently been used in the study of the evolution of diseases. For this model, the transition probabilities p ij (t) are dened p ij (t)=p{x (t + s)=j X (s)=i} As can be seen, this expression does not depend on s. This means that the transition i j on an interval of length t has the same probability at any time. This model has been studied previously by the authors [3]. Under this model, in our case, the relapse probability (transition 1 2) in one year was the same when a patient entered state 1 recently and when a patient had been in state 1 for some time (for example, two years). We also followed another approach to the evolution of breast cancer, introducing non-homogeneous Markov processes. A non-homogeneous Markov process with a state space E was considered. Transition probability functions in this model are denoted by p ij (s; t), where 06s6t: p ij (s; t) = Prob{X (t)=j X (s)=i}; ij E where p ij (s; s)= ij (Kronecker delta). The transition matrix function is P(s; t)=(p ij (s; t)). Transition intensity functions are dened by p ij (t; t + h) q ij (t) = lim ; i;j E; i j h 0 h 1 p ii (t; t + h) q i (t) = lim ; i E h 0 h

112 R. P EREZ-OC ON, J. E. RUIZ-CASTRO AND M. L. G AMIZ-P EREZ Figure 1. Transitions among states in the Markov model. Transition intensity functions q ij (t) can be interpreted as the change rate from state i to state j in time t, and q i (t)= q ii (t) as the exit rate from state i in time t. The q-matrix of the process is Q(t)=(q ij (t)), which we assume to be conservative, that is, the sum of its rows is zero, so q ii (t)= q i (t)= j i q ij(t) for all t. The staying time T i in state i for this Markov process, if the initial state is i, is [11] ( t ) P{T i t} = exp q i (u)du The transitions among states when the model is applied to our data set are given in Figure 1. 2.2. Model selection The eect of the treatments is incorporated into the model via the transition intensity functions [12]. A column vector z of observed covariates is considered (z is the transpose vector of z). Each component of this vector indicates the application or not of treatments. In this way, the data set is partitioned into dierent subcohorts, with particular attention paid to the evolution of these subcohorts. Therefore, in the model, we considered a covariate vector z, and the expression of these functions in the non-homogeneous case between the states i; j is 0 q ij (t; z)= ij (t) exp(z ij (t)); ij (t) 0 In this expression, ij (t) is the parameter column vector associated with the covariate vector z in the transition between states i and j in time t, and q ij (t; z) is the transitional rate at t for the subjects characterized by the vector z. The scalar product of vectors is denoted by z ij (t). These vectors are interpreted as in the Cox model [8]: ij (t) represents the regression coecients of the covariates on transitional rates, and ij (t) is the baseline transition intensity function between states i and j in time t, the transitional rate when z is the null vector. The transition probability function from state i in time s to state j in time t is denoted by p ij (s; t; z). This is the non-homogeneous model with covariates. Our aim is to calculate the transition probability functions and the survival functions for groups of patients submitted to the same treatment, each group characterized by a certain value of the covariate vector. This was done by estimating the parameters using the maximum-likelihood method. An approach to non-homogeneity in a Markov process is the stepwise method. The model we present assumes that transition rates are constant in dierent time intervals, so that covariate eects are included and parameters are allowed to change over xed intervals of the time domain of the

PIECEWISE MARKOV PROCESS FOR ANALYSING SURVIVAL FROM BREAST CANCER 113 Table I. Deaths and censored patients for each period. 06t648 months t 48 months Survivors in state 1 170 89 Survivors in state 2 25 11 Deaths from state 1 67 47 Deaths from state 2 38 26 Censored in state 1 0 110 Censored in state 2 0 12 Relapse 63 13 study. If we consider a piecewise q-matrix in the disjointed intervals determined by the points that approach non-homogeneity, a 1 ;a 2 ;:::;a k with a 0 = 0 and a k =, the q-matrix of the model can be expressed in a general form by Q(t; z)= { Q1 (z); 0=a 0 6t6a 1 ; Q l (z); a l 1 t6a l ; l =2; 3;:::;k 3. RESULTS The general model was introduced in Section 2, where we applied the homogeneous Markov process to our data set [3]. Our aim is to construct a more accurate model for representing the survival probabilities for the dierent groups of patients. A test for data homogeneity in Markov processes is applied, considering all the transitions among states, thus extending a previous work [13]. The experimental value derived from the data for the statistics in the asymptotic distribution of the score vector from the likelihood function was 18:16, with a p-value 0:0001, showing that the hypothesis of non-homogeneity cannot be rejected. Sojourn times in states play an important role in Markov processes, and thus the behaviour of these quantities is relevant in these studies. In our case, these times are time before relapse or death, and time after relapse until death, and our interest was to establish the inuence of treatments on these times. We studied the cumulative hazard functions of sojourn times in states and observed the relapse times from state 1, and death times from states 1 and 2. From these empirical functions, two periods can be established in the evolution of breast cancer: up to 48 months and after 48 months. After 120 months there was no information because there were few transitions. Consequently, two periods were considered in the evolution of each patient: from the entry to observation (t =0)uptot = 48 months and after 48 months. Table I presents a summary of the data by periods. 3.1. Likelihood function and parameter estimation The likelihood function we calculate is an adaptation of previous work to the piecewise model [6; 14]. We assume that n items are observed, all beginning in state 1, and that item i has m i changes of states, the last time being death or censorship, and thus we have a sequence of times for each item i: 0=t i; 1 t i; 2 t i;mi. In these times, the item successively occupied the states 1=x1 i ;:::;xi m i. The value of the covariate vector for item i is known and denoted by z i. Each patient introduces to the likelihood function dierent factors in accordance with her sample function.

114 R. P EREZ-OC ON, J. E. RUIZ-CASTRO AND M. L. G AMIZ-P EREZ Therefore, if the observed transition interval for a patient is between two cutpoints, the contribution to the likelihood is the transition probability with the corresponding q-matrix. If the observed transition interval has one cutpoint, the contribution to the likelihood is the product of two factors: the transition probability in the interval between the instant of jump and the cutpoint, and from this point to the next jump or censoring, with the corresponding q-matrices in each period. If the observed transition interval is between k cutpoints, the contribution is the product of (k + 1) transition probabilities with the corresponding q-matrices. The last time observed could be a death time or a censoring time; in the rst case, the last product is a transition probability to the absorbent state, and, in the second case, a survival probability in the last state visited. The likelihood function for this data set is expressed as L = n m i p x i (t r 1 ;xi i; r 1;t r i; r ; z i ) i=1 r=2 This model can now be applied to our data set. We will assume that the q-matrix changes in time t = 48 (four years) and that it is constant in the intervals determined by this partition [0,48] and ]48; [. The evolution of the patients is progressive; subjects go from state 1 to states 2 or 3, and from state 2 to state 3, but do not go back to a previous state, so that the only observed transitions are 1 2; 1 3; 2 3. The covariate vector is a three-vector z =(z 1 ;z 2 ;z 3 ) = (RT; HT; CT), with z h a dichotomous variable for h =1; 2; 3; taking the value 1 if the corresponding treatment was not applied and 0 if it was. The eect of treatment h on the transition i j is measured by the coecient ij h ;h=1; 2; 3, so we have ij(t)=(ij; 1 ij; 2 ij 3 ) in the intervals [0,48] and ]48; [. The interpretation of the parameters and is as in the homogeneous case [3] in each period. The transition intensity matrix for this model is ( 12 (t)e z 12(t) + 13 (t)e z 13(t) ) 12 (t)e z 12(t) 13 (t)e z 13(t) Q(t; z)= 0 23 (t)e z 23(t) 23 (t)e z 23(t) 0 0 0 The entries of this matrix can be estimated by applying the maximum-likelihood method and taking into account the fact that this matrix is constant in each of the two intervals of the partition determined by the time 48. The computational program we have used is the MATLAB program which involves the algorithm of Nelder Mead [15]. The dierent factors appearing in the maximum-likelihood function are given in the Appendix. The estimated parameters following this procedure, with their standard errors, are given in Tables II and III. The transitional rates among states for dierent subcohorts determined by the applied treatments are calculated from these tables and are given in Table IV. Some of the estimated parameters are close to zero, and it is possible to improve the model specication. 3.2. Survival probabilities The forward Kolmogorov equations in this case are P (s; t; z)=p(s; t; z)q(t; z). We can derive the transition functions from these probability equations if we know the q-matrix Q(t; z). In the Appendix the mathematical expression of the estimated survival probabilities is given for patients in state 1. We represent the survival curves for dierent treatments and compare these curves with the corresponding empirical survival curves and test the t.

PIECEWISE MARKOV PROCESS FOR ANALYSING SURVIVAL FROM BREAST CANCER 115 Table II. Maximum-likelihood estimates and standard errors of baseline transition rates among states. Parameters 06t648 months t 48 months 12(t) 0.0042 0.0007 (SE) (0.0017) (0.0005) 13(t) 0.0018 0.0044 (SE) (0.0022) (0.0014) 23(t) 0.1000 0.0191 (SE) (0.0427) (0.0105) Table III. Maximum-likelihood estimates and standard errors of regression coecients. Parameters 06t648 months t 48 months Components Components ij1 ij2 ij3 ij1 ij2 ij3 12(t) 1.5534 0.2908 0.6340 1.0684 0.4020 0.6560 (SE) (0.3020) (0.3183) (0.2747) (0.6150) (0.6670) (0.5760) 13(t) 0:1657 0:4543 1.2835 0:6465 0.0702 0:2380 (SE) (1.1846) (0.8241) (1.0707) (0.7739) (0.3591) (0.4017) 23(t) 0:9038 0:2090 0:1639 1.1227 0.1395 0:9415 (SE) (0.3289) (0.3940) (0.3205) (0.0002) (0.6050) (0.6111) Table IV. Maximum-likelihood estimates of transition rates among states. Treatments ˆq 12 (t) ˆq 13 (t) ˆq 23 (t) 06t648 t 48 06t648 t 48 06t648 t 48 months months months months months months CT 0.0269 0.0031 0.00098 0.0025 0.0329 0.0675 RT-CT 0.0057 0.0011 0.0012 0.0047 0.0811 0.0220 HT-CT 0.0201 0.0021 0.0015 0.0023 0.0405 0.0588 RT-HT-CT 0.0042 0.00072 0.0018 0.0044 0.1000 0.0191 No treatment 0.0507 0.0060 0.0035 0.0020 0.0279 0.0263 RT 0.0107 0.0021 0.0042 0.0037 0.0689 0.0086 HT 0.0379 0.0040 0.0056 0.0018 0.0344 0.0229 RT-HT 0.0080 0.0014 0.0066 0.0035 0.0849 0.0075 In Figures 2 to 5, we give the survival function for treatments RT-CT-HT, RT-CT, RT, CT and no treatment, and the corresponding empirical survival curves using the product-limit estimate [16]. The graphs cover a period of ten years. For each graph a test to study the t has been drawn using the likelihood rate test [17]. When data are censored, one way to perform this test is to apply the usual likelihood test in each interval determined by a lifetable associated with the data. To test the equality of the two graphs in the

116 R. P EREZ-OC ON, J. E. RUIZ-CASTRO AND M. L. G AMIZ-P EREZ Figure 2. Empirical and theoretical curves for RT-HT-CT. Figure 3. Empirical and theoretical curves for RT-CT. general case, a lifetable with k intervals is calculated and compared with the theoretical curve obtained from the model. The null hypothesis H 0 is that the survival probabilities in both curves are identical in each interval of the lifetable. The statistical test is =2 k d j log d j +2 k d j0 j=1 j=1 (n j d j ) log n j d j n j d j0

PIECEWISE MARKOV PROCESS FOR ANALYSING SURVIVAL FROM BREAST CANCER 117 Figure 4. Empirical and theoretical curves for RT. Figure 5. Empirical and theoretical curves for no treatment. n j being the number of patients at risk at the beginning of interval I j ; d j the number of deaths in I j ;d j0 = n j (1 S j0 ), and S j0 the probability to survive interval I j given survival to interval I j 1 in the model. It is known that the limit distribution of the estimate, under the null hypothesis H 0, is bounded by a distribution 2 with k d.f. and a distribution 2 with k s d.f., k being the number of intervals considered and s the number of parameters. In this study s = 36, so we take

118 R. P EREZ-OC ON, J. E. RUIZ-CASTRO AND M. L. G AMIZ-P EREZ Table V. Values of the statistics of the likelihood ratio test for the survival t to treatments to the empirical survival functions. Treatment k p-value with 2 k p-value with 2 k s RT-HT-CT 33 31.2032 0.5568 0.0003 RT-CT 33 48.3538 0.0412 0:0001 RT 31 31.6023 0.4362 0:0001 CT 28 19.7415 0.8737 0.0006 No treatment 43 24.6187 0.9890 0.1735 Table VI. Survival probabilities for treatments to 4, 8 and 10 years following the piecewise Markov model. Survival probability 4 years 8 years 10 years 12 years RT,HT,CT 0.7807 0.6122 0.5418 0.4794 RT-CT 0.7737 0.5832 0.5068 0.4406 HT-CT 0.5786 0.3102 0.2709 0.2418 RT-HT 0.5512 0.4560 0.4137 0.3748 RT 0.5793 0.4645 0.4144 0.3689 CT 0.5638 0.2221 0.1866 0.1617 HT 0.4074 0.2005 0.1496 0.1158 No treatment 0.4363 0.1626 0.1062 0.0728 the lifetable in intervals of three months for all curves and the number k is greater than 36 in all cases. Moreover, three months is a common period of observation in medical studies. The critical values for the treatments are given in Table V. Table V shows that the t of the curves is good, except perhaps the CT-RT treatment, and thus the model provides a good approach to the survival for treatments listed in Figures 2 5. Table VI gives the survival probabilities for all treatments in dierent times. 4. CONCLUDING REMARKS In previous studies of the data set that we present here, the Cox model was applied and the hazard rate functions were determined for dierent risk groups. A more detailed study is presented in this paper. The application of the Markov model to the data set takes into account several specic considerations about the disease process. First, the Markov hypothesis is tested, and we have the empirical conrmation that the transition rates in states are not aected by the previous sojourn time. Therefore, we assume that the Markov model is appropriate. The homogeneous Markov process has been applied by the authors [3]. We considered that the t of the survival curves to the empirical ones were not very good, and therefore we have considered the homogeneity of the model. A test of homogeneity for a Markov process has been applied, and we have no empirical evidence against the non-homogeneity of the data. Therefore, we consider the transition rates depending on time. Many models can be applied

PIECEWISE MARKOV PROCESS FOR ANALYSING SURVIVAL FROM BREAST CANCER 119 hereafter. We have considered several models, tting dierent distributions to these times. The data set is censored and the t was generally not adequate. Overall, the graphic t of the survival curves is better when the two-piece Markov model is considered. These previous studies and the simplicity of the calculations with the stepwise model have led us to consider a model with one cutpoint (t = 48). The non-homogeneous model we rst applied had no covariates. However, it seemed too restrictive to consider the dierent groups of patients to be homogeneous with respect to lifetime, it being more realistic to assume that the treatments introduce some non-homogeneity within the risk groups. Therefore, dierent groups of patients have been considered, and covariates have been introduced. The Markov process framework developed by Kay [9], continued by Gentlemann et al. [1] and applied by Andersen et al. [18] for studying the eect of disease indicators on survival has been used in this paper for constructing survival probabilities for dierent groups of risk determined by the treatments for breast cancer, introducing non-homogeneity in time. The present paper is also indebted to the work of Lu et al. [19], where a Markov model is applied for the evaluation of prognostic factors inuencing the course of HIV. We have performed specic computational implementations of the model. Some of the parameters estimated are close to zero. A hypothesis test has been applied testing the null hypothesis of being zero for eight parameters, which cannot be rejected. Consequently, the model can be simplied, but the parameter values and the plot of the graphic curves are not signicantly dierent from those in the unsimplied model. We have considered the most general case. As a result, we have obtained the eect of treatments in the evolution of survival probabilities and relapse times to breast cancer in a cohort of patients. The initial cohort of 518 subjects can be considered random, as it included all the patients that arrived at the hospital. None the less, the treatments are not applied randomly, depending instead on the patients. For example, HT is applied when the patient has hormonal receptors. The contribution of our study is to establish relapse and survival curves for patients submitted to treatments after surgery. The interest of the group of doctors with whom we are working is to know the relapse and survival curves of dierent risk groups. This study can be extended by taking into account not only treatments, but age, number of axillary nodes aected, and size of tumour. This would provide an indicator on the patients that arrive at the hospital relative to the relapse and survival times in terms of the subcohort to which they belong. The computational calculations have been performed using the MATHEMATICA and MATLAB programs, adapting the software to a general homogeneous model with m states to an extension to the non-homogeneous model. The parameters are estimated following the algorithm of Nelder Mead [14], and a survival table results for dierent periods of time and for dierent treatments. Survival functions for dierent treatments are established and compared with the corresponding empirical survival function [16], using a test of the goodness-of-t between the two curves to show that the model is well tted. APPENDIX Let {X (t);t 0} be a Markov process that describes the evolution of a disease. State space E is nite, with one state being absorbent (death) and the rest of the states transient. We assume the notation given in Section 3 for change or censored times and occupied states for the process. The

120 R. P EREZ-OC ON, J. E. RUIZ-CASTRO AND M. L. G AMIZ-P EREZ likelihood function, with the vector covariates incorporated to the model, is L = n m i p x i (t r 1 ;xi i; r 1;t r i; r ; z i ) i=1 r=1 An approach to the non-homogeneous case can be made assuming piecewise constant transition intensity functions. Our estimate is reduced to the tting of homogeneous continuous models to a series of k intervals, each bounded by points through the observation interval. The q-matrix is allowed to change at certain times but is constant between these times, so we consider Q(t; z)=q l (z); a l 1 6t a l ; l=1; 2;:::;k a 1 ;a 2 ;:::;a k k being cutpoints with a 0 = 0 and a k =. For calculations we dene the intervals I j =[a j 1 ;a j [; J q =]a q 1 ;a q ]; j;q=1; 2;:::;k. Q(t; z) Let pij (l; z) be the transition probability function calculated using the q-matrix Q l (z). Then, the factors in the likelihood function above have dierent expressions: (a) If t i; r 1 I j ; t i; r J j (b) If t i; r 1 I j ; t i; r J j+1 p x i r 1 ;xi r (t i; r 1;t i; r ; z i )=p Qj(zi) x i r 1 ;xi r(t i; r t i; r 1 ; z i ) p x i (t r 1 ;xi i; r 1;t r i; r ; z i )=p Qj(zi) (a x i j t i; r 1 ; z i )p Qj+1(zi) (t r 1 ;xi x i i; r a j ; z i ) r 1 r 1 ;xi r (c) If t i; r 1 I j ; t i; r J q ; q j 2 p x i (t r 1 ;xi i; r 1;t r i; r ; z i )=p Qj(zi) x i r 1 r 1(a j t i; r 1 ; z i ) ;xi q 2 u=j p Qu+1(zi) (a x i u+1 a u ; z i )p Qq(zi) r 1 ;xi x i r 1 r 1 r(t i; r a q 1 ; z i ) ;xi The state of the items in the cutpoints is known. For parameter estimation, we must apply the Chapman Kolmogorov equation in the cutpoints to obtain the transition probability functions. Q(t; z) If P is the transition matrix calculated with the estimated q-matrix Q(t; z), the matricial expression for these probabilities is: For s I j ; t J j P(s; t; z)=p Qj(z) (t s; z) For s I j ; t J j+1 P(s; t; z)=p Qj(z) (a j s; z)p Qj+1(z) (a j t; z) For s I j ; t J q ; q j 2 P(s; t; z)=p Qj(z) q 2 (a j s; z) P Qu+1(z) (a u+1 a u ; z)p Qq(z) (t a q 1 ; z) u=j

PIECEWISE MARKOV PROCESS FOR ANALYSING SURVIVAL FROM BREAST CANCER 121 The survival probability to time t; S(t; z) for breast cancer patients with vector covariate z, is S(t; z)=p 11 (0;t; z)+p 12 (0;t; z) As the model has a good t up to ten years, taking as cutpoints a 0 = 0 and a 1 = 48, the survival function in the rst ten years after surgical treatment is obtained: If t [0; 48] If t ]48; [ S(t; z)=p Q1(z) 11 (t; z)+p Q1(z) 12 (t; z) S(t; z)=p Q1(z) 11 (48; z)[p Q2(z) 11 (t 48; z)+p Q2(z) 12 (t 48; z)] + p Q1(z) 12 (48; z)p Q2(z) 22 (t 48; z) Some of these functions are represented in Figures 2 5 for the dierent values of the covariate vector z indicated in Section 3. ACKNOWLEDGEMENTS The authors wish to thank the two anonymous referees for many helpful comments and valuable suggestions which improved the presentation and content of the paper. We are also very grateful to Dr Pedraza, head of the Department of Radiology at the University of Granada, for kindly supplying the data. The rst author gratefully acknowledges the nancial support by DGES, Proyecto PB97-0827, Ministerio de Educacion y Cultura, España. REFERENCES 1. Gentlemann RC, Lawless JF, Lindsey JC, Yan P. Multi-state Markov models for analysing incomplete disease history data with illustrations for HIV disease. Statistics in Medicine 1985; 13:805 821. 2. Marshall G, Jones RH. Multi-state models and diabetic retinopathy. Statistics in Medicine 1995; 14:1975 1983. 3. Perez-Ocon R, Ruiz-Castro JE, Gamiz-Perez ML. A multivariate model to measure the eect of treatments in survival to breast cancer. Biometrical Journal 1998; 40(6):703 715. 4. Aalen OO, Vernon TF, De Angelis D, Day NE, Gill ON. A Markov model for HIV disease progression including the eect of HIV diagnosis and treatment: Application to AIDS prediction in England and Wales. Statistics in Medicine 1997; 16:2191 2210. 5. Aalen OO, Johansen S. An empirical transition matrix for non-homogeneous Markov chains based on censored observations. Scandinavian Journal of Statistics 1978; 5:141 150. 6. Keiding N, Andersen PK. Nonparametric estimation of transition intensities and transition probabilities: a case study of a two-state Markov process. Applied Statistics 1989; 38(2):319 329. 7. Andersen PK, Hansen LS, Keiding N. Non- and semiparametric estimation of transition probabilities from censored observations of a non-homogeneous Markov process. Scandinavian Journal of Statistics 1991; 18:153 167. 8. Cox DR. Regression models and life tables (with discussion). Journal of the Royal Statistic Society; Series B 1972; 34:187 202. 9. Kay R. A Markov model for analysing cancer markers and disease states in survival studies. Biometrics 1986; 42: 855 865. 10. Kalbeish JD, Prentice RL. The Statistical Analysis of Failure Time Data. Wiley: 1980. 11. Gutierrez R, Perez-Ocon R. Remarks of the paper stochastic analysis of Q-matrix (Discussion on the paper Stochastic analysis of Q-matrix by K. L. Chung). In Selected Topics on Stochastic Modelling, Gutierrez R, Valderrama M (eds). World Scientic, Singapore, 1994; 17 20. 12. Perez-Ocon R, Gutierrez Jaimez R, Garcia Leal J, Ollero Hinojosa J. A Markovian model with exponential transition intensities: application to the two sample problem. In Applied Stochastic Models and Data Analysis, Volume 2, Janssen J, Skiadas CH (eds). World Scientic, Singapore, 1993; 728 738. 13. de Stavola BL. Testing departures from time homogeneity in multistate Markov processes. Applied Statistics 1988; 37:242 250.

122 R. P EREZ-OC ON, J. E. RUIZ-CASTRO AND M. L. G AMIZ-P EREZ 14. Andersen PK, Borgan, Gill RD, Keiding N. Statistical Models Based on Counting Processes. Springer-Verlag, NY, 1993. 15. Nelder JA, Mead R. A simplex method for function minimization. Computer Journal 1985; 7:308 313. 16. Kaplan EL, Meier P. Nonparametric estimation from incomplete observations. Journal of the American Statistical Association 1958; 53:457 481. 17. Lawless JF. Statistical Models and Methods for Lifetime Data. Wiley, 1984. 18. Andersen PK, Hansen LS, Keiding N. Assessing the inuence of reversible disease indicators on survival. Statistics in Medicine 1991; 10:1061 1067. 19. Lu Y, Stitt FW. Using Markov processes to describe the prognosis of HIV-1 infection. Medical Decision Making 1994; 14:266 272.