A problem with the likelihood ratio test for a change-point hazard rate model

Transcription

1 Biomeirika (1990), 77, 4, pp Printed in Great Britain A problem with the likelihood ratio test for a change-point hazard rate model BY ROBIN HENDERSON Department of Mathematics and Statistics, University of Newcastle-upon-Tyne, Newcastle upon Tyne NE1 7RU, U.K. SUMMARY A likelihood ratio test for constant hazard against a step change alternative has received considerable attention in recent years. Additional information in the maximum likelihood estimate of the change-point can seriously affect interpretation of test results. Some simple modifications are considered for which exact percentage points can be derived. Monte Carlo power and mean squared error comparisons are encouraging. Some key words: Change-point; Consistent estimator; Hazard rate; Sufficient partition; Weighted likelihood ratio. 1. INTRODUCTION Testing a sequence of variables for a change in distribution at an unknown time point has been considered by many authors over many years. See, for example, James, James & Siegmund (1987) for an illustration and references. One aspect which has not received adequate attention, however, concerns the sufficiency of statistics designed to test the null hypothesis of no change in distribution, and consequently the possibility that further information in the data may influence the interpretation of test results. The present paper considers this point in detail for one particular form of change-point problem, that of testing survival data for constant hazard against the alternative of a step change at an unknown time point. To begin with, however, some more general issues are discussed. The most common form of change-point problem assumes a sequence x,,..., x n of independent random variables. Under the null hypothesis H o the variables are identically distributed, but under the alternative hypothesis H x there is a change in distribution at some unknown point k in the sequence (l^k<n). That is, the first k observations are drawn from one distribution and the remaining (n-k) are drawn from a different distribution. Thus H, contains a family of alternatives indexed by a parameter k which disappears under the null hypothesis. In parametric problems a standard method of testing H o against H t is to construct a likelihood ratio test, which is achieved by taking each potential change point k in turn and determining for each the usual log likelihood ratio statistic L k for testing H o against W,. The overall statistic is then L= max (L k ), (1) which implicitly assumes that all values of k are equally likely under the alternative hypothesis. From a Bayesian viewpoint, therefore, a usually unstated component of ff, is that the prior distribution of k is uniform over the integers 1 to n - 1 inclusive. If any

2 836 ROBIN HENDERSON other distribution is presumed then the statistics L k should be weighted accordingly before the maximization if L is to be considered as a genuine likelihood ratio. Consider now the sufficiency of L or more generally any statistic 5 designed to test H o. In accordance with the sufficiency principle (Cox & Hinkley, 1974, p. 37), 5 is sufficient for H o if equally sized samples giving the same value of S always lead to the same conclusions about the validity of H o. This implies that there is no additional information about H o in any other statistic U obtained from the data, which in turn implies that the conditional distribution of U given S = s must be the same under both H o and H t for all values s. To see this suppose that there is some value u for which from which we deduce that the posterior odds on H o given U = u and 5 = s are not the same as those given S = s only, thus contradicting the sufficiency principle. Of course if the distribution of U given 5 = s is unknown under either H o or H x then lack of formal sufficiency has no practical consequence with regard to conclusions since the additional information in U could not be interpreted. Similarly, lack of sufficiency may also be acceptable if additional information can only be obtained at the expense of considerable extra effort. Nonetheless it does not seem sensible to ignore any readily available additional information which may influence conclusions. Suppose therefore that the procedure used to derive S also yields a consistent estimator k of either the change-point k or the proportion k/n, for example at (1) the value of k corresponding to the maximum of the L k. Would two samples with the same value of 5 but different values of k always lead to the same posterior degree of belief in H o l Sufficient conditions for this are that (i) k is independent of 5, and (ii) k has a uniform marginal distribution under H o. The reason is that under the composite //, all change-points k are equally likely. Since k is consistent for k all values of k are therefore at least approximately equally likely under W,, the approximation becoming exact as sample size increases. So if conditions (i) and (ii) hold then the argument of the preceding paragraph shows that k contains no information about H o. Of course these conditions are stronger than strictly required, condition (i) in particular, and weaker necessary conditions would involve the conditional distribution of k given S. Such conditions would be difficult to verify in general however and so it seems reasonable to investigate the at least approximate validity of (i) and (ii) in practical situations. With these comments in mind consider the change-point hazard rate problem (Matthews & Farewell, 1982). The hazard function A(/) of a failure time variable T is modelled as and a test of H o : p = 1 against //,: p =t= 1 is required with T unknown. Note that this is not a standard change-point problem in the sense of a change in distribution somewhere in a sequence of random variables as described above, but the treatment is the same as will be indicated later. A number of intriguing aspects have been highlighted for this deceptively simple problem. Matthews & Farewell (1982) suggested that the null hypothesis could be tested

3 Likelihood ratio test for a change-point hazard rate model 837 via a standard likelihood ratio test but Nguyen, Rogers & Walker (1984) pointed out that the likelihood is unbounded under the alternative hypothesis since a singularity appears if p -» oo and T is taken immediately before the largest observation. Matthews & Farewell (1985) removed the singularity by considering the data as being in effect discrete and re-formulating the likelihood as a product of probabilities rather than densities. Yao (1986) also overcame the problem but this time by simply constraining the estimate of r not to fall in the interval between the largest two observations. Worsley (1988) made the attractive and sensible observation that the singularity could be removed without other material effect if the largest observation is artificially considered to be censored. Another problem concerns the distribution of the likelihood ratio test statistic. There are three unknown parameters under the alternative hypothesis and one under the null, so standard asymptotic theory suggests that a x 2 distribution with two degrees of freedom should be appropriate for the usual statistic of twice the log likelihood ratio, for large samples at least. Matthews & Farewell (1982) noted that the problem is not regular so strictly the asymptotics do not apply but nonetheless they found simulated percentage points to be close to the x\ equivalents. Worsley (1988) showed how exact percentage points could be obtained for the likelihood ratio test statistic with the final observation censored and tabulated 90%, 95% and 99% values for various sample sizes. These were considerably higher than the corresponding x\ values and did not seem to converge to a finite limit as the sample size was increased. Worsley's method is outlined in more detail in 2 below. Section 2 also considers the marginal distribution under H o of the rank in the data of the estimated change-point. This indicates that the likelihood ratio test is not sufficient and therefore some alternative procedures are examined, the object being to find a test statistic for which the corresponding estimated change-point contains no additional information. Some Monte Carlo power and mean squared error comparisons are given in 3 and some general remarks in 4 complete the paper. 2. LIKELIHOOD RATIO AND ALTERNATIVES Consider the following situation. A sample of n independent failure times is available with f,,..., f n denoting the ordered values. The largest observation is considered to be censored as suggested by Worsley (1988) but otherwise all failure times are observed. The effect of random censorship is discussed by Matthews & Farewell (1982) and Worsley (1988) and will not be pursued here. Yao (1986) shows that the likelihood under model (2) is maximized over T by f either just before or just after one of the observed failure times, that is for fe {tt,fk,k = \,...,n l} in an obvious notation. Writing L\ and L~ k for twice the log likelihood ratio evaluated at r= t\ and r = t~ k respectively, Worsley (1988) shows that, for k = 1,..., n 1,

4 838 ROBIN HENDERSON where 0 log 0 is taken to be zero and Now let L' k be the greater of Lt and L~ k for each k. Then L, = max {L' k : k = 1,..., n -1} is the likelihood ratio statistic suggested by Matthews & Farewell (1982) for testing H o against H,, which shows the similarity of treatment with the more common problem discussed at (1). Let k\ be the value of k corresponding to the maximum of the {L' k }, so that, denotes the rank in the sample of the estimate of the change-point T. Note that L, can be considered to be a likelihood ratio test of H o with all values of, given equal prior probability, which is the assumption made henceforth. This can be shown to be equivalent to the prior assumption that T has an exponential distribution with rate A. We concentrate for simplicity in the main on fc, rather than the maximum likelihood estimate f, say of T, but shall return to f, when considering mean squared error in 3. The formulation of L and L~ k in terms of U k and constants is useful in determining the exact distribution of the statistic L,. Since both L and L* are convex in U k the event {L^SSJC} is the same as the event {a k^ U k^b k } for some constants a k and b k. Hence L,^x if and only if a k *z U k^b k for each of k = 1,..., n -1. Worsley (1988) shows that under H o the distribution of U t,..., / _, is the same as that of n -1 ordered uniform random variables and so an algorithm given by Noe (1972) can be applied to find pr (a* = U k^b k, k = 1,..., n - 1). Thus exact percentage points of L, can be determined and are given by Worsley for various sample sizes. As mentioned in 1 these are not close to the corresponding \\ values and give no indication of converging to a finite limit as the sample size increases. By considering terms such as pr (x< L' k^x + 8x, L'j^x, j #= k), Noe's algorithm can be adapted to give a numerical procedure for the calculation of the marginal distribution under H o of &,. This is given in Fig. l(a) for sample size n = 31 which gives 30 uncensored observations. The distribution is drawn for clarity as a continuous curve although of course it is discrete with support 1,..., 30. Figure l(a) also gives equivalent probabilities conditional upon the likelihood ratio statistic L, exceeding the 5% critical value, in other words given a Type I error. In both cases a clear U-shaped distribution is seen with values of, near the extremes of the sample many times more likely than near the centre. For the unconditional distribution for example the probability of k x being equal to the median observation is in comparison to a probability of being equal to the smallest observation. For the conditional distribution the corresponding probabilities are and Similarly shaped distributions occur for other sample sizes also. Knowledge of the shape of the distribution of k, under the null hypothesis clearly affects one's interpretation of a test based on L,. Consider for illustration a situation in which two equally sized samples from different populations produce identical values of L, with a fairly small p-vahie, say between 1% and 5%. Suppose further that the first sample yields the estimate, = 1, whereas the second sample yieldsfc, =\n. One is much more inclined to reject the null hypothesis for the second sample than the first because given a Type I error on L, one expects, to be near the extremes, whereas under H, all values are presumed to be equally likely. Clearly L, is not sufficient and there is additional information in fc,. An improved procedure may involve the joint distribution of L, and,. It is not obvious how to exploit the joint distribution however and such a technique in any case

5 Likelihood ratio test for a change-point hazard rate model (a) Unadjusted: fc, 0-2 (b) Standardised: k, 2 Ou ' CO o & Rank (c) Weighted: k, Rank Rank Fig. 1. Distribution of rank of change-point estimate under null hypothesis, n = 31; solid line, unconditional upon test; dashed line, conditional upon test exceeding 95% point S 01 i Rank (d) Combined: fc«is likely to be cumbersome in practice. An alternative possibility is to modify the test procedure in such a way that the corresponding estimator k has a uniform distribution under H o irrespective of the value of the test statistic. Knowledge of the value of k would then play no role in the interpretation of the test result, as indicated in 1 for more general problems. This is in some way achieved by a suggestion made for other reasons by Worsley (1988) that the estimator of T be constrained to lie between the p-quantile and (1 p)-quantile of the sample. Inspection of Fig. l(a) shows that this would give a much flatter distribution of, under H o, which although not uniform would involve a sufficiently small range of values for the difference to be of no practical concern. Worsley's suggestion, also made in a different context by James et al. (1987), has two important drawbacks. First, the choice of p is crucial but arbitrary and second even the strongest evidence of a change near the extremes of the sample would be discounted. Instead we consider three methods of modifying the likelihood ratio procedure. Each involves the calculation of a statistic L, designed to test H o and produces as part of the test procedure the rank k, of an estimated change-point T, (i = 2, 3,4). 30 (i) Standardizing the Lt, L* terms. Part of the reason for the uneven distribution of, is that the terms L and L* have moments which depend upon k. Consequently one expects values with larger means and possibly also larger variances to produce the maximum. The mean and variance of either L + k or L\~ can be obtained using the observation that the marginal distribution of U k under H Q is beta with parameters k and (n -k), and

6 840 ROBIN HENDERSON that (log U k ) = E{log (1- U n. k )} = -" \/i, var (log U k ) = var {log (1 - U n _ k )} = "f I// 2 cov {log U k, log (1 - U k )} = ("f I// 2 ) The first and second of these moment results can be obtained from standard integrals (Gradsteyn & Ryzhik, 1980, results , ). The covariance result can be obtained recursively starting at k = 1. Together the results can be used to determine simple expressions for the means and standard deviations of L + k and L~ k, say n + k, nl, "t and o~ k. In turn both means and standard deviations can be shown to be relatively large, as expected, for k near either 1 or n -1 and relatively small for k near \n. A sensible adjustment to the likelihood ratio procedure seems to be, therefore, to standardize the L* and LI terms before maximization over k. Hence an alternative to L, as test statistic is the standardized likelihood ratio statistic L 2 = max {{LX fit)i'^, (L k (jl k )/o- k : k = I,..., n 1}. Calculation of L? involves almost negligible extra effort over calculation of L, and Noe's algorithm can again be employed to determine the exact distribution of both L 2 and the corresponding rank k 2 of the estimate f 2 defined in the obvious way. The marginal distribution of k 2 with n = 31 appears in Fig. l(b) and whilst giving improved results a U-shaped pattern can again be seen. Probabilities range from 0017 at the centre to 0113 at the extremes when the unconditional case is considered, and from 0022 to conditional upon L 2 exceeding the 95% point. Note that in the conditional case the maximum values occur at observations 2 and n 2 rather than 1 and n 1. (ii) Weighting the likelihood ratio. An alternative possibility is to weight the likelihood ratio so as to give greater credence to the more central values of L' k. The ad hoc scheme considered here gives weight {k(n-k)}^/(^n) to the likelihood ratio evaluated at t k so that the appropriate statistic is L 3 = max [L k + log {4k{n - k)/n 2 }: k = 1,..., n - 1] which is again easily obtained. Once more the exact distribution of both L 3 and the rank k 3 of f, can be determined using Noe's algorithm. The marginal distribution of 3 under H o appears in Fig. l(c) for sample size n = 31, and again there is an improvement over the standard procedure, but still a U-shaped pattern apparent. The unconditional probabilities range from to 0- with the conditional values ranging from 0024 to for this sample size. (iii) Combined approach: Weighting and standardizing. Both of the preceding procedures give some improvement over the usual methods but neither is entirely satisfactory. The final possibility considered here is to combine the two approaches, to use as test the weighted and standardized likelihood ratio value. Weighting takes place after standardization where the factor of 2 disappears, and so the fourth test statistic is L 4 = max [Lt + { log {4k(n - k)/n 2 }: k = l,...,n-l], where L* is the larger of (Lt -/A*)/CT and (LI -(jl k )/cr k.

7 Likelihood ratio test for a change-point hazard rate model 841 Again Noe's algorithm can be applied to determine the marginal distributions of both L 4 and k 4, the rank of f 4. The latter distribution is shown in Fig. l(d) for n = 31, where a much flatter pattern is apparent for both the unconditional and conditional cases, with probabilities ranging from 0025 to 0054 for the former and from to 0044 for the latter. The distribution conditional upon a significant test statistic is of most interest and although not uniform the pattern shown is sufficiently flat, with all values in a narrow range, for the difference to be of little or no practical concern. The distribution is similarly roughly uniform at other sample sizes also. The additional information in the value of the estimator therefore does not seriously influence the interpretation of the test result, and hence the test seems to be adequate in this sense. Critical values of L 4 for various sample sizes are given in Table 1. Sample size Table 1. Percentage points of L 4 90% Percentage point 95% % Sample size is number of uncensored observations The method used by Yao (1986) to show consistency of the maximum likelihood estimator r, can be adapted to show that f 4 is consistent for T under H {, as indeed are the less useful estimators r 2 and T 3. Some additional algebra is necessary to achieve this but the basic procedure follows Yao and so details are omitted. 3. POWER AND MEAN SQUARED ERROR Some Monte Carlo power comparisons between L, and L 4 are given in Table 2 together with some mean squared error comparisons between f, and f 4. All results are based on batches of simulations, assuming 5% tests, and since the problem is invariant to multiplicative transformations the value of A is taken to be one throughout. Two sample sizes are employed, n =31 and n = 101 giving 30 and uncensored cases respectively. Power is obviously higher for the larger sample size and p relatively far from one. The L 4 statistic generally has higher power than L,, the only exceptions in the table occurring at T = 2-25 and p < 1, where L, has a slight advantage. The estimator f 4 also consistently gives better mean squared error than the estimator f,, particularly when the hazard is reduced by the change and the change-point occurs early. Note that when p < 1 the mean squared error of both T, and T 4 tends to be higher than for p > 1 since the survival time variance is greater. Ten thousand simulations were also carried out with n =101 at each of p = 0-5 and p = 2 with the change point T drawn randomly from a unit exponential distribution.

8 842 ROBIN HENDERSON Table 2. Monte Carlo power and mean squared error estimates p L t Power U Mean squared error p Power L A Mean squared error Results based on batches of simulations at each combination tabulated. Upper values for n = 31, lower for n = 101. When p=0-5 the L, and L 4 statistics had power 0-54 and 058 respectively, the mean squared error of f, was 3-49 and that of r 4 was At p = 2 the powers were 0-51 and 0-55 and the mean squared errors were 0-97 and 0-85 respectively. From these and other simulation results for brevity not detailed here, we conclude that T 4 provides a more reliable estimate of the change point than r,, for small sample sizes in particular, and that overall the L 4 statistic is more powerful than L,. The L, statistic has a slight power advantage for both very early and very late changes, which is as expected since the L 4 statistic gives less weight to the extreme sample values, but for most values of the change point the advantage is with L DISCUSSION None of the three different procedures given above leads to a uniform marginal distribution under H o of the rank of the estimated change-point, as would be desired given the arguments in 1. All three however give distributions which are closer to uniformity than obtained by standard maximum likelihood, and for the combined weighting and standardizing procedure at least the degree of nonuniformity is sufficiently small to be of little practical consequence. The L 4 test statistic also has better power than the likelihood ratio test L, and f 4 has smaller mean squared error than the maximum likelihood estimator f,. No significant extra computational burden is involved in either the calculation of L 4 and T 4 or in the determination of exact distributions and hence the alternative approach seems to be preferable to the standard in all respects for this problem. Cobb (1978) noted that, for more general change-point problems, the maximum likelihood estimator of the location of the change is not a sufficient statistic and additional information about the location can be obtained by conditioning on appropriate ancilliary -A T,

9 Likelihood ratio test for a change-point hazard rate model 843 statistics. However, to my knowledge the relationship between the estimated change-point and the value of any proposed test statistic has not previously been considered for other change point problems. Therefore two statistics designed to detect the presence of a mean change in a sequence of independent normal random variables with common known variance, both of which also yield natural estimates of the change-point as part of the test procedure, have been studied by simulation. One is the likelihood ratio statistic as defined at (1) for this problem and studied in detail by James et al. (1987). The other is a score statistic suggested by Pettitt (1980) and which can be considered to be a weighted form of likelihood ratio, with more weight on the more central potential change-points. The results suggest that a U-shaped marginal distribution is again obtained for the maximum unweighted likelihood estimate of the change-point, with a much flatter distribution occurring for the maximum weighted likelihood value. So, interpretation of the standard likelihood ratio test may depend upon the value of the estimated changepoint, just as for the change-point hazard rate problem considered in this paper. Such difficulties may also occur in other more general hypothesis testing problems when a nuisance parameter is present only under the alternative, as considered by Davies (1977). If the marginal distribution under the null hypothesis of the estimated nuisance parameter is markedly different from the prior distribution of that parameter under the alternative then problems of sufficiency could arise. The possibility of exploiting the joint distribution of test statistic and parameter estimator seems to be worth investigation, or perhaps tests should be made conditional upon the estimator value. REFERENCES COBB, G. W. (1978). The problem of the Nile: Conditional solution to the changepoint problem. Biometrika 65, Cox, D. R. & HINKLEY, D. V. (1974). Theoretical Statistics. London: Chapman and Hall. DAVIES, R. B. (1977). Hypothesis testing when a nuisance parameter is present only under the alternative. Biometrika 64, GRADSTEYN, I. S. & RYZHIK, I. M. (1980). Table of Integrals, Series and Products, corrected and enlarged ed. London: Academic Press. JAMES, B., JAMES, K. L. & SIEGMUND, D. (1987). Tests for a change-point. Biometrika 74, MATTHEWS, D. E. & FAREWELL, V. T. (1982). On testing for a constant hazard against a change-point alternative. Biometrics 38, MATTHEWS, D. E. & FAREWELL, V. T. (1985). On a singularity in the likelihood for a change point hazard rate model. Biometrika 72, NGUYEN, H. T., ROGERS, G. S. & WALKER, E. A. (1984). Estimation in change-point hazard rate models. Biometrika 71, NOE, M. (1972). The calculation of distributions of two sided Kolmogorov-Smirnov type statistics. Ann. Math. Statist. 43, PETTIT, A. N. (1980). A simple cumulative sum type statistic for the change point problem with zero-one observations. Biometrika 67, WORSLEY, K. J. (1988). Exact percentage points of the likelihood-ratio test for a change-point hazard-rate model. Biometrics 44, YAO, Y. C. (1986). Maximum likelihood estimation in hazard rate models with a change-point. Comm. Statist. A 15, [Received November Revised July 1990]