# WITH CENSORED DATA THE TWO SAMPLE PROBLEM. particularly if the mechanism censoring the x values is different from

Save this PDF as:

Size: px
Start display at page:

Download "WITH CENSORED DATA THE TWO SAMPLE PROBLEM. particularly if the mechanism censoring the x values is different from"

## Transcription

1 1. Introduction THE TWO SAMPLE PROBLEM WITH CENSORED DATA BRADLEY EFRON STANFORD UNIVERSITY A medical investigator attempting to compare two different treatments for, say, prolongation of life among disease victims, often finds himself in the following situation: at time T, when it is necessary to end the experiment, or at least evaluate the results up to that time, a certain number of the patients in each treatment group will still be alive. His data will then be represented by two sets of numbers which might look like Xl, X2, X3+, X4, X5+, X6,..., Xm and - yl, Y2+, Y3+, Y4, * *, y.. Here xi and x2 would represent actual lifetimes, while X3+, a "censored" observation, represents a lifetime known only to exceed x3. If all the patients in both treatment groups were treated at time 0, then every + value would be equal to T, a situation that has been investigated by Halperin [1]. Frequently, however, patients enter the investigation at different times after it has begun, and the x+ and y+ values may range from 0 to T. Such a situation, of course, complicates the comparison of the two treatments, particularly if the mechanism censoring the x values is different from that censoring the y values. This may happen, for instance, if the x sequence was run some time ago, so that nearly all the patients have been observed to their death times, while the y sequence is begun later, and contains many censored observations. Gehan [2] and Gilbert [3] have independently proposed the same extension of the Wilcoxon statistic as a solution to the two sample problem with censored data. In this paper the problem is discussed further, and a different test statistic is proposed, which is shown to be, in some ways, superior to the Gehan- Gilbert statistic. 2. A statement of the problem and some notation Suppose xl, x, * *, are independent, identically distributed random variables, having FO(s) = P{x > s} as their common right sided cumulative distribution function (c. d. f.). (Because of the censorship from the right, this is a more convenient function to deal with than the usual left c. d. f. Note that FO(s) is a left continuous, nonincreasing function of s, and that FO(-oo) = 1, F(oo) = 0.) Likewise, let y, y, * *, y be independent, identically distributed 831

2 832 FIFTH BERKELEY SYMPOSIUM: EFRON random variables with common right c. d. f., GO(s) = P{y s}. Both F and GO will be assumed continuous. Our null hypothesis, which we wish to test, is (2.1) Ho: FO = GO, that is, the x and y random variables have the same distribution. What we are considering as alternatives to Ho will become apparent as we discuss the various test statistics. Unfortunately, we are not allowed to observe the x and y values directly. Rather, we are given two additional sequences of numbers, ul, U2,..., Ur, and vi, v2, * * *, v., and our observations consist of the minima, xi = min(x, ul), x2 = min(x2),.. *, Xm = min(xm, ur), (2.2) yi = min(yl, v,), Y2 = min(y, v2), *, y. = min(y, v.). In addition, we know which of the xi are actually x, that is, uncensored observations, and which are the ui, and likewise for the yj. This information will be denoted by the two (random) sequences, 61, 62, * S. m and e-, fe2, * * where r.=lifx=x = ixi = x0l (2.3) t t0if O xi < x, (so xi = ui); 1( if yj = y if yj < yi, (so yj = vj). For the purposes of computing means, variances, efficiencies, and so forth, it is convenient to assume that the ui and vj are independent random variables themselves, with continuous right c. d. f.'s (2.4) H(s) = P{ui _ s}, i = 1, 2, *, in I(s) = P{vj > s}, j = 1, 2, *n, respectively. (However, the reader should keep in mind that such assumptioiis are not essential for the application of the various tests. This point is discussed further in section 5.) Under these assumptions, the xi and the yj are mutually independent random variables with right c. d. f.'s (2.5) F(s) = FO(s)H(s) = P{xi _ s}, i = 1, 2,, n, G(s) = GO(s)I(s) = P{yj _ S}f j = 1, 2, n.., respectively. The bi and fj are mutually independeint Bernoulli random variables, with (2.6) P{S = 1} = P{H _ FO} = f H(s)dF"(s), (2.= 16 = P{I > Gol = - I(s)dG (s). The minus sign compensates for F and GO being decreasing functions of s. Throughout this paper, such notation as P{F _ G} will mean "the probability that a random variable with right c. d. f. F(s) is greater than an independeint random variable with right c. d. f. G(s)," that is, P{F _ G} = - fo F(s)dG(s).

3 TWO SAMPLE, PROBLEM WITH CENSORED DATA 833 In general, the randomii variable xi will tot be independent of 5i, aiid likewvise for yj and ej. The example in section 9 is especially simlple because indepelndelnce does hold in that special case. 3. The test statistic of Gehan and Gilbert Let us define a "scorinig function" if xi _ yj, Ej = 1, (3.1) QG(Xil 1Jj, 6, 'ej) =0 ifxi < iij, &j = l, tl/2 otherwise, and a statistic 1m n (3.2) [l',; =-E E QG(Xi, Yi, &i, ej). Because of the continulity of F, G, H, and I, the condition xi _ yj could be replaced with xi > yj. However, throughout this paper, the definitioins have been choseii to be consisteint even when the c. d. f.'s are not continuous. The statistic WG has beeii proposed indepenidenitly by Gehan [2] and Gilbert [3] (whose names, fortunately, both begin wvith G) as a reasonable extension of Wilcoxon's statistic to the case of censored data. It is easily seen that rnnwg equals the number of (xi, ijj) pairs where x i is knowin definitely to be larger (or as large as) Yj?, plus one half the number of pairs where, on the basis of the given data, x may be larger or mnay be smaller thian,y. It is instructive to follow Gilbert and rewrite QG as 1 if y" < mini(.x'' i, 1,j) 0 if X'I < rin(y'3,?j, uli) (3.3) Q (x~, ~,5 6~, E3) 1/2 if ui < min(x (l, yq.1 Vi) 1/2 if v, < min(x, YJ, a,i) which a simple enumeration of cases shows is equivalent to (3.1). This yields the expectation of WG, namely (3.4) EWG = EQG(Xi, YlJj,iej) = ){"I 'OHI > Go + I [I'-{F"((I > HI- + P>{F"G"H > I,], where FOHI is the riglht c. d. f. of min(x, ui, vj), and so forth. Under the null hypothesis, Ho: F = GO, we have 1l"{F"HI > GO'- = 1',G HI _ FO}, and (3.5) EWG = 2 [P-1PHI > G, +±'-,G"HI _ FP- + 1'-(F"G I > H} + '-(F G"H > I-] = independent of the distributions H and I. (This fact was first brought to my attention by Dr. 'Nathan Alantel, who was kiind enough to send me a preprint of his paper [4].)

4 834 FIFTH BERKELEY SYMPOSIUM: EFRON Gilbert gives the following formula for the variance of WG under 1O (3.6) VarH,(WG) = 12rn [3P{HI > (F0)2} + (n - 1)P{HI2 > (F0)3} + (m - 1)P{H2I > (FO)3}]. In what follows, some advantages and disadvantages of the WG test statistic will be discussed, particularly in comparison with the WV test introduced in section Asymptotically nonparametric tests The statistic WG is not nonparametric, since even under Ho, its distribution will depend on the relationship between H, I, and FO = GO (this can be seen in (3.6), for instance). However, it is "asymptotically nonparametric" in the following sense: if both m and n go to X in such a way that lim[m/(m + n)] = with 0 < X < 1, then, under HO, ( 2law 12j X ~1 i-~ y (4.1) (m + n) (WG-).1\ [ 11-2 a_1 where a2 = P{H12 > (FO)3} (4.2) 2 = P{H21 > (FO)3}. + I 2) Here N(,U, q2) represents the normal law with mean u and variance a-2. (This is a weaker than necessary consequence of the two sample U statistic theorem [5].) Moreover, oi and 2 have consistent estimators as m and n go to infinity. Letting, 1?, 7, and F be the usual (right sided) sample c. d. f.'s calculated from the ui, vj and xi values, respectively, then (4.3) = df f(s)() () will estimate oi consistently, where = max {x: HI(x) > O}, and likewise for a2. Any asymptotically nonparametric statistic can be used to construct an asymptotically level a test of Ho. The only statistics considered in this paper will be those having the asymptotically nonparametric property. It must be remembered that in actual practice, the consistent estimator of the variance necessary to carry out the level a significance test may be more or less difficult to obtain, and this is a definite factor in comparing different test statistics. The following well known result from parametric theory offers some insight into the asymptotically nonparametric property, and will be useful in its own right in section 9. Suppose fe(x) is a density function depending on a k + 1 dimensional parameter 0 = (00, 0', 02, * *, Ok), and having information matrix le = {lij O _, j < k}, (4.4) Iij = Eo a log fe(x) a log fe(x) t3oi agio

5 TWO SAMPLE PROBLEM WITH CENSORED DATA 835 It is desired to test Ho: 0 = 00' on the basis of repeated independent observations xl, X2,..., x, *... from fe(x). Let C be the class of all test statistics tn(x1, x2,..., x.), such that under Ho we have (4.5) n1/2(t.- A) j- N(O,.2(Oo)) for some constant u not depending on Oo c Ho. Then, under suitable regularity conditions, the efficacy of any member of C is bounded above by 1/1 0, where 10 is the upper left entry of I60 l. That is, ( Et (4.6) lim <~e,_ (nẹn Varoe(tn) 100 1=0 for all tn C C. Moreover, the maximum likelihood attains the upper bound. The bound above can be obtained formally from the multivariate Cram6r-Rao inequality. A discussion of this type of theorem, with the suitable conditions, can be found in [6]. The class C, which corresponds to our asymptotically nonparametric tests, is essential to the theorem. Usually there will exist statistics outside of C that do not satisfy (4.6). A genuinely nonparametric test for Ho is mentioned in section 5, but is shown to be very inefficient. Both Gehan and Gilbert propose the same nonparametric test for Ho, under the additional assumption that H = I, that is, identical censoring distributions (namely, the permutation test based on WG). However, this test is not even asymptotically nonparametric when H \$ I. It seems doubtful that a reasonably efficient nonparametric test for HO could be constructed in the general case, but the question remains open. 5. A test of Ho based on cross censorship It is not difficult to generate whole families of asymptotically nonparametric tests of the hypothesis HO: FO = GO as an extension of Gehan and Gilbert's method. For example, let t(x, y) be any bounded real valued function, and define the scoring function Qt(xi, yj, 6i, fj) as f t(xi, yj) if xi _ y, and ej = 1, (5.1) Qt(xi, yj, Si, ej) = -t(yj, xi) if xi < yj and Si = 1, 0 otherwise. Then under Ho the statistic 1m (5.2) WT= E Q,(xi, Y1, Si, ej) n will have expectation zero, since (5.3) EWt = EQ,(xi, ys, Si, e,) = E[t(x, y)lyf < min(x, ui, vj)] - E[t(y'j, x ) x < min(y;, ui, VA)] =0 2

6 836 FIFTH BERKELEY SYMPOSIUM: EFRON under HO by symmetry. (If we let t(x, y) -1, then 2Wt - 1 = WG.) Asymptotic normality and the existence of a consistent estimator of the large sample variance follows again from the two sample U statistic theorem. In this section, a somewhat different type of asymptotically nonparametric test statistic will be discussed briefly. Though the specific statistic derived will be shown to be inefficient relative to both WG and the W statistic introduced in section 8, the method of its construction is simple, and applicable in a wide variety of situations. In the sequel, it will also provide several points of comparison with the W statistic. Suppose, for a moment, that in addition to the data xi, bi, yj, -j, i = 1, 2, -..., m, j = 1, 2, * *, n, we are also given the c. d. f.'s of the two censoring distributions, H and I. This is sometimes a realistic assumption. For instance, if the entry of patients into the experiment is random in time, then H and I may be assumed to be uniform over the period of observation. Using a table of random numbers, draw two new sets of mutually independent random variables, say (5.4) (5 4) ~~~~~v*, V*2,,V* m I, and define the "cross censored" observations xl = min(x1, vi), x2 = min(x2, V2), *, x* = min(xm, v*), yl = min(y1, ut), yi2 = min(y2,2), *, = min(y,, u*). The A and y0 are mutually independent random variables, having right c. d. f.'s FOHI and GOHI, respectively, since xi = min(x, ui, v*) and y; = min(y, vj,u). Under the null hypothesis HO, FOHI = GOHI, and the usual Wilcoxon test is applicable. That is, define _ iif X* > (5.6) ~ ~~~~ Q(,txY) t0 if X*t < y* (5.6)1m -m n W EB, E1 Q(x*i, 0j) - mn j=1.=i Under Ho: FO = GO, this W will have the usual Wilcoxon distribution with (5.7) 1 _ m - +±n + EW = -,X arh10w = 12 mn 2 12 mn When FO #4 GO, the expectation is given by (5.8) EJ4T= P1{F 1 > G0HJI Since we have used a table of random numbers in the construction of W, we know that it is inefficient. Using the usual Rao-Blackwell method, we can get an improved statistic by considering (5.9) WC = E(Wlinformation), the conditional expectation being taken with respect to all of our available information: the xi, yi, As, Ej, and the fuinctions H and I. We will then have

7 TWVO SAMPLE PROBLEM WVITH CENSORED DATA 837 LFWC = E TV IP'FOHI _ GOIII, (5.10) Var Wc < Var W, for all choices of FO, (I, H, and I. It is not difficult to express W, in an easily computable form. Let F and G be the usual (right) sample c. d. f.'s of the xi and yj observations respectively; that is, define N,(s) = (number of xi _ s), N,(s) = (number of Ij > s), (5.11) - Nx(s) F(s) = G(s) = N.s) v"s (Note that P represents a distribution with mass 1/m at the points xi, X2, * Xm and is an estimate of F, not F. Likewise, G puts mass 1/n at yl, Y2, * y,, and estimates G, not GI.) Let x, l, u*, and v* be independent random variables with right c. d. f.'s F?, G, H, and I, respectively, and define * = min(x, v*), y* = min(y, u*). Then (5.12) Pf.f* > y*} m n = E E P{* _.oi ==X - I x;?i = L/j} m n xp{z I Ix i, _.2 yj} f2m n -1 E E[Q(x*, y 7)Ixi, yj] mn i=1 =1= = E(Wlinformation) = Wc. Since by definition P{f.* _ y*} = P{FI > OH', we have (5.13) W, = P{FI > GH) - 1 E p - -f 1i (s)i (s) d[g(s)h(s)], in close analogy with the integral expression for the usual Wilcoxon statistic n the uncensored situation, which is, in our type of notation, - f^ P(s) dg(s). It is easy to show (by the methods used in section 8, or by the theorem on U statistics) that as we let mn and n go to infiniity so that lim[m/(m + n)] = where 0 < X < 1, then W, is asymptotically normal, and under Ho, (5.14) (m + n)12 (Wcl-) j N I(OX + 2 where, if we define F HI = G HI = L, 2 f1 01 = I(L-1(z))z2 dz- 2 1 a'2 (2 = H(z))Z2 H(L-()z dz -- 4

8 838 FIFTH BERKELEY SYMPOSIUM: EFRON Since L(s) < H(s) and L(s) _ I(s) for all values of s, we have (5.16) z3 dzz-- < 2 a z2dz _1 or (5.17) 0 < -2, a2 < 1! =12 The upper bound is attained when there is no censoring, and coincides, of course, with the usual asymptotic variance of the Wilcoxon statistic. If the censoring distributions H and I are not known to us, it is natural to replace them by their sample estimates, Ht and i. Substituting into (5.13) yields the statistic (5.18) We= - P(s)I(s) d[g(s)h7(s)]. This statistic can also be derived from permutation considerations: define (5.19) Q(Xi, Yi, Uk, v{) = 1 if min(xi, vy) > min(yj, Uk),,0 if min(xi, vt) < min(yj, Uk). Then 1 m n m n 12 _ Z n (5.20) - n, Z lfli=lj=lk=l Q(Xi, yj, Uk, Vt). = I17, is asymptotically normal, by the multisample U statistic theorem, and has expectation 1/2 under HO. In some ways, it might be tempting to use (5.18) instead of (5.13) even if H and I were completely known, since given any sample, the randomness of the censoring variables is really of no interest to us. On the other hand, to compute W-, we must know all the ui and vj values, as opposed to WG, WC, and, as it will turn out, TV, where only the ui and vj corresponding to censored xi and yj are needed. Even if all the ui and vj are available, which is often not the case, this is philosophically unsatisfying, particularly from a Bayesian point of view. As commented before, the Wc test will be shown to have low efficiency on the case considered in section 9. This is mainly because it ignores the difference between censored and uncensored xi and yj, as can be seen in (5.13). It is easy to construct cross censorship tests which do not ignore the differences, but the author has not computed efficiencies for any such test. 6. Alternatives to the null hypothesis A desirable property of any test statistic for the two sample problem is that when the null hypothesis is not true, that is when FO #4 GO, the statistic estimates some reasonable measure of the difference between the two distributions. Usually, we are not interested in a simple acceptance or rejection of the null hypothesis, but would like to make a quantitative assessment of the treatment differences. One of the virtues of the usual Wilcoxon statistic, as applied

9 TWO SAMPLE PROBLEM WITH CENSORED DATA 839 to uncensored data, is that it estimates P {FO > GO}, a parameter that is usually very relevant to the investigator. The asymptotic and approximate nature of the tests we are investigating intensifies the need for a test statistic that is also a reasonable estimator. And also, of course, for an asymptotically normal statistic the nature of the expectation when FO #6 GO determines what type of alternatives to the null hypothesis the tests will tend to have the most power against. The statistics we have discussed so far, WG, W, and Wa have the unfortunate property that when FO s GO, their expectations depend on the censoring distributions H and I, which play the role of nuisance parameters in our nonparametric situation. Concentrating our attention on WG, let us write xo»>> y if we can infer from the available data that xo > y, which will be the case if and only if xi _ yj and Ej = 1. Likewise, write y >> xo if yj > xi and bi = 1. Equation (3.4) can then be rewritten as (6.1) EWG = - [P{F HI > G } -P{G HI > F }] = 2 [P{x >>»y >} - P{yf >> t}2] + 1 We see that the WG test will always be consistent when FO is stochastically larger (or smaller) than GO, for then FO(s) > GO(s) for all s, and (6.2) P{F HI > G } - P{GOHI > F } = - _ FO(s)H(s)I(s) dg (s) + J G (s)h(s)i(s) df (s) = - FOHI d(g -FO) + f (G -F )HI df = 2 (GO - FO) d(f HI) + f (GO - FO)HI df > 0, which implies from (6.1) that EWG > 1/2. In cases where FO and GO are not stochastically comparable, the parameter EWG can be positive, negative, or zero, depending on the censoring distributions, and even in simple situations can yield misleading information. Consider for example, the case where the x t and the ui have independent uniform distributions over the interval (-1, 1), while the y ; and vj are uniformly distributed over (- 1/2, 1/2). A simple calculation shows that P{x >> yo;} = 17/96, while P{zo >> x f} = 31/96; that is, among the expected 50 per cent of the (xi, yj) pairs where we know the ordering of (x4, y >), nearly twice as many will favor Y as favor x f. The expectation of WG is in this case, indicating a strong advantage for the y distribution. Such a conclusion would obviously be inappropriate in many situations. One of the major advantages of the W statistic introduced in section 8 is that it estimates the usual Wilcoxon parameter P{FO _ GO}, independently of the censoring distributions H and I.

10 840 FIFTH BERKELEY SYMPOSIUM: EFRON 7. Nonparametric estimates of FO and GO An obvious weakness in the definition of WG can be seen in formula (3.3) in those cases where we are not certain of the ordering between x and y' we ascribe a probability of 1/2 to both possible cases x _ y;, x < yo, irrespective of the relative magnitudes of xi and yj. We will now discuss nonparametric estimates of FO and GO with an eye toward remedying this weakness. Our discussion will be from a point of view particularly suitable to the two sample problem. Given xl, x2, *.., x. and 81, 82,... * Bm, we have already introduced F(s), the sample estimate for F(s), as the usual right sided empirical c. d. f. of the xi. That is, letting (7.1) Ni(s) = number of xi _ s, we define (7.2) F(s) = NZ(S) a left continuous, nonincreasing function of s satisfying F(-oo) = 1, P(oo) = 0, and representing a discrete probability distribution with mass 1/m at x1, x2, xm. Similarly, we would like to define F (s) = Nxo(s)/m, when Ne(s) equals the number of x > s, but because of the censorship, the function Nzo(s) is not available to us. Since x > xi for every i, we do know that xi _ s implies x > s so Nxo(s) _ Ni(s) for every s. For an xi < s that is uncensored, x = xi < s, and xi cannot contribute to Nxo(s). The ambiguous situation is xi < s, xi censored (Si = 0), in which case x will be equal to or greater than s with conditional probability FO(s)/FO(xi). Of course we do not know FO(s), but given any initial estimate of it, say F (s), it seems natural to estimate the conditional probability P{x > slxi < s, Si = 0} by PF(s)/P (xi), and define an improved estimate of FO(s) by 'po(s) (7.3) mf2(s) = Nx(s) + F_ )' Fl(xi) Iterating, we could then use F in place of Al above to get another improved estimate F 3, and so forth, and the question arises whether the sequence PF, P2, 'o,... would converge to a function PO which could then not be further improved by application of (7.3). Such a function would have to satisfy = Nx(s) + E (1I - ) xi< 8 (7.4) mf (s) = Ne(s) + E (1 - Si) xi<8 P0(x2) for all s. We shall call a function satisfying (7.4) a "self-consistent" estimate of FO(s), and prove the following theorem.

11 TWO SAMPLE PROBLEM WITH CENSORED DATA 841 TqEOREM 7.1. Given xl, X2,... - * Xm and 61, a2, *, Sm, there is a uniquze function FO(x) satisfying (7.4) for all values of s. Assume without loss of generality, that x1 < X2 < X3 <... < Xm. Then PF(s) has all the usual properties of a right c. d. f., and represents a discrete distribution with mass (7.5) [m -EF- 'z) at Xk if Xk is uncensored, and mass (7.6) [ l=y i(z) at xm uincensored or not. Here F (s) is defined iteratively by ril S XI, (7.7) F [(s) = NFmk i(1) Xk-i < S < Xk, 2 < k < m, 0sS > Xmn. COROLLARY 7.1. The self-consistent estimate F (s) coincides with Kaplan and Meier's [7] product limit estimate (7.8) P(s) =I (m + 1) s e (Xk-1, Xk], if we define P(s) = 0 for s > xm. They have shown [7] that P(s) is the nonparametric maximum likelihood estimate of FO(s). PROOF. Consider the function F (s) defined by (7.7). For s _ xi, F0(s) = 1 and satisfies (7.4). If 61 = 1, we see that PO(xi) - FO(xl+) = 1/m. If a1 = 0, m - (1 -a1)/po(x1) = m - 1, and PO(xi+) = PO(xi) = 1. Suppose now that for some k, with 2 < k < m -1 k-i1-a i=1 F (x>) F(Xk- +) > Then from (7.7) (7.10) [m - = ] O(s) = N..(s), xk-1 < s5 Xk, and FO(s) satisfies (7.4) for s in the half open interval (Xk-1, Xk]. Since Nz(s) is constant in this interval, so is FO(s), and PO(xk) = Po(xk_l+). If ak = 1, k -a _ k-1 1. i=1 FO(x,) i=1 FO(x,) (7.12) Po(xk+) = [N.(xk) - 1 [m ke1 >, with (7.13) P0(Xk) + fr(xk+) = [m E

12 842 FIFTH BERKELEY SYMPOSIUM: EFRON If 3k = 0, O F (xi) = [7n _ ^ - (7.14) [ I - i1 FO(xi) F(X) i-ij~ FO(x,)] 0r = N(xk) -I = Nx(k+), which implies that (7.11) > 0 in this case. Siice by (7.7) (7.15) [I(Xk) Fx(z = we have P0(xk+) = P0(xk) > 0. In either case, the argument proceeds by induction until we have verified (7.4) fors G (-oo,xml], and shownthatp (xmi_+) > 0, m - Zl=i1(1-50)/1f(xi) = 0. From (7.7), this verifies (7.4) for s E (xmi-, xm], while FO(s) = 0 trivially satisfies (7.4) for s > xm. Since PO(xm) = F0(xm-i+), the jump at xm equals (7.5). Note that if 3,m = 1, m m-11 (7.16) m=r- E > 0, i=1 FO(x1) i=1 Xi while if 3b = 0, (7.17) [m Z Fo(xi) = Nx(xQrt) - 1 = 0, - I-^ Fu I 1 P(qxi)] implying mn- Z'(1-3)/F0(xi) = 0. The argument verifying the uniqueness of ti (s) proceeds by induction in a manner almost identical with the above. To prove the corollary, note that PF(s) = P(s) = 1, for s _ xi. Suppose that F0(xk) = P(xk) for some value of k, with 1 _ k _ m - 1. If 3k = 0, then by the proof above and by definitions (7.7) and (7.8) PF(Xk+) = Ft(Xk) = P(Xk) = P(Xk+), implying that F (s) = P(s) for s E (Xk, Xk±1]. If 3k = 1, then (7.18) t(x1±) - Nx(Xk+) _Nx(Xk+) N.,(Xk) (7.18) P(Xk( k+ =, N.,(Xk) k-i F0(x,) 1 F0(x,) m= k P(Xk) = P(Xk+), mr- k+1i implying again that F 0(s) = P(s) for s G (Xk, Xk+1]. The proof of the corollary is completed by induction. The self-consistent (product limit) estimate is also given by the following simple construction; place probability mass 1/m at each of the points xl < x2 < X3 <... < Xm, (that is, construct the distribution with F(s) as c. d. f.); if x, is the smallest xi that is censored, remove the mass at xi, and redistribute it equally among the m - i1 points to the right of it, xi,+l, Xi,+2, * *, xm. If Xi2 1s the smallest censored value among xi,+,, Xi,+2, * * *, xm, redistribute its mass,

13 TWO SAMPLE PROBLEM WITH CENSORED DATA 843 which will be 1/m + 1/(m - ii), among the m -mn2 points to its right, xi2+1, Xi+21 **.., xm; continue in this way until you reach xm. The resulting distribution has mass (7.19) 1 kil m + I Mn Inm-i) at Xk uncensored, and at xm censored or not, which is easily computed to agree with PO(s) from equation (7.8). This construction sheds some light on the special nature of the largest observation, which the self-consistent estimator always treats as uncensored, irrespective of Sm. In addition to the maximum likelihood property of PO, Kaplan and Meier derive several other results which will be of use to us in the next section. (a) F (s) is a nearly unbiased estimator of FO(s). Specifically, (7.20) 1 2 E >a(s) 1- e-en(8) or equivalently, (7.21) 0 < FO(s) - EPO(s) C FO(s)eEN(8) Since ENS(s) = mf(s), the bias of PO(s) declines exponentially with sample size whenever F(s) = FO(s)H(s) > 0 (that is, whenever it is possible and necessary to estimate FO(s) nonparametrically). In the sequel, we will treat PO(s) as if it were unbiased, since the exponential decline of the bias term easily overwhelms the ml/2 magnification needed for the large sample theory. Harking back, briefly, to the point raised in section 2, equation (7.20) also holds when the ui are a fixed set of constants, say u1 < U2 < U3 <... < U., in which case it is simple to show that the relative bias EPO(s)/FO(s) is a constant within any interval (uk-1, Uk]. (b) P0(s) is a consistent estimator of FO(s), as m goes to infinity. (c) Ml12[PO(s)- F0(s)], considered as a stochastic process in s, approaches in the limit for large m, a normal process with mean 0 and covariance kernel (7.22) r(s, t) = FO(s)FO(t) -df (z), s < t. J- F(z)FO(z)' (The limiting normality is not discussed by Kaplan and Meier, but can be derived easily from (7.7) by standard methods.) The covariance kernel (7.9) represents a distorted Wiener process of the type discussed by Doob in his famous paper on the Kolmogorov-Smirnov statistic [8]. Suppose, for convenience, that FO(s) = 1, so that we are dealing with positive random variables. Define (7.23) a(s) = f F(z)FF (Z) an increasing function of s for s less than M, any value such that FO(M) > 0, and let a-1(s) denote its inverse function. Then for 0 < s < M, the process

14 844 FIFTH BERKELEY SYMPOSIUM: EFRON (7.24) Y (S) = ml'2 PO[a1(s)] - FO[a'(s)] FO[a-1(8)] approaches a standard Wiener process as m goes to infinity. That is, Y approaches a normal process with mean zero and covariance kernel r(s, t) = min(s, t) for 0 < s, t < M. 8. The statistic W We now use F0(S) and c0(s), the self-consistent estimates of FO and G, respectively, to modify the statistic WG along the lines suggested at the beginning of section 7. As mentioned previously, the self-consistent estimates treat the largest observation of each group as if it were uncensored, and we will assume that this is actually the situation, to avoid a host of annoying special cases. That is, we will assume the bi and ej corresponding to the largest xi and largest yj respectively are both equal to 1. Define the scoring function Q(xi, y, bi, ej) to be (8.1) Q(Xi( y, bi, ej) = (xi_y=4xi, yj, &j, 'e, fo', G",, where the conditional expectation is interpreted as if x and yo were actually drawn from F and G0, respectively. Thus, if xi _ yj and Ej = 1, then 0(xi, yj, 6i, Ej) = 1 as before. However, if xi _ y, and Ej = 0, bi = 1, we no longer score 1/2 to indicate equal probability for x _ y and x < y ; rather Q(xi, yi, 3i, Ej) = 1-0(xi)/60(yj) in this case, the conditional probability under G0 that A _ yj is less than x = xi. Table I lists the value of Q(xi, yj, &i, Ej) in all eight possible different cases. TABLE I VALUES OF Q(xi, wj, bi, Eo) (bi, fj) Xi _ yj xi <y (1, 1) 1 0 (0, 1) 1 P(xi) G0(y2)~ ~ ~ ~~~0(j (1, O)1_G(x)) G(xi) L FO(s) dg (s) X(s) P ago(s) (0, O) 1-G ( i)- F0(xi)G1(yj) P1 s(xi)d (yj) The statistic TW is now defined in the usual way in terms of Q(xi, yj, 3i, ej) (8.2) W = mne _ (ji, Yjy As, Ej)

15 TWO SAMPLE PROBLEM WITH CENSORED DATA 845 THEOREM 8.1. W = -f-. PF(s) dgo(s) = P{PF > G }, and is the maximum likelihood estimate of P{F _ G0}. PROOF. To simplify notation, let P{x > y } be the conditional expectation defined at (8.1) and below, and listed in table I. We shall also need #0(yj) = Q0(y3) - G0(yj+), the probability mass function corresponding to the discrete distribution G0(s). From theorem 7.1 (8.3) 9O(yj) = j - c for j = 1, 2, 3, *, n, which can be written as n - E i I tle (8.4) n#0(y3) = 1 + E Ee0O(j) ej= 1. If Ej = 1, then reading from the table, (8.5) Z P{fx > y } = Nz(yj) + E (i -i) FO(j) F0~~~~~iYiP(xi) = mpo(yj) by the self-consistency property (7.4) of F. If ej = 0, then (8.6) P{X _> y>} -= Y, so Yt?,Zh P,x(Y>S_ M G0(yj) (8.7) E > = E q0() A {xf _ y } i = 1 ~~~i=1 Yt >-vi GO(yj) _lyi GO(yj) =1 Remembering that 00(y') is nonzero only for uncensored values of ye, equation (8.5) reduces this last sum to (8.8) P{x' > yo} = m E 60(Yt) P(ye). j==1 vey>vi 0yj) Combining (8.5) and (8.8) yields (8-9) 1=m E P{x? _ y } mnj=i i=i I [te PO(Yj) + o 2 0(ye) PO(y)1 n L~1 ei=ot vevi G(-Yj) j' n,e F (yi) + E 1 't = P0(yj)j0(yj) ej=l

16 846 FIFTH BERKELEY SYMPOSIUM: EFRON by equation (8.4). Thus, (8.10) W = - F (s) dg (s) = P{FPO>_G}, as was to be proved, and the second statement of the theorem follows from the corollary to theorem 7.1, and the usual invariance property of maximum likelihood estimates. It should be mentioned that other consistent estimates of FO and GO exist [7], which conceivably could be used in place of P and G to define the scoring function, as was done in (8.1). However, the uniqueness of the self-consistent estimate insures that the resulting statistic will not be expressible in the form given in theorem 8.1. THEOREM 8.2. Let m and n go to infinity in such a way that lim m/(m + n) = X, with 0 < X < 1. Then (8.11) (m + n)12[jv - P{F0 > GO}] -* N ( law k~0 1-X2) where under Ho, 2 1 z2 dz 1 z3 dz (8.12) 1 4 Jo H[FO-(z)] 4 JoF[FO-0(z)]' 2 1 fz2dz 1 z3dz i24 Jo I[GO-'(z)] 4 JoG[GO-'(z)] Here FO` and GO` are the inverse functions of FO and GO, respectively, and are identical under Ho. PROOF. Only a heuristic argument will be presented here. This argument is somewhat similar to the proof of the Chernoff-Savage theorem [9], and a rigorous proof can be developed along the lines suggested in that paper. Write (8.13) -W = f PO(s) dg (s) =f FOdG + f (PO-FO) dg + f FOd(G -G) + f,x (F-FO) d(g -G), and integrate the third term by parts, yielding (8.14) -W= FOdG + f (PO-FO) dg - f (G0-GO) df + f (PO - FO) d(g - GO) Of these four terms, the first is a constant, while the fourth is asymptotically negligible;

17 TWO SAMPLE PROBLEM WITH CENSORED DATA 847 (8.15) J (F, - FO) d(g- GO) = ()1t2 ml/2(f0- FO) dnl/2(g0 - GO) P L(m + n)1/21 by property c of section 7. (The equation Xm,,n = op[]/(m + n)1/2] means (m + n&)1/2xmfn b as m, n go to infinity.) Consider the second term in (8.14). By property c of section 7 and equation (7.18), fr ml/2[po(s)- FO(s)] dg (s) approaches a normal variate with mean zero and variance (8.16) f Jf r(s, t) dg (s) dg (t) i F ( df (z) dg (t) dg (s). Now under the null hypothesis, GO = FO, this last expression becomes (8.17) -2 f f F0(s)F0(t) df0(z) df (t) df0(s), and a change of variables to FO(z), FO(t) and FO(s) yields (8-18) 2 f f f FLo-tI] dz dt ds Jo zf[fo01(z)] 2 ldt ds dz 1 ( z3 dz 1f z2 dz 2 4 Jo F[FO-'(z)] 4 H[FO-'(z)] ' The last equality follows from (2.5) (8.19) F[FO-'(z)] = H[FO-'(z)]FO[FO-'(z)] = zh[fo-'(z)]. A similar argument gives (8.20) 4fo I[GO(z)] as the limiting variance of the third term in (8.13) under Ho. Since the second and third terms of that expression are independent random variables, this completes the heuristic argument. The expression (8.11) for al fails to converge if H(z) = O{[F0(z)]3}, as z approaches 0, and likewise A2 will not converge if I(z) = 0{[G0(z)]3}. In these cases, the variance of Wk does not diminish at rate 1/(m + n). This situation is illustrated in the next section, and a possible remedy suggested in section 10.

18 848 FIFTH BERKELEY SYMPOSIUM: EFRON In these cases where u2 and A2 are finite, they have obvious consistent estimators in terms of equation (8.11). The second form given for each integral will often be the most convenient to compute with, since PO(z) _ P(z) for all z, and the function P(z) can be computed directly from the observed xi. (To compute f, for instance, requires knowing all the ui values, not just those where xi = ui. As mentioned before, these may not be available, and in any case are not required to compute W.) 9. Asymptotic relation efficiencies of the various tests in the exponential case In this section we will compute the Pitman efficiency (A. R. E.) of W'v, WG, and W, relative to the best parametric test, in the special case where all the random variables involved are exponentially distributed. That is, we assume 8 0 FO(s) = {e Ss- GO(s) = {e ~1' 8< 0, ~ 1' s< 0, = H(s) { s-s I(s) =e-{ a s > O s<0, ~ 1' s <O0, where, for the purposes of the parametric test, a is a known positive parameter, while 0 and 0 are unknown. This example with a = 1 was investigated by Gilbert. Gehan evaluates the more realistic case where H(s) = I(s) = a uniform distribution over [0, T]. Our null hypothesis is Ho: 0 = 1. The observed random variables xi and yj will have right c. d. f.'s, (9.2) F(s) = e(+l)s s G()s s 0, respectively. The Si and ej are Bernoulli random variables, with (9.3) P{-i= 1} = o+ 1 P{e= l} = + a In this case, xi and Si are independent, as are yj and ej. (I am grateful to Dr. J. Sethuraman, for first bringing this useful fact of the independence of xi and bi to my attention.) To simplify matters further, we will let m = n, so the sample sizes remain equal as n goes to infinity. The efficacy Eff(T.) of any test statistic T. for Ho is defined to be OdETRl 0 0 (9.4) Eff (7') ~ =1)2 (9.4) Eff(7) = 2n Vare= i,ol' From the discussion in section 4, we know that the maximum likelihood estimate of 0, say M, achieves the greatest efficacy among all computable test statistics, the class we called C. That efficacy is 1/I00. In our case, assuming 0, (p unknown, but a known,

19 TWO SAMPLE PROBLEM WITH CENSORED DATA 849 (9.5) Eff(M) = 1 0+a ato = 1. From (4.2) and (8.12), the efficacies of WG and IT' at 0 = 1 are 3) (± + 2a + 1)(4 + a + 2) and ) (4) ) (9.7) Eff(W) = a ) Formula (9.7) holds only for 4 > max(1/3, a/3). For 4 < max(1/3, a/3), the situation described at the end of section 8 prevails, and IV has efficacy 0. The Pitman efficiency of WG relative to M, the best parametric test, is by definition (9.8) PIVGI ff( WM) _ 3(4)+ 3 )( +3) 4 ( (4)±a~1)2 a + ) Wheni a = I, that is, when H(s) = I(s) = e- for s >- 0, tlieii PWG/,1j = 3/4 for all values of 4, agreeing with Gilbert's result. For all other values of a and 4), (9.9) 8 (3) < PwGA/1 < 3 with the lower bound achieved at the extreme points 4 = 0, a = 0 aiid 4 = (9, a = 00 lf'or the W statistic we have Pitman efficieney (9.10) PTT/Al = Eff(M) 3(4O - 1) (4) a) (4O + 2 /I) for 4 > nlax(i/3, a/3) 0 for 4) < max(1/3, a/3). Let us consider, the convenience, the case where a _ 1. Then for 4 _ 1/2, we have PWl/M > 3/4. Another way of saying this is that for exponential observations and exponential censoring variables, the Tf' test is more efficient thani the WG test as long as not more than 2/3 of either sample consists of

20 850 FIFTH BERKELEY SYMPOSIUM: EFRON censored observations (for when 4 = 1/2, an expected 1/(0 + 1) = 2/3 percentage of the xi and a/(d) + a) < 2/3 percentage of the yi will be censored under Ho: 0 = 1). That the increase in efficiency can be quite substantial is shown in table II. TABLE II PW,/M IN THE EXPONENTIAL CASE a X X , V M The case of very large 4 corresponds to a very low rate of censoring 1/(1 + 4), in which case both WG and W become equivalent to the ordinary Wilcoxon test, while the F test based on El x/,'l yj corresponds to M. This accounts for the constant top row of the table. The same remark applies to the statistic Wf' discussed in section 5, which by (5.14) has Pitman efficiency (9.11) Pw./m 4 (4 + 4 (4 + 2)2 when a = 1. This is quite low for reasonable values of 4, not attaining even 1/2 until 4 is greater than 4.7. The last column of the table is for a = 0, that is, the case where the yj values are uncensored. 10. Truncating the W statistic The rapid loss of asymptotic efficiency of the WC test when 4 drops below 1/2 in the example treated in section 9 can be alleviated somewhat by considering truncated modifications of the W statistic. We will consider here, briefly, such modifications, first in general, and then as applied to the exponential case as given by (9.1).

### Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written

### Nonparametric adaptive age replacement with a one-cycle criterion

Nonparametric adaptive age replacement with a one-cycle criterion P. Coolen-Schrijner, F.P.A. Coolen Department of Mathematical Sciences University of Durham, Durham, DH1 3LE, UK e-mail: Pauline.Schrijner@durham.ac.uk

### OPTIMAL SELECTION BASED ON RELATIVE RANK* (the "Secretary Problem")

OPTIMAL SELECTION BASED ON RELATIVE RANK* (the "Secretary Problem") BY Y. S. CHOW, S. MORIGUTI, H. ROBBINS AND S. M. SAMUELS ABSTRACT n rankable persons appear sequentially in random order. At the ith

### B y R us se ll E ri c Wr ig ht, DV M. M as te r of S ci en ce I n V et er in ar y Me di ca l Sc ie nc es. A pp ro ve d:

E ff ec ts o f El ec tr ic al ly -S ti mu la te d Si lv er -C oa te d Im pl an ts a nd B ac te ri al C on ta mi na ti on i n a Ca ni ne R ad iu s Fr ac tu re G ap M od el B y R us se ll E ri c Wr ig ht,

### Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay

Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay Lecture - 17 Shannon-Fano-Elias Coding and Introduction to Arithmetic Coding

### Exact Nonparametric Tests for Comparing Means - A Personal Summary

Exact Nonparametric Tests for Comparing Means - A Personal Summary Karl H. Schlag European University Institute 1 December 14, 2006 1 Economics Department, European University Institute. Via della Piazzuola

### max cx s.t. Ax c where the matrix A, cost vector c and right hand side b are given and x is a vector of variables. For this example we have x

Linear Programming Linear programming refers to problems stated as maximization or minimization of a linear function subject to constraints that are linear equalities and inequalities. Although the study

### 9.2 Summation Notation

9. Summation Notation 66 9. Summation Notation In the previous section, we introduced sequences and now we shall present notation and theorems concerning the sum of terms of a sequence. We begin with a

### Two modified Wilcoxon tests for symmetry about an unknown location parameter

Biometril-a (1982). 69. 2. pp. 377-82 377 Printed in Great Britain Two modified Wilcoxon tests for symmetry about an unknown location parameter BY P. K. BHATTACHARYA Department of Statistics, University

### Permanents, Order Statistics, Outliers, and Robustness

Permanents, Order Statistics, Outliers, and Robustness N. BALAKRISHNAN Department of Mathematics and Statistics McMaster University Hamilton, Ontario, Canada L8S 4K bala@mcmaster.ca Received: November

### AN ALGORITHM FOR DETERMINING WHETHER A GIVEN BINARY MATROID IS GRAPHIC

AN ALGORITHM FOR DETERMINING WHETHER A GIVEN BINARY MATROID IS GRAPHIC W. T. TUTTE. Introduction. In a recent series of papers [l-4] on graphs and matroids I used definitions equivalent to the following.

### Chapter 2: Binomial Methods and the Black-Scholes Formula

Chapter 2: Binomial Methods and the Black-Scholes Formula 2.1 Binomial Trees We consider a financial market consisting of a bond B t = B(t), a stock S t = S(t), and a call-option C t = C(t), where the

### Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur

Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur Module No. #01 Lecture No. #15 Special Distributions-VI Today, I am going to introduce

### 1 if 1 x 0 1 if 0 x 1

Chapter 3 Continuity In this chapter we begin by defining the fundamental notion of continuity for real valued functions of a single real variable. When trying to decide whether a given function is or

### 15.062 Data Mining: Algorithms and Applications Matrix Math Review

.6 Data Mining: Algorithms and Applications Matrix Math Review The purpose of this document is to give a brief review of selected linear algebra concepts that will be useful for the course and to develop

### Statistical Machine Learning

Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes

### Introduction to Flocking {Stochastic Matrices}

Supelec EECI Graduate School in Control Introduction to Flocking {Stochastic Matrices} A. S. Morse Yale University Gif sur - Yvette May 21, 2012 CRAIG REYNOLDS - 1987 BOIDS The Lion King CRAIG REYNOLDS

### Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh

Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh Peter Richtárik Week 3 Randomized Coordinate Descent With Arbitrary Sampling January 27, 2016 1 / 30 The Problem

### Tail inequalities for order statistics of log-concave vectors and applications

Tail inequalities for order statistics of log-concave vectors and applications Rafał Latała Based in part on a joint work with R.Adamczak, A.E.Litvak, A.Pajor and N.Tomczak-Jaegermann Banff, May 2011 Basic

### Sums of Independent Random Variables

Chapter 7 Sums of Independent Random Variables 7.1 Sums of Discrete Random Variables In this chapter we turn to the important question of determining the distribution of a sum of independent random variables

### Math 209 Solutions to Assignment 7. x + 2y. 1 x + 2y i + 2. f x = cos(y/z)), f y = x z sin(y/z), f z = xy z 2 sin(y/z).

Math 29 Solutions to Assignment 7. Find the gradient vector field of the following functions: a fx, y lnx + 2y; b fx, y, z x cosy/z. Solution. a f x x + 2y, f 2 y x + 2y. Thus, the gradient vector field

### Basics of Statistical Machine Learning

CS761 Spring 2013 Advanced Machine Learning Basics of Statistical Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu Modern machine learning is rooted in statistics. You will find many familiar

### Single item inventory control under periodic review and a minimum order quantity

Single item inventory control under periodic review and a minimum order quantity G. P. Kiesmüller, A.G. de Kok, S. Dabia Faculty of Technology Management, Technische Universiteit Eindhoven, P.O. Box 513,

### Gambling Systems and Multiplication-Invariant Measures

Gambling Systems and Multiplication-Invariant Measures by Jeffrey S. Rosenthal* and Peter O. Schwartz** (May 28, 997.. Introduction. This short paper describes a surprising connection between two previously

### Newton Divided-Difference Interpolating Polynomials

Section 3 Newton Divided-Difference Interpolating Polynomials The standard form for representing an nth order interpolating polynomial is straightforward. There are however, other representations of nth

### Interpretation of Somers D under four simple models

Interpretation of Somers D under four simple models Roger B. Newson 03 September, 04 Introduction Somers D is an ordinal measure of association introduced by Somers (96)[9]. It can be defined in terms

### Multivariate Analysis of Variance (MANOVA): I. Theory

Gregory Carey, 1998 MANOVA: I - 1 Multivariate Analysis of Variance (MANOVA): I. Theory Introduction The purpose of a t test is to assess the likelihood that the means for two groups are sampled from the

### Lectures 5-6: Taylor Series

Math 1d Instructor: Padraic Bartlett Lectures 5-: Taylor Series Weeks 5- Caltech 213 1 Taylor Polynomials and Series As we saw in week 4, power series are remarkably nice objects to work with. In particular,

### Some stability results of parameter identification in a jump diffusion model

Some stability results of parameter identification in a jump diffusion model D. Düvelmeyer Technische Universität Chemnitz, Fakultät für Mathematik, 09107 Chemnitz, Germany Abstract In this paper we discuss

### 5. Convergence of sequences of random variables

5. Convergence of sequences of random variables Throughout this chapter we assume that {X, X 2,...} is a sequence of r.v. and X is a r.v., and all of them are defined on the same probability space (Ω,

### Probability and Statistics

CHAPTER 2: RANDOM VARIABLES AND ASSOCIATED FUNCTIONS 2b - 0 Probability and Statistics Kristel Van Steen, PhD 2 Montefiore Institute - Systems and Modeling GIGA - Bioinformatics ULg kristel.vansteen@ulg.ac.be

### 1.- L a m e j o r o p c ió n e s c l o na r e l d i s co ( s e e x p li c a r á d es p u é s ).

PROCEDIMIENTO DE RECUPERACION Y COPIAS DE SEGURIDAD DEL CORTAFUEGOS LINUX P ar a p od e r re c u p e ra r nu e s t r o c o rt a f u e go s an t e un d es a s t r e ( r ot u r a d e l di s c o o d e l a

### Optimal Auctions. Jonathan Levin 1. Winter 2009. Economics 285 Market Design. 1 These slides are based on Paul Milgrom s.

Optimal Auctions Jonathan Levin 1 Economics 285 Market Design Winter 29 1 These slides are based on Paul Milgrom s. onathan Levin Optimal Auctions Winter 29 1 / 25 Optimal Auctions What auction rules lead

### Question: What is the probability that a five-card poker hand contains a flush, that is, five cards of the same suit?

ECS20 Discrete Mathematics Quarter: Spring 2007 Instructor: John Steinberger Assistant: Sophie Engle (prepared by Sophie Engle) Homework 8 Hints Due Wednesday June 6 th 2007 Section 6.1 #16 What is the

### NCSS Statistical Software

Chapter 06 Introduction This procedure provides several reports for the comparison of two distributions, including confidence intervals for the difference in means, two-sample t-tests, the z-test, the

### . (3.3) n Note that supremum (3.2) must occur at one of the observed values x i or to the left of x i.

Chapter 3 Kolmogorov-Smirnov Tests There are many situations where experimenters need to know what is the distribution of the population of their interest. For example, if they want to use a parametric

### MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 5 9/17/2008 RANDOM VARIABLES

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 5 9/17/2008 RANDOM VARIABLES Contents 1. Random variables and measurable functions 2. Cumulative distribution functions 3. Discrete

### RINGS WITH A POLYNOMIAL IDENTITY

RINGS WITH A POLYNOMIAL IDENTITY IRVING KAPLANSKY 1. Introduction. In connection with his investigation of projective planes, M. Hall [2, Theorem 6.2]* proved the following theorem: a division ring D in

### General Theory of Differential Equations Sections 2.8, 3.1-3.2, 4.1

A B I L E N E C H R I S T I A N U N I V E R S I T Y Department of Mathematics General Theory of Differential Equations Sections 2.8, 3.1-3.2, 4.1 Dr. John Ehrke Department of Mathematics Fall 2012 Questions

### The Heat Equation. Lectures INF2320 p. 1/88

The Heat Equation Lectures INF232 p. 1/88 Lectures INF232 p. 2/88 The Heat Equation We study the heat equation: u t = u xx for x (,1), t >, (1) u(,t) = u(1,t) = for t >, (2) u(x,) = f(x) for x (,1), (3)

### MATH4427 Notebook 2 Spring 2016. 2 MATH4427 Notebook 2 3. 2.1 Definitions and Examples... 3. 2.2 Performance Measures for Estimators...

MATH4427 Notebook 2 Spring 2016 prepared by Professor Jenny Baglivo c Copyright 2009-2016 by Jenny A. Baglivo. All Rights Reserved. Contents 2 MATH4427 Notebook 2 3 2.1 Definitions and Examples...................................

### Test of Hypotheses. Since the Neyman-Pearson approach involves two statistical hypotheses, one has to decide which one

Test of Hypotheses Hypothesis, Test Statistic, and Rejection Region Imagine that you play a repeated Bernoulli game: you win \$1 if head and lose \$1 if tail. After 10 plays, you lost \$2 in net (4 heads

### Lecture 2 ESTIMATING THE SURVIVAL FUNCTION. One-sample nonparametric methods

Lecture 2 ESTIMATING THE SURVIVAL FUNCTION One-sample nonparametric methods There are commonly three methods for estimating a survivorship function S(t) = P (T > t) without resorting to parametric models:

### Chapter 4 Lecture Notes

Chapter 4 Lecture Notes Random Variables October 27, 2015 1 Section 4.1 Random Variables A random variable is typically a real-valued function defined on the sample space of some experiment. For instance,

### General theory of stochastic processes

CHAPTER 1 General theory of stochastic processes 1.1. Definition of stochastic process First let us recall the definition of a random variable. A random variable is a random number appearing as a result

### Chapter 2 Limits Functions and Sequences sequence sequence Example

Chapter Limits In the net few chapters we shall investigate several concepts from calculus, all of which are based on the notion of a limit. In the normal sequence of mathematics courses that students

### EC 6310: Advanced Econometric Theory

EC 6310: Advanced Econometric Theory July 2008 Slides for Lecture on Bayesian Computation in the Nonlinear Regression Model Gary Koop, University of Strathclyde 1 Summary Readings: Chapter 5 of textbook.

### Master s Theory Exam Spring 2006

Spring 2006 This exam contains 7 questions. You should attempt them all. Each question is divided into parts to help lead you through the material. You should attempt to complete as much of each problem

### Double Sequences and Double Series

Double Sequences and Double Series Eissa D. Habil Islamic University of Gaza P.O. Box 108, Gaza, Palestine E-mail: habil@iugaza.edu Abstract This research considers two traditional important questions,

### Convex analysis and profit/cost/support functions

CALIFORNIA INSTITUTE OF TECHNOLOGY Division of the Humanities and Social Sciences Convex analysis and profit/cost/support functions KC Border October 2004 Revised January 2009 Let A be a subset of R m

### MATH 425, PRACTICE FINAL EXAM SOLUTIONS.

MATH 45, PRACTICE FINAL EXAM SOLUTIONS. Exercise. a Is the operator L defined on smooth functions of x, y by L u := u xx + cosu linear? b Does the answer change if we replace the operator L by the operator

### Non Parametric Inference

Maura Department of Economics and Finance Università Tor Vergata Outline 1 2 3 Inverse distribution function Theorem: Let U be a uniform random variable on (0, 1). Let X be a continuous random variable

### An example of a computable

An example of a computable absolutely normal number Verónica Becher Santiago Figueira Abstract The first example of an absolutely normal number was given by Sierpinski in 96, twenty years before the concept

### Stochastic Inventory Control

Chapter 3 Stochastic Inventory Control 1 In this chapter, we consider in much greater details certain dynamic inventory control problems of the type already encountered in section 1.3. In addition to the

### Multiple Testing. Joseph P. Romano, Azeem M. Shaikh, and Michael Wolf. Abstract

Multiple Testing Joseph P. Romano, Azeem M. Shaikh, and Michael Wolf Abstract Multiple testing refers to any instance that involves the simultaneous testing of more than one hypothesis. If decisions about

### Maximum likelihood estimation of mean reverting processes

Maximum likelihood estimation of mean reverting processes José Carlos García Franco Onward, Inc. jcpollo@onwardinc.com Abstract Mean reverting processes are frequently used models in real options. For

### On Small Sample Properties of Permutation Tests: A Significance Test for Regression Models

On Small Sample Properties of Permutation Tests: A Significance Test for Regression Models Hisashi Tanizaki Graduate School of Economics Kobe University (tanizaki@kobe-u.ac.p) ABSTRACT In this paper we

### A direct approach to false discovery rates

J. R. Statist. Soc. B (2002) 64, Part 3, pp. 479 498 A direct approach to false discovery rates John D. Storey Stanford University, USA [Received June 2001. Revised December 2001] Summary. Multiple-hypothesis

### (Refer Slide Time: 00:00:56 min)

Numerical Methods and Computation Prof. S.R.K. Iyengar Department of Mathematics Indian Institute of Technology, Delhi Lecture No # 3 Solution of Nonlinear Algebraic Equations (Continued) (Refer Slide

### arxiv:1112.0829v1 [math.pr] 5 Dec 2011

How Not to Win a Million Dollars: A Counterexample to a Conjecture of L. Breiman Thomas P. Hayes arxiv:1112.0829v1 [math.pr] 5 Dec 2011 Abstract Consider a gambling game in which we are allowed to repeatedly

### Vilnius University. Faculty of Mathematics and Informatics. Gintautas Bareikis

Vilnius University Faculty of Mathematics and Informatics Gintautas Bareikis CONTENT Chapter 1. SIMPLE AND COMPOUND INTEREST 1.1 Simple interest......................................................................

### FIRST YEAR CALCULUS. Chapter 7 CONTINUITY. It is a parabola, and we can draw this parabola without lifting our pencil from the paper.

FIRST YEAR CALCULUS WWLCHENW L c WWWL W L Chen, 1982, 2008. 2006. This chapter originates from material used by the author at Imperial College, University of London, between 1981 and 1990. It It is is

### What is Statistics? Lecture 1. Introduction and probability review. Idea of parametric inference

0. 1. Introduction and probability review 1.1. What is Statistics? What is Statistics? Lecture 1. Introduction and probability review There are many definitions: I will use A set of principle and procedures

### The Relation between Two Present Value Formulae

James Ciecka, Gary Skoog, and Gerald Martin. 009. The Relation between Two Present Value Formulae. Journal of Legal Economics 15(): pp. 61-74. The Relation between Two Present Value Formulae James E. Ciecka,

### Numerical Analysis Lecture Notes

Numerical Analysis Lecture Notes Peter J. Olver 5. Inner Products and Norms The norm of a vector is a measure of its size. Besides the familiar Euclidean norm based on the dot product, there are a number

### Markov Chain Monte Carlo Simulation Made Simple

Markov Chain Monte Carlo Simulation Made Simple Alastair Smith Department of Politics New York University April2,2003 1 Markov Chain Monte Carlo (MCMC) simualtion is a powerful technique to perform numerical

### Set theory, and set operations

1 Set theory, and set operations Sayan Mukherjee Motivation It goes without saying that a Bayesian statistician should know probability theory in depth to model. Measure theory is not needed unless we

### ON THE EMBEDDING OF BIRKHOFF-WITT RINGS IN QUOTIENT FIELDS

ON THE EMBEDDING OF BIRKHOFF-WITT RINGS IN QUOTIENT FIELDS DOV TAMARI 1. Introduction and general idea. In this note a natural problem arising in connection with so-called Birkhoff-Witt algebras of Lie

### Taylor Series and Asymptotic Expansions

Taylor Series and Asymptotic Epansions The importance of power series as a convenient representation, as an approimation tool, as a tool for solving differential equations and so on, is pretty obvious.

### LOGNORMAL MODEL FOR STOCK PRICES

LOGNORMAL MODEL FOR STOCK PRICES MICHAEL J. SHARPE MATHEMATICS DEPARTMENT, UCSD 1. INTRODUCTION What follows is a simple but important model that will be the basis for a later study of stock prices as

### Recursive Estimation

Recursive Estimation Raffaello D Andrea Spring 04 Problem Set : Bayes Theorem and Bayesian Tracking Last updated: March 8, 05 Notes: Notation: Unlessotherwisenoted,x, y,andz denoterandomvariables, f x

### The Delta Method and Applications

Chapter 5 The Delta Method and Applications 5.1 Linear approximations of functions In the simplest form of the central limit theorem, Theorem 4.18, we consider a sequence X 1, X,... of independent and

### Linear Algebra. A vector space (over R) is an ordered quadruple. such that V is a set; 0 V ; and the following eight axioms hold:

Linear Algebra A vector space (over R) is an ordered quadruple (V, 0, α, µ) such that V is a set; 0 V ; and the following eight axioms hold: α : V V V and µ : R V V ; (i) α(α(u, v), w) = α(u, α(v, w)),

### RESULTANT AND DISCRIMINANT OF POLYNOMIALS

RESULTANT AND DISCRIMINANT OF POLYNOMIALS SVANTE JANSON Abstract. This is a collection of classical results about resultants and discriminants for polynomials, compiled mainly for my own use. All results

### Lecture 6: Option Pricing Using a One-step Binomial Tree. Friday, September 14, 12

Lecture 6: Option Pricing Using a One-step Binomial Tree An over-simplified model with surprisingly general extensions a single time step from 0 to T two types of traded securities: stock S and a bond

### THE FUNDAMENTAL THEOREM OF ALGEBRA VIA PROPER MAPS

THE FUNDAMENTAL THEOREM OF ALGEBRA VIA PROPER MAPS KEITH CONRAD 1. Introduction The Fundamental Theorem of Algebra says every nonconstant polynomial with complex coefficients can be factored into linear

### The Black-Scholes pricing formulas

The Black-Scholes pricing formulas Moty Katzman September 19, 2014 The Black-Scholes differential equation Aim: Find a formula for the price of European options on stock. Lemma 6.1: Assume that a stock

### Maximum Likelihood Estimation

Math 541: Statistical Theory II Lecturer: Songfeng Zheng Maximum Likelihood Estimation 1 Maximum Likelihood Estimation Maximum likelihood is a relatively simple method of constructing an estimator for

### THE CENTRAL LIMIT THEOREM TORONTO

THE CENTRAL LIMIT THEOREM DANIEL RÜDT UNIVERSITY OF TORONTO MARCH, 2010 Contents 1 Introduction 1 2 Mathematical Background 3 3 The Central Limit Theorem 4 4 Examples 4 4.1 Roulette......................................

### E3: PROBABILITY AND STATISTICS lecture notes

E3: PROBABILITY AND STATISTICS lecture notes 2 Contents 1 PROBABILITY THEORY 7 1.1 Experiments and random events............................ 7 1.2 Certain event. Impossible event............................

### Course 221: Analysis Academic year , First Semester

Course 221: Analysis Academic year 2007-08, First Semester David R. Wilkins Copyright c David R. Wilkins 1989 2007 Contents 1 Basic Theorems of Real Analysis 1 1.1 The Least Upper Bound Principle................

### Taylor Polynomials and Taylor Series Math 126

Taylor Polynomials and Taylor Series Math 26 In many problems in science and engineering we have a function f(x) which is too complicated to answer the questions we d like to ask. In this chapter, we will

### Multiple Linear Regression in Data Mining

Multiple Linear Regression in Data Mining Contents 2.1. A Review of Multiple Linear Regression 2.2. Illustration of the Regression Process 2.3. Subset Selection in Linear Regression 1 2 Chap. 2 Multiple

### CONTROLLABILITY. Chapter 2. 2.1 Reachable Set and Controllability. Suppose we have a linear system described by the state equation

Chapter 2 CONTROLLABILITY 2 Reachable Set and Controllability Suppose we have a linear system described by the state equation ẋ Ax + Bu (2) x() x Consider the following problem For a given vector x in

### ECON20310 LECTURE SYNOPSIS REAL BUSINESS CYCLE

ECON20310 LECTURE SYNOPSIS REAL BUSINESS CYCLE YUAN TIAN This synopsis is designed merely for keep a record of the materials covered in lectures. Please refer to your own lecture notes for all proofs.

### Some Notes on Taylor Polynomials and Taylor Series

Some Notes on Taylor Polynomials and Taylor Series Mark MacLean October 3, 27 UBC s courses MATH /8 and MATH introduce students to the ideas of Taylor polynomials and Taylor series in a fairly limited

### Lecture 4: Random Variables

Lecture 4: Random Variables 1. Definition of Random variables 1.1 Measurable functions and random variables 1.2 Reduction of the measurability condition 1.3 Transformation of random variables 1.4 σ-algebra

### CHAPTER IV - BROWNIAN MOTION

CHAPTER IV - BROWNIAN MOTION JOSEPH G. CONLON 1. Construction of Brownian Motion There are two ways in which the idea of a Markov chain on a discrete state space can be generalized: (1) The discrete time

### STAT 830 Convergence in Distribution

STAT 830 Convergence in Distribution Richard Lockhart Simon Fraser University STAT 830 Fall 2011 Richard Lockhart (Simon Fraser University) STAT 830 Convergence in Distribution STAT 830 Fall 2011 1 / 31

### t := maxγ ν subject to ν {0,1,2,...} and f(x c +γ ν d) f(x c )+cγ ν f (x c ;d).

1. Line Search Methods Let f : R n R be given and suppose that x c is our current best estimate of a solution to P min x R nf(x). A standard method for improving the estimate x c is to choose a direction

### Manual for SOA Exam MLC.

Chapter 5. Life annuities. Extract from: Arcones Manual for the SOA Exam MLC. Spring 2010 Edition. available at http://www.actexmadriver.com/ 1/114 Whole life annuity A whole life annuity is a series of

### EXIT TIME PROBLEMS AND ESCAPE FROM A POTENTIAL WELL

EXIT TIME PROBLEMS AND ESCAPE FROM A POTENTIAL WELL Exit Time problems and Escape from a Potential Well Escape From a Potential Well There are many systems in physics, chemistry and biology that exist

### NCSS Statistical Software

Chapter 06 Introduction This procedure provides several reports for the comparison of two distributions, including confidence intervals for the difference in means, two-sample t-tests, the z-test, the

### Notes V General Equilibrium: Positive Theory. 1 Walrasian Equilibrium and Excess Demand

Notes V General Equilibrium: Positive Theory In this lecture we go on considering a general equilibrium model of a private ownership economy. In contrast to the Notes IV, we focus on positive issues such

### **BEGINNING OF EXAMINATION** The annual number of claims for an insured has probability function: , 0 < q < 1.

**BEGINNING OF EXAMINATION** 1. You are given: (i) The annual number of claims for an insured has probability function: 3 p x q q x x ( ) = ( 1 ) 3 x, x = 0,1,, 3 (ii) The prior density is π ( q) = q,

### NOTES ON LINEAR TRANSFORMATIONS

NOTES ON LINEAR TRANSFORMATIONS Definition 1. Let V and W be vector spaces. A function T : V W is a linear transformation from V to W if the following two properties hold. i T v + v = T v + T v for all

### Life Table Analysis using Weighted Survey Data

Life Table Analysis using Weighted Survey Data James G. Booth and Thomas A. Hirschl June 2005 Abstract Formulas for constructing valid pointwise confidence bands for survival distributions, estimated using

### Probability Generating Functions

page 39 Chapter 3 Probability Generating Functions 3 Preamble: Generating Functions Generating functions are widely used in mathematics, and play an important role in probability theory Consider a sequence