2 Right Censoring and Kaplan-Meier Estimator
|
|
|
- Janel Watson
- 10 years ago
- Views:
Transcription
1 2 Right Censoring and Kaplan-Meier Estimator In biomedical applications, especially in clinical trials, two important issues arise when studying time to event data (we will assume the event to be death. It can be any event of interest): 1. Some individuals are still alive at the end of the study or analysis so the event of interest, namely death, has not occurred. Therefore we have right censored data. 2. Length of follow-up varies due to staggered entry. So we cannot observe the event for those individuals with insufficient follow-up time. Note: It is important to distinguish calendar time and patient time Figure 2.1: Illustration of censored data x x x x x o x x o x o o Study Calendar Study starts time ends 0 Patient time (measured from entry to study) In addition to censoring because of insufficient follow-up (i.e., end of study censoring due to staggered entry), other reasons for censoring includes loss to follow-up: patients stop coming to clinic or move away. deaths from other causes: competing risks. Censoring from these types of causes may be inherently different from censoring due to staggered entry. We will discuss this in more detail later. PAGE 11
2 Censoring and differential follow-up create certain difficulties in the analysis for such data as is illustrated by the following example taken from a clinical trial of 146 patients treated after they had a myocardial infarction (MI). The data have been grouped into one year intervals and all time is measured in terms of patient time. Table 2.1: Data from a clinical trial on myocardial infarction (MI) Number alive and under Year since observation at beginning Number dying Number censored entry into study of interval during interval or withdrawn [0, 1) [1, 2) [2, 3) [3, 4) [4, 5) [5, 6) [6, 7) [7, 8) [8, 9) [9, 10) Question: Estimate the 5 year survival rate, i.e., S(5) = P [T 5]. Two naive and incorrect answers are given by 1. F (5) = P [T <5] = 2. F (5) = P [T <5] = 76 deaths in 5 years 146 individuals =52.1%, Ŝ(5) = 1 F (5) = 47.9%. 76 deaths in 5 years (withdrawn in 5 years) = 65%, Ŝ(5) = 1 F (5) = 35%. Obviously, we can observe the following PAGE 12
3 1. The first estimate would be correct if all censoring occurred after 5 years. Of cause, this was not the case leading to overly optimistic estimate (i.e., overestimates S(5)). 2. The second estimate would be correct if all individuals censored in the 5 years were censored immediately upon entering the study. This was not the case either, leading to overly pessimistic estimate (i.e., underestimates S(5)). Our clinical colleagues have suggested eliminating all individuals who are censored and use the remaining complete data. This would lead to the following estimate F (5) = P [T 5] = 76 deaths in 5 years (censored) =88.4%, Ŝ(5) = 1 F (5) = 11.6%. This is even more pessimistic than the estimate given by (2). Life-table Estimate More appropriate methods use life-table or actuarial method. The problem with the above two estimates is that they both ignore the fact that each one-year interval experienced censoring (or withdrawing). Obviously we need to take this information into account in order to reduce bias. If we can express S(5) as a function of quantities related to each interval and get a very good estimate for each quantity, then intuitively, we will get a very good estimate of S(5). By the definition of S(5), we have: S(5) = P [T 5] = P [(T 5) (T 4)] = P [T 4] P [T 5 T 4] = P [T 4] {1 P [4 T<5 T 4]} = P [T 4] q 5 = P [T 3] P [T 4 T 3] q 5 = P [T 3] {1 P [3 T<4 T 3]} q 5 = P [T 3] q 4 q 5 = = q 1 q 2 q 3 q 4 q 5 where q i =1 P [i 1 T < i T i 1],i =1, 2,..., 5. So if we can estimate q i well, then we will get a very good estimate of S(5). Note that 1 q i is the mortality rate m(x) atyear x = i 1 by our definition. PAGE 13
4 Table 2.2: Life-table estimate of S(5) assuming censoring occurred at the end of interval duration [t i 1,t i ) n(x) d(x) w(x) m(x) = d(x) n(x) 1 m(x) ŜR (t i )= (1 m(x)) [0, 1) [1, 2) [2, 3) [3, 4) [4, 5) Case 1: Let us first assume that anyone censored in an interval of time is censored at the end of that interval. Then we can estimate each q i =1 m(i 1) in the following way: d(0) Bin(n(0),m(0)) = m(0) = d(0) n(0) = =0.185, q 1 =1 m(0) = d(1) H Bin(n(1),m(1)) = m(1) = d(1) n(1) = =0.155, q 2 =1 m(1) = where H means data history (i.e, data before the second interval). The life table estimate would be computed as shown in Table 2.2. So the 5 year survival probability estimate ŜR (5) = (If the assumption that anyone censored in an interval of time is censored at the end of that interval is true, then the estimator ŜR (5) is approximately unbiased to S(5).) Of course, this estimate ŜR (5) will have variation since it was calculated from a sample. We need to estimate its variation in order to make inference on S(5) (for example, construct a 95% CI for S(5)). However, ŜR (5) is a product of 5 estimates ( q 1 q 5 ), whose variance is not easy to find. But we have log(ŝr (5)) = log( q 1 )+log( q 2 )+log( q 3 )+log( q 4 )+log( q 5 ). So if we can find out the variance of each log( q i ), we might be able to find out the variance PAGE 14
5 of log(ŝr (5)) and hence the variance of ŜR (5). For this purpose, let us first introduce a very popular method in statistics: delta method: Delta Method: If θ a N(θ, σ 2 ) then f( θ) a N(f(θ), [f (θ)] 2 σ 2 ) Proof of delta method: If σ 2 is small, θ will be close to θ with high probability. We hence can expand f( θ) aboutθ using Taylor expansion: f( θ) f(θ)+f (θ)( θ θ). We immediately get the (asymptotic) distribution of f( θ) from this expansion. Returning to our problem. Let φ i = log( q i ). Using the delta method, the variance of φ i is approximately equal to ( ) 2 var( φ 1 i )= var( q i ). q i Therefore we need to find out and estimate var( q i ). Of course, we also need to find out the covariances among φ i and φ j (i j). For this purpose, we need the following theorem: Double expectation theorem (Law of iterated conditional expectation and variance): If X and Y are any two random variables (or vectors), then E(X) = E[E(X Y )] Var(X) =Var[E(X Y )] + E[Var(X Y )] Since q i =1 m(i 1), we have var( q i ) = var( m(i 1)) PAGE 15
6 = E[var( m(i 1) H)] + var[e( m(i 1) H)] [ ] m(i 1)[1 m(i 1)] = E +var[m(i 1)] n(i 1) [ ] 1 = m(i 1)[1 m(i 1)]E, n(i 1) which can be estimated by m(i 1)[1 m(i 1)]. n(i 1) Hence the variance of φ i = log( q i ) can be approximately estimated by ( ) 2 1 m(i 1)[1 m(i 1)] q i n(i 1) = m(i 1) [1 m(i 1)]n(i 1) = d (n d)n. Now let us look at the covariances among φ i and φ j (i i). It is very amazing that they are all approximately equal to zero! For example, let us consider the covariance between φ 1 and φ 2. Since φ 1 = log( q 1 )and φ 2 = log( q 2 ), using the same argument for the delta method, we know that we only need to find out the covariance between q 1 and q 2, or equivalently, the covariance between m(0) and m(1). This can be seen from the following: E[ m(0) m(1)] = E[E[ m(0) m(1) n(0),d(0),w(0)]] = E[ m(0)e[ m(1) n(0),d(0),w(0)]] = E[ m(0)m(1)] = m(1)e[ m(0)] = m(1)m(0) = E[ m(0)]e[ m(1)]. Therefore, the covariance between m(0) and m(1) is zero. Similarly, we can show other covariances are zero. Hence, var(log(ŝr (5))) = var( φ 1 )+var( φ 2 )+var( φ 3 )+var( φ 4 )+var( φ 5 ). Let θ = log(ŝr (5)). Then ŜR (5) = e θ.so var(ŝr (5)) = (e θ ) 2 var(log(ŝr (5))) = (S(5)) 2 [var( φ 1 )+var( φ 2 )+var( φ 3 )+var( φ 4 )+var( φ 5 )], PAGE 16
7 which can be estimated by [ var(ŝr (5)) = (ŜR (5)) 2 d(0) (n(0) d(0))n(0) + + = (ŜR (5)) 2 4 d(3) (n(3) d(3))n(3) + i=0 d(1) (n(1) d(1))n(1) + ] d(4) (n(4) d(4))n(4) d(2) (n(2) d(2))n(2) d(i) [n(i) d(i)]n(i). (2.1) Case 2: Let us assume that anyone censored in an interval of time is censored right at the beginning of that interval. Then the life table estimate would be computed as shown in Table 2.3. So the 5 year survival probability estimate = (In this case, the estimator Ŝ L (5) is approximately unbiased to S(5).) The variance estimate of ŜL (5) is similar to that of ŜR (5)exceptthatweneedtochange the sample size for each mortality estimate to n w in equation (2.1). Table 2.3: Life-table estimate of S(5) assuming censoring occurred at the beginning of interval duration [t i 1,t i ) n(x) d(x) w(x) m(x) = d(x) n(x) w(x) 1 m(x) ŜL (t i )= (1 m(x)) [0, 1) [1, 2) [2, 3) [3, 4) [4, 5) The naive estimates range from 35% to 47.9% for the five year survival probability with the complete case (i.e., eliminating anyone censored) estimator giving an estimate of 11.6%. The life-table estimate ranged from 40% to 43.2% depending on whether we assume censoring occurred at the left (i.e., beginning) or right (i.e., end) of each interval. More than likely censoring occurs during the interval. Thus ŜL and ŜR are not correct. A compromise is to use the following modification: PAGE 17
8 Table 2.4: Life-table estimate of S(5) assuming censoring occurred during the interval duration [t i 1,t i ) n(x) d(x) w(x) m(x) = d(x) n(x) w(x)/2 1 m(x) ŜLT (t i )= (1 m(x)) [0, 1) [1, 2) [2, 3) [3, 4) [4, 5) That is, when calculating the mortality estimate in each interval, we use (n(x) w(x)/2) as the sample size. This number is often referred to as the effective sample size. So the 5 year survival probability estimate ŜLT (5) = 0.417, which is between ŜL =0.400 and Ŝ R = Figure 2.2: Life-table estimate of the survival probability for MI data Survival probability Time (years) Figure 2.2 shows the life-table estimate of the survival probability assuming censoring occurred during interval. Here the estimates were connected using straight lines. No special significance should be given to this. From this figure, the median survival time is estimated to PAGE 18
9 be about 3 years. The variance estimate of the life-tabble estimate ŜLT (5) is similar to equation (2.1) except that the sample size n(i) is changed to n(i) w(i)/2. That is var(ŝlt (5)) = (ŜLT (5)) 2 4 i=0 d(i) [n(i) w(i)/2 d(i)][n(i) w(i)/2]. (2.2) Of course, we can also use the above formula to calculate the variance of ŜLT (t) atother time points. For example: { } var(ŝlt (1)) = (ŜLT (1)) 2 d(0) [n(0) w(0)/2 d(0)][n(0) w(0)/2] = (146 3/2 27)(146 3/2) = = Therefore SE(ŜLT (1)) = = The calculation presented in Table 2.4 can be implemented using Proc Lifetest in SAS: options ls=72 ps=60; Data mi; input survtime number status; cards; ; proc lifetest method=life intervals=(0 to 10 by 1); time survtime*status(0); freq number; run; PAGE 19
10 Note that the number of observed events and withdrawals in [t i 1,t i ) were entered after t i 1 instead of t i. Part of the output of the above SAS program is The LIFETEST Procedure Life Table Survival Estimates Effective Conditional Interval Number Number Sample Probability [Lower, Upper) Failed Censored Size of Failure Interval Conditional Probability Standard Survival Standard Median Residual [Lower, Upper) Error Survival Failure Error Lifetime Here the numbers in the column under Conditional Probability of Failure are the estimated mortality m(x) = d(x)/(n(x) w(x)/2). The above lifetable estimation can also be implemented using R. Here is the R code: > tis <- 0:10 > ninit <- 146 > nlost <- c(3,10,10,3,3,11,5,8,1,6) > nevent <- c(27,18,21,9,1,2,3,1,2,2) > lifetab(tis, ninit, nlost, nevent) PAGE 20
11 The output from the above R function is nsubs nlost nrisk nevent surv pdf hazard se.surv NA NA se.pdf se.hazard NA NA Note: Here the numbers in the column of hazard are the estimated hazard rates at the midpoint of each interval by assuming the true survival function S(t) is a straight line in each interval. You can find an explicit expression for this estimator using the relation λ(t) = f(t) S(t), and the assumption that the true survival function S(t) is a straight line in [t i 1,t i ): S(t) =S(t i 1 )+ S(t i) S(t i 1 ) t i t i 1 (t t i 1 ), for t [t i 1,t i ). These estimates are very close to the mortality estimates we obtained before (the column under Conditional Probability of Failure in the SAS output.) Kaplan-Meier Estimator The Kaplan-Meier or product limit estimator is the limit of the life-table estimator when intervals are taken so small that only at most one distinct observation occurs within an interval. Kaplan and Meier demonstrated in a paper in JASA (1958) that this estimator is maximum likelihood estimate. PAGE 21
12 Figure 2.3: An illustrative example of Kaplan-Meier estimator x x o x o x x o x o Patient time (years) m(x) : Ŝ(t) : We will illustrate through a simple example shown in Figure 2.3 how the Kaplan-Meier estimator is constructed. By convention, the Kaplan-Meier estimate is a right continuous step function which takes jumps only at the death time. The calculation of the above KM estimate can be implemented using Proc Lifetest in SAS as follows: Data example; input survtime censcode; cards; ; Proc lifetest; PAGE 22
13 time survtime*censcode(0); run; And part of the output from the above program is The LIFETEST Procedure Product-Limit Survival Estimates Survival Standard Number Number SURVTIME Survival Failure Error Failed Left * * * * * Censored Observation The above Kaplan-Meier estimate can also be obtained using R function survfit(). The code is given in the following: > survtime <- c(4.5, 7.5, 8.5, 11.5, 13.5, 15.5, 16.5, 17.5, 19.5, 21.5) > status <- c(1, 1, 0, 1, 0, 1, 1, 0, 1, 0) > fit <- survfit(surv(survtime, status), conf.type=c("plain")) Then we can use R function summary() to see the output: > summary(fit) Call: survfit(formula = Surv(survtime, status), conf.type = c("plain")) time n.risk n.event survival std.err lower 95% CI upper 95% CI Let d(x) denote the number of deaths at time x. Generally d(x) is either zero or one, but we allow the possibility of tied survival times in which case d(x) may be greater than one.let n(x) PAGE 23
14 denote the number of individuals at risk just prior to time x; i.e., number of individuals in the sample who neither died nor were censored prior to time x. Then Kaplan-Meier estimate can be expressed as KM(t) = ( 1 d(x) ). x t n(x) Note: In the notation above, the product changes only at times x where d(x) 1;, i.e., only at times where we observed deaths. Non-informative Censoring In order that the life-table estimates give unbiased results there is an important assumption that individuals who are censored are at the same risk of subsequent failure as those who are still alive and uncensored. The risk set at any time point (the individuals still alive and uncensored) should be representative of the entire population alive at the same time. If this is the case, the censoring process is called non-informative. Statistically, if the censoring process is independent of the survival time, then we will automatically have non-informative censoring. Actually, we almost always mean independent censoring by non-informative censoring. If censoring only occurs because of staggered entry, then the assumption of non-informative censoring seems plausible. However, when censoring results from loss to follow-up or death from a competing risk, then this assumption is more suspect. If at all possible censoring from these later situations should be kept to a minimum. Greenwood s Formula for the Variance of the Life-table Estimator The derivation given below is heuristic in nature but will try to capture some of the salient feature of the more rigorous treatments given in the theoretical literature on survival analysis. For this reason, we will use some of the notation that is associated with the counting process approach to survival analysis. In fact we have seen it when we discussed the life-table estimator. PAGE 24
15 It is useful when considering the product limit estimator to partition time into many small intervals, say, with interval length equal to x where x is small. Figure 2.4: Partition of time axis Patient time x Let x denote some arbitrary time point on the grid above and define = number of individuals at risk (i.e., alive and uncensored) at time point x. = number of observed deaths occurring in [x, x + x). Recall: Previously, was denoted by n(x) and was denoted by d(x). It should be straightforward to see that w(x), the number of censored individuals in [x, x+ x), is equal to {[ Y (x + x)] }. Note: In theory, we should be able to choose x small enough so that { > 0and w(x) > 0} should never occur. In practice, however, data may not be collected in that fashion, in which case, approximations such as those given with life-table estimators may be necessary. With these definitions, the Kaplan-Meier estimator can be written as { KM(t) = 1 }, as x 0, all grid points x such that x + x t which can be modified if x is not chosen small enough to be { } LT (t) = 1, w(x)/2 all grid points x such that x + x t where LT (t) means life-table estimator. If the sample size is large and x is small, then is a small number (i.e., close to zero) and as long as x is not close to the right hand tail of the survival distribution (where may PAGE 25
16 be very small). If this is the case, then exp { } { 1 Here we used the approximation e x 1+x when x is close to zero. This approximation is exact when =0. }. Therefore, the Kaplan-Meier estimator can be approximated by { KM(t) exp } { =exp all grid points x such that x + x t here and thereafter, {x <t} means {all grid points x such that x + x t}. }, If x is taken to be small enough so that all distinct times (either death times or withdrawal times) are represented at most once in any time interval, then the estimator will be uniquely defined and will not be altered by choosing a finer partition for the grid of time points. In such a case the quantity is sometimes represented as t Basically, this estimator take the sum over all the distinct death times before time t of the number of deaths divided by the number at risk at each of those distinct death times. 2. The estimator hazard function Λ(t) = t 0 λ(x)dx. Thatis Recall that S(t) = exp( Λ(t)). is referred to as the Nelson-Aalen estimator for the cumulative Λ(t) =. By the definition of an integral, Λ(t) = t 0 λ(x)dx λ(x) x. grid points x such that x + x t PAGE 26
17 By the definition of a hazard function, λ(x) x P [x T<x+ x T x]. With independent censoring, it would seem reasonable to estimate λ(x) x, i.e., the conditional probability of dying in [x, x + x) given being alive at time x by. Therefore we obtain the Nelson-Aalen estimator Λ(t) =. We will now show how to estimate the variance of the Nelson-Aalen estimator and then show how this will be used to estimate the variance of the Kaplan-Meier estimator. time x. For a grid point x, leth(x) denote the history of all deaths and censoring occurring up to H(x) ={dn(u),w(u); for all values u on our grid of points for u<x}. Note the following 1. Conditional on H(x), we would know the value of (i.e., the number of risk at time x) and that would follow a binomial distribution denoted as H(x) Bin(,π(x)), where π(x) is the Conditional probability of an individual dying in [x, x + x) giventhat the individual was at risk at time x (i.e., π(x) =P [x T<x+ x T x]). Recall that this probability can be approximated by π(x) λ(x) x. 2. The following are standard results for a binomially distributed random variable. (a) E[ H(x)] = π(x), PAGE 27
18 (b) Var[ H(x)] = π(x)[1 π(x)], [ ] (c) E H(x) = π(x), {[ ][ ][ ] } (d) E 1 H(x) = π(x)[1 π(x)]. Consider the Nelson-Aalen estimator Λ(t) =.Wehave [ ] E[ Λ(t)] = E = [ ] E = [ [ ]] E E H(x) = π(x) λ(x) x t 0 λ(x)dx =Λ(t). Hence E[ Λ(t)] = π(x). If we take x smaller and smaller, then in the limit π(x) goestoλ(t). Namely Λ(t) is nearly unbiased to Λ(t). How to Estimate the Variance of Λ(t) The definition of variance is given by Var( Λ(t)) = E[ Λ(t) E( Λ(t))] 2 [ = E ] 2 π(x) [ { }] 2 = E π(x). Note: The square of a sum of terms is equal to the sum of the squares plus the sum of all cross product terms. So the above expectation is equal to E { } 2 π(x) + { }{ dn(x } π(x) ) Y (x x x <t ) π(x ) = [ ] 2 E π(x) + [{ }{ dn(x }] π(x) ) Y (x ) π(x ) E x x <t PAGE 28
19 We will first demonstrate that the cross product terms have expectation equal to zero. Let us take one such term and let us say, without loss of generality, that x<x. [{ }{ dn(x }] E π(x) ) Y (x ) π(x ) [ [{ }{ dn(x } ]] = E E π(x) ) Y (x ) π(x ) H(x ) Note: Conditional on H(x ),,Y(x) andπ(x) are constant since x<x. Therefore the above expectation is equal to [{ } [{ dn(x } ]] E π(x) ) E Y (x ) π(x ) H(x ) The inner conditional expectation is zero since { dn(x } ) E Y (x ) H(x ) = π(x ) by (2.c). Therefore we show that [{ }{ dn(x }] E π(x) ) Y (x ) π(x ) =0. Since the cross product terms have expectation equal to zero, this implies that Var( Λ(t)) = [ ] 2 E π(x) Using the double expectation again, we get that [ ] 2 E π(x) } 2 = E E π(x) H(x) [ [ ]] = E Var H(x) [ ] π(x)[1 π(x)] = E. Therefore, we have that Var( Λ(t)) = [ ] π(x)[1 π(x)] E. PAGE 29
20 If we wanted to estimate π(x)[1 π(x)], then using (2.d) we might think that [ ] 1 may be reasonable. In fact, we would then use as an estimate for Var( Λ(t)) the following estimator; summing the above estimator over all grid points x such that x + x t. Var( Λ(t)) = [ 1 ]. In fact, the above variance estimator is unbiased for Var( Λ(t)), which can be seen using the following argument: [ ] E 1 [ ] = E 1 { } = E E 1 H(x) = [ ] π(x)[1 π(x)] E (by (2.d)) = Var[ Λ(t)]. (double expectation again) What this last argument shows is that an unbiased estimator for Var[ Λ(t)] is given by [ 1 ]. Note: If the survival data are continuous (i.e., noties)and x is taken small enough, then would take on the values 0 or 1 only. In this case and [ ] 1 Var( Λ(t)) = = Y 2 (x), Y 2 (x), PAGE 30
21 which is also written as t 0 Y 2 (x). Remark: We proved that the Nelson-Aalen estimator π(x). We argued before that in the limit as x goes to zero, is an unbiased estimator for becomes t 0. We also argued that π(x) λ(x) x, hence as x goes to zero, then π(x) goes to t 0 λ(x)dx. These two arguments taken together imply that t 0 is an unbiased estimator of the cumulative hazard function Λ(t) = t 0 λ(x)dx, namely, E [ t 0 ] =Λ(t). Since Λ(t) = is made up of a sum of random variables that are conditionally uncorrelated, they have a martingale structure for which there exists a body of theory that enables us to show that PAGE 31
22 Λ(t) is asymptotically normal with mean Λ(t) and variance Var[ Λ(t)], which can be estimated unbiasedly by Var( Λ(t)) = [ 1 ] ; and in the case of no ties, by Var( Λ(t)) = Y 2 (x). Let us refer to the estimated standard error of Λ(t) by se[ Λ(t)] = [ 1 ] 1/2. The unbiasedness and asymptotic normality of Λ(t) aboutλ(t) allow us to form confidence intervals for Λ(t) (at time t). Specifically, the (1 α)th confidence interval for Λ(t) is given by Λ(t) ± z α/2 se( Λ(t)), where z α/2 is the (1 α/2)th quantile of a standard normal distribution. That is, the random interval [ Λ(t) z α/2 se( Λ(t)), Λ(t)+z α/2 se( Λ(t))] covers the true value Λ(t) with probability 1 α. This result could also be used to construct confidence intervals for the survival function S(t). This is seen by realizing that S(t) =e Λ(t), in which case the confidence interval is given by [e Λ(t) zα/2 se( Λ(t)), e Λ(t)+zα/2 se( Λ(t)) ], PAGE 32
23 meaning that this random interval will cover the true value S(t) with probability 1 α. An example: We will use the hypothetical data shown in Figure 2.3 to illustrate the calculation of Λ(t), Var Λ(t), and confidence intervals for Λ(t) ands(t). For illustration, let us take t = 17. Note that there are no ties in this example. So Λ(t) = t = 0 = =0.804, Var[ Λ(t)] = t Y 2 (x) = 0 Y 2 (x) = =0.145, 2 ŝe[ Λ(t)] = = So the 95% confidence interval for Λ(t) is ± = [0.0572, 1.551]. and the Nelson-Aalen estimate of S(t) is Ŝ(t) =e Λ(t) =e = The 95% confidence interval for S(t) is [e 1.551, e ]=[0.212, 0.944]. Note The above Nelson-Aalen estimate Ŝ(t) =0.448 is different from (but close to) the Kaplan-Meier estimate KM(t) = It should also be noted that above confidence interval for the survival probability S(t) is not symmetric about the estimator Ŝ(t). Another way of getting approximate confidence intervals for S(t) =e Λ(t) is by using the delta method. This method guarantees symmetric confidence intervals. Hence a (1 α)th confidence interval for f(θ) isgivenby f( θ) ± z α/2 f ( θ) σ. In our case, Λ(t) takes on the role of θ, Λ(t) takes on the role of θ, f(θ) =e θ so that S(t) =f{λ(t)}. Since f (θ) = e θ =e θ, and Ŝ(t) =e Λ(t). PAGE 33
24 Consequently, using the delta method we get Ŝ(t) a N(S(t), [S(t)] 2 Var[ Λ(t)]), and a (1 α)th confidence interval for S(t) is given by Ŝ(t) ± z α/2 {Ŝ(t) se[ Λ(t)]}. Remark: Note that [S(t)] 2 Var[ Λ(t)] is an estimate of Var[Ŝ(t)], where Ŝ(t) =exp[ Λ(t)]. Previously, we showed that the Kaplan-Meier estimator KM(t) = was well approximated by Ŝ(t) =exp[ Λ(t)]. [ 1 ] Thus a reasonable estimator of Var(KM(t)) would be to use the estimator of Var[exp( Λ(t))], or (by using the delta method) [Ŝ(t)]2 Var[ Λ(t))] = [Ŝ(t)]2 Y 2 (x). This is very close (asymptotically the same) as the estimator for the variance of the Kaplan- Meier estimator given by Greenwood. Namely Var{KM(t)} = {KM(t)} 2 [ ]. [ w(x)/2][ w(x)/2] Note: SAS uses the above formula to calculate the estimated variance for the life-table estimate of the survival function, by replacing KM(t) on both sides by LT (t). Note: The summation in the above equation can be viewed as the variance estimate for the cumulative hazard estimator defined by Λ KM (t) = log[km(t)]. Namely, Var{ Λ KM (t)} = [ w(x)/2][ w(x)/2]. In the example shown in Figure 2.3, using the delta-method approximation for getting a confidence interval with the Nelson-Aalen estimator, we get that a 95% CI for S(t) (where t=17) PAGE 34
25 is e Λ(t) ± 1.96 e Λ(t) se[ Λ(t)] = e ± 1.96 e = [0.114, 0.784]. The estimated se[ŝ(t)] = If we use the Kaplan-Meier estimator, together with Greenwood s formula for estimating the variance, to construct a 95% confidence interval for S(t), we would get [ KM(t) = 1 1 ][ 1 1 ][ 1 1 ][ 1 1 ][ 1 1 ] = { 1 Var[KM(t)] = } = ŝe[km(t)] = = Var[ Λ KM (t)] = =0.182 se[ Λ KM (t)] = Thus a 95% confidence interval for S(t) is given by KM(t) ± 1.96 ŝe[km(t)] = ± = [0.068, 0.754], which is close to the confidence interval using delta method, considering the sample size is only 10. In fact the estimated standard errors for Ŝ(t) andkm(t) using delta method and Greenwood s formula are and respectively, which agree with each other very well. Note: If we want to use R function survfit() to construct a confidence interval for S(t)with the form KM(t) ± z α/2 ŝe[km(t)],wehavetospecifytheargumentconf.type=c("plain") in survfit(). The default constructs the confidence interval for S(t) by exponentiating the confidence interval for the cumulative hazard using the Kaplan-Meier estimator. For example, a 95% CI for S(t) iskm(t) [e 1.96 se[ ΛKM (t)],e 1.96 se[ ΛKM (t)] ]=0.411 [e , [e ]= [0.178, 0.949]. Comparison of confidence intervals for S(t) 1. exponentiating the 95% CI for cumulative hazard using Nelson-Aalen estimator: [0.212, 0.944]. PAGE 35
26 2. Delta-method using Nelson-Aalen estimator: [0.114, 0.784]. 3. exponentiating the 95% CI for cumulative hazard using Kaplan-Meier estimator: [0.178, 0.949]. 4. Kaplan-Meier estimator together with Greenwood s formula for variance: [0.068, 0.754]. These are relatively close and the approximations become better with larger sample sizes. Of the different methods for constructing confidence intervals, usually the most accurate is based on exponentiating the confidence intervals for the cumulative hazard function based on Nelson-Aalen estimator. We don t feel that symmetry is necessarily an important feature that confidence interval need have. Summary 1. We first estimate S(t) bykm(t) = ( ) 1 d(x) n(x), then estimate Λ(t) by Λ KM (t) = log[km(t)]. Their variance estimates are Var{ Λ KM (t)} = [ w(x)/2][ w(x)/2] Var{KM(t)} = {KM(t)} 2 Var{ Λ KM (t)}. The confidence intervals for S(t) can be constructed in two ways: KM(t) ± z α/2 se[km(t)], or e ΛKM (t)±z α/2 se[ ΛKM (t)] = KM(t) e ±z α/2 se[ ΛKM (t)] 2. We first estimate Λ(t) by Nelson-Aalen estimator Λ(t) = Ŝ(t) =e Λ(t). Their variance estimates are given by Var{ Λ(t)} = 1 Var{Ŝ(t)} = {Ŝ(t)}2 Var{ Λ(t)}. [, then estimate S(t) by ] The confidence intervals for S(t) can also be constructed in two ways: Ŝ(t) ± z α/2 se[ŝ(t)], or e Λ(t)±zα/2 se[ Λ(t)] = Ŝ(t) e±z α/2 se[ Λ(t)]. PAGE 36
27 Estimators of quantiles (such as median, first and third quartiles) of a distribution can be obtained by inverse relationships. This is most easily illustrated through an example. Suppose we want to estimate the median S 1 (0.5) or any other quantile ϕ = S 1 (θ); 0 < θ<1. Then the point estimate of ϕ is obtained (using the Kaplan-Meier estimator of S(t)) ϕ = KM 1 (θ), i.e., KM( ϕ) =θ. An approximate (1 α)th confidence interval for ϕ if given by [ ϕ L, ϕ U ], where ϕ L satisfies KM( ϕ L ) z α/2 se[km( ϕ L )] = θ and ϕ U satisfies KM( ϕ U )+z α/2 se[km( ϕ U )] = θ. Proof: We prove this argument for a general estimator Ŝ(t). So if we use the Kaplan-Meier estimator, then Ŝ(t) iskm(t). It can also be the Nelson-Aalen estimator. Then P [ ϕ L <ϕ< ϕ U ] = P [S( ϕ U ) <θ<s( ϕ L )] (note that S(t) is decreasing and S(ϕ) =θ) = 1 (P [S( ϕ U ) >θ]+p [S( ϕ L ) <θ]). Denote ϕ U the solution to the equation S(ϕ U )+z α/2 se[ŝ(ϕ U)] = θ. Then ϕ U will be close to ϕ U. Therefore, P [S( ϕ U ) >θ] = P [S( ϕ U ) > Ŝ( ϕ U)+z α/2 se[ŝ( ϕ U)]] [Ŝ( ] ϕu ) S( ϕ U ) = P < z α/2 U)] [Ŝ(ϕU ] ) S(ϕ U ) P < z α/2 U)] P [Z < z α/2 ] (Z N(0, 1)) = α 2. PAGE 37
28 Similarly, we can show that P [S( ϕ L ) <θ] α 2. Therefore, ( α P [ ϕ L <ϕ< ϕ U ] α ) =1 α. 2 We illustrate this practice using a simulated data set generated using the following R commands > survtime <- rexp(50, 0.2) > censtime <- rexp(50, 0.1) > status <- (survtime <= censtime) > obstime <- survtime*status + censtime*(1-status) > fit <- survfit(surv(obstime, status)) > summary(fit) Call: survfit(formula = Surv(obstime, status)) time n.risk n.event survival std.err lower 95% CI upper 95% CI PAGE 38
29 NA NA NA The true survival time has an exponential distribution with λ = 0.2/year (so the true mean is 5 years and median is 5 log(2) 3.5 years). The (potential) censoring time is independent from the survival time and has an exponential distribution with λ = 0.1/year (so it is stochastically larger than the survival time). The Kaplan estimate (solid line) and its 95% confidence intervals (dotted lines) are shown in Figure 2.5, which is generated using R function plot(fit, xlab="patient time (years)", ylab="survival probability"). Note that these CIs are constructed by exponentiating the CIs for Λ(t). From this figure, the median survival time is estimated to be 3.56 years, with its 95% confidence interval [2.51, 6.20]. Figure 2.5: Illustration for constructing 95% CI for median survival time survival probability Patient time (years) If we use symmetric confidence intervals of S(t) to construct the confidence interval for the median of the true survival time, then we need to specify conf.type=c("plain") in survfit() as follows > fit <- survfit(surv(obstime, status), conf.type=c("plain")) We get the following output using summary() PAGE 39
30 > summary(fit) Call: survfit(formula = Surv(obstime, status), conf.type = c("plain")) time n.risk n.event survival std.err lower 95% CI upper 95% CI NA NA NA The Kaplan estimate (solid line) and its symmetric 95% confidence intervals (dotted lines) are shown in Figure 2.6. Note that the Kaplan estimate is the same as before. From this figure, the median survival time is estimated to be 3.56 years, with its 95% confidence interval [2.51, 6.12]. Note: If we treat the censored data obstime as uncensored and fit an exponential model to it, then the best estimate of the median survival time is 2.5, with 95% confidence interval [1.8, 3.2] (using the methodology to be presented in next chapter). These estimates severely underestimate the true median survival time 3.5 years. PAGE 40
31 Figure 2.6: Illustration for constructing 95% CI for median survival time using symmetric CIs of S(t) survival probability Patient time (years) Note: If we want a CI for the quantile such as the median survival time with a different confidence level, say, 90%, then we need to construct 90% confidence intervals for S(t). This can be done by specifying conf.int=0.9 in the R function survfit(). If we use Proc Lifetest in SAS to compute the Kaplan-Meier estimate, it will produce 95% confidence intervals for 25%, 50% (median) and 75% quantiles of the true survival time. Other types of censoring and truncation: Left censoring: This kind of censoring occurs when the event of interest is only known to happen before a specific time point. For example, in a study of time to first marijuana use (example 1.17, page 17 of Klein & Moeschberger) 191 high school boys were asked when did you first use marijuana?. Some answers were I have used it but cannot recall when the first time was. For these boys, their time to first marijuana use is left censored at their current age. For the boys who never used marijuana, their time to first marijuana use is right censored at their current age. Of course, we got their exact time to first marijuana PAGE 41
32 use for those boys who remembered when they first used it. Interval censoring occurs when the event of interest is only known to take place in an interval. For example, in a study to compare time to cosmetic deterioration of breasts for breast cancer patients treated with radiotherapy and radiotherapy + chemotherapy, patients were examined at each clinical visit for breast retraction and the breast retraction is only known to take place between two clinical visits or right censored at the end of the study. See example 1.18 on page 18 of Klein & Moeschberger. Left truncation occurs when the time to event of interest in the study sample is greater than a (left) truncation variable. For example, in a study of life expectancy (survival time measured from birth to death) using elderly residents in a retirement community (example 1.16, page 15 of Klein & Moeschberger), the individuals must survive to a sufficient age to enter the retirement community. Therefore, their survival time is left truncated by their age entering the community. Ignoring the truncation will lead to a biased sample and the survival time from the sample will over estimate the underlying life expectancy. Right truncation occurs when the time to event of interest in the study sample is less than a (right) truncation variable. A special case is when the study sample consists of only those individuals who have already experienced the event. For example, to study the induction period (also called latency period or incubation period) between infection with AIDS virus and the onset of clinical AIDS, the ideal approach will be to collect a sample of patients infected with AIDS virus and then follow them for some period of time until some of them develop clinical AIDS. However, this approach may be too lengthy and costly. An alternative approach is to study those patients who were infected with AIDS from a contaminated blood transfusion and later developed clinical AIDS. In this case, the total number of patients infected with AIDS is unknown. A similar approach can be used to study the induction time for pediatric AIDS. Children were infected with AIDS in utero or at birth and later developed clinical AIDS. But the study sample consists of children only known to develop AIDS. This sampling scheme is similar to the case-control design. See PAGE 42
33 example 1.19 on page 19 of Klein & Moeschberger for more description and the data. Note: The K-M survival estimation approach cannot be directly applied to the data with the above censorings and truncations. Modified K-M approach or others have to be used. Similar to right censoring case, the censoring time and truncation time are often assumed to be independent of the time to event of interest (survival time). Since right censoring is the most common censoring scheme, we will focus on this special case most of the time in this course. Nonparametric estimation of the survival function (or the cumulative distribution function) for the data with other censoring or truncation schemes can be found in Chapters 4 and 5 of Klein & Moeschberger. PAGE 43
Lecture 2 ESTIMATING THE SURVIVAL FUNCTION. One-sample nonparametric methods
Lecture 2 ESTIMATING THE SURVIVAL FUNCTION One-sample nonparametric methods There are commonly three methods for estimating a survivorship function S(t) = P (T > t) without resorting to parametric models:
Gamma Distribution Fitting
Chapter 552 Gamma Distribution Fitting Introduction This module fits the gamma probability distributions to a complete or censored set of individual or grouped data values. It outputs various statistics
Lecture 15 Introduction to Survival Analysis
Lecture 15 Introduction to Survival Analysis BIOST 515 February 26, 2004 BIOST 515, Lecture 15 Background In logistic regression, we were interested in studying how risk factors were associated with presence
Survey, Statistics and Psychometrics Core Research Facility University of Nebraska-Lincoln. Log-Rank Test for More Than Two Groups
Survey, Statistics and Psychometrics Core Research Facility University of Nebraska-Lincoln Log-Rank Test for More Than Two Groups Prepared by Harlan Sayles (SRAM) Revised by Julia Soulakova (Statistics)
Tips for surviving the analysis of survival data. Philip Twumasi-Ankrah, PhD
Tips for surviving the analysis of survival data Philip Twumasi-Ankrah, PhD Big picture In medical research and many other areas of research, we often confront continuous, ordinal or dichotomous outcomes
**BEGINNING OF EXAMINATION** The annual number of claims for an insured has probability function: , 0 < q < 1.
**BEGINNING OF EXAMINATION** 1. You are given: (i) The annual number of claims for an insured has probability function: 3 p x q q x x ( ) = ( 1 ) 3 x, x = 0,1,, 3 (ii) The prior density is π ( q) = q,
Life Table Analysis using Weighted Survey Data
Life Table Analysis using Weighted Survey Data James G. Booth and Thomas A. Hirschl June 2005 Abstract Formulas for constructing valid pointwise confidence bands for survival distributions, estimated using
Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model
Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written
JANUARY 2016 EXAMINATIONS. Life Insurance I
PAPER CODE NO. MATH 273 EXAMINER: Dr. C. Boado-Penas TEL.NO. 44026 DEPARTMENT: Mathematical Sciences JANUARY 2016 EXAMINATIONS Life Insurance I Time allowed: Two and a half hours INSTRUCTIONS TO CANDIDATES:
Survival Analysis of Left Truncated Income Protection Insurance Data. [March 29, 2012]
Survival Analysis of Left Truncated Income Protection Insurance Data [March 29, 2012] 1 Qing Liu 2 David Pitt 3 Yan Wang 4 Xueyuan Wu Abstract One of the main characteristics of Income Protection Insurance
Statistics in Retail Finance. Chapter 6: Behavioural models
Statistics in Retail Finance 1 Overview > So far we have focussed mainly on application scorecards. In this chapter we shall look at behavioural models. We shall cover the following topics:- Behavioural
Math 370, Spring 2008 Prof. A.J. Hildebrand. Practice Test 2 Solutions
Math 370, Spring 008 Prof. A.J. Hildebrand Practice Test Solutions About this test. This is a practice test made up of a random collection of 5 problems from past Course /P actuarial exams. Most of the
Manual for SOA Exam MLC.
Chapter 5. Life annuities. Extract from: Arcones Manual for the SOA Exam MLC. Spring 2010 Edition. available at http://www.actexmadriver.com/ 1/114 Whole life annuity A whole life annuity is a series of
Survival Analysis: Introduction
Survival Analysis: Introduction Survival Analysis typically focuses on time to event data. In the most general sense, it consists of techniques for positivevalued random variables, such as time to death
Distribution (Weibull) Fitting
Chapter 550 Distribution (Weibull) Fitting Introduction This procedure estimates the parameters of the exponential, extreme value, logistic, log-logistic, lognormal, normal, and Weibull probability distributions
Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur
Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur Module No. #01 Lecture No. #15 Special Distributions-VI Today, I am going to introduce
Nonparametric adaptive age replacement with a one-cycle criterion
Nonparametric adaptive age replacement with a one-cycle criterion P. Coolen-Schrijner, F.P.A. Coolen Department of Mathematical Sciences University of Durham, Durham, DH1 3LE, UK e-mail: [email protected]
SOCIETY OF ACTUARIES/CASUALTY ACTUARIAL SOCIETY EXAM C CONSTRUCTION AND EVALUATION OF ACTUARIAL MODELS EXAM C SAMPLE QUESTIONS
SOCIETY OF ACTUARIES/CASUALTY ACTUARIAL SOCIETY EXAM C CONSTRUCTION AND EVALUATION OF ACTUARIAL MODELS EXAM C SAMPLE QUESTIONS Copyright 005 by the Society of Actuaries and the Casualty Actuarial Society
Statistics 104: Section 6!
Page 1 Statistics 104: Section 6! TF: Deirdre (say: Dear-dra) Bloome Email: [email protected] Section Times Thursday 2pm-3pm in SC 109, Thursday 5pm-6pm in SC 705 Office Hours: Thursday 6pm-7pm SC
Least Squares Estimation
Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David
Poisson processes (and mixture distributions)
Poisson processes (and mixture distributions) James W. Daniel Austin Actuarial Seminars www.actuarialseminars.com June 26, 2008 c Copyright 2007 by James W. Daniel; reproduction in whole or in part without
Properties of Future Lifetime Distributions and Estimation
Properties of Future Lifetime Distributions and Estimation Harmanpreet Singh Kapoor and Kanchan Jain Abstract Distributional properties of continuous future lifetime of an individual aged x have been studied.
4. Continuous Random Variables, the Pareto and Normal Distributions
4. Continuous Random Variables, the Pareto and Normal Distributions A continuous random variable X can take any value in a given range (e.g. height, weight, age). The distribution of a continuous random
Important Probability Distributions OPRE 6301
Important Probability Distributions OPRE 6301 Important Distributions... Certain probability distributions occur with such regularity in real-life applications that they have been given their own names.
Errata and updates for ASM Exam C/Exam 4 Manual (Sixteenth Edition) sorted by page
Errata for ASM Exam C/4 Study Manual (Sixteenth Edition) Sorted by Page 1 Errata and updates for ASM Exam C/Exam 4 Manual (Sixteenth Edition) sorted by page Practice exam 1:9, 1:22, 1:29, 9:5, and 10:8
Solution. Let us write s for the policy year. Then the mortality rate during year s is q 30+s 1. q 30+s 1
Solutions to the May 213 Course MLC Examination by Krzysztof Ostaszewski, http://wwwkrzysionet, krzysio@krzysionet Copyright 213 by Krzysztof Ostaszewski All rights reserved No reproduction in any form
Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay
Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay Lecture - 17 Shannon-Fano-Elias Coding and Introduction to Arithmetic Coding
Numerical Methods for Option Pricing
Chapter 9 Numerical Methods for Option Pricing Equation (8.26) provides a way to evaluate option prices. For some simple options, such as the European call and put options, one can integrate (8.26) directly
Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)
Summary of Formulas and Concepts Descriptive Statistics (Ch. 1-4) Definitions Population: The complete set of numerical information on a particular quantity in which an investigator is interested. We assume
Exam C, Fall 2006 PRELIMINARY ANSWER KEY
Exam C, Fall 2006 PRELIMINARY ANSWER KEY Question # Answer Question # Answer 1 E 19 B 2 D 20 D 3 B 21 A 4 C 22 A 5 A 23 E 6 D 24 E 7 B 25 D 8 C 26 A 9 E 27 C 10 D 28 C 11 E 29 C 12 B 30 B 13 C 31 C 14
MATH4427 Notebook 2 Spring 2016. 2 MATH4427 Notebook 2 3. 2.1 Definitions and Examples... 3. 2.2 Performance Measures for Estimators...
MATH4427 Notebook 2 Spring 2016 prepared by Professor Jenny Baglivo c Copyright 2009-2016 by Jenny A. Baglivo. All Rights Reserved. Contents 2 MATH4427 Notebook 2 3 2.1 Definitions and Examples...................................
LINEAR ALGEBRA W W L CHEN
LINEAR ALGEBRA W W L CHEN c W W L Chen, 1997, 2008 This chapter is available free to all individuals, on understanding that it is not to be used for financial gain, and may be downloaded and/or photocopied,
The Heat Equation. Lectures INF2320 p. 1/88
The Heat Equation Lectures INF232 p. 1/88 Lectures INF232 p. 2/88 The Heat Equation We study the heat equation: u t = u xx for x (,1), t >, (1) u(,t) = u(1,t) = for t >, (2) u(x,) = f(x) for x (,1), (3)
200609 - ATV - Lifetime Data Analysis
Coordinating unit: Teaching unit: Academic year: Degree: ECTS credits: 2015 200 - FME - School of Mathematics and Statistics 715 - EIO - Department of Statistics and Operations Research 1004 - UB - (ENG)Universitat
Further Topics in Actuarial Mathematics: Premium Reserves. Matthew Mikola
Further Topics in Actuarial Mathematics: Premium Reserves Matthew Mikola April 26, 2007 Contents 1 Introduction 1 1.1 Expected Loss...................................... 2 1.2 An Overview of the Project...............................
University of Ljubljana Doctoral Programme in Statistics Methodology of Statistical Research Written examination February 14 th, 2014.
University of Ljubljana Doctoral Programme in Statistics ethodology of Statistical Research Written examination February 14 th, 2014 Name and surname: ID number: Instructions Read carefully the wording
Introduction to Survival Analysis
Introduction to Survival Analysis Introduction to Survival Analysis CF Jeff Lin, MD., PhD. December 27, 2005 c Jeff Lin, MD., PhD. c Jeff Lin, MD., PhD. Introduction to Survival Analysis, 1 Introduction
Survival Distributions, Hazard Functions, Cumulative Hazards
Week 1 Survival Distributions, Hazard Functions, Cumulative Hazards 1.1 Definitions: The goals of this unit are to introduce notation, discuss ways of probabilistically describing the distribution of a
Master s Theory Exam Spring 2006
Spring 2006 This exam contains 7 questions. You should attempt them all. Each question is divided into parts to help lead you through the material. You should attempt to complete as much of each problem
Math 370, Actuarial Problemsolving Spring 2008 A.J. Hildebrand. Practice Test, 1/28/2008 (with solutions)
Math 370, Actuarial Problemsolving Spring 008 A.J. Hildebrand Practice Test, 1/8/008 (with solutions) About this test. This is a practice test made up of a random collection of 0 problems from past Course
Chapter 6: Point Estimation. Fall 2011. - Probability & Statistics
STAT355 Chapter 6: Point Estimation Fall 2011 Chapter Fall 2011 6: Point1 Estimat / 18 Chap 6 - Point Estimation 1 6.1 Some general Concepts of Point Estimation Point Estimate Unbiasedness Principle of
Multiple Linear Regression in Data Mining
Multiple Linear Regression in Data Mining Contents 2.1. A Review of Multiple Linear Regression 2.2. Illustration of the Regression Process 2.3. Subset Selection in Linear Regression 1 2 Chap. 2 Multiple
Math 370, Spring 2008 Prof. A.J. Hildebrand. Practice Test 1 Solutions
Math 70, Spring 008 Prof. A.J. Hildebrand Practice Test Solutions About this test. This is a practice test made up of a random collection of 5 problems from past Course /P actuarial exams. Most of the
Math 370/408, Spring 2008 Prof. A.J. Hildebrand. Actuarial Exam Practice Problem Set 2 Solutions
Math 70/408, Spring 2008 Prof. A.J. Hildebrand Actuarial Exam Practice Problem Set 2 Solutions About this problem set: These are problems from Course /P actuarial exams that I have collected over the years,
May 2012 Course MLC Examination, Problem No. 1 For a 2-year select and ultimate mortality model, you are given:
Solutions to the May 2012 Course MLC Examination by Krzysztof Ostaszewski, http://www.krzysio.net, [email protected] Copyright 2012 by Krzysztof Ostaszewski All rights reserved. No reproduction in any
Overview of Monte Carlo Simulation, Probability Review and Introduction to Matlab
Monte Carlo Simulation: IEOR E4703 Fall 2004 c 2004 by Martin Haugh Overview of Monte Carlo Simulation, Probability Review and Introduction to Matlab 1 Overview of Monte Carlo Simulation 1.1 Why use simulation?
STAT 830 Convergence in Distribution
STAT 830 Convergence in Distribution Richard Lockhart Simon Fraser University STAT 830 Fall 2011 Richard Lockhart (Simon Fraser University) STAT 830 Convergence in Distribution STAT 830 Fall 2011 1 / 31
1 Teaching notes on GMM 1.
Bent E. Sørensen January 23, 2007 1 Teaching notes on GMM 1. Generalized Method of Moment (GMM) estimation is one of two developments in econometrics in the 80ies that revolutionized empirical work in
SUMAN DUVVURU STAT 567 PROJECT REPORT
SUMAN DUVVURU STAT 567 PROJECT REPORT SURVIVAL ANALYSIS OF HEROIN ADDICTS Background and introduction: Current illicit drug use among teens is continuing to increase in many countries around the world.
SAS Software to Fit the Generalized Linear Model
SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling
Statistical Analysis of Life Insurance Policy Termination and Survivorship
Statistical Analysis of Life Insurance Policy Termination and Survivorship Emiliano A. Valdez, PhD, FSA Michigan State University joint work with J. Vadiveloo and U. Dias Session ES82 (Statistics in Actuarial
Unobserved heterogeneity; process and parameter effects in life insurance
Unobserved heterogeneity; process and parameter effects in life insurance Jaap Spreeuw & Henk Wolthuis University of Amsterdam ABSTRACT In this paper life insurance contracts based on an urn-of-urns model,
Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.
Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C
e.g. arrival of a customer to a service station or breakdown of a component in some system.
Poisson process Events occur at random instants of time at an average rate of λ events per second. e.g. arrival of a customer to a service station or breakdown of a component in some system. Let N(t) be
Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics
Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics For 2015 Examinations Aim The aim of the Probability and Mathematical Statistics subject is to provide a grounding in
November 2012 Course MLC Examination, Problem No. 1 For two lives, (80) and (90), with independent future lifetimes, you are given: k p 80+k
Solutions to the November 202 Course MLC Examination by Krzysztof Ostaszewski, http://www.krzysio.net, [email protected] Copyright 202 by Krzysztof Ostaszewski All rights reserved. No reproduction in
Chapter 2: Binomial Methods and the Black-Scholes Formula
Chapter 2: Binomial Methods and the Black-Scholes Formula 2.1 Binomial Trees We consider a financial market consisting of a bond B t = B(t), a stock S t = S(t), and a call-option C t = C(t), where the
Introduction to Event History Analysis DUSTIN BROWN POPULATION RESEARCH CENTER
Introduction to Event History Analysis DUSTIN BROWN POPULATION RESEARCH CENTER Objectives Introduce event history analysis Describe some common survival (hazard) distributions Introduce some useful Stata
Linear Threshold Units
Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear
Problem sets for BUEC 333 Part 1: Probability and Statistics
Problem sets for BUEC 333 Part 1: Probability and Statistics I will indicate the relevant exercises for each week at the end of the Wednesday lecture. Numbered exercises are back-of-chapter exercises from
Vignette for survrm2 package: Comparing two survival curves using the restricted mean survival time
Vignette for survrm2 package: Comparing two survival curves using the restricted mean survival time Hajime Uno Dana-Farber Cancer Institute March 16, 2015 1 Introduction In a comparative, longitudinal
E3: PROBABILITY AND STATISTICS lecture notes
E3: PROBABILITY AND STATISTICS lecture notes 2 Contents 1 PROBABILITY THEORY 7 1.1 Experiments and random events............................ 7 1.2 Certain event. Impossible event............................
Some Observations on Variance and Risk
Some Observations on Variance and Risk 1 Introduction By K.K.Dharni Pradip Kumar 1.1 In most actuarial contexts some or all of the cash flows in a contract are uncertain and depend on the death or survival
Modeling the Claim Duration of Income Protection Insurance Policyholders Using Parametric Mixture Models
Modeling the Claim Duration of Income Protection Insurance Policyholders Using Parametric Mixture Models Abstract This paper considers the modeling of claim durations for existing claimants under income
Algebra I Vocabulary Cards
Algebra I Vocabulary Cards Table of Contents Expressions and Operations Natural Numbers Whole Numbers Integers Rational Numbers Irrational Numbers Real Numbers Absolute Value Order of Operations Expression
Comparison of Survival Curves
Comparison of Survival Curves We spent the last class looking at some nonparametric approaches for estimating the survival function, Ŝ(t), over time for a single sample of individuals. Now we want to compare
What is Statistics? Lecture 1. Introduction and probability review. Idea of parametric inference
0. 1. Introduction and probability review 1.1. What is Statistics? What is Statistics? Lecture 1. Introduction and probability review There are many definitions: I will use A set of principle and procedures
Chapter Two. THE TIME VALUE OF MONEY Conventions & Definitions
Chapter Two THE TIME VALUE OF MONEY Conventions & Definitions Introduction Now, we are going to learn one of the most important topics in finance, that is, the time value of money. Note that almost every
Problem of Missing Data
VASA Mission of VA Statisticians Association (VASA) Promote & disseminate statistical methodological research relevant to VA studies; Facilitate communication & collaboration among VA-affiliated statisticians;
ACTUARIAL NOTATION. i k 1 i k. , (ii) i k 1 d k
ACTUARIAL NOTATION 1) v s, t discount function - this is a function that takes an amount payable at time t and re-expresses it in terms of its implied value at time s. Why would its implied value be different?
Multivariate Logistic Regression
1 Multivariate Logistic Regression As in univariate logistic regression, let π(x) represent the probability of an event that depends on p covariates or independent variables. Then, using an inv.logit formulation
1 Short Introduction to Time Series
ECONOMICS 7344, Spring 202 Bent E. Sørensen January 24, 202 Short Introduction to Time Series A time series is a collection of stochastic variables x,.., x t,.., x T indexed by an integer value t. The
The Variability of P-Values. Summary
The Variability of P-Values Dennis D. Boos Department of Statistics North Carolina State University Raleigh, NC 27695-8203 [email protected] August 15, 2009 NC State Statistics Departement Tech Report
15.062 Data Mining: Algorithms and Applications Matrix Math Review
.6 Data Mining: Algorithms and Applications Matrix Math Review The purpose of this document is to give a brief review of selected linear algebra concepts that will be useful for the course and to develop
Linda Staub & Alexandros Gekenidis
Seminar in Statistics: Survival Analysis Chapter 2 Kaplan-Meier Survival Curves and the Log- Rank Test Linda Staub & Alexandros Gekenidis March 7th, 2011 1 Review Outcome variable of interest: time until
IEOR 6711: Stochastic Models, I Fall 2012, Professor Whitt, Final Exam SOLUTIONS
IEOR 6711: Stochastic Models, I Fall 2012, Professor Whitt, Final Exam SOLUTIONS There are four questions, each with several parts. 1. Customers Coming to an Automatic Teller Machine (ATM) (30 points)
How To Check For Differences In The One Way Anova
MINITAB ASSISTANT WHITE PAPER This paper explains the research conducted by Minitab statisticians to develop the methods and data checks used in the Assistant in Minitab 17 Statistical Software. One-Way
A Coefficient of Variation for Skewed and Heavy-Tailed Insurance Losses. Michael R. Powers[ 1 ] Temple University and Tsinghua University
A Coefficient of Variation for Skewed and Heavy-Tailed Insurance Losses Michael R. Powers[ ] Temple University and Tsinghua University Thomas Y. Powers Yale University [June 2009] Abstract We propose a
Life Tables. Marie Diener-West, PhD Sukon Kanchanaraksa, PhD
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this
Annuities. Lecture: Weeks 9-11. Lecture: Weeks 9-11 (STT 455) Annuities Fall 2014 - Valdez 1 / 43
Annuities Lecture: Weeks 9-11 Lecture: Weeks 9-11 (STT 455) Annuities Fall 2014 - Valdez 1 / 43 What are annuities? What are annuities? An annuity is a series of payments that could vary according to:
Time Series and Forecasting
Chapter 22 Page 1 Time Series and Forecasting A time series is a sequence of observations of a random variable. Hence, it is a stochastic process. Examples include the monthly demand for a product, the
2DI36 Statistics. 2DI36 Part II (Chapter 7 of MR)
2DI36 Statistics 2DI36 Part II (Chapter 7 of MR) What Have we Done so Far? Last time we introduced the concept of a dataset and seen how we can represent it in various ways But, how did this dataset came
Sums of Independent Random Variables
Chapter 7 Sums of Independent Random Variables 7.1 Sums of Discrete Random Variables In this chapter we turn to the important question of determining the distribution of a sum of independent random variables
Simple Linear Regression Inference
Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation
R 2 -type Curves for Dynamic Predictions from Joint Longitudinal-Survival Models
Faculty of Health Sciences R 2 -type Curves for Dynamic Predictions from Joint Longitudinal-Survival Models Inference & application to prediction of kidney graft failure Paul Blanche joint work with M-C.
On Marginal Effects in Semiparametric Censored Regression Models
On Marginal Effects in Semiparametric Censored Regression Models Bo E. Honoré September 3, 2008 Introduction It is often argued that estimation of semiparametric censored regression models such as the
MATHEMATICAL METHODS OF STATISTICS
MATHEMATICAL METHODS OF STATISTICS By HARALD CRAMER TROFESSOK IN THE UNIVERSITY OF STOCKHOLM Princeton PRINCETON UNIVERSITY PRESS 1946 TABLE OF CONTENTS. First Part. MATHEMATICAL INTRODUCTION. CHAPTERS
You know from calculus that functions play a fundamental role in mathematics.
CHPTER 12 Functions You know from calculus that functions play a fundamental role in mathematics. You likely view a function as a kind of formula that describes a relationship between two (or more) quantities.
Notes on Factoring. MA 206 Kurt Bryan
The General Approach Notes on Factoring MA 26 Kurt Bryan Suppose I hand you n, a 2 digit integer and tell you that n is composite, with smallest prime factor around 5 digits. Finding a nontrivial factor
LOGNORMAL MODEL FOR STOCK PRICES
LOGNORMAL MODEL FOR STOCK PRICES MICHAEL J. SHARPE MATHEMATICS DEPARTMENT, UCSD 1. INTRODUCTION What follows is a simple but important model that will be the basis for a later study of stock prices as
DATA INTERPRETATION AND STATISTICS
PholC60 September 001 DATA INTERPRETATION AND STATISTICS Books A easy and systematic introductory text is Essentials of Medical Statistics by Betty Kirkwood, published by Blackwell at about 14. DESCRIPTIVE
Kaplan-Meier Plot. Time to Event Analysis Diagnostic Plots. Outline. Simulating time to event. The Kaplan-Meier Plot. Visual predictive checks
1 Time to Event Analysis Diagnostic Plots Nick Holford Dept Pharmacology & Clinical Pharmacology University of Auckland, New Zealand 2 Outline The Kaplan-Meier Plot Simulating time to event Visual predictive
Chi Square Tests. Chapter 10. 10.1 Introduction
Contents 10 Chi Square Tests 703 10.1 Introduction............................ 703 10.2 The Chi Square Distribution.................. 704 10.3 Goodness of Fit Test....................... 709 10.4 Chi Square
Introduction to General and Generalized Linear Models
Introduction to General and Generalized Linear Models General Linear Models - part I Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby
Supplement to Call Centers with Delay Information: Models and Insights
Supplement to Call Centers with Delay Information: Models and Insights Oualid Jouini 1 Zeynep Akşin 2 Yves Dallery 1 1 Laboratoire Genie Industriel, Ecole Centrale Paris, Grande Voie des Vignes, 92290
Introduction to Survival Analysis
John Fox Lecture Notes Introduction to Survival Analysis Copyright 2014 by John Fox Introduction to Survival Analysis 1 1. Introduction I Survival analysis encompasses a wide variety of methods for analyzing
