# Non-Parametric Estimation in Survival Models

Save this PDF as:

Size: px
Start display at page:

## Transcription

1 Non-Parametric Estimation in Survival Models Germán Rodríguez Spring, 2001; revised Spring 2005 We now discuss the analysis of survival data without parametric assumptions about the form of the distribution. 1 One Sample: Kaplan-Meier Our first topic is non-parametric estimation of the survival function. If the data were not censored, the obvious estimate would be the empirical survival function Ŝ(t) = 1 n I{t i > t}, n where I is the indicator function that takes the value 1 if the condition in braces is true and 0 otherwise. The estimator is simply the proportion alive at t. 1.1 Estimation with Censored Data Kaplan and Meier (1958) extended the estimate to censored data. Let t (1) < t (2) <... < t (m) denote the distinct ordered times of death (not counting censoring times). Let d i be the number of deaths at t (i), and let n i be the number alive just before t (i). This is the number exposed to risk at time t (i). Then the Kaplan- Meier or product limit estimate of the survivor function is Ŝ(t) = i:t (i) <t ( 1 d i n i A heuristic justification of the estimate is as follows. To survive to time t you must first survive to t (1). You must then survive from t (1) to t (2) given ). 1

2 that you have already survived to t (1). And so on. Because there are no deaths between t (i 1) and t (i), we take the probability of dying between these times to be zero. The conditional probability of dying at t (i) given that the subject was alive just before can be estimated by d i /n i. The conditional probability of surviving time t (i) is the complement 1 d i /n i. The overall unconditional probability of surviving to t is obtained by multiplying the conditional probabilities for all relevant times up to t. The Kaplan-Meier estimate is a step function with discontinuities or jumps at the observed death times. Figure 1 shows Kaplan-Meier estimates for the treated and control groups in the famous Gehan data (see Cox, 1972 or Andersen et al., 1993, p ) Figure 1: Kaplan-Meier Estimates for Gehan Data If there is no censoring, the K-M estimate coincides with the empirical survival function. If the last observation happens to be a censored case, as is the case in the treated group in the Gehan data, the estimate is undefined beyond the last death. 2

3 1.2 Non-parametric Maximum Likelihood The K-M estimator has a nice interpretation as a non-parametric maximum likelihood estimator (NPML). A rigorous treatment of this notion is beyond the scope of the course, but the original article by K-M provides a more intuitive approach. We consider the contribution to the likelihood of cases that die or are censored at time t. If a subject is censored at t its contribution to the likelihood is S(t). In order to maximize the likelihood we would like to make this as large as possible. Because a survival function must be non-increasing, the best we can do is keep it constant at t. In other words, the estimated survival function doesn t change at censoring times. If a subject dies at t then this is one of the distinct times of death that we introduced before. Say it is t (i). We need to make the survival function just before t (i) as large as possible. The largest it can be is the value at the previous time of death or 1, whichever is less. We also need to make the survival at t (i) itself as small as possible. This means we need a discontinuity at t (i). Let c i denote the number of cases censored between t (i) and t (i+1), and let d i be the number of cases that die at t (i). Then the likelihood function takes the form m L = [S(t (i 1) ) S(t (i) )] d i S(t (i) ) c i, where the product is over the m distinct times of death, and we take t (0) = 0 with S(t (0) ) = 1. The problem now is to estimate m parameters representing the values of the survival function at the death times t (1), t (2),..., t (m). Write π i = S(t (i) )/S(t (i 1) ) for the conditional probability of surviving from S(t (i 1) ) to S(t (i) ). Then we can write and the likelihood becomes S(t (i) ) = π 1 π 2... π i, m L = (1 π i ) d i π c i i (π 1π 2... π i 1 ) d i+c i. Note that all cases who die at t (i) or are censored between t (i) and t (i+1) contribute a term π j to each of the previous times of death from t (1) to t (i 1). In addition, those who die at t (i) contribute 1 π i, and the censored 3

4 cases contribute an additional π i. Let n i = j i (d j + c j ) denote the total number exposed to risk at t (i). We can then collect terms on each π i and write the likelihood as m L = (1 π i ) d i π n i d i a binomial likelihood. The m.l.e. of π i is then ˆπ i = n i d i n i i, = 1 d i n i. The K-M estimator follows from multiplying these conditional probabilities. 1.3 Greenwood s Formula From the likelihood function obtained above it follows that the large sample variance of ˆπ i conditional on the data n i and d i is given by the usual binomial formula var(ˆπ i ) = π i(1 π i ) n i. Perhaps less obviously, cov(ˆπ i, ˆπ j ) = 0 for i j, so the covariances of the contributions from different times of death are all zero. You can verify this result by taking logs and then first and second derivatives of the loglikelihood function. To obtain the large sample variance of Ŝ(t), the K-M estimate of the survival function, we need to apply the delta method twice. First we take logs, so that instead of the variance of a product we can find the variance of a sum, working with i K i = log Ŝ(t (i)) = log ˆπ j. j=1 Now we need to find the variance of the log of ˆπ i. This will be our first application of the delta method. The large-sample variance of a function f of a random variable X is var(f(x)) = (f (X)) 2 var(x), so we just multiply the variance of X by the derivative of the transformation. In our case the function is the log and we obtain var(log ˆπ i ) = ( 1ˆπ i ) 2 var( ˆπ i ) = 1 π i n i π i. 4

5 Because K i is a sum and the covariances of the π i s (and hence of the log π i s) are zero, we find var(log Ŝ(t (i))) = i j=1 1 π j n j π j = d j n j (n j d j ). Now we have to use the delta method again, this time to get the variance of the survivor function from the variance of its log: var(ŝ(t (i))) = [Ŝ(t (i))] 2 i j=1 1 ˆπ j n j ˆπ j. This result is known as Greenwood s formula. You may question the derivation because it conditions on the n j which are random variables, but the result is in the spirit of likelihood theory, conditioning on all observed quantities, and has been justified rigourously. Peterson (1977) has shown that the K-M estimator Ŝ(t) is consistent, and Breslow and Crowley (1974) show that n(ŝ(t) S(t)) converges in law to a Gaussian process with expectation 0 and a variance-covariance function that may be approximated using Greenwood s formula. For a modern treatment of the estimator from the point of view of counting processes see Andersen et al. (1993). 1.4 The Nelson-Aalen Estimator Consider estimating the cumulative hazard Λ(t). A simple approach is to start from an estimator of S(t) and take minus the log. An alternative approach is to estimate the cumulative hazard directly using the Nelson- Aalen estimator: i d j ˆΛ(t (i) ) =. n j Intuitively, this expression is estimating the hazard at each distinct time of death t (j) as the ratio of the number of deaths to the number exposed. The cumulative hazard up to time t is simply the sum of the hazards at all death times up to t, and has a nice interpretation as the expected number of deaths in (0, t] per unit at risk. This estimator has a strong justification in terms of the theory of counting processes. The variance of ˆΛ(t (i) ) can be approximated by var( log Ŝ(t (i))), which we obtained on our way to Greenwood s formula. Therneau and Grambsch (2000) discuss alternative approximations. 5 j=1

6 Breslow (1972) suggested estimating the survival function as Ŝ(t) = exp{ ˆΛ(t)}, where ˆΛ(t) is the Nelson-Aalen estimator of the integrated hazard. The Breslow estimator and the K-M estimator are asymptotically equivalent, and usually are quite close to each other, particularly when the number of deaths is small relative to the number exposed. 1.5 Expectation of Life If Ŝ(t (m)) = 0 then one can estimate µ = E(T ) as the integral of the K-M estimate: m ˆµ = Ŝ(t)dt = (t (i) t (i 1) )Ŝ(t (i)). 0 Can you figure out the variance of ˆµ? 2 k-samples: Mantel-Haenszel Consider now the problem of comparing two or more survivor functions, for example urban versus rural, or treated versus control. Let t (1) < t (2) <... < t (m) denote the distinct times of death observed in the total sample, obtained by combining all groups of interest. Let d ij = deaths at time t (i) in group j, and n ij = number at risk at time t (i) in group j. We also let d i and n i denote the total number of deaths and subjects at risk at time t (i). If the survival probabilities are the same in all groups, then the d i deaths at time t (i) should be distributed among the k groups in proportion to the number at risk. Thus, conditional on d i and n ij, E(d ij ) = d i n ij n i = n ij d i n i, where the last term shows that we can also view this calculation as applying an overall failure rate d i /n i to the n ij subjects in group j. 6

7 We now proceed beyond the mean to obtain the distribution of these counts. Imagine setting up a contingency table at each distinct failure time, with rows given by survival status and columns given by group membership. The entries in the table are d ij or n ij d ij, the row totals are d i and n i and the column totals are n ij. The distribution of the counts conditional on both the row and column totals is hypergeometric. (We mentioned this distribution briefly in WWS509 when we considered contingency tables with both margins fixed.) The hypergeometric distribution has mean as given above, variance var(d ij ) = d i(n i d i ) n ij (1 n ij ), n i 1 n i n i and covariance cov(d ir, d is ) = d i(n i d i ) n i 1 n ir n is n 2. i Let d i denote the vector of deaths at time t (i), with mean E( d i ) and var-cov matrix var( d i ). We sum these over all times to obtain m D = [ d i E( d i )] and m V = var( d i ). Mantel and Haenszel proposed testing the equality of the k survival functions by treating the quadratic form H 0 : S 1 (t) = S 2 (t) =... = S k (t) Q = D V D as a χ 2 statistic with k 1 degrees of freedom. Here V is any generalized inverse of V. Omitting the i-th group from the calculation of D and V will do; the test is invariant to the choice of omitted group. For k = 2 we get z = Q = (di1 E(d i1 )). var(di1 ) An approximation for k 2 which does not require matrix inversion treats i j (O ij E ij ) 2 E ij, where O ij denotes observed and E ij expected deaths at time t (i) in group j, as a χ 2 statistic with k 1 d.f. 7

8 The Mantel-Haenszel test can be derived as a linear rank test, and in that context is often called the log-rank or Savage test. Kalbfleisch and Prentice have proposed an extension to censored data of the Wilcoxon test. Other alternatives have been proposed by Gehan (1965) and Breslow (1970), but the M-H test is the most popular one. 3 Regression: Cox s Model Let us consider the more general problem where we have a vector x of covariates. The k-sample problem can be viewed as the special case where the x s are dummy variables denoting group membership. Recall the basic model λ(t, x) = λ 0 (t)e x β, and consider estimation of β without making any assumptions about the baseline hazard λ 0 (t). 3.1 Cox s Partial Likelihood In his 1972 paper Cox proposed fitting the model by maximizing a special likelihood. Let t (1) < t (2) <... < t (m) denote the observed distinct times of death, as before, and consider what happens at t (i). Let R i denote the risk set at t (i), defined as the set of indices of the subjects that are alive just before t (i). Thus, R 0 = {1, 2,..., n}. Suppose first that there are no ties in the observation times, so one and only person subject failed at t (i). Let s call this subject j(i). What is the conditional probability that this particular subject would fail at t (i) given the risk set R i and the fact that exactly one subject fails at that time? Answer: λ(t (i), x j(i) )dt j R i λ(t (i), x j )dt. We can write this probability in terms of the baseline hazard and relative risk as λ 0 (t (i) )e x j(i) β j R i λ 0 (t (i) )e x j β, 8

9 and we notice that the baseline hazard cancels, so the probability in question is e x j(i) β j R i e x j β and does not depend on the baseline hazard λ 0 (t). Cox proposed multiplying these probabilities together over all distinct failure times and treating the resulting product L = m e x j(i) β j R i e x j β as if it was an ordinary likelihood. In his original paper Cox called this a conditional likelihood because it is a product of conditional probabilities, but later abandoned the name because it is misleading: L is not itself a conditional probability. Kalbfleisch and Prentice considered the case where the covariates are fixed over time and showed that L is the marginal likelihood of the ranks of the observations, obtained by considering just the order in which people die and not the actual times at which they die. In 1975 Cox provided a more general justification of L as part of the full likelihood in fact, a part that happens to contain most of the information about β and therefore proposed calling L a partial likelihood. This justification is valid even with time-varying covariates. A more rigorous justification of the partial likelihood in terms of the theory of counting processes can be found in Andersen et al. (1993). 3.2 The Score and Information The log of Cox s partial likelihood is logl = i {x (j(i) β log j R i e x j β }. Taking derivatives with respect to β we find the score to be log L i β r = x j(i)r j R i e x j β x jr j R i e x j β. The term to the right of the minus sign is just a weighted average of x r over the risk set R i with weights equal to the relative risks e x j β. Thus, we can 9

10 write the score as U r (β) = log L i β r = x j(i)r A ir (β), where A ir (β) is the weighted average of x r over the risk set R i. Taking derivatives again and changing sign we find the observed information to be 2 log L i e x j β x jr x js ( e x j β ) ( e x j β x jr )( e x j β x js ) = β r β s ( e x j β, ) 2 where all sums are over the risk set R i. The right hand side can be written as the difference of two terms. The first term can be interpreted as a weighted average of the cross-product of x r and x s. The second term is the product of the weighted averages A ir (β) and A is (β). Thus we can write 2 log L i e x j β x jr x js = β r β s e x j β A jr (β)a js (β). You may recognize this expression as the old desk calculator formula for a covariance, leading to the observed information I(β) = 2 log L i β r β s = m C irs (β), where C irs (β) denotes the weighted covariance of x r and x s over the risk set R i with weights equal to the relative risks e x j β for j R i. Calculation of the score and information is thus relatively simple. In matrix notation u(β) = i (x j(i) A i (β)) and I(β) = i C i (β), where A i (β) is the mean and C i (β) is the variance-covariance matrix of x over the risk set R i with weights e x j β for j R i. Notably, the partial log-likelihood is formally identical with the loglikelihood for a conditional logit model. 3.3 The Problem of Ties The development so far has assumed that only one death occurs at each distinct time t (i). In practice we often observe several deaths, say d i, at t (i). This can happen in several ways: 10

11 The data are discrete, so in fact there is a positive probability of failure at t (i). If this is the case one should really use a discrete model. We will discuss possible approaches below. The data are continuous but have been grouped, so d i represents the number of deaths in some interval around t (i). In this case one would probably be better off estimating a separate parameter for each interval, using a complementary-log-log binomial model or a Poisson model corresponding to a piece-wise exponential survival model, as discussed in my WWS509 notes. The data are continuous and are not grouped, but there are a few ties resulting perhaps from coarse measurement. In this case we can extend the argument used to build the likelihood. Let D i denote the set of indices of the d i cases who failed at t (i). The probability that the d i cases that actually fail would be those in D i given the risk set R i and the fact that d i of them fail at t (i) is L = j D i e x j β P i j P i e x j β, where the sum in the denominator is over all possible permutations P i or ways of choosing d i indices from the risk set, and the product is over the set of chosen indices, which we call a permutation. For example assume four people are at risk and two die. The deaths could be {1,2}, {1,3}, {1,4}, {2,3}, {2,4}, {3,4}. We calculate the probability of each of these outcomes. Then we divide the probability of the outcome that actually occurred by the sum of the probabilities of all possible outcomes. This likelihood was proposed by Cox in his original paper. The numerator is easy to calculate, and has the form exp{s i β}, where S i = j D i x j is the sum of the x s over the death set D i. The denominator is difficult to calculate because the number of permutations grows very quickly with d i. Peto (and independently Breslow) proposed approximating the denominator by calculating the sum e x j β over the entire risk set R i and raising it to d i. This leads to the much simpler expression m L e S i β j β j R i e x d i. The Peto-Breslow approximation is reasonably good when d i is small relative to n i, and is popular because of its simplicity. Efron proposed a better 11

12 approximation that requires only modest additional computational effort. Consider again our example where two out of four subjects fail. Suppose the subjects that fail are 1 and 2, and let r j = e x j β denote the relative risk for the j-th subject. In continuous time one must have failed before the other, we just don t know which. The contributions to the partial likelihood would be r 1 r 1 + r 2 + r 3 + r 4 r 2 r 2 + r 3 + r 4 if 1 failed before 2, or r 2 r 1 r 1 + r 2 + r 3 + r 4 r 1 + r 3 + r 4 if 2 was the first to fail. In both cases the numerator is r 1 r 2. To compute the denominator Peto and Breslow add the risks over the complete risk set both times, using (r 1 + r 2 + r 3 + r 4 ) 2, which is obviously conservative. Efron uses the average risk of subjects 1 and 2 for the second term, so he takes the denominator to be (r 1 + r 2 + r 3 + r 4 )(0.5r r 2 + r 3 + r 4 ). This approximation is much more accurate unless d i is very large relative to n i. For more details see Therneau and Grambsch (2000, Section 3.3). 3.4 Tests of Hypotheses As usual, we have three approaches to testing hypotheses about ˆβ: Likelihood Ratio Test: given two nested models, we treat twice the difference in partial log-likelihoods as a χ 2 statistic with degrees of freedom equal to the difference in the number of parameters. Wald Test: we use the fact that approximately in large samples ˆβ has a multivariate normal distribution with mean β and variance-covariance matrix var( ˆβ) = I 1 (β). Thus, under H 0 : β = β 0, the quadratic form ( ˆβ β 0 ) var 1 ( ˆβ)( ˆβ β 0 ) χ 2 p, where p is the dimension of β. This test is often used for a subset of β. Score Test: we use the fact that approximately in large samples the score u(β) has a multivariate normal distribution with mean 0 and variance-covariance matrix equal to the information matrix. Thus, under H 0 : β = β 0, the quadratic form u(β 0 ) I 1 (β 0 )u(β 0 ) χ 2 p. Note that this test does not require calculating the m.l.e. ˆβ. 12

13 One reason for bringing up the score test is that in the k-sample case the score test of H 0 : β = 0 based on Cox s model happens to be the same as the Mantel-Haenszel log-rank test. Here is an outline of the proof. Assume no ties. If β = 0 then the weights used to calculate A i and C i are all 1, A ir happens to be the proportion of the risk set R i that comes from the r-th sample (which is the same as the expected number of deaths in that group) and C i is a binomial variance-covariance matrix. If there are ties, Cox s approach leads to the test discussed in Section 2. Use of Peto s approximation is equivalent to omitting the factor (n i d i )/(n i 1) from the variance-covariance matrix. All three tests are asymptotically equivalent. The quality of the normal approximations depends on the sample size, the distribution of the cases over the covariate space, and the extent of censoring. 3.5 Time-Varying Covariates A nice feature of the Cox model and partial likelihood is that it extends easily to the case of time-varying covariates. Note that the partial likelihood is built by considering only what happens at each failure time, so we only need to know the values of the covariates at the distinct times of death. One use of time-varying covariates is to check the assumption of proportionality of hazards. In his original paper Cox analyzes a two-sample problem and introduces an auxiliary covariate { 0, in group 0 z i = t c, in group 1 where c is an arbitrary constant close to the mean of t, chosen to avoid numerical instability. If the coefficient of z is 0, the assumption of proportionality of hazards is adequate. A positive value indicates that the ratio of hazards for group 1 over group 0 increases over time. A negative coefficient suggests a declining hazard ratio, a common occurrence. Another use of time-varying covariates is to represent variables that simply change over time. In a study of contraceptive failure, for example, one may treat frequency of intercourse as a time-varying covariate. Another example of a time-varying covariate is education in an analysis of age at marriage. Note that x(t) may represent the actual value of a variable at time t, or any index based on the individual s history up to time t. In a study of the effects of breastfeeding on post-partum amenorrhea, for example, x(t) could represent the total number of suckling episodes in the week preceding t. 13

14 The partial likelihood function with time-varying covariates does not have an interpretation as a marginal likelihood of the ranks. For further details see Cox and Oakes (1984, Chapter 8). 3.6 Estimating the Baseline Survival Interest so far has focused on the regression coefficients β. We now consider how to estimate the baseline hazard λ 0 (t), which dropped out of the partial likelihood. Kalbfleisch and Prentice (1980, Section 4.3) use an argument similar to the derivation of the Kaplan-Meier estimate, noting that the hazard should assign mass only to the discrete times of death. Let π i denote the conditional survival probability at time t (i) for a baseline subject. To obtain the conditional probability for a subject with covariates x we would need to raise π i to e x β. This leads to a likelihood of the form L = m j D i (1 π i ) e x j β π e x j β i. j R i D i Meier suggested maximizing this likelihood with respect to both the π i and β. A simpler approach is to plug-in the estimate ˆβ from the partial likelihood, and maximize the resulting expression with respect to the π i only. If there are no ties, this gives ˆπ i = 1 ˆβ e x ˆβ j(i) ex j(i) j R i e x ˆβ j Think of this as follows. With no covariates, our estimate would be ˆπ i = 1 d i /n i with d i = 1, which is the K-M estimate. With covariates we do the same thing, except that we weight each case by its relative risk. The resulting survival probability is then raised to e x j(i) β to turn it into a baseline probability. If there are ties one has to solve. j D i x e ˆβ j 1 π ex j ˆβ i iteratively. A suitable starting value is = e x ˆβ j j R i d i log π i = j R i e x ˆβ. j 14

15 The estimate of the baseline survival function is then a step function Ŝ 0 (t) = i:t (i) <t Cox and Oakes (1984, Section 7.8) describe a simpler estimator that extends the Nelson-Aalen estimate of the cumulative hazard to the case of covariates. The estimator can be described somewhat heuristically as follows. Treat the baseline hazard as 0 except at failure times. The expected number of deaths at t (i) can be obtained by summing the hazards over the risk set: E(d i ) = λ 0 (t (i) )e x j β. j R i Equating the observed and expected number of deaths at t (i) leads us to estimate λ i = λ 0 (t (i) ) as ˆλ i = d i e x j β, where the sum is over the risk set R i. The cumulative hazard and survival functions are then estimated as ˆΛ 0 (t) = i:t (i) <t ˆπ i. ˆλ i, and Ŝ 0 (t) = e ˆΛ 0 (t). If there are no covariates these reduce to the ordinary Nelson-Aalen and Breslow estimators described earlier. Having obtained estimates of the baseline hazard and survival, we can obtain fitted hazards and survival functions for any value of x. This task is pretty straightforward when we have time-fixed covariates, as all we need to do is multiply the baseline hazard by the relative risk, or raise the baseline survival to the relative risk. With time-varying covariates things get somewhat more complicated, as we have to pick up the appropriate hazard for each distinct failure time depending on the values of the covariates at that point. 3.7 Martingale Residuals Residuals play an important role in checking linear and generalized linear models. Not surprisingly, the concept has been extended to survival models. A lot of this work relies heavily on the terminology and notation of counting processes. We will try to convey the essential ideas in a non-technical way. 15

16 Instead of focusing on the i-th individual s survival time T i, we will introduce a function N i (t) that counts events over time. In survival models N i (t) will be zero while the subject is alive and then it will become one. (In more general event-history models N i (t) counts the number of occurrences of the event to subject i by time t.) While survival time T i is a random variable, N i (t) is a stochastic process, a function of time. We will also introduce a function to track the i-th individual s exposure. Y i (t) will be one while the individual is in the risk set and zero afterwards. Note that Y i (t) can become zero due to death or due to censoring. To complete the model we add a hazard function λ i (t) representing the i-th individuals risk at time t. In a Cox model λ i (t) = λ 0 (t) exp{x i β}. In the terminology of counting processes, the process N i (t) is said to have intensity Y i (t)λ i (t). The intensity is just the risk if the subject is exposed, and is zero otherwise. The probability that N i (t) will jump in a small interval [t, t + dt) conditional on the entire history of the process is given by λ i (t)y i (t)dt, and is proportional to the intensity of the process and the width of the interval. A key feature of this formulation is that t M i (t) = N i (t) λ i (t)y i (t)dt 0 is a martingale, a fact that can be used to establish the properties of tests and estimators using martingale central limit theory. A martingale is essentially a stochastic process without drift. Given two times 0 < t 1 < t 2 the expectation E(M(t 2 )) given the history up to time t 1 is simply M(t 1 ). In other words martingale increments have mean zero. Also, martingale increments are uncorrelated, although not necessarily independent. The integral following the minus sign in the above equation is called the compensator of N i (t). You may think of it as the conditional expected value of the counting process at time t. Subtracting the compensator turns the counting process into a martingale. This equation suggest immediately the first type of residual one can use in Cox models, the so-called Martingale Residual: t ˆM i (t) = N i (t) Y i (t)e ˆβdˆΛ x i 0 (t) 0 where ˆΛ 0 (t) denotes the Nelson-Aalen estimator of the baseline hazard. Because this is a discrete function with jumps at the observed failure times, the integral in the above equation should be interpreted as a sum over all j such that t (j) < t. Usually the residual is computed at t = (or the largest 16

17 observed survival time), in which case the martingale residual is ˆM i = d i e x i ˆβ ˆΛ0 (t i ), where d i is the usual death indicator and t i is the observation time (to death or censoring) for the i-th individual. One possible use of residuals is to find outliers, in this case individuals who lived unusually short or long times, after taking into account the relative risks associated with their observed characteristics. Martingale residuals are just one of several types of residuals that have been proposed for survival models. Others include deviance, score and Schoenfeld residuals. For more details on counting processes, martingales and residuals see Therneau and Grambsch (2000), especially Section 2.2 and Chapter Models for Discrete and Grouped Data We close with a brief review of alternative approaches for discrete and continuous grouped data, expanding slightly on the WWS509 discussion. Cox (1972) proposed an alternative version of the proportional hazards model for discrete data. In this case the hazard is the conditional probability of dying at time t given survival up to that point, so that λ(t) = Pr{T = t T t}. Cox s discrete logistic model assumes that the conditional odds of surviving t (i) are proportional to some baseline odds, so that λ(t, x) 1 λ(t, x) = λ 0(t) 1 λ 0 (t) ex β. Note that taking logs on both sides results in a logit model, where the logit of the discrete hazard is linear on β. Do not confuse this model with the proportional odds model, where the unconditional odds of survival (or odds of dying) are proportional for different values of x. If the λ 0 (t) are small so that 1 λ 0 (t) is close to one, this model will be similar to the proportional hazards model, as the odds are close to the hazard itself. A nice property of this model is that the partial likelihood turns out to be identical to that of the continuous-time model. To see this point note 17

18 that under this model the hazard at time t i for an individual with covariate values x is λ(t i, x) = θ ie x β 1 + θ i e x β, where θ i = λ 0 (t i )/(1 λ 0 (t i )) denotes the baseline conditional odds of surviving t i. Suppose there are two cases exposed at time t i, labelled 1 and 2. The probability that 1 dies and 2 does not is λ(t i, x 1 )(1 λ(t i, x 2 )) = θ ie x 1 β 1 + θ i e x 1 β θ i e x 2 β. The probability that 2 dies and 1 does not has a similar structure, with θ i e x 2 β in the numerator and the same denominator. When we divide the probability of one of these outcomes by the sum of the two, the denominators cancel out, as do the θ i. Thus, the conditional probability is e x j(i) β j R i e x j β, which is exactly the same as in the continuous case. (You may wonder why we did not consider the 1 λ s in the continuous case. It turns out we didn t have to. In you repeat the continuous-time derivation including terms of the form λdt for deaths and 1 λdt for survivors you will discover that the dt s for deaths cancel out but those for survivors do not, and as dt 0 the terms 1 λdt 1 and drop from the likelihood.) There is an alternative approach to discrete data that is particularly appropriate when you have grouped continuous times. See Kalbfleisch and Prentice (1980), Section and Section Suppose we wanted a discrete time model that preserves the relationship between survivor functions in the continuous time model, namely S(t, x) = S 0 (t) ex β. In the discrete case we have S(t, x) = u<t (1 λ(u, x)), so we must have Solving for the hazard we get 1 λ(t i, x) = (1 λ 0 (t i )) ex β. λ(t i, x) = 1 (1 λ 0 (t i )) ex β. 18

19 The linearizing transformation for this model is the complementary log-log link, log( log(1 λ)). To see why this model is uniquely appropriate for grouped data suppose we only observe failures in intervals [0 = τ 0, τ 1 ), [τ 1, τ 2 ),..., [τ k 1, τ k = ). Suppose the hazard is continuous and satisfies a standard proportional hazards model. The probability of surviving interval i for a subject with covariates x is Pr{T > τ i T > τ i 1, x} = S(τ i, x) S(τ i 1, x). Writing this in terms of the baseline survival we get ( ) S0 (τ i ) e x β = (e τ ) i e λ τ 0 (t)dt x β i 1. S 0 (τ i 1 ) In view of this result, we define the baseline hazard in interval (τ i 1, τ i ) as λ 0i = 1 e τ i τ i 1 λ 0 (t)dt. The hazard for an individual with covariates x in the same interval then becomes λ i (x) = 1 (1 λ 0i ) ex β, and can be linearized using the c-log-log transformation. This is the only discrete model appropriate for grouped data from a continuous-time proportional hazards model. Unfortunately, it cannot be estimated non-parametrically using a partial likelihood argument. (If you try to construct a partial likelihood you will discover that the λ 0i s do not drop out of the likelihood.) In practice, both the logit and the complementary log-log discrete models can be estimated via ordinary likelihood techniques by treating the baseline hazard at each discrete failure time (or interval) as a separate parameter to be estimated, provided of course one has enough failures at each time (or interval). The resulting likelihood is the same as that of a binomial model with logit or c-log-log link, so either model can be easily fitted with standard software. One difficulty with discrete models generated by grouping continuous data has to do with censored cases that may have been exposed part of the interval. The only way to handle these cases is to censor them at the 19

20 beginning of the interval, which throws away some information. This is not a problem with piece-wise exponential models based on the Poisson equivalence, because one can easily take into account partial exposure in the offset. 20

### 7.1 The Hazard and Survival Functions

Chapter 7 Survival Models Our final chapter concerns models for the analysis of data which have three main characteristics: (1) the dependent variable or response is the waiting time until the occurrence

### Poisson Models for Count Data

Chapter 4 Poisson Models for Count Data In this chapter we study log-linear models for count data under the assumption of a Poisson error structure. These models have many applications, not only to the

### Lecture 2 ESTIMATING THE SURVIVAL FUNCTION. One-sample nonparametric methods

Lecture 2 ESTIMATING THE SURVIVAL FUNCTION One-sample nonparametric methods There are commonly three methods for estimating a survivorship function S(t) = P (T > t) without resorting to parametric models:

### Parametric Survival Models

Parametric Survival Models Germán Rodríguez grodri@princeton.edu Spring, 2001; revised Spring 2005, Summer 2010 We consider briefly the analysis of survival data when one is willing to assume a parametric

### Logit Models for Binary Data

Chapter 3 Logit Models for Binary Data We now turn our attention to regression models for dichotomous data, including logistic regression and probit analysis. These models are appropriate when the response

### Checking proportionality for Cox s regression model

Checking proportionality for Cox s regression model by Hui Hong Zhang Thesis for the degree of Master of Science (Master i Modellering og dataanalyse) Department of Mathematics Faculty of Mathematics and

### Models for Count Data With Overdispersion

Models for Count Data With Overdispersion Germán Rodríguez November 6, 2013 Abstract This addendum to the WWS 509 notes covers extra-poisson variation and the negative binomial model, with brief appearances

### Generalized Linear Models. Today: definition of GLM, maximum likelihood estimation. Involves choice of a link function (systematic component)

Generalized Linear Models Last time: definition of exponential family, derivation of mean and variance (memorize) Today: definition of GLM, maximum likelihood estimation Include predictors x i through

### SAS Software to Fit the Generalized Linear Model

SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling

: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary

### Probability Calculator

Chapter 95 Introduction Most statisticians have a set of probability tables that they refer to in doing their statistical wor. This procedure provides you with a set of electronic statistical tables that

### Overview Classes. 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7)

Overview Classes 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7) 2-4 Loglinear models (8) 5-4 15-17 hrs; 5B02 Building and

### ˆx wi k = i ) x jk e x j β. c i. c i {x ik x wi k}, (1)

Residuals Schoenfeld Residuals Assume p covariates and n independent observations of time, covariates, and censoring, which are represented as (t i, x i, c i ), where i = 1, 2,..., n, and c i = 1 for uncensored

### Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written

### Variance of OLS Estimators and Hypothesis Testing. Randomness in the model. GM assumptions. Notes. Notes. Notes. Charlie Gibbons ARE 212.

Variance of OLS Estimators and Hypothesis Testing Charlie Gibbons ARE 212 Spring 2011 Randomness in the model Considering the model what is random? Y = X β + ɛ, β is a parameter and not random, X may be

### Comparison of Survival Curves

Comparison of Survival Curves We spent the last class looking at some nonparametric approaches for estimating the survival function, Ŝ(t), over time for a single sample of individuals. Now we want to compare

### Applied Statistics. J. Blanchet and J. Wadsworth. Institute of Mathematics, Analysis, and Applications EPF Lausanne

Applied Statistics J. Blanchet and J. Wadsworth Institute of Mathematics, Analysis, and Applications EPF Lausanne An MSc Course for Applied Mathematicians, Fall 2012 Outline 1 Model Comparison 2 Model

### Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression

Logistic Regression Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Logistic Regression Preserve linear classification boundaries. By the Bayes rule: Ĝ(x) = arg max

### Statistics in Retail Finance. Chapter 6: Behavioural models

Statistics in Retail Finance 1 Overview > So far we have focussed mainly on application scorecards. In this chapter we shall look at behavioural models. We shall cover the following topics:- Behavioural

### Introduction to General and Generalized Linear Models

Introduction to General and Generalized Linear Models General Linear Models - part I Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby

Statistics Graduate Courses STAT 7002--Topics in Statistics-Biological/Physical/Mathematics (cr.arr.).organized study of selected topics. Subjects and earnable credit may vary from semester to semester.

### Survival analysis methods in Insurance Applications in car insurance contracts

Survival analysis methods in Insurance Applications in car insurance contracts Abder OULIDI 1-2 Jean-Marie MARION 1 Hérvé GANACHAUD 3 1 Institut de Mathématiques Appliquées (IMA) Angers France 2 Institut

### Distribution (Weibull) Fitting

Chapter 550 Distribution (Weibull) Fitting Introduction This procedure estimates the parameters of the exponential, extreme value, logistic, log-logistic, lognormal, normal, and Weibull probability distributions

### Lecture 3: Linear methods for classification

Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,

### Event history analysis

Event history analysis Jeroen K. Vermunt Department of Methodology and Statistics Tilburg University 1 Introduction The aim of event history analysis is to explain why certain individuals are at a higher

### Gamma Distribution Fitting

Chapter 552 Gamma Distribution Fitting Introduction This module fits the gamma probability distributions to a complete or censored set of individual or grouped data values. It outputs various statistics

### Survival Analysis Using SPSS. By Hui Bian Office for Faculty Excellence

Survival Analysis Using SPSS By Hui Bian Office for Faculty Excellence Survival analysis What is survival analysis Event history analysis Time series analysis When use survival analysis Research interest

### Linear Models for Continuous Data

Chapter 2 Linear Models for Continuous Data The starting point in our exploration of statistical models in social research will be the classical linear model. Stops along the way include multiple linear

### Ordinal Regression. Chapter

Ordinal Regression Chapter 4 Many variables of interest are ordinal. That is, you can rank the values, but the real distance between categories is unknown. Diseases are graded on scales from least severe

### The Scalar Algebra of Means, Covariances, and Correlations

3 The Scalar Algebra of Means, Covariances, and Correlations In this chapter, we review the definitions of some key statistical concepts: means, covariances, and correlations. We show how the means, variances,

### 3. The Multivariate Normal Distribution

3. The Multivariate Normal Distribution 3.1 Introduction A generalization of the familiar bell shaped normal density to several dimensions plays a fundamental role in multivariate analysis While real data

### Stat 5102 Notes: Nonparametric Tests and. confidence interval

Stat 510 Notes: Nonparametric Tests and Confidence Intervals Charles J. Geyer April 13, 003 This handout gives a brief introduction to nonparametrics, which is what you do when you don t believe the assumptions

### Introduction. Survival Analysis. Censoring. Plan of Talk

Survival Analysis Mark Lunt Arthritis Research UK Centre for Excellence in Epidemiology University of Manchester 01/12/2015 Survival Analysis is concerned with the length of time before an event occurs.

### Estimating an ARMA Process

Statistics 910, #12 1 Overview Estimating an ARMA Process 1. Main ideas 2. Fitting autoregressions 3. Fitting with moving average components 4. Standard errors 5. Examples 6. Appendix: Simple estimators

Lecture 16: Generalized Additive Models Regression III: Advanced Methods Bill Jacoby Michigan State University http://polisci.msu.edu/jacoby/icpsr/regress3 Goals of the Lecture Introduce Additive Models

### 13. Poisson Regression Analysis

136 Poisson Regression Analysis 13. Poisson Regression Analysis We have so far considered situations where the outcome variable is numeric and Normally distributed, or binary. In clinical work one often

### Estimating survival functions has interested statisticians for numerous years.

ZHAO, GUOLIN, M.A. Nonparametric and Parametric Survival Analysis of Censored Data with Possible Violation of Method Assumptions. (2008) Directed by Dr. Kirsten Doehler. 55pp. Estimating survival functions

### HURDLE AND SELECTION MODELS Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics July 2009

HURDLE AND SELECTION MODELS Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics July 2009 1. Introduction 2. A General Formulation 3. Truncated Normal Hurdle Model 4. Lognormal

### DETERMINANTS. b 2. x 2

DETERMINANTS 1 Systems of two equations in two unknowns A system of two equations in two unknowns has the form a 11 x 1 + a 12 x 2 = b 1 a 21 x 1 + a 22 x 2 = b 2 This can be written more concisely in

### Tests for Two Survival Curves Using Cox s Proportional Hazards Model

Chapter 730 Tests for Two Survival Curves Using Cox s Proportional Hazards Model Introduction A clinical trial is often employed to test the equality of survival distributions of two treatment groups.

### Lecture 19: Conditional Logistic Regression

Lecture 19: Conditional Logistic Regression Dipankar Bandyopadhyay, Ph.D. BMTRY 711: Analysis of Categorical Data Spring 2011 Division of Biostatistics and Epidemiology Medical University of South Carolina

### The basic unit in matrix algebra is a matrix, generally expressed as: a 11 a 12. a 13 A = a 21 a 22 a 23

(copyright by Scott M Lynch, February 2003) Brief Matrix Algebra Review (Soc 504) Matrix algebra is a form of mathematics that allows compact notation for, and mathematical manipulation of, high-dimensional

### Module 10: Mixed model theory II: Tests and confidence intervals

St@tmaster 02429/MIXED LINEAR MODELS PREPARED BY THE STATISTICS GROUPS AT IMM, DTU AND KU-LIFE Module 10: Mixed model theory II: Tests and confidence intervals 10.1 Notes..................................

### CHAPTER 3 Numbers and Numeral Systems

CHAPTER 3 Numbers and Numeral Systems Numbers play an important role in almost all areas of mathematics, not least in calculus. Virtually all calculus books contain a thorough description of the natural,

### LOGISTIC REGRESSION. Nitin R Patel. where the dependent variable, y, is binary (for convenience we often code these values as

LOGISTIC REGRESSION Nitin R Patel Logistic regression extends the ideas of multiple linear regression to the situation where the dependent variable, y, is binary (for convenience we often code these values

### Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur

Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur Module No. #01 Lecture No. #15 Special Distributions-VI Today, I am going to introduce

### Statistical Machine Learning

Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes

### Test of Hypotheses. Since the Neyman-Pearson approach involves two statistical hypotheses, one has to decide which one

Test of Hypotheses Hypothesis, Test Statistic, and Rejection Region Imagine that you play a repeated Bernoulli game: you win \$1 if head and lose \$1 if tail. After 10 plays, you lost \$2 in net (4 heads

### Multinomial and Ordinal Logistic Regression

Multinomial and Ordinal Logistic Regression ME104: Linear Regression Analysis Kenneth Benoit August 22, 2012 Regression with categorical dependent variables When the dependent variable is categorical,

### 7 Hypothesis testing - one sample tests

7 Hypothesis testing - one sample tests 7.1 Introduction Definition 7.1 A hypothesis is a statement about a population parameter. Example A hypothesis might be that the mean age of students taking MAS113X

### 200609 - ATV - Lifetime Data Analysis

Coordinating unit: Teaching unit: Academic year: Degree: ECTS credits: 2015 200 - FME - School of Mathematics and Statistics 715 - EIO - Department of Statistics and Operations Research 1004 - UB - (ENG)Universitat

### Models Where the Fate of Every Individual is Known

Models Where the Fate of Every Individual is Known This class of models is important because they provide a theory for estimation of survival probability and other parameters from radio-tagged animals.

### Notes on Determinant

ENGG2012B Advanced Engineering Mathematics Notes on Determinant Lecturer: Kenneth Shum Lecture 9-18/02/2013 The determinant of a system of linear equations determines whether the solution is unique, without

### Standard errors of marginal effects in the heteroskedastic probit model

Standard errors of marginal effects in the heteroskedastic probit model Thomas Cornelißen Discussion Paper No. 320 August 2005 ISSN: 0949 9962 Abstract In non-linear regression models, such as the heteroskedastic

### Multivariate Analysis (Slides 13)

Multivariate Analysis (Slides 13) The final topic we consider is Factor Analysis. A Factor Analysis is a mathematical approach for attempting to explain the correlation between a large set of variables

### Introduction to Matrix Algebra

Psychology 7291: Multivariate Statistics (Carey) 8/27/98 Matrix Algebra - 1 Introduction to Matrix Algebra Definitions: A matrix is a collection of numbers ordered by rows and columns. It is customary

### MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS Systems of Equations and Matrices Representation of a linear system The general system of m equations in n unknowns can be written a x + a 2 x 2 + + a n x n b a

### , for x = 0, 1, 2, 3,... (4.1) (1 + 1/n) n = 2.71828... b x /x! = e b, x=0

Chapter 4 The Poisson Distribution 4.1 The Fish Distribution? The Poisson distribution is named after Simeon-Denis Poisson (1781 1840). In addition, poisson is French for fish. In this chapter we will

### Lecture 15 Introduction to Survival Analysis

Lecture 15 Introduction to Survival Analysis BIOST 515 February 26, 2004 BIOST 515, Lecture 15 Background In logistic regression, we were interested in studying how risk factors were associated with presence

### A Basic Introduction to Missing Data

John Fox Sociology 740 Winter 2014 Outline Why Missing Data Arise Why Missing Data Arise Global or unit non-response. In a survey, certain respondents may be unreachable or may refuse to participate. Item

### Life Table Analysis using Weighted Survey Data

Life Table Analysis using Weighted Survey Data James G. Booth and Thomas A. Hirschl June 2005 Abstract Formulas for constructing valid pointwise confidence bands for survival distributions, estimated using

### Association Between Variables

Contents 11 Association Between Variables 767 11.1 Introduction............................ 767 11.1.1 Measure of Association................. 768 11.1.2 Chapter Summary.................... 769 11.2 Chi

### L10: Probability, statistics, and estimation theory

L10: Probability, statistics, and estimation theory Review of probability theory Bayes theorem Statistics and the Normal distribution Least Squares Error estimation Maximum Likelihood estimation Bayesian

### Least Squares Estimation

Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David

### MODEL CHECK AND GOODNESS-OF-FIT FOR NESTED CASE-CONTROL STUDIES

MODEL CHECK AND GOODNESS-OF-FIT FOR NESTED CASE-CONTROL STUDIES by YING ZHANG THESIS for the degree of MASTER OF SCIENCE (Master i Modellering og dataanalyse) Faculty of Mathematics and Natural Sciences

### Nominal and ordinal logistic regression

Nominal and ordinal logistic regression April 26 Nominal and ordinal logistic regression Our goal for today is to briefly go over ways to extend the logistic regression model to the case where the outcome

### Basics Inversion and related concepts Random vectors Matrix calculus. Matrix algebra. Patrick Breheny. January 20

Matrix algebra January 20 Introduction Basics The mathematics of multiple regression revolves around ordering and keeping track of large arrays of numbers and solving systems of equations The mathematical

### Multivariate Logistic Regression

1 Multivariate Logistic Regression As in univariate logistic regression, let π(x) represent the probability of an event that depends on p covariates or independent variables. Then, using an inv.logit formulation

### Contents. 3 Survival Distributions: Force of Mortality 37 Exercises Solutions... 51

Contents 1 Probability Review 1 1.1 Functions and moments...................................... 1 1.2 Probability distributions...................................... 2 1.2.1 Bernoulli distribution...................................

### 15.062 Data Mining: Algorithms and Applications Matrix Math Review

.6 Data Mining: Algorithms and Applications Matrix Math Review The purpose of this document is to give a brief review of selected linear algebra concepts that will be useful for the course and to develop

### LS.6 Solution Matrices

LS.6 Solution Matrices In the literature, solutions to linear systems often are expressed using square matrices rather than vectors. You need to get used to the terminology. As before, we state the definitions

### THE MULTINOMIAL DISTRIBUTION. Throwing Dice and the Multinomial Distribution

THE MULTINOMIAL DISTRIBUTION Discrete distribution -- The Outcomes Are Discrete. A generalization of the binomial distribution from only 2 outcomes to k outcomes. Typical Multinomial Outcomes: red A area1

### Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus Tihomir Asparouhov and Bengt Muthén Mplus Web Notes: No. 15 Version 8, August 5, 2014 1 Abstract This paper discusses alternatives

### Portfolio Using Queuing Theory

Modeling the Number of Insured Households in an Insurance Portfolio Using Queuing Theory Jean-Philippe Boucher and Guillaume Couture-Piché December 8, 2015 Quantact / Département de mathématiques, UQAM.

### Linear Threshold Units

Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear

### Simple Linear Regression Inference

Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation

### OLS in Matrix Form. Let y be an n 1 vector of observations on the dependent variable.

OLS in Matrix Form 1 The True Model Let X be an n k matrix where we have observations on k independent variables for n observations Since our model will usually contain a constant term, one of the columns

### MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 2. x n. a 11 a 12 a 1n b 1 a 21 a 22 a 2n b 2 a 31 a 32 a 3n b 3. a m1 a m2 a mn b m

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS 1. SYSTEMS OF EQUATIONS AND MATRICES 1.1. Representation of a linear system. The general system of m equations in n unknowns can be written a 11 x 1 + a 12 x 2 +

### Interpretation of Somers D under four simple models

Interpretation of Somers D under four simple models Roger B. Newson 03 September, 04 Introduction Somers D is an ordinal measure of association introduced by Somers (96)[9]. It can be defined in terms

### e.g. arrival of a customer to a service station or breakdown of a component in some system.

Poisson process Events occur at random instants of time at an average rate of λ events per second. e.g. arrival of a customer to a service station or breakdown of a component in some system. Let N(t) be

### Credit Risk Models: An Overview

Credit Risk Models: An Overview Paul Embrechts, Rüdiger Frey, Alexander McNeil ETH Zürich c 2003 (Embrechts, Frey, McNeil) A. Multivariate Models for Portfolio Credit Risk 1. Modelling Dependent Defaults:

### Errata and updates for ASM Exam C/Exam 4 Manual (Sixteenth Edition) sorted by page

Errata for ASM Exam C/4 Study Manual (Sixteenth Edition) Sorted by Page 1 Errata and updates for ASM Exam C/Exam 4 Manual (Sixteenth Edition) sorted by page Practice exam 1:9, 1:22, 1:29, 9:5, and 10:8

### 171:290 Model Selection Lecture II: The Akaike Information Criterion

171:290 Model Selection Lecture II: The Akaike Information Criterion Department of Biostatistics Department of Statistics and Actuarial Science August 28, 2012 Introduction AIC, the Akaike Information

### SYSM 6304: Risk and Decision Analysis Lecture 3 Monte Carlo Simulation

SYSM 6304: Risk and Decision Analysis Lecture 3 Monte Carlo Simulation M. Vidyasagar Cecil & Ida Green Chair The University of Texas at Dallas Email: M.Vidyasagar@utdallas.edu September 19, 2015 Outline

### Sampling Distributions and the Central Limit Theorem

135 Part 2 / Basic Tools of Research: Sampling, Measurement, Distributions, and Descriptive Statistics Chapter 10 Sampling Distributions and the Central Limit Theorem In the previous chapter we explained

### 1 Determinants and the Solvability of Linear Systems

1 Determinants and the Solvability of Linear Systems In the last section we learned how to use Gaussian elimination to solve linear systems of n equations in n unknowns The section completely side-stepped

### Exam C, Fall 2006 PRELIMINARY ANSWER KEY

Exam C, Fall 2006 PRELIMINARY ANSWER KEY Question # Answer Question # Answer 1 E 19 B 2 D 20 D 3 B 21 A 4 C 22 A 5 A 23 E 6 D 24 E 7 B 25 D 8 C 26 A 9 E 27 C 10 D 28 C 11 E 29 C 12 B 30 B 13 C 31 C 14

### Exploratory Factor Analysis

Introduction Principal components: explain many variables using few new variables. Not many assumptions attached. Exploratory Factor Analysis Exploratory factor analysis: similar idea, but based on model.

### Probability & Statistics Primer Gregory J. Hakim University of Washington 2 January 2009 v2.0

Probability & Statistics Primer Gregory J. Hakim University of Washington 2 January 2009 v2.0 This primer provides an overview of basic concepts and definitions in probability and statistics. We shall

### Regression Modeling Strategies

Frank E. Harrell, Jr. Regression Modeling Strategies With Applications to Linear Models, Logistic Regression, and Survival Analysis With 141 Figures Springer Contents Preface Typographical Conventions

### Parametric Models. dh(t) dt > 0 (1)

Parametric Models: The Intuition Parametric Models As we saw early, a central component of duration analysis is the hazard rate. The hazard rate is the probability of experiencing an event at time t i

### Linear Classification. Volker Tresp Summer 2015

Linear Classification Volker Tresp Summer 2015 1 Classification Classification is the central task of pattern recognition Sensors supply information about an object: to which class do the object belong

### Adjusting the Mantel Haenszel Test Statistic and Odds Ratio for Cluster Sampling

Adjusting the Mantel Haenszel Test Statistic and Odds Ratio for Cluster Sampling Gilles Lamothe Department of Mathematics and Statistics, University of Ottawa October 15, 2011 1 Introduction The purpose

### Econometrics Simple Linear Regression

Econometrics Simple Linear Regression Burcu Eke UC3M Linear equations with one variable Recall what a linear equation is: y = b 0 + b 1 x is a linear equation with one variable, or equivalently, a straight

### 1. χ 2 minimization 2. Fits in case of of systematic errors

Data fitting Volker Blobel University of Hamburg March 2005 1. χ 2 minimization 2. Fits in case of of systematic errors Keys during display: enter = next page; = next page; = previous page; home = first

### Maximum Likelihood Estimation

Math 541: Statistical Theory II Lecturer: Songfeng Zheng Maximum Likelihood Estimation 1 Maximum Likelihood Estimation Maximum likelihood is a relatively simple method of constructing an estimator for

### It is not immediately obvious that this should even give an integer. Since 1 < 1 5

Math 163 - Introductory Seminar Lehigh University Spring 8 Notes on Fibonacci numbers, binomial coefficients and mathematical induction These are mostly notes from a previous class and thus include some

### 11 Linear and Quadratic Discriminant Analysis, Logistic Regression, and Partial Least Squares Regression

Frank C Porter and Ilya Narsky: Statistical Analysis Techniques in Particle Physics Chap. c11 2013/9/9 page 221 le-tex 221 11 Linear and Quadratic Discriminant Analysis, Logistic Regression, and Partial