NonParametric Estimation in Survival Models


 Sharleen Jenkins
 2 years ago
 Views:
Transcription
1 NonParametric Estimation in Survival Models Germán Rodríguez Spring, 2001; revised Spring 2005 We now discuss the analysis of survival data without parametric assumptions about the form of the distribution. 1 One Sample: KaplanMeier Our first topic is nonparametric estimation of the survival function. If the data were not censored, the obvious estimate would be the empirical survival function Ŝ(t) = 1 n I{t i > t}, n where I is the indicator function that takes the value 1 if the condition in braces is true and 0 otherwise. The estimator is simply the proportion alive at t. 1.1 Estimation with Censored Data Kaplan and Meier (1958) extended the estimate to censored data. Let t (1) < t (2) <... < t (m) denote the distinct ordered times of death (not counting censoring times). Let d i be the number of deaths at t (i), and let n i be the number alive just before t (i). This is the number exposed to risk at time t (i). Then the Kaplan Meier or product limit estimate of the survivor function is Ŝ(t) = i:t (i) <t ( 1 d i n i A heuristic justification of the estimate is as follows. To survive to time t you must first survive to t (1). You must then survive from t (1) to t (2) given ). 1
2 that you have already survived to t (1). And so on. Because there are no deaths between t (i 1) and t (i), we take the probability of dying between these times to be zero. The conditional probability of dying at t (i) given that the subject was alive just before can be estimated by d i /n i. The conditional probability of surviving time t (i) is the complement 1 d i /n i. The overall unconditional probability of surviving to t is obtained by multiplying the conditional probabilities for all relevant times up to t. The KaplanMeier estimate is a step function with discontinuities or jumps at the observed death times. Figure 1 shows KaplanMeier estimates for the treated and control groups in the famous Gehan data (see Cox, 1972 or Andersen et al., 1993, p ) Figure 1: KaplanMeier Estimates for Gehan Data If there is no censoring, the KM estimate coincides with the empirical survival function. If the last observation happens to be a censored case, as is the case in the treated group in the Gehan data, the estimate is undefined beyond the last death. 2
3 1.2 Nonparametric Maximum Likelihood The KM estimator has a nice interpretation as a nonparametric maximum likelihood estimator (NPML). A rigorous treatment of this notion is beyond the scope of the course, but the original article by KM provides a more intuitive approach. We consider the contribution to the likelihood of cases that die or are censored at time t. If a subject is censored at t its contribution to the likelihood is S(t). In order to maximize the likelihood we would like to make this as large as possible. Because a survival function must be nonincreasing, the best we can do is keep it constant at t. In other words, the estimated survival function doesn t change at censoring times. If a subject dies at t then this is one of the distinct times of death that we introduced before. Say it is t (i). We need to make the survival function just before t (i) as large as possible. The largest it can be is the value at the previous time of death or 1, whichever is less. We also need to make the survival at t (i) itself as small as possible. This means we need a discontinuity at t (i). Let c i denote the number of cases censored between t (i) and t (i+1), and let d i be the number of cases that die at t (i). Then the likelihood function takes the form m L = [S(t (i 1) ) S(t (i) )] d i S(t (i) ) c i, where the product is over the m distinct times of death, and we take t (0) = 0 with S(t (0) ) = 1. The problem now is to estimate m parameters representing the values of the survival function at the death times t (1), t (2),..., t (m). Write π i = S(t (i) )/S(t (i 1) ) for the conditional probability of surviving from S(t (i 1) ) to S(t (i) ). Then we can write and the likelihood becomes S(t (i) ) = π 1 π 2... π i, m L = (1 π i ) d i π c i i (π 1π 2... π i 1 ) d i+c i. Note that all cases who die at t (i) or are censored between t (i) and t (i+1) contribute a term π j to each of the previous times of death from t (1) to t (i 1). In addition, those who die at t (i) contribute 1 π i, and the censored 3
4 cases contribute an additional π i. Let n i = j i (d j + c j ) denote the total number exposed to risk at t (i). We can then collect terms on each π i and write the likelihood as m L = (1 π i ) d i π n i d i a binomial likelihood. The m.l.e. of π i is then ˆπ i = n i d i n i i, = 1 d i n i. The KM estimator follows from multiplying these conditional probabilities. 1.3 Greenwood s Formula From the likelihood function obtained above it follows that the large sample variance of ˆπ i conditional on the data n i and d i is given by the usual binomial formula var(ˆπ i ) = π i(1 π i ) n i. Perhaps less obviously, cov(ˆπ i, ˆπ j ) = 0 for i j, so the covariances of the contributions from different times of death are all zero. You can verify this result by taking logs and then first and second derivatives of the loglikelihood function. To obtain the large sample variance of Ŝ(t), the KM estimate of the survival function, we need to apply the delta method twice. First we take logs, so that instead of the variance of a product we can find the variance of a sum, working with i K i = log Ŝ(t (i)) = log ˆπ j. j=1 Now we need to find the variance of the log of ˆπ i. This will be our first application of the delta method. The largesample variance of a function f of a random variable X is var(f(x)) = (f (X)) 2 var(x), so we just multiply the variance of X by the derivative of the transformation. In our case the function is the log and we obtain var(log ˆπ i ) = ( 1ˆπ i ) 2 var( ˆπ i ) = 1 π i n i π i. 4
5 Because K i is a sum and the covariances of the π i s (and hence of the log π i s) are zero, we find var(log Ŝ(t (i))) = i j=1 1 π j n j π j = d j n j (n j d j ). Now we have to use the delta method again, this time to get the variance of the survivor function from the variance of its log: var(ŝ(t (i))) = [Ŝ(t (i))] 2 i j=1 1 ˆπ j n j ˆπ j. This result is known as Greenwood s formula. You may question the derivation because it conditions on the n j which are random variables, but the result is in the spirit of likelihood theory, conditioning on all observed quantities, and has been justified rigourously. Peterson (1977) has shown that the KM estimator Ŝ(t) is consistent, and Breslow and Crowley (1974) show that n(ŝ(t) S(t)) converges in law to a Gaussian process with expectation 0 and a variancecovariance function that may be approximated using Greenwood s formula. For a modern treatment of the estimator from the point of view of counting processes see Andersen et al. (1993). 1.4 The NelsonAalen Estimator Consider estimating the cumulative hazard Λ(t). A simple approach is to start from an estimator of S(t) and take minus the log. An alternative approach is to estimate the cumulative hazard directly using the Nelson Aalen estimator: i d j ˆΛ(t (i) ) =. n j Intuitively, this expression is estimating the hazard at each distinct time of death t (j) as the ratio of the number of deaths to the number exposed. The cumulative hazard up to time t is simply the sum of the hazards at all death times up to t, and has a nice interpretation as the expected number of deaths in (0, t] per unit at risk. This estimator has a strong justification in terms of the theory of counting processes. The variance of ˆΛ(t (i) ) can be approximated by var( log Ŝ(t (i))), which we obtained on our way to Greenwood s formula. Therneau and Grambsch (2000) discuss alternative approximations. 5 j=1
6 Breslow (1972) suggested estimating the survival function as Ŝ(t) = exp{ ˆΛ(t)}, where ˆΛ(t) is the NelsonAalen estimator of the integrated hazard. The Breslow estimator and the KM estimator are asymptotically equivalent, and usually are quite close to each other, particularly when the number of deaths is small relative to the number exposed. 1.5 Expectation of Life If Ŝ(t (m)) = 0 then one can estimate µ = E(T ) as the integral of the KM estimate: m ˆµ = Ŝ(t)dt = (t (i) t (i 1) )Ŝ(t (i)). 0 Can you figure out the variance of ˆµ? 2 ksamples: MantelHaenszel Consider now the problem of comparing two or more survivor functions, for example urban versus rural, or treated versus control. Let t (1) < t (2) <... < t (m) denote the distinct times of death observed in the total sample, obtained by combining all groups of interest. Let d ij = deaths at time t (i) in group j, and n ij = number at risk at time t (i) in group j. We also let d i and n i denote the total number of deaths and subjects at risk at time t (i). If the survival probabilities are the same in all groups, then the d i deaths at time t (i) should be distributed among the k groups in proportion to the number at risk. Thus, conditional on d i and n ij, E(d ij ) = d i n ij n i = n ij d i n i, where the last term shows that we can also view this calculation as applying an overall failure rate d i /n i to the n ij subjects in group j. 6
7 We now proceed beyond the mean to obtain the distribution of these counts. Imagine setting up a contingency table at each distinct failure time, with rows given by survival status and columns given by group membership. The entries in the table are d ij or n ij d ij, the row totals are d i and n i and the column totals are n ij. The distribution of the counts conditional on both the row and column totals is hypergeometric. (We mentioned this distribution briefly in WWS509 when we considered contingency tables with both margins fixed.) The hypergeometric distribution has mean as given above, variance var(d ij ) = d i(n i d i ) n ij (1 n ij ), n i 1 n i n i and covariance cov(d ir, d is ) = d i(n i d i ) n i 1 n ir n is n 2. i Let d i denote the vector of deaths at time t (i), with mean E( d i ) and varcov matrix var( d i ). We sum these over all times to obtain m D = [ d i E( d i )] and m V = var( d i ). Mantel and Haenszel proposed testing the equality of the k survival functions by treating the quadratic form H 0 : S 1 (t) = S 2 (t) =... = S k (t) Q = D V D as a χ 2 statistic with k 1 degrees of freedom. Here V is any generalized inverse of V. Omitting the ith group from the calculation of D and V will do; the test is invariant to the choice of omitted group. For k = 2 we get z = Q = (di1 E(d i1 )). var(di1 ) An approximation for k 2 which does not require matrix inversion treats i j (O ij E ij ) 2 E ij, where O ij denotes observed and E ij expected deaths at time t (i) in group j, as a χ 2 statistic with k 1 d.f. 7
8 The MantelHaenszel test can be derived as a linear rank test, and in that context is often called the logrank or Savage test. Kalbfleisch and Prentice have proposed an extension to censored data of the Wilcoxon test. Other alternatives have been proposed by Gehan (1965) and Breslow (1970), but the MH test is the most popular one. 3 Regression: Cox s Model Let us consider the more general problem where we have a vector x of covariates. The ksample problem can be viewed as the special case where the x s are dummy variables denoting group membership. Recall the basic model λ(t, x) = λ 0 (t)e x β, and consider estimation of β without making any assumptions about the baseline hazard λ 0 (t). 3.1 Cox s Partial Likelihood In his 1972 paper Cox proposed fitting the model by maximizing a special likelihood. Let t (1) < t (2) <... < t (m) denote the observed distinct times of death, as before, and consider what happens at t (i). Let R i denote the risk set at t (i), defined as the set of indices of the subjects that are alive just before t (i). Thus, R 0 = {1, 2,..., n}. Suppose first that there are no ties in the observation times, so one and only person subject failed at t (i). Let s call this subject j(i). What is the conditional probability that this particular subject would fail at t (i) given the risk set R i and the fact that exactly one subject fails at that time? Answer: λ(t (i), x j(i) )dt j R i λ(t (i), x j )dt. We can write this probability in terms of the baseline hazard and relative risk as λ 0 (t (i) )e x j(i) β j R i λ 0 (t (i) )e x j β, 8
9 and we notice that the baseline hazard cancels, so the probability in question is e x j(i) β j R i e x j β and does not depend on the baseline hazard λ 0 (t). Cox proposed multiplying these probabilities together over all distinct failure times and treating the resulting product L = m e x j(i) β j R i e x j β as if it was an ordinary likelihood. In his original paper Cox called this a conditional likelihood because it is a product of conditional probabilities, but later abandoned the name because it is misleading: L is not itself a conditional probability. Kalbfleisch and Prentice considered the case where the covariates are fixed over time and showed that L is the marginal likelihood of the ranks of the observations, obtained by considering just the order in which people die and not the actual times at which they die. In 1975 Cox provided a more general justification of L as part of the full likelihood in fact, a part that happens to contain most of the information about β and therefore proposed calling L a partial likelihood. This justification is valid even with timevarying covariates. A more rigorous justification of the partial likelihood in terms of the theory of counting processes can be found in Andersen et al. (1993). 3.2 The Score and Information The log of Cox s partial likelihood is logl = i {x (j(i) β log j R i e x j β }. Taking derivatives with respect to β we find the score to be log L i β r = x j(i)r j R i e x j β x jr j R i e x j β. The term to the right of the minus sign is just a weighted average of x r over the risk set R i with weights equal to the relative risks e x j β. Thus, we can 9
10 write the score as U r (β) = log L i β r = x j(i)r A ir (β), where A ir (β) is the weighted average of x r over the risk set R i. Taking derivatives again and changing sign we find the observed information to be 2 log L i e x j β x jr x js ( e x j β ) ( e x j β x jr )( e x j β x js ) = β r β s ( e x j β, ) 2 where all sums are over the risk set R i. The right hand side can be written as the difference of two terms. The first term can be interpreted as a weighted average of the crossproduct of x r and x s. The second term is the product of the weighted averages A ir (β) and A is (β). Thus we can write 2 log L i e x j β x jr x js = β r β s e x j β A jr (β)a js (β). You may recognize this expression as the old desk calculator formula for a covariance, leading to the observed information I(β) = 2 log L i β r β s = m C irs (β), where C irs (β) denotes the weighted covariance of x r and x s over the risk set R i with weights equal to the relative risks e x j β for j R i. Calculation of the score and information is thus relatively simple. In matrix notation u(β) = i (x j(i) A i (β)) and I(β) = i C i (β), where A i (β) is the mean and C i (β) is the variancecovariance matrix of x over the risk set R i with weights e x j β for j R i. Notably, the partial loglikelihood is formally identical with the loglikelihood for a conditional logit model. 3.3 The Problem of Ties The development so far has assumed that only one death occurs at each distinct time t (i). In practice we often observe several deaths, say d i, at t (i). This can happen in several ways: 10
11 The data are discrete, so in fact there is a positive probability of failure at t (i). If this is the case one should really use a discrete model. We will discuss possible approaches below. The data are continuous but have been grouped, so d i represents the number of deaths in some interval around t (i). In this case one would probably be better off estimating a separate parameter for each interval, using a complementaryloglog binomial model or a Poisson model corresponding to a piecewise exponential survival model, as discussed in my WWS509 notes. The data are continuous and are not grouped, but there are a few ties resulting perhaps from coarse measurement. In this case we can extend the argument used to build the likelihood. Let D i denote the set of indices of the d i cases who failed at t (i). The probability that the d i cases that actually fail would be those in D i given the risk set R i and the fact that d i of them fail at t (i) is L = j D i e x j β P i j P i e x j β, where the sum in the denominator is over all possible permutations P i or ways of choosing d i indices from the risk set, and the product is over the set of chosen indices, which we call a permutation. For example assume four people are at risk and two die. The deaths could be {1,2}, {1,3}, {1,4}, {2,3}, {2,4}, {3,4}. We calculate the probability of each of these outcomes. Then we divide the probability of the outcome that actually occurred by the sum of the probabilities of all possible outcomes. This likelihood was proposed by Cox in his original paper. The numerator is easy to calculate, and has the form exp{s i β}, where S i = j D i x j is the sum of the x s over the death set D i. The denominator is difficult to calculate because the number of permutations grows very quickly with d i. Peto (and independently Breslow) proposed approximating the denominator by calculating the sum e x j β over the entire risk set R i and raising it to d i. This leads to the much simpler expression m L e S i β j β j R i e x d i. The PetoBreslow approximation is reasonably good when d i is small relative to n i, and is popular because of its simplicity. Efron proposed a better 11
12 approximation that requires only modest additional computational effort. Consider again our example where two out of four subjects fail. Suppose the subjects that fail are 1 and 2, and let r j = e x j β denote the relative risk for the jth subject. In continuous time one must have failed before the other, we just don t know which. The contributions to the partial likelihood would be r 1 r 1 + r 2 + r 3 + r 4 r 2 r 2 + r 3 + r 4 if 1 failed before 2, or r 2 r 1 r 1 + r 2 + r 3 + r 4 r 1 + r 3 + r 4 if 2 was the first to fail. In both cases the numerator is r 1 r 2. To compute the denominator Peto and Breslow add the risks over the complete risk set both times, using (r 1 + r 2 + r 3 + r 4 ) 2, which is obviously conservative. Efron uses the average risk of subjects 1 and 2 for the second term, so he takes the denominator to be (r 1 + r 2 + r 3 + r 4 )(0.5r r 2 + r 3 + r 4 ). This approximation is much more accurate unless d i is very large relative to n i. For more details see Therneau and Grambsch (2000, Section 3.3). 3.4 Tests of Hypotheses As usual, we have three approaches to testing hypotheses about ˆβ: Likelihood Ratio Test: given two nested models, we treat twice the difference in partial loglikelihoods as a χ 2 statistic with degrees of freedom equal to the difference in the number of parameters. Wald Test: we use the fact that approximately in large samples ˆβ has a multivariate normal distribution with mean β and variancecovariance matrix var( ˆβ) = I 1 (β). Thus, under H 0 : β = β 0, the quadratic form ( ˆβ β 0 ) var 1 ( ˆβ)( ˆβ β 0 ) χ 2 p, where p is the dimension of β. This test is often used for a subset of β. Score Test: we use the fact that approximately in large samples the score u(β) has a multivariate normal distribution with mean 0 and variancecovariance matrix equal to the information matrix. Thus, under H 0 : β = β 0, the quadratic form u(β 0 ) I 1 (β 0 )u(β 0 ) χ 2 p. Note that this test does not require calculating the m.l.e. ˆβ. 12
13 One reason for bringing up the score test is that in the ksample case the score test of H 0 : β = 0 based on Cox s model happens to be the same as the MantelHaenszel logrank test. Here is an outline of the proof. Assume no ties. If β = 0 then the weights used to calculate A i and C i are all 1, A ir happens to be the proportion of the risk set R i that comes from the rth sample (which is the same as the expected number of deaths in that group) and C i is a binomial variancecovariance matrix. If there are ties, Cox s approach leads to the test discussed in Section 2. Use of Peto s approximation is equivalent to omitting the factor (n i d i )/(n i 1) from the variancecovariance matrix. All three tests are asymptotically equivalent. The quality of the normal approximations depends on the sample size, the distribution of the cases over the covariate space, and the extent of censoring. 3.5 TimeVarying Covariates A nice feature of the Cox model and partial likelihood is that it extends easily to the case of timevarying covariates. Note that the partial likelihood is built by considering only what happens at each failure time, so we only need to know the values of the covariates at the distinct times of death. One use of timevarying covariates is to check the assumption of proportionality of hazards. In his original paper Cox analyzes a twosample problem and introduces an auxiliary covariate { 0, in group 0 z i = t c, in group 1 where c is an arbitrary constant close to the mean of t, chosen to avoid numerical instability. If the coefficient of z is 0, the assumption of proportionality of hazards is adequate. A positive value indicates that the ratio of hazards for group 1 over group 0 increases over time. A negative coefficient suggests a declining hazard ratio, a common occurrence. Another use of timevarying covariates is to represent variables that simply change over time. In a study of contraceptive failure, for example, one may treat frequency of intercourse as a timevarying covariate. Another example of a timevarying covariate is education in an analysis of age at marriage. Note that x(t) may represent the actual value of a variable at time t, or any index based on the individual s history up to time t. In a study of the effects of breastfeeding on postpartum amenorrhea, for example, x(t) could represent the total number of suckling episodes in the week preceding t. 13
14 The partial likelihood function with timevarying covariates does not have an interpretation as a marginal likelihood of the ranks. For further details see Cox and Oakes (1984, Chapter 8). 3.6 Estimating the Baseline Survival Interest so far has focused on the regression coefficients β. We now consider how to estimate the baseline hazard λ 0 (t), which dropped out of the partial likelihood. Kalbfleisch and Prentice (1980, Section 4.3) use an argument similar to the derivation of the KaplanMeier estimate, noting that the hazard should assign mass only to the discrete times of death. Let π i denote the conditional survival probability at time t (i) for a baseline subject. To obtain the conditional probability for a subject with covariates x we would need to raise π i to e x β. This leads to a likelihood of the form L = m j D i (1 π i ) e x j β π e x j β i. j R i D i Meier suggested maximizing this likelihood with respect to both the π i and β. A simpler approach is to plugin the estimate ˆβ from the partial likelihood, and maximize the resulting expression with respect to the π i only. If there are no ties, this gives ˆπ i = 1 ˆβ e x ˆβ j(i) ex j(i) j R i e x ˆβ j Think of this as follows. With no covariates, our estimate would be ˆπ i = 1 d i /n i with d i = 1, which is the KM estimate. With covariates we do the same thing, except that we weight each case by its relative risk. The resulting survival probability is then raised to e x j(i) β to turn it into a baseline probability. If there are ties one has to solve. j D i x e ˆβ j 1 π ex j ˆβ i iteratively. A suitable starting value is = e x ˆβ j j R i d i log π i = j R i e x ˆβ. j 14
15 The estimate of the baseline survival function is then a step function Ŝ 0 (t) = i:t (i) <t Cox and Oakes (1984, Section 7.8) describe a simpler estimator that extends the NelsonAalen estimate of the cumulative hazard to the case of covariates. The estimator can be described somewhat heuristically as follows. Treat the baseline hazard as 0 except at failure times. The expected number of deaths at t (i) can be obtained by summing the hazards over the risk set: E(d i ) = λ 0 (t (i) )e x j β. j R i Equating the observed and expected number of deaths at t (i) leads us to estimate λ i = λ 0 (t (i) ) as ˆλ i = d i e x j β, where the sum is over the risk set R i. The cumulative hazard and survival functions are then estimated as ˆΛ 0 (t) = i:t (i) <t ˆπ i. ˆλ i, and Ŝ 0 (t) = e ˆΛ 0 (t). If there are no covariates these reduce to the ordinary NelsonAalen and Breslow estimators described earlier. Having obtained estimates of the baseline hazard and survival, we can obtain fitted hazards and survival functions for any value of x. This task is pretty straightforward when we have timefixed covariates, as all we need to do is multiply the baseline hazard by the relative risk, or raise the baseline survival to the relative risk. With timevarying covariates things get somewhat more complicated, as we have to pick up the appropriate hazard for each distinct failure time depending on the values of the covariates at that point. 3.7 Martingale Residuals Residuals play an important role in checking linear and generalized linear models. Not surprisingly, the concept has been extended to survival models. A lot of this work relies heavily on the terminology and notation of counting processes. We will try to convey the essential ideas in a nontechnical way. 15
16 Instead of focusing on the ith individual s survival time T i, we will introduce a function N i (t) that counts events over time. In survival models N i (t) will be zero while the subject is alive and then it will become one. (In more general eventhistory models N i (t) counts the number of occurrences of the event to subject i by time t.) While survival time T i is a random variable, N i (t) is a stochastic process, a function of time. We will also introduce a function to track the ith individual s exposure. Y i (t) will be one while the individual is in the risk set and zero afterwards. Note that Y i (t) can become zero due to death or due to censoring. To complete the model we add a hazard function λ i (t) representing the ith individuals risk at time t. In a Cox model λ i (t) = λ 0 (t) exp{x i β}. In the terminology of counting processes, the process N i (t) is said to have intensity Y i (t)λ i (t). The intensity is just the risk if the subject is exposed, and is zero otherwise. The probability that N i (t) will jump in a small interval [t, t + dt) conditional on the entire history of the process is given by λ i (t)y i (t)dt, and is proportional to the intensity of the process and the width of the interval. A key feature of this formulation is that t M i (t) = N i (t) λ i (t)y i (t)dt 0 is a martingale, a fact that can be used to establish the properties of tests and estimators using martingale central limit theory. A martingale is essentially a stochastic process without drift. Given two times 0 < t 1 < t 2 the expectation E(M(t 2 )) given the history up to time t 1 is simply M(t 1 ). In other words martingale increments have mean zero. Also, martingale increments are uncorrelated, although not necessarily independent. The integral following the minus sign in the above equation is called the compensator of N i (t). You may think of it as the conditional expected value of the counting process at time t. Subtracting the compensator turns the counting process into a martingale. This equation suggest immediately the first type of residual one can use in Cox models, the socalled Martingale Residual: t ˆM i (t) = N i (t) Y i (t)e ˆβdˆΛ x i 0 (t) 0 where ˆΛ 0 (t) denotes the NelsonAalen estimator of the baseline hazard. Because this is a discrete function with jumps at the observed failure times, the integral in the above equation should be interpreted as a sum over all j such that t (j) < t. Usually the residual is computed at t = (or the largest 16
17 observed survival time), in which case the martingale residual is ˆM i = d i e x i ˆβ ˆΛ0 (t i ), where d i is the usual death indicator and t i is the observation time (to death or censoring) for the ith individual. One possible use of residuals is to find outliers, in this case individuals who lived unusually short or long times, after taking into account the relative risks associated with their observed characteristics. Martingale residuals are just one of several types of residuals that have been proposed for survival models. Others include deviance, score and Schoenfeld residuals. For more details on counting processes, martingales and residuals see Therneau and Grambsch (2000), especially Section 2.2 and Chapter Models for Discrete and Grouped Data We close with a brief review of alternative approaches for discrete and continuous grouped data, expanding slightly on the WWS509 discussion. Cox (1972) proposed an alternative version of the proportional hazards model for discrete data. In this case the hazard is the conditional probability of dying at time t given survival up to that point, so that λ(t) = Pr{T = t T t}. Cox s discrete logistic model assumes that the conditional odds of surviving t (i) are proportional to some baseline odds, so that λ(t, x) 1 λ(t, x) = λ 0(t) 1 λ 0 (t) ex β. Note that taking logs on both sides results in a logit model, where the logit of the discrete hazard is linear on β. Do not confuse this model with the proportional odds model, where the unconditional odds of survival (or odds of dying) are proportional for different values of x. If the λ 0 (t) are small so that 1 λ 0 (t) is close to one, this model will be similar to the proportional hazards model, as the odds are close to the hazard itself. A nice property of this model is that the partial likelihood turns out to be identical to that of the continuoustime model. To see this point note 17
18 that under this model the hazard at time t i for an individual with covariate values x is λ(t i, x) = θ ie x β 1 + θ i e x β, where θ i = λ 0 (t i )/(1 λ 0 (t i )) denotes the baseline conditional odds of surviving t i. Suppose there are two cases exposed at time t i, labelled 1 and 2. The probability that 1 dies and 2 does not is λ(t i, x 1 )(1 λ(t i, x 2 )) = θ ie x 1 β 1 + θ i e x 1 β θ i e x 2 β. The probability that 2 dies and 1 does not has a similar structure, with θ i e x 2 β in the numerator and the same denominator. When we divide the probability of one of these outcomes by the sum of the two, the denominators cancel out, as do the θ i. Thus, the conditional probability is e x j(i) β j R i e x j β, which is exactly the same as in the continuous case. (You may wonder why we did not consider the 1 λ s in the continuous case. It turns out we didn t have to. In you repeat the continuoustime derivation including terms of the form λdt for deaths and 1 λdt for survivors you will discover that the dt s for deaths cancel out but those for survivors do not, and as dt 0 the terms 1 λdt 1 and drop from the likelihood.) There is an alternative approach to discrete data that is particularly appropriate when you have grouped continuous times. See Kalbfleisch and Prentice (1980), Section and Section Suppose we wanted a discrete time model that preserves the relationship between survivor functions in the continuous time model, namely S(t, x) = S 0 (t) ex β. In the discrete case we have S(t, x) = u<t (1 λ(u, x)), so we must have Solving for the hazard we get 1 λ(t i, x) = (1 λ 0 (t i )) ex β. λ(t i, x) = 1 (1 λ 0 (t i )) ex β. 18
19 The linearizing transformation for this model is the complementary loglog link, log( log(1 λ)). To see why this model is uniquely appropriate for grouped data suppose we only observe failures in intervals [0 = τ 0, τ 1 ), [τ 1, τ 2 ),..., [τ k 1, τ k = ). Suppose the hazard is continuous and satisfies a standard proportional hazards model. The probability of surviving interval i for a subject with covariates x is Pr{T > τ i T > τ i 1, x} = S(τ i, x) S(τ i 1, x). Writing this in terms of the baseline survival we get ( ) S0 (τ i ) e x β = (e τ ) i e λ τ 0 (t)dt x β i 1. S 0 (τ i 1 ) In view of this result, we define the baseline hazard in interval (τ i 1, τ i ) as λ 0i = 1 e τ i τ i 1 λ 0 (t)dt. The hazard for an individual with covariates x in the same interval then becomes λ i (x) = 1 (1 λ 0i ) ex β, and can be linearized using the cloglog transformation. This is the only discrete model appropriate for grouped data from a continuoustime proportional hazards model. Unfortunately, it cannot be estimated nonparametrically using a partial likelihood argument. (If you try to construct a partial likelihood you will discover that the λ 0i s do not drop out of the likelihood.) In practice, both the logit and the complementary loglog discrete models can be estimated via ordinary likelihood techniques by treating the baseline hazard at each discrete failure time (or interval) as a separate parameter to be estimated, provided of course one has enough failures at each time (or interval). The resulting likelihood is the same as that of a binomial model with logit or cloglog link, so either model can be easily fitted with standard software. One difficulty with discrete models generated by grouping continuous data has to do with censored cases that may have been exposed part of the interval. The only way to handle these cases is to censor them at the 19
20 beginning of the interval, which throws away some information. This is not a problem with piecewise exponential models based on the Poisson equivalence, because one can easily take into account partial exposure in the offset. 20
7.1 The Hazard and Survival Functions
Chapter 7 Survival Models Our final chapter concerns models for the analysis of data which have three main characteristics: (1) the dependent variable or response is the waiting time until the occurrence
More informationPoisson Models for Count Data
Chapter 4 Poisson Models for Count Data In this chapter we study loglinear models for count data under the assumption of a Poisson error structure. These models have many applications, not only to the
More informationLecture 2 ESTIMATING THE SURVIVAL FUNCTION. Onesample nonparametric methods
Lecture 2 ESTIMATING THE SURVIVAL FUNCTION Onesample nonparametric methods There are commonly three methods for estimating a survivorship function S(t) = P (T > t) without resorting to parametric models:
More informationParametric Survival Models
Parametric Survival Models Germán Rodríguez grodri@princeton.edu Spring, 2001; revised Spring 2005, Summer 2010 We consider briefly the analysis of survival data when one is willing to assume a parametric
More informationLogit Models for Binary Data
Chapter 3 Logit Models for Binary Data We now turn our attention to regression models for dichotomous data, including logistic regression and probit analysis. These models are appropriate when the response
More informationChecking proportionality for Cox s regression model
Checking proportionality for Cox s regression model by Hui Hong Zhang Thesis for the degree of Master of Science (Master i Modellering og dataanalyse) Department of Mathematics Faculty of Mathematics and
More informationModels for Count Data With Overdispersion
Models for Count Data With Overdispersion Germán Rodríguez November 6, 2013 Abstract This addendum to the WWS 509 notes covers extrapoisson variation and the negative binomial model, with brief appearances
More informationGeneralized Linear Models. Today: definition of GLM, maximum likelihood estimation. Involves choice of a link function (systematic component)
Generalized Linear Models Last time: definition of exponential family, derivation of mean and variance (memorize) Today: definition of GLM, maximum likelihood estimation Include predictors x i through
More informationSAS Software to Fit the Generalized Linear Model
SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling
More informationSTATISTICA Formula Guide: Logistic Regression. Table of Contents
: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 SigmaRestricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary
More informationProbability Calculator
Chapter 95 Introduction Most statisticians have a set of probability tables that they refer to in doing their statistical wor. This procedure provides you with a set of electronic statistical tables that
More informationOverview Classes. 123 Logistic regression (5) 193 Building and applying logistic regression (6) 263 Generalizations of logistic regression (7)
Overview Classes 123 Logistic regression (5) 193 Building and applying logistic regression (6) 263 Generalizations of logistic regression (7) 24 Loglinear models (8) 54 1517 hrs; 5B02 Building and
More informationˆx wi k = i ) x jk e x j β. c i. c i {x ik x wi k}, (1)
Residuals Schoenfeld Residuals Assume p covariates and n independent observations of time, covariates, and censoring, which are represented as (t i, x i, c i ), where i = 1, 2,..., n, and c i = 1 for uncensored
More informationOverview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model
Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written
More informationVariance of OLS Estimators and Hypothesis Testing. Randomness in the model. GM assumptions. Notes. Notes. Notes. Charlie Gibbons ARE 212.
Variance of OLS Estimators and Hypothesis Testing Charlie Gibbons ARE 212 Spring 2011 Randomness in the model Considering the model what is random? Y = X β + ɛ, β is a parameter and not random, X may be
More informationComparison of Survival Curves
Comparison of Survival Curves We spent the last class looking at some nonparametric approaches for estimating the survival function, Ŝ(t), over time for a single sample of individuals. Now we want to compare
More informationApplied Statistics. J. Blanchet and J. Wadsworth. Institute of Mathematics, Analysis, and Applications EPF Lausanne
Applied Statistics J. Blanchet and J. Wadsworth Institute of Mathematics, Analysis, and Applications EPF Lausanne An MSc Course for Applied Mathematicians, Fall 2012 Outline 1 Model Comparison 2 Model
More informationLogistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression
Logistic Regression Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Logistic Regression Preserve linear classification boundaries. By the Bayes rule: Ĝ(x) = arg max
More informationStatistics in Retail Finance. Chapter 6: Behavioural models
Statistics in Retail Finance 1 Overview > So far we have focussed mainly on application scorecards. In this chapter we shall look at behavioural models. We shall cover the following topics: Behavioural
More informationIntroduction to General and Generalized Linear Models
Introduction to General and Generalized Linear Models General Linear Models  part I Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK2800 Kgs. Lyngby
More informationStatistics Graduate Courses
Statistics Graduate Courses STAT 7002Topics in StatisticsBiological/Physical/Mathematics (cr.arr.).organized study of selected topics. Subjects and earnable credit may vary from semester to semester.
More informationSurvival analysis methods in Insurance Applications in car insurance contracts
Survival analysis methods in Insurance Applications in car insurance contracts Abder OULIDI 12 JeanMarie MARION 1 Hérvé GANACHAUD 3 1 Institut de Mathématiques Appliquées (IMA) Angers France 2 Institut
More informationDistribution (Weibull) Fitting
Chapter 550 Distribution (Weibull) Fitting Introduction This procedure estimates the parameters of the exponential, extreme value, logistic, loglogistic, lognormal, normal, and Weibull probability distributions
More informationLecture 3: Linear methods for classification
Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,
More informationEvent history analysis
Event history analysis Jeroen K. Vermunt Department of Methodology and Statistics Tilburg University 1 Introduction The aim of event history analysis is to explain why certain individuals are at a higher
More informationGamma Distribution Fitting
Chapter 552 Gamma Distribution Fitting Introduction This module fits the gamma probability distributions to a complete or censored set of individual or grouped data values. It outputs various statistics
More informationSurvival Analysis Using SPSS. By Hui Bian Office for Faculty Excellence
Survival Analysis Using SPSS By Hui Bian Office for Faculty Excellence Survival analysis What is survival analysis Event history analysis Time series analysis When use survival analysis Research interest
More informationLinear Models for Continuous Data
Chapter 2 Linear Models for Continuous Data The starting point in our exploration of statistical models in social research will be the classical linear model. Stops along the way include multiple linear
More informationOrdinal Regression. Chapter
Ordinal Regression Chapter 4 Many variables of interest are ordinal. That is, you can rank the values, but the real distance between categories is unknown. Diseases are graded on scales from least severe
More informationThe Scalar Algebra of Means, Covariances, and Correlations
3 The Scalar Algebra of Means, Covariances, and Correlations In this chapter, we review the definitions of some key statistical concepts: means, covariances, and correlations. We show how the means, variances,
More information3. The Multivariate Normal Distribution
3. The Multivariate Normal Distribution 3.1 Introduction A generalization of the familiar bell shaped normal density to several dimensions plays a fundamental role in multivariate analysis While real data
More informationStat 5102 Notes: Nonparametric Tests and. confidence interval
Stat 510 Notes: Nonparametric Tests and Confidence Intervals Charles J. Geyer April 13, 003 This handout gives a brief introduction to nonparametrics, which is what you do when you don t believe the assumptions
More informationIntroduction. Survival Analysis. Censoring. Plan of Talk
Survival Analysis Mark Lunt Arthritis Research UK Centre for Excellence in Epidemiology University of Manchester 01/12/2015 Survival Analysis is concerned with the length of time before an event occurs.
More informationEstimating an ARMA Process
Statistics 910, #12 1 Overview Estimating an ARMA Process 1. Main ideas 2. Fitting autoregressions 3. Fitting with moving average components 4. Standard errors 5. Examples 6. Appendix: Simple estimators
More informationRegression III: Advanced Methods
Lecture 16: Generalized Additive Models Regression III: Advanced Methods Bill Jacoby Michigan State University http://polisci.msu.edu/jacoby/icpsr/regress3 Goals of the Lecture Introduce Additive Models
More information13. Poisson Regression Analysis
136 Poisson Regression Analysis 13. Poisson Regression Analysis We have so far considered situations where the outcome variable is numeric and Normally distributed, or binary. In clinical work one often
More informationEstimating survival functions has interested statisticians for numerous years.
ZHAO, GUOLIN, M.A. Nonparametric and Parametric Survival Analysis of Censored Data with Possible Violation of Method Assumptions. (2008) Directed by Dr. Kirsten Doehler. 55pp. Estimating survival functions
More informationHURDLE AND SELECTION MODELS Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics July 2009
HURDLE AND SELECTION MODELS Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics July 2009 1. Introduction 2. A General Formulation 3. Truncated Normal Hurdle Model 4. Lognormal
More informationDETERMINANTS. b 2. x 2
DETERMINANTS 1 Systems of two equations in two unknowns A system of two equations in two unknowns has the form a 11 x 1 + a 12 x 2 = b 1 a 21 x 1 + a 22 x 2 = b 2 This can be written more concisely in
More informationTests for Two Survival Curves Using Cox s Proportional Hazards Model
Chapter 730 Tests for Two Survival Curves Using Cox s Proportional Hazards Model Introduction A clinical trial is often employed to test the equality of survival distributions of two treatment groups.
More informationLecture 19: Conditional Logistic Regression
Lecture 19: Conditional Logistic Regression Dipankar Bandyopadhyay, Ph.D. BMTRY 711: Analysis of Categorical Data Spring 2011 Division of Biostatistics and Epidemiology Medical University of South Carolina
More informationThe basic unit in matrix algebra is a matrix, generally expressed as: a 11 a 12. a 13 A = a 21 a 22 a 23
(copyright by Scott M Lynch, February 2003) Brief Matrix Algebra Review (Soc 504) Matrix algebra is a form of mathematics that allows compact notation for, and mathematical manipulation of, highdimensional
More informationModule 10: Mixed model theory II: Tests and confidence intervals
St@tmaster 02429/MIXED LINEAR MODELS PREPARED BY THE STATISTICS GROUPS AT IMM, DTU AND KULIFE Module 10: Mixed model theory II: Tests and confidence intervals 10.1 Notes..................................
More informationCHAPTER 3 Numbers and Numeral Systems
CHAPTER 3 Numbers and Numeral Systems Numbers play an important role in almost all areas of mathematics, not least in calculus. Virtually all calculus books contain a thorough description of the natural,
More informationLOGISTIC REGRESSION. Nitin R Patel. where the dependent variable, y, is binary (for convenience we often code these values as
LOGISTIC REGRESSION Nitin R Patel Logistic regression extends the ideas of multiple linear regression to the situation where the dependent variable, y, is binary (for convenience we often code these values
More informationProbability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur
Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur Module No. #01 Lecture No. #15 Special DistributionsVI Today, I am going to introduce
More informationStatistical Machine Learning
Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes
More informationTest of Hypotheses. Since the NeymanPearson approach involves two statistical hypotheses, one has to decide which one
Test of Hypotheses Hypothesis, Test Statistic, and Rejection Region Imagine that you play a repeated Bernoulli game: you win $1 if head and lose $1 if tail. After 10 plays, you lost $2 in net (4 heads
More informationMultinomial and Ordinal Logistic Regression
Multinomial and Ordinal Logistic Regression ME104: Linear Regression Analysis Kenneth Benoit August 22, 2012 Regression with categorical dependent variables When the dependent variable is categorical,
More information7 Hypothesis testing  one sample tests
7 Hypothesis testing  one sample tests 7.1 Introduction Definition 7.1 A hypothesis is a statement about a population parameter. Example A hypothesis might be that the mean age of students taking MAS113X
More information200609  ATV  Lifetime Data Analysis
Coordinating unit: Teaching unit: Academic year: Degree: ECTS credits: 2015 200  FME  School of Mathematics and Statistics 715  EIO  Department of Statistics and Operations Research 1004  UB  (ENG)Universitat
More informationModels Where the Fate of Every Individual is Known
Models Where the Fate of Every Individual is Known This class of models is important because they provide a theory for estimation of survival probability and other parameters from radiotagged animals.
More informationNotes on Determinant
ENGG2012B Advanced Engineering Mathematics Notes on Determinant Lecturer: Kenneth Shum Lecture 918/02/2013 The determinant of a system of linear equations determines whether the solution is unique, without
More informationStandard errors of marginal effects in the heteroskedastic probit model
Standard errors of marginal effects in the heteroskedastic probit model Thomas Cornelißen Discussion Paper No. 320 August 2005 ISSN: 0949 9962 Abstract In nonlinear regression models, such as the heteroskedastic
More informationMultivariate Analysis (Slides 13)
Multivariate Analysis (Slides 13) The final topic we consider is Factor Analysis. A Factor Analysis is a mathematical approach for attempting to explain the correlation between a large set of variables
More informationIntroduction to Matrix Algebra
Psychology 7291: Multivariate Statistics (Carey) 8/27/98 Matrix Algebra  1 Introduction to Matrix Algebra Definitions: A matrix is a collection of numbers ordered by rows and columns. It is customary
More informationMATRIX ALGEBRA AND SYSTEMS OF EQUATIONS
MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS Systems of Equations and Matrices Representation of a linear system The general system of m equations in n unknowns can be written a x + a 2 x 2 + + a n x n b a
More information, for x = 0, 1, 2, 3,... (4.1) (1 + 1/n) n = 2.71828... b x /x! = e b, x=0
Chapter 4 The Poisson Distribution 4.1 The Fish Distribution? The Poisson distribution is named after SimeonDenis Poisson (1781 1840). In addition, poisson is French for fish. In this chapter we will
More informationLecture 15 Introduction to Survival Analysis
Lecture 15 Introduction to Survival Analysis BIOST 515 February 26, 2004 BIOST 515, Lecture 15 Background In logistic regression, we were interested in studying how risk factors were associated with presence
More informationA Basic Introduction to Missing Data
John Fox Sociology 740 Winter 2014 Outline Why Missing Data Arise Why Missing Data Arise Global or unit nonresponse. In a survey, certain respondents may be unreachable or may refuse to participate. Item
More informationLife Table Analysis using Weighted Survey Data
Life Table Analysis using Weighted Survey Data James G. Booth and Thomas A. Hirschl June 2005 Abstract Formulas for constructing valid pointwise confidence bands for survival distributions, estimated using
More informationAssociation Between Variables
Contents 11 Association Between Variables 767 11.1 Introduction............................ 767 11.1.1 Measure of Association................. 768 11.1.2 Chapter Summary.................... 769 11.2 Chi
More informationL10: Probability, statistics, and estimation theory
L10: Probability, statistics, and estimation theory Review of probability theory Bayes theorem Statistics and the Normal distribution Least Squares Error estimation Maximum Likelihood estimation Bayesian
More informationLeast Squares Estimation
Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN13: 9780470860809 ISBN10: 0470860804 Editors Brian S Everitt & David
More informationMODEL CHECK AND GOODNESSOFFIT FOR NESTED CASECONTROL STUDIES
MODEL CHECK AND GOODNESSOFFIT FOR NESTED CASECONTROL STUDIES by YING ZHANG THESIS for the degree of MASTER OF SCIENCE (Master i Modellering og dataanalyse) Faculty of Mathematics and Natural Sciences
More informationNominal and ordinal logistic regression
Nominal and ordinal logistic regression April 26 Nominal and ordinal logistic regression Our goal for today is to briefly go over ways to extend the logistic regression model to the case where the outcome
More informationBasics Inversion and related concepts Random vectors Matrix calculus. Matrix algebra. Patrick Breheny. January 20
Matrix algebra January 20 Introduction Basics The mathematics of multiple regression revolves around ordering and keeping track of large arrays of numbers and solving systems of equations The mathematical
More informationMultivariate Logistic Regression
1 Multivariate Logistic Regression As in univariate logistic regression, let π(x) represent the probability of an event that depends on p covariates or independent variables. Then, using an inv.logit formulation
More informationContents. 3 Survival Distributions: Force of Mortality 37 Exercises Solutions... 51
Contents 1 Probability Review 1 1.1 Functions and moments...................................... 1 1.2 Probability distributions...................................... 2 1.2.1 Bernoulli distribution...................................
More information15.062 Data Mining: Algorithms and Applications Matrix Math Review
.6 Data Mining: Algorithms and Applications Matrix Math Review The purpose of this document is to give a brief review of selected linear algebra concepts that will be useful for the course and to develop
More informationLS.6 Solution Matrices
LS.6 Solution Matrices In the literature, solutions to linear systems often are expressed using square matrices rather than vectors. You need to get used to the terminology. As before, we state the definitions
More informationTHE MULTINOMIAL DISTRIBUTION. Throwing Dice and the Multinomial Distribution
THE MULTINOMIAL DISTRIBUTION Discrete distribution  The Outcomes Are Discrete. A generalization of the binomial distribution from only 2 outcomes to k outcomes. Typical Multinomial Outcomes: red A area1
More informationAuxiliary Variables in Mixture Modeling: 3Step Approaches Using Mplus
Auxiliary Variables in Mixture Modeling: 3Step Approaches Using Mplus Tihomir Asparouhov and Bengt Muthén Mplus Web Notes: No. 15 Version 8, August 5, 2014 1 Abstract This paper discusses alternatives
More informationPortfolio Using Queuing Theory
Modeling the Number of Insured Households in an Insurance Portfolio Using Queuing Theory JeanPhilippe Boucher and Guillaume CouturePiché December 8, 2015 Quantact / Département de mathématiques, UQAM.
More informationLinear Threshold Units
Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear
More informationSimple Linear Regression Inference
Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation
More informationOLS in Matrix Form. Let y be an n 1 vector of observations on the dependent variable.
OLS in Matrix Form 1 The True Model Let X be an n k matrix where we have observations on k independent variables for n observations Since our model will usually contain a constant term, one of the columns
More informationMATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 2. x n. a 11 a 12 a 1n b 1 a 21 a 22 a 2n b 2 a 31 a 32 a 3n b 3. a m1 a m2 a mn b m
MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS 1. SYSTEMS OF EQUATIONS AND MATRICES 1.1. Representation of a linear system. The general system of m equations in n unknowns can be written a 11 x 1 + a 12 x 2 +
More informationInterpretation of Somers D under four simple models
Interpretation of Somers D under four simple models Roger B. Newson 03 September, 04 Introduction Somers D is an ordinal measure of association introduced by Somers (96)[9]. It can be defined in terms
More informatione.g. arrival of a customer to a service station or breakdown of a component in some system.
Poisson process Events occur at random instants of time at an average rate of λ events per second. e.g. arrival of a customer to a service station or breakdown of a component in some system. Let N(t) be
More informationCredit Risk Models: An Overview
Credit Risk Models: An Overview Paul Embrechts, Rüdiger Frey, Alexander McNeil ETH Zürich c 2003 (Embrechts, Frey, McNeil) A. Multivariate Models for Portfolio Credit Risk 1. Modelling Dependent Defaults:
More informationErrata and updates for ASM Exam C/Exam 4 Manual (Sixteenth Edition) sorted by page
Errata for ASM Exam C/4 Study Manual (Sixteenth Edition) Sorted by Page 1 Errata and updates for ASM Exam C/Exam 4 Manual (Sixteenth Edition) sorted by page Practice exam 1:9, 1:22, 1:29, 9:5, and 10:8
More information171:290 Model Selection Lecture II: The Akaike Information Criterion
171:290 Model Selection Lecture II: The Akaike Information Criterion Department of Biostatistics Department of Statistics and Actuarial Science August 28, 2012 Introduction AIC, the Akaike Information
More informationSYSM 6304: Risk and Decision Analysis Lecture 3 Monte Carlo Simulation
SYSM 6304: Risk and Decision Analysis Lecture 3 Monte Carlo Simulation M. Vidyasagar Cecil & Ida Green Chair The University of Texas at Dallas Email: M.Vidyasagar@utdallas.edu September 19, 2015 Outline
More informationSampling Distributions and the Central Limit Theorem
135 Part 2 / Basic Tools of Research: Sampling, Measurement, Distributions, and Descriptive Statistics Chapter 10 Sampling Distributions and the Central Limit Theorem In the previous chapter we explained
More information1 Determinants and the Solvability of Linear Systems
1 Determinants and the Solvability of Linear Systems In the last section we learned how to use Gaussian elimination to solve linear systems of n equations in n unknowns The section completely sidestepped
More informationExam C, Fall 2006 PRELIMINARY ANSWER KEY
Exam C, Fall 2006 PRELIMINARY ANSWER KEY Question # Answer Question # Answer 1 E 19 B 2 D 20 D 3 B 21 A 4 C 22 A 5 A 23 E 6 D 24 E 7 B 25 D 8 C 26 A 9 E 27 C 10 D 28 C 11 E 29 C 12 B 30 B 13 C 31 C 14
More informationExploratory Factor Analysis
Introduction Principal components: explain many variables using few new variables. Not many assumptions attached. Exploratory Factor Analysis Exploratory factor analysis: similar idea, but based on model.
More informationProbability & Statistics Primer Gregory J. Hakim University of Washington 2 January 2009 v2.0
Probability & Statistics Primer Gregory J. Hakim University of Washington 2 January 2009 v2.0 This primer provides an overview of basic concepts and definitions in probability and statistics. We shall
More informationRegression Modeling Strategies
Frank E. Harrell, Jr. Regression Modeling Strategies With Applications to Linear Models, Logistic Regression, and Survival Analysis With 141 Figures Springer Contents Preface Typographical Conventions
More informationParametric Models. dh(t) dt > 0 (1)
Parametric Models: The Intuition Parametric Models As we saw early, a central component of duration analysis is the hazard rate. The hazard rate is the probability of experiencing an event at time t i
More informationLinear Classification. Volker Tresp Summer 2015
Linear Classification Volker Tresp Summer 2015 1 Classification Classification is the central task of pattern recognition Sensors supply information about an object: to which class do the object belong
More informationAdjusting the Mantel Haenszel Test Statistic and Odds Ratio for Cluster Sampling
Adjusting the Mantel Haenszel Test Statistic and Odds Ratio for Cluster Sampling Gilles Lamothe Department of Mathematics and Statistics, University of Ottawa October 15, 2011 1 Introduction The purpose
More informationEconometrics Simple Linear Regression
Econometrics Simple Linear Regression Burcu Eke UC3M Linear equations with one variable Recall what a linear equation is: y = b 0 + b 1 x is a linear equation with one variable, or equivalently, a straight
More information1. χ 2 minimization 2. Fits in case of of systematic errors
Data fitting Volker Blobel University of Hamburg March 2005 1. χ 2 minimization 2. Fits in case of of systematic errors Keys during display: enter = next page; = next page; = previous page; home = first
More informationMaximum Likelihood Estimation
Math 541: Statistical Theory II Lecturer: Songfeng Zheng Maximum Likelihood Estimation 1 Maximum Likelihood Estimation Maximum likelihood is a relatively simple method of constructing an estimator for
More informationIt is not immediately obvious that this should even give an integer. Since 1 < 1 5
Math 163  Introductory Seminar Lehigh University Spring 8 Notes on Fibonacci numbers, binomial coefficients and mathematical induction These are mostly notes from a previous class and thus include some
More information11 Linear and Quadratic Discriminant Analysis, Logistic Regression, and Partial Least Squares Regression
Frank C Porter and Ilya Narsky: Statistical Analysis Techniques in Particle Physics Chap. c11 2013/9/9 page 221 letex 221 11 Linear and Quadratic Discriminant Analysis, Logistic Regression, and Partial
More informationLOGISTIC REGRESSION ANALYSIS
LOGISTIC REGRESSION ANALYSIS C. Mitchell Dayton Department of Measurement, Statistics & Evaluation Room 1230D Benjamin Building University of Maryland September 1992 1. Introduction and Model Logistic
More informationLecture 14: GLM Estimation and Logistic Regression
Lecture 14: GLM Estimation and Logistic Regression Dipankar Bandyopadhyay, Ph.D. BMTRY 711: Analysis of Categorical Data Spring 2011 Division of Biostatistics and Epidemiology Medical University of South
More information