λ(t, Z) = λ 0 (t) exp(βz)

Size: px
Start display at page:

Download "λ(t, Z) = λ 0 (t) exp(βz)"

Transcription

1 Parametric Survival Analysis So far, we have focused primarily on nonparametric and semi-parametric approaches to survival analysis, with heavy emphasis on the Cox proportional hazards model: λ(t, Z) = λ 0 (t) exp(βz) We used the following estimating approach: We estimated λ 0 (t) nonparametrically, using the Kaplan-Meier estimator, or using the Kalbfleisch/Prentice estimator under the PH assumption We estimated β by assuming a linear model between the log HR and covariates, under the PH model Both estimates were based on maximum likelihood theory. Reading: for parametric models see Collett. 1

2 There are several reasons why we should consider some alternative approaches based on parametric models: The assumption of proportional hazards might not be appropriate (based on major departures) If a parametric model actually holds, then we would probably gain efficiency We may want to handle non-standard situations like interval censoring incorporating population mortality We may want to make some connections with other familiar approaches (e.g. use of the Poisson likelihood) We may want to obtain some estimates for use in designing a future survival study. 2

3 A simple start: Exponential Regression Observed data: (X i, δ i, Z i ) for individual i, Z i = (Z i1, Z i2,..., Z ip ) represents a set of p covariates. Right censoring: Assume that X i = min(t i, U i ) Survival distribution: Assume T i follows an exponential distribution with a parameter λ that depends on Z i, say λ i = Ψ(Z i ). Then we can write: T i exponential(ψ(z i )) 3

4 First, let s review some facts about the exponential distribution (from our first survival lecture): f(t) = λe λt for t 0 S(t) = P (T t) = t f(u)du = e λt F (t) = P (T < t) = 1 e λt λ(t) = f(t) S(t) = λ constant hazard! Λ(t) = t 0 λ(u) du = t 0 λ du = λt 4

5 Now, we say that λ is a constant over time t, but we want to let it depend on the covariate values, so we are setting λ i = Ψ(Z i ) The hazard rate would therefore be the same for any two individuals with the same covariate values. Although there are many possible choices for Ψ, one simple and natural choice is: WHY? Ψ(Z i ) = exp[β 0 + Z i1 β 1 + Z i2 β Z ip β p ] ensures a positive hazard for an individual with Z = 0, the hazard is e β 0. The model is called exponential regression because of the natural generalization from regular linear regression 5

6 Exponential regression for the 2-sample case: Assume we have only a single covariate Z = Z, i.e., (p = 1). Hazard Rate: Ψ(Z i ) = exp(β 0 + Z i β 1 ) Define: Z i = 0 if individual i is in group 0 Z i = 1 if individual i is in group 1 What is the hazard for group 0? What is the hazard for group 1? What is the hazard ratio of group 1 to group 0? What is the interpretation of β 1? 6

7 Likelihood for Exponential Model Under the assumption of right censored data, each person has one of two possible contributions to the likelihood: (a) they have an event at X i (δ i = 1) contribution is L i = S(X i ) }{{} λ(x i) }{{} = e λx i λ survive to X i fail at X i (b) they are censored at X i (δ i = 0) contribution is L i = S(X i ) }{{} = e λx i survive to X i 7

8 The likelihood is the product over all of the individuals: L = i L i = i ( λe λx i ) δi }{{} events ( e λx i ) (1 δi ) }{{} censorings = i λ δ i ( e λx i ) 8

9 Maximum Likelihood for Exponential How do we use the likelihood? first take the log then take the partial derivative with respect to β then set to zero and solve for β this gives us the maximum likelihood estimators 9

10 The log-likelihood is: [ log L = log i λ δ i ( e λx i ) ] = i [δ i log(λ) λx i ] = i [δ i log(λ)] i λx i For the case of exponential regression, we now substitute the hazard λ = Ψ(Z i ) in the above log-likelihood: log L = i [δ i log(ψ(z i ))] i Ψ(Z i )X i (1) 10

11 General Form of Log-likelihood for Right Censored Data In general, whenever we have right censored data, the likelihood and corresponding log likelihood will have the following forms: L = i [λ i (X i )] δ i S i (X i ) log L = i [δ i log (λ i (X i ))] i Λ i (X i ) where λ i (X i ) is the hazard for the individual i who fails at X i Λ i (X i ) is the cumulative hazard for an individual at their failure or censoring time For example, see the derivation of the likelihood for a Cox model on p of Lecture 4 notes. We started with the likelihood above, then substituted the specific forms for λ(x i ) under the PH assumption. 11

12 Consider our model for the hazard rate: λ = Ψ(Z i ) = exp[β 0 + Z i1 β 1 + Z i2 β Z ip β p ] We can write this using vector notation, as follows: Let Z i = (1, Z i1,...z ip ) T and β = (β 0, β 1,...β p ) (Since β 0 is the intercept (i.e., the log hazard rate for the baseline group), we put a 1 as the first term in the vector Z i.) Then, we can write the hazard as: Ψ(Z i ) = exp[βz i ] Now we can substitute Ψ(Z i ) = exp[βz i ] in the log-likelihood shown in (1): log L = n δ i (βz i ) i=1 n X i exp(βz i ) i=1 12

13 Score Equations Taking the derivative with respect to β 0, the score equation is: log L n = [δ i X i exp(βz i )] β 0 i=1 For β k, k = 1,...p, the equations are: log L n = [δ i Z ik X i Z ik exp(βz i )] β k = i=1 n i=1 Z ik [δ i X i exp(βz i )] To find the MLE s, we set the above equations to 0 and solve (simultaneously). The equations above imply that the MLE s are obtained by setting the weighted number of failures ( i Z ikδ i ) equal to the weighted cumulative hazard ( i Z ikλ(x i )). 13

14 To find the variance of the MLE s, we need to take the second derivatives: 2 log L β k β j = n Z ik Z ij X i exp(βz i ) i=1 Some algebra (see Cox and Oakes section 6.2) reveals that where V ar( β) = I(β) 1 = [ Z(I Π)Z T ] 1 Z = (Z 1,..., Z n ) is a (p + 1) n matrix (p covariates plus the 1 for the intercept β 0 ) Π = diag(π 1,..., π n ) (this means that Π is a diagonal matrix, with the terms π 1,..., π n on the diagonal) 14

15 π i is the probability that the i-th person is censored, so (1 π i ) is the probability that they failed. Note: The information I(β) (inverse of the variance) is proportional to the number of failures, not the sample size. This will be important when we talk about study design. 15

16 The Single Sample Problem (Z i = 1 for everyone): First, what is the MLE of β 0? We set log L β 0 = n i=1 [δ i X i exp(β 0 Z i )] equal to 0 and solve: n δ i = i=1 exp( β 0 ) = n [X i exp(β 0 )] i=1 d = exp(β 0 ) d n i=1 X i n i=1 X i ˆλ = d t where d is the total number of deaths (or events), and t = X i is the total person-time contributed by all individuals. 16

17 If d/t is the MLE for λ, what does this imply about the MLE of β 0? Using the previous formula V ar( ˆβ) = [ Z(I Π)Z T ] 1, what is the variance of β 0?: With some matrix algebra, you can show that it is: V ar( β 0 ) = 1 n i=1 (1 π i) = 1 d 17

18 What about ˆλ = e ˆβ 0? By the delta method, V ar(ˆλ) = ˆλ 2 V ar( β 0 ) =? 18

19 The Two-Sample Problem: Z i Subjects Events Follow-up Group 0: Z i = 0 n 0 d 0 t 0 = n 0 i=1 X i Group 1: Z i = 1 n 1 d 1 t 1 = n 1 i=1 X i 19

20 The log-likelihood: log L = n δ i (β 0 + β 1 Z i ) i=1 n X i exp(β 0 + β 1 Z i ) i=1 so log L β 0 = n [δ i X i exp(β 0 + β 1 Z i )] i=1 = (d 0 + d 1 ) (t 0 e β 0 + t 1 e β 0+β 1 ) log L β 1 = n Z i [δ i X i exp(β 0 + β 1 Z i )] i=1 = d 1 t 1 e β 0+β 1 20

21 This implies: ˆλ1 = e ˆβ 0 + ˆβ 1 =? ˆλ 0 = e ˆβ 0 =? ˆβ 0 =? ˆβ 1 =? 21

22 Important Result: The maximum likelihood estimates (MLE s) of the hazard rates under the exponential model are the number of events divided by the person-years of follow-up! (this result will be relied on heavily when we discuss study design) 22

23 Exponential Regression: Means and Medians Mean Survival Time For the exponential distribution, E(T ) = 1/λ. Control Group: T 0 = 1/ˆλ 0 = 1/ exp( ˆβ 0 ) Treatment Group: T 1 = 1/ˆλ 1 = 1/ exp( ˆβ 0 + ˆβ 1 ) 23

24 Median Survival Time This is the value M at which S(t) = e λt = 0.5, so M = median = log(0.5) λ Control Group: ˆM 0 = log(0.5) ˆλ 0 = log(0.5) exp( ˆβ 0 ) Treatment Group: ˆM 1 = log(0.5) ˆλ 1 = log(0.5) exp( ˆβ 0 + ˆβ 1 ) 24

25 Exponential Regression: Variance Estimates and Test Statistics We can also calculate the variances of the MLE s as simple functions of the number of failures: var( ˆβ 0 ) = 1 d 0 var( ˆβ 1 ) = 1 d d 1 25

26 So our test statistics are formed as: For testing H o : β 0 = 0: For testing H o : β 1 = 0: χ 2 w = χ 2 w = ( ˆβ0 ) 2 var( ˆβ 0 ) = [log(d 0/t 0 )] 2 1/d 0 = ( ˆβ1 ) 2 var( ˆβ 1 ) [ log( d 1/t 1 d 0 /t 0 ) 1 d d 1 How would we form confidence intervals for the hazard ratio? ] 2 26

27 The Likelihood Ratio Test Statistic: (An alternative to the Wald test) A likelihood ratio test is based on 2 times the log of the ratio of the likelihoods under the null and alternative. We reject H 0 if 2 log(lr) > χ 2 1,0.05, where LR = L(H 1) L(H 0 ) = L( λ 0, λ 1 ) L( λ) 27

28 For a sample of n independent exponential random variables with parameter λ, the Likelihood is: n L = [λ δ i exp( λx i )] i=1 = λ d exp( λ x i ) = λ d exp( λn x) where d is the number of deaths or failures. The log-likelihood is l = d log(λ) λn x and the MLE is λ = d/(n x) 28

29 2-Sample Case: LR test calculations Data: Group 0: d 0 failures among the n 0 females mean failure time is x 0 = ( n 0 i X i )/n 0 Group 1: d 1 failures among the n 1 males mean failure time is x 1 = ( n 1 i X i )/n 1 Under the alternative hypothesis: L = λ d 1 1 exp( λ 1n 1 x 1 ) λ d 0 0 exp( λ 0n 0 x 0 ) log(l) = d 1 log(λ 1 ) λ 1 n 1 x 1 + d 0 log(λ 0 ) λ 0 n 0 x 0 29

30 The MLE s are: λ 1 = d 1 /(n 1 x 1 ) for males λ 0 = d 0 /(n 0 x 0 ) for females Under the null hypothesis: L = λ d 1+d 0 exp[ λ(n 1 x 1 + n 0 x 0 )] log(l) = (d 1 + d 0 ) log(λ) λ[n 1 x 1 + n 0 x 0 ] The corresponding MLE is λ = (d 1 + d 0 )/[n 1 x 1 + n 0 x 0 ] 30

31 A likelihood ratio test can be constructed by taking twice the difference of the log-likelihoods under the alternative and the null hypotheses: 2 [ ( ) d0 + d 1 (d 0 + d 1 ) log t 0 + t 1 ] d 1 log[d 1 /t 1 ] d 0 log[d 0 /t 0 ] 31

32 Nursing home example: For the females: n 0 = 1173 d 0 = 902 t 0 = x 0 = 265 For the males: n 1 = 418 d 1 = 367 t 1 = x 1 = 181 Plugging these values in, we get a LR test statistic of

33 Hand Calculations using events and follow-up: By adding up los for males to get t 1 and for females to get t 0, I obtained: d 0 = 902 (females) d 1 = 367 (males) t 0 = (female follow-up) t 1 = (male follow-up) This yields an estimated log HR: [ ] [ ] d1/t1 367/75457 ˆβ 1 = log = log d0/t0 902/ = log(1.6756) =

34 The estimated standard error is: var( ˆβ 1 1 ) = + 1 = d 1 d = So the Wald test becomes: χ 2 W = ˆβ 1 2 var( ˆβ 1 ) = ( ) = We can also calculate ˆβ 0 = log(d 0 /t 0 ) = 5.842, along with its standard error se( ˆβ 0 ) = (1/d0) =

35 Exponential Regression in STATA. use nurshome. stset los fail. streg gender, dist(exp) nohr failure _d: fail analysis time _t: los Iteration 0: log likelihood = Iteration 1: log likelihood = Iteration 2: log likelihood = Iteration 3: log likelihood = Iteration 4: log likelihood = Exponential regression -- log relative-hazard form No. of subjects = 1591 Number of obs = 1591 No. of failures = 1269 Time at risk = LR chi2(1) = Log likelihood = Prob > chi2 = _t Coef. Std. Err. z P> z [95% Conf. Interval] gender _cons Since Z = 8.337, the chi-square test is Z 2 =

36 The Weibull Regression Model At the beginning of the course, we saw that the survivorship function for a Weibull random variable is: and the hazard function is: S(t) = exp[ λ(t κ )] λ(t) = κ λ t (κ 1) The Weibull regression model assumes that for someone with covariates Z i, the survivorship function is S(t; Z i ) = exp[ Ψ(Z i )(t κ )] where Ψ(Z i ) is defined as in exponential regression to be: Ψ(Z i ) = exp[β 0 + Z i1 β 1 + Z i2 β Z ip β p ] For the 2-sample problem, we have: Ψ(Z i ) = exp[β 0 + Z i1 β 1 ] 36

37 Weibull MLEs for the 2-sample problem: Log-likelihood: log L = n i=1 δ i log [ κ exp(β 0 + β 1 Z i )X κ 1 ] n i Xi κ exp(β 0 + β 1 Z i ) i=1 exp( ˆβ 0 ) = d 0 /t 0 κ exp( ˆβ 0 + ˆβ 1 ) = d 1 /t 1 κ where t jκ = ˆλ 0 (t) = ˆκ exp( ˆβ 0 ) tˆκ 1 n j i=1 X ˆκ i among n j subjects ˆλ1 (t) = ˆκ exp( ˆβ 0 + ˆβ 1 ) tˆκ 1 ĤR = ˆλ 1 (t)/ˆλ 0 (t) = exp( ˆβ 1 ) ( ) d1 /t 1 κ = exp d 0 /t 0 κ 37

38 Weibull Regression: Means and Medians Mean Survival Time For the Weibull distribution, E(T ) = λ ( 1/κ) Γ[(1/κ) + 1]. Control Group: T 0 = ˆλ ( 1/ˆκ) 0 Γ[(1/ˆκ) + 1] Treatment Group: T 1 = ˆλ ( 1/ˆκ) 1 Γ[(1/ˆκ) + 1] 38

39 Median Survival Time For the Weibull distribution, M = median = [ ] 1/κ log(0.5) λ Control Group: ˆM 0 = [ log(0.5) ˆλ 0 ] 1/ˆκ Treatment Group: ˆM 1 = [ log(0.5) ˆλ 1 ] 1/ˆκ where ˆλ 0 = exp( ˆβ 0 ) and ˆλ 1 = exp( ˆβ 0 + ˆβ 1 ). 39

40 Note: the symbol Γ is the gamma function. If x is an integer, then Γ(x) = (x 1)! In cases where x is not an integer, this function has to be evaluated numerically. In homework and labs, I will supply this value to you. The Weibull regression model is very easy to fit: In stata: Just specify dist(weibull) instead of dist(exp) within the streg command In sas: use model option dist=weibull within the proc lifereg procedure Note: to get more information on these modeling procedures, use the online help facilities. For example, in Stata, you can type:.help streg 40

41 Weibull in Stata:. streg gender, dist(weibull) nohr failure _d: analysis time _t: fail los Fitting constant-only model: Iteration 0: log likelihood = Iteration 1: log likelihood = Iteration 2: log likelihood = Iteration 3: log likelihood = Iteration 4: log likelihood = Fitting full model: Iteration 0: log likelihood = Iteration 1: log likelihood = Iteration 2: log likelihood = Iteration 3: log likelihood = Iteration 4: log likelihood =

42 Weibull regression -- log relative-hazard form No. of subjects = 1591 Number of obs = 1591 No. of failures = 1269 Time at risk = LR chi2(1) = Log likelihood = Prob > chi2 = _t Coef. Std. Err. z P> z [95% Conf. Interval] gender _cons /ln_p p /p

43 Comparison of Exponential with Kaplan-Meier We can see how well the Exponential model fits by comparing the survival estimates for males and females under the exponential model, i.e., P (T t) = e ( ˆλ z t), to the Kaplan-Meier survival estimates: S u r v i v a l L e n g t h o f S t a y ( d a y s ) 43

44 Comparison of Weibull with Kaplan-Meier We can see how well the Weibull model fits by comparing the survival estimates, P (T t) = e ( ˆλ z tˆκ), to the Kaplan-Meier survival estimates. S u r v i v a l Which do you think fits best? L e n g t h o f S t a y ( d a y s ) 44

45 Other useful plots for evaluating fit to exponential and Weibull models log(ŝ(t)) vs t log[ log(ŝ(t))] vs log(t) Why are these useful? If T is exponential, then S(t) = exp( λt)) so log(s(t)) = λt and Λ(t) = λ t a straight line in t with slope λ and intercept=0 45

46 If T is Weibull, then S(t) = exp( (λt) κ ) so log(s(t)) = λt κ then Λ(t) = λt κ and log( log(s(t))) = log(λ) + κ log(t) a straight line in log(t) with slope κ and intercept log(λ). 46

47 So we can calculate our estimated Λ(t) and plot it versus t, and if it seems to form a straight line, then the exponential distribution is probably appropriate for our dataset. Plots for nursing home data: ˆΛ(t) vs t N e g a t i v e L o g S D F L O S 47

48 Or we can plot log ˆΛ(t) versus log(t), and if it seems to form a straight line, then the Weibull distribution is probably appropriate for our dataset. Plots for nursing home data: log[ log(ŝ(t))] vs log(t) L o g N e g a t i v e L o g S D F L o g o f L O S 48

49 Comparison of Methods for the Two-sample problem: Data: Z i Subjects Events Follow-up Group 0: Z i = 0 n 0 d 0 t 0 = n 0 i=1 X i Group 1: Z i = 1 n 1 d 1 t 1 = n 1 i=1 X i In General: λ z (t) = λ(t, Z = z) for z = 0 or 1. The hazard rate depends on the value of the covariate Z. In this case, we are assuming that we only have a single covariate, and it is binary (Z = 1 or Z = 0) 49

50 Reading from Collett: Section(s) Description 4.1.1, Exponential properties Weibull properties 4.3.1, Exponential ML estimation Weibull ML estimation 4.5 General Weibull regression 4.6 Model selection - Weibull regression 4.7 Weibull/AFT model connection Ch.6 AFT - Other parametric models 50

51 MODELS Exponential Regression: λ z (t) = exp(β 0 + β 1 Z) λ 0 = exp(β 0 ) λ 1 = exp(β 0 + β 1 ) HR = exp(β 1 ) 51

52 Weibull Regression: λ z (t) = κ exp(β 0 + β 1 Z) t κ 1 λ 0 = κ exp(β 0 ) t κ 1 λ 1 = κ exp(β 0 + β 1 ) t κ 1 HR = exp(β 1 ) Proportional Hazards Model: λ z (t) = λ 0 (t) exp(β 1 ) λ 0 = λ 0 (t) KM? λ 1 = λ 0 (t) exp(β 1 ) HR = exp(β 1 ) 52

53 Remarks Exponential model is a special case of the Weibull model with κ = 1 (note: Collett uses γ instead of κ) Exponential and Weibull models are both special cases of the Cox PH model. How can you show this? If either the exponential model or the Weibull model is valid, then these models will tend to be more efficient than PH (smaller s.e. s of estimates). This is because they assume a particular form for λ 0 (t), rather than estimating it at every death time. 53

54 For the Exponential model, the hazards are constant over time, given the value of the covariate Z i : Z i = 0 ˆλ 0 = exp( ˆβ 0 ) Z i = 1 ˆλ 0 = exp( ˆβ 0 + ˆβ 1 ) For the Weibull model, we have to estimate the hazard as a function of time, given the estimates of β 0, β 1 and κ: Z i = 0 ˆλ 0 (t) = ˆκ exp( ˆβ 0 ) tˆκ 1 Z i = 1 ˆλ 1 (t) = ˆκ exp( ˆβ 0 + ˆβ 1 ) tˆκ 1 However, the ratio of the hazards is still just exp( ˆβ 1 ), since the other terms cancel out. 54

55 Here s what the estimated hazards look like for the nursing home data: H a z a r d R a t e E x p o n e n t i a l H a z a r d : F e m a l e E x p o n e n t i a l H a z a r d : M a l e W e i b u l l H a z a r d : F e m a l e W e i b u l l H a z a r d : M a l e L e n g t h o f s t a y ( d a y s ) 55

56 Proportional Hazards Model: To get the MLE s for this model, we have to maximize the Cox partial likelihood iteratively. There are not closed form expressions like above. L(β) = = [ n i=1 [ n i=1 e βz i l R(X i ) eβz l ] δi e β 0+β 1 Z i l R(X i ) eβ 0+β 1 Z l ] δi 56

57 Comparison with Proportional Hazards Model. stcox gender, nohr failure _d: fail analysis time _t: los Iteration 0: log likelihood = Iteration 1: log likelihood = Iteration 2: log likelihood = Iteration 3: log likelihood = Refining estimates: Iteration 0: log likelihood = Cox regression -- Breslow method for ties No. of subjects = 1591 Number of obs = 1591 No. of failures = 1269 Time at risk = LR chi2(1) = Log likelihood = Prob > chi2 = _t _d Coef. Std. Err. z P> z [95% Conf. Interval] gender For the PH model, ˆβ 1 = and ĤR = e0.394 =

58 Comparison with the Logrank and Wilcoxon Tests. sts test gender failure _d: analysis time _t: fail los Log-rank test for equality of survivor functions Events gender observed expected Total chi2(1) = Pr>chi2 =

59 . sts test gender, wilcoxon failure _d: analysis time _t: fail los Wilcoxon (Breslow) test for equality of survivor functions Events Sum of gender observed expected ranks Total chi2(1) = Pr>chi2 =

60 Comparison of Hazard Ratios and Test Statistics for effect of Gender Wald Model/Method λ 0 λ 1 HR log(hr) se(log HR) Statistic Exponential Weibull t = t = t = Logrank Wilcoxon Cox PH Ties=Breslow Ties=Discrete Ties=Efron Ties=Exact Score (Discrete)

61 Comparison of Mean and Median Survival Times by Gender Mean Survival Median Survival Model/Method Female Male Female Male Exponential Weibull Kaplan-Meier Cox PH (Kalbfleisch/Prentice) 61

62 The Accelerated Failure Time Model The general form of an accelerated failure time (AFT) model is: where log(t i ) = β AF T Z i + σϵ log(t i ) is the log of a survival time β AF T is the vector of AFT model parameters corresponding to the covariate vector Z i ϵ is a random error term σ is a scale factor In other words, we can model the log-survival times as a linear function of the covariates! The streg command in stata (without the exponential or weibull option) uses this log-linear model for fitting parametric models. 62

63 By choosing different distributions for ϵ, we can obtain different parametric distributions: Exponential Weibull Gamma Log-logistic Normal Lognormal We can compare the predicted survival under any of these parametric distributions to the KM estimated survival to see which one seems to fit best. Once we decide on a certain class of model (say, Gamma), we can evaluate the contributions of covariates by finding the MLE s, and constructing Wald, Score, or LR tests of the covariate effects. 63

64 We can motivate the AFT model by first demonstrating the following two relationships: 1. For the Exponential Model: If the failure times T i = T (Z i ) follow an exponential distribution, i.e., S i (t) = e λ it with λ i = exp(βz i ), then log(t i ) = βz i + ϵ where ϵ follows an extreme value distribution (which just means that e ϵ follows a unit exponential distribution). 64

65 2. For the Weibull Model: If the failure times T i = T (Z i ) follow a Weibull distribution, i.e., S i (t) = e λ it κ with λ i = exp(βz i ), then log(t i ) = σβz i + σϵ where ϵ again follows an extreme value distribution, and σ = 1/κ. In other words, both the Exponential and Weibull model can be written in the form of a log-linear model for the survival times, if we choose the right distribution for ϵ. 65

66 The log-linear form for the exponential can be derived by: (1) Creating a new variable T 0 = T Z exp(βz i ) ( (2) Taking the log of T Z, yielding log(t Z ) = log Step (1): For an exponential model, recall that: T 0 exp(βz i ) ) S i (t) = P r(t Z t) = e λt, with λ = exp(βz i ) It follows that T 0 exp(1): S 0 (t) = P r(t 0 t) = P r(t Z exp(βz) t) = P r(t Z t exp( βz)) = exp [ λ t exp( βz)] = exp [ exp(βz) t exp( βz)] = exp( t) 66

67 Step (2): Now take the log of the survival time: ( ) T 0 log(t Z ) = log exp(βz i ) = log(t 0 ) log (exp(βz i )) = βz i + log(t 0 ) = βz i + ϵ where ϵ = log(t 0 ) follows the extreme value distribution. 67

68 Relationship between Exponential and Weibull If T Z has a Weibull distribution, i.e., S(t) = e λtκ with λ = exp(βz i ), then you can show that the new variable T Z = T κ Z follows an exponential distribution with parameter exp(βz i ). Based on the previous page, we can therefore write: log(t ) = βz + ϵ (where ϵ has an extreme value distribution.) 68

69 But since log(t ) = log(t κ ) = κ log(t ), we can write: log(t ) = log(t )/κ = (1/κ) ( βz i + ϵ) = σβz i + σϵ where σ = 1/κ. 69

70 This motivates the following general definition of the Accelerated Failure Time Model by: log(t i ) = β AF T Z i + σϵ where ϵ is a random error term, σ is a scale factor, Y is the log of a survival random variable, and β AF T = σβ e where β e came from the hazard λ = exp(βz). 70

71 The defining feature of an AFT model is: S(t; Z) = S i (t) = S 0 (ϕ t) That is, the effect of covariates is to accelerate (stretch) or decelerate (shrink) the time-scale. Effect of AFT on hazard: λ i (t) = ϕ λ 0 (ϕt) 71

72 One way to interpret the AFT model is via its effect on median survival times. If S i (t) = 0.5, then S 0 (ϕt) = 0.5. This means: Interpretation: M i = ϕm 0 For ϕ < 1, there is an acceleration of the endpoint (if M 0 = 2yrs in control and ϕ = 0.5, then M i = 1yr. For ϕ > 1, there is a stretching or delay in endpoint In general, the lifetime of individual i is ϕ times what they would have experienced in the reference group Since ϕ must be positive and a function of the covariates, we model ϕ = exp(βz i ). 72

73 When does Proportional hazards = AFT? According to the proportional hazards model: S(t) = S 0 (t) exp(βz i) and according to the accelerated failure time model: S(t) = S 0 (t exp(βz i )) Say T i W eibull(λ, κ). Then λ(t) = λκt (κ 1) 73

74 Under the AFT model: λ i (t) = ϕ λ 0 (ϕt) = e βz i λ 0 (e βz i t) = e βz i λ 0 κ ( e βz i t ) (κ 1) = ( e βz i ) κ λ0 κt (κ 1) = ( e βz i ) κ λ0 (t) But this looks just like the PH model: λ i (t) = exp(β Z i ) λ 0 (t) It turns out that the Weibull distribution (and exponential, since this is just a special case of a Weibull with κ = 1) is the only one for which the accelerated failure time and proportional hazards models coincide. 74

75 Special cases of AFT models Exponential regression: σ = 1, ϵ following the extreme value distribution. Weibull regression: σ arbitrary, ϵ following the extreme value distribution. Lognormal regression: σ arbitrary, ϵ following the normal distribution. 75

76 Examples in stata: Using the streg command, one has the following options of distributions for the log-survival times:. streg trt, dist(lognormal) exponential weibull gompertz lognormal loglogistic gamma 76

77 . streg gender, dist(exponential) nohr _t Coef. Std. Err. z P> z [95% Conf. Interval] gender streg gender, dist(weibull) nohr _t Coef. Std. Err. z P> z [95% Conf. Interval] gender /p streg gender, dist(lognormal) _t Coef. Std. Err. z P> z [95% Conf. Interval] gender _cons sigma

78 . streg gender, dist(gamma) _t Coef. Std. Err. z P> z [95% Conf. Interval] gender _cons sigma This gives a good idea of the sensitivity of the test of gender to the choice of model. It is also easy to get predicted survival curves under any of the parametric models using the following:. streg gender, dist(gamma). stcurv, survival The options hazard and cumhaz can also be substituted for survival above to obtain plots. 78

SUMAN DUVVURU STAT 567 PROJECT REPORT

SUMAN DUVVURU STAT 567 PROJECT REPORT SUMAN DUVVURU STAT 567 PROJECT REPORT SURVIVAL ANALYSIS OF HEROIN ADDICTS Background and introduction: Current illicit drug use among teens is continuing to increase in many countries around the world.

More information

Introduction. Survival Analysis. Censoring. Plan of Talk

Introduction. Survival Analysis. Censoring. Plan of Talk Survival Analysis Mark Lunt Arthritis Research UK Centre for Excellence in Epidemiology University of Manchester 01/12/2015 Survival Analysis is concerned with the length of time before an event occurs.

More information

Applied Statistics. J. Blanchet and J. Wadsworth. Institute of Mathematics, Analysis, and Applications EPF Lausanne

Applied Statistics. J. Blanchet and J. Wadsworth. Institute of Mathematics, Analysis, and Applications EPF Lausanne Applied Statistics J. Blanchet and J. Wadsworth Institute of Mathematics, Analysis, and Applications EPF Lausanne An MSc Course for Applied Mathematicians, Fall 2012 Outline 1 Model Comparison 2 Model

More information

Comparison of Survival Curves

Comparison of Survival Curves Comparison of Survival Curves We spent the last class looking at some nonparametric approaches for estimating the survival function, Ŝ(t), over time for a single sample of individuals. Now we want to compare

More information

Lecture 2 ESTIMATING THE SURVIVAL FUNCTION. One-sample nonparametric methods

Lecture 2 ESTIMATING THE SURVIVAL FUNCTION. One-sample nonparametric methods Lecture 2 ESTIMATING THE SURVIVAL FUNCTION One-sample nonparametric methods There are commonly three methods for estimating a survivorship function S(t) = P (T > t) without resorting to parametric models:

More information

Gamma Distribution Fitting

Gamma Distribution Fitting Chapter 552 Gamma Distribution Fitting Introduction This module fits the gamma probability distributions to a complete or censored set of individual or grouped data values. It outputs various statistics

More information

Competing-risks regression

Competing-risks regression Competing-risks regression Roberto G. Gutierrez Director of Statistics StataCorp LP Stata Conference Boston 2010 R. Gutierrez (StataCorp) Competing-risks regression July 15-16, 2010 1 / 26 Outline 1. Overview

More information

HURDLE AND SELECTION MODELS Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics July 2009

HURDLE AND SELECTION MODELS Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics July 2009 HURDLE AND SELECTION MODELS Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics July 2009 1. Introduction 2. A General Formulation 3. Truncated Normal Hurdle Model 4. Lognormal

More information

Lecture 15 Introduction to Survival Analysis

Lecture 15 Introduction to Survival Analysis Lecture 15 Introduction to Survival Analysis BIOST 515 February 26, 2004 BIOST 515, Lecture 15 Background In logistic regression, we were interested in studying how risk factors were associated with presence

More information

Multinomial and Ordinal Logistic Regression

Multinomial and Ordinal Logistic Regression Multinomial and Ordinal Logistic Regression ME104: Linear Regression Analysis Kenneth Benoit August 22, 2012 Regression with categorical dependent variables When the dependent variable is categorical,

More information

Introduction to Event History Analysis DUSTIN BROWN POPULATION RESEARCH CENTER

Introduction to Event History Analysis DUSTIN BROWN POPULATION RESEARCH CENTER Introduction to Event History Analysis DUSTIN BROWN POPULATION RESEARCH CENTER Objectives Introduce event history analysis Describe some common survival (hazard) distributions Introduce some useful Stata

More information

Tests for Two Survival Curves Using Cox s Proportional Hazards Model

Tests for Two Survival Curves Using Cox s Proportional Hazards Model Chapter 730 Tests for Two Survival Curves Using Cox s Proportional Hazards Model Introduction A clinical trial is often employed to test the equality of survival distributions of two treatment groups.

More information

Statistics in Retail Finance. Chapter 6: Behavioural models

Statistics in Retail Finance. Chapter 6: Behavioural models Statistics in Retail Finance 1 Overview > So far we have focussed mainly on application scorecards. In this chapter we shall look at behavioural models. We shall cover the following topics:- Behavioural

More information

Parametric Survival Models

Parametric Survival Models Parametric Survival Models Germán Rodríguez grodri@princeton.edu Spring, 2001; revised Spring 2005, Summer 2010 We consider briefly the analysis of survival data when one is willing to assume a parametric

More information

200609 - ATV - Lifetime Data Analysis

200609 - ATV - Lifetime Data Analysis Coordinating unit: Teaching unit: Academic year: Degree: ECTS credits: 2015 200 - FME - School of Mathematics and Statistics 715 - EIO - Department of Statistics and Operations Research 1004 - UB - (ENG)Universitat

More information

Duration Analysis. Econometric Analysis. Dr. Keshab Bhattarai. April 4, 2011. Hull Univ. Business School

Duration Analysis. Econometric Analysis. Dr. Keshab Bhattarai. April 4, 2011. Hull Univ. Business School Duration Analysis Econometric Analysis Dr. Keshab Bhattarai Hull Univ. Business School April 4, 2011 Dr. Bhattarai (Hull Univ. Business School) Duration April 4, 2011 1 / 27 What is Duration Analysis?

More information

Interpretation of Somers D under four simple models

Interpretation of Somers D under four simple models Interpretation of Somers D under four simple models Roger B. Newson 03 September, 04 Introduction Somers D is an ordinal measure of association introduced by Somers (96)[9]. It can be defined in terms

More information

Statistical Analysis of Life Insurance Policy Termination and Survivorship

Statistical Analysis of Life Insurance Policy Termination and Survivorship Statistical Analysis of Life Insurance Policy Termination and Survivorship Emiliano A. Valdez, PhD, FSA Michigan State University joint work with J. Vadiveloo and U. Dias Session ES82 (Statistics in Actuarial

More information

Survival analysis methods in Insurance Applications in car insurance contracts

Survival analysis methods in Insurance Applications in car insurance contracts Survival analysis methods in Insurance Applications in car insurance contracts Abder OULIDI 1-2 Jean-Marie MARION 1 Hérvé GANACHAUD 3 1 Institut de Mathématiques Appliquées (IMA) Angers France 2 Institut

More information

Modeling the Claim Duration of Income Protection Insurance Policyholders Using Parametric Mixture Models

Modeling the Claim Duration of Income Protection Insurance Policyholders Using Parametric Mixture Models Modeling the Claim Duration of Income Protection Insurance Policyholders Using Parametric Mixture Models Abstract This paper considers the modeling of claim durations for existing claimants under income

More information

Regression Modeling Strategies

Regression Modeling Strategies Frank E. Harrell, Jr. Regression Modeling Strategies With Applications to Linear Models, Logistic Regression, and Survival Analysis With 141 Figures Springer Contents Preface Typographical Conventions

More information

Distribution (Weibull) Fitting

Distribution (Weibull) Fitting Chapter 550 Distribution (Weibull) Fitting Introduction This procedure estimates the parameters of the exponential, extreme value, logistic, log-logistic, lognormal, normal, and Weibull probability distributions

More information

Standard errors of marginal effects in the heteroskedastic probit model

Standard errors of marginal effects in the heteroskedastic probit model Standard errors of marginal effects in the heteroskedastic probit model Thomas Cornelißen Discussion Paper No. 320 August 2005 ISSN: 0949 9962 Abstract In non-linear regression models, such as the heteroskedastic

More information

Survival Analysis of Left Truncated Income Protection Insurance Data. [March 29, 2012]

Survival Analysis of Left Truncated Income Protection Insurance Data. [March 29, 2012] Survival Analysis of Left Truncated Income Protection Insurance Data [March 29, 2012] 1 Qing Liu 2 David Pitt 3 Yan Wang 4 Xueyuan Wu Abstract One of the main characteristics of Income Protection Insurance

More information

Topic 3 - Survival Analysis

Topic 3 - Survival Analysis Topic 3 - Survival Analysis 1. Topics...2 2. Learning objectives...3 3. Grouped survival data - leukemia example...4 3.1 Cohort survival data schematic...5 3.2 Tabulation of events and time at risk...6

More information

MULTIPLE REGRESSION EXAMPLE

MULTIPLE REGRESSION EXAMPLE MULTIPLE REGRESSION EXAMPLE For a sample of n = 166 college students, the following variables were measured: Y = height X 1 = mother s height ( momheight ) X 2 = father s height ( dadheight ) X 3 = 1 if

More information

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN Beckman HLM Reading Group: Questions, Answers and Examples Carolyn J. Anderson Department of Educational Psychology I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN Linear Algebra Slide 1 of

More information

7.1 The Hazard and Survival Functions

7.1 The Hazard and Survival Functions Chapter 7 Survival Models Our final chapter concerns models for the analysis of data which have three main characteristics: (1) the dependent variable or response is the waiting time until the occurrence

More information

SAS Software to Fit the Generalized Linear Model

SAS Software to Fit the Generalized Linear Model SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling

More information

III. INTRODUCTION TO LOGISTIC REGRESSION. a) Example: APACHE II Score and Mortality in Sepsis

III. INTRODUCTION TO LOGISTIC REGRESSION. a) Example: APACHE II Score and Mortality in Sepsis III. INTRODUCTION TO LOGISTIC REGRESSION 1. Simple Logistic Regression a) Example: APACHE II Score and Mortality in Sepsis The following figure shows 30 day mortality in a sample of septic patients as

More information

Statistical Models in R

Statistical Models in R Statistical Models in R Some Examples Steven Buechler Department of Mathematics 276B Hurley Hall; 1-6233 Fall, 2007 Outline Statistical Models Structure of models in R Model Assessment (Part IA) Anova

More information

Discussion Section 4 ECON 139/239 2010 Summer Term II

Discussion Section 4 ECON 139/239 2010 Summer Term II Discussion Section 4 ECON 139/239 2010 Summer Term II 1. Let s use the CollegeDistance.csv data again. (a) An education advocacy group argues that, on average, a person s educational attainment would increase

More information

From the help desk: hurdle models

From the help desk: hurdle models The Stata Journal (2003) 3, Number 2, pp. 178 184 From the help desk: hurdle models Allen McDowell Stata Corporation Abstract. This article demonstrates that, although there is no command in Stata for

More information

Statistics in Retail Finance. Chapter 2: Statistical models of default

Statistics in Retail Finance. Chapter 2: Statistical models of default Statistics in Retail Finance 1 Overview > We consider how to build statistical models of default, or delinquency, and how such models are traditionally used for credit application scoring and decision

More information

ECON 142 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE #2

ECON 142 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE #2 University of California, Berkeley Prof. Ken Chay Department of Economics Fall Semester, 005 ECON 14 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE # Question 1: a. Below are the scatter plots of hourly wages

More information

Ordinal Regression. Chapter

Ordinal Regression. Chapter Ordinal Regression Chapter 4 Many variables of interest are ordinal. That is, you can rank the values, but the real distance between categories is unknown. Diseases are graded on scales from least severe

More information

Nonlinear Regression Functions. SW Ch 8 1/54/

Nonlinear Regression Functions. SW Ch 8 1/54/ Nonlinear Regression Functions SW Ch 8 1/54/ The TestScore STR relation looks linear (maybe) SW Ch 8 2/54/ But the TestScore Income relation looks nonlinear... SW Ch 8 3/54/ Nonlinear Regression General

More information

11. Analysis of Case-control Studies Logistic Regression

11. Analysis of Case-control Studies Logistic Regression Research methods II 113 11. Analysis of Case-control Studies Logistic Regression This chapter builds upon and further develops the concepts and strategies described in Ch.6 of Mother and Child Health:

More information

SPSS TRAINING SESSION 3 ADVANCED TOPICS (PASW STATISTICS 17.0) Sun Li Centre for Academic Computing lsun@smu.edu.sg

SPSS TRAINING SESSION 3 ADVANCED TOPICS (PASW STATISTICS 17.0) Sun Li Centre for Academic Computing lsun@smu.edu.sg SPSS TRAINING SESSION 3 ADVANCED TOPICS (PASW STATISTICS 17.0) Sun Li Centre for Academic Computing lsun@smu.edu.sg IN SPSS SESSION 2, WE HAVE LEARNT: Elementary Data Analysis Group Comparison & One-way

More information

BIOM611 Biological Data Analysis

BIOM611 Biological Data Analysis BIOM611 Biological Data Analysis Spring, 2015 Tentative Syllabus Introduction BIOMED611 is a ½ unit course required for all 1 st year BGS students (except GCB students). It will provide an introduction

More information

5 Modeling Survival Data with Parametric Regression

5 Modeling Survival Data with Parametric Regression 5 Modeling Survival Data with Parametric Regression Models 5. The Accelerated Failure Time Model Before talking about parametric regression models for survival data, let us introduce the accelerated failure

More information

Lab 5 Linear Regression with Within-subject Correlation. Goals: Data: Use the pig data which is in wide format:

Lab 5 Linear Regression with Within-subject Correlation. Goals: Data: Use the pig data which is in wide format: Lab 5 Linear Regression with Within-subject Correlation Goals: Data: Fit linear regression models that account for within-subject correlation using Stata. Compare weighted least square, GEE, and random

More information

Survey, Statistics and Psychometrics Core Research Facility University of Nebraska-Lincoln. Log-Rank Test for More Than Two Groups

Survey, Statistics and Psychometrics Core Research Facility University of Nebraska-Lincoln. Log-Rank Test for More Than Two Groups Survey, Statistics and Psychometrics Core Research Facility University of Nebraska-Lincoln Log-Rank Test for More Than Two Groups Prepared by Harlan Sayles (SRAM) Revised by Julia Soulakova (Statistics)

More information

From the help desk: Swamy s random-coefficients model

From the help desk: Swamy s random-coefficients model The Stata Journal (2003) 3, Number 3, pp. 302 308 From the help desk: Swamy s random-coefficients model Brian P. Poi Stata Corporation Abstract. This article discusses the Swamy (1970) random-coefficients

More information

BayesX - Software for Bayesian Inference in Structured Additive Regression

BayesX - Software for Bayesian Inference in Structured Additive Regression BayesX - Software for Bayesian Inference in Structured Additive Regression Thomas Kneib Faculty of Mathematics and Economics, University of Ulm Department of Statistics, Ludwig-Maximilians-University Munich

More information

A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution

A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution PSYC 943 (930): Fundamentals of Multivariate Modeling Lecture 4: September

More information

Regression III: Advanced Methods

Regression III: Advanced Methods Lecture 16: Generalized Additive Models Regression III: Advanced Methods Bill Jacoby Michigan State University http://polisci.msu.edu/jacoby/icpsr/regress3 Goals of the Lecture Introduce Additive Models

More information

How to set the main menu of STATA to default factory settings standards

How to set the main menu of STATA to default factory settings standards University of Pretoria Data analysis for evaluation studies Examples in STATA version 11 List of data sets b1.dta (To be created by students in class) fp1.xls (To be provided to students) fp1.txt (To be

More information

Sample Size Calculation for Longitudinal Studies

Sample Size Calculation for Longitudinal Studies Sample Size Calculation for Longitudinal Studies Phil Schumm Department of Health Studies University of Chicago August 23, 2004 (Supported by National Institute on Aging grant P01 AG18911-01A1) Introduction

More information

Survival Analysis: Introduction

Survival Analysis: Introduction Survival Analysis: Introduction Survival Analysis typically focuses on time to event data. In the most general sense, it consists of techniques for positivevalued random variables, such as time to death

More information

Checking proportionality for Cox s regression model

Checking proportionality for Cox s regression model Checking proportionality for Cox s regression model by Hui Hong Zhang Thesis for the degree of Master of Science (Master i Modellering og dataanalyse) Department of Mathematics Faculty of Mathematics and

More information

STATISTICA Formula Guide: Logistic Regression. Table of Contents

STATISTICA Formula Guide: Logistic Regression. Table of Contents : Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary

More information

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Objectives: To perform a hypothesis test concerning the slope of a least squares line To recognize that testing for a

More information

A Basic Introduction to Missing Data

A Basic Introduction to Missing Data John Fox Sociology 740 Winter 2014 Outline Why Missing Data Arise Why Missing Data Arise Global or unit non-response. In a survey, certain respondents may be unreachable or may refuse to participate. Item

More information

Lesson 1: Comparison of Population Means Part c: Comparison of Two- Means

Lesson 1: Comparison of Population Means Part c: Comparison of Two- Means Lesson : Comparison of Population Means Part c: Comparison of Two- Means Welcome to lesson c. This third lesson of lesson will discuss hypothesis testing for two independent means. Steps in Hypothesis

More information

CHAPTER 12 EXAMPLES: MONTE CARLO SIMULATION STUDIES

CHAPTER 12 EXAMPLES: MONTE CARLO SIMULATION STUDIES Examples: Monte Carlo Simulation Studies CHAPTER 12 EXAMPLES: MONTE CARLO SIMULATION STUDIES Monte Carlo simulation studies are often used for methodological investigations of the performance of statistical

More information

business statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar

business statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar business statistics using Excel Glyn Davis & Branko Pecar OXFORD UNIVERSITY PRESS Detailed contents Introduction to Microsoft Excel 2003 Overview Learning Objectives 1.1 Introduction to Microsoft Excel

More information

Lecture 14: GLM Estimation and Logistic Regression

Lecture 14: GLM Estimation and Logistic Regression Lecture 14: GLM Estimation and Logistic Regression Dipankar Bandyopadhyay, Ph.D. BMTRY 711: Analysis of Categorical Data Spring 2011 Division of Biostatistics and Epidemiology Medical University of South

More information

Chapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 )

Chapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 ) Chapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 ) and Neural Networks( 類 神 經 網 路 ) 許 湘 伶 Applied Linear Regression Models (Kutner, Nachtsheim, Neter, Li) hsuhl (NUK) LR Chap 10 1 / 35 13 Examples

More information

Applied Reliability Page 1 APPLIED RELIABILITY. Techniques for Reliability Analysis

Applied Reliability Page 1 APPLIED RELIABILITY. Techniques for Reliability Analysis Applied Reliability Page 1 APPLIED RELIABILITY Techniques for Reliability Analysis with Applied Reliability Tools (ART) (an EXCEL Add-In) and JMP Software AM216 Class 4 Notes Santa Clara University Copyright

More information

Descriptive Statistics

Descriptive Statistics Descriptive Statistics Primer Descriptive statistics Central tendency Variation Relative position Relationships Calculating descriptive statistics Descriptive Statistics Purpose to describe or summarize

More information

Least Squares Estimation

Least Squares Estimation Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David

More information

Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur

Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur Module No. #01 Lecture No. #15 Special Distributions-VI Today, I am going to introduce

More information

Basic Statistical and Modeling Procedures Using SAS

Basic Statistical and Modeling Procedures Using SAS Basic Statistical and Modeling Procedures Using SAS One-Sample Tests The statistical procedures illustrated in this handout use two datasets. The first, Pulse, has information collected in a classroom

More information

Statistics 104 Final Project A Culture of Debt: A Study of Credit Card Spending in America TF: Kevin Rader Anonymous Students: LD, MH, IW, MY

Statistics 104 Final Project A Culture of Debt: A Study of Credit Card Spending in America TF: Kevin Rader Anonymous Students: LD, MH, IW, MY Statistics 104 Final Project A Culture of Debt: A Study of Credit Card Spending in America TF: Kevin Rader Anonymous Students: LD, MH, IW, MY ABSTRACT: This project attempted to determine the relationship

More information

Logit Models for Binary Data

Logit Models for Binary Data Chapter 3 Logit Models for Binary Data We now turn our attention to regression models for dichotomous data, including logistic regression and probit analysis. These models are appropriate when the response

More information

From the help desk: Demand system estimation

From the help desk: Demand system estimation The Stata Journal (2002) 2, Number 4, pp. 403 410 From the help desk: Demand system estimation Brian P. Poi Stata Corporation Abstract. This article provides an example illustrating how to use Stata to

More information

Generalized Linear Models

Generalized Linear Models Generalized Linear Models We have previously worked with regression models where the response variable is quantitative and normally distributed. Now we turn our attention to two types of models where the

More information

Poisson Models for Count Data

Poisson Models for Count Data Chapter 4 Poisson Models for Count Data In this chapter we study log-linear models for count data under the assumption of a Poisson error structure. These models have many applications, not only to the

More information

A course on Longitudinal data analysis : what did we learn? COMPASS 11/03/2011

A course on Longitudinal data analysis : what did we learn? COMPASS 11/03/2011 A course on Longitudinal data analysis : what did we learn? COMPASS 11/03/2011 1 Abstract We report back on methods used for the analysis of longitudinal data covered by the course We draw lessons for

More information

Lecture 8: Gamma regression

Lecture 8: Gamma regression Lecture 8: Gamma regression Claudia Czado TU München c (Claudia Czado, TU Munich) ZFS/IMS Göttingen 2004 0 Overview Models with constant coefficient of variation Gamma regression: estimation and testing

More information

Normality Testing in Excel

Normality Testing in Excel Normality Testing in Excel By Mark Harmon Copyright 2011 Mark Harmon No part of this publication may be reproduced or distributed without the express permission of the author. mark@excelmasterseries.com

More information

LOGISTIC REGRESSION ANALYSIS

LOGISTIC REGRESSION ANALYSIS LOGISTIC REGRESSION ANALYSIS C. Mitchell Dayton Department of Measurement, Statistics & Evaluation Room 1230D Benjamin Building University of Maryland September 1992 1. Introduction and Model Logistic

More information

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics For 2015 Examinations Aim The aim of the Probability and Mathematical Statistics subject is to provide a grounding in

More information

An Application of Weibull Analysis to Determine Failure Rates in Automotive Components

An Application of Weibull Analysis to Determine Failure Rates in Automotive Components An Application of Weibull Analysis to Determine Failure Rates in Automotive Components Jingshu Wu, PhD, PE, Stephen McHenry, Jeffrey Quandt National Highway Traffic Safety Administration (NHTSA) U.S. Department

More information

Chapter 6: Multivariate Cointegration Analysis

Chapter 6: Multivariate Cointegration Analysis Chapter 6: Multivariate Cointegration Analysis 1 Contents: Lehrstuhl für Department Empirische of Wirtschaftsforschung Empirical Research and und Econometrics Ökonometrie VI. Multivariate Cointegration

More information

II. DISTRIBUTIONS distribution normal distribution. standard scores

II. DISTRIBUTIONS distribution normal distribution. standard scores Appendix D Basic Measurement And Statistics The following information was developed by Steven Rothke, PhD, Department of Psychology, Rehabilitation Institute of Chicago (RIC) and expanded by Mary F. Schmidt,

More information

Module 14: Missing Data Stata Practical

Module 14: Missing Data Stata Practical Module 14: Missing Data Stata Practical Jonathan Bartlett & James Carpenter London School of Hygiene & Tropical Medicine www.missingdata.org.uk Supported by ESRC grant RES 189-25-0103 and MRC grant G0900724

More information

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written

More information

Please follow the directions once you locate the Stata software in your computer. Room 114 (Business Lab) has computers with Stata software

Please follow the directions once you locate the Stata software in your computer. Room 114 (Business Lab) has computers with Stata software STATA Tutorial Professor Erdinç Please follow the directions once you locate the Stata software in your computer. Room 114 (Business Lab) has computers with Stata software 1.Wald Test Wald Test is used

More information

Survival Analysis of Dental Implants. Abstracts

Survival Analysis of Dental Implants. Abstracts Survival Analysis of Dental Implants Andrew Kai-Ming Kwan 1,4, Dr. Fu Lee Wang 2, and Dr. Tak-Kun Chow 3 1 Census and Statistics Department, Hong Kong, China 2 Caritas Institute of Higher Education, Hong

More information

Survival Distributions, Hazard Functions, Cumulative Hazards

Survival Distributions, Hazard Functions, Cumulative Hazards Week 1 Survival Distributions, Hazard Functions, Cumulative Hazards 1.1 Definitions: The goals of this unit are to introduce notation, discuss ways of probabilistically describing the distribution of a

More information

ESTIMATING AVERAGE TREATMENT EFFECTS: IV AND CONTROL FUNCTIONS, II Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics

ESTIMATING AVERAGE TREATMENT EFFECTS: IV AND CONTROL FUNCTIONS, II Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics ESTIMATING AVERAGE TREATMENT EFFECTS: IV AND CONTROL FUNCTIONS, II Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics July 2009 1. Quantile Treatment Effects 2. Control Functions

More information

Multivariate Logistic Regression

Multivariate Logistic Regression 1 Multivariate Logistic Regression As in univariate logistic regression, let π(x) represent the probability of an event that depends on p covariates or independent variables. Then, using an inv.logit formulation

More information

Nominal and ordinal logistic regression

Nominal and ordinal logistic regression Nominal and ordinal logistic regression April 26 Nominal and ordinal logistic regression Our goal for today is to briefly go over ways to extend the logistic regression model to the case where the outcome

More information

MISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group

MISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group MISSING DATA TECHNIQUES WITH SAS IDRE Statistical Consulting Group ROAD MAP FOR TODAY To discuss: 1. Commonly used techniques for handling missing data, focusing on multiple imputation 2. Issues that could

More information

Handling missing data in Stata a whirlwind tour

Handling missing data in Stata a whirlwind tour Handling missing data in Stata a whirlwind tour 2012 Italian Stata Users Group Meeting Jonathan Bartlett www.missingdata.org.uk 20th September 2012 1/55 Outline The problem of missing data and a principled

More information

The Cox Proportional Hazards Model

The Cox Proportional Hazards Model The Cox Proportional Hazards Model Mario Chen, PhD Advanced Biostatistics and RCT Workshop Office of AIDS Research, NIH ICSSC, FHI Goa, India, September 2009 1 The Model h i (t)=h 0 (t)exp(z i ), Z i =

More information

Introduction to General and Generalized Linear Models

Introduction to General and Generalized Linear Models Introduction to General and Generalized Linear Models General Linear Models - part I Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby

More information

Portfolio Using Queuing Theory

Portfolio Using Queuing Theory Modeling the Number of Insured Households in an Insurance Portfolio Using Queuing Theory Jean-Philippe Boucher and Guillaume Couture-Piché December 8, 2015 Quantact / Département de mathématiques, UQAM.

More information

**BEGINNING OF EXAMINATION** The annual number of claims for an insured has probability function: , 0 < q < 1.

**BEGINNING OF EXAMINATION** The annual number of claims for an insured has probability function: , 0 < q < 1. **BEGINNING OF EXAMINATION** 1. You are given: (i) The annual number of claims for an insured has probability function: 3 p x q q x x ( ) = ( 1 ) 3 x, x = 0,1,, 3 (ii) The prior density is π ( q) = q,

More information

Simple Linear Regression Inference

Simple Linear Regression Inference Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation

More information

CHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression

CHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression Opening Example CHAPTER 13 SIMPLE LINEAR REGREION SIMPLE LINEAR REGREION! Simple Regression! Linear Regression Simple Regression Definition A regression model is a mathematical equation that descries the

More information

Introduction to mixed model and missing data issues in longitudinal studies

Introduction to mixed model and missing data issues in longitudinal studies Introduction to mixed model and missing data issues in longitudinal studies Hélène Jacqmin-Gadda INSERM, U897, Bordeaux, France Inserm workshop, St Raphael Outline of the talk I Introduction Mixed models

More information

The Landmark Approach: An Introduction and Application to Dynamic Prediction in Competing Risks

The Landmark Approach: An Introduction and Application to Dynamic Prediction in Competing Risks Landmarking and landmarking Landmarking and competing risks Discussion The Landmark Approach: An Introduction and Application to Dynamic Prediction in Competing Risks Department of Medical Statistics and

More information

Chapter 7: Simple linear regression Learning Objectives

Chapter 7: Simple linear regression Learning Objectives Chapter 7: Simple linear regression Learning Objectives Reading: Section 7.1 of OpenIntro Statistics Video: Correlation vs. causation, YouTube (2:19) Video: Intro to Linear Regression, YouTube (5:18) -

More information

A Log-Robust Optimization Approach to Portfolio Management

A Log-Robust Optimization Approach to Portfolio Management A Log-Robust Optimization Approach to Portfolio Management Dr. Aurélie Thiele Lehigh University Joint work with Ban Kawas Research partially supported by the National Science Foundation Grant CMMI-0757983

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes

More information

" Y. Notation and Equations for Regression Lecture 11/4. Notation:

 Y. Notation and Equations for Regression Lecture 11/4. Notation: Notation: Notation and Equations for Regression Lecture 11/4 m: The number of predictor variables in a regression Xi: One of multiple predictor variables. The subscript i represents any number from 1 through

More information