Comparison of Survival Curves
|
|
|
- Alexander Norris
- 10 years ago
- Views:
Transcription
1 Comparison of Survival Curves We spent the last class looking at some nonparametric approaches for estimating the survival function, Ŝ(t), over time for a single sample of individuals. Now we want to compare the survival estimates between two groups. Example: Time to remission of leukemia patients How can we form a basis for comparison? At a specific point in time, we could see whether the confidence intervals for the survival curves overlap. However, the confidence intervals we have been calculating are pointwise they correspond to a confidence interval for Ŝ(t ) at a single point in time, t. In other words, we can t say that the true survival function S(t) is contained between the pointwise confidence intervals with 95% probability. (Aside: if you re interested, the issue of confidence bands for the estimated survival function are discussed in Section 4.4 of Klein and Moeschberger) 1 2
2 Looking at whether the confidence intervals for Ŝ(t ) overlap between the 6MP and placebo groups would only focus on comparing the two treatment groups at a single point in time, t. We want an overall comparison. Should we base our overall comparison of Ŝ(t) on: Nonparametric comparisons of groups All of these are pretty reasonable options, and we ll see that there have been several proposals for how to compare the survival of two groups. For the moment, we are sticking to nonparametric comparisons. the furthest distance between the two curves? the median survival for each group? the average hazard? (for exponential distributions, this would be like comparing the mean event times) adding up the difference between the two survival estimates over time? [ Ŝ(tjA ) Ŝ(t jb) ] Why nonparametric? fairly robust efficient relative to parametric tests often simple and intuitive j a weighted sum of differences, where the weights reflect the number at risk at each time? a rank-based test? i.e., we could rank all of the event times, and then see whether the sum of ranks for one group was less than the other. Before continuing the description of the two-sample comparison, I m going to try to put this in a general framework to give a perspective of where we re heading in this class. 3 4
3 General Framework for Survival Analysis We observe (X i,δ i, Z i ) for individual i, where X i is a censored failure time random variable δ i is the failure/censoring indicator Z i represents a set of covariates We ll proceed as follows: Two group comparisons Multigroup and stratified comparisons - stratified logrank Failure time regression models Cox proportional hazards model Accelerated failure time model Note that Z i might be a scalar (a single covariate, say treatment or gender) or may be a (p 1) vector (representing several different covariates). These covariates might be: continuous discrete time-varying (more later) If Z i is a scalar and is binary, then we are comparing the survival of two groups, like in the leukemia example. More generally though, it is useful to build a model that characterizes the relationship between survival and all of the covariates of interest. 5 6
4 Two sample tests Mantel-Haenszel logrank test Peto & Peto s version of the logrank test Gehan s Generalized Wilcoxon Peto & Peto s and Prentice s generalized Wilcoxon Tarone-Ware and Fleming-Harrington classes Cox s F-test (non-parametric version) Mantel-Haenszel Logrank test The logrank test is the most well known and widely used. It also has an intuitive appeal, building on standard methods for binary data. (Later we will see that it can also be obtained as the score test from a partial likelihood from the Cox Proportional Hazards model.) References: Hosmer & Lemeshow Section 2.4 Collett Section 2.5 Klein & Moeschberger Section 7.3 Kleinbaum Chapter 2 Lee Chapter 5 First consider the following (2 2) table classifying those with and without the event of interest in a two group setting: Event Group Yes No Total 0 d 0 n 0 d 0 n 0 1 d 1 n 1 d 1 n 1 Total d n d n 7 8
5 If the margins of this table are considered fixed, then d 0 follows a? distribution. Under the null hypothesis of no association between the event and group, it follows that E(d 0 ) = n 0d n Var(d 0 ) = n 0 n 1 d(n d) n 2 (n 1) Example: Toxicity in a clinical trial with two treatments Toxicity Group Yes No Total Total Therefore, under H 0 : χ 2 MH = [d 0 n 0 d/n] 2 n 0 n 1 d(n d) n 2 (n 1) χ 2 1 χ 2 p = 4.00 (p =0.046) This is the Mantel-Haenszel statistic and is approximately equivalent to the Pearson χ 2 test for equality of the two groups given by: χ 2 MH = 3.96 (p =0.047) χ 2 p = (o e)2 e Note: recall that the Pearson χ 2 test was derived for the case where only the row margins were fixed, and thus the variance above was replaced by: Var(d 0 ) = n 0 n 1 d(n d) n
6 Now suppose we have K (2 2) tables, all independent, and we want to test for a common group effect. The Cochran- Mantel-Haenszel test for a common odds ratio not equal to 1 can be written as: How does this apply in survival analysis? Suppose we observe Group 1: (X 11,δ 11 )...(X 1n1,δ 1n1 ) Group 0: (X 01,δ 01 )...(X 0n0,δ 0n0 ) χ 2 CMH = [ K j=1 (d 0j n 0j d j /n j )] 2 K j=1 n 1j n 0j d j (n j d j )/[n 2 j(n j 1)] We could just count the numbers of failures: eg., d 1 = K j=1 δ 1j where the subscript j refers to the j-th table: Event Group Yes No Total 0 d 0j n 0j d 0j n 0j 1 d 1j n 1j d 1j n 1j Total d j n j d j n j Example: Leukemia data, just counting up the number of remissions in each treatment group. Fail Group Yes No Total Total This statistic is distributed approximately as χ 2 1. χ 2 p = 16.8 (p =0.001) χ 2 MH = 16.4 (p =0.001) But, this doesn t account for the time at risk. Conceptually, we would like to compare the KM survival curves. Let s put the components side-by-side and compare
7 Cox & Oakes Table 1.1 Leukemia example Ordered Group 0 Group 1 Death Times d j c j r j d j c j r j Note that I wrote down the number at risk for Group 1 for times 1-5 even though there were no events or censorings at those times. Logrank Test: Formal Definition The logrank test is obtained by constructing a (2 2) table at each distinct death time, and comparing the death rates between the two groups, conditional on the number at risk in the groups. The tables are then combined using the Cochran-Mantel-Haenszel test. Note: The logrank is sometimes called the Cox-Mantel test. Let t 1,..., t K represent the K ordered, distinct death times. At the j-th death time, we have the following table: Die/Fail Group Yes No Total 0 d 0j r 0j d 0j r 0j 1 d 1j r 1j d 1j r 1j Total d j r j d j r j where d 0j and d 1j are the number of deaths in group 0 and 1, respectively at the j-th death time, and r 0j and r 1j are the number at risk at that time, in groups 0 and
8 The logrank test is: First several tables of leukemia data CMH analysis of leukemia data χ 2 logrank = [ K j=1 (d 0j r 0j d j /r j )] 2 K j=1 r 1j r 0j d j (r j d j ) [r 2 j (r j 1)] Assuming the tables are all independent, then this statistic will have an approximate χ 2 distribution with 1 df. TABLE 1 OF TRTMT BY REMISS CONTROLLINGFOR FAILTIME=1 TABLE 3 OF TRTMT BY REMISS CONTROLLINGFOR FAILTIME=3 TRTMT REMISS TRTMT REMISS Frequency Frequency Expected 0 1 Total Expected 0 1 Total Total Total Based on the motivation for the logrank test, which of the survival-related quantities are we comparing at each time point? K j=1 w j [ Ŝ1 (t j ) Ŝ2(t j ) ]? K j=1 w j [ˆλ 1 (t j ) ˆλ 2 (t j ) ]? K j=1 w j [ˆΛ 1 (t j ) ˆΛ 2 (t j ) ]? TABLE 2 OF TRTMT BY REMISS CONTROLLINGFOR FAILTIME=2 TABLE 4 OF TRTMT BY REMISS CONTROLLINGFOR FAILTIME=4 TRTMT REMISS TRTMT REMISS Frequency Frequency Expected 0 1 Total Expected 0 1 Total Total Total
9 CMH statistic = logrank statistic SUMMARY STATISTICS FOR TRTMT BY REMISS CONTROLLING FOR FAILTIME Cochran-Mantel-Haenszel Statistics (Basedon Table Scores) Statistic Alternative Hypothesis DF Value Prob Nonzero Correlation Row Mean Scores Differ General Association <===LOGRANK TEST Note: Although CMHworks to get the correct logrank test, it would require inputting the d j and r j at each time of death for each treatment group. There s an easier way to get the test statistic, which I ll show you shortly. Calculating logrank statistic by hand Leukemia Example: Ordered Group 0 Combined Death Times d 0j r 0j d j r j e j o j e j v j Sum o j = d 0j e j = d j r 0j /r j v j = r 1j r 0j d j (r j d j )/[r 2 j(r j 1)] χ 2 logrank = (10.251) =
10 Notes about logrank test: The logrank statistic depends on ranks of event times only If there are no tied deaths, then the logrank has the form: [ K j=1 (d 0j r 0j r j )] 2 K j=1 r 1j r 0j /r 2 j Numerator can be interpreted as (o e) where o is the observed number of deaths in group 0, and e is the expected number, given the risk set. The expected number equals #deaths proportion in group 0 at risk. The (o e) terms in the numerator can be written as r 0j r 1j r j (ˆλ 1j ˆλ 0j ) It does not matter which group you choose to sum over. To see this, note that if we summed up (o-e) over the death times for the 6MP group we would get , and the sum of the variances is the same. So when we square the numerator, the test statistic is the same. Analogous to the CMHtest for a series of tables at different levels of a confounder, the logrank test is most powerful when odds ratios are constant over time intervals. That is, it is most powerful for proportional hazards. Checking the assumption of proportional hazards: check to see if the estimated survival curves cross - if they do, then this is evidence that the hazards are not proportional more formal test: any ideas? What should be done if the hazards are not proportional? If the difference between hazards has a consistent sign, the logrank test usually does well. Other tests are available that are more powerful against different alternatives
11 Getting the logrank statistic using Stata: After declaring data as survival type data using the stset command, issue the sts test command. stset remiss status data set name: leukem id: -- (meaning each record a unique subject) entry time: -- (meaning all enteredat time 0) exit time: remiss failure/censor: status. sts list, by(trt) Beg. Net Survivor Std. Time Total Fail Lost Function Error [95% Conf. Int.] trt= (etc). sts test trt Log-rank test for equality of survivor functions Events trt observedexpected Total chi2(1) = Pr>chi2 = Getting the logrank statistic using SAS Still use PROC LIFETEST Add STRATA command, with treatment variable Gives the chi-square test (2-sided), but also gives you the terms you need to calculate the 1-sided test; this is useful if we want to know which of the two groups has the higher estimated hazard over time. The STRATA command also gives the Gehan-Wilcoxon test (which we will talk about next) Title Cox andoakes example ; data leukemia; input weeks remiss trtmt; cards; /* data for 6MP group */ etc /* data for placebo group */ etc ; proc lifetest data=leukemia; time weeks*remiss(0); strata trtmt; title Logrank test for leukemia data ; run; 21 22
12 Output from leukemia example: Getting the logrank statistic using Splus: Logrank test for leukemia data Summary of the Number of CensoredandUncensoredValues TRTMT Total FailedCensored%Censored 6-MP Control Total Testing Homogeneity of Survival Curves over Strata Time Variable FAILTIME Rank Statistics TRTMT Log-Rank Wilcoxon 6-MP Control Instead of the surv.fit command, use the surv.diff command with a group (treatment) variable. Mantel-Haenszel logrank: > logrank<-surv.diff(weeks,remiss,trtmt) > logrank N Observed Expected (O-E)ˆ2/E Chisq= 16.8 on 1 degrees of freedom, p= 4.169e-05 Covariance Matrix for the Log-Rank Statistics TRTMT 6-MP Control 6-MP Control Test of Equality over Strata Pr > Test Chi-Square DF Chi-Square Log-Rank <== Here s the one we want!! Wilcoxon Log(LR)
13 Generalization of logrank test = Linear rank tests The logrank and other tests can be derived by assigning scores to the ranks of the death times, and are members of a general class of linear rank tests (for more detail, see Lee, ch 5) First, define ˆΛ(t) = j:t j <t where d j and r j are the number of deaths and the number at risk, respectively at the j-th ordered death time. Then assign these scores (suggested by Peto and Peto): d j r j Event Score Death at t j w j =1 ˆΛ(t j ) Censoring at t j w j = ˆΛ(t j ) To calculate the logrank test, simply sum up the scores for group 0. Example Group 0: 15, 18, 19, 19, 20 Group 1: 16+, 18+, 20+, 23, 24+ Calculation of logrank as a linear rank statistic Ordered Data Group d j r j ˆΛ(tj ) score w j The logrank statistic S is sum of scores for group 0: S = =2.75 The variance is: Var(S) = n 0n 1 n j=1 w 2 j n(n 1) In this case, Var(S) =1.210, so Z = =2.50 = χ 2 logrank =(2.50) 2 =
14 Why is this form of the logrank equivalent? The logrank statistic S is equivalent to (o e) over the distinct death times, where o is the observed number of deaths in group 0, and e is the expected number, given the risk sets. At deaths: At censorings: weights are 1 ˆΛ weights are ˆΛ So we are summing up 1 s for deaths (to get d 0j ), and subtracting ˆΛ at both deaths and censorings. This amounts to subtracting d j /r j at each death or censoring time in group 0, at or after the j-th death. Since there are a total of r 0j of these, we get e = r 0j d j /r j. Why is it called the logrank test? Since S(t) = exp( Λ(t)), an alternative estimator of S(t) is: Ŝ(t) = exp( ˆΛ(t)) = exp( d j ) r j j:t j <t So, we can think of ˆΛ(t) = log(ŝ(t)) as yielding the logsurvival scores used to calculate the statistic. Comparing the CMH-type Logrank and Linear Rank logrank A. CMH-type Logrank: We motivated the logrank test through the CMHstatistic for testing H o : OR = 1 over K tables, where K is the number of distinct death times. This turned out to be what we get when we use the logrank (default) option in Stata or the strata statement in SAS. B. Linear Rank logrank: The linear rank version of the logrank test is based on adding up scores for one of the two treatment groups. The particular scores that gave us the same logrank statistic were based on the Nelson-Aalen estimator, i.e., ˆΛ = ˆλ(tj ). This is what you get when you use the test statement in SAS. Here are some comparisons, with a new example to show when the two types of logrank statistics will be equal
15 First, let s go back to our example from Chapter 5 of Lee: Example Group 0: 15, 18, 19, 19, 20 Group 1: 16+, 18+, 20+, 23, 24+ This is exactly the same chi-square test that you would get if you calculated the numerator of the logrank as (o j e j ) and the variance as v j = r 1j r 0j d j (r j d j )/[r 2 j(r j 1)] A. The CMH-type logrank statistic: (using the strata statement) Rank Statistics TRTMT Log-Rank Wilcoxon Control Treated Ordered Group 0 Combined Death Times d 0j r 0j d j r j e j o j e j v j Sum Covariance Matrix for the Log-Rank Statistics TRTMT Control Treated χ 2 logrank = (2.75) = Control Treated Test of Equality over Strata Pr > Test Chi-Square DF Chi-Square Log-Rank Wilcoxon Log(LR)
16 B. The linear rank logrank statistic: (using the test statement) Univariate Chi-Squares for the LOG RANK Test Test Standard Pr > Variable Statistic Deviation Chi-Square Chi-Square GROUP Covariance Matrix for the LOG RANK Statistics Variable TRTMT TRTMT Note that the numerator is the exact same number (2.75) in both versions of the logrank test. The difference in the denominator is due to the way that ties are handled. CMH-type variance: var = r 1jr 0j d j (r j d j ) rj(r 2 j 1) = r 1jr 0j r j (r j 1) d j (r j d j ) r j This is actually very close to what we would get if we use the Nelson-Aalen based scores : Calculation of logrank as a linear rank statistic Ordered Data Group d j r j ˆΛ(tj ) score w j Sum(grp 0) Linear rank type variance: var = n 0n 1 n j=1 w 2 j n(n 1) 31 32
17 Now consider an example where there are no tied death times Example I Group 0: 15, 18, 19, 21, 22 Group 1: 16+, 17+, 20+, 23, 24+ A. The CMH-type logrank statistic: (using the strata statement) Rank Statistics TRTMT Log-Rank Wilcoxon Control Treated Covariance Matrix for the Log-Rank Statistics TRTMT Control Treated Control Treated Test of Equality over Strata Pr > Test Chi-Square DF Chi-Square Log-Rank Wilcoxon Log(LR) B. The linear rank logrank statistic: (using the test statement) Univariate Chi-Squares for the LOG RANK Test Test Standard Pr > Variable Statistic Deviation Chi-Square Chi-Square TRTMT Covariance Matrix for the LOG RANK Statistics Variable TRTMT TRTMT Note that this time, the variances of the two logrank statistics are exactly the same, equal to If there are no tied event times, then the two versions of the test will yield identical results. The more ties we have, the more it matters which version we use
18 Gehan s Generalized Wilcoxon Test The Mann-Whitney form of the Wilcoxon is defined as: First, let s review the Wilcoxon test for uncensored data: Denote observations from two samples by: (X 1,X 2,...,X n ) and (Y 1,Y 2,...,Y m ) Order the combined sample and define: and U(X i,y j )=U ij = U = n +1 if X i >Y j 0 if X i = Y j 1 if X i <Y j m i=1 j=1 U ij. Z (1) <Z (2) < <Z (m+n) R i1 = rank of X i R 1 = m+n i=1 R i1 Reject H 0 if R 1 is too big or too small, according to R 1 E(R 1 ) N(0, 1) Var(R1 ) There is a simple correspondence between U and R 1 : R 1 = m(m + n +1)/2+U/2 so U = 2R 1 m(m + n +1) Therefore, where E(R 1 ) = Var(R 1 ) = m(m + n +1) 2 mn(m + n +1) 12 E(U) = 0 Var(U) = mn(m + n +1)/
19 Extending Wilcoxon to censored data The Mann-Whitney form leads to a generalization for censored data. Define U(X i,y j )=U ij = Then define +1 if x i >y j or x + i y j 0 if x i = y i or lower value censored 1 if x i <y j or x i y j + W = n m U ij i=1 j=1 Thus, there is a contribution to W for every comparison where both observations are failures (except for ties), or where a censored observation is greater than or equal to a failure. Looking at all possible pairs of individuals between the two treatment groups makes this a nightmare to compute by hand! Gehan found an easier way to compute the above. First, pool the sample of (n + m) observations into a single group, then compare each individual with the remaining n + m 1: For comparing the i-th individual with the j-th, define Then U ij = +1 if t i >t j or t + i t j 1 if t i <t j or t i t + j 0 otherwise U i = m+n j=1 Thus, for the i-th individual, U i is the number of observations which are definitely less than t i minus the number of observations that are definitely greater than t i. We assume censorings occur after deaths, so that if t i =18 + and t j = 18, then we add 1 to U i. U ij The Gehan statistic is defined as U = m+n U i 1 {i in group 0} i=1 = W U has mean 0 and variance var(u) = mn (m + n)(m + n 1) m+n i=1 U 2 i 37 38
20 Example from Lee: Group 0: 15, 18, 19, 19, 20 Group 1: 16+, 18+, 20+, 23, 24+ Time Group U i Ui SUM U = 18 Var(U) = (5)(5)(208) (10)(9) = and χ 2 = ( 18) 2 /57.78=5.61 SAS code: data leedata; infile lee.dat ; input time cens group; proc lifetest data=leedata; time time*cens(0); strata group; run ; SAS OUTPUT: Gehans Wilcoxon test Rank Statistics TRTMT Log-Rank Wilcoxon Control Treated Covariance Matrix for the Wilcoxon Statistics TRTMT Control Treated Control Treated Test of Equality over Strata Pr > Test Chi-Square DF Chi-Square Log-Rank Wilcoxon **this is Gehan s test -2Log(LR)
21 Notes about SAS Wilcoxon Test: SAS calculates the Wilcoxon as U instead of U, probably so that the sign of the test statistic is consistent with the logrank. Obtaining the Wilcoxon test using Stata Use the sts test statement, with the appropriate option SAS gets something slightly different for the variance, and this does not seem to depend on whether there are ties. sts test varlist [if exp] [in range] [, [logrank wilcoxon cox] strata(varlist) detail mat(matname1 matname2) notitle noshow ] For example, the hypothetical dataset on p.6 without ties yields U = 15 and U 2 i = 182, so Var(U) = (5)(5)(182) (10)(9) while SAS gives the following: Rank Statistics TRTMT Log-Rank Wilcoxon Control Treated Covariance Matrix for the Wilcoxon Statistics TRTMT Control Treated Control Treated Test of Equality over Strata Pr > Test Chi-Square DF Chi-Square =50.56 and χ 2 = ( 15) =4.45 logrank, wilcoxon, and cox specify which test of equality is desired. logrank is the default, and cox yields a likelihood ratio test under a cox model. Example: (leukemia data). stset remiss status. sts test trt, wilcoxon Wilcoxon (Breslow) test for equality of survivor functions Events Sum of trt observed expected ranks Total chi2(1) = Pr>chi2 = Log-Rank Wilcoxon Log(LR)
22 Generalized Wilcoxon (Peto & Peto, Prentice) Assign the following scores: For a death at t: Ŝ(t+) + Ŝ(t ) 1 For a censoring at t: Ŝ(t+) 1 The test statistic is (scores) for group 0. Time Group d j r j Ŝ(t+) score w j The Tarone-Ware class of tests: This general class of tests is like the logrank test, but adds weights w j. The logrank test, Wilcoxon test, and Peto- Prentice Wilcoxon are included as special cases. χ 2 tw = [ K j=1 w j (d 1j r 1j d j /r j )] 2 K l=1 w 2 j r 1jr 0j d j (r j d j ) r 2 j (r j 1) Test Weight w j Logrank w j =1 Gehan s Wilcoxon w j = r j Peto/Prentice w j = nŝ(t j) w j 1 {j in group 0} = (0.313) + ( 0.081) = 2.13 Fleming-Harrington Tarone-Ware w j =[Ŝ(t j)] α w j = r j Var(S) = n 0n n 1 j=1 wj 2 = n(n 1) so Z = 2.13/0.765 = Note: these weights w j are not the same as the scores w j we ve been talking about earlier, and they apply to the CMH-type form of the test statistic rather than (scores) over a single treatment group
23 Which test should we used? CMH-type or Linear Rank? If there are not a high proportion of ties, then it doesn t really matter since: The Wilcoxon is sensitive to early differences between survival, while the logrank is sensitive to later ones. This can be seen by the relative weights they assign to the test statistic: LOGRANK numerator = (o j e j ) j The two Wilcoxons are similar to each other WILCOXON numerator = j r j (o j e j ) The two logrank tests are similar to each other Note: personally, I tend to use the CMH-type test, which you get with the strata statement in SAS and the test statement in STATA. Logrank or Wilcoxon? Both tests have the right Type I power for testing the null hypothesis of equal survival, H o : S 1 (t) =S 2 (t) The choice of which test may therefore depend on the alternative hypothesis, which will drive the power of the test. The logrank is most powerful under the assumption of proportional hazards, which implies an alternative in terms of the survival functions of H a : S 1 (t) =[S 2 (t)] α The Wilcoxon has high power when the failure times are lognormally distributed, with equal variance in both groups but a different mean. It will turn out that this is the assumption of an accelerated failure time model. Both tests will lack power if the survival curves (or hazards) cross. However, that does not necessarily make them invalid! 45 46
24 Comparison between TEST and STRATA in SAS for 2 examples: Data from Lee (n=10): from STRATA: Test of Equality over Strata Pr > Test Chi-Square DF Chi-Square Log-Rank Wilcoxon **this is Gehan s test -2Log(LR) Previous example with leukemia data: from STRATA: Test of Equality over Strata Pr > Test Chi-Square DF Chi-Square Log-Rank Wilcoxon Log(LR) from TEST: from TEST: Univariate Chi-Squares for the WILCOXON Test Test Standard Pr > Variable Statistic Deviation Chi-Square Chi-Square GROUP Univariate Chi-Squares for the LOG RANK Test Univariate Chi-Squares for the WILCOXON Test Test Standard Pr > Variable Statistic Deviation Chi-Square Chi-Square GROUP Univariate Chi-Squares for the LOG RANK Test Test Standard Pr > Variable Statistic Deviation Chi-Square Chi-Square GROUP Test Standard Pr > Variable Statistic Deviation Chi-Square Chi-Square GROUP
25 P -sample and stratified logrank tests We have been discussing two sample problems. In practice, more complex settings often arise: There are more than two treatments or groups, and the question of interest is whether the groups differ from each other. We are interested in a comparison between two groups, but we wish to adjust for another factor that may confound the analysis We want to adjust for lots of covariates. We will first talk about comparing the survival distributions between more than 2 groups, and then about adjusting for other covariates. P -sample logrank Suppose we observe data from P different groups, and the data from group p (p =1,..., P ) are: (X p1,δ p1 )...(X pnp,δ pnp ) We now construct a (P 2) table at each of the K distinct death times, and compare the death rates between the P groups, conditional on the number at risk. Let t 1,...t K represent the K ordered, distinct death times. At the j-th death time, we have the following table: Die/Fail Group Yes No Total 1 d 1j r 1l d 1j r 1j.... P d Pj r Pj d Pj r Pj Total d j r j d j r j where d pj is the number of deaths in group p at the j-th death time, and r pj is the number at risk at that time. The tables are then combined using the CMHapproach
26 If we were just focusing on this one table, then a χ 2 (P 1) test statistic could be constructed through a comparison of o s and e s, like before. Example: Toxicity in a clinical trial with 3 treatments TABLE OF GROUP BY TOXICITY GROUP TOXICITY Frequency Row Pct no yes Total Total STATISTICS FOR TABLE OF GROUP BY TOXICITY Statistic DF Value Prob Chi-Square LikelihoodRatio Chi-Square Mantel-Haenszel Chi-Square Cochran-Mantel-Haenszel Statistics (Basedon Table Scores) Statistic Alternative Hypothesis DF Value Prob Nonzero Correlation Row Mean Scores Differ General Association Formal Calculations: Let O j =(d 1j,...d (P 1)j ) T be a vector of the observed number of failures in groups 1 to (P 1), respectively, at the j-th death time. Given the risk sets r 1j,... r Pj, and the fact that there are d j deaths, then O j has a distribution like a multivariate version of the Hypergeometric. O j has mean: E j =( d j r 1j r j and variance covariance matrix: V j =,..., d j r (P 1)j ) T r j v 11j v 12j... v 1(P 1)j v 22j... v 2(P 1)j v (P 1)(P 1)j where the l-th diagonal element is: v llj = r lj (r j r lj )d j (r j d j )/[r 2 j(r j 1)] and the lm-th off-diagonal element is: v lmj = r lj r mj d j (r j d j )/[r 2 j(r j 1)] 51 52
27 The resulting χ 2 test for a single (P 1) table would have (P-1) degrees and is constructed as follows: (O j E j ) T V 1 j (O j E j ) Example: Time taken to finish a test with 3 different noise distractions. All tests were stopped after 12 minutes. Generalizing to K tables Analogous to what we did for the two sample logrank, we replace the O j, E j and V j with the sums over the K distinct death times. That is, let O = k j=1 O j, E = k j=1 E j, and V = k j=1 V j. Then, the test statistic is: (O E) T V 1 (O E) Noise Level Group Group Group
28 Lets start the calculations... Observed data table Ordered Group 1 Group 2 Group 3 Combined Times d 1j r 1j d 2j r 2j d 3j r 3j d j r j Expected table Ordered Group 1 Group 2 Group 3 Combined Times o 1j e 1j o 2j e 2j o 3j e 3j o j e j Doing the P -sample test by hand is cumbersome... Luckily, most statistical packages will do it for you! P -sample logrank in Stata.sts graph, by(group).sts test group, logrank Log-rank test for equality of survivor functions Events group observed expected Total chi2(2) = Pr>chi2 = sts test group, wilcoxon Wilcoxon (Breslow) test for equality of survivor functions Events Sum of group observed expected ranks Total chi2(2) = Pr>chi2 =
29 SAS program for P -sample logrank Title Testing with noise example ; data noise; input testtime finish group; cards; ; proc lifetest data=noise; time testtime*finish(0); strata group; run; Testing Homogeneity of Survival Curves over Strata Time Variable TESTTIME Rank Statistics GROUP Log-Rank Wilcoxon Covariance Matrix for the Log-Rank Statistics GROUP Covariance Matrix for the Wilcoxon Statistics GROUP Test of Equality over Strata Pr > Test Chi-Square DF Chi-Square Log-Rank Wilcoxon Log(LR)
30 Note: do not use Test in SAS PROC LIFETEST if you want a P -sample logrank. Test will interpret the group variable as a measured covariate (i.e., either ordinal or continuous). In other words, you will get a trend test with only 1 degree of freedom, rather than a P-sample test with (p-1) df. For example, here s what we get if we use the TEST statement on the noise example: proc lifetest data=noise; time testtime*finish(0); test group; run; SAS OUTPUT: Univariate Chi-Squares for the LOG RANK Test Test Standard Pr > Variable Statistic Deviation Chi-Square Chi-Square GROUP Covariance Matrix for the LOG RANK Statistics Variable GROUP GROUP Forward Stepwise Sequence of Chi-Squares for the LOG RANK Test Pr > Chi-Square Pr > Variable DF Chi-Square Chi-Square Increment Increment The Stratified Logrank Sometimes, even though we are interested in comparing two groups (or maybe P ) groups, we know there are other factors that also affect the outcome. It would be useful to adjust for these other factors in some way. Example: For the nursing home data, a logrank test comparing length of stay for those under and over 85 years of age suggests a significant difference (p=0.03). However, we know that gender has a strong association with length of stay, and also age. Hence, it would be a good idea to STRATIFY the analysis by gender when trying to assess the age effect. A stratified logrank allows one to compare groups, but allows the shapes of the hazards of the different groups to differ across strata. It makes the assumption that the group 1 vs group 2 hazard ratio is constant across strata. In other words: λ 1s(t) λ 2s (t) (s =1,..., S). = θ where θ is constant over the strata This method of adjusting for other variables is not as flexible as that based on a modelling approach. GROUP
31 General setup for the stratified logrank: Suppose we want to assess the association between survival and a factor (call this X) that has two different levels. Suppose however, that we want to stratify by a second factor, that has S different levels. Let O s be the sum of the o s obtained by applying the logrank calculations in the usual way to the data from group s. Similarly, let E s be the sum of the e s, and V s be the sum of the v s. The stratified logrank is First, divide the data into S separate groups. Within group s (s =1,..., S), proceed as though you were constructing the logrank to assess the association between survival and the variable X. That is, let t 1s,..., t Ks s represent the K s ordered, distinct death times in the s-th group. At the j-th death time in group s, we have the following table: Z = S s=1 (O s E s ) S s=1 (V s ) Die/Fail X Yes No Total 1 d s1j r s1j d s1j r s1j 2 d s2j r s2j d s2j r s2j Total d sj r sj d sj r sj 61 62
32 Stratified logrank using Stata: Stratified logrank using SAS:. use nurshome. gen age1=0. replace age1=1 if age>85. sts test age1, strata(gender) failure _d: analysis time _t: cens los Stratified log-rank test for equality of survivor functions Events age1 observed expected(*) Total (*) sum over calculations within gender chi2(1) = 3.22 Pr>chi2 = data pop1; set pop; age1=0; if age >85 then age1=1; proc lifetest data=pop1 outsurv=survres; time stay*censor(1); test age1; strata gender; RESULTS (just the logrank part... you can also do a stratified Wilcoxon) The LIFETEST Procedure Rank Tests for the Association of LSTAY with Covariates Pooledover Strata Univariate Chi-Squares for the LOG RANK Test Test Standard Pr > Variable Statistic Deviation Chi-Square Chi-Square AGE Covariance Matrix for the LOG RANK Statistics Variable AGE1 AGE ForwardStepwise Sequence of Chi-Squares for the LOG RANK Test Pr > Chi-Square Pr > Variable DF Chi-Square Chi-Square Increment Increment AGE
Survey, Statistics and Psychometrics Core Research Facility University of Nebraska-Lincoln. Log-Rank Test for More Than Two Groups
Survey, Statistics and Psychometrics Core Research Facility University of Nebraska-Lincoln Log-Rank Test for More Than Two Groups Prepared by Harlan Sayles (SRAM) Revised by Julia Soulakova (Statistics)
Lecture 2 ESTIMATING THE SURVIVAL FUNCTION. One-sample nonparametric methods
Lecture 2 ESTIMATING THE SURVIVAL FUNCTION One-sample nonparametric methods There are commonly three methods for estimating a survivorship function S(t) = P (T > t) without resorting to parametric models:
Introduction. Survival Analysis. Censoring. Plan of Talk
Survival Analysis Mark Lunt Arthritis Research UK Centre for Excellence in Epidemiology University of Manchester 01/12/2015 Survival Analysis is concerned with the length of time before an event occurs.
Linda Staub & Alexandros Gekenidis
Seminar in Statistics: Survival Analysis Chapter 2 Kaplan-Meier Survival Curves and the Log- Rank Test Linda Staub & Alexandros Gekenidis March 7th, 2011 1 Review Outcome variable of interest: time until
SUMAN DUVVURU STAT 567 PROJECT REPORT
SUMAN DUVVURU STAT 567 PROJECT REPORT SURVIVAL ANALYSIS OF HEROIN ADDICTS Background and introduction: Current illicit drug use among teens is continuing to increase in many countries around the world.
200609 - ATV - Lifetime Data Analysis
Coordinating unit: Teaching unit: Academic year: Degree: ECTS credits: 2015 200 - FME - School of Mathematics and Statistics 715 - EIO - Department of Statistics and Operations Research 1004 - UB - (ENG)Universitat
SAS Software to Fit the Generalized Linear Model
SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling
Gamma Distribution Fitting
Chapter 552 Gamma Distribution Fitting Introduction This module fits the gamma probability distributions to a complete or censored set of individual or grouped data values. It outputs various statistics
How to set the main menu of STATA to default factory settings standards
University of Pretoria Data analysis for evaluation studies Examples in STATA version 11 List of data sets b1.dta (To be created by students in class) fp1.xls (To be provided to students) fp1.txt (To be
Tests for Two Survival Curves Using Cox s Proportional Hazards Model
Chapter 730 Tests for Two Survival Curves Using Cox s Proportional Hazards Model Introduction A clinical trial is often employed to test the equality of survival distributions of two treatment groups.
Applied Statistics. J. Blanchet and J. Wadsworth. Institute of Mathematics, Analysis, and Applications EPF Lausanne
Applied Statistics J. Blanchet and J. Wadsworth Institute of Mathematics, Analysis, and Applications EPF Lausanne An MSc Course for Applied Mathematicians, Fall 2012 Outline 1 Model Comparison 2 Model
Recall this chart that showed how most of our course would be organized:
Chapter 4 One-Way ANOVA Recall this chart that showed how most of our course would be organized: Explanatory Variable(s) Response Variable Methods Categorical Categorical Contingency Tables Categorical
Statistics in Retail Finance. Chapter 6: Behavioural models
Statistics in Retail Finance 1 Overview > So far we have focussed mainly on application scorecards. In this chapter we shall look at behavioural models. We shall cover the following topics:- Behavioural
Using Stata for Categorical Data Analysis
Using Stata for Categorical Data Analysis NOTE: These problems make extensive use of Nick Cox s tab_chi, which is actually a collection of routines, and Adrian Mander s ipf command. From within Stata,
11. Analysis of Case-control Studies Logistic Regression
Research methods II 113 11. Analysis of Case-control Studies Logistic Regression This chapter builds upon and further develops the concepts and strategies described in Ch.6 of Mother and Child Health:
Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm
Mgt 540 Research Methods Data Analysis 1 Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm http://web.utk.edu/~dap/random/order/start.htm
Basic Statistical and Modeling Procedures Using SAS
Basic Statistical and Modeling Procedures Using SAS One-Sample Tests The statistical procedures illustrated in this handout use two datasets. The first, Pulse, has information collected in a classroom
Chapter 5 Analysis of variance SPSS Analysis of variance
Chapter 5 Analysis of variance SPSS Analysis of variance Data file used: gss.sav How to get there: Analyze Compare Means One-way ANOVA To test the null hypothesis that several population means are equal,
Regression Modeling Strategies
Frank E. Harrell, Jr. Regression Modeling Strategies With Applications to Linear Models, Logistic Regression, and Survival Analysis With 141 Figures Springer Contents Preface Typographical Conventions
STATISTICA Formula Guide: Logistic Regression. Table of Contents
: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary
Students' Opinion about Universities: The Faculty of Economics and Political Science (Case Study)
Cairo University Faculty of Economics and Political Science Statistics Department English Section Students' Opinion about Universities: The Faculty of Economics and Political Science (Case Study) Prepared
Introduction to Event History Analysis DUSTIN BROWN POPULATION RESEARCH CENTER
Introduction to Event History Analysis DUSTIN BROWN POPULATION RESEARCH CENTER Objectives Introduce event history analysis Describe some common survival (hazard) distributions Introduce some useful Stata
Least Squares Estimation
Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David
Tips for surviving the analysis of survival data. Philip Twumasi-Ankrah, PhD
Tips for surviving the analysis of survival data Philip Twumasi-Ankrah, PhD Big picture In medical research and many other areas of research, we often confront continuous, ordinal or dichotomous outcomes
Developing Business Failure Prediction Models Using SAS Software Oki Kim, Statistical Analytics
Paper SD-004 Developing Business Failure Prediction Models Using SAS Software Oki Kim, Statistical Analytics ABSTRACT The credit crisis of 2008 has changed the climate in the investment and finance industry.
Life Table Analysis using Weighted Survey Data
Life Table Analysis using Weighted Survey Data James G. Booth and Thomas A. Hirschl June 2005 Abstract Formulas for constructing valid pointwise confidence bands for survival distributions, estimated using
Statistical Models in R
Statistical Models in R Some Examples Steven Buechler Department of Mathematics 276B Hurley Hall; 1-6233 Fall, 2007 Outline Statistical Models Structure of models in R Model Assessment (Part IA) Anova
Nonparametric tests these test hypotheses that are not statements about population parameters (e.g.,
CHAPTER 13 Nonparametric and Distribution-Free Statistics Nonparametric tests these test hypotheses that are not statements about population parameters (e.g., 2 tests for goodness of fit and independence).
Beginning Tutorials. PROC FREQ: It s More Than Counts Richard Severino, The Queen s Medical Center, Honolulu, HI OVERVIEW.
Paper 69-25 PROC FREQ: It s More Than Counts Richard Severino, The Queen s Medical Center, Honolulu, HI ABSTRACT The FREQ procedure can be used for more than just obtaining a simple frequency distribution
NCSS Statistical Software
Chapter 06 Introduction This procedure provides several reports for the comparison of two distributions, including confidence intervals for the difference in means, two-sample t-tests, the z-test, the
Interpretation of Somers D under four simple models
Interpretation of Somers D under four simple models Roger B. Newson 03 September, 04 Introduction Somers D is an ordinal measure of association introduced by Somers (96)[9]. It can be defined in terms
VI. Introduction to Logistic Regression
VI. Introduction to Logistic Regression We turn our attention now to the topic of modeling a categorical outcome as a function of (possibly) several factors. The framework of generalized linear models
This can dilute the significance of a departure from the null hypothesis. We can focus the test on departures of a particular form.
One-Degree-of-Freedom Tests Test for group occasion interactions has (number of groups 1) number of occasions 1) degrees of freedom. This can dilute the significance of a departure from the null hypothesis.
Data Mining: An Overview of Methods and Technologies for Increasing Profits in Direct Marketing. C. Olivia Rud, VP, Fleet Bank
Data Mining: An Overview of Methods and Technologies for Increasing Profits in Direct Marketing C. Olivia Rud, VP, Fleet Bank ABSTRACT Data Mining is a new term for the common practice of searching through
Developing Risk Adjustment Techniques Using the SAS@ System for Assessing Health Care Quality in the lmsystem@
Developing Risk Adjustment Techniques Using the SAS@ System for Assessing Health Care Quality in the lmsystem@ Yanchun Xu, Andrius Kubilius Joint Commission on Accreditation of Healthcare Organizations,
Ordinal Regression. Chapter
Ordinal Regression Chapter 4 Many variables of interest are ordinal. That is, you can rank the values, but the real distance between categories is unknown. Diseases are graded on scales from least severe
Modeling Lifetime Value in the Insurance Industry
Modeling Lifetime Value in the Insurance Industry C. Olivia Parr Rud, Executive Vice President, Data Square, LLC ABSTRACT Acquisition modeling for direct mail insurance has the unique challenge of targeting
Section 13, Part 1 ANOVA. Analysis Of Variance
Section 13, Part 1 ANOVA Analysis Of Variance Course Overview So far in this course we ve covered: Descriptive statistics Summary statistics Tables and Graphs Probability Probability Rules Probability
Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model
Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written
Parametric and Nonparametric: Demystifying the Terms
Parametric and Nonparametric: Demystifying the Terms By Tanya Hoskin, a statistician in the Mayo Clinic Department of Health Sciences Research who provides consultations through the Mayo Clinic CTSA BERD
Poisson Models for Count Data
Chapter 4 Poisson Models for Count Data In this chapter we study log-linear models for count data under the assumption of a Poisson error structure. These models have many applications, not only to the
Multivariate normal distribution and testing for means (see MKB Ch 3)
Multivariate normal distribution and testing for means (see MKB Ch 3) Where are we going? 2 One-sample t-test (univariate).................................................. 3 Two-sample t-test (univariate).................................................
Survival Analysis: Introduction
Survival Analysis: Introduction Survival Analysis typically focuses on time to event data. In the most general sense, it consists of techniques for positivevalued random variables, such as time to death
Generalized Linear Models
Generalized Linear Models We have previously worked with regression models where the response variable is quantitative and normally distributed. Now we turn our attention to two types of models where the
UNIVERSITY OF NAIROBI
UNIVERSITY OF NAIROBI MASTERS IN PROJECT PLANNING AND MANAGEMENT NAME: SARU CAROLYNN ELIZABETH REGISTRATION NO: L50/61646/2013 COURSE CODE: LDP 603 COURSE TITLE: RESEARCH METHODS LECTURER: GAKUU CHRISTOPHER
Simple Linear Regression Inference
Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation
SPSS Tests for Versions 9 to 13
SPSS Tests for Versions 9 to 13 Chapter 2 Descriptive Statistic (including median) Choose Analyze Descriptive statistics Frequencies... Click on variable(s) then press to move to into Variable(s): list
Study Guide for the Final Exam
Study Guide for the Final Exam When studying, remember that the computational portion of the exam will only involve new material (covered after the second midterm), that material from Exam 1 will make
NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )
Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates
Difference tests (2): nonparametric
NST 1B Experimental Psychology Statistics practical 3 Difference tests (): nonparametric Rudolf Cardinal & Mike Aitken 10 / 11 February 005; Department of Experimental Psychology University of Cambridge
business statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar
business statistics using Excel Glyn Davis & Branko Pecar OXFORD UNIVERSITY PRESS Detailed contents Introduction to Microsoft Excel 2003 Overview Learning Objectives 1.1 Introduction to Microsoft Excel
Using An Ordered Logistic Regression Model with SAS Vartanian: SW 541
Using An Ordered Logistic Regression Model with SAS Vartanian: SW 541 libname in1 >c:\=; Data first; Set in1.extract; A=1; PROC LOGIST OUTEST=DD MAXITER=100 ORDER=DATA; OUTPUT OUT=CC XBETA=XB P=PROB; MODEL
Non-Inferiority Tests for One Mean
Chapter 45 Non-Inferiority ests for One Mean Introduction his module computes power and sample size for non-inferiority tests in one-sample designs in which the outcome is distributed as a normal random
Sun Li Centre for Academic Computing [email protected]
Sun Li Centre for Academic Computing [email protected] Elementary Data Analysis Group Comparison & One-way ANOVA Non-parametric Tests Correlations General Linear Regression Logistic Models Binary Logistic
EPS 625 INTERMEDIATE STATISTICS FRIEDMAN TEST
EPS 625 INTERMEDIATE STATISTICS The Friedman test is an extension of the Wilcoxon test. The Wilcoxon test can be applied to repeated-measures data if participants are assessed on two occasions or conditions
Basic Statistics and Data Analysis for Health Researchers from Foreign Countries
Basic Statistics and Data Analysis for Health Researchers from Foreign Countries Volkert Siersma [email protected] The Research Unit for General Practice in Copenhagen Dias 1 Content Quantifying association
Modeling Customer Lifetime Value Using Survival Analysis An Application in the Telecommunications Industry
Paper 12028 Modeling Customer Lifetime Value Using Survival Analysis An Application in the Telecommunications Industry Junxiang Lu, Ph.D. Overland Park, Kansas ABSTRACT Increasingly, companies are viewing
SPSS TRAINING SESSION 3 ADVANCED TOPICS (PASW STATISTICS 17.0) Sun Li Centre for Academic Computing [email protected]
SPSS TRAINING SESSION 3 ADVANCED TOPICS (PASW STATISTICS 17.0) Sun Li Centre for Academic Computing [email protected] IN SPSS SESSION 2, WE HAVE LEARNT: Elementary Data Analysis Group Comparison & One-way
II. DISTRIBUTIONS distribution normal distribution. standard scores
Appendix D Basic Measurement And Statistics The following information was developed by Steven Rothke, PhD, Department of Psychology, Rehabilitation Institute of Chicago (RIC) and expanded by Mary F. Schmidt,
Rank-Based Non-Parametric Tests
Rank-Based Non-Parametric Tests Reminder: Student Instructional Rating Surveys You have until May 8 th to fill out the student instructional rating surveys at https://sakai.rutgers.edu/portal/site/sirs
Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics
Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics For 2015 Examinations Aim The aim of the Probability and Mathematical Statistics subject is to provide a grounding in
Multivariate Logistic Regression
1 Multivariate Logistic Regression As in univariate logistic regression, let π(x) represent the probability of an event that depends on p covariates or independent variables. Then, using an inv.logit formulation
Lecture 18: Logistic Regression Continued
Lecture 18: Logistic Regression Continued Dipankar Bandyopadhyay, Ph.D. BMTRY 711: Analysis of Categorical Data Spring 2011 Division of Biostatistics and Epidemiology Medical University of South Carolina
Biostatistics: Types of Data Analysis
Biostatistics: Types of Data Analysis Theresa A Scott, MS Vanderbilt University Department of Biostatistics [email protected] http://biostat.mc.vanderbilt.edu/theresascott Theresa A Scott, MS
One-Way Analysis of Variance (ANOVA) Example Problem
One-Way Analysis of Variance (ANOVA) Example Problem Introduction Analysis of Variance (ANOVA) is a hypothesis-testing technique used to test the equality of two or more population (or treatment) means
Topic 3 - Survival Analysis
Topic 3 - Survival Analysis 1. Topics...2 2. Learning objectives...3 3. Grouped survival data - leukemia example...4 3.1 Cohort survival data schematic...5 3.2 Tabulation of events and time at risk...6
Multinomial and Ordinal Logistic Regression
Multinomial and Ordinal Logistic Regression ME104: Linear Regression Analysis Kenneth Benoit August 22, 2012 Regression with categorical dependent variables When the dependent variable is categorical,
TABLE OF CONTENTS. About Chi Squares... 1. What is a CHI SQUARE?... 1. Chi Squares... 1. Hypothesis Testing with Chi Squares... 2
About Chi Squares TABLE OF CONTENTS About Chi Squares... 1 What is a CHI SQUARE?... 1 Chi Squares... 1 Goodness of fit test (One-way χ 2 )... 1 Test of Independence (Two-way χ 2 )... 2 Hypothesis Testing
Multivariate Normal Distribution
Multivariate Normal Distribution Lecture 4 July 21, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Lecture #4-7/21/2011 Slide 1 of 41 Last Time Matrices and vectors Eigenvalues
Principle Component Analysis and Partial Least Squares: Two Dimension Reduction Techniques for Regression
Principle Component Analysis and Partial Least Squares: Two Dimension Reduction Techniques for Regression Saikat Maitra and Jun Yan Abstract: Dimension reduction is one of the major tasks for multivariate
Statistics Graduate Courses
Statistics Graduate Courses STAT 7002--Topics in Statistics-Biological/Physical/Mathematics (cr.arr.).organized study of selected topics. Subjects and earnable credit may vary from semester to semester.
Statistics, Data Analysis & Econometrics
Using the LOGISTIC Procedure to Model Responses to Financial Services Direct Marketing David Marsh, Senior Credit Risk Modeler, Canadian Tire Financial Services, Welland, Ontario ABSTRACT It is more important
Survival Analysis of Left Truncated Income Protection Insurance Data. [March 29, 2012]
Survival Analysis of Left Truncated Income Protection Insurance Data [March 29, 2012] 1 Qing Liu 2 David Pitt 3 Yan Wang 4 Xueyuan Wu Abstract One of the main characteristics of Income Protection Insurance
Descriptive Statistics
Descriptive Statistics Primer Descriptive statistics Central tendency Variation Relative position Relationships Calculating descriptive statistics Descriptive Statistics Purpose to describe or summarize
Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics.
Course Catalog In order to be assured that all prerequisites are met, students must acquire a permission number from the education coordinator prior to enrolling in any Biostatistics course. Courses are
Chapter 12 Nonparametric Tests. Chapter Table of Contents
Chapter 12 Nonparametric Tests Chapter Table of Contents OVERVIEW...171 Testing for Normality...... 171 Comparing Distributions....171 ONE-SAMPLE TESTS...172 TWO-SAMPLE TESTS...172 ComparingTwoIndependentSamples...172
Multivariate Analysis of Variance (MANOVA): I. Theory
Gregory Carey, 1998 MANOVA: I - 1 Multivariate Analysis of Variance (MANOVA): I. Theory Introduction The purpose of a t test is to assess the likelihood that the means for two groups are sampled from the
Statistics Review PSY379
Statistics Review PSY379 Basic concepts Measurement scales Populations vs. samples Continuous vs. discrete variable Independent vs. dependent variable Descriptive vs. inferential stats Common analyses
Lesson 1: Comparison of Population Means Part c: Comparison of Two- Means
Lesson : Comparison of Population Means Part c: Comparison of Two- Means Welcome to lesson c. This third lesson of lesson will discuss hypothesis testing for two independent means. Steps in Hypothesis
NCSS Statistical Software
Chapter 06 Introduction This procedure provides several reports for the comparison of two distributions, including confidence intervals for the difference in means, two-sample t-tests, the z-test, the
International Statistical Institute, 56th Session, 2007: Phil Everson
Teaching Regression using American Football Scores Everson, Phil Swarthmore College Department of Mathematics and Statistics 5 College Avenue Swarthmore, PA198, USA E-mail: [email protected] 1. Introduction
Data Analysis Tools. Tools for Summarizing Data
Data Analysis Tools This section of the notes is meant to introduce you to many of the tools that are provided by Excel under the Tools/Data Analysis menu item. If your computer does not have that tool
Overview Classes. 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7)
Overview Classes 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7) 2-4 Loglinear models (8) 5-4 15-17 hrs; 5B02 Building and
Part 2: Analysis of Relationship Between Two Variables
Part 2: Analysis of Relationship Between Two Variables Linear Regression Linear correlation Significance Tests Multiple regression Linear Regression Y = a X + b Dependent Variable Independent Variable
UNDERSTANDING THE TWO-WAY ANOVA
UNDERSTANDING THE e have seen how the one-way ANOVA can be used to compare two or more sample means in studies involving a single independent variable. This can be extended to two independent variables
Name of the module: Multivariate biostatistics and SPSS Number of module: 471-8-4081
Name of the module: Multivariate biostatistics and SPSS Number of module: 471-8-4081 BGU Credits: 1.5 ECTS credits: Academic year: 4 th Semester: 15 days during fall semester Hours of instruction: 8:00-17:00
ANALYSING LIKERT SCALE/TYPE DATA, ORDINAL LOGISTIC REGRESSION EXAMPLE IN R.
ANALYSING LIKERT SCALE/TYPE DATA, ORDINAL LOGISTIC REGRESSION EXAMPLE IN R. 1. Motivation. Likert items are used to measure respondents attitudes to a particular question or statement. One must recall
LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING
LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING In this lab you will explore the concept of a confidence interval and hypothesis testing through a simulation problem in engineering setting.
Class 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1)
Spring 204 Class 9: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.) Big Picture: More than Two Samples In Chapter 7: We looked at quantitative variables and compared the
Tutorial 5: Hypothesis Testing
Tutorial 5: Hypothesis Testing Rob Nicholls [email protected] MRC LMB Statistics Course 2014 Contents 1 Introduction................................ 1 2 Testing distributional assumptions....................
MEASURES OF LOCATION AND SPREAD
Paper TU04 An Overview of Non-parametric Tests in SAS : When, Why, and How Paul A. Pappas and Venita DePuy Durham, North Carolina, USA ABSTRACT Most commonly used statistical procedures are based on the
IBM SPSS Statistics 20 Part 4: Chi-Square and ANOVA
CALIFORNIA STATE UNIVERSITY, LOS ANGELES INFORMATION TECHNOLOGY SERVICES IBM SPSS Statistics 20 Part 4: Chi-Square and ANOVA Summer 2013, Version 2.0 Table of Contents Introduction...2 Downloading the
COMPARISONS OF CUSTOMER LOYALTY: PUBLIC & PRIVATE INSURANCE COMPANIES.
277 CHAPTER VI COMPARISONS OF CUSTOMER LOYALTY: PUBLIC & PRIVATE INSURANCE COMPANIES. This chapter contains a full discussion of customer loyalty comparisons between private and public insurance companies
Factors affecting online sales
Factors affecting online sales Table of contents Summary... 1 Research questions... 1 The dataset... 2 Descriptive statistics: The exploratory stage... 3 Confidence intervals... 4 Hypothesis tests... 4
From the help desk: Bootstrapped standard errors
The Stata Journal (2003) 3, Number 1, pp. 71 80 From the help desk: Bootstrapped standard errors Weihua Guan Stata Corporation Abstract. Bootstrapping is a nonparametric approach for evaluating the distribution
Research Methods & Experimental Design
Research Methods & Experimental Design 16.422 Human Supervisory Control April 2004 Research Methods Qualitative vs. quantitative Understanding the relationship between objectives (research question) and
Standard errors of marginal effects in the heteroskedastic probit model
Standard errors of marginal effects in the heteroskedastic probit model Thomas Cornelißen Discussion Paper No. 320 August 2005 ISSN: 0949 9962 Abstract In non-linear regression models, such as the heteroskedastic
ln(p/(1-p)) = α +β*age35plus, where p is the probability or odds of drinking
Dummy Coding for Dummies Kathryn Martin, Maternal, Child and Adolescent Health Program, California Department of Public Health ABSTRACT There are a number of ways to incorporate categorical variables into
Part 3. Comparing Groups. Chapter 7 Comparing Paired Groups 189. Chapter 8 Comparing Two Independent Groups 217
Part 3 Comparing Groups Chapter 7 Comparing Paired Groups 189 Chapter 8 Comparing Two Independent Groups 217 Chapter 9 Comparing More Than Two Groups 257 188 Elementary Statistics Using SAS Chapter 7 Comparing
Predicting Customer Churn in the Telecommunications Industry An Application of Survival Analysis Modeling Using SAS
Paper 114-27 Predicting Customer in the Telecommunications Industry An Application of Survival Analysis Modeling Using SAS Junxiang Lu, Ph.D. Sprint Communications Company Overland Park, Kansas ABSTRACT
