1 , The Trustees of Indiana University Comparing Group Means: 1 Comparing Group Means: The T-test and One-way ANOVA Using STATA, SAS, and SPSS Hun Myoung Park This document summarizes the method of comparing group means and illustrates how to conduct the t-test and one-way ANOVA using STATA 9.0, SAS 9.1, and SPSS Introduction. Univariate Samples 3. Paired (dependent) Samples 4. Independent Samples with Equal Variances 5. Independent Samples with Unequal Variances 6. One-way ANOVA, GLM, and Regression 7. Conclusion 1. Introduction The t-test and analysis of variance (ANOVA) compare group means. The mean of a variable to be compared should be substantively interpretable. A t-test may examine gender differences in average salary or racial (white versus black) differences in average annual income. The lefthand side (LHS) variable to be tested should be interval or ratio, whereas the right-hand side (RHS) variable should be binary (categorical). 1.1 T-test and ANOVA While the t-test is limited to comparing means of two groups, one-way ANOVA can compare more than two groups. Therefore, the t-test is considered a special case of one-way ANOVA. These analyses do not, however, necessarily imply any causality (i.e., a causal relationship between the left-hand and right-hand side variables). Table 1 compares the t-test and one-way ANOVA. Table 1. Comparison between the T-test and One-way ANOVA T-test One-way ANOVA LHS (Dependent) Interval or ratio variable Interval or ratio variable RHS (Independent) Binary variable with only two groups Categorical variable Null Hypothesis µ 1 = µ µ 1 = µ = µ 3 =... Prob. Distribution * T distribution F distribution * In the case of one degree of freedom on numerator, F=t. The t-test assumes that samples are randomly drawn from normally distributed populations with unknown population means. Otherwise, their means are no longer the best measures of central tendency and the t-test will not be valid. The Central Limit Theorem says, however, that
2 , The Trustees of Indiana University Comparing Group Means: the distributions of y 1 and y are approximately normal when N is large. When n 1 + n 30, in practice, you do not need to worry too much about the normality assumption. You may numerically test the normality assumption using the Shapiro-Wilk W (N<=000), Shapiro-Francia W (N<=5000), Kolmogorov-Smirnov D (N>000), and Jarque-Bera tests. If N is small and the null hypothesis of normality is rejected, you my try such nonparametric methods as the Kolmogorov-Smirnov test, Kruscal-Wallis test, Wilcoxon Rank-Sum Test, or Log-Rank Test, depending on the circumstances. 1. T-test in SAS, STATA, and SPSS In STATA, the.ttest and.ttesti commands are used to conduct t-tests, whereas the.anova and.oneway commands perform one-way ANOVA. SAS has the TTEST procedure for t-test, but the UNIVARIATE, and MEANS procedures also have options for t- test. SAS provides various procedures for the analysis of variance, such as the ANOVA, GLM, and MIXED procedures. The ANOVA procedure can handle balanced data only, while the GLM and MIXED can analyze either balanced or unbalanced data (having the same or different numbers of observations across groups). However, unbalanced data does not cause any problems in the t-test and one-way ANOVA. In SPSS, T-TEST, ONEWAY, and UNIANOVA commands are used to perform t-test and one-way ANOVA. Table summarizes STATA commands, SAS procedures, and SPSS commands that are associated with t-test and one-way ANOVA. Table. Related Procedures and Commands in STATA, SAS, and SPSS STATA 9.0 SE SAS 9.1 SPSS 13.0 Normality Test.sktest;.swilk; UNIVARIATE EXAMINE.sfrancia Equal Variance.oneway TTEST T-TEST Nonparametric.ksmirnov;.kwallis NPAR1WAY NPAR TESTS T-test.ttest TTEST; MEANS T-TEST ANOVA.anova;.oneway ANOVA ONEWAY GLM * GLM; MIXED UNIANOVA * The STATA.glm command is not used for the T test, but for the generalized linear model. 1.3 Data Arrangement There are two types of data arrangement for t-tests (Figure 1). The first data arrangement has a variable to be tested and a grouping variable to classify groups (0 or 1). The second, appropriate especially for paired samples, has two variables to be tested. The two variables in this type are not, however, necessarily paired nor balanced. SAS and SPSS prefer the first data arrangement, whereas STATA can handle either type flexibly. Note that the numbers of observations across groups are not necessarily equal.
3 , The Trustees of Indiana University Comparing Group Means: 3 Figure 1. Two Types of Data Arrangement Variable Group Variable1 Variable x x y y The data set used here is adopted from J. F. Fraumeni s study on cigarette smoking and cancer (Fraumeni 1968). The data are per capita numbers of cigarettes sold by 43 states and the District of Columbia in 1960 together with death rates per hundred thousand people from various forms of cancer. Two variables were added to categorize states into two groups. See the appendix for the details. x x y y
4 , The Trustees of Indiana University Comparing Group Means: 4. Univariate Samples The univariate-sample or one-sample t-test determines whether an unknown population mean µ differs from a hypothesized value c that is commonly set to zero: H 0 : µ = c. The t statistic y c follows Student s T probability distribution with n-1 degrees of freedom, t = ~ t( n 1), s y where y is a variable to be tested and n is the number of observations. 1 Suppose you want to test if the population mean of the death rates from lung cancer is 0 per 100,000 people at the.01 significance level. Note the default significance level used in most software is the.05 level..1 T-test in STATA The.ttest command conducts t-tests in an easy and flexible manner. For a univariate sample test, the command requires that a hypothesized value be explicitly specified. The level() option indicates the confidence level as a percentage. The 99 percent confidence level is equivalent to the.01 significance level.. ttest lung=0, level(99) One-sample t test Variable Obs Mean Std. Err. Std. Dev. [99% Conf. Interval] lung mean = mean(lung) t = Ho: mean = 0 degrees of freedom = 43 Ha: mean < 0 Ha: mean!= 0 Ha: mean > 0 Pr(T < t) = Pr( T > t ) = Pr(T > t) = STATA first lists descriptive statistics of the variable lung. The mean and standard deviation of the 44 observations are and 4.8, respectively. The t statistic is = ( ) / Finally, the degrees of freedom are 43 =44-1. There are three t-tests at the bottom of the output above. The first and third are one-tailed tests, whereas the second is a two-tailed test. The t statistic and its large p-value do not reject the null hypothesis that the population mean of the death rate from lung cancer is 0 at the.01 level. The mean of the death rate may be 0 per 100,000 people. Note that the hypothesized value 0 falls into the 99 percent confidence interval y 1 i ( ) y =, = yi y s s, and standard error s y =. n n 1 n The 99 percent confidence interval of the mean is y tα s = *. 6374, where the.695 is ± y ± the critical value with 43 degree of freedom at the.01 level in the two-tailed test.
5 , The Trustees of Indiana University Comparing Group Means: 5 If you just have the aggregate data (i.e., the number of observations, mean, and standard deviation of the sample), use the.ttesti command to replicate the t-test above. Note the hypothesized value is specified at the end of the summary statistics.. ttesti , level(99). T-test Using the SAS TTEST Procedure The TTEST procedure conducts various types of t-tests in SAS. The H0 option specifies a hypothesized value, whereas the ALPHA indicates a significance level. If omitted, the default values zero and.05 respectively are assumed. PROC TTEST H0=0 ALPHA=.01 DATA=masil.smoking; VAR lung; RUN; The TTEST Procedure Statistics Lower CL Upper CL Lower CL Upper CL Variable N Mean Mean Mean Std Dev Std Dev Std Dev Std Err lung T-Tests Variable DF t Value Pr > t lung The TTEST procedure reports descriptive statistics followed by a one-tailed t-test. You may have a summary data set containing the values of a variable (lung) and their frequencies (count). The FREQ option of the TTEST procedure provides the solution for this case. PROC TTEST H0=0 ALPHA=.01 DATA=masil.smoking; VAR lung; FREQ count; RUN;.3 T-test Using the SAS UNIVARIATE and MEANS Procedures The SAS UNIVARIATE and MEANS procedures also conduct a t-test for a univariate-sample. The UNIVARIATE procedure is basically designed to produces a variety of descriptive statistics of a variable. Its MU0 option tells the procedure to perform a t-test using the hypothesized value specified. The VARDEF=DF specifies a divisor (degrees of freedom) used in
6 , The Trustees of Indiana University Comparing Group Means: 6 computing the variance (standard deviation). 3 The NORMAL option examines if the variable is normally distributed. PROC UNIVARIATE MU0=0 VARDEF=DF NORMAL ALPHA=.01 DATA=masil.smoking; VAR lung; RUN; The UNIVARIATE Procedure Variable: lung Moments N 44 Sum Weights 44 Mean Sum Observations Std Deviation Variance Skewness Kurtosis Uncorrected SS Corrected SS Coeff Variation Std Error Mean Basic Statistical Measures Location Variability Mean Std Deviation 4.81 Median Variance Mode. Range Interquartile Range Tests for Location: Mu0=0 Test -Statistic p Value Student's t t Pr > t Sign M 1 Pr >= M Signed Rank S Pr >= S Tests for Normality Test --Statistic p Value Shapiro-Wilk W Pr < W Kolmogorov-Smirnov D Pr > D > Cramer-von Mises W-Sq Pr > W-Sq >0.500 Anderson-Darling A-Sq Pr > A-Sq >0.500 Quantiles (Definition 5) Quantile Estimate 100% Max The VARDEF=N uses N as a divisor, while VARDEF=WDF specifies the sum of weights minus one.
7 , The Trustees of Indiana University Comparing Group Means: 7 99% % % % Q % Median % Q Quantiles (Definition 5) Quantile Estimate 10% % % % Min Extreme Observations -----Lowest Highest---- Value Obs Value Obs The third block of the output above reports a t statistic and its p-value. The fourth block contains several statistics of normality test. Since N is less than,000, you should read the Shapiro-Wilk W, which suggests that lung is normally distributed (p<.535) The MEANS procedure also conducts t-tests using the T and PROBT options that request the t statistic and its two-tailed p-value. The CLM option produces the two-tailed confidence interval (or upper and lower limits). The MEAN, STD, and STDERR respectively print the sample mean, standard deviation, and standard error. PROC MEANS MEAN STD STDERR T PROBT CLM VARDEF=DF ALPHA=.01 DATA=masil.smoking; VAR lung; RUN; The MEANS Procedure Analysis Variable : lung Lower 99% Upper 99% Mean Std Dev Std Error t Value Pr > t CL for Mean CL for Mean ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ < ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
8 , The Trustees of Indiana University Comparing Group Means: 8 The MEANS procedure does not, however, have an option to specify a hypothesized value to anything other than zero. Thus, the null hypothesis here is that the population mean of death rate from lung cancer is zero. The t statistic is ( )/ The large t statistic and small p-value reject the null hypothesis, reporting a consistent conclusion..4 T-test in SPSS The SPSS has the T-TEST command for t-tests. The /TESTVAL subcommand specifies the value with which the sample mean is compared, whereas the /VARIABLES list the variables to be tested. Like STATA, SPSS specifies a confidence level rather than a significance level in the /CRITERIA=CI() subcommand. T-TEST /TESTVAL = 0 /VARIABLES = lung /MISSING = ANALYSIS /CRITERIA = CI(.99).
9 , The Trustees of Indiana University Comparing Group Means: 9 3. Paired (Dependent) Samples When two variables are not independent, but paired, the difference of these two variables, di = y1 i yi, is treated as if it were a single sample. This test is appropriate for pre-post treatment responses. The null hypothesis is that the true mean difference of the two variables is D 0, H : D 0 µ d = 0. 4 The difference is typically assumed to be zero unless explicitly specified. 3.1 T-test in STATA In order to conduct a paired sample t-test, you need to list two variables separated by an equal sign. The interpretation of the t-test remains almost unchanged. The = ( )/ at 35 degrees of freedom does not reject the null hypothesis that the difference is zero.. ttest pre=post0, level(95) Paired t test Variable Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] pre post diff mean(diff) = mean(pre post0) t = Ho: mean(diff) = 0 degrees of freedom = 35 Ha: mean(diff) < 0 Ha: mean(diff)!= 0 Ha: mean(diff) > 0 Pr(T < t) = Pr( T > t ) = Pr(T > t) = Alternatively, you may first compute the difference between the two variables, and then conduct one-sample t-test. Note that the default confidence level, level(95), can be omitted.. gen d=pre post0. ttest d=0 3. T-test in SAS In the TTEST procedure, you have to use the PAIRED instead of the VAR statement. For the output of the following procedure, refer to the end of this section. PROC TTEST DATA=temp.drug; PAIRED pre*post0; RUN; t d D 4 = 0 ~ t( n 1) d sd, where d n d =, i s d ( ) = di d n 1, and s d = sd n
10 , The Trustees of Indiana University Comparing Group Means: 10 The PAIRED statement provides various ways of comparing variables using asterisk (*) and colon (:) operators. The asterisk requests comparisons between each variable on the left with each variable on the right. The colon requests comparisons between the first variable on the left and the first on the right, the second on the left and the second on the right, and so forth. Consider the following examples. PROC TTEST; PAIRED pro: post0; PAIRED (a b)*(c d); /* Equivalent to PAIRED a*c a*d b*c b*d; */ PAIRED (a b):(c d); /* Equivalent to PAIRED a*c b*c; */ PAIRED (a1-a10)*(b1-b10); RUN; The first PAIRED statement is the same as the PAIRED pre*post0. The second and the third PAIRED statements contrast differences between asterisk and colon operators. The hyphen ( ) operator in the last statement indicates a1 through a10 and b1 through b10. Let us consider an example of the PAIRED statement. PROC TTEST DATA=temp.drug; PAIRED (pre)*(post0-post1); RUN; The TTEST Procedure Statistics Lower CL Upper CL Lower CL Upper CL Difference N Mean Mean Mean Std Dev Std Dev Std Dev Std Err pre - post pre - post T-Tests Difference DF t Value Pr > t pre - post pre - post The first t statistic for pre versus post0 is identical to that of the previous section. The second for pre versus post1 rejects the null hypothesis of no mean difference at the.01 level (p<.000). In order to use the UNIVARIATE and MEANS procedures, the difference between two paired variables should be computed in advance. DATA temp.drug; SET temp.drug; d1 = pre - post0; d = pre - post1; RUN;
11 , The Trustees of Indiana University Comparing Group Means: 11 PROC UNIVARIATE MU0=0 VARDEF=DF NORMAL; VAR d1 d; RUN; PROC MEANS MEAN STD STDERR T PROBT CLM; VAR d1 d; RUN; PROC TTEST ALPHA=.05; VAR d1 d; RUN; 3.3 T-test in SPSS In SPSS, the PAIRS subcommand indicates a paired sample t-test. T-TEST PAIRS = pre post0 /CRITERIA = CI(.95) /MISSING = ANALYSIS.
12 , The Trustees of Indiana University Comparing Group Means: 1 4. Independent Samples with Equal Variances You should check three assumptions first when testing the mean difference of two independent samples. First, the samples are drawn from normally distributed populations with unknown parameters. Second, the two samples are independent in the sense that they are drawn from different populations and/or the elements of one sample are not related to those of the other sample. Finally, the population variances of the two groups, σ 1 and σ are equal. 5 If any one of assumption is violated, the t-test is not valid. An example here is to compare mean death rates from lung cancer between smokers and nonsmokers. Let us begin with discussing the equal variance assumption. 4.1 F test for Equal Variances The folded form F test is widely used to examine whether two populations have the same sl variance. The statistic is ~ F( n 1, 1) L ns, where L and S respectively indicate groups ss with larger and smaller sample variances. Unless the null hypothesis of equal variances is rejected, the pooled variance estimate s pool is used. The null hypothesis of the independent sample t-test is H : µ µ = D ( y1 y ) D0 t = ~ t( n1 + n 1 1 s pool + n1 n ( ) ( y1 i y1 + y s = n + n ), where y j ( n1 1) s1 + ( n 1) s pool =. 1 n1 + n ) When the assumption is violated, the t-test requires the approximations of the degree of freedom. The null hypothesis and other components of the t-test, however, remain unchanged. Satterthwaite s approximation for the degree of freedom is commonly used. Note that the approximation is a real number, not an integer. y1 y D0 t' = ~ t( df Satterthwaite ), where s1 s + n n df 1 ( n 1)( n 1) 1 Satterthwaite = and ( n1 1)(1 c) + ( n 1) c c = s 1 s1 n n + s 1 1 n 5 1 E ( x1 x ) = µ 1 µ, 1 Var( x = + = + 1 x ) σ n1 n n1 n σ σ 1
13 , The Trustees of Indiana University Comparing Group Means: 13 The SAS TTEST procedure and SPSS T-TEST command conduct F tests for equal variance. SAS reports the folded form F statistic, whereas SPSS computes Levene's weighted F statistic. In STATA, the.oneway command produces Bartlett s statistic for the equal variance test. The following is an example of Bartlett's test that does not reject the null hypothesis of equal variance.. oneway lung smoke Analysis of Variance Source SS df MS F Prob > F Between groups Within groups Total Bartlett's test for equal variances: chi(1) = Prob>chi = 0.77 STATA, SAS, and SPSS all compute Satterthwaite s approximation of the degrees of freedom. In addition, the SAS TTEST procedure reports Cochran-Cox approximation and the STATA.ttest command provides Welch s degrees of freedom. 4. T-test in STATA With the.ttest command, you have to specify a grouping variable smoke in this example in the parenthesis of the by option.. ttest lung, by(smoke) level(95) Two-sample t test with equal variances Group Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] combined diff diff = mean(0) - mean(1) t = Ho: diff = 0 degrees of freedom = 4 Ha: diff < 0 Ha: diff!= 0 Ha: diff > 0 Pr(T < t) = Pr( T > t ) = Pr(T > t) = s Let us first check the equal variance. The F statistic is 1.17 = L = ~ F(1,1). The ss degrees of freedom of the numerator and denominator are 1 (=-1). The p-value of.773, virtually the same as that of Bartlett s test above, does not reject the null hypothesis of equal variance. Thus, the t-test here is valid (t= and p<.0000).
14 , The Trustees of Indiana University Comparing Group Means: 14 ( ) 0 t = = ~ t( + ), where 1 1 s pool + ( 1) ( 1)3.418 s pool = = If only aggregate data of the two variables are available, use the.ttesti command and list the number of observations, mean, and standard deviation of the two variables.. ttesti , level(95) Suppose a data set is differently arranged (second type in Figure 1) so that one variable smk_lung has data for smokers and the other non_lung for non-smokers. You have to use the unpaired option to indicate that two variables are not paired. A grouping variable here is not necessary. Compare the following output with what is printed above.. ttest smk_lung=non_lung, unpaired Two-sample t test with equal variances Variable Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] smk_lung non_lung combined diff diff = mean(smk_lung) - mean(non_lung) t = Ho: diff = 0 degrees of freedom = 4 Ha: diff < 0 Ha: diff!= 0 Ha: diff > 0 Pr(T < t) = Pr( T > t ) = Pr(T > t) = This unpaired option is very useful since it enables you to conduct a t-test without additional data manipulation. You may run the.ttest command with the unpaired option to compare two variables, say leukemia and kidney, as independent samples in STATA. In SAS and SPSS, however, you have to stack up two variables and generate a grouping variable before t- tests.. ttest leukemia=kidney, unpaired Two-sample t test with equal variances Variable Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] leukemia kidney combined diff
15 , The Trustees of Indiana University Comparing Group Means: 15 diff = mean(leukemia) - mean(kidney) t = Ho: diff = 0 degrees of freedom = 86 Ha: diff < 0 Ha: diff!= 0 Ha: diff > 0 Pr(T < t) = Pr( T > t ) = Pr(T > t) = The F = ( ^)/( ^) and its p-value (=.1797) do not reject the null hypothesis of equal variance. The large t statistic rejects the null hypothesis that death rates from leukemia and kidney cancers have the same mean. 4.3 T-test in SAS The TTEST procedure by default examines the hypothesis of equal variances, and provides T statistics for either case. The procedure by default reports Satterthwaite s approximation for the degrees of freedom. Keep in mind that a variable to be tested is grouped by the variable that is specified in the CLASS statement. PROC TTEST H0=0 ALPHA=.05 DATA=masil.smoking; CLASS smoke; VAR lung; RUN; The TTEST Procedure Statistics Lower CL Upper CL Lower CL Upper CL Variable smoke N Mean Mean Mean Std Dev Std Dev Std Dev lung lung lung Diff (1-) Statistics Variable smoke Std Err Minimum Maximum lung lung lung Diff (1-) T-Tests Variable Method Variances DF t Value Pr > t lung Pooled Equal <.0001 lung Satterthwaite Unequal <.0001 Equality of Variances Variable Method Num DF Den DF F Value Pr > F
16 , The Trustees of Indiana University Comparing Group Means: 16 lung Folded F The F test for equal variance does not reject the null hypothesis of equal variances. Thus, the t- test labeled as Pooled should be referred to in order to get the t and its p-value If the equal variance assumption is violated, the statistics of Satterthwaite and Cochran should be read. If you have a summary data set with the values of variables (lung) and their frequency (count), specify the count variable in the FREQ statement. PROC TTEST DATA=masil.smoking; CLASS smoke; VAR lung; FREQ count; RUN; Now, let us compare the death rates from leukemia and kidney in the second data arrangement type of Figure 1. As mentioned before, you need to rearrange the data set to stack up two variables into one and generate a grouping variable (first type in Figure 1). DATA masil.smoking; SET masil.smoking; death = leukemia; leu_kid ='Leukemia'; OUTPUT; death = kidney; leu_kid ='Kidney'; OUTPUT; KEEP leu_kid death; RUN; PROC TTEST COCHRAN DATA=masil.smoking; CLASS leu_kid; VAR death; RUN; The TTEST Procedure Statistics Lower CL Upper CL Lower CL Upper CL Variable leu_kid N Mean Mean Mean Std Dev Std Dev Std Dev Std Err death Kidney death Leukemia death Diff (1-) T-Tests Variable Method Variances DF t Value Pr > t death Pooled Equal <.0001 death Satterthwaite Unequal <.0001 death Cochran Unequal <.0001 Equality of Variances Variable Method Num DF Den DF F Value Pr > F
17 , The Trustees of Indiana University Comparing Group Means: 17 death Folded F Compare this SAS output with that of STATA in the previous section. 4.4 T-test in SPSS In the T-TEST command, you need to use the /GROUP subcommand in order to specify a grouping variable. SPSS reports Levene's F.0000 that does not reject the null hypothesis of equal variance (p<.995). T-TEST GROUPS = smoke(0 1) /VARIABLES = lung /MISSING = ANALYSIS /CRITERIA = CI(.95).
18 , The Trustees of Indiana University Comparing Group Means: Independent Samples with Unequal Variances If the assumption of equal variances is violated, we have to compute the adjusted t statistic using individual sample standard deviations rather than a pooled standard deviation. It is also necessary to use the Satterthwaite, Cochran-Cox (SAS), or Welch (STATA) approximations of the degrees of freedom. In this chapter, you compare mean death rates from kidney cancer between the west (south) and east (north). 5.1 T-test in STATA As discussed earlier, let us check equality of variances using the.oneway command. The tabulate option produces a table of summary statistics for the groups.. oneway kidney west, tabulate Summary of kidney west Mean Std. Dev. Freq Total Analysis of Variance Source SS df MS F Prob > F Between groups Within groups Total Bartlett's test for equal variances: chi(1) = Prob>chi = Bartlett s chi-squared statistic rejects the null hypothesis of equal variance at the.01 level. It is appropriate to use the unequal option in the.ttest command, which calculates Satterthwaite s approximation for the degrees of freedom. Unlike the SAS TTEST procedure, the.ttest command cannot specify the mean difference D 0 other than zero. Thus, the null hypothesis is that the mean difference is zero.. ttest kidney, by(west) unequal level(95) Two-sample t test with unequal variances Group Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] combined diff
19 , The Trustees of Indiana University Comparing Group Means: 19 diff = mean(0) - mean(1) t =.7817 Ho: diff = 0 Satterthwaite's degrees of freedom = Ha: diff < 0 Ha: diff!= 0 Ha: diff > 0 Pr(T < t) = Pr( T > t ) = Pr(T > t) = See Satterthwaite s approximation of in the middle of the output. If you want to get Welch s approximation, use the welch as well as unequal options; without the unequal option, the welch is ignored.. ttest kidney, by(west) unequal welch Two-sample t test with unequal variances Group Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] combined diff diff = mean(0) - mean(1) t =.7817 Ho: diff = 0 Welch's degrees of freedom = Ha: diff < 0 Ha: diff!= 0 Ha: diff > 0 Pr(T < t) = Pr( T > t ) = Pr(T > t) = Satterthwaite s approximation is slightly smaller than Welch s Again, keep in mind that these approximations are not integers, but real numbers. The t statistic.7817 and its p- value.0086 reject the null hypothesis of equal population means. The north and east have larger death rates from kidney cancer per 100 thousand people than the south and west. For aggregate data, use the.ttesti command with the necessary options.. ttesti , unequal welch As mentioned earlier, the unpaired option of the.ttest command directly compares two variables without data manipulation. The option treats the two variables as independent of each other. The following is an example of the unpaired and unequal options.. ttest bladder=kidney, unpaired unequal welch Two-sample t test with unequal variances Variable Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] bladder kidney combined diff diff = mean(bladder) - mean(kidney) t = Ho: diff = 0 Welch's degrees of freedom =
20 , The Trustees of Indiana University Comparing Group Means: 0 Ha: diff < 0 Ha: diff!= 0 Ha: diff > 0 Pr(T < t) = Pr( T > t ) = Pr(T > t) = The F = ( ^)/( ^) rejects the null hypothesis of equal variance (p<0001). If the welch option is omitted, Satterthwaite's degree of freedom will be produced instead. For aggregate data, again, use the.ttesti command without the unpaired option.. ttesti , unequal welch level(95) 5. T-test in SAS The TTEST procedure reports statistics for cases of both equal and unequal variance. You may add the COCHRAN option to compute Cochran-Cox approximations for the degree of freedom. PROC TTEST COCHRAN DATA=masil.smoking; CLASS west; VAR kidney; RUN; The TTEST Procedure Statistics Lower CL Upper CL Lower CL Upper CL Variable s_west N Mean Mean Mean Std Dev Std Dev Std Dev kidney kidney kidney Diff (1-) Statistics Variable west Std Err Minimum Maximum kidney kidney kidney Diff (1-) T-Tests Variable Method Variances DF t Value Pr > t kidney Pooled Equal kidney Satterthwaite Unequal kidney Cochran Unequal Equality of Variances Variable Method Num DF Den DF F Value Pr > F kidney Folded F
Student Guide to SPSS Barnard College Department of Biological Sciences Dan Flynn Table of Contents Introduction... 2 Basics... 4 Starting SPSS... 4 Navigating... 4 Data Editor... 5 SPSS Viewer... 6 Getting
Topic Overview Statistics 512: Applied Linear Models Topic 6 This topic will cover One-way Analysis of Variance (ANOVA) One-Way Analysis of Variance (ANOVA) Also called single factor ANOVA. The response
1 Chapter 1 PRINCIPAL COMPONENT ANALYSIS Introduction: The Basics of Principal Component Analysis........................... 2 A Variable Reduction Procedure.......................................... 2
Version 3.0 The InStat guide to choosing and interpreting statistical tests Harvey Motulsky 1990-2003, GraphPad Software, Inc. All rights reserved. Program design, manual and help screens: Programming:
Results from the 2014 AP Statistics Exam Jessica Utts, University of California, Irvine Chief Reader, AP Statistics email@example.com The six free-response questions Question #1: Extracurricular activities
The Inaugural Coase Lecture An Introduction to Regression Analysis Alan O. Sykes * Regression analysis is a statistical tool for the investigation of relationships between variables. Usually, the investigator
14. Regression A. Introduction to Simple Linear Regression B. Partitioning Sums of Squares C. Standard Error of the Estimate D. Inferential Statistics for b and r E. Influential Observations F. Regression
The Capital Asset Pricing Model: Some Empirical Tests Fischer Black* Deceased Michael C. Jensen Harvard Business School MJensen@hbs.edu and Myron Scholes Stanford University - Graduate School of Business
8 INTERPRETATION OF SURVEY RESULTS 8.1 Introduction This chapter discusses the interpretation of survey results, primarily those of the final status survey. Interpreting a survey s results is most straightforward
1314 Testing a Hypothesis about Two Independent Means How can you test the null hypothesis that two population means are equal, based on the results observed in two independent samples? Why can t you use
Appendix. Understanding and Using ACS Single-Year and Multiyear Estimates What Are Single-Year and Multiyear Estimates? Understanding Period Estimates The ACS produces period estimates of socioeconomic
Beyond Baseline and Follow-up: The Case for More T in Experiments * David McKenzie, World Bank Abstract The vast majority of randomized experiments in economics rely on a single baseline and single follow-up
Where the Bugs Are Thomas J. Ostrand AT&T Labs - Research 180 Park Avenue Florham Park, NJ 07932 firstname.lastname@example.org Elaine J. Weyuker AT&T Labs - Research 180 Park Avenue Florham Park, NJ 07932 email@example.com
Working Paper Number 103 December 2006 How to Do xtabond2: An Introduction to Difference and System GMM in Stata By David Roodman Abstract The Arellano-Bond (1991) and Arellano-Bover (1995)/Blundell-Bond
Can political science literatures be believed? A study of publication bias in the APSR and the AJPS Alan Gerber Yale University Neil Malhotra Stanford University Abstract Despite great attention to the
J. R. Statist. Soc. A (2008) 171, Part 2, pp. 481 502 Misunderstandings between experimentalists and observationalists about causal inference Kosuke Imai, Princeton University, USA Gary King Harvard University,
Introduction to Hypothesis Testing CHAPTER 8 LEARNING OBJECTIVES After reading this chapter, you should be able to: 1 Identify the four steps of hypothesis testing. 2 Define null hypothesis, alternative
Marginal Effects for Continuous Variables Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised February 21, 2015 References: Long 1997, Long and Freese 2003 & 2006 & 2014,
Journal of Personality and Social Psychology Copyright 2005 by the American Psychological Association 2005, Vol. 89, No. 6, 852 863 0022-3514/05/$12.00 DOI: 10.1037/0022-35184.108.40.2062 When Moderation Is
Journal of Machine Learning Research 7 (2006) 1 30 Submitted 8/04; Revised 4/05; Published 1/06 Statistical Comparisons of Classifiers over Multiple Data Sets Janez Demšar Faculty of Computer and Information
Forthcoming: Journal of Economic Growth Growth Is Good for the Poor David Dollar Aart Kraay Development Research Group The World Bank First Draft: March 2000 This Draft: March 2002 Abstract: Average incomes
How and Why Do Teacher Credentials Matter for Student Achievement? C h a r l e s T. Clotfelter H e l e n F. Ladd J a c o b L. Vigdor w o r k i n g p a p e r 2 m a r c h 2 0 0 7 How and why do teacher credentials
The Statistician (1999) 48, Part 1, pp. 1±40 Some statistical heresies J. K. Lindsey Limburgs Universitair Centrum, Diepenbeek, Belgium [Read before The Royal Statistical Society on Wednesday, July 15th,
Making the Most of Statistical Analyses: Improving Interpretation and Presentation Gary King Michael Tomz Jason Wittenberg Harvard University Harvard University Harvard University Social scientists rarely
Chapter Regression Supervised learning can be divided into regression and classification problems. Whereas the outputs for classification are discrete class labels, regression is concerned with the prediction
Version 9 New Features in JMP 9 The real voyage of discovery consists not in seeking new landscapes, but in having new eyes. Marcel Proust JMP, A Business Unit of SAS SAS Campus Drive Cary, NC 27513 The
EVALUATION OF GAUSSIAN PROCESSES AND OTHER METHODS FOR NON-LINEAR REGRESSION Carl Edward Rasmussen A thesis submitted in conformity with the requirements for the degree of Doctor of Philosophy, Graduate