Comparing Group Means: The Ttest and Oneway ANOVA Using STATA, SAS, and SPSS


 Willa Hodges
 1 years ago
 Views:
Transcription
1 , The Trustees of Indiana University Comparing Group Means: 1 Comparing Group Means: The Ttest and Oneway ANOVA Using STATA, SAS, and SPSS Hun Myoung Park This document summarizes the method of comparing group means and illustrates how to conduct the ttest and oneway ANOVA using STATA 9.0, SAS 9.1, and SPSS Introduction. Univariate Samples 3. Paired (dependent) Samples 4. Independent Samples with Equal Variances 5. Independent Samples with Unequal Variances 6. Oneway ANOVA, GLM, and Regression 7. Conclusion 1. Introduction The ttest and analysis of variance (ANOVA) compare group means. The mean of a variable to be compared should be substantively interpretable. A ttest may examine gender differences in average salary or racial (white versus black) differences in average annual income. The lefthand side (LHS) variable to be tested should be interval or ratio, whereas the righthand side (RHS) variable should be binary (categorical). 1.1 Ttest and ANOVA While the ttest is limited to comparing means of two groups, oneway ANOVA can compare more than two groups. Therefore, the ttest is considered a special case of oneway ANOVA. These analyses do not, however, necessarily imply any causality (i.e., a causal relationship between the lefthand and righthand side variables). Table 1 compares the ttest and oneway ANOVA. Table 1. Comparison between the Ttest and Oneway ANOVA Ttest Oneway ANOVA LHS (Dependent) Interval or ratio variable Interval or ratio variable RHS (Independent) Binary variable with only two groups Categorical variable Null Hypothesis µ 1 = µ µ 1 = µ = µ 3 =... Prob. Distribution * T distribution F distribution * In the case of one degree of freedom on numerator, F=t. The ttest assumes that samples are randomly drawn from normally distributed populations with unknown population means. Otherwise, their means are no longer the best measures of central tendency and the ttest will not be valid. The Central Limit Theorem says, however, that
2 , The Trustees of Indiana University Comparing Group Means: the distributions of y 1 and y are approximately normal when N is large. When n 1 + n 30, in practice, you do not need to worry too much about the normality assumption. You may numerically test the normality assumption using the ShapiroWilk W (N<=000), ShapiroFrancia W (N<=5000), KolmogorovSmirnov D (N>000), and JarqueBera tests. If N is small and the null hypothesis of normality is rejected, you my try such nonparametric methods as the KolmogorovSmirnov test, KruscalWallis test, Wilcoxon RankSum Test, or LogRank Test, depending on the circumstances. 1. Ttest in SAS, STATA, and SPSS In STATA, the.ttest and.ttesti commands are used to conduct ttests, whereas the.anova and.oneway commands perform oneway ANOVA. SAS has the TTEST procedure for ttest, but the UNIVARIATE, and MEANS procedures also have options for t test. SAS provides various procedures for the analysis of variance, such as the ANOVA, GLM, and MIXED procedures. The ANOVA procedure can handle balanced data only, while the GLM and MIXED can analyze either balanced or unbalanced data (having the same or different numbers of observations across groups). However, unbalanced data does not cause any problems in the ttest and oneway ANOVA. In SPSS, TTEST, ONEWAY, and UNIANOVA commands are used to perform ttest and oneway ANOVA. Table summarizes STATA commands, SAS procedures, and SPSS commands that are associated with ttest and oneway ANOVA. Table. Related Procedures and Commands in STATA, SAS, and SPSS STATA 9.0 SE SAS 9.1 SPSS 13.0 Normality Test.sktest;.swilk; UNIVARIATE EXAMINE.sfrancia Equal Variance.oneway TTEST TTEST Nonparametric.ksmirnov;.kwallis NPAR1WAY NPAR TESTS Ttest.ttest TTEST; MEANS TTEST ANOVA.anova;.oneway ANOVA ONEWAY GLM * GLM; MIXED UNIANOVA * The STATA.glm command is not used for the T test, but for the generalized linear model. 1.3 Data Arrangement There are two types of data arrangement for ttests (Figure 1). The first data arrangement has a variable to be tested and a grouping variable to classify groups (0 or 1). The second, appropriate especially for paired samples, has two variables to be tested. The two variables in this type are not, however, necessarily paired nor balanced. SAS and SPSS prefer the first data arrangement, whereas STATA can handle either type flexibly. Note that the numbers of observations across groups are not necessarily equal.
3 , The Trustees of Indiana University Comparing Group Means: 3 Figure 1. Two Types of Data Arrangement Variable Group Variable1 Variable x x y y The data set used here is adopted from J. F. Fraumeni s study on cigarette smoking and cancer (Fraumeni 1968). The data are per capita numbers of cigarettes sold by 43 states and the District of Columbia in 1960 together with death rates per hundred thousand people from various forms of cancer. Two variables were added to categorize states into two groups. See the appendix for the details. x x y y
4 , The Trustees of Indiana University Comparing Group Means: 4. Univariate Samples The univariatesample or onesample ttest determines whether an unknown population mean µ differs from a hypothesized value c that is commonly set to zero: H 0 : µ = c. The t statistic y c follows Student s T probability distribution with n1 degrees of freedom, t = ~ t( n 1), s y where y is a variable to be tested and n is the number of observations. 1 Suppose you want to test if the population mean of the death rates from lung cancer is 0 per 100,000 people at the.01 significance level. Note the default significance level used in most software is the.05 level..1 Ttest in STATA The.ttest command conducts ttests in an easy and flexible manner. For a univariate sample test, the command requires that a hypothesized value be explicitly specified. The level() option indicates the confidence level as a percentage. The 99 percent confidence level is equivalent to the.01 significance level.. ttest lung=0, level(99) Onesample t test Variable Obs Mean Std. Err. Std. Dev. [99% Conf. Interval] lung mean = mean(lung) t = Ho: mean = 0 degrees of freedom = 43 Ha: mean < 0 Ha: mean!= 0 Ha: mean > 0 Pr(T < t) = Pr( T > t ) = Pr(T > t) = STATA first lists descriptive statistics of the variable lung. The mean and standard deviation of the 44 observations are and 4.8, respectively. The t statistic is = ( ) / Finally, the degrees of freedom are 43 =441. There are three ttests at the bottom of the output above. The first and third are onetailed tests, whereas the second is a twotailed test. The t statistic and its large pvalue do not reject the null hypothesis that the population mean of the death rate from lung cancer is 0 at the.01 level. The mean of the death rate may be 0 per 100,000 people. Note that the hypothesized value 0 falls into the 99 percent confidence interval y 1 i ( ) y =, = yi y s s, and standard error s y =. n n 1 n The 99 percent confidence interval of the mean is y tα s = *. 6374, where the.695 is ± y ± the critical value with 43 degree of freedom at the.01 level in the twotailed test.
5 , The Trustees of Indiana University Comparing Group Means: 5 If you just have the aggregate data (i.e., the number of observations, mean, and standard deviation of the sample), use the.ttesti command to replicate the ttest above. Note the hypothesized value is specified at the end of the summary statistics.. ttesti , level(99). Ttest Using the SAS TTEST Procedure The TTEST procedure conducts various types of ttests in SAS. The H0 option specifies a hypothesized value, whereas the ALPHA indicates a significance level. If omitted, the default values zero and.05 respectively are assumed. PROC TTEST H0=0 ALPHA=.01 DATA=masil.smoking; VAR lung; RUN; The TTEST Procedure Statistics Lower CL Upper CL Lower CL Upper CL Variable N Mean Mean Mean Std Dev Std Dev Std Dev Std Err lung TTests Variable DF t Value Pr > t lung The TTEST procedure reports descriptive statistics followed by a onetailed ttest. You may have a summary data set containing the values of a variable (lung) and their frequencies (count). The FREQ option of the TTEST procedure provides the solution for this case. PROC TTEST H0=0 ALPHA=.01 DATA=masil.smoking; VAR lung; FREQ count; RUN;.3 Ttest Using the SAS UNIVARIATE and MEANS Procedures The SAS UNIVARIATE and MEANS procedures also conduct a ttest for a univariatesample. The UNIVARIATE procedure is basically designed to produces a variety of descriptive statistics of a variable. Its MU0 option tells the procedure to perform a ttest using the hypothesized value specified. The VARDEF=DF specifies a divisor (degrees of freedom) used in
6 , The Trustees of Indiana University Comparing Group Means: 6 computing the variance (standard deviation). 3 The NORMAL option examines if the variable is normally distributed. PROC UNIVARIATE MU0=0 VARDEF=DF NORMAL ALPHA=.01 DATA=masil.smoking; VAR lung; RUN; The UNIVARIATE Procedure Variable: lung Moments N 44 Sum Weights 44 Mean Sum Observations Std Deviation Variance Skewness Kurtosis Uncorrected SS Corrected SS Coeff Variation Std Error Mean Basic Statistical Measures Location Variability Mean Std Deviation 4.81 Median Variance Mode. Range Interquartile Range Tests for Location: Mu0=0 Test Statistic p Value Student's t t Pr > t Sign M 1 Pr >= M Signed Rank S Pr >= S Tests for Normality Test Statistic p Value ShapiroWilk W Pr < W KolmogorovSmirnov D Pr > D > Cramervon Mises WSq Pr > WSq >0.500 AndersonDarling ASq Pr > ASq >0.500 Quantiles (Definition 5) Quantile Estimate 100% Max The VARDEF=N uses N as a divisor, while VARDEF=WDF specifies the sum of weights minus one.
7 , The Trustees of Indiana University Comparing Group Means: 7 99% % % % Q % Median % Q Quantiles (Definition 5) Quantile Estimate 10% % % % Min Extreme Observations Lowest Highest Value Obs Value Obs The third block of the output above reports a t statistic and its pvalue. The fourth block contains several statistics of normality test. Since N is less than,000, you should read the ShapiroWilk W, which suggests that lung is normally distributed (p<.535) The MEANS procedure also conducts ttests using the T and PROBT options that request the t statistic and its twotailed pvalue. The CLM option produces the twotailed confidence interval (or upper and lower limits). The MEAN, STD, and STDERR respectively print the sample mean, standard deviation, and standard error. PROC MEANS MEAN STD STDERR T PROBT CLM VARDEF=DF ALPHA=.01 DATA=masil.smoking; VAR lung; RUN; The MEANS Procedure Analysis Variable : lung Lower 99% Upper 99% Mean Std Dev Std Error t Value Pr > t CL for Mean CL for Mean ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ < ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
8 , The Trustees of Indiana University Comparing Group Means: 8 The MEANS procedure does not, however, have an option to specify a hypothesized value to anything other than zero. Thus, the null hypothesis here is that the population mean of death rate from lung cancer is zero. The t statistic is ( )/ The large t statistic and small pvalue reject the null hypothesis, reporting a consistent conclusion..4 Ttest in SPSS The SPSS has the TTEST command for ttests. The /TESTVAL subcommand specifies the value with which the sample mean is compared, whereas the /VARIABLES list the variables to be tested. Like STATA, SPSS specifies a confidence level rather than a significance level in the /CRITERIA=CI() subcommand. TTEST /TESTVAL = 0 /VARIABLES = lung /MISSING = ANALYSIS /CRITERIA = CI(.99).
9 , The Trustees of Indiana University Comparing Group Means: 9 3. Paired (Dependent) Samples When two variables are not independent, but paired, the difference of these two variables, di = y1 i yi, is treated as if it were a single sample. This test is appropriate for prepost treatment responses. The null hypothesis is that the true mean difference of the two variables is D 0, H : D 0 µ d = 0. 4 The difference is typically assumed to be zero unless explicitly specified. 3.1 Ttest in STATA In order to conduct a paired sample ttest, you need to list two variables separated by an equal sign. The interpretation of the ttest remains almost unchanged. The = ( )/ at 35 degrees of freedom does not reject the null hypothesis that the difference is zero.. ttest pre=post0, level(95) Paired t test Variable Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] pre post diff mean(diff) = mean(pre post0) t = Ho: mean(diff) = 0 degrees of freedom = 35 Ha: mean(diff) < 0 Ha: mean(diff)!= 0 Ha: mean(diff) > 0 Pr(T < t) = Pr( T > t ) = Pr(T > t) = Alternatively, you may first compute the difference between the two variables, and then conduct onesample ttest. Note that the default confidence level, level(95), can be omitted.. gen d=pre post0. ttest d=0 3. Ttest in SAS In the TTEST procedure, you have to use the PAIRED instead of the VAR statement. For the output of the following procedure, refer to the end of this section. PROC TTEST DATA=temp.drug; PAIRED pre*post0; RUN; t d D 4 = 0 ~ t( n 1) d sd, where d n d =, i s d ( ) = di d n 1, and s d = sd n
10 , The Trustees of Indiana University Comparing Group Means: 10 The PAIRED statement provides various ways of comparing variables using asterisk (*) and colon (:) operators. The asterisk requests comparisons between each variable on the left with each variable on the right. The colon requests comparisons between the first variable on the left and the first on the right, the second on the left and the second on the right, and so forth. Consider the following examples. PROC TTEST; PAIRED pro: post0; PAIRED (a b)*(c d); /* Equivalent to PAIRED a*c a*d b*c b*d; */ PAIRED (a b):(c d); /* Equivalent to PAIRED a*c b*c; */ PAIRED (a1a10)*(b1b10); RUN; The first PAIRED statement is the same as the PAIRED pre*post0. The second and the third PAIRED statements contrast differences between asterisk and colon operators. The hyphen ( ) operator in the last statement indicates a1 through a10 and b1 through b10. Let us consider an example of the PAIRED statement. PROC TTEST DATA=temp.drug; PAIRED (pre)*(post0post1); RUN; The TTEST Procedure Statistics Lower CL Upper CL Lower CL Upper CL Difference N Mean Mean Mean Std Dev Std Dev Std Dev Std Err pre  post pre  post TTests Difference DF t Value Pr > t pre  post pre  post The first t statistic for pre versus post0 is identical to that of the previous section. The second for pre versus post1 rejects the null hypothesis of no mean difference at the.01 level (p<.000). In order to use the UNIVARIATE and MEANS procedures, the difference between two paired variables should be computed in advance. DATA temp.drug; SET temp.drug; d1 = pre  post0; d = pre  post1; RUN;
11 , The Trustees of Indiana University Comparing Group Means: 11 PROC UNIVARIATE MU0=0 VARDEF=DF NORMAL; VAR d1 d; RUN; PROC MEANS MEAN STD STDERR T PROBT CLM; VAR d1 d; RUN; PROC TTEST ALPHA=.05; VAR d1 d; RUN; 3.3 Ttest in SPSS In SPSS, the PAIRS subcommand indicates a paired sample ttest. TTEST PAIRS = pre post0 /CRITERIA = CI(.95) /MISSING = ANALYSIS.
12 , The Trustees of Indiana University Comparing Group Means: 1 4. Independent Samples with Equal Variances You should check three assumptions first when testing the mean difference of two independent samples. First, the samples are drawn from normally distributed populations with unknown parameters. Second, the two samples are independent in the sense that they are drawn from different populations and/or the elements of one sample are not related to those of the other sample. Finally, the population variances of the two groups, σ 1 and σ are equal. 5 If any one of assumption is violated, the ttest is not valid. An example here is to compare mean death rates from lung cancer between smokers and nonsmokers. Let us begin with discussing the equal variance assumption. 4.1 F test for Equal Variances The folded form F test is widely used to examine whether two populations have the same sl variance. The statistic is ~ F( n 1, 1) L ns, where L and S respectively indicate groups ss with larger and smaller sample variances. Unless the null hypothesis of equal variances is rejected, the pooled variance estimate s pool is used. The null hypothesis of the independent sample ttest is H : µ µ = D ( y1 y ) D0 t = ~ t( n1 + n 1 1 s pool + n1 n ( ) ( y1 i y1 + y s = n + n ), where y j ( n1 1) s1 + ( n 1) s pool =. 1 n1 + n ) When the assumption is violated, the ttest requires the approximations of the degree of freedom. The null hypothesis and other components of the ttest, however, remain unchanged. Satterthwaite s approximation for the degree of freedom is commonly used. Note that the approximation is a real number, not an integer. y1 y D0 t' = ~ t( df Satterthwaite ), where s1 s + n n df 1 ( n 1)( n 1) 1 Satterthwaite = and ( n1 1)(1 c) + ( n 1) c c = s 1 s1 n n + s 1 1 n 5 1 E ( x1 x ) = µ 1 µ, 1 Var( x = + = + 1 x ) σ n1 n n1 n σ σ 1
13 , The Trustees of Indiana University Comparing Group Means: 13 The SAS TTEST procedure and SPSS TTEST command conduct F tests for equal variance. SAS reports the folded form F statistic, whereas SPSS computes Levene's weighted F statistic. In STATA, the.oneway command produces Bartlett s statistic for the equal variance test. The following is an example of Bartlett's test that does not reject the null hypothesis of equal variance.. oneway lung smoke Analysis of Variance Source SS df MS F Prob > F Between groups Within groups Total Bartlett's test for equal variances: chi(1) = Prob>chi = 0.77 STATA, SAS, and SPSS all compute Satterthwaite s approximation of the degrees of freedom. In addition, the SAS TTEST procedure reports CochranCox approximation and the STATA.ttest command provides Welch s degrees of freedom. 4. Ttest in STATA With the.ttest command, you have to specify a grouping variable smoke in this example in the parenthesis of the by option.. ttest lung, by(smoke) level(95) Twosample t test with equal variances Group Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] combined diff diff = mean(0)  mean(1) t = Ho: diff = 0 degrees of freedom = 4 Ha: diff < 0 Ha: diff!= 0 Ha: diff > 0 Pr(T < t) = Pr( T > t ) = Pr(T > t) = s Let us first check the equal variance. The F statistic is 1.17 = L = ~ F(1,1). The ss degrees of freedom of the numerator and denominator are 1 (=1). The pvalue of.773, virtually the same as that of Bartlett s test above, does not reject the null hypothesis of equal variance. Thus, the ttest here is valid (t= and p<.0000).
14 , The Trustees of Indiana University Comparing Group Means: 14 ( ) 0 t = = ~ t( + ), where 1 1 s pool + ( 1) ( 1)3.418 s pool = = If only aggregate data of the two variables are available, use the.ttesti command and list the number of observations, mean, and standard deviation of the two variables.. ttesti , level(95) Suppose a data set is differently arranged (second type in Figure 1) so that one variable smk_lung has data for smokers and the other non_lung for nonsmokers. You have to use the unpaired option to indicate that two variables are not paired. A grouping variable here is not necessary. Compare the following output with what is printed above.. ttest smk_lung=non_lung, unpaired Twosample t test with equal variances Variable Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] smk_lung non_lung combined diff diff = mean(smk_lung)  mean(non_lung) t = Ho: diff = 0 degrees of freedom = 4 Ha: diff < 0 Ha: diff!= 0 Ha: diff > 0 Pr(T < t) = Pr( T > t ) = Pr(T > t) = This unpaired option is very useful since it enables you to conduct a ttest without additional data manipulation. You may run the.ttest command with the unpaired option to compare two variables, say leukemia and kidney, as independent samples in STATA. In SAS and SPSS, however, you have to stack up two variables and generate a grouping variable before t tests.. ttest leukemia=kidney, unpaired Twosample t test with equal variances Variable Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] leukemia kidney combined diff
15 , The Trustees of Indiana University Comparing Group Means: 15 diff = mean(leukemia)  mean(kidney) t = Ho: diff = 0 degrees of freedom = 86 Ha: diff < 0 Ha: diff!= 0 Ha: diff > 0 Pr(T < t) = Pr( T > t ) = Pr(T > t) = The F = ( ^)/( ^) and its pvalue (=.1797) do not reject the null hypothesis of equal variance. The large t statistic rejects the null hypothesis that death rates from leukemia and kidney cancers have the same mean. 4.3 Ttest in SAS The TTEST procedure by default examines the hypothesis of equal variances, and provides T statistics for either case. The procedure by default reports Satterthwaite s approximation for the degrees of freedom. Keep in mind that a variable to be tested is grouped by the variable that is specified in the CLASS statement. PROC TTEST H0=0 ALPHA=.05 DATA=masil.smoking; CLASS smoke; VAR lung; RUN; The TTEST Procedure Statistics Lower CL Upper CL Lower CL Upper CL Variable smoke N Mean Mean Mean Std Dev Std Dev Std Dev lung lung lung Diff (1) Statistics Variable smoke Std Err Minimum Maximum lung lung lung Diff (1) TTests Variable Method Variances DF t Value Pr > t lung Pooled Equal <.0001 lung Satterthwaite Unequal <.0001 Equality of Variances Variable Method Num DF Den DF F Value Pr > F
16 , The Trustees of Indiana University Comparing Group Means: 16 lung Folded F The F test for equal variance does not reject the null hypothesis of equal variances. Thus, the t test labeled as Pooled should be referred to in order to get the t and its pvalue If the equal variance assumption is violated, the statistics of Satterthwaite and Cochran should be read. If you have a summary data set with the values of variables (lung) and their frequency (count), specify the count variable in the FREQ statement. PROC TTEST DATA=masil.smoking; CLASS smoke; VAR lung; FREQ count; RUN; Now, let us compare the death rates from leukemia and kidney in the second data arrangement type of Figure 1. As mentioned before, you need to rearrange the data set to stack up two variables into one and generate a grouping variable (first type in Figure 1). DATA masil.smoking; SET masil.smoking; death = leukemia; leu_kid ='Leukemia'; OUTPUT; death = kidney; leu_kid ='Kidney'; OUTPUT; KEEP leu_kid death; RUN; PROC TTEST COCHRAN DATA=masil.smoking; CLASS leu_kid; VAR death; RUN; The TTEST Procedure Statistics Lower CL Upper CL Lower CL Upper CL Variable leu_kid N Mean Mean Mean Std Dev Std Dev Std Dev Std Err death Kidney death Leukemia death Diff (1) TTests Variable Method Variances DF t Value Pr > t death Pooled Equal <.0001 death Satterthwaite Unequal <.0001 death Cochran Unequal <.0001 Equality of Variances Variable Method Num DF Den DF F Value Pr > F
17 , The Trustees of Indiana University Comparing Group Means: 17 death Folded F Compare this SAS output with that of STATA in the previous section. 4.4 Ttest in SPSS In the TTEST command, you need to use the /GROUP subcommand in order to specify a grouping variable. SPSS reports Levene's F.0000 that does not reject the null hypothesis of equal variance (p<.995). TTEST GROUPS = smoke(0 1) /VARIABLES = lung /MISSING = ANALYSIS /CRITERIA = CI(.95).
18 , The Trustees of Indiana University Comparing Group Means: Independent Samples with Unequal Variances If the assumption of equal variances is violated, we have to compute the adjusted t statistic using individual sample standard deviations rather than a pooled standard deviation. It is also necessary to use the Satterthwaite, CochranCox (SAS), or Welch (STATA) approximations of the degrees of freedom. In this chapter, you compare mean death rates from kidney cancer between the west (south) and east (north). 5.1 Ttest in STATA As discussed earlier, let us check equality of variances using the.oneway command. The tabulate option produces a table of summary statistics for the groups.. oneway kidney west, tabulate Summary of kidney west Mean Std. Dev. Freq Total Analysis of Variance Source SS df MS F Prob > F Between groups Within groups Total Bartlett's test for equal variances: chi(1) = Prob>chi = Bartlett s chisquared statistic rejects the null hypothesis of equal variance at the.01 level. It is appropriate to use the unequal option in the.ttest command, which calculates Satterthwaite s approximation for the degrees of freedom. Unlike the SAS TTEST procedure, the.ttest command cannot specify the mean difference D 0 other than zero. Thus, the null hypothesis is that the mean difference is zero.. ttest kidney, by(west) unequal level(95) Twosample t test with unequal variances Group Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] combined diff
19 , The Trustees of Indiana University Comparing Group Means: 19 diff = mean(0)  mean(1) t =.7817 Ho: diff = 0 Satterthwaite's degrees of freedom = Ha: diff < 0 Ha: diff!= 0 Ha: diff > 0 Pr(T < t) = Pr( T > t ) = Pr(T > t) = See Satterthwaite s approximation of in the middle of the output. If you want to get Welch s approximation, use the welch as well as unequal options; without the unequal option, the welch is ignored.. ttest kidney, by(west) unequal welch Twosample t test with unequal variances Group Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] combined diff diff = mean(0)  mean(1) t =.7817 Ho: diff = 0 Welch's degrees of freedom = Ha: diff < 0 Ha: diff!= 0 Ha: diff > 0 Pr(T < t) = Pr( T > t ) = Pr(T > t) = Satterthwaite s approximation is slightly smaller than Welch s Again, keep in mind that these approximations are not integers, but real numbers. The t statistic.7817 and its p value.0086 reject the null hypothesis of equal population means. The north and east have larger death rates from kidney cancer per 100 thousand people than the south and west. For aggregate data, use the.ttesti command with the necessary options.. ttesti , unequal welch As mentioned earlier, the unpaired option of the.ttest command directly compares two variables without data manipulation. The option treats the two variables as independent of each other. The following is an example of the unpaired and unequal options.. ttest bladder=kidney, unpaired unequal welch Twosample t test with unequal variances Variable Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] bladder kidney combined diff diff = mean(bladder)  mean(kidney) t = Ho: diff = 0 Welch's degrees of freedom =
20 , The Trustees of Indiana University Comparing Group Means: 0 Ha: diff < 0 Ha: diff!= 0 Ha: diff > 0 Pr(T < t) = Pr( T > t ) = Pr(T > t) = The F = ( ^)/( ^) rejects the null hypothesis of equal variance (p<0001). If the welch option is omitted, Satterthwaite's degree of freedom will be produced instead. For aggregate data, again, use the.ttesti command without the unpaired option.. ttesti , unequal welch level(95) 5. Ttest in SAS The TTEST procedure reports statistics for cases of both equal and unequal variance. You may add the COCHRAN option to compute CochranCox approximations for the degree of freedom. PROC TTEST COCHRAN DATA=masil.smoking; CLASS west; VAR kidney; RUN; The TTEST Procedure Statistics Lower CL Upper CL Lower CL Upper CL Variable s_west N Mean Mean Mean Std Dev Std Dev Std Dev kidney kidney kidney Diff (1) Statistics Variable west Std Err Minimum Maximum kidney kidney kidney Diff (1) TTests Variable Method Variances DF t Value Pr > t kidney Pooled Equal kidney Satterthwaite Unequal kidney Cochran Unequal Equality of Variances Variable Method Num DF Den DF F Value Pr > F kidney Folded F
NCSS Statistical Software
Chapter 06 Introduction This procedure provides several reports for the comparison of two distributions, including confidence intervals for the difference in means, twosample ttests, the ztest, the
More informationStudent Guide to SPSS Barnard College Department of Biological Sciences
Student Guide to SPSS Barnard College Department of Biological Sciences Dan Flynn Table of Contents Introduction... 2 Basics... 4 Starting SPSS... 4 Navigating... 4 Data Editor... 5 SPSS Viewer... 6 Getting
More informationStatistics 512: Applied Linear Models. Topic 6. The response variable Y is continuous (same as in regression).
Topic Overview Statistics 512: Applied Linear Models Topic 6 This topic will cover Oneway Analysis of Variance (ANOVA) OneWay Analysis of Variance (ANOVA) Also called single factor ANOVA. The response
More informationPRINCIPAL COMPONENT ANALYSIS
1 Chapter 1 PRINCIPAL COMPONENT ANALYSIS Introduction: The Basics of Principal Component Analysis........................... 2 A Variable Reduction Procedure.......................................... 2
More informationThe InStat guide to choosing and interpreting statistical tests
Version 3.0 The InStat guide to choosing and interpreting statistical tests Harvey Motulsky 19902003, GraphPad Software, Inc. All rights reserved. Program design, manual and help screens: Programming:
More informationResults from the 2014 AP Statistics Exam. Jessica Utts, University of California, Irvine Chief Reader, AP Statistics jutts@uci.edu
Results from the 2014 AP Statistics Exam Jessica Utts, University of California, Irvine Chief Reader, AP Statistics jutts@uci.edu The six freeresponse questions Question #1: Extracurricular activities
More informationAn Introduction to Regression Analysis
The Inaugural Coase Lecture An Introduction to Regression Analysis Alan O. Sykes * Regression analysis is a statistical tool for the investigation of relationships between variables. Usually, the investigator
More informationMaking Regression Analysis More Useful, II: Dummies and Trends
7 Making Regression Analysis More Useful, II: Dummies and Trends LEARNING OBJECTIVES j Know what a dummy variable is and be able to construct and use one j Know what a trend variable is and be able to
More informationIntroduction to Linear Regression
14. Regression A. Introduction to Simple Linear Regression B. Partitioning Sums of Squares C. Standard Error of the Estimate D. Inferential Statistics for b and r E. Influential Observations F. Regression
More informationThe Capital Asset Pricing Model: Some Empirical Tests
The Capital Asset Pricing Model: Some Empirical Tests Fischer Black* Deceased Michael C. Jensen Harvard Business School MJensen@hbs.edu and Myron Scholes Stanford University  Graduate School of Business
More informationUnrealistic Optimism About Future Life Events
Journal of Personality and Social Psychology 1980, Vol. 39, No. 5, 806820 Unrealistic Optimism About Future Life Events Neil D. Weinstein Department of Human Ecology and Social Sciences Cook College,
More information8 INTERPRETATION OF SURVEY RESULTS
8 INTERPRETATION OF SURVEY RESULTS 8.1 Introduction This chapter discusses the interpretation of survey results, primarily those of the final status survey. Interpreting a survey s results is most straightforward
More informationTesting a Hypothesis about Two Independent Means
1314 Testing a Hypothesis about Two Independent Means How can you test the null hypothesis that two population means are equal, based on the results observed in two independent samples? Why can t you use
More informationUnderstanding and Using ACS SingleYear and Multiyear Estimates
Appendix. Understanding and Using ACS SingleYear and Multiyear Estimates What Are SingleYear and Multiyear Estimates? Understanding Period Estimates The ACS produces period estimates of socioeconomic
More informationBeyond Baseline and Followup: The Case for More T in Experiments * David McKenzie, World Bank. Abstract
Beyond Baseline and Followup: The Case for More T in Experiments * David McKenzie, World Bank Abstract The vast majority of randomized experiments in economics rely on a single baseline and single followup
More informationWhere the Bugs Are. Thomas J. Ostrand AT&T Labs  Research 180 Park Avenue Florham Park, NJ 07932 ostrand@research.att.com. Elaine J.
Where the Bugs Are Thomas J. Ostrand AT&T Labs  Research 180 Park Avenue Florham Park, NJ 07932 ostrand@research.att.com Elaine J. Weyuker AT&T Labs  Research 180 Park Avenue Florham Park, NJ 07932 weyuker@research.att.com
More informationWorking Paper Number 103 December 2006
Working Paper Number 103 December 2006 How to Do xtabond2: An Introduction to Difference and System GMM in Stata By David Roodman Abstract The ArellanoBond (1991) and ArellanoBover (1995)/BlundellBond
More informationCan political science literatures be believed? A study of publication bias in the APSR and the AJPS
Can political science literatures be believed? A study of publication bias in the APSR and the AJPS Alan Gerber Yale University Neil Malhotra Stanford University Abstract Despite great attention to the
More informationMisunderstandings between experimentalists and observationalists about causal inference
J. R. Statist. Soc. A (2008) 171, Part 2, pp. 481 502 Misunderstandings between experimentalists and observationalists about causal inference Kosuke Imai, Princeton University, USA Gary King Harvard University,
More informationIntroduction to. Hypothesis Testing CHAPTER LEARNING OBJECTIVES. 1 Identify the four steps of hypothesis testing.
Introduction to Hypothesis Testing CHAPTER 8 LEARNING OBJECTIVES After reading this chapter, you should be able to: 1 Identify the four steps of hypothesis testing. 2 Define null hypothesis, alternative
More informationMarginal Effects for Continuous Variables Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised February 21, 2015
Marginal Effects for Continuous Variables Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised February 21, 2015 References: Long 1997, Long and Freese 2003 & 2006 & 2014,
More informationWhen Moderation Is Mediated and Mediation Is Moderated
Journal of Personality and Social Psychology Copyright 2005 by the American Psychological Association 2005, Vol. 89, No. 6, 852 863 00223514/05/$12.00 DOI: 10.1037/00223514.89.6.852 When Moderation Is
More informationStatistical Comparisons of Classifiers over Multiple Data Sets
Journal of Machine Learning Research 7 (2006) 1 30 Submitted 8/04; Revised 4/05; Published 1/06 Statistical Comparisons of Classifiers over Multiple Data Sets Janez Demšar Faculty of Computer and Information
More informationGrowth Is Good for the Poor
Forthcoming: Journal of Economic Growth Growth Is Good for the Poor David Dollar Aart Kraay Development Research Group The World Bank First Draft: March 2000 This Draft: March 2002 Abstract: Average incomes
More informationHow and Why Do Teacher Credentials Matter for Student Achievement? C h a r l e s T. Clotfelter
How and Why Do Teacher Credentials Matter for Student Achievement? C h a r l e s T. Clotfelter H e l e n F. Ladd J a c o b L. Vigdor w o r k i n g p a p e r 2 m a r c h 2 0 0 7 How and why do teacher credentials
More informationSome statistical heresies
The Statistician (1999) 48, Part 1, pp. 1±40 Some statistical heresies J. K. Lindsey Limburgs Universitair Centrum, Diepenbeek, Belgium [Read before The Royal Statistical Society on Wednesday, July 15th,
More informationWe show that social scientists often do not take full advantage of
Making the Most of Statistical Analyses: Improving Interpretation and Presentation Gary King Michael Tomz Jason Wittenberg Harvard University Harvard University Harvard University Social scientists rarely
More informationRegression. Chapter 2. 2.1 Weightspace View
Chapter Regression Supervised learning can be divided into regression and classification problems. Whereas the outputs for classification are discrete class labels, regression is concerned with the prediction
More informationNew Features in JMP 9
Version 9 New Features in JMP 9 The real voyage of discovery consists not in seeking new landscapes, but in having new eyes. Marcel Proust JMP, A Business Unit of SAS SAS Campus Drive Cary, NC 27513 The
More informationEVALUATION OF GAUSSIAN PROCESSES AND OTHER METHODS FOR NONLINEAR REGRESSION. Carl Edward Rasmussen
EVALUATION OF GAUSSIAN PROCESSES AND OTHER METHODS FOR NONLINEAR REGRESSION Carl Edward Rasmussen A thesis submitted in conformity with the requirements for the degree of Doctor of Philosophy, Graduate
More information