# Comparing Group Means: The T-test and One-way ANOVA Using STATA, SAS, and SPSS

Save this PDF as:

Size: px
Start display at page:

Download "Comparing Group Means: The T-test and One-way ANOVA Using STATA, SAS, and SPSS"

## Transcription

1 , The Trustees of Indiana University Comparing Group Means: 1 Comparing Group Means: The T-test and One-way ANOVA Using STATA, SAS, and SPSS Hun Myoung Park This document summarizes the method of comparing group means and illustrates how to conduct the t-test and one-way ANOVA using STATA 9.0, SAS 9.1, and SPSS Introduction. Univariate Samples 3. Paired (dependent) Samples 4. Independent Samples with Equal Variances 5. Independent Samples with Unequal Variances 6. One-way ANOVA, GLM, and Regression 7. Conclusion 1. Introduction The t-test and analysis of variance (ANOVA) compare group means. The mean of a variable to be compared should be substantively interpretable. A t-test may examine gender differences in average salary or racial (white versus black) differences in average annual income. The lefthand side (LHS) variable to be tested should be interval or ratio, whereas the right-hand side (RHS) variable should be binary (categorical). 1.1 T-test and ANOVA While the t-test is limited to comparing means of two groups, one-way ANOVA can compare more than two groups. Therefore, the t-test is considered a special case of one-way ANOVA. These analyses do not, however, necessarily imply any causality (i.e., a causal relationship between the left-hand and right-hand side variables). Table 1 compares the t-test and one-way ANOVA. Table 1. Comparison between the T-test and One-way ANOVA T-test One-way ANOVA LHS (Dependent) Interval or ratio variable Interval or ratio variable RHS (Independent) Binary variable with only two groups Categorical variable Null Hypothesis µ 1 = µ µ 1 = µ = µ 3 =... Prob. Distribution * T distribution F distribution * In the case of one degree of freedom on numerator, F=t. The t-test assumes that samples are randomly drawn from normally distributed populations with unknown population means. Otherwise, their means are no longer the best measures of central tendency and the t-test will not be valid. The Central Limit Theorem says, however, that

2 , The Trustees of Indiana University Comparing Group Means: the distributions of y 1 and y are approximately normal when N is large. When n 1 + n 30, in practice, you do not need to worry too much about the normality assumption. You may numerically test the normality assumption using the Shapiro-Wilk W (N<=000), Shapiro-Francia W (N<=5000), Kolmogorov-Smirnov D (N>000), and Jarque-Bera tests. If N is small and the null hypothesis of normality is rejected, you my try such nonparametric methods as the Kolmogorov-Smirnov test, Kruscal-Wallis test, Wilcoxon Rank-Sum Test, or Log-Rank Test, depending on the circumstances. 1. T-test in SAS, STATA, and SPSS In STATA, the.ttest and.ttesti commands are used to conduct t-tests, whereas the.anova and.oneway commands perform one-way ANOVA. SAS has the TTEST procedure for t-test, but the UNIVARIATE, and MEANS procedures also have options for t- test. SAS provides various procedures for the analysis of variance, such as the ANOVA, GLM, and MIXED procedures. The ANOVA procedure can handle balanced data only, while the GLM and MIXED can analyze either balanced or unbalanced data (having the same or different numbers of observations across groups). However, unbalanced data does not cause any problems in the t-test and one-way ANOVA. In SPSS, T-TEST, ONEWAY, and UNIANOVA commands are used to perform t-test and one-way ANOVA. Table summarizes STATA commands, SAS procedures, and SPSS commands that are associated with t-test and one-way ANOVA. Table. Related Procedures and Commands in STATA, SAS, and SPSS STATA 9.0 SE SAS 9.1 SPSS 13.0 Normality Test.sktest;.swilk; UNIVARIATE EXAMINE.sfrancia Equal Variance.oneway TTEST T-TEST Nonparametric.ksmirnov;.kwallis NPAR1WAY NPAR TESTS T-test.ttest TTEST; MEANS T-TEST ANOVA.anova;.oneway ANOVA ONEWAY GLM * GLM; MIXED UNIANOVA * The STATA.glm command is not used for the T test, but for the generalized linear model. 1.3 Data Arrangement There are two types of data arrangement for t-tests (Figure 1). The first data arrangement has a variable to be tested and a grouping variable to classify groups (0 or 1). The second, appropriate especially for paired samples, has two variables to be tested. The two variables in this type are not, however, necessarily paired nor balanced. SAS and SPSS prefer the first data arrangement, whereas STATA can handle either type flexibly. Note that the numbers of observations across groups are not necessarily equal.

3 , The Trustees of Indiana University Comparing Group Means: 3 Figure 1. Two Types of Data Arrangement Variable Group Variable1 Variable x x y y The data set used here is adopted from J. F. Fraumeni s study on cigarette smoking and cancer (Fraumeni 1968). The data are per capita numbers of cigarettes sold by 43 states and the District of Columbia in 1960 together with death rates per hundred thousand people from various forms of cancer. Two variables were added to categorize states into two groups. See the appendix for the details. x x y y

4 , The Trustees of Indiana University Comparing Group Means: 4. Univariate Samples The univariate-sample or one-sample t-test determines whether an unknown population mean µ differs from a hypothesized value c that is commonly set to zero: H 0 : µ = c. The t statistic y c follows Student s T probability distribution with n-1 degrees of freedom, t = ~ t( n 1), s y where y is a variable to be tested and n is the number of observations. 1 Suppose you want to test if the population mean of the death rates from lung cancer is 0 per 100,000 people at the.01 significance level. Note the default significance level used in most software is the.05 level..1 T-test in STATA The.ttest command conducts t-tests in an easy and flexible manner. For a univariate sample test, the command requires that a hypothesized value be explicitly specified. The level() option indicates the confidence level as a percentage. The 99 percent confidence level is equivalent to the.01 significance level.. ttest lung=0, level(99) One-sample t test Variable Obs Mean Std. Err. Std. Dev. [99% Conf. Interval] lung mean = mean(lung) t = Ho: mean = 0 degrees of freedom = 43 Ha: mean < 0 Ha: mean!= 0 Ha: mean > 0 Pr(T < t) = Pr( T > t ) = Pr(T > t) = STATA first lists descriptive statistics of the variable lung. The mean and standard deviation of the 44 observations are and 4.8, respectively. The t statistic is = ( ) / Finally, the degrees of freedom are 43 =44-1. There are three t-tests at the bottom of the output above. The first and third are one-tailed tests, whereas the second is a two-tailed test. The t statistic and its large p-value do not reject the null hypothesis that the population mean of the death rate from lung cancer is 0 at the.01 level. The mean of the death rate may be 0 per 100,000 people. Note that the hypothesized value 0 falls into the 99 percent confidence interval y 1 i ( ) y =, = yi y s s, and standard error s y =. n n 1 n The 99 percent confidence interval of the mean is y tα s = *. 6374, where the.695 is ± y ± the critical value with 43 degree of freedom at the.01 level in the two-tailed test.

5 , The Trustees of Indiana University Comparing Group Means: 5 If you just have the aggregate data (i.e., the number of observations, mean, and standard deviation of the sample), use the.ttesti command to replicate the t-test above. Note the hypothesized value is specified at the end of the summary statistics.. ttesti , level(99). T-test Using the SAS TTEST Procedure The TTEST procedure conducts various types of t-tests in SAS. The H0 option specifies a hypothesized value, whereas the ALPHA indicates a significance level. If omitted, the default values zero and.05 respectively are assumed. PROC TTEST H0=0 ALPHA=.01 DATA=masil.smoking; VAR lung; RUN; The TTEST Procedure Statistics Lower CL Upper CL Lower CL Upper CL Variable N Mean Mean Mean Std Dev Std Dev Std Dev Std Err lung T-Tests Variable DF t Value Pr > t lung The TTEST procedure reports descriptive statistics followed by a one-tailed t-test. You may have a summary data set containing the values of a variable (lung) and their frequencies (count). The FREQ option of the TTEST procedure provides the solution for this case. PROC TTEST H0=0 ALPHA=.01 DATA=masil.smoking; VAR lung; FREQ count; RUN;.3 T-test Using the SAS UNIVARIATE and MEANS Procedures The SAS UNIVARIATE and MEANS procedures also conduct a t-test for a univariate-sample. The UNIVARIATE procedure is basically designed to produces a variety of descriptive statistics of a variable. Its MU0 option tells the procedure to perform a t-test using the hypothesized value specified. The VARDEF=DF specifies a divisor (degrees of freedom) used in

6 , The Trustees of Indiana University Comparing Group Means: 6 computing the variance (standard deviation). 3 The NORMAL option examines if the variable is normally distributed. PROC UNIVARIATE MU0=0 VARDEF=DF NORMAL ALPHA=.01 DATA=masil.smoking; VAR lung; RUN; The UNIVARIATE Procedure Variable: lung Moments N 44 Sum Weights 44 Mean Sum Observations Std Deviation Variance Skewness Kurtosis Uncorrected SS Corrected SS Coeff Variation Std Error Mean Basic Statistical Measures Location Variability Mean Std Deviation 4.81 Median Variance Mode. Range Interquartile Range Tests for Location: Mu0=0 Test -Statistic p Value Student's t t Pr > t Sign M 1 Pr >= M Signed Rank S Pr >= S Tests for Normality Test --Statistic p Value Shapiro-Wilk W Pr < W Kolmogorov-Smirnov D Pr > D > Cramer-von Mises W-Sq Pr > W-Sq >0.500 Anderson-Darling A-Sq Pr > A-Sq >0.500 Quantiles (Definition 5) Quantile Estimate 100% Max The VARDEF=N uses N as a divisor, while VARDEF=WDF specifies the sum of weights minus one.

7 , The Trustees of Indiana University Comparing Group Means: 7 99% % % % Q % Median % Q Quantiles (Definition 5) Quantile Estimate 10% % % % Min Extreme Observations -----Lowest Highest---- Value Obs Value Obs The third block of the output above reports a t statistic and its p-value. The fourth block contains several statistics of normality test. Since N is less than,000, you should read the Shapiro-Wilk W, which suggests that lung is normally distributed (p<.535) The MEANS procedure also conducts t-tests using the T and PROBT options that request the t statistic and its two-tailed p-value. The CLM option produces the two-tailed confidence interval (or upper and lower limits). The MEAN, STD, and STDERR respectively print the sample mean, standard deviation, and standard error. PROC MEANS MEAN STD STDERR T PROBT CLM VARDEF=DF ALPHA=.01 DATA=masil.smoking; VAR lung; RUN; The MEANS Procedure Analysis Variable : lung Lower 99% Upper 99% Mean Std Dev Std Error t Value Pr > t CL for Mean CL for Mean ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ < ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

8 , The Trustees of Indiana University Comparing Group Means: 8 The MEANS procedure does not, however, have an option to specify a hypothesized value to anything other than zero. Thus, the null hypothesis here is that the population mean of death rate from lung cancer is zero. The t statistic is ( )/ The large t statistic and small p-value reject the null hypothesis, reporting a consistent conclusion..4 T-test in SPSS The SPSS has the T-TEST command for t-tests. The /TESTVAL subcommand specifies the value with which the sample mean is compared, whereas the /VARIABLES list the variables to be tested. Like STATA, SPSS specifies a confidence level rather than a significance level in the /CRITERIA=CI() subcommand. T-TEST /TESTVAL = 0 /VARIABLES = lung /MISSING = ANALYSIS /CRITERIA = CI(.99).

9 , The Trustees of Indiana University Comparing Group Means: 9 3. Paired (Dependent) Samples When two variables are not independent, but paired, the difference of these two variables, di = y1 i yi, is treated as if it were a single sample. This test is appropriate for pre-post treatment responses. The null hypothesis is that the true mean difference of the two variables is D 0, H : D 0 µ d = 0. 4 The difference is typically assumed to be zero unless explicitly specified. 3.1 T-test in STATA In order to conduct a paired sample t-test, you need to list two variables separated by an equal sign. The interpretation of the t-test remains almost unchanged. The = ( )/ at 35 degrees of freedom does not reject the null hypothesis that the difference is zero.. ttest pre=post0, level(95) Paired t test Variable Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] pre post diff mean(diff) = mean(pre post0) t = Ho: mean(diff) = 0 degrees of freedom = 35 Ha: mean(diff) < 0 Ha: mean(diff)!= 0 Ha: mean(diff) > 0 Pr(T < t) = Pr( T > t ) = Pr(T > t) = Alternatively, you may first compute the difference between the two variables, and then conduct one-sample t-test. Note that the default confidence level, level(95), can be omitted.. gen d=pre post0. ttest d=0 3. T-test in SAS In the TTEST procedure, you have to use the PAIRED instead of the VAR statement. For the output of the following procedure, refer to the end of this section. PROC TTEST DATA=temp.drug; PAIRED pre*post0; RUN; t d D 4 = 0 ~ t( n 1) d sd, where d n d =, i s d ( ) = di d n 1, and s d = sd n

10 , The Trustees of Indiana University Comparing Group Means: 10 The PAIRED statement provides various ways of comparing variables using asterisk (*) and colon (:) operators. The asterisk requests comparisons between each variable on the left with each variable on the right. The colon requests comparisons between the first variable on the left and the first on the right, the second on the left and the second on the right, and so forth. Consider the following examples. PROC TTEST; PAIRED pro: post0; PAIRED (a b)*(c d); /* Equivalent to PAIRED a*c a*d b*c b*d; */ PAIRED (a b):(c d); /* Equivalent to PAIRED a*c b*c; */ PAIRED (a1-a10)*(b1-b10); RUN; The first PAIRED statement is the same as the PAIRED pre*post0. The second and the third PAIRED statements contrast differences between asterisk and colon operators. The hyphen ( ) operator in the last statement indicates a1 through a10 and b1 through b10. Let us consider an example of the PAIRED statement. PROC TTEST DATA=temp.drug; PAIRED (pre)*(post0-post1); RUN; The TTEST Procedure Statistics Lower CL Upper CL Lower CL Upper CL Difference N Mean Mean Mean Std Dev Std Dev Std Dev Std Err pre - post pre - post T-Tests Difference DF t Value Pr > t pre - post pre - post The first t statistic for pre versus post0 is identical to that of the previous section. The second for pre versus post1 rejects the null hypothesis of no mean difference at the.01 level (p<.000). In order to use the UNIVARIATE and MEANS procedures, the difference between two paired variables should be computed in advance. DATA temp.drug; SET temp.drug; d1 = pre - post0; d = pre - post1; RUN;

11 , The Trustees of Indiana University Comparing Group Means: 11 PROC UNIVARIATE MU0=0 VARDEF=DF NORMAL; VAR d1 d; RUN; PROC MEANS MEAN STD STDERR T PROBT CLM; VAR d1 d; RUN; PROC TTEST ALPHA=.05; VAR d1 d; RUN; 3.3 T-test in SPSS In SPSS, the PAIRS subcommand indicates a paired sample t-test. T-TEST PAIRS = pre post0 /CRITERIA = CI(.95) /MISSING = ANALYSIS.

12 , The Trustees of Indiana University Comparing Group Means: 1 4. Independent Samples with Equal Variances You should check three assumptions first when testing the mean difference of two independent samples. First, the samples are drawn from normally distributed populations with unknown parameters. Second, the two samples are independent in the sense that they are drawn from different populations and/or the elements of one sample are not related to those of the other sample. Finally, the population variances of the two groups, σ 1 and σ are equal. 5 If any one of assumption is violated, the t-test is not valid. An example here is to compare mean death rates from lung cancer between smokers and nonsmokers. Let us begin with discussing the equal variance assumption. 4.1 F test for Equal Variances The folded form F test is widely used to examine whether two populations have the same sl variance. The statistic is ~ F( n 1, 1) L ns, where L and S respectively indicate groups ss with larger and smaller sample variances. Unless the null hypothesis of equal variances is rejected, the pooled variance estimate s pool is used. The null hypothesis of the independent sample t-test is H : µ µ = D ( y1 y ) D0 t = ~ t( n1 + n 1 1 s pool + n1 n ( ) ( y1 i y1 + y s = n + n ), where y j ( n1 1) s1 + ( n 1) s pool =. 1 n1 + n ) When the assumption is violated, the t-test requires the approximations of the degree of freedom. The null hypothesis and other components of the t-test, however, remain unchanged. Satterthwaite s approximation for the degree of freedom is commonly used. Note that the approximation is a real number, not an integer. y1 y D0 t' = ~ t( df Satterthwaite ), where s1 s + n n df 1 ( n 1)( n 1) 1 Satterthwaite = and ( n1 1)(1 c) + ( n 1) c c = s 1 s1 n n + s 1 1 n 5 1 E ( x1 x ) = µ 1 µ, 1 Var( x = + = + 1 x ) σ n1 n n1 n σ σ 1

13 , The Trustees of Indiana University Comparing Group Means: 13 The SAS TTEST procedure and SPSS T-TEST command conduct F tests for equal variance. SAS reports the folded form F statistic, whereas SPSS computes Levene's weighted F statistic. In STATA, the.oneway command produces Bartlett s statistic for the equal variance test. The following is an example of Bartlett's test that does not reject the null hypothesis of equal variance.. oneway lung smoke Analysis of Variance Source SS df MS F Prob > F Between groups Within groups Total Bartlett's test for equal variances: chi(1) = Prob>chi = 0.77 STATA, SAS, and SPSS all compute Satterthwaite s approximation of the degrees of freedom. In addition, the SAS TTEST procedure reports Cochran-Cox approximation and the STATA.ttest command provides Welch s degrees of freedom. 4. T-test in STATA With the.ttest command, you have to specify a grouping variable smoke in this example in the parenthesis of the by option.. ttest lung, by(smoke) level(95) Two-sample t test with equal variances Group Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] combined diff diff = mean(0) - mean(1) t = Ho: diff = 0 degrees of freedom = 4 Ha: diff < 0 Ha: diff!= 0 Ha: diff > 0 Pr(T < t) = Pr( T > t ) = Pr(T > t) = s Let us first check the equal variance. The F statistic is 1.17 = L = ~ F(1,1). The ss degrees of freedom of the numerator and denominator are 1 (=-1). The p-value of.773, virtually the same as that of Bartlett s test above, does not reject the null hypothesis of equal variance. Thus, the t-test here is valid (t= and p<.0000).

14 , The Trustees of Indiana University Comparing Group Means: 14 ( ) 0 t = = ~ t( + ), where 1 1 s pool + ( 1) ( 1)3.418 s pool = = If only aggregate data of the two variables are available, use the.ttesti command and list the number of observations, mean, and standard deviation of the two variables.. ttesti , level(95) Suppose a data set is differently arranged (second type in Figure 1) so that one variable smk_lung has data for smokers and the other non_lung for non-smokers. You have to use the unpaired option to indicate that two variables are not paired. A grouping variable here is not necessary. Compare the following output with what is printed above.. ttest smk_lung=non_lung, unpaired Two-sample t test with equal variances Variable Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] smk_lung non_lung combined diff diff = mean(smk_lung) - mean(non_lung) t = Ho: diff = 0 degrees of freedom = 4 Ha: diff < 0 Ha: diff!= 0 Ha: diff > 0 Pr(T < t) = Pr( T > t ) = Pr(T > t) = This unpaired option is very useful since it enables you to conduct a t-test without additional data manipulation. You may run the.ttest command with the unpaired option to compare two variables, say leukemia and kidney, as independent samples in STATA. In SAS and SPSS, however, you have to stack up two variables and generate a grouping variable before t- tests.. ttest leukemia=kidney, unpaired Two-sample t test with equal variances Variable Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] leukemia kidney combined diff

15 , The Trustees of Indiana University Comparing Group Means: 15 diff = mean(leukemia) - mean(kidney) t = Ho: diff = 0 degrees of freedom = 86 Ha: diff < 0 Ha: diff!= 0 Ha: diff > 0 Pr(T < t) = Pr( T > t ) = Pr(T > t) = The F = ( ^)/( ^) and its p-value (=.1797) do not reject the null hypothesis of equal variance. The large t statistic rejects the null hypothesis that death rates from leukemia and kidney cancers have the same mean. 4.3 T-test in SAS The TTEST procedure by default examines the hypothesis of equal variances, and provides T statistics for either case. The procedure by default reports Satterthwaite s approximation for the degrees of freedom. Keep in mind that a variable to be tested is grouped by the variable that is specified in the CLASS statement. PROC TTEST H0=0 ALPHA=.05 DATA=masil.smoking; CLASS smoke; VAR lung; RUN; The TTEST Procedure Statistics Lower CL Upper CL Lower CL Upper CL Variable smoke N Mean Mean Mean Std Dev Std Dev Std Dev lung lung lung Diff (1-) Statistics Variable smoke Std Err Minimum Maximum lung lung lung Diff (1-) T-Tests Variable Method Variances DF t Value Pr > t lung Pooled Equal <.0001 lung Satterthwaite Unequal <.0001 Equality of Variances Variable Method Num DF Den DF F Value Pr > F

16 , The Trustees of Indiana University Comparing Group Means: 16 lung Folded F The F test for equal variance does not reject the null hypothesis of equal variances. Thus, the t- test labeled as Pooled should be referred to in order to get the t and its p-value If the equal variance assumption is violated, the statistics of Satterthwaite and Cochran should be read. If you have a summary data set with the values of variables (lung) and their frequency (count), specify the count variable in the FREQ statement. PROC TTEST DATA=masil.smoking; CLASS smoke; VAR lung; FREQ count; RUN; Now, let us compare the death rates from leukemia and kidney in the second data arrangement type of Figure 1. As mentioned before, you need to rearrange the data set to stack up two variables into one and generate a grouping variable (first type in Figure 1). DATA masil.smoking; SET masil.smoking; death = leukemia; leu_kid ='Leukemia'; OUTPUT; death = kidney; leu_kid ='Kidney'; OUTPUT; KEEP leu_kid death; RUN; PROC TTEST COCHRAN DATA=masil.smoking; CLASS leu_kid; VAR death; RUN; The TTEST Procedure Statistics Lower CL Upper CL Lower CL Upper CL Variable leu_kid N Mean Mean Mean Std Dev Std Dev Std Dev Std Err death Kidney death Leukemia death Diff (1-) T-Tests Variable Method Variances DF t Value Pr > t death Pooled Equal <.0001 death Satterthwaite Unequal <.0001 death Cochran Unequal <.0001 Equality of Variances Variable Method Num DF Den DF F Value Pr > F

17 , The Trustees of Indiana University Comparing Group Means: 17 death Folded F Compare this SAS output with that of STATA in the previous section. 4.4 T-test in SPSS In the T-TEST command, you need to use the /GROUP subcommand in order to specify a grouping variable. SPSS reports Levene's F.0000 that does not reject the null hypothesis of equal variance (p<.995). T-TEST GROUPS = smoke(0 1) /VARIABLES = lung /MISSING = ANALYSIS /CRITERIA = CI(.95).

18 , The Trustees of Indiana University Comparing Group Means: Independent Samples with Unequal Variances If the assumption of equal variances is violated, we have to compute the adjusted t statistic using individual sample standard deviations rather than a pooled standard deviation. It is also necessary to use the Satterthwaite, Cochran-Cox (SAS), or Welch (STATA) approximations of the degrees of freedom. In this chapter, you compare mean death rates from kidney cancer between the west (south) and east (north). 5.1 T-test in STATA As discussed earlier, let us check equality of variances using the.oneway command. The tabulate option produces a table of summary statistics for the groups.. oneway kidney west, tabulate Summary of kidney west Mean Std. Dev. Freq Total Analysis of Variance Source SS df MS F Prob > F Between groups Within groups Total Bartlett's test for equal variances: chi(1) = Prob>chi = Bartlett s chi-squared statistic rejects the null hypothesis of equal variance at the.01 level. It is appropriate to use the unequal option in the.ttest command, which calculates Satterthwaite s approximation for the degrees of freedom. Unlike the SAS TTEST procedure, the.ttest command cannot specify the mean difference D 0 other than zero. Thus, the null hypothesis is that the mean difference is zero.. ttest kidney, by(west) unequal level(95) Two-sample t test with unequal variances Group Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] combined diff

19 , The Trustees of Indiana University Comparing Group Means: 19 diff = mean(0) - mean(1) t =.7817 Ho: diff = 0 Satterthwaite's degrees of freedom = Ha: diff < 0 Ha: diff!= 0 Ha: diff > 0 Pr(T < t) = Pr( T > t ) = Pr(T > t) = See Satterthwaite s approximation of in the middle of the output. If you want to get Welch s approximation, use the welch as well as unequal options; without the unequal option, the welch is ignored.. ttest kidney, by(west) unequal welch Two-sample t test with unequal variances Group Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] combined diff diff = mean(0) - mean(1) t =.7817 Ho: diff = 0 Welch's degrees of freedom = Ha: diff < 0 Ha: diff!= 0 Ha: diff > 0 Pr(T < t) = Pr( T > t ) = Pr(T > t) = Satterthwaite s approximation is slightly smaller than Welch s Again, keep in mind that these approximations are not integers, but real numbers. The t statistic.7817 and its p- value.0086 reject the null hypothesis of equal population means. The north and east have larger death rates from kidney cancer per 100 thousand people than the south and west. For aggregate data, use the.ttesti command with the necessary options.. ttesti , unequal welch As mentioned earlier, the unpaired option of the.ttest command directly compares two variables without data manipulation. The option treats the two variables as independent of each other. The following is an example of the unpaired and unequal options.. ttest bladder=kidney, unpaired unequal welch Two-sample t test with unequal variances Variable Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] bladder kidney combined diff diff = mean(bladder) - mean(kidney) t = Ho: diff = 0 Welch's degrees of freedom =

20 , The Trustees of Indiana University Comparing Group Means: 0 Ha: diff < 0 Ha: diff!= 0 Ha: diff > 0 Pr(T < t) = Pr( T > t ) = Pr(T > t) = The F = ( ^)/( ^) rejects the null hypothesis of equal variance (p<0001). If the welch option is omitted, Satterthwaite's degree of freedom will be produced instead. For aggregate data, again, use the.ttesti command without the unpaired option.. ttesti , unequal welch level(95) 5. T-test in SAS The TTEST procedure reports statistics for cases of both equal and unequal variance. You may add the COCHRAN option to compute Cochran-Cox approximations for the degree of freedom. PROC TTEST COCHRAN DATA=masil.smoking; CLASS west; VAR kidney; RUN; The TTEST Procedure Statistics Lower CL Upper CL Lower CL Upper CL Variable s_west N Mean Mean Mean Std Dev Std Dev Std Dev kidney kidney kidney Diff (1-) Statistics Variable west Std Err Minimum Maximum kidney kidney kidney Diff (1-) T-Tests Variable Method Variances DF t Value Pr > t kidney Pooled Equal kidney Satterthwaite Unequal kidney Cochran Unequal Equality of Variances Variable Method Num DF Den DF F Value Pr > F kidney Folded F

### Basic Statistical and Modeling Procedures Using SAS

Basic Statistical and Modeling Procedures Using SAS One-Sample Tests The statistical procedures illustrated in this handout use two datasets. The first, Pulse, has information collected in a classroom

### I n d i a n a U n i v e r s i t y University Information Technology Services

I n d i a n a U n i v e r s i t y University Information Technology Services Comparing Group Means: T-tests and One-way ANOVA Using Stata, SAS, R, and SPSS * Hun Myoung Park, Ph.D. kucc65@indiana.edu 003-009

### Introduction to Stata

Introduction to Stata September 23, 2014 Stata is one of a few statistical analysis programs that social scientists use. Stata is in the mid-range of how easy it is to use. Other options include SPSS,

### An Introduction to Statistical Tests for the SAS Programmer Sara Beck, Fred Hutchinson Cancer Research Center, Seattle, WA

ABSTRACT An Introduction to Statistical Tests for the SAS Programmer Sara Beck, Fred Hutchinson Cancer Research Center, Seattle, WA Often SAS Programmers find themselves in situations where performing

### SAS Software to Fit the Generalized Linear Model

SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling

: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary

### Interaction effects and group comparisons Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised February 20, 2015

Interaction effects and group comparisons Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised February 20, 2015 Note: This handout assumes you understand factor variables,

### Generalized Linear Models

Generalized Linear Models We have previously worked with regression models where the response variable is quantitative and normally distributed. Now we turn our attention to two types of models where the

### Introduction to Regression and Data Analysis

Statlab Workshop Introduction to Regression and Data Analysis with Dan Campbell and Sherlock Campbell October 28, 2008 I. The basics A. Types of variables Your variables may take several forms, and it

### Outline. Topic 4 - Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares

Topic 4 - Analysis of Variance Approach to Regression Outline Partitioning sums of squares Degrees of freedom Expected mean squares General linear test - Fall 2013 R 2 and the coefficient of correlation

### Please follow the directions once you locate the Stata software in your computer. Room 114 (Business Lab) has computers with Stata software

STATA Tutorial Professor Erdinç Please follow the directions once you locate the Stata software in your computer. Room 114 (Business Lab) has computers with Stata software 1.Wald Test Wald Test is used

### Models for Count Data With Overdispersion

Models for Count Data With Overdispersion Germán Rodríguez November 6, 2013 Abstract This addendum to the WWS 509 notes covers extra-poisson variation and the negative binomial model, with brief appearances

### I n d i a n a U n i v e r s i t y U n i v e r s i t y I n f o r m a t i o n T e c h n o l o g y S e r v i c e s

I n d i a n a U n i v e r s i t y U n i v e r s i t y I n f o r m a t i o n T e c h n o l o g y S e r v i c e s Linear Regression Models for Panel Data Using SAS, Stata, LIMDEP, and SPSS * Hun Myoung Park,

### EXST SAS Lab Lab #9: Two-sample t-tests

EXST700X Lab Spring 014 EXST SAS Lab Lab #9: Two-sample t-tests Objectives 1. Input a CSV file (data set #1) and do a one-tailed two-sample t-test. Input a TXT file (data set #) and do a two-tailed two-sample

### ECON 142 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE #2

University of California, Berkeley Prof. Ken Chay Department of Economics Fall Semester, 005 ECON 14 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE # Question 1: a. Below are the scatter plots of hourly wages

### Statistical Analysis The First Steps Jennifer L. Waller Medical College of Georgia, Augusta, Georgia

Statistical Analysis The First Steps Jennifer L. Waller Medical College of Georgia, Augusta, Georgia ABSTRACT For both statisticians and non-statisticians, knowing what data look like before more rigorous

### Statistical Modeling Using SAS

Statistical Modeling Using SAS Xiangming Fang Department of Biostatistics East Carolina University SAS Code Workshop Series 2012 Xiangming Fang (Department of Biostatistics) Statistical Modeling Using

### EXST SAS Lab Lab #7: Hypothesis testing with Paired t-tests and One-tailed t-tests

EXST SAS Lab Lab #7: Hypothesis testing with Paired t-tests and One-tailed t-tests Objectives 1. Infile two external data sets (TXT files) 2. Calculate a difference between two variables in the data step

### How Do We Test Multiple Regression Coefficients?

How Do We Test Multiple Regression Coefficients? Suppose you have constructed a multiple linear regression model and you have a specific hypothesis to test which involves more than one regression coefficient.

### New SAS Procedures for Analysis of Sample Survey Data

New SAS Procedures for Analysis of Sample Survey Data Anthony An and Donna Watts, SAS Institute Inc, Cary, NC Abstract Researchers use sample surveys to obtain information on a wide variety of issues Many

### How to Perform and Interpret Chi-Square and T-Tests Jennifer L. Waller, Georgia Health Sciences University, Augusta, Georgia

Paper HW-05 How to Perform and Interpret Chi-Square and T-Tests Jennifer L. Waller, Georgia Health Sciences University, Augusta, Georgia ABSTRACT For both statisticians and non-statisticians, knowing what

### Statistics, Data Analysis & Econometrics

Using the LOGISTIC Procedure to Model Responses to Financial Services Direct Marketing David Marsh, Senior Credit Risk Modeler, Canadian Tire Financial Services, Welland, Ontario ABSTRACT It is more important

### 1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

1 Final Review 2 Review 2.1 CI 1-propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years

### Failure to take the sampling scheme into account can lead to inaccurate point estimates and/or flawed estimates of the standard errors.

Analyzing Complex Survey Data: Some key issues to be aware of Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised January 24, 2015 Rather than repeat material that is

### Statistics 104 Final Project A Culture of Debt: A Study of Credit Card Spending in America TF: Kevin Rader Anonymous Students: LD, MH, IW, MY

Statistics 104 Final Project A Culture of Debt: A Study of Credit Card Spending in America TF: Kevin Rader Anonymous Students: LD, MH, IW, MY ABSTRACT: This project attempted to determine the relationship

### One-sample normal hypothesis Testing, paired t-test, two-sample normal inference, normal probability plots

1 / 27 One-sample normal hypothesis Testing, paired t-test, two-sample normal inference, normal probability plots Timothy Hanson Department of Statistics, University of South Carolina Stat 704: Data Analysis

### ECON Introductory Econometrics. Lecture 17: Experiments

ECON4150 - Introductory Econometrics Lecture 17: Experiments Monique de Haan (moniqued@econ.uio.no) Stock and Watson Chapter 13 Lecture outline 2 Why study experiments? The potential outcome framework.

### Marginal Effects for Continuous Variables Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised February 21, 2015

Marginal Effects for Continuous Variables Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised February 21, 2015 References: Long 1997, Long and Freese 2003 & 2006 & 2014,

### Using Stata for Categorical Data Analysis

Using Stata for Categorical Data Analysis NOTE: These problems make extensive use of Nick Cox s tab_chi, which is actually a collection of routines, and Adrian Mander s ipf command. From within Stata,

### Lesson 1: Comparison of Population Means Part c: Comparison of Two- Means

Lesson : Comparison of Population Means Part c: Comparison of Two- Means Welcome to lesson c. This third lesson of lesson will discuss hypothesis testing for two independent means. Steps in Hypothesis

### Handling missing data in Stata a whirlwind tour

Handling missing data in Stata a whirlwind tour 2012 Italian Stata Users Group Meeting Jonathan Bartlett www.missingdata.org.uk 20th September 2012 1/55 Outline The problem of missing data and a principled

### Statistics for Clinical Trial SAS Programmers 1: paired t-test Kevin Lee, Covance Inc., Conshohocken, PA

Statistics for Clinical Trial SAS Programmers 1: paired t-test Kevin Lee, Covance Inc., Conshohocken, PA ABSTRACT This paper is intended for SAS programmers who are interested in understanding common statistical

### xtmixed & denominator degrees of freedom: myth or magic

xtmixed & denominator degrees of freedom: myth or magic 2011 Chicago Stata Conference Phil Ender UCLA Statistical Consulting Group July 2011 Phil Ender xtmixed & denominator degrees of freedom: myth or

### Econ 371 Problem Set #3 Answer Sheet

Econ 371 Problem Set #3 Answer Sheet 4.1 In this question, you are told that a OLS regression analysis of third grade test scores as a function of class size yields the following estimated model. T estscore

### The Chi-Square Diagnostic Test for Count Data Models

The Chi-Square Diagnostic Test for Count Data Models M. Manjón-Antoĺın and O. Martínez-Ibañez QURE-CREIP Department of Economics, Rovira i Virgili University. 2012 Spanish Stata Users Group Meeting (Universitat

### From the help desk: hurdle models

The Stata Journal (2003) 3, Number 2, pp. 178 184 From the help desk: hurdle models Allen McDowell Stata Corporation Abstract. This article demonstrates that, although there is no command in Stata for

### Dongfeng Li. Autumn 2010

Autumn 2010 Chapter Contents Some statistics background; ; Comparing means and proportions; variance. Students should master the basic concepts, descriptive statistics measures and graphs, basic hypothesis

### Multinomial and Ordinal Logistic Regression

Multinomial and Ordinal Logistic Regression ME104: Linear Regression Analysis Kenneth Benoit August 22, 2012 Regression with categorical dependent variables When the dependent variable is categorical,

### Regression Analysis. Data Calculations Output

Regression Analysis In an attempt to find answers to questions such as those posed above, empirical labour economists use a useful tool called regression analysis. Regression analysis is essentially a

### Using Stata 11 & higher for Logistic Regression Richard Williams, University of Notre Dame, Last revised March 28, 2015

Using Stata 11 & higher for Logistic Regression Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised March 28, 2015 NOTE: The routines spost13, lrdrop1, and extremes are

### MEASURES OF LOCATION AND SPREAD

Paper TU04 An Overview of Non-parametric Tests in SAS : When, Why, and How Paul A. Pappas and Venita DePuy Durham, North Carolina, USA ABSTRACT Most commonly used statistical procedures are based on the

### NCSS Statistical Software

Chapter 06 Introduction This procedure provides several reports for the comparison of two distributions, including confidence intervals for the difference in means, two-sample t-tests, the z-test, the

### Let s explore SAS Proc T-Test

Let s explore SAS Proc T-Test Ana Yankovsky Research Statistical Analyst Screening Programs, AHS Ana.Yankovsky@albertahealthservices.ca Goals of the presentation: 1. Look at the structure of Proc TTEST;

### Quick Stata Guide by Liz Foster

by Liz Foster Table of Contents Part 1: 1 describe 1 generate 1 regress 3 scatter 4 sort 5 summarize 5 table 6 tabulate 8 test 10 ttest 11 Part 2: Prefixes and Notes 14 by var: 14 capture 14 use of the

### Experimental Design for Influential Factors of Rates on Massive Open Online Courses

Experimental Design for Influential Factors of Rates on Massive Open Online Courses December 12, 2014 Ning Li nli7@stevens.edu Qing Wei qwei1@stevens.edu Yating Lan ylan2@stevens.edu Yilin Wei ywei12@stevens.edu

### Guido s Guide to PROC MEANS A Tutorial for Beginners Using the SAS System Joseph J. Guido, University of Rochester Medical Center, Rochester, NY

Guido s Guide to PROC MEANS A Tutorial for Beginners Using the SAS System Joseph J. Guido, University of Rochester Medical Center, Rochester, NY ABSTRACT PROC MEANS is a basic procedure within BASE SAS

### Testing Group Differences using T-tests, ANOVA, and Nonparametric Measures

Testing Group Differences using T-tests, ANOVA, and Nonparametric Measures Jamie DeCoster Department of Psychology University of Alabama 348 Gordon Palmer Hall Box 870348 Tuscaloosa, AL 35487-0348 Phone:

### Module 9: Nonparametric Tests. The Applied Research Center

Module 9: Nonparametric Tests The Applied Research Center Module 9 Overview } Nonparametric Tests } Parametric vs. Nonparametric Tests } Restrictions of Nonparametric Tests } One-Sample Chi-Square Test

### HURDLE AND SELECTION MODELS Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics July 2009

HURDLE AND SELECTION MODELS Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics July 2009 1. Introduction 2. A General Formulation 3. Truncated Normal Hurdle Model 4. Lognormal

### ESTIMATING AVERAGE TREATMENT EFFECTS: IV AND CONTROL FUNCTIONS, II Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics

ESTIMATING AVERAGE TREATMENT EFFECTS: IV AND CONTROL FUNCTIONS, II Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics July 2009 1. Quantile Treatment Effects 2. Control Functions

### Guide to Microsoft Excel for calculations, statistics, and plotting data

Page 1/47 Guide to Microsoft Excel for calculations, statistics, and plotting data Topic Page A. Writing equations and text 2 1. Writing equations with mathematical operations 2 2. Writing equations with

### Data Analysis Tools. Tools for Summarizing Data

Data Analysis Tools This section of the notes is meant to introduce you to many of the tools that are provided by Excel under the Tools/Data Analysis menu item. If your computer does not have that tool

### NCSS Statistical Software

Chapter 06 Introduction This procedure provides several reports for the comparison of two distributions, including confidence intervals for the difference in means, two-sample t-tests, the z-test, the

### Regression in Stata. Alicia Doyle Lynch Harvard-MIT Data Center (HMDC)

Regression in Stata Alicia Doyle Lynch Harvard-MIT Data Center (HMDC) Documents for Today Find class materials at: http://libraries.mit.edu/guides/subjects/data/ training/workshops.html Several formats

### KSTAT MINI-MANUAL. Decision Sciences 434 Kellogg Graduate School of Management

KSTAT MINI-MANUAL Decision Sciences 434 Kellogg Graduate School of Management Kstat is a set of macros added to Excel and it will enable you to do the statistics required for this course very easily. To

### August 2012 EXAMINATIONS Solution Part I

August 01 EXAMINATIONS Solution Part I (1) In a random sample of 600 eligible voters, the probability that less than 38% will be in favour of this policy is closest to (B) () In a large random sample,

### REGRESSION LINES IN STATA

REGRESSION LINES IN STATA THOMAS ELLIOTT 1. Introduction to Regression Regression analysis is about eploring linear relationships between a dependent variable and one or more independent variables. Regression

### Syntax Menu Description Options Remarks and examples Stored results Methods and formulas References Also see. level(#) , options2

Title stata.com ttest t tests (mean-comparison tests) Syntax Syntax Menu Description Options Remarks and examples Stored results Methods and formulas References Also see One-sample t test ttest varname

### How to set the main menu of STATA to default factory settings standards

University of Pretoria Data analysis for evaluation studies Examples in STATA version 11 List of data sets b1.dta (To be created by students in class) fp1.xls (To be provided to students) fp1.txt (To be

### Introduction to Quantitative Methods

Introduction to Quantitative Methods October 15, 2009 Contents 1 Definition of Key Terms 2 2 Descriptive Statistics 3 2.1 Frequency Tables......................... 4 2.2 Measures of Central Tendencies.................

### Data Analysis. Lecture Empirical Model Building and Methods (Empirische Modellbildung und Methoden) SS Analysis of Experiments - Introduction

Data Analysis Lecture Empirical Model Building and Methods (Empirische Modellbildung und Methoden) Prof. Dr. Dr. h.c. Dieter Rombach Dr. Andreas Jedlitschka SS 2014 Analysis of Experiments - Introduction

### GLM I An Introduction to Generalized Linear Models

GLM I An Introduction to Generalized Linear Models CAS Ratemaking and Product Management Seminar March 2009 Presented by: Tanya D. Havlicek, Actuarial Assistant 0 ANTITRUST Notice The Casualty Actuarial

### IBM SPSS Missing Values 22

IBM SPSS Missing Values 22 Note Before using this information and the product it supports, read the information in Notices on page 23. Product Information This edition applies to version 22, release 0,

### Statistical Modelling in Stata 5: Linear Models

Statistical Modelling in Stata 5: Linear Models Mark Lunt Arthritis Research UK Centre for Excellence in Epidemiology University of Manchester 08/11/2016 Structure This Week What is a linear model? How

### Statistics Review PSY379

Statistics Review PSY379 Basic concepts Measurement scales Populations vs. samples Continuous vs. discrete variable Independent vs. dependent variable Descriptive vs. inferential stats Common analyses

### Aileen Murphy, Department of Economics, UCC, Ireland. WORKING PAPER SERIES 07-10

AN ECONOMETRIC ANALYSIS OF SMOKING BEHAVIOUR IN IRELAND Aileen Murphy, Department of Economics, UCC, Ireland. DEPARTMENT OF ECONOMICS WORKING PAPER SERIES 07-10 1 AN ECONOMETRIC ANALYSIS OF SMOKING BEHAVIOUR

### SAS 3: Comparing Means

SAS 3: Comparing Means University of Guelph Revised June 2011 Table of Contents SAS Availability... 2 Goals of the workshop... 2 Data for SAS sessions... 3 Statistical Background... 4 T-test... 8 1. Independent

### In Chapter 2, we used linear regression to describe linear relationships. The setting for this is a

Math 143 Inference on Regression 1 Review of Linear Regression In Chapter 2, we used linear regression to describe linear relationships. The setting for this is a bivariate data set (i.e., a list of cases/subjects

### Descriptive Statistics

Descriptive Statistics Primer Descriptive statistics Central tendency Variation Relative position Relationships Calculating descriptive statistics Descriptive Statistics Purpose to describe or summarize

### Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics For 2015 Examinations Aim The aim of the Probability and Mathematical Statistics subject is to provide a grounding in

### Understanding the Statistical Power of a Test

2004 by Jeeshim and KUCC625 (11/28/2004) Understanding the Statistical Power: 1 Understanding the Statistical Power of a Test Hun Myoung Park Software Consultant UITS Center for Statistical and Mathematical

### Outline of Topics. Statistical Methods I. Types of Data. Descriptive Statistics

Statistical Methods I Tamekia L. Jones, Ph.D. (tjones@cog.ufl.edu) Research Assistant Professor Children s Oncology Group Statistics & Data Center Department of Biostatistics Colleges of Medicine and Public

### Regression III: Dummy Variable Regression

Regression III: Dummy Variable Regression Tom Ilvento FREC 408 Linear Regression Assumptions about the error term Mean of Probability Distribution of the Error term is zero Probability Distribution of

### Hypothesis Testing and Statistical Power of a Test *

2004-2008, The Trustees of Indiana University Hypothesis Testing and Statistical Power: 1 Hypothesis Testing and Statistical Power of a Test * Hun Myoung Park (kucc625@indiana.edu) This document provides

### Recall this chart that showed how most of our course would be organized:

Chapter 4 One-Way ANOVA Recall this chart that showed how most of our course would be organized: Explanatory Variable(s) Response Variable Methods Categorical Categorical Contingency Tables Categorical

### Poisson Models for Count Data

Chapter 4 Poisson Models for Count Data In this chapter we study log-linear models for count data under the assumption of a Poisson error structure. These models have many applications, not only to the

### Regression in SPSS. Workshop offered by the Mississippi Center for Supercomputing Research and the UM Office of Information Technology

Regression in SPSS Workshop offered by the Mississippi Center for Supercomputing Research and the UM Office of Information Technology John P. Bentley Department of Pharmacy Administration University of

### MODEL I: DRINK REGRESSED ON GPA & MALE, WITHOUT CENTERING

Interpreting Interaction Effects; Interaction Effects and Centering Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised February 20, 2015 Models with interaction effects

### An analysis method for a quantitative outcome and two categorical explanatory variables.

Chapter 11 Two-Way ANOVA An analysis method for a quantitative outcome and two categorical explanatory variables. If an experiment has a quantitative outcome and two categorical explanatory variables that

### Statistics and research

Statistics and research Usaneya Perngparn Chitlada Areesantichai Drug Dependence Research Center (WHOCC for Research and Training in Drug Dependence) College of Public Health Sciences Chulolongkorn University,

### Random effects and nested models with SAS

Random effects and nested models with SAS /************* classical2.sas ********************* Three levels of factor A, four levels of B Both fixed Both random A fixed, B random B nested within A ***************************************************/

### Analyses on Hurricane Archival Data June 17, 2014

Analyses on Hurricane Archival Data June 17, 2014 This report provides detailed information about analyses of archival data in our PNAS article http://www.pnas.org/content/early/2014/05/29/1402786111.abstract

### Econ 371 Problem Set #3 Answer Sheet

Econ 371 Problem Set #3 Answer Sheet 4.3 In this question, you are told that a OLS regression analysis of average weekly earnings yields the following estimated model. AW E = 696.7 + 9.6 Age, R 2 = 0.023,

### Module 5: Multiple Regression Analysis

Using Statistical Data Using to Make Statistical Decisions: Data Multiple to Make Regression Decisions Analysis Page 1 Module 5: Multiple Regression Analysis Tom Ilvento, University of Delaware, College

### Data Mining: An Overview of Methods and Technologies for Increasing Profits in Direct Marketing. C. Olivia Rud, VP, Fleet Bank

Data Mining: An Overview of Methods and Technologies for Increasing Profits in Direct Marketing C. Olivia Rud, VP, Fleet Bank ABSTRACT Data Mining is a new term for the common practice of searching through

### IAPRI Quantitative Analysis Capacity Building Series. Multiple regression analysis & interpreting results

IAPRI Quantitative Analysis Capacity Building Series Multiple regression analysis & interpreting results How important is R-squared? R-squared Published in Agricultural Economics 0.45 Best article of the

### Tutorial 5: Hypothesis Testing

Tutorial 5: Hypothesis Testing Rob Nicholls nicholls@mrc-lmb.cam.ac.uk MRC LMB Statistics Course 2014 Contents 1 Introduction................................ 1 2 Testing distributional assumptions....................

### Data Analysis Methodology 1

Data Analysis Methodology 1 Suppose you inherited the database in Table 1.1 and needed to find out what could be learned from it fast. Say your boss entered your office and said, Here s some software project

### MINITAB ASSISTANT WHITE PAPER

MINITAB ASSISTANT WHITE PAPER This paper explains the research conducted by Minitab statisticians to develop the methods and data checks used in the Assistant in Minitab 17 Statistical Software. One-Way

### Multiple Regression YX1 YX2 X1X2 YX1.X2

Multiple Regression Simple or total correlation: relationship between one dependent and one independent variable, Y versus X Coefficient of simple determination: r (or r, r ) YX YX XX Partial correlation:

### Regression Analysis: A Complete Example

Regression Analysis: A Complete Example This section works out an example that includes all the topics we have discussed so far in this chapter. A complete example of regression analysis. PhotoDisc, Inc./Getty

### How to choose a statistical test. Francisco J. Candido dos Reis DGO-FMRP University of São Paulo

How to choose a statistical test Francisco J. Candido dos Reis DGO-FMRP University of São Paulo Choosing the right test One of the most common queries in stats support is Which analysis should I use There

### Confidence Intervals for the Difference Between Two Means

Chapter 47 Confidence Intervals for the Difference Between Two Means Introduction This procedure calculates the sample size necessary to achieve a specified distance from the difference in sample means

### Predictor Coef StDev T P Constant 970667056 616256122 1.58 0.154 X 0.00293 0.06163 0.05 0.963. S = 0.5597 R-Sq = 0.0% R-Sq(adj) = 0.

Statistical analysis using Microsoft Excel Microsoft Excel spreadsheets have become somewhat of a standard for data storage, at least for smaller data sets. This, along with the program often being packaged

### Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.

Business Course Text Bowerman, Bruce L., Richard T. O'Connell, J. B. Orris, and Dawn C. Porter. Essentials of Business, 2nd edition, McGraw-Hill/Irwin, 2008, ISBN: 978-0-07-331988-9. Required Computing

### Introduction. Hypothesis Testing. Hypothesis Testing. Significance Testing

Introduction Hypothesis Testing Mark Lunt Arthritis Research UK Centre for Ecellence in Epidemiology University of Manchester 13/10/2015 We saw last week that we can never know the population parameters