Comparing Group Means: The Ttest and Oneway ANOVA Using STATA, SAS, and SPSS


 Willa Hodges
 1 years ago
 Views:
Transcription
1 , The Trustees of Indiana University Comparing Group Means: 1 Comparing Group Means: The Ttest and Oneway ANOVA Using STATA, SAS, and SPSS Hun Myoung Park This document summarizes the method of comparing group means and illustrates how to conduct the ttest and oneway ANOVA using STATA 9.0, SAS 9.1, and SPSS Introduction. Univariate Samples 3. Paired (dependent) Samples 4. Independent Samples with Equal Variances 5. Independent Samples with Unequal Variances 6. Oneway ANOVA, GLM, and Regression 7. Conclusion 1. Introduction The ttest and analysis of variance (ANOVA) compare group means. The mean of a variable to be compared should be substantively interpretable. A ttest may examine gender differences in average salary or racial (white versus black) differences in average annual income. The lefthand side (LHS) variable to be tested should be interval or ratio, whereas the righthand side (RHS) variable should be binary (categorical). 1.1 Ttest and ANOVA While the ttest is limited to comparing means of two groups, oneway ANOVA can compare more than two groups. Therefore, the ttest is considered a special case of oneway ANOVA. These analyses do not, however, necessarily imply any causality (i.e., a causal relationship between the lefthand and righthand side variables). Table 1 compares the ttest and oneway ANOVA. Table 1. Comparison between the Ttest and Oneway ANOVA Ttest Oneway ANOVA LHS (Dependent) Interval or ratio variable Interval or ratio variable RHS (Independent) Binary variable with only two groups Categorical variable Null Hypothesis µ 1 = µ µ 1 = µ = µ 3 =... Prob. Distribution * T distribution F distribution * In the case of one degree of freedom on numerator, F=t. The ttest assumes that samples are randomly drawn from normally distributed populations with unknown population means. Otherwise, their means are no longer the best measures of central tendency and the ttest will not be valid. The Central Limit Theorem says, however, that
2 , The Trustees of Indiana University Comparing Group Means: the distributions of y 1 and y are approximately normal when N is large. When n 1 + n 30, in practice, you do not need to worry too much about the normality assumption. You may numerically test the normality assumption using the ShapiroWilk W (N<=000), ShapiroFrancia W (N<=5000), KolmogorovSmirnov D (N>000), and JarqueBera tests. If N is small and the null hypothesis of normality is rejected, you my try such nonparametric methods as the KolmogorovSmirnov test, KruscalWallis test, Wilcoxon RankSum Test, or LogRank Test, depending on the circumstances. 1. Ttest in SAS, STATA, and SPSS In STATA, the.ttest and.ttesti commands are used to conduct ttests, whereas the.anova and.oneway commands perform oneway ANOVA. SAS has the TTEST procedure for ttest, but the UNIVARIATE, and MEANS procedures also have options for t test. SAS provides various procedures for the analysis of variance, such as the ANOVA, GLM, and MIXED procedures. The ANOVA procedure can handle balanced data only, while the GLM and MIXED can analyze either balanced or unbalanced data (having the same or different numbers of observations across groups). However, unbalanced data does not cause any problems in the ttest and oneway ANOVA. In SPSS, TTEST, ONEWAY, and UNIANOVA commands are used to perform ttest and oneway ANOVA. Table summarizes STATA commands, SAS procedures, and SPSS commands that are associated with ttest and oneway ANOVA. Table. Related Procedures and Commands in STATA, SAS, and SPSS STATA 9.0 SE SAS 9.1 SPSS 13.0 Normality Test.sktest;.swilk; UNIVARIATE EXAMINE.sfrancia Equal Variance.oneway TTEST TTEST Nonparametric.ksmirnov;.kwallis NPAR1WAY NPAR TESTS Ttest.ttest TTEST; MEANS TTEST ANOVA.anova;.oneway ANOVA ONEWAY GLM * GLM; MIXED UNIANOVA * The STATA.glm command is not used for the T test, but for the generalized linear model. 1.3 Data Arrangement There are two types of data arrangement for ttests (Figure 1). The first data arrangement has a variable to be tested and a grouping variable to classify groups (0 or 1). The second, appropriate especially for paired samples, has two variables to be tested. The two variables in this type are not, however, necessarily paired nor balanced. SAS and SPSS prefer the first data arrangement, whereas STATA can handle either type flexibly. Note that the numbers of observations across groups are not necessarily equal.
3 , The Trustees of Indiana University Comparing Group Means: 3 Figure 1. Two Types of Data Arrangement Variable Group Variable1 Variable x x y y The data set used here is adopted from J. F. Fraumeni s study on cigarette smoking and cancer (Fraumeni 1968). The data are per capita numbers of cigarettes sold by 43 states and the District of Columbia in 1960 together with death rates per hundred thousand people from various forms of cancer. Two variables were added to categorize states into two groups. See the appendix for the details. x x y y
4 , The Trustees of Indiana University Comparing Group Means: 4. Univariate Samples The univariatesample or onesample ttest determines whether an unknown population mean µ differs from a hypothesized value c that is commonly set to zero: H 0 : µ = c. The t statistic y c follows Student s T probability distribution with n1 degrees of freedom, t = ~ t( n 1), s y where y is a variable to be tested and n is the number of observations. 1 Suppose you want to test if the population mean of the death rates from lung cancer is 0 per 100,000 people at the.01 significance level. Note the default significance level used in most software is the.05 level..1 Ttest in STATA The.ttest command conducts ttests in an easy and flexible manner. For a univariate sample test, the command requires that a hypothesized value be explicitly specified. The level() option indicates the confidence level as a percentage. The 99 percent confidence level is equivalent to the.01 significance level.. ttest lung=0, level(99) Onesample t test Variable Obs Mean Std. Err. Std. Dev. [99% Conf. Interval] lung mean = mean(lung) t = Ho: mean = 0 degrees of freedom = 43 Ha: mean < 0 Ha: mean!= 0 Ha: mean > 0 Pr(T < t) = Pr( T > t ) = Pr(T > t) = STATA first lists descriptive statistics of the variable lung. The mean and standard deviation of the 44 observations are and 4.8, respectively. The t statistic is = ( ) / Finally, the degrees of freedom are 43 =441. There are three ttests at the bottom of the output above. The first and third are onetailed tests, whereas the second is a twotailed test. The t statistic and its large pvalue do not reject the null hypothesis that the population mean of the death rate from lung cancer is 0 at the.01 level. The mean of the death rate may be 0 per 100,000 people. Note that the hypothesized value 0 falls into the 99 percent confidence interval y 1 i ( ) y =, = yi y s s, and standard error s y =. n n 1 n The 99 percent confidence interval of the mean is y tα s = *. 6374, where the.695 is ± y ± the critical value with 43 degree of freedom at the.01 level in the twotailed test.
5 , The Trustees of Indiana University Comparing Group Means: 5 If you just have the aggregate data (i.e., the number of observations, mean, and standard deviation of the sample), use the.ttesti command to replicate the ttest above. Note the hypothesized value is specified at the end of the summary statistics.. ttesti , level(99). Ttest Using the SAS TTEST Procedure The TTEST procedure conducts various types of ttests in SAS. The H0 option specifies a hypothesized value, whereas the ALPHA indicates a significance level. If omitted, the default values zero and.05 respectively are assumed. PROC TTEST H0=0 ALPHA=.01 DATA=masil.smoking; VAR lung; RUN; The TTEST Procedure Statistics Lower CL Upper CL Lower CL Upper CL Variable N Mean Mean Mean Std Dev Std Dev Std Dev Std Err lung TTests Variable DF t Value Pr > t lung The TTEST procedure reports descriptive statistics followed by a onetailed ttest. You may have a summary data set containing the values of a variable (lung) and their frequencies (count). The FREQ option of the TTEST procedure provides the solution for this case. PROC TTEST H0=0 ALPHA=.01 DATA=masil.smoking; VAR lung; FREQ count; RUN;.3 Ttest Using the SAS UNIVARIATE and MEANS Procedures The SAS UNIVARIATE and MEANS procedures also conduct a ttest for a univariatesample. The UNIVARIATE procedure is basically designed to produces a variety of descriptive statistics of a variable. Its MU0 option tells the procedure to perform a ttest using the hypothesized value specified. The VARDEF=DF specifies a divisor (degrees of freedom) used in
6 , The Trustees of Indiana University Comparing Group Means: 6 computing the variance (standard deviation). 3 The NORMAL option examines if the variable is normally distributed. PROC UNIVARIATE MU0=0 VARDEF=DF NORMAL ALPHA=.01 DATA=masil.smoking; VAR lung; RUN; The UNIVARIATE Procedure Variable: lung Moments N 44 Sum Weights 44 Mean Sum Observations Std Deviation Variance Skewness Kurtosis Uncorrected SS Corrected SS Coeff Variation Std Error Mean Basic Statistical Measures Location Variability Mean Std Deviation 4.81 Median Variance Mode. Range Interquartile Range Tests for Location: Mu0=0 Test Statistic p Value Student's t t Pr > t Sign M 1 Pr >= M Signed Rank S Pr >= S Tests for Normality Test Statistic p Value ShapiroWilk W Pr < W KolmogorovSmirnov D Pr > D > Cramervon Mises WSq Pr > WSq >0.500 AndersonDarling ASq Pr > ASq >0.500 Quantiles (Definition 5) Quantile Estimate 100% Max The VARDEF=N uses N as a divisor, while VARDEF=WDF specifies the sum of weights minus one.
7 , The Trustees of Indiana University Comparing Group Means: 7 99% % % % Q % Median % Q Quantiles (Definition 5) Quantile Estimate 10% % % % Min Extreme Observations Lowest Highest Value Obs Value Obs The third block of the output above reports a t statistic and its pvalue. The fourth block contains several statistics of normality test. Since N is less than,000, you should read the ShapiroWilk W, which suggests that lung is normally distributed (p<.535) The MEANS procedure also conducts ttests using the T and PROBT options that request the t statistic and its twotailed pvalue. The CLM option produces the twotailed confidence interval (or upper and lower limits). The MEAN, STD, and STDERR respectively print the sample mean, standard deviation, and standard error. PROC MEANS MEAN STD STDERR T PROBT CLM VARDEF=DF ALPHA=.01 DATA=masil.smoking; VAR lung; RUN; The MEANS Procedure Analysis Variable : lung Lower 99% Upper 99% Mean Std Dev Std Error t Value Pr > t CL for Mean CL for Mean ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ < ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
8 , The Trustees of Indiana University Comparing Group Means: 8 The MEANS procedure does not, however, have an option to specify a hypothesized value to anything other than zero. Thus, the null hypothesis here is that the population mean of death rate from lung cancer is zero. The t statistic is ( )/ The large t statistic and small pvalue reject the null hypothesis, reporting a consistent conclusion..4 Ttest in SPSS The SPSS has the TTEST command for ttests. The /TESTVAL subcommand specifies the value with which the sample mean is compared, whereas the /VARIABLES list the variables to be tested. Like STATA, SPSS specifies a confidence level rather than a significance level in the /CRITERIA=CI() subcommand. TTEST /TESTVAL = 0 /VARIABLES = lung /MISSING = ANALYSIS /CRITERIA = CI(.99).
9 , The Trustees of Indiana University Comparing Group Means: 9 3. Paired (Dependent) Samples When two variables are not independent, but paired, the difference of these two variables, di = y1 i yi, is treated as if it were a single sample. This test is appropriate for prepost treatment responses. The null hypothesis is that the true mean difference of the two variables is D 0, H : D 0 µ d = 0. 4 The difference is typically assumed to be zero unless explicitly specified. 3.1 Ttest in STATA In order to conduct a paired sample ttest, you need to list two variables separated by an equal sign. The interpretation of the ttest remains almost unchanged. The = ( )/ at 35 degrees of freedom does not reject the null hypothesis that the difference is zero.. ttest pre=post0, level(95) Paired t test Variable Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] pre post diff mean(diff) = mean(pre post0) t = Ho: mean(diff) = 0 degrees of freedom = 35 Ha: mean(diff) < 0 Ha: mean(diff)!= 0 Ha: mean(diff) > 0 Pr(T < t) = Pr( T > t ) = Pr(T > t) = Alternatively, you may first compute the difference between the two variables, and then conduct onesample ttest. Note that the default confidence level, level(95), can be omitted.. gen d=pre post0. ttest d=0 3. Ttest in SAS In the TTEST procedure, you have to use the PAIRED instead of the VAR statement. For the output of the following procedure, refer to the end of this section. PROC TTEST DATA=temp.drug; PAIRED pre*post0; RUN; t d D 4 = 0 ~ t( n 1) d sd, where d n d =, i s d ( ) = di d n 1, and s d = sd n
10 , The Trustees of Indiana University Comparing Group Means: 10 The PAIRED statement provides various ways of comparing variables using asterisk (*) and colon (:) operators. The asterisk requests comparisons between each variable on the left with each variable on the right. The colon requests comparisons between the first variable on the left and the first on the right, the second on the left and the second on the right, and so forth. Consider the following examples. PROC TTEST; PAIRED pro: post0; PAIRED (a b)*(c d); /* Equivalent to PAIRED a*c a*d b*c b*d; */ PAIRED (a b):(c d); /* Equivalent to PAIRED a*c b*c; */ PAIRED (a1a10)*(b1b10); RUN; The first PAIRED statement is the same as the PAIRED pre*post0. The second and the third PAIRED statements contrast differences between asterisk and colon operators. The hyphen ( ) operator in the last statement indicates a1 through a10 and b1 through b10. Let us consider an example of the PAIRED statement. PROC TTEST DATA=temp.drug; PAIRED (pre)*(post0post1); RUN; The TTEST Procedure Statistics Lower CL Upper CL Lower CL Upper CL Difference N Mean Mean Mean Std Dev Std Dev Std Dev Std Err pre  post pre  post TTests Difference DF t Value Pr > t pre  post pre  post The first t statistic for pre versus post0 is identical to that of the previous section. The second for pre versus post1 rejects the null hypothesis of no mean difference at the.01 level (p<.000). In order to use the UNIVARIATE and MEANS procedures, the difference between two paired variables should be computed in advance. DATA temp.drug; SET temp.drug; d1 = pre  post0; d = pre  post1; RUN;
11 , The Trustees of Indiana University Comparing Group Means: 11 PROC UNIVARIATE MU0=0 VARDEF=DF NORMAL; VAR d1 d; RUN; PROC MEANS MEAN STD STDERR T PROBT CLM; VAR d1 d; RUN; PROC TTEST ALPHA=.05; VAR d1 d; RUN; 3.3 Ttest in SPSS In SPSS, the PAIRS subcommand indicates a paired sample ttest. TTEST PAIRS = pre post0 /CRITERIA = CI(.95) /MISSING = ANALYSIS.
12 , The Trustees of Indiana University Comparing Group Means: 1 4. Independent Samples with Equal Variances You should check three assumptions first when testing the mean difference of two independent samples. First, the samples are drawn from normally distributed populations with unknown parameters. Second, the two samples are independent in the sense that they are drawn from different populations and/or the elements of one sample are not related to those of the other sample. Finally, the population variances of the two groups, σ 1 and σ are equal. 5 If any one of assumption is violated, the ttest is not valid. An example here is to compare mean death rates from lung cancer between smokers and nonsmokers. Let us begin with discussing the equal variance assumption. 4.1 F test for Equal Variances The folded form F test is widely used to examine whether two populations have the same sl variance. The statistic is ~ F( n 1, 1) L ns, where L and S respectively indicate groups ss with larger and smaller sample variances. Unless the null hypothesis of equal variances is rejected, the pooled variance estimate s pool is used. The null hypothesis of the independent sample ttest is H : µ µ = D ( y1 y ) D0 t = ~ t( n1 + n 1 1 s pool + n1 n ( ) ( y1 i y1 + y s = n + n ), where y j ( n1 1) s1 + ( n 1) s pool =. 1 n1 + n ) When the assumption is violated, the ttest requires the approximations of the degree of freedom. The null hypothesis and other components of the ttest, however, remain unchanged. Satterthwaite s approximation for the degree of freedom is commonly used. Note that the approximation is a real number, not an integer. y1 y D0 t' = ~ t( df Satterthwaite ), where s1 s + n n df 1 ( n 1)( n 1) 1 Satterthwaite = and ( n1 1)(1 c) + ( n 1) c c = s 1 s1 n n + s 1 1 n 5 1 E ( x1 x ) = µ 1 µ, 1 Var( x = + = + 1 x ) σ n1 n n1 n σ σ 1
13 , The Trustees of Indiana University Comparing Group Means: 13 The SAS TTEST procedure and SPSS TTEST command conduct F tests for equal variance. SAS reports the folded form F statistic, whereas SPSS computes Levene's weighted F statistic. In STATA, the.oneway command produces Bartlett s statistic for the equal variance test. The following is an example of Bartlett's test that does not reject the null hypothesis of equal variance.. oneway lung smoke Analysis of Variance Source SS df MS F Prob > F Between groups Within groups Total Bartlett's test for equal variances: chi(1) = Prob>chi = 0.77 STATA, SAS, and SPSS all compute Satterthwaite s approximation of the degrees of freedom. In addition, the SAS TTEST procedure reports CochranCox approximation and the STATA.ttest command provides Welch s degrees of freedom. 4. Ttest in STATA With the.ttest command, you have to specify a grouping variable smoke in this example in the parenthesis of the by option.. ttest lung, by(smoke) level(95) Twosample t test with equal variances Group Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] combined diff diff = mean(0)  mean(1) t = Ho: diff = 0 degrees of freedom = 4 Ha: diff < 0 Ha: diff!= 0 Ha: diff > 0 Pr(T < t) = Pr( T > t ) = Pr(T > t) = s Let us first check the equal variance. The F statistic is 1.17 = L = ~ F(1,1). The ss degrees of freedom of the numerator and denominator are 1 (=1). The pvalue of.773, virtually the same as that of Bartlett s test above, does not reject the null hypothesis of equal variance. Thus, the ttest here is valid (t= and p<.0000).
14 , The Trustees of Indiana University Comparing Group Means: 14 ( ) 0 t = = ~ t( + ), where 1 1 s pool + ( 1) ( 1)3.418 s pool = = If only aggregate data of the two variables are available, use the.ttesti command and list the number of observations, mean, and standard deviation of the two variables.. ttesti , level(95) Suppose a data set is differently arranged (second type in Figure 1) so that one variable smk_lung has data for smokers and the other non_lung for nonsmokers. You have to use the unpaired option to indicate that two variables are not paired. A grouping variable here is not necessary. Compare the following output with what is printed above.. ttest smk_lung=non_lung, unpaired Twosample t test with equal variances Variable Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] smk_lung non_lung combined diff diff = mean(smk_lung)  mean(non_lung) t = Ho: diff = 0 degrees of freedom = 4 Ha: diff < 0 Ha: diff!= 0 Ha: diff > 0 Pr(T < t) = Pr( T > t ) = Pr(T > t) = This unpaired option is very useful since it enables you to conduct a ttest without additional data manipulation. You may run the.ttest command with the unpaired option to compare two variables, say leukemia and kidney, as independent samples in STATA. In SAS and SPSS, however, you have to stack up two variables and generate a grouping variable before t tests.. ttest leukemia=kidney, unpaired Twosample t test with equal variances Variable Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] leukemia kidney combined diff
15 , The Trustees of Indiana University Comparing Group Means: 15 diff = mean(leukemia)  mean(kidney) t = Ho: diff = 0 degrees of freedom = 86 Ha: diff < 0 Ha: diff!= 0 Ha: diff > 0 Pr(T < t) = Pr( T > t ) = Pr(T > t) = The F = ( ^)/( ^) and its pvalue (=.1797) do not reject the null hypothesis of equal variance. The large t statistic rejects the null hypothesis that death rates from leukemia and kidney cancers have the same mean. 4.3 Ttest in SAS The TTEST procedure by default examines the hypothesis of equal variances, and provides T statistics for either case. The procedure by default reports Satterthwaite s approximation for the degrees of freedom. Keep in mind that a variable to be tested is grouped by the variable that is specified in the CLASS statement. PROC TTEST H0=0 ALPHA=.05 DATA=masil.smoking; CLASS smoke; VAR lung; RUN; The TTEST Procedure Statistics Lower CL Upper CL Lower CL Upper CL Variable smoke N Mean Mean Mean Std Dev Std Dev Std Dev lung lung lung Diff (1) Statistics Variable smoke Std Err Minimum Maximum lung lung lung Diff (1) TTests Variable Method Variances DF t Value Pr > t lung Pooled Equal <.0001 lung Satterthwaite Unequal <.0001 Equality of Variances Variable Method Num DF Den DF F Value Pr > F
16 , The Trustees of Indiana University Comparing Group Means: 16 lung Folded F The F test for equal variance does not reject the null hypothesis of equal variances. Thus, the t test labeled as Pooled should be referred to in order to get the t and its pvalue If the equal variance assumption is violated, the statistics of Satterthwaite and Cochran should be read. If you have a summary data set with the values of variables (lung) and their frequency (count), specify the count variable in the FREQ statement. PROC TTEST DATA=masil.smoking; CLASS smoke; VAR lung; FREQ count; RUN; Now, let us compare the death rates from leukemia and kidney in the second data arrangement type of Figure 1. As mentioned before, you need to rearrange the data set to stack up two variables into one and generate a grouping variable (first type in Figure 1). DATA masil.smoking; SET masil.smoking; death = leukemia; leu_kid ='Leukemia'; OUTPUT; death = kidney; leu_kid ='Kidney'; OUTPUT; KEEP leu_kid death; RUN; PROC TTEST COCHRAN DATA=masil.smoking; CLASS leu_kid; VAR death; RUN; The TTEST Procedure Statistics Lower CL Upper CL Lower CL Upper CL Variable leu_kid N Mean Mean Mean Std Dev Std Dev Std Dev Std Err death Kidney death Leukemia death Diff (1) TTests Variable Method Variances DF t Value Pr > t death Pooled Equal <.0001 death Satterthwaite Unequal <.0001 death Cochran Unequal <.0001 Equality of Variances Variable Method Num DF Den DF F Value Pr > F
17 , The Trustees of Indiana University Comparing Group Means: 17 death Folded F Compare this SAS output with that of STATA in the previous section. 4.4 Ttest in SPSS In the TTEST command, you need to use the /GROUP subcommand in order to specify a grouping variable. SPSS reports Levene's F.0000 that does not reject the null hypothesis of equal variance (p<.995). TTEST GROUPS = smoke(0 1) /VARIABLES = lung /MISSING = ANALYSIS /CRITERIA = CI(.95).
18 , The Trustees of Indiana University Comparing Group Means: Independent Samples with Unequal Variances If the assumption of equal variances is violated, we have to compute the adjusted t statistic using individual sample standard deviations rather than a pooled standard deviation. It is also necessary to use the Satterthwaite, CochranCox (SAS), or Welch (STATA) approximations of the degrees of freedom. In this chapter, you compare mean death rates from kidney cancer between the west (south) and east (north). 5.1 Ttest in STATA As discussed earlier, let us check equality of variances using the.oneway command. The tabulate option produces a table of summary statistics for the groups.. oneway kidney west, tabulate Summary of kidney west Mean Std. Dev. Freq Total Analysis of Variance Source SS df MS F Prob > F Between groups Within groups Total Bartlett's test for equal variances: chi(1) = Prob>chi = Bartlett s chisquared statistic rejects the null hypothesis of equal variance at the.01 level. It is appropriate to use the unequal option in the.ttest command, which calculates Satterthwaite s approximation for the degrees of freedom. Unlike the SAS TTEST procedure, the.ttest command cannot specify the mean difference D 0 other than zero. Thus, the null hypothesis is that the mean difference is zero.. ttest kidney, by(west) unequal level(95) Twosample t test with unequal variances Group Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] combined diff
19 , The Trustees of Indiana University Comparing Group Means: 19 diff = mean(0)  mean(1) t =.7817 Ho: diff = 0 Satterthwaite's degrees of freedom = Ha: diff < 0 Ha: diff!= 0 Ha: diff > 0 Pr(T < t) = Pr( T > t ) = Pr(T > t) = See Satterthwaite s approximation of in the middle of the output. If you want to get Welch s approximation, use the welch as well as unequal options; without the unequal option, the welch is ignored.. ttest kidney, by(west) unequal welch Twosample t test with unequal variances Group Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] combined diff diff = mean(0)  mean(1) t =.7817 Ho: diff = 0 Welch's degrees of freedom = Ha: diff < 0 Ha: diff!= 0 Ha: diff > 0 Pr(T < t) = Pr( T > t ) = Pr(T > t) = Satterthwaite s approximation is slightly smaller than Welch s Again, keep in mind that these approximations are not integers, but real numbers. The t statistic.7817 and its p value.0086 reject the null hypothesis of equal population means. The north and east have larger death rates from kidney cancer per 100 thousand people than the south and west. For aggregate data, use the.ttesti command with the necessary options.. ttesti , unequal welch As mentioned earlier, the unpaired option of the.ttest command directly compares two variables without data manipulation. The option treats the two variables as independent of each other. The following is an example of the unpaired and unequal options.. ttest bladder=kidney, unpaired unequal welch Twosample t test with unequal variances Variable Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] bladder kidney combined diff diff = mean(bladder)  mean(kidney) t = Ho: diff = 0 Welch's degrees of freedom =
20 , The Trustees of Indiana University Comparing Group Means: 0 Ha: diff < 0 Ha: diff!= 0 Ha: diff > 0 Pr(T < t) = Pr( T > t ) = Pr(T > t) = The F = ( ^)/( ^) rejects the null hypothesis of equal variance (p<0001). If the welch option is omitted, Satterthwaite's degree of freedom will be produced instead. For aggregate data, again, use the.ttesti command without the unpaired option.. ttesti , unequal welch level(95) 5. Ttest in SAS The TTEST procedure reports statistics for cases of both equal and unequal variance. You may add the COCHRAN option to compute CochranCox approximations for the degree of freedom. PROC TTEST COCHRAN DATA=masil.smoking; CLASS west; VAR kidney; RUN; The TTEST Procedure Statistics Lower CL Upper CL Lower CL Upper CL Variable s_west N Mean Mean Mean Std Dev Std Dev Std Dev kidney kidney kidney Diff (1) Statistics Variable west Std Err Minimum Maximum kidney kidney kidney Diff (1) TTests Variable Method Variances DF t Value Pr > t kidney Pooled Equal kidney Satterthwaite Unequal kidney Cochran Unequal Equality of Variances Variable Method Num DF Den DF F Value Pr > F kidney Folded F
Basic Statistical and Modeling Procedures Using SAS
Basic Statistical and Modeling Procedures Using SAS OneSample Tests The statistical procedures illustrated in this handout use two datasets. The first, Pulse, has information collected in a classroom
More informationI n d i a n a U n i v e r s i t y University Information Technology Services
I n d i a n a U n i v e r s i t y University Information Technology Services Comparing Group Means: Ttests and Oneway ANOVA Using Stata, SAS, R, and SPSS * Hun Myoung Park, Ph.D. kucc65@indiana.edu 003009
More informationAn Introduction to Statistical Tests for the SAS Programmer Sara Beck, Fred Hutchinson Cancer Research Center, Seattle, WA
ABSTRACT An Introduction to Statistical Tests for the SAS Programmer Sara Beck, Fred Hutchinson Cancer Research Center, Seattle, WA Often SAS Programmers find themselves in situations where performing
More informationSAS Software to Fit the Generalized Linear Model
SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling
More informationSTATISTICA Formula Guide: Logistic Regression. Table of Contents
: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 SigmaRestricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary
More informationGeneralized Linear Models
Generalized Linear Models We have previously worked with regression models where the response variable is quantitative and normally distributed. Now we turn our attention to two types of models where the
More informationInteraction effects and group comparisons Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised February 20, 2015
Interaction effects and group comparisons Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised February 20, 2015 Note: This handout assumes you understand factor variables,
More informationI n d i a n a U n i v e r s i t y U n i v e r s i t y I n f o r m a t i o n T e c h n o l o g y S e r v i c e s
I n d i a n a U n i v e r s i t y U n i v e r s i t y I n f o r m a t i o n T e c h n o l o g y S e r v i c e s Linear Regression Models for Panel Data Using SAS, Stata, LIMDEP, and SPSS * Hun Myoung Park,
More informationIntroduction to Regression and Data Analysis
Statlab Workshop Introduction to Regression and Data Analysis with Dan Campbell and Sherlock Campbell October 28, 2008 I. The basics A. Types of variables Your variables may take several forms, and it
More informationOutline. Topic 4  Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares
Topic 4  Analysis of Variance Approach to Regression Outline Partitioning sums of squares Degrees of freedom Expected mean squares General linear test  Fall 2013 R 2 and the coefficient of correlation
More informationModels for Count Data With Overdispersion
Models for Count Data With Overdispersion Germán Rodríguez November 6, 2013 Abstract This addendum to the WWS 509 notes covers extrapoisson variation and the negative binomial model, with brief appearances
More informationPlease follow the directions once you locate the Stata software in your computer. Room 114 (Business Lab) has computers with Stata software
STATA Tutorial Professor Erdinç Please follow the directions once you locate the Stata software in your computer. Room 114 (Business Lab) has computers with Stata software 1.Wald Test Wald Test is used
More informationNew SAS Procedures for Analysis of Sample Survey Data
New SAS Procedures for Analysis of Sample Survey Data Anthony An and Donna Watts, SAS Institute Inc, Cary, NC Abstract Researchers use sample surveys to obtain information on a wide variety of issues Many
More informationECON 142 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE #2
University of California, Berkeley Prof. Ken Chay Department of Economics Fall Semester, 005 ECON 14 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE # Question 1: a. Below are the scatter plots of hourly wages
More informationStatistics, Data Analysis & Econometrics
Using the LOGISTIC Procedure to Model Responses to Financial Services Direct Marketing David Marsh, Senior Credit Risk Modeler, Canadian Tire Financial Services, Welland, Ontario ABSTRACT It is more important
More informationStatistics 104 Final Project A Culture of Debt: A Study of Credit Card Spending in America TF: Kevin Rader Anonymous Students: LD, MH, IW, MY
Statistics 104 Final Project A Culture of Debt: A Study of Credit Card Spending in America TF: Kevin Rader Anonymous Students: LD, MH, IW, MY ABSTRACT: This project attempted to determine the relationship
More informationFailure to take the sampling scheme into account can lead to inaccurate point estimates and/or flawed estimates of the standard errors.
Analyzing Complex Survey Data: Some key issues to be aware of Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised January 24, 2015 Rather than repeat material that is
More information1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96
1 Final Review 2 Review 2.1 CI 1propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years
More informationLesson 1: Comparison of Population Means Part c: Comparison of Two Means
Lesson : Comparison of Population Means Part c: Comparison of Two Means Welcome to lesson c. This third lesson of lesson will discuss hypothesis testing for two independent means. Steps in Hypothesis
More informationMarginal Effects for Continuous Variables Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised February 21, 2015
Marginal Effects for Continuous Variables Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised February 21, 2015 References: Long 1997, Long and Freese 2003 & 2006 & 2014,
More informationUsing Stata for Categorical Data Analysis
Using Stata for Categorical Data Analysis NOTE: These problems make extensive use of Nick Cox s tab_chi, which is actually a collection of routines, and Adrian Mander s ipf command. From within Stata,
More informationxtmixed & denominator degrees of freedom: myth or magic
xtmixed & denominator degrees of freedom: myth or magic 2011 Chicago Stata Conference Phil Ender UCLA Statistical Consulting Group July 2011 Phil Ender xtmixed & denominator degrees of freedom: myth or
More informationFrom the help desk: hurdle models
The Stata Journal (2003) 3, Number 2, pp. 178 184 From the help desk: hurdle models Allen McDowell Stata Corporation Abstract. This article demonstrates that, although there is no command in Stata for
More informationNCSS Statistical Software
Chapter 06 Introduction This procedure provides several reports for the comparison of two distributions, including confidence intervals for the difference in means, twosample ttests, the ztest, the
More informationDongfeng Li. Autumn 2010
Autumn 2010 Chapter Contents Some statistics background; ; Comparing means and proportions; variance. Students should master the basic concepts, descriptive statistics measures and graphs, basic hypothesis
More informationHandling missing data in Stata a whirlwind tour
Handling missing data in Stata a whirlwind tour 2012 Italian Stata Users Group Meeting Jonathan Bartlett www.missingdata.org.uk 20th September 2012 1/55 Outline The problem of missing data and a principled
More informationMultinomial and Ordinal Logistic Regression
Multinomial and Ordinal Logistic Regression ME104: Linear Regression Analysis Kenneth Benoit August 22, 2012 Regression with categorical dependent variables When the dependent variable is categorical,
More informationMEASURES OF LOCATION AND SPREAD
Paper TU04 An Overview of Nonparametric Tests in SAS : When, Why, and How Paul A. Pappas and Venita DePuy Durham, North Carolina, USA ABSTRACT Most commonly used statistical procedures are based on the
More informationQuick Stata Guide by Liz Foster
by Liz Foster Table of Contents Part 1: 1 describe 1 generate 1 regress 3 scatter 4 sort 5 summarize 5 table 6 tabulate 8 test 10 ttest 11 Part 2: Prefixes and Notes 14 by var: 14 capture 14 use of the
More informationLet s explore SAS Proc TTest
Let s explore SAS Proc TTest Ana Yankovsky Research Statistical Analyst Screening Programs, AHS Ana.Yankovsky@albertahealthservices.ca Goals of the presentation: 1. Look at the structure of Proc TTEST;
More informationTesting Group Differences using Ttests, ANOVA, and Nonparametric Measures
Testing Group Differences using Ttests, ANOVA, and Nonparametric Measures Jamie DeCoster Department of Psychology University of Alabama 348 Gordon Palmer Hall Box 870348 Tuscaloosa, AL 354870348 Phone:
More informationExperimental Design for Influential Factors of Rates on Massive Open Online Courses
Experimental Design for Influential Factors of Rates on Massive Open Online Courses December 12, 2014 Ning Li nli7@stevens.edu Qing Wei qwei1@stevens.edu Yating Lan ylan2@stevens.edu Yilin Wei ywei12@stevens.edu
More informationAugust 2012 EXAMINATIONS Solution Part I
August 01 EXAMINATIONS Solution Part I (1) In a random sample of 600 eligible voters, the probability that less than 38% will be in favour of this policy is closest to (B) () In a large random sample,
More informationAileen Murphy, Department of Economics, UCC, Ireland. WORKING PAPER SERIES 0710
AN ECONOMETRIC ANALYSIS OF SMOKING BEHAVIOUR IN IRELAND Aileen Murphy, Department of Economics, UCC, Ireland. DEPARTMENT OF ECONOMICS WORKING PAPER SERIES 0710 1 AN ECONOMETRIC ANALYSIS OF SMOKING BEHAVIOUR
More informationGLM I An Introduction to Generalized Linear Models
GLM I An Introduction to Generalized Linear Models CAS Ratemaking and Product Management Seminar March 2009 Presented by: Tanya D. Havlicek, Actuarial Assistant 0 ANTITRUST Notice The Casualty Actuarial
More informationESTIMATING AVERAGE TREATMENT EFFECTS: IV AND CONTROL FUNCTIONS, II Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics
ESTIMATING AVERAGE TREATMENT EFFECTS: IV AND CONTROL FUNCTIONS, II Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics July 2009 1. Quantile Treatment Effects 2. Control Functions
More informationHow to set the main menu of STATA to default factory settings standards
University of Pretoria Data analysis for evaluation studies Examples in STATA version 11 List of data sets b1.dta (To be created by students in class) fp1.xls (To be provided to students) fp1.txt (To be
More informationNCSS Statistical Software
Chapter 06 Introduction This procedure provides several reports for the comparison of two distributions, including confidence intervals for the difference in means, twosample ttests, the ztest, the
More informationIBM SPSS Missing Values 22
IBM SPSS Missing Values 22 Note Before using this information and the product it supports, read the information in Notices on page 23. Product Information This edition applies to version 22, release 0,
More informationSyntax Menu Description Options Remarks and examples Stored results Methods and formulas References Also see. level(#) , options2
Title stata.com ttest t tests (meancomparison tests) Syntax Syntax Menu Description Options Remarks and examples Stored results Methods and formulas References Also see Onesample t test ttest varname
More informationData Analysis Tools. Tools for Summarizing Data
Data Analysis Tools This section of the notes is meant to introduce you to many of the tools that are provided by Excel under the Tools/Data Analysis menu item. If your computer does not have that tool
More informationGuide to Microsoft Excel for calculations, statistics, and plotting data
Page 1/47 Guide to Microsoft Excel for calculations, statistics, and plotting data Topic Page A. Writing equations and text 2 1. Writing equations with mathematical operations 2 2. Writing equations with
More informationIntroduction to Quantitative Methods
Introduction to Quantitative Methods October 15, 2009 Contents 1 Definition of Key Terms 2 2 Descriptive Statistics 3 2.1 Frequency Tables......................... 4 2.2 Measures of Central Tendencies.................
More informationHURDLE AND SELECTION MODELS Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics July 2009
HURDLE AND SELECTION MODELS Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics July 2009 1. Introduction 2. A General Formulation 3. Truncated Normal Hurdle Model 4. Lognormal
More informationSAS 3: Comparing Means
SAS 3: Comparing Means University of Guelph Revised June 2011 Table of Contents SAS Availability... 2 Goals of the workshop... 2 Data for SAS sessions... 3 Statistical Background... 4 Ttest... 8 1. Independent
More informationInstitute of Actuaries of India Subject CT3 Probability and Mathematical Statistics
Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics For 2015 Examinations Aim The aim of the Probability and Mathematical Statistics subject is to provide a grounding in
More informationDescriptive Statistics
Descriptive Statistics Primer Descriptive statistics Central tendency Variation Relative position Relationships Calculating descriptive statistics Descriptive Statistics Purpose to describe or summarize
More informationKSTAT MINIMANUAL. Decision Sciences 434 Kellogg Graduate School of Management
KSTAT MINIMANUAL Decision Sciences 434 Kellogg Graduate School of Management Kstat is a set of macros added to Excel and it will enable you to do the statistics required for this course very easily. To
More informationStatistics Review PSY379
Statistics Review PSY379 Basic concepts Measurement scales Populations vs. samples Continuous vs. discrete variable Independent vs. dependent variable Descriptive vs. inferential stats Common analyses
More informationData Mining: An Overview of Methods and Technologies for Increasing Profits in Direct Marketing. C. Olivia Rud, VP, Fleet Bank
Data Mining: An Overview of Methods and Technologies for Increasing Profits in Direct Marketing C. Olivia Rud, VP, Fleet Bank ABSTRACT Data Mining is a new term for the common practice of searching through
More informationData Analysis Methodology 1
Data Analysis Methodology 1 Suppose you inherited the database in Table 1.1 and needed to find out what could be learned from it fast. Say your boss entered your office and said, Here s some software project
More informationRandom effects and nested models with SAS
Random effects and nested models with SAS /************* classical2.sas ********************* Three levels of factor A, four levels of B Both fixed Both random A fixed, B random B nested within A ***************************************************/
More informationAnalyses on Hurricane Archival Data June 17, 2014
Analyses on Hurricane Archival Data June 17, 2014 This report provides detailed information about analyses of archival data in our PNAS article http://www.pnas.org/content/early/2014/05/29/1402786111.abstract
More informationAdditional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jintselink/tselink.htm
Mgt 540 Research Methods Data Analysis 1 Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jintselink/tselink.htm http://web.utk.edu/~dap/random/order/start.htm
More informationHypothesis Testing and Statistical Power of a Test *
20042008, The Trustees of Indiana University Hypothesis Testing and Statistical Power: 1 Hypothesis Testing and Statistical Power of a Test * Hun Myoung Park (kucc625@indiana.edu) This document provides
More informationConfidence Intervals for the Difference Between Two Means
Chapter 47 Confidence Intervals for the Difference Between Two Means Introduction This procedure calculates the sample size necessary to achieve a specified distance from the difference in sample means
More informationRecall this chart that showed how most of our course would be organized:
Chapter 4 OneWay ANOVA Recall this chart that showed how most of our course would be organized: Explanatory Variable(s) Response Variable Methods Categorical Categorical Contingency Tables Categorical
More informationIAPRI Quantitative Analysis Capacity Building Series. Multiple regression analysis & interpreting results
IAPRI Quantitative Analysis Capacity Building Series Multiple regression analysis & interpreting results How important is Rsquared? Rsquared Published in Agricultural Economics 0.45 Best article of the
More informationMODEL I: DRINK REGRESSED ON GPA & MALE, WITHOUT CENTERING
Interpreting Interaction Effects; Interaction Effects and Centering Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised February 20, 2015 Models with interaction effects
More informationBusiness Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.
Business Course Text Bowerman, Bruce L., Richard T. O'Connell, J. B. Orris, and Dawn C. Porter. Essentials of Business, 2nd edition, McGrawHill/Irwin, 2008, ISBN: 9780073319889. Required Computing
More informationRegression Analysis: A Complete Example
Regression Analysis: A Complete Example This section works out an example that includes all the topics we have discussed so far in this chapter. A complete example of regression analysis. PhotoDisc, Inc./Getty
More informationPoisson Models for Count Data
Chapter 4 Poisson Models for Count Data In this chapter we study loglinear models for count data under the assumption of a Poisson error structure. These models have many applications, not only to the
More informationSUMAN DUVVURU STAT 567 PROJECT REPORT
SUMAN DUVVURU STAT 567 PROJECT REPORT SURVIVAL ANALYSIS OF HEROIN ADDICTS Background and introduction: Current illicit drug use among teens is continuing to increase in many countries around the world.
More informationSimple Linear Regression Inference
Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation
More informationAn analysis method for a quantitative outcome and two categorical explanatory variables.
Chapter 11 TwoWay ANOVA An analysis method for a quantitative outcome and two categorical explanatory variables. If an experiment has a quantitative outcome and two categorical explanatory variables that
More informationIntroduction. Hypothesis Testing. Hypothesis Testing. Significance Testing
Introduction Hypothesis Testing Mark Lunt Arthritis Research UK Centre for Ecellence in Epidemiology University of Manchester 13/10/2015 We saw last week that we can never know the population parameters
More informationDETERMINANTS OF CAPITAL ADEQUACY RATIO IN SELECTED BOSNIAN BANKS
DETERMINANTS OF CAPITAL ADEQUACY RATIO IN SELECTED BOSNIAN BANKS Nađa DRECA International University of Sarajevo nadja.dreca@students.ius.edu.ba Abstract The analysis of a data set of observation for 10
More informationMINITAB ASSISTANT WHITE PAPER
MINITAB ASSISTANT WHITE PAPER This paper explains the research conducted by Minitab statisticians to develop the methods and data checks used in the Assistant in Minitab 17 Statistical Software. OneWay
More informationGamma Distribution Fitting
Chapter 552 Gamma Distribution Fitting Introduction This module fits the gamma probability distributions to a complete or censored set of individual or grouped data values. It outputs various statistics
More informationPart 3. Comparing Groups. Chapter 7 Comparing Paired Groups 189. Chapter 8 Comparing Two Independent Groups 217
Part 3 Comparing Groups Chapter 7 Comparing Paired Groups 189 Chapter 8 Comparing Two Independent Groups 217 Chapter 9 Comparing More Than Two Groups 257 188 Elementary Statistics Using SAS Chapter 7 Comparing
More informationVI. Introduction to Logistic Regression
VI. Introduction to Logistic Regression We turn our attention now to the topic of modeling a categorical outcome as a function of (possibly) several factors. The framework of generalized linear models
More informationBiostatistics: Types of Data Analysis
Biostatistics: Types of Data Analysis Theresa A Scott, MS Vanderbilt University Department of Biostatistics theresa.scott@vanderbilt.edu http://biostat.mc.vanderbilt.edu/theresascott Theresa A Scott, MS
More informationNCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )
Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates
More informationChapter 7. Oneway ANOVA
Chapter 7 Oneway ANOVA Oneway ANOVA examines equality of population means for a quantitative outcome and a single categorical explanatory variable with any number of levels. The ttest of Chapter 6 looks
More information11. Analysis of Casecontrol Studies Logistic Regression
Research methods II 113 11. Analysis of Casecontrol Studies Logistic Regression This chapter builds upon and further develops the concepts and strategies described in Ch.6 of Mother and Child Health:
More informationDATA INTERPRETATION AND STATISTICS
PholC60 September 001 DATA INTERPRETATION AND STATISTICS Books A easy and systematic introductory text is Essentials of Medical Statistics by Betty Kirkwood, published by Blackwell at about 14. DESCRIPTIVE
More informationChapter 29 The GENMOD Procedure. Chapter Table of Contents
Chapter 29 The GENMOD Procedure Chapter Table of Contents OVERVIEW...1365 WhatisaGeneralizedLinearModel?...1366 ExamplesofGeneralizedLinearModels...1367 TheGENMODProcedure...1368 GETTING STARTED...1370
More informationFinal Exam Practice Problem Answers
Final Exam Practice Problem Answers The following data set consists of data gathered from 77 popular breakfast cereals. The variables in the data set are as follows: Brand: The brand name of the cereal
More informationPredictor Coef StDev T P Constant 970667056 616256122 1.58 0.154 X 0.00293 0.06163 0.05 0.963. S = 0.5597 RSq = 0.0% RSq(adj) = 0.
Statistical analysis using Microsoft Excel Microsoft Excel spreadsheets have become somewhat of a standard for data storage, at least for smaller data sets. This, along with the program often being packaged
More informationTutorial 5: Hypothesis Testing
Tutorial 5: Hypothesis Testing Rob Nicholls nicholls@mrclmb.cam.ac.uk MRC LMB Statistics Course 2014 Contents 1 Introduction................................ 1 2 Testing distributional assumptions....................
More informationNonlinear Regression Functions. SW Ch 8 1/54/
Nonlinear Regression Functions SW Ch 8 1/54/ The TestScore STR relation looks linear (maybe) SW Ch 8 2/54/ But the TestScore Income relation looks nonlinear... SW Ch 8 3/54/ Nonlinear Regression General
More informationFrom the help desk: Bootstrapped standard errors
The Stata Journal (2003) 3, Number 1, pp. 71 80 From the help desk: Bootstrapped standard errors Weihua Guan Stata Corporation Abstract. Bootstrapping is a nonparametric approach for evaluating the distribution
More informationSection Format Day Begin End Building Rm# Instructor. 001 Lecture Tue 6:45 PM 8:40 PM Silver 401 Ballerini
NEW YORK UNIVERSITY ROBERT F. WAGNER GRADUATE SCHOOL OF PUBLIC SERVICE Course Syllabus Spring 2016 Statistical Methods for Public, Nonprofit, and Health Management Section Format Day Begin End Building
More informationModeling Lifetime Value in the Insurance Industry
Modeling Lifetime Value in the Insurance Industry C. Olivia Parr Rud, Executive Vice President, Data Square, LLC ABSTRACT Acquisition modeling for direct mail insurance has the unique challenge of targeting
More informationOneWay Analysis of Variance
OneWay Analysis of Variance Note: Much of the math here is tedious but straightforward. We ll skim over it in class but you should be sure to ask questions if you don t understand it. I. Overview A. We
More informationHYPOTHESIS TESTING: CONFIDENCE INTERVALS, TTESTS, ANOVAS, AND REGRESSION
HYPOTHESIS TESTING: CONFIDENCE INTERVALS, TTESTS, ANOVAS, AND REGRESSION HOD 2990 10 November 2010 Lecture Background This is a lightning speed summary of introductory statistical methods for senior undergraduate
More informationModeration. Moderation
Stats  Moderation Moderation A moderator is a variable that specifies conditions under which a given predictor is related to an outcome. The moderator explains when a DV and IV are related. Moderation
More informationbusiness statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar
business statistics using Excel Glyn Davis & Branko Pecar OXFORD UNIVERSITY PRESS Detailed contents Introduction to Microsoft Excel 2003 Overview Learning Objectives 1.1 Introduction to Microsoft Excel
More informationSurvey, Statistics and Psychometrics Core Research Facility University of NebraskaLincoln. LogRank Test for More Than Two Groups
Survey, Statistics and Psychometrics Core Research Facility University of NebraskaLincoln LogRank Test for More Than Two Groups Prepared by Harlan Sayles (SRAM) Revised by Julia Soulakova (Statistics)
More informationLAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING
LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING In this lab you will explore the concept of a confidence interval and hypothesis testing through a simulation problem in engineering setting.
More informationMODELING AUTO INSURANCE PREMIUMS
MODELING AUTO INSURANCE PREMIUMS Brittany Parahus, Siena College INTRODUCTION The findings in this paper will provide the reader with a basic knowledge and understanding of how Auto Insurance Companies
More informationt Tests in Excel The Excel Statistical Master By Mark Harmon Copyright 2011 Mark Harmon
ttests in Excel By Mark Harmon Copyright 2011 Mark Harmon No part of this publication may be reproduced or distributed without the express permission of the author. mark@excelmasterseries.com www.excelmasterseries.com
More informationTwoSample TTests Allowing Unequal Variance (Enter Difference)
Chapter 45 TwoSample TTests Allowing Unequal Variance (Enter Difference) Introduction This procedure provides sample size and power calculations for one or twosided twosample ttests when no assumption
More informationINTERPRETING THE ONEWAY ANALYSIS OF VARIANCE (ANOVA)
INTERPRETING THE ONEWAY ANALYSIS OF VARIANCE (ANOVA) As with other parametric statistics, we begin the oneway ANOVA with a test of the underlying assumptions. Our first assumption is the assumption of
More informationLecture 15. Endogeneity & Instrumental Variable Estimation
Lecture 15. Endogeneity & Instrumental Variable Estimation Saw that measurement error (on right hand side) means that OLS will be biased (biased toward zero) Potential solution to endogeneity instrumental
More informationData Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression
Data Mining and Data Warehousing Henryk Maciejewski Data Mining Predictive modelling: regression Algorithms for Predictive Modelling Contents Regression Classification Auxiliary topics: Estimation of prediction
More informationStatistical Functions in Excel
Statistical Functions in Excel There are many statistical functions in Excel. Moreover, there are other functions that are not specified as statistical functions that are helpful in some statistical analyses.
More informationTwo Correlated Proportions (McNemar Test)
Chapter 50 Two Correlated Proportions (Mcemar Test) Introduction This procedure computes confidence intervals and hypothesis tests for the comparison of the marginal frequencies of two factors (each with
More informationSome Essential Statistics The Lure of Statistics
Some Essential Statistics The Lure of Statistics Data Mining Techniques, by M.J.A. Berry and G.S Linoff, 2004 Statistics vs. Data Mining..lie, damn lie, and statistics mining data to support preconceived
More informationLogistic Regression. http://faculty.chass.ncsu.edu/garson/pa765/logistic.htm#sigtests
Logistic Regression http://faculty.chass.ncsu.edu/garson/pa765/logistic.htm#sigtests Overview Binary (or binomial) logistic regression is a form of regression which is used when the dependent is a dichotomy
More information