The Analysis of Variance ANOVA

Size: px
Start display at page:

Download "The Analysis of Variance ANOVA"

Transcription

1 -3σ -σ -σ +σ +σ +3σ The Analysis of Variance ANOVA Lecture / Dr. P. s Clinic Consultant Module in Probability & Statistics in Engineering

2 Today in P&S -3σ -σ -σ +σ +σ +3σ Analysis of Variance (ANOVA) Definitions Single Factor Anova Setting and assumptions The F-statistic Tests about the variance of two populations F-distribution and F-test Anova variables and Anova table ANOVA using MATLAB Multiple Comparisons in ANOVA 006 All Rights Reserved, Robi Polikar, Rowan University, Dept. of Electrical and Computer Engineering

3 Definitions -3σ -σ -σ +σ +σ +3σ The analysis of variance (ANOVA) refers to a collection of experimental situations and statistical procedures for the analysis of quantitative responses from experimental units. The simplest form is known as single factor ANOVA or one-way ANOVA, and is usually used for comparing means of Data sampled from more than two populations, or Data from experiments in which more than two treatments have been used The characteristic that differentiates the treatments or populations from one another is called the factor under study, and the different treatments or populations are referred to as the levels of the factor. 006 All Rights Reserved, Robi Polikar, Rowan University, Dept. of Electrical and Computer Engineering

4 Examples -3σ -σ -σ +σ +σ +3σ An experiment to study geographic demographics (e.g., urban, suburban, rural, international urban, international rural) in overall student success Factor of interest is the geographic demographic, and there are five different qualitative levels. An experiment to study the effect of different diets (Mediterranean, Middle eastern, Southern US, Chinese, Atkins, Vegetarian, Low Carb) on cancer rates Factor is the diet, with seven different qualitative levels An experiment to study the effect of precise temperature on bacteria growth rate Factor is the temperature, and levels are quantitative in nature [0ºC ~ 0ºC] An experiment to study the chip defect rate of different VLSI technologies (0.0 micron, 0.05 micron, 0.08 micron, 0. micron) Factor is the size of the single component (transistor) on the chip, with four quantitative levels 006 All Rights Reserved, Robi Polikar, Rowan University, Dept. of Electrical and Computer Engineering

5 -3σ -σ -σ +σ +σ +3σ Single Factor ANOVA Definitions and Assumptions In all of the above examples, there is one factor with multiple levels, and hence oneway (single factor) analysis of multiple populations. Some definitions and assumptions I: Number of populations or treatments being compared J i : Sample size for the i th population/treatment. Often J i =J, i=,,i IJ observations i : the mean value of the i th population, or the average response when the i th treatment is applied X ij : the random variable that denotes the j th measurement taken from the i th population x ij : the observed value of X ij X : Sample mean of the i th population, computed over all values of J i. X : Grand mean, the average of all I.J observations J I J i.. S i : Sample variance of the i th population X ij X ij j= i= j= X i. = X.. = J IJ Assumption: All I distributions are normal with the same variance σ. That is, each X ij is normally distributed with E(X ij )= ij and Var(X ij )= σ. We will accept this assumption as satisfied as long as max (σ i )<. min (σ i ) ( X X ) i =,, I 006 All Rights Reserved, Robi Polikar, Rowan University, Dept. of Electrical and Computer Engineering S i = J j= ij J i.

6 -3σ -σ -σ +σ +σ +3σ A typical dataset can be summarized as follows: Single Factor Experiments Treatment Observations Totals Averages x x x x x J i i x x x x x J i i I x x x x x I I IJ Ii Ii If we were to replace each trial with the mean of its observation, the difference between the mean and the observed value is called the residual. These are expected to have a normal distribution, which can be checked using a normality plot. eij = xij xi i 006 All Rights Reserved, Robi Polikar, Rowan University, Dept. of Electrical and Computer Engineering x ii x ii

7 -3σ -σ -σ +σ +σ +3σ Example In Class Exercise The following data shows the number of hours students of different colleges spend on homework; rows are number of hours spent studied by students randomly selected from each college, whereas the columns represent different colleges. 6 observations, (students) from Engineering ENG LAS COM EDU FPA Do student from certain colleges study harder? I=# of populations = 5 J=sample size of each population =6 x i. x i. j= 4 th student from each college 6 th student from each college = = 3.83 J x ij 3.33 x = x.. =455/30=5.6.. I.67 J i= j= x ij All Rights Reserved, Robi Polikar, Rowan University, Dept. of Electrical and Computer Engineering

8 -3σ -σ -σ +σ +σ +3σ Hypothesis testing for ANOVA The hypothesis of interest in one-way ANOVA is H 0 : = = 3 = = I vs. H a: at least two of the means are different If H 0 is true, then = = 3 = = I, and therefore, x., x., xi. should all be reasonably close to each other. The procedure to test this hypothesis is based on comparing a measure of between-samples variation to a measure of within-sample variation Within sample variation is the variation within each sample (each population). This variation is independent of whether H 0 is true or false, as this is the inherent variation within each sample, hence an indicator of noise / error within each sample. Between-samples variation, however, can indicate whether H 0 is true or false. This is because, the variation from one sample mean to another sample mean will only change significantly, if the population means are truly different, an indication that H 0 is false. Therefore, the ratio of the two gives an even stronger indication of whether H 0 is true: If the between samples variation is large, particularly when the within samples variation (noise) is small, then we have even more evidence against H All Rights Reserved, Robi Polikar, Rowan University, Dept. of Electrical and Computer Engineering

9 Within / Between? -3σ -σ -σ +σ +σ +3σ Between sample variation ENG LAS COM Variation within each sample EDU FPA Average of these is the within sample variation i x i. = s = Average variation among the sample means is the between sample variation 006 All Rights Reserved, Robi Polikar, Rowan University, Dept. of Electrical and Computer Engineering

10 Test-Statistic -3σ -σ -σ +σ +σ +3σ The between-samples variation and within sample variation can quantitatively be expressed using mean square for treatment (MSTr) and mean square error (MSE), respectively. J MSTr = [( X X..) ( XI X..) ] I MSE = I S S... S I The test statistic for one-way anova is then J = ( Xi X..) I i The variations between each sample mean and the overall mean, hence a measure of between-samples variation Each sample variance measures the variation (noise) within that sample. The average of all sample variances is then the average within-sample variation, the mean-square error MSTr F = MSE 006 All Rights Reserved, Robi Polikar, Rowan University, Dept. of Electrical and Computer Engineering

11 Test-statistic -3σ -σ -σ +σ +σ +3σ What value of F provides information regarding rejecting H 0? Recall that if H 0 is true, then = = 3 = = I, and therefore, x., x., xi. should be reasonably close to each other, and also to the grand mean x... Then, the differences between individual sample means and grand mean would be small, resulting in a small MSTr. Otherwise, the differences would be large resulting in a large MSTr. MSE, however, is independent of whether H 0 is true, as it relies on the underlying value of the sample variance. Therefore, we can assert that: When H 0 is true, E(MSTr) = E(MSE) = σ When H 0 is false, E(MSTr) > E(MSE) = σ Therefore, an F value >> indicating that MSTr >> MSE provides justifiable skepticism on H 0. The form of the rejection region is therefore, f c where, f is the observed value of the F statistic, c is the cutoff chosen to give enough benefit of the doubt to H 0. That is c is chosen such that P ( F c, when H 0 is indeed true) α, the desired significance level. 006 All Rights Reserved, Robi Polikar, Rowan University, Dept. of Electrical and Computer Engineering

12 -3σ -σ -σ +σ +σ +3σ χ -Distribution (Little side step) The F-distribution is related to Chi-squared (χ ) distribution: Let X,, X n be a random sample from a normal distribution with parameters and σ. Then the following random variable has a χ -distribution with ν=n- degrees of freedom ( n ) S ( X i X ) χ = = σ σ The χ -distribution is used in computing the confidence intervals of the variance (as opposed z or t- distribution used for the confidence interval of the mean) 006 All Rights Reserved, Robi Polikar, Rowan University, Dept. of Electrical and Computer Engineering

13 In Matlab -3σ -σ -σ +σ +σ +3σ Matlab has several functions to compute various parameters of the χ distribution: Y = chipdf(x,v) computes the χ pdf at each of the values in X using the corresponding parameters in V (V can be a vector including several df s, in which case, Matlab will compute the pdf for each df. P = chicdf(x,v) computes the χ cdf at each of the values in X using the corresponding parameters in V X = chiinv(p,v) computes the inverse of the χ cdf with parameters specified by V for the corresponding probabilities in P. That is, given an area under the curve, this function computes the corresponding critical value, to the left of which the area is the specified value P (-alpha) [M,V] = chistat(nu) returns the mean and variance for the χ distribution with degrees of freedom parameters specified by NU. R = chirnd(v) generates random numbers from the χ distribution with degrees of freedom parameters specified by V. 006 All Rights Reserved, Robi Polikar, Rowan University, Dept. of Electrical and Computer Engineering

14 F-distribution -3σ -σ -σ +σ +σ +3σ The F probability distribution has two parameters v (number of numerator degrees of freedom) and v (number of denominator degrees of freedom). If X and X are independent χ rv s with v and v df, then, the following ratio has an F-distribution with their respective df s. F = X X ν ν Both χ and F distributions are non-symmetric. However, F-distribution has the interesting property that F α, v, v = / Fα, v, v 006 All Rights Reserved, Robi Polikar, Rowan University, Dept. of Electrical and Computer Engineering

15 In Matlab -3σ -σ -σ +σ +σ +3σ Matlab has several functions that compute various parameters of the F- distribution: Y = fpdf(x,v,v) computes the F distribution pdf at each of the values in X using the corresponding parameters in V and V. P = fcdf(x,v,v) computes the F-distribution cdf at each of the values in X using the corresponding parameters in V and V. X = finv(p,v,v) computes the inverse of the F-distribution cdf with numerator degrees of freedom V and denominator degrees of freedom V for the corresponding probabilities in P. That is, given an area under the curve, this function computes the corresponding critical value, to the left of which the area is the specified value P (-alpha) [M,V] = fstat(v,v) returns the mean and variance for the F distribution with parameters specified by V and V R = frnd(v,v) generates random numbers from the F distribution with numerator degrees of freedom V and denominator degrees of freedom V. 006 All Rights Reserved, Robi Polikar, Rowan University, Dept. of Electrical and Computer Engineering

16 -3σ -σ -σ F-Test for Equality of Variances (By request) +σ +σ +3σ Let X,,X m and Y,,Y n be random (independent) samples from normal distributions with std. deviations σ and σ. If S and S are the sample std. deviations, then the following random variable has an F-distribution S / σ F = S / σ with ν = m- ν = n-. Then, the test-statistic for the observed value of two variances is f = s s 006 All Rights Reserved, Robi Polikar, Rowan University, Dept. of Electrical and Computer Engineering

17 F-Test for Equality of Variances -3σ -σ -σ +σ +σ +3σ However, you can use Matlab s finv(.) function for any arbitrary α, ν and ν 006 All Rights Reserved, Robi Polikar, Rowan University, Dept. of Electrical and Computer Engineering

18 -3σ -σ -σ +σ +σ +3σ Back to Anova Test-statistic (Reprise) What value of F provides information regarding rejecting H 0? Again, recall that if H 0 is true, then = = 3 = = I, and therefore,., x., xi. should be reasonable close to each other, and also to the grand mean x... Then, the differences between individual sample means and grand mean would be small, resulting in a small MSTr. Otherwise, the differences would be large resulting in a large MSTr. MSE, however, is independent of whether H 0 is true, as it relies on the underlying value of the sample variance. Therefore, we can assert that: When H 0 is true, E(MSTr) = E(MSE) = σ When H 0 is false, E(MSTr) > E(MSE) = σ Therefore, an F value >> indicating that MSTr >> MSE provides justifiable skepticism on H 0. The form of the rejection region is therefore, f c where, f is the observed value of the F statistic, c is the cutoff chosen to give enough benefit of the doubt to H 0. That is c is chosen such that P ( F c, when H 0 is indeed true) α, the desired significance level. 006 All Rights Reserved, Robi Polikar, Rowan University, Dept. of Electrical and Computer Engineering x

19 The F-Test Example -3σ -σ -σ +σ +σ +3σ Let F = MSTr/MSE be the statistic in a single-factor ANOVA problem involving I populations or treatments with a random sample of J observations from each one. When H 0 is true (basic assumptions true), F has an F distribution with v = I and v = I(J ). The rejection region is then f F α, I-, I(J-) for the significance level α. F-dist. for ν =3, ν = 0 F 0.05, 3, 0 = xi. J MSTr =.. I MSE = f OBS ( x i x ) s i x.. Parameters I, J, ν, ν? I=4, J=6 ν =3, ν =4*(6-)=0 α= 0.05 H 0 : = = 3 = 4 ( ) + ( ) + ( ) + ( ) 4. = i= = 4, All Rights Reserved, Robi Polikar, Rowan University, Dept. of Electrical and Computer Engineering [( 46.55) + ( 40.34) + ( 37.0) + ( 39.87) ] = MSTr = MSE = 4, /69.9 = 5.09 >>> 3.0

20 -3σ -σ -σ +σ +σ +3σ Other Formulas for ANOVA In practice, we compute the following related parameters to conduct an F-test: Using the sum of (instead of averages) of x ij s J I J xi. = xij x.. = x j= i= j= ij Treatment sum of squares SSTr = J SSE = Error sum of squares ( x ) ij xi. Total sum of squares SST I J i= j= I J I i= = x i= j= x i. IJ IJ ij x.. x.. Amount of variation that can be attributed to changes in differences in means of each sample Amount of variation due to inherent noise in each sample. The variation of each x i from its mean. Measure of total variation in the data; the difference between each measurement and the grand mean Fundamental Identity SST = SSTr + SSE Thus, the total variation (SST) can be partitioned into two pieces: SSE is the variation present within samples, and is present whether H 0 is true, and SSTr is the variation between the samples, which can only be explained by differences in sample means. 006 All Rights Reserved, Robi Polikar, Rowan University, Dept. of Electrical and Computer Engineering

21 -3σ -σ -σ +σ +σ +3σ Sum Squares and Mean Squares The statistics we compute SSTr and SSE are intimately related to MSTr and MSE: MSTr = SSTr MSE = SSE I I( J ) F = MSTr MSE = SSTr SSE I ( I ) ( J ) The F r.v. with ν =I- ν =I(J-) The computations for the ANOVA test, using the F-test, are often summarized in a tabular form, known as the ANOVA table 006 All Rights Reserved, Robi Polikar, Rowan University, Dept. of Electrical and Computer Engineering

22 ANOVA Table -3σ -σ -σ +σ +σ +3σ P-value p 006 All Rights Reserved, Robi Polikar, Rowan University, Dept. of Electrical and Computer Engineering

23 In Matlab -3σ -σ -σ +σ +σ +3σ p = anova(x) performs a one-way ANOVA for comparing the means of two or more columns of data in the m-by-n matrix X, where each column represents an independent sample containing m mutually independent observations. The function returns the p-value for the null hypothesis that all samples in X are drawn from the same population (or from different populations with the same mean). If the p-value is near zero, this casts doubt on the null hypothesis and suggests that at least one sample mean is significantly different than the other sample means. The anova function displays two figures. The first figure is the standard ANOVA table, which divides the variability of the data in X into two parts: Variability due to the differences among the column means (variability between groups). Variability due to the differences between the data in each column and the column mean (variability within groups). The second figure displays box plots of each column of X. Large differences in the center lines of the box plots correspond to large values of F and correspondingly small p-values. The ANOVA test makes the following assumptions about the data in X: All sample populations are normally distributed. All sample populations have equal variance. All observations are mutually independent. The ANOVA test is known to be robust to modest violations of the first two assumptions. [p,table,stats] = anova(...) returns the ANOVA table as a cell array as well as a stats structure that you can use to perform a follow-up multiple comparison test. 006 All Rights Reserved, Robi Polikar, Rowan University, Dept. of Electrical and Computer Engineering

24 -3σ -σ -σ +σ +σ +3σ Example In Class Exercise The following data shows the number of hours students of different colleges spend on homework; rows are number of hours spent studied by students randomly selected from each college, whereas the columns represent different colleges. ENG LAS COM EDU FPA 6 observations, (students) from Engineering th student from each college 6 th student from each college Do student from certain colleges study harder? H 0 : = = = 5 I=# of populations = 5 J=sample size of each population =6 006 All Rights Reserved, Robi Polikar, Rowan University, Dept. of Electrical and Computer Engineering

25 Example -3σ -σ -σ +σ +σ +3σ x i. = J j= x ij = i= j= I SSTr = xi. x.. = J i= IJ x.. ( ) I J x ij 6 x i. = , x.. =455 x i. = [( ) ( ) ( ) ( ) ( ) ] ( 455) = All Rights Reserved, Robi Polikar, Rowan University, Dept. of Electrical and Computer Engineering 30 ( ) 7.83 I J SSE = x ij x 7 4 9SSTr I i. f = = = = i= j= OBS 7 5SSE I( J ) = [( ) + ( ) + + ( ) 6 ] I J + [( ) + ( ) + + ( ) ] SST = xij x.. i= j= IJ + [( ) + ( ) + + ( ) ] ( 4) + ( 5) + + ( 3) + = ( 4) + ( 7) + + ( 6) + ( 455) = = SST = SSTr + SSE ( 9) + ( 4) + + ( 0) We can reject the null hypothesis that students F 0.05, 4, 5 =.76 f obs >>F α from all colleges work the same amount. We can also look at the p-value: What is the probability, that if H 0 were true, we would observe an f obs as large as ? In matlab: -fcdf(9.0077, 4, 5)= !!! MSTr MSE

26 Solution by Matlab -3σ -σ -σ +σ +σ +3σ [p, table, stats]=anova(data) p =.97e-004 table = 'Source' 'SS' 'df' 'MS' 'F' 'Prob>F' 'Columns' [ ] [ 4] [ ] [9.0076] [.97e-004] 'Error' [ ] [5] [.867] [] [] 'Total' [.360e+003] [9] [] [] [] stats = gnames: [5x char] n: [ ] source: 'anova' means: [ ] df: 5 s: All Rights Reserved, Robi Polikar, Rowan University, Dept. of Electrical and Computer Engineering

27 Solution by Matlab -3σ -σ -σ +σ +σ +3σ 006 All Rights Reserved, Robi Polikar, Rowan University, Dept. of Electrical and Computer Engineering

28 What Happens After We reject H 0-3σ -σ -σ +σ +σ +3σ Recall that H 0 : = = = I : If f obs <F α, or p>α, then we cannot reject H 0, and we accept that H 0 : = = I But what happens next, if f obs >F α and we reject H 0? We accept the alternative hypothesis, which means that not all means can be considered equal, so at least two of the means must differ But which ones? Multiple Comparisons Procedure The idea is to check all pair wise means, i - j (for all i<j), and compute the CI for each. Those intervals that do not include zero indicate that i and j differ significantly Those intervals that do include zero indicate that i and j do not differ significantly 006 All Rights Reserved, Robi Polikar, Rowan University, Dept. of Electrical and Computer Engineering

29 -3σ -σ -σ +σ +σ +3σ Tukey s Procesure (T-Method for Multiple Comparisons) Use yet another distribution: Studentized Range Distribution (tables) Q α,m,ν : The upper-tail area beyond the α critical value, for the SR dist. with numerator df m and denominator df ν. With probability -α Xi. X j. Qα, I, I( J ) MSE / J i j Xi. X j. + Qα, I, I( J ) MSE / J for every i and j with i < j. Note that m=i (not I- as it was in F-dist.) and ν=i(j-). This formula computes the confidence interval for all i - j, but do we really need the entire confidence interval? We only need to know, whether the CI includes zero or not. There is a simpler form of the Tukey s test! 006 All Rights Reserved, Robi Polikar, Rowan University, Dept. of Electrical and Computer Engineering

30 Simplified Tukey s Test -3σ -σ -σ +σ +σ +3σ. Select α and extract corresponding Q α,i,i(j-). Calculate w= Qα, II, ( J ) MSE / J 3. List the sample means in increasing order, underline those that differ by more than w. Any pair not underlined by the same line corresponds to a pair that are significantly different. 006 All Rights Reserved, Robi Polikar, Rowan University, Dept. of Electrical and Computer Engineering

31 Example -3σ -σ -σ +σ +σ +3σ Recall our bacteria count example, for which we had he following results: p =.97e-004 'Source' 'SS' 'df' 'MS' 'F' 'Prob>F' 'Columns' [ ] [ 4] [ ] [9.0076] [.97e-004] 'Error' [ ] [5] [.867] [] [] 'Total' [.360e+003] [9] [] [] [] Means: Let s compute w: w= Qα, II, ( J ) MSE / J w Q = α, I, I ( J ) = Q 0.05,5,5 MSE J = 8.03 Grp4 Grp3 Grp Grp5 Grp Sort means: How to interpret? 006 All Rights Reserved, Robi Polikar, Rowan University, Dept. of Electrical and Computer Engineering

32 In Matlab -3σ -σ -σ +σ +σ +3σ c = multcompare(stats, alpha) performs a multiple comparison test using the information in the stats structure (from anova(.) ), and returns a matrix c of pairwise comparison results. It also displays an interactive figure presenting a graphical representation of the test. The output c contains the results of the test in the form of a five-column matrix. Each row of the matrix represents one test, and there is one row for each pair of groups. The entries in the row indicate the means being compared, the estimated difference in means, and a confidence interval for the difference. For example, suppose one row contains the following entries These numbers indicate that the mean of group minus the mean of group 5 is estimated to be 8.06, and a 95% confidence interval for the true mean is [.944, 4.497]. In this example the confidence interval does not contain 0.0, so the difference is significant at the 0.05 level. If the confidence interval did contain 0.0, the difference would not be significant at the 0.05 level. The multcompare function also displays a graph with each group mean represented by a symbol and an interval around the symbol. Two means are significantly different if their intervals are disjoint, and are not significantly different if their intervals overlap. You can use the mouse to select any group, and the graph will highlight any other groups that are significantly different from it. 006 All Rights Reserved, Robi Polikar, Rowan University, Dept. of Electrical and Computer Engineering

33 Solution by Matlab -3σ -σ -σ +σ +σ +3σ c = All Rights Reserved, Robi Polikar, Rowan University, Dept. of Electrical and Computer Engineering

34 Homework -3σ -σ -σ +σ +σ +3σ From Chapter 0, 4 From Chapter 3,,4,8 Analyze the data given in these questions to obtain an ANOVA table, solve by hand and then by MATLAB and compare your results. If you do not get the same results, you did not solve correctly! 006 All Rights Reserved, Robi Polikar, Rowan University, Dept. of Electrical and Computer Engineering

One-Way Analysis of Variance (ANOVA) Example Problem

One-Way Analysis of Variance (ANOVA) Example Problem One-Way Analysis of Variance (ANOVA) Example Problem Introduction Analysis of Variance (ANOVA) is a hypothesis-testing technique used to test the equality of two or more population (or treatment) means

More information

INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA)

INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA) INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA) As with other parametric statistics, we begin the one-way ANOVA with a test of the underlying assumptions. Our first assumption is the assumption of

More information

CHAPTER 13. Experimental Design and Analysis of Variance

CHAPTER 13. Experimental Design and Analysis of Variance CHAPTER 13 Experimental Design and Analysis of Variance CONTENTS STATISTICS IN PRACTICE: BURKE MARKETING SERVICES, INC. 13.1 AN INTRODUCTION TO EXPERIMENTAL DESIGN AND ANALYSIS OF VARIANCE Data Collection

More information

Section 13, Part 1 ANOVA. Analysis Of Variance

Section 13, Part 1 ANOVA. Analysis Of Variance Section 13, Part 1 ANOVA Analysis Of Variance Course Overview So far in this course we ve covered: Descriptive statistics Summary statistics Tables and Graphs Probability Probability Rules Probability

More information

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96 1 Final Review 2 Review 2.1 CI 1-propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years

More information

CS 147: Computer Systems Performance Analysis

CS 147: Computer Systems Performance Analysis CS 147: Computer Systems Performance Analysis One-Factor Experiments CS 147: Computer Systems Performance Analysis One-Factor Experiments 1 / 42 Overview Introduction Overview Overview Introduction Finding

More information

Simple Linear Regression Inference

Simple Linear Regression Inference Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation

More information

1.5 Oneway Analysis of Variance

1.5 Oneway Analysis of Variance Statistics: Rosie Cornish. 200. 1.5 Oneway Analysis of Variance 1 Introduction Oneway analysis of variance (ANOVA) is used to compare several means. This method is often used in scientific or medical experiments

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression A regression with two or more explanatory variables is called a multiple regression. Rather than modeling the mean response as a straight line, as in simple regression, it is

More information

CHAPTER 11 CHI-SQUARE AND F DISTRIBUTIONS

CHAPTER 11 CHI-SQUARE AND F DISTRIBUTIONS CHAPTER 11 CHI-SQUARE AND F DISTRIBUTIONS CHI-SQUARE TESTS OF INDEPENDENCE (SECTION 11.1 OF UNDERSTANDABLE STATISTICS) In chi-square tests of independence we use the hypotheses. H0: The variables are independent

More information

One-Way Analysis of Variance: A Guide to Testing Differences Between Multiple Groups

One-Way Analysis of Variance: A Guide to Testing Differences Between Multiple Groups One-Way Analysis of Variance: A Guide to Testing Differences Between Multiple Groups In analysis of variance, the main research question is whether the sample means are from different populations. The

More information

12: Analysis of Variance. Introduction

12: Analysis of Variance. Introduction 1: Analysis of Variance Introduction EDA Hypothesis Test Introduction In Chapter 8 and again in Chapter 11 we compared means from two independent groups. In this chapter we extend the procedure to consider

More information

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Objectives: To perform a hypothesis test concerning the slope of a least squares line To recognize that testing for a

More information

1 Basic ANOVA concepts

1 Basic ANOVA concepts Math 143 ANOVA 1 Analysis of Variance (ANOVA) Recall, when we wanted to compare two population means, we used the 2-sample t procedures. Now let s expand this to compare k 3 population means. As with the

More information

UNDERSTANDING THE TWO-WAY ANOVA

UNDERSTANDING THE TWO-WAY ANOVA UNDERSTANDING THE e have seen how the one-way ANOVA can be used to compare two or more sample means in studies involving a single independent variable. This can be extended to two independent variables

More information

individualdifferences

individualdifferences 1 Simple ANalysis Of Variance (ANOVA) Oftentimes we have more than two groups that we want to compare. The purpose of ANOVA is to allow us to compare group means from several independent samples. In general,

More information

Statistiek II. John Nerbonne. October 1, 2010. Dept of Information Science [email protected]

Statistiek II. John Nerbonne. October 1, 2010. Dept of Information Science j.nerbonne@rug.nl Dept of Information Science [email protected] October 1, 2010 Course outline 1 One-way ANOVA. 2 Factorial ANOVA. 3 Repeated measures ANOVA. 4 Correlation and regression. 5 Multiple regression. 6 Logistic

More information

Chapter 7. One-way ANOVA

Chapter 7. One-way ANOVA Chapter 7 One-way ANOVA One-way ANOVA examines equality of population means for a quantitative outcome and a single categorical explanatory variable with any number of levels. The t-test of Chapter 6 looks

More information

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( ) Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates

More information

ABSORBENCY OF PAPER TOWELS

ABSORBENCY OF PAPER TOWELS ABSORBENCY OF PAPER TOWELS 15. Brief Version of the Case Study 15.1 Problem Formulation 15.2 Selection of Factors 15.3 Obtaining Random Samples of Paper Towels 15.4 How will the Absorbency be measured?

More information

Recall this chart that showed how most of our course would be organized:

Recall this chart that showed how most of our course would be organized: Chapter 4 One-Way ANOVA Recall this chart that showed how most of our course would be organized: Explanatory Variable(s) Response Variable Methods Categorical Categorical Contingency Tables Categorical

More information

Randomized Block Analysis of Variance

Randomized Block Analysis of Variance Chapter 565 Randomized Block Analysis of Variance Introduction This module analyzes a randomized block analysis of variance with up to two treatment factors and their interaction. It provides tables of

More information

One-Way ANOVA using SPSS 11.0. SPSS ANOVA procedures found in the Compare Means analyses. Specifically, we demonstrate

One-Way ANOVA using SPSS 11.0. SPSS ANOVA procedures found in the Compare Means analyses. Specifically, we demonstrate 1 One-Way ANOVA using SPSS 11.0 This section covers steps for testing the difference between three or more group means using the SPSS ANOVA procedures found in the Compare Means analyses. Specifically,

More information

Normality Testing in Excel

Normality Testing in Excel Normality Testing in Excel By Mark Harmon Copyright 2011 Mark Harmon No part of this publication may be reproduced or distributed without the express permission of the author. [email protected]

More information

Class 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1)

Class 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1) Spring 204 Class 9: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.) Big Picture: More than Two Samples In Chapter 7: We looked at quantitative variables and compared the

More information

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS MSR = Mean Regression Sum of Squares MSE = Mean Squared Error RSS = Regression Sum of Squares SSE = Sum of Squared Errors/Residuals α = Level of Significance

More information

Difference of Means and ANOVA Problems

Difference of Means and ANOVA Problems Difference of Means and Problems Dr. Tom Ilvento FREC 408 Accounting Firm Study An accounting firm specializes in auditing the financial records of large firm It is interested in evaluating its fee structure,particularly

More information

Introduction to Analysis of Variance (ANOVA) Limitations of the t-test

Introduction to Analysis of Variance (ANOVA) Limitations of the t-test Introduction to Analysis of Variance (ANOVA) The Structural Model, The Summary Table, and the One- Way ANOVA Limitations of the t-test Although the t-test is commonly used, it has limitations Can only

More information

Analysis of Variance ANOVA

Analysis of Variance ANOVA Analysis of Variance ANOVA Overview We ve used the t -test to compare the means from two independent groups. Now we ve come to the final topic of the course: how to compare means from more than two populations.

More information

Confidence Intervals for the Difference Between Two Means

Confidence Intervals for the Difference Between Two Means Chapter 47 Confidence Intervals for the Difference Between Two Means Introduction This procedure calculates the sample size necessary to achieve a specified distance from the difference in sample means

More information

Coefficient of Determination

Coefficient of Determination Coefficient of Determination The coefficient of determination R 2 (or sometimes r 2 ) is another measure of how well the least squares equation ŷ = b 0 + b 1 x performs as a predictor of y. R 2 is computed

More information

How To Check For Differences In The One Way Anova

How To Check For Differences In The One Way Anova MINITAB ASSISTANT WHITE PAPER This paper explains the research conducted by Minitab statisticians to develop the methods and data checks used in the Assistant in Minitab 17 Statistical Software. One-Way

More information

One-Way Analysis of Variance

One-Way Analysis of Variance One-Way Analysis of Variance Note: Much of the math here is tedious but straightforward. We ll skim over it in class but you should be sure to ask questions if you don t understand it. I. Overview A. We

More information

LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING

LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING In this lab you will explore the concept of a confidence interval and hypothesis testing through a simulation problem in engineering setting.

More information

Statistics Review PSY379

Statistics Review PSY379 Statistics Review PSY379 Basic concepts Measurement scales Populations vs. samples Continuous vs. discrete variable Independent vs. dependent variable Descriptive vs. inferential stats Common analyses

More information

Lesson 1: Comparison of Population Means Part c: Comparison of Two- Means

Lesson 1: Comparison of Population Means Part c: Comparison of Two- Means Lesson : Comparison of Population Means Part c: Comparison of Two- Means Welcome to lesson c. This third lesson of lesson will discuss hypothesis testing for two independent means. Steps in Hypothesis

More information

Stat 5303 (Oehlert): Tukey One Degree of Freedom 1

Stat 5303 (Oehlert): Tukey One Degree of Freedom 1 Stat 5303 (Oehlert): Tukey One Degree of Freedom 1 > catch

More information

Outline. Topic 4 - Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares

Outline. Topic 4 - Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares Topic 4 - Analysis of Variance Approach to Regression Outline Partitioning sums of squares Degrees of freedom Expected mean squares General linear test - Fall 2013 R 2 and the coefficient of correlation

More information

CALCULATIONS & STATISTICS

CALCULATIONS & STATISTICS CALCULATIONS & STATISTICS CALCULATION OF SCORES Conversion of 1-5 scale to 0-100 scores When you look at your report, you will notice that the scores are reported on a 0-100 scale, even though respondents

More information

Solutions to Homework 10 Statistics 302 Professor Larget

Solutions to Homework 10 Statistics 302 Professor Larget s to Homework 10 Statistics 302 Professor Larget Textbook Exercises 7.14 Rock-Paper-Scissors (Graded for Accurateness) In Data 6.1 on page 367 we see a table, reproduced in the table below that shows the

More information

Chapter 5 Analysis of variance SPSS Analysis of variance

Chapter 5 Analysis of variance SPSS Analysis of variance Chapter 5 Analysis of variance SPSS Analysis of variance Data file used: gss.sav How to get there: Analyze Compare Means One-way ANOVA To test the null hypothesis that several population means are equal,

More information

Calculating P-Values. Parkland College. Isela Guerra Parkland College. Recommended Citation

Calculating P-Values. Parkland College. Isela Guerra Parkland College. Recommended Citation Parkland College A with Honors Projects Honors Program 2014 Calculating P-Values Isela Guerra Parkland College Recommended Citation Guerra, Isela, "Calculating P-Values" (2014). A with Honors Projects.

More information

Regression Analysis: A Complete Example

Regression Analysis: A Complete Example Regression Analysis: A Complete Example This section works out an example that includes all the topics we have discussed so far in this chapter. A complete example of regression analysis. PhotoDisc, Inc./Getty

More information

2. Simple Linear Regression

2. Simple Linear Regression Research methods - II 3 2. Simple Linear Regression Simple linear regression is a technique in parametric statistics that is commonly used for analyzing mean response of a variable Y which changes according

More information

MULTIPLE LINEAR REGRESSION ANALYSIS USING MICROSOFT EXCEL. by Michael L. Orlov Chemistry Department, Oregon State University (1996)

MULTIPLE LINEAR REGRESSION ANALYSIS USING MICROSOFT EXCEL. by Michael L. Orlov Chemistry Department, Oregon State University (1996) MULTIPLE LINEAR REGRESSION ANALYSIS USING MICROSOFT EXCEL by Michael L. Orlov Chemistry Department, Oregon State University (1996) INTRODUCTION In modern science, regression analysis is a necessary part

More information

Statistical Models in R

Statistical Models in R Statistical Models in R Some Examples Steven Buechler Department of Mathematics 276B Hurley Hall; 1-6233 Fall, 2007 Outline Statistical Models Linear Models in R Regression Regression analysis is the appropriate

More information

Examining Differences (Comparing Groups) using SPSS Inferential statistics (Part I) Dwayne Devonish

Examining Differences (Comparing Groups) using SPSS Inferential statistics (Part I) Dwayne Devonish Examining Differences (Comparing Groups) using SPSS Inferential statistics (Part I) Dwayne Devonish Statistics Statistics are quantitative methods of describing, analysing, and drawing inferences (conclusions)

More information

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4) Summary of Formulas and Concepts Descriptive Statistics (Ch. 1-4) Definitions Population: The complete set of numerical information on a particular quantity in which an investigator is interested. We assume

More information

Final Exam Practice Problem Answers

Final Exam Practice Problem Answers Final Exam Practice Problem Answers The following data set consists of data gathered from 77 popular breakfast cereals. The variables in the data set are as follows: Brand: The brand name of the cereal

More information

SCHOOL OF HEALTH AND HUMAN SCIENCES DON T FORGET TO RECODE YOUR MISSING VALUES

SCHOOL OF HEALTH AND HUMAN SCIENCES DON T FORGET TO RECODE YOUR MISSING VALUES SCHOOL OF HEALTH AND HUMAN SCIENCES Using SPSS Topics addressed today: 1. Differences between groups 2. Graphing Use the s4data.sav file for the first part of this session. DON T FORGET TO RECODE YOUR

More information

Chapter 3 RANDOM VARIATE GENERATION

Chapter 3 RANDOM VARIATE GENERATION Chapter 3 RANDOM VARIATE GENERATION In order to do a Monte Carlo simulation either by hand or by computer, techniques must be developed for generating values of random variables having known distributions.

More information

Analysis of Variance. MINITAB User s Guide 2 3-1

Analysis of Variance. MINITAB User s Guide 2 3-1 3 Analysis of Variance Analysis of Variance Overview, 3-2 One-Way Analysis of Variance, 3-5 Two-Way Analysis of Variance, 3-11 Analysis of Means, 3-13 Overview of Balanced ANOVA and GLM, 3-18 Balanced

More information

Study Guide for the Final Exam

Study Guide for the Final Exam Study Guide for the Final Exam When studying, remember that the computational portion of the exam will only involve new material (covered after the second midterm), that material from Exam 1 will make

More information

Post-hoc comparisons & two-way analysis of variance. Two-way ANOVA, II. Post-hoc testing for main effects. Post-hoc testing 9.

Post-hoc comparisons & two-way analysis of variance. Two-way ANOVA, II. Post-hoc testing for main effects. Post-hoc testing 9. Two-way ANOVA, II Post-hoc comparisons & two-way analysis of variance 9.7 4/9/4 Post-hoc testing As before, you can perform post-hoc tests whenever there s a significant F But don t bother if it s a main

More information

Testing for Lack of Fit

Testing for Lack of Fit Chapter 6 Testing for Lack of Fit How can we tell if a model fits the data? If the model is correct then ˆσ 2 should be an unbiased estimate of σ 2. If we have a model which is not complex enough to fit

More information

ECON 142 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE #2

ECON 142 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE #2 University of California, Berkeley Prof. Ken Chay Department of Economics Fall Semester, 005 ECON 14 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE # Question 1: a. Below are the scatter plots of hourly wages

More information

Lecture 8. Confidence intervals and the central limit theorem

Lecture 8. Confidence intervals and the central limit theorem Lecture 8. Confidence intervals and the central limit theorem Mathematical Statistics and Discrete Mathematics November 25th, 2015 1 / 15 Central limit theorem Let X 1, X 2,... X n be a random sample of

More information

12.5: CHI-SQUARE GOODNESS OF FIT TESTS

12.5: CHI-SQUARE GOODNESS OF FIT TESTS 125: Chi-Square Goodness of Fit Tests CD12-1 125: CHI-SQUARE GOODNESS OF FIT TESTS In this section, the χ 2 distribution is used for testing the goodness of fit of a set of data to a specific probability

More information

Biostatistics: DESCRIPTIVE STATISTICS: 2, VARIABILITY

Biostatistics: DESCRIPTIVE STATISTICS: 2, VARIABILITY Biostatistics: DESCRIPTIVE STATISTICS: 2, VARIABILITY 1. Introduction Besides arriving at an appropriate expression of an average or consensus value for observations of a population, it is important to

More information

Data Analysis Tools. Tools for Summarizing Data

Data Analysis Tools. Tools for Summarizing Data Data Analysis Tools This section of the notes is meant to introduce you to many of the tools that are provided by Excel under the Tools/Data Analysis menu item. If your computer does not have that tool

More information

Friedman's Two-way Analysis of Variance by Ranks -- Analysis of k-within-group Data with a Quantitative Response Variable

Friedman's Two-way Analysis of Variance by Ranks -- Analysis of k-within-group Data with a Quantitative Response Variable Friedman's Two-way Analysis of Variance by Ranks -- Analysis of k-within-group Data with a Quantitative Response Variable Application: This statistic has two applications that can appear very different,

More information

Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm

Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm Mgt 540 Research Methods Data Analysis 1 Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm http://web.utk.edu/~dap/random/order/start.htm

More information

Multivariate Analysis of Variance (MANOVA)

Multivariate Analysis of Variance (MANOVA) Chapter 415 Multivariate Analysis of Variance (MANOVA) Introduction Multivariate analysis of variance (MANOVA) is an extension of common analysis of variance (ANOVA). In ANOVA, differences among various

More information

An analysis method for a quantitative outcome and two categorical explanatory variables.

An analysis method for a quantitative outcome and two categorical explanatory variables. Chapter 11 Two-Way ANOVA An analysis method for a quantitative outcome and two categorical explanatory variables. If an experiment has a quantitative outcome and two categorical explanatory variables that

More information

Estimation of σ 2, the variance of ɛ

Estimation of σ 2, the variance of ɛ Estimation of σ 2, the variance of ɛ The variance of the errors σ 2 indicates how much observations deviate from the fitted surface. If σ 2 is small, parameters β 0, β 1,..., β k will be reliably estimated

More information

Introduction. Statistics Toolbox

Introduction. Statistics Toolbox Introduction A hypothesis test is a procedure for determining if an assertion about a characteristic of a population is reasonable. For example, suppose that someone says that the average price of a gallon

More information

Variables Control Charts

Variables Control Charts MINITAB ASSISTANT WHITE PAPER This paper explains the research conducted by Minitab statisticians to develop the methods and data checks used in the Assistant in Minitab 17 Statistical Software. Variables

More information

t Tests in Excel The Excel Statistical Master By Mark Harmon Copyright 2011 Mark Harmon

t Tests in Excel The Excel Statistical Master By Mark Harmon Copyright 2011 Mark Harmon t-tests in Excel By Mark Harmon Copyright 2011 Mark Harmon No part of this publication may be reproduced or distributed without the express permission of the author. [email protected] www.excelmasterseries.com

More information

MULTIPLE REGRESSION WITH CATEGORICAL DATA

MULTIPLE REGRESSION WITH CATEGORICAL DATA DEPARTMENT OF POLITICAL SCIENCE AND INTERNATIONAL RELATIONS Posc/Uapp 86 MULTIPLE REGRESSION WITH CATEGORICAL DATA I. AGENDA: A. Multiple regression with categorical variables. Coding schemes. Interpreting

More information

The Chi-Square Test. STAT E-50 Introduction to Statistics

The Chi-Square Test. STAT E-50 Introduction to Statistics STAT -50 Introduction to Statistics The Chi-Square Test The Chi-square test is a nonparametric test that is used to compare experimental results with theoretical models. That is, we will be comparing observed

More information

Statistical Functions in Excel

Statistical Functions in Excel Statistical Functions in Excel There are many statistical functions in Excel. Moreover, there are other functions that are not specified as statistical functions that are helpful in some statistical analyses.

More information

Chapter 4 and 5 solutions

Chapter 4 and 5 solutions Chapter 4 and 5 solutions 4.4. Three different washing solutions are being compared to study their effectiveness in retarding bacteria growth in five gallon milk containers. The analysis is done in a laboratory,

More information

3.4 Statistical inference for 2 populations based on two samples

3.4 Statistical inference for 2 populations based on two samples 3.4 Statistical inference for 2 populations based on two samples Tests for a difference between two population means The first sample will be denoted as X 1, X 2,..., X m. The second sample will be denoted

More information

1. The parameters to be estimated in the simple linear regression model Y=α+βx+ε ε~n(0,σ) are: a) α, β, σ b) α, β, ε c) a, b, s d) ε, 0, σ

1. The parameters to be estimated in the simple linear regression model Y=α+βx+ε ε~n(0,σ) are: a) α, β, σ b) α, β, ε c) a, b, s d) ε, 0, σ STA 3024 Practice Problems Exam 2 NOTE: These are just Practice Problems. This is NOT meant to look just like the test, and it is NOT the only thing that you should study. Make sure you know all the material

More information

A POPULATION MEAN, CONFIDENCE INTERVALS AND HYPOTHESIS TESTING

A POPULATION MEAN, CONFIDENCE INTERVALS AND HYPOTHESIS TESTING CHAPTER 5. A POPULATION MEAN, CONFIDENCE INTERVALS AND HYPOTHESIS TESTING 5.1 Concepts When a number of animals or plots are exposed to a certain treatment, we usually estimate the effect of the treatment

More information

Part 2: Analysis of Relationship Between Two Variables

Part 2: Analysis of Relationship Between Two Variables Part 2: Analysis of Relationship Between Two Variables Linear Regression Linear correlation Significance Tests Multiple regression Linear Regression Y = a X + b Dependent Variable Independent Variable

More information

HYPOTHESIS TESTING: POWER OF THE TEST

HYPOTHESIS TESTING: POWER OF THE TEST HYPOTHESIS TESTING: POWER OF THE TEST The first 6 steps of the 9-step test of hypothesis are called "the test". These steps are not dependent on the observed data values. When planning a research project,

More information

Regression step-by-step using Microsoft Excel

Regression step-by-step using Microsoft Excel Step 1: Regression step-by-step using Microsoft Excel Notes prepared by Pamela Peterson Drake, James Madison University Type the data into the spreadsheet The example used throughout this How to is a regression

More information

Comparing Means in Two Populations

Comparing Means in Two Populations Comparing Means in Two Populations Overview The previous section discussed hypothesis testing when sampling from a single population (either a single mean or two means from the same population). Now we

More information

AP: LAB 8: THE CHI-SQUARE TEST. Probability, Random Chance, and Genetics

AP: LAB 8: THE CHI-SQUARE TEST. Probability, Random Chance, and Genetics Ms. Foglia Date AP: LAB 8: THE CHI-SQUARE TEST Probability, Random Chance, and Genetics Why do we study random chance and probability at the beginning of a unit on genetics? Genetics is the study of inheritance,

More information

Data Mining Techniques Chapter 5: The Lure of Statistics: Data Mining Using Familiar Tools

Data Mining Techniques Chapter 5: The Lure of Statistics: Data Mining Using Familiar Tools Data Mining Techniques Chapter 5: The Lure of Statistics: Data Mining Using Familiar Tools Occam s razor.......................................................... 2 A look at data I.........................................................

More information

Lecture Notes Module 1

Lecture Notes Module 1 Lecture Notes Module 1 Study Populations A study population is a clearly defined collection of people, animals, plants, or objects. In psychological research, a study population usually consists of a specific

More information

ANOVA. February 12, 2015

ANOVA. February 12, 2015 ANOVA February 12, 2015 1 ANOVA models Last time, we discussed the use of categorical variables in multivariate regression. Often, these are encoded as indicator columns in the design matrix. In [1]: %%R

More information

MBA 611 STATISTICS AND QUANTITATIVE METHODS

MBA 611 STATISTICS AND QUANTITATIVE METHODS MBA 611 STATISTICS AND QUANTITATIVE METHODS Part I. Review of Basic Statistics (Chapters 1-11) A. Introduction (Chapter 1) Uncertainty: Decisions are often based on incomplete information from uncertain

More information

NCSS Statistical Software

NCSS Statistical Software Chapter 06 Introduction This procedure provides several reports for the comparison of two distributions, including confidence intervals for the difference in means, two-sample t-tests, the z-test, the

More information

Bivariate Statistics Session 2: Measuring Associations Chi-Square Test

Bivariate Statistics Session 2: Measuring Associations Chi-Square Test Bivariate Statistics Session 2: Measuring Associations Chi-Square Test Features Of The Chi-Square Statistic The chi-square test is non-parametric. That is, it makes no assumptions about the distribution

More information

5. Linear Regression

5. Linear Regression 5. Linear Regression Outline.................................................................... 2 Simple linear regression 3 Linear model............................................................. 4

More information

Parametric and non-parametric statistical methods for the life sciences - Session I

Parametric and non-parametric statistical methods for the life sciences - Session I Why nonparametric methods What test to use? Rank Tests Parametric and non-parametric statistical methods for the life sciences - Session I Liesbeth Bruckers Geert Molenberghs Interuniversity Institute

More information

t-test Statistics Overview of Statistical Tests Assumptions

t-test Statistics Overview of Statistical Tests Assumptions t-test Statistics Overview of Statistical Tests Assumption: Testing for Normality The Student s t-distribution Inference about one mean (one sample t-test) Inference about two means (two sample t-test)

More information

Comparing Multiple Proportions, Test of Independence and Goodness of Fit

Comparing Multiple Proportions, Test of Independence and Goodness of Fit Comparing Multiple Proportions, Test of Independence and Goodness of Fit Content Testing the Equality of Population Proportions for Three or More Populations Test of Independence Goodness of Fit Test 2

More information

Crosstabulation & Chi Square

Crosstabulation & Chi Square Crosstabulation & Chi Square Robert S Michael Chi-square as an Index of Association After examining the distribution of each of the variables, the researcher s next task is to look for relationships among

More information

Survey, Statistics and Psychometrics Core Research Facility University of Nebraska-Lincoln. Log-Rank Test for More Than Two Groups

Survey, Statistics and Psychometrics Core Research Facility University of Nebraska-Lincoln. Log-Rank Test for More Than Two Groups Survey, Statistics and Psychometrics Core Research Facility University of Nebraska-Lincoln Log-Rank Test for More Than Two Groups Prepared by Harlan Sayles (SRAM) Revised by Julia Soulakova (Statistics)

More information

How To Run Statistical Tests in Excel

How To Run Statistical Tests in Excel How To Run Statistical Tests in Excel Microsoft Excel is your best tool for storing and manipulating data, calculating basic descriptive statistics such as means and standard deviations, and conducting

More information

Point Biserial Correlation Tests

Point Biserial Correlation Tests Chapter 807 Point Biserial Correlation Tests Introduction The point biserial correlation coefficient (ρ in this chapter) is the product-moment correlation calculated between a continuous random variable

More information

THE FIRST SET OF EXAMPLES USE SUMMARY DATA... EXAMPLE 7.2, PAGE 227 DESCRIBES A PROBLEM AND A HYPOTHESIS TEST IS PERFORMED IN EXAMPLE 7.

THE FIRST SET OF EXAMPLES USE SUMMARY DATA... EXAMPLE 7.2, PAGE 227 DESCRIBES A PROBLEM AND A HYPOTHESIS TEST IS PERFORMED IN EXAMPLE 7. THERE ARE TWO WAYS TO DO HYPOTHESIS TESTING WITH STATCRUNCH: WITH SUMMARY DATA (AS IN EXAMPLE 7.17, PAGE 236, IN ROSNER); WITH THE ORIGINAL DATA (AS IN EXAMPLE 8.5, PAGE 301 IN ROSNER THAT USES DATA FROM

More information

Independent t- Test (Comparing Two Means)

Independent t- Test (Comparing Two Means) Independent t- Test (Comparing Two Means) The objectives of this lesson are to learn: the definition/purpose of independent t-test when to use the independent t-test the use of SPSS to complete an independent

More information

13: Additional ANOVA Topics. Post hoc Comparisons

13: Additional ANOVA Topics. Post hoc Comparisons 13: Additional ANOVA Topics Post hoc Comparisons ANOVA Assumptions Assessing Group Variances When Distributional Assumptions are Severely Violated Kruskal-Wallis Test Post hoc Comparisons In the prior

More information

Odds ratio, Odds ratio test for independence, chi-squared statistic.

Odds ratio, Odds ratio test for independence, chi-squared statistic. Odds ratio, Odds ratio test for independence, chi-squared statistic. Announcements: Assignment 5 is live on webpage. Due Wed Aug 1 at 4:30pm. (9 days, 1 hour, 58.5 minutes ) Final exam is Aug 9. Review

More information

Module 5: Multiple Regression Analysis

Module 5: Multiple Regression Analysis Using Statistical Data Using to Make Statistical Decisions: Data Multiple to Make Regression Decisions Analysis Page 1 Module 5: Multiple Regression Analysis Tom Ilvento, University of Delaware, College

More information

Confidence Intervals for Cp

Confidence Intervals for Cp Chapter 296 Confidence Intervals for Cp Introduction This routine calculates the sample size needed to obtain a specified width of a Cp confidence interval at a stated confidence level. Cp is a process

More information