The Analysis of Variance ANOVA
|
|
|
- Primrose Short
- 9 years ago
- Views:
Transcription
1 -3σ -σ -σ +σ +σ +3σ The Analysis of Variance ANOVA Lecture / Dr. P. s Clinic Consultant Module in Probability & Statistics in Engineering
2 Today in P&S -3σ -σ -σ +σ +σ +3σ Analysis of Variance (ANOVA) Definitions Single Factor Anova Setting and assumptions The F-statistic Tests about the variance of two populations F-distribution and F-test Anova variables and Anova table ANOVA using MATLAB Multiple Comparisons in ANOVA 006 All Rights Reserved, Robi Polikar, Rowan University, Dept. of Electrical and Computer Engineering
3 Definitions -3σ -σ -σ +σ +σ +3σ The analysis of variance (ANOVA) refers to a collection of experimental situations and statistical procedures for the analysis of quantitative responses from experimental units. The simplest form is known as single factor ANOVA or one-way ANOVA, and is usually used for comparing means of Data sampled from more than two populations, or Data from experiments in which more than two treatments have been used The characteristic that differentiates the treatments or populations from one another is called the factor under study, and the different treatments or populations are referred to as the levels of the factor. 006 All Rights Reserved, Robi Polikar, Rowan University, Dept. of Electrical and Computer Engineering
4 Examples -3σ -σ -σ +σ +σ +3σ An experiment to study geographic demographics (e.g., urban, suburban, rural, international urban, international rural) in overall student success Factor of interest is the geographic demographic, and there are five different qualitative levels. An experiment to study the effect of different diets (Mediterranean, Middle eastern, Southern US, Chinese, Atkins, Vegetarian, Low Carb) on cancer rates Factor is the diet, with seven different qualitative levels An experiment to study the effect of precise temperature on bacteria growth rate Factor is the temperature, and levels are quantitative in nature [0ºC ~ 0ºC] An experiment to study the chip defect rate of different VLSI technologies (0.0 micron, 0.05 micron, 0.08 micron, 0. micron) Factor is the size of the single component (transistor) on the chip, with four quantitative levels 006 All Rights Reserved, Robi Polikar, Rowan University, Dept. of Electrical and Computer Engineering
5 -3σ -σ -σ +σ +σ +3σ Single Factor ANOVA Definitions and Assumptions In all of the above examples, there is one factor with multiple levels, and hence oneway (single factor) analysis of multiple populations. Some definitions and assumptions I: Number of populations or treatments being compared J i : Sample size for the i th population/treatment. Often J i =J, i=,,i IJ observations i : the mean value of the i th population, or the average response when the i th treatment is applied X ij : the random variable that denotes the j th measurement taken from the i th population x ij : the observed value of X ij X : Sample mean of the i th population, computed over all values of J i. X : Grand mean, the average of all I.J observations J I J i.. S i : Sample variance of the i th population X ij X ij j= i= j= X i. = X.. = J IJ Assumption: All I distributions are normal with the same variance σ. That is, each X ij is normally distributed with E(X ij )= ij and Var(X ij )= σ. We will accept this assumption as satisfied as long as max (σ i )<. min (σ i ) ( X X ) i =,, I 006 All Rights Reserved, Robi Polikar, Rowan University, Dept. of Electrical and Computer Engineering S i = J j= ij J i.
6 -3σ -σ -σ +σ +σ +3σ A typical dataset can be summarized as follows: Single Factor Experiments Treatment Observations Totals Averages x x x x x J i i x x x x x J i i I x x x x x I I IJ Ii Ii If we were to replace each trial with the mean of its observation, the difference between the mean and the observed value is called the residual. These are expected to have a normal distribution, which can be checked using a normality plot. eij = xij xi i 006 All Rights Reserved, Robi Polikar, Rowan University, Dept. of Electrical and Computer Engineering x ii x ii
7 -3σ -σ -σ +σ +σ +3σ Example In Class Exercise The following data shows the number of hours students of different colleges spend on homework; rows are number of hours spent studied by students randomly selected from each college, whereas the columns represent different colleges. 6 observations, (students) from Engineering ENG LAS COM EDU FPA Do student from certain colleges study harder? I=# of populations = 5 J=sample size of each population =6 x i. x i. j= 4 th student from each college 6 th student from each college = = 3.83 J x ij 3.33 x = x.. =455/30=5.6.. I.67 J i= j= x ij All Rights Reserved, Robi Polikar, Rowan University, Dept. of Electrical and Computer Engineering
8 -3σ -σ -σ +σ +σ +3σ Hypothesis testing for ANOVA The hypothesis of interest in one-way ANOVA is H 0 : = = 3 = = I vs. H a: at least two of the means are different If H 0 is true, then = = 3 = = I, and therefore, x., x., xi. should all be reasonably close to each other. The procedure to test this hypothesis is based on comparing a measure of between-samples variation to a measure of within-sample variation Within sample variation is the variation within each sample (each population). This variation is independent of whether H 0 is true or false, as this is the inherent variation within each sample, hence an indicator of noise / error within each sample. Between-samples variation, however, can indicate whether H 0 is true or false. This is because, the variation from one sample mean to another sample mean will only change significantly, if the population means are truly different, an indication that H 0 is false. Therefore, the ratio of the two gives an even stronger indication of whether H 0 is true: If the between samples variation is large, particularly when the within samples variation (noise) is small, then we have even more evidence against H All Rights Reserved, Robi Polikar, Rowan University, Dept. of Electrical and Computer Engineering
9 Within / Between? -3σ -σ -σ +σ +σ +3σ Between sample variation ENG LAS COM Variation within each sample EDU FPA Average of these is the within sample variation i x i. = s = Average variation among the sample means is the between sample variation 006 All Rights Reserved, Robi Polikar, Rowan University, Dept. of Electrical and Computer Engineering
10 Test-Statistic -3σ -σ -σ +σ +σ +3σ The between-samples variation and within sample variation can quantitatively be expressed using mean square for treatment (MSTr) and mean square error (MSE), respectively. J MSTr = [( X X..) ( XI X..) ] I MSE = I S S... S I The test statistic for one-way anova is then J = ( Xi X..) I i The variations between each sample mean and the overall mean, hence a measure of between-samples variation Each sample variance measures the variation (noise) within that sample. The average of all sample variances is then the average within-sample variation, the mean-square error MSTr F = MSE 006 All Rights Reserved, Robi Polikar, Rowan University, Dept. of Electrical and Computer Engineering
11 Test-statistic -3σ -σ -σ +σ +σ +3σ What value of F provides information regarding rejecting H 0? Recall that if H 0 is true, then = = 3 = = I, and therefore, x., x., xi. should be reasonably close to each other, and also to the grand mean x... Then, the differences between individual sample means and grand mean would be small, resulting in a small MSTr. Otherwise, the differences would be large resulting in a large MSTr. MSE, however, is independent of whether H 0 is true, as it relies on the underlying value of the sample variance. Therefore, we can assert that: When H 0 is true, E(MSTr) = E(MSE) = σ When H 0 is false, E(MSTr) > E(MSE) = σ Therefore, an F value >> indicating that MSTr >> MSE provides justifiable skepticism on H 0. The form of the rejection region is therefore, f c where, f is the observed value of the F statistic, c is the cutoff chosen to give enough benefit of the doubt to H 0. That is c is chosen such that P ( F c, when H 0 is indeed true) α, the desired significance level. 006 All Rights Reserved, Robi Polikar, Rowan University, Dept. of Electrical and Computer Engineering
12 -3σ -σ -σ +σ +σ +3σ χ -Distribution (Little side step) The F-distribution is related to Chi-squared (χ ) distribution: Let X,, X n be a random sample from a normal distribution with parameters and σ. Then the following random variable has a χ -distribution with ν=n- degrees of freedom ( n ) S ( X i X ) χ = = σ σ The χ -distribution is used in computing the confidence intervals of the variance (as opposed z or t- distribution used for the confidence interval of the mean) 006 All Rights Reserved, Robi Polikar, Rowan University, Dept. of Electrical and Computer Engineering
13 In Matlab -3σ -σ -σ +σ +σ +3σ Matlab has several functions to compute various parameters of the χ distribution: Y = chipdf(x,v) computes the χ pdf at each of the values in X using the corresponding parameters in V (V can be a vector including several df s, in which case, Matlab will compute the pdf for each df. P = chicdf(x,v) computes the χ cdf at each of the values in X using the corresponding parameters in V X = chiinv(p,v) computes the inverse of the χ cdf with parameters specified by V for the corresponding probabilities in P. That is, given an area under the curve, this function computes the corresponding critical value, to the left of which the area is the specified value P (-alpha) [M,V] = chistat(nu) returns the mean and variance for the χ distribution with degrees of freedom parameters specified by NU. R = chirnd(v) generates random numbers from the χ distribution with degrees of freedom parameters specified by V. 006 All Rights Reserved, Robi Polikar, Rowan University, Dept. of Electrical and Computer Engineering
14 F-distribution -3σ -σ -σ +σ +σ +3σ The F probability distribution has two parameters v (number of numerator degrees of freedom) and v (number of denominator degrees of freedom). If X and X are independent χ rv s with v and v df, then, the following ratio has an F-distribution with their respective df s. F = X X ν ν Both χ and F distributions are non-symmetric. However, F-distribution has the interesting property that F α, v, v = / Fα, v, v 006 All Rights Reserved, Robi Polikar, Rowan University, Dept. of Electrical and Computer Engineering
15 In Matlab -3σ -σ -σ +σ +σ +3σ Matlab has several functions that compute various parameters of the F- distribution: Y = fpdf(x,v,v) computes the F distribution pdf at each of the values in X using the corresponding parameters in V and V. P = fcdf(x,v,v) computes the F-distribution cdf at each of the values in X using the corresponding parameters in V and V. X = finv(p,v,v) computes the inverse of the F-distribution cdf with numerator degrees of freedom V and denominator degrees of freedom V for the corresponding probabilities in P. That is, given an area under the curve, this function computes the corresponding critical value, to the left of which the area is the specified value P (-alpha) [M,V] = fstat(v,v) returns the mean and variance for the F distribution with parameters specified by V and V R = frnd(v,v) generates random numbers from the F distribution with numerator degrees of freedom V and denominator degrees of freedom V. 006 All Rights Reserved, Robi Polikar, Rowan University, Dept. of Electrical and Computer Engineering
16 -3σ -σ -σ F-Test for Equality of Variances (By request) +σ +σ +3σ Let X,,X m and Y,,Y n be random (independent) samples from normal distributions with std. deviations σ and σ. If S and S are the sample std. deviations, then the following random variable has an F-distribution S / σ F = S / σ with ν = m- ν = n-. Then, the test-statistic for the observed value of two variances is f = s s 006 All Rights Reserved, Robi Polikar, Rowan University, Dept. of Electrical and Computer Engineering
17 F-Test for Equality of Variances -3σ -σ -σ +σ +σ +3σ However, you can use Matlab s finv(.) function for any arbitrary α, ν and ν 006 All Rights Reserved, Robi Polikar, Rowan University, Dept. of Electrical and Computer Engineering
18 -3σ -σ -σ +σ +σ +3σ Back to Anova Test-statistic (Reprise) What value of F provides information regarding rejecting H 0? Again, recall that if H 0 is true, then = = 3 = = I, and therefore,., x., xi. should be reasonable close to each other, and also to the grand mean x... Then, the differences between individual sample means and grand mean would be small, resulting in a small MSTr. Otherwise, the differences would be large resulting in a large MSTr. MSE, however, is independent of whether H 0 is true, as it relies on the underlying value of the sample variance. Therefore, we can assert that: When H 0 is true, E(MSTr) = E(MSE) = σ When H 0 is false, E(MSTr) > E(MSE) = σ Therefore, an F value >> indicating that MSTr >> MSE provides justifiable skepticism on H 0. The form of the rejection region is therefore, f c where, f is the observed value of the F statistic, c is the cutoff chosen to give enough benefit of the doubt to H 0. That is c is chosen such that P ( F c, when H 0 is indeed true) α, the desired significance level. 006 All Rights Reserved, Robi Polikar, Rowan University, Dept. of Electrical and Computer Engineering x
19 The F-Test Example -3σ -σ -σ +σ +σ +3σ Let F = MSTr/MSE be the statistic in a single-factor ANOVA problem involving I populations or treatments with a random sample of J observations from each one. When H 0 is true (basic assumptions true), F has an F distribution with v = I and v = I(J ). The rejection region is then f F α, I-, I(J-) for the significance level α. F-dist. for ν =3, ν = 0 F 0.05, 3, 0 = xi. J MSTr =.. I MSE = f OBS ( x i x ) s i x.. Parameters I, J, ν, ν? I=4, J=6 ν =3, ν =4*(6-)=0 α= 0.05 H 0 : = = 3 = 4 ( ) + ( ) + ( ) + ( ) 4. = i= = 4, All Rights Reserved, Robi Polikar, Rowan University, Dept. of Electrical and Computer Engineering [( 46.55) + ( 40.34) + ( 37.0) + ( 39.87) ] = MSTr = MSE = 4, /69.9 = 5.09 >>> 3.0
20 -3σ -σ -σ +σ +σ +3σ Other Formulas for ANOVA In practice, we compute the following related parameters to conduct an F-test: Using the sum of (instead of averages) of x ij s J I J xi. = xij x.. = x j= i= j= ij Treatment sum of squares SSTr = J SSE = Error sum of squares ( x ) ij xi. Total sum of squares SST I J i= j= I J I i= = x i= j= x i. IJ IJ ij x.. x.. Amount of variation that can be attributed to changes in differences in means of each sample Amount of variation due to inherent noise in each sample. The variation of each x i from its mean. Measure of total variation in the data; the difference between each measurement and the grand mean Fundamental Identity SST = SSTr + SSE Thus, the total variation (SST) can be partitioned into two pieces: SSE is the variation present within samples, and is present whether H 0 is true, and SSTr is the variation between the samples, which can only be explained by differences in sample means. 006 All Rights Reserved, Robi Polikar, Rowan University, Dept. of Electrical and Computer Engineering
21 -3σ -σ -σ +σ +σ +3σ Sum Squares and Mean Squares The statistics we compute SSTr and SSE are intimately related to MSTr and MSE: MSTr = SSTr MSE = SSE I I( J ) F = MSTr MSE = SSTr SSE I ( I ) ( J ) The F r.v. with ν =I- ν =I(J-) The computations for the ANOVA test, using the F-test, are often summarized in a tabular form, known as the ANOVA table 006 All Rights Reserved, Robi Polikar, Rowan University, Dept. of Electrical and Computer Engineering
22 ANOVA Table -3σ -σ -σ +σ +σ +3σ P-value p 006 All Rights Reserved, Robi Polikar, Rowan University, Dept. of Electrical and Computer Engineering
23 In Matlab -3σ -σ -σ +σ +σ +3σ p = anova(x) performs a one-way ANOVA for comparing the means of two or more columns of data in the m-by-n matrix X, where each column represents an independent sample containing m mutually independent observations. The function returns the p-value for the null hypothesis that all samples in X are drawn from the same population (or from different populations with the same mean). If the p-value is near zero, this casts doubt on the null hypothesis and suggests that at least one sample mean is significantly different than the other sample means. The anova function displays two figures. The first figure is the standard ANOVA table, which divides the variability of the data in X into two parts: Variability due to the differences among the column means (variability between groups). Variability due to the differences between the data in each column and the column mean (variability within groups). The second figure displays box plots of each column of X. Large differences in the center lines of the box plots correspond to large values of F and correspondingly small p-values. The ANOVA test makes the following assumptions about the data in X: All sample populations are normally distributed. All sample populations have equal variance. All observations are mutually independent. The ANOVA test is known to be robust to modest violations of the first two assumptions. [p,table,stats] = anova(...) returns the ANOVA table as a cell array as well as a stats structure that you can use to perform a follow-up multiple comparison test. 006 All Rights Reserved, Robi Polikar, Rowan University, Dept. of Electrical and Computer Engineering
24 -3σ -σ -σ +σ +σ +3σ Example In Class Exercise The following data shows the number of hours students of different colleges spend on homework; rows are number of hours spent studied by students randomly selected from each college, whereas the columns represent different colleges. ENG LAS COM EDU FPA 6 observations, (students) from Engineering th student from each college 6 th student from each college Do student from certain colleges study harder? H 0 : = = = 5 I=# of populations = 5 J=sample size of each population =6 006 All Rights Reserved, Robi Polikar, Rowan University, Dept. of Electrical and Computer Engineering
25 Example -3σ -σ -σ +σ +σ +3σ x i. = J j= x ij = i= j= I SSTr = xi. x.. = J i= IJ x.. ( ) I J x ij 6 x i. = , x.. =455 x i. = [( ) ( ) ( ) ( ) ( ) ] ( 455) = All Rights Reserved, Robi Polikar, Rowan University, Dept. of Electrical and Computer Engineering 30 ( ) 7.83 I J SSE = x ij x 7 4 9SSTr I i. f = = = = i= j= OBS 7 5SSE I( J ) = [( ) + ( ) + + ( ) 6 ] I J + [( ) + ( ) + + ( ) ] SST = xij x.. i= j= IJ + [( ) + ( ) + + ( ) ] ( 4) + ( 5) + + ( 3) + = ( 4) + ( 7) + + ( 6) + ( 455) = = SST = SSTr + SSE ( 9) + ( 4) + + ( 0) We can reject the null hypothesis that students F 0.05, 4, 5 =.76 f obs >>F α from all colleges work the same amount. We can also look at the p-value: What is the probability, that if H 0 were true, we would observe an f obs as large as ? In matlab: -fcdf(9.0077, 4, 5)= !!! MSTr MSE
26 Solution by Matlab -3σ -σ -σ +σ +σ +3σ [p, table, stats]=anova(data) p =.97e-004 table = 'Source' 'SS' 'df' 'MS' 'F' 'Prob>F' 'Columns' [ ] [ 4] [ ] [9.0076] [.97e-004] 'Error' [ ] [5] [.867] [] [] 'Total' [.360e+003] [9] [] [] [] stats = gnames: [5x char] n: [ ] source: 'anova' means: [ ] df: 5 s: All Rights Reserved, Robi Polikar, Rowan University, Dept. of Electrical and Computer Engineering
27 Solution by Matlab -3σ -σ -σ +σ +σ +3σ 006 All Rights Reserved, Robi Polikar, Rowan University, Dept. of Electrical and Computer Engineering
28 What Happens After We reject H 0-3σ -σ -σ +σ +σ +3σ Recall that H 0 : = = = I : If f obs <F α, or p>α, then we cannot reject H 0, and we accept that H 0 : = = I But what happens next, if f obs >F α and we reject H 0? We accept the alternative hypothesis, which means that not all means can be considered equal, so at least two of the means must differ But which ones? Multiple Comparisons Procedure The idea is to check all pair wise means, i - j (for all i<j), and compute the CI for each. Those intervals that do not include zero indicate that i and j differ significantly Those intervals that do include zero indicate that i and j do not differ significantly 006 All Rights Reserved, Robi Polikar, Rowan University, Dept. of Electrical and Computer Engineering
29 -3σ -σ -σ +σ +σ +3σ Tukey s Procesure (T-Method for Multiple Comparisons) Use yet another distribution: Studentized Range Distribution (tables) Q α,m,ν : The upper-tail area beyond the α critical value, for the SR dist. with numerator df m and denominator df ν. With probability -α Xi. X j. Qα, I, I( J ) MSE / J i j Xi. X j. + Qα, I, I( J ) MSE / J for every i and j with i < j. Note that m=i (not I- as it was in F-dist.) and ν=i(j-). This formula computes the confidence interval for all i - j, but do we really need the entire confidence interval? We only need to know, whether the CI includes zero or not. There is a simpler form of the Tukey s test! 006 All Rights Reserved, Robi Polikar, Rowan University, Dept. of Electrical and Computer Engineering
30 Simplified Tukey s Test -3σ -σ -σ +σ +σ +3σ. Select α and extract corresponding Q α,i,i(j-). Calculate w= Qα, II, ( J ) MSE / J 3. List the sample means in increasing order, underline those that differ by more than w. Any pair not underlined by the same line corresponds to a pair that are significantly different. 006 All Rights Reserved, Robi Polikar, Rowan University, Dept. of Electrical and Computer Engineering
31 Example -3σ -σ -σ +σ +σ +3σ Recall our bacteria count example, for which we had he following results: p =.97e-004 'Source' 'SS' 'df' 'MS' 'F' 'Prob>F' 'Columns' [ ] [ 4] [ ] [9.0076] [.97e-004] 'Error' [ ] [5] [.867] [] [] 'Total' [.360e+003] [9] [] [] [] Means: Let s compute w: w= Qα, II, ( J ) MSE / J w Q = α, I, I ( J ) = Q 0.05,5,5 MSE J = 8.03 Grp4 Grp3 Grp Grp5 Grp Sort means: How to interpret? 006 All Rights Reserved, Robi Polikar, Rowan University, Dept. of Electrical and Computer Engineering
32 In Matlab -3σ -σ -σ +σ +σ +3σ c = multcompare(stats, alpha) performs a multiple comparison test using the information in the stats structure (from anova(.) ), and returns a matrix c of pairwise comparison results. It also displays an interactive figure presenting a graphical representation of the test. The output c contains the results of the test in the form of a five-column matrix. Each row of the matrix represents one test, and there is one row for each pair of groups. The entries in the row indicate the means being compared, the estimated difference in means, and a confidence interval for the difference. For example, suppose one row contains the following entries These numbers indicate that the mean of group minus the mean of group 5 is estimated to be 8.06, and a 95% confidence interval for the true mean is [.944, 4.497]. In this example the confidence interval does not contain 0.0, so the difference is significant at the 0.05 level. If the confidence interval did contain 0.0, the difference would not be significant at the 0.05 level. The multcompare function also displays a graph with each group mean represented by a symbol and an interval around the symbol. Two means are significantly different if their intervals are disjoint, and are not significantly different if their intervals overlap. You can use the mouse to select any group, and the graph will highlight any other groups that are significantly different from it. 006 All Rights Reserved, Robi Polikar, Rowan University, Dept. of Electrical and Computer Engineering
33 Solution by Matlab -3σ -σ -σ +σ +σ +3σ c = All Rights Reserved, Robi Polikar, Rowan University, Dept. of Electrical and Computer Engineering
34 Homework -3σ -σ -σ +σ +σ +3σ From Chapter 0, 4 From Chapter 3,,4,8 Analyze the data given in these questions to obtain an ANOVA table, solve by hand and then by MATLAB and compare your results. If you do not get the same results, you did not solve correctly! 006 All Rights Reserved, Robi Polikar, Rowan University, Dept. of Electrical and Computer Engineering
One-Way Analysis of Variance (ANOVA) Example Problem
One-Way Analysis of Variance (ANOVA) Example Problem Introduction Analysis of Variance (ANOVA) is a hypothesis-testing technique used to test the equality of two or more population (or treatment) means
INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA)
INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA) As with other parametric statistics, we begin the one-way ANOVA with a test of the underlying assumptions. Our first assumption is the assumption of
CHAPTER 13. Experimental Design and Analysis of Variance
CHAPTER 13 Experimental Design and Analysis of Variance CONTENTS STATISTICS IN PRACTICE: BURKE MARKETING SERVICES, INC. 13.1 AN INTRODUCTION TO EXPERIMENTAL DESIGN AND ANALYSIS OF VARIANCE Data Collection
Section 13, Part 1 ANOVA. Analysis Of Variance
Section 13, Part 1 ANOVA Analysis Of Variance Course Overview So far in this course we ve covered: Descriptive statistics Summary statistics Tables and Graphs Probability Probability Rules Probability
1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96
1 Final Review 2 Review 2.1 CI 1-propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years
CS 147: Computer Systems Performance Analysis
CS 147: Computer Systems Performance Analysis One-Factor Experiments CS 147: Computer Systems Performance Analysis One-Factor Experiments 1 / 42 Overview Introduction Overview Overview Introduction Finding
Simple Linear Regression Inference
Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation
1.5 Oneway Analysis of Variance
Statistics: Rosie Cornish. 200. 1.5 Oneway Analysis of Variance 1 Introduction Oneway analysis of variance (ANOVA) is used to compare several means. This method is often used in scientific or medical experiments
Multiple Linear Regression
Multiple Linear Regression A regression with two or more explanatory variables is called a multiple regression. Rather than modeling the mean response as a straight line, as in simple regression, it is
CHAPTER 11 CHI-SQUARE AND F DISTRIBUTIONS
CHAPTER 11 CHI-SQUARE AND F DISTRIBUTIONS CHI-SQUARE TESTS OF INDEPENDENCE (SECTION 11.1 OF UNDERSTANDABLE STATISTICS) In chi-square tests of independence we use the hypotheses. H0: The variables are independent
One-Way Analysis of Variance: A Guide to Testing Differences Between Multiple Groups
One-Way Analysis of Variance: A Guide to Testing Differences Between Multiple Groups In analysis of variance, the main research question is whether the sample means are from different populations. The
12: Analysis of Variance. Introduction
1: Analysis of Variance Introduction EDA Hypothesis Test Introduction In Chapter 8 and again in Chapter 11 we compared means from two independent groups. In this chapter we extend the procedure to consider
Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression
Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Objectives: To perform a hypothesis test concerning the slope of a least squares line To recognize that testing for a
1 Basic ANOVA concepts
Math 143 ANOVA 1 Analysis of Variance (ANOVA) Recall, when we wanted to compare two population means, we used the 2-sample t procedures. Now let s expand this to compare k 3 population means. As with the
UNDERSTANDING THE TWO-WAY ANOVA
UNDERSTANDING THE e have seen how the one-way ANOVA can be used to compare two or more sample means in studies involving a single independent variable. This can be extended to two independent variables
individualdifferences
1 Simple ANalysis Of Variance (ANOVA) Oftentimes we have more than two groups that we want to compare. The purpose of ANOVA is to allow us to compare group means from several independent samples. In general,
Statistiek II. John Nerbonne. October 1, 2010. Dept of Information Science [email protected]
Dept of Information Science [email protected] October 1, 2010 Course outline 1 One-way ANOVA. 2 Factorial ANOVA. 3 Repeated measures ANOVA. 4 Correlation and regression. 5 Multiple regression. 6 Logistic
Chapter 7. One-way ANOVA
Chapter 7 One-way ANOVA One-way ANOVA examines equality of population means for a quantitative outcome and a single categorical explanatory variable with any number of levels. The t-test of Chapter 6 looks
NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )
Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates
ABSORBENCY OF PAPER TOWELS
ABSORBENCY OF PAPER TOWELS 15. Brief Version of the Case Study 15.1 Problem Formulation 15.2 Selection of Factors 15.3 Obtaining Random Samples of Paper Towels 15.4 How will the Absorbency be measured?
Recall this chart that showed how most of our course would be organized:
Chapter 4 One-Way ANOVA Recall this chart that showed how most of our course would be organized: Explanatory Variable(s) Response Variable Methods Categorical Categorical Contingency Tables Categorical
Randomized Block Analysis of Variance
Chapter 565 Randomized Block Analysis of Variance Introduction This module analyzes a randomized block analysis of variance with up to two treatment factors and their interaction. It provides tables of
One-Way ANOVA using SPSS 11.0. SPSS ANOVA procedures found in the Compare Means analyses. Specifically, we demonstrate
1 One-Way ANOVA using SPSS 11.0 This section covers steps for testing the difference between three or more group means using the SPSS ANOVA procedures found in the Compare Means analyses. Specifically,
Normality Testing in Excel
Normality Testing in Excel By Mark Harmon Copyright 2011 Mark Harmon No part of this publication may be reproduced or distributed without the express permission of the author. [email protected]
Class 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1)
Spring 204 Class 9: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.) Big Picture: More than Two Samples In Chapter 7: We looked at quantitative variables and compared the
MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS
MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS MSR = Mean Regression Sum of Squares MSE = Mean Squared Error RSS = Regression Sum of Squares SSE = Sum of Squared Errors/Residuals α = Level of Significance
Difference of Means and ANOVA Problems
Difference of Means and Problems Dr. Tom Ilvento FREC 408 Accounting Firm Study An accounting firm specializes in auditing the financial records of large firm It is interested in evaluating its fee structure,particularly
Introduction to Analysis of Variance (ANOVA) Limitations of the t-test
Introduction to Analysis of Variance (ANOVA) The Structural Model, The Summary Table, and the One- Way ANOVA Limitations of the t-test Although the t-test is commonly used, it has limitations Can only
Analysis of Variance ANOVA
Analysis of Variance ANOVA Overview We ve used the t -test to compare the means from two independent groups. Now we ve come to the final topic of the course: how to compare means from more than two populations.
Confidence Intervals for the Difference Between Two Means
Chapter 47 Confidence Intervals for the Difference Between Two Means Introduction This procedure calculates the sample size necessary to achieve a specified distance from the difference in sample means
Coefficient of Determination
Coefficient of Determination The coefficient of determination R 2 (or sometimes r 2 ) is another measure of how well the least squares equation ŷ = b 0 + b 1 x performs as a predictor of y. R 2 is computed
How To Check For Differences In The One Way Anova
MINITAB ASSISTANT WHITE PAPER This paper explains the research conducted by Minitab statisticians to develop the methods and data checks used in the Assistant in Minitab 17 Statistical Software. One-Way
One-Way Analysis of Variance
One-Way Analysis of Variance Note: Much of the math here is tedious but straightforward. We ll skim over it in class but you should be sure to ask questions if you don t understand it. I. Overview A. We
LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING
LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING In this lab you will explore the concept of a confidence interval and hypothesis testing through a simulation problem in engineering setting.
Statistics Review PSY379
Statistics Review PSY379 Basic concepts Measurement scales Populations vs. samples Continuous vs. discrete variable Independent vs. dependent variable Descriptive vs. inferential stats Common analyses
Lesson 1: Comparison of Population Means Part c: Comparison of Two- Means
Lesson : Comparison of Population Means Part c: Comparison of Two- Means Welcome to lesson c. This third lesson of lesson will discuss hypothesis testing for two independent means. Steps in Hypothesis
Stat 5303 (Oehlert): Tukey One Degree of Freedom 1
Stat 5303 (Oehlert): Tukey One Degree of Freedom 1 > catch
Outline. Topic 4 - Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares
Topic 4 - Analysis of Variance Approach to Regression Outline Partitioning sums of squares Degrees of freedom Expected mean squares General linear test - Fall 2013 R 2 and the coefficient of correlation
CALCULATIONS & STATISTICS
CALCULATIONS & STATISTICS CALCULATION OF SCORES Conversion of 1-5 scale to 0-100 scores When you look at your report, you will notice that the scores are reported on a 0-100 scale, even though respondents
Solutions to Homework 10 Statistics 302 Professor Larget
s to Homework 10 Statistics 302 Professor Larget Textbook Exercises 7.14 Rock-Paper-Scissors (Graded for Accurateness) In Data 6.1 on page 367 we see a table, reproduced in the table below that shows the
Chapter 5 Analysis of variance SPSS Analysis of variance
Chapter 5 Analysis of variance SPSS Analysis of variance Data file used: gss.sav How to get there: Analyze Compare Means One-way ANOVA To test the null hypothesis that several population means are equal,
Calculating P-Values. Parkland College. Isela Guerra Parkland College. Recommended Citation
Parkland College A with Honors Projects Honors Program 2014 Calculating P-Values Isela Guerra Parkland College Recommended Citation Guerra, Isela, "Calculating P-Values" (2014). A with Honors Projects.
Regression Analysis: A Complete Example
Regression Analysis: A Complete Example This section works out an example that includes all the topics we have discussed so far in this chapter. A complete example of regression analysis. PhotoDisc, Inc./Getty
2. Simple Linear Regression
Research methods - II 3 2. Simple Linear Regression Simple linear regression is a technique in parametric statistics that is commonly used for analyzing mean response of a variable Y which changes according
MULTIPLE LINEAR REGRESSION ANALYSIS USING MICROSOFT EXCEL. by Michael L. Orlov Chemistry Department, Oregon State University (1996)
MULTIPLE LINEAR REGRESSION ANALYSIS USING MICROSOFT EXCEL by Michael L. Orlov Chemistry Department, Oregon State University (1996) INTRODUCTION In modern science, regression analysis is a necessary part
Statistical Models in R
Statistical Models in R Some Examples Steven Buechler Department of Mathematics 276B Hurley Hall; 1-6233 Fall, 2007 Outline Statistical Models Linear Models in R Regression Regression analysis is the appropriate
Examining Differences (Comparing Groups) using SPSS Inferential statistics (Part I) Dwayne Devonish
Examining Differences (Comparing Groups) using SPSS Inferential statistics (Part I) Dwayne Devonish Statistics Statistics are quantitative methods of describing, analysing, and drawing inferences (conclusions)
Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)
Summary of Formulas and Concepts Descriptive Statistics (Ch. 1-4) Definitions Population: The complete set of numerical information on a particular quantity in which an investigator is interested. We assume
Final Exam Practice Problem Answers
Final Exam Practice Problem Answers The following data set consists of data gathered from 77 popular breakfast cereals. The variables in the data set are as follows: Brand: The brand name of the cereal
SCHOOL OF HEALTH AND HUMAN SCIENCES DON T FORGET TO RECODE YOUR MISSING VALUES
SCHOOL OF HEALTH AND HUMAN SCIENCES Using SPSS Topics addressed today: 1. Differences between groups 2. Graphing Use the s4data.sav file for the first part of this session. DON T FORGET TO RECODE YOUR
Chapter 3 RANDOM VARIATE GENERATION
Chapter 3 RANDOM VARIATE GENERATION In order to do a Monte Carlo simulation either by hand or by computer, techniques must be developed for generating values of random variables having known distributions.
Analysis of Variance. MINITAB User s Guide 2 3-1
3 Analysis of Variance Analysis of Variance Overview, 3-2 One-Way Analysis of Variance, 3-5 Two-Way Analysis of Variance, 3-11 Analysis of Means, 3-13 Overview of Balanced ANOVA and GLM, 3-18 Balanced
Study Guide for the Final Exam
Study Guide for the Final Exam When studying, remember that the computational portion of the exam will only involve new material (covered after the second midterm), that material from Exam 1 will make
Post-hoc comparisons & two-way analysis of variance. Two-way ANOVA, II. Post-hoc testing for main effects. Post-hoc testing 9.
Two-way ANOVA, II Post-hoc comparisons & two-way analysis of variance 9.7 4/9/4 Post-hoc testing As before, you can perform post-hoc tests whenever there s a significant F But don t bother if it s a main
Testing for Lack of Fit
Chapter 6 Testing for Lack of Fit How can we tell if a model fits the data? If the model is correct then ˆσ 2 should be an unbiased estimate of σ 2. If we have a model which is not complex enough to fit
ECON 142 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE #2
University of California, Berkeley Prof. Ken Chay Department of Economics Fall Semester, 005 ECON 14 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE # Question 1: a. Below are the scatter plots of hourly wages
Lecture 8. Confidence intervals and the central limit theorem
Lecture 8. Confidence intervals and the central limit theorem Mathematical Statistics and Discrete Mathematics November 25th, 2015 1 / 15 Central limit theorem Let X 1, X 2,... X n be a random sample of
12.5: CHI-SQUARE GOODNESS OF FIT TESTS
125: Chi-Square Goodness of Fit Tests CD12-1 125: CHI-SQUARE GOODNESS OF FIT TESTS In this section, the χ 2 distribution is used for testing the goodness of fit of a set of data to a specific probability
Biostatistics: DESCRIPTIVE STATISTICS: 2, VARIABILITY
Biostatistics: DESCRIPTIVE STATISTICS: 2, VARIABILITY 1. Introduction Besides arriving at an appropriate expression of an average or consensus value for observations of a population, it is important to
Data Analysis Tools. Tools for Summarizing Data
Data Analysis Tools This section of the notes is meant to introduce you to many of the tools that are provided by Excel under the Tools/Data Analysis menu item. If your computer does not have that tool
Friedman's Two-way Analysis of Variance by Ranks -- Analysis of k-within-group Data with a Quantitative Response Variable
Friedman's Two-way Analysis of Variance by Ranks -- Analysis of k-within-group Data with a Quantitative Response Variable Application: This statistic has two applications that can appear very different,
Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm
Mgt 540 Research Methods Data Analysis 1 Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm http://web.utk.edu/~dap/random/order/start.htm
Multivariate Analysis of Variance (MANOVA)
Chapter 415 Multivariate Analysis of Variance (MANOVA) Introduction Multivariate analysis of variance (MANOVA) is an extension of common analysis of variance (ANOVA). In ANOVA, differences among various
An analysis method for a quantitative outcome and two categorical explanatory variables.
Chapter 11 Two-Way ANOVA An analysis method for a quantitative outcome and two categorical explanatory variables. If an experiment has a quantitative outcome and two categorical explanatory variables that
Estimation of σ 2, the variance of ɛ
Estimation of σ 2, the variance of ɛ The variance of the errors σ 2 indicates how much observations deviate from the fitted surface. If σ 2 is small, parameters β 0, β 1,..., β k will be reliably estimated
Introduction. Statistics Toolbox
Introduction A hypothesis test is a procedure for determining if an assertion about a characteristic of a population is reasonable. For example, suppose that someone says that the average price of a gallon
Variables Control Charts
MINITAB ASSISTANT WHITE PAPER This paper explains the research conducted by Minitab statisticians to develop the methods and data checks used in the Assistant in Minitab 17 Statistical Software. Variables
t Tests in Excel The Excel Statistical Master By Mark Harmon Copyright 2011 Mark Harmon
t-tests in Excel By Mark Harmon Copyright 2011 Mark Harmon No part of this publication may be reproduced or distributed without the express permission of the author. [email protected] www.excelmasterseries.com
MULTIPLE REGRESSION WITH CATEGORICAL DATA
DEPARTMENT OF POLITICAL SCIENCE AND INTERNATIONAL RELATIONS Posc/Uapp 86 MULTIPLE REGRESSION WITH CATEGORICAL DATA I. AGENDA: A. Multiple regression with categorical variables. Coding schemes. Interpreting
The Chi-Square Test. STAT E-50 Introduction to Statistics
STAT -50 Introduction to Statistics The Chi-Square Test The Chi-square test is a nonparametric test that is used to compare experimental results with theoretical models. That is, we will be comparing observed
Statistical Functions in Excel
Statistical Functions in Excel There are many statistical functions in Excel. Moreover, there are other functions that are not specified as statistical functions that are helpful in some statistical analyses.
Chapter 4 and 5 solutions
Chapter 4 and 5 solutions 4.4. Three different washing solutions are being compared to study their effectiveness in retarding bacteria growth in five gallon milk containers. The analysis is done in a laboratory,
3.4 Statistical inference for 2 populations based on two samples
3.4 Statistical inference for 2 populations based on two samples Tests for a difference between two population means The first sample will be denoted as X 1, X 2,..., X m. The second sample will be denoted
1. The parameters to be estimated in the simple linear regression model Y=α+βx+ε ε~n(0,σ) are: a) α, β, σ b) α, β, ε c) a, b, s d) ε, 0, σ
STA 3024 Practice Problems Exam 2 NOTE: These are just Practice Problems. This is NOT meant to look just like the test, and it is NOT the only thing that you should study. Make sure you know all the material
A POPULATION MEAN, CONFIDENCE INTERVALS AND HYPOTHESIS TESTING
CHAPTER 5. A POPULATION MEAN, CONFIDENCE INTERVALS AND HYPOTHESIS TESTING 5.1 Concepts When a number of animals or plots are exposed to a certain treatment, we usually estimate the effect of the treatment
Part 2: Analysis of Relationship Between Two Variables
Part 2: Analysis of Relationship Between Two Variables Linear Regression Linear correlation Significance Tests Multiple regression Linear Regression Y = a X + b Dependent Variable Independent Variable
HYPOTHESIS TESTING: POWER OF THE TEST
HYPOTHESIS TESTING: POWER OF THE TEST The first 6 steps of the 9-step test of hypothesis are called "the test". These steps are not dependent on the observed data values. When planning a research project,
Regression step-by-step using Microsoft Excel
Step 1: Regression step-by-step using Microsoft Excel Notes prepared by Pamela Peterson Drake, James Madison University Type the data into the spreadsheet The example used throughout this How to is a regression
Comparing Means in Two Populations
Comparing Means in Two Populations Overview The previous section discussed hypothesis testing when sampling from a single population (either a single mean or two means from the same population). Now we
AP: LAB 8: THE CHI-SQUARE TEST. Probability, Random Chance, and Genetics
Ms. Foglia Date AP: LAB 8: THE CHI-SQUARE TEST Probability, Random Chance, and Genetics Why do we study random chance and probability at the beginning of a unit on genetics? Genetics is the study of inheritance,
Data Mining Techniques Chapter 5: The Lure of Statistics: Data Mining Using Familiar Tools
Data Mining Techniques Chapter 5: The Lure of Statistics: Data Mining Using Familiar Tools Occam s razor.......................................................... 2 A look at data I.........................................................
Lecture Notes Module 1
Lecture Notes Module 1 Study Populations A study population is a clearly defined collection of people, animals, plants, or objects. In psychological research, a study population usually consists of a specific
ANOVA. February 12, 2015
ANOVA February 12, 2015 1 ANOVA models Last time, we discussed the use of categorical variables in multivariate regression. Often, these are encoded as indicator columns in the design matrix. In [1]: %%R
MBA 611 STATISTICS AND QUANTITATIVE METHODS
MBA 611 STATISTICS AND QUANTITATIVE METHODS Part I. Review of Basic Statistics (Chapters 1-11) A. Introduction (Chapter 1) Uncertainty: Decisions are often based on incomplete information from uncertain
NCSS Statistical Software
Chapter 06 Introduction This procedure provides several reports for the comparison of two distributions, including confidence intervals for the difference in means, two-sample t-tests, the z-test, the
Bivariate Statistics Session 2: Measuring Associations Chi-Square Test
Bivariate Statistics Session 2: Measuring Associations Chi-Square Test Features Of The Chi-Square Statistic The chi-square test is non-parametric. That is, it makes no assumptions about the distribution
5. Linear Regression
5. Linear Regression Outline.................................................................... 2 Simple linear regression 3 Linear model............................................................. 4
Parametric and non-parametric statistical methods for the life sciences - Session I
Why nonparametric methods What test to use? Rank Tests Parametric and non-parametric statistical methods for the life sciences - Session I Liesbeth Bruckers Geert Molenberghs Interuniversity Institute
t-test Statistics Overview of Statistical Tests Assumptions
t-test Statistics Overview of Statistical Tests Assumption: Testing for Normality The Student s t-distribution Inference about one mean (one sample t-test) Inference about two means (two sample t-test)
Comparing Multiple Proportions, Test of Independence and Goodness of Fit
Comparing Multiple Proportions, Test of Independence and Goodness of Fit Content Testing the Equality of Population Proportions for Three or More Populations Test of Independence Goodness of Fit Test 2
Crosstabulation & Chi Square
Crosstabulation & Chi Square Robert S Michael Chi-square as an Index of Association After examining the distribution of each of the variables, the researcher s next task is to look for relationships among
Survey, Statistics and Psychometrics Core Research Facility University of Nebraska-Lincoln. Log-Rank Test for More Than Two Groups
Survey, Statistics and Psychometrics Core Research Facility University of Nebraska-Lincoln Log-Rank Test for More Than Two Groups Prepared by Harlan Sayles (SRAM) Revised by Julia Soulakova (Statistics)
How To Run Statistical Tests in Excel
How To Run Statistical Tests in Excel Microsoft Excel is your best tool for storing and manipulating data, calculating basic descriptive statistics such as means and standard deviations, and conducting
Point Biserial Correlation Tests
Chapter 807 Point Biserial Correlation Tests Introduction The point biserial correlation coefficient (ρ in this chapter) is the product-moment correlation calculated between a continuous random variable
THE FIRST SET OF EXAMPLES USE SUMMARY DATA... EXAMPLE 7.2, PAGE 227 DESCRIBES A PROBLEM AND A HYPOTHESIS TEST IS PERFORMED IN EXAMPLE 7.
THERE ARE TWO WAYS TO DO HYPOTHESIS TESTING WITH STATCRUNCH: WITH SUMMARY DATA (AS IN EXAMPLE 7.17, PAGE 236, IN ROSNER); WITH THE ORIGINAL DATA (AS IN EXAMPLE 8.5, PAGE 301 IN ROSNER THAT USES DATA FROM
Independent t- Test (Comparing Two Means)
Independent t- Test (Comparing Two Means) The objectives of this lesson are to learn: the definition/purpose of independent t-test when to use the independent t-test the use of SPSS to complete an independent
13: Additional ANOVA Topics. Post hoc Comparisons
13: Additional ANOVA Topics Post hoc Comparisons ANOVA Assumptions Assessing Group Variances When Distributional Assumptions are Severely Violated Kruskal-Wallis Test Post hoc Comparisons In the prior
Odds ratio, Odds ratio test for independence, chi-squared statistic.
Odds ratio, Odds ratio test for independence, chi-squared statistic. Announcements: Assignment 5 is live on webpage. Due Wed Aug 1 at 4:30pm. (9 days, 1 hour, 58.5 minutes ) Final exam is Aug 9. Review
Module 5: Multiple Regression Analysis
Using Statistical Data Using to Make Statistical Decisions: Data Multiple to Make Regression Decisions Analysis Page 1 Module 5: Multiple Regression Analysis Tom Ilvento, University of Delaware, College
Confidence Intervals for Cp
Chapter 296 Confidence Intervals for Cp Introduction This routine calculates the sample size needed to obtain a specified width of a Cp confidence interval at a stated confidence level. Cp is a process
