Outline. Design and Model ANOVA table and F test Meaning of Main Effects
|
|
|
- Juniper Norman
- 9 years ago
- Views:
Transcription
1 Outline 1 Two-factor design Design and Model ANOVA table and F test Meaning of Main Effects 2 Split-plot design Design and Model, CRD at whole-plot level ANOVA table and F test Split plot with RCBD at whole-plot level
2 Two-factor design Full factorial: all combinations of factor levels are observed. Example: factor A (frog species) with a = 5 levels, factor B (carbaryl presence) with b = 2 levels. Full factorial balanced design with r = 3 replicates: total of = 30 observations. Model response = factora + factorb + factora:factorb + error Y i = µ + α j[i] + β k[i] + γ j[i],k[i] + e i with e i i.i.d N (0, σ 2 e) µ is a population mean. α j : main effect of A (frog species), compared to the mean across all species. Constrained to a j=1 α j = 0. β k : main effect of B (carbaryl). Constraint: b k=1 β k = 0. γ jk : interactive effect of A and B. Constraints: a j=1 γ jk = 0 for each carbaryl level k and b k=1 γ jk = 0 for each frog species j.
3 Two factor design: ANOVA table IE(MS) Source df A, B fixed A, B random A a 1 σ 2 e + br P a j=1 α2 j a 1 B b 1 σ 2 e + ar P b j=1 β2 k Pb 1 j,k γ2 jk (a 1)(b 1) σ 2 e + r σ 2 γ + brσ 2 α σ 2 e + r σ 2 γ + arσ 2 β A:B (a 1)(b 1) σe 2 + r σe 2 + rσγ 2 Error ab(r 1) σe 2 σe 2 Total abr 1 We may consider A and/or B as random. If so, their interaction must be considered random too, with γ jk N (0, σ 2 γ). The F test depends if A and B are considered fixed, or if one of them is considered random.
4 Plant alkaloid content in wild and cultivated plants, different populations and seasons: > dat = read.table("alkaloid.txt", header=t) > with(dat, table(type, month, pop)),, pop = A,, pop = B,, pop = C month month month type May Oct type May Oct type May Oct cultivated 9 10 cultivated cultivated wild wild wild 10 8 > datb = subset(dat, pop=="b") > str(datb) data.frame : 40 obs. of 4 variables: $ pop : Factor w/ 3 levels "A","B","C": $ month : Factor w/ 2 levels "May","October": $ type : Factor w/ 2 levels "cultivated","wild": $ alkaloid: int We will focus on population (or block ) B, no missing values: full factorial design with r = 10 replicates.
5 Same type I and III F-tests Because of the balance, the type I and type III SS are equal: > fit1 = lm(sqrt(alkaloid) type*month, data=datb) > fit2 = lm(sqrt(alkaloid) month*type, data=datb) > anova(fit1) Df Sum Sq Mean Sq F value Pr(>F) type month e-07 *** type:month * Residuals > anova(fit2) Df Sum Sq Mean Sq F value Pr(>F) month e-07 *** type month:type * Residuals > drop1(fit2, test="f") Single term deletions Df Sum of Sq RSS AIC <none> month:type *
6 Meaning of main effects F-test for the main effects of A (say, type): H 0 : all α j = 0 versus H A : at least one α j 0. When there are interactive effects type:month, α j is not the effect of wild type in October, nor in May. It is the average effect of wild, taken across both months. Main effect α j = predicted mean Y in treatment j across all values of the other factor(s) predicted mean Ŷ across all treatment levels and all other factor levels. This value might not be meaningful. It might, or might not. Accordingly, the associated f-test might not be meaningful.
7 Meaning of main effects Why does the F-test for the main effect of type (p =.62) disagree with the t-test for the coefficient typewild (p =.057)? > anova(fit1) Df Sum Sq Mean Sq F value Pr(>F) type month e-07 *** type:month * Residuals > summary(fit1) Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) < 2e-16 *** typewild monthoctober ** typewild:monthoctober * with(datb, interaction.plot(month, type, sqrt(alkaloid)))
8 Meaning of main effects Data from population B mean of sqrt(alkaloid) type µ + α cult µ µ + α wild cultivated wild May month October
9 Meaning of main effects The F-test for the main effect of type corresponds to testing the coefficient type1 when we use the sum contrast: > options(contrasts=c("contr.sum", "contr.poly")) > fit.sum.contrasts = lm(sqrt(alkaloid) type*month, data=datb) > summary(fit.sum.contrasts) Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) < 2e-16 *** type month e-07 *** type1:month * mu = alphawild = alphacult = with(datb, interaction.plot(month, type, sqrt(alkaloid))) abline(h=mu); abline(h=mu+alphawild, col="darkred") abline(h=mu+alphacult, col="blue")
10 Effect of a few missing data Analysis of the cultivated populations: factorial design but 1 missing obs. > datcult = subset(dat, type=="cultivated") > with(datcult, table(pop, month)) month pop May October A 9 10 B C > fit.cult1 = lm(sqrt(alkaloid) pop*month, data=datcult) > fit.cult2 = lm(sqrt(alkaloid) month*pop, data=datcult)
11 Effect of a few missing data Type I and type III sums of squares are not longer exactly equal: > anova(fit.cult1) Df Sum Sq Mean Sq F value Pr(>F) pop < 2.2e-16 *** month e-13 *** pop:month * Residuals > anova(fit.cult2) Df Sum Sq Mean Sq F value Pr(>F) month e-13 *** pop < 2.2e-16 *** month:pop * Residuals > drop1(fit.cult1, test="f") Single term deletions Df Sum of Sq RSS AIC F value Pr(F) <none> pop:month *
12 Meaning of main effects Data from Cultivated populations mean of sqrt(alkaloid) May October µ + α May µ µ + α Oct A B C pop Main effects are meaningful here: the difference between May and October is consistent.
13 Meaning of main effects > summary(fit.cult2) Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) < 2e-16 *** month e-13 *** pop < 2e-16 *** pop < 2e-16 *** month1:pop month1:pop ** > mu = > alphamay = > alphaoct = > with(datcult, interaction.plot(pop, month, sqrt(alkaloid))) > abline(h=mu); > abline(h=mu+alphamay) > abline(h=mu+alphaoct)
14 Outline 1 Two-factor design Design and Model ANOVA table and F test Meaning of Main Effects 2 Split-plot design Design and Model, CRD at whole-plot level ANOVA table and F test Split plot with RCBD at whole-plot level
15 Split-plot design Two of more factors, say A (irrigation) and C (barley variety) A blocking factor B, say large plots ( pan ) A single level of factor A (say) is applied to each block, making it difficult to compare the levels of A. All levels of C are observed in each block.
16 Split-plot design Example: each irrigation treatment is applied to a large plot. The 4 varieties are planted in rows, randomized. Plot 1: Irrigation 2 row 1: V4 row 2: V2 row 3: V1 row 4: V3 Plot 2: Irrigation 3 row 1: V3 row 2: V1 row 3: V2 row 4: V4 Plot 3: Irrigation 1 row 1: V4 row 2: V1 row 3: V3 row 4: V2 Here we cannot compare irrigation treatments, but can compare varieties. Why?
17 Split-plot design 6 large plots are used, randomly assigned to irrigation trts: Plot 1: Irrigation 3 row 1: V3 row 2: V1 row 3: V4 row 4: V2 Plot 4: Irrigation 2 row 1: V2 row 2: V1 row 3: V3 row 4: V4 Plot 2: Irrigation 1 row 1: V1 row 2: V4 row 3: V3 row 4: V2 Plot 5: Irrigation 2 row 1: V2 row 2: V4 row 3: V1 row 4: V3 Plot 3: Irrigation 3 row 1: V4 row 2: V3 row 3: V1 row 4: V2 Plot 6: Irrigation 1 row 1: V4 row 2: V3 row 3: V1 row 4: V2 Now we can compare both the soil types and the varieties: 2 EUs (large plots) per irrigation treatment. 1 EU (row) per variety within each large plot.
18 Split-plot Model Model for split plot design response factor A * factor C + whole-plot (random) + error Y i = µ + α j + v k + (αv) jk + γ l + e i with e i N (0, σe) 2 µ is a population mean across all treatments. α j is a (fixed) main effect of A (irrigation) constrained to a j=1 α j = 0. v k is a (fixed) main effect of C (variety) constrained to c k=1 v k = 0. (αv) jk is a (fixed) interaction effect of A and C constrained to a j=1 (αv) jk = 0 for each variety k and c k=1 (αv) jk = 0 for each irrigation treatment j. γ l iid N (0, σ 2 γ) represents the large plot effect, nested in irrigation.
19 Split-plot design: ANOVA table a: # levels for A b: # whole-plots for each level of A c: # levels for B, each observed once in each whole-plot. Source df E(MS) A a 1 σe 2 + cσγ 2 + bc WPErr, or B a(b 1) σe 2 + cσγ 2 P C c 1 σe 2 k + ab v k 2 AC (a 1)(c 1) σ 2 e + b SPErr a(b 1)(c 1) σe 2 Total abc 1 P j α2 j a 1 Pc 1 j,k (αv)2 jk (a 1)(c 1) WPErr: Whole plot error here at the large plot level SPErr: Subplot error here at the row level.
20 Split-plot design: f-tests Under H 0 : α j = 0 for all j, i.e. no main irrigation effect: F = MSA MSWPErr F a 1,a(b 1) Under H 0 : v k = 0 for all k, i.e. no main variety effect: F = MSC MSSPErr F c 1,a(b 1)(c 1) Under H 0 : (αv) jk = 0 for all j, k, i.e. additivity of irrigation and variety effects: F = MSAC MSSPErr F (a 1)(c 1),a(b 1)(c 1)
21 Oats example and R commands block: fields grouped into 6 homogeneous blocks. field: whole plots. 3 fields per block. variety: of oats (Golden Rain, Marvellous and Victory). Assigned to fields randomly. nitro: amount of nitrogen added to subplots. 4 subplots per field, 4 levels of nitrogen, randomized within each field. yield: response. We will first ignore the blocks, to illustrate the simplest split-splot design.
22 Oats example with lmer > oats = read.table("oats.txt", header=t) > oats$nitro = ordered(oats$nitro) # factor with ordered levels > oats$field123 = factor(oats$field123) > oats$field = factor(oats$field) > oats block field field123 variety nitro yield 1 I 1 1 Marvellous I 1 1 Marvellous I 1 1 Marvellous I 1 1 Marvellous I 2 2 Victory I 2 2 Victory I 3 3 GoldenRain I 3 3 GoldenRain II 4 1 Marvellous II 4 1 Marvellous II 4 1 Marvellous VI 17 2 GoldenRain VI 18 3 Marvellous VI 18 3 Marvellous VI 18 3 Marvellous VI 18 3 Marvellous
23 Oats example with lmer > library(lme4) > fit.lmer1 = lmer(yield variety * nitro + (1 field), data=oats) > anova(fit.lmer1) Analysis of Variance Table Df Sum Sq Mean Sq F value variety nitro variety:nitro > pf(0.6114, df1=2, df2=15, lower.tail=f) # 15=3*(6-1) [1] # no main variety effects > pf( ,df1=3, df2=45, lower.tail=f) # 45=3*(6-1)*(4-1) [1] e-12 # main nitro. effects > pf(0.3028, df1=6, df2=45, lower.tail=f) [1] # no interaction effects
24 Danger! coding for whole plot > library(lme4) > fit.wrong = lmer(yield variety * nitro + (1 field123), data=oats) > anova(fit.wrong) Analysis of Variance Table Df Sum Sq Mean Sq F value variety nitro variety:nitro Ex: use field with ID from 1 through 18 or: use field123 with ID from 1 through 3 AND indicate nesting within blocks : > library(lme4) > fit.okay = lmer(yield variety * nitro + (1 block:field123), data=o > anova(fit.okay) Analysis of Variance Table Df Sum Sq Mean Sq F value variety nitro variety:nitro
25 Same analysis with aov > fit.aov1 = aov(yield variety*nitro + Error(field), data=oats) > summary(fit.aov1) Error: field Df Sum Sq Mean Sq F value Pr(>F) variety Residuals Error: Within Df Sum Sq Mean Sq F value Pr(>F) nitro e-12 *** variety:nitro Residuals Same results from f-tests. Again, make sure to use the field column with IDs from 1 to 18.
26 Danger! of not using random effects for whole plots > fit.wrong = lm(yield variety*nitro + field, data=oats) > anova(fit.wrong) Df Sum Sq Mean Sq F value Pr(>F) variety * nitro e-12 *** field e-08 *** variety:nitro Residuals SS are correct, but the f-value and f-test for variety are wrong. Hint to catch the mistake: some coefficient were not estimated. > summary(fit.wrong) Coefficients: (2 not defined because of singularities) Estimate Std. Error t value Pr(> t ) (Intercept) < 2e-16 ***... field field16 NA NA NA NA field17 NA NA NA NA variety1:nitro.l
27 Split plot with RCBD at whole-plot level Often, replication happens in blocks. Ex: in 2 different fields. Randomly assign factor A levels (irrigation) to whole plots within each block. Plot 1: Irrigation 3 row 1: V3 row 2: V1 row 3: V4 row 4: V2 Plot 2: Irrigation 1 row 1: V1 row 2: V4 row 3: V3 row 4: V2 Plot 3: Irrigation 2 row 1: V4 row 2: V3 row 3: V1 row 4: V2 Plot 4: Irrigation 2 row 1: V2 row 2: V1 row 3: V3 row 4: V4 Plot 5: Irrigation 3 row 1: V2 row 2: V4 row 3: V1 row 4: V3 Plot 6: Irrigation 1 row 1: V4 row 2: V3 row 3: V1 row 4: V2
28 Split-plot Model with RCBD at whole-plot level response factor A * factor C + block + whole-plot + error Y i = µ + α j + v k + (αv) jk + β l + γ lm + e i with e i N (0, σ 2 e) and random whole-plot effects. µ, α j, v k and (αv) jk : as before. Overall population mean, fixed main effects of A (irrigation) and C (variety) and their interaction effect, properly constrained. β l : either fixed or random block effects. If fixed, constrained to sum to 0. If random, iid N (0, σ 2 β ). γ lm iid N (0, σ 2 γ): whole-plot effect, nested in irrigation and in blocks. Must be random.
29 Split-plot Model with RCBD at whole-plot level a: # levels for A b: # blocks, one whole-plot for each level of A per block c: # levels for B, each observed once in each whole-plot. Source df E(MS) A a 1 σ 2 e + cσ 2 γ + bc B b 1 σe 2 + cσγ 2 + ac WPErr, or A:B (a 1)(b 1) σe 2 + cσγ 2 P C c 1 σe 2 k + ab v k 2 AC (a 1)(c 1) σ 2 e + b SPErr a(b 1)(c 1) σe 2 Total abc 1 P j α2 j Pa 1 j β2 j b 1 Pc 1 j,k (αv)2 jk (a 1)(c 1)
30 Oats example with blocks, lmer Block effects as fixed: > fit.lmer2 = lmer(yield block + variety*nitro + (1 field), data=oat > anova(fit.lmer2) Analysis of Variance Table Df Sum Sq Mean Sq F value block variety nitro variety:nitro > pf(1.4848, df1=2, df2=10, lower.tail=f) # 10=(3-1)*(6-1) [1] # no main variety effects > pf( , df1=3, df2=45, lower.tail=f) # 45=3*(6-1)*(4-1) [1] e-12 # main nitro. effects > pf(0.3028, df1=6, df2=45, lower.tail=f) # 45=3*(6-1)*(4-1) [1] # no variety:nitro interact
31 Oats example with blocks, lmer Block effects as random: > fit.lmer3 = lmer(yield variety*nitro + (1 block/field123), data=oa > fit.lmer3 = lmer(yield variety*nitro + (1 block/field), data=oats) > fit.lmer3 = lmer(yield variety*nitro + (1 field) + (1 block), data > anova(fit.lmer3) Analysis of Variance Table Df Sum Sq Mean Sq F value variety nitro variety:nitro We get the same f-values: doesn t matter that block effects are random or fixed.
32 Oats example with aov, fixed blocks > fit.aov2 = aov(yield variety*nitro + block + Error(field), data=oa > summary(fit.aov3) Error: field Df Sum Sq Mean Sq F value Pr(>F) variety block * Residuals Error: Within Df Sum Sq Mean Sq F value Pr(>F) nitro e-12 *** variety:nitro Residuals Make sure to use field and not field123! Same f-values, same p-values as with lmer.
33 Oats example with aov Random block effects. aov accepts only one Error term. Factors within this Error() must be nested. > fit.aov3 = aov(yield variety*nitro + Error(block/field), data=oats Warning message: In aov(yield variety * nitro + Error(block/field), data = oats) : Error() model is singular The ANOVA table would be correct. To get rid of the warning: aov prefers field IDs that are repeated across blocks: 1,2,3,1,2,3 etc. > fit.aov3 = aov(yield variety*nitro + Error(block/field123), data=o
34 Oats example with aov > summary(fit.aov3) Error: block Df Sum Sq Mean Sq F value Pr(>F) Residuals Error: block:field123 Df Sum Sq Mean Sq F value Pr(>F) variety Residuals Error: Within Df Sum Sq Mean Sq F value Pr(>F) nitro e-12 *** variety:nitro Residuals
Outline. RCBD: examples and model Estimates, ANOVA table and f-tests Checking assumptions RCBD with subsampling: Model
Outline 1 Randomized Complete Block Design (RCBD) RCBD: examples and model Estimates, ANOVA table and f-tests Checking assumptions RCBD with subsampling: Model 2 Latin square design Design and model ANOVA
Stat 5303 (Oehlert): Tukey One Degree of Freedom 1
Stat 5303 (Oehlert): Tukey One Degree of Freedom 1 > catch
Statistical Models in R
Statistical Models in R Some Examples Steven Buechler Department of Mathematics 276B Hurley Hall; 1-6233 Fall, 2007 Outline Statistical Models Linear Models in R Regression Regression analysis is the appropriate
DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9
DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9 Analysis of covariance and multiple regression So far in this course,
Multiple Linear Regression
Multiple Linear Regression A regression with two or more explanatory variables is called a multiple regression. Rather than modeling the mean response as a straight line, as in simple regression, it is
Topic 9. Factorial Experiments [ST&D Chapter 15]
Topic 9. Factorial Experiments [ST&D Chapter 5] 9.. Introduction In earlier times factors were studied one at a time, with separate experiments devoted to each factor. In the factorial approach, the investigator
E(y i ) = x T i β. yield of the refined product as a percentage of crude specific gravity vapour pressure ASTM 10% point ASTM end point in degrees F
Random and Mixed Effects Models (Ch. 10) Random effects models are very useful when the observations are sampled in a highly structured way. The basic idea is that the error associated with any linear,
Comparing Nested Models
Comparing Nested Models ST 430/514 Two models are nested if one model contains all the terms of the other, and at least one additional term. The larger model is the complete (or full) model, and the smaller
BIOL 933 Lab 6 Fall 2015. Data Transformation
BIOL 933 Lab 6 Fall 2015 Data Transformation Transformations in R General overview Log transformation Power transformation The pitfalls of interpreting interactions in transformed data Transformations
MIXED MODEL ANALYSIS USING R
Research Methods Group MIXED MODEL ANALYSIS USING R Using Case Study 4 from the BIOMETRICS & RESEARCH METHODS TEACHING RESOURCE BY Stephen Mbunzi & Sonal Nagda www.ilri.org/rmg www.worldagroforestrycentre.org/rmg
HOW TO USE MINITAB: DESIGN OF EXPERIMENTS. Noelle M. Richard 08/27/14
HOW TO USE MINITAB: DESIGN OF EXPERIMENTS 1 Noelle M. Richard 08/27/14 CONTENTS 1. Terminology 2. Factorial Designs When to Use? (preliminary experiments) Full Factorial Design General Full Factorial Design
ii. More than one per Trt*Block -> EXPLORATORY MODEL
1. What is the experimental unit? WHAT IS RANDOMIZED. Are we taking multiple measures of the SAME e.u.? a. Subsamples AVERAGE FIRST! b. Repeated measures in time 3. Are there blocking variables? a. No
Testing for Lack of Fit
Chapter 6 Testing for Lack of Fit How can we tell if a model fits the data? If the model is correct then ˆσ 2 should be an unbiased estimate of σ 2. If we have a model which is not complex enough to fit
Analysis of Variance. MINITAB User s Guide 2 3-1
3 Analysis of Variance Analysis of Variance Overview, 3-2 One-Way Analysis of Variance, 3-5 Two-Way Analysis of Variance, 3-11 Analysis of Means, 3-13 Overview of Balanced ANOVA and GLM, 3-18 Balanced
Outline. Dispersion Bush lupine survival Quasi-Binomial family
Outline 1 Three-way interactions 2 Overdispersion in logistic regression Dispersion Bush lupine survival Quasi-Binomial family 3 Simulation for inference Why simulations Testing model fit: simulating the
How to calculate an ANOVA table
How to calculate an ANOVA table Calculations by Hand We look at the following example: Let us say we measure the height of some plants under the effect of different fertilizers. Treatment Measures Mean
Exercises on using R for Statistics and Hypothesis Testing Dr. Wenjia Wang
Exercises on using R for Statistics and Hypothesis Testing Dr. Wenjia Wang School of Computing Sciences, UEA University of East Anglia Brief Introduction to R R is a free open source statistics and mathematical
Outline. Topic 4 - Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares
Topic 4 - Analysis of Variance Approach to Regression Outline Partitioning sums of squares Degrees of freedom Expected mean squares General linear test - Fall 2013 R 2 and the coefficient of correlation
Least Squares Regression. Alan T. Arnholt Department of Mathematical Sciences Appalachian State University [email protected].
Least Squares Regression Alan T. Arnholt Department of Mathematical Sciences Appalachian State University [email protected] Spring 2006 R Notes 1 Copyright c 2006 Alan T. Arnholt 2 Least Squares
Two-way ANOVA and ANCOVA
Two-way ANOVA and ANCOVA In this tutorial we discuss fitting two-way analysis of variance (ANOVA), as well as, analysis of covariance (ANCOVA) models in R. As we fit these models using regression methods
Minitab Tutorials for Design and Analysis of Experiments. Table of Contents
Table of Contents Introduction to Minitab...2 Example 1 One-Way ANOVA...3 Determining Sample Size in One-way ANOVA...8 Example 2 Two-factor Factorial Design...9 Example 3: Randomized Complete Block Design...14
Statistical Models in R
Statistical Models in R Some Examples Steven Buechler Department of Mathematics 276B Hurley Hall; 1-6233 Fall, 2007 Outline Statistical Models Structure of models in R Model Assessment (Part IA) Anova
Chapter 19 Split-Plot Designs
Chapter 19 Split-Plot Designs Split-plot designs are needed when the levels of some treatment factors are more difficult to change during the experiment than those of others. The designs have a nested
N-Way Analysis of Variance
N-Way Analysis of Variance 1 Introduction A good example when to use a n-way ANOVA is for a factorial design. A factorial design is an efficient way to conduct an experiment. Each observation has data
Simple Linear Regression Inference
Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation
Other forms of ANOVA
Other forms of ANOVA Pierre Legendre, Université de Montréal August 009 1 - Introduction: different forms of analysis of variance 1. One-way or single classification ANOVA (previous lecture) Equal or unequal
Randomized Block Analysis of Variance
Chapter 565 Randomized Block Analysis of Variance Introduction This module analyzes a randomized block analysis of variance with up to two treatment factors and their interaction. It provides tables of
Lecture 11: Confidence intervals and model comparison for linear regression; analysis of variance
Lecture 11: Confidence intervals and model comparison for linear regression; analysis of variance 14 November 2007 1 Confidence intervals and hypothesis testing for linear regression Just as there was
Section 13, Part 1 ANOVA. Analysis Of Variance
Section 13, Part 1 ANOVA Analysis Of Variance Course Overview So far in this course we ve covered: Descriptive statistics Summary statistics Tables and Graphs Probability Probability Rules Probability
Univariate Regression
Univariate Regression Correlation and Regression The regression line summarizes the linear relationship between 2 variables Correlation coefficient, r, measures strength of relationship: the closer r is
1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96
1 Final Review 2 Review 2.1 CI 1-propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years
SPLIT PLOT DESIGN 2 A 2 B 1 B 1 A 1 B 1 A 2 B 2 A 1 B 1 A 2 A 2 B 2 A 2 B 1 A 1 B. Mathematical Model - Split Plot
SPLIT PLOT DESIGN 2 Main Plot Treatments (1, 2) 2 Sub Plot Treatments (A, B) 4 Blocks Block 1 Block 2 Block 3 Block 4 2 A 2 B 1 B 1 A 1 B 1 A 2 B 2 A 1 B 1 A 2 A 2 B 2 A 2 B 1 A 1 B Mathematical Model
1 Basic ANOVA concepts
Math 143 ANOVA 1 Analysis of Variance (ANOVA) Recall, when we wanted to compare two population means, we used the 2-sample t procedures. Now let s expand this to compare k 3 population means. As with the
UNDERSTANDING THE TWO-WAY ANOVA
UNDERSTANDING THE e have seen how the one-way ANOVA can be used to compare two or more sample means in studies involving a single independent variable. This can be extended to two independent variables
Psychology 205: Research Methods in Psychology
Psychology 205: Research Methods in Psychology Using R to analyze the data for study 2 Department of Psychology Northwestern University Evanston, Illinois USA November, 2012 1 / 38 Outline 1 Getting ready
Mixed 2 x 3 ANOVA. Notes
Mixed 2 x 3 ANOVA This section explains how to perform an ANOVA when one of the variables takes the form of repeated measures and the other variable is between-subjects that is, independent groups of participants
Two-Sample T-Tests Assuming Equal Variance (Enter Means)
Chapter 4 Two-Sample T-Tests Assuming Equal Variance (Enter Means) Introduction This procedure provides sample size and power calculations for one- or two-sided two-sample t-tests when the variances of
1.5 Oneway Analysis of Variance
Statistics: Rosie Cornish. 200. 1.5 Oneway Analysis of Variance 1 Introduction Oneway analysis of variance (ANOVA) is used to compare several means. This method is often used in scientific or medical experiments
Stat 412/512 CASE INFLUENCE STATISTICS. Charlotte Wickham. stat512.cwick.co.nz. Feb 2 2015
Stat 412/512 CASE INFLUENCE STATISTICS Feb 2 2015 Charlotte Wickham stat512.cwick.co.nz Regression in your field See website. You may complete this assignment in pairs. Find a journal article in your field
Data Analysis Tools. Tools for Summarizing Data
Data Analysis Tools This section of the notes is meant to introduce you to many of the tools that are provided by Excel under the Tools/Data Analysis menu item. If your computer does not have that tool
Difference of Means and ANOVA Problems
Difference of Means and Problems Dr. Tom Ilvento FREC 408 Accounting Firm Study An accounting firm specializes in auditing the financial records of large firm It is interested in evaluating its fee structure,particularly
Using R for Linear Regression
Using R for Linear Regression In the following handout words and symbols in bold are R functions and words and symbols in italics are entries supplied by the user; underlined words and symbols are optional
Multiple Regression in SPSS This example shows you how to perform multiple regression. The basic command is regression : linear.
Multiple Regression in SPSS This example shows you how to perform multiple regression. The basic command is regression : linear. In the main dialog box, input the dependent variable and several predictors.
MEAN SEPARATION TESTS (LSD AND Tukey s Procedure) is rejected, we need a method to determine which means are significantly different from the others.
MEAN SEPARATION TESTS (LSD AND Tukey s Procedure) If Ho 1 2... n is rejected, we need a method to determine which means are significantly different from the others. We ll look at three separation tests
Random effects and nested models with SAS
Random effects and nested models with SAS /************* classical2.sas ********************* Three levels of factor A, four levels of B Both fixed Both random A fixed, B random B nested within A ***************************************************/
Chapter 4 and 5 solutions
Chapter 4 and 5 solutions 4.4. Three different washing solutions are being compared to study their effectiveness in retarding bacteria growth in five gallon milk containers. The analysis is done in a laboratory,
Week 5: Multiple Linear Regression
BUS41100 Applied Regression Analysis Week 5: Multiple Linear Regression Parameter estimation and inference, forecasting, diagnostics, dummy variables Robert B. Gramacy The University of Chicago Booth School
n + n log(2π) + n log(rss/n)
There is a discrepancy in R output from the functions step, AIC, and BIC over how to compute the AIC. The discrepancy is not very important, because it involves a difference of a constant factor that cancels
ANOVA. February 12, 2015
ANOVA February 12, 2015 1 ANOVA models Last time, we discussed the use of categorical variables in multivariate regression. Often, these are encoded as indicator columns in the design matrix. In [1]: %%R
KSTAT MINI-MANUAL. Decision Sciences 434 Kellogg Graduate School of Management
KSTAT MINI-MANUAL Decision Sciences 434 Kellogg Graduate School of Management Kstat is a set of macros added to Excel and it will enable you to do the statistics required for this course very easily. To
ABSORBENCY OF PAPER TOWELS
ABSORBENCY OF PAPER TOWELS 15. Brief Version of the Case Study 15.1 Problem Formulation 15.2 Selection of Factors 15.3 Obtaining Random Samples of Paper Towels 15.4 How will the Absorbency be measured?
ORTHOGONAL POLYNOMIAL CONTRASTS INDIVIDUAL DF COMPARISONS: EQUALLY SPACED TREATMENTS
ORTHOGONAL POLYNOMIAL CONTRASTS INDIVIDUAL DF COMPARISONS: EQUALLY SPACED TREATMENTS Many treatments are equally spaced (incremented). This provides us with the opportunity to look at the response curve
NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )
Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates
Two-Sample T-Tests Allowing Unequal Variance (Enter Difference)
Chapter 45 Two-Sample T-Tests Allowing Unequal Variance (Enter Difference) Introduction This procedure provides sample size and power calculations for one- or two-sided two-sample t-tests when no assumption
MULTIPLE REGRESSION EXAMPLE
MULTIPLE REGRESSION EXAMPLE For a sample of n = 166 college students, the following variables were measured: Y = height X 1 = mother s height ( momheight ) X 2 = father s height ( dadheight ) X 3 = 1 if
Chicago Booth BUSINESS STATISTICS 41000 Final Exam Fall 2011
Chicago Booth BUSINESS STATISTICS 41000 Final Exam Fall 2011 Name: Section: I pledge my honor that I have not violated the Honor Code Signature: This exam has 34 pages. You have 3 hours to complete this
Module 5: Multiple Regression Analysis
Using Statistical Data Using to Make Statistical Decisions: Data Multiple to Make Regression Decisions Analysis Page 1 Module 5: Multiple Regression Analysis Tom Ilvento, University of Delaware, College
Parametric and non-parametric statistical methods for the life sciences - Session I
Why nonparametric methods What test to use? Rank Tests Parametric and non-parametric statistical methods for the life sciences - Session I Liesbeth Bruckers Geert Molenberghs Interuniversity Institute
Two Related Samples t Test
Two Related Samples t Test In this example 1 students saw five pictures of attractive people and five pictures of unattractive people. For each picture, the students rated the friendliness of the person
An Introduction to Statistical Methods in GenStat
An Introduction to Statistical Methods in GenStat Alex Glaser VSN International, 5 The Waterhouse, Waterhouse Street, Hemel Hempstead, UK email: [email protected] [email protected] Many thanks to Roger
xtmixed & denominator degrees of freedom: myth or magic
xtmixed & denominator degrees of freedom: myth or magic 2011 Chicago Stata Conference Phil Ender UCLA Statistical Consulting Group July 2011 Phil Ender xtmixed & denominator degrees of freedom: myth or
THE KRUSKAL WALLLIS TEST
THE KRUSKAL WALLLIS TEST TEODORA H. MEHOTCHEVA Wednesday, 23 rd April 08 THE KRUSKAL-WALLIS TEST: The non-parametric alternative to ANOVA: testing for difference between several independent groups 2 NON
1.1. Simple Regression in Excel (Excel 2010).
.. Simple Regression in Excel (Excel 200). To get the Data Analysis tool, first click on File > Options > Add-Ins > Go > Select Data Analysis Toolpack & Toolpack VBA. Data Analysis is now available under
Introduction to Analysis of Variance (ANOVA) Limitations of the t-test
Introduction to Analysis of Variance (ANOVA) The Structural Model, The Summary Table, and the One- Way ANOVA Limitations of the t-test Although the t-test is commonly used, it has limitations Can only
The ANOVA for 2x2 Independent Groups Factorial Design
The ANOVA for 2x2 Independent Groups Factorial Design Please Note: In the analyses above I have tried to avoid using the terms "Independent Variable" and "Dependent Variable" (IV and DV) in order to emphasize
POLYNOMIAL AND MULTIPLE REGRESSION. Polynomial regression used to fit nonlinear (e.g. curvilinear) data into a least squares linear regression model.
Polynomial Regression POLYNOMIAL AND MULTIPLE REGRESSION Polynomial regression used to fit nonlinear (e.g. curvilinear) data into a least squares linear regression model. It is a form of linear regression
Final Exam Practice Problem Answers
Final Exam Practice Problem Answers The following data set consists of data gathered from 77 popular breakfast cereals. The variables in the data set are as follows: Brand: The brand name of the cereal
Introduction to Design and Analysis of Experiments with the SAS System (Stat 7010 Lecture Notes)
Introduction to Design and Analysis of Experiments with the SAS System (Stat 7010 Lecture Notes) Asheber Abebe Discrete and Statistical Sciences Auburn University Contents 1 Completely Randomized Design
HYPOTHESIS TESTING: POWER OF THE TEST
HYPOTHESIS TESTING: POWER OF THE TEST The first 6 steps of the 9-step test of hypothesis are called "the test". These steps are not dependent on the observed data values. When planning a research project,
Lucky vs. Unlucky Teams in Sports
Lucky vs. Unlucky Teams in Sports Introduction Assuming gambling odds give true probabilities, one can classify a team as having been lucky or unlucky so far. Do results of matches between lucky and unlucky
5. Linear Regression
5. Linear Regression Outline.................................................................... 2 Simple linear regression 3 Linear model............................................................. 4
Week TSX Index 1 8480 2 8470 3 8475 4 8510 5 8500 6 8480
1) The S & P/TSX Composite Index is based on common stock prices of a group of Canadian stocks. The weekly close level of the TSX for 6 weeks are shown: Week TSX Index 1 8480 2 8470 3 8475 4 8510 5 8500
Calculating P-Values. Parkland College. Isela Guerra Parkland College. Recommended Citation
Parkland College A with Honors Projects Honors Program 2014 Calculating P-Values Isela Guerra Parkland College Recommended Citation Guerra, Isela, "Calculating P-Values" (2014). A with Honors Projects.
SAS Syntax and Output for Data Manipulation:
Psyc 944 Example 5 page 1 Practice with Fixed and Random Effects of Time in Modeling Within-Person Change The models for this example come from Hoffman (in preparation) chapter 5. We will be examining
Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm
Mgt 540 Research Methods Data Analysis 1 Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm http://web.utk.edu/~dap/random/order/start.htm
Introduction. Hypothesis Testing. Hypothesis Testing. Significance Testing
Introduction Hypothesis Testing Mark Lunt Arthritis Research UK Centre for Ecellence in Epidemiology University of Manchester 13/10/2015 We saw last week that we can never know the population parameters
data visualization and regression
data visualization and regression Sepal.Length 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0 I. setosa I. versicolor I. virginica I. setosa I. versicolor I. virginica Species Species
Doing Multiple Regression with SPSS. In this case, we are interested in the Analyze options so we choose that menu. If gives us a number of choices:
Doing Multiple Regression with SPSS Multiple Regression for Data Already in Data Editor Next we want to specify a multiple regression analysis for these data. The menu bar for SPSS offers several options:
Simple Linear Regression
Chapter Nine Simple Linear Regression Consider the following three scenarios: 1. The CEO of the local Tourism Authority would like to know whether a family s annual expenditure on recreation is related
Lets suppose we rolled a six-sided die 150 times and recorded the number of times each outcome (1-6) occured. The data is
In this lab we will look at how R can eliminate most of the annoying calculations involved in (a) using Chi-Squared tests to check for homogeneity in two-way tables of catagorical data and (b) computing
MULTIPLE REGRESSION WITH CATEGORICAL DATA
DEPARTMENT OF POLITICAL SCIENCE AND INTERNATIONAL RELATIONS Posc/Uapp 86 MULTIPLE REGRESSION WITH CATEGORICAL DATA I. AGENDA: A. Multiple regression with categorical variables. Coding schemes. Interpreting
t-test Statistics Overview of Statistical Tests Assumptions
t-test Statistics Overview of Statistical Tests Assumption: Testing for Normality The Student s t-distribution Inference about one mean (one sample t-test) Inference about two means (two sample t-test)
Introduction to Hierarchical Linear Modeling with R
Introduction to Hierarchical Linear Modeling with R 5 10 15 20 25 5 10 15 20 25 13 14 15 16 40 30 20 10 0 40 30 20 10 9 10 11 12-10 SCIENCE 0-10 5 6 7 8 40 30 20 10 0-10 40 1 2 3 4 30 20 10 0-10 5 10 15
Hypothesis testing - Steps
Hypothesis testing - Steps Steps to do a two-tailed test of the hypothesis that β 1 0: 1. Set up the hypotheses: H 0 : β 1 = 0 H a : β 1 0. 2. Compute the test statistic: t = b 1 0 Std. error of b 1 =
Predictor Coef StDev T P Constant 970667056 616256122 1.58 0.154 X 0.00293 0.06163 0.05 0.963. S = 0.5597 R-Sq = 0.0% R-Sq(adj) = 0.
Statistical analysis using Microsoft Excel Microsoft Excel spreadsheets have become somewhat of a standard for data storage, at least for smaller data sets. This, along with the program often being packaged
Exam Solutions. X t = µ + βt + A t,
Exam Solutions Please put your answers on these pages. Write very carefully and legibly. HIT Shenzhen Graduate School James E. Gentle, 2015 1. 3 points. There was a transcription error on the registrar
Lesson 1: Comparison of Population Means Part c: Comparison of Two- Means
Lesson : Comparison of Population Means Part c: Comparison of Two- Means Welcome to lesson c. This third lesson of lesson will discuss hypothesis testing for two independent means. Steps in Hypothesis
CS 147: Computer Systems Performance Analysis
CS 147: Computer Systems Performance Analysis One-Factor Experiments CS 147: Computer Systems Performance Analysis One-Factor Experiments 1 / 42 Overview Introduction Overview Overview Introduction Finding
Chapter 7 Notes - Inference for Single Samples. You know already for a large sample, you can invoke the CLT so:
Chapter 7 Notes - Inference for Single Samples You know already for a large sample, you can invoke the CLT so: X N(µ, ). Also for a large sample, you can replace an unknown σ by s. You know how to do a
STAT 350 Practice Final Exam Solution (Spring 2015)
PART 1: Multiple Choice Questions: 1) A study was conducted to compare five different training programs for improving endurance. Forty subjects were randomly divided into five groups of eight subjects
Quick Stata Guide by Liz Foster
by Liz Foster Table of Contents Part 1: 1 describe 1 generate 1 regress 3 scatter 4 sort 5 summarize 5 table 6 tabulate 8 test 10 ttest 11 Part 2: Prefixes and Notes 14 by var: 14 capture 14 use of the
NCSS Statistical Software
Chapter 06 Introduction This procedure provides several reports for the comparison of two distributions, including confidence intervals for the difference in means, two-sample t-tests, the z-test, the
" Y. Notation and Equations for Regression Lecture 11/4. Notation:
Notation: Notation and Equations for Regression Lecture 11/4 m: The number of predictor variables in a regression Xi: One of multiple predictor variables. The subscript i represents any number from 1 through
Please follow the directions once you locate the Stata software in your computer. Room 114 (Business Lab) has computers with Stata software
STATA Tutorial Professor Erdinç Please follow the directions once you locate the Stata software in your computer. Room 114 (Business Lab) has computers with Stata software 1.Wald Test Wald Test is used
Regression 3: Logistic Regression
Regression 3: Logistic Regression Marco Baroni Practical Statistics in R Outline Logistic regression Logistic regression in R Outline Logistic regression Introduction The model Looking at and comparing
Bowerman, O'Connell, Aitken Schermer, & Adcock, Business Statistics in Practice, Canadian edition
Bowerman, O'Connell, Aitken Schermer, & Adcock, Business Statistics in Practice, Canadian edition Online Learning Centre Technology Step-by-Step - Excel Microsoft Excel is a spreadsheet software application
Bill Burton Albert Einstein College of Medicine [email protected] April 28, 2014 EERS: Managing the Tension Between Rigor and Resources 1
Bill Burton Albert Einstein College of Medicine [email protected] April 28, 2014 EERS: Managing the Tension Between Rigor and Resources 1 Calculate counts, means, and standard deviations Produce
Analysing Questionnaires using Minitab (for SPSS queries contact -) [email protected]
Analysing Questionnaires using Minitab (for SPSS queries contact -) [email protected] Structure As a starting point it is useful to consider a basic questionnaire as containing three main sections:
