An Introduction to Statistical Methods in GenStat



Similar documents
Analysis of Variance. MINITAB User s Guide 2 3-1

5 Analysis of Variance models, complex linear models and Random effects models

Data Analysis Tools. Tools for Summarizing Data

Minitab Tutorials for Design and Analysis of Experiments. Table of Contents

Using Excel for Statistics Tips and Warnings

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

Bowerman, O'Connell, Aitken Schermer, & Adcock, Business Statistics in Practice, Canadian edition

Figure 1. An embedded chart on a worksheet.

Using Excel (Microsoft Office 2007 Version) for Graphical Analysis of Data

Using Excel in Research. Hui Bian Office for Faculty Excellence

Randomized Block Analysis of Variance

APPLYING BENFORD'S LAW This PDF contains step-by-step instructions on how to apply Benford's law using Microsoft Excel, which is commonly used by

business statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar

KSTAT MINI-MANUAL. Decision Sciences 434 Kellogg Graduate School of Management

Getting Started with Minitab 17

Predictor Coef StDev T P Constant X S = R-Sq = 0.0% R-Sq(adj) = 0.

Below is a very brief tutorial on the basic capabilities of Excel. Refer to the Excel help files for more information.

Business Objects Version 5 : Introduction

2. Simple Linear Regression

INTRODUCTION TO EXCEL

Using Excel s Analysis ToolPak Add-In

Multivariate Analysis of Variance (MANOVA)

SPSS (Statistical Package for the Social Sciences)

MS-EXCEL: STATISTICAL PROCEDURES

GeoGebra Statistics and Probability

Excel Guide for Finite Mathematics and Applied Calculus

Tutorial 2: Using Excel in Data Analysis

How To Run Statistical Tests in Excel

IBM SPSS Statistics 20 Part 4: Chi-Square and ANOVA

Appendix 2.1 Tabular and Graphical Methods Using Excel

Recall this chart that showed how most of our course would be organized:

Spreadsheets and Laboratory Data Analysis: Excel 2003 Version (Excel 2007 is only slightly different)

Assignment objectives:

Monthly Payroll to Finance Reconciliation Report: Access and Instructions

Once saved, if the file was zipped you will need to unzip it. For the files that I will be posting you need to change the preferences.

Basic Pivot Tables. To begin your pivot table, choose Data, Pivot Table and Pivot Chart Report. 1 of 18

Using Microsoft Excel to Plot and Analyze Kinetic Data

DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9

TIPS FOR DOING STATISTICS IN EXCEL

Microsoft Excel Tutorial

SECTION 2-1: OVERVIEW SECTION 2-2: FREQUENCY DISTRIBUTIONS

Data Analysis. Using Excel. Jeffrey L. Rummel. BBA Seminar. Data in Excel. Excel Calculations of Descriptive Statistics. Single Variable Graphs

Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.

SPSS Manual for Introductory Applied Statistics: A Variable Approach

Simple Linear Regression Inference

Spreadsheet software for linear regression analysis

Advanced Excel for Institutional Researchers

EXCEL Analysis TookPak [Statistical Analysis] 1. First of all, check to make sure that the Analysis ToolPak is installed. Here is how you do it:

Additional sources Compilation of sources:

One-Way Analysis of Variance (ANOVA) Example Problem

ADD-INS: ENHANCING EXCEL

There are six different windows that can be opened when using SPSS. The following will give a description of each of them.

Assessing Measurement System Variation

Summary of important mathematical operations and formulas (from first tutorial):

Chapter 7. One-way ANOVA

January 26, 2009 The Faculty Center for Teaching and Learning

AP Physics 1 and 2 Lab Investigations

Microsoft Excel. Qi Wei

Getting Started with Excel Table of Contents

Elementary Statistics Sample Exam #3

General instructions for the content of all StatTools assignments and the use of StatTools:

Statistics Review PSY379

Algebra 1 Course Information

Bill Burton Albert Einstein College of Medicine April 28, 2014 EERS: Managing the Tension Between Rigor and Resources 1

DataPA OpenAnalytics End User Training

Directions for using SPSS

HOW TO USE MINITAB: DESIGN OF EXPERIMENTS. Noelle M. Richard 08/27/14

Chapter 4 and 5 solutions

Microsoft Access 2010 Overview of Basics

TI-Inspire manual 1. Instructions. Ti-Inspire for statistics. General Introduction

Course Text. Required Computing Software. Course Description. Course Objectives. StraighterLine. Business Statistics

Basic Excel Handbook

Point Biserial Correlation Tests

Engineering Problem Solving and Excel. EGN 1006 Introduction to Engineering

Using Microsoft Excel for Probability and Statistics

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

Preface of Excel Guide

ECDL. European Computer Driving Licence. Spreadsheet Software BCS ITQ Level 2. Syllabus Version 5.0

An introduction to using Microsoft Excel for quantitative data analysis

Information Server Documentation SIMATIC. Information Server V8.0 Update 1 Information Server Documentation. Introduction 1. Web application basics 2

How To Check For Differences In The One Way Anova

An analysis method for a quantitative outcome and two categorical explanatory variables.

Creating an Access Database. To start an Access Database, you should first go into Access and then select file, new.

An analysis appropriate for a quantitative outcome and a single quantitative explanatory. 9.1 The model behind linear regression

Tutorial for proteome data analysis using the Perseus software platform

seven Statistical Analysis with Excel chapter OVERVIEW CHAPTER

Disciplined Use of Spreadsheet Packages for Data Entry

Gamma Distribution Fitting

Introduction to Microsoft Access 2003

To change title of module, click on settings

Statistical Models in R

Drawing a histogram using Excel

Stat 5303 (Oehlert): Tukey One Degree of Freedom 1

A Guide to Survey Analysis in GenStat. by Steve Langton. Defra Environmental Observatory, 1-2 Peasholme Green, York YO1 7PX, UK.

Excel Companion. (Profit Embedded PHD) User's Guide

Introduction to Statistical Computing in Microsoft Excel By Hector D. Flores; and Dr. J.A. Dobelman

TRINITY COLLEGE. Faculty of Engineering, Mathematics and Science. School of Computer Science & Statistics

Tutorial: Get Running with Amos Graphics

Intro to Excel spreadsheets

Transcription:

An Introduction to Statistical Methods in GenStat Alex Glaser VSN International, 5 The Waterhouse, Waterhouse Street, Hemel Hempstead, UK email: alex@vsni.co.uk support@vsni.co.uk Many thanks to Roger Payne for the original slides Aberystwyth, January 2011

Programme Day 1 Introduction to GenStat From t-test to one-way anova Basic principles of design and blocking Treatment structure factorials & interactions and checking the assumptions Day 2 Simple linear regression Multiple linear regression GLM counts and binomial data GLM further models and extensions

Aim of course To give you an overall introduction to the GenStat 13th Edition system.

Learning objectives By the end of the course, you will be able to Navigate the GenStat interface Obtain help from the system where necessary Input and manage data Analyse data through GenStat menus All without the help of the trainer.

Exercise 1.1 What happens when you select input log in the window navigator? Can you see yourself using this feature in you work? If so, how? What happens to status bar when you click the button? Resize the input log and output window so that you can see both simultaneously What happens when you click the button? Use the tools customize toolbar menu to add or remove buttons from the toolbar to suit your needs.

Exercise 1.2 What happens to the text in right hand corner of the status bar if you press the insert key? What do you think this part of the status bar means? Open a new text window using the button. In this window, type the following GenStat command PRINT This is my first time using GenStat Execute the command using the Run Submit Line menu option. Now select the Window Event Log entry for this action. Is there an Event log for this action?

GenStat Client Menus Commands GenStat Server

Exercise 2 Find help for what s new in the 13 th edition of GenStat Find help on the GenStat spreadsheet Open the Tools Options menu and find help about the ECHO COMMANDS setting on the AUDIT TRAIL tab. Open a new test window and type in the word FIT. Place the cursor in the word and press the F1 key. What is FIT? Type in a statistical term and press the F1 key. View the Introduction to GenStat guide (pdf format) View an example program for a two-sample t- test.

Data / Load Menu File Menu ASCII Spreadsheet Database files Other Statistics packages GenStat Save Set up ODBC query Saved ODBC Queries DDE links Spread Menu Spreadsheet Other Statistics packages GIS GenStat Save GenStat session Database files Saved ODBC Queries Saved DDE links Blank / type data Data in GenStat to edit From clipboard Excel Set up ODBC query DDE link Central Data Core

Exercise 3.1 Clear all the data from GenStat and use the file open menu to read the data from the file sulphur.xls from installsets\data Clear all the data from GenStat. Go to the tools spreadsheet options file menu and uncheck the use excel import wizard on file open option. Repeat part 1 using the file open menu. Which approach best suits your way of working? The file bacteria.xls, that you met earlier, contains data from a second experiment in the worksheet called Bacteria Counts. The data are not stored in standard format; the data can be found in the range of cells D3:E13. Clear the data core. Read the data into GenStat using the Excel import wizard button.

Exercise 3.2 & 3.3 Using the data in the iris.gsh file: Produce a scatter plot of Sepal Width versus Petal Width. There is one point in this plot that stands alone. What are the coordinates of this point? Can you suggest a method of easily identifying to which species of iris this unusual point belongs? Produce a scatter plot of Sepal Length versus Petal Length. Give each factor a different symbol and colours. Experiment with labelling. Produce a histogram of Petal lengths versus Petal widths. Using your own data, experiment with the different aspects of the graphics window. That is, explore the different menus and toolbars. If you have not brought your on data sets, experiment with any of the course data files.

Exercise 4.1 Using the Excel Import Wizard, load in the file Traffic.xls On the second screen enter B3:D43 in the Specified Range box. Click OK on the Select Columns to Convert to Factors menu Convert Day and Month to factors using the methods of your choice.

Exercise 4.2 Continue using the file Traffic.xls Select a cell in the Day column. Delete the value, type F and then press return. Repeat the process but with the value G. What property of the GenStat spreadsheet do you think this illustrates. Select the Tools Spreadsheet Options Conversions menu. Check the Allow new factor levels in Edit box. Now repeat the above question. What happens now?

Exercise 4.3 Continue using the file Traffic.xls Create a new variate which contains the log of the Counts. Sort the columns in descending order of the Counts. Use the Spread Manipulate Unstack to create separate variables for each day of the week. Experiment with the Calculate menu with your own data.

1 From t test to one way anova In this session you will learn how to use the t-test to compare two treatments the T-Test menu how to use one-way ANOVA to compare several treatments the model fitted in one-way anova the statistical philosophy behind one-way anova the relationship between one-way anova and the t-test for two treatments how to use the One- and two-way ANOVA menu for oneway anova how to plot the means from one-way anova how to do multiple comparisons Note: topics marked are optional

t test suppose we have 2 sets of units, that have received 2 different treatments: animals that have been fed two different diets plots that have been given different fertilisers subjects with different drugs plants with different fungicides. assume the units do not have any special structure e.g. the animals are all of the same breed the plots are in a fairly uniform field the subjects are of similar ages, weights and heights with 2 treatments we may then do a t-test assume each group from a Normal distribution usually assume distributions have the same s.e. (can check) but may have different means

Data sets data sets for the examples and practicals can be accessed using the Example Data Sets menu filter by the course Guide to Anova and Design select the file click on Open data

t test experiment to study yields from 2 manufacturing methods data in Manufacture.gsh do yields differ more than we would expect from the random variation? can we estimate mean yields from each method?

t test menu Use GenStat menus for simplicity

Output

Practical 1.2 spreadsheet Pots.gsh stores data from a fertilizer experiment 7 plants grown in pots with no fertilizer 8 plants grown in similar conditions with fertilizer do a two-sample t-test to assess whether fertilizer has an effect

One way analysis of variance linear model y ij = μ + a i + ε ij represent each mean by grand mean μ + effect a i observations described by fitted value μ + a i + residual ε ij

Residual variation may arise from many different causes: the units may not be absolutely identical (discuss later how to allocate units to treatments to take account of this) they may experience slightly different conditions during the experiment there may be measurement errors they may be being dealt with by different people during the experiment and you can no doubt think of others! so estimation is not exact analysis must estimate the amount of variation and take account of it in drawing conclusions

One way anova linear model y ij = μ + a i + ε ij if treatments have no effect a 1 = a 2 = 0 y ij = μ + ε ij estimate grand mean by average of all data values assess lack of fit of model by sum of squared residuals (RSS 0 ) degrees of freedom (d.f.) is n 1 +n 2 1 (fitted 1 parameter μ) fit full model estimate a i by average for group i minus grand mean assess lack of fit of model by sum of squared residuals (RSS 1 ) this has n 1 +n 2 2 d.f. (2 parameters as (n 1 a 1 +n 2 a 2 )/(n 1 +n 2 )=0) assess treatments sum of squares due to treatments is TSS=RSS 0 RSS 1 on 1 d.f. assess underlying variation by residual from full model RSS 1 variance ratio is treatment mean square / residual mean square VR = {TSS / 1} / {RSS 1 / (n 1 +n 2 2)} on 1 and (n 1 +n 2 2) d.f.

One and two way ANOVA menu

Output aov table tables of means s.e.'s for differences between means (m1 m2)/sed = t

ANOVA Options menu Options menu controls the output

ANOVA Further Output menu Further Output menu provides more output (without redoing the analysis)

ANOVA Means Plots menu Means Plots menu plots means as points or joined by lines or with original data points too or in a bar chart

Practical 1.4 spreadsheet Pots.gsh stores data from a fertilizer experiment used in Practical 1.2 7 plants grown in pots with no fertilizer 8 plants grown in similar conditions with fertilizer do a one-way analysis of variance to assess if fertilizer has an effect compare results with t-test from Practical 1.2

One way anova with >2 treatments spreadsheet Rat.gsh has data from an experiment to study effect of dietary supplements on gain in weight of rats 5 diet treatments (a-e) 20 rats allocated at random, 4 per treatment can use One-and two-way ANOVA menu, and plot means, as before

Output aov table means s.e.d

Plot of means suppose a-e represent amounts 0-4 of supplement might want to assess linear (& quadratic?) effects of supplement

Multiple comparison tests in favour there may be many possible comparisons between pairs of treatment means (with t treatments there are t (t 1)/2) so some researchers feel their significance levels should be adjusted to take account of all the tests that they might make against multiple-comparisons are unnecessary if you have only a small number of comparisons to make either because there are few treatments, or because you should have identified beforehand the comparisons that you feel are likely to be of interest they are inappropriate also if the treatments have any sort of structure e.g. levels may represent different amounts of a substance like a fertiliser or a drug, then illogical to assume that only some of the amounts might have an effect see on-line help for the menu

Multiple comparisons check that they are enabled on the Menus tab of the Options menu

Multiple comparisons the Multiple Comparisons button will then be available to click on the ANOVA Further Output menu check Multiple Comparisons select Treatment and type of Test click OK (and then Run on the Further Output menu)

Practical 1.9 spreadsheet Octane.gsh stores data from an experiment to study the effect of different additives A-E on the octane level of gasoline used in Practical 1.7 do a one-way analysis of variance to assess if Gasoline has an effect do a Bonferroni multiplecomparison test to compare the types of gasoline

2 Blocking structures In this session you will learn how to improve the precision of an experiment by grouping the units into similar sets called "blocks" how randomization can avoid bias by guarding against unforeseen differences amongst the units how to design and analyse a complete randomized block design how to recognise situations that may require more than one type of blocking how to design and analyse a Latin square design Note: topics marked are optional

Completely randomized design design used for all examples so far no formal structure is imposed on the units assumes units effectively identical e.g. in a field experiment, no systematic differences in underlying fertility, drainage etc of the plots in a glasshouse, assumes that light and temperature are the same for each row of pots in a factory, that workforce behaves in essentially the same way at different times of day, days of the week etc in educational studies, that children in different schools are approximately the same, or students studying different subjects at Universities, or in different year groups etc treatments allocated to units at random

Non uniform units for example field experiment on a slope best plots may be at top of slope random allocation of treatments to plots may not seem "fair" e.g. replicates of treatment A mainly on "good" plots & replicates of treatment B mainly on "bad" plots if no actual difference between A & B, could lead to A appearing to be much better than B systematic differences between plots increase the residual sum of squares, & hence the estimate of random variability treatment differences must be larger to give a significant F-test standard errors of differences between treatments will be larger i.e. experiment will give less precise results if you know there are differences between units avoid bias & improve precision by grouping (blocking) units into homogenous groups (i.e. groups that are effectively identical)

Randomized block design single grouping factor usually known as blocks within each block same number of units for each treatment (one per treatment in a randomized-complete-block design) treatments are allocated randomly to the units in analysis block-effects are estimated and removed, leading to more-precise estimates e.g.

One way anova with blocks another experiment to study effect of dietary supplements on gain in weight of rats 8 litters of 5 rats assume rats from same litter more similar than those from different litters 5 Diet treatments (A-E), allocated at random to rats within each litter

No blocking residual m.s. 206.8 variance ratio 0.42 s.e.d. 7.19

With litters as blocks Differences between litters residual m.s. 40.63 (c.f. 206.8) variance ratio 2.13 (c.f. 0.42) s.e.d. 3.19 (c.f. 7.19)

Practical 2.3 spreadsheet Wheatstrains.gsh contains the results from a randomized block design to assess 4 strains of wheat analyse the experiment give your assessment of whether the blocking was worthwhile

Blocking in 2 directions e.g. experiment on pot plants in a glasshouse door in east wall which may cause temperature differences sunlight mainly from the south other e.g. weekday time-of-day school year-group factory weekday time location

Latin square design a design for t treatments arranged in t rows and t columns (i.e. t 2 units) each treatment occurs exactly once in each row and once in each column randomized by randomly permuting rows & columns e.g.

Latin square example experiment to assess the (in?)consistency of 6 samplers in assessing the heights of wheat plants 6 areas of wheat to assess may also be ordering effects (accuracy of samplers may vary during experiment) so 6 6 Latin square used with blocking factors Areas and Orders

Analysis of Variance menu select Design to be Latin Square

Output between Areas between Orders Samplers more precisely estimated (residual m.s. 3.328 c.f. 5.801)

Practical 2.5 spreadsheet Fabric.gsh contains the results from a Latin square design to assess wear resistance of rubber-covered fabrics column factor is 4 different runs row factor is four positions on testing machine used to generate wear under simulated natural conditions analyse the results

3 Treatment structure In this session you will learn how to recognise the need for more than one treatment factor analyse designs with two treatment factors using the Oneand two-way ANOVA menu define and interpret interactions between factors analyse designs with two treatment factors using the general Analysis of Variance menu use the Anova Contrasts menu estimate comparisons between levels of treatments interpret interactions between treatment contrasts use model formulae to define the treatment terms to be fitted include control treatments in a factorial experiment use covariates to improve precision by using additional background information about the experimental units (not used for blocking Note: topics marked are optional

Types of treatment experiments may study different types of treatment e.g. several different drugs at a range of different doses several different types of fertiliser varieties of wheat and types of fungicide represent each type of treatment by a different treatment factor, with levels to represent the various possibilities e.g. Drug levels Morphine, Amidone, Phenadoxone, Pethidine; Dose levels 2.5, 5, 10, 15; Nitrogen levels 0, 50, 100, 150; Phosphate levels 50, 100; Fungicide levels Carbendazim, Prochloraz; Amount levels 2, 3, 4.

Two treatment factors experiment on canola (oil-seed rape) 2 treatment factors N (nitrogen) 0, 180, 230 S (sulphur) 0, 10, 20, 40 randomized-block design with 3 blocks (factor block) and 12 plots per block

One and two way ANOVA menu Two-way analysis (Treatment factors N & S) with Blocks (factor block)

Output line for each term: N & S main effects, and N.S interaction table of means for each treatment term s.e.d. for each table of means

Linear model y ijk = μ + β i + n j + s k + ns jk + ε ijk β i represent the block effects (block stratum in the aov) ε ijk are the residuals n j represent the main effect of nitrogen (N) s k represent the main effect of sulphur (S) ns jk represent the interaction between nitrogen & sulphur (N.S) analysis fits each term in turn, so you can decide how complicated a model is required analysis-of-variance table has a line for each term, so you can assess whether its parameters are needed in the model conclusions will be much clearer if there is no interaction

With interaction

Without interaction lines are parallel can decide on best level of S without considering N or best level of N without considering S need present only one-way tables of means

General Analysis of Variance menu Design: Two-way ANOVA (in Randomized Blocks) click on Contrasts button to fit comparisons (or other contrasts)

Comparison contrasts 1 comparison between levels of N clicking OK opens matrix spreadsheet Cont type information into Cont to define comparison

General Analysis of Variance menu notice function Comp in Treatment 1 (1 comparison of N defined by Cont)

Output extra line for N assesses the comparison also extra line for N.S to assess interaction of comparison with S

Practical 3.3 spreadsheet Ratfactorial.gsh contains the results from an experiment to study the effect of 6 different diets on the gain in weight of rats treatment factors concern the protein in the diet Amount (High or Low) Source (Beef, Cereal or Pork) analyse the data as a twoway factorial fit 2 comparison contrasts between levels of Source Animal vs Vegetable Beef vs Pork

Model formula define a model to be fitted in an analysis formed automatically by the menus or can define your own list of model terms, linked by operator "+ e.g. A + B 2 terms representing main effects of factors A & B Higher-order terms specified as series of factors separated by dots (e.g. interactions): meaning depends on contents of formula e.g. N + S + N.S N.S is an interaction e.g. Block + Block.Plot Block.Plot represents plotwithin-block effects: differences between individual plots after removing the overall similarity between plots in same block

Operators for formulae crossing operator * specifies factorial structures e.g. N * S is expanded automatically to become N + S + N.S nesting operator / occurs most often in block formulae e.g Block / Plot is expanded to become Block + Block.Plot

Several operators 3-factor factorial model A * B * C becomes A + B + C + A.B + A.C + B.C + A.B.C 3 nested factors (e.g. block model of split-plot) block / wplot / subplot becomes block + block.wplot + block.wplot.subplot factorial-plus-added-control treatment structure Control / (Drug * Dose) expands to Control + Control.Drug + Control.Dose + Control.Drug.Dose NB: many commands and menus have a FACTORIAL option to control the number of factors/variates in the terms to fit

Factorial plus added control 4 different fumigants to control nematodes CN, CS, CM and CK 2 levels of dose single and double also include a control treatment none (no fumigant at any dose) randomized-block design 4 blocks 12 plots per block (4 replicates of control treatment in each block) effects proportional analyse log counts

Analysis of Variance menu select Design to be General Treatment Structure (in Randomized Blocks)

Factorial plus added control treatment structure Fumigant / ( Level * Type ) Fumigant represents the overall effect of any fumigant at any (non-zero) dose Fumigant.Level represents comparison between single and double doses (averaged over different types) Fumigant.Type represents overall differences between types (averaged over single and double doses) Fumigant.Level.Type represents the interaction between Level and Type (given that some sort of fumigant has been applied)

Output

Output notice different sed's according to the replication of the means

Covariates provide additional background information often measurements made before expt (not used for blocking) e.g. (log) prior nematode counts incorporated in model as linear (regression) terms y ijkl = μ + β i + f j + ft jk + fl jl + ftl jkl + b (x ijkl x mean ) + ε ijkl improve precision remove potential biases caused by non-uniformity of units in aov table extra line(s) to assess effect of covariate(s) on y-variate, after removing effects of treatments treatment s.s. (and effects) adjusted to take account of the fact that the plots with the various treatments have different covariate values cov.ef. for treatment is efficiency remaining after adjustment cov.ef. for residual is amount by which its m.s. has decreased

Output regression coefficient for adjustment in Blocks stratum regression coefficient for adjustment within Blocks combined estimate

Output

Practical 3.7 spreadsheet Ratmuscles.gsh contains data from an experiment to study the effect of electrical stimulation in preventing the wasting away of denervated muscles of rats 3 treatment factors length of each treatment number of treatment periods per day type of current randomized block design with 2 blocks denervated muscles were gastrocnemius muscles on one side of each rat the normal muscle on the other side of each rat was also measured, for use as a covariate in the analysis analyse the experiment

4 Checking the assumptions In this session you will learn what assumptions are needed to ensure validity of an aov why the variance must be homogeneous (e.g. variability of residuals should be the same at high as low response values) how to assess whether the variance is homogeneous that residuals should come from identical and independent Normal distributions how to assess the Normality of the residuals why the model must be additive (i.e. differences between treatment effects must remain the same however large or small the underlying size of the response variable) how to identify outliers how transforming the response variate may correct for failures in the assumptions how to print back-transformed tables of means how to do a random permutation test Note: topics marked are optional

Homogeneity of variance random variation must be similar over all units beware: it may change with the size of response assess by plotting residuals against fitted values homogeneous increasing with response

Non homogeneity of variance if variation increases with size of response s.e.d.'s between treatment means will be over-estimated for differences between low means under-estimated for differences between larger means this could lead you to the wrong conclusions! if plot of residuals against fitted values indicates non-homogeneity of variances consider transforming the response variate (or using a generalized linear model; see Guide to Linear, Nonlinear and Generalized Linear Models in GenStat)

Normality of residuals histogram should be "bell-shaped" Normal plot residuals in ascending order plotted against Normal quantiles should give an approximately straight line half-normal plot similar to Normal plot but plots absolute residual values

Additivity differences between treatment effects remain the same however large or small the underlying size of the response e.g. in randomized-block design, assume that theoretical value of difference between two treatments remains the same within a block where responses are low, as in one where they are high fitting an additive model when non-additivity is present often leads to detection of (spurious) interactions analysis will be harder to interpret predictions will be unreliable but take care genuine interactions may also occur e.g. if one treatment modifies the mode of action of another data that shows signs of non-additivity often also violates other assumptions use background knowledge of the process if a multiplicative model appropriate take a log transformation for percentage data, consider a logit transformation

Outliers are extreme observation, leading to very large residuals look for warnings in ANOVA Information Summary or for extreme points in histogram of residuals or high or low points in plot of residuals against fitted values or points away from line at end of Normal or half-normal plot outliers may arise from errors in recording or punching data if the wrong treatment has been applied to a unit where there is a problem in the experimental procedure outliers distort treatment means inflate the error variance, decreasing the precision of estimates if you have outliers investigate to see if errors have occurred if you find an error try to recover the correct data value if you cannot find the correct data value, insert a missing value if you cannot find any possible source of error, perhaps the outlier might be a true data value is your model wrong?

Transformations can correct failures of assumptions e.g. to stabilize variance counts binomial percentages s.e. proportional to mean e.g. non-additivity multiplicative effects percentages square root angular i.e. arcsine(sqrt(p/100)) log log e.g. log10(n+1) for counts logit = log(p/(100-p)) p=100 (r+½)/(n+1) for binomial note: must make inferences on transformed scale but can present back-transformed means using Save and Calculate menus

Log transformed data study of plankton numbers 4 types of plankton (treatments) sampled in 12 hauls (blocks) compare analyses for untransformed and log10 transformed numbers

Save the means

Backtransform and print

Practical 4.6 spreadsheet Wine.gsh contains results from an experiment to assess the % alcohol of wine 5 types of wine A-E 3 bottles of each type were tested in a random order analyse the percentages & plot residuals against fitted values transform the percentages using a logit transformation, re-analyse the data & replot residuals against fitted values

Permutation tests if the distributional assumptions are not satisfied, you might use a random permutation test as an alternative way to assess the significance of the terms in the analysis model must still be additive for results to be meaningful but residuals need no longer follow Normal distributions with equal variances click on Permutation Test in ANOVA Further Output menu to open ANOVA Permutation Test menu specify Number of permutations select Seed (0 automatic) click on Run probability for each treatment term is now determined from its distribution over the randomly permuted data sets

Practical 4.8 spreadsheet Wine.gsh contains results from an experiment to assess the % alcohol of wine used in Practical 4.6 5 types of wine A-E 3 bottles of each type were tested in a random order analyse the percentages & plot residuals against fitted values assess the differences between the types using a permutation test