3.1. The F distribution [ST&D p. 99]

Similar documents
Chapter 7. Response of First-Order RL and RC Circuits

Chapter 8: Regression with Lagged Explanatory Variables

CHARGE AND DISCHARGE OF A CAPACITOR

11/6/2013. Chapter 14: Dynamic AD-AS. Introduction. Introduction. Keeping track of time. The model s elements

Cointegration: The Engle and Granger approach

MTH6121 Introduction to Mathematical Finance Lesson 5

cooking trajectory boiling water B (t) microwave time t (mins)

Fakultet for informasjonsteknologi, Institutt for matematiske fag

Duration and Convexity ( ) 20 = Bond B has a maturity of 5 years and also has a required rate of return of 10%. Its price is $613.

Module 4. Single-phase AC circuits. Version 2 EE IIT, Kharagpur

Journal Of Business & Economics Research September 2005 Volume 3, Number 9

Acceleration Lab Teacher s Guide

RC (Resistor-Capacitor) Circuits. AP Physics C

Mathematics in Pharmacokinetics What and Why (A second attempt to make it clearer)

A Note on Using the Svensson procedure to estimate the risk free rate in corporate valuation

Random Walk in 1-D. 3 possible paths x vs n. -5 For our random walk, we assume the probabilities p,q do not depend on time (n) - stationary

ANALYSIS AND COMPARISONS OF SOME SOLUTION CONCEPTS FOR STOCHASTIC PROGRAMMING PROBLEMS

The Transport Equation

Measuring macroeconomic volatility Applications to export revenue data,

Unstructured Experiments

Permutations and Combinations

Appendix A: Area. 1 Find the radius of a circle that has circumference 12 inches.

Capacitors and inductors

PROFIT TEST MODELLING IN LIFE ASSURANCE USING SPREADSHEETS PART ONE

Vector Autoregressions (VARs): Operational Perspectives

A Re-examination of the Joint Mortality Functions

The Torsion of Thin, Open Sections

Stability. Coefficients may change over time. Evolution of the economy Policy changes

Differential Equations and Linear Superposition

4. International Parity Conditions

Signal Rectification

17 Laplace transform. Solving linear ODE with piecewise continuous right hand sides

Chapter 2 Kinematics in One Dimension

How to calculate effect sizes from published research: A simplified methodology

Chapter 8 Student Lecture Notes 8-1

Principal components of stock market dynamics. Methodology and applications in brief (to be updated ) Andrei Bouzaev, bouzaev@ya.

AP Calculus BC 2010 Scoring Guidelines

Supplementary Appendix for Depression Babies: Do Macroeconomic Experiences Affect Risk-Taking?

Full-wave rectification, bulk capacitor calculations Chris Basso January 2009

Risk Modelling of Collateralised Lending

Usefulness of the Forward Curve in Forecasting Oil Prices

Stochastic Optimal Control Problem for Life Insurance

1 HALF-LIFE EQUATIONS

Differential Equations. Solving for Impulse Response. Linear systems are often described using differential equations.

The naive method discussed in Lecture 1 uses the most recent observations to forecast future values. That is, Y ˆ t + 1

DYNAMIC MODELS FOR VALUATION OF WRONGFUL DEATH PAYMENTS

Chapter 6: Business Valuation (Income Approach)

Forecasting Sales: A Model and Some Evidence from the Retail Industry. Russell Lundholm Sarah McVay Taylor Randall

Analysis of tax effects on consolidated household/government debts of a nation in a monetary union under classical dichotomy

A Probability Density Function for Google s stocks

PREMIUM INDEXING IN LIFELONG HEALTH INSURANCE

SEASONAL ADJUSTMENT. 1 Introduction. 2 Methodology. 3 X-11-ARIMA and X-12-ARIMA Methods

II.1. Debt reduction and fiscal multipliers. dbt da dpbal da dg. bal

Individual Health Insurance April 30, 2008 Pages

Mortality Variance of the Present Value (PV) of Future Annuity Payments

2.5 Life tables, force of mortality and standard life insurance products

Present Value Methodology

4 Convolution. Recommended Problems. x2[n] 1 2[n]

COMPUTATION OF CENTILES AND Z-SCORES FOR HEIGHT-FOR-AGE, WEIGHT-FOR-AGE AND BMI-FOR-AGE

Keldysh Formalism: Non-equilibrium Green s Function

Module 3 Design for Strength. Version 2 ME, IIT Kharagpur

Why Did the Demand for Cash Decrease Recently in Korea?

Life insurance cash flows with policyholder behaviour

ARCH Proceedings

Economics Honors Exam 2008 Solutions Question 5

TEMPORAL PATTERN IDENTIFICATION OF TIME SERIES DATA USING PATTERN WAVELETS AND GENETIC ALGORITHMS

Default Risk in Equity Returns

Statistical Analysis with Little s Law. Supplementary Material: More on the Call Center Data. by Song-Hee Kim and Ward Whitt

MACROECONOMIC FORECASTS AT THE MOF A LOOK INTO THE REAR VIEW MIRROR

9. Capacitor and Resistor Circuits

AP Calculus AB 2013 Scoring Guidelines

The Greek financial crisis: growing imbalances and sovereign spreads. Heather D. Gibson, Stephan G. Hall and George S. Tavlas

Morningstar Investor Return

Technical Appendix to Risk, Return, and Dividends

Hedging with Forwards and Futures

Optimal Investment and Consumption Decision of Family with Life Insurance

CLASSICAL TIME SERIES DECOMPOSITION

Return Calculation of U.S. Treasury Constant Maturity Indices

Two-Group Designs Independent samples t-test & paired samples t-test. Chapter 10

CLASSIFICATION OF REINSURANCE IN LIFE INSURANCE

Table of contents Chapter 1 Interest rates and factors Chapter 2 Level annuities Chapter 3 Varying annuities

Motion Along a Straight Line

Credit Index Options: the no-armageddon pricing measure and the role of correlation after the subprime crisis

INTEREST RATE FUTURES AND THEIR OPTIONS: SOME PRICING APPROACHES

Estimating Time-Varying Equity Risk Premium The Japanese Stock Market

TSG-RAN Working Group 1 (Radio Layer 1) meeting #3 Nynashamn, Sweden 22 nd 26 th March 1999

Lectures # 5 and 6: The Prime Number Theorem.

Markov Chain Modeling of Policy Holder Behavior in Life Insurance and Pension

Imagine a Source (S) of sound waves that emits waves having frequency f and therefore

AP Calculus AB 2007 Scoring Guidelines

Working Paper No Net Intergenerational Transfers from an Increase in Social Security Benefits

THE PRESSURE DERIVATIVE

Newton s Laws of Motion

RESTRICTIONS IN REGRESSION MODEL

Determinants of Capital Structure: Comparison of Empirical Evidence from the Use of Different Estimators

When Is Growth Pro-Poor? Evidence from a Panel of Countries

Dependent Interest and Transition Rates in Life Insurance

= r t dt + σ S,t db S t (19.1) with interest rates given by a mean reverting Ornstein-Uhlenbeck or Vasicek process,

GOOD NEWS, BAD NEWS AND GARCH EFFECTS IN STOCK RETURN DATA

Transcription:

Topic 3: Fundamenals of analysis of variance "The analysis of variance is more han a echnique for saisical analysis. Once i is undersood, ANOVA is a ool ha can provide an insigh ino he naure of variaion of naural evens" Sokal & Rohlf (995), BIOMETRY. 3.. The F disribuion [ST&D p. 99] Assume ha you are sampling a random from a normally disribued populaion (or from wo differen populaions wih equal variance) by firs sampling n iems and calculaing heir variance s (df: n - ), followed by sampling n iems and calculaing heir variance s (df: n - ). Now consider he raio of hese wo sample variances: s s This raio will be close o, because hese variances are esimaes of he same quaniy. The expeced disribuion of his saisic is called he F-disribuion. The F-disribuion is deermined by wo values for degrees of freedom, one for each sample variance. The values found wihin saisical Tables for F (e.g. Table A6) represen F α[df,df] where α is he proporion of he F-disribuion o he righ of he given F-value and df, df are he degrees of freedom peraining o he numeraor and denominaor of he variance raio, respecively..0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0. 0. 0.0 F(,40) F(8,6) F(6,8) F Figure Three represenaive F-disribuions (noe similariy of F (,40) o χ ). For example, a value F α/0.05, df9, df 9] 4.03 indicaes ha he raio s / s, from samples of en individuals from normally disribued populaions wih equal variances, is expeced o be larger han 4.03 by chance in only 5% of he experimens (he alernaive hypohesis is s s so i is a wo ailed es). 3 4

3. Tesing he hypohesis of equaliy of variances [ST&D 6-8] Suppose X,..., X m are observaions drawn from a normal disribuion wih mean µ X and variance σ X ; and Y,..., Y n are drawn from a normal disribuion wih mean µ v and variance σ Y. In heory, he F saisic can be used as a es for he hypohesis H 0 : σ X σ Y vs. he hypohesis H : σ X σ Y. H 0 is rejeced a he α level of significance if he raio s X s Y is eiher Fα/, dfx-, dfy- or F -α/, dfx-, dfy-. In pracice, his es is rarely used because i is very sensiive o deparures from normaliy. 3.3 Tesing he hypohesis of equaliy of wo means [ST&D 98-] The raio beween wo esimaes of σ can also be used o es differences beween means; ha is, i can be used o es H 0 : µ - µ 0 versus H : µ - µ 0. In paricular: F esimae of σ from sample means esimae of σ from individuals The denominaor is an esimae of σ provided by he individuals wihin each sample. Tha is, i is a weighed average of he sample variances. The numeraor is an esimae of σ provided by he means among samples. The variance of a populaion of sample means is σ /n, where σ is he variance of individuals in a paren populaion and all samples are of size n. This implies ha means may be used o esimae σ by muliplying he variance of sample means σ /n by n. F s among wihin s ns s Y When he wo populaions have differen means (bu he same variance), he esimae of σ based on sample means will include a conribuion aribuable o he difference among populaion means as well as any random difference (i.e. wihin-populaion variance). Thus, in general, if he means differ, he sample means are expeced o be more variable han prediced by chance alone. Example: We will explain he es using a daa se of Lile and Hills (p. 3). Yields (00 lbs/acre) of whea varieies and from plos o which he varieies were randomly assigned: Varieies Replicaions Y i. i. s i 9 4 5 7 0 85. 7 6.5 3 9 9 8 00. 0 4.0 Y.. 85.. 8.5

In his experimen, here are wo reamen levels ( ) and five replicaions (r 5) (he symbol sands for "reamens" and r sands for "replicaions"). Each observaion in he experimen has a unique "address" given by Y ij, where i is he index for reamen (i,) and j is he index for replicaion (j,,3,4,5). Thus Y 4 9. The do noaion is a shorhand alernaive o using. Summaion is for all values of he subscrip occupied by he do. Thus Y. 9 + 4 + 5 + 7 + 0 and Y. 4 + 9. We begin by assuming ha he wo populaions have he same (unknown) variance σ and hen es H 0 : µ µ. We do his by obaining wo esimaes for σ and comparing hem. Firs, we can compue he average variance of individuals wihin samples, also known as he experimenal error. To deermine he experimenal error, we compue he variance of each sample (s and s ), assume hey boh esimae a common variance, and hen esimae ha common variance by pooling he wo esimaes: s j (Y j Y.) n, s j (Y j Y.) n s pooled (n )s + (n )s 4*6.5 + 4* 4.0 / (4 + 4) 5.5 (n )+ (n ) s wihin In his case, since r r, he pooled variance is simply he average of he wo sample variances. Since pooling s and s gives an esimae of σ based on he variabiliy wihin samples, le's designae i s w (subscrip w wihin). The second esimae of σ is based on he variaion beween or among samples. Assuming, by he null hypohesis, ha hese wo samples are random samples drawn from he same populaion and ha, herefore, Y. and Y. boh esimae he same populaion mean, we can esimae he variance of means of ha populaion by s Y. Recall from Topic ha he mean Y of a se of n random variables drawn from a normal disribuion wih mean µ and variance σ is iself a normally disribued random variable wih mean µ and variance σ /n. The formula for is s Y i (Y i. Y..) [(7-8.5) + (0-8.5) ] / (-) 4.5 and, from he cenral limi heorem, n imes his quaniy provides an esimae for σ (n is he number of variaes on which each sample mean is based). Therefore, he beween samples esimae is: 3

n s Y 5 * 4.5.5 s beween These wo variances are used in he F es as follows. If he null hypohesis is no rue, we would expec he variance beween samples o be much larger han he variance wihin samples ("much larger" means larger han one would expec by chance alone). Therefore, we look a he raio of hese variances and ask wheher his raio is significanly greaer han. I urns ou ha under our assumpions (normaliy, equal variance, ec.), his raio is disribued according o an F (-, (n- )) disribuion. Tha is, we define: F s b / s w and es wheher his saisic is significanly greaer han. The F saisic is a measure of how many imes larger he variabiliy beween he samples is compared o he variabiliy wihin samples. In his example, F.5/5.5 4.9. The numeraor s b is based on df, since here are only wo sample means. The denominaor, s w, is based on pooling he df wihin each sample so df den (n-) (4) 8. For hese df, we would expec an F value of 4.9 or larger jus by chance abou 7% of he ime. From Table A.6, F 0.05,, 8 5.3. Since 4.9 < 5.3, we fail o rejec H 0 a he 0.05 significance level. 3.3. Relaionship beween F and In he case of only wo reamens, he square-roo of he F saisic is disribued according o a disribuion: F α,df,(n ) α,df (n ) meaning s b s w In he example above, wih 5 reps per reamen: F (,8), - α ( 5, - α/) 5.3.306 The oal degrees of freedom for he saisic is (n - ) since here are n oal observaions and hey mus saisfy consrain equaions, one for each reamen mean. Therefore, we rejec he null hypohesis a he α significance level if > a/,(n-). 4

Here are he compuaions for our daa se: s s b w.5 5.5.07 Since.07 < 0.05, 8.306, we fail o rejec H 0 a he 0.05 significance level. 3.4 The linear addiive model [ST&D p. 3, 03, 5] 3.4. One populaion: In saisics, a common model describing he makeup of an observaion saes ha i consiss of a mean plus an error. This is a linear addiive model. A minimum assumpion is ha he errors are random, making he model probabilisic raher han deerminisic. The simples linear addiive model is his one: Y i µ + ε i I is applicable o he problem of esimaing or making inferences abou populaion means and variances. This model aemps o explain an observaion Y i as a mean µ plus a random elemen of variaion ε i. The ε i 's are assumed o be from a populaion of uncorrelaed ε's wih mean zero. Independence among ε's is assured by random sampling. 3.4. Two populaions: Now consider his model: Y ij µ + τ i + ε ij I is more general han he previous model because i permis us o describe wo populaions simulaneously. For samples from wo populaions wih possibly differen means bu a common variance, any given reading is composed of he grand mean µ of he populaion, a componen τ i for he populaion involved (i.e. µ + τ µ and µ + τ µ ), and a random deviaion ε ij. The subindex i (,) indicaes he reamen number and he subindex j (,..., r) indicaes he number of observaions from each populaion (replicaions). τ i, he reamen effecs, are measured as deviaions from he overall mean [µ (µ + µ ) / ] such ha τ + τ 0 or -τ τ. This does no affec he difference beween means, τ. If r r we may se r τ + r τ 0. The ε's are assumed o be from a single populaion wih normal disribuion, mean µ 0, and variance s. 5

Anoher way o express his model, using he do noaion from before, is: Yij.. + ( i. -..) + (Yij - i.) 3.4.3 More han wo populaions. One-way classificaion ANOVA As wih he sample -es, he linear model is: Y ij µ + τ i + ε ij where now i,..., and j,...,r. Again, he ε ij are assumed o be drawn from a normal disribuion wih mean 0 and variance σ. Two differen kinds of assumpions can be made abou he τ's ha will differeniae he Model I ANOVA from he Model II ANOVA. The Model I ANOVA or fixed model: In his model, he τ's are fixed and τ i 0 The consrain τ i 0 is a consequence of defining reamen effecs as deviaions from an overall mean. The null hypohesis is hen saed as H 0 : τ... τ i 0 and he alernaive as H : a leas one τ i 0. Wha a Model I ANOVA ess is he differenial effecs of reamens ha are fixed and deermined by he experimener. The word "fixed" refers o he fac ha each reamen is assumed o always have he same effec τ i. Moreover, he se of τ's are assumed o consiue a finie populaion and are specific parameers of ineres, along wih s. In he case of a false H 0 (i.e. some τ i 0), here will be an addiional componen of variaion due o reamen effecs equal o: τ i r Since he τ i are measured as deviaions from a mean, his quaniy is analogous o a variance bu canno be called such since i is no based on a random variable bu raher on deliberaely chosen reamens. The Model II ANOVA or random model: In his model, he addiive effecs for each group (τ's) are no fixed reamens bu are random effecs. In his case, we have no deliberaely planned or fixed he reamen for any group, and he effecs on each group are random and only parly under our conrol. The τ's hemselves are a random sample from a populaion of τ's for which he mean is zero and he variance is σ. When he null hypohesis is false, here will be an addiional componen of variance equal o rσ. Since he effecs are random, i is fuile o esimae he magniude of hese random effecs for any one group or he differences from group o group; bu we can esimae heir variance, he added variance componen among groups: rσ. We es for is presence and esimae is magniude, as well as is percenage conribuion o he variaion. In he fixed model, we draw inferences abou paricular reamens; in he random 6

model, we draw an inference abou he populaion of reamens. The null hypohesis in his laer case is saed as H 0 : σ 0 versus H : σ 0. An imporan poin is ha he basic seup of daa, as well as he compuaion and significance es, in mos cases is he same for boh models. I is he purpose which differs beween he wo models, as do some of he supplemenary ess and compuaions following he iniial significance es. For now, we will deal only wih he fixed model. Assumpions of he model [ST&D 74]. Treamen and environmenal effecs are addiive. Experimenal errors are random, possess a common variance, and are independenly and normally disribued abou zero mean Effecs are addiive This means ha all effecs in he model (reamen effecs, random error) cause deviaions from he overall mean in an addiive manner (raher han, for example, muliplicaive). Error erms are independenly and normally disribued This means here is no correlaion beween experimenal groupings of observaions (e.g. by reamen level) and he sizes of he error erms. This could be violaed if, for example, reamens are no assigned randomly. This assumpion essenially means ha he means and variances of reamens share no correlaion. For example, suppose yield is measured and he reamens cause yield o range from g/plan up o 0 g/plan. A range of ± gm would be much more "significan" a he low end han he high end bu could no be considered any differenly by his model. Variances are homogeneous The means he variances of he differen reamen groups are he same. 7

3.5 ANOVA: Single facor designs 3.5. The Compleely Randomized Design (CRD) In single facor experimens, a single reamen (i.e. facor) is varied o form he differen reamen levels. The experimen discussed below is aken from page 4 of ST&D. The experimen involves inoculaing five differen culures of one legume, clover, wih srains of nirogen-fixing baceria from anoher legume, alfalfa. As a sor of conrol, a sixh rial was run in which a composie of five clover culures was inoculaed. There are 6 reamens ( 6) and each reamen is given 5 replicaions (r 5). 3DOK 3DOK5 3DOK4 3DOK7 3DOK3 composie Toal 9.4 7.7 7.0 0.7 4.3 7.3 3.6 4.8 9.4.0 4.4 9.4 7.0 7.9 9. 0.5.8 9. 3. 5..9 8.8.6 6.9 33.0 4.3 5.8 8.6 4. 0.8 Y ij Y i. 44. 9.9 73. 99.6 66.3 93.5 596.6 Y.. Y ij 487.53 93.7 39.4 989.4 887.9 758.7 994.36 Y i. /r 45.96 875. 07.65 984.03 879.4 748.45 7.43 (Y ij - i. ) 34.57 57.07 67.77 5. 8.5 0.6 8.93 i. mean 8.8 4.0 4.6 9.9 3.3 8.7 9.88 σ n- variance 33.64 4.7 6.94.8.04.56 Inoculaion of clover wih Rhizobium srains [ST&D Table 7.] The compleely randomized design (CRD) is he basic ANOVA design. I is used when here are differen reamen levels of a single facor (in his case, Rhizobium srain). These reamens are applied o independen random samples of size r. Le he oal sample size for he experimen be designaed as n r. Le Y ij denoe he j h measuremen (replicaion) recorded from he i h reamen. WARNING: Some exs inerchange he i and he j (i.e. he rows and columns of he able), so be careful. We wish o es he hypohesis H 0: µ µ µ 3... µ agains H : no all he µ i 's are equal. This is a sraighforward exension of he wo-sample es of opic 3.3 since here was nohing special abou he value. Recall ha he es saisic was: In our new do noaion, we can wrie: F s b / s w r ( Yij Y i. ) r i j SSE sw, where SSE ( r ) ( r ) i j ( Y ij Y ) i. 8

Here SSE is he sum of squares for error. Also: r (Y i. Y..) s i b SST, where SST r (Y i. Y..) i Here SST is he sum of squares for reamen. Since he variance among reamen means esimaes σ /r, he r in he definiion formula for SST is required so ha he mean square for reamen (MST) will be an esimae of σ raher han σ /r. This is equivalen o he sep we ook in example 3.3 above when we muliplied by n in order o esimae he variance beween samples (s b n s Y ). Using our new noaion, we can wrie: We can hen define: SST /( ) SST /( ) F SSE / ( r ) SSE /( n ) The mean square for error: MSE SSE/(n-). This is he average dispersion of he observaions around heir respecive group means. I is an esimae of a common σ, he experimenal error (i.e. he variaion among observaions reaed alike). MSE is a valid esimae of he common σ if he assumpion of equal variances among reamens is rue. The mean square for reamen: MST SST/(-). (MS Model in SAS) This is an independen esimae of σ, when he null hypohesis is rue (H 0 : µ µ µ 3... µ ). If here are differences among reamen means, here will be an addiional source of variaion in he experimen due o reamen effecs equal o rτ i /(-) (Model I) or rσ (Model II) (see opic 3.4.3 and ST&D 55). F MST/MSE The F value is obained by dividing he reamen mean square by he error mean square. We expec o find F approximaely equal o. In fac, however, he expeced raio is: MST MSE σ / ( ) σ + r τ i As is clear from his formula, he F-es is sensiive o he presence of he added componen of variaion due o reamen effecs. In oher words, ANOVA permis us o es wheher here are 9

any nonzero reamen effecs. Tha is, o es wheher a group of means can be considered random samples from he same populaion or wheher we have sufficien evidence o conclude ha he reamens ha have affeced each group separaely have resuled in shifing hese means sufficienly so ha hey can no longer be considered samples from he same populaion. Recall ha he degrees of freedom is he number of independen, unconsrained quaniies underlying a saisic. Underlying SST are quaniies (Y i. - Y.. ) which have one consrain (hey mus sum o 0); so df r -. Underlying SSE are n quaniies Yij, which have consrains for he sample means; so df e n-. Consider he following equaion: r (Y ij Y..) r (Y i. Y..) + i j i i j r (Y ij Y i.) If you ake he ime o deal wih he messy algebra, you will find ha his equaliy is rue. The reason his is relevan is because his is jus our do noaion for: TSS SST + SSE where TSS is he oal sum of squares of he experimen. In oher words, sums of squares are perfecly addiive. If you were o fully expand he quaniy on he lef-hand side of he equaion, you find los of cross produc erms of he form (Y ij Y.. ). I urns ou, upon simplificaion, ha all of hose cross produc erms cancel each oher ou (noe ha none appear on he righ-hand side of he equaion). Quaniies ha saisfy his crierion are said o be orhogonal. Anoher way of saying his is ha we can decompose he oal SS ino a porion due o variaion among groups and anoher, compleely independen porion due o variaion wihin groups. The degrees of freedom are also addiive (i.e. df To df Tr + df e ). 0

The do noaion above provides he "definiional" forms of hese quaniies (TSS, SST, and SSE). Bu each also has a friendlier "calculaional" form, for when you compue hem by hand. The wo expressions are mahemaically equivalen. The calculaional forms make use of a quaniy called he "correcion facor": C ( Y n..) ( Y ij ) n ij This erm is jus he squared sum of all observaions in he experimen, divided by heir oal number. Once you calculae C, you can ackle he oal SS (TSS): TSS r i j Y ij C The oal SS is he sum of squares ha includes all sources of variaion. In he do noaion, you see ha i is he sum of he squares of he deviaion of each observaion from he overall mean. Nex, you can ackle he reamen sum of squares (SST): SST Yi r i. C The SST is he sum of squares aribuable o he variable of classificaion. This is he SS due o differences among reamen groups and is referred o as he wihin-or-among groups SS. Finally, he error sum of squares (SSE): SSE TSS SST The SSE is ha par of he oal sum of squares ha canno be explained by difference among groups. I is he sum of squares among individuals reaed alike. I is referred o as he wihin groups SS, residual SS, or error SS.

An ANOVA able provides a sysemaic presenaion of everyhing we've covered unil now. The firs column of he ANOVA able specifies he componens of he linear model. In a single facor CRD, remember, he linear model is jus: Y ij µ + τ i + ε ij By his model, we have wo named sources of variaion: Treamens and Error. The nex column indicaes he df associaed wih each of hese componens. Nex is a column wih he SS associaed wih each, followed by a column wih he mean squares associaed wih each. Mean squares, mahemaically, are essenially variances; and hey are found by dividing SS by heir respecive df. Finally, he las column in an ANOVA able presen he F saisic, which is a raio of mean squares (i.e. a raio of variances). An ANOVA able (including an addiional column of he SS definiional forms): Source df Definiion SS MS F Treamens - SST SST/(-) MST/MSE r ( Y i. Y.. ) i Error Toal n - (r-) n - i, j i, j ( Y ij Y i.) ( Y ij Y.. ) TSS - SST TSS SSE/(n-) The ANOVA able for our Rhizobium experimen would look like his: Source df SS MS F Treamen 5 847.05 69.4 4.37** Error 4 8.93.79 Toal 9 9.98 Noice ha he mean square error (MSE.79) is jus he pooled variance or he average of variances wihin each reamen (i.e. MSE Σ s i / ; where s i is he variance esimaed from he ih reamen). The F value of 4 indicaes ha he variaion among reamens is over 4 imes larger han he mean variaion wihin reamens. This value far exceeds he criical F value for such an experimenal design a α 0.05 (F cri F (5,4),0.05.6), so we rejec H 0. A leas one of he reamens has a nonzero effec on he response variable, a he specified significance level.

3.5.. Assumpions associaed wih ANOVA The assumpions associaed wih ANOVA can be expressed in erms of he following saisical model: Y ij µ + τ i + ε ij. Firs, ε ij (he residuals) are assumed o be normally disribued wih mean 0 and possess a common variance σ, independen of reamen level i and sample number j). 3.5... Normal disribuion Recall from he firs lecure ha he Shapiro-Wilk es saisic W provides a powerful es for normaliy for small o medium samples (n < 000). For large populaions (n>000), he use of he Kolmogorov-Smirnov saisic is recommended. 3.5... Homogeneiy of reamen variances Tess for homogeneiy of variance (i.e. homoscedasiciy) aemp o deermine if he variance is he same wihin each of he groups defined by he independen variable. Barle's es (ST&D 48) can be very inaccurae if he underlying disribuion is even slighly nonnormal, and i is no recommended for rouine use. Levene's es is much more robus o deviaions from normaliy. Levene s es is an ANOVA of he absolue values of he residuals of he observaion from heir respecive reamen means. If Levene's es leads you rejec he null hypohesis (H 0 : he mean residual absolue value is he same for all reamens; i.e. he wihin-reamen variance is he same across all reamen groups; i.e. variances are homogeneous), one opion is o perform a Welch's variance-weighed ANOVA (Biomerika 95 v38, 330) insead of he usual ANOVA o es for differences beween group means in a CRD. This alernaive o he usual analysis of variance is more robus if variances are no equal. 3

3.5..3 Experimenal Procedure: Randomizaion Here is how he clover plos migh look if his experimen were conduced in he field: 3 4 5 6 7 8 9 0 3 4 5 6 7 8 9 0 3 4 5 6 7 8 9 30 The experimenal procedure would be: Firs, randomly (e.g. from a random number able, ec.) selec he plo numbers o be assigned o he six reamens (A,B,C,D,E,F). Example: On p. 607 [ST&D], saring from Row 0, columns 88-89 (a random saring poin), move downward. Take for reamen A he firs 5 random numbers under 30, and so forh: Treamen A: 05, 9, 3, 0, 6; Treamen B: 4, 6,, 8, 4; ec. 3.5..4 Power and sample size 3.5..4. Power The power of a es is he probabiliy of deecing a nonzero reamen effec. To calculae he power of he F es in an ANOVA using Pearson and Harley's power funcion chars (953, Biomerika 38:-30), i is necessary firs o calculae a criical value φ. This criical value depends on he number of reamens (), he number of replicaions (r), he magniude of he reamen effecs ha he invesigaor wishes o deec (d), an esimae of he populaion variance (σ MSE), and he probabiliy of rejecing a rue null hypohesis (α). In a CRD, y ij µ + τ i + ε ij, where i,,,; j,,...,r; µ is he overall mean; and τ i is he reamen effec (τ i µ i - µ). To calculae he power, you firs need o calculae φ, a sandardized measure (in σ unis) of he expeced differences among means which can be used o deermine sample size from he power chars. Is general form: φ r MSE τ i Wih his value, you ener he char for ν df df numeraor df reamen - and choose he x-axis scale for he appropriae α (0.05 or 0.0). The inercepion of he calculaed φ wih he curve for ν df df denominaor df error (n-) gives he power of he es (he y-axis on boh sides of he char). 4

Example: Suppose an experimen has 6 reamens wih r replicaions each. Given he MSE and he required α 5%, you calculae φ.75. To find he power associaed wih his value of φ, use Char v - 5 and he se of curves o he lef (α 5%). Selec curve v (r- ) 6. The heigh of his curve corresponding o he abscissa of φ.75 is he power of he es. In his case, he power is slighly greaer han 0.55. As a rule of humb, experimens should be designed wih a power of a leas 80% (i.e. β 0.0). 3. 5.. 4.. Sample size To calculae he number of replicaions required for a given α and desired power, a simplificaion of he general power formula above can be used. The general power formula can be simplified if we assume all τ i are zero excep he wo exreme reamen effecs (le's call hem τ K and τ L, so ha d µ K - µ L. You can hink of d as he difference beween he exreme reamen means. Taking µ o be in he middle of µ K and µ L, τ i d/: i τ ( d / ) + ( d / ) d / 4 + d / 4 d / d And he φ formula simplifies: φ d * r * MSE Wih his simplified expression, one can esimae he required number of replicaions for a given α and desired power by: ) Specifying he consans, ) Saring wih an arbirary r o compue φ, 3) Using he appropriae Pearson and Harley char o find he power; and 4) Ieraing he process unil a minimum r value is found which saisfies he required power for a given α level. Example: Suppose ha 6 reamens will be involved in a sudy and he anicipaed difference beween he exreme means is 5 unis. Wha is he required sample size so ha his difference will be deeced a α % and power 90%, knowing ha σ? (noe, 6, α 0.0, β 0.0, d 5, and MSE ). r df φ (-β) for α% 6(-) 6.77 0. 3 6(3-).7 0.7 4 6(4-) 8.50 0.93 Thus 4 replicaions are required for each reamen o saisfy he required condiions. 5