Bootstrapping Analogs of the One Way MANOVA Test

Size: px
Start display at page:

Download "Bootstrapping Analogs of the One Way MANOVA Test"

Transcription

1 Bootstrapping Analogs of the One Way MANOVA Test Hasthika S Rupasinghe Arachchige Don and David J Olive Southern Illinois University March 17, 2016 Abstract The classical one way MANOVA model is used to test whether the mean measurements are the same or differ across p groups, and assumes that the covariance matrix of each group is the same This paper suggests using the Olive (2016a) bootstrap technique to develop analogs of one way MANOVA test The new tests can have considerable outlier resistance, and the tests do not need the population covariance matrices to be equal KEY WORDS: Behrens Fisher problem, bootstrap, prediction region, coordinatewise median, RMVN estimator David J Olive is Professor, Hasthika S Rupasinghe Arachchige Don is PhD student, Department of Mathematics, Southern Illinois University, Carbondale, IL 62901, USA 1

2 1 INTRODUCTION The multivariate linear model y i = B T x i + ɛ i for i = 1,, n has m 2 response variables Y 1,, Y m and p predictor variables x 1, x 2,, x p The ith case is (x T i, yt i ) = (x i1, x i2,, x ip, Y i1,, Y im ) The model is written in matrix form as Z = XB + E where the matrices are defined below The model has E(ɛ k ) = 0 and Cov(ɛ k ) = Σɛ = (σ ij ) for k = 1,, n Then the p m coefficient matrix B = [ ] β 1 β 2 β m and the m m covariance matrix Σ ɛ are to be estimated, and E(Z) = XB while E(Y ij ) = x T i β j The ɛ i are assumed to be iid The univariate linear model corresponds to m = 1 response variable, and is written in matrix form as Y = Xβ + e Subscripts are needed for the m univariate linear models Y j = Xβ j + e j for j = 1,, m where E(e j ) = 0 For the multivariate linear model, Cov(e i, e j ) = σ ij I n for i, j = 1,, m where I n is the n n identity matrix The n m matrix Y 1,1 Y 1,2 Y 1,m Y 2,1 Y 2,2 Y 2,m Z = = [ y ] T 1 Y 1 Y 2 Y m = y Y n,1 Y n,2 Y T n n,m The n p design matrix X of predictor variables is not necessarily of full rank p, and x 1,1 x 1,2 x 1,p x 2,1 x 2,2 x 2,p X = = [ x ] T 1 v 1 v 2 v p = x x n,1 x n,2 x T n n,p where often v 1 = 1 The p m matrix B = β 1,1 β 1,2 β 1,m β 2,1 β 2,2 β 2,m β 1 β 2 β m ] β p,1 β p,2 β p,m The n m matrix ɛ 1,1 ɛ 1,2 ɛ 1,m ɛ 2,1 ɛ 2,2 ɛ 2,m E = ɛ n,1 ɛ n,2 ɛ n,m e 1 e 2 e m ] = ɛ T 1 ɛ T n Considering the ith row of Z, X, and E shows that y T i = x T i B + ɛ T i The multivariate linear regression model and one way MANOVA model are special cases of the multivariate linear model, but using double subscripts will be useful for 2

3 describing the one way MANOVA model Suppose there are independent random samples of size n i from p different populations (treatments), or n i cases are randomly assigned to p treatment groups where n = p n i Assume that m response variables y ij = (Y ij1,, Y ijm ) T are measured for the ith treatment group and the jth case (often an individual or thing) in the group Hence i = 1,, p and j = 1,, n i The Y ijk follow different one way ANOVA models for k = 1,, m Assume E(y ij ) = µ i and Cov(y ij ) = Σɛ Hence the p treatments have different mean vectors µ i, but common covariance matrix Σɛ The one way MANOVA is used to test H 0 : µ 1 = µ 2 = = µ p Often µ i = µ + τ i, so H 0 becomes H 0 : τ 1 = = τ p If m = 1, the one way MANOVA model is the one way ANOVA model MANOVA is useful since it takes into account the correlations between the m response variables The Hotelling s T 2 test that uses a common covariance matrix is a special case of the one way MANOVA model with m = 2 Let µ i = µ + τ i where p n i τ i = 0 The jth case from the ith population or treatment group is y ij = µ + τ j + ɛ ij where ɛ ij is an error vector, i = 1,, p and j = 1,, n i Let y = ˆµ = p ni j=1 y ij /n be the overall mean Let y i = n i j=1 y ij /n i so ˆτ i = y i y Let the residual vector ˆɛ ij = y ij y i = y ij ˆµ ˆτ i Then y ij = y + (y i y) + (y ij y i ) = ˆµ + ˆτ i + ˆɛ ij Several m m matrices will be useful Let S i be the sample covariance matrix corresponding to the ith treatment group Then the within sum of squares and cross products matrix is W = (n 1 1)S 1 + +( 1) = p ni j=1(y ij y i )(y ij y i ) T Then ˆΣɛ = W/(n p) The treatment or between sum of squares and cross products matrix is p B T = n i (y i y)(y i y) T The total corrected (for the mean) sum of squares and cross products matrix is T = B T + W = p ni j=1(y ij y)(y ij y) T Note that S = T/(n 1) is the usual sample covariance matrix of the y ij if it is assumed that all n of the y ij are iid so that the µ i µ for i = 1,, p The one way MANOVA model is y ij = µ+τ i +ɛ ij where the ɛ ij are iid with E(ɛ ij ) = 0 and Cov(ɛ ij ) = Σɛ If all n of the y ij are iid with E(y ij ) = µ and Cov(y ij ) = Σɛ, it can be shown that A/df P Σɛ where A = W, B T, or T and df is the corresponding degrees of freedom Let t 0 be the test statistic Often Pillai s trace statistic, the Hotelling Lawley trace statistic, or Wilks lambda are used Wilks lambda Λ = W B T + W = W T p p ni = p (n i 1)S i (n 1)S j=1(y ij y i )(y ij y i ) T ni j=1(y ij y)(y ij y) T Then t o = [n 05(m + p 2)] log(λ) and the test rejects H 0 if t 0 > χ 2 m(p 1) (1 α) See Johnson and Wichern (1988, p 238) Following Mardia, Kent, and Bibby (1979, p 335), let λ 1 λ 2 λ m be the eigenvalues of W 1 B T Then 1 + λ i for i = 1,, m are the eigenvalues of W 1 T and Λ = m (1 + λ i ) 1 3 =

4 Following Fujikoshi (2002) and Kakizawa (2009), let the Hotelling Lawley trace statistic U = tr(b T W 1 ) = tr(w 1 B T ) = m λ i, and let Pillai s trace statistic V = m tr(b T T 1 ) = tr(t 1 λ i B T ) = If the y 1 + λ ij µ j are iid with common covariance matrix Σɛ, and if H 0 is true, then under regularity conditions [n 05(m + p i 2)] log(λ) D χ 2 m(p 1), (n m p 1)U D χ 2 m(p 1), and (n 1)V D χ 2 m(p 1) Note that the common covariance matrix assumption implies that each of the p treatment groups or populations has the same covariance matrix Σ i = Σɛ for i = 1,, p, an extremely strong assumption or A possible alternative method for one way MANOVA is to use the model Z = XB+E Y 111 Y 112 Y 11m Y 1,n1,1 Y 1,n1,2 Y 1,n1,m Y 211 Y 211 Y 21m β ,1 β 1,2 β 1,m β 2,1 β 2,2 β 2,m = Y 2,n2,1 Y 2,n2,2 Y 2,n2,m E β p,1 β p,2 β p,m Y p,11 Y p,1m Y p,1m Y p,np,1 Y p,np,2 Y p,np,m Then X is full rank where the ith column of X is an indicator for group i 1 for i = 2,, p, ˆβ 1k = Y p0k = ˆµ pk for k = 1,, m, and ˆβ ik = Y i 1,0k Y p0k = ˆµ i 1,k ˆµ pk for k = 1,, m and i = 2,, p Thus testing H 0 : µ 1 = = µ p is equivalent to testing H 0 : LB = 0 where L = [0 I p 1 ] Press (2005, p 262) uses the above model Then y ij = µ i + ɛ ij and B = µ T p µ T 1 µ T p µ T 2 µ T p µ T p 2 µ T p µ T p 1 µ T p Following Olive (2016b, ch 10) and Rupasinghe Arachchige Don (2017), large sample theory can be also be used to derive a better test Let Σ i be the nonsingular population 4

5 covariance matrix of the ith treatment group or population To simplify the large sample theory, assume n i = π i n where 0 < π i < 1 and p π i = 1 Assume H 0 is true, and let µ i = µ for i = 1,, p Then by the multivariate central limit theorem, n i (y i µ) D N m (0,Σ i ), and ( n(y i µ) D N m 0, Σ ) i Let π i w = y 1 y p y 2 y p y p 2 y p y p 1 y p Then nw D N m(p 1) (0,Σw) where Σw = (Σ ij ) where Σ ij = Cov( n(y i y p ), n(y j y p )) = Σ p π p for i j, and Σ ii = Cov( n(y i y p )) = Σ i π i + Σ p π p for i = j Hence t 0 = nw T ˆΣ 1 ww = w T as the n i if H 0 is true Here ˆΣw n = S 1 n 1 + Sp S 2 n 2 + Sp ( ) 1 ˆΣw w D χ 2 m(p 1) n Sp is a block matrix where the off diagonal block entries equal / and the ith diagonal block entry is S i + for i = 1,, (p 1) Reject H 0 if t 0 > m(p 1)F m(p 1),dn (1 α) n i where d n = min(n 1,, ) It may make sense to relabel the groups so that is the largest n i or / has the smallest generalized variance of the S i /n i This test may start to outperform the one way MANOVA test if n (m + p) 2 and n i 10m for i = 1,, p 2 The Prediction Region Method Since the common covariance matrix assumption Cov(ɛ k ) = Σ ɛ for k = 1,, n is extremely strong, using the prediction region method to test H 0 : LB = 0 may be a useful alternative Take a sample of size n i with replacement from random sample i for i = 1, 2,, m Let the (p 1)m 1 vector w i = vec(l ˆB i ) = ((ˆµ 1 ˆµ p )T,, (ˆµ p 1 ˆµ p )T ) T for i = 1,, B, where vec(a) stacks columns of a matrix into a vector For a robust test use w i = ((T 1 T p ) T,, (T p 1 T p ) T ) T where T i is a robust location estimator, such as the coordinatewise median or RMVN location estimator, applied to the cases in the ith treatment group Likely need n 20mp, n (m + p) 2, and n i 20m 5

6 3 EXAMPLES AND SIMULATIONS In tables 1 to 4, 4 CONCLUSIONS Pelawa Watagoda and Rupasinghe Arachchige Don (2016) consider bootstrapping analogs of the two sample Hotelling s T 2 test, and Konietschke, Bathke, Harrar, and Pauly (2015) bootstrap MANOVA models 5 References Fujikoshi, Y (2002), Asymptotic Expansions for the Distributions of Multivariate Basic Statistics and One-Way MANOVA Tests Under Nonnormality, Journal of Statistical Planning and Inference, 108, Johnson, RA, and Wichern, DW (1988), Applied Multivariate Statistical Analysis, 2nd ed, Prentice Hall, Englewood Cliffs, NJ Kakizawa, Y (2009), Third-Order Power Comparisons for a Class of Tests for Multivariate Linear Hypothesis Under General Distributions, Journal of Multivariate Analysis, 100, Konietschke, F, Bathke, AC, Harrar, SW, and Pauly, M (2015), Parametric and Nonparametric Bootstrap Methods for General MANOVA, Journal of Multivariate Analysis, 140, Mardia, KV, Kent, JT, and Bibby, JM (1979), Multivariate Analysis, Academic Press, London, UK Olive, DJ (2016a), Bootstrapping Hypothesis Tests, unpublished manuscript at ( Olive, DJ (2016b), Robust Multivariate Analysis, Springer, New York, NY, to appear Pelawa Watagoda, LCR, and Rupasinghe Arachchige Don, HS (2016), Bootstrapping Analogs of the Two Sample Hotelling s T 2 Test, Preprint at ( mathsiuedu/olive/stwosamplepdf) Press, SJ (2005), Applied Multivariate Analysis: Using Bayesian and Frequentist Methods of Inference, 2nd ed, Dover, Mineola, NY Rupasinghe Arachchige Don, HS (2017), Bootstrapping Analogs of the One Way MANOVA Test, PhD Thesis, Southern Illinois University, to appear Van Aelst, S, and Willems, G (2011), Robust and Efficient One-Way MANOVA Tests, Journal of the American Statistical Association, 106,

Least Squares Estimation

Least Squares Estimation Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David

More information

Multivariate Analysis of Variance (MANOVA): I. Theory

Multivariate Analysis of Variance (MANOVA): I. Theory Gregory Carey, 1998 MANOVA: I - 1 Multivariate Analysis of Variance (MANOVA): I. Theory Introduction The purpose of a t test is to assess the likelihood that the means for two groups are sampled from the

More information

Multivariate Analysis of Variance (MANOVA)

Multivariate Analysis of Variance (MANOVA) Chapter 415 Multivariate Analysis of Variance (MANOVA) Introduction Multivariate analysis of variance (MANOVA) is an extension of common analysis of variance (ANOVA). In ANOVA, differences among various

More information

University of Ljubljana Doctoral Programme in Statistics Methodology of Statistical Research Written examination February 14 th, 2014.

University of Ljubljana Doctoral Programme in Statistics Methodology of Statistical Research Written examination February 14 th, 2014. University of Ljubljana Doctoral Programme in Statistics ethodology of Statistical Research Written examination February 14 th, 2014 Name and surname: ID number: Instructions Read carefully the wording

More information

Multivariate normal distribution and testing for means (see MKB Ch 3)

Multivariate normal distribution and testing for means (see MKB Ch 3) Multivariate normal distribution and testing for means (see MKB Ch 3) Where are we going? 2 One-sample t-test (univariate).................................................. 3 Two-sample t-test (univariate).................................................

More information

Multivariate Analysis of Variance (MANOVA)

Multivariate Analysis of Variance (MANOVA) Multivariate Analysis of Variance (MANOVA) Aaron French, Marcelo Macedo, John Poulsen, Tyler Waterson and Angela Yu Keywords: MANCOVA, special cases, assumptions, further reading, computations Introduction

More information

Introduction to General and Generalized Linear Models

Introduction to General and Generalized Linear Models Introduction to General and Generalized Linear Models General Linear Models - part I Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby

More information

DISCRIMINANT FUNCTION ANALYSIS (DA)

DISCRIMINANT FUNCTION ANALYSIS (DA) DISCRIMINANT FUNCTION ANALYSIS (DA) John Poulsen and Aaron French Key words: assumptions, further reading, computations, standardized coefficents, structure matrix, tests of signficance Introduction Discriminant

More information

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written

More information

Quadratic forms Cochran s theorem, degrees of freedom, and all that

Quadratic forms Cochran s theorem, degrees of freedom, and all that Quadratic forms Cochran s theorem, degrees of freedom, and all that Dr. Frank Wood Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 1, Slide 1 Why We Care Cochran s theorem tells us

More information

Time Series Analysis III

Time Series Analysis III Lecture 12: Time Series Analysis III MIT 18.S096 Dr. Kempthorne Fall 2013 MIT 18.S096 Time Series Analysis III 1 Outline Time Series Analysis III 1 Time Series Analysis III MIT 18.S096 Time Series Analysis

More information

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 2. x n. a 11 a 12 a 1n b 1 a 21 a 22 a 2n b 2 a 31 a 32 a 3n b 3. a m1 a m2 a mn b m

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 2. x n. a 11 a 12 a 1n b 1 a 21 a 22 a 2n b 2 a 31 a 32 a 3n b 3. a m1 a m2 a mn b m MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS 1. SYSTEMS OF EQUATIONS AND MATRICES 1.1. Representation of a linear system. The general system of m equations in n unknowns can be written a 11 x 1 + a 12 x 2 +

More information

Lecture 8: Gamma regression

Lecture 8: Gamma regression Lecture 8: Gamma regression Claudia Czado TU München c (Claudia Czado, TU Munich) ZFS/IMS Göttingen 2004 0 Overview Models with constant coefficient of variation Gamma regression: estimation and testing

More information

Exploratory Factor Analysis Brian Habing - University of South Carolina - October 15, 2003

Exploratory Factor Analysis Brian Habing - University of South Carolina - October 15, 2003 Exploratory Factor Analysis Brian Habing - University of South Carolina - October 15, 2003 FA is not worth the time necessary to understand it and carry it out. -Hills, 1977 Factor analysis should not

More information

Multivariate Normal Distribution

Multivariate Normal Distribution Multivariate Normal Distribution Lecture 4 July 21, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Lecture #4-7/21/2011 Slide 1 of 41 Last Time Matrices and vectors Eigenvalues

More information

How To Understand Multivariate Models

How To Understand Multivariate Models Neil H. Timm Applied Multivariate Analysis With 42 Figures Springer Contents Preface Acknowledgments List of Tables List of Figures vii ix xix xxiii 1 Introduction 1 1.1 Overview 1 1.2 Multivariate Models

More information

Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics.

Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics. Course Catalog In order to be assured that all prerequisites are met, students must acquire a permission number from the education coordinator prior to enrolling in any Biostatistics course. Courses are

More information

Solution to Homework 2

Solution to Homework 2 Solution to Homework 2 Olena Bormashenko September 23, 2011 Section 1.4: 1(a)(b)(i)(k), 4, 5, 14; Section 1.5: 1(a)(b)(c)(d)(e)(n), 2(a)(c), 13, 16, 17, 18, 27 Section 1.4 1. Compute the following, if

More information

T-test & factor analysis

T-test & factor analysis Parametric tests T-test & factor analysis Better than non parametric tests Stringent assumptions More strings attached Assumes population distribution of sample is normal Major problem Alternatives Continue

More information

Multivariate Statistical Inference and Applications

Multivariate Statistical Inference and Applications Multivariate Statistical Inference and Applications ALVIN C. RENCHER Department of Statistics Brigham Young University A Wiley-Interscience Publication JOHN WILEY & SONS, INC. New York Chichester Weinheim

More information

Introduction to Matrix Algebra

Introduction to Matrix Algebra Psychology 7291: Multivariate Statistics (Carey) 8/27/98 Matrix Algebra - 1 Introduction to Matrix Algebra Definitions: A matrix is a collection of numbers ordered by rows and columns. It is customary

More information

UNDERSTANDING THE TWO-WAY ANOVA

UNDERSTANDING THE TWO-WAY ANOVA UNDERSTANDING THE e have seen how the one-way ANOVA can be used to compare two or more sample means in studies involving a single independent variable. This can be extended to two independent variables

More information

STATISTICA Formula Guide: Logistic Regression. Table of Contents

STATISTICA Formula Guide: Logistic Regression. Table of Contents : Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary

More information

Profile analysis is the multivariate equivalent of repeated measures or mixed ANOVA. Profile analysis is most commonly used in two cases:

Profile analysis is the multivariate equivalent of repeated measures or mixed ANOVA. Profile analysis is most commonly used in two cases: Profile Analysis Introduction Profile analysis is the multivariate equivalent of repeated measures or mixed ANOVA. Profile analysis is most commonly used in two cases: ) Comparing the same dependent variables

More information

Similarity and Diagonalization. Similar Matrices

Similarity and Diagonalization. Similar Matrices MATH022 Linear Algebra Brief lecture notes 48 Similarity and Diagonalization Similar Matrices Let A and B be n n matrices. We say that A is similar to B if there is an invertible n n matrix P such that

More information

Exploratory Factor Analysis and Principal Components. Pekka Malo & Anton Frantsev 30E00500 Quantitative Empirical Research Spring 2016

Exploratory Factor Analysis and Principal Components. Pekka Malo & Anton Frantsev 30E00500 Quantitative Empirical Research Spring 2016 and Principal Components Pekka Malo & Anton Frantsev 30E00500 Quantitative Empirical Research Spring 2016 Agenda Brief History and Introductory Example Factor Model Factor Equation Estimation of Loadings

More information

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( ) Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates

More information

Rank-Based Non-Parametric Tests

Rank-Based Non-Parametric Tests Rank-Based Non-Parametric Tests Reminder: Student Instructional Rating Surveys You have until May 8 th to fill out the student instructional rating surveys at https://sakai.rutgers.edu/portal/site/sirs

More information

Multivariate Analysis of Variance. The general purpose of multivariate analysis of variance (MANOVA) is to determine

Multivariate Analysis of Variance. The general purpose of multivariate analysis of variance (MANOVA) is to determine 2 - Manova 4.3.05 25 Multivariate Analysis of Variance What Multivariate Analysis of Variance is The general purpose of multivariate analysis of variance (MANOVA) is to determine whether multiple levels

More information

Notes on Applied Linear Regression

Notes on Applied Linear Regression Notes on Applied Linear Regression Jamie DeCoster Department of Social Psychology Free University Amsterdam Van der Boechorststraat 1 1081 BT Amsterdam The Netherlands phone: +31 (0)20 444-8935 email:

More information

Forecasting Geographic Data Michael Leonard and Renee Samy, SAS Institute Inc. Cary, NC, USA

Forecasting Geographic Data Michael Leonard and Renee Samy, SAS Institute Inc. Cary, NC, USA Forecasting Geographic Data Michael Leonard and Renee Samy, SAS Institute Inc. Cary, NC, USA Abstract Virtually all businesses collect and use data that are associated with geographic locations, whether

More information

ANOVA. February 12, 2015

ANOVA. February 12, 2015 ANOVA February 12, 2015 1 ANOVA models Last time, we discussed the use of categorical variables in multivariate regression. Often, these are encoded as indicator columns in the design matrix. In [1]: %%R

More information

Statistics Graduate Courses

Statistics Graduate Courses Statistics Graduate Courses STAT 7002--Topics in Statistics-Biological/Physical/Mathematics (cr.arr.).organized study of selected topics. Subjects and earnable credit may vary from semester to semester.

More information

A SURVEY ON CONTINUOUS ELLIPTICAL VECTOR DISTRIBUTIONS

A SURVEY ON CONTINUOUS ELLIPTICAL VECTOR DISTRIBUTIONS A SURVEY ON CONTINUOUS ELLIPTICAL VECTOR DISTRIBUTIONS Eusebio GÓMEZ, Miguel A. GÓMEZ-VILLEGAS and J. Miguel MARÍN Abstract In this paper it is taken up a revision and characterization of the class of

More information

Sections 2.11 and 5.8

Sections 2.11 and 5.8 Sections 211 and 58 Timothy Hanson Department of Statistics, University of South Carolina Stat 704: Data Analysis I 1/25 Gesell data Let X be the age in in months a child speaks his/her first word and

More information

Department of Economics

Department of Economics Department of Economics On Testing for Diagonality of Large Dimensional Covariance Matrices George Kapetanios Working Paper No. 526 October 2004 ISSN 1473-0278 On Testing for Diagonality of Large Dimensional

More information

Analysis of Variance. MINITAB User s Guide 2 3-1

Analysis of Variance. MINITAB User s Guide 2 3-1 3 Analysis of Variance Analysis of Variance Overview, 3-2 One-Way Analysis of Variance, 3-5 Two-Way Analysis of Variance, 3-11 Analysis of Means, 3-13 Overview of Balanced ANOVA and GLM, 3-18 Balanced

More information

5. Linear Regression

5. Linear Regression 5. Linear Regression Outline.................................................................... 2 Simple linear regression 3 Linear model............................................................. 4

More information

A credibility method for profitable cross-selling of insurance products

A credibility method for profitable cross-selling of insurance products Submitted to Annals of Actuarial Science manuscript 2 A credibility method for profitable cross-selling of insurance products Fredrik Thuring Faculty of Actuarial Science and Insurance, Cass Business School,

More information

15.062 Data Mining: Algorithms and Applications Matrix Math Review

15.062 Data Mining: Algorithms and Applications Matrix Math Review .6 Data Mining: Algorithms and Applications Matrix Math Review The purpose of this document is to give a brief review of selected linear algebra concepts that will be useful for the course and to develop

More information

1 Introduction to Matrices

1 Introduction to Matrices 1 Introduction to Matrices In this section, important definitions and results from matrix algebra that are useful in regression analysis are introduced. While all statements below regarding the columns

More information

Regression Analysis. Regression Analysis MIT 18.S096. Dr. Kempthorne. Fall 2013

Regression Analysis. Regression Analysis MIT 18.S096. Dr. Kempthorne. Fall 2013 Lecture 6: Regression Analysis MIT 18.S096 Dr. Kempthorne Fall 2013 MIT 18.S096 Regression Analysis 1 Outline Regression Analysis 1 Regression Analysis MIT 18.S096 Regression Analysis 2 Multiple Linear

More information

Degrees of Freedom and Model Search

Degrees of Freedom and Model Search Degrees of Freedom and Model Search Ryan J. Tibshirani Abstract Degrees of freedom is a fundamental concept in statistical modeling, as it provides a quantitative description of the amount of fitting performed

More information

Regression III: Advanced Methods

Regression III: Advanced Methods Lecture 16: Generalized Additive Models Regression III: Advanced Methods Bill Jacoby Michigan State University http://polisci.msu.edu/jacoby/icpsr/regress3 Goals of the Lecture Introduce Additive Models

More information

Factor Analysis. Factor Analysis

Factor Analysis. Factor Analysis Factor Analysis Principal Components Analysis, e.g. of stock price movements, sometimes suggests that several variables may be responding to a small number of underlying forces. In the factor model, we

More information

Chapter 1 Introduction. 1.1 Introduction

Chapter 1 Introduction. 1.1 Introduction Chapter 1 Introduction 1.1 Introduction 1 1.2 What Is a Monte Carlo Study? 2 1.2.1 Simulating the Rolling of Two Dice 2 1.3 Why Is Monte Carlo Simulation Often Necessary? 4 1.4 What Are Some Typical Situations

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes

More information

Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm

Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm Mgt 540 Research Methods Data Analysis 1 Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm http://web.utk.edu/~dap/random/order/start.htm

More information

Multivariate Analysis of Ecological Data

Multivariate Analysis of Ecological Data Multivariate Analysis of Ecological Data MICHAEL GREENACRE Professor of Statistics at the Pompeu Fabra University in Barcelona, Spain RAUL PRIMICERIO Associate Professor of Ecology, Evolutionary Biology

More information

Chicago Booth BUSINESS STATISTICS 41000 Final Exam Fall 2011

Chicago Booth BUSINESS STATISTICS 41000 Final Exam Fall 2011 Chicago Booth BUSINESS STATISTICS 41000 Final Exam Fall 2011 Name: Section: I pledge my honor that I have not violated the Honor Code Signature: This exam has 34 pages. You have 3 hours to complete this

More information

Rachel J. Goldberg, Guideline Research/Atlanta, Inc., Duluth, GA

Rachel J. Goldberg, Guideline Research/Atlanta, Inc., Duluth, GA PROC FACTOR: How to Interpret the Output of a Real-World Example Rachel J. Goldberg, Guideline Research/Atlanta, Inc., Duluth, GA ABSTRACT THE METHOD This paper summarizes a real-world example of a factor

More information

Section 13, Part 1 ANOVA. Analysis Of Variance

Section 13, Part 1 ANOVA. Analysis Of Variance Section 13, Part 1 ANOVA Analysis Of Variance Course Overview So far in this course we ve covered: Descriptive statistics Summary statistics Tables and Graphs Probability Probability Rules Probability

More information

The VAR models discussed so fare are appropriate for modeling I(0) data, like asset returns or growth rates of macroeconomic time series.

The VAR models discussed so fare are appropriate for modeling I(0) data, like asset returns or growth rates of macroeconomic time series. Cointegration The VAR models discussed so fare are appropriate for modeling I(0) data, like asset returns or growth rates of macroeconomic time series. Economic theory, however, often implies equilibrium

More information

Chapter 6. Orthogonality

Chapter 6. Orthogonality 6.3 Orthogonal Matrices 1 Chapter 6. Orthogonality 6.3 Orthogonal Matrices Definition 6.4. An n n matrix A is orthogonal if A T A = I. Note. We will see that the columns of an orthogonal matrix must be

More information

Master s Theory Exam Spring 2006

Master s Theory Exam Spring 2006 Spring 2006 This exam contains 7 questions. You should attempt them all. Each question is divided into parts to help lead you through the material. You should attempt to complete as much of each problem

More information

Spatial Statistics Chapter 3 Basics of areal data and areal data modeling

Spatial Statistics Chapter 3 Basics of areal data and areal data modeling Spatial Statistics Chapter 3 Basics of areal data and areal data modeling Recall areal data also known as lattice data are data Y (s), s D where D is a discrete index set. This usually corresponds to data

More information

The Assignment Problem and the Hungarian Method

The Assignment Problem and the Hungarian Method The Assignment Problem and the Hungarian Method 1 Example 1: You work as a sales manager for a toy manufacturer, and you currently have three salespeople on the road meeting buyers. Your salespeople are

More information

Notes on Determinant

Notes on Determinant ENGG2012B Advanced Engineering Mathematics Notes on Determinant Lecturer: Kenneth Shum Lecture 9-18/02/2013 The determinant of a system of linear equations determines whether the solution is unique, without

More information

Recall this chart that showed how most of our course would be organized:

Recall this chart that showed how most of our course would be organized: Chapter 4 One-Way ANOVA Recall this chart that showed how most of our course would be organized: Explanatory Variable(s) Response Variable Methods Categorical Categorical Contingency Tables Categorical

More information

SF2940: Probability theory Lecture 8: Multivariate Normal Distribution

SF2940: Probability theory Lecture 8: Multivariate Normal Distribution SF2940: Probability theory Lecture 8: Multivariate Normal Distribution Timo Koski 24.09.2015 Timo Koski Matematisk statistik 24.09.2015 1 / 1 Learning outcomes Random vectors, mean vector, covariance matrix,

More information

Part II. Multiple Linear Regression

Part II. Multiple Linear Regression Part II Multiple Linear Regression 86 Chapter 7 Multiple Regression A multiple linear regression model is a linear model that describes how a y-variable relates to two or more xvariables (or transformations

More information

QUALITY ENGINEERING PROGRAM

QUALITY ENGINEERING PROGRAM QUALITY ENGINEERING PROGRAM Production engineering deals with the practical engineering problems that occur in manufacturing planning, manufacturing processes and in the integration of the facilities and

More information

Linear Models and Conjoint Analysis with Nonlinear Spline Transformations

Linear Models and Conjoint Analysis with Nonlinear Spline Transformations Linear Models and Conjoint Analysis with Nonlinear Spline Transformations Warren F. Kuhfeld Mark Garratt Abstract Many common data analysis models are based on the general linear univariate model, including

More information

Chapter 7. Matrices. Definition. An m n matrix is an array of numbers set out in m rows and n columns. Examples. ( 1 1 5 2 0 6

Chapter 7. Matrices. Definition. An m n matrix is an array of numbers set out in m rows and n columns. Examples. ( 1 1 5 2 0 6 Chapter 7 Matrices Definition An m n matrix is an array of numbers set out in m rows and n columns Examples (i ( 1 1 5 2 0 6 has 2 rows and 3 columns and so it is a 2 3 matrix (ii 1 0 7 1 2 3 3 1 is a

More information

SAS Software to Fit the Generalized Linear Model

SAS Software to Fit the Generalized Linear Model SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling

More information

ON THE DEGREES OF FREEDOM IN RICHLY PARAMETERISED MODELS

ON THE DEGREES OF FREEDOM IN RICHLY PARAMETERISED MODELS COMPSTAT 2004 Symposium c Physica-Verlag/Springer 2004 ON THE DEGREES OF FREEDOM IN RICHLY PARAMETERISED MODELS Salvatore Ingrassia and Isabella Morlini Key words: Richly parameterised models, small data

More information

SPSS Guide How-to, Tips, Tricks & Statistical Techniques

SPSS Guide How-to, Tips, Tricks & Statistical Techniques SPSS Guide How-to, Tips, Tricks & Statistical Techniques Support for the course Research Methodology for IB Also useful for your BSc or MSc thesis March 2014 Dr. Marijke Leliveld Jacob Wiebenga, MSc CONTENT

More information

3.1 Least squares in matrix form

3.1 Least squares in matrix form 118 3 Multiple Regression 3.1 Least squares in matrix form E Uses Appendix A.2 A.4, A.6, A.7. 3.1.1 Introduction More than one explanatory variable In the foregoing chapter we considered the simple regression

More information

Factor analysis. Angela Montanari

Factor analysis. Angela Montanari Factor analysis Angela Montanari 1 Introduction Factor analysis is a statistical model that allows to explain the correlations between a large number of observed correlated variables through a small number

More information

Applied Statistics. J. Blanchet and J. Wadsworth. Institute of Mathematics, Analysis, and Applications EPF Lausanne

Applied Statistics. J. Blanchet and J. Wadsworth. Institute of Mathematics, Analysis, and Applications EPF Lausanne Applied Statistics J. Blanchet and J. Wadsworth Institute of Mathematics, Analysis, and Applications EPF Lausanne An MSc Course for Applied Mathematicians, Fall 2012 Outline 1 Model Comparison 2 Model

More information

SF2940: Probability theory Lecture 8: Multivariate Normal Distribution

SF2940: Probability theory Lecture 8: Multivariate Normal Distribution SF2940: Probability theory Lecture 8: Multivariate Normal Distribution Timo Koski 24.09.2014 Timo Koski () Mathematisk statistik 24.09.2014 1 / 75 Learning outcomes Random vectors, mean vector, covariance

More information

Data Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression

Data Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression Data Mining and Data Warehousing Henryk Maciejewski Data Mining Predictive modelling: regression Algorithms for Predictive Modelling Contents Regression Classification Auxiliary topics: Estimation of prediction

More information

Statistical Models in R

Statistical Models in R Statistical Models in R Some Examples Steven Buechler Department of Mathematics 276B Hurley Hall; 1-6233 Fall, 2007 Outline Statistical Models Structure of models in R Model Assessment (Part IA) Anova

More information

From the help desk: Bootstrapped standard errors

From the help desk: Bootstrapped standard errors The Stata Journal (2003) 3, Number 1, pp. 71 80 From the help desk: Bootstrapped standard errors Weihua Guan Stata Corporation Abstract. Bootstrapping is a nonparametric approach for evaluating the distribution

More information

Permutation Tests for Comparing Two Populations

Permutation Tests for Comparing Two Populations Permutation Tests for Comparing Two Populations Ferry Butar Butar, Ph.D. Jae-Wan Park Abstract Permutation tests for comparing two populations could be widely used in practice because of flexibility of

More information

Multiple group discriminant analysis: Robustness and error rate

Multiple group discriminant analysis: Robustness and error rate Institut f. Statistik u. Wahrscheinlichkeitstheorie Multiple group discriminant analysis: Robustness and error rate P. Filzmoser, K. Joossens, and C. Croux Forschungsbericht CS-006- Jänner 006 040 Wien,

More information

Inner products on R n, and more

Inner products on R n, and more Inner products on R n, and more Peyam Ryan Tabrizian Friday, April 12th, 2013 1 Introduction You might be wondering: Are there inner products on R n that are not the usual dot product x y = x 1 y 1 + +

More information

Introduction to Principal Components and FactorAnalysis

Introduction to Principal Components and FactorAnalysis Introduction to Principal Components and FactorAnalysis Multivariate Analysis often starts out with data involving a substantial number of correlated variables. Principal Component Analysis (PCA) is a

More information

Mehtap Ergüven Abstract of Ph.D. Dissertation for the degree of PhD of Engineering in Informatics

Mehtap Ergüven Abstract of Ph.D. Dissertation for the degree of PhD of Engineering in Informatics INTERNATIONAL BLACK SEA UNIVERSITY COMPUTER TECHNOLOGIES AND ENGINEERING FACULTY ELABORATION OF AN ALGORITHM OF DETECTING TESTS DIMENSIONALITY Mehtap Ergüven Abstract of Ph.D. Dissertation for the degree

More information

Factor Analysis. Chapter 420. Introduction

Factor Analysis. Chapter 420. Introduction Chapter 420 Introduction (FA) is an exploratory technique applied to a set of observed variables that seeks to find underlying factors (subsets of variables) from which the observed variables were generated.

More information

Poisson Models for Count Data

Poisson Models for Count Data Chapter 4 Poisson Models for Count Data In this chapter we study log-linear models for count data under the assumption of a Poisson error structure. These models have many applications, not only to the

More information

Factor Analysis. Principal components factor analysis. Use of extracted factors in multivariate dependency models

Factor Analysis. Principal components factor analysis. Use of extracted factors in multivariate dependency models Factor Analysis Principal components factor analysis Use of extracted factors in multivariate dependency models 2 KEY CONCEPTS ***** Factor Analysis Interdependency technique Assumptions of factor analysis

More information

Fitting Subject-specific Curves to Grouped Longitudinal Data

Fitting Subject-specific Curves to Grouped Longitudinal Data Fitting Subject-specific Curves to Grouped Longitudinal Data Djeundje, Viani Heriot-Watt University, Department of Actuarial Mathematics & Statistics Edinburgh, EH14 4AS, UK E-mail: vad5@hw.ac.uk Currie,

More information

Notes on Symmetric Matrices

Notes on Symmetric Matrices CPSC 536N: Randomized Algorithms 2011-12 Term 2 Notes on Symmetric Matrices Prof. Nick Harvey University of British Columbia 1 Symmetric Matrices We review some basic results concerning symmetric matrices.

More information

Factorization Theorems

Factorization Theorems Chapter 7 Factorization Theorems This chapter highlights a few of the many factorization theorems for matrices While some factorization results are relatively direct, others are iterative While some factorization

More information

SYSTEMS OF REGRESSION EQUATIONS

SYSTEMS OF REGRESSION EQUATIONS SYSTEMS OF REGRESSION EQUATIONS 1. MULTIPLE EQUATIONS y nt = x nt n + u nt, n = 1,...,N, t = 1,...,T, x nt is 1 k, and n is k 1. This is a version of the standard regression model where the observations

More information

COMPARING DATA ANALYSIS TECHNIQUES FOR EVALUATION DESIGNS WITH NON -NORMAL POFULP_TIOKS Elaine S. Jeffers, University of Maryland, Eastern Shore*

COMPARING DATA ANALYSIS TECHNIQUES FOR EVALUATION DESIGNS WITH NON -NORMAL POFULP_TIOKS Elaine S. Jeffers, University of Maryland, Eastern Shore* COMPARING DATA ANALYSIS TECHNIQUES FOR EVALUATION DESIGNS WITH NON -NORMAL POFULP_TIOKS Elaine S. Jeffers, University of Maryland, Eastern Shore* The data collection phases for evaluation designs may involve

More information

Overview of Non-Parametric Statistics PRESENTER: ELAINE EISENBEISZ OWNER AND PRINCIPAL, OMEGA STATISTICS

Overview of Non-Parametric Statistics PRESENTER: ELAINE EISENBEISZ OWNER AND PRINCIPAL, OMEGA STATISTICS Overview of Non-Parametric Statistics PRESENTER: ELAINE EISENBEISZ OWNER AND PRINCIPAL, OMEGA STATISTICS About Omega Statistics Private practice consultancy based in Southern California, Medical and Clinical

More information

APPLIED MISSING DATA ANALYSIS

APPLIED MISSING DATA ANALYSIS APPLIED MISSING DATA ANALYSIS Craig K. Enders Series Editor's Note by Todd D. little THE GUILFORD PRESS New York London Contents 1 An Introduction to Missing Data 1 1.1 Introduction 1 1.2 Chapter Overview

More information

Review Jeopardy. Blue vs. Orange. Review Jeopardy

Review Jeopardy. Blue vs. Orange. Review Jeopardy Review Jeopardy Blue vs. Orange Review Jeopardy Jeopardy Round Lectures 0-3 Jeopardy Round $200 How could I measure how far apart (i.e. how different) two observations, y 1 and y 2, are from each other?

More information

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not. Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C

More information

INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA)

INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA) INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA) As with other parametric statistics, we begin the one-way ANOVA with a test of the underlying assumptions. Our first assumption is the assumption of

More information

PROBABILITY AND STATISTICS. Ma 527. 1. To teach a knowledge of combinatorial reasoning.

PROBABILITY AND STATISTICS. Ma 527. 1. To teach a knowledge of combinatorial reasoning. PROBABILITY AND STATISTICS Ma 527 Course Description Prefaced by a study of the foundations of probability and statistics, this course is an extension of the elements of probability and statistics introduced

More information

Non Parametric Inference

Non Parametric Inference Maura Department of Economics and Finance Università Tor Vergata Outline 1 2 3 Inverse distribution function Theorem: Let U be a uniform random variable on (0, 1). Let X be a continuous random variable

More information

The Characteristic Polynomial

The Characteristic Polynomial Physics 116A Winter 2011 The Characteristic Polynomial 1 Coefficients of the characteristic polynomial Consider the eigenvalue problem for an n n matrix A, A v = λ v, v 0 (1) The solution to this problem

More information

Principle Component Analysis and Partial Least Squares: Two Dimension Reduction Techniques for Regression

Principle Component Analysis and Partial Least Squares: Two Dimension Reduction Techniques for Regression Principle Component Analysis and Partial Least Squares: Two Dimension Reduction Techniques for Regression Saikat Maitra and Jun Yan Abstract: Dimension reduction is one of the major tasks for multivariate

More information

Multiple regression - Matrices

Multiple regression - Matrices Multiple regression - Matrices This handout will present various matrices which are substantively interesting and/or provide useful means of summarizing the data for analytical purposes. As we will see,

More information

Linear Algebra Notes

Linear Algebra Notes Linear Algebra Notes Chapter 19 KERNEL AND IMAGE OF A MATRIX Take an n m matrix a 11 a 12 a 1m a 21 a 22 a 2m a n1 a n2 a nm and think of it as a function A : R m R n The kernel of A is defined as Note

More information

Eigenvalues, Eigenvectors, Matrix Factoring, and Principal Components

Eigenvalues, Eigenvectors, Matrix Factoring, and Principal Components Eigenvalues, Eigenvectors, Matrix Factoring, and Principal Components The eigenvalues and eigenvectors of a square matrix play a key role in some important operations in statistics. In particular, they

More information