Multivariate Analysis. Overview



Similar documents
DISCRIMINANT FUNCTION ANALYSIS (DA)

When to Use a Particular Statistical Test

Multivariate Analysis of Variance (MANOVA)

Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.

Course Text. Required Computing Software. Course Description. Course Objectives. StraighterLine. Business Statistics

Module 3: Correlation and Covariance

How To Understand Multivariate Models

Data analysis process

Multiple Regression: What Is It?

Additional sources Compilation of sources:

Profile analysis is the multivariate equivalent of repeated measures or mixed ANOVA. Profile analysis is most commonly used in two cases:

Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics.

SPSS ADVANCED ANALYSIS WENDIANN SETHI SPRING 2011

Introduction to Regression and Data Analysis

Introduction to Principal Components and FactorAnalysis

Multivariate Normal Distribution

Lecture 2: Descriptive Statistics and Exploratory Data Analysis

How to Get More Value from Your Survey Data

HLM software has been one of the leading statistical packages for hierarchical

10. Analysis of Longitudinal Studies Repeat-measures analysis

Descriptive Statistics. Purpose of descriptive statistics Frequency distributions Measures of central tendency Measures of dispersion

Multivariate analyses

Univariate Regression

Simple Predictive Analytics Curtis Seare

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

Multivariate Statistical Inference and Applications

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012

Introduction to Longitudinal Data Analysis

Case Study in Data Analysis Does a drug prevent cardiomegaly in heart failure?

Multivariate Analysis of Variance (MANOVA)

II. DISTRIBUTIONS distribution normal distribution. standard scores

Silvermine House Steenberg Office Park, Tokai 7945 Cape Town, South Africa Telephone:

Common factor analysis

STATISTICA Formula Guide: Logistic Regression. Table of Contents

Factor Analysis. Principal components factor analysis. Use of extracted factors in multivariate dependency models

Correlational Research. Correlational Research. Stephen E. Brock, Ph.D., NCSP EDS 250. Descriptive Research 1. Correlational Research: Scatter Plots

Simple linear regression

Week 1. Exploratory Data Analysis

January 26, 2009 The Faculty Center for Teaching and Learning

Introduction to Data Analysis in Hierarchical Linear Models

Dimensionality Reduction: Principal Components Analysis

Missing Data: Part 1 What to Do? Carol B. Thompson Johns Hopkins Biostatistics Center SON Brown Bag 3/20/13

Mathematics within the Psychology Curriculum

Description. Textbook. Grading. Objective

Statistics. Measurement. Scales of Measurement 7/18/2012

An analysis method for a quantitative outcome and two categorical explanatory variables.

Fairfield Public Schools

Part 2: Analysis of Relationship Between Two Variables

Organizing Your Approach to a Data Analysis

Canonical Correlation Analysis

Section Format Day Begin End Building Rm# Instructor. 001 Lecture Tue 6:45 PM 8:40 PM Silver 401 Ballerini

A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution

Descriptive Statistics

Basic Concepts in Research and Data Analysis

Directions for using SPSS

Exploratory Factor Analysis and Principal Components. Pekka Malo & Anton Frantsev 30E00500 Quantitative Empirical Research Spring 2016

Moderation. Moderation

Introduction to Statistics and Quantitative Research Methods

Exploratory Data Analysis. Psychology 3256

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Module 5: Multiple Regression Analysis

Elements of statistics (MATH0487-1)

[This document contains corrections to a few typos that were found on the version available through the journal s web page]

Design & Analysis of Ecological Data. Landscape of Statistical Methods...

Applied Multiple Regression/Correlation Analysis for the Behavioral Sciences

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

DESCRIPTIVE STATISTICS AND EXPLORATORY DATA ANALYSIS

business statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar

Regression Analysis: A Complete Example

(and sex and drugs and rock 'n' roll) ANDY FIELD

DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses.

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS

Analysing Questionnaires using Minitab (for SPSS queries contact -)

Analysis of Data. Organizing Data Files in SPSS. Descriptive Statistics

4.1 Exploratory Analysis: Once the data is collected and entered, the first question is: "What do the data look like?"

UNDERSTANDING ANALYSIS OF COVARIANCE (ANCOVA)

DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9

Factor Analysis. Chapter 420. Introduction

Chapter Eight: Quantitative Methods

Using Multivariate Statistics

Statistics Review PSY379

Running head: SCHOOL COMPUTER USE AND ACADEMIC PERFORMANCE. Using the U.S. PISA results to investigate the relationship between

Improving the Performance of Data Mining Models with Data Preparation Using SAS Enterprise Miner Ricardo Galante, SAS Institute Brasil, São Paulo, SP

Teaching Multivariate Analysis to Business-Major Students

Statistics Graduate Courses

DATA COLLECTION AND ANALYSIS

An analysis appropriate for a quantitative outcome and a single quantitative explanatory. 9.1 The model behind linear regression

Introduction to Quantitative Methods

Assumptions. Assumptions of linear models. Boxplot. Data exploration. Apply to response variable. Apply to error terms from linear model

11. Analysis of Case-control Studies Logistic Regression

Bill Burton Albert Einstein College of Medicine April 28, 2014 EERS: Managing the Tension Between Rigor and Resources 1

Data Analysis Tools. Tools for Summarizing Data

Factors affecting online sales

Overview Classes Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7)

Appendix G STATISTICAL METHODS INFECTIOUS METHODS STATISTICAL ROADMAP. Prepared in Support of: CDC/NCEH Cross Sectional Assessment Study.

An introduction to Value-at-Risk Learning Curve September 2003

Curriculum - Doctor of Philosophy

Transcription:

Multivariate Analysis Overview

Introduction Multivariate thinking Body of thought processes that illuminate the interrelatedness between and within sets of variables. The essence of multivariate thinking is to expose the inherent structure and meaning revealed within these sets of variables through application and interpretation of various statistical methods

Why the multivariate approach? Big idea- multiple response outcomes With univariate analyses we have just one dependent variable of interest Although any analysis of data involving more than one variable could be seen as multivariate, we typically reserve the term for multiple dependent variables So MV analysis is an extension of UV ones, or conversely, many of the UV analyses are special cases of MV ones

Why MV over the univariate approach? Complexity The subject/data studied may be more complex than what univariate methods can offer in terms of analysis Reality In some cases it would be inappropriate to conduct univariate analysis as the data/research demand a multivariate analysis

Why MV over the univariate approach? Experimental data Although experimental research can be and often is multivariate, typically subjects are assigned to groups and the manipulations regard corresponding changes to a single outcome Different doses of caffeine test performance Causality is more easily deduced Non-experimental data Likewise survey/inventory data might be analyzed in univariate fashion, but typically it will require the multivariate approach to solve the questions stemming from it Correlational

Why not MV? In the past the computations were overwhelming even with smaller datasets, and so MV analyses were typically avoided Now this is not a problem but there are still reasons to not do a MV analysis

Why not MV? Ambiguity MV analysis may result in a less clear understanding of the data E.g. group differences on a linear combination of DVs (Manova) Differences are easily interpreted in a univariate sense Ambiguity because of ignorance of the technique is not a valid reason however Unnecessary complexity Just because SEM looks neat/is popular doesn t mean you have to do one, or that it is the best way to answer your research question No free lunch MV analyses come with their own rules and assumptions that may make analysis difficult or not as strong

Multivariate Pros and Cons Summary Advantages of using a multivariate statistic Richer realistic design Looks at phenomena in an overarching way (provides multiple levels of analysis) Each method differs in amount or type of Independent Variables (IVs) and DVs Can help control for Type I Error Disadvantages Larger Ns are often required More difficult to interpret Less known about the robustness of assumptions

Primary purposes of MV analysis Prediction and explanation Determining structure

Prediction The goal in most research situations is to be able to predict outcomes based on prior information E.g. given a person s gender and region, what will their attitude be on some social issue? Given a number of variables how well can we predict group membership? Explanation Which variables are most important in the prediction of some outcome? In many cases this is end goal of an analysis, though a very problematic one

A caveat regarding explanation Determining variable importance can be a suspect endeavor Something that might be deemed a statistically significant variable may not make the cut had the study been conducted again Depending on a number of factors, results may be sample specific i.e. you may not see the same ordering next time

Structure A different goal in MV analysis is to determine the structure of the data Is there an underlying dimension that can describe the data in a simpler fashion? Methods involve classification and/or data reduction Latent variables (constructs) Example: Observed variables Giddiness, Silliness, Irrationality, Possessiveness and Misunderstanding reduced to the underlying construct of Love Interest may be in reducing variables (Factor analysis), emphasis on group membership (Cluster analysis), stimulus structure (MDS) etc.

Prediction and Structure Both prediction and structure may be the goal of analysis SEM and path analysis How well does the model fit the data?

Multivariate Themes Multiple Theories and Hypotheses Multiple Empirical Studies Multiple Measures Multiplicity Theme Multiple considerations at all levels of focus, with greater multiplicity generally leading to greater reliability, validity, and generalization: Multiple Time Points Multiple Controls Multiple Samples Practical Implications Multiple Statistical Methods

Multivariate Themes Variance Systematic Random Central Themes All multivariate methods focus on these central themes: Covariance Ratio of Variances Linear Combinations (e.g., Components, Factors) Interpretation Themes Big picture and Specific levels Macro-Assessment (e.g., Significance Test & Effect Size) Micro-Assessment (e.g., Examining Means or Weights)

Things to consider Initial variable choice Comes down to: Familiarity with previous research Instrument used Expertise with field of study Common sense Much of the hard work consists of developing a plan of attack and deciding on how to study the problem

Initial Examination of Data Preliminary analysis A thorough initial examination of the data is not only required but also necessary for a full understanding of any research Such initial analyses provide a better grasp of what is happening in the data and may inform the MV analysis to a certain extent However, in the MV case, if the actual goal is interpretation of the UV analyses (as one often sees in MANOVA), the MV analysis is unwarranted

More to consider Intro now, more details as we discuss each method Assumptions important for inferences beyond the sample Normality: Basic assumption of General Linear Model; concerned with an elliptical pattern of residuals for the data Skewness: Distribution of scores is tilted (asymmetrical) Direction established by tail greater skewness = less normality Kurtosis: Degree of peakedness of data 3 Types: leptokurtic (thin); mesokurtic (normal); platykurtic (flattened)

More to consider Linearity Data forms a relatively straight oval line when plotted Homoscedasticity variance of 1 variable is equal at all levels of other variables understood through standard deviations across variables and scatter plots Referred to as homogeneity of variance in ANOVA methods Homogeneity of regression Regression slopes between covariate and DV are equal across groups of IV Do not want this statistic (F) to be significantly different if so, violation of assumption for (M)ANCOVA

More to consider Multicollinearity Correlation coefficient (r) between predictors is noticeably large Causes instability in the statistical procedure Can t differentiate which variables are contributing to outcome Singularity Redundant variables brings discriminant in equation to zero Orthogonality Allows no association among variables Not realistic in real world data May allow greater interpretability versus data that are too related

More to consider Outliers Effect mean (inflate/deflate) disguising true relationship Distort data create noise (error) lose power Transformations (log or square root) may be helpful with outliers Reshapes distribution creating a more normal distribution However you now have a scale with which you are unfamiliar and which you cannot generalize back to the original

Some distinctions Types of data Nominal/Categorical Ordinal Continuous Interval or Ratio The types of variables involved will say much about what analyses are going to be appropriate and/or how one might proceed with a particular analysis

Types of data One thing to keep in mind is that these distinctions are largely arbitrary One can dichotomize a continuous measure into categories A bad idea most of the time An ordinal measure (e.g. likert question) has a mean/construct that actually falls along a continuum How the data is to be considered is largely left to the researcher

Sample vs. Population In typical research we are rarely dealing with a population The goal in research is not to simply describe our data but to generalize to the real world from which the sample is taken This is the purpose of conducting inferential analyses which require certain assumptions to be met in order to be utilized Many analyses and data collection are for a variety of reasons (not good) sample-specific, and not much use to the scientific community Take care in the initial phase of research planning to help guard against such a situation

The linear combination of variables Whether of IVs or DVs, a linear combination of variables is often necessary to interpret the data This idea is essential to thinking multivariately MultReg Finding the linear combination of IVs that best predicts the DV Manova What linear combination of DVs maximizes the distinction between groups

How many variables Considerations Cost Availability Meaningfulness Theory For ease of understanding and efficiency we typically want the fewest number of variables that will explain the most Ockham s razor

Statistical power and effect size A problem that has plagued the social sciences is the lack of power to find subtle effects Some multivariate procedures will require relatively large amounts of data (e.g. SEM) Power and sample size are a required consideration before any attempt at research, multivariate or otherwise, though typically sample size will be determined by the practicalities and limitations of the research After the fact, emphasis should be placed on effect size and model fit, rather than p-values More later

The matrices of interest Data matrix What you see in SPSS or whatever program you re using Includes the cases and their corresponding values for the variables of interest Correlation matrix- R Contains information about the linear relationship between variables Standardized covariance Symmetrical Square cov r = xy ss x y Typically only the bottom portion is shown as the top portion is its mirror image and the diagonal contains all ones (each variable is perfectly correlated with itself)

The matrices of interest Variance/Covariance matrix - Σ Square and symmetrical Variance of each variable is on the diagonal, covariances with other variables on the offdiagonals In some cases you will have the option to use correlations or covariances as the unit of analysis, with some debate about which is better under what circumstances

The matrices of interest Sum of Squares and cross-products matrix - S Precursor to the Variance/Covariance matrix (the values before division by N-1) On the diagonal is a variable s sum of the squared deviations from its mean Off-diagonal elements are the sum of the products of the deviation scores for the two variables

Methods of analysis A host of methods are available to the researcher The kind of question asked will help guide one in choosing the appropriate analysis, however the data may be available to multiple methods, and almost always is

Degree of relationship Bivariate r The degree of linear relationship between two variables Partial and semi-partial Multiple R The relationship of a set of variables to another (dependent) variable Canonical R The grandaddy Relationship between sets of variables Methods are also available to assess the relationship among non-continuous variables E.g. Chi-square, Multiway Frequency Analysis

Group Differences Very popular research question in social sciences (too popular really) Is group A different from B? The answer is always yes, and with a large enough sample, statistically significantly so Anova and related Manova the multivariate counterpart Repeated measures

Predicting group membership Turning the group difference question the other way around Discriminant function analysis Logistic regression

Structure Data reduction and classification Cluster analysis Seeks to identify homogeneous subgroups of cases or variables based on some measure of distance Identify a set of groups in which within-group variation is minimized and between-group variation is maximized Principal components and Factor analysis Reduce a large number of variables to smaller Often used in psych for the development of inventories Structural equation modeling Where factor analysis and regression meet

Time course of events How long is it before some event occurs? How does a DV change over the course of time? The former question can be answered with survival/failure analysis Survival rates for disease Time before failure for a particular electronic part The latter is often examined with time-series analysis Many time periods are available for analysis E.g. monthly stock prices over the past five years Popular in the economics realm

Decision tree

Decision tree

Decision tree Although such guides may be useful, as mentioned before, multiple analyses may be appropriate for the data under consideration The best plan of attack is to have a well-defined research question, and collect data appropriate to the analysis that will best answer that question

Multivariate Methods: Quick Glance Organizational Chart based on: Type of Research Focus (Group differences or Correlational). Research Question IVs: Number and Scale # & Scale Method Research Focus IVs DVs Multivariate Number & Scale Number & Scale Method Group Differences 1+ categorical & continuous 1 continuous ANCOVA 1+ categorical 2+ continuous MANOVA 2+ continuous 1+ categorical DFA 1+categ or cont 1 categorical LR Correlational 2+ continuous 1 continuous MR 2+ continuous 2+ continuous CC - 2+ continuous PCA & FA Note: Scale and number of Independent (IV) and Dependent (DV) categorical or continuous variables. + indicates 1 or more; ANCOVA = Analysis of Covariance; MANOVA = Multivariate Analysis of Variance; DFA = Discriminant Function Analysis; LR=Logistic Regression; MR = Multiple Regression; CC = Canonical Correlation; PCA/FA = Principal Components/Factor Analysis

Summary of Methods The multivariate methods we will look at are a set of tools for analyzing multiple variables in an integrated and powerful way. They allow the examination of richer and perhaps more realistic designs than can be assessed with traditional univariate methods that only analyze one outcome variable and usually just one or two independent variables (IVs) Compared to univariate methods, multivariate methods allow us to analyze a complex array of variables, providing greater assurance that we can come to some synthesizing conclusions with less error and more validity than if we were to analyze variables in isolation.