Step 5: Conduct Analysis. The CCA Algorithm

Save this PDF as:

Size: px
Start display at page:

Transcription

1 Model Parameterization: Step 5: Conduct Analysis P Dropped species with fewer than 5 occurrences P Log-transformed species abundances P Row-normalized species log abundances (chord distance) P Selected smallest best subset of variables from each set of independent variables P Chose RDA (in combo with chord distance) 1 The CCA Algorithm Assign arbitrary weights to samples Compute species scores as weighted averages Compute sample scores as weighted averages Regress sample scores on environmental variables and assign new sample scores as the predicted values from regression Rescale (standardize) sample scores Subsequent axes computed from residuals CCA estimates species optima, regression coefficients, and site scores using a Gaussian weighted-averaging model combined with regression. P Gradients are assumed to be linear combinations of the environmental variables. P Key assumption is that of unimodal species distributions along environmental gradients. 2

2 Step 6: Interpret Results P Effectiveness of the Constrained Ordination < Total inertia, eigenvalues, percent variance explained, significance tests, species-environment correlation, cumulative species/sample fit. P Interpreting the Constrained Ordination < Canonical coefficients, inter-set and intra-set correlations, biplot scores. < The Triplot: sample vs species scaling, centroid rule vs biplot rule, general interpretation. < Interpreting species-environment relationships. 3 Effectiveness of the Constrained Ordination Total inertia P Sum of the eigenvalues; total variance in the species data, some of which is accounted for by the environmental constraints. Eigenvalues P Variance in the community matrix attributed to a particular axis; measure of importance of the ordination axis. 4

3 Effectiveness of the Constrained Ordination Percent variance of species data P Proportion of species P For ecological data the variance (total inertia) percentage of explained variance explained by each axis. is usually low; often ~ 10%. Not Computed as to worry, as it is an inherent eigenvalue/total inertia feature of data with a strong (reported as cumulative %). presence/absence aspect. 5 Effectiveness of the Constrained Ordination Significance Tests P Monte Carlo global permutation tests of significance of canonical axes. P Test based on 1 st canonical axis has maximum power against the alternative hypothesis that there is a single dominating gradient that determines the relation between the species and environment. 6

4 Effectiveness of the Constrained Ordination Scores for Sites and Species P Ordination scores (coordinates on ordination axes) given for each site (sample) and each species. Two sets of scores are produced for sites (samples): < Linear combination (LC) scores linear combinations of the independent variables. LC scores are in environmental space, each axis formed by linear combinations of environmental variables. < Weighted averages (WA) scores weighted averages (sums) of the species scores that are as simimlar to LC scores as possible. WA scores are in species space. LC scores show were the site should be; the WA scores show where the site is. 7 Effectiveness of the Constrained Ordination LC scores (samples) WA scores (samples) 8

5 Effectiveness of the Constrained Ordination LC versus WA Scores P Longstanding debate (confusion) over whether to use LC or WA scores. P LC scores are very sensitive to noise in the environmental data, whereas the WA scores are much more stable, being largely determined by the species data rather than being a direct projection in environmental space. Use of LC scores will misrepresent the observed community relationships unless the environmental data are both noiseless and meaningful... CCA (RDA) with LC scores is inappropriate where the objective is to describe community structure. (McCune) 9 Effectiveness of the Constrained Ordination Species-environmental correlations P The multiple correlation coefficient of the final regression. Correlation between site scores computed as weighted averages of species scores (WA s) and site scores computed as linear combinations of environmental variables (LC s); measures how well the extracted variation in community composition can be explained by environmental variables. This is a bad measure of goodness of ordination, because it is sensitive to extreme scores (like correlations are), and very sensitive to overfitting or using too many constraints. (Oksanen) 10

6 Effectiveness of the Constrained Ordination Goodness-of-Fit for Species/Samples P Ordination diagnostic to find out which species/samples are illrepresented in the ordination. < % variance (inertia) explained by selected axes. Plot species with >10% explained 11 Interpreting the Constrained Ordination Canonical Coefficients P Regression coefficients of the linear combinations of environmental variables that define the ordination axes; they are eigenvector coefficients and not very useful for interpretation. Unstandardized coefficients Standardized coefficients 12

7 Interpreting the Constrained Ordination IntRA-set Correlation (structure) Coefficients P Correlations between sample scores derived as linear combinations of environmental variables (LC scores) and environmental variables (i.e., loadings); they are related to the canonical coefficients but differ in several important aspects. 13 Interpreting the Constrained Ordination IntRA-set Correlation (structure) Coefficients P Both canonical coefficients and IntRA-set correlations relate to the rate of change in community composition per unit change in the corresponding environmental variable but for canonical coefficients it is assumed that other environmental variables are being held constant, while for IntRA-set correlations other environmental variables are assumed to covary as they do in the data set. P When variables are intercorrelated canonical coefficients become unstable the multicollinearity problem; however, IntRA-set correlations are not effected. 14

8 Interpreting the Constrained Ordination IntRA-set Correlation (structure) Coefficients P Canonical coefficients and IntRA-set correlations indicate which environmental variables are more influential in structuring the ordination, but cannot be viewed as an independent measure of the strength of relationship between communities and environmental variables. P Probably only meaningful if the site scores are derived as LC scores (which is generally not recommended). They have all the problems of correlations, like being sensitive to extreme values, and they focus on the relationship between single constraints and single axes instead of multivariate analysis. (Oksanen) 15 Interpreting the Constrained Ordination IntER-set Correlation (structure) Coefficients P Correlations between species-derived sample scores (WA scores) and environmental variables. < Should be higher than correlations derived from unconstrained ordination. < Generally lower than IntRA-set correlations. < IntRA-set correlation x species-environment correlation. 16

9 Interpreting the Constrained Ordination IntER-set Correlation (structure) Coefficients P Like IntRA-set correlations, they do not become unstable when the environmental variables are strongly correlated with each other (multicollinear). P Probably only meaningful if the site scores are derived as WA scores (which is generally recommended). They have all the problems of correlations, like being sensitive to extreme values, and they focus on the relationship between single constraints and single axes instead of multivariate analysis. (Oksanen) 17 Interpreting the Constrained Ordination Biplot Scores for Environmental Variables P Environmental variables are typically represented as vectors radiating from the centroid of the ordination. P Biplot scores give the coordinates of the heads of the environmental vectors, and are based on the IntRA-set correlations. 18

10 P The triplot displays the major patterns in the species data with respect to the environmental variables. The Canonical Triplot Tri =(1) Samples (2) Species (3) Environment P But there are a number of ways to scale the triplot which affect the exact quantitative interpretation of the relationships shown. 19 Sample Scaling P Optimally displays intersample relationships. P Variance of sample scores on each axis reflects the importance of the axis as measured by the eigenvalue, whereas the variance of the species scores along the axes are equal. The Canonical Triplot 20

11 Species Scaling P Optimally displays interspecies relationships. P Variance of species scores on each axis reflects the importance of the axis as measured by the eigenvalue, whereas the variance of the sample scores along the axes are equal. The Canonical Triplot Generally the preferred scaling 21 The Centroid Principle P Way of interpreting ordination diagrams in unimodal methods (CCA). P Each species' point is at the centroid (weighted average) of the sample points where it occurs (i.e., center of its niche); the samples that contain a particular species are scattered around that species' point in the diagram. The Canonical Triplot 22 Species-Hill s Scaling

12 The Biplot Rule P Way of interpreting ordination diagrams in linear methods (RDA). P An arrow through the species points in direction of maximum change in abundance of the species. The Canonical Triplot P The order of sites projected onto arrow gives the inferred ranking of the relative abundance of the species across sites. Species-Biplot Scaling 23 The Canonical Triplot General Interpretation regardless of scaling P Sample locations indicate their compositional similarity to each other. P Samples tend to be dominated by the species that are located near them (or projected toward them) in ordination space. 24

13 The Canonical Triplot General Interpretation regardless of scaling P Species locations indicate their distributional similarity to each other. P Species tend to be most present and abundant in the sites that are located near them (or in the direction of their projection) in ordination space. 25 The Canonical Triplot General Interpretation regardless of scaling P Length of environmental vector indicates its importance to the ordination. P Direction of the vector indicates its correlation with each of the axes. P Angles between vectors indicate the correlation between the environmental variables themselves. 26

14 The Canonical Triplot General Interpretation regardless of scaling P In unimodal model (CCA), perpendiculars drawn from species to environmental arrow gives approximate ranking of species response to that variable, and whether species has higherthan-average or lower-than average optimum on that environmental variable. Higher-thanaverage Lower-thanaverage 27 The Canonical Triplot General Interpretation regardless of scaling P In linear model (RDA), direction of arrow drawn from orgin to species and environmental arrow gives approximate relationship between species and environmental variable; i.e., whether species has positive or negative relationship with that environmental variable. Positive correlation Negative correlation 28

15 The Canonical Triplot General Interpretation regardless of scaling P Perpendiculars drawn from samples to environmental arrows gives approximate ranking of sample values for that environmental variable. P And whether a sample has a higher-than-average or lower-than-average value on that environmental variable. Higher-thanaverage Lower-thanaverage P It is more natural to represent nominal environmental variables by points instead of arrows. P The points are located at the centroid of sites that belong to that class. P Classes containing sites with high values for a species will tend to lie close to that species point. 29 The Canonical Triplot Nominal Environmental Variables 30

16 The Canonical Triplot Samples-scaling Species-scaling Don t worry; focus on relationships 31 Major Differences Between RDA and CCA CCA/dCCA P Chi-square distance among samples preserved (approximates unimodal species responses). P Rare species are given disproportionate weight in the analysis. P Unequal weighting of sites in the regression (equal to site totals). RDA P Euclidean distance among samples preserved (appropriate with linear species respones and can approximate unimodal responses with appropriate data transformations). P Rare species are not given high weight in the analysis. P Equal weighting of sites in the regression. 32

17 Major Differences Between RDA and CCA CCA/dCCA P Species-environment correlation equals the correlation between the site scores that are weighted averages of the species scores (WA s) and the site scores that are a linear combination of the environmental variables (LC s). P Ordination diagram can be interpreted using the centroid principle. RDA P Species-environment correlation equals the correlation between the site scores that are weighted sums of the species scores (WA s) and the site scores that are a linear combination of the environmental variables (LC s). P Ordination diagram can be interpreted using the biplot rule. 33 Usefulness of Constrained Ordination Displaying Interpretable Relationships P Constrained ordination (as with any ordination) is ultimately about portraying meaningful speciesenvironment relationships. 34

18 Usefulness of Constrained Ordination Displaying Interpretable Relationships P Constrained ordination (as with any ordination) is ultimately about portraying meaningful speciesenvironment relationships. 35 Usefulness of Constrained Ordination Displaying Interpretable Relationships P Constrained ordination (as with any ordination) is ultimately about portraying meaningful speciesenvironment relationships. 36

19 Multivariate Regression Trees (MRT) P Nonparametric procedure useful for exploration, description, and prediction of species-environment relationships (De ath 2002). P Natural extension of Univariate Regression Trees (URT): with the univariate response of URT being replaced by a multivariate response. P Recursive partitioning of the data space such that the populations within each partition become more and more homogeneous. 37 Important Characteristics of MRT P Makes no assumption about the form of the relationships between species and their environment. P Form of multivariate regression in that the response is explained, and can be predicted, by the explanatory variables. P Method of constrained cluster analysis, because it determines clusters that are similar in a chosen measure of species dissimilarity (e.g., assemblage type), with each cluster being defined by a set of environmental values (e.g., habitat type). 38

20 Univariate Regression Trees (URT) 1.At each node, the tree algorithm searches through the variables one by one, beginning with x 1 and continuing up to x M. 2.For each variable it finds the best split (minimizes the total sums of squares or sums of absolute deviations about the median). 3.Then it compares the M best single-variable splits and selects the best of the best. 4.Recursively partition each node until an over-large tree is grown and then pruned back to the desired size (usually based on cross-validated predicted mean square error). 5.Describe each terminal node (mean, median,variance) and the overall fit of the tree (% variance explained). 39 Multivariate Regression Trees (MRT) P Redefine node impurity by summing the univariate impurity measure over the multivariate response. < Minimize the sums of squared Euclidean distances (SSD) of sites about the node centroid. P Additional ways to interpret the results of MRT: < Identification of species that most strongly determine the splits. < Tree biplots to represent group means and species and site information. < Indentification of species that best characterize the groups (ISA). 40

21 Multivariate Regression Trees (MRT) P Compare the MRT solution to unconstrained clustering using a metric equivalent to the MRT impurity and the same number of clusters. P MRT can be extended to work directly from a dissimilarity matrix; clusters can be formed by splitting the data on environmental values that minimize the intersite SSD. 41 MRT vs CCA vs RDA P Unlike CCA/RDA, MRT is not a method of direct gradient analysis, in the sense of locating sites along ecological gradients. < MRT does locate sites (albeit grouped) in an ecological space defined by the environmental variables. P Unlike CCA/RDA, the MRT solution space is nonlinear and includes interactions between the environmental variables. P MRT is a divisive partitioning technique; CCA/RDA model continuous structure. 42

22 MRT vs CCA vs RDA P MRT emphasizes local structure and interactions between environmental effects; CCA/RDA determine global structure. P MRT does not assume particular relationships between species abundances and environmental characteristics; CCA/RDA assume unimodal and linear response models MRT vs CCA vs RDA P MRT has outperformed CCA/RDA in both simulated and real ecological data sets in both explaining and predicting species composition. P The advantage of MRT increases with the strength of interactions and the nonlinearity of relationships between species composition and the environmental variables. 44

4.7. Canonical ordination

Université Laval Analyse multivariable - mars-avril 2008 1 4.7.1 Introduction 4.7. Canonical ordination The ordination methods reviewed above are meant to represent the variation of a data matrix in a

Multivariate Analysis of Ecological Data

Multivariate Analysis of Ecological Data MICHAEL GREENACRE Professor of Statistics at the Pompeu Fabra University in Barcelona, Spain RAUL PRIMICERIO Associate Professor of Ecology, Evolutionary Biology

Dimensionality Reduction: Principal Components Analysis

Dimensionality Reduction: Principal Components Analysis In data mining one often encounters situations where there are a large number of variables in the database. In such situations it is very likely

Design & Analysis of Ecological Data. Landscape of Statistical Methods...

Design & Analysis of Ecological Data Landscape of Statistical Methods: Part 3 Topics: 1. Multivariate statistics 2. Finding groups - cluster analysis 3. Testing/describing group differences 4. Unconstratined

Principal Components Analysis (PCA)

Principal Components Analysis (PCA) Janette Walde janette.walde@uibk.ac.at Department of Statistics University of Innsbruck Outline I Introduction Idea of PCA Principle of the Method Decomposing an Association

SOME NOTES ON STATISTICAL INTERPRETATION. Below I provide some basic notes on statistical interpretation for some selected procedures.

1 SOME NOTES ON STATISTICAL INTERPRETATION Below I provide some basic notes on statistical interpretation for some selected procedures. The information provided here is not exhaustive. There is more to

, then the form of the model is given by: which comprises a deterministic component involving the three regression coefficients (

Multiple regression Introduction Multiple regression is a logical extension of the principles of simple linear regression to situations in which there are several predictor variables. For instance if we

Principal Component Analysis

Principal Component Analysis ERS70D George Fernandez INTRODUCTION Analysis of multivariate data plays a key role in data analysis. Multivariate data consists of many different attributes or variables recorded

Technology Step-by-Step Using StatCrunch

Technology Step-by-Step Using StatCrunch Section 1.3 Simple Random Sampling 1. Select Data, highlight Simulate Data, then highlight Discrete Uniform. 2. Fill in the following window with the appropriate

Lecture - 32 Regression Modelling Using SPSS

Applied Multivariate Statistical Modelling Prof. J. Maiti Department of Industrial Engineering and Management Indian Institute of Technology, Kharagpur Lecture - 32 Regression Modelling Using SPSS (Refer

Applied Multivariate Analysis

Neil H. Timm Applied Multivariate Analysis With 42 Figures Springer Contents Preface Acknowledgments List of Tables List of Figures vii ix xix xxiii 1 Introduction 1 1.1 Overview 1 1.2 Multivariate Models

Improving the Performance of Data Mining Models with Data Preparation Using SAS Enterprise Miner Ricardo Galante, SAS Institute Brasil, São Paulo, SP

Improving the Performance of Data Mining Models with Data Preparation Using SAS Enterprise Miner Ricardo Galante, SAS Institute Brasil, São Paulo, SP ABSTRACT In data mining modelling, data preparation

Data Mining Techniques Chapter 6: Decision Trees

Data Mining Techniques Chapter 6: Decision Trees What is a classification decision tree?.......................................... 2 Visualizing decision trees...................................................

For example, enter the following data in three COLUMNS in a new View window.

Statistics with Statview - 18 Paired t-test A paired t-test compares two groups of measurements when the data in the two groups are in some way paired between the groups (e.g., before and after on the

Data Clustering. Dec 2nd, 2013 Kyrylo Bessonov

Data Clustering Dec 2nd, 2013 Kyrylo Bessonov Talk outline Introduction to clustering Types of clustering Supervised Unsupervised Similarity measures Main clustering algorithms k-means Hierarchical Main

Regression Modeling Strategies

Frank E. Harrell, Jr. Regression Modeling Strategies With Applications to Linear Models, Logistic Regression, and Survival Analysis With 141 Figures Springer Contents Preface Typographical Conventions

12/31/2016. PSY 512: Advanced Statistics for Psychological and Behavioral Research 2

PSY 512: Advanced Statistics for Psychological and Behavioral Research 2 Understand when to use multiple Understand the multiple equation and what the coefficients represent Understand different methods

Canonical Correlation Analysis

Canonical Correlation Analysis LEARNING OBJECTIVES Upon completing this chapter, you should be able to do the following: State the similarities and differences between multiple regression, factor analysis,

Quantitative Methods for Finance

Quantitative Methods for Finance Module 1: The Time Value of Money 1 Learning how to interpret interest rates as required rates of return, discount rates, or opportunity costs. 2 Learning how to explain

UNDERSTANDING MULTIPLE REGRESSION

UNDERSTANDING Multiple regression analysis (MRA) is any of several related statistical methods for evaluating the effects of more than one independent (or predictor) variable on a dependent (or outcome)

Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools 2009-2010

Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools 2009-2010 Week 1 Week 2 14.0 Students organize and describe distributions of data by using a number of different

Least-Squares Intersection of Lines

Least-Squares Intersection of Lines Johannes Traa - UIUC 2013 This write-up derives the least-squares solution for the intersection of lines. In the general case, a set of lines will not intersect at a

Introduction to Principal Components and FactorAnalysis

Introduction to Principal Components and FactorAnalysis Multivariate Analysis often starts out with data involving a substantial number of correlated variables. Principal Component Analysis (PCA) is a

SPSS Explore procedure

SPSS Explore procedure One useful function in SPSS is the Explore procedure, which will produce histograms, boxplots, stem-and-leaf plots and extensive descriptive statistics. To run the Explore procedure,

SPSS: Descriptive and Inferential Statistics. For Windows

For Windows August 2012 Table of Contents Section 1: Summarizing Data...3 1.1 Descriptive Statistics...3 Section 2: Inferential Statistics... 10 2.1 Chi-Square Test... 10 2.2 T tests... 11 2.3 Correlation...

Review Jeopardy. Blue vs. Orange. Review Jeopardy

Review Jeopardy Blue vs. Orange Review Jeopardy Jeopardy Round Lectures 0-3 Jeopardy Round \$200 How could I measure how far apart (i.e. how different) two observations, y 1 and y 2, are from each other?

496 STATISTICAL ANALYSIS OF CAUSE AND EFFECT

496 STATISTICAL ANALYSIS OF CAUSE AND EFFECT * Use a non-parametric technique. There are statistical methods, called non-parametric methods, that don t make any assumptions about the underlying distribution

Lean Six Sigma Analyze Phase Introduction. TECH 50800 QUALITY and PRODUCTIVITY in INDUSTRY and TECHNOLOGY

TECH 50800 QUALITY and PRODUCTIVITY in INDUSTRY and TECHNOLOGY Before we begin: Turn on the sound on your computer. There is audio to accompany this presentation. Audio will accompany most of the online

Factor Analysis. Chapter 420. Introduction

Chapter 420 Introduction (FA) is an exploratory technique applied to a set of observed variables that seeks to find underlying factors (subsets of variables) from which the observed variables were generated.

Steven M. Ho!and. Department of Geology, University of Georgia, Athens, GA 30602-2501

PRINCIPAL COMPONENTS ANALYSIS (PCA) Steven M. Ho!and Department of Geology, University of Georgia, Athens, GA 30602-2501 May 2008 Introduction Suppose we had measured two variables, length and width, and

Fairfield Public Schools

Mathematics Fairfield Public Schools AP Statistics AP Statistics BOE Approved 04/08/2014 1 AP STATISTICS Critical Areas of Focus AP Statistics is a rigorous course that offers advanced students an opportunity

Location matters. 3 techniques to incorporate geo-spatial effects in one's predictive model

Location matters. 3 techniques to incorporate geo-spatial effects in one's predictive model Xavier Conort xavier.conort@gear-analytics.com Motivation Location matters! Observed value at one location is

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)

CORRELATION AND REGRESSION / 47 CHAPTER EIGHT CORRELATION AND REGRESSION Correlation and regression are statistical methods that are commonly used in the medical literature to compare two or more variables.

Statistics Graduate Courses STAT 7002--Topics in Statistics-Biological/Physical/Mathematics (cr.arr.).organized study of selected topics. Subjects and earnable credit may vary from semester to semester.

Univariate Regression

Univariate Regression Correlation and Regression The regression line summarizes the linear relationship between 2 variables Correlation coefficient, r, measures strength of relationship: the closer r is

Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.

Business Course Text Bowerman, Bruce L., Richard T. O'Connell, J. B. Orris, and Dawn C. Porter. Essentials of Business, 2nd edition, McGraw-Hill/Irwin, 2008, ISBN: 978-0-07-331988-9. Required Computing

Canonical Correlation

Chapter 400 Introduction Canonical correlation analysis is the study of the linear relations between two sets of variables. It is the multivariate extension of correlation analysis. Although we will present

Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics.

Course Catalog In order to be assured that all prerequisites are met, students must acquire a permission number from the education coordinator prior to enrolling in any Biostatistics course. Courses are

The ith principal component (PC) is the line that follows the eigenvector associated with the ith largest eigenvalue.

More Principal Components Summary Principal Components (PCs) are associated with the eigenvectors of either the covariance or correlation matrix of the data. The ith principal component (PC) is the line

Lecture 10: Regression Trees

Lecture 10: Regression Trees 36-350: Data Mining October 11, 2006 Reading: Textbook, sections 5.2 and 10.5. The next three lectures are going to be about a particular kind of nonlinear predictive model,

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization GENOME 560, Spring 2012 Data are interesting because they help us understand the world Genomics: Massive Amounts

Statistical Data Mining. Practical Assignment 3 Discriminant Analysis and Decision Trees

Statistical Data Mining Practical Assignment 3 Discriminant Analysis and Decision Trees In this practical we discuss linear and quadratic discriminant analysis and tree-based classification techniques.

Assumptions. Assumptions of linear models. Boxplot. Data exploration. Apply to response variable. Apply to error terms from linear model

Assumptions Assumptions of linear models Apply to response variable within each group if predictor categorical Apply to error terms from linear model check by analysing residuals Normality Homogeneity

, has mean A) 0.3. B) the smaller of 0.8 and 0.5. C) 0.15. D) which cannot be determined without knowing the sample results.

BA 275 Review Problems - Week 9 (11/20/06-11/24/06) CD Lessons: 69, 70, 16-20 Textbook: pp. 520-528, 111-124, 133-141 An SRS of size 100 is taken from a population having proportion 0.8 of successes. An

Module 9: Nonparametric Tests. The Applied Research Center

Module 9: Nonparametric Tests The Applied Research Center Module 9 Overview } Nonparametric Tests } Parametric vs. Nonparametric Tests } Restrictions of Nonparametric Tests } One-Sample Chi-Square Test

Introduction to Quantitative Methods

Introduction to Quantitative Methods October 15, 2009 Contents 1 Definition of Key Terms 2 2 Descriptive Statistics 3 2.1 Frequency Tables......................... 4 2.2 Measures of Central Tendencies.................

Quantitative Data Analysis: Choosing a statistical test Prepared by the Office of Planning, Assessment, Research and Quality

Quantitative Data Analysis: Choosing a statistical test Prepared by the Office of Planning, Assessment, Research and Quality 1 To help choose which type of quantitative data analysis to use either before

Principal Component Analysis

Principal Component Analysis Principle Component Analysis: A statistical technique used to examine the interrelations among a set of variables in order to identify the underlying structure of those variables.

Module 5: Multiple Regression Analysis

Using Statistical Data Using to Make Statistical Decisions: Data Multiple to Make Regression Decisions Analysis Page 1 Module 5: Multiple Regression Analysis Tom Ilvento, University of Delaware, College

Statistical matching: Experimental results and future research questions

Statistical matching: Experimental results and future research questions 2015 19 Ton de Waal Content 1. Introduction 4 2. Methods for statistical matching 5 2.1 Introduction to statistical matching 5 2.2

Factor Analysis. Principal components factor analysis. Use of extracted factors in multivariate dependency models

Factor Analysis Principal components factor analysis Use of extracted factors in multivariate dependency models 2 KEY CONCEPTS ***** Factor Analysis Interdependency technique Assumptions of factor analysis

Yiming Peng, Department of Statistics. February 12, 2013

Regression Analysis Using JMP Yiming Peng, Department of Statistics February 12, 2013 2 Presentation and Data http://www.lisa.stat.vt.edu Short Courses Regression Analysis Using JMP Download Data to Desktop

Steven M. Ho!and. Department of Geology, University of Georgia, Athens, GA 30602-2501

CLUSTER ANALYSIS Steven M. Ho!and Department of Geology, University of Georgia, Athens, GA 30602-2501 January 2006 Introduction Cluster analysis includes a broad suite of techniques designed to find groups

C: LEVEL 800 {MASTERS OF ECONOMICS( ECONOMETRICS)}

C: LEVEL 800 {MASTERS OF ECONOMICS( ECONOMETRICS)} 1. EES 800: Econometrics I Simple linear regression and correlation analysis. Specification and estimation of a regression model. Interpretation of regression

The aspect of the data that we want to describe/measure is the degree of linear relationship between and The statistic r describes/measures the degree

PS 511: Advanced Statistics for Psychological and Behavioral Research 1 Both examine linear (straight line) relationships Correlation works with a pair of scores One score on each of two variables ( and

By Hui Bian Office for Faculty Excellence

By Hui Bian Office for Faculty Excellence 1 Email: bianh@ecu.edu Phone: 328-5428 Location: 2307 Old Cafeteria Complex 2 When want to predict one variable from a combination of several variables. When want

11/20/2014. Correlational research is used to describe the relationship between two or more naturally occurring variables.

Correlational research is used to describe the relationship between two or more naturally occurring variables. Is age related to political conservativism? Are highly extraverted people less afraid of rejection

Lecture 2: Descriptive Statistics and Exploratory Data Analysis

Lecture 2: Descriptive Statistics and Exploratory Data Analysis Further Thoughts on Experimental Design 16 Individuals (8 each from two populations) with replicates Pop 1 Pop 2 Randomly sample 4 individuals

Course Text. Required Computing Software. Course Description. Course Objectives. StraighterLine. Business Statistics

Course Text Business Statistics Lind, Douglas A., Marchal, William A. and Samuel A. Wathen. Basic Statistics for Business and Economics, 7th edition, McGraw-Hill/Irwin, 2010, ISBN: 9780077384470 [This

THE HYBRID CART-LOGIT MODEL IN CLASSIFICATION AND DATA MINING. Dan Steinberg and N. Scott Cardell

THE HYBID CAT-LOGIT MODEL IN CLASSIFICATION AND DATA MINING Introduction Dan Steinberg and N. Scott Cardell Most data-mining projects involve classification problems assigning objects to classes whether

Statistics and research

Statistics and research Usaneya Perngparn Chitlada Areesantichai Drug Dependence Research Center (WHOCC for Research and Training in Drug Dependence) College of Public Health Sciences Chulolongkorn University,

Course Objective This course is designed to give you a basic understanding of how to run regressions in SPSS.

SPSS Regressions Social Science Research Lab American University, Washington, D.C. Web. www.american.edu/provost/ctrl/pclabs.cfm Tel. x3862 Email. SSRL@American.edu Course Objective This course is designed

MULTIVARIATE DATA ANALYSIS WITH PCA, CA AND MS TORSTEN MADSEN 2007

MULTIVARIATE DATA ANALYSIS WITH PCA, CA AND MS TORSTEN MADSEN 2007 Archaeological material that we wish to analyse through formalised methods has to be described prior to analysis in a standardised, formalised

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct

Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris

Class #6: Non-linear classification ML4Bio 2012 February 17 th, 2012 Quaid Morris 1 Module #: Title of Module 2 Review Overview Linear separability Non-linear classification Linear Support Vector Machines

Inferential Statistics

Inferential Statistics Sampling and the normal distribution Z-scores Confidence levels and intervals Hypothesis testing Commonly used statistical methods Inferential Statistics Descriptive statistics are

SEX DISCRIMINATION PROBLEM

SEX DISCRIMINATION PROBLEM 12. Multiple Linear Regression in SPSS In this section we will demonstrate how to apply the multiple linear regression procedure in SPSS to the sex discrimination data. The numerical

Chapter 2: Looking at Data Relationships (Part 1)

Chapter 2: Looking at Data Relationships (Part 1) Dr. Nahid Sultana Chapter 2: Looking at Data Relationships 2.1: Scatterplots 2.2: Correlation 2.3: Least-Squares Regression 2.5: Data Analysis for Two-Way

Chapter 14: Analyzing Relationships Between Variables

Chapter Outlines for: Frey, L., Botan, C., & Kreps, G. (1999). Investigating communication: An introduction to research methods. (2nd ed.) Boston: Allyn & Bacon. Chapter 14: Analyzing Relationships Between

Graphical and Tabular. Summarization of Data OPRE 6301

Graphical and Tabular Summarization of Data OPRE 6301 Introduction and Re-cap... Descriptive statistics involves arranging, summarizing, and presenting a set of data in such a way that useful information

Outline of Topics. Statistical Methods I. Types of Data. Descriptive Statistics

Statistical Methods I Tamekia L. Jones, Ph.D. (tjones@cog.ufl.edu) Research Assistant Professor Children s Oncology Group Statistics & Data Center Department of Biostatistics Colleges of Medicine and Public

Research Variables. Measurement. Scales of Measurement. Chapter 4: Data & the Nature of Measurement

Chapter 4: Data & the Nature of Graziano, Raulin. Research Methods, a Process of Inquiry Presented by Dustin Adams Research Variables Variable Any characteristic that can take more than one form or value.

Example: Boats and Manatees

Figure 9-6 Example: Boats and Manatees Slide 1 Given the sample data in Table 9-1, find the value of the linear correlation coefficient r, then refer to Table A-6 to determine whether there is a significant

Data Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression

Data Mining and Data Warehousing Henryk Maciejewski Data Mining Predictive modelling: regression Algorithms for Predictive Modelling Contents Regression Classification Auxiliary topics: Estimation of prediction

Doing Quantitative Research 26E02900, 6 ECTS Lecture 2: Measurement Scales. Olli-Pekka Kauppila Rilana Riikkinen

Doing Quantitative Research 26E02900, 6 ECTS Lecture 2: Measurement Scales Olli-Pekka Kauppila Rilana Riikkinen Learning Objectives 1. Develop the ability to assess a quality of measurement instruments

Local outlier detection in data forensics: data mining approach to flag unusual schools

Local outlier detection in data forensics: data mining approach to flag unusual schools Mayuko Simon Data Recognition Corporation Paper presented at the 2012 Conference on Statistical Detection of Potential

Evaluating DETECT Classification Accuracy and Consistency when Data Display Complex Structure

DETECT Classification 1 Evaluating DETECT Classification Accuracy and Consistency when Data Display Complex Structure Mark J. Gierl Jacqueline P. Leighton Xuan Tan Centre for Research in Applied Measurement

Cluster Analysis. Isabel M. Rodrigues. Lisboa, 2014. Instituto Superior Técnico

Instituto Superior Técnico Lisboa, 2014 Introduction: Cluster analysis What is? Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates

Descriptive Statistics

Descriptive Statistics Primer Descriptive statistics Central tendency Variation Relative position Relationships Calculating descriptive statistics Descriptive Statistics Purpose to describe or summarize

Exercise 1.12 (Pg. 22-23)

Individuals: The objects that are described by a set of data. They may be people, animals, things, etc. (Also referred to as Cases or Records) Variables: The characteristics recorded about each individual.

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

Readings: Ha and Ha Textbook - Chapters 1 8 Appendix D & E (online) Plous - Chapters 10, 11, 12 and 14 Chapter 10: The Representativeness Heuristic Chapter 11: The Availability Heuristic Chapter 12: Probability

Introduction to Principal Component Analysis: Stock Market Values

Chapter 10 Introduction to Principal Component Analysis: Stock Market Values The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from

EPS 625 ANALYSIS OF COVARIANCE (ANCOVA) EXAMPLE USING THE GENERAL LINEAR MODEL PROGRAM

EPS 6 ANALYSIS OF COVARIANCE (ANCOVA) EXAMPLE USING THE GENERAL LINEAR MODEL PROGRAM ANCOVA One Continuous Dependent Variable (DVD Rating) Interest Rating in DVD One Categorical/Discrete Independent Variable

Supervised Feature Selection & Unsupervised Dimensionality Reduction

Supervised Feature Selection & Unsupervised Dimensionality Reduction Feature Subset Selection Supervised: class labels are given Select a subset of the problem features Why? Redundant features much or

Social Media Mining. Data Mining Essentials

Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers

Data Mining - Evaluation of Classifiers

Data Mining - Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010

Linear Models and Conjoint Analysis with Nonlinear Spline Transformations

Linear Models and Conjoint Analysis with Nonlinear Spline Transformations Warren F. Kuhfeld Mark Garratt Abstract Many common data analysis models are based on the general linear univariate model, including

SPSS Tests for Versions 9 to 13

SPSS Tests for Versions 9 to 13 Chapter 2 Descriptive Statistic (including median) Choose Analyze Descriptive statistics Frequencies... Click on variable(s) then press to move to into Variable(s): list

Chapter 1 Introduction. 1.1 Introduction

Chapter 1 Introduction 1.1 Introduction 1 1.2 What Is a Monte Carlo Study? 2 1.2.1 Simulating the Rolling of Two Dice 2 1.3 Why Is Monte Carlo Simulation Often Necessary? 4 1.4 What Are Some Typical Situations

!"!!"#\$\$%&'()*+\$(,%!"#\$%\$&'()*""%(+,'-*&./#-\$&'(-&(0*".\$#-\$1"(2&."3\$'45"

!"!!"#\$\$%&'()*+\$(,%!"#\$%\$&'()*""%(+,'-*&./#-\$&'(-&(0*".\$#-\$1"(2&."3\$'45"!"#"\$%&#'()*+',\$\$-.&#',/"-0%.12'32./4'5,5'6/%&)\$).2&'7./&)8'5,5'9/2%.%3%&8':")08';:

POLYNOMIAL AND MULTIPLE REGRESSION. Polynomial regression used to fit nonlinear (e.g. curvilinear) data into a least squares linear regression model.

Polynomial Regression POLYNOMIAL AND MULTIPLE REGRESSION Polynomial regression used to fit nonlinear (e.g. curvilinear) data into a least squares linear regression model. It is a form of linear regression

Analysing Ecological Data

Alain F. Zuur Elena N. Ieno Graham M. Smith Analysing Ecological Data University- una Landesbibliothe;< Darmstadt Eibliothek Biologie tov.-nr. 4y Springer Contents Contributors xix 1 Introduction 1 1.1

Multivariate Analysis of Variance (MANOVA)

Chapter 415 Multivariate Analysis of Variance (MANOVA) Introduction Multivariate analysis of variance (MANOVA) is an extension of common analysis of variance (ANOVA). In ANOVA, differences among various

Statistical Machine Learning

Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes