Lecture 12: Indirect gradient analysis IV: weighted averaging, Correspondence Analysis (CA) and Detrended Correspondence Analysis (DCA)
|
|
- Rudolf Flynn
- 7 years ago
- Views:
Transcription
1 Lecture 12: Indirect gradient analysis IV: weighted averaging, Correspondence Analysis (CA) and Detrended Correspondence Analysis (DCA) 1. Why go further than PCA? 2. Weighted averaging 3. Correspondence Analysis (CA) 4. Detrended Correspondence Analysis (DCA) 5. Biplot diagrams 6. Review of all ordination techniques
2 PCA- Why go further? The linear model PCA usually doesn t fit most species data. Species come and go across a gradient -- often with a unimodal response (Gausian or bell-shaped distribution). If you sample across a broad enough gradient, this unimodal distribution patterns usually occurs. Correspondence analysis is another eigenvector-based technique that assumes a unimodal, rather than linear, relationship among the variables. As far as what you can do with it, what the graphing looks like, etc. it is very similar to PCA, the big change is how the axes are derived
3 Weighted Averaging A form of direct gradient analysis, weighted averaging was first used for vegetation data in 1967 by Robert Whittaker. It led quickly to reciprocal averaging (correspondence analysis) by Curtis and McIntosh in their studies of upland forests in southern Wisconsin. It is an important precursor to the application of more advanced multivariate techniques. Knowledge of species response along a known environmental gradient is used to order stands of vegetation along the same gradient.
4 Weighted averaging The weighted average is calculated for each stand by multiplying the abundance of each species times the weighting factor for that species and summing the weighted cover scores for all species and dividing by the sum of unweighted scores of all species.
5 Weighted averaging: an example Suppose that we had a set of vegetation samples, and we knew a lot about the wetland status of all the species within the samples. We could then calculate a wetland rating for each sample by taking the abundance of each species and using that to weight it according to its wetland ranking. Sample A: SPECIES WETLAND RANKING COVER WRXC Juntri Bigglu Lupplu sum: 9 15 In this example the wetland ranking for this sample is 15/9 or 1.67, reflecting that it is towards the dry end of the gradient (closer to 1 than 3). Our sample at Wetland scale If there were several samples: the samples could be ordered in a one-dimensional ordination according to their weighted wetland ranking.
6 Weighted averaging: an example (cont.) INTERPRETATION The method is a form of direct gradient analysis. It shows only one axis and is useful for situations where there is only one primary environmental gradient under consideration. The major criticism of the method is that the original weights assigned to each species is based on a subjective assessment of the position of the species with respect to the gradient, and this score could vary from one worker to another.
7 Reciprocal averaging (Correspondence Analysis) Reciprocal averaging works on much of the same principle as weighted averaging, but simultaneously orders both the columns and rows of a matrix. Also, rather than forcing an external structure into the results by use of a weighting factor, it finds inherent structure within a data set. Provides the basis for more advanced methods of ordinations developed after Papers by Hill (1973, 1974) first made CA well known to ecologists.
8 Basic idea of CA: To get a simultaneous ordination of the variables in the columns and rows (sample plots and species) based on the inherent structure of the data due to redundancy in the data, i.e. the co-occurrence of groups of species within the set of plots, and multiple plots showing the same groups of species. If you arrange a set of species and samples according to their first axis CA order, they will look similar to a sorted B-B table, with dominant species in the middle, and rare ones on the ends. The method of weighted averages is applied to a data matrix such that quadrat scores are derived from species scores and weightings. These are carried out successively using an interative procedure. The scores eventually stabilize to get a set of scores for quadrats and a set of scores for species. This two-way weighted averaging is only slightly more complex than one-way weighted averaging of the preceding example.
9 Calculation of the first axis Calculate the row and column totals. Allocate weights to the species. Reciprocal averaging then commences. The averaging process is then applied in reverse to give a new set of scores for the species using the sample scores. To avoid calculation with very small numbers these new species scores are rescaled from 1 to 100 The species scores of the final iteration are the positions of the species along the first axis (0 to 100) of the ordination, and the sample scores are the positions of the samples along the first axis.
10 Reciprocal averaging (an example): SPECIES Sample 1 Sample 2 Sample 3 Juntri Bigglu Lupplu Total spp Average RESCALE Rescaling: Example:(average cover-lowest average cover)/range of cover totals x 100. Rescaled value for Sample 1 = ( )/( ) x 100 =
11 Reciprocal averaging (an example): SPECIES Sample 1 Sample 2 Sample 3 Total samples Wt Av 1 Juntri Bigglu Lupplu Total spp Average RESCALE The next step is to calculate an average value for each species, weighted by the first axis value. So: ((83.3 x 2)+(100 x 8)+(1 x 1))/11= etc.
12 Reciprocal averaging (an example): SPECIES Sample 1 Sample 2 Sample 3 Total samples Wt Av 1 Juntri Bigglu Lupplu Total spp Average RESCALE WT AV RESCALE This new vector is used to calculate a new weighted average which is then rescaled. So: ((87.97 x 2)+(72.03 x 4)+(78.88 x 8))/14 = etc. Note: rescaling is done to keep the values from getting very small and keep them in a reasonable range. It is necessary to do this one time each iteration (for the sample scores in this case).
13 Reciprocal averaging (an example): Keep going... SPECIES Sample 1 Sample 2 Sample 3 Total samples Wt Av 1 Wt Av 2 Wt Av 3 Wt Av 4 Wt Av 5 Wt Av 6 Rescale 6 Juntri Bigglu Lupplu Total spp Average RESCALE WT AV RESCALE WT AV RESCSALE WT AV RESCALE WT AV RESCALE WT AV RESCALE until the amount of change between successive iterations for both the species and sample vectors is minimized. The species scores are rescaled on the last iteration, so that both the species and sample axes are scaled from 0 to 100.
14 Reciprocal averaging (an example): IF there is indeed a diagonal structure to the matrix (where the species and samples correspond to a diagonal along the middle) then the scores will stabilize. Once they stabilize, you have the first axis! The species axis is usually used as a test and the sample axis is the weighted average of the species axis.
15 Calculation of the eigenvalue As with PCA, the eigenvalue can be thought of as a measure of the proportion of the total variation in the data explained by the axis. It is obtained by taking the range of the unscaled scores in the final iteration and dividing by the range of the scaled scores. In our example, the eigenvalue of the first axis = ( )/100 = There is no specific eigenvalue cutoff for significant axes, but generally axes with eigenvalues >0.25 should be considered in the analysis. In our trivial example with 3 plots and 3 species, the eigenvalue of the first axis is close to 0.25, and no higher axes would be close to this.
16 Calculation of the second axis It is possible to extract a second axis. This will likely be necessary if there are plots that lie close together in the first axis but which also have a great deal of differences in species composition. The second axis is extracted by the same iteration process, with one extra step in which the trial scores for the second axis are made uncorrelated with the first axis. In other words, The linear correlation with the first axis is removed. This is done by taking the trial scores for the second axis and regressing them against the site scores for the first axis. The residuals from this regression are the new trial axis. This is done once for the sample scores at each iteration.
17 Example data set for reciprocal averaging from Kent and Coker Uses binary (presence/absence) data. Several species.
18 Forming the ordination The position of the quadrats and species in the ordination space is determined by species scores and quadrat scores for the first two axes. The table below provides the scores for the samples and species for the second axis. The first axis scores were derived in Table 6.1 in Kent and Coker.
19 Review of reciprocal averaging (correspondence analysis) Axis 1 Assign random scores to each species. Use these to calculate a weighted average for each sample. Rescale these sample scores. Use the sample scores to calculate a weighted average for each species. Use the species scores to calculate a weighted average for each sample. Continue until scores converge to a unique solution. Axis 2 Assign a new set of random scores to each species. Calculate a trial axis as above. Perform a multiple regression between the trial axis and the final axis obtained (above). Take the residual values as the new trial axis.
20 Problems with CA THE ARCH EFFECT A mathematical artifact corresponding to no real structure in the data. The second axis is a quadratic distortion of the first axis. In data sets where there is no strong controlling gradient for the second axis, the arch effect is likely to occur COMPRESSION NEAR THE ENDS OF THE AXES Related to the arch effect and does not show the actual comings and goings of species along the first axis.
21 Arch effect and compression of the axes resulting from CA in an idealized data set In this data set there should be no variation in the y axis scores of plots 1 to 7. The CA will create variation where there is none. It also compresses the Axis 1 scores towards the ends of the axis. Detrending and rescaling in DCA removes these effects.
22 Detrended Correspondence Analysis (DCA) Correcting for the arch effect (detrending) The first axis is divided into a number of segments and within each segment, the second axis scores are recalculated so that they have an average of zero. Correcting for the compression effect Also overcome by segmenting the first axis and rescaling the species ordination (not the sample ordination), such that the coming and going of species is about equal along the gradient.
23 Detrended Correspondence Analysis (DCA): scaling of the axes Scaling of the axes in sd units The axes in DCA are scaled into units that are the average standard deviation of species turnover (sd units). A 50% change in species composition occurs in a distance of about 1 SD unit. Species appear, rise to their modes, and disappear over a distance of about 4 SD units. The more SD units that occur along the axis the more change in species composition is shown. Thus, the sd units of DCA are a useful measure of beta diversity in the total data set. The position of samples along the 1st axis are thus shifted to equalize betadiversity. Species have, on average, a habitat breadth (as measure by standard deviations) of 1.
24 Detrended Correspondence Analysis (DCA): an example comparing results from CA with DCA
25 Output from PC-ORD DCA List of residuals. Eigenvectors are found by iteration and the residuals indicate how close a given vector is to an eigenvector at each iteration. A vector is considered an eigenvector if it has a residual of less than The eigenvalue is calculated at the final iteration. For the first axis it can be considered in the same way as in PCA, as an indicator of the amount of variance accounted for by the first axis. However, for higher axes, detrending in DCA destroys the correspondence between the eigenvalue and the structure along that axis. PC-ORD recommends using an after the fact coefficient of determination between Relative Euclidean distance in the unreduced species space and the Euclidean distance in the ordination space (Graph/ Ordination/ Statistics/ Percent of Variance). Length of segments. These lengths indicate the degree of rescaling that occurred. For example, small values for the middle segments suggests that a gradient was compressed at the ends before rescaling. Length of the gradient. This is the total length of the axis before and after rescaling. The units are in sd units, a measure of beta-diversity. Four sd units is approximately equal to one complete turnover of species. Note: in PC- Ord 5, the length of the gradient is multiplied by 100.
26 PC-ORD DCA output (cont ) At the end of the output are the species scores for the three axes, and their rankings along the first two axes. Along with the eigenvalue scores for each axis. This followed by a similar table for the sample scores.
27 How do you tell if DCA is for you? A general rule of thumb is that if the axis is less than 2 units long, you should consider PCA, if it is more than 4 units long then DCA will likely be appropriate, and in between that you will have to look harder at the data and make some educated choices...
28 Criticism of DCA The method used for correcting the arch effect and compression have no emprical or theoritical basis (Wartenberg et. al 1987). The assumption that species turnover is constant or even along gradients is likely not true. It removes, with brute force, if needed any arch effect even if there is really is an arch effect in your data set. Because it is done by arbitrarily dividing the axis into pieces, and then shifting those pieces up or down (to get a constant mean), the relationships within the segment are maintained, but other relationships can be ripped apart.
29 DCA OVERALL Despite these criticisms, DCA remains one of the most popular methods of indirect gradient analysis and is computationally very efficient. It is the best method to to use when there are no environmental data. The interpretation of results from DCA is best carried out with some knowledge of its limitations and comparison with other techniques. By doing this, one can get a better feel for patterns created by actual structure within the data set.
30 Joint plots, Biplots, Triplots A useful diagram that simultaneously shows the ordination together with the trends and strength of environmental factor correlations with the species data (Gabriel, 1971). A biplot is scaled somewhat differently than a joint plot. Biplot scaling is centered and normalized. Joint plot scaling is not. (These scaling options are part of the CCA ordination.) Both species and environmental factors are plotted on the same graph but using different scales. Arrows are drawn from the joint centered ordination axes to the points representing environmental variables. The direction of the arrow indicates the direction in which the abundance of a variable increases most rapidly. The length of the arrow indicates the rate of change in abundance in that direction. The method of joint plot calculation used in PC-ORD can be applied to any ordination technique. In PC-ORD the number of variables displayed is controlled by the Joint plot cutoff. Only variables in the 2nd matrix with r 2 values greater than the cutoff are displayed.
31 Joint plot: an example using a sample plot from DCA
32 Technique Review of ordination techniques Summary When to use Bray and Curtis(Polar Ordination) Principal Components Analysis (PCA) Reciprocal Averaging (Correspondence Analysis, (CA)) The first ordination technique. The axes are calculated by the use of floristic similarity coefficients. Samples that are least similar to each other form the ends of the first axis, and other plots are positioned along this axis according to their dissimilarity to both ends of the axis. The length of the axis is the dissimilarity of the plots of the end members of the axis. The second axis is artificially made perpendicular to the first axis. PCA is based on the assumption that there are linear correlations among the variables being reduced. It is an eigenvector technique. It reduces data sets with many highly correlated variables to a set of components that are uncorrelated with each other. The eigenvalues are correlation coefficients, and therefore can be used directly to measure the variance explained by the ordination. CA is also an eigenvector method. It simultaneously solves for the scores of plots and samples through a iterative process of matrix algebra that eventually results in scores that don t change with additional iterations. These scores are the solution for each axis. It is based on the assumption that the data have a unimodal response (Gaussian distribution) to an underlying gradient. The axes are generally scaled from 0 to 100. Both species and samples are plotted in the same ordination space. Any plant community data set can be ordinated with Bray and Curtis. The technique is not used much now because of the first and second axes are not mathematically orthogonal (noncorrelated), and there is criticism about the subjective nature of choosing the end points of the axes. It is, however, an easily understood method, and is good for teaching the principals of ordination.it has recently regained favor among some ecologists because modern computer algorithms have eliminated much of the subjectivity in the method (e.g., BCORD used in the PC-ORD program), and it has performed well in comparison with other methods. Any data set with high linear correlations between variables is appropriate for PCA. This is often the case with series of related data such as climate, soils, or biogeochemical data from plots. It can be appropriate for species data that have mostly linear correlations. It is generally not applied to plant community data that have high beta diversity (high species turnover along the first axis). It can be used for most plant community data, but the arch effect and compression of the scores toward the ends of the axes can be severe limitations. A great advantage is that both plots and species are plotted with the same axes.
33 Review of all ordination techniques (cont ) Technique Summary When to use Detrended Correspondence Analysis (DCA) Other methods: Canonical Correlations Analysis (CCA), Non-metric Multidimensional scaling (NMDS) DCA is similar to CA except that it corrects for the arch effect and the compression of the scores toward the end of the axes. It does this by dividing the the axis into segments and recalculating the scores within each segment so that the average score is zero. It also rescales each segment so that the coming and going of species in each segment is about equal. The axes are measured in sd units, which measure the turnover of species along the axes. CCA is a direct ordination method that uses correlation and regression between the floristic data and environmental data. Through an iterative process similar to that in CA and DCA, it selects the linear combination of environmental variables that explains most of the variation in the species scores on each axis. NMDS has only recently gained favor among ecologists (Clarke 1993). It is an iterative search for the best positions of n entitites in k dimensions (axes) that minimizes the stress of the k-dimensional configuration. DCA was extensively used in the 1980s and 1990s, but has lost favor in the past few years because the detrending process can have undesirable effects on ordinations. It remains a very good method for ordinating plots and species, when there are no environmental data. These are more advanced techniques have gained favor in the past few years. Students wishing to use ordination for their their theses are advised to seriously consider use of these techniques. CCA requires a good set of environmental data, whereas all other ordinations can be based purely on floristic data (environmental relationships are determined indirectly with these other methods).
34 Differences and advantages of NMDS vs. DCA Topic Computation time Distance metric Simultaneous ordination of species and plots Arch effect Related to direct gradient analysis methods Need to pre-specify numbers of dimensions Need to pre-specify parameters for no. of segments, etc. Solution changes depending on no. of axes viewed Handles samples with high noise levels Axes show measure of beta diversity (sd units) Used in other disciplines DCA Low Do not need to specify Yes Artificially and inelegantly removed Yes No Yes No Yes Yes No NMDS High Highly sensitive No Rarely occurs No Yes No Yes No? No Widely Axes interpretable as gradients Derived from a model of species response gradients Yes Yes No No Guaranteed to reach the global solution Yes No Adapted from: Michael Palmer s web page:
35 Two especially good references for ordination users Bruce McCune, James B.Grace, and Dean L. Urban on methods for analyzing multivariate data in community ecology, published by MjM Software Design, A expanded supplement for the PC-Ord manual by the authors of PC-Ord. Ordination Methods an overview. Michael Palmer: A condensed modern overview of the wide variety of ordination methods.
Dimensionality Reduction: Principal Components Analysis
Dimensionality Reduction: Principal Components Analysis In data mining one often encounters situations where there are a large number of variables in the database. In such situations it is very likely
More informationPrincipal Component Analysis
Principal Component Analysis ERS70D George Fernandez INTRODUCTION Analysis of multivariate data plays a key role in data analysis. Multivariate data consists of many different attributes or variables recorded
More informationReview Jeopardy. Blue vs. Orange. Review Jeopardy
Review Jeopardy Blue vs. Orange Review Jeopardy Jeopardy Round Lectures 0-3 Jeopardy Round $200 How could I measure how far apart (i.e. how different) two observations, y 1 and y 2, are from each other?
More informationIntroduction to Principal Components and FactorAnalysis
Introduction to Principal Components and FactorAnalysis Multivariate Analysis often starts out with data involving a substantial number of correlated variables. Principal Component Analysis (PCA) is a
More informationMultivariate Analysis of Ecological Data
Multivariate Analysis of Ecological Data MICHAEL GREENACRE Professor of Statistics at the Pompeu Fabra University in Barcelona, Spain RAUL PRIMICERIO Associate Professor of Ecology, Evolutionary Biology
More informationPrincipal Component Analysis
Principal Component Analysis Principle Component Analysis: A statistical technique used to examine the interrelations among a set of variables in order to identify the underlying structure of those variables.
More informationFactor Analysis. Chapter 420. Introduction
Chapter 420 Introduction (FA) is an exploratory technique applied to a set of observed variables that seeks to find underlying factors (subsets of variables) from which the observed variables were generated.
More information1 2 3 1 1 2 x = + x 2 + x 4 1 0 1
(d) If the vector b is the sum of the four columns of A, write down the complete solution to Ax = b. 1 2 3 1 1 2 x = + x 2 + x 4 1 0 0 1 0 1 2. (11 points) This problem finds the curve y = C + D 2 t which
More informationCALCULATIONS & STATISTICS
CALCULATIONS & STATISTICS CALCULATION OF SCORES Conversion of 1-5 scale to 0-100 scores When you look at your report, you will notice that the scores are reported on a 0-100 scale, even though respondents
More informationIntroduction to Principal Component Analysis: Stock Market Values
Chapter 10 Introduction to Principal Component Analysis: Stock Market Values The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from
More informationEM Clustering Approach for Multi-Dimensional Analysis of Big Data Set
EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set Amhmed A. Bhih School of Electrical and Electronic Engineering Princy Johnson School of Electrical and Electronic Engineering Martin
More informationManifold Learning Examples PCA, LLE and ISOMAP
Manifold Learning Examples PCA, LLE and ISOMAP Dan Ventura October 14, 28 Abstract We try to give a helpful concrete example that demonstrates how to use PCA, LLE and Isomap, attempts to provide some intuition
More informationSteven M. Ho!and. Department of Geology, University of Georgia, Athens, GA 30602-2501
PRINCIPAL COMPONENTS ANALYSIS (PCA) Steven M. Ho!and Department of Geology, University of Georgia, Athens, GA 30602-2501 May 2008 Introduction Suppose we had measured two variables, length and width, and
More information15.062 Data Mining: Algorithms and Applications Matrix Math Review
.6 Data Mining: Algorithms and Applications Matrix Math Review The purpose of this document is to give a brief review of selected linear algebra concepts that will be useful for the course and to develop
More informationPrincipal components analysis
CS229 Lecture notes Andrew Ng Part XI Principal components analysis In our discussion of factor analysis, we gave a way to model data x R n as approximately lying in some k-dimension subspace, where k
More informationT-test & factor analysis
Parametric tests T-test & factor analysis Better than non parametric tests Stringent assumptions More strings attached Assumes population distribution of sample is normal Major problem Alternatives Continue
More informationMultivariate Analysis (Slides 13)
Multivariate Analysis (Slides 13) The final topic we consider is Factor Analysis. A Factor Analysis is a mathematical approach for attempting to explain the correlation between a large set of variables
More informationIntroduction to Matrix Algebra
Psychology 7291: Multivariate Statistics (Carey) 8/27/98 Matrix Algebra - 1 Introduction to Matrix Algebra Definitions: A matrix is a collection of numbers ordered by rows and columns. It is customary
More informationThe PageRank Citation Ranking: Bring Order to the Web
The PageRank Citation Ranking: Bring Order to the Web presented by: Xiaoxi Pang 25.Nov 2010 1 / 20 Outline Introduction A ranking for every page on the Web Implementation Convergence Properties Personalized
More informationSimple Predictive Analytics Curtis Seare
Using Excel to Solve Business Problems: Simple Predictive Analytics Curtis Seare Copyright: Vault Analytics July 2010 Contents Section I: Background Information Why use Predictive Analytics? How to use
More informationDesign & Analysis of Ecological Data. Landscape of Statistical Methods...
Design & Analysis of Ecological Data Landscape of Statistical Methods: Part 3 Topics: 1. Multivariate statistics 2. Finding groups - cluster analysis 3. Testing/describing group differences 4. Unconstratined
More information6.4 Normal Distribution
Contents 6.4 Normal Distribution....................... 381 6.4.1 Characteristics of the Normal Distribution....... 381 6.4.2 The Standardized Normal Distribution......... 385 6.4.3 Meaning of Areas under
More informationFACTOR ANALYSIS. Factor Analysis is similar to PCA in that it is a technique for studying the interrelationships among variables.
FACTOR ANALYSIS Introduction Factor Analysis is similar to PCA in that it is a technique for studying the interrelationships among variables Both methods differ from regression in that they don t have
More informationAdditional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm
Mgt 540 Research Methods Data Analysis 1 Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm http://web.utk.edu/~dap/random/order/start.htm
More informationExploratory Factor Analysis
Introduction Principal components: explain many variables using few new variables. Not many assumptions attached. Exploratory Factor Analysis Exploratory factor analysis: similar idea, but based on model.
More informationBiggar High School Mathematics Department. National 5 Learning Intentions & Success Criteria: Assessing My Progress
Biggar High School Mathematics Department National 5 Learning Intentions & Success Criteria: Assessing My Progress Expressions & Formulae Topic Learning Intention Success Criteria I understand this Approximation
More information[1] Diagonal factorization
8.03 LA.6: Diagonalization and Orthogonal Matrices [ Diagonal factorization [2 Solving systems of first order differential equations [3 Symmetric and Orthonormal Matrices [ Diagonal factorization Recall:
More informationMATH 551 - APPLIED MATRIX THEORY
MATH 55 - APPLIED MATRIX THEORY FINAL TEST: SAMPLE with SOLUTIONS (25 points NAME: PROBLEM (3 points A web of 5 pages is described by a directed graph whose matrix is given by A Do the following ( points
More informationMehtap Ergüven Abstract of Ph.D. Dissertation for the degree of PhD of Engineering in Informatics
INTERNATIONAL BLACK SEA UNIVERSITY COMPUTER TECHNOLOGIES AND ENGINEERING FACULTY ELABORATION OF AN ALGORITHM OF DETECTING TESTS DIMENSIONALITY Mehtap Ergüven Abstract of Ph.D. Dissertation for the degree
More information4.7. Canonical ordination
Université Laval Analyse multivariable - mars-avril 2008 1 4.7.1 Introduction 4.7. Canonical ordination The ordination methods reviewed above are meant to represent the variation of a data matrix in a
More informationCommon factor analysis
Common factor analysis This is what people generally mean when they say "factor analysis" This family of techniques uses an estimate of common variance among the original variables to generate the factor
More informationNCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )
Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates
More informationHow To Understand Multivariate Models
Neil H. Timm Applied Multivariate Analysis With 42 Figures Springer Contents Preface Acknowledgments List of Tables List of Figures vii ix xix xxiii 1 Introduction 1 1.1 Overview 1 1.2 Multivariate Models
More informationFactor Analysis. Principal components factor analysis. Use of extracted factors in multivariate dependency models
Factor Analysis Principal components factor analysis Use of extracted factors in multivariate dependency models 2 KEY CONCEPTS ***** Factor Analysis Interdependency technique Assumptions of factor analysis
More informationUSING SPECTRAL RADIUS RATIO FOR NODE DEGREE TO ANALYZE THE EVOLUTION OF SCALE- FREE NETWORKS AND SMALL-WORLD NETWORKS
USING SPECTRAL RADIUS RATIO FOR NODE DEGREE TO ANALYZE THE EVOLUTION OF SCALE- FREE NETWORKS AND SMALL-WORLD NETWORKS Natarajan Meghanathan Jackson State University, 1400 Lynch St, Jackson, MS, USA natarajan.meghanathan@jsums.edu
More informationComponent Ordering in Independent Component Analysis Based on Data Power
Component Ordering in Independent Component Analysis Based on Data Power Anne Hendrikse Raymond Veldhuis University of Twente University of Twente Fac. EEMCS, Signals and Systems Group Fac. EEMCS, Signals
More information4. There are no dependent variables specified... Instead, the model is: VAR 1. Or, in terms of basic measurement theory, we could model it as:
1 Neuendorf Factor Analysis Assumptions: 1. Metric (interval/ratio) data 2. Linearity (in the relationships among the variables--factors are linear constructions of the set of variables; the critical source
More informationCHAPTER 8 FACTOR EXTRACTION BY MATRIX FACTORING TECHNIQUES. From Exploratory Factor Analysis Ledyard R Tucker and Robert C.
CHAPTER 8 FACTOR EXTRACTION BY MATRIX FACTORING TECHNIQUES From Exploratory Factor Analysis Ledyard R Tucker and Robert C MacCallum 1997 180 CHAPTER 8 FACTOR EXTRACTION BY MATRIX FACTORING TECHNIQUES In
More informationStatistical Machine Learning
Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes
More informationMULTIVARIATE DATA ANALYSIS WITH PCA, CA AND MS TORSTEN MADSEN 2007
MULTIVARIATE DATA ANALYSIS WITH PCA, CA AND MS TORSTEN MADSEN 2007 Archaeological material that we wish to analyse through formalised methods has to be described prior to analysis in a standardised, formalised
More informationLeast-Squares Intersection of Lines
Least-Squares Intersection of Lines Johannes Traa - UIUC 2013 This write-up derives the least-squares solution for the intersection of lines. In the general case, a set of lines will not intersect at a
More informationFacebook Friend Suggestion Eytan Daniyalzade and Tim Lipus
Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus 1. Introduction Facebook is a social networking website with an open platform that enables developers to extract and utilize user information
More informationMultivariate Analysis
Table Of Contents Multivariate Analysis... 1 Overview... 1 Principal Components... 2 Factor Analysis... 5 Cluster Observations... 12 Cluster Variables... 17 Cluster K-Means... 20 Discriminant Analysis...
More informationHow To Cluster
Data Clustering Dec 2nd, 2013 Kyrylo Bessonov Talk outline Introduction to clustering Types of clustering Supervised Unsupervised Similarity measures Main clustering algorithms k-means Hierarchical Main
More informationLinear Algebra Review. Vectors
Linear Algebra Review By Tim K. Marks UCSD Borrows heavily from: Jana Kosecka kosecka@cs.gmu.edu http://cs.gmu.edu/~kosecka/cs682.html Virginia de Sa Cogsci 8F Linear Algebra review UCSD Vectors The length
More informationRachel J. Goldberg, Guideline Research/Atlanta, Inc., Duluth, GA
PROC FACTOR: How to Interpret the Output of a Real-World Example Rachel J. Goldberg, Guideline Research/Atlanta, Inc., Duluth, GA ABSTRACT THE METHOD This paper summarizes a real-world example of a factor
More informationSimple Regression Theory II 2010 Samuel L. Baker
SIMPLE REGRESSION THEORY II 1 Simple Regression Theory II 2010 Samuel L. Baker Assessing how good the regression equation is likely to be Assignment 1A gets into drawing inferences about how close the
More informationOrthogonal Diagonalization of Symmetric Matrices
MATH10212 Linear Algebra Brief lecture notes 57 Gram Schmidt Process enables us to find an orthogonal basis of a subspace. Let u 1,..., u k be a basis of a subspace V of R n. We begin the process of finding
More informationDESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses.
DESCRIPTIVE STATISTICS The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses. DESCRIPTIVE VS. INFERENTIAL STATISTICS Descriptive To organize,
More informationby the matrix A results in a vector which is a reflection of the given
Eigenvalues & Eigenvectors Example Suppose Then So, geometrically, multiplying a vector in by the matrix A results in a vector which is a reflection of the given vector about the y-axis We observe that
More informationElements of a graph. Click on the links below to jump directly to the relevant section
Click on the links below to jump directly to the relevant section Elements of a graph Linear equations and their graphs What is slope? Slope and y-intercept in the equation of a line Comparing lines on
More informationDATA ANALYSIS II. Matrix Algorithms
DATA ANALYSIS II Matrix Algorithms Similarity Matrix Given a dataset D = {x i }, i=1,..,n consisting of n points in R d, let A denote the n n symmetric similarity matrix between the points, given as where
More informationNotes on Orthogonal and Symmetric Matrices MENU, Winter 2013
Notes on Orthogonal and Symmetric Matrices MENU, Winter 201 These notes summarize the main properties and uses of orthogonal and symmetric matrices. We covered quite a bit of material regarding these topics,
More information2. Linearity (in relationships among the variables--factors are linear constructions of the set of variables) F 2 X 4 U 4
1 Neuendorf Factor Analysis Assumptions: 1. Metric (interval/ratio) data. Linearity (in relationships among the variables--factors are linear constructions of the set of variables) 3. Univariate and multivariate
More informationVector Math Computer Graphics Scott D. Anderson
Vector Math Computer Graphics Scott D. Anderson 1 Dot Product The notation v w means the dot product or scalar product or inner product of two vectors, v and w. In abstract mathematics, we can talk about
More informationExample: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.
Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C
More informationMATH 423 Linear Algebra II Lecture 38: Generalized eigenvectors. Jordan canonical form (continued).
MATH 423 Linear Algebra II Lecture 38: Generalized eigenvectors Jordan canonical form (continued) Jordan canonical form A Jordan block is a square matrix of the form λ 1 0 0 0 0 λ 1 0 0 0 0 λ 0 0 J = 0
More informationThe ith principal component (PC) is the line that follows the eigenvector associated with the ith largest eigenvalue.
More Principal Components Summary Principal Components (PCs) are associated with the eigenvectors of either the covariance or correlation matrix of the data. The ith principal component (PC) is the line
More informationExploratory data analysis for microarray data
Eploratory data analysis for microarray data Anja von Heydebreck Ma Planck Institute for Molecular Genetics, Dept. Computational Molecular Biology, Berlin, Germany heydebre@molgen.mpg.de Visualization
More informationOverview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model
Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written
More informationModule 3: Correlation and Covariance
Using Statistical Data to Make Decisions Module 3: Correlation and Covariance Tom Ilvento Dr. Mugdim Pašiƒ University of Delaware Sarajevo Graduate School of Business O ften our interest in data analysis
More informationLocal outlier detection in data forensics: data mining approach to flag unusual schools
Local outlier detection in data forensics: data mining approach to flag unusual schools Mayuko Simon Data Recognition Corporation Paper presented at the 2012 Conference on Statistical Detection of Potential
More informationSolving Systems of Linear Equations
LECTURE 5 Solving Systems of Linear Equations Recall that we introduced the notion of matrices as a way of standardizing the expression of systems of linear equations In today s lecture I shall show how
More informationIntroduction to Multivariate Analysis
Introduction to Multivariate Analysis Lecture 1 August 24, 2005 Multivariate Analysis Lecture #1-8/24/2005 Slide 1 of 30 Today s Lecture Today s Lecture Syllabus and course overview Chapter 1 (a brief
More informationRegression III: Advanced Methods
Lecture 16: Generalized Additive Models Regression III: Advanced Methods Bill Jacoby Michigan State University http://polisci.msu.edu/jacoby/icpsr/regress3 Goals of the Lecture Introduce Additive Models
More informationStepwise Regression. Chapter 311. Introduction. Variable Selection Procedures. Forward (Step-Up) Selection
Chapter 311 Introduction Often, theory and experience give only general direction as to which of a pool of candidate variables (including transformed variables) should be included in the regression model.
More informationExploratory Factor Analysis
Exploratory Factor Analysis Definition Exploratory factor analysis (EFA) is a procedure for learning the extent to which k observed variables might measure m abstract variables, wherein m is less than
More informationFactor analysis. Angela Montanari
Factor analysis Angela Montanari 1 Introduction Factor analysis is a statistical model that allows to explain the correlations between a large number of observed correlated variables through a small number
More informationEcology and Simpson s Diversity Index
ACTIVITY BRIEF Ecology and Simpson s Diversity Index The science at work Ecologists, such as those working for the Environmental Agency, are interested in species diversity. This is because diversity is
More informationExploratory Factor Analysis and Principal Components. Pekka Malo & Anton Frantsev 30E00500 Quantitative Empirical Research Spring 2016
and Principal Components Pekka Malo & Anton Frantsev 30E00500 Quantitative Empirical Research Spring 2016 Agenda Brief History and Introductory Example Factor Model Factor Equation Estimation of Loadings
More informationQuadratic forms Cochran s theorem, degrees of freedom, and all that
Quadratic forms Cochran s theorem, degrees of freedom, and all that Dr. Frank Wood Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 1, Slide 1 Why We Care Cochran s theorem tells us
More informationMATRIX ALGEBRA AND SYSTEMS OF EQUATIONS
MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS Systems of Equations and Matrices Representation of a linear system The general system of m equations in n unknowns can be written a x + a 2 x 2 + + a n x n b a
More informationHow to report the percentage of explained common variance in exploratory factor analysis
UNIVERSITAT ROVIRA I VIRGILI How to report the percentage of explained common variance in exploratory factor analysis Tarragona 2013 Please reference this document as: Lorenzo-Seva, U. (2013). How to report
More informationFactor Analysis. Advanced Financial Accounting II Åbo Akademi School of Business
Factor Analysis Advanced Financial Accounting II Åbo Akademi School of Business Factor analysis A statistical method used to describe variability among observed variables in terms of fewer unobserved variables
More informationSession 7 Bivariate Data and Analysis
Session 7 Bivariate Data and Analysis Key Terms for This Session Previously Introduced mean standard deviation New in This Session association bivariate analysis contingency table co-variation least squares
More informationFACTOR ANALYSIS NASC
FACTOR ANALYSIS NASC Factor Analysis A data reduction technique designed to represent a wide range of attributes on a smaller number of dimensions. Aim is to identify groups of variables which are relatively
More informationBEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES
BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES 123 CHAPTER 7 BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES 7.1 Introduction Even though using SVM presents
More informationANALYSIS OF TREND CHAPTER 5
ANALYSIS OF TREND CHAPTER 5 ERSH 8310 Lecture 7 September 13, 2007 Today s Class Analysis of trends Using contrasts to do something a bit more practical. Linear trends. Quadratic trends. Trends in SPSS.
More informationRisk Decomposition of Investment Portfolios. Dan dibartolomeo Northfield Webinar January 2014
Risk Decomposition of Investment Portfolios Dan dibartolomeo Northfield Webinar January 2014 Main Concepts for Today Investment practitioners rely on a decomposition of portfolio risk into factors to guide
More informationDescriptive Statistics
Descriptive Statistics Primer Descriptive statistics Central tendency Variation Relative position Relationships Calculating descriptive statistics Descriptive Statistics Purpose to describe or summarize
More informationSimilarity and Diagonalization. Similar Matrices
MATH022 Linear Algebra Brief lecture notes 48 Similarity and Diagonalization Similar Matrices Let A and B be n n matrices. We say that A is similar to B if there is an invertible n n matrix P such that
More informationTorgerson s Classical MDS derivation: 1: Determining Coordinates from Euclidean Distances
Torgerson s Classical MDS derivation: 1: Determining Coordinates from Euclidean Distances It is possible to construct a matrix X of Cartesian coordinates of points in Euclidean space when we know the Euclidean
More informationMISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group
MISSING DATA TECHNIQUES WITH SAS IDRE Statistical Consulting Group ROAD MAP FOR TODAY To discuss: 1. Commonly used techniques for handling missing data, focusing on multiple imputation 2. Issues that could
More informationNonlinear Iterative Partial Least Squares Method
Numerical Methods for Determining Principal Component Analysis Abstract Factors Béchu, S., Richard-Plouet, M., Fernandez, V., Walton, J., and Fairley, N. (2016) Developments in numerical treatments for
More informationComputer program review
Journal of Vegetation Science 16: 355-359, 2005 IAVS; Opulus Press Uppsala. - Ginkgo, a multivariate analysis package - 355 Computer program review Ginkgo, a multivariate analysis package Bouxin, Guy Haute
More informationTeaching Multivariate Analysis to Business-Major Students
Teaching Multivariate Analysis to Business-Major Students Wing-Keung Wong and Teck-Wong Soon - Kent Ridge, Singapore 1. Introduction During the last two or three decades, multivariate statistical analysis
More informationMedical Information Management & Mining. You Chen Jan,15, 2013 You.chen@vanderbilt.edu
Medical Information Management & Mining You Chen Jan,15, 2013 You.chen@vanderbilt.edu 1 Trees Building Materials Trees cannot be used to build a house directly. How can we transform trees to building materials?
More informationChapter Seven. Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS
Chapter Seven Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS Section : An introduction to multiple regression WHAT IS MULTIPLE REGRESSION? Multiple
More informationChapter 7 Factor Analysis SPSS
Chapter 7 Factor Analysis SPSS Factor analysis attempts to identify underlying variables, or factors, that explain the pattern of correlations within a set of observed variables. Factor analysis is often
More informationHow to Get More Value from Your Survey Data
Technical report How to Get More Value from Your Survey Data Discover four advanced analysis techniques that make survey research more effective Table of contents Introduction..............................................................2
More informationChapter 11. Correspondence Analysis
Chapter 11 Correspondence Analysis Software and Documentation by: Bee-Leng Lee This chapter describes ViSta-Corresp, the ViSta procedure for performing simple correspondence analysis, a way of analyzing
More informationBindel, Spring 2012 Intro to Scientific Computing (CS 3220) Week 3: Wednesday, Feb 8
Spaces and bases Week 3: Wednesday, Feb 8 I have two favorite vector spaces 1 : R n and the space P d of polynomials of degree at most d. For R n, we have a canonical basis: R n = span{e 1, e 2,..., e
More information1 The Brownian bridge construction
The Brownian bridge construction The Brownian bridge construction is a way to build a Brownian motion path by successively adding finer scale detail. This construction leads to a relatively easy proof
More informationMultiple Regression: What Is It?
Multiple Regression Multiple Regression: What Is It? Multiple regression is a collection of techniques in which there are multiple predictors of varying kinds and a single outcome We are interested in
More informationImproving the Performance of Data Mining Models with Data Preparation Using SAS Enterprise Miner Ricardo Galante, SAS Institute Brasil, São Paulo, SP
Improving the Performance of Data Mining Models with Data Preparation Using SAS Enterprise Miner Ricardo Galante, SAS Institute Brasil, São Paulo, SP ABSTRACT In data mining modelling, data preparation
More informationLecture 9: Introduction to Pattern Analysis
Lecture 9: Introduction to Pattern Analysis g Features, patterns and classifiers g Components of a PR system g An example g Probability definitions g Bayes Theorem g Gaussian densities Features, patterns
More informationMachine Learning and Pattern Recognition Logistic Regression
Machine Learning and Pattern Recognition Logistic Regression Course Lecturer:Amos J Storkey Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh Crichton Street,
More informationPsychology 7291, Multivariate Analysis, Spring 2003. SAS PROC FACTOR: Suggestions on Use
: Suggestions on Use Background: Factor analysis requires several arbitrary decisions. The choices you make are the options that you must insert in the following SAS statements: PROC FACTOR METHOD=????
More informationCovariance and Correlation
Covariance and Correlation ( c Robert J. Serfling Not for reproduction or distribution) We have seen how to summarize a data-based relative frequency distribution by measures of location and spread, such
More informationSTATISTICA Formula Guide: Logistic Regression. Table of Contents
: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary
More information