Part II Multivariate data. Jiří Militky Computer assisted statistical modeling in the textile research

Size: px
Start display at page:

Download "Part II Multivariate data. Jiří Militky Computer assisted statistical modeling in the textile research"

Transcription

1 Part II Multivariate data Jiří Militky Computer assisted statistical modeling in the textile research

2 Experimental data peculiarities Small samples (thumb rule- 100 points per feature) Non normal distribution Presence of heterogeneities and outliers Data oriented models creation Physical sense of data Uncertainty in model selection concentration K C=f(t) C eq time

3 Style of analysis Data exploration Simplification of data structures Interactive model selection Interpretation of results Depth contours :Multidimensional analog of the median

4 Z Primary data All data are compiled to one matrix called X (n x m) each column of X represents one feature (variable) each row of X represents one obect (i.e. observation as one point in time, or one person, one piece etc.) Obects Rows Y X Features Columns DATA MATRIX (n x m)

5 Data transformation I Linear centering, scaling, standardization x µ / σ Nonlinear - logaritmic transformation 1. Reduces extremes contribution. Reduces right skewness of data. 3. Stabilizes variance (remove heteroskedasticity) Rank - (values are replaced by their ranks). x ( x µ ) / σ

6 Linear transformation I Common PCA is based on the column centered data (covariance matrix). Standardization leads to correlation matrix R. Differences are caused by different weighting. For centered data are columns of X weighted" according to their length x i (standard deviation in original data). For standardized data are columns of X weighted" to unit length. Weights are here the same For feature in various units is suitable to use correlation matrix.

7 Linear transformation II Centering removes absolute term and then reduce number of variables. The configuration of data is not changed. Only origin is shifted. Standardization removes dependence on units and remove heteroskedasticity.it have influence on parameter estimates (weighted least squares). It is inappropriate for cases where are some features on the noise level (improves their importance).

8 Linear transformation III Centering. Standardization

9 Outline Basic problems due to multivariate data. Proections of correlated data Dimension reduction techniques.

10 virginica Dimensionality problem I setosa versicolor Basic characteristics of multivariate data is their dimension (number of elements). High dimensions bring about huge problems in their statistical analysis. Variables reduction variables have often variability on the noise level and can be therefore excluded from data (bring no information). There are some redundancies due to near linear dependencies between some variables or due to linkages arising from their physical essence. In both cases it is possible to replace the original set by reduced number of uncorrelated new variables.

11 Dimensionality problem II Multivariate curse number of data necessary for achieving the multivariate estimates precision is exponential function of number of variables. Empty space phenomena multivariate data are concentrated on the peripheral part of variables space. Distance problem distance between obects is often weighted by the strength of the mutual links between variables,

12 Multivariate exploratory analysis I For n obects (points) are defined m variables (features) expressed in cardinal scale.input data matrix X has dimension n m. It is standard that n is higher than m. Value m defines problem dimension (number of features). x.. x Aims 11 n1 x x 1.. n x x 1.. n x x 1m.. nm. (a) Assess obects similarity or clustering tendency, (b) Retrieve outliers, or features of outliers, (c) Determine linear relations between features. (d) Prove assumptions about data (normality, no correlation, homogeneity).

13 Multivariate exploratory analysis II Graphical display for D or 3D representation of data Identification of obects or features appearing to be outlying. Indication of structures in data as heterogeneities, multiple groups, etc.

14 Multivariate exploratory analysis III Most of methods for multivariate data exploration can be divided to the following categories: generalized scatter graphs symbols proections

15 Obekty Profiles Each obect x i is characterized by m piecewise lines. Their size is proportional to corresponding value of feature x i = 1,..., m. Znaky On x-axis is index of features. Profile is created by oining of individual lines end points. It is suitable to scale y axis (values of features). Profiles are simple and easily interpretable. It is possible to identify outliers and groups of obects with similar behavior.

16 Chernoff faces Humans have specialized brain areas for face recognition. For d < 0 use face elements.

17 Latent variables I First principal component Direction is equal to direction of maximum variability Scatter graphs in modified coordinates enable to simpler interpretation and reduce distortion or artifacts. As suitable coordinates the latent variables are used. Typical latent variables are based on principal component (PCA). This method is useful for cases where columns of X matrix are highly correlated. Second principal component

18 Latent variables II PCA combined with dynamic graphics (rotation of coordinates).

19 Multivariate geometry Date are lying in hypercube on the diagonal vector v starting from center and ending in one corner.angle between this vector and selected axis e i is given by relation cosθ = i v * e v ± 1 m For higher m is this cosine approaching to zero. Diagonal vectors are then nearly perpendicular to all axes. In scatter graphs are then points clusters oriented in the diagonal directions proected to origin and are not identifiable. * i e i =

20 Volume concentration Volume concentration phenomena is not visible from classical D or 3D geometry. Volume of hyper sphere with radius r in m m / dimensional space is π Vk = Γ( m / + 1) Volume of hyper cube with edge length r is r m m Vh = * r Hyper sphere inscribed in hyper-cube. Ratio of volumes is Vk Vh m / = π 0 for Γ m + m m * ( / 1) Volume of hyper-cube is then concentrated in edges and central part is nearly empty m

21 Volumes ratio Influence of space dimension on volume ratio of hypercube and inscribed hyper sphere. From dimension m = 8 is volume of sphere negligible in comparison with cube volume. Data in the multidimensional space are concentrated in the periphery and for covering of central part it will be necessary to have huge number of points (obects).

22 Multivariate normal distribution I Let we have multivariate normal distribution with mean equal to zero and unit covariance matrix. Dependence of multivariate normal distribution function value in the point x= (1, 1,1.1) on the dimension m is shown on the figure.

23 Multivariate normal distribution II In central area is probability of random variable occurrence very low in comparison with area of tails. The dependence of standardized multivariate normal distribution function in the point x= (,,.) is shown on the figure. The decrease of probability for higher m is clearly visible.

24 Multivariate normal distribution III It is well known that sum of squares of independent standardized normally distributed elements of random vector x has chí squared distribution. x m = i= 1 x i Because is mean value equal to zero is this norm equal to distance form origin. The probability of occurrence of multivariate normal random vector in sphere having zero origin and radius r is then equal to P( x r) = χ P( χ m m r)

25 Multivariate normal distribution IV The dependence of probability of occurrence of multivariate normal random vector in sphere having zero origin and radius r = 3 on dimension m is shown on the figure. The quick decrease towards zero is visible. Starting from m = 8 has the occurrence of individual obects in central area small likelihood. For higher dimensions will be then maority of points in tail area. The tails play here more important role Paradox of dimensionality

26 Data Proection I Usually, for D proection first two PC are used. The information from the last two PC can be interesting as well. These proections preserve angles and distances between obects (points). On the other hand there is here no obective criterion for revealing the hidden structures in data.. The linear proections of multivariate data (proection pursuit) satisfy to some criterion called proection index IP(C i ). The proection vectors C i, maximizing IP(C i ) under the constraints C it C i = 1 are here computed. Proection on these vectors is then C it X.

27 Linear transformation D vectors X in a unit circle with mean (1,1); Y = A*X, A = x matrix Y Y 1 = a a 11 1 a a 1 X * X 1 The shape and the mean is changed. Scaling (a ii elements); rotation, mirror reflection. Distances between vectors are not invariant: Y 1 -Y X 1 -X

28 Data Proection II It is interesting that index IP, corresponding to principal components is T T IP( C) = max( Ci SCi ) for Ci Ci = 1 S is sample covariance matrix. (can be simply robustified) C i satisfying the maximum condition is eigenvector of matrix S having i -th largest eigenvalue λ i, i = 1,. The C 1 and C are orthogonal. Index IP(C) corresponds to minimum of all proections C of logarithm of likelihood function maximum for normally distributed data. N(c T µ, c T C c). For normally distributed samples in then proection to the first two principal components optimal.

29 PCA limitations PCA leads to vertical axis (no discrimination) PP leads to horizontal axis (two clusters)

30 PCA Data Proection III Frequently, the selection of clusters in proection is the main goal. Fro these purposes the index is equal to ratio between mean inter obect distance D ans mean distance between nearest neighbors d. Some indexes are based on the pdf of data in the proection f P (x). The estimator of f P (x) ) is usually the kernel pdf estimator IP( C) = f p ( x) dx Differences from normality expressed by pdf φ(x) are included in index IP( C) = φ( x)[ f ( x) φ( x)] dx p PP

31 Nonlinear proection Sammon algorithm proection from original space to reduced space (having nearly the same distances between obects). Let d i * are distances between two obects in original space and d i corresponding distances in reduced space.. Target function E (to be minimized) has form E = 1 * di i< i< ( d The iterative Newton method or heuristically oriented algorithms are used. * i - d d i * i )

32 Comparison of proections

33 PCA goals x Selection of original variables combinations (latent variables) explaining main part of overall variance. Discovering of data structures characterizing links between features of obects Dimension reduction Removal of noise and outliers x 1 Creation of optimal summary of features. (first PCA)

34 PC=principal component (latent variables) PC features x z z 1 Linear combination of original variables Mutual orthogonality Arrangement about importance Rotation of coordinate system Optimal low dimensional proection Explanation of maximal portion of data variance z z 1 x 1

35 6 PCA utilization Dimensionality reduction Multivariate normality testing Data exploration Indicator of multivariate data Data proection Special regression model

36 First PC PC1= y 1 explains maximum of original data variability Overall mean of the dataset

37 PC= y explains maximum of not included in y 1. Second PC y 1 is orthogonal to y Overall mean of the dataset

38 Mathematical formalization x C centered original data expressed in deviations from means T xc = (x1 - µ 1, x - µ,..., xm - µ m,) First PC y 1 Second PC y m T y1 = V 1 xc = V1 xc y = V x = V x = 1 m = 1 T T T D ( y 1 ) = D ( V1 xc) = E [(V1 xc) (V1 xc) ] = y )= C V T T T V1 E (xc xc ) V1 = V1 C V1 C.. covariance matrix, V1.. vector of loadings for PC1, V.. vector of loadings for PC. y.. PC, V.. factor loadings T D( C T C T V y = V * xc

39 Properties of loadings I Normalization conditions and mutual orthogonality V 1 T V 1 = 1 T T V V =1 a V1 V Computation of loadings V 1, V,..., V m,leads to maximization of variances under equality constraints. Solution shows that V is eigenvector of covariance matrix C, corresponding to -th maximal eigenvalue λ.. Covariance matrix decomposition = 0 C = V Λ V T V is (m x m) matrix, columns are loading vectors V Λ is (m x m) diagonal matrix, covariance matrix eigenvalues λ 1 <= λ <=... λ m are diagonal elements.

40 Properties of loadings II Matrix V is orthogonal, i.e. V T V = E, Variance D(y ) = λ is equal to -th eigenvalue. Overall data variance is equal to sum of PC m variances tr C = λ i=1 Relative contribution of -th PC for explanation of data variability λ P = m i=1 λ

41 Properties of loadings III Covariances between -th PC and vector of features x C are T cov (x, y ) = cov (x, V x ) = E (x x ) V = C V = λ V C C T C Covariance between i-th feature x i and -th PC y is cov (x Ci, y ) = λ V i, V i is i-th element of vector V. Correlation coefficient r(x Ci, y ) has form C C r( x C, y ) = λ V σ x i i λ = λ V σ x i i

42 Properties of loadings IV Replacement of centered features x C by normalized features x N x1 - µ 1 =, σ x - µ - µ Correlation coefficient r(x Ci, y ) reduces to form x x m,..., 1 σ x σ x m m cov (x N, y ) = r(x N, y ) = v * i λ * V i * and λ * are eigenvectors and eigenvalues of correlation matrix R.

43 Two features example I Two features x 1 and x, with covariance matrix C and correlation matrix R, σ 1 C1 C = C σ PCA for correlation matrix Condition for eigenvalues computation 1- l r det (R - l E ) = det = 0 r 1- l After rearrangements (1 - l) -r = 0. 1 R = 1 r r 1 l - * l + 1- r = 0

44 Two features example II Solution of quadratic equation l - * l + 1- r [ ] ( 1- r ) =1 r 1 = λ = l 1 = 0 [ ] ( 1- r ) =1 r = λ = 0.5 l Eigenvectors are solution of homogeneous equations ( R - λi E ) V i = 0, i =1,

45 Two features example III Normalized eigenvector V 1* has form Normalized eigenvector V * has form = 1 1-1) + ( r -1 r -1) + ( r = V / * 1 λ λ λ = (1- r -1) + (1- r -1) r + (1- r -1) r - r = V -1/ *

46 Two features example IV First PC Second PC 1 y = ( z + z ) y= ( z - z1 ) 1 =( x - E( x )) / D( x ) z=( x - E( x )) / D( x ) z Application of normalized features leads to independence of PC on correlation in original data The coordinate system is rotated of i.e. of 45 o. cosα = 1/

47 Scores plot 4 3 Acute Chronic PC scores PC Scores T Xc * V Values of PC s for all obects Reconstructed data PC 1 = (nxm)=(nxm)*(mxm) X c = T* V Reduction of PC number: Selection of few Pc (p) Replacing the loading matrix V (m x m) by reduced loading matrix Vf (m x p) Computation of reduced scores T f = Xc * Vf T

48 Reduced PC I The percentage of variability explained by the th principal component is: λ P = *100 m λ i=1 In practice, it is often the case that although hundreds of variables are measured, the first few PCs will explain almost all of the variability (information) in the data. Scree plot Bar diagram of ordered eigenvalues λ 1 <= λ <=... λ m. Often it is visible gap between important and non important PC

49 3.5 Scree Plot Reduced PC II Eigenvalue Due to use of reduced loadings there are differences between reconstructed data matrices. Centered matrix X C is decomposed on matrix of component scores T (n x k) and loading matrix V k T (k x m). Information loss i.e. error matrix O (n x m). Component Number X C = T * V T k + O

50 n T S( µ, y, V ) = I k (xi - µ - Vk yii ) (xi - µ - Vk yii) i=1 Bilinear regression model X = T * V Model C k has as parameters scores T, and eigenvalues V k. T T X Cr = T *V k = X C *V k V k The minimization of length of O or distance dist(x C - T V kt ) can be realized. The results are the same as maximization of variance. Residual matrix Or = Xc X cr T + O = Xc Tf * V T k = Xc *( E V k * V T k )

51 PC interpretation I Original matrix X c = (x 1,..., x m ) - creating n points in m dimensional feature space Score matrix (t 1,..., t m ) - creating n points in m or k dimensional space of PC. -th vector t of PC i-th vector of x Ci m m t = Vi xci x = Ci Vi t i=1 V i are elements of matrix V m, resp. V k if only k component are used. =1

52 PC interpretation II In feature space is t the weighted sum of vectors x Ci with weights V i. Length of this vector is d( t ) = t T t = v T X T C x α p=k*z X C v = cos α = cos α = z λ x T z ((x T x)*(z T z)) 1/ (x T x) (p T p) Proection t Pi of vectou x Ci on t is expressed as t Pi = t b, where b is slope. t T (x Ci - b t ) = 0 and b = t T t T x t Ci = t T x λ Ci

53 PC interpretation III Because t T t k = 0 for # k (vectors t are orthogonal) it is valid t T x Ci = t T m k=1 V Therefore b = V i. and proection vector t Pi = V i t has length p i = t T Pi t Pi Length of t vector is then. m m d( t ) = Vi p V ik = t k i=1 V = V i i λ λ = V i i λ = λ i=1 i

54 PC interpretation IV Contribution of each original variable to length of vector t is proportional to square of V i. Length of this vector is proportional to standard deviation of corresponding PC. Variance explained by -th PC is composed from contributions of original features and their importance is expressed by V i. Small V i means that i-th original feature has small contribution to variability of -th PC and is no important. If the row of matrix V has all small elements is the i-th feature no important for all PC.

55 Contribution plot Graph composed from m groups. In each group are m columns. Each group corresponds to one PC and each column represents one feature Heights of columns are related to Heights of columns in first group are standardized and their sum is 100 %. (division by sum of their lengths Ls) The same standardization (division by Ls) is used for rest of groups as well It is simple to investigate the influence of features on PC V i * λ

56 Correlations Relations between original features and PC are quantified by correlation coefficients between x Ci and t, r i = cosα = i (x σ i is standard deviation for i-th feature. x T Ci T Ci x t Ci ) t T t By using of normalized variables (replacement of matrix S by correlation matrix R) are correlation coefficients equal directly to partial proections r i = p i Higher r i indicates higher proections. It means that x i is close to t and contributes markedly to variance explained by -th PC. V i σ i λ = p σ i i

57 PCA for correlated data I Simulated data arises from 3D normal distribution with zero vector of mean values and correlation matrix r r r r N =500 data were generated for various paired correlation coefficient r r

58 PCA for correlated data II All correlations are zero

59 PCA for correlated data III All correlations are 0.9

60 False correlations I r 1 = H H 1 13 H Multiple correlation coefficient Partial correlation coefficients R 1 = 0 Variable x3 is then parasite and do not contribute to explanation of feature x1 variability For simulation was selected H = 0,9. r = r = H,3() 3 R = 1(,3) R 1,(3) = H H 1+ H

61 False correlations II Scree and contribution plots

62 Low paired correlations I r 1 = H H 0.01 r 13 = 0 Multiple correlation coefficient Partial correlation coefficients r3 = 1 H R 1 = (,3) R = 0.71 R ,3() 1 =,(3) All variables are therefore important.

63 Low paired correlations II Scree and contribution plots PCA is not able to fully replace the correlation analysis

64 Distances in feature spaces Data vector, d-dimensions X T = (X 1,... X m ), Y T = (Y 1,... Y m ) Distance, or metric function Popular distance functions: Minkowski distance (L r metric) Manhattan (city-block) distance (L 1 norm) Euclidean distance (L norm) = = = = = = m k k k m k k k r m k r k k Y X d Y X d Y X d ), ( ), ( 0, ), ( X Y Y X Y X Y X d d d = = ), ( ), ( ), ( Z Y Z X Y X d d d + i h

65 Basic metrics X X Manhattan (city-block) X 1 X 1 Euclidean Identical distance between two points: imagine that in 10 D! i

66 Other metric Ultrametric d i replaces [ d d ] max, ih h d i d ih + d h i h Four-point additive condition d d hi i d k replaces ih d h [( d + d ) ( d + d )] + d max, + h ik hk i

67 Invariant distances Euclidean distance is not invariant to linear transformations Y = A*X Scaling of units has strong influence on distances. Mahalanobis metric will replace A T A by inverse of the covariance matrix (1) () (1) () T (1) () Y Y = Y Y Y Y = ( ) ( ) ( (1) () ) T T ( (1) () Y Y A A Y Y ) Orthonormal matrices: A T A = I, rigid rotations. Invariance requires standardization + covariance matrix

68 Distances Mahalanobis distance d i = ( x i x A ) T S 1 ( x i x A ) Euclidean distance d i = ( x i x A ) T ( x i x A )

69 Outlying obects I Indication of outliers is sensitive on the presence of masking, outliers appear to be correct (due to covariance matrix augmentation) or swamping,correct values appear to be outlier (due to presence of outlying points).

70 Outlying obects II As outliers are identified these obects d i > c( p, N, α N ) For the case of multivariate normal distribution is c( p, N, α N ) Equal to quantile of chi squared distribution c( p, N, α N ) = χ (1 α / p N)

71 Outlying obects III For application of Mahalanobis distance approach it is necessary to known clean estimators x A a S. Robust estimator of covariance matrix can be obtained by the following ways: - M estimator - S estimator minimizing det C with constraints - Estimators minimizing volume of confidence ellipsoid. EDA analysis requires to visualization of outliers but not corruption of proections.

72 Simple solution Evaluation of clear subset of data 1. Selection of starting subset based on on the - Mahalanobis distance and trimming of suspicious data - Distance from multivariate median Result is subset of data with parameters x Ac S c. Calculation of residuals 1 d i = ( x i x 3. Iterative modification of clear subset to be containing the points with residuals lower that c * χ α where AC ) T S C ( x h = ( n + p +1) / c 1 = max(0,( h r) /( h + r)) c = 1+ ( p + 1) /( n p) + /( n 1 3p) c = c 1 +c i x AC )

73 PCA corrupted normal data Axis % ellipse 99% ellipse 99.9% ellipse -1 savings Rule Axis 1 Exceptions Yearly income

Factor Analysis. Chapter 420. Introduction

Factor Analysis. Chapter 420. Introduction Chapter 420 Introduction (FA) is an exploratory technique applied to a set of observed variables that seeks to find underlying factors (subsets of variables) from which the observed variables were generated.

More information

Least-Squares Intersection of Lines

Least-Squares Intersection of Lines Least-Squares Intersection of Lines Johannes Traa - UIUC 2013 This write-up derives the least-squares solution for the intersection of lines. In the general case, a set of lines will not intersect at a

More information

Review Jeopardy. Blue vs. Orange. Review Jeopardy

Review Jeopardy. Blue vs. Orange. Review Jeopardy Review Jeopardy Blue vs. Orange Review Jeopardy Jeopardy Round Lectures 0-3 Jeopardy Round $200 How could I measure how far apart (i.e. how different) two observations, y 1 and y 2, are from each other?

More information

15.062 Data Mining: Algorithms and Applications Matrix Math Review

15.062 Data Mining: Algorithms and Applications Matrix Math Review .6 Data Mining: Algorithms and Applications Matrix Math Review The purpose of this document is to give a brief review of selected linear algebra concepts that will be useful for the course and to develop

More information

Big Ideas in Mathematics

Big Ideas in Mathematics Big Ideas in Mathematics which are important to all mathematics learning. (Adapted from the NCTM Curriculum Focal Points, 2006) The Mathematics Big Ideas are organized using the PA Mathematics Standards

More information

Introduction to Principal Components and FactorAnalysis

Introduction to Principal Components and FactorAnalysis Introduction to Principal Components and FactorAnalysis Multivariate Analysis often starts out with data involving a substantial number of correlated variables. Principal Component Analysis (PCA) is a

More information

Dimensionality Reduction: Principal Components Analysis

Dimensionality Reduction: Principal Components Analysis Dimensionality Reduction: Principal Components Analysis In data mining one often encounters situations where there are a large number of variables in the database. In such situations it is very likely

More information

How To Understand Multivariate Models

How To Understand Multivariate Models Neil H. Timm Applied Multivariate Analysis With 42 Figures Springer Contents Preface Acknowledgments List of Tables List of Figures vii ix xix xxiii 1 Introduction 1 1.1 Overview 1 1.2 Multivariate Models

More information

Medical Information Management & Mining. You Chen Jan,15, 2013 You.chen@vanderbilt.edu

Medical Information Management & Mining. You Chen Jan,15, 2013 You.chen@vanderbilt.edu Medical Information Management & Mining You Chen Jan,15, 2013 You.chen@vanderbilt.edu 1 Trees Building Materials Trees cannot be used to build a house directly. How can we transform trees to building materials?

More information

T-test & factor analysis

T-test & factor analysis Parametric tests T-test & factor analysis Better than non parametric tests Stringent assumptions More strings attached Assumes population distribution of sample is normal Major problem Alternatives Continue

More information

NEW YORK STATE TEACHER CERTIFICATION EXAMINATIONS

NEW YORK STATE TEACHER CERTIFICATION EXAMINATIONS NEW YORK STATE TEACHER CERTIFICATION EXAMINATIONS TEST DESIGN AND FRAMEWORK September 2014 Authorized for Distribution by the New York State Education Department This test design and framework document

More information

Exploratory Factor Analysis and Principal Components. Pekka Malo & Anton Frantsev 30E00500 Quantitative Empirical Research Spring 2016

Exploratory Factor Analysis and Principal Components. Pekka Malo & Anton Frantsev 30E00500 Quantitative Empirical Research Spring 2016 and Principal Components Pekka Malo & Anton Frantsev 30E00500 Quantitative Empirical Research Spring 2016 Agenda Brief History and Introductory Example Factor Model Factor Equation Estimation of Loadings

More information

Common factor analysis

Common factor analysis Common factor analysis This is what people generally mean when they say "factor analysis" This family of techniques uses an estimate of common variance among the original variables to generate the factor

More information

Multivariate Normal Distribution

Multivariate Normal Distribution Multivariate Normal Distribution Lecture 4 July 21, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Lecture #4-7/21/2011 Slide 1 of 41 Last Time Matrices and vectors Eigenvalues

More information

Current Standard: Mathematical Concepts and Applications Shape, Space, and Measurement- Primary

Current Standard: Mathematical Concepts and Applications Shape, Space, and Measurement- Primary Shape, Space, and Measurement- Primary A student shall apply concepts of shape, space, and measurement to solve problems involving two- and three-dimensional shapes by demonstrating an understanding of:

More information

Multivariate Statistical Inference and Applications

Multivariate Statistical Inference and Applications Multivariate Statistical Inference and Applications ALVIN C. RENCHER Department of Statistics Brigham Young University A Wiley-Interscience Publication JOHN WILEY & SONS, INC. New York Chichester Weinheim

More information

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( ) Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates

More information

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written

More information

How To Cluster

How To Cluster Data Clustering Dec 2nd, 2013 Kyrylo Bessonov Talk outline Introduction to clustering Types of clustering Supervised Unsupervised Similarity measures Main clustering algorithms k-means Hierarchical Main

More information

Manifold Learning Examples PCA, LLE and ISOMAP

Manifold Learning Examples PCA, LLE and ISOMAP Manifold Learning Examples PCA, LLE and ISOMAP Dan Ventura October 14, 28 Abstract We try to give a helpful concrete example that demonstrates how to use PCA, LLE and Isomap, attempts to provide some intuition

More information

Tutorial on Exploratory Data Analysis

Tutorial on Exploratory Data Analysis Tutorial on Exploratory Data Analysis Julie Josse, François Husson, Sébastien Lê julie.josse at agrocampus-ouest.fr francois.husson at agrocampus-ouest.fr Applied Mathematics Department, Agrocampus Ouest

More information

Algebra 1 2008. Academic Content Standards Grade Eight and Grade Nine Ohio. Grade Eight. Number, Number Sense and Operations Standard

Algebra 1 2008. Academic Content Standards Grade Eight and Grade Nine Ohio. Grade Eight. Number, Number Sense and Operations Standard Academic Content Standards Grade Eight and Grade Nine Ohio Algebra 1 2008 Grade Eight STANDARDS Number, Number Sense and Operations Standard Number and Number Systems 1. Use scientific notation to express

More information

Principal Component Analysis

Principal Component Analysis Principal Component Analysis ERS70D George Fernandez INTRODUCTION Analysis of multivariate data plays a key role in data analysis. Multivariate data consists of many different attributes or variables recorded

More information

Component Ordering in Independent Component Analysis Based on Data Power

Component Ordering in Independent Component Analysis Based on Data Power Component Ordering in Independent Component Analysis Based on Data Power Anne Hendrikse Raymond Veldhuis University of Twente University of Twente Fac. EEMCS, Signals and Systems Group Fac. EEMCS, Signals

More information

Biggar High School Mathematics Department. National 5 Learning Intentions & Success Criteria: Assessing My Progress

Biggar High School Mathematics Department. National 5 Learning Intentions & Success Criteria: Assessing My Progress Biggar High School Mathematics Department National 5 Learning Intentions & Success Criteria: Assessing My Progress Expressions & Formulae Topic Learning Intention Success Criteria I understand this Approximation

More information

Introduction to Matrix Algebra

Introduction to Matrix Algebra Psychology 7291: Multivariate Statistics (Carey) 8/27/98 Matrix Algebra - 1 Introduction to Matrix Algebra Definitions: A matrix is a collection of numbers ordered by rows and columns. It is customary

More information

Factor analysis. Angela Montanari

Factor analysis. Angela Montanari Factor analysis Angela Montanari 1 Introduction Factor analysis is a statistical model that allows to explain the correlations between a large number of observed correlated variables through a small number

More information

Mehtap Ergüven Abstract of Ph.D. Dissertation for the degree of PhD of Engineering in Informatics

Mehtap Ergüven Abstract of Ph.D. Dissertation for the degree of PhD of Engineering in Informatics INTERNATIONAL BLACK SEA UNIVERSITY COMPUTER TECHNOLOGIES AND ENGINEERING FACULTY ELABORATION OF AN ALGORITHM OF DETECTING TESTS DIMENSIONALITY Mehtap Ergüven Abstract of Ph.D. Dissertation for the degree

More information

Multimedia Databases. Wolf-Tilo Balke Philipp Wille Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.

Multimedia Databases. Wolf-Tilo Balke Philipp Wille Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs. Multimedia Databases Wolf-Tilo Balke Philipp Wille Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de 14 Previous Lecture 13 Indexes for Multimedia Data 13.1

More information

Data Exploration Data Visualization

Data Exploration Data Visualization Data Exploration Data Visualization What is data exploration? A preliminary exploration of the data to better understand its characteristics. Key motivations of data exploration include Helping to select

More information

Multivariate Analysis (Slides 13)

Multivariate Analysis (Slides 13) Multivariate Analysis (Slides 13) The final topic we consider is Factor Analysis. A Factor Analysis is a mathematical approach for attempting to explain the correlation between a large set of variables

More information

Principal Component Analysis

Principal Component Analysis Principal Component Analysis Principle Component Analysis: A statistical technique used to examine the interrelations among a set of variables in order to identify the underlying structure of those variables.

More information

Common Core Unit Summary Grades 6 to 8

Common Core Unit Summary Grades 6 to 8 Common Core Unit Summary Grades 6 to 8 Grade 8: Unit 1: Congruence and Similarity- 8G1-8G5 rotations reflections and translations,( RRT=congruence) understand congruence of 2 d figures after RRT Dilations

More information

CHAPTER 8 FACTOR EXTRACTION BY MATRIX FACTORING TECHNIQUES. From Exploratory Factor Analysis Ledyard R Tucker and Robert C.

CHAPTER 8 FACTOR EXTRACTION BY MATRIX FACTORING TECHNIQUES. From Exploratory Factor Analysis Ledyard R Tucker and Robert C. CHAPTER 8 FACTOR EXTRACTION BY MATRIX FACTORING TECHNIQUES From Exploratory Factor Analysis Ledyard R Tucker and Robert C MacCallum 1997 180 CHAPTER 8 FACTOR EXTRACTION BY MATRIX FACTORING TECHNIQUES In

More information

Algebra 2 Chapter 1 Vocabulary. identity - A statement that equates two equivalent expressions.

Algebra 2 Chapter 1 Vocabulary. identity - A statement that equates two equivalent expressions. Chapter 1 Vocabulary identity - A statement that equates two equivalent expressions. verbal model- A word equation that represents a real-life problem. algebraic expression - An expression with variables.

More information

Introduction to Principal Component Analysis: Stock Market Values

Introduction to Principal Component Analysis: Stock Market Values Chapter 10 Introduction to Principal Component Analysis: Stock Market Values The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from

More information

GRADES 7, 8, AND 9 BIG IDEAS

GRADES 7, 8, AND 9 BIG IDEAS Table 1: Strand A: BIG IDEAS: MATH: NUMBER Introduce perfect squares, square roots, and all applications Introduce rational numbers (positive and negative) Introduce the meaning of negative exponents for

More information

Elasticity Theory Basics

Elasticity Theory Basics G22.3033-002: Topics in Computer Graphics: Lecture #7 Geometric Modeling New York University Elasticity Theory Basics Lecture #7: 20 October 2003 Lecturer: Denis Zorin Scribe: Adrian Secord, Yotam Gingold

More information

Fairfield Public Schools

Fairfield Public Schools Mathematics Fairfield Public Schools AP Statistics AP Statistics BOE Approved 04/08/2014 1 AP STATISTICS Critical Areas of Focus AP Statistics is a rigorous course that offers advanced students an opportunity

More information

Section 1.1. Introduction to R n

Section 1.1. Introduction to R n The Calculus of Functions of Several Variables Section. Introduction to R n Calculus is the study of functional relationships and how related quantities change with each other. In your first exposure to

More information

1 2 3 1 1 2 x = + x 2 + x 4 1 0 1

1 2 3 1 1 2 x = + x 2 + x 4 1 0 1 (d) If the vector b is the sum of the four columns of A, write down the complete solution to Ax = b. 1 2 3 1 1 2 x = + x 2 + x 4 1 0 0 1 0 1 2. (11 points) This problem finds the curve y = C + D 2 t which

More information

Overview of Factor Analysis

Overview of Factor Analysis Overview of Factor Analysis Jamie DeCoster Department of Psychology University of Alabama 348 Gordon Palmer Hall Box 870348 Tuscaloosa, AL 35487-0348 Phone: (205) 348-4431 Fax: (205) 348-8648 August 1,

More information

Mathematics Pre-Test Sample Questions A. { 11, 7} B. { 7,0,7} C. { 7, 7} D. { 11, 11}

Mathematics Pre-Test Sample Questions A. { 11, 7} B. { 7,0,7} C. { 7, 7} D. { 11, 11} Mathematics Pre-Test Sample Questions 1. Which of the following sets is closed under division? I. {½, 1,, 4} II. {-1, 1} III. {-1, 0, 1} A. I only B. II only C. III only D. I and II. Which of the following

More information

Descriptive Statistics

Descriptive Statistics Y520 Robert S Michael Goal: Learn to calculate indicators and construct graphs that summarize and describe a large quantity of values. Using the textbook readings and other resources listed on the web

More information

Data Mining: Exploring Data. Lecture Notes for Chapter 3. Introduction to Data Mining

Data Mining: Exploring Data. Lecture Notes for Chapter 3. Introduction to Data Mining Data Mining: Exploring Data Lecture Notes for Chapter 3 Introduction to Data Mining by Tan, Steinbach, Kumar What is data exploration? A preliminary exploration of the data to better understand its characteristics.

More information

DATA ANALYSIS II. Matrix Algorithms

DATA ANALYSIS II. Matrix Algorithms DATA ANALYSIS II Matrix Algorithms Similarity Matrix Given a dataset D = {x i }, i=1,..,n consisting of n points in R d, let A denote the n n symmetric similarity matrix between the points, given as where

More information

Data Mining: Exploring Data. Lecture Notes for Chapter 3. Slides by Tan, Steinbach, Kumar adapted by Michael Hahsler

Data Mining: Exploring Data. Lecture Notes for Chapter 3. Slides by Tan, Steinbach, Kumar adapted by Michael Hahsler Data Mining: Exploring Data Lecture Notes for Chapter 3 Slides by Tan, Steinbach, Kumar adapted by Michael Hahsler Topics Exploratory Data Analysis Summary Statistics Visualization What is data exploration?

More information

Factor Analysis. Principal components factor analysis. Use of extracted factors in multivariate dependency models

Factor Analysis. Principal components factor analysis. Use of extracted factors in multivariate dependency models Factor Analysis Principal components factor analysis Use of extracted factors in multivariate dependency models 2 KEY CONCEPTS ***** Factor Analysis Interdependency technique Assumptions of factor analysis

More information

FACTOR ANALYSIS. Factor Analysis is similar to PCA in that it is a technique for studying the interrelationships among variables.

FACTOR ANALYSIS. Factor Analysis is similar to PCA in that it is a technique for studying the interrelationships among variables. FACTOR ANALYSIS Introduction Factor Analysis is similar to PCA in that it is a technique for studying the interrelationships among variables Both methods differ from regression in that they don t have

More information

Module 3: Correlation and Covariance

Module 3: Correlation and Covariance Using Statistical Data to Make Decisions Module 3: Correlation and Covariance Tom Ilvento Dr. Mugdim Pašiƒ University of Delaware Sarajevo Graduate School of Business O ften our interest in data analysis

More information

Nonlinear Iterative Partial Least Squares Method

Nonlinear Iterative Partial Least Squares Method Numerical Methods for Determining Principal Component Analysis Abstract Factors Béchu, S., Richard-Plouet, M., Fernandez, V., Walton, J., and Fairley, N. (2016) Developments in numerical treatments for

More information

Glencoe. correlated to SOUTH CAROLINA MATH CURRICULUM STANDARDS GRADE 6 3-3, 5-8 8-4, 8-7 1-6, 4-9

Glencoe. correlated to SOUTH CAROLINA MATH CURRICULUM STANDARDS GRADE 6 3-3, 5-8 8-4, 8-7 1-6, 4-9 Glencoe correlated to SOUTH CAROLINA MATH CURRICULUM STANDARDS GRADE 6 STANDARDS 6-8 Number and Operations (NO) Standard I. Understand numbers, ways of representing numbers, relationships among numbers,

More information

Linear Threshold Units

Linear Threshold Units Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear

More information

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not. Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C

More information

Chapter 6. Orthogonality

Chapter 6. Orthogonality 6.3 Orthogonal Matrices 1 Chapter 6. Orthogonality 6.3 Orthogonal Matrices Definition 6.4. An n n matrix A is orthogonal if A T A = I. Note. We will see that the columns of an orthogonal matrix must be

More information

Towards Online Recognition of Handwritten Mathematics

Towards Online Recognition of Handwritten Mathematics Towards Online Recognition of Handwritten Mathematics Vadim Mazalov, joint work with Oleg Golubitsky and Stephen M. Watt Ontario Research Centre for Computer Algebra Department of Computer Science Western

More information

Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data

Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data CMPE 59H Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data Term Project Report Fatma Güney, Kübra Kalkan 1/15/2013 Keywords: Non-linear

More information

How to report the percentage of explained common variance in exploratory factor analysis

How to report the percentage of explained common variance in exploratory factor analysis UNIVERSITAT ROVIRA I VIRGILI How to report the percentage of explained common variance in exploratory factor analysis Tarragona 2013 Please reference this document as: Lorenzo-Seva, U. (2013). How to report

More information

Simple Linear Regression Inference

Simple Linear Regression Inference Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation

More information

Principle Component Analysis and Partial Least Squares: Two Dimension Reduction Techniques for Regression

Principle Component Analysis and Partial Least Squares: Two Dimension Reduction Techniques for Regression Principle Component Analysis and Partial Least Squares: Two Dimension Reduction Techniques for Regression Saikat Maitra and Jun Yan Abstract: Dimension reduction is one of the major tasks for multivariate

More information

Cluster Analysis. Isabel M. Rodrigues. Lisboa, 2014. Instituto Superior Técnico

Cluster Analysis. Isabel M. Rodrigues. Lisboa, 2014. Instituto Superior Técnico Instituto Superior Técnico Lisboa, 2014 Introduction: Cluster analysis What is? Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from

More information

EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set

EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set Amhmed A. Bhih School of Electrical and Electronic Engineering Princy Johnson School of Electrical and Electronic Engineering Martin

More information

with functions, expressions and equations which follow in units 3 and 4.

with functions, expressions and equations which follow in units 3 and 4. Grade 8 Overview View unit yearlong overview here The unit design was created in line with the areas of focus for grade 8 Mathematics as identified by the Common Core State Standards and the PARCC Model

More information

Data Exploration and Preprocessing. Data Mining and Text Mining (UIC 583 @ Politecnico di Milano)

Data Exploration and Preprocessing. Data Mining and Text Mining (UIC 583 @ Politecnico di Milano) Data Exploration and Preprocessing Data Mining and Text Mining (UIC 583 @ Politecnico di Milano) References Jiawei Han and Micheline Kamber, "Data Mining: Concepts and Techniques", The Morgan Kaufmann

More information

Improving the Performance of Data Mining Models with Data Preparation Using SAS Enterprise Miner Ricardo Galante, SAS Institute Brasil, São Paulo, SP

Improving the Performance of Data Mining Models with Data Preparation Using SAS Enterprise Miner Ricardo Galante, SAS Institute Brasil, São Paulo, SP Improving the Performance of Data Mining Models with Data Preparation Using SAS Enterprise Miner Ricardo Galante, SAS Institute Brasil, São Paulo, SP ABSTRACT In data mining modelling, data preparation

More information

Epipolar Geometry. Readings: See Sections 10.1 and 15.6 of Forsyth and Ponce. Right Image. Left Image. e(p ) Epipolar Lines. e(q ) q R.

Epipolar Geometry. Readings: See Sections 10.1 and 15.6 of Forsyth and Ponce. Right Image. Left Image. e(p ) Epipolar Lines. e(q ) q R. Epipolar Geometry We consider two perspective images of a scene as taken from a stereo pair of cameras (or equivalently, assume the scene is rigid and imaged with a single camera from two different locations).

More information

business statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar

business statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar business statistics using Excel Glyn Davis & Branko Pecar OXFORD UNIVERSITY PRESS Detailed contents Introduction to Microsoft Excel 2003 Overview Learning Objectives 1.1 Introduction to Microsoft Excel

More information

KEANSBURG SCHOOL DISTRICT KEANSBURG HIGH SCHOOL Mathematics Department. HSPA 10 Curriculum. September 2007

KEANSBURG SCHOOL DISTRICT KEANSBURG HIGH SCHOOL Mathematics Department. HSPA 10 Curriculum. September 2007 KEANSBURG HIGH SCHOOL Mathematics Department HSPA 10 Curriculum September 2007 Written by: Karen Egan Mathematics Supervisor: Ann Gagliardi 7 days Sample and Display Data (Chapter 1 pp. 4-47) Surveys and

More information

Pennsylvania System of School Assessment

Pennsylvania System of School Assessment Pennsylvania System of School Assessment The Assessment Anchors, as defined by the Eligible Content, are organized into cohesive blueprints, each structured with a common labeling system that can be read

More information

Diagrams and Graphs of Statistical Data

Diagrams and Graphs of Statistical Data Diagrams and Graphs of Statistical Data One of the most effective and interesting alternative way in which a statistical data may be presented is through diagrams and graphs. There are several ways in

More information

Data, Measurements, Features

Data, Measurements, Features Data, Measurements, Features Middle East Technical University Dep. of Computer Engineering 2009 compiled by V. Atalay What do you think of when someone says Data? We might abstract the idea that data are

More information

Going Big in Data Dimensionality:

Going Big in Data Dimensionality: LUDWIG- MAXIMILIANS- UNIVERSITY MUNICH DEPARTMENT INSTITUTE FOR INFORMATICS DATABASE Going Big in Data Dimensionality: Challenges and Solutions for Mining High Dimensional Data Peer Kröger Lehrstuhl für

More information

Least Squares Estimation

Least Squares Estimation Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David

More information

Algebra Unpacked Content For the new Common Core standards that will be effective in all North Carolina schools in the 2012-13 school year.

Algebra Unpacked Content For the new Common Core standards that will be effective in all North Carolina schools in the 2012-13 school year. This document is designed to help North Carolina educators teach the Common Core (Standard Course of Study). NCDPI staff are continually updating and improving these tools to better serve teachers. Algebra

More information

How To Solve The Cluster Algorithm

How To Solve The Cluster Algorithm Cluster Algorithms Adriano Cruz adriano@nce.ufrj.br 28 de outubro de 2013 Adriano Cruz adriano@nce.ufrj.br () Cluster Algorithms 28 de outubro de 2013 1 / 80 Summary 1 K-Means Adriano Cruz adriano@nce.ufrj.br

More information

Introduction to General and Generalized Linear Models

Introduction to General and Generalized Linear Models Introduction to General and Generalized Linear Models General Linear Models - part I Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby

More information

Multidimensional data and factorial methods

Multidimensional data and factorial methods Multidimensional data and factorial methods Bidimensional data x 5 4 3 4 X 3 6 X 3 5 4 3 3 3 4 5 6 x Cartesian plane Multidimensional data n X x x x n X x x x n X m x m x m x nm Factorial plane Interpretation

More information

Applied Linear Algebra I Review page 1

Applied Linear Algebra I Review page 1 Applied Linear Algebra Review 1 I. Determinants A. Definition of a determinant 1. Using sum a. Permutations i. Sign of a permutation ii. Cycle 2. Uniqueness of the determinant function in terms of properties

More information

Fitting Subject-specific Curves to Grouped Longitudinal Data

Fitting Subject-specific Curves to Grouped Longitudinal Data Fitting Subject-specific Curves to Grouped Longitudinal Data Djeundje, Viani Heriot-Watt University, Department of Actuarial Mathematics & Statistics Edinburgh, EH14 4AS, UK E-mail: vad5@hw.ac.uk Currie,

More information

Factor Analysis. Factor Analysis

Factor Analysis. Factor Analysis Factor Analysis Principal Components Analysis, e.g. of stock price movements, sometimes suggests that several variables may be responding to a small number of underlying forces. In the factor model, we

More information

New York State Student Learning Objective: Regents Geometry

New York State Student Learning Objective: Regents Geometry New York State Student Learning Objective: Regents Geometry All SLOs MUST include the following basic components: Population These are the students assigned to the course section(s) in this SLO all students

More information

The Singular Value Decomposition in Symmetric (Löwdin) Orthogonalization and Data Compression

The Singular Value Decomposition in Symmetric (Löwdin) Orthogonalization and Data Compression The Singular Value Decomposition in Symmetric (Löwdin) Orthogonalization and Data Compression The SVD is the most generally applicable of the orthogonal-diagonal-orthogonal type matrix decompositions Every

More information

Correlation key concepts:

Correlation key concepts: CORRELATION Correlation key concepts: Types of correlation Methods of studying correlation a) Scatter diagram b) Karl pearson s coefficient of correlation c) Spearman s Rank correlation coefficient d)

More information

Neural Networks Lesson 5 - Cluster Analysis

Neural Networks Lesson 5 - Cluster Analysis Neural Networks Lesson 5 - Cluster Analysis Prof. Michele Scarpiniti INFOCOM Dpt. - Sapienza University of Rome http://ispac.ing.uniroma1.it/scarpiniti/index.htm michele.scarpiniti@uniroma1.it Rome, 29

More information

Chapter ML:XI (continued)

Chapter ML:XI (continued) Chapter ML:XI (continued) XI. Cluster Analysis Data Mining Overview Cluster Analysis Basics Hierarchical Cluster Analysis Iterative Cluster Analysis Density-Based Cluster Analysis Cluster Evaluation Constrained

More information

Rachel J. Goldberg, Guideline Research/Atlanta, Inc., Duluth, GA

Rachel J. Goldberg, Guideline Research/Atlanta, Inc., Duluth, GA PROC FACTOR: How to Interpret the Output of a Real-World Example Rachel J. Goldberg, Guideline Research/Atlanta, Inc., Duluth, GA ABSTRACT THE METHOD This paper summarizes a real-world example of a factor

More information

Bindel, Spring 2012 Intro to Scientific Computing (CS 3220) Week 3: Wednesday, Feb 8

Bindel, Spring 2012 Intro to Scientific Computing (CS 3220) Week 3: Wednesday, Feb 8 Spaces and bases Week 3: Wednesday, Feb 8 I have two favorite vector spaces 1 : R n and the space P d of polynomials of degree at most d. For R n, we have a canonical basis: R n = span{e 1, e 2,..., e

More information

Chapter 17. Orthogonal Matrices and Symmetries of Space

Chapter 17. Orthogonal Matrices and Symmetries of Space Chapter 17. Orthogonal Matrices and Symmetries of Space Take a random matrix, say 1 3 A = 4 5 6, 7 8 9 and compare the lengths of e 1 and Ae 1. The vector e 1 has length 1, while Ae 1 = (1, 4, 7) has length

More information

Steven M. Ho!and. Department of Geology, University of Georgia, Athens, GA 30602-2501

Steven M. Ho!and. Department of Geology, University of Georgia, Athens, GA 30602-2501 PRINCIPAL COMPONENTS ANALYSIS (PCA) Steven M. Ho!and Department of Geology, University of Georgia, Athens, GA 30602-2501 May 2008 Introduction Suppose we had measured two variables, length and width, and

More information

Measurement with Ratios

Measurement with Ratios Grade 6 Mathematics, Quarter 2, Unit 2.1 Measurement with Ratios Overview Number of instructional days: 15 (1 day = 45 minutes) Content to be learned Use ratio reasoning to solve real-world and mathematical

More information

MAT 200, Midterm Exam Solution. a. (5 points) Compute the determinant of the matrix A =

MAT 200, Midterm Exam Solution. a. (5 points) Compute the determinant of the matrix A = MAT 200, Midterm Exam Solution. (0 points total) a. (5 points) Compute the determinant of the matrix 2 2 0 A = 0 3 0 3 0 Answer: det A = 3. The most efficient way is to develop the determinant along the

More information

Algebra 1 Course Information

Algebra 1 Course Information Course Information Course Description: Students will study patterns, relations, and functions, and focus on the use of mathematical models to understand and analyze quantitative relationships. Through

More information

Higher Education Math Placement

Higher Education Math Placement Higher Education Math Placement Placement Assessment Problem Types 1. Whole Numbers, Fractions, and Decimals 1.1 Operations with Whole Numbers Addition with carry Subtraction with borrowing Multiplication

More information

Partial Least Squares (PLS) Regression.

Partial Least Squares (PLS) Regression. Partial Least Squares (PLS) Regression. Hervé Abdi 1 The University of Texas at Dallas Introduction Pls regression is a recent technique that generalizes and combines features from principal component

More information

Exploratory Factor Analysis

Exploratory Factor Analysis Exploratory Factor Analysis Definition Exploratory factor analysis (EFA) is a procedure for learning the extent to which k observed variables might measure m abstract variables, wherein m is less than

More information

CRLS Mathematics Department Algebra I Curriculum Map/Pacing Guide

CRLS Mathematics Department Algebra I Curriculum Map/Pacing Guide Curriculum Map/Pacing Guide page 1 of 14 Quarter I start (CP & HN) 170 96 Unit 1: Number Sense and Operations 24 11 Totals Always Include 2 blocks for Review & Test Operating with Real Numbers: How are

More information

APPM4720/5720: Fast algorithms for big data. Gunnar Martinsson The University of Colorado at Boulder

APPM4720/5720: Fast algorithms for big data. Gunnar Martinsson The University of Colorado at Boulder APPM4720/5720: Fast algorithms for big data Gunnar Martinsson The University of Colorado at Boulder Course objectives: The purpose of this course is to teach efficient algorithms for processing very large

More information

Iris Sample Data Set. Basic Visualization Techniques: Charts, Graphs and Maps. Summary Statistics. Frequency and Mode

Iris Sample Data Set. Basic Visualization Techniques: Charts, Graphs and Maps. Summary Statistics. Frequency and Mode Iris Sample Data Set Basic Visualization Techniques: Charts, Graphs and Maps CS598 Information Visualization Spring 2010 Many of the exploratory data techniques are illustrated with the Iris Plant data

More information

MULTIVARIATE DATA ANALYSIS WITH PCA, CA AND MS TORSTEN MADSEN 2007

MULTIVARIATE DATA ANALYSIS WITH PCA, CA AND MS TORSTEN MADSEN 2007 MULTIVARIATE DATA ANALYSIS WITH PCA, CA AND MS TORSTEN MADSEN 2007 Archaeological material that we wish to analyse through formalised methods has to be described prior to analysis in a standardised, formalised

More information

Understanding and Applying Kalman Filtering

Understanding and Applying Kalman Filtering Understanding and Applying Kalman Filtering Lindsay Kleeman Department of Electrical and Computer Systems Engineering Monash University, Clayton 1 Introduction Objectives: 1. Provide a basic understanding

More information