Principal Component Analysis

Size: px
Start display at page:

Download "Principal Component Analysis"

Transcription

1 Principal Component Analsis Mark Richardson Ma 29 Contents Introduction 2 2 An Eample From Multivariate Data Analsis 3 3 The Technical Details Of PCA 6 4 The Singular Value Decomposition 9 5 Image Compression Using PCA 6 Blind Source Separation 5 7 Conclusions 9 8 Appendi - MATLAB 2

2 Introduction Principal Component Analsis (PCA) is the general name for a technique which uses sophisticated underling mathematical principles to transforms a number of possibl correlated variables into a smaller number of variables called principal components. The origins of PCA lie in multivariate data analsis, however, it has a wide range of other applications, as we will show in due course. PCA has been called, one of the most important results from applied linear algebra [2] and perhaps its most common use is as the first step in tring to analse large data sets. Some of the other common applications include; de-noising signals, blind source separation, and data compression. In general terms, PCA uses a vector space transform to reduce the dimensionalit of large data sets. Using mathematical projection, the original data set, which ma have involved man variables, can often be interpreted in just a few variables (the principal components). It is therefore often the case that an eamination of the reduced dimension data set will allow the the user to spot trends, patterns and outliers in the data, far more easil than would have been possible without performing the principal component analsis. The aim of this essa is to eplain the theoretical side of PCA, and to provide eamples of its application. We will begin with a non-rigorous motivational eample from multivariate data analsis in which we will attempt to etract some meaning from a 7 dimensional data set. After this motivational eample, we shall discuss the PCA technique in terms of its linear algebra fundamentals. This will lead us to a method for implementing PCA for real-world data, and we will see that there is a close connection between PCA and the singular value decomposition (SVD) from numerical linear algebra. We will then look at two further eamples of PCA in practice; Image Compression and Blind Source Separation. 2

3 2 An Eample From Multivariate Data Analsis In this section, we will eamine some real life multivariate data in order to eplain, in simple terms what PCA achieves. We will perform a principal component analsis of this data and eamine the results, though we will skip over the computational details for now. Suppose that we are eamining the following DEFRA data showing the consumption in grams (per person, per week) of 7 different tpes of foodstuff measured and averaged in the four countries of the United Kingdom in 997. We shall sa that the 7 food tpes are the variables and the 4 countries are the observations. A cursor glance over the numbers in Table does not reveal much, indeed in general it is difficult to etract meaning from an given arra of numbers. Given that this is actuall a relativel small data set, we see that a powerful analtical method is absolutel necessar if we wish to observe trends and patterns in larger data. England Wales Scotland N Ireland Cheese Carcass meat Other meat Fish Fats and oils Sugars Fresh potatoes Fresh Veg Other Veg Processed potatoes Processed Veg Fresh fruit Cereals Beverages Soft drinks Alcoholic drinks Confectioner Table : UK food consumption in 997 (g/person/week). Source: DEFRA website We need some wa of making sense of the above data. Are there an trends present which are not obvious from glancing at the arra of numbers? Traditionall, we would use a series of bivariate plots (scatter diagrams) and analse these to tr and determine an relationships between variables, however the number of such plots required for such a task is tpicall O(n 2 ), where n is the number of variables. Clearl, for large data sets, this is not feasible. PCA generalises this idea and allows us to perform such an analsis simultaneousl, for man variables. In our eample above, we have 7 dimensional data for 4 countries. We can thus imagine plotting the 4 coordinates representing the 4 countries in 7 dimensional space. If there is an correlation between the observations (the countries), this will be observed in the 7 dimensional space b the correlated points being clustered close together, though of course since we cannot visualise such a space, we are not able to see such clustering directl. Department for Environment, Food and Rural Affairs 3

4 The first task of PCA is to identif a new set of orthogonal coordinate aes through the data. This is achieved b finding the direction of maimal variance through the coordinates in the 7 dimensional space. It is equivalent to obtaining the (least-squares) line of best fit through the plotted data. We call this new ais the first principal component of the data. Once this first principal component has been obtained, we can use orthogonal projection 2 to map the coordinates down onto this new ais. In our food eample above, the four 7 dimensional coordinates are projected down onto the first principal component to obtain the following representation in Figure. Figure : Projections onto first principal component (-D space).5.5 Wal Eng Scot N Ire PC This tpe of diagram is known as a score plot. Alread, we can see that the there are two potential clusters forming, in the sense that England, Wales and Scotland seem to be close together at one end of the principal component, whilst Northern Ireland is positioned at the opposite end of the ais. The PCA method then obtains a second principal coordinate (ais) which is both orthogonal to the first PC, and is the net best direction for approimating the original data (i.e. it finds the direction of second largest variance in the data, chosen from directions which are orthogonal to the first principal component). We now have two orthogonal principal components defining a plane which, similarl to before, we can project our coordinates down onto. This is shown below in the 2 dimensional score plot in Figure 2. Notice that the inclusion of the second principal component has highlighted variation between the dietar habits present England, Scotland and Wales. Figure 2: Projections onto first 2 principal components (2-D space) 4 2 Wal PC2 Eng N Ire 2 Scot PC As part of the PCA method (which will be eplained in detail later), we automaticall obtain information about the contributions of each principal component to the total variance of the coordinates. In fact, in this case approimatel 67% of the variance in the data is accounted for b the first principal component, and approimatel 97% is accounted for in total b the first two principal components. In this case, we have therefore accounted for the vast majorit of the variation in the data using a two dimensional plot - a dramatic reduction in dimensionalit from seventeen dimensions to two. 2 In linear algebra and functional analsis, a projection is defined as a linear transformation, P, that maps from a given vector space to the same vector space and is such that P 2 = P. 4

5 In practice, it is usuall sufficient to include enough principal components so that somewhere in the region of 7 8% of the variation in the data is accounted for [3]. This information can be summarised in a plot of the variances (nonzero eigenvalues) with respect to the principal component number (eigenvector number), which is given in Figure 3, below. 5 4 Figure 3: Eigenspectrum eigenvalue eigenvector number We can also consider the influence of each of the original variables upon the principal components. This information can be summarised in the following plot, in Figure 4. Figure 4: Load plot effect(pc2).5.5 Fresh potatoes Fresh fruit Other Fresh VegCereals Veg Other meat FishFats and oils Cheese Sugars Processed Confectioner Processed Beverages Carcass Veg potatoes meat Alcoholic drinks Soft drinks effect(pc) Observe that there is a central group of variables around the middle of each principal component, with four variables on the peripher that do not seem to be part of the group. Recall the 2D score plot (Figure 2), on which England, Wales and Scotland were clustered together, whilst Northern Ireland was the countr that was awa from the cluster. Perhaps there is some association to be made between the four variables that are awa from the cluster in Figure 4 and the countr that is located awa from the rest of the countries in Figure 2, Northern Ireland. A look at the original data in Table reveals that for the three variables, Fresh potatoes, Alcoholic drinks and Fresh fruit, there is a noticeable difference between the values for England, Wales and Scotland, which are roughl similar, and Northern Ireland, which is usuall significantl higher or lower. PCA has the abilit to be able to make these associations for us. It has also successfull managed to reduce the dimensionalit of our data set down from 7 to 2, allowing us to assert (using Figure 2) that countries England, Wales and Scotland are similar with Northern Ireland being different in some wa. Furthermore, using Figure 4 we were able to associate certain food tpes with each cluster of countries. 5

6 3 The Technical Details Of PCA The principal component analsis for the eample above took a large set of data and identified an optimal new basis in which to re-epress the data. This mirrors the general aim of the PCA method: can we obtain another basis that is a linear combination of the original basis and that re-epresses the data optimall? There are some ambiguous terms in this statement, which we shall address shortl, however for now let us frame the problem in the following wa. Assume that we start with a data set that is represented in terms of an m n matri, X where the n columns are the samples (e.g. observations) and the m rows are the variables. We wish to linearl transform this matri, X into another matri, Y, also of dimension m n, so that for some m m matri, P, Y = PX () This equation represents a change of basis. If we consider the rows of P to be the row vectors p,p 2,...,p m, and the columns of X to be the column vectors, 2,..., n, then (3) can be interpreted in the following wa. p. p. 2 p. n PX = ( ) p 2. p 2. 2 p 2. n P P 2... P n = = Y p m. p m. 2 p m. n Note that p i, j R m, and so p i. j is just the standard Euclidean inner (dot) product. This tells us that the original data, X is being projected on to the columns of P. Thus, the rows of P, {p,p 2,...,p m } are a new basis for representing the columns of X. The rows of P will later become our principal component directions. We now need to address the issue of what this new basis should be, indeed what is the best wa to re-epress the data in X - in other words, how should we define independence between principal components in the new basis? Principal component analsis defines independence b considering the variance of the data in the original basis. It seeks to de-correlate the original data b finding the directions in which variance is maimised and then use these directions to define the new basis. Recall the definition for the variance of a random variable, Z with mean, µ. σ 2 Z = E[(Z µ) 2 ] Suppose we have a vector of n discrete measurements, r = ( r, r 2,..., r n ), with mean µ r. If we subtract the mean from each of the measurements, then we obtain a translated set of measurements r = (r, r 2,...,r n ), that has zero mean. Thus, the variance of these measurements is given b the relation σ 2 r = n rrt If we have a second vector of n measurements, s = (s, s 2,...,s n ), again with zero mean, then we can generalise this idea to obtain the covariance of r and s. Covariance can be thought of as a measure of how much two variables change together. Variance is thus a special case of covariance, when the the two variables are identical. It is in fact correct to divide through b a factor of n rather than n, a fact which we shall not justif here, but is discussed in [2]. 6

7 σ 2 rs = n rst We can now generalise this idea to considering our m n data matri, X. Recall that m was the number of variables, and n the number of samples. We can therefore think of this matri, X in terms of m row vectors, each of length n.,,2,n 2, 2,2 2,n X = = m, m,2 m,n 2. m Rm n, i T R n Since we have a row vector for each variable, each of these vectors contains all the samples for one particular variable. So for eample, i is a vector of the n samples for the i th variable. It therefore makes sense to consider the following matri product. C X = n XXT = n T T 2 T m 2 T 2 T 2 2 T m m T m T 2 m T m Rm m If we look closel at the entries of this matri, we see that we have computed all the possible covariance pairs between the m variables. Indeed, on the diagonal entries, we have the variances and on the off-diagonal entries, we have the covariances. This matri is therefore known as the Covariance Matri. Now let us return to the original problem, that of linearl transforming the original data matri using the relation Y = PX, for some matri, P. We need to decide upon some features that we would like the transformed matri, Y to ehibit and somehow relate this to the features of the corresponding covariance matri C Y. Covariance can be considered to be a measure of how well correlated two variables are. The PCA method makes the fundamental assumption that the variables in the transformed matri should be as uncorrelated as possible. This is equivalent to saing that the covariances of different variables in the matri C Y, should be as close to zero as possible (covariance matrices are alwas positive definite or positive semi-definite). Conversel, large variance values interest us, since the correspond to interesting dnamics in the sstem (small variances ma well be noise). We therefore have the following requirements for constructing the covariance matri, C Y :. Maimise the signal, measured b variance (maimise the diagonal entries) 2. Minimise the covariance between variables (minimise the off-diagonal entries) We thus come to the conclusion that since the minimum possible covariance is zero, we are seeking a diagonal matri, C Y. If we can choose the transformation matri, P in such a wa that C Y is diagonal, then we will have achieved our objective. We now make the assumption that the vectors in the new basis, p,p 2,...,p m are orthogonal (in fact, we additionall assume that the are orthonormal). Far from being restrictive, this assumption enables us to proceed b using the tools of linear algebra to find a solution to the problem. Consider the formula for the covariance matri, C Y and our interpretation of Y in terms of X and P. 7

8 C Y = n YYT = n (PX)(PX)T = n (PX)(XT P T ) = n P(XXT )P T i.e. C Y = n PSPT where S = XX T Note that S is an m m smmetric matri, since (XX T ) T = (X T ) T (X) T = XX T. We now invoke the well known theorem from linear algebra that ever square smmetric matri is orthogonall (orthonormall) diagonalisable. That is, we can write: S = EDE T Where E is an m m orthonormal matri whose columns are the orthonormal eigenvectors of S, and D is a diagonal matri which has the eigenvalues of S as its (diagonal) entries. The rank, r, of S is the number of orthonormal eigenvectors that it has. If B turns out to be rank-deficient so that r is less than the size, m, of the matri, then we simpl need to generate m r orthonormal vectors to fill the remaining columns of S. It is at this point that we make a choice for the transformation matri, P. B choosing the rows of P to be the eigenvectors of S, we ensure that P = E T and vice-versa. Thus, substituting this into our derived epression for the covariance matri, C Y gives: C Y = = n PSPT n ET (EDE T )E Now, since E is an orthonormal matri, we have E T E = I, where I is the m m identit matri. Hence, for this special choice of P, we have: C Y = n D A last point to note is that with this method, we automaticall gain information about the relative importance of each principal component from the variances. The largest variance corresponds to the first principal component, the second largest to the second principal component, and so on. This therefore gives us a method for organising the data in the diagonalisation stage. Once we have obtained the eigenvalues and eigenvectors of S = XX T, we sort the eigenvalues in descending order and place them in this order on the diagonal of D. We then construct the orthonormal matri, E b placing the associated eigenvectors in the same order to form the columns of E (i.e. place the eigenvector that corresponds to the largest eigenvalue in the first column, the eigenvector corresponding to the second largest eigenvalue in the second column etc.). We have therefore achieved our objective of diagonalising the covariance matri of the transformed data. The principal components (the rows of P) are the eigenvectors of the covariance matri, XX T, and the rows are in order of importance, telling us how principal each principal component is. 8

9 4 The Singular Value Decomposition In this section, we will eamine how the well known singular value decomposition (SVD) from linear algebra can be used in principal component analsis. Indeed, we will show that the derivation of PCA in the previous section and the SVD are closel related. We will not derive the SVD, as it is a well established result, and can be found in an good book on numerical linear algebra, such as [4]. Given A R n m, not necessaril of full rank, a singular value decomposition of A is: Where A = UΣV T U R n n is orthonormal Σ R n m is diagonal V R m m is orthonormal In addition, the diagonal entries, σ i, of Σ are non-negative and are called the singular values of A. The are ordered such that the largest singular value, σ is placed in the (, ) entr of Σ, and the other singular values are placed in order down the diagonal, and satisf σ σ 2...σ p, where p = min(n, m). Note that we have reversed the row and column indees in defining the SVD from the wa the were defined in the derivation of PCA in the previous section. The reason for doing this will become apparent shortl. The SVD can be considered to be a general method for understanding change of basis, as can be illustrated b the following argument (which follows [4]). Since U R n n and V R m m are orthonormal matrices, their columns form bases for, respectivel, the vector spaces R n and R m. Therefore, an vector b R n can be epanded in the basis formed b the columns of U (also known as the left singular vectors of A) and an vector R m can be epanded in the basis formed b the columns of V (also known as the right singular vectors of A). The vectors for these epansions ˆb and ˆ, are given b: ˆb = U T b & ˆ = V T Now, if the relation b = A holds, then we can infer the following: U T b = U T A ˆb = U T (UΣV T ) ˆb = Σˆ Thus, the SVD allows us to assert that ever matri is diagonal, so long as we choose the appropriate bases for the domain and range spaces. How does this link in to the previous analsis of PCA? Consider the n m matri, A, for which we have a singular value decomposition, A = UΣV T. There is a theorem from linear algebra which sas that the non-zero singular values of A are the square roots of the nonzero eigenvalues of AA T or A T A. The former assertion for the case A T A is proven in the following wa: A T A = (UΣV T ) T (UΣV T ) = (VΣ T U T )(UΣV T ) = V(Σ T Σ)V T 9

10 We observe that A T A is similar to Σ T Σ, and thus it has the same eigenvalues. Since Σ T Σ is a square (m m), diagonal matri, the eigenvalues are in fact the diagonal entries, which are the squares of the singular values. Note that the nonzero eigenvalues of each of the covariance matrices, AA T and A T A are actuall identical. It should also be noted that we have effectivel performed an eigenvalue decomposition for the matri, A T A. Indeed, since A T A is smmetric, this is an orthogonal diagonalisation and thus the eigenvectors of A T A are the columns of V. This will be important in making the practical connection between the SVD and and the PCA of matri X, which is what we will do net. Returning to the original m n data matri, X, let us define a new n m matri, Z: Z = n X T Recall that since the m rows of X contained the n data samples, we subtracted the row average from each entr to ensure zero mean across the rows. Thus, the new matri, Z has columns with zero mean. Consider forming the m m matri, Z T Z: ( ) T ( ) Z T Z = X T X T n n i.e. = Z T Z = C X n XXT We find that defining Z in this wa ensures that Z T Z is equal to the covariance matri of X, C X. From the discussion in the previous section, the principal components of X (which is what we are tring to identif) are the eigenvectors of C X. Therefore, if we perform a singular value decomposition of the matri Z T Z, the principal components will be the columns of the orthogonal matri, V. The last step is to relate the SVD of Z T Z back to the change of basis represented b equation (3): Y = PX We wish to project the original data onto the directions described b the principal components. Since we have the relation V = P T, this is simpl: Y = V T X If we wish to recover the original data, we simpl compute (using orthogonalit of V): X = VY

11 5 Image Compression Using PCA In the previous section, we developed a method for principal component analsis which utilised the singular value decomposition of an m m matri Z T Z, where Z = n X T and X was an m n data matri. Since Z T Z R m m, the matri, V obtained in the singular value decomposition of Z T Z must also be of dimensions m m. Recall also that the columns of V are the principal component directions, and that the SVD automaticall sorts these components in decreasing order of importance or principalit, so that the most principal component is the first column of V. Suppose that before projecting the data using the relation, Y = V T X, we were to truncate the matri, V so that we kept onl the first r < m columns. We would thus have a matri Ṽ R m r. The projection Ỹ = ṼT X is still dimensionall consistent, and the result of the product is a matri, Ỹ R r n. Suppose that we then wished to transform this data back to the original basis b computing X = ṼỸ. We therefore recover the dimensions of the original data matri, X and obtain, X R m n. The matrices, X and X are of the same dimensions, but the are not the same matri, since we truncated the matri of principal components V in order to obtain X. It is therefore reasonable to conclude that the matri, X has in some sense, less information in it than the matri X. Of course, in terms of memor allocation on a computer, this is certainl not the case since both matrices have the same dimensions and would therefore allotted the same amount of memor. However, the matri, X can be computed as the product of two smaller matrices (Ṽ and Ỹ). This, together with the fact that the important information in the matri is captured b the first principal components suggests a possible method for image compression. During the subsequent analsis, we shall work with a standard test image that is often used in image processing and image compression. It is a grescale picture of a butterfl, and is displaed in Figure 5. We will use MATLAB to perform the following analsis, though the principles can be applied in other computational packages. Figure 5: The Butterfl grescale test image MATLAB considers grescale images as objects consisting of two components, a matri of piels, and a colourmap. The Butterfl image above is stored in a matri (and therefore has this number of piels). The colourmap is a 52 3 matri. For RGB colour images, each image can be stored as a single matri, where the third dimension stores three numbers in the range [, ] corresponding to each piel in the matri,

12 representing the intensit of the red, green and blue components. For a grescale image such as the one we are dealing with, the colourmap matri has three identical columns with a scale representing intensit on the one dimensional gre scale. Each element of the piel matri contains a number representing a certain intensit of gre scale for an individual piel. MATLAB displas all of the piels simultaneousl with the correct intensit and the grescale image that we see is produced. The matri containing the piel information is our data matri, X. We will perform a principal component analsis of this matri, using the SVD method outlined above. The steps involved are eactl as described above and summarised in the following MATLAB code. [fl,map] = imread('butterfl.gif'); % load image into MATLAB 2 fl=double(fl); % convert to double precision 3 image(fl),colormap(map); % displa image 4 ais off, ais equal 5 [m n]=size(fl); 6 mn = mean(fl,2); % compute row mean 7 X = fl repmat(mn,,n); % subtract row mean to obtain X 8 Z=/sqrt(n )*X'; % create matri, Z 9 covz=z'*z; % covariance matri of Z %% Singular value decomposition [U,S,V] = svd(covz); 2 variances=diag(s).*diag(s); % compute variances 3 bar(variances(:3)) % scree plot of variances 4 %% Etract first 2 principal components 5 PCs=4; 6 VV=V(:,:PCs); 7 Y=VV'*X; % project data onto PCs 8 ratio=256/(2*pcs+); % compression ratio 9 XX=VV*Y; % convert back to original basis 2 XX=XX+repmat(mn,,n); % add the row means back on 2 image(xx),colormap(map),ais off; % displa results Figure 6: MATLAB code for image compression PCA In this case, we have chosen to use the first 4 (out of 52) principal components. What compression ratio does this equate to? To answer this question, we need to compare the amount of data we would have needed to store previousl, with what we can now store. Without compression, we would still have our matri to store. After selecting the first 4 principal components, we have the two matrices Ṽ and Ỹ (VV and YY) in the above MATLAB code) from which we can obtain a piel matri b computing the matri product. Matri Ṽ is 52 4, whilst matri Ỹ is There is also one more matri that we must use if we wish to displa our image - the vector of means which we add back on after converting back to the original basis (this is just a 52 matri which we can later cop into a larger matri to add to X). We therefore have reduced the number of columns needed from 52 to = 4 and the compression ratio is then calculated in the following wa: 52 : 8 i.e. approimatel 6.3 : compression A decent ratio it seems, however what does the compressed image look like? The image for 4 principal components (6.3: compression) is displaed in Figure 7. 2

13 Figure 7: 4 principal components (6.3: compression) The loss in qualit is evident (after all, this loss compression, as opposed to lossless compression), however considering the compression ratio, the trade off seems quite good. Let s look net at the eigenspectrum, in Figure 8. Figure 8: Eigenspectrum (first 2 eigenvalues) 3 eigenvalue eigenvector number The first principal component accounts for 5.6% of the variance, the first two account for 69.8%, the first si 93.8%. This tpe of plot is not so informative here, as accounting for 93.8% of the variance in the data does not correspond to us seeing a clear image, as is shown in Figure 9. Figure 9: Image compressed using 6 principal components On the net page, in Figure 5, a selection of images is shown with an increasing number of principal components retained. In Table 5, the cumulative sum of the contribution from the first variances is displaed. 3

14 Eigenvector Number Cumulative proportion of variance Table 2: Cumulative variance accounted for b PCs 2.4: compression 2 principal components 39.4: compression 6 principal components 24.4: compression principal components 7.7: compression 4 principal components 2.5: compression 2 principal components 8.4: compression 3 principal components 6.3: compression 4 principal components 4.2: compression 6 principal components 2.8: compression 9 principal components 2.: compression 2 principal components.7: compression 5 principal components.4: compression 8 principal components Figure : The visual effect of retaining principal components 4

15 6 Blind Source Separation The final application of PCA in this report is motivated b the cocktail part problem, a diagram of which is displaed in Figure. Imagine we have N people at a cocktail part. The N people are all speaking at once, resulting in a miture of all the voices. Suppose that we wish to obtain the individual monologues from this miture - how would we go about doing this? Figure : The cocktail part problem (image courtes of Gari Clifford, MIT) The room has been equipped with eactl n microphones, spread around at different points in the room. Each microphone thus records a slightl different version of the combined signal, together with some random noise. B analsing these n combined signals, suing PCA, it is possible to both de-noise the group signal, and to separate out the original sources. A formal statement of this problem is: Matri Z R m n consists of m samples of n independent sources The signals are mied together linearl using a matri, A R n n The matri of observations is represented as the product X T = AZ T We attempt to demi the observations b finding W R n n s.t. Y T = WX T The hope is that Y Z, and thus W A These points reflect the assumptions of the blind source separation (BSS) problem:. The miture of source signals must be linear 2. The source signals are independent 3. The miture (the matri A) is stationar (constant) 4. The number of observations (microphones) is the same as the number of sources In order to use PCA for BSS, we need to define independence in terms of the variance of the signals. In analog with the previous eamples and discussion of PCA in Section 3, we assume that we will be able to de-correlate the individual signals b finding the (orthogonal) 5

16 directions of maimal variance for the matri of observations, Z. It is therefore possible to again use the SVD for this analsis. Consider the skinn SVD of X R m n : X = UΣV T where U R m n, Σ R n n, V R n n Comparing the the skinn SVD with the full SVD, in both cases, the n n matri V is the same. Assuming that m n, the diagonal matri of singular values, Σ, is square (n n) in the skinn case, and rectangular (m n) in the full case, with the additional m n rows being ghost rows (i.e. have entries that are all zero). The first n columns of the matri U in the full case are identical to the n columns of the skinn case U, with the additional m n columns being arbitrar orthogonal appendments. Recall that we are tring to approimate the original signals matri (Z Y) b tring to find a matri (W A ) such that: Y T = WX T This matri is obtained b rearranging the equation for the skinn SVD. X = UΣV T X T = VΣ T U T U T = Σ T V T X T Thus, we identif our approimation to the de-miing matri as W = Σ T V T, and our de-mied signals are therefore the columns of the matri U. Note that since the matri Σ is square and diagonal, Σ T = Σ, which is computed b simpl taking the reciprocal value of each diagonal entr of Σ. However, if we are using the SVD method, it is not necessar to worr about eplicitl calculating the matri W, since the SVD automaticall delivers us the de-mied signals. To illustrate this, consider constructing the matri Z R 2 3 consisting of the following three signals (the columns) sampled at 2 equispaced points on the interval [, 2] (the rows)..5 sin() 2mod(/,) sign(sin(π/4))* Figure 2: Three input signals for the BSS problem We now construct a 3 3 miing matri, A b randoml perturbing the 3 3 identit matri. The MATLAB code A=round((ee(M)+.*randn(3,3))*)/ achieves this. To demonstrate, an eample miing matri could be therefore be: A = To simulate the cocktail part, we will also add some noise to the signals before the mi (for this, we use the MATLAB code Z=Z+.2*randn(2,3)). After adding this noise and miing the signals to obtain X via X T = AZ T, we obtain the following miture of signals: 6

17 Figure 3: A miture of the three input signals, with noise added This miing corresponds to each of the three microphones in the eample above being close to a unique guest of the cocktail part, and therefore predominantl picking up what that individual is saing. We can now go ahead and perform a (skinn) singular value decomposition of the matri, X, using the command, [u,s,v]=svd(x,). Figure 4 shows the separated signals (the columns of U) plotted together, and individuall Figure 4: The separated signals plotted together, and individuall We observe that the etracted signals are quite easil identifiable representations the three input signals, however note that the sin and sawtooth functions have been inverted, and that the scaling is different to that of the inputted signals. The performance of PCA for blind source separation is good in this case because the miing matri was close to the identit. If we generate normall distributed random numbers for the elements of the miing matri, we can get good results, but we can also get poor results. In the following figures, the upper left plot shows the three mied signals and the subsequent 3 plots are the SVD etractions. Figures 5, 6 and 7 show eamples of relativel good performance in etracting the original signals from the randoml mied combination, whilst 8 shows a relativel poor performance. 7

18 Figure 5: Figure 6: Figure 7: Figure 8: 8

19 7 Conclusions M aim in writing this article was that somebod with a similar level of mathematical knowledge as mself (i.e. earl graduate level) would be able to gain a good introductor understanding of PCA b reading this essa. I hope that the would understand that it is a diverse tool in data analsis, with man applications, three of which we have covered in detail here. I would also hope that the would gain an good understanding of the surrounding mathematics, and the close link that PCA has with the singular value decomposition. I embarked upon writing this essa with onl one application in mind, that of Blind Source Separation. However, when it came to researching the topic in detail, I found that there were man interesting applications of PCA, and I identified dimensional reduction in multivariate data analsis and image compression as being two of the most appealing alternative applications. Though it is a powerful technique, with a diverse range of possible applications, it is fair to sa that PCA is not necessaril the best wa to deal with each of the sample applications that I have discussed. For the multivariate data analsis eample, we were able to identif that the inhabitants of Northern Ireland were in some wa different in their dietar habits to those of the other three countries in the UK. We were also able to identif particular food groups with the eating habits of Northern Ireland, et we were limited in being able to make distinctions between the dietar habits of the English, Scottish and Welsh. In order to eplore this avenue, it would perhaps be necessar to perform a similar analsis on just those three countries. Image compression (and more generall, data compression) is b now getting to be a mature field,and there are man sophisticated technologies available that perform this task. JPEG is an obvious and comparable eample that springs to mind (JPEG can also involve loss compression). JPEG utilises the discrete cosine transform to convert the image to a frequenc-domain representation and generall achieves much higher qualit for similar compression ratios when compared to PCA. Having said this, PCA is a nice technique in its own right for implementing image compression and it is nice to find such a pleasing implementation. As we saw in the last eample, Blind Source Separation can cause problems for PCA under certain circumstances. PCA will not be able to separate the individual sources if the signals are combined nonlinearl, and can produce spurious results even if the combination is linear. PCA will also fail for BSS if the data is non-gaussian. In this situation, a well known technique that works is called Independent Component Analsis (ICA). The main philosophical difference between the two methods is that PCA defines independence using variance, whilst ICA defines independence using statistical independence - it identifies the principal components b maimising the statistical independence between each of the components. Writing the theoretical parts of this essa (Sections 3 and 4) was a ver educational eperience and I was aided in doing this b the ecellent paper b Jonathon Shlens, A Tutorial on Principal Component Analsis [2], and the famous book on Numerical Linear Algebra b Llod N. Trefethen and David Bau III[4]. However, the original motivation for writing this special topic was from the ecellent lectures in Signals Processing delivered b Dr. I Drobnjak and Dr. C. Orphinadou during Hilar term of 28, at the Oford Universit Mathematical Institute. 9

20 8 Appendi - MATLAB Figure 9: MATLAB code: Data Analsis X = [ ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ]; 7 covmatri=x*x'; data = X; [M,N] = size(data); mn = mean(data,2); 8 data = data repmat(mn,,n); Y = data' / sqrt(n ); [u,s,pc] = svd(y); 9 S = diag(s); V = S.* S; signals = PC' * data; plot(signals(,),,'b.',signals(,2),,'b.',... signals(,3),,'b.',signals(,4),,'r.','markersize',5) 2 label('pc') 3 tet(signals(,) 25,.2,'Eng'),tet(signals(,2) 25,.2,'Wal'), 4 tet(signals(,3) 2,.2,'Scot'),tet(signals(,4) 3,.2,'N Ire') 5 plot(signals(,),signals(2,),'b.',signals(,2),signals(2,2),'b.',... 6 signals(,3),signals(2,3),'b.',signals(,4),signals(2,4),'r.',... 7 'markersize',5) 8 label('pc'),label('pc2') 9 tet(signals(,)+2,signals(2,),'eng') 2 tet(signals(,2)+2,signals(2,2),'wal') 2 tet(signals(,3)+2,signals(2,3),'scot') 22 tet(signals(,4) 6,signals(2,4),'N Ire') plot(pc(,),pc(,2),'m.',pc(2,),pc(2,2),'m.', PC(3,),PC(3,2),'m.',PC(4,),PC(4,2),'m.', PC(5,),PC(5,2),'m.',PC(6,),PC(6,2),'m.', PC(7,),PC(7,2),'m.',PC(8,),PC(8,2),'m.', PC(9,),PC(9,2),'m.',PC(,),PC(,2),'m.', PC(,),PC(,2),'m.',PC(2,),PC(2,2),'m.',... 3 PC(3,),PC(3,2),'m.',PC(4,),PC(4,2),'m.',... 3 PC(5,),PC(5,2),'m.',PC(6,),PC(6,2),'m.', PC(7,),PC(7,2),'m.','markersize',5) label('effect(pc)'),label('effect(pc2)') tet(pc(,),pc(,2).,'cheese'),tet(pc(2,),pc(2,2).,'carcass meat') 37 tet(pc(3,),pc(3,2).,'other meat'),tet(pc(4,),pc(4,2).,'fish') 38 tet(pc(5,),pc(5,2).,'fats and oils'),tet(pc(6,),pc(6,2).,'sugars') 39 tet(pc(7,),pc(7,2).,'fresh potatoes') 4 tet(pc(8,),pc(8,2).,'fresh Veg') 4 tet(pc(9,),pc(9,2).,'other Veg') 42 tet(pc(,),pc(,2).,'processed potatoes') 43 tet(pc(,),pc(,2).,'processed Veg') 44 tet(pc(2,),pc(2,2).,'fresh fruit'), 45 tet(pc(3,),pc(3,2).,'cereals'),tet(pc(4,),pc(4,2).,'beverages') 46 tet(pc(5,),pc(5,2).,'soft drinks'), 47 tet(pc(6,),pc(6,2).,'alcoholic drinks') 48 tet(pc(7,),pc(7,2).,'confectioner') 49 %% 5 bar(v) 5 label('eigenvector number'), label('eigenvalue') 52 %% 53 t=sum(v);cumsum(v/t) 2

21 Figure 2: MATLAB code : Image Compression clear all;close all;clc, 2 3 [fl,map] = imread('butterfl.gif'); 4 fl=double(fl); 5 whos 6 7 image(fl) 8 colormap(map) 9 ais off, ais equal [m n]=size(fl); 2 mn = mean(fl,2); 3 X = fl repmat(mn,,n); 4 5 Z=/sqrt(n )*X'; 6 covz=z'*z; 7 8 [U,S,V] = svd(covz); 9 2 variances=diag(s).*diag(s); 2 bar(variances,'b') 22 lim([ 2]) 23 label('eigenvector number') 24 label('eigenvalue') tot=sum(variances) 27 [[:52]' cumsum(variances)/tot] PCs=4; 3 VV=V(:,:PCs); 3 Y=VV'*X; 32 ratio=52/(2*pcs+) XX=VV*Y; XX=XX+repmat(mn,,n); image(xx) 39 colormap(map) 4 ais off, ais equal 4 42 z=; 43 for PCs=[ ] 44 VV=V(:,:PCs); 45 Y=VV'*X; 46 XX=VV*Y; 47 XX=XX+repmat(mn,,n); 48 subplot(4,3,z) 49 z=z+; 5 image(xx) 5 colormap(map) 52 ais off, ais equal 53 title({[num2str(round(*52/(2*pcs+))/) ': compression']; [int2str(pcs) ' principal components']}) 55 end 2

22 Figure 2: MATLAB code : Blind Source Separation clear all; close all; clc; 2 3 set(,'defaultfigureposition',[ ],... 4 'defaultaeslinewidth',.9,'defaultaesfontsize',8,... 5 'defaultlinelinewidth',.,'defaultpatchlinewidth',.,... 6 'defaultlinemarkersize',5), format compact, format short 7 8 =[:.:2]'; 9 signala sin(); signalb 2*mod(/,) ; signalc sign(sin(.25*pi*))*.5; 2 Z=[signalA() signalb() signalc()]; 3 4 [N M]=size(Z); 5 Z=Z+.2*randn(N,M); 6 [N M]=size(Z); 7 A=round(*randn(3,3)*)/ 8 X=A*Z'; 9 X=X'; 2 2 figure 22 subplot(2,2,) 23 plot(x,'linewidth',2) 24 lim([,2]) 25 label(''),label('','rotation',) [u,s,v] = svd(x,); subplot(2,2,2) 3 plot(u(:,),'b','linewidth',2) 3 lim([,2]) 32 label(''),label('','rotation',) subplot(2,2,3) 35 plot(u(:,2),'g','linewidth',2) 36 lim([,2]) 37 label(''),label('','rotation',) subplot(2,2,4) 4 plot(u(:,3),'r','linewidth',2) 4 lim([,2]) 42 label(''),label('','rotation',) 22

23 References [] UMETRICS Multivariate Data Analsis MVA how8/c/# [2] Jonathon Shlens A Tutorial on Principal Component Analsis [3] Soren Hojsgaard Eamples of multivariate analsis Principal component analsis (PCA) Statistics and Decision Theor Research Unit, Danish Institute of Agricultural Sciences [4] Llod Trefethen & David Bau Numerical Linear Algebra SIAM [5] Signals and Sstems Group Uppsala Universit [6] Dr. I. Drobnjak Oford Universit MSc MMSC Signals Processing Lecture Notes (PCA/ICA) [7] Gari Clifford MIT Blind Source Separation: PCA & ICA 23

Downloaded from www.heinemann.co.uk/ib. equations. 2.4 The reciprocal function x 1 x

Downloaded from www.heinemann.co.uk/ib. equations. 2.4 The reciprocal function x 1 x Functions and equations Assessment statements. Concept of function f : f (); domain, range, image (value). Composite functions (f g); identit function. Inverse function f.. The graph of a function; its

More information

Linear Algebra Review. Vectors

Linear Algebra Review. Vectors Linear Algebra Review By Tim K. Marks UCSD Borrows heavily from: Jana Kosecka kosecka@cs.gmu.edu http://cs.gmu.edu/~kosecka/cs682.html Virginia de Sa Cogsci 8F Linear Algebra review UCSD Vectors The length

More information

Lecture 5: Singular Value Decomposition SVD (1)

Lecture 5: Singular Value Decomposition SVD (1) EEM3L1: Numerical and Analytical Techniques Lecture 5: Singular Value Decomposition SVD (1) EE3L1, slide 1, Version 4: 25-Sep-02 Motivation for SVD (1) SVD = Singular Value Decomposition Consider the system

More information

When I was 3.1 POLYNOMIAL FUNCTIONS

When I was 3.1 POLYNOMIAL FUNCTIONS 146 Chapter 3 Polnomial and Rational Functions Section 3.1 begins with basic definitions and graphical concepts and gives an overview of ke properties of polnomial functions. In Sections 3.2 and 3.3 we

More information

MAT188H1S Lec0101 Burbulla

MAT188H1S Lec0101 Burbulla Winter 206 Linear Transformations A linear transformation T : R m R n is a function that takes vectors in R m to vectors in R n such that and T (u + v) T (u) + T (v) T (k v) k T (v), for all vectors u

More information

Mathematical goals. Starting points. Materials required. Time needed

Mathematical goals. Starting points. Materials required. Time needed Level A7 of challenge: C A7 Interpreting functions, graphs and tables tables Mathematical goals Starting points Materials required Time needed To enable learners to understand: the relationship between

More information

Addition and Subtraction of Vectors

Addition and Subtraction of Vectors ddition and Subtraction of Vectors 1 ppendi ddition and Subtraction of Vectors In this appendi the basic elements of vector algebra are eplored. Vectors are treated as geometric entities represented b

More information

Introduction to Matrices for Engineers

Introduction to Matrices for Engineers Introduction to Matrices for Engineers C.T.J. Dodson, School of Mathematics, Manchester Universit 1 What is a Matrix? A matrix is a rectangular arra of elements, usuall numbers, e.g. 1 0-8 4 0-1 1 0 11

More information

Chapter 6. Orthogonality

Chapter 6. Orthogonality 6.3 Orthogonal Matrices 1 Chapter 6. Orthogonality 6.3 Orthogonal Matrices Definition 6.4. An n n matrix A is orthogonal if A T A = I. Note. We will see that the columns of an orthogonal matrix must be

More information

Exploratory data analysis for microarray data

Exploratory data analysis for microarray data Eploratory data analysis for microarray data Anja von Heydebreck Ma Planck Institute for Molecular Genetics, Dept. Computational Molecular Biology, Berlin, Germany heydebre@molgen.mpg.de Visualization

More information

x y The matrix form, the vector form, and the augmented matrix form, respectively, for the system of equations are

x y The matrix form, the vector form, and the augmented matrix form, respectively, for the system of equations are Solving Sstems of Linear Equations in Matri Form with rref Learning Goals Determine the solution of a sstem of equations from the augmented matri Determine the reduced row echelon form of the augmented

More information

INVESTIGATIONS AND FUNCTIONS 1.1.1 1.1.4. Example 1

INVESTIGATIONS AND FUNCTIONS 1.1.1 1.1.4. Example 1 Chapter 1 INVESTIGATIONS AND FUNCTIONS 1.1.1 1.1.4 This opening section introduces the students to man of the big ideas of Algebra 2, as well as different was of thinking and various problem solving strategies.

More information

5.2 Inverse Functions

5.2 Inverse Functions 78 Further Topics in Functions. Inverse Functions Thinking of a function as a process like we did in Section., in this section we seek another function which might reverse that process. As in real life,

More information

Functions and Graphs CHAPTER INTRODUCTION. The function concept is one of the most important ideas in mathematics. The study

Functions and Graphs CHAPTER INTRODUCTION. The function concept is one of the most important ideas in mathematics. The study Functions and Graphs CHAPTER 2 INTRODUCTION The function concept is one of the most important ideas in mathematics. The stud 2-1 Functions 2-2 Elementar Functions: Graphs and Transformations 2-3 Quadratic

More information

Review Jeopardy. Blue vs. Orange. Review Jeopardy

Review Jeopardy. Blue vs. Orange. Review Jeopardy Review Jeopardy Blue vs. Orange Review Jeopardy Jeopardy Round Lectures 0-3 Jeopardy Round $200 How could I measure how far apart (i.e. how different) two observations, y 1 and y 2, are from each other?

More information

SAMPLE. Polynomial functions

SAMPLE. Polynomial functions Objectives C H A P T E R 4 Polnomial functions To be able to use the technique of equating coefficients. To introduce the functions of the form f () = a( + h) n + k and to sketch graphs of this form through

More information

Identifying second degree equations

Identifying second degree equations Chapter 7 Identifing second degree equations 7.1 The eigenvalue method In this section we appl eigenvalue methods to determine the geometrical nature of the second degree equation a 2 + 2h + b 2 + 2g +

More information

4. Matrix Methods for Analysis of Structure in Data Sets:

4. Matrix Methods for Analysis of Structure in Data Sets: ATM 552 Notes: Matrix Methods: EOF, SVD, ETC. D.L.Hartmann Page 64 4. Matrix Methods for Analysis of Structure in Data Sets: Empirical Orthogonal Functions, Principal Component Analysis, Singular Value

More information

Mehtap Ergüven Abstract of Ph.D. Dissertation for the degree of PhD of Engineering in Informatics

Mehtap Ergüven Abstract of Ph.D. Dissertation for the degree of PhD of Engineering in Informatics INTERNATIONAL BLACK SEA UNIVERSITY COMPUTER TECHNOLOGIES AND ENGINEERING FACULTY ELABORATION OF AN ALGORITHM OF DETECTING TESTS DIMENSIONALITY Mehtap Ergüven Abstract of Ph.D. Dissertation for the degree

More information

15.062 Data Mining: Algorithms and Applications Matrix Math Review

15.062 Data Mining: Algorithms and Applications Matrix Math Review .6 Data Mining: Algorithms and Applications Matrix Math Review The purpose of this document is to give a brief review of selected linear algebra concepts that will be useful for the course and to develop

More information

C3: Functions. Learning objectives

C3: Functions. Learning objectives CHAPTER C3: Functions Learning objectives After studing this chapter ou should: be familiar with the terms one-one and man-one mappings understand the terms domain and range for a mapping understand the

More information

Higher. Polynomials and Quadratics 64

Higher. Polynomials and Quadratics 64 hsn.uk.net Higher Mathematics UNIT OUTCOME 1 Polnomials and Quadratics Contents Polnomials and Quadratics 64 1 Quadratics 64 The Discriminant 66 3 Completing the Square 67 4 Sketching Parabolas 70 5 Determining

More information

Affine Transformations

Affine Transformations A P P E N D I X C Affine Transformations CONTENTS C The need for geometric transformations 335 C2 Affine transformations 336 C3 Matri representation of the linear transformations 338 C4 Homogeneous coordinates

More information

Plane Stress Transformations

Plane Stress Transformations 6 Plane Stress Transformations ASEN 311 - Structures ASEN 311 Lecture 6 Slide 1 Plane Stress State ASEN 311 - Structures Recall that in a bod in plane stress, the general 3D stress state with 9 components

More information

LESSON EIII.E EXPONENTS AND LOGARITHMS

LESSON EIII.E EXPONENTS AND LOGARITHMS LESSON EIII.E EXPONENTS AND LOGARITHMS LESSON EIII.E EXPONENTS AND LOGARITHMS OVERVIEW Here s what ou ll learn in this lesson: Eponential Functions a. Graphing eponential functions b. Applications of eponential

More information

The Big Picture. Correlation. Scatter Plots. Data

The Big Picture. Correlation. Scatter Plots. Data The Big Picture Correlation Bret Hanlon and Bret Larget Department of Statistics Universit of Wisconsin Madison December 6, We have just completed a length series of lectures on ANOVA where we considered

More information

Chapter 13 Introduction to Linear Regression and Correlation Analysis

Chapter 13 Introduction to Linear Regression and Correlation Analysis Chapter 3 Student Lecture Notes 3- Chapter 3 Introduction to Linear Regression and Correlation Analsis Fall 2006 Fundamentals of Business Statistics Chapter Goals To understand the methods for displaing

More information

1. a. standard form of a parabola with. 2 b 1 2 horizontal axis of symmetry 2. x 2 y 2 r 2 o. standard form of an ellipse centered

1. a. standard form of a parabola with. 2 b 1 2 horizontal axis of symmetry 2. x 2 y 2 r 2 o. standard form of an ellipse centered Conic Sections. Distance Formula and Circles. More on the Parabola. The Ellipse and Hperbola. Nonlinear Sstems of Equations in Two Variables. Nonlinear Inequalities and Sstems of Inequalities In Chapter,

More information

I think that starting

I think that starting . Graphs of Functions 69. GRAPHS OF FUNCTIONS One can envisage that mathematical theor will go on being elaborated and etended indefinitel. How strange that the results of just the first few centuries

More information

Mathematics Placement Packet Colorado College Department of Mathematics and Computer Science

Mathematics Placement Packet Colorado College Department of Mathematics and Computer Science Mathematics Placement Packet Colorado College Department of Mathematics and Computer Science Colorado College has two all college requirements (QR and SI) which can be satisfied in full, or part, b taking

More information

Partial Least Squares (PLS) Regression.

Partial Least Squares (PLS) Regression. Partial Least Squares (PLS) Regression. Hervé Abdi 1 The University of Texas at Dallas Introduction Pls regression is a recent technique that generalizes and combines features from principal component

More information

SECTION 2.2. Distance and Midpoint Formulas; Circles

SECTION 2.2. Distance and Midpoint Formulas; Circles SECTION. Objectives. Find the distance between two points.. Find the midpoint of a line segment.. Write the standard form of a circle s equation.. Give the center and radius of a circle whose equation

More information

STIFFNESS OF THE HUMAN ARM

STIFFNESS OF THE HUMAN ARM SIFFNESS OF HE HUMAN ARM his web resource combines the passive properties of muscles with the neural feedback sstem of the short-loop (spinal) and long-loop (transcortical) reflees and eamine how the whole

More information

Slope-Intercept Form and Point-Slope Form

Slope-Intercept Form and Point-Slope Form Slope-Intercept Form and Point-Slope Form In this section we will be discussing Slope-Intercept Form and the Point-Slope Form of a line. We will also discuss how to graph using the Slope-Intercept Form.

More information

The Singular Value Decomposition in Symmetric (Löwdin) Orthogonalization and Data Compression

The Singular Value Decomposition in Symmetric (Löwdin) Orthogonalization and Data Compression The Singular Value Decomposition in Symmetric (Löwdin) Orthogonalization and Data Compression The SVD is the most generally applicable of the orthogonal-diagonal-orthogonal type matrix decompositions Every

More information

Similarity and Diagonalization. Similar Matrices

Similarity and Diagonalization. Similar Matrices MATH022 Linear Algebra Brief lecture notes 48 Similarity and Diagonalization Similar Matrices Let A and B be n n matrices. We say that A is similar to B if there is an invertible n n matrix P such that

More information

In this this review we turn our attention to the square root function, the function defined by the equation. f(x) = x. (5.1)

In this this review we turn our attention to the square root function, the function defined by the equation. f(x) = x. (5.1) Section 5.2 The Square Root 1 5.2 The Square Root In this this review we turn our attention to the square root function, the function defined b the equation f() =. (5.1) We can determine the domain and

More information

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Clustering Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Clustering Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining Data Mining Cluster Analsis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 Introduction to Data Mining b Tan, Steinbach, Kumar Clustering Algorithms K-means and its variants Hierarchical clustering

More information

Chapter 8. Lines and Planes. By the end of this chapter, you will

Chapter 8. Lines and Planes. By the end of this chapter, you will Chapter 8 Lines and Planes In this chapter, ou will revisit our knowledge of intersecting lines in two dimensions and etend those ideas into three dimensions. You will investigate the nature of planes

More information

For supervised classification we have a variety of measures to evaluate how good our model is Accuracy, precision, recall

For supervised classification we have a variety of measures to evaluate how good our model is Accuracy, precision, recall Cluster Validation Cluster Validit For supervised classification we have a variet of measures to evaluate how good our model is Accurac, precision, recall For cluster analsis, the analogous question is

More information

Find the Relationship: An Exercise in Graphing Analysis

Find the Relationship: An Exercise in Graphing Analysis Find the Relationship: An Eercise in Graphing Analsis Computer 5 In several laborator investigations ou do this ear, a primar purpose will be to find the mathematical relationship between two variables.

More information

MATH REVIEW SHEETS BEGINNING ALGEBRA MATH 60

MATH REVIEW SHEETS BEGINNING ALGEBRA MATH 60 MATH REVIEW SHEETS BEGINNING ALGEBRA MATH 60 A Summar of Concepts Needed to be Successful in Mathematics The following sheets list the ke concepts which are taught in the specified math course. The sheets

More information

Text Analytics (Text Mining)

Text Analytics (Text Mining) CSE 6242 / CX 4242 Apr 3, 2014 Text Analytics (Text Mining) LSI (uses SVD), Visualization Duen Horng (Polo) Chau Georgia Tech Some lectures are partly based on materials by Professors Guy Lebanon, Jeffrey

More information

Matrix Calculations: Applications of Eigenvalues and Eigenvectors; Inner Products

Matrix Calculations: Applications of Eigenvalues and Eigenvectors; Inner Products Matrix Calculations: Applications of Eigenvalues and Eigenvectors; Inner Products H. Geuvers Institute for Computing and Information Sciences Intelligent Systems Version: spring 2015 H. Geuvers Version:

More information

Lines and Planes 1. x(t) = at + b y(t) = ct + d

Lines and Planes 1. x(t) = at + b y(t) = ct + d 1 Lines in the Plane Lines and Planes 1 Ever line of points L in R 2 can be epressed as the solution set for an equation of the form A + B = C. The equation is not unique for if we multipl both sides b

More information

2.1 Three Dimensional Curves and Surfaces

2.1 Three Dimensional Curves and Surfaces . Three Dimensional Curves and Surfaces.. Parametric Equation of a Line An line in two- or three-dimensional space can be uniquel specified b a point on the line and a vector parallel to the line. The

More information

Zeros of Polynomial Functions. The Fundamental Theorem of Algebra. The Fundamental Theorem of Algebra. zero in the complex number system.

Zeros of Polynomial Functions. The Fundamental Theorem of Algebra. The Fundamental Theorem of Algebra. zero in the complex number system. _.qd /7/ 9:6 AM Page 69 Section. Zeros of Polnomial Functions 69. Zeros of Polnomial Functions What ou should learn Use the Fundamental Theorem of Algebra to determine the number of zeros of polnomial

More information

SECTION 5-1 Exponential Functions

SECTION 5-1 Exponential Functions 354 5 Eponential and Logarithmic Functions Most of the functions we have considered so far have been polnomial and rational functions, with a few others involving roots or powers of polnomial or rational

More information

Inner Product Spaces and Orthogonality

Inner Product Spaces and Orthogonality Inner Product Spaces and Orthogonality week 3-4 Fall 2006 Dot product of R n The inner product or dot product of R n is a function, defined by u, v a b + a 2 b 2 + + a n b n for u a, a 2,, a n T, v b,

More information

More Equations and Inequalities

More Equations and Inequalities Section. Sets of Numbers and Interval Notation 9 More Equations and Inequalities 9 9. Compound Inequalities 9. Polnomial and Rational Inequalities 9. Absolute Value Equations 9. Absolute Value Inequalities

More information

Dimensionality Reduction: Principal Components Analysis

Dimensionality Reduction: Principal Components Analysis Dimensionality Reduction: Principal Components Analysis In data mining one often encounters situations where there are a large number of variables in the database. In such situations it is very likely

More information

Nonlinear Iterative Partial Least Squares Method

Nonlinear Iterative Partial Least Squares Method Numerical Methods for Determining Principal Component Analysis Abstract Factors Béchu, S., Richard-Plouet, M., Fernandez, V., Walton, J., and Fairley, N. (2016) Developments in numerical treatments for

More information

Introduction to Matrix Algebra

Introduction to Matrix Algebra Psychology 7291: Multivariate Statistics (Carey) 8/27/98 Matrix Algebra - 1 Introduction to Matrix Algebra Definitions: A matrix is a collection of numbers ordered by rows and columns. It is customary

More information

Chapter 4 One Dimensional Kinematics

Chapter 4 One Dimensional Kinematics Chapter 4 One Dimensional Kinematics 41 Introduction 1 4 Position, Time Interval, Displacement 41 Position 4 Time Interval 43 Displacement 43 Velocity 3 431 Average Velocity 3 433 Instantaneous Velocity

More information

D.2. The Cartesian Plane. The Cartesian Plane The Distance and Midpoint Formulas Equations of Circles. D10 APPENDIX D Precalculus Review

D.2. The Cartesian Plane. The Cartesian Plane The Distance and Midpoint Formulas Equations of Circles. D10 APPENDIX D Precalculus Review D0 APPENDIX D Precalculus Review SECTION D. The Cartesian Plane The Cartesian Plane The Distance and Midpoint Formulas Equations of Circles The Cartesian Plane An ordered pair, of real numbers has as its

More information

So, using the new notation, P X,Y (0,1) =.08 This is the value which the joint probability function for X and Y takes when X=0 and Y=1.

So, using the new notation, P X,Y (0,1) =.08 This is the value which the joint probability function for X and Y takes when X=0 and Y=1. Joint probabilit is the probabilit that the RVs & Y take values &. like the PDF of the two events, and. We will denote a joint probabilit function as P,Y (,) = P(= Y=) Marginal probabilit of is the probabilit

More information

Similar matrices and Jordan form

Similar matrices and Jordan form Similar matrices and Jordan form We ve nearly covered the entire heart of linear algebra once we ve finished singular value decompositions we ll have seen all the most central topics. A T A is positive

More information

Pearson s Correlation Coefficient

Pearson s Correlation Coefficient Pearson s Correlation Coefficient In this lesson, we will find a quantitative measure to describe the strength of a linear relationship (instead of using the terms strong or weak). A quantitative measure

More information

Polynomials. Jackie Nicholas Jacquie Hargreaves Janet Hunter

Polynomials. Jackie Nicholas Jacquie Hargreaves Janet Hunter Mathematics Learning Centre Polnomials Jackie Nicholas Jacquie Hargreaves Janet Hunter c 26 Universit of Sdne Mathematics Learning Centre, Universit of Sdne 1 1 Polnomials Man of the functions we will

More information

Example: Document Clustering. Clustering: Definition. Notion of a Cluster can be Ambiguous. Types of Clusterings. Hierarchical Clustering

Example: Document Clustering. Clustering: Definition. Notion of a Cluster can be Ambiguous. Types of Clusterings. Hierarchical Clustering Overview Prognostic Models and Data Mining in Medicine, part I Cluster Analsis What is Cluster Analsis? K-Means Clustering Hierarchical Clustering Cluster Validit Eample: Microarra data analsis 6 Summar

More information

Linear Algebraic Equations, SVD, and the Pseudo-Inverse

Linear Algebraic Equations, SVD, and the Pseudo-Inverse Linear Algebraic Equations, SVD, and the Pseudo-Inverse Philip N. Sabes October, 21 1 A Little Background 1.1 Singular values and matrix inversion For non-smmetric matrices, the eigenvalues and singular

More information

x1 x 2 x 3 y 1 y 2 y 3 x 1 y 2 x 2 y 1 0.

x1 x 2 x 3 y 1 y 2 y 3 x 1 y 2 x 2 y 1 0. Cross product 1 Chapter 7 Cross product We are getting ready to study integration in several variables. Until now we have been doing only differential calculus. One outcome of this study will be our ability

More information

5.1. A Formula for Slope. Investigation: Points and Slope CONDENSED

5.1. A Formula for Slope. Investigation: Points and Slope CONDENSED CONDENSED L E S S O N 5.1 A Formula for Slope In this lesson ou will learn how to calculate the slope of a line given two points on the line determine whether a point lies on the same line as two given

More information

1 2 3 1 1 2 x = + x 2 + x 4 1 0 1

1 2 3 1 1 2 x = + x 2 + x 4 1 0 1 (d) If the vector b is the sum of the four columns of A, write down the complete solution to Ax = b. 1 2 3 1 1 2 x = + x 2 + x 4 1 0 0 1 0 1 2. (11 points) This problem finds the curve y = C + D 2 t which

More information

Component Ordering in Independent Component Analysis Based on Data Power

Component Ordering in Independent Component Analysis Based on Data Power Component Ordering in Independent Component Analysis Based on Data Power Anne Hendrikse Raymond Veldhuis University of Twente University of Twente Fac. EEMCS, Signals and Systems Group Fac. EEMCS, Signals

More information

Factoring bivariate polynomials with integer coefficients via Newton polygons

Factoring bivariate polynomials with integer coefficients via Newton polygons Filomat 27:2 (2013), 215 226 DOI 10.2298/FIL1302215C Published b Facult of Sciences and Mathematics, Universit of Niš, Serbia Available at: http://www.pmf.ni.ac.rs/filomat Factoring bivariate polnomials

More information

Cluster Analysis Overview. Data Mining Techniques: Cluster Analysis. What is Cluster Analysis? What is Cluster Analysis?

Cluster Analysis Overview. Data Mining Techniques: Cluster Analysis. What is Cluster Analysis? What is Cluster Analysis? Cluster Analsis Overview Data Mining Techniques: Cluster Analsis Mirek Riedewald Man slides based on presentations b Han/Kamber, Tan/Steinbach/Kumar, and Andrew Moore Introduction Foundations: Measuring

More information

CHAPTER TEN. Key Concepts

CHAPTER TEN. Key Concepts CHAPTER TEN Ke Concepts linear regression: slope intercept residual error sum of squares or residual sum of squares sum of squares due to regression mean squares error mean squares regression (coefficient)

More information

EQUATIONS OF LINES IN SLOPE- INTERCEPT AND STANDARD FORM

EQUATIONS OF LINES IN SLOPE- INTERCEPT AND STANDARD FORM . Equations of Lines in Slope-Intercept and Standard Form ( ) 8 In this Slope-Intercept Form Standard Form section Using Slope-Intercept Form for Graphing Writing the Equation for a Line Applications (0,

More information

SECTION 7-4 Algebraic Vectors

SECTION 7-4 Algebraic Vectors 7-4 lgebraic Vectors 531 SECTIN 7-4 lgebraic Vectors From Geometric Vectors to lgebraic Vectors Vector ddition and Scalar Multiplication Unit Vectors lgebraic Properties Static Equilibrium Geometric vectors

More information

2.6. The Circle. Introduction. Prerequisites. Learning Outcomes

2.6. The Circle. Introduction. Prerequisites. Learning Outcomes The Circle 2.6 Introduction A circle is one of the most familiar geometrical figures and has been around a long time! In this brief Section we discuss the basic coordinate geometr of a circle - in particular

More information

f x a 0 n 1 a 0 a 1 cos x a 2 cos 2x a 3 cos 3x b 1 sin x b 2 sin 2x b 3 sin 3x a n cos nx b n sin nx n 1 f x dx y

f x a 0 n 1 a 0 a 1 cos x a 2 cos 2x a 3 cos 3x b 1 sin x b 2 sin 2x b 3 sin 3x a n cos nx b n sin nx n 1 f x dx y Fourier Series When the French mathematician Joseph Fourier (768 83) was tring to solve a problem in heat conduction, he needed to epress a function f as an infinite series of sine and cosine functions:

More information

Example 1: Model A Model B Total Available. Gizmos. Dodads. System:

Example 1: Model A Model B Total Available. Gizmos. Dodads. System: Lesson : Sstems of Equations and Matrices Outline Objectives: I can solve sstems of three linear equations in three variables. I can solve sstems of linear inequalities I can model and solve real-world

More information

α = u v. In other words, Orthogonal Projection

α = u v. In other words, Orthogonal Projection Orthogonal Projection Given any nonzero vector v, it is possible to decompose an arbitrary vector u into a component that points in the direction of v and one that points in a direction orthogonal to v

More information

Eigenvalues and Eigenvectors

Eigenvalues and Eigenvectors Chapter 6 Eigenvalues and Eigenvectors 6. Introduction to Eigenvalues Linear equations Ax D b come from steady state problems. Eigenvalues have their greatest importance in dynamic problems. The solution

More information

Principal Component Analysis

Principal Component Analysis Principal Component Analysis ERS70D George Fernandez INTRODUCTION Analysis of multivariate data plays a key role in data analysis. Multivariate data consists of many different attributes or variables recorded

More information

Polynomial Degree and Finite Differences

Polynomial Degree and Finite Differences CONDENSED LESSON 7.1 Polynomial Degree and Finite Differences In this lesson you will learn the terminology associated with polynomials use the finite differences method to determine the degree of a polynomial

More information

1.6. Piecewise Functions. LEARN ABOUT the Math. Representing the problem using a graphical model

1.6. Piecewise Functions. LEARN ABOUT the Math. Representing the problem using a graphical model . Piecewise Functions YOU WILL NEED graph paper graphing calculator GOAL Understand, interpret, and graph situations that are described b piecewise functions. LEARN ABOUT the Math A cit parking lot uses

More information

Solving Quadratic Equations by Graphing. Consider an equation of the form. y ax 2 bx c a 0. In an equation of the form

Solving Quadratic Equations by Graphing. Consider an equation of the form. y ax 2 bx c a 0. In an equation of the form SECTION 11.3 Solving Quadratic Equations b Graphing 11.3 OBJECTIVES 1. Find an ais of smmetr 2. Find a verte 3. Graph a parabola 4. Solve quadratic equations b graphing 5. Solve an application involving

More information

6. The given function is only drawn for x > 0. Complete the function for x < 0 with the following conditions:

6. The given function is only drawn for x > 0. Complete the function for x < 0 with the following conditions: Precalculus Worksheet 1. Da 1 1. The relation described b the set of points {(-, 5 ),( 0, 5 ),(,8 ),(, 9) } is NOT a function. Eplain wh. For questions - 4, use the graph at the right.. Eplain wh the graph

More information

[1] Diagonal factorization

[1] Diagonal factorization 8.03 LA.6: Diagonalization and Orthogonal Matrices [ Diagonal factorization [2 Solving systems of first order differential equations [3 Symmetric and Orthonormal Matrices [ Diagonal factorization Recall:

More information

3. INNER PRODUCT SPACES

3. INNER PRODUCT SPACES . INNER PRODUCT SPACES.. Definition So far we have studied abstract vector spaces. These are a generalisation of the geometric spaces R and R. But these have more structure than just that of a vector space.

More information

that satisfies (2). Then (3) ax 0 + by 0 + cz 0 = d.

that satisfies (2). Then (3) ax 0 + by 0 + cz 0 = d. Planes.nb 1 Plotting Planes in Mathematica Copright 199, 1997, 1 b James F. Hurle, Universit of Connecticut, Department of Mathematics, Unit 39, Storrs CT 669-39. All rights reserved. This notebook discusses

More information

Common Core Unit Summary Grades 6 to 8

Common Core Unit Summary Grades 6 to 8 Common Core Unit Summary Grades 6 to 8 Grade 8: Unit 1: Congruence and Similarity- 8G1-8G5 rotations reflections and translations,( RRT=congruence) understand congruence of 2 d figures after RRT Dilations

More information

CHAPTER 8 FACTOR EXTRACTION BY MATRIX FACTORING TECHNIQUES. From Exploratory Factor Analysis Ledyard R Tucker and Robert C.

CHAPTER 8 FACTOR EXTRACTION BY MATRIX FACTORING TECHNIQUES. From Exploratory Factor Analysis Ledyard R Tucker and Robert C. CHAPTER 8 FACTOR EXTRACTION BY MATRIX FACTORING TECHNIQUES From Exploratory Factor Analysis Ledyard R Tucker and Robert C MacCallum 1997 180 CHAPTER 8 FACTOR EXTRACTION BY MATRIX FACTORING TECHNIQUES In

More information

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS Systems of Equations and Matrices Representation of a linear system The general system of m equations in n unknowns can be written a x + a 2 x 2 + + a n x n b a

More information

Trigonometry Review Workshop 1

Trigonometry Review Workshop 1 Trigonometr Review Workshop Definitions: Let P(,) be an point (not the origin) on the terminal side of an angle with measure θ and let r be the distance from the origin to P. Then the si trig functions

More information

Connecting Transformational Geometry and Transformations of Functions

Connecting Transformational Geometry and Transformations of Functions Connecting Transformational Geometr and Transformations of Functions Introductor Statements and Assumptions Isometries are rigid transformations that preserve distance and angles and therefore shapes.

More information

Linear Algebra Notes for Marsden and Tromba Vector Calculus

Linear Algebra Notes for Marsden and Tromba Vector Calculus Linear Algebra Notes for Marsden and Tromba Vector Calculus n-dimensional Euclidean Space and Matrices Definition of n space As was learned in Math b, a point in Euclidean three space can be thought of

More information

Au = = = 3u. Aw = = = 2w. so the action of A on u and w is very easy to picture: it simply amounts to a stretching by 3 and 2, respectively.

Au = = = 3u. Aw = = = 2w. so the action of A on u and w is very easy to picture: it simply amounts to a stretching by 3 and 2, respectively. Chapter 7 Eigenvalues and Eigenvectors In this last chapter of our exploration of Linear Algebra we will revisit eigenvalues and eigenvectors of matrices, concepts that were already introduced in Geometry

More information

arxiv:1404.1100v1 [cs.lg] 3 Apr 2014

arxiv:1404.1100v1 [cs.lg] 3 Apr 2014 A Tutorial on Principal Component Analysis Jonathon Shlens Google Research Mountain View, CA 94043 (Dated: April 7, 2014; Version 3.02) arxiv:1404.1100v1 [cs.lg] 3 Apr 2014 Principal component analysis

More information

Linear Equations in Two Variables

Linear Equations in Two Variables Section. Sets of Numbers and Interval Notation 0 Linear Equations in Two Variables. The Rectangular Coordinate Sstem and Midpoint Formula. Linear Equations in Two Variables. Slope of a Line. Equations

More information

Manifold Learning Examples PCA, LLE and ISOMAP

Manifold Learning Examples PCA, LLE and ISOMAP Manifold Learning Examples PCA, LLE and ISOMAP Dan Ventura October 14, 28 Abstract We try to give a helpful concrete example that demonstrates how to use PCA, LLE and Isomap, attempts to provide some intuition

More information

ONVIF Video Analytics Service Specification

ONVIF Video Analytics Service Specification ONVIF 1 Video Analtics. Ver. 2.2.1 ONVIF Video Analtics Service Specification Version 2.2.1 December, 2012 ONVIF 2 Video Analtics. Ver. 2.2.1 2008-2012 b ONVIF: Open Network Video Interface Forum Inc..

More information

Solving Systems of Linear Equations

Solving Systems of Linear Equations LECTURE 5 Solving Systems of Linear Equations Recall that we introduced the notion of matrices as a way of standardizing the expression of systems of linear equations In today s lecture I shall show how

More information

CHAPTER 10 SYSTEMS, MATRICES, AND DETERMINANTS

CHAPTER 10 SYSTEMS, MATRICES, AND DETERMINANTS CHAPTER 0 SYSTEMS, MATRICES, AND DETERMINANTS PRE-CALCULUS: A TEACHING TEXTBOOK Lesson 64 Solving Sstems In this chapter, we re going to focus on sstems of equations. As ou ma remember from algebra, sstems

More information

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( ) Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates

More information

( ) is proportional to ( 10 + x)!2. Calculate the

( ) is proportional to ( 10 + x)!2. Calculate the PRACTICE EXAMINATION NUMBER 6. An insurance company eamines its pool of auto insurance customers and gathers the following information: i) All customers insure at least one car. ii) 64 of the customers

More information

Linear algebra and the geometry of quadratic equations. Similarity transformations and orthogonal matrices

Linear algebra and the geometry of quadratic equations. Similarity transformations and orthogonal matrices MATH 30 Differential Equations Spring 006 Linear algebra and the geometry of quadratic equations Similarity transformations and orthogonal matrices First, some things to recall from linear algebra Two

More information

Graphing Linear Equations

Graphing Linear Equations 6.3 Graphing Linear Equations 6.3 OBJECTIVES 1. Graph a linear equation b plotting points 2. Graph a linear equation b the intercept method 3. Graph a linear equation b solving the equation for We are

More information