Rank one SVD: un algorithm pour la visualisation d une matrice non négative



Similar documents
DATA ANALYSIS II. Matrix Algorithms

Chapter 6. Orthogonality

Francesco Sorrentino Department of Mechanical Engineering

6. Cholesky factorization

Nonlinear Iterative Partial Least Squares Method

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 2. x n. a 11 a 12 a 1n b 1 a 21 a 22 a 2n b 2 a 31 a 32 a 3n b 3. a m1 a m2 a mn b m

A Toolbox for Bicluster Analysis in R

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS

Linear Algebra Review. Vectors

Inner Product Spaces and Orthogonality

Factorization Theorems

NETZCOPE - a tool to analyze and display complex R&D collaboration networks

Similar matrices and Jordan form

Lecture 5: Singular Value Decomposition SVD (1)

Nimble Algorithms for Cloud Computing. Ravi Kannan, Santosh Vempala and David Woodruff

Direct Methods for Solving Linear Systems. Matrix Factorization

4: EIGENVALUES, EIGENVECTORS, DIAGONALIZATION

Notes on Symmetric Matrices

The Singular Value Decomposition in Symmetric (Löwdin) Orthogonalization and Data Compression

APPM4720/5720: Fast algorithms for big data. Gunnar Martinsson The University of Colorado at Boulder

arxiv: v3 [math.na] 1 Oct 2012

. P. 4.3 Basic feasible solutions and vertices of polyhedra. x 1. x 2

Solution of Linear Systems

Solving Systems of Linear Equations

Similarity and Diagonalization. Similar Matrices

Linear Algebraic Equations, SVD, and the Pseudo-Inverse

Matrix Calculations: Applications of Eigenvalues and Eigenvectors; Inner Products

The Characteristic Polynomial

Notes on Determinant

Modélisation et résolutions numérique et symbolique

Applied Linear Algebra I Review page 1

SALEM COMMUNITY COLLEGE Carneys Point, New Jersey COURSE SYLLABUS COVER SHEET. Action Taken (Please Check One) New Course Initiated

LINEAR ALGEBRA. September 23, 2010

x = + x 2 + x

Practical Graph Mining with R. 5. Link Analysis

7 Gaussian Elimination and LU Factorization

Multidimensional data analysis

Numerical Methods I Solving Linear Systems: Sparse Matrices, Iterative Methods and Non-Square Systems

Part 2: Community Detection

Introduction to Matrix Algebra

CS 5614: (Big) Data Management Systems. B. Aditya Prakash Lecture #18: Dimensionality Reduc7on

Generalized association plots (GAP): Dimension free information visualization environment for multivariate data structure

December 4, 2013 MATH 171 BASIC LINEAR ALGEBRA B. KITCHENS

SPECIAL PERTURBATIONS UNCORRELATED TRACK PROCESSING

Tensor Methods for Machine Learning, Computer Vision, and Computer Graphics

SYMMETRIC EIGENFACES MILI I. SHAH

13 MATH FACTS a = The elements of a vector have a graphical interpretation, which is particularly easy to see in two or three dimensions.

Chapter 7. Lyapunov Exponents. 7.1 Maps

Introduction to Clustering

USING SPECTRAL RADIUS RATIO FOR NODE DEGREE TO ANALYZE THE EVOLUTION OF SCALE- FREE NETWORKS AND SMALL-WORLD NETWORKS

Factor analysis. Angela Montanari

Mehtap Ergüven Abstract of Ph.D. Dissertation for the degree of PhD of Engineering in Informatics

Linear Algebra and TI 89

ELA

by the matrix A results in a vector which is a reflection of the given

FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM

Mining Social-Network Graphs

Tensor Factorization for Multi-Relational Learning

Structure Preserving Model Reduction for Logistic Networks

Component Ordering in Independent Component Analysis Based on Data Power

Split Nonthreshold Laplacian Integral Graphs

Math 115A HW4 Solutions University of California, Los Angeles. 5 2i 6 + 4i. (5 2i)7i (6 + 4i)( 3 + i) = 35i + 14 ( 22 6i) = i.

SOLVING LINEAR SYSTEMS

USE OF EIGENVALUES AND EIGENVECTORS TO ANALYZE BIPARTIVITY OF NETWORK GRAPHS

3. Let A and B be two n n orthogonal matrices. Then prove that AB and BA are both orthogonal matrices. Prove a similar result for unitary matrices.

3 Orthogonal Vectors and Matrices

Visualization of textual data: unfolding the Kohonen maps.

5. Orthogonal matrices

State of Stress at Point

Clustering Very Large Data Sets with Principal Direction Divisive Partitioning

MATH 304 Linear Algebra Lecture 18: Rank and nullity of a matrix.

(67902) Topics in Theory and Complexity Nov 2, Lecture 7

Linear Algebra: Determinants, Inverses, Rank

Data Mining: Algorithms and Applications Matrix Math Review

Large-Scale Spectral Clustering on Graphs

Text Analytics (Text Mining)

University of Lille I PC first year list of exercises n 7. Review

LS.6 Solution Matrices

a 11 x 1 + a 12 x a 1n x n = b 1 a 21 x 1 + a 22 x a 2n x n = b 2.

Large-scale Data Mining: MapReduce and Beyond Part 2: Algorithms. Spiros Papadimitriou, IBM Research Jimeng Sun, IBM Research Rong Yan, Facebook

Eigenvalues and Eigenvectors

Part 1: Link Analysis & Page Rank

Understanding Big Data Spectral Clustering

8 Square matrices continued: Determinants

Categorical Data Visualization and Clustering Using Subjective Factors

ON THE DEGREES OF FREEDOM OF SIGNALS ON GRAPHS. Mikhail Tsitsvero and Sergio Barbarossa

UNCOUPLING THE PERRON EIGENVECTOR PROBLEM

Math 550 Notes. Chapter 7. Jesse Crawford. Department of Mathematics Tarleton State University. Fall 2010

Search engines: ranking algorithms

A linear combination is a sum of scalars times quantities. Such expressions arise quite frequently and have the form

Practical Guide to the Simplex Method of Linear Programming

Manifold Learning Examples PCA, LLE and ISOMAP

8. Linear least-squares

MATH APPLIED MATRIX THEORY

MAT 242 Test 2 SOLUTIONS, FORM T

Chapter 19. General Matrices. An n m matrix is an array. a 11 a 12 a 1m a 21 a 22 a 2m A = a n1 a n2 a nm. The matrix A has n row vectors

Transcription:

Rank one SVD: un algorithm pour la visualisation d une matrice non négative L. Labiod and M. Nadif LIPADE - Universite ParisDescartes, France ECAIS 2013 November 7, 2013

Outline Outline 1 Data visualization and Co-clustering

Outline Outline 1 Data visualization and Co-clustering 2 One Dimensional embedding: Problem formulation

Outline Outline 1 Data visualization and Co-clustering 2 One Dimensional embedding: Problem formulation 3 Rank one SVD algorithm (R1SVD)

Outline Outline 1 Data visualization and Co-clustering 2 One Dimensional embedding: Problem formulation 3 Rank one SVD algorithm (R1SVD) 4 Experimental analysis

Outline Outline 1 Data visualization and Co-clustering 2 One Dimensional embedding: Problem formulation 3 Rank one SVD algorithm (R1SVD) 4 Experimental analysis 5 Conclusion and future work

Rank one SVD: un algorithm pour la visualisation d une matrice non négative L. Labiod and M. Nadif LIPADE - Universite ParisDescartes, France ECAIS 2013 November 7, 2013

Data visualization and Co-clustering Notation and definition Data set representation Data set can be viewed as an m n matrix A Each of the m rows represents an object (individual) Each of the n columns represents a feature (or attribute) Each entry a ij represents the element of the A in the intersection between row i and column j. Data examples Documents/words data Categorical data Social network - bipartite graph

Data visualization and Co-clustering Data visualization The visualization of A consists in an optimal permutation of rows and columns data, which involves a data reorganization revealing homogeneous blocks. Bertin has described the visualization procedure as simplifying without destroying and was convinced that simplification was no more than regrouping similar things. Spath considered such matrix permutation approaches to have a great advantage in contrast to the cluster algorithms, because no information of any kind is lost. Arabie and Hubert have referred to similar advantages calling such an approach a non-destructive data analysis, emphasizing the essential property that no transformation or reduction of the data itself takes place.

Data visualization and Co-clustering Data visualization Data visualization methods. Hierarchical clustering Bond energy algorithm (BEA, McCormick et al. (1972)) Genetic algorithm : Stress objective (Niermann (2005)) A Reordred A Figure: Initial Data - reordered data

Data visualization and Co-clustering Co-clustering Simultaneously partition data objects and features. Direct Clustering (Hartigan, 1975) Double kmeans (Tao Li, 2001) Spectral Co-clustering (Dhillon, 2001) Nonnegative matrix tri factorization: A USV t (Ding et al, 2006) Latent Block models (Govaert and Nadif,2003) Despite the advantages of co-clustering, all methods require the knowledge of the number of blocks. Balanced data: data2 Reordred data: co clustering result 50 50 100 100 150 150 200 200 250 250 300 300 350 350 400 400 450 450 500 100 200 300 500 100 200 300

Data visualization and Co-clustering Illustrative Example: 16 townships data Table: 16 Townships Data. Townships Characteristics A B C D E F G H I J K L M N O P High School 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 Agricult Coop 0 1 1 0 0 0 1 0 0 0 0 1 0 0 1 0 Rail station 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 One Room School 1 0 0 0 1 1 0 0 1 1 0 0 1 1 0 1 Veterinary 0 1 1 1 0 0 1 0 0 0 0 1 0 0 1 0 No Doctor 1 0 0 0 1 1 0 0 1 1 0 0 1 1 0 1 No Water Supply 0 0 0 0 0 0 0 0 1 1 0 0 1 1 0 0 Police Station 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 Land Reallocation 0 1 1 1 0 0 1 0 0 0 0 1 0 0 1 0 Table: Reorganization of townships and characteristics. Characteristics H K B C D G L O M N J I A P F E High School 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Railway Station 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Police Station 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Agricult Coop 0 0 1 1 1 1 1 1 0 0 0 0 0 0 0 0 Veterinary 0 0 1 1 1 1 1 1 0 0 0 0 0 0 0 0 Land Reallocation 0 0 1 1 1 1 1 1 0 0 0 0 0 0 0 0 One Room School 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 No Doctor 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 No Water Supply 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0

Data visualization and Co-clustering Illustrative Example: reordred data Table: Characterization of each cluster of townships by a cluster of characteristics. class of townships {H, K} {B, C, D, G, L, O} {M, N, J, I, A, A, P, F, E} class of characteristics {High School, Railway Station,Police Station} {Agricult Coop, Veterinary,Land Reallocation} { One Room School, No Doctor, No Water Supply} Table: Reorganization of townships and characteristics. Characteristics H K B C D G L O M N J I A P F E High School 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Railway Station 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Police Station 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Agricult Coop 0 0 1 1 1 1 1 1 0 0 0 0 0 0 0 0 Veterinary 0 0 1 1 1 1 1 1 0 0 0 0 0 0 0 0 Land Reallocation 0 0 1 1 1 1 1 1 0 0 0 0 0 0 0 0 One Room School 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 No Doctor 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 No Water Supply 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0

One Dimensional embedding: Problem formulation Problem formulation R1SVD problem formulation Given an m by n nonnegative data matrix A The adjacency matrix of a bipartite graph is: [ ] 0 A B = A T, (1) 0 from which we define a stochastic data matrix as follow, [ ] [ S = D 1 0 Dr B = 1 A Dr 0 Dc 1 A T where D = 0 0 D c ], (2) where D r and D c, are diagonal matrices such that D r = diag(a½) and D c = diag(a T½).

One Dimensional embedding: Problem formulation Problem formulation Power method Since S is nonnegative and stochastic, The Perron-Frobenius theorem tel us that the first vector is also nonnegative and constant. The power method is the well known technique used to compute the =½ leading eigenvector of S. The power method consists in the following iterative process: π (t) = S (t) π (0) and π (t) = π(t) (3) π (t) The right eigenvector corresponding to the uniform distribution ( 1 m+n,..., 1 m+n,..., 1 m+n )T. The corresponding left eigenvector π represents the constant left eigenvector of S so that π T½=m + n. In the matrix notation we have π = Sπ and S½=½.

One Dimensional embedding: Problem formulation Problem formulation Power method π (t) = S (t) π (0) and π (t) = π(t) π (t) At first sight, this process might seem uninteresting since it eventually leads to a vector with all rows and columns coincide for any starting vector. The corresponding left eigenvector π =½represents the constant left eigenvector of S. However our practical experience shows that. If we stop the power method after a few iterations, the algorithm would have a potential application for data reordering. The key idea : early stopping for the power method (4)

Rank one SVD algorithm (R1SVD) R1SVD algorithm R1SVD : a mutual reinforcement principle [ ] u Now, let us consider π =, where u R v m + and v R n +. The upper part of π.i.e u is for documents weight and the lower part v is for words weights. Exploiting now the diagonal structure of S, then we can write [ ] [ ] [ ] { u 0 Dr = 1 A u u = D 1 v Dc 1 A T r Av (a) 0 v v = Dc 1 A T u (b) (5) This iterative process starts with arbitrary vector u 0 and repeatedly performs the updates of v and u by alternating between formulas (a) and (b) given in equation 5 until convergence.

Rank one SVD algorithm (R1SVD) R1SVD algorithm R1SVD algorithm R1SVD algorithm Input: data A R m n +, D r and D c Output: u,v Initialize: ũ = Dr 1 A½,,u = ũ ũ repeat ṽ (t+1) = Dc 1 A T u (t) v (t+1) = ṽ(t+1) ṽ (t+1) ũ (t+1) = D 1 r Av (t) u (t+1) = ũ(t+1) ũ (t+1) γ (t+1) u (t+1) u (t) + v (t+1) v (t) until stabilization of u, v, γ (t+1) γ (t) threshold

A Reordred A Reordred Sr Reordred Sc 0.25 0.2 0.15 0.1 0.05 1 2 3 4 5 6 7 8 9 0.2 0.15 0.1 0.05 Sorted u Sorted v 0 0 2 4 6 8 10 12 14 16 Rank one SVD: un algorithm pour la visualisation d une matrice non négative Rank one SVD algorithm (R1SVD) R1SVD algorithm Principal steps of the R1SVD algorithm Compute the first singular vectors u of matrix D 1 r D 1 c A T Early stopping of the R1SVD A and v of matrix Sorting the vectors u and v by descending (or ascending) order Reorganize the rows and columns data according to the sorted vectors Illustrative example Figure: Data1: Data1 reordered according u and v, reordered Sr = AA T according u, reordered Sc = A T A according v.

Rank one SVD: un algorithm pour la visualisation d une matrice non négative Experimental analysis Experimental analysis R1SVD : data visualization and co-clustering A Reordred A Reordred Sr Reordred Sc 4 5 Sorted u x 10 5 5 5 0 500 1000 3 2 1500 2000 Sorted v x 10 2 2 2 2 0 100 200 300 400 500 Figure: Data1: Data1 reordered according u and v, reordered Sr = AAT according u, reordered Sc = AT A according v. A Reordred A Reordred Sr Reordred Sc 4 5 Sorted u x 10 5 5 5 5 0 500 1000 3 2 1500 2000 Sorted v x 10 2 2 2 2 0 100 200 300 400 500 Figure: Data2: Data2 reordered according u and v,reordered Sr = AAT according u, reordered Sc = AT A according v.

Experimental analysis Experimental analysis Is this reordering meaningful?. In order to be able to answer this question we use confusion matrices to measure the clustering performance of the co-clustering result provided by our method. R1SVD : data co-clustering Table: Confusion Matrix evaluation on rows and columns data. data1 (rows) data2 (rows) data1 (columns) data2 (columns) 0 205 0 0 0 795 0 0 40 155 0 0 1614 0 5 626 0 0 397 0 0 0 133 0 0 0 176 0 579 0 0 63 0 0 0 212

Conclusion and future work Conclusion R1SVD : conclusion We have presented an iterative matrix-vector multiplication procedure called rank-one SVD for data visualization. The procedure is to apply iteratively an appropriate stochastic adjacency data matrix associated to a bipartite graph, compute the first leading left singular vector associated to the eigenvalue λ 1. Stopping the algorithm after a few iterations involves a visualization of data matrix into homogeneous blocks. This approach appears therefore very interesting in co-clustering context.