Text Analytics (Text Mining)



Similar documents
CS 5614: (Big) Data Management Systems. B. Aditya Prakash Lecture #18: Dimensionality Reduc7on

Linear Algebra Review. Vectors

Lecture 5: Singular Value Decomposition SVD (1)

Scaling Up HBase, Hive, Pegasus

Big Data Analytics Process & Building Blocks

Scaling Up 2 CSE 6242 / CX Duen Horng (Polo) Chau Georgia Tech. HBase, Hive

Dimension Reduction. Wei-Ta Chu 2014/10/22. Multimedia Content Analysis, CSIE, CCU

DATA ANALYSIS II. Matrix Algorithms

Review Jeopardy. Blue vs. Orange. Review Jeopardy

Orthogonal Diagonalization of Symmetric Matrices

Attend Part 1 (2-3pm) to get 1 point extra credit. Polo will announce on Piazza options for DL students.

Multimedia Databases. Wolf-Tilo Balke Philipp Wille Institut für Informationssysteme Technische Universität Braunschweig

Math 550 Notes. Chapter 7. Jesse Crawford. Department of Mathematics Tarleton State University. Fall 2010

Big Data Analytics Building Blocks. Simple Data Storage (SQLite)

Similarity and Diagonalization. Similar Matrices

Big Data Analytics Building Blocks; Simple Data Storage (SQLite)

Big Data Analytics Building Blocks. Simple Data Storage (SQLite)

Big Data Analytics Building Blocks; Simple Data Storage (SQLite)

Mehtap Ergüven Abstract of Ph.D. Dissertation for the degree of PhD of Engineering in Informatics

Chapter 6. Orthogonality

Non-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning

The Singular Value Decomposition in Symmetric (Löwdin) Orthogonalization and Data Compression

Text Mining in JMP with R Andrew T. Karl, Senior Management Consultant, Adsurgo LLC Heath Rushing, Principal Consultant and Co-Founder, Adsurgo LLC

Data Mining: Algorithms and Applications Matrix Math Review

Nonlinear Iterative Partial Least Squares Method

Manifold Learning Examples PCA, LLE and ISOMAP

Linear Algebra Methods for Data Mining

Inner product. Definition of inner product

MATH 423 Linear Algebra II Lecture 38: Generalized eigenvectors. Jordan canonical form (continued).

x = + x 2 + x

APPM4720/5720: Fast algorithms for big data. Gunnar Martinsson The University of Colorado at Boulder

3 Orthogonal Vectors and Matrices

Linear Algebra Methods for Data Mining

Dimensionality Reduction: Principal Components Analysis

CSE 6040 Computing for Data Analytics: Methods and Tools. Lecture 1 Course Overview

Inner Product Spaces and Orthogonality

Torgerson s Classical MDS derivation: 1: Determining Coordinates from Euclidean Distances

CHAPTER 8 FACTOR EXTRACTION BY MATRIX FACTORING TECHNIQUES. From Exploratory Factor Analysis Ledyard R Tucker and Robert C.

α = u v. In other words, Orthogonal Projection

CS Introduction to Data Mining Instructor: Abdullah Mueen

Medical Information Management & Mining. You Chen Jan,15, 2013 You.chen@vanderbilt.edu

LINEAR ALGEBRA. September 23, 2010

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 2. x n. a 11 a 12 a 1n b 1 a 21 a 22 a 2n b 2 a 31 a 32 a 3n b 3. a m1 a m2 a mn b m

Wavelet analysis. Wavelet requirements. Example signals. Stationary signal 2 Hz + 10 Hz + 20Hz. Zero mean, oscillatory (wave) Fast decay (let)

Introduction to Matrix Algebra

Applied Linear Algebra I Review page 1

Soft Clustering with Projections: PCA, ICA, and Laplacian

Eigenvalues, Eigenvectors, Matrix Factoring, and Principal Components

4. Matrix Methods for Analysis of Structure in Data Sets:

Component Ordering in Independent Component Analysis Based on Data Power

3. INNER PRODUCT SPACES

Supervised Feature Selection & Unsupervised Dimensionality Reduction

Factor analysis. Angela Montanari

Hyperspectral Data Analysis and Supervised Feature Reduction Via Projection Pursuit

Quadratic forms Cochran s theorem, degrees of freedom, and all that

by the matrix A results in a vector which is a reflection of the given

A Direct Numerical Method for Observability Analysis

Multidimensional data analysis

Au = = = 3u. Aw = = = 2w. so the action of A on u and w is very easy to picture: it simply amounts to a stretching by 3 and 2, respectively.

Nimble Algorithms for Cloud Computing. Ravi Kannan, Santosh Vempala and David Woodruff

Section Inner Products and Norms

BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES

Similar matrices and Jordan form

Factor Rotations in Factor Analyses.

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS

Linear Algebraic Equations, SVD, and the Pseudo-Inverse

AMS526: Numerical Analysis I (Numerical Linear Algebra)

CS3220 Lecture Notes: QR factorization and orthogonal transformations

Eigenvalues and Eigenvectors

W. Heath Rushing Adsurgo LLC. Harness the Power of Text Analytics: Unstructured Data Analysis for Healthcare. Session H-1 JTCC: October 23, 2015

Lecture Topic: Low-Rank Approximations

Collaborative Filtering. Radek Pelánek

Introduction to Principal Components and FactorAnalysis

Bindel, Spring 2012 Intro to Scientific Computing (CS 3220) Week 3: Wednesday, Feb 8

Partial Least Squares (PLS) Regression.

Numerical Analysis Lecture Notes

Computational Optical Imaging - Optique Numerique. -- Deconvolution --

Syllabus for MATH 191 MATH 191 Topics in Data Science: Algorithms and Mathematical Foundations Department of Mathematics, UCLA Fall Quarter 2015

The Data Mining Process

FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM

Solving Systems of Linear Equations

Chapter 7. Lyapunov Exponents. 7.1 Maps

LINEAR ALGEBRA W W L CHEN

Recall that two vectors in are perpendicular or orthogonal provided that their dot

Notes on Symmetric Matrices

Time Domain and Frequency Domain Techniques For Multi Shaker Time Waveform Replication

Linear algebra and the geometry of quadratic equations. Similarity transformations and orthogonal matrices

Examination paper for TMA4205 Numerical Linear Algebra

Lecture 1: Systems of Linear Equations

6. Cholesky factorization

Solution of Linear Systems

Unsupervised and supervised dimension reduction: Algorithms and connections

Adaptive Framework for Network Traffic Classification using Dimensionality Reduction and Clustering

Mean value theorem, Taylors Theorem, Maxima and Minima.

Learning, Sparsity and Big Data

Least-Squares Intersection of Lines

Tensor Methods for Machine Learning, Computer Vision, and Computer Graphics

Chapter 17. Orthogonal Matrices and Symmetries of Space

Chapter 19. General Matrices. An n m matrix is an array. a 11 a 12 a 1m a 21 a 22 a 2m A = a n1 a n2 a nm. The matrix A has n row vectors

Finite Dimensional Hilbert Spaces and Linear Inverse Problems

Transcription:

CSE 6242 / CX 4242 Apr 3, 2014 Text Analytics (Text Mining) LSI (uses SVD), Visualization Duen Horng (Polo) Chau Georgia Tech Some lectures are partly based on materials by Professors Guy Lebanon, Jeffrey Heer, John Stasko, Christos Faloutsos, Le Song

Singular Value Decomposition (SVD): Motivation Problem #1: " " Text - LSI uses SVD find concepts " " Problem #2: " " Compression / dimensionality reduction

SVD - Motivation Problem #1: text - LSI: find concepts

SVD - Motivation Customer-product, for recommendation system: bread lettuce tomatos beef chicken vegetarians meat eaters

SVD - Motivation problem #2: compress / reduce dimensionality

Problem - Specification ~10^6 rows; ~10^3 columns; no updates;" Random access to any cell(s); small error: OK

SVD - Motivation

SVD - Motivation

SVD - Definition (reminder: matrix multiplication) x = 3 x 2 2 x 1

SVD - Definition (reminder: matrix multiplication) x = 3 x 2 2 x 1 3 x 1

SVD - Definition (reminder: matrix multiplication) x = 3 x 2 2 x 1 3 x 1

SVD - Definition (reminder: matrix multiplication) x = 3 x 2 2 x 1 3 x 1

SVD - Definition (reminder: matrix multiplication) x =

SVD - Definition A [n x m] = U [n x r] Λ [ r x r] (V [m x r] ) T" " A: n x m matrix e.g., n documents, m terms" U: n x r matrix e.g., n documents, r concepts" Λ: r x r diagonal matrix r : rank of the matrix; strength of each concept " V: m x r matrix " e.g., m terms, r concepts

SVD - Definition A [n x m] = U [n x r] Λ [ r x r] (V [m x r] ) T m r n r = n x r x r m diagonal entries: concept strengths m terms r concepts n documents m terms n documents r concepts"

SVD - Properties THEOREM [Press+92]: always possible to decompose matrix A into A = U Λ V T " U, Λ, V: unique, most of the time" U, V: column orthonormal " i.e., columns are unit vectors, orthogonal to each other" U T U = I" V T V = I (I: identity matrix) Λ: diagonal matrix with non-negative diagonal entires, sorted in decreasing order

SVD - Example A = U Λ V T - example: data infṛetrieval brain lung CS = x x MD

SVD - Example A = U Λ V T - example: data infṛetrieval brain lung CS-concept MD-concept CS = x x MD

SVD - Example A = U Λ V T - example: data infṛetrieval brain lung doc-to-concept similarity matrix CS-concept MD-concept CS = x x MD

SVD - Example A = U Λ V T - example: data infṛetrieval brain lung strength of CS-concept CS = x x MD

SVD - Example CS A = U Λ V T - example: data infṛetrieval brain lung = CS-concept x term-to-concept similarity matrix x MD

SVD - Example CS A = U Λ V T - example: data infṛetrieval brain lung = CS-concept x term-to-concept similarity matrix x MD

SVD - Interpretation #1 documents, terms and concepts :! U: document-to-concept similarity matrix! V: term-to-concept sim. matrix! Λ: its diagonal elements: strength of each concept

SVD Interpretation #1 documents, terms and concepts :! Q: if A is the document-to-term matrix, what is A T A?! A:! Q: A A T?! A:

SVD Interpretation #1 documents, terms and concepts :! Q: if A is the document-to-term matrix, what is A T A?! A: term-to-term ([m x m]) similarity matrix! Q: A A T?! A: document-to-document ([n x n]) similarity matrix

SVD properties V are the eigenvectors of the covariance matrix A T A!! U are the eigenvectors of the Gram (inner-product) matrix AA T Thus, SVD is closely related to PCA, and can be numerically more stable. For more info, see: http://math.stackexchange.com/questions/3869/what-is-the-intuitive-relationship-between-svd-and-pca Ian T. Jolliffe, Principal Component Analysis (2 nd ed), Springer, 2002. Gilbert Strang, Linear Algebra and Its Applications (4 th ed), Brooks Cole, 2005.

SVD - Interpretation #2 best axis to project on! ( best = min sum of squares of projection errors) v1 First Singular Vector min RMS error

SVD - Interpretation #2 A = U Λ V T - example: variance ( spread ) on the v1 axis = x x v1

SVD - Interpretation #2 A = U Λ V T - example:! U Λ gives the coordinates of the points in the projection axis = x x

SVD - Interpretation #2 More details! Q: how exactly is dim. reduction done? = x x

SVD - Interpretation #2 More details! Q: how exactly is dim. reduction done?! A: set the smallest singular values to zero: = x x

SVD - Interpretation #2 ~ x x

SVD - Interpretation #2 ~ x x

SVD - Interpretation #2 ~ x x

SVD - Interpretation #2 ~

SVD - Interpretation #3 finds non-zero blobs in a data matrix = x x

SVD - Interpretation #3 finds non-zero blobs in a data matrix = x x

SVD - Interpretation #3 finds non-zero blobs in a data matrix =! communities (bi-partite cores, here) Row 1 Row 4 Row 5 Row 7 Col 1 Col 3 Col 4

SVD algorithm Numerical Recipes in C (free)

SVD - Interpretation #3 Drill: find the SVD, by inspection!! Q: rank =?? =?? x?? x??

SVD - Interpretation #3 A: rank = 2 (2 linearly independent rows/ cols) =???? x x????

SVD - Interpretation #3 A: rank = 2 (2 linearly independent rows/ cols) = x x orthogonal??

SVD - Interpretation #3 column vectors: are orthogonal - but not unit vectors: 1/sqrt(3) 0 = 1/sqrt(3) 0 1/sqrt(3) 0 x x 0 1/sqrt(2) 0 1/sqrt(2)

SVD - Interpretation #3 and the singular values are: 1/sqrt(3) 0 = 1/sqrt(3) 0 1/sqrt(3) 0 x x 0 1/sqrt(2) 0 1/sqrt(2)

SVD - Interpretation #3 Q: How to check we are correct? 1/sqrt(3) 0 = 1/sqrt(3) 0 1/sqrt(3) 0 x x 0 1/sqrt(2) 0 1/sqrt(2)

SVD - Interpretation #3 A: SVD properties:! matrix product should give back matrix A matrix U should be column-orthonormal, i.e., columns should be unit vectors, orthogonal to each other! ditto for matrix V matrix Λ should be diagonal, with non-negative values

SVD - Complexity O(n*m*m) or O(n*n*m) (whichever is less)!! Faster version, if just want singular values! or if we want first k singular vectors! or if the matrix is sparse [Berry]!! No need to write your own! Available in most linear algebra packages (LINPACK, matlab, Splus/R, mathematica...)

References Berry, Michael: http://www.cs.utk.edu/~lsi/! Fukunaga, K. (1990). Introduction to Statistical Pattern Recognition, Academic Press.! Press, W. H., S. A. Teukolsky, et al. (1992). Numerical Recipes in C, Cambridge University Press.

Case study - LSI Q1: How to do queries with LSI?! Q2: multi-lingual IR (english query, on spanish text?)

Case study - LSI Q1: How to do queries with LSI?! Problem: Eg., find documents with data data inf. retrieval brain lung CS = x x MD

Case study - LSI Q1: How to do queries with LSI?! A: map query vectors into concept space how? data inf. retrieval brain lung CS = x x MD

Case study - LSI Q1: How to do queries with LSI?! A: map query vectors into concept space how? q= data inf. retrieval brain lung term2 v2 v1 q term1

Case study - LSI Q1: How to do queries with LSI?! A: map query vectors into concept space how? q= data inf. retrieval brain lung term2 v2 q A: inner product (cosine similarity) with each concept vector v i v1 term1

Case study - LSI Q1: How to do queries with LSI?! A: map query vectors into concept space how? q= data inf. retrieval brain lung term2 v2 A: inner product (cosine similarity) with each concept vector v i v1 q q o v1 term1

Case study - LSI compactly, we have:! Eg: q= data inf. retrieval brain lung q V= q concept! CS-concept = term-to-concept similarities

Case study - LSI Drill: how would the document ( information, retrieval ) be handled by LSI?

Case study - LSI Drill: how would the document ( information, retrieval ) be handled by LSI? A: SAME:! d concept = d V Eg: d= data inf. retrieval brain lung CS-concept = term-to-concept similarities

Case study - LSI Observation: document ( information, retrieval ) will be retrieved by query ( data ), although it does not contain data!! d= q= data inf. retrieval brain lung CS-concept

Case study - LSI Q1: How to do queries with LSI?! Q2: multi-lingual IR (english query, on spanish text?)

Case study - LSI Problem:! given many documents, translated to both languages (eg., English and Spanish)! answer queries across languages

Solution: ~ LSI data inf. retrieval brain lung Case study - LSI informacion datos CS MD

Switch Gear to Text Visualization What comes up to your mind?! What visualization have you seen before? 62

Word/Tag Cloud (still popular?) http://www.wordle.net! 63

Word Counts (words as bubbles) http://www.infocaptor.com/bubble-my-page 64

Word Tree http://www.jasondavies.com/wordtree/ 65

Phrase Net Visualize pairs of words that satisfy a particular pattern, e.g., X and Y http://www-958.ibm.com/software/data/cognos/manyeyes/page/phrase_net.html 66