Spazi vettoriali e misure di similaritá



Similar documents
Vector and Matrix Norms

Metric Spaces. Chapter Metrics

Linear Algebra: Vectors

MATH 304 Linear Algebra Lecture 20: Inner product spaces. Orthogonal sets.

Linear Algebra Notes for Marsden and Tromba Vector Calculus

DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS

Bindel, Spring 2012 Intro to Scientific Computing (CS 3220) Week 3: Wednesday, Feb 8

Data Mining: Algorithms and Applications Matrix Math Review

Inner Product Spaces

α = u v. In other words, Orthogonal Projection

Numerical Analysis Lecture Notes

Linear Algebra Review. Vectors

Section Inner Products and Norms

Medical Information Management & Mining. You Chen Jan,15, 2013 You.chen@vanderbilt.edu

Clustering. Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016

Mathematical Methods of Engineering Analysis

1 Introduction to Matrices

Similarity and Diagonalization. Similar Matrices

TF-IDF. David Kauchak cs160 Fall 2009 adapted from:

Section 1.1. Introduction to R n

October 3rd, Linear Algebra & Properties of the Covariance Matrix

3. INNER PRODUCT SPACES

Matrix Representations of Linear Transformations and Changes of Coordinates

Introduction to General and Generalized Linear Models

Similarity Search in a Very Large Scale Using Hadoop and HBase

Torgerson s Classical MDS derivation: 1: Determining Coordinates from Euclidean Distances

Inner product. Definition of inner product

Vector Spaces; the Space R n

Social Media Mining. Data Mining Essentials

Mehtap Ergüven Abstract of Ph.D. Dissertation for the degree of PhD of Engineering in Informatics

THREE DIMENSIONAL GEOMETRY

Introduction to Geometric Algebra Lecture II

Mathematics Course 111: Algebra I Part IV: Vector Spaces

Electrical Engineering 103 Applied Numerical Computing

Data Mining 5. Cluster Analysis

Mathematics Pre-Test Sample Questions A. { 11, 7} B. { 7,0,7} C. { 7, 7} D. { 11, 11}

1 VECTOR SPACES AND SUBSPACES

Elementary Gradient-Based Parameter Estimation

NEW YORK STATE TEACHER CERTIFICATION EXAMINATIONS

1.3. DOT PRODUCT If θ is the angle (between 0 and π) between two non-zero vectors u and v,

MATH 304 Linear Algebra Lecture 18: Rank and nullity of a matrix.

Inner Product Spaces and Orthogonality

Elementary Linear Algebra

Least-Squares Intersection of Lines

CONTINUED FRACTIONS AND PELL S EQUATION. Contents 1. Continued Fractions 1 2. Solution to Pell s Equation 9 References 12

1 Inner Products and Norms on Real Vector Spaces

Grassmann Algebra in Game Development. Eric Lengyel, PhD Terathon Software

A linear combination is a sum of scalars times quantities. Such expressions arise quite frequently and have the form

Vectors Math 122 Calculus III D Joyce, Fall 2012

17. Inner product spaces Definition Let V be a real vector space. An inner product on V is a function

2.3 Convex Constrained Optimization Problems

ON TORI TRIANGULATIONS ASSOCIATED WITH TWO-DIMENSIONAL CONTINUED FRACTIONS OF CUBIC IRRATIONALITIES.

9.4. The Scalar Product. Introduction. Prerequisites. Learning Style. Learning Outcomes

Let H and J be as in the above lemma. The result of the lemma shows that the integral

To give it a definition, an implicit function of x and y is simply any relationship that takes the form:

Introduction to Information Retrieval

UNIFORMLY DISCONTINUOUS GROUPS OF ISOMETRIES OF THE PLANE

Chapter ML:XI (continued)

Unified Lecture # 4 Vectors

Adding vectors We can do arithmetic with vectors. We ll start with vector addition and related operations. Suppose you have two vectors

Finite Dimensional Hilbert Spaces and Linear Inverse Problems

Introduction to Matrix Algebra

x1 x 2 x 3 y 1 y 2 y 3 x 1 y 2 x 2 y 1 0.

Special Theory of Relativity

Cluster Analysis. Isabel M. Rodrigues. Lisboa, Instituto Superior Técnico

Follow links for Class Use and other Permissions. For more information send to:

5. Orthogonal matrices

Continued Fractions and the Euclidean Algorithm

1 Norms and Vector Spaces

Algebra I Vocabulary Cards

Lecture L3 - Vectors, Matrices and Coordinate Transformations

Math 4310 Handout - Quotient Vector Spaces

Choice under Uncertainty

BALTIC OLYMPIAD IN INFORMATICS Stockholm, April 18-22, 2009 Page 1 of?? ENG rectangle. Rectangle

Section 4.4 Inner Product Spaces

Physics 235 Chapter 1. Chapter 1 Matrices, Vectors, and Vector Calculus

Two vectors are equal if they have the same length and direction. They do not

discuss how to describe points, lines and planes in 3 space.

Solving Systems of Linear Equations

T ( a i x i ) = a i T (x i ).

Rotation Matrices and Homogeneous Transformations

BANACH AND HILBERT SPACE REVIEW

Lecture 18 - Clifford Algebras and Spin groups

The p-norm generalization of the LMS algorithm for adaptive filtering

28 CHAPTER 1. VECTORS AND THE GEOMETRY OF SPACE. v x. u y v z u z v y u y u z. v y v z

BILINEAR FORMS KEITH CONRAD

December 4, 2013 MATH 171 BASIC LINEAR ALGEBRA B. KITCHENS

Level Set Framework, Signed Distance Function, and Various Tools

Inequalities of Analysis. Andrejs Treibergs. Fall 2014

How To Solve The Cluster Algorithm

Towards Online Recognition of Handwritten Mathematics

11.1. Objectives. Component Form of a Vector. Component Form of a Vector. Component Form of a Vector. Vectors and the Geometry of Space

Virtual Landmarks for the Internet

13 MATH FACTS a = The elements of a vector have a graphical interpretation, which is particularly easy to see in two or three dimensions.

Information Retrieval and Web Search Engines

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

Linear Programming. March 14, 2014

Linear algebra and the geometry of quadratic equations. Similarity transformations and orthogonal matrices

Transcription:

Spazi vettoriali e misure di similaritá R. Basili Corso di Web Mining e Retrieval a.a. 2008-9 April 10, 2009

Outline Outline Spazi vettoriali a valori reali Operazioni tra vettori Indipendenza Lineare Basi Prodotto Interno Norma di un vettore e Proprietá Vettori unitari Ortogonalitá Similaritá Norme e similaritá

Real-valued Vector Space Vector Space definition: A vector space is a set V of objects called vectors x = x 1 x n = x where we can simply refer to a vector by x, or using the specific realization called column vector, (Dirac notation x )

Real-valued Vector Space Vector Space definition: A vector space need to satisfy the following axioms: Sum To every pair, x and y, of vectors in V there corresponds a vector x + y, called the sum of x and y, in such a way that: 1. sum is commutative, x + y = y + x 2. sum is associative, x + ( y + z ) = ( x + y ) + z 3. there exist in V a unique vector Φ (called the origin) such that x + Φ = x x V 4. x V there corresponds a unique vector x such that x + ( x) = Φ Scalar Multiplication To every pair α and x, where α is a scalar and x V, there corresponds a vector αx, called the product of α and x, in such a way that: 1. associativity α(β x) = (αβ)x 2. 1x = x x V 3. mult. by scalar is distributive wrt. vector addition α ( x + y ) = αx + αy 4. mult. by vector is distributive wrt. scalar addition (α + β)x = αx + βx

Vector Operations Sum of two vector x and y x 1 + y 1 x+y = x + y = x n + y n Linear combination Multiplication by scalar α αx 1 αx = α x = αx n y = c 1 x 1 + + c n x n or y = c 1 x 1 + + c n x n

Linear dependence Conditions for linear dependence A set o vectors {x 1,...,x n } are linearly dependent if there a set constant scalars c 1,...,c n exists, not all 0, such that: c 1 x 1 + + c n x n = 0 Conditions for linear independence A set o vectors {x 1,...,x n } are linearly independent if and only if the linear condition c 1 x 1 + + c n x n = 0 is satisfied only when c 1 = c 2 = = c n = 0

Basis Definition: A basis for a space is a set of n linearly independent vectors in a n-dimensional vector space V n. This means that every arbitrary vector x V can be expressed as linear combination of the basis vectors, x = c 1 x 1 + + c n x n where the c i are called the co-ordinates of x wrt. the basis set {x 1,...,x n }

Inner Product Definition: Is a real-valued function on the cross product V n V n associating with each pair of vectors ( x,y ) a unique real number. The function (.,.) has the following properties: 1. (x,y) = (y,x) 2. (x,λy) = λ(x,y) 3. (x 1 + x 2,y) = (x 1,y) + (x 2,y) 4. (x,x) 0 and (x,x) = 0 iff x = 0 Standard Inner Product (x,y) = n i=1 x i y i Other notations x T y where x T is the transpose of x x y or sometimes x y in Dirac notation

Norm Geometric interpretation Geometrically the norm represent the length of the vector Properties Euclidean Norm: x = (x,x) = n i=1 x2 i = ( x1 2 + + x2 n Definition The norm id a function. from V n to R ) 1/2 1. x 0 and x = 0 if and only if x = 0 2. αx = α x for all α and x 3. x, y, (x, y) x y (Cauchy-Schwartz) A vector x V n is a unit vector, or normalsized, when x = 1

From Norm to distance In V n we can define the distance between two vectors x and y as: d(x,y) = x y = (x y,x y) = ( (x 1 y 1 ) 2 + + (x n y n ) 2) 1/2 These measure, noted sometimes as x y 2 2, is also named Euclidean distance. Properties: d(x,y) 0 and d(x,y) = 0 if and only if x = y d(x, y) = d(y, x) symmetry d(x, y) = d(x, z) + d(z, y) triangle inequality

From Norm to distance An immediate consequence of Cauchy-Schwartz property is that: 1 and therefore we can express it as: (x,y) x y 1 (x,y) = x y cosϕ 0 ϕ π where ϕ is the angle between the two vectors x and y Cosine distance cosϕ = (x,y) x y = n i=1 n i=1 x 2 i x i y i n y 2 i i=1 If the vectors x, y have the norm equal to 1 then: cosϕ = n i=1 x i y i = (x,y)

Ortogonality Definition x and y are ortogonal if and only if (x,y) = 0 Orthonormal basis A set of linearly independent vectors {x 1,...,x n } constitutes an orthonormal basis for the space V n if and only if ( ) 1 if i = j x i,x j = δ ij = 0 if i j

Similarity Applications Document clusters provide often a structure for organizing large bodies of texts for efficient searching and browsing. For example, recent advances in Internet search engines (e.g., http://vivisimo.com/, http://metacrawler.com/) exploit document cluster analysis. Document and vectors For this purpose, a document is commonly represented as a vector consisting of the suitably normalized frequency counts of words or terms. Each document typically contains only a small percentage of all the words ever used. If we consider each document as a multi-dimensional vector and then try to cluster documents based on their word contents, the problem differs from classic clustering scenarios in several ways.

Similarity The role of similarity among vectors Document data is high-dimensional, characterized by a very sparse term-document matrix with positive ordinal attribute values and a significant amount of outliers. In such situations, one is truly faced with the curse of dimensionality issue since, even after feature reduction, one is left with hundreds of dimensions per object.

Similarity In the relationship-based clustering process, key cluster analysis activities can be associated with each step: Clustering steps Representation of raw objects (i.e. documents) into vectors of properties with real-valued scores (weights) Definition of a proximity measure Clustering algorithm Evaluation

Similarity and Clustering A well-known example of clustering algorithm is k-mean.

Similarity Clustering steps To obtain features X F from the raw objects, a suitable object representation has to be found. Given an objext O D, we will refer to such a representation as the feature vector x of X. In the second step, a measure of proximity S S has to be defined between objects, i.e. S : D 2 R. The choice of similarity or distance can have a deep impact on clustering quality.

Minkowski distances Minkowski distances The Minkowski distances L p (x,y) defined as: L p (x,y) = p n i=1 x i y i p are the standard metrics for geometrical problems. Euclidean Distance For p = 2 we obtain the Euclidean distance, x,y 2 2.

Minkowski distances There are several possibilities for converting an L p (x,y) distance metric (in [0,inf), with 0 closest) into a similarity measure (in [0,1], with 1 closest) by a monotonic decreasing function. Relation between distances and similarities For Euclidean space, we chose to relate distances d and similarities s using s = e d2 Consequently, the Euclidean [0,1]-normalized similarity is defined as: s (E) (x,y) = e x y 2 2

Pearson Correlation Pearson Correlation In collaborative filtering, correlation is often used to predict a feature from a highly similar mentor group of objects whose features are known. The [0,1]-normalized Pearson correlation is defined as: ( ) s (P) (x,y) = 1 (x x) T (y ȳ) + 1, 2 x x 2 y ȳ 2 where x denotes the average feature value of x over all dimensions.

Pearson Correlation Pearson Correlation The [0,1]-normalized Pearson correlation can also be seen as a probabilistic measure as in: s (P) (x,y) = r xy = (x i x)(y i ȳ) (n 1)s x s y, where x denotes the average feature value of x over all dimensions, and s x and s y are the standard deviations of x and y, respectively. The correlation is defined only if both of the standard deviations are finite and both of them are nonzero. It is a corollary of the Cauchy-Schwarz inequality that the correlation cannot exceed 1 in absolute value. The correlation is 1 in the case of an increasing linear relationship, -1 in the case of a decreasing linear relationship, and some value in between in all other cases, indicating the degree of linear dependence between the variables.

Jaccard Similarity Binary Jaccard Similarity The binary Jaccard coefficient measures the degree of overlap between two sets and is computed as the ratio of the number of shared features of x AND y to the number possessed by x OR y. Example For example, given two sets binary indicator vectors x = (0,1,1,0) T and y = (1,1,0,0) T, the cardinality of their intersect is 1 and the cardinality of their union is 3, rendering their Jaccard coefficient 1/3. The binary Jaccard coefficient it is often used in retail market-basket applications.

Extended Jaccard Similarity Extended Jaccard Similarity The extended Jaccard coefficient is the generalized notion of the binary case and it is computed as: s (J) x T y (x,y) = x 2 2 + y 2 2 xt y

Dice coefficient Dice coefficient Another similarity measure highly related to the extended Jaccard is the Dice coefficient: s (D) 2x T y (x,y) = x 2 2 + y 2 2 The Dice coefficient can be obtained from the extended Jaccard coefficient by adding x T y to both the numerator and denominator.

Similarity: discussion Scale and Translation invariance Euclidean similarity is translation invariant but scale sensitive while cosine is translation sensitive but scale invariant. The extended Jaccard has aspects of both properties as illustrated in figure. Iso-similarity lines at s = 0.25, 0.5 and 0.75 for points x = (3,1) T and y = (1,2) T are shown for Euclidean, cosine, and the extended Jaccard.

Similarity: discussion Thus, for s (J) 0, extended Jaccard behaves like the cosine measure, and for s (J) 1, it behaves like the Euclidean distance

Similarity: discussion Similarity in Clustering In traditional Euclidean k-means clustering the optimal cluster representative c l minimizes the sum of squared error criterion, i.e., c l = argmin x j z 2 2 z F x j C l Any convex distance-based objective can be translated and extended to the similarity space.

Similarity: discussion Swtiching from distances to similarity Consider the generalized objective function f (C l, z) given a cluster C l and a representative z: f (C l, z) = x j C l d(x j, z) 2 = x j C l x z 2 2. We use the transformation s = e d2 to express the objective in terms of similarity rather than distance: f (C l, z) = x j C l log(s(x j, z))

Similarity: discussion Swtiching from distances to similarity Finally, we simplify and transform the objective using a strictly monotonic decreasing function. Instead of minimizing f (C l, z), we maximize f (C l, z) = e f (C l, z) Thus, in the similarity space, the least squared error representative c l F for a cluster C l satisfies: c l = argmax z F x j C l s(x j, z) Using the concave evaluation function f, we can obtain optimal representatives for non-euclidean similarity spaces S.

Similarity: discussion To illustrate the values of the evaluation function f ({x 1,x 2 },z) are used to shade the background in the figure below. The maximum likelihood representative of x 1 and x 2 is marked with a.

Similarity: discussion For cosine similarity all points on the equi-similarity are optimal representatives. In a maximum likelihood interpretation, we constructed the distance similarity transformation such that p( z c l ) s( z,c l ) Consequently, we can use the dual interpretations of probabilities in similarity space S and errors in distance space R.

Further similarity measures Vector similarities Grefenstette (fuzzy) set-oriented similarity for capturing dependency relations (head words) Distributional (Probabilstic) similarities Lin similarity (commonalities) (Dice like) sim(x,y) = logp(common(x,y)) log(p(desc(x, y)) Jensen-Shannon total divergence to the mean: A(p,q) = D(p p + q 2 ) + D(q p + q 2 ) α-skewed divergence (Lee, 1999): s α (p,q) = D(p αp + (1 α)q) (α = 0,1 or 0.01)

References Vectors, Operations, Norms and Distances K. Van Rijesbergen, The Geometry of Information Retrieval, CUP Press, 2004. Distances and Similarities Alexander Strehl, Relationship-based Clustering and Cluster Ensembles for High-dimensional Data Mining, PhD Dissertation, University of Texas at Austin, 2002. URL: http://www.lans.ece.utexas.edu/ strehl/diss/htdi.html.