Del Análisis de Conceptos Formales al co-clustering idempotente

Transcription

1 Del Análisis de Conceptos Formales al co-clustering idempotente Francisco J. Valverde-Albacete Dep. Lenguajes y Sistemas Informáticos NLP & IR group, UNED, Spain 02/04/2013, Seminario MAVIR, Madrid, Spain F.J. Valverde (NLP&IR, UNED) From FCA to kfca NLP&IR / 32

2 Outline 1 Motivation Co-clustering as a DM task A model of batch ad-hoc retrieval Biclustering in IR 2 The basics of Formal Concept Analysis Definitions The Concept Lattice 3 The KFCA analysis of Confusion Matrices Representations of Confusion Matrices R min,+ -FCA of Confusion Matrices 4 Discussion and conclusions F.J. Valverde (NLP&IR, UNED) From FCA to kfca NLP&IR / 32

3 Biclustering, coclustering: a definition Given: a set of samples (or objects or observations, etc.) G, with G = g, a set of features (or attributes, etc.) M, with M = m, and a data matrix R K g m, where K is generally any non-negative section of a field, say R + 0, Direct clustering[hartigan, 1972]: generate permutations for rows I and columns J... so that R(I, J) is block diagonal. More generally[mirkin, 1996], generate: biclusters, that is pairs (A, B) of sets of samples A G and features B M... that are naturally related to each other. F.J. Valverde (NLP&IR, UNED) From FCA to kfca NLP&IR / 32

4 Biclustering definition (II) Different models of what a matrix is generate different concepts of natural relations and algorithms As a contingency matrix, find a non-negative factorization minimizing the reconstruction loss. Iterative (direct clustering) techniques Non-negative matrix factorization techniques As bipartite (weighted) graph, maximize/minimize measure on a cut Graph-partitioning techniques Spectral coclustering techniques As a product of RV s, minimize loss of mutual information in coclustering taken as a compression of the joint distribution. Information-theoretic techniques F.J. Valverde (NLP&IR, UNED) From FCA to kfca NLP&IR / 32

6 A model for batch ad-hoc tasks [Fuhr, 1992] Q R D α Q Q R β Q Q R α D β D D D Figure: An adaptation of the conceptual model of Fuhr. Given D, Q and R, the ideal IR system is S D,Q (R) =< ϱ R >... with a relevance function ϱ R : Q 2 D (1) q i ϱ R (q i ) = {d j D d j Rq i }. F.J. Valverde (NLP&IR, UNED) From FCA to kfca NLP&IR / 32

7 An IR model solving the batch ad-hoc task Given a collection, DT D, a set of topics, QT Q, and a set of relevance judgments, RT D T Q T, the implemented IR system S D,Q ( ˆR) =< ϱˆr > is what we can actually build, with approximated relevance ˆR R using a retrieval function: ϱˆr : Q 2 D q i ϱˆr(q i ) = {d j D d j ˆRqi }. for each query q Q we have precision PˆR and recall RˆR PˆR(q) = ϱ R(q) ϱˆr(q) ϱˆr(q) RˆR(q) = ϱ R(q) ϱˆr(q) ϱ R (q). F.J. Valverde (NLP&IR, UNED) From FCA to kfca NLP&IR / 32

9 Biclusters appear naturally for relevance relations... Consider a set of queries B 2 Q : It is natural to think of a set of documents relevant to all queries: B R = {d D q B, drq} Dually, consider a set of documents A 2 D : And the set of queries for which all documents are relevant A R = {q Q d A, drq} Clearly the following is a bicluster (A, B) such that A R = B B R = A Q: What is the organization of D and Q implied by this coclustering? F.J. Valverde (NLP&IR, UNED) From FCA to kfca NLP&IR / 32

10 The affordances of Formal Concept Analysis Affordance 1 FCA implements the (conjunctive) Boolean model of IR[Godin et al., 1986, Valverde-Albacete, 2006]. There exists a set of keywords T (after normalization, stoplisting, stemming) Queries are represented as sets of keywords Q 2 T Documents are represented as set of keywords D 2 T Retrieval (estimated relevance) ˆR is modelled as inclusion d ˆR q q d The retrieval function is the query polar, ϱˆr (q ) = q ˆR F.J. Valverde (NLP&IR, UNED) From FCA to kfca NLP&IR / 32

12 Formal contexts and their polars A Formal Context (D, Q, R) is a triple of: A set of objects D A set of attributes Q A (boolean) incidence relation R 2 D Q drq object d has attribute q The polars of the formal context: Given (D, Q, R) and subsets of objects A and attributes B ϕ( ) : 2 D 2 Q ψ( ) : 2 Q 2 D ϕ(a) = A R ψ(b) = B R = {q Q d A, drq} = {d D q B, drq} F.J. Valverde (NLP&IR, UNED) From FCA to kfca NLP&IR / 32

13 Formal contexts and polars (II) The polars form an (antitone) Galois connection (ϕ, ψ) : 2 D 2 Q ϕ(a) Q B ψ(b) D A (2) The closures of the polars: monotone, expansive and idempotent γ D = ψ ϕ γ Q = ϕ ψ A 1 A 2 γ D (A 1 ) γ D (A 2 ) B 1 B 2 γ Q (B 1 ) γ Q (B 2 ) γ D (A) A γ Q (B) B γ D (γ D (A)) = γ D (A) γ Q (γ Q (B)) = γ Q (B) F.J. Valverde (NLP&IR, UNED) From FCA to kfca NLP&IR / 32

15 Concepts The set of concepts (A, B) B(D, Q, R) ϕ(a) = B A = ψ(b) Extents and intents: let c = (A, B) B(G, M, I) ext( ) : B(D, Q, R) B(D, Q, R) int( ) : B(D, Q, R) B(D, Q, R) c = (A, B) ext(c) = A c = (A, B) int(c) = B The concept order B(D, Q, R) = B(D, Q, R), (A 1, B 1 ) (A 2, B 2 ) A 1 A 2 B 1 B 2 F.J. Valverde (NLP&IR, UNED) From FCA to kfca NLP&IR / 32

16 The fundamental theorem of Formal Concept Analysis The Concept Lattice B(D, Q, R) is a complete lattice in which infima and suprema are given by: (A i, B i ) = [ ] R A i, B i [ i, B i ) =, i I i I(A B i, i I i I i I R i I A i]r R A complete latttice V is isomorphic to B(D, Q, R) if and only if there are mappings γ : D V γ(d) J (V) such that, drq γ(d) µ(q) In particular V = B(V, V, ). µ : Q V µ(q) M(V) F.J. Valverde (NLP&IR, UNED) From FCA to kfca NLP&IR / 32

17 A logarithmic connection... By the fundamental theorem incidence and Concept Lattice are interchangeable: They are a pair of analysis and synthesis equations! Metaphor: The concept lattice is the exponential of the formal context. The formal concept is the logarithm of the concept lattice. F.J. Valverde (NLP&IR, UNED) From FCA to kfca NLP&IR / 32

19 Examples: Confusion matrices of multiclass classifiers The data matrix is in a semiring N DQ N D Q N DQ p m t f th k s p m t f th k s Figure: N DQ at SNR = 0 db Notice: no symmetry, certain sparsity. F.J. Valverde (NLP&IR, UNED) From FCA to kfca NLP&IR / 32

20 Usual representation: heatmaps F.J. Valverde (NLP&IR, UNED) From FCA to kfca NLP&IR / 32

21 Boolean confusion matrix and lattice of confusions Boolean confusion matrices can be subjected to FCA by simply thresholding counts: Confusion lattices represent some information about CM: Stimuli in white boxes; percepts in grey. The strength of the confusion is not clear. F.J. Valverde (NLP&IR, UNED) From FCA to kfca NLP&IR / 32

23 Data preparation We substitute N DQ N D Q by MI DQ R D Q min,+ by computing the point-wise mutual information of the count matrix N DQ : The MLE of the joint probability is ˆP DQ (a i, b j ) = n ij ij n ij, with marginals, ˆP D (a i ) = i n ij /N ˆPQ (b j ) = j n ij /N. Then the mutual information matrix becomes, ( ) ˆPDQ (a i, b j ) MI DQ (a i, b j ) = log ˆP D (a i ) ˆP Q (b j ). F.J. Valverde (NLP&IR, UNED) From FCA to kfca NLP&IR / 32

24 Generalized Formal Concept Analysis The entries are now in the min-plus semiring: MI R D Q min,+ R p m t f th k s p m t f th k s Figure: (pointwise) mutual information from N DQ Interpretations of MI(i, j) = λ stimulus i is confused with percept j in degree λ percept j is taken for stimulus i to degree λ. F.J. Valverde (NLP&IR, UNED) From FCA to kfca NLP&IR / 32

25 Generalized Formal Concept Analysis (cont) [Valverde-Albacete and Peláez-Moreno, 2011] A K-valued formal context is a triple (D, Q, R) K with: K, a complete, reflexive idempotent semifield two finite set of objects D and attributes Q, a K-valued incidence between them, R K D Q, where R(d, q) = λ reads as: object d has attribute q in degree λ or attribute q is manifested in object d to degree λ, F.J. Valverde (NLP&IR, UNED) From FCA to kfca NLP&IR / 32

26 ϕ-polars Consider (D, Q, R) K, an invertible ϕ K and the bracket y x = y T R x. Then the ϕ-polars are the maps of the GC ( ) ( ) R ϕ,r ϕ ( ) : K D K Q : (y) R ϕ = (y T R) \ ϕ R ϕ K D R ϕ R ϕ(x) = ϕ / K Q (R x) γ K D Rϕ K D Y R ϕ γ K Q K Q K D = R ( ϕ K Q ) R R ϕ ϕ K Q = ( K D) R ϕ F.J. Valverde (NLP&IR, UNED) From FCA to kfca NLP&IR / 32

27 Formal ϕ-concepts A (formal) ϕ-concept of the formal context (G, M, R) K is a pair (a, b) Y X such that (a) R ϕ = b and R ϕ(b) = a. We call: a the ϕ-extent and b the ϕ-intent of the concept (a, b), and ϕ its (minimum) degree of existence. ϕ-concepts are pairs (A, B) ϕ with similar properties to those of standard Formal Concept Analysis. ϕ R describes a minimum degree of existence required for pairs (A, B) R D min,+ RQ min,+ to be considered as members of the ϕ-lattice B ϕ (D, Q, MI DQ ) Rmin,+. F.J. Valverde (NLP&IR, UNED) From FCA to kfca NLP&IR / 32

28 Basic theorem of K-valued Formal Concept Analysis, finite version, 1 st half The hierarchical order. If (a 1, b 1 ) (a 2, b 2 ) are ϕ-concepts, (a 1, b 1 ) (a 2, b 2 ) a 1 K D a 2 b 1 op K Q b 2 Given a reflexive, idempotent semiring (K, ϕ), the ϕ-concept lattice B ϕ (D, Q, R) K of a K-valued formal context (D, Q, R) K is a (finite, complete) lattice in which infimum and supremum are given by: (a t, b t ) = t T (a t, b t ) = t T R ϕ t T a t, R ϕ t T t T a t R ϕ b t, F.J. Valverde (NLP&IR, UNED) From FCA to kfca NLP&IR / 32 t T R ϕ b t

29 Challenges Theoretical The relationship with the VSM in NLP and IR is very evident. tfidf is related to mutual information (Roelleke, 2008) (k)fca is a VSM in a different algebraic setting. The entailments, very enticing: There is a concept lattice structure underlying the VSM. There is an actual topology of information that is finer than the discrete topology. kfca actually shows how IR and IF are two sides of the same coin. The development of idempotent semiring algebra is way behind that of normal algebra (e.g. no known SVD, so idempotent LSI is unavailable). F.J. Valverde (NLP&IR, UNED) From FCA to kfca NLP&IR / 32

30 Challenges Practical The complexity of CL building algorithms is not good: O(DQK ) where K is the number of concepts in the lattice. But Big Data techniques may be of great help. Most toolkits deal with the dense context case, which for us is less interesting. The theory is agnostic with respect to the interpretations of D and Q. This is a mixed blessing. F.J. Valverde (NLP&IR, UNED) From FCA to kfca NLP&IR / 32

31 Summary (K)FCA as a coclustering strategy... KFCA does not try to solve the original (direct clustering) task. But it provides an alternative look into the task that makes it more realistic and varied: Deals naturally with lack of symmetry (confusion matrices) Deals naturally with data with many objects/few attributes (GED data) or viceversa (itemset analysis). Most of the advantages stem from: A very solid theory (FCA). A deep understanding of the maths behind (order lattice theory). Appropriateness of use: KFCA deals with counts, probabilities, concentrations: all positive quantities. F.J. Valverde (NLP&IR, UNED) From FCA to kfca NLP&IR / 32

32 Thank you! F.J. Valverde (NLP&IR, UNED) From FCA to kfca NLP&IR / 32

33 Norbert Fuhr. Probabilistic models or information retrieval. The Computer Journal, 35(3): , R Godin, E Saunders, and Jan Gecsei. Lattice model of browsable data spaces. Information Sciences, 40:89 116, J Hartigan. Direct clustering of a data matrix. Journal of the American Statistical Association, Jan Boris Mirkin. Mathematical Classification and Clustering, volume 11 of Nonconvex Optimization and Its Applications. Kluwer Academic Publishers, Francisco J. Valverde-Albacete. Combining soft and hard techniques for the analysis of batch retrieval tasks. In Enrique Herrera-Viedma, Gabriella Pasi, and Fabio Crestani, editors, Soft Computing for Information Retrieval on the Web. Models and Applications, volume 197 of Studies in Fuzziness and Soft Computing, pages Springer, Francisco J. Valverde-Albacete and Carmen Peláez-Moreno. Extending conceptualisation modes for generalised Formal Concept Analysis. Information Sciences, 181: , May F.J. Valverde (NLP&IR, UNED) From FCA to kfca NLP&IR / 32