Vers une Analyse Conceptuelle des Réseaux Sociaux Erick Stattner Martine Collard Laboratory of Mathematics and Computer Science (LAMIA) University of the French West Indies and Guiana, France MARAMI 2012 Erick Stattner, Martine Collard MARAMI 2012 1 / 27
Motivation Issues New Science of Networks focuses on interactions between entities and investigates new methods and techniques Knowledge extraction from data on real world phenomena studied through interactions among individuals New data mining techniques: Link Mining (Node classification, Link-based Clustering, Link prediction, Frequent patterns...) Attributed graph mining (Cohesive sub-graphs, Summarization,...) Erick Stattner, Martine Collard MARAMI 2012 2 / 27
Data Mining Task Context: Search for frequent patterns to answer to questions like : What are the groups of nodes the most connected? What are the nodes properties the most frequently found in connection? Contribution: Search for Frequent Links in Social Networks between groups of nodes sharing internal common properties by combining network structure and node attribute values b b b b b b b r r r r b b r Frequent link (b,r) Erick Stattner, Martine Collard MARAMI 2012 3 / 27
Outline Frequent pattern discovery Node clustering 1 Frequent pattern discovery Node clustering 2 3 4 Erick Stattner, Martine Collard MARAMI 2012 4 / 27
Pattern Mining in Social Networks Current Methods Frequent pattern discovery Node clustering Main methods: Link prediction Frequent pattern discovery Node clustering Formal concept analysis Erick Stattner, Martine Collard MARAMI 2012 5 / 27
Pattern Mining in Social Networks Frequent pattern discovery Frequent pattern discovery: pattern = subgraph search for subgraphs occuring frequently into a large network into a set of networks Frequent pattern discovery Node clustering X 1. X 7. Y X Y X 2. X 3. X 4. X 5. X 6. Y 9. Z 8. Z 10. Y 11. Z X Y X Z X Y X Z Y Z Erick Stattner, Martine Collard MARAMI 2012 6 / 27
Pattern Mining in Social Networks Node clustering Frequent pattern discovery Node clustering Node clustering: based on links to detect subgraphs or "communities" objective: identifying groups of nodes densely connected into the network by maximizing intra-cluster links while minimizing inter-cluster links Erick Stattner, Martine Collard MARAMI 2012 7 / 27
Pattern Mining in Social Networks Hybrid Node clustering Frequent pattern discovery Node clustering Hybrid node clustering: based on links and on node attributes values objective: identifying groups of nodes that share common contacts Erick Stattner, Martine Collard MARAMI 2012 8 / 27
Formal concept analysis Frequent pattern discovery Node clustering Formal concept of links: based on links and on nodes objective: identifying groups of nodes that share common contacts Erick Stattner, Martine Collard MARAMI 2012 9 / 27
Pattern Mining in Social Networks Observation Frequent pattern discovery Node clustering Current methods mainly use network structure often ignore nodes properties Concept of frequent link combines information both from links and from node attributes values represents a regularity involving two groups of nodes that share internal common characteristics % % Erick Stattner, Martine Collard MARAMI 2012 10 / 27
Outline Knowledge extracted Analogy with lattices of itemsets 1 2 Knowledge extracted Analogy with lattices of itemsets 3 4 Erick Stattner, Martine Collard MARAMI 2012 11 / 27
Conceptual link Knowledge extracted Analogy with lattices of itemsets G = (V,E) network (directed) V defined as a relation R(A 1,...,A p ) A 1,...,A p node attributes each node v V defined by the itemset A 1 = a 1 and... and A p = a p or a 1...a p for m an itemset V m : set of nodes satisfying m sm sub-itemset of m V m V sm ex: V abc V ab Erick Stattner, Martine Collard MARAMI 2012 12 / 27
Conceptual link Knowledge extracted Analogy with lattices of itemsets G = (V,E) network I V set of all possible itemsets on G Left-hand side link set LE m = {e E ; e = (a,b) a V m } Right-hand side link set RE m = {e E ; e = (a,b) b V m } Conceptual link (m 1,m 2 ) = LE m1 RE m2 (1) = {e E ; e = (a,b) a V m1 et b V m2 } (2) Erick Stattner, Martine Collard MARAMI 2012 13 / 27
Frequent conceptual link Knowledge extracted Analogy with lattices of itemsets Support Support of l = (m 1,m 2 ) supp[(m 1,m 2 )] = (m 1,m 2 E β: link support threshold (m 1,m 2 ) is a frequent conceptual link iff: supp[(m 1,m 2 )] > β Erick Stattner, Martine Collard MARAMI 2012 14 / 27
Frequent Links Knowledge provided Knowledge extracted Analogy with lattices of itemsets Frequent Links: Provide knowledge on the groups of nodes the most connected in the social network i.e. knowledge on the properties most often connected Example: Bipartite network customer-product: m 1 : Gender= M and Interest= computer science m 2 : Category= Science Fiction and Product= book supp[(m 1,m 2 )] = 14% Erick Stattner, Martine Collard MARAMI 2012 15 / 27
Frequent conceptual link Downward-closure property Knowledge extracted Analogy with lattices of itemsets Sub and Super conceptual links (sm 1,sm 2 ) sub conceptual link of (m 1,m 2 ) (sm 1,sm 2 ) (m 1,m 2 ) Downward-closure property if l is frequent then all its sub-links sl are also frequent if l is unfrequent then all its super-links sl are also unfrequent Erick Stattner, Martine Collard MARAMI 2012 16 / 27
Maximal frequent conceptual link Knowledge extracted Analogy with lattices of itemsets Maximal frequent conceptual link (m 1,m 2 ) maximal frequent conceptual link iff l frequent conceptual link such as l l. Erick Stattner, Martine Collard MARAMI 2012 17 / 27
Conceptual view Lattice Knowledge extracted Analogy with lattices of itemsets Extraction of maximal frequent conceptual link on G Concept lattice and search space reduction ab, ab ab, ab ab, a ab, b a, ab b, ab ab, a ab, b a, ab b, ab a, a a, b b, a b, b a, a a, b b, a b, b Φ, Φ Φ, Φ (a) (b) Erick Stattner, Martine Collard MARAMI 2012 18 / 27
Conceptual view Knowledge extracted Analogy with lattices of itemsets β: link support threshold FL Vmax set of all maximal frequent conceptual links on G FL Vmax conceptual view of the social network G Seuil de support β Réseau Social Liens Conceptuels Fréquents Vue Conceptuelle 31% 22% 13% Erick Stattner, Martine Collard MARAMI 2012 19 / 27
Outline Testbed Extracted patterns 1 2 3 Testbed Extracted patterns 4 Erick Stattner, Martine Collard MARAMI 2012 20 / 27
cc General Degree Testbed Testbed Extracted patterns Testbed: Sub-network of the proximity contact network (City of Portland) simulated with Episim [Eubank,2005] Each node: age class, i.e. age 10, gender (1-male, 2-female), worker status, type of relationship with householder, contact class, i.e. degree 2 sociability Origine Portland Type Undirected #nodes 3000 #links 4683 Density 0.00110413 #comp 1 avg 3.087 max 15 0,3 0,2 Distribution 0,1 0 avg 0.63627 1 3 5 7 9 11 13 15 Erick Stattner, Martine Collard MARAMI 2012 21 / 27
Extracted patterns Testbed Extracted patterns Some examples of extracted patterns: β = 0.1 Maximal cfl Support ((4; ;1;,, ),( ; ;2;,, )) 0.107 ((2; ; ;2,, ),( ; ;2;2,, )) 0.105 (( ;1;1;,, ),( ; ;1;,, )) 0.113 10.7% of the links of the network connect 40 years old people who have a job to people who do not have a job β = 0.2 Maximal cfl Support (( ;2; ;,, ),( ; ;1;,, )) 0.231 (( ;1; ;,, ),( ; ;2;,, )) 0.288 (( ;2; ;,, ),( ;1; ;,, )) 0.297 23.1% of the links of the network connect men to people who have a job Erick Stattner, Martine Collard MARAMI 2012 22 / 27
Conceptuel view Testbed Extracted patterns Summarization Erick Stattner, Martine Collard MARAMI 2012 23 / 27
P(k) 0,11 0,12 0,13 0,14 0,15 0,16 0,17 0,18 0,19 0,2 0,11 0,12 0,13 0,14 0,15 0,16 0,17 0,18 0,19 0,2 Results Testbed Extracted patterns Network measures versus support threshold: Number of nodes and links (c), Density and clustering coeff. (d) and Degree distribution (e). 80 60 40 20 0 # Noeuds # Liens 0,6 0,5 0,4 0,3 0,2 0,1 0 Coeff. Clust. Densité Support Support (c) (d) 0,5 0,1 0,4 0,15 0,3 0,2 0,2 0,1 0 1 2 3 4 5 6 7 8 9 101112 Erick Stattner, Martine Collard MARAMI 2012 24 / 27
Outline 1 2 3 4 Erick Stattner, Martine Collard MARAMI 2012 25 / 27
Conclusion: New approach for extract frequent pattern in social data Combine information both from attributes values and links Two interests: Perspectives: Extract novel patterns : groups of nodes most connected Provide a kind of summarized representation of the network Optimization Scalability Erick Stattner, Martine Collard MARAMI 2012 26 / 27
Thanks for your attention! Erick Stattner, Martine Collard MARAMI 2012 27 / 27