Vers une Analyse Conceptuelle des Réseaux Sociaux



Similar documents
Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining

Part 2: Community Detection


Cluster analysis and Association analysis for the same data

Categorical Data Visualization and Clustering Using Subjective Factors

Practical Graph Mining with R. 5. Link Analysis

Graph Mining and Social Network Analysis

Clustering UE 141 Spring 2013

CIS 700: algorithms for Big Data

Protein Protein Interaction Networks

MAXIMAL FREQUENT ITEMSET GENERATION USING SEGMENTATION APPROACH

A discussion of Statistical Mechanics of Complex Networks P. Part I

Information Management course

Data Mining Clustering (2) Sheets are based on the those provided by Tan, Steinbach, and Kumar. Introduction to Data Mining

Course DSS. Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

How To Find Local Affinity Patterns In Big Data

How To Monitor User System Interactions Through Graph Based Dynamics Analysis

The Theory of Concept Analysis and Customer Relationship Mining

Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

PSG College of Technology, Coimbatore Department of Computer & Information Sciences BSc (CT) G1 & G2 Sixth Semester PROJECT DETAILS.

Mining Social-Network Graphs

Subgraph Patterns: Network Motifs and Graphlets. Pedro Ribeiro

Mining Large Datasets: Case of Mining Graph Data in the Cloud

SAP InfiniteInsight 7.0 SP1

DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS

Machine Learning and Data Analysis overview. Department of Cybernetics, Czech Technical University in Prague.

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining

K-Means Cluster Analysis. Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1

Forschungskolleg Data Analytics Methods and Techniques

Decision Support Optimization through Predictive Analytics - Leuven Statistical Day 2010

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining

Clustering. Adrian Groza. Department of Computer Science Technical University of Cluj-Napoca

Network Algorithms for Homeland Security

Why do statisticians "hate" us?

Chapter 5. Warehousing, Data Acquisition, Data. Visualization

Introduction to Data Mining

Clustering Artificial Intelligence Henry Lin. Organizing data into clusters such that there is

3. The Junction Tree Algorithms

Simple Graphs Degrees, Isomorphism, Paths

雲 端 運 算 願 景 與 實 現 馬 維 英 博 士 微 軟 亞 洲 研 究 院 常 務 副 院 長

High-dimensional labeled data analysis with Gabriel graphs

{ Mining, Sets, of, Patterns }

Big Data Analytics of Multi-Relationship Online Social Network Based on Multi-Subnet Composited Complex Network

A NOVEL RESOURCE EFFICIENT DMMS APPROACH

Intrusion Detection: Game Theory, Stochastic Processes and Data Mining

Static Data Mining Algorithm with Progressive Approach for Mining Knowledge

Network Analytics in Marketing

Cloud Monitoring. A challenging Application for Complex Event Processing. Bastian Hoßbach, Bernhard Seeger. ETH Zürich October 7, 2011

Semantic Video Annotation by Mining Association Patterns from Visual and Speech Features

Sanjeev Kumar. contribute

Graph Analysis of Student Model Networks

Innovative Data Mining based approaches for life course analysis

Analyzing User Patterns to Derive Design Guidelines for Job Seeking and Recruiting Website

Standardization of Components, Products and Processes with Data Mining

Topic 13 Predictive Modeling. Topic 13. Predictive Modeling

Data Mining and Knowledge Discovery in Databases (KDD) State of the Art. Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland

STATISTICA. Clustering Techniques. Case Study: Defining Clusters of Shopping Center Patrons. and

Exploring Big Data in Social Networks

How To Cluster Of Complex Systems

DATA ANALYSIS IN PUBLIC SOCIAL NETWORKS

A comparative study of social network analysis tools

An Efficient Clustering Algorithm for Market Basket Data Based on Small Large Ratios

Analytics on Big Data

Applied Research Laboratory: Visualization, Information and Imaging Programs

EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set

Data Mining and Pattern Recognition for Large-Scale Scientific Data

PulseTerraMetrix RS Production Benefit

Echidna: Efficient Clustering of Hierarchical Data for Network Traffic Analysis

Harnessing the Potential of. The ABCs of using social network approaches to design and evaluate health & development programs.

Introduction to Graph Mining

TDS - Socio-Environmental Data Science

Social Media Mining. Network Measures

Big Data: Rethinking Text Visualization

SPANNING CACTI FOR STRUCTURALLY CONTROLLABLE NETWORKS NGO THI TU ANH NATIONAL UNIVERSITY OF SINGAPORE

Comparison of K-means and Backpropagation Data Mining Algorithms

Statistical Models in Data Mining

CLAN: An Algorithm for Mining Closed Cliques from Large Dense Graph Databases

Clustering & Visualization

Association Analysis: Basic Concepts and Algorithms

Data Intensive Science and Computing

MapReduce and Distributed Data Analysis. Sergei Vassilvitskii Google Research

Unique column combinations

Visualization methods for patent data

Principles of Data Mining by Hand&Mannila&Smyth

A SOCIAL NETWORK ANALYSIS APPROACH TO ANALYZE ROAD NETWORKS INTRODUCTION

Individual security and network design

Search for the optimal strategy to spread a viral video: An agent-based model optimized with genetic algorithms

Comparative Analysis of EM Clustering Algorithm and Density Based Clustering Algorithm Using WEKA tool.

Predictive Modeling. Age. Sex. Y.o.B. Expected mortality. Model. Married Amount. etc. SOA/CAS Spring Meeting

Community Detection Proseminar - Elementary Data Mining Techniques by Simon Grätzer

BIG DATA VISUALIZATION. Team Impossible Peter Vilim, Sruthi Mayuram Krithivasan, Matt Burrough, and Ismini Lourentzou

Teaching Scheme Credits Assigned Course Code Course Hrs./Week. BEITC802 Big Data Analytics. Theory Marks

Data Mining Cluster Analysis: Advanced Concepts and Algorithms. Lecture Notes for Chapter 9. Introduction to Data Mining

Graph Mining Techniques for Social Media Analysis

Building Data Cubes and Mining Them. Jelena Jovanovic

A. Mrvar: Network Analysis using Pajek 1. Cluster

CAS CS 565, Data Mining

Discrete Mathematics & Mathematical Reasoning Chapter 10: Graphs

Sla Aware Load Balancing Algorithm Using Join-Idle Queue for Virtual Machines in Cloud Computing

General Network Analysis: Graph-theoretic. COMP572 Fall 2009

Transcription:

Vers une Analyse Conceptuelle des Réseaux Sociaux Erick Stattner Martine Collard Laboratory of Mathematics and Computer Science (LAMIA) University of the French West Indies and Guiana, France MARAMI 2012 Erick Stattner, Martine Collard MARAMI 2012 1 / 27

Motivation Issues New Science of Networks focuses on interactions between entities and investigates new methods and techniques Knowledge extraction from data on real world phenomena studied through interactions among individuals New data mining techniques: Link Mining (Node classification, Link-based Clustering, Link prediction, Frequent patterns...) Attributed graph mining (Cohesive sub-graphs, Summarization,...) Erick Stattner, Martine Collard MARAMI 2012 2 / 27

Data Mining Task Context: Search for frequent patterns to answer to questions like : What are the groups of nodes the most connected? What are the nodes properties the most frequently found in connection? Contribution: Search for Frequent Links in Social Networks between groups of nodes sharing internal common properties by combining network structure and node attribute values b b b b b b b r r r r b b r Frequent link (b,r) Erick Stattner, Martine Collard MARAMI 2012 3 / 27

Outline Frequent pattern discovery Node clustering 1 Frequent pattern discovery Node clustering 2 3 4 Erick Stattner, Martine Collard MARAMI 2012 4 / 27

Pattern Mining in Social Networks Current Methods Frequent pattern discovery Node clustering Main methods: Link prediction Frequent pattern discovery Node clustering Formal concept analysis Erick Stattner, Martine Collard MARAMI 2012 5 / 27

Pattern Mining in Social Networks Frequent pattern discovery Frequent pattern discovery: pattern = subgraph search for subgraphs occuring frequently into a large network into a set of networks Frequent pattern discovery Node clustering X 1. X 7. Y X Y X 2. X 3. X 4. X 5. X 6. Y 9. Z 8. Z 10. Y 11. Z X Y X Z X Y X Z Y Z Erick Stattner, Martine Collard MARAMI 2012 6 / 27

Pattern Mining in Social Networks Node clustering Frequent pattern discovery Node clustering Node clustering: based on links to detect subgraphs or "communities" objective: identifying groups of nodes densely connected into the network by maximizing intra-cluster links while minimizing inter-cluster links Erick Stattner, Martine Collard MARAMI 2012 7 / 27

Pattern Mining in Social Networks Hybrid Node clustering Frequent pattern discovery Node clustering Hybrid node clustering: based on links and on node attributes values objective: identifying groups of nodes that share common contacts Erick Stattner, Martine Collard MARAMI 2012 8 / 27

Formal concept analysis Frequent pattern discovery Node clustering Formal concept of links: based on links and on nodes objective: identifying groups of nodes that share common contacts Erick Stattner, Martine Collard MARAMI 2012 9 / 27

Pattern Mining in Social Networks Observation Frequent pattern discovery Node clustering Current methods mainly use network structure often ignore nodes properties Concept of frequent link combines information both from links and from node attributes values represents a regularity involving two groups of nodes that share internal common characteristics % % Erick Stattner, Martine Collard MARAMI 2012 10 / 27

Outline Knowledge extracted Analogy with lattices of itemsets 1 2 Knowledge extracted Analogy with lattices of itemsets 3 4 Erick Stattner, Martine Collard MARAMI 2012 11 / 27

Conceptual link Knowledge extracted Analogy with lattices of itemsets G = (V,E) network (directed) V defined as a relation R(A 1,...,A p ) A 1,...,A p node attributes each node v V defined by the itemset A 1 = a 1 and... and A p = a p or a 1...a p for m an itemset V m : set of nodes satisfying m sm sub-itemset of m V m V sm ex: V abc V ab Erick Stattner, Martine Collard MARAMI 2012 12 / 27

Conceptual link Knowledge extracted Analogy with lattices of itemsets G = (V,E) network I V set of all possible itemsets on G Left-hand side link set LE m = {e E ; e = (a,b) a V m } Right-hand side link set RE m = {e E ; e = (a,b) b V m } Conceptual link (m 1,m 2 ) = LE m1 RE m2 (1) = {e E ; e = (a,b) a V m1 et b V m2 } (2) Erick Stattner, Martine Collard MARAMI 2012 13 / 27

Frequent conceptual link Knowledge extracted Analogy with lattices of itemsets Support Support of l = (m 1,m 2 ) supp[(m 1,m 2 )] = (m 1,m 2 E β: link support threshold (m 1,m 2 ) is a frequent conceptual link iff: supp[(m 1,m 2 )] > β Erick Stattner, Martine Collard MARAMI 2012 14 / 27

Frequent Links Knowledge provided Knowledge extracted Analogy with lattices of itemsets Frequent Links: Provide knowledge on the groups of nodes the most connected in the social network i.e. knowledge on the properties most often connected Example: Bipartite network customer-product: m 1 : Gender= M and Interest= computer science m 2 : Category= Science Fiction and Product= book supp[(m 1,m 2 )] = 14% Erick Stattner, Martine Collard MARAMI 2012 15 / 27

Frequent conceptual link Downward-closure property Knowledge extracted Analogy with lattices of itemsets Sub and Super conceptual links (sm 1,sm 2 ) sub conceptual link of (m 1,m 2 ) (sm 1,sm 2 ) (m 1,m 2 ) Downward-closure property if l is frequent then all its sub-links sl are also frequent if l is unfrequent then all its super-links sl are also unfrequent Erick Stattner, Martine Collard MARAMI 2012 16 / 27

Maximal frequent conceptual link Knowledge extracted Analogy with lattices of itemsets Maximal frequent conceptual link (m 1,m 2 ) maximal frequent conceptual link iff l frequent conceptual link such as l l. Erick Stattner, Martine Collard MARAMI 2012 17 / 27

Conceptual view Lattice Knowledge extracted Analogy with lattices of itemsets Extraction of maximal frequent conceptual link on G Concept lattice and search space reduction ab, ab ab, ab ab, a ab, b a, ab b, ab ab, a ab, b a, ab b, ab a, a a, b b, a b, b a, a a, b b, a b, b Φ, Φ Φ, Φ (a) (b) Erick Stattner, Martine Collard MARAMI 2012 18 / 27

Conceptual view Knowledge extracted Analogy with lattices of itemsets β: link support threshold FL Vmax set of all maximal frequent conceptual links on G FL Vmax conceptual view of the social network G Seuil de support β Réseau Social Liens Conceptuels Fréquents Vue Conceptuelle 31% 22% 13% Erick Stattner, Martine Collard MARAMI 2012 19 / 27

Outline Testbed Extracted patterns 1 2 3 Testbed Extracted patterns 4 Erick Stattner, Martine Collard MARAMI 2012 20 / 27

cc General Degree Testbed Testbed Extracted patterns Testbed: Sub-network of the proximity contact network (City of Portland) simulated with Episim [Eubank,2005] Each node: age class, i.e. age 10, gender (1-male, 2-female), worker status, type of relationship with householder, contact class, i.e. degree 2 sociability Origine Portland Type Undirected #nodes 3000 #links 4683 Density 0.00110413 #comp 1 avg 3.087 max 15 0,3 0,2 Distribution 0,1 0 avg 0.63627 1 3 5 7 9 11 13 15 Erick Stattner, Martine Collard MARAMI 2012 21 / 27

Extracted patterns Testbed Extracted patterns Some examples of extracted patterns: β = 0.1 Maximal cfl Support ((4; ;1;,, ),( ; ;2;,, )) 0.107 ((2; ; ;2,, ),( ; ;2;2,, )) 0.105 (( ;1;1;,, ),( ; ;1;,, )) 0.113 10.7% of the links of the network connect 40 years old people who have a job to people who do not have a job β = 0.2 Maximal cfl Support (( ;2; ;,, ),( ; ;1;,, )) 0.231 (( ;1; ;,, ),( ; ;2;,, )) 0.288 (( ;2; ;,, ),( ;1; ;,, )) 0.297 23.1% of the links of the network connect men to people who have a job Erick Stattner, Martine Collard MARAMI 2012 22 / 27

Conceptuel view Testbed Extracted patterns Summarization Erick Stattner, Martine Collard MARAMI 2012 23 / 27

P(k) 0,11 0,12 0,13 0,14 0,15 0,16 0,17 0,18 0,19 0,2 0,11 0,12 0,13 0,14 0,15 0,16 0,17 0,18 0,19 0,2 Results Testbed Extracted patterns Network measures versus support threshold: Number of nodes and links (c), Density and clustering coeff. (d) and Degree distribution (e). 80 60 40 20 0 # Noeuds # Liens 0,6 0,5 0,4 0,3 0,2 0,1 0 Coeff. Clust. Densité Support Support (c) (d) 0,5 0,1 0,4 0,15 0,3 0,2 0,2 0,1 0 1 2 3 4 5 6 7 8 9 101112 Erick Stattner, Martine Collard MARAMI 2012 24 / 27

Outline 1 2 3 4 Erick Stattner, Martine Collard MARAMI 2012 25 / 27

Conclusion: New approach for extract frequent pattern in social data Combine information both from attributes values and links Two interests: Perspectives: Extract novel patterns : groups of nodes most connected Provide a kind of summarized representation of the network Optimization Scalability Erick Stattner, Martine Collard MARAMI 2012 26 / 27

Thanks for your attention! Erick Stattner, Martine Collard MARAMI 2012 27 / 27