Graph Mining and Social Network Analysis



Similar documents
9.1. Graph Mining, Social Network Analysis, and Multirelational Data Mining. Graph Mining

Graph Mining and Social Network Analysis. Data Mining; EECS 4412 Darren Rolfe + Vince Chu

Practical Graph Mining with R. 5. Link Analysis

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining

SCAN: A Structural Clustering Algorithm for Networks

DATA MINING TECHNOLOGY. Keywords: data mining, data warehouse, knowledge discovery, OLAP, OLAM.

Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10

Graph Theory and Complex Networks: An Introduction. Chapter 06: Network analysis

IJCSES Vol.7 No.4 October 2013 pp Serials Publications BEHAVIOR PERDITION VIA MINING SOCIAL DIMENSIONS

Protein Protein Interaction Networks

International Journal of World Research, Vol: I Issue XIII, December 2008, Print ISSN: X DATA MINING TECHNIQUES AND STOCK MARKET

Search and Data Mining: Techniques. Applications Anya Yarygina Boris Novikov

Data Exploration and Preprocessing. Data Mining and Text Mining (UIC Politecnico di Milano)

Distance Degree Sequences for Network Analysis

Graphs over Time Densification Laws, Shrinking Diameters and Possible Explanations

Big Data Analytics of Multi-Relationship Online Social Network Based on Multi-Subnet Composited Complex Network

Complex Networks Analysis: Clustering Methods

Mining Large Datasets: Case of Mining Graph Data in the Cloud

A Serial Partitioning Approach to Scaling Graph-Based Knowledge Discovery

A Statistical Text Mining Method for Patent Analysis

How To Solve The Kd Cup 2010 Challenge

Introduction to Graph Mining

PSG College of Technology, Coimbatore Department of Computer & Information Sciences BSc (CT) G1 & G2 Sixth Semester PROJECT DETAILS.

Social Media Mining. Data Mining Essentials

Graph Theory and Complex Networks: An Introduction. Chapter 06: Network analysis. Contents. Introduction. Maarten van Steen. Version: April 28, 2014

Building Data Cubes and Mining Them. Jelena Jovanovic

Graph Processing and Social Networks

Information Management course

Scientific Collaboration Networks in China s System Engineering Subject

Subgraph Patterns: Network Motifs and Graphlets. Pedro Ribeiro

Social Media Mining. Graph Essentials

An In-depth Comparison of Subgraph Isomorphism Algorithms in Graph Databases

KEYWORD SEARCH OVER PROBABILISTIC RDF GRAPHS

DATA MINING TECHNIQUES AND APPLICATIONS

FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM

K-Means Cluster Analysis. Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1

Bing Liu. Web Data Mining. Exploring Hyperlinks, Contents, and Usage Data. With 177 Figures. ~ Spring~r

Data Mining Solutions for the Business Environment

Clustering. Adrian Groza. Department of Computer Science Technical University of Cluj-Napoca

Data Mining and Knowledge Discovery in Databases (KDD) State of the Art. Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland

Data Mining Analytics for Business Intelligence and Decision Support

Categorical Data Visualization and Clustering Using Subjective Factors

Network Algorithms for Homeland Security

Comparison of K-means and Backpropagation Data Mining Algorithms

ARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING)

Investigating Clinical Care Pathways Correlated with Outcomes

Part 2: Community Detection

Graph Theory and Complex Networks: An Introduction. Chapter 08: Computer networks

Mining Social-Network Graphs

How To Cluster Of Complex Systems

1 o Semestre 2007/2008

Network/Graph Theory. What is a Network? What is network theory? Graph-based representations. Friendship Network. What makes a problem graph-like?

Asking Hard Graph Questions. Paul Burkhardt. February 3, 2014

Classification and Prediction

A SOCIAL NETWORK ANALYSIS APPROACH TO ANALYZE ROAD NETWORKS INTRODUCTION

Extracting Information from Social Networks

Data Mining on Social Networks. Dionysios Sotiropoulos Ph.D.

Binary Coded Web Access Pattern Tree in Education Domain

Association Rule Mining

Collective Behavior Prediction in Social Media. Lei Tang Data Mining & Machine Learning Group Arizona State University

Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data

Evaluation of Different Task Scheduling Policies in Multi-Core Systems with Reconfigurable Hardware

Mining Social Network Graphs

Web Usage Mining: Identification of Trends Followed by the user through Neural Network

Static Data Mining Algorithm with Progressive Approach for Mining Knowledge

Principles of Dat Da a t Mining Pham Tho Hoan hoanpt@hnue.edu.v hoanpt@hnue.edu. n

An Introduction to the Use of Bayesian Network to Analyze Gene Expression Data

Distributed forests for MapReduce-based machine learning

KNIME TUTORIAL. Anna Monreale KDD-Lab, University of Pisa

BIRCH: An Efficient Data Clustering Method For Very Large Databases

Comparison of Data Mining Techniques used for Financial Data Analysis

EMPIRICAL STUDY ON SELECTION OF TEAM MEMBERS FOR SOFTWARE PROJECTS DATA MINING APPROACH

3. The Junction Tree Algorithms

An Overview of Knowledge Discovery Database and Data mining Techniques

Customer Classification And Prediction Based On Data Mining Technique

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

Predicting borrowers chance of defaulting on credit loans

Data Mining Applications in Manufacturing

not possible or was possible at a high cost for collecting the data.

Social Networks and Social Media

QUALITY OF SERVICE METRICS FOR DATA TRANSMISSION IN MESH TOPOLOGIES

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015

Parallel Algorithms for Small-world Network. David A. Bader and Kamesh Madduri

MapReduce Approach to Collective Classification for Networks

Predicting the Risk of Heart Attacks using Neural Network and Decision Tree

D A T A M I N I N G C L A S S I F I C A T I O N

Knowledge Discovery from Data Bases Proposal for a MAP-I UC

TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM

Data Mining Practical Machine Learning Tools and Techniques

PREDICTIVE DATA MINING ON WEB-BASED E-COMMERCE STORE

City University of Hong Kong. Information on a Course offered by the Department of Management Sciences with effect from Semester A in 2012 / 2013

Transcription:

Graph Mining and Social Network Analysis Data Mining and Text Mining (UIC 583 @ Politecnico di Milano)

References Jiawei Han and Micheline Kamber, "Data Mining: Concepts and Techniques", The Morgan Kaufmann Series in Data Management Systems (Second Edition) Chapter 9

Graph Mining

Graph Mining Overview Graphs are becoming increasingly important to model many phenomena in a large class of domains (e.g., bioinformatics, computer vision, social analysis) To deal with these needs, many data mining approaches have been extended also to graphs and trees Major approaches Mining frequent subgraphs Indexing Similarity search Classification Clustering

Mining frequent subgraphs Given a labeled graph data set D = {G 1,G 2,,G n } We define support(g) as the percentage of graphs in D where g is a subgraph A frequent subgraph in D is a subgraph with a support greater than min_sup How to find frequent subgraph? Apriori-based approach Pattern-growth approach

AprioriGraph Apply a level-wise iterative algorithm 1. Search for similar size-k frequent subgraphs 2. Merge two similar subgraphs in a size-(k+1) subgraph 3. Check if the new subgraph is frequent 4. Restart from 2. until all similar subgraphs have been considered. Otherwise restart from 1. and move to k+1. What is subgraph size? Number of vertex Number of edges Number of edge-disjoint paths Two subgraphs of size-k are similar if they have the same size-(k-1) subgraph AprioriGraph has a big computational cost (due to the merging step)

PatternGrowthGraph Incrementally extend frequent subgraphs 1. Add to S each frequent subgraphs g E obtained by extending subgraph g 2. Until S is not empty, select a new subgraph g in S to extend and start from 1. How to extend a subgraph? Add a vertex Add an edge The same graph can be discovered many times! Get rid of duplicates once discovered Reduce the generation of duplicates

Mining closed, unlabeled, and constrained subgraphs Closed subgraphs G is closed iff there is no proper supergraph G with the same support of G Reduce the growth of subgraphs discovered Is a more compact representation of knowledge Unlabeled (or partially labeled) graphs Introduce a special label Φ Φ can match any label or not Constrained subgraphs Containment constraint (edges, vertex, subgraphs) Geometric constraint Value constraint

Graph Indexing Indexing is basilar for effective search and query processing How to index graphs? Path-based approach takes the path as indexing unit All the path up to maxl length are indexed Does not scale very well gindex approach takes frequent and discriminative subgraphs as indexing unit A subgraph is frequent if it has a support greater than a threshold A subgraph is discriminative if its support cannot be well approximated by the intersection of the graph sets that contain one of its subgraphs

Graph Classification and Clustering Mining of frequent subgraphs can be effectively used for classification and clustering purposes Classification Frequent and discriminative subgraphs are used as features to perform the classification task A subgraph is discriminative if it is frequent only in one class of graphs and infrequent in the others The threshold on frequency and discriminativeness should be tuned to obtain the desired classification results Clustering The mined frequent subgraphs are used to define similarity between graphs Two graphs that share a large set of patterns should be considered similar and grouped in the same cluster The threshold on frequency can be tuned to find the desired number of clusters As the mining step affects heavily the final outcome, this is an intertwined process rather tan a two-steps process

Social Network Analysis

Social Network A social network is an heterogeneous and multirelational dataset represented by a graph Vertexes represent the objects (entities) Edges represent the links (relationships or interaction) Both objects and links may have attributes Social networks are usually very large Social network can be used to represents many real-world phenomena (not necessarily social) Electrical power grids Phone calls Spread of computer virus WWW

Small World Networks (1) Are social networks random graphs? NO! Internet Map Science Coauthorship High degree of local clustering Protein Network Few degrees of separation

Small World Networks (2) Sarah Peter Jane Ralph Society: Six degrees S. Milgram 1967 F. Karinthy 1929 WWW: 19 degrees Albert et al. 1999

Small World Networks (3) Definitions Node s degree us the number of incident edges Network effective diameter is the max distance within 90% of the network Properties Densification power law Shrinking diameter Heavy-tailed degrees distribution n: number of nodes e: number of eges 1<α<2 Node degrees Preferential attachment model Node rank

Mining social networks (1) Several Link mining tasks can be identified in the analysis of social networks Link based object classification Classification of objects on the basis of its attributes, its links and attributes of objects linked to it E.g., predict topic of a paper on the basis of Keywords occurrence Citations and cocitations Link type prediction Prediction of link type on the basis of objects attributes E.g., predict if a link between two Web pages is an advertising link or not Predicting link existence Predict the presence of a link between two objects

Mining social networks (2) Link cardinality estimation Prediction of the number of links to an object Prediction of the number of objects reachable from a specific object Object reconciliation Discover if two objects are the same on the basis of their attributes and links E.g., predict if two websites are mirrors of each other Group detection Clustering of objects on the basis both of their attributes and their links Subgraph detection Discover characteristic subgraphs within network

Challenges Feature construction Not only the objects attributes need to be considered but also attributes of linked objects Feature selection and aggregation techniques must be applied to reduce the size of search space Collective classification and consolidation Unlabeled data cannot be classified independently New objects can be correlated and need to be considered collectively to consolidate the current model Link prediction The prior probability of link between two objects may be very low Community mining from multirelational networks Many approaches assume an homogenous relationship while social networks usually represent different communities and functionalities

Applications Link Prediction Viral Marketing Community Mining

Link prediction What edges will be added to the network? Given a snapshot of a network at time t, link prediction aims to predict the edges that will be added before a given future time t Link prediction is generally solved assigning to each pair of nodes a weight score(x,y) The higher the score the more likely that link will be added in the near future The score(x,y) can be computed in several way Shortest path: the shortest he path between X and Y the highest is their score Common neighbors: the greater the number of neighbors X and Y have in common, the highest is their score Ensemble of all paths: weighted sum of paths that connects X and Y (shorter paths have usually larger weights)

Viral Marketing Several marketing approaches Mass marketing is targeted on specific segment of customers Direct marketing is target on specific customers solely on the basis of their characteristics Viral marketing tries to exploit the social connections to maximize the output of marketing actions Each customer has a specific network value based on The number of connections Its role in the network (e.g., opinion leader, listener) Role of its connections Viral marketing aims to exploit the network value of customers to predict their influence and to maximize the outcome of marketing actions

Viral Marketing: Random Spreading 500 randomly chosen customers are given a product (from 5000).

Viral Marketing: Directed Spreading The 500 most connected consumers are given a product.

Community Mining In social networks there are usually several kinds of relationships between objects A social network usually contains several relation networks that plays an important role to identify different communities The relation that identify a community can be an hidden relation Relation extraction and selection techniques are generally used to discover communities in social networks Example: