Complex Networks Analysis: Clustering Methods



Similar documents
General Network Analysis: Graph-theoretic. COMP572 Fall 2009

Graph models for the Web and the Internet. Elias Koutsoupias University of Athens and UCLA. Crete, July 2003

Introduction to Networks and Business Intelligence

Network/Graph Theory. What is a Network? What is network theory? Graph-based representations. Friendship Network. What makes a problem graph-like?

Graphs over Time Densification Laws, Shrinking Diameters and Possible Explanations

PUBLIC TRANSPORT SYSTEMS IN POLAND: FROM BIAŁYSTOK TO ZIELONA GÓRA BY BUS AND TRAM USING UNIVERSAL STATISTICS OF COMPLEX NETWORKS

A discussion of Statistical Mechanics of Complex Networks P. Part I

USING SPECTRAL RADIUS RATIO FOR NODE DEGREE TO ANALYZE THE EVOLUTION OF SCALE- FREE NETWORKS AND SMALL-WORLD NETWORKS

! E6893 Big Data Analytics Lecture 10:! Linked Big Data Graph Computing (II)

Part 2: Community Detection

Big Data Analytics of Multi-Relationship Online Social Network Based on Multi-Subnet Composited Complex Network

GENERATING AN ASSORTATIVE NETWORK WITH A GIVEN DEGREE DISTRIBUTION

Walk-Based Centrality and Communicability Measures for Network Analysis

Social Media Mining. Graph Essentials

V. Adamchik 1. Graph Theory. Victor Adamchik. Fall of 2005

Emergence of Complexity in Financial Networks

Graphs, Networks and Python: The Power of Interconnection. Lachlan Blackhall - lachlan@repositpower.com

Bioinformatics: Network Analysis

Healthcare Analytics. Aryya Gangopadhyay UMBC

Complex networks: Structure and dynamics

IC05 Introduction on Networks &Visualization Nov

Graph Mining and Social Network Analysis

Temporal Dynamics of Scale-Free Networks

Social Media Mining. Network Measures

Effects of node buffer and capacity on network traffic

How To Predict The Growth Of A Network

Chapter 29 Scale-Free Network Topologies with Clustering Similar to Online Social Networks

Asking Hard Graph Questions. Paul Burkhardt. February 3, 2014

Graph Theory and Networks in Biology

Graph theory and network analysis. Devika Subramanian Comp 140 Fall 2008

NETZCOPE - a tool to analyze and display complex R&D collaboration networks

A scalable multilevel algorithm for graph clustering and community structure detection

Complex Network Visualization based on Voronoi Diagram and Smoothed-particle Hydrodynamics

Graph Mining Techniques for Social Media Analysis

The average distances in random graphs with given expected degrees

The architecture of complex weighted networks

Research Article A Comparison of Online Social Networks and Real-Life Social Networks: A Study of Sina Microblogging

The Structure of Growing Social Networks

LINEAR-ALGEBRAIC GRAPH MINING

Dmitri Krioukov CAIDA/UCSD

Recent Progress in Complex Network Analysis. Models of Random Intersection Graphs

Random graphs and complex networks

The Large-Scale Structure of Semantic Networks: Statistical Analyses and a Model of Semantic Growth

Statistical Inference for Networks Graduate Lectures. Hilary Term 2009 Prof. Gesine Reinert

DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS

A mixture model for random graphs

How To Understand The Network Of A Network

SCAN: A Structural Clustering Algorithm for Networks

Social and Economic Networks: Lecture 1, Networks?

Structural and functional analytics for community detection in large-scale complex networks

Expander Graph based Key Distribution Mechanisms in Wireless Sensor Networks

Introduced by Stuart Kauffman (ca. 1986) as a tunable family of fitness landscapes.

Chapter ML:XI (continued)

IE 680 Special Topics in Production Systems: Networks, Routing and Logistics*

Graph Theory and Complex Networks: An Introduction. Chapter 06: Network analysis

Community Detection Proseminar - Elementary Data Mining Techniques by Simon Grätzer

Social Networks and Social Media

DECENTRALIZED SCALE-FREE NETWORK CONSTRUCTION AND LOAD BALANCING IN MASSIVE MULTIUSER VIRTUAL ENVIRONMENTS

The mathematics of networks

Data Mining Cluster Analysis: Advanced Concepts and Algorithms. Lecture Notes for Chapter 9. Introduction to Data Mining

Statistical mechanics of complex networks

Time-Dependent Complex Networks:

The Topology of Large-Scale Engineering Problem-Solving Networks

Medical Information Management & Mining. You Chen Jan,15, 2013 You.chen@vanderbilt.edu

The Network Structure of Hard Combinatorial Landscapes

Practical statistical network analysis (with R and igraph)

Greedy Routing on Hidden Metric Spaces as a Foundation of Scalable Routing Architectures


Inet-3.0: Internet Topology Generator

Degree distribution in random Apollonian networks structures

B490 Mining the Big Data. 2 Clustering

How To Cluster Of Complex Systems

Discrete Mathematics & Mathematical Reasoning Chapter 10: Graphs

Robustness of Spatial Databases: Using Network Analysis on GIS Data Models

Parallel Algorithms for Small-world Network. David A. Bader and Kamesh Madduri

Towards Modelling The Internet Topology The Interactive Growth Model

Statistical and computational challenges in networks and cybersecurity

Social Network Mining

Transcription:

Complex Networks Analysis: Clustering Methods Nikolai Nefedov Spring 2013 ISI ETH Zurich nefedov@isi.ee.ethz.ch 1

Outline Purpose to give an overview of modern graph-clustering methods and their applications for analysis of complex dynamic networks. Planned topics short introduction to complex networks discrete vector calculus, graph Laplacian, graph spectral analysis methods of community detection based on modularity maximization random walk on graphs, Laplacian dynamics, stability of community detection multi-layer graphs: clustering and regularization topology detection via system dynamics dynamic network analysis and missing links prediction applications for real-world datasets (multi-dimensional time series and network analysis) 2

Complex Systems Complex vs Complicated Complex systems (no unique definition): a (large) number of interacting elements stochastic interactions no centralized authority, self-organized Emerging properties system behavior arises from interaction structure: detailed understanding of elements in isolation is not enough even if elements follow simple rules (chaotic behavior) evolving structures, system adaptation hierarchies, heavy-tails,... Complex Systems => Statistical physics large scale regularities microscopic origins of marcoscopic behavior multiple (hierarchical) scales 3

Complex Systems Complex Systems => Complex Networks Stat. Physics approach a fixed level of abstraction vertices => interacting elements edges => interactions (statistical) analysis of network structure dynamical processes taking place on a network dynamics of a network Graph theory approach (mostly static graphs) simple graphs => cuts, structure, factorization, spanning trees,... multigraphs => multiple edges and self-loops hypergraphs => hyper-edge as a set of vertices multi-layer graphs => a set of graphs on the same vertices => tensors multiplexing graphs 4

Graph Theory Origin: Leonhard Euler (1736) Königsberg L. Euler, Solutio problematis ad geometriam situs pertinentis, Comment. Academiae Sci. J. Petropolitanae 8, 128-140 (1736) (Euler theorem: when we can draw a graph with a single line) 5

6

Complex Networks Stat. Physics approach network analysis statistical analysis (random networks, small-world, scale-free networks) network structure analysis clustering network partition classification (taxonomy => hierarchical classification) clustering => unsupervised classification (problem dependent) relates data to knowledge (basic human activity) dynamical processes taking place on a network random walk, opinion (voting) dynamics, synchronization game-strategies... convergence, stability... distributed computations/control dynamics of a network evolving networks interplay between network topology and dynamics on a network adaptive /learning networks 7

Outline Purpose to give an overview of modern graph-clustering methods and their applications for analysis of complex dynamic networks. Planned topics short introduction to complex networks discrete vector calculus, graph Laplacian, graph spectral analysis methods of community detection based on modularity maximization random walk on graphs, Laplacian dynamics, stability of community detection multi-layer graphs: clustering and regularization topology detection via system dynamics dynamic network analysis and missing links prediction applications for real-world datasets (multi-dimensional time series and network analysis) 8

Outline Purpose to give an overview of modern graph-clustering methods and their applications for analysis of complex dynamic networks. Planned topics short introduction to complex networks complex networks, definitions, basics Graph partition min-cut, normalized-cut, min-ratio-cut Brief overview of vector calculus: differential operators (gradient, divergence, Laplace operator) Graph Laplacian as a discrete version of Laplace-Beltrami operator Spectral analysis based on graph Laplacian Limits of spectral analysis 9

Basics: Network Structure Network or graph G = (V,E) => set of vertices joined by edges, V = {vi } set of vertices i =1,, N, E = {e (i, j ) } set of links/edges => (ordered) pair elements from V, max E = N (N 1) /2 ; vi is a neighbor of vj if there is e ( i, j ) in E number of neighbors k of a vertex vi is called its degree in directed networks: in- and out- degrees k in, k out edge density of the graph: ρ= E / N N 1 /2 ρ = 1 => fully connected, ρ << 1 => sparse graph Cycle/loop = closed path (distinct vertices/edges) Graph types: regular, tree, forest Bipartite network: 2 types of nodes, links only between nodes of different types. 10

Basics: Network Structure Shortest path between i and j => a path with min number of edges Distance d(i,j) => measure associated with the shortest path between i and j 2d i,j / N N 1 Average shortest distance l = Diameter of the graph d = max d i,j Connected graph: there is a path between any pair of nodes Min connected graph => no loops => tree, E = N - 1 edges Forest => collection of trees Fully connected (complete) graph: d (i,j) = 1 for all i,j E = N(N 1) /2 Adjacency matrix A (i,j) = 1 if e {i,j } in E, 0 otherwise Clique: a fully connected subgraph k-clique: clique with k vertices Motifs: subgraphs which often occur in a network (wrt to a null model) 11

Basics: Network Structure Centrality measures: node degree = number of neighbors Closeness centrality: d c i =1/ Σ j i d i,j measures how far (on the average) a vertex is from all other vertices Betweenness centrality = number of shortest paths going through vertex/edge, measures the amount of flow through a vertex/edge,computationally demanding. b i = d i l,m /d l,m l,m d(l,m) shortest paths between l and m; di(l,m) shortest paths going through node i Clustering coefficient of a node N 1 C i = e e e k i k i 1 j k ij jk ki 2 Ei C i = k i k i 1 Average clustering coefficient of a graph 12 C G = C i / N = triangles connected triples

Network: Statistical characterization Degree distribution p(k) => probability that a randomly chosen vertex has degree k P(k k ): => cond. prob. that a vertex of degree k is connected to a vertex of degree k Average degree <k > = 2 E /N Sparse graphs: <k> << N Average degree of nearest neighbors of node i : Average degree fluctuations: <k2> Clustering spectrum (of vertices which have the same degree) Topological heterogeneity: homogeneous networks: light tails heterogeneous networks: skewed, heavy tails 13

Stochastic Networks Stochastic network -> not s single graph, but a statistical ensemble Erdős Rényi (random) networks: G (N,p) - connect N vertices randomly, each pair is connected with probability p - ensemble of possible realizations: network properties => averages over the ensemble - average number of edges E = pn N 1 /2 - average degree k = 2 E / N = p N 1 pn triangles Clustering coefficient C G = E-R networks C ER = p = connected triples k N practically there is no clustering large random networks are tree-like networks 14

Erdős Rényi Networks Example N = 3, p = 1/3

Erdős Rényi Networks Probability pi k that vertex i has a degree k connected to k vertices, not connected to the other N k 1 k pi k = C N 1 pk 1 p N Degree distribution for the whole network P k = pi k / N i= 1 For E-R networks average degree: k = 2 E / N=p N 1 pn N s. t. k = const => Poisson distribution k k k k pn P k = e = exp pn k! k! 16 N 1 k

Erdős Rényi Networks Connected component sizes N s. t. k = const mean component size giant component relative giant component size 17 small subgraphs k k 1 : many small subgraphs k >>1 : giant component + small subgraphs k =1 : phase transition (percolation)

Erdős Rényi Networks Degree distribution: Poisson (degrees of all nodes close to average) No correlations, all edges exist independently of each other Path lengths grow logarithmically with system size, <l> ~ ln (N) Connectivity depends on average degree <k> small <k> => several disjoint components, high <k> => giant connected component there is a percolation transition phase (from a fragmented to a connected) Very homogeneous networks 18

Real-World Networks Shortest path Clustering Random networks Short Low Real networks Short High Regular-topology networks * Long High * * [Watts & Strogatz 1998] 19

Random vs Real-World Networks Degree distributions Poison distribution k k k P k = e k! [Barabási & Albert, 1999] 20 Heavy tail distributions (often power law in log axes)

Network Models: Small-World D.J. Watts and S. Strogatz, Collective dynamics of 'small-world' networks", Nature 393, 440 442, 1998 WS model: Take a regular clustered network Rewire the endpoint of each link to a random node with probability p SWN => a simple model for interpolating between regular and random networks clustering coefficient WS model, k>2 [Barrat & Weight, 2000] Degree distribution 21 Randomness controlled by a single tuning parameter N >> k >> ln(n) >> 1 <= independent of system size

Network Models: Small-World Networks Small-World Network short paths, high clustering regular network Clustering Path Length 22 [Watts & Strogatz] N = 1000 k = 10 average over 20 realizations at each p random network

Network Models: Small-World Networks Epidemics: number of infected Epidemic size Density of shortcuts Network structure strongly affects processes taking place on networks [Watts & Strogatz] 23 Dynamics of sync, virus spreading : small number of shortcuts greatly speeds up the process: 3% shortcuts => 50% epidemic

Network Models: Scale-Free Networks A.-L. Barabási & R. Albert, Emergence of Scaling in Random Networks, Science 286, 509 (1999) Degree distributions Power-Law Distribution logarithmic axes 24

Power Law Distributions P k = Power-law tails, k>k min γ 1 γ 1 k min k γ k = k n P k dk n Fluctuations k min n n k = k min γ 1 γ 1 n for γ n 1 Level of heterogeneity: 1<γ< 2 k kc= cut-off due to finite-size diverging degree fluctuations for γ< 3 2 <γ<3 k 2 only k γ 1 Scale invariance: F k F αk =α D F k <=> shift on log scale γ Power-law: P k = Ak γ P αk = A αk 25 for most of real world networks 2 <γ<3 = α γ P k

Power Law Distributions Networks with Power Law Distributions => Scale-Free Networks logarithmic axes power-law no characteristic scale (node degree) in the distribution 26 P k = k k exp k /k!

Barabási-Albert Model Scale-Free Networks Where networks come from? Networks are not static => growth networks B-A model of network growth based on the principle of preferential attachment - the rich get richer 2 results in networks with a power-law degree distribution P k =2m /k (average degree <k> = 2m ) 1. Take a small seed network, e.g. a few connected nodes 2. Let a new node of degree m enter the network 3. Connect the new node to existing nodes such that the probability πi Degree distribution 27 2m2 P k = 3 k of connecting to node i of degree ki is π i= Average shortest path lengths 3 ki ki Clustering coefficient:

Network Models Random p = 0.02 28 Small world p = 0.1 Scale free <k> = 2

Network Models: Summary Erdös-Renyi model short path lengths Poisson distribution (no hubs) no clustering Watts-Strogatz Small World model short path lengths high clustering (N independent) almost constant degrees 29 Barabási-Albert scale-free model short path lengths power-law distribution for degrees robustness no clustering (may be fixed) Real-world networks short path lengths high clustering broad degree distributions, often power laws

Similarity Graphs Graphs embedded in space Euclidean distance (L2 norm) Manhattan distance (L1 norm) Cosine similarity Graphs built from data: Data points from Euclidean space, sampling of some underlying distribution,... Connectivity parameter: k (KNN), ε - neighborhood graph,... Similarity measure => fully connected (weighted ) matrix Graphs not embedded in space Neighborhood measures - structural equivalence: share the same neighbors => Jaccard coefficient - regular equivalence: if neighbors of a node are similar Pearson correlation coefficient Path dependent measures Measures based on random walk: - commute-time: average number of steps for a random to hit a target and return - escape probability: probability to hit a target before coming back 30