Practical statistical network analysis (with R and igraph)



Similar documents
Social Media Mining. Graph Essentials

DATA ANALYSIS II. Matrix Algorithms

A comparative study of social network analysis tools

Part 2: Community Detection

SGL: Stata graph library for network analysis

Complex Networks Analysis: Clustering Methods

Statistical and computational challenges in networks and cybersecurity

Social Media Mining. Network Measures

Option 1: empirical network analysis. Task: find data, analyze data (and visualize it), then interpret.

Graph/Network Visualization

How To Understand The Network Of A Network

Network/Graph Theory. What is a Network? What is network theory? Graph-based representations. Friendship Network. What makes a problem graph-like?

Equivalence Concepts for Social Networks

Discrete Mathematics & Mathematical Reasoning Chapter 10: Graphs

Tools and Techniques for Social Network Analysis

Network Metrics, Planar Graphs, and Software Tools. Based on materials by Lala Adamic, UMichigan

CSE 326, Data Structures. Sample Final Exam. Problem Max Points Score 1 14 (2x7) 2 18 (3x6) Total 92.

2. (a) Explain the strassen s matrix multiplication. (b) Write deletion algorithm, of Binary search tree. [8+8]

USING SPECTRAL RADIUS RATIO FOR NODE DEGREE TO ANALYZE THE EVOLUTION OF SCALE- FREE NETWORKS AND SMALL-WORLD NETWORKS

Graph Theory and Complex Networks: An Introduction. Chapter 06: Network analysis

3. The Junction Tree Algorithms

Social and Economic Networks: Lecture 1, Networks?

V. Adamchik 1. Graph Theory. Victor Adamchik. Fall of 2005

Practical Graph Mining with R. 5. Link Analysis

IE 680 Special Topics in Production Systems: Networks, Routing and Logistics*

in R Binbin Lu, Martin Charlton National Centre for Geocomputation National University of Ireland Maynooth The R User Conference 2011

! E6893 Big Data Analytics Lecture 10:! Linked Big Data Graph Computing (II)

Linear Programming I

Home Page. Data Structures. Title Page. Page 1 of 24. Go Back. Full Screen. Close. Quit

Reductions & NP-completeness as part of Foundations of Computer Science undergraduate course

Graph Theory and Complex Networks: An Introduction. Chapter 06: Network analysis. Contents. Introduction. Maarten van Steen. Version: April 28, 2014

OPTIMAL DESIGN OF DISTRIBUTED SENSOR NETWORKS FOR FIELD RECONSTRUCTION

Data Structures. Chapter 8

SCAN: A Structural Clustering Algorithm for Networks

Graph theoretic approach to analyze amino acid network

Subgraph Patterns: Network Motifs and Graphlets. Pedro Ribeiro

Class One: Degree Sequences

Why? A central concept in Computer Science. Algorithms are ubiquitous.

HISTORICAL DEVELOPMENTS AND THEORETICAL APPROACHES IN SOCIOLOGY Vol. I - Social Network Analysis - Wouter de Nooy

Social and Technological Network Analysis. Lecture 3: Centrality Measures. Dr. Cecilia Mascolo (some material from Lada Adamic s lectures)

5. A full binary tree with n leaves contains [A] n nodes. [B] log n 2 nodes. [C] 2n 1 nodes. [D] n 2 nodes.

NETZCOPE - a tool to analyze and display complex R&D collaboration networks

Network (Tree) Topology Inference Based on Prüfer Sequence

Graph Processing and Social Networks

A comparative study of social network analysis tools

Network VisualizationS

Any two nodes which are connected by an edge in a graph are called adjacent node.

Answer: (a) Since we cannot repeat men on the committee, and the order we select them in does not matter, ( )

Open Source Software Developer and Project Networks

RA MODEL VISUALIZATION WITH MICROSOFT EXCEL 2013 AND GEPHI

Social Network Analysis: Visualization Tools

Asking Hard Graph Questions. Paul Burkhardt. February 3, 2014

Warshall s Algorithm: Transitive Closure

The Singular Value Decomposition in Symmetric (Löwdin) Orthogonalization and Data Compression

SECTIONS NOTES ON GRAPH THEORY NOTATION AND ITS USE IN THE STUDY OF SPARSE SYMMETRIC MATRICES

An Introduction to APGL

Graphical degree sequences and realizations

Distributed Computing over Communication Networks: Maximal Independent Set

A SOCIAL NETWORK ANALYSIS APPROACH TO ANALYZE ROAD NETWORKS INTRODUCTION

The Basics of Graphical Models

Link Prediction in Social Networks

Big Graph Processing: Some Background

Finding and counting given length cycles

Transportation Polytopes: a Twenty year Update

One last point: we started off this book by introducing another famously hard search problem:

Follow links for Class Use and other Permissions. For more information send to:

5.1 Bipartite Matching

Structural and functional analytics for community detection in large-scale complex networks

Graph Theory and Complex Networks: An Introduction. Chapter 08: Computer networks

Mining Social-Network Graphs

Complex Network Visualization based on Voronoi Diagram and Smoothed-particle Hydrodynamics

How To Cluster Of Complex Systems

Outline. NP-completeness. When is a problem easy? When is a problem hard? Today. Euler Circuits

Data Structures and Algorithms Written Examination

1. Write the number of the left-hand item next to the item on the right that corresponds to it.

Ramsey numbers for bipartite graphs with small bandwidth

ONLINE DEGREE-BOUNDED STEINER NETWORK DESIGN. Sina Dehghani Saeed Seddighin Ali Shafahi Fall 2015

Chapter 29 Scale-Free Network Topologies with Clustering Similar to Online Social Networks

Map-like Wikipedia Visualization. Pang Cheong Iao. Master of Science in Software Engineering

Hadoop SNS. renren.com. Saturday, December 3, 11

Online EFFECTIVE AS OF JANUARY 2013

USE OF EIGENVALUES AND EIGENVECTORS TO ANALYZE BIPARTIVITY OF NETWORK GRAPHS

General Network Analysis: Graph-theoretic. COMP572 Fall 2009

Graph Theory and Networks in Biology

International Journal of Software and Web Sciences (IJSWS)

What is SNA? A sociogram showing ties

Complex Network Analysis of Brain Connectivity: An Introduction LABREPORT 5

(67902) Topics in Theory and Complexity Nov 2, Lecture 7

Course on Social Network Analysis Graphs and Networks

Groups and Positions in Complete Networks

An approach of detecting structure emergence of regional complex network of entrepreneurs: simulation experiment of college student start-ups

Application of Graph Theory to

Data Mining: Algorithms and Applications Matrix Math Review

5. Binary objects labeling

Dynamic Network Analyzer Building a Framework for the Graph-theoretic Analysis of Dynamic Networks

Social Network Analysis: Lecture 4-Centrality Measures

A Review And Evaluations Of Shortest Path Algorithms

Transcription:

Practical statistical network analysis (with R and igraph) Gábor Csárdi csardi@rmki.kfki.hu Department of Biophysics, KFKI Research Institute for Nuclear and Particle Physics of the Hungarian Academy of Sciences, Budapest, Hungary Currently at Department of Medical Genetics, University of Lausanne, Lausanne, Switzerland

What is a network (or graph)? Practical statistical network analysis WU Wien 2

What is a network (or graph)? Practical statistical network analysis WU Wien 3

What is a network (or graph)? Practical statistical network analysis WU Wien 4

What is a graph? Binary relation (=edges) between elements of a set (=vertices).

What is a graph? Binary relation (=edges) between elements of a set (=vertices). E.g. vertices = {A, B, C, D, E} edges = ({A, B}, {A, C}, {B, C}, {C, E}).

What is a graph? Binary relation (=edges) between elements of a set (=vertices). E.g. vertices = {A, B, C, D, E} edges = ({A, B}, {A, C}, {B, C}, {C, E}). It is better to draw it: E D C A B

What is a graph? Binary relation (=edges) between elements of a set (=vertices). E.g. vertices = {A, B, C, D, E} edges = ({A, B}, {A, C}, {B, C}, {C, E}). It is better to draw it: E C A B D E C B A D Practical statistical network analysis WU Wien 5

Undirected and directed graphs If the pairs are unordered, then the graph is undirected: B vertices = {A, B, C, D, E} A C E edges = ({A, B}, {A, C}, {B, C}, {C, E}). D

Undirected and directed graphs If the pairs are unordered, then the graph is undirected: B vertices = {A, B, C, D, E} A C E edges = ({A, B}, {A, C}, {B, C}, {C, E}). D Otherwise it is directed: B vertices = {A, B, C, D, E} A C E edges = ((A, B), (A, C), (B, C), (C, E)). D Practical statistical network analysis WU Wien 6

The igraph package For classic graph theory and network science. Core functionality is implemented as a C library. High level interfaces from R and Python. GNU GPL. http://igraph.sf.net Practical statistical network analysis WU Wien 7

Vertex and edge ids Vertices are always numbered from zero (!). Numbering is continual, form to V.

Vertex and edge ids Vertices are always numbered from zero (!). Numbering is continual, form to V. We have to translate vertex names to ids: V = {A, B, C, D, E} E = ((A, B), (A, C), (B, C), (C, E)). A =, B =, C = 2, D = 3, E = 4.

Vertex and edge ids Vertices are always numbered from zero (!). Numbering is continual, form to V. We have to translate vertex names to ids: V = {A, B, C, D, E} E = ((A, B), (A, C), (B, C), (C, E)). A =, B =, C = 2, D = 3, E = 4. > g <- graph( c(,,,2,,2, 2,4), n=5 ) Practical statistical network analysis WU Wien 8

Creating igraph graphs igraph objects

Creating igraph graphs igraph objects print(), summary(), is.igraph()

Creating igraph graphs igraph objects print(), summary(), is.igraph() is.directed(), vcount(), ecount() > g <- graph( c(,,,2,,2, 2,4), n=5 ) 2 > g 3 Vertices: 5 4 Edges: 4 5 Directed: TRUE 6 Edges: 7 8 [] -> 9 [] -> 2 [2] -> 2 [3] 2 -> 4 Practical statistical network analysis WU Wien 9

Visualization > g <- graph.tree(4, 4) 2 > plot(g) 3 > plot(g, layout=layout.circle)

Visualization > g <- graph.tree(4, 4) 2 > plot(g) 3 > plot(g, layout=layout.circle) # Force directed layouts 2 > plot(g, layout=layout.fruchterman.reingold) 3 > plot(g, layout=layout.graphopt) 4 > plot(g, layout=layout.kamada.kawai)

Visualization > g <- graph.tree(4, 4) 2 > plot(g) 3 > plot(g, layout=layout.circle) # Force directed layouts 2 > plot(g, layout=layout.fruchterman.reingold) 3 > plot(g, layout=layout.graphopt) 4 > plot(g, layout=layout.kamada.kawai) # Interactive 2 > tkplot(g, layout=layout.kamada.kawai) 3 > l <- layout=layout.kamada.kawai(g)

Visualization > g <- graph.tree(4, 4) 2 > plot(g) 3 > plot(g, layout=layout.circle) # Force directed layouts 2 > plot(g, layout=layout.fruchterman.reingold) 3 > plot(g, layout=layout.graphopt) 4 > plot(g, layout=layout.kamada.kawai) # Interactive 2 > tkplot(g, layout=layout.kamada.kawai) 3 > l <- layout=layout.kamada.kawai(g) # 3D 2 > rglplot(g, layout=l)

Visualization > g <- graph.tree(4, 4) 2 > plot(g) 3 > plot(g, layout=layout.circle) # Force directed layouts 2 > plot(g, layout=layout.fruchterman.reingold) 3 > plot(g, layout=layout.graphopt) 4 > plot(g, layout=layout.kamada.kawai) # Interactive 2 > tkplot(g, layout=layout.kamada.kawai) 3 > l <- layout=layout.kamada.kawai(g) # 3D 2 > rglplot(g, layout=l) # Visual properties 2 > plot(g, layout=l, vertex.color="cyan") Practical statistical network analysis WU Wien

Simple graphs igraph can handle multi-graphs: V = {A, B, C, D, E} E = ((AB), (AB), (AC), (BC), (CE)). > g <- graph( c(,,,,,2,,2, 3,4), n=5 ) 2 > g 3 Vertices: 5 4 Edges: 5 5 Directed: TRUE 6 Edges: 7 8 [] -> 9 [] -> [2] -> 2 [3] -> 2 2 [4] 3 -> 4 Practical statistical network analysis WU Wien

Simple graphs igraph can handle loop-edges: V = {A, B, C, D, E} E = ((AA), (AB), (AC), (BC), (CE)). > g <- graph( c(,,,,,2,,2, 3,4), n=5 ) 2 > g 3 Vertices: 5 4 Edges: 5 5 Directed: TRUE 6 Edges: 7 8 [] -> 9 [] -> [2] -> 2 [3] -> 2 2 [4] 3 -> 4 Practical statistical network analysis WU Wien 2

Creating (more) igraph graphs el <- scan("lesmis.txt") 2 el <- matrix(el, byrow=true, nc=2) 3 gmis <- graph.edgelist(el, dir=false) 4 summary(gmis) Practical statistical network analysis WU Wien 3

Naming vertices V(gmis)$name 2 g <- graph.ring() 3 V(g)$name <- sample(letters, vcount(g)) Practical statistical network analysis WU Wien 4

Creating (more) igraph graphs # A simple undirected graph 2 > g <- graph.formula( Alice-Bob-Cecil-Alice, 3 Daniel-Cecil-Eugene, Cecil-Gordon )

Creating (more) igraph graphs # A simple undirected graph 2 > g <- graph.formula( Alice-Bob-Cecil-Alice, 3 Daniel-Cecil-Eugene, Cecil-Gordon ) # Another undirected graph, ":" notation 2 > g2 <- graph.formula( Alice-Bob:Cecil:Daniel, 3 Cecil:Daniel-Eugene:Gordon )

Creating (more) igraph graphs # A simple undirected graph 2 > g <- graph.formula( Alice-Bob-Cecil-Alice, 3 Daniel-Cecil-Eugene, Cecil-Gordon ) # Another undirected graph, ":" notation 2 > g2 <- graph.formula( Alice-Bob:Cecil:Daniel, 3 Cecil:Daniel-Eugene:Gordon ) # A directed graph 2 > g3 <- graph.formula( Alice +-+ Bob --+ Cecil 3 +-- Daniel, Eugene --+ Gordon:Helen )

Creating (more) igraph graphs # A simple undirected graph 2 > g <- graph.formula( Alice-Bob-Cecil-Alice, 3 Daniel-Cecil-Eugene, Cecil-Gordon ) # Another undirected graph, ":" notation 2 > g2 <- graph.formula( Alice-Bob:Cecil:Daniel, 3 Cecil:Daniel-Eugene:Gordon ) # A directed graph 2 > g3 <- graph.formula( Alice +-+ Bob --+ Cecil 3 +-- Daniel, Eugene --+ Gordon:Helen ) # A graph with isolate vertices 2 > g4 <- graph.formula( Alice -- Bob -- Daniel, 3 Cecil:Gordon, Helen )

Creating (more) igraph graphs # A simple undirected graph 2 > g <- graph.formula( Alice-Bob-Cecil-Alice, 3 Daniel-Cecil-Eugene, Cecil-Gordon ) # Another undirected graph, ":" notation 2 > g2 <- graph.formula( Alice-Bob:Cecil:Daniel, 3 Cecil:Daniel-Eugene:Gordon ) # A directed graph 2 > g3 <- graph.formula( Alice +-+ Bob --+ Cecil 3 +-- Daniel, Eugene --+ Gordon:Helen ) # A graph with isolate vertices 2 > g4 <- graph.formula( Alice -- Bob -- Daniel, 3 Cecil:Gordon, Helen ) # "Arrows" can be arbitrarily long 2 > g5 <- graph.formula( Alice +---------+ Bob ) Practical statistical network analysis WU Wien 5

Vertex/Edge sets, attributes Assigning attributes: set/get.graph/vertex/edge.attribute.

Vertex/Edge sets, attributes Assigning attributes: set/get.graph/vertex/edge.attribute. V(g) and E(g).

Vertex/Edge sets, attributes Assigning attributes: set/get.graph/vertex/edge.attribute. V(g) and E(g). Smart indexing, e.g. V(g)[color=="white"]

Vertex/Edge sets, attributes Assigning attributes: set/get.graph/vertex/edge.attribute. V(g) and E(g). Smart indexing, e.g. V(g)[color=="white"] Easy access of attributes: > g <- erdos.renyi.game(, /) 2 > V(g)$color <- sample( c("red", "black"), 3 vcount(g), rep=true) 4 > E(g)$color <- "grey" 5 > red <- V(g)[ color == "red" ] 6 > bl <- V(g)[ color == "black" ] 7 > E(g)[ red %--% red ]$color <- "red" 8 > E(g)[ bl %--% bl ]$color <- "black" 9 > plot(g, vertex.size=5, layout= layout.fruchterman.reingold) Practical statistical network analysis WU Wien 6

Creating (even) more graphs E.g. from.csv files. > traits <- read.csv("traits.csv", head=f) 2 > relations <- read.csv("relations.csv", head=f) 3 > orgnet <- graph.data.frame(relations) 4 5 > traits[,] <- sapply(strsplit(as.character 6 (traits[,]), split=" "), "[[", ) 7 > idx <- match(v(orgnet)$name, traits[,]) 8 > V(orgnet)$gender <- as.character(traits[,3][idx]) 9 > V(orgnet)$age <- traits[,2][idx] > igraph.par("print.vertex.attributes", TRUE) 2 > orgnet Practical statistical network analysis WU Wien 7

Creating (even) more graphs From the web, e.g. Pajek files. > karate <- read.graph("http://cneurocvs.rmki.kfki.hu/igraph/karate.net", 2 format="pajek") 3 > summary(karate) 4 Vertices: 34 5 Edges: 78 6 Directed: FALSE 7 No graph attributes. 8 No vertex attributes. 9 No edge attributes. Practical statistical network analysis WU Wien 8

Graph representation There is no best format, everything depends on what kind of questions one wants to ask. Iannis Esmeralda Fabien Alice Diana Helen Cecil Jennifer Gigi Bob Practical statistical network analysis WU Wien 9

Graph representation Adjacency matrix. Good for questions like: is Alice connected to Bob? Alice Bob Cecil Diana Esmeralda Fabien Gigi Helen Iannis Jennifer Alice Bob Cecil Diana Esmeralda Fabien Gigi Helen Iannis Jennifer Practical statistical network analysis WU Wien 2

Graph representation Edge list. Not really good for anything. Alice Bob Cecil Alice Diana Cecil Esmeralda Bob Cecil Diana Alice Bob Diana Esmeralda Fabien Gigi Diana Esmeralda Alice Helen Bob Diana Diana Esmeralda Esmeralda Fabien Fabien Gigi Gigi Gigi Helen Helen Helen Helen Helen Helen Iannis Iannis Jennifer Jennifer Practical statistical network analysis WU Wien 2

Graph representation Adjacency lists. GQ: who are the neighbors of Alice? Alice Bob Cecil Diana Esmeralda Fabien Gigi Helen Iannis Jennifer Bob, Esmeralda, Helen, Jennifer Alice, Diana, Gigi, Helen Diana, Fabien, Gigi Bob, Cecil, Esmeralda, Gigi, Helen, Iannis Alice, Diana, Fabien, Helen, Iannis Cecil, Esmeralda, Helen Bob, Cecil, Diana, Helen Alice, Bob, Diana, Esmeralda, Fabien, Gigi, Jennifer Diana, Esmeralda Alice, Helen Practical statistical network analysis WU Wien 22

Graph representation igraph. Flat data structures, indexed edge lists. Easy to handle, good for many kind of questions. Practical statistical network analysis WU Wien 23

Centrality in networks degree Iannis 2 Esmeralda 5 Cecil 3 Fabien 3 Diana 6 Gigi 4 Helen 7 Bob 4 Alice 4 Jennifer 2 Practical statistical network analysis WU Wien 24

Centrality in networks closeness C v = V i v d vi Iannis.53 Esmeralda.69 Cecil.53 Fabien.6 Diana.75 Gigi.64 Helen.82 Bob.64 Alice.6 Jennifer.5 Practical statistical network analysis WU Wien 25

Centrality in networks betweenness B v = g ivj /g ij i j,i v,j v Iannis Esmeralda 4.62 Cecil.83 Fabien.45 Diana 6.76 Gigi.45 Helen. Bob.2 Alice.67 Jennifer Practical statistical network analysis WU Wien 26

Centrality in networks eigenvector centrality E v = λ V i= A iv E i, Ax = λx Iannis.36 Esmeralda.75 Cecil.46 Fabien.49 Diana.88 Gigi.68 Helen Bob.7 Alice.63 Jennifer.36 Practical statistical network analysis WU Wien 27

Centrality in networks page rank E v = d V + d V i= A iv E i Iannis.34 Esmeralda.74 Cecil.47 Fabien.47 Diana.87 Gigi.59 Helen Bob.59 Alice.6 Jennifer.34 Practical statistical network analysis WU Wien 28

Community structure in networks Organizing things, clustering items to see the structure. M. E. J. Newman, PNAS, 3, 8577 8582 Practical statistical network analysis WU Wien 29

Community structure in networks How to define what is modular? Many proposed definitions, here is a popular one: Q = 2 E [A vw p vw ]δ(c v, c w ). vw

Community structure in networks How to define what is modular? Many proposed definitions, here is a popular one: Q = 2 E [A vw p vw ]δ(c v, c w ). vw Random graph null model: p vw = p = V ( V )

Community structure in networks How to define what is modular? Many proposed definitions, here is a popular one: Q = 2 E [A vw p vw ]δ(c v, c w ). vw Random graph null model: p vw = p = V ( V ) Degree sequence based null model: p vw = k vk w 2 E Practical statistical network analysis WU Wien 3

Cohesive blocks (Based on Structural Cohesion and Embeddedness: a Hierarchical Concept of Social Groups by J.Moody and D.White, Americal Sociological Review, 68, 3 27, 23) Definition : A collectivity is structurally cohesive to the extent that the social relations of its members hold it together.

Cohesive blocks (Based on Structural Cohesion and Embeddedness: a Hierarchical Concept of Social Groups by J.Moody and D.White, Americal Sociological Review, 68, 3 27, 23) Definition : A collectivity is structurally cohesive to the extent that the social relations of its members hold it together. Definition 2: A group is structurally cohesive to the extent that multiple independent relational paths among all pairs of members hold it together.

Cohesive blocks (Based on Structural Cohesion and Embeddedness: a Hierarchical Concept of Social Groups by J.Moody and D.White, Americal Sociological Review, 68, 3 27, 23) Definition : A collectivity is structurally cohesive to the extent that the social relations of its members hold it together. Definition 2: A group is structurally cohesive to the extent that multiple independent relational paths among all pairs of members hold it together. Vertex-independent paths and vertex connectivity.

Cohesive blocks (Based on Structural Cohesion and Embeddedness: a Hierarchical Concept of Social Groups by J.Moody and D.White, Americal Sociological Review, 68, 3 27, 23) Definition : A collectivity is structurally cohesive to the extent that the social relations of its members hold it together. Definition 2: A group is structurally cohesive to the extent that multiple independent relational paths among all pairs of members hold it together. Vertex-independent paths and vertex connectivity. Vertex connectivity and network flows. Practical statistical network analysis WU Wien 3

Cohesive blocks Practical statistical network analysis WU Wien 32

Cohesive blocks Practical statistical network analysis WU Wien 33

Rapid prototyping Weighted transitivity c(i) = A3 ii (AA) ii

Rapid prototyping Weighted transitivity c(i) = A3 ii (AA) ii c w (i) = W 3 ii (WW max W) ii

Rapid prototyping Weighted transitivity c(i) = A3 ii (AA) ii c w (i) = W 3 ii (WW max W) ii wtrans <- function(g) { 2 W <- get.adjacency(g, attr="weight") 3 WM <- matrix(max(w), nrow(w), ncol(w)) 4 diag(wm) <- 5 diag( W %*% W %*% W ) / 6 diag( W %*% WM %*% W) 7 } Practical statistical network analysis WU Wien 34

Rapid prototyping Clique percolation (Palla et al., Nature 435, 84, 25) Practical statistical network analysis WU Wien 35

... and the rest Cliques and independent vertex sets. Network flows. Motifs, i.e. dyad and triad census. Random graph generators. Graph isomorphism. Vertex similarity measures, topological sorting, spanning trees, graph components, K-cores, transitivity or clustering coefficient. etc. C-level: rich data type library. Practical statistical network analysis WU Wien 36

Acknowledgement Tamás Nepusz All the people who contributed code, sent bug reports, suggestions The R project Hungarian Academy of Sciences The OSS community in general Practical statistical network analysis WU Wien 37