Graphs, Networks and Python: The Power of Interconnection. Lachlan Blackhall - lachlan@repositpower.com

Similar documents

NetworkX: Network Analysis with Python

NetworkX: Network Analysis with Python

NetworkX: Network Analysis with Python

Complex Networks Analysis: Clustering Methods

Software tools for Complex Networks Analysis. Fabrice Huet, University of Nice Sophia- Antipolis SCALE (ex-oasis) Team

Introduction to Networks and Business Intelligence

An Introduction to APGL

Graph models for the Web and the Internet. Elias Koutsoupias University of Athens and UCLA. Crete, July 2003

USING SPECTRAL RADIUS RATIO FOR NODE DEGREE TO ANALYZE THE EVOLUTION OF SCALE- FREE NETWORKS AND SMALL-WORLD NETWORKS

A comparative study of social network analysis tools

Graph Theory and Networks in Biology

Social and Economic Networks: Lecture 1, Networks?

Graphs over Time Densification Laws, Shrinking Diameters and Possible Explanations

Graph Mining Techniques for Social Media Analysis

The average distances in random graphs with given expected degrees

Network/Graph Theory. What is a Network? What is network theory? Graph-based representations. Friendship Network. What makes a problem graph-like?

Effects of node buffer and capacity on network traffic

Robustness of Spatial Databases: Using Network Analysis on GIS Data Models

Big Data Analytics of Multi-Relationship Online Social Network Based on Multi-Subnet Composited Complex Network

DATA ANALYSIS IN PUBLIC SOCIAL NETWORKS

Network VisualizationS

Lecture 26: Small-World Peer-to-Peer Networks and Their Security Issues. Lecture Notes on Computer and Network Security. by Avi Kak

A discussion of Statistical Mechanics of Complex Networks P. Part I

A Unified Network Performance Measure with Importance Identification and the Ranking of Network Components

How To Predict The Growth Of A Network

Enron Network Analysis Tutorial r date()

ATM Network Performance Evaluation And Optimization Using Complex Network Theory

Bioinformatics: Network Analysis

Chapter 29 Scale-Free Network Topologies with Clustering Similar to Online Social Networks

How to do a Business Network Analysis

Healthcare Analytics. Aryya Gangopadhyay UMBC

GENERATING AN ASSORTATIVE NETWORK WITH A GIVEN DEGREE DISTRIBUTION

Network Theory: 80/20 Rule and Small Worlds Theory

Complex Network Visualization based on Voronoi Diagram and Smoothed-particle Hydrodynamics

IC05 Introduction on Networks &Visualization Nov

! E6893 Big Data Analytics Lecture 10:! Linked Big Data Graph Computing (II)

Walk-Based Centrality and Communicability Measures for Network Analysis

Scientific Collaboration Networks in China s System Engineering Subject

Recent Progress in Complex Network Analysis. Models of Random Intersection Graphs

MAGE: Matching Approximate Patterns in Richly-Attributed Graphs

Brian Hayes. A reprint from. American Scientist. the magazine of Sigma Xi, the Scientific Research Society

JustClust User Manual

An Interest-Oriented Network Evolution Mechanism for Online Communities

Degree distribution in random Apollonian networks structures

Open Source Software Developer and Project Networks

Option 1: empirical network analysis. Task: find data, analyze data (and visualize it), then interpret.

How to Analyze Company Using Social Network?

Complex Network Analysis of Brain Connectivity: An Introduction LABREPORT 5

Cluster detection algorithm in neural networks

What is SNA? A sociogram showing ties

Temporal Dynamics of Scale-Free Networks

Optimization with Gurobi and Python

The Topology of Large-Scale Engineering Problem-Solving Networks

MODEL SELECTION FOR SOCIAL NETWORKS USING GRAPHLETS

DECENTRALIZED SCALE-FREE NETWORK CONSTRUCTION AND LOAD BALANCING IN MASSIVE MULTIUSER VIRTUAL ENVIRONMENTS

Gephi Tutorial Quick Start

General Network Analysis: Graph-theoretic. COMP572 Fall 2009

Strategies for Optimizing Public Train Transport Networks in China: Under a Viewpoint of Complex Networks

Combining Spatial and Network Analysis: A Case Study of the GoMore Network

Network Metrics, Planar Graphs, and Software Tools. Based on materials by Lala Adamic, UMichigan

USE OF GRAPH THEORY AND NETWORKS IN BIOLOGY

Statistical Inference for Networks Graduate Lectures. Hilary Term 2009 Prof. Gesine Reinert

Research Article A Comparison of Online Social Networks and Real-Life Social Networks: A Study of Sina Microblogging

Subgraph Patterns: Network Motifs and Graphlets. Pedro Ribeiro

Big Data: Opportunities and Challenges for Complex Networks

Social Media Mining. Graph Essentials

Part 2: Community Detection

1. Nondeterministically guess a solution (called a certificate) 2. Check whether the solution solves the problem (called verification)

How Placing Limitations on the Size of Personal Networks Changes the Structural Properties of Complex Networks

A Brief Study of Open Source Graph Databases

DYNAMIC Distributed Federated Databases (DDFD) Experimental Evaluation of the Performance and Scalability of a Dynamic Distributed Federated Database

Transcription:

Graphs, Networks and Python: The Power of Interconnection Lachlan Blackhall - lachlan@repositpower.com

A little about me

Graphs Graph, G = (V, E) V = Vertices / Nodes E = Edges

NetworkX Native graph structures for Python. Nodes can be any hashable object. Edges are tuples of nodes with optional edge data which is stored in a dictionary. Well maintained package

Lets make a Graph import networkx as nx import matplotlib.pyplot as plt G = nx.graph()

Lets add some nodes import networkx as nx import matplotlib.pyplot as plt G = nx.graph() G.add_node('1') G.add_node('2') G.add_node('3') G.add_node('4') G.add_node('5')

Lets add some edges import networkx as nx import matplotlib.pyplot as plt G = nx.graph() G.add_node('1') G.add_node('2') G.add_node('3') G.add_node('4') G.add_node('5') G.add_edge('1', '2') G.add_edge('2', '3') G.add_edge('3', '4') G.add_edge('4', '1') G.add_edge('4', '5')

Lets Visualise it import networkx as nx import matplotlib.pyplot as plt G = nx.graph() G.add_node('1') G.add_node('2') G.add_node('3') G.add_node('4') G.add_node('5') G.add_edge('1', '2') G.add_edge('2', '3') G.add_edge('3', '4') G.add_edge('4', '1') G.add_edge('4', '5') nx.draw_spectral(g) plt.show()

Now a directed graph import networkx as nx import matplotlib.pyplot as plt G = nx.digraph() G.add_nodes_from(['1', '2', '3', '4', '5']) G.add_edge('1', '2') G.add_edge('2', '3') G.add_edge('3', '4') G.add_edge('4', '1') G.add_edge('4', '5') nx.draw_spectral(g) plt.show()

Look Ma - No Nodes import networkx as nx G = nx.digraph() G.add_edge('1', '2') G.add_edge('2', '3') G.add_edge('3', '4') G.add_edge('4', '1') G.add_edge('4', '5')

More Graphs Networkx can generate lots of interesting graphs to experiment with. Lets have a look at a few of them.

More Graphs import networkx as nx import matplotlib.pyplot as plt K_5 = nx.complete_graph(5) nx.draw(k_5) plt.show() barbell = nx.barbell_graph(10, 10) nx.draw(barbell) plt.show()

Erdos-Renyi Named for Paul Erdős and Alfréd Rényi. The Erdős Rényi model is a model for generating random graphs. The model sets an edge between each pair of nodes with equal probability.

Erdos-Renyi import networkx as nx import matplotlib.pyplot as plt er = nx.erdos_renyi_graph(100, 0.15) nx.draw(er) plt.show()

Watts-Strogatz Named for Duncan J. Watts and Steven Strogatz. The Watts Strogatz model produces graphs with small-world properties, including short average path lengths and high clustering. Most nodes are not neighbours but can be reached from every other node by a small number of hops or steps.

Watts-Strogatz import networkx as nx import matplotlib.pyplot as plt ws = nx.watts_strogatz_graph(100, 15, 0.1) nx.draw(ws) plt.show()

Barabasi - Albert Named for Albert-László Barabási and Réka Albert. The Barabási Albert model is an algorithm for generating random scale-free networks. Scale free describes the distribution of node degrees of the network.

Barabasi - Albert import networkx as nx import matplotlib.pyplot as plt ba = nx.barabasi_albert_graph(100, 5) nx.draw(ba) plt.show()

Social Network Analyis One major area of interest in network analysis. Networkx is well suited to this type of analysis. Interested in understanding graph properties that explain the social interaction.

Enron Data Enron declared bankruptcy in 2001 Enron email corpus contains data from about 150 users, mostly senior management of Enron. Converted the corpus to a TSV file of the form SENDER RECIPIENTS EMAIL DATA str str,,str

import csv import networkx as nx G = nx.graph() A Graph

Load the email data import csv import networkx as nx G = nx.graph() with open('enron.tsv', 'rb') as tsvin: tsvin = csv.reader(tsvin, delimiter='\t')

Build the graph import csv import networkx as nx G = nx.graph() with open('enron.tsv', 'rb') as tsvin: tsvin = csv.reader(tsvin, delimiter='\t') for row in tsvin: sender = row[0] recipients = row[1].split(',') for recipient in recipients: G.add_edge(sender, recipient)

And visualise import csv import networkx as nx G = nx.graph() with open('enron.tsv', 'rb') as tsvin: tsvin = csv.reader(tsvin, delimiter='\t') for row in tsvin: sender = row[0] recipients = row[1].split(',') for recipient in recipients: G.add_edge(sender, recipient) nx.write_gexf(g, 'enron.gexf')

So what? Visualisations are pretty but Networkx can help us do better Lets dive into some graph analysis

Node Degree node_degree = nx.degree(g).items() sorted_degrees = sorted(node_degree, key=lambda tup: tup[1]) poi = [email for (email, degree) in sorted_degrees[-10:]] >>> ['tana.jones@enron.com', 'technology.enron@enron.com', '', 'klay@enron.com', 'jeff.skilling@enron.com', 'david.forster@enron.com', 'jeff.dasovich@enron.com', 'outlook.team@enron.com', 'sally.beck@enron.com', 'kenneth.lay@enron.com']

The People Kenneth Lay - Chairman Jeff Skilling - CEO Tana Jones - Senior Legal Specialist Sally Beck - COO David Forster - Vice President Jeff Dasovich - Government Relation Executive

Subgraphs real_poi = ['tana.jones@enron.com', 'klay@enron.com', 'jeff.skilling@enron.com', 'david.forster@enron.com', 'jeff.dasovich@enron.com', 'sally.beck@enron.com', 'kenneth.lay@enron.com'] sub_g = G.subgraph(real_poi) nx.draw(sub_g) plt.show()

MultiGraphs Allows you to have multiple edges between nodes. i.e. one edge per email. Instead of G = nx.graph() Use G = nx.multigraph()

Relationships? Neighbours G.neighbors('kenneth.lay@enron.com') G.neighbors_iter('kenneth.lay@enron.com') Cliques nx.find_cliques(g)

Networkx Functionality

Other useful graph Packages Gephi (http://gephi.github.io/) Pajek (http://pajek.imfm.si/doku.php)

Conclusion Graphs are good Networkx is awesome Have fun :)

Questions???