An Introduction to APGL



Similar documents
Social Media Mining. Graph Essentials

Practical Graph Mining with R. 5. Link Analysis

Tools and Techniques for Social Network Analysis

Network Metrics, Planar Graphs, and Software Tools. Based on materials by Lala Adamic, UMichigan

DATA ANALYSIS II. Matrix Algorithms

Software tools for Complex Networks Analysis. Fabrice Huet, University of Nice Sophia- Antipolis SCALE (ex-oasis) Team

Lecture 15 An Arithmetic Circuit Lowerbound and Flows in Graphs

Mining Social Network Graphs

Warshall s Algorithm: Transitive Closure

V. Adamchik 1. Graph Theory. Victor Adamchik. Fall of 2005

A comparative study of social network analysis tools

Frans J.C.T. de Ruiter, Norman L. Biggs Applications of integer programming methods to cages

IE 680 Special Topics in Production Systems: Networks, Routing and Logistics*

Approximation Algorithms

Reductions & NP-completeness as part of Foundations of Computer Science undergraduate course

SGL: Stata graph library for network analysis

Graphical degree sequences and realizations

Social Media Mining. Network Measures

NetworkX: Network Analysis with Python

Graphs, Networks and Python: The Power of Interconnection. Lachlan Blackhall - lachlan@repositpower.com

Class One: Degree Sequences

Asking Hard Graph Questions. Paul Burkhardt. February 3, 2014

Data Structures and Algorithms Written Examination

Distance Degree Sequences for Network Analysis

An Empirical Study of Two MIS Algorithms

2+2 Just type and press enter and the answer comes up ans = 4

Handout #Ch7 San Skulrattanakulchai Gustavus Adolphus College Dec 6, Chapter 7: Digraphs

COUNTING INDEPENDENT SETS IN SOME CLASSES OF (ALMOST) REGULAR GRAPHS

Intro to scientific programming (with Python) Pietro Berkes, Brandeis University

Distributed R for Big Data

OPTIMAL DESIGN OF DISTRIBUTED SENSOR NETWORKS FOR FIELD RECONSTRUCTION

CSE 326, Data Structures. Sample Final Exam. Problem Max Points Score 1 14 (2x7) 2 18 (3x6) Total 92.

Network (Tree) Topology Inference Based on Prüfer Sequence

Why? A central concept in Computer Science. Algorithms are ubiquitous.

Analysis of Algorithms, I

SECTIONS NOTES ON GRAPH THEORY NOTATION AND ITS USE IN THE STUDY OF SPARSE SYMMETRIC MATRICES

One last point: we started off this book by introducing another famously hard search problem:

Network/Graph Theory. What is a Network? What is network theory? Graph-based representations. Friendship Network. What makes a problem graph-like?

KEYWORD SEARCH OVER PROBABILISTIC RDF GRAPHS

Finding and counting given length cycles

The Open University s repository of research publications and other research outputs

Discrete Mathematics & Mathematical Reasoning Chapter 10: Graphs

Linear Algebra and TI 89

VISUAL ALGEBRA FOR COLLEGE STUDENTS. Laurie J. Burton Western Oregon University

Zachary Monaco Georgia College Olympic Coloring: Go For The Gold

Chapter 6: Graph Theory

Ramsey numbers for bipartite graphs with small bandwidth

Scientific Programming in Python

MODEL SELECTION FOR SOCIAL NETWORKS USING GRAPHLETS

Midterm Practice Problems

COMBINATORIAL PROPERTIES OF THE HIGMAN-SIMS GRAPH. 1. Introduction

Walk-Based Centrality and Communicability Measures for Network Analysis

Course on Social Network Analysis Graphs and Networks

Complex Networks Analysis: Clustering Methods

The mathematics of networks

GRAPH THEORY LECTURE 4: TREES

CSC2420 Fall 2012: Algorithm Design, Analysis and Theory

Graph Theory and Complex Networks: An Introduction. Chapter 06: Network analysis

Clique coloring B 1 -EPG graphs

A scalable multilevel algorithm for graph clustering and community structure detection

NetworkX: Network Analysis with Python

! E6893 Big Data Analytics Lecture 10:! Linked Big Data Graph Computing (II)

Chapter 10: Network Flow Programming

Discrete Mathematics. Hans Cuypers. October 11, 2007

Automated Model Based Testing for an Web Applications

CIS 192: Lecture 13 Scientific Computing and Unit Testing

NETZCOPE - a tool to analyze and display complex R&D collaboration networks

Part 2: Community Detection

General Network Analysis: Graph-theoretic. COMP572 Fall 2009

Hadoop SNS. renren.com. Saturday, December 3, 11

Stationary random graphs on Z with prescribed iid degrees and finite mean connections

Decision Mathematics D1 Advanced/Advanced Subsidiary. Tuesday 5 June 2007 Afternoon Time: 1 hour 30 minutes

SCIENTIFIC COMPUTING AND PROGRAMMING IN THE CLOUD USING OPEN SOURCE PLATFORMS: AN ILLUSTRATION USING WEIGHTED VOTING SYSTEMS

Circuits 1 M H Miller

A Review And Evaluations Of Shortest Path Algorithms

Big Data Analytics of Multi-Relationship Online Social Network Based on Multi-Subnet Composited Complex Network

CSE 4351/5351 Notes 7: Task Scheduling & Load Balancing

SHARP BOUNDS FOR THE SUM OF THE SQUARES OF THE DEGREES OF A GRAPH

Week 3. Network Data; Introduction to Graph Theory and Sociometric Notation

SPANNING CACTI FOR STRUCTURALLY CONTROLLABLE NETWORKS NGO THI TU ANH NATIONAL UNIVERSITY OF SINGAPORE

TIgeometry.com. Geometry. Angle Bisectors in a Triangle

System Interconnect Architectures. Goals and Analysis. Network Properties and Routing. Terminology - 2. Terminology - 1

Efficient Recovery of Secrets

Recent Progress in Complex Network Analysis. Models of Random Intersection Graphs

Bicolored Shortest Paths in Graphs with Applications to Network Overlay Design

Problem Set 7 Solutions

USING SPECTRAL RADIUS RATIO FOR NODE DEGREE TO ANALYZE THE EVOLUTION OF SCALE- FREE NETWORKS AND SMALL-WORLD NETWORKS

WEEK #3, Lecture 1: Sparse Systems, MATLAB Graphics

136 CHAPTER 4. INDUCTION, GRAPHS AND TREES

Routing in packet-switching networks

[2], [3], which realize a time bound of O(n. e(c + 1)).

6. Cholesky factorization

THE NUMBER OF GRAPHS AND A RANDOM GRAPH WITH A GIVEN DEGREE SEQUENCE. Alexander Barvinok

Modeling with Python

in R Binbin Lu, Martin Charlton National Centre for Geocomputation National University of Ireland Maynooth The R User Conference 2011

Lecture 7: NP-Complete Problems

NP-Completeness. CptS 223 Advanced Data Structures. Larry Holder School of Electrical Engineering and Computer Science Washington State University

Transcription:

An Introduction to APGL Charanpal Dhanjal February 2012 Abstract Another Python Graph Library (APGL) is a graph library written using pure Python, NumPy and SciPy. Users new to the library can gain an overview and start to learn some of the core functionality using this short article. Contents 1 Introduction 1 2 Download and Installation 2 3 Graph Storage and Manipulation 2 3.1 Creation and Basic Operations................... 3 3.2 Graph Properties........................... 3.3 Set Operations............................ 5 3. Input/Output............................. 6 3.5 Interfacing with NetworkX and igraph............... 7 3.6 Creating a DictGraph........................ 7 Random Graph Generation 7 5 Summary 8 1 Introduction APGL: Another Python Graph Library 1 has a self-explanatory title: it is a library for graph manipulation using pure Python 2. One of the main features of APGL is that the graph objects are based on adjacency matrices implemented using NumPy 3, SciPy and Pysparse 5, which allows for fast and memory efficient implementations of numerous algorithms. The sparse graph classes can scale up to 1,000,000s of vertices and edges on a standard PC. 1 http://packages.python.org/apgl/index.html 2 http://python.org/ 3 http://numpy.scipy.org/ http://www.scipy.org/ 5 http://pysparse.sourceforge.net/ 1

In this document we show how to install the library and demonstrate, using simple examples, some important functionality. Graph creation, manipulation, and input/output are exemplified. 2 Download and Installation One can download APGL for Windows, Linux or Mac OS using Sourceforge http://sourceforge.net/projects/apythongraphlib/ or the Python Package Index (PyPI) http://pypi.python.org/pypi/apgl/. To use this library you must have Python, NumPy and SciPy installed. The code has been verified on Python 2.7.2, Numpy 1.6.1 and Scipy 0.10.0, but should work with other versions. The automatic testing routine requires Python 2.7 or later, or the unittest2 testing framework for Python 2.3-2.6. To install the package, ensure that pip 6 is installed, and then run: pip i n s t a l l apgl If installing from source unzip the apgl-x.y.z.tar.gz file and then run setup.py as follows: python setup. py i n s t a l l In order to test the library (recommended), using the following commands in python import apgl apgl. t e s t ( ) and check that all tested pass. 3 Graph Storage and Manipulation A graph G = (V, E) is denoted by a set of vertices V and edges E V V, in which an edge is a relation between a pair of vertices. The current graph types in APGL are SparseGraph, DenseGraph and PySparseGraph which use adjacency or weight matrices as the underlying data structure. Note that there is also the DictGraph class, which does not use weight matrices and is described in Section 3.6. For a graph with n vertices, an adjacency matrix A has a value of 1 at the ijth entry if an edge exists between vertices i and j, otherwise the entry is zero. A weight matrix is identical to an adjacency matrix except that matrix elements can take any real value, and a non-zero element indicates an edge between vertices. In an undirected graph, an edge exists from vertex i to j whenever there is an edge from j to i. A directed graph does not have this constraint. In APGL, the edges in a graph are stored using weight matrices and the three different graph classes differ mainly in their underlying storage mechanism. 6 http://pypi.python.org/pypi/pip 2

DenseGraph uses numpy.ndarrays to store adjacencies whereas SparseGraph uses the scipy.sparse classes and is efficient for the storage of large graphs without many edges. As the scipy.sparse classes are written in Python we also provide PySparseGraph which uses Pysparse to store adjacencies. Many matrix operations for Pysparse are written in C and hence may be faster than the scipy.sparse ones. Edge values can currently only be numerical, however vertices can be labelled with anything using a subclass of the AbstractVertexList class. Currently, there are two general way of labelling vertices: using numpy.ndarrays in conjunction with VertexList, and with any label using GeneralVertexList. AbstractVertexList can easily be extended in order to define different vertex labelling methods. 3.1 Creation and Basic Operations We start by demonstrating the creation of a graph using SparseGraph. Notice that DenseGraph and PySparseGraph operate in a near-identical manner. 1 import numpy 2 from apgl. graph. V e r t e x L i s t import V e r t e x L i s t 3 from apgl. graph. SparseGraph import SparseGraph numvertices = 5 5 numfeatures = 2 6 graph = SparseGraph ( V e r t e x L i s t ( numvertices, numfeatures ) ) 7 8 #Add some edges to the graph #V e r t i c e s are indexed s t a r t i n g from 0 10 graph [ 0, 1 ] = 0. 1 11 graph [ 1, 2 ] = 1. 0 12 13 #Set the l a b e l o f the 0 th v e r t e x to [ 2, 3 ] 1 graph. s e t V e r t e x ( 0, numpy. array ( [ 2, 3 ] ) ) 15 16 #D i s p l a y s edge weights 17 p r i n t ( graph [ 1, 2 ] ) 18 p r i n t ( graph [ 1, 3 ] ) The first 2 lines import all of the graph classes required. Following, a VertexList object is created with 5 vertices and vector labels of size 2, which are all initialised to zero. Using this VertexList object, a SparseGraph object is created with no edges. By default, the SparseGraph is an undirected graph, however we will later show how to construct directed graphs. Two edges are added to the graph from vertex 0 to 1 with a weight of 0.1, and from 1 to 2 with a weight of 1.0. Notice that one can only add non-zero edge labels, as zero indicates the absence of an edge. Notice that edges can be referenced as if accessing a matrix directly. Often it is more convenient to add edges in a group rather than individually, and in this case one can use the addedges method. For example, the above code can be modified as follows to produce the same results 1 #Add some edges to the graph 2 edges = numpy. array ( [ [ 0, 1 ], [ 1, 2 ] ], numpy. i n t ) 3 edgevalues = numpy. array ( [ 0. 1, 1. 0 ] ) graph. addedges ( edges, edgevalues ) 5 3

6 #D i s p l a y s edge weight between v e r t i c e s 1 and 2 7 p r i n t ( graph [ 1, 2 ] ) The call to addedges uses a matrix of size m 2 as the first parameter and an array of length m of edge values as the second parameter. Each row of the matrix edges corresponds to an edge between two vertices and the corresponding value in edgevalue is the corresponding value. SparseGraphs are created by default using the SciPy csr matrix class (Compressed Sparse Row matrix), which allows for fast access to the rows of the adjacency matrix. One can also create a SparseGraph using other types of sparse matrix (currently limited to lil matrix, csr matrix, csc matrix and dok matrix): 1 import numpy 2 import s c i p y. s p a r s e as sps 3 from apgl. graph. G e n e r a l V e r t e x L i s t import G e n e r a l V e r t e x L i s t from apgl. graph. SparseGraph import SparseGraph 5 numvertices = 10 6 v L i s t = G e n e r a l V e r t e x L i s t ( numvertices ) 7 Wght = sps. c s c m a t r i x ( ( numvertices, numvertices ) ) 8 graph = SparseGraph ( vlist, W=Wght, u n d i r e c t e d=f a l s e ) 10 graph [ 0, 1 ] = 1 11 graph [ 0, 2 ] = 1 12 graph. s e t V e r t e x ( 0, abc ) 13 graph. s e t V e r t e x ( 1, 123) 1 15 p r i n t ( graph. i n D e g r e e D i s t r i b u t i o n ( ) ) Here, we use a different type of vertex list using the GeneralVertexList class, which allows vertex labels to take any value. The sparse matrix used in the graph is a scipy.sparse.csc matrix which is in Compressed Sparse Column format. The final parameter used in the constructor specifies that the resulting graph is directed. Following graph construction, the first and second vertices are initialised with abc and 123 respectively. The final line of the example computes the in-degree distribution of the graph, which is faster when the adjacency matrix is a csc matrix compared to the default choice of csr matrix. In this case however, the speed difference is negligible as the graph is very small. With larger graphs, the choice of weight matrix type can significantly affect the speed of graph algorithms. SparseGraph and DenseGraph have a number of additional methods for querying and modifying the underlying graph. For example, getnumedges() and getnumvertices() return the number of edges and vertices respectively. The neighbours(vertexid) method returns the set of neighbouring vertices for the given vertexid, and neighbourof returns the set of vertices which have edges to vertexid for a directed graph. 3.2 Graph Properties To study the characteristics of graphs, various properties have been proposed in the research literature. Some of the most common ones are shown below

(the interested reader is directed to e.g. [1] for more precise definitions of the properties): clusteringcoefficient() - 3 times the number of triples divided by the number of triangles density() - The proportion of edges vs. total possible number of edges diameter() - Length of the longest shortest path in the graph effectivediameter(p) - A more robust alternative to the diameter geodesicdistance() - Mean shortest distance between all pairs of vertices harmonicgeodesicdistance() - Mean harmonic shortest distance between all pairs of vertices Several of the methods above require the computation of the shortest paths between all pairs of vertices. The matrix of shortest paths P can be found using the Floyd-Warshall algorithm [2] at a computational cost of O(n 3 ) where n is the size of the graph. The ijth entry of P is the shortest path between vertices i and j. To compute diameters and geodesic distances, one can optionally pass in a matrix P as follows: 1 from apgl. graph. G e n e r a l V e r t e x L i s t import G e n e r a l V e r t e x L i s t 2 from apgl. graph. SparseGraph import SparseGraph 3 numvertices = 10 5 graph = SparseGraph ( G e n e r a l V e r t e x L i s t ( numvertices ) ) 6 7 graph [ 0, 1 ] = 1 8 graph [ 0, 2 ] = 1 10 P = graph. f l o y d W a r s h a l l ( ) 11 p r i n t ( graph. g e o d e s i c D i s t a n c e (P=P) ) 12 p r i n t ( graph. harmonicgeodesicdistance (P=P) ) The above example outputs both the mean geodesic distance and harmonic mean geodesic distance using the matrix P as computed using the floydwarshall method (note that one can also use findalldistances which is based on Dijkstra s algorithm). As P is computed only once but used twice, this usage reduces computational cost over making the default calls of the distance methods. 3.3 Set Operations By considering the edges as a set of pairs of vertices one can perform various set operations with graphs, and some of these are listed in in Table 1. The first methods in Table 1 consider the edges in the graphs without weights, hence the resulting returned graphs contain adjacency matrices only. For the subgraph method, the graph returned contains only those vertices indexed by vinds and edges between these vertices. 5

Example Method Call g1.union(g2) g1.intersect(g2) g1.setdiff(g2) g1.complement() g1.subgraph(vinds) Description Union between graph edges Intersection of graph edges Find edges in g1 that are not in g2 Find the graph with edges which are not present in g1 Compute the subgraph using the selected vertices Table 1: Methods to perform set operations using graphs g1 and g2, and vertex indices set vinds. 3. Input/Output Simple file reading and writing is possible by using a pre-defined comma separated value format which is exemplified as follows: Vertices 0 1 2 3 Edges 0, 1, 1 2,, 1, 0, 1 2, 2, 1 Vertex labels can only be integers and must be listed after Vertices. Following, edges are given as a sequence of triples corresponding to two vertices and the last value in the triple is the edge weight. In the case that the graph is directed one should replace Edges with Arcs. The graph corresponding to this file (saved as test.txt ) is read using the code: 1 from apgl. i o import SimpleGraphReader 2 3 filename = t e s t. t x t graphreader = SimpleGraphReader ( ) 5 graph = graphreader. readfromfile ( filename ) 6 7 p r i n t ( graph. getalledges ( ) ) 8 #Save the edges and v e r t i c e s i n testgraph. z i p 10 graph. save ( testgraph ) The graph returned from graphreader is a SparseGraph. Furthermore, the output of the final line of the code is [[1 0] [2 2] [ 0] [ 2]]. In order to write graphs in this format, one can use the SimpleGraphWriter class. Notice that SimpleGraphReader and SimpleGraphReader work with only the edges of the graph, i.e. vertex labels are not stored. To save complete graphs, including the vertex labels, one can use the save and load methods which store and load the weight matrices in matrix market format. 6

3.5 Interfacing with NetworkX and igraph NetworkX 7 is a Python package for the creation, manipulation, and study of the structure, dynamics, and functions of complex networks. In APGL there are methods to convert between NetworkX graphs and APGL ones. To convert from a PySparseGraph, SparseGraph or DenseGraph to a NetworkX undirected Graph or directed DiGraph, one can use the tonetworkx method. See the reference documentation for more details. Similarly, there is toigraph to output igraph 8 objects. 3.6 Creating a DictGraph It might seem that the construction of the graph classes is restrictive as one is required to construct an object with the number of vertices, and these vertices are indexed using integers. For growing graphs with non-integer names we provide the DictGraph class which uses a dictionary of dictionaries to store adjacencies. One can easily create and populate a DictGraph and then transfer the edges to a matrix graph: 1 from apgl. graph. DictGraph import DictGraph 2 from apgl. graph. SparseGraph import SparseGraph 3 from apgl. graph. G e n e r a l V e r t e x L i s t import G e n e r a l V e r t e x L i s t 5 graph = DictGraph ( ) 6 graph. addedge ( a, b ) 7 graph. addedge ( a, c ) 8 graph. addedge ( a, d ) 10 e d g e I n d i c e s = graph. g e t A l l E d g e I n d i c e s ( ) 11 graph2 = SparseGraph ( G e n e r a l V e r t e x L i s t ( graph. getnumvertices ( ) ) ) 12 graph2. addedges ( e d g e I n d i c e s ) The mapping between vertex names in DictGraph and those in SparseGraph can be found using graph.getalledgeindices(). A useful subclass of DictGraph which restricts the input to trees is DictTree, see the reference for more details. Random Graph Generation As well as creating graphs in the fashion outlined in the preceding examples, one can use a number of graph generators to produce graphs in random and non-random ways. Currently, there are 5 random graph generator types: BarabasiAlbertGenerator, ConfigModelGenerator, ErdosRenyiGenerator, KroneckerGenerator and SmallWorldGenerator. In the following code block, we show how to generate a random graph using an Erdos-Renyi [3] process: 1 from apgl. graph. DenseGraph import DenseGraph 2 from apgl. graph. G e n e r a l V e r t e x L i s t import G e n e r a l V e r t e x L i s t 3 from apgl. g e n e r a t o r. ErdosRenyiGenerator import 5 numvertices = 20 6 graph = DenseGraph ( G e n e r a l V e r t e x L i s t ( numvertices ) ) 7 http://networkx.lanl.gov/ 8 http://igraph.sourceforge.net/ 7

7 8 p = 0. 2 g e n e r a t o r = ErdosRenyiGenerator ( p ) 10 graph = g e n e r a t o r. g e n e r a t e ( graph ) For the ErdosRenyiGenerator object, the probability of an edge between any vertices is set to 0.2, and edges are created independently of the other edges. Furthermore, no self edges are created. Figure 1 shows the resulting graph. Notice that the random graph created using ErdosRenyiGenerator uses the numpy.random module and consequently identical random graphs by using the same numpy.random.seed value. 7 12 3 16 15 6 11 8 10 17 1 1 18 2 0 13 1 5 Figure 1: An Erdos-Renyi graph generated using the ErdosRenyiGenerator class. 5 Summary We exemplified some of the main features of the APGL graph library, with the aim of getting one familiarised with the library. The basic graph types were introduced, as well as how to manipulate graphs, find graph properties, perform set operations, write and read from files and generate random graphs. For much more information, see the reference documentation online at http://packages.python.org/apgl/. 8

References [1] M. E. J. Newman. The structure and function of complex networks. SIAM Review, 5(2):167 256, 2003. [2] Stephen Warshall. A theorem on boolean matrices. Journal of the ACM, (1):11 12, 162. [3] Paul Erdős and Alfréd Rényi. On random graphs. Publicationes Mathematicae, 6:20 27, 15.