An Introduction to APGL

An Introduction to APGL Charanpal Dhanjal February 2012 Abstract Another Python Graph Library (APGL) is a graph library written using pure Python, NumPy and SciPy. Users new to the library can gain an overview and start to learn some of the core functionality using this short article. Contents 1 Introduction 1 2 Download and Installation 2 3 Graph Storage and Manipulation 2 3.1 Creation and Basic Operations................... 3 3.2 Graph Properties........................... 3.3 Set Operations............................ 5 3. Input/Output............................. 6 3.5 Interfacing with NetworkX and igraph............... 7 3.6 Creating a DictGraph........................ 7 Random Graph Generation 7 5 Summary 8 1 Introduction APGL: Another Python Graph Library 1 has a self-explanatory title: it is a library for graph manipulation using pure Python 2. One of the main features of APGL is that the graph objects are based on adjacency matrices implemented using NumPy 3, SciPy and Pysparse 5, which allows for fast and memory efficient implementations of numerous algorithms. The sparse graph classes can scale up to 1,000,000s of vertices and edges on a standard PC. 1 http://packages.python.org/apgl/index.html 2 http://python.org/ 3 http://numpy.scipy.org/ http://www.scipy.org/ 5 http://pysparse.sourceforge.net/ 1

In this document we show how to install the library and demonstrate, using simple examples, some important functionality. Graph creation, manipulation, and input/output are exemplified. 2 Download and Installation One can download APGL for Windows, Linux or Mac OS using Sourceforge http://sourceforge.net/projects/apythongraphlib/ or the Python Package Index (PyPI) http://pypi.python.org/pypi/apgl/. To use this library you must have Python, NumPy and SciPy installed. The code has been verified on Python 2.7.2, Numpy 1.6.1 and Scipy 0.10.0, but should work with other versions. The automatic testing routine requires Python 2.7 or later, or the unittest2 testing framework for Python 2.3-2.6. To install the package, ensure that pip 6 is installed, and then run: pip i n s t a l l apgl If installing from source unzip the apgl-x.y.z.tar.gz file and then run setup.py as follows: python setup. py i n s t a l l In order to test the library (recommended), using the following commands in python import apgl apgl. t e s t ( ) and check that all tested pass. 3 Graph Storage and Manipulation A graph G = (V, E) is denoted by a set of vertices V and edges E V V, in which an edge is a relation between a pair of vertices. The current graph types in APGL are SparseGraph, DenseGraph and PySparseGraph which use adjacency or weight matrices as the underlying data structure. Note that there is also the DictGraph class, which does not use weight matrices and is described in Section 3.6. For a graph with n vertices, an adjacency matrix A has a value of 1 at the ijth entry if an edge exists between vertices i and j, otherwise the entry is zero. A weight matrix is identical to an adjacency matrix except that matrix elements can take any real value, and a non-zero element indicates an edge between vertices. In an undirected graph, an edge exists from vertex i to j whenever there is an edge from j to i. A directed graph does not have this constraint. In APGL, the edges in a graph are stored using weight matrices and the three different graph classes differ mainly in their underlying storage mechanism. 6 http://pypi.python.org/pypi/pip 2

DenseGraph uses numpy.ndarrays to store adjacencies whereas SparseGraph uses the scipy.sparse classes and is efficient for the storage of large graphs without many edges. As the scipy.sparse classes are written in Python we also provide PySparseGraph which uses Pysparse to store adjacencies. Many matrix operations for Pysparse are written in C and hence may be faster than the scipy.sparse ones. Edge values can currently only be numerical, however vertices can be labelled with anything using a subclass of the AbstractVertexList class. Currently, there are two general way of labelling vertices: using numpy.ndarrays in conjunction with VertexList, and with any label using GeneralVertexList. AbstractVertexList can easily be extended in order to define different vertex labelling methods. 3.1 Creation and Basic Operations We start by demonstrating the creation of a graph using SparseGraph. Notice that DenseGraph and PySparseGraph operate in a near-identical manner. 1 import numpy 2 from apgl. graph. V e r t e x L i s t import V e r t e x L i s t 3 from apgl. graph. SparseGraph import SparseGraph numvertices = 5 5 numfeatures = 2 6 graph = SparseGraph ( V e r t e x L i s t ( numvertices, numfeatures ) ) 7 8 #Add some edges to the graph #V e r t i c e s are indexed s t a r t i n g from 0 10 graph [ 0, 1 ] = 0. 1 11 graph [ 1, 2 ] = 1. 0 12 13 #Set the l a b e l o f the 0 th v e r t e x to [ 2, 3 ] 1 graph. s e t V e r t e x ( 0, numpy. array ( [ 2, 3 ] ) ) 15 16 #D i s p l a y s edge weights 17 p r i n t ( graph [ 1, 2 ] ) 18 p r i n t ( graph [ 1, 3 ] ) The first 2 lines import all of the graph classes required. Following, a VertexList object is created with 5 vertices and vector labels of size 2, which are all initialised to zero. Using this VertexList object, a SparseGraph object is created with no edges. By default, the SparseGraph is an undirected graph, however we will later show how to construct directed graphs. Two edges are added to the graph from vertex 0 to 1 with a weight of 0.1, and from 1 to 2 with a weight of 1.0. Notice that one can only add non-zero edge labels, as zero indicates the absence of an edge. Notice that edges can be referenced as if accessing a matrix directly. Often it is more convenient to add edges in a group rather than individually, and in this case one can use the addedges method. For example, the above code can be modified as follows to produce the same results 1 #Add some edges to the graph 2 edges = numpy. array ( [ [ 0, 1 ], [ 1, 2 ] ], numpy. i n t ) 3 edgevalues = numpy. array ( [ 0. 1, 1. 0 ] ) graph. addedges ( edges, edgevalues ) 5 3

6 #D i s p l a y s edge weight between v e r t i c e s 1 and 2 7 p r i n t ( graph [ 1, 2 ] ) The call to addedges uses a matrix of size m 2 as the first parameter and an array of length m of edge values as the second parameter. Each row of the matrix edges corresponds to an edge between two vertices and the corresponding value in edgevalue is the corresponding value. SparseGraphs are created by default using the SciPy csr matrix class (Compressed Sparse Row matrix), which allows for fast access to the rows of the adjacency matrix. One can also create a SparseGraph using other types of sparse matrix (currently limited to lil matrix, csr matrix, csc matrix and dok matrix): 1 import numpy 2 import s c i p y. s p a r s e as sps 3 from apgl. graph. G e n e r a l V e r t e x L i s t import G e n e r a l V e r t e x L i s t from apgl. graph. SparseGraph import SparseGraph 5 numvertices = 10 6 v L i s t = G e n e r a l V e r t e x L i s t ( numvertices ) 7 Wght = sps. c s c m a t r i x ( ( numvertices, numvertices ) ) 8 graph = SparseGraph ( vlist, W=Wght, u n d i r e c t e d=f a l s e ) 10 graph [ 0, 1 ] = 1 11 graph [ 0, 2 ] = 1 12 graph. s e t V e r t e x ( 0, abc ) 13 graph. s e t V e r t e x ( 1, 123) 1 15 p r i n t ( graph. i n D e g r e e D i s t r i b u t i o n ( ) ) Here, we use a different type of vertex list using the GeneralVertexList class, which allows vertex labels to take any value. The sparse matrix used in the graph is a scipy.sparse.csc matrix which is in Compressed Sparse Column format. The final parameter used in the constructor specifies that the resulting graph is directed. Following graph construction, the first and second vertices are initialised with abc and 123 respectively. The final line of the example computes the in-degree distribution of the graph, which is faster when the adjacency matrix is a csc matrix compared to the default choice of csr matrix. In this case however, the speed difference is negligible as the graph is very small. With larger graphs, the choice of weight matrix type can significantly affect the speed of graph algorithms. SparseGraph and DenseGraph have a number of additional methods for querying and modifying the underlying graph. For example, getnumedges() and getnumvertices() return the number of edges and vertices respectively. The neighbours(vertexid) method returns the set of neighbouring vertices for the given vertexid, and neighbourof returns the set of vertices which have edges to vertexid for a directed graph. 3.2 Graph Properties To study the characteristics of graphs, various properties have been proposed in the research literature. Some of the most common ones are shown below

(the interested reader is directed to e.g. [1] for more precise definitions of the properties): clusteringcoefficient() - 3 times the number of triples divided by the number of triangles density() - The proportion of edges vs. total possible number of edges diameter() - Length of the longest shortest path in the graph effectivediameter(p) - A more robust alternative to the diameter geodesicdistance() - Mean shortest distance between all pairs of vertices harmonicgeodesicdistance() - Mean harmonic shortest distance between all pairs of vertices Several of the methods above require the computation of the shortest paths between all pairs of vertices. The matrix of shortest paths P can be found using the Floyd-Warshall algorithm [2] at a computational cost of O(n 3 ) where n is the size of the graph. The ijth entry of P is the shortest path between vertices i and j. To compute diameters and geodesic distances, one can optionally pass in a matrix P as follows: 1 from apgl. graph. G e n e r a l V e r t e x L i s t import G e n e r a l V e r t e x L i s t 2 from apgl. graph. SparseGraph import SparseGraph 3 numvertices = 10 5 graph = SparseGraph ( G e n e r a l V e r t e x L i s t ( numvertices ) ) 6 7 graph [ 0, 1 ] = 1 8 graph [ 0, 2 ] = 1 10 P = graph. f l o y d W a r s h a l l ( ) 11 p r i n t ( graph. g e o d e s i c D i s t a n c e (P=P) ) 12 p r i n t ( graph. harmonicgeodesicdistance (P=P) ) The above example outputs both the mean geodesic distance and harmonic mean geodesic distance using the matrix P as computed using the floydwarshall method (note that one can also use findalldistances which is based on Dijkstra s algorithm). As P is computed only once but used twice, this usage reduces computational cost over making the default calls of the distance methods. 3.3 Set Operations By considering the edges as a set of pairs of vertices one can perform various set operations with graphs, and some of these are listed in in Table 1. The first methods in Table 1 consider the edges in the graphs without weights, hence the resulting returned graphs contain adjacency matrices only. For the subgraph method, the graph returned contains only those vertices indexed by vinds and edges between these vertices. 5

Example Method Call g1.union(g2) g1.intersect(g2) g1.setdiff(g2) g1.complement() g1.subgraph(vinds) Description Union between graph edges Intersection of graph edges Find edges in g1 that are not in g2 Find the graph with edges which are not present in g1 Compute the subgraph using the selected vertices Table 1: Methods to perform set operations using graphs g1 and g2, and vertex indices set vinds. 3. Input/Output Simple file reading and writing is possible by using a pre-defined comma separated value format which is exemplified as follows: Vertices 0 1 2 3 Edges 0, 1, 1 2,, 1, 0, 1 2, 2, 1 Vertex labels can only be integers and must be listed after Vertices. Following, edges are given as a sequence of triples corresponding to two vertices and the last value in the triple is the edge weight. In the case that the graph is directed one should replace Edges with Arcs. The graph corresponding to this file (saved as test.txt ) is read using the code: 1 from apgl. i o import SimpleGraphReader 2 3 filename = t e s t. t x t graphreader = SimpleGraphReader ( ) 5 graph = graphreader. readfromfile ( filename ) 6 7 p r i n t ( graph. getalledges ( ) ) 8 #Save the edges and v e r t i c e s i n testgraph. z i p 10 graph. save ( testgraph ) The graph returned from graphreader is a SparseGraph. Furthermore, the output of the final line of the code is [[1 0] [2 2] [ 0] [ 2]]. In order to write graphs in this format, one can use the SimpleGraphWriter class. Notice that SimpleGraphReader and SimpleGraphReader work with only the edges of the graph, i.e. vertex labels are not stored. To save complete graphs, including the vertex labels, one can use the save and load methods which store and load the weight matrices in matrix market format. 6

3.5 Interfacing with NetworkX and igraph NetworkX 7 is a Python package for the creation, manipulation, and study of the structure, dynamics, and functions of complex networks. In APGL there are methods to convert between NetworkX graphs and APGL ones. To convert from a PySparseGraph, SparseGraph or DenseGraph to a NetworkX undirected Graph or directed DiGraph, one can use the tonetworkx method. See the reference documentation for more details. Similarly, there is toigraph to output igraph 8 objects. 3.6 Creating a DictGraph It might seem that the construction of the graph classes is restrictive as one is required to construct an object with the number of vertices, and these vertices are indexed using integers. For growing graphs with non-integer names we provide the DictGraph class which uses a dictionary of dictionaries to store adjacencies. One can easily create and populate a DictGraph and then transfer the edges to a matrix graph: 1 from apgl. graph. DictGraph import DictGraph 2 from apgl. graph. SparseGraph import SparseGraph 3 from apgl. graph. G e n e r a l V e r t e x L i s t import G e n e r a l V e r t e x L i s t 5 graph = DictGraph ( ) 6 graph. addedge ( a, b ) 7 graph. addedge ( a, c ) 8 graph. addedge ( a, d ) 10 e d g e I n d i c e s = graph. g e t A l l E d g e I n d i c e s ( ) 11 graph2 = SparseGraph ( G e n e r a l V e r t e x L i s t ( graph. getnumvertices ( ) ) ) 12 graph2. addedges ( e d g e I n d i c e s ) The mapping between vertex names in DictGraph and those in SparseGraph can be found using graph.getalledgeindices(). A useful subclass of DictGraph which restricts the input to trees is DictTree, see the reference for more details. Random Graph Generation As well as creating graphs in the fashion outlined in the preceding examples, one can use a number of graph generators to produce graphs in random and non-random ways. Currently, there are 5 random graph generator types: BarabasiAlbertGenerator, ConfigModelGenerator, ErdosRenyiGenerator, KroneckerGenerator and SmallWorldGenerator. In the following code block, we show how to generate a random graph using an Erdos-Renyi [3] process: 1 from apgl. graph. DenseGraph import DenseGraph 2 from apgl. graph. G e n e r a l V e r t e x L i s t import G e n e r a l V e r t e x L i s t 3 from apgl. g e n e r a t o r. ErdosRenyiGenerator import 5 numvertices = 20 6 graph = DenseGraph ( G e n e r a l V e r t e x L i s t ( numvertices ) ) 7 http://networkx.lanl.gov/ 8 http://igraph.sourceforge.net/ 7

7 8 p = 0. 2 g e n e r a t o r = ErdosRenyiGenerator ( p ) 10 graph = g e n e r a t o r. g e n e r a t e ( graph ) For the ErdosRenyiGenerator object, the probability of an edge between any vertices is set to 0.2, and edges are created independently of the other edges. Furthermore, no self edges are created. Figure 1 shows the resulting graph. Notice that the random graph created using ErdosRenyiGenerator uses the numpy.random module and consequently identical random graphs by using the same numpy.random.seed value. 7 12 3 16 15 6 11 8 10 17 1 1 18 2 0 13 1 5 Figure 1: An Erdos-Renyi graph generated using the ErdosRenyiGenerator class. 5 Summary We exemplified some of the main features of the APGL graph library, with the aim of getting one familiarised with the library. The basic graph types were introduced, as well as how to manipulate graphs, find graph properties, perform set operations, write and read from files and generate random graphs. For much more information, see the reference documentation online at http://packages.python.org/apgl/. 8

References [1] M. E. J. Newman. The structure and function of complex networks. SIAM Review, 5(2):167 256, 2003. [2] Stephen Warshall. A theorem on boolean matrices. Journal of the ACM, (1):11 12, 162. [3] Paul Erdős and Alfréd Rényi. On random graphs. Publicationes Mathematicae, 6:20 27, 15.