An Extensible Graph Toolkit for Mathematica Eytan Bakshy SI 708 Introduction & Motivation The Extensible Graph Toolkit for Mathematica, or EGraph, is a framework for studying networks. The toolkit is a lightweight, extensible object framework that poses significant advantages over the two standard methods of representing graphs in Mathematica: rule lists and Combinatorica graphs. This tutorial demonstrates the usage of the kit in the analysis and visualization of static and evolving networks. Graph Programming with Rule Lists Rule-lists can be used with some of Mathematica's more modern graph utilities such as adaptive algorithms for computing all-pairs shortest paths, pseudodiameters, PageRank, and finding strong and weakly connected components. The objects in rule lists can easily be plotted using GraphPlot[]. Since the rules can contain arbitrary objects, text or graphics objects are automatically rendered inline as nodes. Here are a few examples of how one constructs a graph as a rule list, finds its weakly connected components, and use GraphPlot to plot data GraphUtilities` rulegraph " "," " 3,4 5,"a" " ","a" 3,6 7,7,7 " "," " WeakComponents rulegraph GraphPlot rulegraph GraphPlot rulegraph,vertexlabeling True,, 3, a, 6, 7,, 4, 5 6 a 7 3 5 4
2 EGraph Paper.nb Pitfalls with Combinatorica Graph objects and the EGraph Structure Combinatorica is a package that covers many areas of discrete mathematics, including graph theory. It has a generic Graph object, which allows users to add labels, edge weights, and progressively consruct graphs. It also has a number of algorithms to compute basic graph measures. Unfortunately, the structure is not well optimized for large graphs. In this section, we will compare the performance characteristics of the Combinatorica Graph structure with the EGraph structure. Combinatorica` In the following series of experiments, it is shown that time to compute a vertex's nearest neighbor scales linearly with the number of nodes in the graph (in this case, a random graph), using Mathematica's built-in graph structure. The EGraph structure on the other hand uses a double-hash structure so that these operations can be carried out in constant time. GraphTiming Table n,first Timing Neighborhood,, & RandomGraph n,2 n, n,00,000,00 EGraphTiming Table n,first Timing AllNeighbors, & RandomEGraph n,2 n, n,00,000,00 ; ListPlot GraphTiming,EGraphTiming,AxesOrigin 0,0, PlotLabel "Neighborhood access time vs number of nodes", AxesLabel "num nodes","access time Neighborhood access time vs number of nodes access time 0.020 0.05 0.00 0.005 num nodes 200 400 600 800 000 The difference in performance is even more pronounced when one compares Combinatorica's shortest path algorithm to the one used by EGraph. gives the time to find the shortest path length for random nodes, averaged over 50 trials AveragePathLengthTiming graph,numnodes,pathfunction : Mean Table First Timing PathFunction graph,randominteger,numnodes,randominteger,numnodes GraphTiming Table n, AveragePathLengthTiming RandomGraph n,2 n,n,shortestpath, n,00,000 EGraphTiming Table n, AveragePathLengthTiming RandomEGraph n,2 n,n,pathlength, n,00,000 ; ListPlot GraphTiming,EGraphTiming,AxesOrigin 0,0, PlotLabel "Shortest path calculation time vs number of nodes", AxesLabel "num nodes"," time Shortest path calculation time vs number of nodes time 0.4 0.3 0.2 0. num nodes 200 400 600 800 000
EGraph Paper.nb 3 Getting Started with the EGraph Object The EGraph structure consists of a double-hash sparse matrix representation of the graph, with forward and backwards index of vertex labels. These labels may be of any data format. In most circumstances, the labels will either be integers or strings, but the toolkit supports labels of any datatype. Finally, the EGraph object contains an attribute store, which can hold arbitrary data about the graph, including individual node properties. An EGraph object can be instantiated from a list of vertex label pairs, or rules, or, as a sparse adjacency matrix ToEGraph 0,,,2, 2,3, 0,2 ToEGraph 0, 2,2 3,0 2 ToEGraph SparseArray i,j If RandomReal.0 0.,,0, 4,4 EGraph of 4 nodes EGraph of 4 nodes EGraph of 4 nodes EGraph structures are capable of dealing with non-numerical data as well. Consider the network of trade partners in the UN importlinks Select Flatten Thread Rule CountryData,"ImportPartners", & CountryData "UN" exportlinks Select Flatten Thread Rule,CountryData,"ExportPartners" & CountryData "UN" tradedata Join importlinks,exportlinks Short tradedata tradegraph ToEGraph tradedata Pakistan Afghanistan, UnitedStates Afghanistan, 2048, Zimbabwe Italy, Zimbabwe Germany EGraph of 98 nodes The graph structure itself is smaller than the base data, because vertex labels are only stored in the backwards and forwards index hash. For larger networks, like the Yahoo! QA network, this difference can be quite significant. ByteCount tradedata ByteCount tradegraph 87 584 48 60 Labels can easily be extracted from the graph and used to label the output of network metrics Short VertexLabels tradegraph Short PageRanks tradegraph Afghanistan, Albania, Algeria, AmericanSamoa, Andorra, 88, Venezuela, Vietnam, Yemen, Zambia, Zimbabwe Afghanistan 0.004284, Albania 0.00305773, 95, Zimbabwe 0.00245865 The EGraph structure can easily be converted into other commonly used representations, like matiricies and ordered pairs
4 EGraph Paper.nb MatrixPlot GraphMatrix tradegraph EGraphToPairs tradegraph Short 50 00 50 98 50 50 00 00 50 50 98 98 50 00 50 98 Afghanistan, UnitedStates, Afghanistan, Pakistan, 908, Zimbabwe, Italy, Zimbabwe, Germany Here is an example of how the following data can be modified to visualize the rankings of countries based on prestige. (mouse over to see country) ListPlot Sort PageRanks tradegraph. Rule country,pr Tooltip pr,country, AxesLabel "Ordinal Rank","PageRank" PageRank 0.00 0.008 0.006 0.004 Ordinal Rank 50 00 50 200 Analysis of Dynamic Network Data We will demonstrate various ways to analyze the dynamics of network models The Diameter of an Evolving Graph The first example shows how one can perform experiments with NetLogo and the EGraph structure to study properties of evolving networks. In this example, we use the modified preferential attachment model from class. NetLogo` NLStart " Applications NetLogo 4.0" NLLoadModel " Users ebakshy netlearn RandAndPrefNetEB.nlogo" To perform the experiment, we grow the graph to 5000 nodes and sample its structure every 00 model ticks. Because the behavior of the model might differ from run to run, we perform 20 realizations of the model. The resulting network is stored in BAG[step, realization #] (warning, the simulation takes a considerable amount of time to run)
To perform the experiment, we grow the graph to 5000 nodes and sample its structure every 00 model ticks. Because the behavior of the model might differ from run to run, we perform 20 realizations of the model. The resulting network is stored in BAG[step, realization #] (warning, the simulation takes a considerable amount of time to run) Do NLCommand "set m 2","setup" Do NLCommand "repeat 00 go " BAG i,j ToEGraph NLGetGraph,Directed False, i,50, j,20 Here is how one would plot a simple statistic, like the degree distribution EGraph Paper.nb 5 ListLogLogPlot Cumdist OutDegrees BAG 50,,AxesLabel "Degree","Count",PlotLabel "Degree Count Degree Distribution 000 500 00 50 0 5 2 5 0 20 50 00 Degree To plot the effective diameter of the graph over time, we plot the average the results at each time step AvgDiameter i : Mean Table First First PseudoDiameter BAG i,j, j,20 ListLinePlot Table NumNodes BAG i,, AvgDiameter i, i,50,axeslabel "Num Nodes","Pseudo Pseudo Diameter 8.0 7.5 7.0 6.5 6.0 Num Nodes 000 2000 3000 4000 5000 The Spread of Opinions in a Network In this example, we will generate a network and observe the diffusion of opinions on the network. Then, the structure of the network will be used to generate a two dimensional representation of agent influence over time. NLLoadModel " Users ebakshy netlearn PrefDiffusionEBOpinion.nlogo" NLCommand "set num nodes 300","set gamma 0.3","set p 0.4","set sorting? true","generate topology NLCommand "setup" opgraph ToEGraph NLGetGraph,Directed False this part is ugly but will be more straight forward in future versions opseries Apply Rule,NLDoReport "repeat 2 spread "," list who pro? of turtles", 200, 2 agentstates Map N VertexLabels opgraph. &,opseries ; SetAttribute opgraph,"opinions",agentstates One way of tracking people's opinions over time is to represent each agent as a row in an ArrayPlot, with each column representing the agent's opinion at time t.
6 EGraph Paper.nb One way of tracking people's opinions over time is to represent each agent as a row in an ArrayPlot, with each column representing the agent's opinion at time t. OpinionPlot opinions : ArrayPlot opinions. True Red,False Lighter Pink,0.5,AspectRatio Here are the opinions of all agents over time, sorted in random order. In this case, the spatial ordering of agents in no way reflects the structure of interaction among agents. OpinionPlot GetAttribute opgraph,"opinions" Here is what the network looks like, using a ball and stick plot and a matrix plot
EGraph Paper.nb 7 GraphPlot GraphMatrix opgraph MatrixPlot GraphMatrix opgraph 00 200 304 00 00 200 200 304 304 00 200 304 We can compute the shortest path between all pairs of nodes and use hierarchical clustering to resolve community structure HierarchicalClustering`
8 EGraph Paper.nb gdm GraphDistanceMatrix GraphMatrix opgraph gdordering ClusterFlatten DirectAgglomerate gdm GraphicsRow MatrixPlot gdm, PlotLabel "Vertex Distance Matrix unordered ", MatrixPlot gdm gdordering,gdordering,plotlabel "Vertex Distance Matrix clustered " Vertex Distance Matrix unordered 00 200 304 Vertex Distance Matrix clustered 00 200 304 00 00 00 00 200 200 200 200 304 304 00 200 304 304 304 00 200 304 This new ordering can be used to visualize agent states over time in a way in which agents' spatial proximity is related to their structural proximity in the network OpinionPlot GetAttribute opgraph,"opinions" gdordering Network Visualization Fisher-Smith Egonet Plots The kit has built-in support for "Fisher" plots, which have been used in our work on Yahoo! Answers. Here is one example using previous data set
EGraph Paper.nb 9 FisherSample tradegraph,20 Using Attribute Stores for Enhancing Network Visualizations This part is a bit experimental, but is a good demonstration of how the attribute object can be used to enhance network visualizations importlinks Select Flatten Thread Rule CountryData,"ImportPartners", & CountryData "MiddleEast exportlinks Select Flatten Thread Rule,CountryData,"ExportPartners" & CountryData "MiddleEast mideastgraph ToEGraph Join importlinks,exportlinks We assign each node a shape attribute SetAttribute mideastgraph,"shape",tooltip CountryData,"Shape", & VertexLabels mideastgraph ;
0 EGraph Paper.nb GraphPlot GraphMatrix mideastgraph, VertexRenderingFunction Inset GetAttribute tradegraph, "shape" 2,, Automatic, 0.5, 0.5 &
EGraph Paper.nb Supporting Code GraphUtilities`; Format EGraph sa SparseArray, labels, labelrules, attributes : "EGraph of " ToString Length labels " nodes" " "; GraphMatrix g EGraph : g VertexLabels g EGraph : g 2 VertexLabelToIndex g EGraph : g 3 AttributeObject g EGraph : g 4 PairIntersect pairs,baselist is a fast algorithm for reducing a list of ordered pairs to a base set returns a,b a,b pairs a baselist b baselist PairIntersect pairs, baselist : Module membershiprules, membershiprules Dispatch Thread Rule baselist, True DeleteCases Pick pairs, pairs. membershiprules, True, True, Options ToEGraph VertexLabeling True, Directed True Clear ToEGraph ToEGraph g : Rule,.., opts : OptionsPattern : ToEGraph List g, opts ToEGraph g :,.., opts : OptionsPattern : Module allvertexlabels, labeltoindex, gtally, storedvertexlabels, storedlabeltoindex, arrayrules, graphmatrix, allvertexlabels Union Flatten g labeltoindex Dispatch MapIndexed Rule, First 2 &, allvertexlabels gtally Tally g. labeltoindex storedvertexlabels If OptionValue VertexLabeling True, allvertexlabels, Range Length allvertexlabels storedlabeltoindex If OptionValue VertexLabeling True, labeltoindex, Rule i, i arrayrules gtally. in Integer, out Integer, count Integer Rule in, out, count If OptionValue Directed False, arrayrules Join arrayrules, arrayrules. Rule i, j, c Rule j, i, c graphmatrix SparseArray arrayrules, Length allvertexlabels, Length allvertexlabels, 0 EGraph graphmatrix, storedvertexlabels, storedlabeltoindex, Unique attributeobject ToEGraph m SparseArray, opts : OptionsPattern : Module allvertexlabels, labeltoindex, gtally, storedvertexlabels, storedlabeltoindex, arrayrules, graphmatrix, allvertexlabels Range Length m labeltoindex i i If OptionValue Directed False, arrayrules Join arrayrules, arrayrules. Rule i, j, c Rule j, i, c graphmatrix Map If 0,, 0 &, m m, 2 make symmetric EGraph graphmatrix, allvertexlabels, labeltoindex, Unique attributeobject EGraphToPairs G EGraph : First Most ArrayRules GraphMatrix G. a Integer, b Integer VertexLabels G a, b Clear RandomEGraph RandomEGraph n, p : Module m, ToEGraph SparseArray i, j If RandomReal.0 p,, 0 i j, n, n sets arbitrary attribute
2 EGraph Paper.nb sets arbitrary attribute SetAttribute G EGraph, attr String, data : AttributeObject G attr data; sets node attributes by encoding data by vertex label data rules SetAttribute G EGraph, attr String, data : List,.., opts : OptionsPattern : Module datadispatch Dispatch Rule data, SetAttribute G, attr, Table vertex. datadispatch, vertex, VertexLabels G GetAttribute G EGraph, attr String : AttributeObject G attr NumNodes G EGraph : Length GraphMatrix G InDegrees G EGraph : Total GraphMatrix G OutDegrees G EGraph : Total GraphMatrix G OutNeighbors G EGraph, i Integer : Cases ArrayRules GraphMatrix G i, Rule node Integer, val Integer node val 0 InNeighbors G EGraph, i Integer : Cases ArrayRules GraphMatrix G All, i, Rule node Integer, val Integer node val 0 OutNeighbors G EGraph, i : VertexLabels G OutNeighbors G, i. VertexLabelToIndex G InNeighbors G EGraph, i : VertexLabels G InNeighbors G, i. VertexLabelToIndex G AllNeighbors G EGraph, i : Join InNeighbors G, i, OutNeighbors G, i AllNeighbors2 G EGraph, i : Union Flatten AllNeighbors G, & AllNeighbors G, i ordered pairs of node i and its in or out neighbors InNeighborPairs G EGraph, i : Thread InNeighbors G, i, i OutNeighborPairs G EGraph, i : Thread i, OutNeighbors G, i AllNeighborPairs G EGraph, i : Join OutNeighborPairs G, i, InNeighborPairs G, i all second nearest neighbors AllNeighbors2Pairs G EGraph, i : Module myneighbors, myneighbors AllNeighbors G, i Union Join AllNeighborPairs G, i, Flatten Map AllNeighborPairs G, &, myneighbors, egonet is all nearest neighbors of i and the links between all nearest neighbors EgoNetPairs G EGraph, i : Join AllNeighborPairs G, i, PairIntersect AllNeighbors2Pairs G, i, AllNeighbors G, i AllNeighbors2Graph G EGraph, i : ToEGraph AllNeighbors2Pairs G, i. a Integer, b Integer VertexLabels G a, b EgoNetGraph G EGraph, i : ToEGraph EgoNetPairs G, i. a Integer, b Integer VertexLabels G a, b Unprotect StrongComponents, WeakComponents, PseudoDiameter, PageRankVector, CommunityStructureAssignment, CommunityStructurePartition, CommunityModularity, PathLengthMatrix, PathLengthMatrix StrongComponents G EGraph : StrongComponents GraphMatrix G PageRankVector G EGraph, opts : PageRankVector GraphMatrix G, opts PageRanks G EGraph, opts : Thread VertexLabels G PageRankVector G, opts WeakComponents G EGraph : WeakComponents GraphMatrix G PseudoDiameter G EGraph, opts : PseudoDiameter GraphMatrix G, opts CommunityStructureAssignment G EGraph, opts : CommunityStructureAssignment EGraphToPairs G CommunityStructurePartition G EGraph, opts : CommunityStructurePartition EGraphToPairs G CommunityModularity G EGraph, part List, opts : CommunityModularity EGraphToPairs G, part, opts PathLengthMatrix G EGraph : GraphDistanceMatrix GraphMatrix G PathLength G EGraph, i Integer, j Integer : GraphDistance GraphMatrix G, i, j
EGraph Paper.nb 3 PathLength G EGraph, i Integer, j Integer : GraphDistance GraphMatrix G, i, j PathLength G EGraph, i, j : GraphDistance GraphMatrix G, i. VertexLabelToIndex G, j. VertexLabelToIndex G PathLength G EGraph, i, j : PathLength G, i, j Protect StrongComponents, WeakComponents, PseudoDiameter, PageRankVector, CommunityStructureAssignment, CommunityStructurePartition, CommunityModularity, GraphDistanceMatrix, GraphDistance cumulative binning for doing log plots Cumdist l : Module valuetally, sorted in descending order by value valuetally Reverse SortBy Tally l, First pair each value with the of values less than itself Transpose First valuetally, Accumulate Last valuetally Clear FisherSample RandomizePairs pairs : Module randlist, randlist Map Rule, RandomReal &, Union Flatten pairs pairs. randlist FisherSample G EGraph, n, opts : Module thegraph, thegraph Flatten Table Rule RandomizePairs EgoNetPairs G, RandomInteger, Length VertexLabels G, n GraphPlot thegraph, DirectedEdges True, EdgeRenderingFunction Black, Arrowheads 0.0, Arrow, 0.0 &, VertexLabeling False, opts