Towards Modelling The Internet Topology The Interactive Growth Model Shi Zhou (member of IEEE & IEE) Department of Electronic Engineering Queen Mary, University of London Mile End Road, London, E1 4NS United Kingdom Email: shi.zhou@elec.qmul.ac.uk Tel: +44-2078825334 Fax: +44-2078827997 Raul J. Mondragon Department of Electronic Engineering Queen Mary, University of London Mile End Road, London, E1 4NS United Kingdom Email: r.j.mondragon@elec.qmul.ac.uk Tel: +44-2078825358 Fax: +44-2078827997 Abstract The Internet topology at the autonomous systems level (AS graph) has a powerlaw degree distribution. Recently it was reported that the AS graph shows a rich-club phenomenon, in which rich nodes (a small number of nodes with large numbers of links) are very well connected to each other and rich nodes are connected preferentially to the other rich nodes. The rich-club phenomenon precisely describes and quantifies the topological structure of the AS graph. In this paper we introduce the Interactive Growth (IG) model. We show that the simple and dynamic IG model compares favourably with other Internet power-law topology generators in terms of the degree distribution and the rich-club phenomenon. Index Terms Internet, topology, modeling, networks. I. INTRODUCTION Faloutsos et al [1] discovered that the Internet topology at the autonomous systems level (AS graph) has a power-law degree distribution. Power law topologies have a small number of nodes having large numbers of links. We call these nodes rich nodes. Section II revisits a recent discovery [8] [9] that the AS graph shows a rich-club phenomenon, in which rich nodes are very well connected to each other and rich nodes are connected preferentially to the other rich nodes. The rich-club phenomenon precisely describes and quantifies the topological structure of the AS 1. Shi Zhou is the corresponding author. 2. This research is supported by the U.K. Engineering and Physical Sciences Research Council (EPSRC) under grant no. GR-R30136-01. 1
graph. The rich-club phenomenon is relevant because links between rich nodes are crucial for network properties. Section III reviews several Internet power-law topology generators, such as the Inet-3.0 model [3], the Barabási-Albert (BA) model [4] and the Generalized Linear Preference (GLP) model [6]. In section IV, we introduce the Interactive Growth (IG) model. In section V, we compare the IG model and the other models against the AS graph by examining the details of degree distribution and the rich-club phenomenon. We show that the IG model produces network topologies that resemble the AS graph better than the other models. Finally Section VI concludes the paper with some discussions. II. RICH-CLUB PHENOMENON The fact that rich nodes of power law topologies are very well connected to each other and rich nodes are connected preferentially to the other rich nodes is called the rich-club phenomenon. It is reported [8] that the AS graph shows the rich-club phenomenon. Recently [9] the rich-club phenomenon was measured in the so-called original maps and extended maps of the AS graph. The original maps are based on the BGP routing tables collected by the Oregon route server. The extended map use additional data sources, such as the Looking Glass (LG) data and the Internet Routing Registry (IRR) data. The two maps have similar numbers of nodes, while the extended maps have 40% more links than the original maps. The research shows the majority of the missing links in the original maps are connecting links between rich nodes of the extended maps, and therefore, the extended map shows the rich-club phenomenon significantly stronger than the original map. The rich-club phenomenon of the AS graph is coherent with the study of Tauro et al [10] and Subramanian et al [10]. The later scrutinized the commercial relationships between the AS graph and found that the best-connected ASes are well connected to each other. The rich-club phenomenon precisely describes and quantifies the topological structure of the AS graph. 2
The rich-club phenomenon is relevant because links between rich nodes are important for network properties. In this paper we use the rich-club phenomenon as an metric to distinguish between power-law topology generators. III. INTERNET POWER-LAW TOPOLOGY MODELS Since the discovery of power law degree distribution [1] in the Internet topology, researchers have introduced several Internet models to reproduce this property. A. Inet-3.0 model The Inet-3.0 model [3] is based on measurements of the original maps of the AS graph. This model is important because it is capable of creating networks with accurate details of degree distributions when compared to the measurements. Networks generated by the Inet-3.0 model have fewer links than the AS graph. B. Barabási-Albert (BA) model The BA model [4] shows that a power-law degree distribution can arise from two generic mechanisms: 1) growth, where networks expand continuously by the addition of new nodes, and 2) preferential attachment, where new nodes are attached preferentially to nodes that are already well connected. Let Π(i) denote the probability that a new node will be connected to node i, the linear preference of the BA model is defined as, Π( i ) = ( k i ) / ( k ) (1) where k i is the degree of node i. This model has created great interests in various research areas, and has been applied to the research on error and attack tolerance of the Internet [5]. j j C. Generalized Linear Preference (GLP) model The GLP model [5] has been recently introduced. This model is a modification of the BA model. It reflects the fact that the evolution of the AS graph is mostly due to two operations, the addition of new nodes and the addition of new links between existing nodes. It starts with m 0 nodes connected 3
through m 0-1 links. At each time-step, one of the following two operations is performed: 1) with probability ρ [0,1], m<m 0 new links are added between existing nodes, and 2) with probability 1- ρ, a new node is added and connected to m existing nodes. The GLP model uses the generalized linear preference to choose node i, Π( i ) = ( k i β ) / ( k β ) (2) j where parameter β (,1) can be adjusted such that nodes have a stronger preference of being connected to high degree nodes than predicted by Equation (1). This model matches the AS graph (original map measured in Sept. 2000) in terms of the two properties of the small world [7], the characteristic path length and the clustering coefficient. j IV. INTERACTIVE GROWTH (IG) MODEL The Interactive Growth (IG) model is a modification of the BA model. It also reflects the two operations that account for the evolution of the AS graph, the addition of new nodes and the addition of new links. As shown in Fig. 1, in the GLP model, new links are added between existing nodes, and the addition of new nodes and the addition of new links are two independent operations. Whereas the two operations are related in the IG model. When a new node is connected to existing nodes (host Existing network New li nk peer peer peer host host host New node The GLP model New node A The IG model New Node B Fig. 1 Addition of new nodes and new links in the IG model and the GLP model 4
nodes), new links will connect the host nodes to other existing nodes (peer nodes). During the evolution of the AS graph, new nodes bring new traffic load to its host nodes. It is the increased traffic load that triggers the addition of new links that connect the host nodes to peer nodes in order to balance the traffic load and optimize the network performance. The IG model reflects the joint growth of new nodes and new links. We call this dynamic mechanism the interactive growth. The IG model uses the linear preference of the BA model. Due to the interactive growth, host nodes are not only connected to new nodes but also acquire new links connecting to peer nodes. Their degrees increase at a higher rate than predicted in the BA model. As a consequence, rich nodes of the IG model have significantly higher degrees than those of the BA model. V. MODEL VALIDATION We generate network topologies with the IG model, the GLP model, the Inet-3.0 model and the BA model. The GLP model(1) is grown with parameters of ρ = 0.66, m = 1 (m 0 = 10) and β = 0.6447, which is suggested by its authors. The GLP model(2) is grown with the same parameters except β = 0, TABLE I NETWORK PROPERTIES PROPERTIES AS GRAPH IG MODEL GLP (1) GLP (2) INET-3.0 BA MODEL N 11461 11461 11461 11461 11461 11461 L 32730 34363 34363 34363 24171 34363 k average 5.72 6 6 6 4.22 6 k max 2432 842 517 524 2010 329 P(1) 28.9% 26% 68.4% 52% 40% 0% P(2) 40.3% 33.8% 11.3% 16.3% 36.7% 0% P(3) 11.6% 10.5% 5.1% 7.9% 8.24% 40% N total number of nodes. L total number of links. k average average degree. k max maximum degree. P(k) degree distribution, which is the percentage of node with degree k. 5
10 0 10 4 P(k) 10-1 10-2 GLP model (1) GLP model (2) BA model AS graph Inet 3.0 model IG model k 10 3 10 2 GLP model (1) GLP model (2) BA model AS graph Inet 3.0 model IG model 10-3 10-4 10 1 10-5 10 0 10 1 10 2 10 3 10 4 k Fig. 2 Degree distribution 10 0 10 0 10 1 10 2 10 3 10 4 10 5 r Fig. 3 Node degree k against node rank r which makes its generalized linear preference equivalent to the linear preference of the BA model. The IG model is grown with parameters to match the details of degree distribution of the AS graph. It starts with a random graph of 10 nodes and 10 links. At each time-step, 1) with 40% probability, a new node (as new node A shown in Fig. 1) is connected to one host node and the host node is connected to two peer nodes; and 2) with 60% probability, a new node (as new node B shown in Fig. 1) is connected to two host nodes and one of the host nodes is connected to one peer node. All models have the same number of nodes of the AS graph (an extended map measured on 26 th May, 2001) as shown in Table I. We validate and compare these models with the AS graph by examining the degree distribution and rich-club phenomenon. A. Degree distribution Degree distribution P(k) is the percentage of node with degree k. It is reported [2] that, as shown in Fig. 2, the degree distributions of the AS graph deviates significantly from a strict power law. To take this into consideration, we study the details of degree distribution by examining the low-range degree distribution (degree less than 3), the high-range degree distribution (top 1000 rich nodes) and the maximum degree, which is the largest number of links a node has. 6
1) Low-range degree distribution The low-range degree distribution of the AS graph is important because nodes with degree one and two account for more than 70% of the total nodes. It is interesting to notice that, in the AS graph, the percentage of nodes with degree one is actually smaller than the percentage of nodes with degree two. Fig. 2 and Table I show that the IG model and the Inet-3.0 model closely match the low-range degree distribution of the AS graph. Only in the IG model, P(1) is smaller than P(2). Other models do not resemble this property. For example, in the GLP model (1), P(1)=68.4% is significantly higher than P(2)=11.3%. 2) High-range degree distribution The AS graph has a number of rich nodes with significantly high degrees [2] and these rich nodes play a dominant role in the topology. Fig. 3 is a plot of node degree k against node rank r, where r is the rank of a node on a list sorted in a decreasing order of node degree. Fig. 3 shows that the high-range degree distribution ( r < 10 3 ) of the GLP model(1) and (2) significantly deviate from a strict power law. And the degree distribution (10 1 < r < 10 3 ) of the Inet-3.0 model significantly deviates from that of the AS graph. In general, the IG model matches the high-rang degree distribution of the AS graph more accurately than other models. 3) Maximum degree The AS graph features a large value of maximum degree (2432). The GLP model(1) uses the generalized linear preference with β = 0.6447 and the GLP model(2) uses the linear preference. Table I shows that their maximum degrees are similar. This suggests that, although the generalized linear preference can make nodes have a stronger preference of being connected to high degree nodes, it does not effectively increase the maximum degree of the model. The IG model uses the linear preference of the BA model. But due to the interactive growth, its 7
maximum degree is significantly higher than that of the BA model and the two GLP models. B. Rich-club phenomenon We compare the rich-club phenomenon by examining the rich-club coefficient and the connectivity of rich nodes. 1) Rich-club coefficient To measure how well rich nodes are 100% 10% ϕ (r) 1% 01% AS graph IG model GLP model (1) GLP model (2) Inet_3.0 model BA model connected to each other, we defined the richclub coefficient. The maximum possible number of links that m nodes can have is 0.01% 0.1% 1% 10% 100% r Fig. 4 Rich-club coefficientϕ (r) against node rank r m(m-1)/2. The rich-club coefficient ϕ (r) is defined as the ratio of the actual number of links over the maximum possible number of links between nodes with node rank less than r, where r is normalized by the total number of nodes. Fig. 4 is a plot of the rich-club coefficient ϕ (r) against node rank r on a logarithmic scale. The plot shows that only the IG model closely matches the rich-club coefficient of the AS graph. The rich-club coefficients of the Inet-3.0 model and the BA model are significantly lower than that of the AS graph. This means that rich nodes in these two models are not as well connected to each other as in the AS graph. On the other hand, the rich-club coefficients of the two GLP models are significantly higher than that of the AS graph. This means the rich nodes in these two models are better connected to each other than in the AS graph. To be specific, the AS graph and the IG model have ϕ (1%)=32%, which indicates that the top 1% rich nodes have 32% of the maximum possible number of links, comparing with ϕ (1%)=18% of the Inet-3.0 model, ϕ (1%)=5% of the BA model, ϕ (1%)=72% of the GLP model(1) and ϕ (1%)=50% of the GLP model(2). 8
TABLE II LINK DISTRIBUTION LINK AS GRAPH IG MODEL GLP (1) GLP (2) INET-3.0 2) Connectivity of rich nodes We define l(r i, r j ) as the number of link connecting node i to node j, where node rank r is normalized by the total number of nodes and r i < r j. Table II shows that the majority of links are the links connecting with the top 5% rich node, l(r i <5%, r j ). Fig. 5 illustrates the detail of the connectivity of rich nodes by plotting l(r i <5%, BA MODEL L 32730 34363 34363 34363 24171 34363 l ( r i < 5%, r j ) 29602 26422 32376 29073 22620 15687 l ( r i < 5%, r j <5%) 8919 7806 16210 11540 3697 1511 l (r i < 5%, r j ) number of links connecting with the top 5% rich nodes. l (r i < 5%, r j < 5%) number of links connecting between the top 5% rich nodes. r j ) against corresponding node rank r j, where r j is divided into 5% bins. The AS graph shows a rich-club phenomenon, in which rich nodes are connected preferentially to the other rich nodes. The number of links connecting the top 5% rich nodes to each other ( l(r i <5%, r j <5%) = 8919) is significantly larger than the number l(r i <5%, r j ) GLP model (1) GLP model (2) 18000 AS graph 16000 14000 IG model 12000 Inet-3.0 model 10000 BA model 8000 6000 4000 2000 0 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% Fig. 5 Number of link l( r i < 5%, r j ) against rank r j r j of links connecting the rich nodes to other nodes of lower degrees. The Inet-3.0 model does not show this phenomenon as strong as the AS graph. The BA model does not show this phenomenon at all, in stead, rich nodes are connected to nodes of all degrees with similar probabilities. Whereas the two GLP models show this phenomenon significantly stronger than the AS graph. In the GLP model(1), l(r i <5%, r j <5%) = 16210. As seen from Fig. 4, Table II and Fig. 5, only the IG model closely reproduce the rich-club phenomenon of the AS graph. 9
VI. CONCLUSIONS AND DISCUSSIONS A. The details of the degree distribution of the AS graph The degree distribution of the AS graph deviates from a strict power law. It is important to match the details of the degree distribution. For example, the percentage of nodes with degree one is in fact smaller than the percentage of nodes with degree two. A few of the richest nodes of the AS graph have significantly higher degrees than some models. B. The rich-club phenomenon of the AS graph Tangmunarunkit et al [12] reported that degree-based network topology generators not only match the degree distribution but also match the hierarchy in real networks. We show that networks generated by different degree-based models can show significantly different levels of the rich-club phenomenon even they obey similar power-law degree distributions. The rich-club phenomenon is relevant because links between rich nodes are important for network performance and redundancy. We believe that the rich-club phenomenon is an effective metric to distinguish between power-law topology models. C. Inet-3.0 model The Inet-3.0 model is not a dynamic model. It is based on measurements of the original maps of the AS graph. Networks generated by the Inet-3.0 model have fewer links than the AS graph. Althought this model resembles many details of the degree distribution of (the original maps of) the AS graph, our results show that the majority of the missing links in the Inet-3.0 model are connecting links between rich nodes of the AS graph, and as a result, the Inet-3.0 model does not show the rich-club phenomenon as strong as the AS graph. D. BA model The BA model generates a strict power law degree distribution, which is very different from that of the actual AS graph. Furthermore, it does not show the rich-club phenomenon at all. This means 10
that the network structure of the BA model is fundamentally different from that of the AS graph. The reason for this is that all new links connect between new nodes and existing nodes. The probability for a new node to become a rich node decreases as the model grows. After the initial growth, the number of links between rich nodes rarely increases. At the end, rich nodes are not well connected to each other and rich nodes are connected to all nodes with similar probabilities regardless of the node degree. E. GLP model The GLP model does not reproduce the details of the degree distribution of the AS graph. Moreover, the rich-club phenomenon obtained from the GLP model is significantly stronger than the AS graph. F. IG model The IG model closely matches the details of the degree distribution and the rich-club phenomenon of the AS graph. This dynamic model is simple and the topological properties of the networks, generated by the model, are given by the joint growth of new nodes and new links. The IG model compares favourably with other power-law topology models. We believe that the IG model is a good step towards modelling the Internet topology. VII. REFERENCES [1] M. Faloutsos, P. Faloutsos, and C. Faloutsos, On Power-Law Relationship of the Internet Topology, Proc. of ACM SIGCOMM 99, page 251-262, Aug. 1999. [2] Q. Chen, H. Chang, R. Govindan, S. Jamin, S. J. Shenker and W. Willinger, The Origin of Power Laws in Internet Topologies (Revisited), Proc. of IEEE / INFOCOM 02, Dec. 2002. [3] J. Winick, and S. Jamin, Inet-3.0: Internet Topology Generator, Tech Report UM-CSE-TR- 456-02, University of Michigan, 2002. 11
[4] A.-L. Barabási and R. Albert, Emergence of Scaling in Random Networks, SCIENCE, vol. 286, pp. 509-512, Oct. 1999. [5] R. Albert, H. Jeong, and A.-L. Barabási, Error and attack tolerance of complex networks, NATURE, vol. 406, pp. 378-381, July 2000. [6] T. Bu and D. Towsley, On Distinguishing between Internet Power Law Topology Generators, Proc. of IEEE / INFOCOM 02, Dec. 2002. [7] D. J. Watts and S. H. Strogatz Collective dynamics of small world networks, NATURE, vol. 393, pp. 440-442, 1998. [8] S. Zhou and R. J. Mondragon, The rich-club phenomenon in the AS-level Internet topology, IEEE Communications Letters (submitted), Nov. 2002. [9] S. Zhou and R. J. Mondragon, The missing links in the BGP-based AS connectivity maps, NLANR / PAM2003 (in press), Dec. 2002. [10] L. Subramanian, S. Agarwal, J. Rexford and R. H. Katz, Characterizing the Internet Hierarchy from Multiple Vantage Points, Proc. of INFOCOM 2002. [11] S. L. Tauro, C. Palmer, G. Siganos and M. Faloutsos, A Simple Conceptual Model for the Internet Topology, Proc. of Global Internet 2001. [12] H. Tangmunarunkit, R. Govindan, S. Jamin, S. Shenker and W. Willinger, "Network topology generators: Degree-based vs. structural, Proc. of ACM/SIGCOMM 2002. 12