Graphs over Time Densification Laws, Shrinking Diameters and Possible Explanations



Similar documents
Graph Mining Techniques for Social Media Analysis

School of Computer Science Carnegie Mellon Graph Mining, self-similarity and power laws

Graph models for the Web and the Internet. Elias Koutsoupias University of Athens and UCLA. Crete, July 2003

How To Find Out How A Graph Densifies

(B ) Empirical observation: Shrinking diameters: The effective diameter is, in many cases, actually decreasing as the network grows.

CMU SCS Large Graph Mining Patterns, Tools and Cascade analysis

Big Data Analytics of Multi-Relationship Online Social Network Based on Multi-Subnet Composited Complex Network

Introduction to Networks and Business Intelligence

The Shape of the Network. The Shape of the Internet. Why study topology? Internet topologies. Early work. More on topologies..

The ebay Graph: How Do Online Auction Users Interact?

Some questions... Graphs

Temporal Dynamics of Scale-Free Networks

Complex Networks Analysis: Clustering Methods

USING SPECTRAL RADIUS RATIO FOR NODE DEGREE TO ANALYZE THE EVOLUTION OF SCALE- FREE NETWORKS AND SMALL-WORLD NETWORKS

arxiv:cs.dm/ v1 30 Mar 2002

The average distances in random graphs with given expected degrees

The Internet Is Like A Jellyfish

Network/Graph Theory. What is a Network? What is network theory? Graph-based representations. Friendship Network. What makes a problem graph-like?

An Incremental Super-Linear Preferential Internet Topology Model

Small-World Characteristics of Internet Topologies and Implications on Multicast Scaling

An Alternative Web Search Strategy? Abstract

Chapter 29 Scale-Free Network Topologies with Clustering Similar to Online Social Networks

General Network Analysis: Graph-theoretic. COMP572 Fall 2009

Characterizing and Modelling Clustering Features in AS-Level Internet Topology

Network Analysis. BCH 5101: Analysis of -Omics Data 1/34

Online Social Networks and Network Economics. Aris Anagnostopoulos, Online Social Networks and Network Economics

Random graphs and complex networks

A discussion of Statistical Mechanics of Complex Networks P. Part I

Inet-3.0: Internet Topology Generator

Social Network Mining

Tutorial, IEEE SERVICE 2014 Anchorage, Alaska

The Structure of Growing Social Networks

Computer Network Topologies: Models and Generation Tools

Network Theory: 80/20 Rule and Small Worlds Theory

Research Article A Comparison of Online Social Networks and Real-Life Social Networks: A Study of Sina Microblogging

Graph Mining and Social Network Analysis

Studying Graphs for Intelligence Monitoring and Analysis in the Absence of Semantic Information

Social Media Mining. Graph Essentials

Analyzing the Facebook graph?

Dmitri Krioukov CAIDA/UCSD

Effects of node buffer and capacity on network traffic

Towards Modeling Legitimate and Unsolicited Traffic Using Social Network Properties

Scaling Properties of the Internet Graph

ATM Network Performance Evaluation And Optimization Using Complex Network Theory

Small-World Internet Topologies

A dynamic model for on-line social networks

A SURVEY OF PROPERTIES AND MODELS OF ON- LINE SOCIAL NETWORKS

1 Six Degrees of Separation

Looking at the Blogosphere Topology through Different Lenses

Accelerated growth of networks

Algorithms for representing network centrality, groups and density and clustered graph representation

MINFS544: Business Network Data Analytics and Applications

On Generating Graphs with Prescribed Vertex Degrees for Complex Network Modeling 1. College of Computing Georgia Institute of Technology

Online Appendix to Social Network Formation and Strategic Interaction in Large Networks

MINING COMMUNITIES OF BLOGGERS: A CASE STUDY

On Realistic Network Topologies for Simulation

A MULTI-MODEL DOCKING EXPERIMENT OF DYNAMIC SOCIAL NETWORK SIMULATIONS ABSTRACT

The Topology of Large-Scale Engineering Problem-Solving Networks

Cost effective Outbreak Detection in Networks

Distance Degree Sequences for Network Analysis

The Large-Scale Structure of Semantic Networks: Statistical Analyses and a Model of Semantic Growth

Understanding the evolution dynamics of internet topology

Subnet Based Internet Topology Generation

Statistical Mechanics of Complex Networks

ModelingandSimulationofthe OpenSourceSoftware Community

Mobile Call Graphs: Beyond Power-Law and Lognormal Distributions

Euler Paths and Euler Circuits

Applying Social Network Analysis to the Information in CVS Repositories

A Measurement-driven Analysis of Information Propagation in the Flickr Social Network

Complex Network Visualization based on Voronoi Diagram and Smoothed-particle Hydrodynamics

A box-covering algorithm for fractal scaling in scale-free networks

Structure of a large social network

Zipf s law and the Internet

A scalable multilevel algorithm for graph clustering and community structure detection

Analysis of Internet Topologies

arxiv: v1 [cs.ne] 12 Feb 2014

Towards Modelling The Internet Topology The Interactive Growth Model

Graph Theory and Networks in Biology

Emergence of Complexity in Financial Networks

V. Adamchik 1. Graph Theory. Victor Adamchik. Fall of 2005

Efficient Crawling of Community Structures in Online Social Networks

Practical Graph Mining with R. 5. Link Analysis

Mining Social Network Graphs

Application of Social Network Analysis to Collaborative Team Formation

Sampling Biases in IP Topology Measurements

Social Networks and Social Media

Graph Theory and Complex Networks: An Introduction. Chapter 08: Computer networks

Graph Theory Approaches to Protein Interaction Data Analysis

Who Links to Whom: Mining Linkage between Web Sites

Link Prediction in Social Networks

Evaluation of a New Method for Measuring the Internet Degree Distribution: Simulation Results

Foundations of Multidimensional Network Analysis

The mathematics of networks

1. Write the number of the left-hand item next to the item on the right that corresponds to it.

Statistical mechanics of complex networks

Communication Dynamics of Blog Networks

Degree distribution of the FKP network model

Jure Leskovec Stanford University

Social and Economic Networks: Lecture 1, Networks?

Information Network or Social Network? The Structure of the Twitter Follow Graph

Transcription:

Graphs over Time Densification Laws, Shrinking Diameters and Possible Explanations Jurij Leskovec, CMU Jon Kleinberg, Cornell Christos Faloutsos, CMU 1

Introduction What can we do with graphs? What patterns or laws hold for most real-world graphs? How do the graphs evolve over time? Can we generate synthetic but realistic graphs? Needle exchange networks of drug users 2

Evolution of the Graphs How do graphs evolve over time? Conventional Wisdom: Constant average degree: the number of edges grows linearly with the number of nodes Slowly growing diameter: as the network grows the distances between nodes grow Our findings: Densification Power Law: networks are becoming denser over time Shrinking Diameter: diameter is decreasing as the network grows 3

Outline Introduction General patterns and generators Graph evolution Observations Densification Power Law Shrinking Diameters Proposed explanation Community Guided Attachment Proposed graph generation model Forest Fire Model Conclusion 4

Outline Introduction General patterns and generators Graph evolution Observations Densification Power Law Shrinking Diameters Proposed explanation Community Guided Attachment Proposed graph generation model Forest Fire Model Conclusion 5

Graph Patterns Power Law Many lowdegree nodes Few highdegree nodes Internet in December 1998 log(count) vs. log(degree) Y=a*X b 6

Graph Patterns Small-world [Watts and Strogatz],++: 6 degrees of separation Small diameter (Community structure, ) # reachable pairs Effective Diameter hops 7

Graph models: Random Graphs How can we generate a realistic graph? given the number of nodes N and edges E Random graph [Erdos & Renyi, 60s]: Pick 2 nodes at random and link them Does not obey Power laws No community structure 8

Graph models: Preferential attachment Preferential attachment [Albert & Barabasi, 99]: Add a new node, create M out-links Probability of linking a node is proportional to its degree Examples: Citations: new citations of a paper are proportional to the number it already has Rich get richer phenomena Explains power-law degree distributions But, all nodes have equal (constant) out-degree 9

Graph models: Copying model Copying model [Kleinberg, Kumar, Raghavan, Rajagopalan and Tomkins, 99]: Add a node and choose the number of edges to add Choose a random vertex and copy its links (neighbors) Generates power-law degree distributions Generates communities 10

Other Related Work Huberman and Adamic, 1999: Growth dynamics of the world wide web Kumar, Raghavan, Rajagopalan, Sivakumar and Tomkins, 1999: Stochastic models for the web graph Watts, Dodds, Newman, 2002: Identity and search in social networks Medina, Lakhina, Matta, and Byers, 2001: BRITE: An Approach to Universal Topology Generation 11

Why is all this important? Gives insight into the graph formation process: Anomaly detection abnormal behavior, evolution Predictions predicting future from the past Simulations of new algorithms Graph sampling many real world graphs are too large to deal with 12

Outline Introduction General patterns and generators Graph evolution Observations Densification Power Law Shrinking Diameters Proposed explanation Community Guided Attachment Proposed graph generation model Forest Fire Model Conclusion 13

Temporal Evolution of the Graphs N(t) nodes at time t E(t) edges at time t Suppose that N(t+1) = 2 * N(t) Q: what is your guess for E(t+1) =? 2 * E(t) A: over-doubled! But obeying the Densification Power Law 14

Temporal Evolution of the Graphs Densification Power Law networks are becoming denser over time the number of edges grows faster than the number of nodes average degree is increasing or equivalently a densification exponent 15

Graph Densification A closer look Densification Power Law Densification exponent: 1 a 2: a=1: linear growth constant out-degree (assumed in the literature so far) a=2: quadratic growth clique Let s see the real graphs! 16

Densification Physics Citations Citations among physics papers 1992: 1,293 papers, 2,717 citations 2003: 29,555 papers, 352,807 citations For each month M, create a graph of all citations up to month M E(t) 1.69 N(t) 17

Densification Patent Citations Citations among patents granted 1975 E(t) 334,000 nodes 676,000 edges 1.66 1999 2.9 million nodes 16.5 million edges Each year is a datapoint N(t) 18

Densification Autonomous Systems Graph of Internet 1997 E(t) 3,000 nodes 10,000 edges 2000 1.18 6,000 nodes 26,000 edges One graph per day N(t) 19

Densification Affiliation Network Authors linked to their publications 1992 E(t) 318 nodes 272 edges 1.15 2002 60,000 nodes 20,000 authors 38,000 papers 133,000 edges N(t) 20

Graph Densification Summary The traditional constant out-degree assumption does not hold Instead: the number of edges grows faster than the number of nodes average degree is increasing 21

Outline Introduction General patterns and generators Graph evolution Observations Densification Power Law Shrinking Diameters Proposed explanation Community Guided Attachment Proposed graph generation model Forest Fire Model Conclusion 22

Evolution of the Diameter Prior work on Power Law graphs hints at Slowly growing diameter: diameter ~ O(log N) diameter ~ O(log log N) What is happening in real data? Diameter shrinks over time As the network grows the distances between nodes slowly decrease 23

Diameter ArXiv citation graph Citations among physics papers 1992 2003 diameter One graph per year time [years] 24

Diameter Autonomous Systems Graph of Internet diameter One graph per day 1997 2000 number of nodes 25

Diameter Affiliation Network Graph of collaborations in physics authors linked to papers 10 years of data diameter time [years] 26

Diameter Patents Patent citation network 25 years of data diameter time [years] 27

Validating Diameter Conclusions There are several factors that could influence the Shrinking diameter Effective Diameter: Distance at which 90% of pairs of nodes is reachable Problem of Missing past How do we handle the citations outside the dataset? Disconnected components None of them matters 28

Outline Introduction General patterns and generators Graph evolution Observations Densification Power Law Shrinking Diameters Proposed explanation Community Guided Attachment Proposed graph generation model Forest Fire Mode Conclusion 29

Densification Possible Explanation Existing graph generation models do not capture the Densification Power Law and Shrinking diameters Can we find a simple model of local behavior, which naturally leads to observed phenomena? Yes! We present 2 models: Community Guided Attachment obeys Densification Forest Fire model obeys Densification, Shrinking diameter (and Power Law degree distribution) 30

Community structure Let s assume the community structure One expects many within-group friendships and fewer cross-group ones How hard is it to cross communities? Science University CS Math Drama Music Self-similar university community structure Arts 31

Fundamental Assumption If the cross-community linking probability of nodes at tree-distance h is scale-free We propose cross-community linking probability: where: c 1 the Difficulty constant h tree-distance 32

Densification Power Law (1) Theorem: The Community Guided Attachment leads to Densification Power Law with exponent a densification exponent b community structure branching factor c difficulty constant 33

Difficulty Constant Theorem: Gives any non-integer Densification exponent If c = 1: easy to cross communities Then: a=2, quadratic growth of edges near clique If c = b: hard to cross communities Then: a=1, linear growth of edges constant outdegree 34

Room for Improvement Community Guided Attachment explains Densification Power Law Issues: Requires explicit Community structure Does not obey Shrinking Diameters 35

Outline Introduction General patterns and generators Graph evolution Observations Densification Power Law Shrinking Diameters Proposed explanation Community Guided Attachment Proposed graph generation model Forest Fire Model Conclusion 36

Forest Fire model Wish List Want no explicit Community structure Shrinking diameters and: Rich get richer attachment process, to get heavytailed in-degrees Copying model, to lead to communities Community Guided Attachment, to produce Densification Power Law 37

Forest Fire model Intuition (1) How do authors identify references? 1. Find first paper and cite it 2. Follow a few citations, make citations 3. Continue recursively 4. From time to time use bibliographic tools (e.g. CiteSeer) and chase back-links 38

Forest Fire model Intuition (2) How do people make friends in a new environment? 1. Find first a person and make friends 2. Follow a of his friends 3. Continue recursively 4. From time to time get introduced to his friends Forest Fire model imitates exactly this process 39

Forest Fire the Model A node arrives Randomly chooses an ambassador Starts burning nodes (with probability p) and adds links to burned nodes Fire spreads recursively 40

Forest Fire in Action (1) Forest Fire generates graphs that Densify and have Shrinking Diameter E(t) densification diameter 1.21 diameter N(t) 41 N(t)

Forest Fire in Action (2) Forest Fire also generates graphs with heavy-tailed degree distribution in-degree out-degree count vs. in-degree 42 count vs. out-degree

Forest Fire model Justification Densification Power Law: Similar to Community Guided Attachment The probability of linking decays exponentially with the distance Densification Power Law Power law out-degrees: From time to time we get large fires Power law in-degrees: The fire is more likely to burn hubs 43

Forest Fire model Justification Communities: Newcomer copies neighbors links Shrinking diameter 44

Conclusion (1) We study evolution of graphs over time We discover: Densification Power Law Shrinking Diameters Propose explanation: Community Guided Attachment leads to Densification Power Law 45

Conclusion (2) Proposed Forest Fire Model uses only 2 parameters to generate realistic graphs: Heavy-tailed in- and out-degrees Densification Power Law Shrinking diameter 46

Thank you! Questions? jure@cs.cmu.edu 47

Dynamic Community Guided Attachment The community tree grows At each iteration a new level of nodes gets added New nodes create links among themselves as well as to the existing nodes in the hierarchy Based on the value of parameter c we get: a) Densification with heavy-tailed in-degrees b) Constant average degree and heavy-tailed in-degrees c) Constant in- and out-degrees But: Community Guided Attachment still does not obey the shrinking diameter property 48

Densification Power Law (1) Theorem: Community Guided Attachment random graph model, the expected out-degree of a node is proportional to 49

Forest Fire the Model 2 parameters: p forward burning probability r backward burning ratio Nodes arrive one at a time New node v attaches to a random node the ambassador Then v begins burning ambassador s neighbors: Burn X links, where X is binomially distributed Choose in-links with probability r times less than outlinks Fire spreads recursively Node v attaches to all nodes that got burned 50

Forest Fire Phase plots Exploring the Forest Fire parameter space Dense graph Increasing diameter Sparse graph Shrinking diameter 51

Forest Fire Extensions Orphans: isolated nodes that eventually get connected into the network Example: citation networks Orphans can be created in two ways: start the Forest Fire model with a group of nodes new node can create no links Diameter decreases even faster Multiple ambassadors: Example: following paper citations from different fields Faster decrease of diameter 52

Densification and Shrinking Diameter Are the Densification and Shrinking Diameter two different observations of the same phenomena? No! Forest Fire can generate: (1) Sparse graphs with increasing diameter Sparse graphs with decreasing diameter (2) Dense graphs with decreasing diameter 1 2 53