Distance Degree Sequences for Network Analysis



Similar documents
Graph Theory and Complex Networks: An Introduction. Chapter 06: Network analysis

Asking Hard Graph Questions. Paul Burkhardt. February 3, 2014

Graph Theory and Complex Networks: An Introduction. Chapter 06: Network analysis. Contents. Introduction. Maarten van Steen. Version: April 28, 2014

Analysis of Algorithms, I

Mining Social Network Graphs

Social Media Mining. Graph Essentials

MapReduce and Distributed Data Analysis. Sergei Vassilvitskii Google Research

Distributed Computing over Communication Networks: Topology. (with an excursion to P2P)

The Goldberg Rao Algorithm for the Maximum Flow Problem

System Interconnect Architectures. Goals and Analysis. Network Properties and Routing. Terminology - 2. Terminology - 1

Distributed Computing over Communication Networks: Maximal Independent Set

Graph Mining Techniques for Social Media Analysis

Graph Mining and Social Network Analysis

Graph models for the Web and the Internet. Elias Koutsoupias University of Athens and UCLA. Crete, July 2003

How To Understand The Network Of A Network

A discussion of Statistical Mechanics of Complex Networks P. Part I

Evaluation of a New Method for Measuring the Internet Degree Distribution: Simulation Results

CIS 700: algorithms for Big Data

CSE 326, Data Structures. Sample Final Exam. Problem Max Points Score 1 14 (2x7) 2 18 (3x6) Total 92.

Subgraph Patterns: Network Motifs and Graphlets. Pedro Ribeiro

Big Graph Processing: Some Background

Social Media Mining. Network Measures

Topological Properties

DATA ANALYSIS II. Matrix Algorithms

Graph/Network Visualization

Algorithmic Aspects of Big Data. Nikhil Bansal (TU Eindhoven)

Graphs over Time Densification Laws, Shrinking Diameters and Possible Explanations

Big Data Analytics of Multi-Relationship Online Social Network Based on Multi-Subnet Composited Complex Network

Home Page. Data Structures. Title Page. Page 1 of 24. Go Back. Full Screen. Close. Quit


Network (Tree) Topology Inference Based on Prüfer Sequence

Graph Processing and Social Networks

Random graphs and complex networks

Routing in packet-switching networks


Dynamic Network Analyzer Building a Framework for the Graph-theoretic Analysis of Dynamic Networks

Graph Theory and Complex Networks: An Introduction. Chapter 08: Computer networks

Business Intelligence and Process Modelling

Fast Sequential Summation Algorithms Using Augmented Data Structures

USE OF EIGENVALUES AND EIGENVECTORS TO ANALYZE BIPARTIVITY OF NETWORK GRAPHS

Algorithmic Techniques for Big Data Analysis. Barna Saha AT&T Lab-Research

Social Network Analysis

An Introduction to APGL

HADI: Fast Diameter Estimation and Mining in Massive Graphs with Hadoop

Simplified External memory Algorithms for Planar DAGs. July 2004

Multimedia Databases. Wolf-Tilo Balke Philipp Wille Institut für Informationssysteme Technische Universität Braunschweig

The average distances in random graphs with given expected degrees

5. A full binary tree with n leaves contains [A] n nodes. [B] log n 2 nodes. [C] 2n 1 nodes. [D] n 2 nodes.


Systems and Algorithms for Big Data Analytics

CSC2420 Fall 2012: Algorithm Design, Analysis and Theory

Data Warehousing und Data Mining

The Internet Is Like A Jellyfish

Efficiency of algorithms. Algorithms. Efficiency of algorithms. Binary search and linear search. Best, worst and average case.

Analysis of Internet Topologies

A SOCIAL NETWORK ANALYSIS APPROACH TO ANALYZE ROAD NETWORKS INTRODUCTION

Dmitri Krioukov CAIDA/UCSD

Class One: Degree Sequences

A Performance Evaluation of Open Source Graph Databases. Robert McColl David Ediger Jason Poovey Dan Campbell David A. Bader

An Empirical Study of Two MIS Algorithms

Big Data & Scripting Part II Streaming Algorithms

Six Degrees of Separation in Online Society

Common Patterns and Pitfalls for Implementing Algorithms in Spark. Hossein

SoSe 2014: M-TANI: Big Data Analytics

Course on Social Network Analysis Graphs and Networks

Discrete Mathematics & Mathematical Reasoning Chapter 10: Graphs

Load balancing in a heterogeneous computer system by self-organizing Kohonen network

Sampling Biases in IP Topology Measurements

1. Introduction Gene regulation Genomics and genome analyses Hidden markov model (HMM)

Distributed Dynamic Load Balancing for Iterative-Stencil Applications

6.852: Distributed Algorithms Fall, Class 2

Graph Database Proof of Concept Report

B490 Mining the Big Data. 2 Clustering

CSE 4351/5351 Notes 7: Task Scheduling & Load Balancing

Option 1: empirical network analysis. Task: find data, analyze data (and visualize it), then interpret.

Inet-3.0: Internet Topology Generator

Design of LDPC codes

So today we shall continue our discussion on the search engines and web crawlers. (Refer Slide Time: 01:02)

Interconnection Networks. Interconnection Networks. Interconnection networks are used everywhere!

Broadcasting in Wireless Networks

Unsupervised Data Mining (Clustering)

Analysis of Internet Topologies: A Historical View

Misleading Stars: What Cannot Be Measured in the Internet?

Principles of Data Mining by Hand&Mannila&Smyth

Scalable Source Routing

USING SPECTRAL RADIUS RATIO FOR NODE DEGREE TO ANALYZE THE EVOLUTION OF SCALE- FREE NETWORKS AND SMALL-WORLD NETWORKS

How To Cluster Of Complex Systems

Transcription:

Universität Konstanz Computer & Information Science Algorithmics Group 15 Mar 2005

based on Palmer, Gibbons, and Faloutsos: ANF A Fast and Scalable Tool for Data Mining in Massive Graphs, SIGKDD 02.

Motivation Foundations Exact Algorithm Approximation Algorithm Benefits Web Mining Graph Similarity Internet Router Data

Graphs Motivation Foundations Problems modeled as graphs appear in various ares, including: social networks streets academic citations biology and chemistry the Internet...

Questions Motivation Foundations Some related questions in network analysis: How robust is a network to failures? Are two given networks similar? Given two actors in a network, which one is more influential? Typical networks to be analyzed are LARGE Key issue: Extract a small set of features that describe much of the character of particular actors or the overall network

Definitions I Basics Motivation Foundations Graph G = (V, E), E V V (or ( V 2) if G is undirected) n = V, m = E v, w adjacent (v, w) E (or {v, w} E) Neighborhood Neigh(v) = {w V : (v, w) E} Degree deg(v) = Neigh(v) Distance d(v, w) = length of shortest path from v to w Diameter diam(g) = longest distance in a graph (over all v, w E)

Definitions II Neighborhoods Motivation Foundations h-neighborhood Neigh h (v) = {w V : d(v, w) h} Neigh 0 (v) = {v}, Neigh 1 (v) = Neigh(v) {v}, distance degrees N(v, h) = Neigh h (v) distance degree sequence N(v, 0), N(v, 1), N(v, 2),... Hop plot P(h) = {(v, w) : d(v, w) h} = v V N(v, h) (also called distance distribution)

Motivation Foundations What can we do with those N(v, h)? Compare nodes (their distance degree sequence) Rank nodes (which are the important ones?) Compare graphs (their hop plots)

Exact Algorithm Exact Algorithm Approximation Algorithm Benefits How can we compute the N(v, h) efficiently for each v V and h = 1,..., diam(g) (even for very large instances)? BFS from every vertex? No! (random access to edge file) Idea: Sequentially scan edge file, grow the set of already reached nodes for each node accordingly ANF (Approximate Neighborhood Function) algorithm, Palmer et. al (2002)

Exact Algorithm Exact Algorithm Approximation Algorithm Benefits Input: Graph G = (V, E) Output: h-neighborhood sizes for all h N, v V foreach v V do Neigh 0 (v) {v} for h = 1,..., diam(g) do foreach v V do Neigh h (v) Neigh h 1 (v) foreach (v, w) E do Neigh h (v) Neigh h (v) Neigh h 1 (w)

Exact Algorithm Exact Algorithm Approximation Algorithm Benefits

Exact Algorithm Approximation Algorithm Benefits Exact Algorithm Crucial: Computing the number of distinct elements in foreach (v, w) E do Neigh h (v) Neigh h (v) Neigh h 1 (w) Maintaining for each node v V a bitstring that represents the set of already reached nodes Give each node w its own bit in v s bitstring? No, needs quadratic space! Solution: Approximation to the N(v, h) s by using shorter bit strings

Probabilistic Counting Exact Algorithm Approximation Algorithm Benefits Probabilistic Counting: Flajolet and Martin (1985) Originally designed for data base applications Maintain for each node v V a bitstring of length O(log n) ) j+1 Throw a node to bit j with probability ( 1 2 j 0 1 2 3 4... 1 1 1 1 1 probability 2 4 8 16 32... union of two sets: bitwise OR of the two corresponding bitstrings

Probabilistic Counting, cont d Exact Algorithm Approximation Algorithm Benefits How can we estimate the number of elements which are represented by a given bitstring? look for the leftmost zero bit (say b) bit 0 1 2 3 4 5 6... value 1 1 0 1 0 0 0... the number of elements is proportional to 2 b proportionality factor = 0.77351 improved accuracy by maintaining k bitstrings and averaging over the resulting b s estimation has good provable error bounds!

Basic Exact Algorithm Approximation Algorithm Benefits foreach v V do M(v, 0) concatenation of k bitstrings, each with 1 bit set (P(i) = 1 2 i+1 ) for h = 1,..., diam(g) do foreach v V do M(v, h) M(v, h 1) foreach (v, w) E do M(v, h) M(v, h) M(w, h 1) foreach v V do b average position of leftmost zero bits in the k partial bitstrings in M(v, h) N(v, h) 2b 0.77351

Exact Algorithm Approximation Algorithm Benefits Example Input: a cycle with 5 nodes k = 3 v M(v, 0) M(v, 1) N(v, 1) M(v, 2) N(v, 2) 0 100 100 001 110 110 101 4.1 110 111 101 5.2 1 010 100 100 110 101 101 3.25 110 111 101 5.2 2 100 001 100 110 101 100 3.25 110 111 101 5.2 3 100 100 100 100 111 100 4.1 110 111 101 5.2 4 100 010 100 100 110 101 3.25 110 111 101 5.2 Example: N(2, 1) = 2 (2+1+1)/3 0.77351 = 24/3 0.77351 = 3.25

Benefits Exact Algorithm Approximation Algorithm Benefits Why use the ANF algorithm? Input (edge file) can stay on disk (sequential access, no random access) Scalability, O(diam(G) m) time Linear memory usage, O(m + n) Can be parallelized Good, accurate results (better than sampling etc.)

Web Mining Web Mining Graph Similarity Internet Router Data The Web as a graph Increasing amount of research on graph structure in the WWW Objective: get a more global view to the WWW structure Typical statistics: average path length, distance distribution,...

Web Mining Web Mining Graph Similarity Internet Router Data Example: Compute for each node v V the minimum distance h such that N(v, h) n 2

Graph Similarity Web Mining Graph Similarity Internet Router Data Given two graphs, how can we determine their similarity? One approach: use the hop plot P(h) = {(v, w) : d(v, w) h} Many real-world graphs seem to have a P( ) following a power law P(h) h a, where a is called hop exponent Examples: Cycle: a = 1, Grid: a = 2 intrinsic dimensionality of the graph

Graph Similarity Web Mining Graph Similarity Internet Router Data

Internet Router Data Web Mining Graph Similarity Internet Router Data Fault-tolerance and connectivity of the internet topology Data: Collection of tracert results (285k nodes, 430k edges), pulicly available at www.isi.edu Experiments: Successively delete nodes and compute neighborhood information again

Internet Router Data Web Mining Graph Similarity Internet Router Data

The h-neighborhoods and the hop plots can be useful to reveal structural properties of the networks ANF algorithm yields good approximation to the required information Algorithm scales even to very large instances (> 50m nodes) Other applications: analysis, clustering, visualization,...