Graph Theory and Complex Networks: An Introduction. Chapter 08: Computer networks



Similar documents
Graph Theory and Complex Networks: An Introduction. Chapter 06: Network analysis

Graph Theory and Complex Networks: An Introduction. Chapter 06: Network analysis. Contents. Introduction. Maarten van Steen. Version: April 28, 2014

Social Media Mining. Graph Essentials

Computer Networking Networks

Internet Firewall CSIS Packet Filtering. Internet Firewall. Examples. Spring 2011 CSIS net15 1. Routers can implement packet filtering

Practical Graph Mining with R. 5. Link Analysis

Module 7. Routing and Congestion Control. Version 2 CSE IIT, Kharagpur

architecture: what the pieces are and how they fit together names and addresses: what's your name and number?

An Alternative Web Search Strategy? Abstract

Communications and Networking

Exterior Gateway Protocols (BGP)

Outline. EE 122: Interdomain Routing Protocol (BGP) BGP Routing. Internet is more complicated... Ion Stoica TAs: Junda Liu, DK Moon, David Zats

Lesson 5-3: Border Gateway Protocol

Disaster Recovery Design Ehab Ashary University of Colorado at Colorado Springs

Network Virtualization Network Admission Control Deployment Guide

Border Gateway Protocol (BGP)

Border Gateway Protocol BGP4 (2)

Inter-domain Routing. Outline. Border Gateway Protocol

Mathematical issues in network construction and security

BGP Prefix Hijack: An Empirical Investigation of a Theoretical Effect Masters Project

Distributed Computing over Communication Networks: Topology. (with an excursion to P2P)

Network: several computers who can communicate. bus. Main example: Ethernet (1980 today: coaxial cable, twisted pair, 10Mb 1000Gb).

Routing Protocols. Interconnected ASes. Hierarchical Routing. Hierarchical Routing

Social Media Mining. Network Measures

Traffic delivery evolution in the Internet ENOG 4 Moscow 23 rd October 2012

Computer Networks - CS132/EECS148 - Spring

Dynamic Routing Protocols II OSPF. Distance Vector vs. Link State Routing

Based on Computer Networking, 4 th Edition by Kurose and Ross

Some questions... Graphs

OSPF Routing Protocol

CSC458 Lecture 6. Homework #1 Grades. Inter-domain Routing IP Addressing. Administrivia. Midterm will Cover Following Topics

Asking Hard Graph Questions. Paul Burkhardt. February 3, 2014

Networks. Connecting Computers. Measures for connection speed. Ethernet. Collision detection. Ethernet protocol

DECENTRALIZED SCALE-FREE NETWORK CONSTRUCTION AND LOAD BALANCING IN MASSIVE MULTIUSER VIRTUAL ENVIRONMENTS

Network Monitoring Using Traffic Dispersion Graphs (TDGs)

Bandwidth-based load-balancing with failover. The easy way. We need more bandwidth.

Top-Down Network Design

ESSENTIALS. Understanding Ethernet Switches and Routers. April 2011 VOLUME 3 ISSUE 1 A TECHNICAL SUPPLEMENT TO CONTROL NETWORK

A Review on Zero Day Attack Safety Using Different Scenarios

Finding Fault Location: Combining network topology and end-to-end measurments to locate network problems?

Technical Support Information Belkin internal use only

Datagram-based network layer: forwarding; routing. Additional function of VCbased network layer: call setup.

The Quality of Internet Service: AT&T s Global IP Network Performance Measurements

Graph Mining and Social Network Analysis

How To Understand and Configure Your Network for IntraVUE

The Joint Degree Distribution as a Definitive Metric of the Internet AS-level Topologies

Bit Chat: A Peer-to-Peer Instant Messenger

Internet Infrastructure Measurement: Challenges and Tools

Overview of Network Hardware and Software. CS158a Chris Pollett Jan 29, 2007.

Level 2 Routing: LAN Bridges and Switches

Life of a Packet CS 640,

How To Build A Lightpath Network For Multiple Lightpath Projects

Quality of Service Routing Network and Performance Evaluation*

Distributed Computing over Communication Networks: Maximal Independent Set

Introduction to Routing

Systems and Algorithms for Big Data Analytics

Assignment 6: Internetworking Due October 17/18, 2012

Definition. A Historical Example

CROSS LAYER BASED MULTIPATH ROUTING FOR LOAD BALANCING

V. Adamchik 1. Graph Theory. Victor Adamchik. Fall of 2005

Inter-domain Routing Basics. Border Gateway Protocol. Inter-domain Routing Basics. Inter-domain Routing Basics. Exterior routing protocols created to:

BGP: Border Gateway Protocol

Outline. Internet Routing. Alleviating the Problem. DV Algorithm. Routing Information Protocol (RIP) Link State Routing. Routing algorithms

Strong and Weak Ties

Chapter 8: Computer Networking. AIMS The aim of this chapter is to give a brief introduction to computer networking.

Analyzing and modelling the AS-level Internet topology

Scalable Source Routing

Active measurements: networks. Prof. Anja Feldmann, Ph.D. Dr. Nikolaos Chatzis Georgios Smaragdakis, Ph.D.

Network (Tree) Topology Inference Based on Prüfer Sequence

Introduction to Networks and Business Intelligence

Internet Routing. Review of Networking Principles

Masters of Science in Information Technology

Distance Degree Sequences for Network Analysis

Terminology. Internet Addressing System

Doing Don ts: Modifying BGP Attributes within an Autonomous System

Chapter 14: Distributed Operating Systems

APNIC elearning: BGP Attributes

Module 15: Network Structures

Security-Aware Beacon Based Network Monitoring

Agenda. Distributed System Structures. Why Distributed Systems? Motivation

Route Discovery Protocols

USING SPECTRAL RADIUS RATIO FOR NODE DEGREE TO ANALYZE THE EVOLUTION OF SCALE- FREE NETWORKS AND SMALL-WORLD NETWORKS

Routing in Small Networks. Internet Routing Overview. Agenda. Routing in Large Networks

Social Network Mining

Large-Scale Distributed Systems. Datacenter Networks. COMP6511A Spring 2014 HKUST. Lin Gu

Graph models for the Web and the Internet. Elias Koutsoupias University of Athens and UCLA. Crete, July 2003

20. Switched Local Area Networks

The Internet: A Remarkable Story. Inside the Net: A Different Story. Networks are Hard to Manage. Software Defined Networking Concepts

TRILL for Data Center Networks

Routing Heterogeneous CCI Subnets

Analysis of Internet Topologies

Efficient Content Location Using Interest-Based Locality in Peer-to-Peer Systems

Outline. CSc 466/566. Computer Security. 18 : Network Security Introduction. Network Topology. Network Topology. Christian Collberg

OSPF Version 2 (RFC 2328) Describes Autonomous Systems (AS) topology. Propagated by flooding: Link State Advertisements (LSAs).

Static IP Routing and Aggregation Exercises

Operating System Concepts. Operating System 資 訊 工 程 學 系 袁 賢 銘 老 師

Oct 15, Internet : the vast collection of interconnected networks that all use the TCP/IP protocols

2. What is the maximum value of each octet in an IP address? A. 28 B. 255 C. 256 D. None of the above

CSE 473 Introduction to Computer Networks. Exam 2 Solutions. Your name: 10/31/2013

Transcription:

Graph Theory and Complex Networks: An Introduction Maarten van Steen VU Amsterdam, Dept. Computer Science Room R4.20, steen@cs.vu.nl Chapter 08: Computer networks Version: March 3, 2011

2 / 53 Contents Chapter Description 01: Introduction History, background 02: Foundations Basic terminology and properties of graphs 03: Extensions Directed & weighted graphs, colorings 04: Network traversal Walking through graphs (cf. traveling) 05: Trees Graphs without cycles; routing algorithms 06: Network analysis Basic metrics for analyzing large graphs 07: Random networks Introduction modeling real-world networks 08: Computer networks The Internet & WWW seen as a huge graph 09: Social networks Communities seen as graphs

3 / 53 Introduction Observation The Internet as we know it today is a communication network that allows us to exchange messages. The (World Wide) Web is a huge distributed information system, implemented on top of the Internet. The two are very different. 1 The organization and structure of the Internet 2 The organization of overlay (i.e., peer-to-peer) networks 3 The organization and structure of the Web

4 / 53 Computer networks: basics There are many different kinds of computer networks: Traditional networks in buildings and on campus Home networks (wired and wireless) Networks for mobile phones Access networks (with so-called hot spots) Networks owned by Internet Service Providers (ISPs)... The Internet ties all these networks together (well, that s what we think). For starters: make distinction between small-area networks and large-area networks.

5 / 53 Small-area networks LAN 1 Router LAN 2 Switch Switch Server group Router Router R1 LAN 3 Security gateway Router Firewall Internet

Example: router 6 / 53

Example: switch 7 / 53

Example: security gateway 8 / 53

Example: server group 9 / 53

10 / 53 Addressing Essence Each networked device has (in principle) a worldwide unique low-level address, also called its MAC address. The MAC address is nothing but a device identifier. When a device transmits a message, it always sends its MAC address as part of the message. A switch can connect several devices, and discovers the MAC addresses. When a MAC address has been discovered, a switch can distinctively forward messages to the associated device.

Assigning an Internet address 11 / 53

12 / 53 Structure of an IP address 32 bits network identifier host identifier An IP address consists of a network identifier and a host identifier Network ID: worldwide unique address of a (small area) network to which messages can be routed Host ID: network-wide unique address associated with a device/host In the Internet, messages are always routed to a network. Internal routers handle subsequent forwarding to the hosts/devices using host IDs

13 / 53 IP addresses and home networks Observation Each home (or small organization) is assigned exactly one IP address. Note Using a bag of tricks, we can share that address among different devices. For now, it is important to know that all your devices at home have (essentially) the same external IP address. Consequence All devices in a home network are seen by the outside world as being one and the same.

Large-area networks Regional ISP Backbone Point of Presence Network Access Point Client Telephone system Server farm Corporate LAN Router 14 / 53

15 / 53 Autonomous system Description An autonomous system is an organizational unit that maintains a collection of (interconnected) communication networks. An AS announces its accessible networks as AS number, network identifier pairs. A simple number... In 2009, there were approximately 25,000 ASes.

16 / 53 Measuring the AS topology Each AS i has a number of border gateways: a special router that can transfer messages between AS i and an AS to which that router is linked. If BG1 i of AS i is linked to BGj 1 of AS j there is a physical connection between the two routers. Two gateways BG1 i and BGi 2 of the same AS i, are always internally linked: they know how to reach each other through a communication path. A border gateway BG1 i of AS i, attached to network n i, announces i,n i to its neighboring gateways. Assume BG j 1 of AS j is linked to BGi 1 BGj 1 announce that it knows a path to n i : j,i,n i. can then Each other gateway BG j 2 of AS j can announce j,i,n i to its linked neighbors.

17 / 53 Measuring the AS topology Important observations Gateways store and announce entire paths to destinations. For proper routing, each gateway needs to store paths to every network in the Internet. Conclusion If we read the routing tables from only a few gateways, we should be able to obtain a reasonable complete picture of the AS topology of the Internet.

Visualizing the AS topology (CAIDA) 18 / 53

Example topology: October 2008 Over 30,000 registered autonomous systems (including double registeries). Over 100,000 edges. Note: we may be missing more than 30% of all existing links! 1000 Vertex degree 100 10 1 10 100 1000 10,000 Node ID (ranked according to degree) 19 / 53

20 / 53 Example topology: October 2008 Rank: 1 2 3 4 5 6 7 8 9 10 Degree: 3309 2371 2232 2162 1816 1512 1273 1180 1029 1012 Some observations Very high clustering coefficient for top-1000 hubs: an almost complete graph! Most paths no longer than 3 or 4 hops. Most ASes separated by shortest path of max. length 6.

21 / 53 Peer-to-peer overlay networks Issue Large-scale distributed computer systems are spread across the Internet, yet their constituents need to communicate directly with each other organize the system in an overlay network. Overlay network Collection of peers, where each peer maintains a partial view of the system. View is nothing but a list of other peers with whom communication connections can be set up. Observation Partial views may change over time an ever-changing overlay network.

22 / 53 Structured overlay network: Chord Basics Each peer is assigned a unique m-bit identifier id. Every peer is assumed to store data contained in a file. Each file has a unique m-bit key k. Peer with smallest identifier id k is responsible for storing file with key k. succ(k): The peer (i.e., node) with the smallest identifier p k. Note All arithmetic is done modulo M = 2 m. In other words, if x = k M + y, then x mod M = y.

23 / 53 Example 29 30 31 0 1 2 3 26 27 28 Peer 1 stores files with keys 29, 30, 31, 0, 1 4 5 6 25 24 23 22 Peer 20 stores file with key 21 Peer 9 stores files with keys 5, 6, 7, 8, 9 7 8 9 10 21 11 20 12 19 18 17 16 15 14 13

24 / 53 Efficient lookups Partial view = finger table Each node p maintains a finger table FT p [] with at most m entries: FT p [i] = succ(p + 2 i 1 ) Note: FT p [i] points to the first node succeeding p by at least 2 i 1. To look up a key k, node p forwards the request to node with index j satisfying q = FT p [j] k < FT p [j + 1] If p < k < FT p [1], the request is also forwarded to FT p [1]

Example finger tables 1 1 2 1 3 1 4 4 5 14 27 28 29 30 31 0 1 1 4 2 4 3 9 4 9 5 18 2 3 4 5 i 1 9 2 9 3 9 4 14 5 20 succ(p + 2 i-1 ) 26 6 25 7 1 28 2 28 3 28 4 1 5 9 24 23 22 21 20 1 21 2 28 3 28 4 28 5 4 19 18 1 20 2 20 3 28 4 28 5 4 17 16 15 1 18 2 18 3 18 4 28 5 1 14 13 12 11 10 8 9 1 14 2 14 3 18 4 20 5 28 1 11 2 11 3 14 4 18 5 28 25 / 53

26 / 53 Example lookup: 15@4 25 24 23 26 22 27 21 28 1 28 2 28 3 28 4 1 5 9 20 29 1 21 2 28 3 28 4 28 5 4 19 30 18 31 1 1 2 1 3 1 4 4 5 14 1 20 2 20 3 28 4 28 5 4 17 0 1 16 1 4 2 4 3 9 4 9 5 18 15 2 1 9 2 9 3 9 4 14 5 20 1 14 2 14 3 18 4 20 5 28 1 18 2 18 3 18 4 28 5 1 14 3 13 4 1 11 2 11 3 14 4 18 5 28 12 5 11 6 10 7 8 9 1 FT 4 [4] 15 < FT 4 [5] 4 14 2 p = 14 < 15 < FT p [1] 14 18

27 / 53 Example lookup: 22@4 25 24 23 26 22 27 21 28 1 28 2 28 3 28 4 1 5 9 20 29 1 21 2 28 3 28 4 28 5 4 19 30 18 31 1 1 2 1 3 1 4 4 5 14 1 20 2 20 3 28 4 28 5 4 17 0 1 16 1 4 2 4 3 9 4 9 5 18 15 2 1 9 2 9 3 9 4 14 5 20 1 14 2 14 3 18 4 20 5 28 1 18 2 18 3 18 4 28 5 1 14 3 13 4 1 11 2 11 3 14 4 18 5 28 12 5 11 6 10 7 8 9 1 FT 4 [5] 22 4 20 2 FT 20 [1] 22 < FT 20 [2] 20 21 3 p = 21 < 22 < FT 21 [1] 21 28

28 / 53 Example lookup: 18@20 25 24 23 26 22 27 21 28 1 28 2 28 3 28 4 1 5 9 20 29 1 21 2 28 3 28 4 28 5 4 19 30 18 31 17 0 1 16 15 2 1 1 1 4 3 2 1 2 4 3 1 3 9 4 4 4 4 9 5 14 5 18 5 1 p = 20 18 < FT p [1] 1 20 2 20 3 28 4 28 5 4 1 9 2 9 3 9 4 14 5 20 1 14 2 14 3 18 4 20 5 28 1 18 2 18 3 18 4 28 5 1 14 13 1 11 2 11 3 14 4 18 5 28 12 11 6 10 7 8 9 20 21 2 FT 20 [5] < 18 20 4 3 FT 4 [4] 18 < FT 4 [5] 4 14 4 p = 14 < 18 < FT p [1] 14 18

29 / 53 The Chord graph Essence Each peer represented by a vertex; if FT p [i] = j, add arc i,j, but keep directed graph strict. 1 28 4 9 21 11 20 14 18

30 / 53 Chord: path lengths Observation With d2 n (i,j) = min{ i j,n i j }, we can see that every peer is joined with another peer at distance 1 2 n, 1 4 n, 1 8 n,...,1. 7.5 Average path length 7.0 6.5 6.0 5.5 2 4 6 8 10 12 14 16 18 20 Network size (x 1000)

31 / 53 Chord: degree distribution 600 3500 Occurrences 500 400 300 200 Occurrences 2500 1500 500 11 12 13 14 15 16 17 Outdegree 100 20 40 60 80 100 Indegree

32 / 53 Chord: clustering coefficient 0.12 0.11 0.10 0.09 1 5 10 15 20 Note CC is computed over undirected Chord graph; x-axis shows number of 1000 nodes.

33 / 53 Epidemic-based networks Basics Consider a collection of peers P = {p 1,...,p n }. Each peer can store lots of files. Each file f has a version v(f ). The owner of f is a single, unique peer, own(f ) who can update f. Goal We want to propagate updates of file f through a network of peers. v(f,p) denotes version of file f at peer p. FS(p) is set of files at peer p. If f FS(p) v(f,p) = 0. f,p : v(f,own(f )) v(f,p)

34 / 53 Epidemic protocol The core Each peer p P periodically does the following: 1 f FS(p) : v(f,p) > v(f,q) FS(q) FS(q) {f @p} 2 f FS(q) : v(f,p) < v(f,q) FS(p) FS(p) {f @q}

General framework Active part repeat wait T q select 1 from PV p R p select s from PV p send R p {p}\{q} to q skip receive Rq p from q PV p select m from PV p R p q until forever Passive part repeat skip skip skip receive R q p from any p R q select s from PV q send R q {q}\{p} to p PV q select m from PV q R q p until forever 35 / 53

36 / 53 Newscast Issue Policy Description view size m = 30 Each partial view has size 30 peer selection random Each peer uniformly at random selects a peer from its partial view reference selection random A random selection of s peers is selected from a partial view to be exchanged with view size reduction random the selected peer If the view size has grown beyond m, a random selection of references is removed to bring it back to size m

Newscast: evolution indegree distribution 700 500 Occurrences 600 500 400 300 Occurrences 400 300 200 200 100 100 20 30 40 50 Indegree Initially 50 100 150 200 Indegree After 200 rounds 37 / 53

38 / 53 Newscast: evolution path length 3.2 Newscast Average path length 3.0 2.8 2.6 2.4 ER random graph 5 10 15 20 Network size (x 1000)

39 / 53 Newscast: evolution cluster coefficient 0.04 Clustering coefficient 0.03 0.02 0.01 Newscast ER random graph 5 10 15 20 Network size (x 1000) Question For which kind of ER(n,p) graphs is this a fair comparison?

The Web 40 / 53

41 / 53 Web basics Simple view: the Web consists of hyperlinked documents. Hyperlinked: document A carries a reference to document B. When reference is activated, browser fetches document B. Collection of documents forms a site, with its own associated domain name. Some numbers It has been estimated that by 2008, there were at least 75 million Web sites from which Google had discovered more than a trillion Web pages.

42 / 53 Web basics Client machine Server machine 2. Server fetches document Browser Web server 3. Response 1. Get document request (HTTP)

43 / 53 Measuring the topology of the Web Problem With an estimated size of over a trillion Web pages, pages coming and going, and links changing all the time, how can we ever get a snapshot of the Web? We can t. Practical issue: crawling the Web In order to measure anything, we need to be able to identify pages and the links that refer to them.

44 / 53 Web crawler Seed document(s) Frontier Remove reference from head of list Fetch page Extract references; append to frontier Start with seed pages Store pages to inspect in frontier Analyze page and store found references in frontier Observation Using seed documents, we are accessing the reachable pages. Store page

Webpages as graphs: The VU website 45 / 53

46 / 53 Sampling the Web Observation The Web is so huge, that we can only hope to draw a reasonable sample, and hope that this sample represents the structure of the actual Web. We are asking for trouble. Starting point Let us try to represent the Web as a bowtie: SCC: v,w SCC, (v,w)-path of hyperlinks. IN: v IN,w SCC : (v,w)-path, but no (w,v)-path. OUT: v SCC,w OUT : (v,w)-path, but no (w,v)-path. TENDRILS: Essentially: the rest.

47 / 53 The Web as a bowtie: Starting from AltaVista Tendril IN SCC OUT 44 Million nodes 56 Million nodes 44 Million nodes Tube Tendril Disconnected components

48 / 53 Sampling the Web Observation It turns out that for different seeds, we do obtain different bowties. Component Sample 1 Sample 2 Sample 3 Sample 4 SCC 56.46% 65.28% 85.87% 72.30% IN 17.24% 1.69% 2.28% 0.03% OUT 17.94% 31.88% 11.26% 27.64% Other 8.36% 1.15% 0.59% 0.02% Total size 80.57M 18.52M 49.30M 41.29M

49 / 53 Sampling the Web Component Sample 1 Sample 2 Sample 3 Sample 4 SCC 56.46% 65.28% 85.87% 72.30% IN 17.24% 1.69% 2.28% 0.03% OUT 17.94% 31.88% 11.26% 27.64% Other 8.36% 1.15% 0.59% 0.02% Total size 80.57M 18.52M 49.30M 41.29M AltaVista 1 2 3 4 Question Which conclusion can we draw from these samples?

Web graphs: indegree distribution 1.000 0.500 Relative indegree 0.100 0.050 0.010 0.005 1 10 100 1000 10,000 Node ID (ranked according to degree) Observation It turns out that P[δ in = k] 1 k 2.1 another scale-free network. 50 / 53

51 / 53 Side step: Google s PageRank Observation Google uses hyperlinks to a page p as a criterion for the importance of a page: rank(p) = (1 d) + d q,p E rank(q) δ out (q) where d [0,1) is a constant (probably 0.85 in the case of Google). Question This is a recursive definition. What s going on?

52 / 53 Side step: Google s PageRank Observation PageRank is clearly based on indegrees, yet the rank of a page and its indegree turn out to be only weakly correlated. Observation When we rank pages according to PageRank: P[rank = k] 1 k 2.1 Observation Characterizing and sampling the Web is again seen to be far from trivial.

Web graphs: oudegree distribution 1.00 Relative outdegree 0.70 0.50 0.30 0.20 0.15 5 10 50 100 500 Node ID (ranked according to degree) Observation To analyze the Web graph, we need to be very careful regarding measurements and conclusions. 53 / 53