Graph Theory and Complex Networks: An Introduction Maarten van Steen VU Amsterdam, Dept. Computer Science Room R.0, steen@cs.vu.nl Chapter 0: Version: April 8, 0 / Contents Chapter Description 0: Introduction History, background 0: Foundations Basic terminology and properties of graphs 0: Extensions Directed & weighted graphs, colorings 0: Network traversal Walking through graphs (cf. traveling) 0: Trees Graphs without cycles; routing algorithms 0: Basic metrics for analyzing large graphs 07: Random networks Introduction modeling real-world networks 08: Computer networks The Internet & WWW seen as a huge graph 09: Social networks Communities seen as graphs / / Introduction Observation In real-world situations, graphs (or networks) may become very large, making it difficult to (visually) discover properties we need network analysis tools. Vertex degrees: Consider the distribution of degrees: how many vertices have high degrees versus the number of vertices with low degrees. Distance statistics: Focus on where vertices are positioned in the network: far away from each other, central in the network, etc. Clustering: To what extent are my neighbors also adjacent to each other? Centrality: Are there vertices that are more important than others? / /
. Vertex degree. Vertex degree Vertex degree Question Can you visually observe real (nonisomorphic) differences? / /. Vertex degree. Vertex degree Vertex degree: Histogram (a): (b): n = 00, m = 00 n = 00, m = 7 0 0 (a) (b) 0 7 8 9 0 / /. Vertex degree. Vertex degree Vertex degree: Ranked histogram 0 9 8 7 0 8 0 0 0 80 00 0 0 0 80 00 / /
. Distance statistics. Distance statistics Distance statistics G is connected, d(u,v) is distance between vertices u and v: the length of a shortest path between u and v. Eccentricity ε(u): Radius rad(g): Diameter diam(g): max{d(u, v) v V (G)} min{ε(u) u V (G)} max{d(u, v) u, v V (G)} Note Note that these definitions apply to directed as well as undirected graphs. 7 / 7 /. Distance statistics. Distance statistics Path lengths G is connected with vertex V ; d(u) is average length of shortest paths from u to any other vertex v: The average path length d(g): d(u) def = V d(u, v) v V,v u d(g) def = V d(u) = V u V V d(u, v) u,v V,u v 8 / 8 /. Distance statistics. Distance statistics Path lengths The characteristic path length is the median over all d(u). Note The median over n nondecreasing values x,x,...,x n : n odd x (n+)/ n even (x n/ + x n/+ )/ The median separates the higher values from the lower values into two equally-sized subsets. Example {,,,,0,,} [0,,,,,,] M = x (7+)/ = x = 9 / 9 /
. Distance statistics. Distance statistics Example distance statistics 7 Vertex 7 ε(u) v u d(u,v) d(u) 0 7 7.0 0 7 7.7 0 7 7. 7 0 9 9. 0.00 7 7 9 0 9.00 7 0 9.7 0 / 0 / Clustering coefficient Observation Many networks show a high degree of clustering: my neighbors are each other s neighbors. Note An extreme case is formed by having all my neighbors be adjacent to each other neighbors form a complete graph. Question What is the other extreme case? / / Clustering coefficient G is simple, connected, undirected. Vertex v V (G) with neighborset N(v). Let n v = N(v). Note: max. number of edges between neighbors is ( n v ). Let m v is number of edges in subgraph induced by N(v): m v = E(G[N(v)]). Clustering coefficient cc(v): cc(v) def = { mv / ( n v ) = m v n v (n v ) if δ(v) > undefined otherwise / /
Clustering coefficient G is simple, connected and undirected. Let V def = {v V (G) δ(v) > }. Clustering coefficient CC(G) for G: CC(G) def = V v V cc(v) / / Clustering coefficient: triangles A triangle is a complete (sub)graph with exactly vertices. A triple is a (sub)graph with exactly vertices and edges. G is simple and connected with n (G) distinct triangles and n Λ (G) distinct triples. The network transitivity τ(g) def = n (G)/n Λ (G). Notation A triple at v: v is incident to both edges ( in the middle ). n Λ (v) : number of triples at v. / / Clustering coefficient: example 7 Vertex: 7 cc: / 0 / undefined / n Λ : 0 Vertex N() = {,,7}; E(G[N()]) =,7 cc() = Triples at : G[{,,}],G[{,,7}],G[{,,7}] / /
Clustering coefficient versus transitivity Observation Let n (v) be the number of triangles of which v is member cc(v) = n (v) n Λ (v) n Λ (v) = ( δ(v) ) n (G) = v V n (v) (Note: V = {v V δ(v) > }) / / Clustering coefficient versus transitivity x x x v v v... v n v k v v v... v n y G k = G[{x,y,v,v,...,v k }] : cc(u) = if u = v,...,v k k ( k+ ) = k k(k+) = k+ if u = x or u = y y CC(G k ) = k + ( k + + k ) = k + k + k + k + lim k CC(G k) = y 7 / 7 / Clustering coefficient versus transitivity x v v v... v n G k = G[{x,y,v,v,...,v k }] { if u = v,...,v k n Λ (u) = ) ( = k+ ) if u = x,y τ(g k ) = n (G k ) n Λ (u) = ( δ(u) y k = k(k + ) + k k + lim τ(g k) = 0 k 8 / 8 /
Centrality Issue Are there any vertices that are more important than the others? G is (strongly) connected. The center C(G) is the set of vertices with minimal eccentricity: C(G) def = {v V (G) ε(v) = rad(g)} Intuition At the center means at minimal distance to the farthest node. 9 / 9 / Vertex centrality G is (strongly) connected. The (eccentricity based) vertex centrality c E (u) of u: c E (u) def = ε(u) Intuition The higher the centrality, the closer to the center of a graph. 0 / 0 / Closeness G is (strongly) connected. The closeness c C (u) of u: c C (u) def = v V (G) d(u,v) Intuition How close is a vertex to all other nodes? / /
Centrality: example 7 Vertex: 7 ε(u) 7 7 7 9 9 d(u, ) 7 7 9 c C (u): 0.08 0.0 0.07 0.0 0.0 0.07 0.0 / / Betweenness Intuition Important vertices are those whose removal significantly increases the distance between other vertices. Example: cut vertices. G is simple and (strongly) connected. S(x,y) is set of shortest paths between x and y. S(x,u,y) S(x,y) paths that pass through u. Betweenness centrality c B (u) of u: c B (u) def S(x,u,y) = x y u S(x, y) / /