THE NUMBER OF GRAPHS AND A RANDOM GRAPH WITH A GIVEN DEGREE SEQUENCE Alexer Barvinok Papers are available at http://www.math.lsa.umich.edu/ barvinok/papers.html This is a joint work with J.A. Hartigan (Yale 1 Typeset by AMS-TEX
Graphs degree sequences We consider graphs (undirected, with no loops or multiple edges on n labeled vertices 1,...,n. Let d k be the degree of the k-th vertex, that is, the number of edges incident to k. 2 d 1 = 2 1 5 3 d 2 = 2 d 3 = 3 d 4 = 1 4 6 d = 5 d 6 = 4 2 Given a degree sequence D = (d 1,...,d n, we consider the set G(D of all graphs on {1,...,n} such that the degree of the k-th vertex is d k for k = 1,...,n. Equivalently, G(D is the set of all n n symmetric matrices with zero diagonal, 0-1 entries row/column sums d 1,...,d n. Questions: Estimate the cardinality G(D Assuming that G(D, consider G(D as a finite probability space with the uniform measure. Pick a rom graph from G(D. What is it likely to look like? 2
The Erdös - Gallai condition Assume that d 1 d 2... d n. Then G(D is non-empty if only if k n d i k(k 1 + min {k,d i } for k = 1,...,n i=1 i=k+1 n d i 0 mod 2. i=1 i k=4 ( Let us consider the space R (n 2 of vectors x = x{j,k} for 1 j k n the polytope P(D defined by the equations x {j,k} = d k for k = 1,...,n j: j k inequalities Hence 0 x {j,k} 1 for 1 j k n. G(D = P(D Z (n 2. 3
Let us define The maxim entropy matrix H(x = xln 1 x + (1 xln 1 1 x for 0 x 1 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0.2 0.4 0.6 0.8 1 x H(x = x {j,k} for x = ( x {j,k}. 1 j<k n Since H is strictly convex, it achieves its maximum on polytope P(D defined by x {j,k} = d k for k = 1,...,n j: j k 0 x {j,k} 1 for 1 j k n at a unique point z = ( z {j,k} which we call the maximum entropy matrix. 4
Tame degree sequences What we can prove, we can prove for tame degree sequences. For 0 < δ 1/2, a degree sequence D = (d 1,...,d n is δ-tame if δ z {j,k} 1 δ for all 1 j k n, where z = ( z {j,k} is the maximum entropy matrix. Example. Fix real numbers 0 < α < β < 1 such that β < 2 α α. There exist a real number δ = δ(α,β > 0 a positive integer n 0 = n 0 (α,β such that every degree sequence D = (d 1,...,d n satisfying α < d i n 1 < β for i = 1,...,n is δ-tame provided n > n 0. Thus the degree sequences D = (d 1,...,d n satisfying or or 0.25 < d i n 1 0.01 < d i n 1 < 0.74 for i = 1,...,n < 0.18 for i = 1,...,n 0.81 < d i < 0.89 for i = 1,...,n n 1 are δ-tame for some δ > 0 all sufficiently large n. For n = 2m even, the degree sequence d 1 =... = d m = 0.75n 1, d m+1 =... = d n = 0.25n is not δ-tame for any 0 < δ < 1 since 1 for 1 j k m z {j,k} = 0 for m j k n 1/2 for 1 j m m + 1 k n is the only point in P(D. 5
Concentration about the maximum entropy matrix Theorem. Let us fix numbers κ > 0 0 < δ 1/2. Then there exists a number γ(κ,δ > 0 such that the following holds. Suppose that n γ(κ,δ that D = (d 1,...,d n is a δ-tame degree sequence such that d 1 +...+d n 0 mod 2. For a set S ( 1,...,n 2, let σs (G be the number of edges of a graph G G(D that belong to set S let σ S (z = {j,k} S z {j,k}, where z = ( z {j,k} is the maximum entropy matrix. Suppose that S δn 2 let ǫ = δ lnn n. If ǫ 1 then for a rom graph G G(D, we have { Pr G G(D : } (1 ǫσ S (z σ S (G (1 + ǫσ S (z 1 2n κn. S G 6
The number of graphs with a given degree sequence Given a degree sequence D = (d 1,...,d n, let us compute the maximum entropy matrix z = ( z {j,k}. We assume that 0 < z{j,k} < 1 for all j k. Let us consider the quadratic form q : R n R defined by q(t = 1 2 1 j<k n ( z {j,k} z{j,k} 2 (t j + t k 2 for t = (t 1,...,t n the Gaussian probability measure on R n with the density proportional to e q. Let us define f,h : R n R by f(t = 1 6 h(t = 1 24 1 j<k n z {j,k} ( 1 z{j,k} ( 2z{j,k} 1 (t j + t k 3 ( ( z {j,k} 1 z{j,k} 6z{j,k} 2 6z {j,k} + 1 (t j + t k 4 1 j<k n for t = (t 1,...,t n. Let µ = Ef 2 ν = Eh. Theorem. Let us fix 0 < δ < 1/2. Let D = (d 1,...,d n be a δ-tame degree sequence such that d 1 +... + d n 0 mod 2 let us define q, µ ν as above. Then the value of 2e H(z { (2π n/2 det2q exp µ } 2 + ν approximates the number G(D of graphs with the degree sequence D within a relative error which approaches 0 as n +. More precisely, for any 0 < ǫ 1/2 the above value approximates G(D within relative error ǫ provided ( γ(δ 1 n ǫ for some constant γ(δ > 0. 7
Numerical examples. The number of 4-regular graphs with 12 vertices is 480413921130 4.8 10 11, the formula approximates is within a relative error of 6%. The number of 4-regular graphs on 17 vertices is 28797220460586826422720 2.88 10 22, the formula approximates it within a relative error of 12%. The number of graphs on 12 vertices with the degree sequence 6,6,6,6,6,6,5,5, 5,5,5, 5 is approximately 2.27 10 12, the formula gives approximately 2.29 10 12, which is within 1%. The number of graphs on 14 vertices with the degree sequence 7,7,7,7,7,7,7,4, 4,4,4, 4, 4, 4 is approximately 3.27 10 10, the formula gives approximately 3.69 10 10, which is within 25%. 8
Some ideas of the proof Recall that P(D R 2 (n is the polytope defined by x {j,k} = d k for k = 1,...,n j: j k 0 x {j,k} 1 for all j k. Theorem. Suppose that P(D has a non-empty interior, that is, contains a point y = ( y {j,k} such that 0 < y{j,k} < 1 for all j k. Then, for the maximum entropy matrix z = ( z {j,k} we have 0 < z{j,k} < 1 for all j k. Let X {j,k} be independent Bernoulli rom variables such that ( ( Pr X {j,k} = 1 = z {j,k} Pr X {j,k} = 0 = 1 z {j,k}, where z = ( z {j,k} is the maximum entropy matrix. Then the probability mass function of X = ( X {j,k} is constant on graphs G G(D: ( Pr X = G = e H(z for all G G(D. 9
Concentration. Since EX = z, if there are sufficiently many graphs in G(D they will tend to cluster around z by the law of large numbers. Moreover, a rom graph in G(D will behave roughly as a rom graph on {1,...,n} with independently chosen edges, where edge {j,k} is chosen with probability z {j,k}. Counting. Let Y k = j: j k X {j,k} for k = 1,...,n Y = (Y 1,...,Y n. Then ( ( G(D = e H(z Pr X G(D = e H(z Pr Y = (d 1,...,d n. Now, EY = (d 1,...,d n Y is a linear combination of ( n 2 independent rom n-vectors. One is tempted to apply the Local Central Limit Theorem, which would give the formula 2e H(z G(D (2π n/2 detq, where Q is the covariance matrix of Y, that is, q jk =z {j,k} ( 1 z{j,k} q jj =d j k: k j z 2 {j,k}. if j k The formula must be corrected by the Edgeworth correction factor taking into account the third fourth moments of Y. 10