4. Sum-product algorithm

Transcription

1 Sum-product algorithm Sum-product algorithm Elimination algorithm Sum-product algorithm on a line Sum-product algorithm on a tree

2 Sum-product algorithm 4-2 Inference tasks on graphical models consider an undirected graphical model (a.k.a. Markov random field) µ(x) = 1 ψ c (x c ) Z c C where C is the set of all maximal cliques in G we want to calculate marginals: µ(x A ) = x V \A µ(x) calculating conditional distributions µ(x A x B ) = µ(x A, x B ) µ(x B ) calculation maximum a posteriori estimates: arg maxˆx µ(ˆx) calculating the partition function Z sample from this distribution

3 Elimination algorithm for calculating marginals Elimination algorithm is exact but can require O( X V ) operations µ(x) ψ 12 (x 1, x 2 )ψ 13 (x 1, x 3 )ψ 25 (x 2, x 5 )ψ 345 (x 3, x 4, x 5 ) we want to compute µ(x 1 ) brute force marginalization: µ(x 1 ) ψ 12 (x 1, x 2 )ψ 13 (x 1, x 3 )ψ 25 (x 2, x 5 )ψ 345 (x 3, x 4, x 5 ) x 2,x 3,x 4,x 5 X requires O( C X 5 ) operations, where big-o denotes that it is upper bounded by C C X 5 for some constant C Sum-product algorithm 4-3

4 consider an elimination ordering (5, 4, 3, 2) µ(x 1 ) ψ 12 (x 1, x 2 )ψ 13 (x 1, x 3 )ψ 25 (x 2, x 5 )ψ 345 (x 3, x 4, x 5 ) = = = = x 2,x 3,x 4,x 5 X x 2,x 3,x 4 X x 2,x 3,x 4 X x 2,x 3 X x 2,x 3 X = x 2 X ψ 12 (x 1, x 2 )ψ 13 (x 1, x 3 ) x 5 X ψ 12 (x 1, x 2 )ψ 13 (x 1, x 3 )m 5 (x 2, x 3, x 4 ) ψ 12 (x 1, x 2 )ψ 13 (x 1, x 3 ) x 4 X ψ 25 (x 2, x 5 )ψ 345 (x 3, x 4, x 5 ) } {{ } m 5 (x 2,x 3,x 4 ) m 5 (x 2, x 3, x 4 ) } {{ } m 4 (x 2,x 3 ) ψ 12 (x 1, x 2 )ψ 13 (x 1, x 3 )m 4 (x 2, x 3 ) ψ 12 (x 1, x 2 ) x 3 X ψ 13 (x 1, x 3 )m 4 (x 2, x 3 ) } {{ } m 3 (x 1,x 2 ) Sum-product algorithm 4-4

5 Sum-product algorithm 4-5 µ(x 1 ) ψ 12 (x 1, x 2 ) ψ 13 (x 1, x 3 )m 4 (x 2, x 3 ) x 2 X = x 2 X m 2 (x 1 ) x 3 X ψ 12 (x 1, x 2 )m 3 (x 1, x 2 ) } {{ } m 3 (x 1,x 2 ) normalize m 2 ( ) to get µ(x 1 ) µ(x 1 ) = m 2(x 1 ) ˆx 1 m 2 (ˆx 1 ) computational complexity depends on the elimination ordering how do we know which ordering is better?

6 Computational complexity of elimination algorithm µ(x 1 ) = x 2,x 3,x 4,x 5 X x 2,x 3,x 4 X = x 2,x 3 X = x 2 X = x 2 X ψ 12 (x 1, x 2 )ψ 13 (x 1, x 3 )ψ 25 (x 2, x 5 )ψ 345 (x 3, x 4, x 5 ) ψ 12 (x 1, x 2 )ψ 13 (x 1, x 3 ) ψ 12 (x 1, x 2 )ψ 13 (x 1, x 3 ) ψ 12 (x 1, x 2 ) x 3 X x 5 X ψ 25 (x 2, x 5 )ψ 345 (x 3, x 4, x 5 ) }{{} m 5(S 5), S 5={x 2,x 3,x 4}, Ψ 5={ψ 25,ψ 345} x 4 X m 5 (x 2, x 3, x 4 ) }{{} m 4(S 4), S 4={x 2,x 3},Ψ 4={m 5} ψ 13 (x 1, x 3 )m 4 (x 2, x 3 ) }{{} m 3(S 3), S 3={x 1,x 2}, Ψ 3={ψ 13,m 4} ψ 12 (x 1, x 2 )m 3 (x 1, x 2 ) }{{} m 2(S 2), S 2={x 1}, Ψ 2={ψ 12,m 3} = m 2 (x 1 ) Total complexity: i O( Ψ i X 1+ Si ) = O( V max i Ψ i X 1+maxi Si ) Sum-product algorithm 4-6

7 theorem: every maximal clique in G(G, I) corresponds to a domain of a message S i {i} for some i size of the largest clique in G(G, I) is 1 + max i S i different orderings I s give different cliques, resulting in varying complexity Sum-product algorithm 4-7 Induced graph elimination algorithm as transformation of graphs induced graph G(G, I) for a graph G and an elimination ordering I is the union of (the edges of) all the transformed graphs or equivalently, start from G and for each i I connect all pairs in Si

8 Sum-product algorithm 4-8 theorem: finding optimal elimination ordering is NP-hard any suggestions? heuristics give I = (4, 5, 3, 2, 1) for Bayesian networks the same algorithm works with conditional probabilities instead of compatibility functions complexity analysis can be done on moralized undirected graph intermediate messages do not correspond to a conditional distribution

9 Sum-product algorithm 4-9 Elimination algorithm input: {ψ c } c C, alphabet X, subset A V, elimination ordering I output: marginal µ(x A ) 1. initialize active set Ψ to be the set of input compatibility functions 2. for node i in I that is not in A do let S i be the set of nodes, not including i, that share a compatibility function with i end let Ψ i be the set of compatibility functions in Ψ involving x i compute m i (x Si ) = x i ψ Ψ i ψ(x i, x Si ) remove elements of Ψ i from Ψ add m i to Ψ 3. normalize µ(x A ) = ψ Ψ ψ(x A) / x A ψ Ψ ψ(x A)

10 Sum-product algorithm 4-10 Belief Propagation for approximate inference given a pairwise MRF: µ(x) = 1 Z compute marginal: µ(x i ) (i,j) E ψ i,j(x i, x j ) message update: set of 2 E messages on each (directed) edge {ν i j (x i )} (i,j) E, where ν i j : X R + encoding our belief (or approximate P(x i )) decision: ν (t+1) i j (x i) = k i\j O(di X 2 ) computations ν (t) i (x i ) = k i { } xk ν(t) k i (x k)ψ i,k (x i, x k ) = k i { x k { ν (t) i k (x i) } ν (t) k i (x k)ψ i,k (x i, x k ) } 1/(di 1) µ(xi ) = νi(xi) x νi(x )

11 Sum-product algorithm on a line ν 3 4 (x 3 ) ν 5 4 (x 5 ) x 1 x 2 x 3 x 4 x 5 x 6 x 7 Remove node i and recursively compute marginals on sub-graphs µ(x i) = x [n]\{i} µ(x) µ(x) = 1 Z n 1 i=1 ψ i,i+1 (x i, x i+1 ) ψ(x 1, x 2) ψ(x i 2, x i 1) ψ(x i 1, x i)ψ(x i, x i+1) ψ(x i+1, x i+2) ψ(x n 1, x n) }{{}}{{} x [n]\{i} µ i 1 i :joint dist. on a sub-graph µ i+1 i ν i 1 i(x i 1) }{{} x i 1,x i+1 marginal dist. on a subrgaph ψ(x i 1, x i)ψ(x i, x i+1)ν i+1 i(x i+1) ν i 1 i(x i 1) µ i 1 i(x 1,..., x i 1) x 1,...,x i 2 ν i+1 i(x i+1) µ i+1 i(x i+1,..., x n) x i+2,...,x Sum-product algorithm n 4-11

12 Sum-product algorithm 4-12 ν i (x i ) x i 1,x i+1 ν i 1 i (x i 1 )ψ(x i 1, x i )ψ(x i, x i+1 )ν i+1 i (x i+1 ) µ(x i ) = ν i(x i ) x i ν i (x i ) definitions: of joint distribution and marginal on sub-graphs µ i 1 i (x 1,..., x i 1 ) ν i 1 i (x i 1 ) µ i+1 i (x i+1,..., x n ) ν i+1 i (x i+1 ) 1 Z i 1 i k [i 2] ψ(x k, x k+1 ) x 1,...,x i 2 µ i 1 i (x 1,..., x i 1 ) 1 Z i+1 i k {i+2,...,n} ψ(x k 1, x k ) x i+2,...,x n µ i+1 i (x i+1,..., x n )

13 how can we compute the messages ν, recursively? µ i i+1 (x 1,..., x i ) µ i 1 i (x 1,..., x i 1 )ψ(x i 1, x 1 ) ν i i+1 (x i ) = µ i i+1 (x 1,..., x i ) x 1,...,x i 1 µ i 1 i (x 1,..., x i 1 )ψ(x i 1, x 1 ) x 1,...,x i 1 = x i 1 ν i 1 i (x i 1 )ψ(x i 1, x i ) ν 1 2 (x 1 ) = 1/ X ν 2 3 (x 2 ) x 1 1 X ψ(x 1, x 2 ) how many operations are required? O(n X 2 ) operations to compute one marginal µ(x i ) what if we want all the marginals? compute all the messages forward and backward O(n X 2 ) then compute all the marginals in O(n X 2 ) operations Sum-product algorithm 4-13

14 Sum-product algorithm 4-14 Computing partition function is as easy as computing marginals computing the partition function from the messages Z i (i+1) = ψ(x k, x k+1 ) x 1,...,x i k [i 1] = Z (i 1) i ν (i 1) i (x i 1 ) ψ(x i 1, x i ) x i,x i 1 Z 1 = 1 Z n = Z how many operations do we need? O(n X 2 ) operations

15 Sum-product algorithm 4-15 Sum-product algorithm for hidden Markov models hidden Markov model Sequence of r.v. s {(X 1, Y 1 ); (X 2, Y 2 );... ; (X n, Y n )} n 1 hidden state {X i } Markov Chain P{x} = P{x 1 } P{x i+1 x i } {Y i } noisy observations P{y x} = i=1 n P{y i x i } i=i x 1 x 2 x 3 x 4 x 5 x 6 x 7 y 1 y 2 y 3 y 4 y 5 y 6 y 7 (equivalent directed form) x 1 x 2 x 3 x 4 x 5 x 6 x 7 y 1 y 2 y 3 y 4 y 5 y 6 y 7

16 Sum-product algorithm 4-16 time homogeneous hidden Markov models Sequence of r.v. s {(X 1, Y 1 ); (X 2, Y 2 );... ; (X n, Y n )} n 1 {X i } Markov Chain P{x} = q 0 (x 1 ) q(x i, x i+1 ) {Y i } noisy observations P{y x} = i=1 n r(x i, y i ) i=i µ(x, y) = 1 Z n 1 i=1 ψ i (x i, x i+1 ) n ψ i (x i, y i ), i=1 ψ i (x i, x i+1 ) = q(x i, x i+1 ), ψi (x i, y i ) = r(x i, y i ).

17 Sum-product algorithm 4-17 we want to compute marginals of the following graphical model on a line (q 0 uniform) x 1 x 2 x 3 x 4 x 5 x 6 x 7 y 1 y 2 y 3 y 4 y 5 y 6 y 7 µ y (x) = P{x y} Bayes thm = 1 Z(y) n 1 i=1 q(x i, x i+1 ) n r(x i, y i ). i=1 µ y (x) = = n 1 1 q(x i, x i+1 ) Z(y) i=1 n 1 1 ψ i (x i, x i+1 ) Z(y) i=1 n r(x i, y i ) i=1 ψ i (x i, x i+1 ) = q(x i, x i+1 ) r(x i, y i ) (for i < n 1) ψ n 1 (x n 1, x n ) = q(x n 1, x n ) r(x n 1, y n 1 ) r(x n, y n ).

18 Sum-product algorithm 4-18 apply sum-product algorithm to compute marginals in O(n X 2 ) time ν i (i+1) (x i ) q(x i 1, x i ) r(x i 1, y i 1 ) ν (i 1) i (x i 1 ), ν (i+1) i (x i ) x i 1 X x i+1 X q(x i, x i+1 ) r(x i, y i ) ν (i+2) (i+1) (x i+1 ). known as forward-backward algorithm a special case of the sum-product algorithm BCJR algorithm for convolutional codes ([Bahl, Cocke, Jelinek and Raviv 1974]) cannot find the maximum likelihood estimate (cf. Viterbi algorithm) implement sum-product algorithm for HMM [Homework 2.7] consider an extension of inference on HMM [Homework 2.8]

19 Sum-product algorithm 4-19 Homework 2.7 S&P 500 index over a period of time For each week, measure the price movement relative to the previous week: +1 indicates up and 1 indicates down a hidden Markov model in which x t denotes the economic state (good or bad) of week t and y t denotes the price movement (up or down) x t+1 = x t with probability 0.8 P Yt X t (y t = +1 x t = good ) = P Yt X t (y t = 1 x t = bad ) = q

20 Sum-product algorithm 4-20 Example: Neuron firing patterns Hypothesis Assemblies of neurones activate in a coordinate way in correspondence to specific cognitive functions. Performing of the function corresponds sequence of these activity states. Approach Firing process Observed variables Activity states Hidden variables

21 single subject ensemble neural spikes horizontal eye position vertical eye position baseline planning moving automatically detect (as opposed to manually specify) baseline, plan, and perimovement epochs of neural activity detect target movement in advance goal: neural prosthesis to help patients with spinal cord injury or neurodegenerative disease and significantly impaired motor control [C. Kemere, G. Santhanam, B. M. Yu, A. Afshar, S.I. Ryu, T. H. Meng and Sum-product algorithm 4-21

22 Sum-product algorithm 4-22 discrete time 10ms P(xt+1 x t ) = A ij, P(# spikes for measurement k = d x t = i) e λ k,i λ d k,i likelihood: P(x t=s) s P(xt=s )

23 Sum-product algorithm 4-23 Belief propagation for factor graphs b x 5 x 6 x 4 c x 3 d x 2 x 1 factor ψ a (x 1, x 5, x 6 ) a }{{} a µ(x) = 1 Z ψ a (x a ) a F

24 variable nodes i, j, etc.; factor nodes a, b, etc. set of messages {ν i a } (i,a) E and { ν a i } (a,i) E messages from variables to factors ν i a (x i ) = ν b i (x i ) b i\{a} messages from factors to variables ν a i (x i ) = ψ a (x a ) x a\{i} j a\{i} messages from variables to factors ν i (x i ) = ν b i (x i ) b i ν j a (x j ) this includes belief propagation (=sum-product algorithm) on (general) Markov random fields exact on factor trees Sum-product algorithm 4-24

25 Example: decoding LDPC codes LDPC code is defined by a factor graph model variable nodes factor nodes x i {0, 1} x1 a x2 ψ a(x i, x j, x k ) = I(x i x j x k = 0) x3 b block length n = 4 number of factors m = 2 allowed messages = {0000, 0111, 1010, 1101} decoding using belief propagation (for BSC with ɛ = ) µ y (x) = 1 P Z Y X (y i x i ) I( x a = 0) i V a F use (parallel) sum-product algorithm to find µ(x i ) and let x4 ˆx i = arg max µ(x i ) minimizes bit error rate Sum-product algorithm 4-25

26 Decoding by sum-product algorithm y P(y i x i) [ ] 0.7 x1 x2 x3 x4 [ ] [ ] [ ] [ ] [ ] a b [ ] [ ] [ ] [ ] [ ] ν (t+1) i a (x i) = P(y i x i ) b i\{a} ν(t) b i (x i) ν (t+1) a i (x i) = x a\{i} j a\{i} ν j a(x j )I( x a = 0) Sum-product algorithm 4-26

27 Decoding by sum-product algorithm y P(y i x i) [ ] 0.7 x1 x2 x3 x4 [ ] 0.7 [ ] 0.7 a b [ ] [ ] [ ] [ ] [ ] ν (t+1) i a (x i) = P(y i x i ) b i\{a} ν(t) b i (x i) ν (t+1) a i (x i) = x a\{i} j a\{i} ν j a(x j )I( x a = 0) Sum-product algorithm 4-26

28 Decoding by sum-product algorithm y P(y i x i) [ ] 0.7 x1 x2 x3 x4 [ ] 0.7 [ ] 0.7 a b [ ] 0.42 = 0.58 [ ] [ ] [ ] 0.7 [ ] ν (t+1) i a (x i) = P(y i x i ) b i\{a} ν(t) b i (x i) ν (t+1) a i (x i) = x a\{i} j a\{i} ν j a(x j )I( x a = 0) Sum-product algorithm 4-26

29 Decoding by sum-product algorithm y P(y i x i) [ ] 0.7 x1 x2 x3 x4 [ ] [ ] a [ ] [ ] b [ ] [ ] [ ] [ ] 0.7 ν (t+1) i a (x i) = P(y i x i ) b i\{a} ν(t) b i (x i) ν (t+1) a i (x i) = x a\{i} j a\{i} ν j a(x j )I( x a = 0) Sum-product algorithm 4-26

30 Decoding by sum-product algorithm y P(y i x i) [ ] 0.7 x1 x2 x3 x4 [ ] [ ] a b [ ] [ ] [ ] [ ] ν (t+1) i a (x i) = P(y i x i ) b i\{a} ν(t) b i (x i) ν (t+1) a i (x i) = x a\{i} j a\{i} ν j a(x j )I( x a = 0) Sum-product algorithm 4-26

31 ν (t+1) i a (x i) = P(y i x i ) b i\{a} ν(t) b i (x i) ν (t+1) a i (x i) = x a\{i} j a\{i} ν j a(x j )I( x a = 0) Sum-product algorithm 4-26 Decoding by sum-product algorithm ˆµ(x i ) [ ] [ ] y P(y i x i) [ ] 0.7 x1 x2 x3 x4 [ ] [ ] a b [ ] [ ] [ ] [ ]

32 Sum-product algorithm on trees a motivating example: influenza virus complete sequence of the gene of the 1918 influenza virus [A.H. Reid, T.G. Fanning, J.V. Hultin, and J.K. Taubenberger, Proc. Natl. Acad. Sci. 96 (1999) ] Sum-product algorithm 4-27

33 challenges in phylogeny phylogeny reconstruction: given DNA sequences at vertices (only at leaves), infer the underlying tree T = (V, E). phylogeny evaluation: given a tree T = (V, E) evaluate the probability of observed DNA sequences at vertices (only at leaves). Bayesian network model for phylogeny evaluation T = (V, D) directed graph, DNA sequences x = (x i ) i V X V µ T (x) = q o (x o ) q i,j (x i, x j ), (i,j) D q i,j (x i, x j ) = Probability that the descendent is x j if ancestor is x i. Sum-product algorithm 4-28

34 simplified model: X = {+1, 1} q o (x o ) = 1 { 2 1 q if x j = x i q(x i, x j ) = q if x i x j MRF representation: q(x i, x j ) e θx ix j with θ = 1 2 log q 1 q probability of certain tree of mutations x: µ T (x) = 1 Z θ (T ) (i,j) E e θx ix j problem: for given T, compute marginal µ T (x i ) we prove the correctness of sum-product algorithm for this model, but the same proof holds for any general pairwise MRF (and also for general MRF and FG) Sum-product algorithm 4-29

35 Sum-product algorithm 4-30 define graphical model on sub-trees j i T j i T j i = (V j i, E j i) Subtree rooted at j and excluding i µ j i(x Vj i ) 1 Z(T j i) ν j i(x j) x Vj i \{j} (u,v) E j i e θxuxv µ j i(x Vj i )

36 T k1 i = (V 1, E 1) T k2 i k 1 i k 2 k 4 k 3 T k4 i T k3 i the messages from neighbors k 1, k 2, k 3, k 4 are sufficient to compute the marginal µ(x i) µ T (x i) e θxuxv = = x V \{i} (u,v) E x V1,x V2,x V3,x V4 l=1 4 l=1 x kl,x Vl \{k l } 4 { l=1 x kl e θxixkl 4 {e θx ix kl {e θx ix kl x Vl \{k l } (u,v) E l e θxuxv (u,v) E l e θxuxv } µ kl i(x Vl ) } {{ } ν kl i(x kl ) Sum-product algorithm 4-31 } }

37 Sum-product algorithm 4-32 recursion on sub-trees to compute the messages ν j i k l T i j µ i j(x Vi j ) = = 1 Z(T i j) (u,v) E i j e θxuxv 1 { Z(T i j) eθx ix k e θx ix l e θxuxv e θx ix l { (u,v) E k i e θxuxv (u,v) E k i e θxuxv }{ e θx ix k e θx ix l µ k i (x Vk i )µ l i (x Vl i ) }{ (u,v) E l i e θxuxv (u,v) E l i e θxuxv } }

38 ν i j j ν k i k i ν l i l ν i j(x i) = µ i j(x Vi j ) x Vi j \i e θx ix k e θx ix l µ k i (x Vk i )µ l i (x Vl i ) = x Vi j \i { }{ } e θxixk µ k i (x Vk i ) e θxixl µ l i (x Vl i ) x Vk i x Vl i { }{ e θxixk µ k i (x Vk i ) e θxixl x k x Vk i \{k} { x k }{ ) x l } ) e θxixk ν k i (x k e θxixl ν l i (x l x l x Vl i \{l} } µ l i (x Vl i ) with uniform initialization ν i j(x i) = 1 X for all leaves i Sum-product algorithm 4-33

39 sum-product algorithm (for our example) ν i j (x i ) { } e θx ix k ν k i (x k ) ν i (x i ) k i\j x k { } e θx ix k ν k i (x k ) k i x k µ T (x i ) = ν i (x i ) x i ν i (x i ) what if we want all the marginals? choose an arbitrary root φ compute all the messages towards the root ( E messages) then compute all the messages outwards from the root ( E messages) then compute all the marginals (n marginals) how many operations are required? naive implementation requires O( X 2 i d2 i ) if i has degree d i, then computing ν i j requires d i X 2 operations d i messages start at each node i, each require d i X 2 operations total computation for 2 E messages is { } i d i (d i X 2 ) however, we can compute all marginals in O(n X 2 ) operations Sum-product algorithm 4-34

40 Sum-product algorithm 4-35 let D = {(i, j), (j, i) (i, j) E} be the directed version of E (cf. D = 2 E ) (sequential) sum-product algorithm 1. initialize ν i j (x i ) = 1/ X for all leaves i 2. recursively over (i, j) D compute (from leaves) { } ν i j (x i ) = ψ ik (x i, x k )ν k i (x k ) x k k i\j 3. for each i V compute marginal ν i (x i ) = { } ψ ik (x i, x k )ν k i (x k ) x k µ T (x i ) = k i ν i (x i ) x i ν i (x i )

41 (parallel) sum-product algorithm 1. initialize ν (0) i j (x i) = 1/ X for all (i, j) D 2. for t {0, 1,..., t max } for all (i, j) D compute ν (t+1) i j (x { } i) = ψ ik (x i, x k )ν (t) k i (x k) k i k i\j 3. for each i V compute marginal ν i (x i ) = { } ψ ik (x i, x k )ν (tmax+1) k i (x k ) µ T (x i ) = x k ν i (x i ) x i ν i (x i ) also called belief propagation when t max is larger than the diameter of the tree (the length of the longest path), this converges to the correct marginal [Homework 2.5] more operations than the sequential version ( O(n X 2 diam(t )) ) a naive implementation requires O( X 2 diam(t ) d 2 i ) naturally extends to general graphs but no proof of exactness Sum-product algorithm 4-36 x k

42 Sum-product algorithm on general graphs (loopy) belief propagation 1. initialize ν i j (x i ) = 1/ X for all (i, j) D 2. for t {0, 1,..., t max } for all (i, j) D compute ν (t+1) i j (x i) = k i k i\j x k { x k } ψ ik (x i, x k )ν (t) k i (x k) 3. for each i V compute marginal ν i (x i ) = { } ψ ik (x i, x k )ν (tmax+1) k i (x k ) µ T (x i ) = ν i (x i ) x i ν i (x i ) computes approximate marginals in O(n X 2 t max ) operations generally it does not converge; even if it does, it might be incorrect folklore about loopy BP works better when G has few short loops works better when ψij (x i, x j ) = ψ ij,1 (x i )ψ ij,2 (x j ) + small(x i, x j ) nonconvex variational principle Sum-product algorithm 4-37

43 Exercise: partition function on trees using the recursion for messages: { } ν i j (x i ) = ψ ik (x i, x k )ν k i (x k ) k i\j x k ν i (x i ) = { } ψ ik (x i, x k )ν k i (x k ) x k k i it follows that we can easily compute the partition function as Z(T ) = ν i (x i ) x i alternatively, if we had a black box that computes marginals for any tree, then we can use it to compute partition functions efficiently Z(T i j ) = { } ψ ik (x i, x k ) Z(T k i ) µ k i (x k ) Z(T ) = x i X k i\j x i X k i x k X { } ψ ik (x i, x k ) Z(T k i ) µ k i (x k ) x k X this recursive algorithm naturally extends to general graphs Sum-product algorithm 4-38

44 Sum-product algorithm 4-39 Why would one want to compute the partition function? Suppose you observe x = (+1, +1, +1, +1, +1, +1, +1, +1, +1) and you know this comes from either of µ(x) = 1 Z(T ) (e.g. coloring) (i,j) E ψ(x i, x j ) which one has highest likelihood?

45 Exercise: sampling on the tree if we have a black-box for computing marginals on any tree, we can use it to sample from any distribution on a tree Sampling( Tree T = (V, E), ψ = {ψ ij } (ij) E ) 1: Choose a root o V ; 2: Sample X o µ o ( ); 2: Recursively over i V (from root to leaves): 3: Compute µ i π(i) (x i x π(i) ); 4: Sample X i µ i π(i) ( x π(i) ); π(i) is the parent of node i in the rooted tree T o we use the black-box to compute the conditional distribution k i j µ T (x Vi j x j ) ψ ij (x i x j ) µ i j (x Vi j ) µ T (x i x j ) ψ ij (x i x j ) µ i j (x i ) Sum-product algorithm 4-40

46 Sum-product algorithm 4-41 Tree decomposition when we don t have a tree we can create an equivalent tree graph by enlarging the alphabet X X k Treewidth(G) Minimum such k it is NP-hard to determine the treewidth of a graph problem: in general Treewidth(G) = Θ(n)

47 Sum-product algorithm 4-42 Tree decomposition of G = (V, E) 1 1,2 1, , , A tree T = (V T, E T ) and a mapping V : V T SUBSETS(V ) s.t.: For each i V there exists at least one u V T with i V (u). For each (i, j) E there exists at least one u V T with i, j V (u). If i V (u 1 ) and i V (u 2 ), then i V (w) for any w on the path between u 1 and u 2 in T.