Markov random fields and Gibbs measures



Similar documents
MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 5 9/17/2008 RANDOM VARIABLES

COMBINATORIAL PROPERTIES OF THE HIGMAN-SIMS GRAPH. 1. Introduction

CS 598CSC: Combinatorial Optimization Lecture date: 2/4/2010

Conductance, the Normalized Laplacian, and Cheeger s Inequality

6.3 Conditional Probability and Independence

All trees contain a large induced subgraph having all degrees 1 (mod k)

DATA ANALYSIS II. Matrix Algorithms

Triangle deletion. Ernie Croot. February 3, 2010

Math 319 Problem Set #3 Solution 21 February 2002

Lecture 4: BK inequality 27th August and 6th September, 2007

1 if 1 x 0 1 if 0 x 1

k, then n = p2α 1 1 pα k

v w is orthogonal to both v and w. the three vectors v, w and v w form a right-handed set of vectors.

Basic Probability Concepts

Continued Fractions and the Euclidean Algorithm

RANDOM INTERVAL HOMEOMORPHISMS. MICHA L MISIUREWICZ Indiana University Purdue University Indianapolis

The sample space for a pair of die rolls is the set. The sample space for a random number between 0 and 1 is the interval [0, 1].

Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh

MATH10212 Linear Algebra. Systems of Linear Equations. Definition. An n-dimensional vector is a row or a column of n numbers (or letters): a 1.

The positive minimum degree game on sparse graphs

LEARNING OBJECTIVES FOR THIS CHAPTER

Probability Theory. Florian Herzog. A random variable is neither random nor variable. Gian-Carlo Rota, M.I.T..

5 Directed acyclic graphs

3. The Junction Tree Algorithms

Lemma 5.2. Let S be a set. (1) Let f and g be two permutations of S. Then the composition of f and g is a permutation of S.

a 11 x 1 + a 12 x a 1n x n = b 1 a 21 x 1 + a 22 x a 2n x n = b 2.

6.045: Automata, Computability, and Complexity Or, Great Ideas in Theoretical Computer Science Spring, Class 4 Nancy Lynch

Midterm Practice Problems

Every tree contains a large induced subgraph with all degrees odd

Math 4310 Handout - Quotient Vector Spaces

A simple criterion on degree sequences of graphs

Regular Languages and Finite Automata

No: Bilkent University. Monotonic Extension. Farhad Husseinov. Discussion Papers. Department of Economics

Regular Expressions with Nested Levels of Back Referencing Form a Hierarchy

Lecture 15 An Arithmetic Circuit Lowerbound and Flows in Graphs

Lecture 2: Universality

Reading 13 : Finite State Automata and Regular Expressions

AN ALGORITHM FOR DETERMINING WHETHER A GIVEN BINARY MATROID IS GRAPHIC

Lecture 17 : Equivalence and Order Relations DRAFT

On Integer Additive Set-Indexers of Graphs

Connectivity and cuts

On an anti-ramsey type result

Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay

INDISTINGUISHABILITY OF ABSOLUTELY CONTINUOUS AND SINGULAR DISTRIBUTIONS

Every Positive Integer is the Sum of Four Squares! (and other exciting problems)

Stationary random graphs on Z with prescribed iid degrees and finite mean connections

Lecture 22: November 10

Continued Fractions. Darren C. Collins

1 The Line vs Point Test

The Goldberg Rao Algorithm for the Maximum Flow Problem

So let us begin our quest to find the holy grail of real analysis.

On strong fairness in UNITY

Chapter 17. Orthogonal Matrices and Symmetries of Space

Fundamentele Informatica II

I. GROUPS: BASIC DEFINITIONS AND EXAMPLES

Maximum Likelihood Estimation

Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur

COUNTING SUBSETS OF A SET: COMBINATIONS

Formal Languages and Automata Theory - Regular Expressions and Finite Automata -

Spatial Statistics Chapter 3 Basics of areal data and areal data modeling

ALMOST COMMON PRIORS 1. INTRODUCTION

Handout #1: Mathematical Reasoning

Special Situations in the Simplex Algorithm

ON GENERALIZED RELATIVE COMMUTATIVITY DEGREE OF A FINITE GROUP. A. K. Das and R. K. Nath

Cycles and clique-minors in expanders

Finitely Additive Dynamic Programming and Stochastic Games. Bill Sudderth University of Minnesota

Mathematical Induction

NOTES ON LINEAR TRANSFORMATIONS

INTRODUCTORY SET THEORY

Automata on Infinite Words and Trees

CMSC 858T: Randomized Algorithms Spring 2003 Handout 8: The Local Lemma

Chapter 20. Vector Spaces and Bases

Geometrical Characterization of RN-operators between Locally Convex Vector Spaces

TOPIC 4: DERIVATIVES

Extension of measure

Math/Stats 425 Introduction to Probability. 1. Uncertainty and the axioms of probability

Practical Guide to the Simplex Method of Linear Programming

The Basics of Graphical Models

DEGREES OF ORDERS ON TORSION-FREE ABELIAN GROUPS

Discrete Mathematics and Probability Theory Fall 2009 Satish Rao, David Tse Note 2

Linear Algebra I. Ronald van Luijk, 2012

(IALC, Chapters 8 and 9) Introduction to Turing s life, Turing machines, universal machines, unsolvable problems.

Sample Induction Proofs

How Much Can Taxes Help Selfish Routing?

Odd induced subgraphs in graphs of maximum degree three

1 Approximating Set Cover

What is Linear Programming?

Computational Geometry Lab: FEM BASIS FUNCTIONS FOR A TETRAHEDRON

Lecture Notes on Measure Theory and Functional Analysis

Message-passing sequential detection of multiple change points in networks

SOLUTIONS TO EXERCISES FOR. MATHEMATICS 205A Part 3. Spaces with special properties

Elements of probability theory

CORRELATED TO THE SOUTH CAROLINA COLLEGE AND CAREER-READY FOUNDATIONS IN ALGEBRA

Smooth functions statistics

The Union-Find Problem Kruskal s algorithm for finding an MST presented us with a problem in data-structure design. As we looked at each edge,

Transcription:

Chapter Markov random fields and Gibbs measures 1. Conditional independence Suppose X i is a random element of (X i, B i ), for i = 1, 2, 3, with all X i defined on the same probability space (.F, P). The random elements X 1 and X 3 are said to be conditionally independent given X 2 if, at least for all bounded, B i -measuable real functions f i, P ( ) ( ) ( ) f 1 (X 1 ) f 3 (X 3 ) X 2 = P f1 (X 1 ) X 2 P f3 (X 3 ) X 2 almost surely. Equivalently, with F i (X 2 ) = P ( ) f i (X i ) X 2 for i = 1, 3, <1> P f 1 (X 1 ) f 2 (X 2 ) f 3 (X 3 ) = PF 1 (X 2 ) f 2 (X 2 )F 3 (X 2 ) for all f i. Notice that equality <1> involves the random elements only through their joint distribution. We could, with no loss of generality, assume = X 1 X 2 X 3 equipped with its product sigma-field B = B 1 B 2 B 3, with P the joint distribution of (X 1, X 2, X 3 ) and X i (ω) = ω i, the ith coordinate map. Things are greatly simplified if P has a density p(ω) with respect to a sigma finite product measure λ = λ 1 λ 2 λ 3 on B. As in the case of densities with respect to Lebesgue measure, the marginal distributions have densities obtained by integrating out some coordinates. For example, the marginal distribution of ω 2 has density p 2 (ω 2 ) = λ 1 λ 3 p(ω 1,ω 2,ω 3 ) with respect to λ 2. Similarly, the various conditional distributions are given by conditional densities (cf. Pollard 2001, Section 5.4). For example, the conditional distribution of (ω 1,ω 3 ) given ω 2 has density p 13 2 (ω 1,ω 3 ω 2 ) = p(ω 1,ω 2,ω 3 ) {p 2 (ω 2 )>0} with respect to λ 1 λ 3. p 2 (ω 2 ) Remark. It is traditional to use less formal notation, for example, writing p(ω 1,ω 3 ω 2 ) instead of p 13 2 (ω 1,ω 3 ω 2 ). As long as the arguments are specified symbolically there is no ambiguity. But an expression like p(a, b c) could refer to several different conditional densities evaluated at values (a, b, c). The conditional expectations are given by integrals involving conditional densities. For example, P ( ) f 1 (ω 1 ) ω 2 = λ1 f 1 (ω 1 )p 1 2 (ω 1 ω 2 ) almost surely. 6 February 2006 Stat 606, version: 6feb06 c David Pollard 1

Equality <1> then becomes λ f 1 (ω 1 ) f 2 (ω 2 ) f 3 (ω 3 )p(ω 1,ω 2,ω 3 ) = λ 2 f 2 (ω 2 )p 2 (ω 2 ) ( λ 1 f 1 (ω 1 )p 1 2 (ω 1 ω 2 ) )( λ 1 f 3 (ω 3 )p 3 2 (ω 3 ω 2 ) ) = λf 1 (ω 1 ) f 2 (ω 2 ) f 3 (ω 3 )p 2 (ω 2 )p 1 2 (ω 1 ω 2 )p 3 2 (ω 3 ω 2 ) A simple generating-class argument then shows that the conditional independence is equivalent to the factorization <2> p(ω 1,ω 2,ω 3 ) = p 2 (ω 2 )p 1 2 (ω 1 ω 2 )p 3 2 (ω 3 ω 2 ) a.e. [λ] Note that the right-hand side of the last display has the form φ(ω 1,ω 2 )ψ(ω 2,ω 3 ). Actually, any factorization <3> p(ω 1,ω 2,ω 3 ) = φ(ω 1,ω 2 )ψ(ω 2,ω 3 ) for nonnegative φ and ψ implies the conditional independence. For, if we define (ω 2 ) = λ 1 φ(ω 1,ω 2 ) and (ω 2 ) = λ 3 ψ(ω 2,ω 3 ), we then have p 2 (ω 2 ) = λ 1 λ 3 p = (ω 2 ) (ω 2 ) p 1,2 (ω 1,ω 2 ) = λ 3 p = φ(ω 1,ω 2 ) (ω 2 ) p 2,3 (ω 2,ω 3 ) = λ 1 p = (ω 2 )ψ(ω 2,ω 3 ) p 1 2 (ω 1 ω 2 ) = φ(ω 1,ω 2 )/ (ω 2 ){p 2 > 0} p 3 2 (ω 3 ω 2 ) = ψ(ω 2,ω 3 )/ (ω 2 ){p 2 > 0} from which <2> follows. Similar arguments apply when P is the joint distribution for more than three random elements. Notation. Let S be a finite index set. (Later the points of S will be called sites at which random variables are defined.) Consider probability densities p(ω i : i S) with respect to a product measure λ = i S λ i on the product sigma-field B = i S B i for a product space = X i S X i.for each A S write ω A for (ω i : i A). Better: The rhs of <4> is a version of p. If A, B, and C are disjoint subsets with union S, then it follows by an argumnet similar to the one leading from <3> that ω A and ω B are conditionally independent given ω C if and only if the density factorizes as <4> p(ω) = φ(ω A,ω C )ψ(ω B,ω C ) In fact, this condition follows directly from the earlier argument if we take X 1 = ω A, X 3 = ω B, and X 2 = ω C. If A, B, and C are disjoint subsets whose union is not the whole of S, the extra variables can complicate the checking of conditional independence. However a sufficient condition for conditional independence of ω A and ω B given ω C is existence of a factorization <5> p(ω) = φ(ω A,ω C,ω D )ψ(ω B,ω C,ω E ) where D and E are disjoint subsets of S\ ( A B C ). The integration over ω D and ω E, which is needed to find the density for ω A B C, preserves the factorization needed for conditional independence. 2. Gibbs distributions There is a particularly easy way to create probability measures with recognizable conditional independence properties. Let F be a collection of subsets of S. For 2 6 February 2006 Stat 606, version: 6feb06 c David Pollard

each a in F, let a (ω) = a (ω a ) be a nonnegative function that depends on ω only through ω a. Provided the number Z := λ a F a is neither zero nor infinite, we can define a Gibbs measure by means of the density <6> p(ω) = 1 Z a F a(ω a ) with respect to λ. The conditional independence properties are easily seen from the corresponding factor graph, which has vertex set V = S F with edges drawn only between those i S and a F for which i a. For example, if S ={1, 2, 3, 4, 5, 6, 7} and F ={{1, 2, 3}, {2, 3, 4, 5}, {4, 5, 6, 7}}: 1 1 2 5 2 5 3 6 3 6 4 7 4 7 The same connectivities could be represented, less economically, without the F vertices by joining all sites i, j for which there is some a F with {i, j} a, as shown in the graph on the right of the display. If i and j share a common neighbor a in the factor graph (equivalently, if i and j are neighbors in the representation without F), write i j. Write N i ={j S\{i} : i j}. <7> Theorem. Suppose A, B, andc are disjoint subsets of S with the separation property: each path joining a site in A to a site in B must pass through some site in C. Then ω A and ω B are conditionally independent given ω C under the Gibbs measure given by the density <6>. Proof. Define V A as the set of vertices from V that can be connected to some site in A by a path that passes through no sites from C. By the separation assumption, A V A and B V c A. Sites in C might belong to either V A or V c A. Note that p(ω) = φ(ω)ψ(ω) where φ = a V A a and ψ = a / V A a. The function φ depends on ω only through the sites in V A. Conservatively, we could write it as φ(ω A,ω C,ω D ), where D denotes the set of all sites in V A \ ( A C ). Similarly, ψ(ω) = ψ(ω B,ω C,ω E ) where E S\ ( B C ) consists of all those sites i connected to some a V c A. There can be no path joining i to a point k in A without passing through C, for otherwise there would be a path from k to a via i that would force a to belong to V A.In other words, D and E are disjoint subsets of S\ ( A B C ). That is, we have a factorization of p as in <5>, which implies the asserted conditional independence. <8> Exercise. For each site i, show that the conditional distribution of ω i given ω S\i depends only on (ω j ; j N i ). Specifically, show that p i S\i (ω i ω S\i ) = This fact is called the Markov property for the Gibbs distribution. 6 February 2006 Stat 606, version: 6feb06 c David Pollard 3

3. Hammersley-Clifford When P is defined by a strictly positive density p(ω) with respect to λ, there is a representation for P as a Gibbs measure with a factor graph that represents any Markov properties that P might have. The representation will follow from an application of the following Lemma to log p(ω). <9> Lemma. Let g be any real-valued function on = X i S X i. Let ω be any arbitrarily chosen, but fixed, point of. There exists a representation g(ω) = a S a(ω) with the following properties. (i) a (ω) = a (ω a ) depends on ω only through the coordinates ω a.in particular, is a constant. (ii) If a and if ω i = ω i for any i in a then a (ω) = 0. Proof. and For each subset a S define g a (ω a ) = g a (ω) = g(ω a, ω S\a ) a (ω) = b a ( 1)#(a\b) g b (ω b ). Clearly depends on ω only through ω a. To prove (ii), for simplicity suppose 1 a and ω 1 = ω 1. Divide the subsets of a into a set of pairs: those b for which 1 / b and the corresponding b = b {1}. The subsets b and b together contribute ( 1) #(a\b) ( g(ω b, ω 1, ω S\b ) g(ω b,ω 1, ω S\b ) ) = 0 because ω 1 = ω 1. Finally, to establish the representation for g, consider the coefficient of g b in a S a(ω) = {b a a,b S}( 1)#(a\b) g b (ω b ) When b = a = S, we recover ( 1) 0 g S (ω) = g(ω). When b is a proper subset of S, the coefficient of g b equals {b a a S}( 1)#(a\b) = D S\b ( 1)#D, which equals zero because half of the subsets D of S\b have #D even and the other half have #D odd. Applying the Lemma with g(ω) = log p(ω) gives p(ω) = exp a S a(ω a ), which is a trivial sort Gibbs density. We could, of course, immediately discard any terms for which a (ω) = 0 for all ω. The factorizations that follow from any conditional independnce properties of P lead in this way to more interesting representations. The following argument definitely works when is finite and λ is counting measure. I am not so confident for the general case because I haven t thought carefully enough about the role of negligible sets and versions. <10> Lemma. Suppose ω i and ω j are conditionally independent given ω C, where C = S\{i, j}. Then a 0 for every a with {i, j} a S. Proof. Invoke equality <4> with A ={i} and B ={j} to get log p(ω) = f (ω i,ω C ) + h(ω j,ω C ) where f = log φ and h = log ψ. 4 6 February 2006 Stat 606, version: 6feb06 c David Pollard

Notice that the representation for a from Lemma <9> is linear in g. Thus a = a f + a h, the sum of the terms by applying the same construction to f then to h. For example, a f (ω a) = b a ( 1)#a\b f (ω b, ω S\b ) If j a, the right-hand side is unaffected if we replace ω j by ω j, because f does not depend on ω j. However, the left-hand side is zero when ω j = ω j. It follows that f a (ω a ) 0 when j a. A similar argument shows that h a (ω a) 0 when i a. References Griffeath, D. (1976), Introduction to Markov Random Fields, Springer. Chapter 12 of Denumerable Markov Chains by Kemeny, Knapp, and Snell (2nd edition). Pollard, D. (2001), A User s Guide to Measure Theoretic Probability, Cambridge University Press. 6 February 2006 Stat 606, version: 6feb06 c David Pollard 5