Complexity of variable elimination Graphs with loops

Similar documents
3. The Junction Tree Algorithms

Course: Model, Learning, and Inference: Lecture 5

6.852: Distributed Algorithms Fall, Class 2

Outline. NP-completeness. When is a problem easy? When is a problem hard? Today. Euler Circuits

Bayesian Networks. Mausam (Slides by UW-AI faculty)

Why? A central concept in Computer Science. Algorithms are ubiquitous.

Introduction to Markov Chain Monte Carlo

Guessing Game: NP-Complete?

Compression algorithm for Bayesian network modeling of binary systems

Approximation Algorithms

Data Structure [Question Bank]

Approximating the Partition Function by Deleting and then Correcting for Model Edges

5 Directed acyclic graphs

The Basics of Graphical Models

SYSM 6304: Risk and Decision Analysis Lecture 5: Methods of Risk Analysis

Bounded Treewidth in Knowledge Representation and Reasoning 1

Scheduling Shop Scheduling. Tim Nieberg

Incorporating Evidence in Bayesian networks with the Select Operator

Real Time Traffic Monitoring With Bayesian Belief Networks

Scheduling Home Health Care with Separating Benders Cuts in Decision Diagrams

Study Manual. Probabilistic Reasoning. Silja Renooij. August 2015

2. (a) Explain the strassen s matrix multiplication. (b) Write deletion algorithm, of Binary search tree. [8+8]

Lecture 7: NP-Complete Problems

R-trees. R-Trees: A Dynamic Index Structure For Spatial Searching. R-Tree. Invariants

Satisfiability Checking

GRAPH THEORY LECTURE 4: TREES

Finding the M Most Probable Configurations Using Loopy Belief Propagation

Data Structures Fibonacci Heaps, Amortized Analysis

Distributed Computing over Communication Networks: Maximal Independent Set

Removing Partial Inconsistency in Valuation- Based Systems*

Computer Algorithms. NP-Complete Problems. CISC 4080 Yanjun Li

Data Mining Practical Machine Learning Tools and Techniques

How To Solve A K Path In Time (K)

Chapter 7: Termination Detection

Analysis of Algorithms I: Binary Search Trees

Big Data, Machine Learning, Causal Models

Well-Separated Pair Decomposition for the Unit-disk Graph Metric and its Applications

CMPSCI611: Approximating MAX-CUT Lecture 20

Querying Past and Future in Web Applications

CSC2420 Spring 2015: Lecture 3

Parallelization: Binary Tree Traversal

5.1 Bipartite Matching

Cautious Propagation in Bayesian Networks *

OPTIMAL DESIGN OF DISTRIBUTED SENSOR NETWORKS FOR FIELD RECONSTRUCTION

Bayesian networks - Time-series models - Apache Spark & Scala

CORRELATED TO THE SOUTH CAROLINA COLLEGE AND CAREER-READY FOUNDATIONS IN ALGEBRA

Maximum Likelihood Graph Structure Estimation with Degree Distributions

A Turán Type Problem Concerning the Powers of the Degrees of a Graph

Context-Specific Independence in Bayesian Networks. P(z [ x) for all values x, y and z of variables X, Y and

Distributed Structured Prediction for Big Data

DATA ANALYSIS II. Matrix Algorithms

Discuss the size of the instance for the minimum spanning tree problem.

MapReduce for Bayesian Network Parameter Learning using the EM Algorithm

NEW VERSION OF DECISION SUPPORT SYSTEM FOR EVALUATING TAKEOVER BIDS IN PRIVATIZATION OF THE PUBLIC ENTERPRISES AND SERVICES

Full and Complete Binary Trees

Finding and counting given length cycles

Linear Programming Notes V Problem Transformations

Social Media Mining. Data Mining Essentials

Compact Representations and Approximations for Compuation in Games

Global secure sets of trees and grid-like graphs. Yiu Yu Ho

Definition Given a graph G on n vertices, we define the following quantities:

Big Graph Processing: Some Background

ALGEBRA. sequence, term, nth term, consecutive, rule, relationship, generate, predict, continue increase, decrease finite, infinite

Distributed Dynamic Load Balancing for Iterative-Stencil Applications

Life of A Knowledge Base (KB)

Y. Xiang, Constraint Satisfaction Problems

A simpler and better derandomization of an approximation algorithm for Single Source Rent-or-Buy

Traffic Engineering for Multiple Spanning Tree Protocol in Large Data Centers

CS510 Software Engineering

Algorithm Design and Analysis

Data Structures in Java. Session 15 Instructor: Bert Huang

COMBINATORIAL PROPERTIES OF THE HIGMAN-SIMS GRAPH. 1. Introduction

Tree-representation of set families and applications to combinatorial decompositions

Decision Trees and Networks

THE PROBLEM WORMS (1) WORMS (2) THE PROBLEM OF WORM PROPAGATION/PREVENTION THE MINIMUM VERTEX COVER PROBLEM

On the k-path cover problem for cacti

A Constraint Programming based Column Generation Approach to Nurse Rostering Problems

Small Maximal Independent Sets and Faster Exact Graph Coloring

I I I I I I I I I I I I I I I I I I I

In-Network Coding for Resilient Sensor Data Storage and Efficient Data Mule Collection

Planning with Subdomains and forests

Binary Search Trees. Data in each node. Larger than the data in its left child Smaller than the data in its right child

IE 680 Special Topics in Production Systems: Networks, Routing and Logistics*

A Logical Approach to Factoring Belief Networks

Message-passing sequential detection of multiple change points in networks

1. Nondeterministically guess a solution (called a certificate) 2. Check whether the solution solves the problem (called verification)

Basic Electronics Prof. Dr. Chitralekha Mahanta Department of Electronics and Communication Engineering Indian Institute of Technology, Guwahati

Recursive Algorithms. Recursion. Motivating Example Factorial Recall the factorial function. { 1 if n = 1 n! = n (n 1)! if n > 1

5 INTEGER LINEAR PROGRAMMING (ILP) E. Amaldi Fondamenti di R.O. Politecnico di Milano 1

Euler Paths and Euler Circuits

Class One: Degree Sequences

CS 598CSC: Combinatorial Optimization Lecture date: 2/4/2010

Outline 2.1 Graph Isomorphism 2.2 Automorphisms and Symmetry 2.3 Subgraphs, part 1

Problem Set 7 Solutions

8.1 Min Degree Spanning Tree

Social Media Mining. Graph Essentials

OHJ-2306 Introduction to Theoretical Computer Science, Fall

Analysis of an Artificial Hormone System (Extended abstract)

Beyond the Stars: Revisiting Virtual Cluster Embeddings

SEQUENTIAL VALUATION NETWORKS: A NEW GRAPHICAL TECHNIQUE FOR ASYMMETRIC DECISION PROBLEMS

Transcription:

Readings: K&F: 8.1, 8.2, 8.3, 8.7.1 K&F: 9.1, 9.2, 9.3, 9.4 Variable Elimination 2 Clique Trees Graphical Models 10708 Carlos Guestrin Carnegie Mellon University October 13 th, 2006 1 Complexity of variable elimination Graphs with loops Coherence Difficulty Intelligence Moralize graph: Connect parents into a clique and remove edge directions Grade SAT Letter Happy Job Connect nodes that appear together in an initial factor 10-708 Carlos Guestrin 2006 2 1

Induced graph The induced graph I F for elimination order has an edge X i X j if X i and X j appear together in a factor generated by VE for elimination order on factors F Coherence Elimination order: {C,D,S,I,L,H,J,G} Difficulty Intelligence Grade SAT Letter Happy Job 10-708 Carlos Guestrin 2006 3 Induced graph and complexity of VE Read complexity from cliques in induced graph Coherence Difficulty Happy Grade Letter Elimination order: {C,D,I,S,L,H,J,G} Intelligence Job SAT Structure of induced graph encodes complexity of VE!!! Theorem: Every factor generated by VE subset of a maximal clique in I F For every maximal clique in I F corresponds to a factor generated by VE Induced width (or treewidth) Size of largest clique in I F minus 1 Minimal induced width induced width of best order 10-708 Carlos Guestrin 2006 4 2

Example: Large induced-width with small number of parents Compact representation Easy inference 10-708 Carlos Guestrin 2006 5 Finding optimal elimination order Coherence Difficulty Happy Grade Letter Intelligence Job SAT Theorem: Finding best elimination order is NP-complete: Decision problem: Given a graph, determine if there exists an elimination order that achieves induced width K Interpretation: Hardness of finding elimination order in addition to hardness of inference Actually, can find elimination order in time exponential in size of largest clique same complexity as inference Elimination order: {C,D,I,S,L,H,J,G} 10-708 Carlos Guestrin 2006 6 3

Induced graphs and chordal graphs Coherence Chordal graph: Every cycle X 1 X 2 X k X 1 with k 3 has a chord Difficulty Intelligence Edge X i X j for non-consecutive i & j Theorem: Grade SAT Every induced graph is chordal Letter Optimal elimination order easily obtained for chordal graph Happy Job 10-708 Carlos Guestrin 2006 7 Chordal graphs and triangulation A B H D F C G E Triangulation: turning graph into chordal graph Max Cardinality Search: Simple heuristic Initialize unobserved nodes X as unmarked For k = X to 1 X unmarked var with most marked neighbors (X) k Mark X Theorem: Obtains optimal order for chordal graphs Often, not so good in other graphs! 10-708 Carlos Guestrin 2006 8 4

Minimum fill/size/weight heuristics A B H D F C G E Many more effective heuristics see reading Min (weighted) fill heuristic Often very effective Initialize unobserved nodes X as unmarked For k = 1 to X X unmarked var whose elimination adds fewest edges (X) k Mark X Add fill edges introduced by eliminating X Weighted version: Consider size of factor rather than number of edges 10-708 Carlos Guestrin 2006 9 Choosing an elimination order Choosing best order is NP-complete Reduction from MAX-Clique Many good heuristics (some with guarantees) Ultimately, can t beat NP-hardness of inference Even optimal order can lead to exponential variable elimination computation In practice Variable elimination often very effective Many (many many) approximate inference approaches available when variable elimination too expensive Most approximate inference approaches build on ideas from variable elimination 10-708 Carlos Guestrin 2006 10 5

Announcements Recitation on advanced topic: Carlos on Context-Specific Independence On Monday Oct 16, 5:30-7:00pm in Wean Hall 4615A 11 10-708 Carlos Guestrin 2006 Most likely explanation (MLE) Flu Query: Allergy Sinus Headache Using defn of conditional probs: Normalization irrelevant: 10-708 Carlos Guestrin 2006 Nose 12 6

Max-marginalization Flu Sinus Nose=t 10-708 Carlos Guestrin 2006 13 Example of variable elimination for MLE Forward pass Flu Allergy Sinus Headache Nose=t 10-708 Carlos Guestrin 2006 14 7

Example of variable elimination for MLE Backward pass Flu Allergy Sinus Headache Nose=t 10-708 Carlos Guestrin 2006 15 MLE Variable elimination algorithm Forward pass Given a BN and a MLE query max x1,,x n P(x 1,,x n,e) Instantiate evidence E=e Choose an ordering on variables, e.g., X 1,, X n For i = 1 to n, If X i E Collect factors f 1,,f k that include X i Generate a new factor by eliminating X i from these factors Variable X i has been eliminated! 10-708 Carlos Guestrin 2006 16 8

MLE Variable elimination algorithm Backward pass {x 1*,, x n* } will store maximizing assignment For i = n to 1, If X i E Take factors f 1,,f k used when X i was eliminated Instantiate f 1,,f k, with {x i+1*,, x n* } Now each f j depends only on X i Generate maximizing assignment for X i : 10-708 Carlos Guestrin 2006 17 What you need to know about VE Variable elimination algorithm Eliminate a variable: Combine factors that include this var into single factor Marginalize var from new factor Cliques in induced graph correspond to factors generated by algorithm Efficient algorithm ( only exponential in induced-width, not number of variables) If you hear: Exact inference only efficient in tree graphical models You say: No!!! Any graph with low induced width And then you say: And even some with very large induced-width (special recitation) Elimination order is important! NP-complete problem Many good heuristics Variable elimination for MLE Only difference between probabilistic inference and MLE is sum versus max 10-708 Carlos Guestrin 2006 18 9

What if I want to compute P(X i x 0,x n+1 ) for each i? X 0 X 1 X 2 X 3 X 4 X 5 Compute: Variable elimination for each i? Variable elimination for every i, what s the complexity? 10-708 Carlos Guestrin 2006 19 Reusing computation Compute: X 0 X 1 X 2 X 3 X 4 X 5 10-708 Carlos Guestrin 2006 20 10

Cluster graph CD DIG GSI Cluster graph: For set of factors F Undirected graph Each node i associated with a cluster C i Family preserving: for each factor f j F, node i such that scope[f i ] C i Each edge i j is associated with a separator S ij = C i C j C D I GJSL JSL G S HGJ L J H 10-708 Carlos Guestrin 2006 21 Factors generated by VE Coherence Difficulty Intelligence Grade SAT Letter Happy Job Elimination order: {C,D,I,S,L,H,J,G} 10-708 Carlos Guestrin 2006 22 11

Cluster graph for VE CD DIG GSI GJSL JSL VE generates cluster tree! One clique for each factor used/generated Edge i j, if f i used to generate f j Message from i to j generated when marginalizing a variable from f i Tree because factors only used once Proposition: Message δ ij from i to j Scope[δ ij ] S ij HGJ 10-708 Carlos Guestrin 2006 23 Running intersection property CD DIG GSI Running intersection property (RIP) Cluster tree satisfies RIP if whenever X C i and X C j then X is in every cluster in the (unique) path from C i to C j Theorem: Cluster tree generated by VE satisfies RIP GJSL JSL HGJ 10-708 Carlos Guestrin 2006 24 12

Constructing a clique tree from VE Select elimination order Connect factors that would be generated if you run VE with order Simplify! Eliminate factor that is subset of neighbor 10-708 Carlos Guestrin 2006 25 Find clique tree from chordal graph Triangulate moralized graph to obtain chordal graph Find maximal cliques NP-complete in general Easy for chordal graphs Max-cardinality search Maximum spanning tree finds clique tree satisfying RIP!!! Generate weighted graph over cliques Edge weights (i,j) is separator size C i C j Coherence Difficulty Grade Letter Intelligence SAT Job Happy 10-708 Carlos Guestrin 2006 26 13

Clique tree & Independencies CD DIG GSI GJSL JSL Clique tree (or Junction tree) A cluster tree that satisfies the RIP Theorem: Given some BN with structure G and factors F For a clique tree T for F consider C i C j with separator S ij : X any set of vars in C i side of the tree Y any set of vars in C i side of the tree Then, (X Y S ij ) in BN Furthermore, I(T) I(G) HGJ 10-708 Carlos Guestrin 2006 27 Variable elimination in a clique tree 1 C 1 : CD C 2 : DIG C 3 : GSI C 4 : GJSL C 5 : HGJ C D I H G L J S Clique tree for a BN Each CPT assigned to a clique Initial potential π 0 (C i ) is product of CPTs 10-708 Carlos Guestrin 2006 28 14

Variable elimination in a clique tree 2 C 1 : CD C 2 : DIG C 3 : GSI C 4 : GJSL C 5 : HGJ VE in clique tree to compute P(X i ) Pick a root (any node containing X i ) Send messages recursively from leaves to root Multiply incoming messages with initial potential Marginalize vars that are not in separator Clique ready if received messages from all neighbors 10-708 Carlos Guestrin 2006 29 Belief from message Theorem: When clique C i is ready Received messages from all neighbors Belief π i (C i ) is product of initial factor with messages: 10-708 Carlos Guestrin 2006 30 15

Choice of root Message does not depend on root!!! Root: node 5 Root: node 3 Cache computation: Obtain belief for all roots in linear time!! 10-708 Carlos Guestrin 2006 31 Shafer-Shenoy Algorithm (a.k.a. VE in clique tree for all roots) C 2 C 3 Clique C i ready to transmit to neighbor C j if received messages from all neighbors but j Leaves are always ready to transmit C 1 While C i ready to transmit to C j C 6 C 5 C 7 C 4 Send message δ i j Complexity: Linear in # cliques One message sent each direction in each edge Corollary: At convergence Every clique has correct belief 10-708 Carlos Guestrin 2006 32 16

Calibrated Clique tree Initially, neighboring nodes don t agree on distribution over separators Calibrated clique tree: At convergence, tree is calibrated Neighboring nodes agree on distribution over separator 10-708 Carlos Guestrin 2006 33 Answering queries with clique trees Query within clique Incremental updates Observing evidence Z=z Multiply some clique by indicator 1(Z=z) Query outside clique Use variable elimination! 10-708 Carlos Guestrin 2006 34 17

Message passing with division C 1 : CD C 2 : DIG C 3 : GSI C 4 : GJSL C 5 : HGJ Computing messages by multiplication: Computing messages by division: 10-708 Carlos Guestrin 2006 35 Lauritzen-Spiegelhalter Algorithm (a.k.a. belief propagation) Initialize all separator potentials to 1 µ ij 1 All messages ready to transmit While δ i j ready to transmit µ ij If µ ij µ ij δ i j π j π j δ i j µ ij µ ij neighbors k of j, k i, δ j k ready to transmit Complexity: Linear in # cliques Simplified description see reading for details for the right schedule over edges (leaves to root, then root to leaves) Corollary: At convergence, every clique has correct belief C 2 C 1 C 3 C 6 C 5 C 7 C 4 10-708 Carlos Guestrin 2006 36 18

VE versus BP in clique trees VE messages (the one that multiplies) BP messages (the one that divides) 10-708 Carlos Guestrin 2006 37 Clique tree invariant Clique tree potential: Product of clique potentials divided by separators potentials Clique tree invariant: P(X) = π Τ (X) 10-708 Carlos Guestrin 2006 38 19

Belief propagation and clique tree invariant Theorem: Invariant is maintained by BP algorithm! BP reparameterizes clique potentials and separator potentials At convergence, potentials and messages are marginal distributions 10-708 Carlos Guestrin 2006 39 Subtree correctness Informed message from i to j, if all messages into i (other than from j) are informed Recursive definition (leaves always send informed messages) Informed subtree: All incoming messages informed Theorem: Potential of connected informed subtree T is marginal over scope[t ] Corollary: At convergence, clique tree is calibrated π i = P(scope[π i ]) µ ij = P(scope[µ ij ]) 10-708 Carlos Guestrin 2006 40 20

Clique trees versus VE Clique tree advantages Multi-query settings Incremental updates Pre-computation makes complexity explicit Clique tree disadvantages Space requirements no factors are deleted Slower for single query Local structure in factors may be lost when they are multiplied together into initial clique potential 10-708 Carlos Guestrin 2006 41 Clique tree summary Solve marginal queries for all variables in only twice the cost of query for one variable Cliques correspond to maximal cliques in induced graph Two message passing approaches VE (the one that multiplies messages) BP (the one that divides by old message) Clique tree invariant Clique tree potential is always the same We are only reparameterizing clique potentials Constructing clique tree for a BN from elimination order from triangulated (chordal) graph Running time (only) exponential in size of largest clique Solve exactly problems with thousands (or millions, or more) of variables, and cliques with tens of nodes (or less) 10-708 Carlos Guestrin 2006 42 21