Clique Trees. Amr Ahmed October 23, 2008

Similar documents
3. The Junction Tree Algorithms

Course: Model, Learning, and Inference: Lecture 5

Scheduling Shop Scheduling. Tim Nieberg

The Basics of Graphical Models

CMPSCI611: Approximating MAX-CUT Lecture 20

Dynamic Programming. Lecture Overview Introduction

Big Data, Machine Learning, Causal Models

1. Nondeterministically guess a solution (called a certificate) 2. Check whether the solution solves the problem (called verification)

Satisfiability Checking

Solutions to Homework 6

Bayesian Networks. Mausam (Slides by UW-AI faculty)

Near Optimal Solutions

5 Directed acyclic graphs

Page 1. CSCE 310J Data Structures & Algorithms. CSCE 310J Data Structures & Algorithms. P, NP, and NP-Complete. Polynomial-Time Algorithms

Exponential Random Graph Models for Social Network Analysis. Danny Wyatt 590AI March 6, 2009

Decision Making under Uncertainty

MapReduce and Distributed Data Analysis. Sergei Vassilvitskii Google Research

SYSM 6304: Risk and Decision Analysis Lecture 5: Methods of Risk Analysis

Introduction to Markov Chain Monte Carlo

A 2-factor in which each cycle has long length in claw-free graphs

6.3 Conditional Probability and Independence

Why? A central concept in Computer Science. Algorithms are ubiquitous.

Distributed Structured Prediction for Big Data

Outline. NP-completeness. When is a problem easy? When is a problem hard? Today. Euler Circuits

Reputation Network Analysis for Filtering

Social Media Mining. Graph Essentials

How To Find Local Affinity Patterns In Big Data

DATA ANALYSIS II. Matrix Algorithms

Life of A Knowledge Base (KB)

Bayesian networks - Time-series models - Apache Spark & Scala

Complexity Theory. IE 661: Scheduling Theory Fall 2003 Satyaki Ghosh Dastidar

Decision Trees and Networks

1. Write the number of the left-hand item next to the item on the right that corresponds to it.

Part 2: Community Detection

8.1 Min Degree Spanning Tree

Bayesian Networks. Read R&N Ch Next lecture: Read R&N

HIGH PERFORMANCE BIG DATA ANALYTICS

Discuss the size of the instance for the minimum spanning tree problem.

Gibbs Sampling and Online Learning Introduction

CSE 326, Data Structures. Sample Final Exam. Problem Max Points Score 1 14 (2x7) 2 18 (3x6) Total 92.

Mathematical Induction. Lecture 10-11

Approximation Algorithms

COMBINATORIAL PROPERTIES OF THE HIGMAN-SIMS GRAPH. 1. Introduction

Omega Automata: Minimization and Learning 1

BOĞAZİÇİ UNIVERSITY COMPUTER ENGINEERING

Binary Search Trees. A Generic Tree. Binary Trees. Nodes in a binary search tree ( B-S-T) are of the form. P parent. Key. Satellite data L R

Message-passing sequential detection of multiple change points in networks

Real Time Traffic Monitoring With Bayesian Belief Networks

How To Understand The Network Of A Network

Small Maximal Independent Sets and Faster Exact Graph Coloring

Guessing Game: NP-Complete?

CS 598CSC: Combinatorial Optimization Lecture date: 2/4/2010

5.1 Bipartite Matching

Compression algorithm for Bayesian network modeling of binary systems

Classification On The Clouds Using MapReduce

! E6893 Big Data Analytics Lecture 10:! Linked Big Data Graph Computing (II)

Network (Tree) Topology Inference Based on Prüfer Sequence

2. (a) Explain the strassen s matrix multiplication. (b) Write deletion algorithm, of Binary search tree. [8+8]

The Two Envelopes Problem

CS473 - Algorithms I

Social Media Mining. Network Measures

Homework # 3 Solutions

bigdata Managing Scale in Ontological Systems

Link Prediction in Social Networks

Solving Quadratic Equations

Managerial Economics Prof. Trupti Mishra S.J.M. School of Management Indian Institute of Technology, Bombay. Lecture - 13 Consumer Behaviour (Contd )

CORRELATED TO THE SOUTH CAROLINA COLLEGE AND CAREER-READY FOUNDATIONS IN ALGEBRA

On the Efficiency of Backtracking Algorithms for Binary Constraint Satisfaction Problems

Common Core Unit Summary Grades 6 to 8

arxiv: v1 [math.co] 7 Mar 2012

Mining Social Network Graphs

Well-Separated Pair Decomposition for the Unit-disk Graph Metric and its Applications

Welcome to the topic on creating key performance indicators in SAP Business One, release 9.1 version for SAP HANA.

Warshall s Algorithm: Transitive Closure

x 2 + y 2 = 1 y 1 = x 2 + 2x y = x 2 + 2x + 1

Lecture 16 : Relations and Functions DRAFT

Using Tableau Software with Hortonworks Data Platform

Graph Theory and Complex Networks: An Introduction. Chapter 06: Network analysis

LECTURE 4. Last time: Lecture outline

Unit 7 Quadratic Relations of the Form y = ax 2 + bx + c

CS473 - Algorithms I

Lecture 15 An Arithmetic Circuit Lowerbound and Flows in Graphs

Methods for Firewall Policy Detection and Prevention

6.852: Distributed Algorithms Fall, Class 2

Online Model Predictive Control of a Robotic System by Combining Simulation and Optimization

Handout #Ch7 San Skulrattanakulchai Gustavus Adolphus College Dec 6, Chapter 7: Digraphs

Distributed Computing over Communication Networks: Maximal Independent Set

Lecture 7: NP-Complete Problems

Zeros of a Polynomial Function

Extracting Decision Trees from Diagnostic Bayesian Networks to Guide Test Selection

Policy Analysis for Administrative Role Based Access Control without Separate Administration

The Point-Slope Form

Recursive Algorithms. Recursion. Motivating Example Factorial Recall the factorial function. { 1 if n = 1 n! = n (n 1)! if n > 1

Graph Mining and Social Network Analysis

L4: Bayesian Decision Theory

Department of Cognitive Sciences University of California, Irvine 1

Approximating the Partition Function by Deleting and then Correcting for Model Edges

Any two nodes which are connected by an edge in a graph are called adjacent node.

Asking Hard Graph Questions. Paul Burkhardt. February 3, 2014

A Hypocoloring Model for Batch Scheduling Problem

Transcription:

Clique Trees Amr Ahmed October 23, 2008

Outline Clique Trees Representation Factorization Inference Relation with VE

Representation Given a Probability distribution, P How to represent it? What does this representation tell us about P? Cost of inference Independence relationships What are the options? Full CPT Bayes Network (list of factors) Clique Tree

Representation FULL CPT Space is exponential Inference is exponential Can read nothingabout independence in P Just bad

Representation: the past Bayes Network List of Factors: P(X i Pa(X i )) Space efficient Independence Read local Markov Ind. Compute global independence via d-seperation Inference Can use dynamic programming by leveraging factors Tell us little immediately about cost of inference Fix an elimination order Compute the induced graph Find the largest clique size» Inference is exponential in this largest clique size

Representation: Today Clique Trees (CT) Tree of cliques Can be constructed from Bayes network Bayes Network + Elimination order CT What independence can read from CT about P? How P factorizes over CT? How to do inference using CT? When should you use CT?

The Big Picture Full CPT List of local factors Coherence Tree of Cliques Difficulty Intelligence many Grade SAT many PDAG, Minimal I-MAP,etc Happy Letter Job Choose an Elimination order, Max-spaning tree Inference is Just summation VE operates in factors by caching computations (intermediate factors) within a singleinference operation P(X E) CT enables caching computations (intermediate factors) across multipleinference operations P(X i,x j ) for all I,j

Clique Trees: Representation For set of factors F (i.e. Bayes Net) Undirected graph Each node iassociated with a cluster C i Family preserving: for each factor f j F, node i such that scope[f i ] C i Each edge i j is associated with a separator S ij = C i C j

Clique Trees: Representation Family preserving over factors Running Intersection Property Both are correct Clique trees Coherence CD GDIS G L J S H J G Difficulty Intelligence Grade SAT Letter Happy Job

Clique Trees: Representation What independence can be read from CT I(CT) subset I(G) subset I(P) Use your intuition How to block a path? Observe a separator. Q4 Coherence Difficulty Intelligence Grade Letter SAT C G D Happy Job H I G, J H I G, S CD HJ GI

Clique Trees: Representation How P factorizes over CT (when CT is calibrated) Q4 (See 9.2.11) Coherence Difficulty Intelligence Grade SAT Letter Happy Job

Representation Summary Clique trees (like Bayes Net) has two parts Structure Potentials (the parallel to CPTs in BN) Clique potentials Separator Potential Upon calibration, you can read marginals from the cliques and separator potentials Initialize clique potentials with factors from BN Distribute factors over cliques (family preserving) Cliques must satisfy RIP But wee need calibration to reach a fixed point of these potentials (see later today) Compare to BN You can only read local conditionals P(x i pa(x i )) in BN You need VE to answer other queries In CT, upon calibration, you can read marginals over cliques You need VE over calibrated CT to answer queries whose scope can not be confined to a single clique

Clique tree Construction Replay VE Connect factors that would be generated if you run VE with this order Simplify! Eliminate factor that is subset of neighbor Coherence Difficulty Happy Grade Letter Intelligence Job SAT

Clique tree Construction (details) Replay VE with order: C,D,I,H,S, L,J,G Initial factors: C, DC, GDI, SI, I, LG, JLS, HJG Coherence Difficulty Grade Intelligence SAT Eliminate C: multiply CD, C to get factor with CD, then marginalize C To get a factor with D. Letter C CD D Happy Job Eliminate D: multiply D, GDI to get factor with GDI, then marginalize D to get a factor with GI C CD D DGI GI Eliminate I: multiply GI, SI, I to get factor with GSI, then marginalize I to get a factor with GS C CD D DGI GI GSI GS I SI

Clique tree Construction (details) Replay VE with order: C,D,I,H,S, L,J,G Initial factors: C, DC, GDI, SI, I, LG, JLS, HJG Eliminate I: multiply GI, SI, I to get factor with GSI, then marginalize I to get a factor with GS Coherence Difficulty Grade Letter Intelligence SAT C CD D DGI GI GSI GS Happy Job I SI Eliminate H: just marginalize HJG to get a factor with JG C CD D DGI GI GSI GS JG HJG I SI

Clique tree Construction (details) Replay VE with order: C,D,I,H,S, L,J,G Initial factors: C, DC, GDI, SI, I, LG, JLS, HJG Eliminate H: just marginalize HJG to get a factor with JG Coherence Difficulty Grade Letter Intelligence SAT C CD D DGI GI GSI GS JG Happy HJG Job I SI Eliminate S: multiply GS, JLS to get GJLS, then marginalize S to get GJL C CD D DGI GI GSI GS GJLS GJL JG HJG I SI JLS

Clique tree Construction (details) Replay VE with order: C,D,I,H,S, L,J,G Initial factors: C, DC, GDI, SI, I, LG, JLS, HJG Coherence Difficulty Grade Intelligence SAT Eliminate S: multiply GS, JLS to get GJLS, then marginalize S to get GJL Letter C CD D DGI GI GSI GS GJLS GJL JG I SI JLS Eliminate L: multiply GJL, LG to get JLG, then marginalize L to get GJ Happy HJG Job C CD D DGI GI GSI GS GJLS GJL JG HJG I SI JLS LG G Eliminate L, G: JG G

Clique tree Construction (details) Summarize CT by removing subsumed nodes C CD D DGI GI GSI GS GJLS GJL JG HJG I SI JLS LG G CD GDI GSI G L J S H J G - Satisfy RIP and Family preserving (always true for any Elimination order) -Finally distribute initial factor into the cliques, to get initial beliefs (which is the parallel of CPTs in BN), to be used for inference

Clique tree Construction: Another method From a triangulated graph Still from VE, why? Elimination order triangulation Triangulation Max cliques Connect cliques, find max-spanning tree

Clique tree Construction: Another method (details) Get choral graph (add fill edges) for the same order as before C,D,I,H,S, L,J,G. Extract Max cliques from this graph and get maximum-spanning clique tree 0 G L J S 2 Coherence CD 1 GDI 1 0 2 2 GSI 0 1 1 H J G Difficulty Grade Letter Intelligence SAT As before Happy Job CD GDI GSI G L J S H J G

The Big Picture Full CPT List of local factors Coherence Tree of Cliques Difficulty Intelligence many Grade SAT many PDAG, Minimal I-MAP,etc Happy Letter Job Choose an Elimination order, Max-spaning tree Inference is Just summation VE operates in factors by caching computations (intermediate factors) within a singleinference operation P(X E) CT enables caching computations (intermediate factors) across multipleinference operations P(X i,x j ) for all I,j

Clique Tree: Inference P(X): assume X is in a node (root) Just run VE! Using elimination order dictated by the tree and initial factors put into each clique to define \Pi 0 (C i ) When done we have P(G,J,S,L) In VE jargon, we assign these messages names like g1, g2, etc. Eliminate C Eliminate D Eliminate I Eliminate H

Clique Tree: Inference (2) Initial Local belief What is: δ 1 ( ) = ( ) > 2 D π 0 C1 = P( C) P( D C) What is: δ ( ) ( ) ( ) ( D) P( G I, D) C C 2 > 3 G, I = π 0 C2 δ1 > 2 D = δ1 > 2 D C We are simply doing VE along partial order determined by the tree: (C,D,I) and H (i.e. H can Be anywhere in the order) Just a factor over D Just a factor over GI In VE jargon, we call these messages with names like g1, g2, etc.

Clique Tree: Inference (3) Initial Local belief -When we are done, C5 would have received two messages from left and right -In VE, we will end up with factors corresponding to these messages in addition to all factors that were distributed into C5: P(L G), P(J L,G) -In VE, we multiply all these factors to get the marginals -In CT, we multiply all factors in C5 : \Pi_0(C_5) with these two messages to get C_5 calibrated potential (which is also the marginal), so what is the deal? Why this is useful?

Clique Tree: Inter-Inference Caching P(G,L) :use C5 as root P(I,S): use C3 as root Notice the same 3 messages: i.e. same intermediate factors in VE

What is passed across the edge? Exclusive Scope:CD Edge Scope: GI Exclusive Scope: S,L,J,H CDGI GISLJH -The message summarizes what the right side of the tree cares about in the left side (GI) - See Theorem 9.2.3 - Completely determined by the root -Multiply all factors in left side -Eliminate out exclusive variables (but do it in steps along the tree: C then D) -The message depends ONLY on the direction of the edge!!!

Clique Tree Calibration Two Step process: Upward: as before Downward(after you calibrate the root)

Intuitively Why it works? Upward Phase: Root is calibrated Downward: Lets take C4, what if it was a root. = ( S ) = π ( C ) δ ( S ) δ 6 > 4 46 0 6 SC [ C ] 6 S 46 5 > 6 56 Now C4 is calibrated and can Act recursively as a new root!!! The two tree only differ on the edge from C4-C6, but else The same C4: just needs message from C6 That summarizes the status of the Separator from the other side of the tree

Clique Trees Can compute all clique marginals with double the cost of a single VE Need to store all intermediate messages It is not magic If you store intermediate factors from VE you get the same effect!! You lose internal structure and some independency Do you care? Time: no! Space: YES You can still run VE to get marginal with variables not in the same clique and even all pair-wise marginals (Q5). Good for continuous inference Can not be tailored to evidences: only one elimination order

Queries Outside Clique: Q5 T is assumed calibrated Cliques agree on separators See section 9.3.4.2, Section 9.3.4.3