Inference in Bayesian networks

Similar documents
Introduction to Markov Chain Monte Carlo

The Basics of Graphical Models

Bayesian Networks. Mausam (Slides by UW-AI faculty)

Gambling with Information Theory

Tutorial on Markov Chain Monte Carlo

Pull versus Push Mechanism in Large Distributed Networks: Closed Form Results

Bayesian Networks. Read R&N Ch Next lecture: Read R&N

Inference on Phase-type Models via MCMC

Bayesian Machine Learning (ML): Modeling And Inference in Big Data. Zhuhua Cai Google, Rice University

Gibbs Sampling and Online Learning Introduction

STA 4273H: Statistical Machine Learning

Model-based Synthesis. Tony O Hagan

IEOR 6711: Stochastic Models, I Fall 2012, Professor Whitt, Final Exam SOLUTIONS

Big Data, Machine Learning, Causal Models

Similarity Search and Mining in Uncertain Spatial and Spatio Temporal Databases. Andreas Züfle

LECTURE 4. Last time: Lecture outline

Towards running complex models on big data

3. The Junction Tree Algorithms

Performance of Dynamic Load Balancing Algorithms for Unstructured Mesh Calculations

Linear Threshold Units

Introduction to Mobile Robotics Bayes Filter Particle Filter and Monte Carlo Localization

Dirichlet Processes A gentle tutorial

ANALYTICS IN BIG DATA ERA

Sampling via Moment Sharing: A New Framework for Distributed Bayesian Inference for Big Data

The normal approximation to the binomial

Monte Carlo methods in PageRank computation: When one iteration is sufficient

Chapter 9. Systems of Linear Equations

What Is Probability?

Incorporating Evidence in Bayesian networks with the Select Operator

Master s Theory Exam Spring 2006

Bayesian Phylogeny and Measures of Branch Support

Logistic Regression for Spam Filtering

CHAPTER 2 Estimating Probabilities

Probabilistic Models for Big Data. Alex Davies and Roger Frigola University of Cambridge 13th February 2014

CMPSCI611: Approximating MAX-CUT Lecture 20

Performance Analysis of a Telephone System with both Patient and Impatient Customers

Stochastic Processes and Queueing Theory used in Cloud Computer Performance Simulations

Reinforcement Learning

Malicious MPLS Policy Engine Reconnaissance

Gaussian Processes to Speed up Hamiltonian Monte Carlo

Bayesian Statistics in One Hour. Patrick Lam

The normal approximation to the binomial

Guessing Game: NP-Complete?

Exponential Random Graph Models for Social Network Analysis. Danny Wyatt 590AI March 6, 2009

Reinforcement Learning

Techniques for Supporting Prediction of Security Breaches in. Critical Cloud Infrastructures Using Bayesian Network and. Markov Decision Process

Message-passing sequential detection of multiple change points in networks

Christfried Webers. Canberra February June 2015

Sample Size Designs to Assess Controls

Decision Trees and Networks

Bayesian Statistics: Indian Buffet Process

Informatics 2D Reasoning and Agents Semester 2,

Server-based Inference of Internet Link Lossiness

Course: Model, Learning, and Inference: Lecture 5

1 Maximum likelihood estimation

Hidden Markov Models

CCNY. BME I5100: Biomedical Signal Processing. Linear Discrimination. Lucas C. Parra Biomedical Engineering Department City College of New York

1 Interest rates, and risk-free investments

1 Formulating The Low Degree Testing Problem

General Sampling Methods

Unit 4 The Bernoulli and Binomial Distributions


More details on the inputs, functionality, and output can be found below.

Chapter 4: Artificial Neural Networks

NEW VERSION OF DECISION SUPPORT SYSTEM FOR EVALUATING TAKEOVER BIDS IN PRIVATIZATION OF THE PUBLIC ENTERPRISES AND SERVICES

D-Policy for a Production Inventory System with Perishable Items

Some Examples of (Markov Chain) Monte Carlo Methods

Reinforcement Learning with Factored States and Actions

A Sublinear Bipartiteness Tester for Bounded Degree Graphs

Bayes and Naïve Bayes. cs534-machine Learning

2.1 Complexity Classes

An Introduction to Using WinBUGS for Cost-Effectiveness Analyses in Health Economics

Approximating the Partition Function by Deleting and then Correcting for Model Edges

DATA ANALYSIS II. Matrix Algorithms

Load Balancing. Load Balancing 1 / 24

Parallelization Strategies for Multicore Data Analysis

Random graphs with a given degree sequence

An Empirical Evaluation of Bayesian Networks Derived from Fault Trees

Introduction to Monte Carlo. Astro 542 Princeton University Shirley Ho

Probability and Statistics

Monte Carlo Methods in Finance

Bayesian networks - Time-series models - Apache Spark & Scala

DURATION ANALYSIS OF FLEET DYNAMICS

Queueing Systems. Ivo Adan and Jacques Resing

Portfolio Using Queuing Theory

VENDOR MANAGED INVENTORY

Structured Learning and Prediction in Computer Vision. Contents

Neural Networks for Machine Learning. Lecture 13a The ups and downs of backpropagation

פרויקט מסכם לתואר בוגר במדעים )B.Sc( במתמטיקה שימושית

Chapter 5. Discrete Probability Distributions

Distributed Computing over Communication Networks: Maximal Independent Set

Supply Chain Management of a Blood Banking System. with Cost and Risk Minimization

Real Time Traffic Monitoring With Bayesian Belief Networks

Topic models for Sentiment analysis: A Literature Survey

THE USE OF STATISTICAL DISTRIBUTIONS TO MODEL CLAIMS IN MOTOR INSURANCE

Probabilistic user behavior models in online stores for recommender systems

Hydrodynamic Limits of Randomized Load Balancing Networks

Network Tomography Based on to-end Measurements

Reinforcement Learning of Task Plans for Real Robot Systems

Problem Set II: budget set, convexity

Transcription:

Inference in Bayesian networks hapter 14.4 5 hapter 14.4 5 1

L L L L omplexity of exact inference Singly connected networks (or polytrees): any two nodes are connected by at most one (undirected) path time and space cost of variable elimination are O(d k n) Multiply connected networks: can reduce 3SA to exact inference NP-hard equivalent to counting 3SA models #P-complete 0.5 0.5 0.5 0.5 A B D 1. A v B v 2. v D v A 1 2 3 3. B v v D AND hapter 14.4 5 12

Inference by stochastic simulation Basic idea: 1) Draw N samples from a sampling distribution S 2) ompute an approximate posterior probability ˆP 3) Show this converges to the true probability P 0.5 oin Outline: Sampling from an empty network Rejection sampling: reject samples disagreeing with evidence Likelihood weighting: use evidence to weight samples Markov chain Monte arlo (MM): sample from a stochastic process whose stationary distribution is the true posterior hapter 14.4 5 13

Sampling from an empty network function Prior-Sample(bn) returns an event sampled from bn inputs: bn, a belief network specifying joint distribution P(X 1,..., X n ) x an event with n elements for i = 1 to n do x i a random sample from P(X i parents(x i )) given the values of P arents(x i ) in x return x hapter 14.4 5 14

Example P() loudy P(S ).10 P(R ).80.20 S R P(W S,R).99.01 hapter 14.4 5 15

Example P() loudy P(S ).10 P(R ).80.20 S R P(W S,R).99.01 hapter 14.4 5 16

Example P() loudy P(S ).10 P(R ).80.20 S R P(W S,R).99.01 hapter 14.4 5 17

Example P() loudy P(S ).10 P(R ).80.20 S R P(W S,R).99.01 hapter 14.4 5 18

Example P() loudy P(S ).10 P(R ).80.20 S R P(W S,R).99.01 hapter 14.4 5 19

Example P() loudy P(S ).10 P(R ).80.20 S R P(W S,R).99.01 hapter 14.4 5 20

Example P() loudy P(S ).10 P(R ).80.20 S R P(W S,R).99.01 hapter 14.4 5 21

Sampling from an empty network contd. Probability that PriorSample generates a particular event S P S (x 1... x n ) = Π n i = 1P (x i parents(x i )) = P (x 1... x n ) i.e., the true prior probability E.g., S P S (t, f, t, t) = 0.5 0.9 0.8 0.9 = 0.324 = P (t, f, t, t) Let N P S (x 1... x n ) be the number of samples generated for event x 1,..., x n hen we have lim N ˆP (x 1,..., x n ) = lim N P S(x 1,..., x n )/N N = S P S (x 1,..., x n ) = P (x 1... x n ) hat is, estimates derived from PriorSample are consistent Shorthand: ˆP (x1,..., x n ) P (x 1... x n ) hapter 14.4 5 22

Rejection sampling ˆP(X e) estimated from samples agreeing with e function Rejection-Sampling(X, e, bn, N) returns an estimate of P (X e) local variables: N, a vector of counts over X, initially zero for j = 1 to N do x Prior-Sample(bn) if x is consistent with e then N[x] N[x]+1 where x is the value of X in x return Normalize(N[X]) E.g., estimate P( = true) using 100 samples 27 samples have = true Of these, 8 have = true and 19 have = false. ˆP( = true) = Normalize( 8, 19 ) = 0.296, 0.704 Similar to a basic real-world empirical estimation procedure hapter 14.4 5 23

Analysis of rejection sampling ˆP(X e) = αn P S (X,e) (algorithm defn.) = N P S (X,e)/N P S (e) (normalized by N P S (e)) P(X, e)/p (e) (property of PriorSample) = P(X e) (defn. of conditional probability) Hence rejection sampling returns consistent posterior estimates Problem: hopelessly expensive if P (e) is small P (e) drops off exponentially with number of evidence variables! hapter 14.4 5 24

Likelihood weighting Idea: fix evidence variables, sample only nonevidence variables, and weight each sample by the likelihood it accords the evidence function Likelihood-Weighting(X, e, bn, N) returns an estimate of P (X e) local variables: W, a vector of weighted counts over X, initially zero for j = 1 to N do x, w Weighted-Sample(bn) W[x] W[x] + w where x is the value of X in x return Normalize(W[X ]) function Weighted-Sample(bn, e) returns an event and a weight x an event with n elements; w 1 for i = 1 to n do if X i has a value x i in e then w w P (X i = x i parents(x i )) else x i a random sample from P(X i parents(x i )) return x, w hapter 14.4 5 25

Likelihood weighting example P() loudy P(S ).10 P(R ).80.20 S R P(W S,R).99.01 w = 1.0 hapter 14.4 5 26

Likelihood weighting example P() loudy P(S ).10 P(R ).80.20 S R P(W S,R).99.01 w = 1.0 hapter 14.4 5 27

Likelihood weighting example P() loudy P(S ).10 P(R ).80.20 S R P(W S,R).99.01 w = 1.0 hapter 14.4 5 28

Likelihood weighting example P() loudy P(S ).10 P(R ).80.20 S R P(W S,R).99.01 w = 1.0 0.1 hapter 14.4 5 29

Likelihood weighting example P() loudy P(S ).10 P(R ).80.20 S R P(W S,R).99.01 w = 1.0 0.1 hapter 14.4 5 30

Likelihood weighting example P() loudy P(S ).10 P(R ).80.20 S R P(W S,R).99.01 w = 1.0 0.1 hapter 14.4 5 31

Likelihood weighting example P() loudy P(S ).10 P(R ).80.20 S R P(W S,R).99.01 w = 1.0 0.1 0.99 = 0.099 hapter 14.4 5 32

Likelihood weighting analysis Sampling probability for WeightedSample is S W S (z,e) = Π l i = 1P (z i parents(z i )) Note: pays attention to evidence in ancestors only somewhere in between prior and posterior distribution loudy Weight for a given sample z,e is w(z,e) = Π m i = 1P (e i parents(e i )) Weighted sampling probability is S W S (z,e)w(z,e) = Π l i = 1P (z i parents(z i )) Π m i = 1P (e i parents(e i )) = P (z,e) (by standard global semantics of network) Hence likelihood weighting returns consistent estimates but performance still degrades with many evidence variables because a few samples have nearly all the total weight hapter 14.4 5 33

Approximate inference using MM State of network = current assignment to all variables. Generate next state by sampling one variable given Markov blanket Sample each variable in turn, keeping evidence fixed function MM-Ask(X, e, bn, N) returns an estimate of P (X e) local variables: N[X ], a vector of counts over X, initially zero Z, the nonevidence variables in bn x, the current state of the network, initially copied from e initialize x with random values for the variables in Y for j = 1 to N do for each Z i in Z do sample the value of Z i in x from P(Z i mb(z i )) given the values of MB(Z i ) in x N[x] N[x] + 1 where x is the value of X in x return Normalize(N[X ]) an also choose a variable to sample at random each time hapter 14.4 5 34

he Markov chain With = true, W et = true, there are four states: loudy loudy loudy loudy Wander about for a while, average what you see hapter 14.4 5 35

MM example contd. Estimate P( = true, W et = true) Sample loudy or given its Markov blanket, repeat. ount number of times is true and false in the samples. E.g., visit 100 states 31 have = true, 69 have = false ˆP( = true, W et = true) = Normalize( 31, 69 ) = 0.31, 0.69 heorem: chain approaches stationary distribution: long-run fraction of time spent in each state is exactly proportional to its posterior probability hapter 14.4 5 36

Markov blanket sampling Markov blanket of loudy is and Markov blanket of is loudy,, and W et loudy Probability given the Markov blanket is calculated as follows: P (x i mb(x i )) = P (x i parents(x i ))Π Zj hildren(x i )P (z j parents(z j )) Easily implemented in message-passing parallel systems, brains Main computational problems: 1) Difficult to tell if convergence has been achieved 2) an be wasteful if Markov blanket is large: P (X i mb(x i )) won t change much (law of large numbers) hapter 14.4 5 37

Summary Exact inference by variable elimination: polytime on polytrees, NP-hard on general graphs space = time, very sensitive to topology Approximate inference by LW, MM: LW does poorly when there is lots of (downstream) evidence LW, MM generally insensitive to topology onvergence can be very slow with probabilities close to 1 or 0 an handle arbitrary combinations of discrete and continuous variables hapter 14.4 5 38