Bayesian network inference

Size: px

Start display at page:

Download "Bayesian network inference"

Edith Bates
7 years ago
Views:

1 Bayesian network inference Given: Query variables: X Evidence observed variables: E = e Unobserved variables: Y Goal: calculate some useful information about the query variables osterior Xe MA estimate arg max x xe Recall: inference via the full joint distribution X e X E = e = X e y y e Since BN s can afford exponential savings in storage of joint distributions can they afford similar savings for inference

2 Bayesian network inference In full generality N-hard More precisely #-hard: equivalent to counting satisfying assignments We can reduce satisfiability to Bayesian network inference Decision problem: is Y > 0 Y = u 1 u 2 u 3 u 1 u 2 u 3 u 2 u 3 u 4 C 1 C 2 C 3 G. Cooper 1990

3 Inference example Query: B j m

4 Inference example Query: B j m = m j a e b m j b m j b = = = e E a A a m a j e b a e b j m j j = = = e E a A a m a j e b a e b j = = e E a A Are we doing any unnecessary work

5 Inference example b j m b e E= e A= a a b e j a m a

6 Exact inference Basic idea: compute the results of sub-expressions in a bottom-up way and cache them for later use Form of dynamic programming Has polynomial time and space complexity for polytrees olytree: at most one undirected path between any two nodes

7 Representing people

8 Review: Bayesian network inference In general harder than satisfiability Efficient inference via dynamic programming is possible for polytrees In other practical cases must resort to approximate methods

9 Approximate inference: Sampling AB Bayesian network kis a generative model Allows us to efficiently generate samples from the joint distributionib i Algorithm for sampling the joint distribution: While not all variables are sampled: ick a variable that is not yet sampled but whose parents are sampled Draw its value from X parentsx

10 Example of sampling from the joint distribution

11 Example of sampling from the joint distribution

12 Example of sampling from the joint distribution

13 Example of sampling from the joint distribution

14 Example of sampling from the joint distribution

15 Example of sampling from the joint distribution

16 Example of sampling from the joint distribution

17 Inference via sampling Suppose we drew N samples from the joint distribution How do we compute X = x e X = x e = x e e # of times x and e happen / # of times e happens / N N Rejection sampling: to compute X = x e keep only the samples in which e happens and find in what proportion p of them x also happens

18 Inference via sampling Rejection sampling: to compute X = x e keep only the samples in which e happens and find in what proportion of them x also happens What if e is a rare event Example: burglary earthquake Rejection sampling ends up throwing away most of the samples

19 Inference via sampling Rejection sampling: to compute X = x e keep only the samples in which e happens and find in what proportion of them x also happens What if e is a rare event Example: burglary earthquake Rejection sampling ends up throwing away most of the samples Likelihood weighting Sample from X = x e but weight each Sample from X = x e but weight each sample by e

20 Inference via sampling: Summary Use the Bayesian network to generate samples from the joint distribution Approximate any desired conditional or marginal probability by empirical frequencies This approach is consistent: in the limit of infinitely many samples frequencies converge to probabilities No free lunch: to get a good approximate of the desired probability you may need an exponential number of samples anyway

21 Y Example: Solving the satisfiability problem by sampling = u1 u2 u3 u1 u2 u3 u2 u3 u4 C 1 C 2 C 3 u i = T = 0.5 Sample values of u 1 u 4 according to u i = T = 0.5 Estimate of Y: # of satisfying assignments / # of sampled assignments Not guaranteed to correctly figure out whether Y > 0 unless you sample every possible assignment!

22 Other approximate inference Variational methods methods Approximate the original network by a simpler one e.g. a polytree and try to minimize the divergence between the simplified and the exact model Belief propagation Iterative ti message passing: each node computes some local estimate and shares it with its neighbors. On the next iteration it uses information from its neighbors to update its estimate.

23 arameter learning Suppose we know the network structure but not the parameters and have a training set of complete observations Training set Sample C S R W 1 T F T T 2 F T F T 3 T F F F 4 T T T T 5 F T F T 6 T F T F.

24 arameter learning Suppose we know the network structure but not the parameters and have a training set of complete observations X arentsx is given by the observed frequencies of the different values of X for each combination of parent values Similar to sampling except your samples come from the training data and not from the model whose parameters are initially unknown

25 arameter learning Incomplete observations Training set Sample C S R W 1 F T T 2 T F T 3 F F F 4 T T T 5 T F T 6 F T F.

26 arameter learning Learning with incomplete observations: EM Expectation Maximization Algorithm t+ 1 = t θ arg maxθ Z = z x θ L x Z = z θ z Z: hidden variables

27 arameter learning What if the network structure is unknown Structure learning algorithms exist but they are pretty complicated C Training set Sample C S R W S 4 R 1 T F T T 2 F T F T 3 T F F F T T T T 5 F T F T W 6 T F T F.

The Basics of Graphical Models

The Basics of Graphical Models David M. Blei Columbia University October 3, 2015 Introduction These notes follow Chapter 2 of An Introduction to Probabilistic Graphical Models by Michael Jordan. Many figures