Artificial Intelligence Mar 27, Bayesian Networks 1 P (T D)P (D) + P (T D)P ( D) =

Similar documents
Bayesian Networks. Read R&N Ch Next lecture: Read R&N

Bayesian Networks. Mausam (Slides by UW-AI faculty)

The Basics of Graphical Models

Data Modeling & Analysis Techniques. Probability & Statistics. Manfred Huber

Bayesian Statistics: Indian Buffet Process

Compression algorithm for Bayesian network modeling of binary systems

Lecture 9: Bayesian hypothesis testing

3. The Junction Tree Algorithms

Neural Networks for Machine Learning. Lecture 13a The ups and downs of backpropagation

1 Maximum likelihood estimation

Sampling via Moment Sharing: A New Framework for Distributed Bayesian Inference for Big Data

Machine Learning.

SYSM 6304: Risk and Decision Analysis Lecture 5: Methods of Risk Analysis

CHAPTER 2 Estimating Probabilities

Introduction to Mobile Robotics Bayes Filter Particle Filter and Monte Carlo Localization

Bayesian networks - Time-series models - Apache Spark & Scala

Bayesian Updating with Discrete Priors Class 11, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom

Chapter 7.30 Retrieving Medical Records Using Bayesian Networks

Relational Dynamic Bayesian Networks: a report. Cristina Manfredotti

A Bayesian Approach for on-line max auditing of Dynamic Statistical Databases

GeNIeRate: An Interactive Generator of Diagnostic Bayesian Network Models

Chapter 14 Managing Operational Risks with Bayesian Networks

Lecture 7: NP-Complete Problems

Bayesian Analysis for the Social Sciences

Bayesian Tutorial (Sheet Updated 20 March)

Course: Model, Learning, and Inference: Lecture 5

Lecture Note 1 Set and Probability Theory. MIT Spring 2006 Herman Bennett

Model-based Synthesis. Tony O Hagan

The Certainty-Factor Model

What is Statistics? Lecture 1. Introduction and probability review. Idea of parametric inference

Introduction. The Quine-McCluskey Method Handout 5 January 21, CSEE E6861y Prof. Steven Nowick

Knowledge-Based Probabilistic Reasoning from Expert Systems to Graphical Models

Tracking Groups of Pedestrians in Video Sequences

Incorporating Evidence in Bayesian networks with the Select Operator

Bayesian models of cognition

Study Manual. Probabilistic Reasoning. Silja Renooij. August 2015

Question 2 Naïve Bayes (16 points)

Lecture 18: Applications of Dynamic Programming Steven Skiena. Department of Computer Science State University of New York Stony Brook, NY

Learning from Data: Naive Bayes

Big Data, Machine Learning, Causal Models

Decision Trees and Networks

Bayes and Naïve Bayes. cs534-machine Learning

Basics of Statistical Machine Learning

Probability Distributions

Probability and statistics; Rehearsal for pattern recognition

Experimental Uncertainty and Probability

Final Mathematics 5010, Section 1, Fall 2004 Instructor: D.A. Levin

In Proceedings of the Eleventh Conference on Biocybernetics and Biomedical Engineering, pages , Warsaw, Poland, December 2-4, 1999

Bayesian Statistics in One Hour. Patrick Lam

PS 271B: Quantitative Methods II. Lecture Notes

A Web-based Bayesian Intelligent Tutoring System for Computer Programming

Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay

Stochastic Processes and Queueing Theory used in Cloud Computer Performance Simulations

Introduction to Markov Chain Monte Carlo

Life of A Knowledge Base (KB)

Why? A central concept in Computer Science. Algorithms are ubiquitous.

Part III: Machine Learning. CS 188: Artificial Intelligence. Machine Learning This Set of Slides. Parameter Estimation. Estimation: Smoothing

An Empirical Evaluation of Bayesian Networks Derived from Fault Trees

Unit 19: Probability Models

Model Uncertainty in Classical Conditioning

ECE302 Spring 2006 HW3 Solutions February 2,

Optimal Stopping in Software Testing

Agenda. Interface Agents. Interface Agents

Informatics 2D Reasoning and Agents Semester 2,

Introduction to Detection Theory

E3: PROBABILITY AND STATISTICS lecture notes

Data Mining for Business Analytics

Statistical Models in Data Mining

Foundations of Statistics Frequentist and Bayesian

Statistical Machine Learning from Data

Review. Bayesianism and Reliability. Today s Class

STA 4273H: Statistical Machine Learning

People have thought about, and defined, probability in different ways. important to note the consequences of the definition:

CSCI567 Machine Learning (Fall 2014)

MATH10040 Chapter 2: Prime and relatively prime numbers

I I I I I I I I I I I I I I I I I I I

Prediction of Heart Disease Using Naïve Bayes Algorithm

Chapter 28. Bayesian Networks

Inference on Phase-type Models via MCMC

5 Directed acyclic graphs

Financial analysis using Bayesian networks

L3: Statistical Modeling with Hadoop

Exponential Random Graph Models for Social Network Analysis. Danny Wyatt 590AI March 6, 2009

1 Prior Probability and Posterior Probability

A survey on click modeling in web search

A Bayesian Network Model for Diagnosis of Liver Disorders Agnieszka Onisko, M.S., 1,2 Marek J. Druzdzel, Ph.D., 1 and Hanna Wasyluk, M.D.,Ph.D.

Introduction to Probability

Towards running complex models on big data

5. Continuous Random Variables

Feedforward Neural Networks and Backpropagation

Reasoning Under Uncertainty: Variable Elimination Example

MATH 10: Elementary Statistics and Probability Chapter 5: Continuous Random Variables

Learning diagnostic diagrams in transport-based data-collection systems

Part 2: One-parameter models

Business Statistics 41000: Probability 1

The Application of Bayesian Optimization and Classifier Systems in Nurse Scheduling

Experimental Design. Power and Sample Size Determination. Proportions. Proportions. Confidence Interval for p. The Binomial Test

MATHEMATICS FOR ENGINEERING INTEGRATION TUTORIAL 3 - NUMERICAL INTEGRATION METHODS

Running Head: STRUCTURE INDUCTION IN DIAGNOSTIC REASONING 1. Structure Induction in Diagnostic Causal Reasoning. Development, Berlin, Germany

The Role of Causal Schemas in Inductive Reasoning

Transcription:

Artificial Intelligence 15-381 Mar 27, 2007 Bayesian Networks 1 Recap of last lecture Probability: precise representation of uncertainty Probability theory: optimal updating of knowledge based on new information Bayesian Inference with Boolean variables posterior P (D T ) = likelihood prior P (T D)P (D) P (T D)P (D) + P (T D)P ( D) normalizing constant Inferences combines sources of knowledge 0.9 0.001 P (D T ) = 0.9 0.001 + 0.1 0.999 = 0.0089 Inference is sequential P (D T 1, T 2 ) = P (T 2 D)P (T 1 D)P (D) P (T 2 )P (T 1 ) 2

Bayesian inference with continuous variables (recap) likelihood (Binomial) p(y!=0.05, n=5) p(θ y, n) = posterior p(θ y, n) likelihood ( ) n θ y (1 θ) n y y p(y!=0.2, n=5) prior p(y θ, n)p(θ n) p(y n) = normalizing constant p(y!=0.35, n=5) p(y θ, n)p(θ n)dθ p(y!=0.5, n=5) 1 2 3 4 5 61 2 3 4 5 6 y y 1 2 3 4 5 6 y 1 2 3 4 5 6 y p(! y=1, n=5) posterior (beta) p(! y=0, n=0) prior (uniform) 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1! 3 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1! Today: Inference with more complex dependencies How do we represent (model) more complex probabilistic relationships? How do we use these models to draw inferences? 4

Probabilistic reasoning Suppose we go to my house and see that the door is open. - What s the cause? Is it a? Should we go in? Call the police? - Then again, it could just be my. Maybe she came home early. How should we represent these relationships? 5 Belief networks In Belief networks, causal relationships are represented in directed acyclic graphs. Arrows indicate causal relationships between the nodes. 6

Types of probabilistic relationships How do we represent these relationships? Direct cause Indirect cause Common cause Common effect A A A A B B B C B C C P(B A) P(B A) P(C B) P(B A) P(C A) P(C A,B) C is independent of A given B Are B and C independent? Are A and B independent? 7 Belief networks In Belief networks, causal relationships are represented in directed acyclic graphs. Arrows indicate causal relationships between the nodes. How can we determine what is happening before we go in? We need more information. What else can we observe? 8

Explaining away Suppose we notice that the car is in the garage. Now we infer that it s probably my, and not a. This fact explains away the hypothesis of a. Note that there is no direct causal link between and. Yet, seeing the car changes our beliefs about the. 9 Explaining away Suppose we notice that the car is in the garage. Now we infer that it s probably my, and not a. This fact explains away the hypothesis of a. We could also notice the door was damaged, in which case we reach the opposite conclusion. How do we make this inference process more precise? Let s start by writing down the conditional probabilities. 10

Defining the belief network Each link in the graph represents a conditional relationship between nodes. To compute the inference, we must specify the conditional probabilities. 11 Defining the belief network Each link in the graph represents a conditional relationship between nodes. To compute the inference, we must specify the conditional probabilities. Let s start with the. What do we specify? What else do we need to specify? The priors probabilities. W B P(O W,B) F F 0.01 F T 0.25 T F 0.05 T T 0.75 Check: Does this column have to sum to one? No! Only the full joint distribution does. This is a conditional distribution. But note that: P( O W,B) = 1- P(O W,B) 12

Defining the belief network Each link in the graph represents a conditional relationship between nodes. To compute the inference, we must specify the conditional probabilities. Let s start with the. What do we specify? What else do we need to specify? The priors probabilities. P(W) 0.05 W B P(O W,B) F F 0.01 F T 0.25 T F 0.05 T T 0.75 13 Defining the belief network Each link in the graph represents a conditional relationship between nodes. To compute the inference, we must specify the conditional probabilities. Let s start with the. What do we specify? What else do we need to specify? W B P(O W,B) F F 0.01 F T 0.25 The priors probabilities. P(W) 0.05 T F 0.05 T T 0.75 P(B) 0.001 14

Defining the belief network Each link in the graph represents a conditional relationship between nodes. To compute the inference, we must specify the conditional probabilities. Let s start with the. What do we specify? W B P(O W,B) F F 0.01 F T 0.25 Finally, we specify the remaining conditionals P(W) 0.05 T F 0.05 T T 0.75 P(B) 0.001 W P(C W) F 0.01 T 0.95 15 Defining the belief network Each link in the graph represents a conditional relationship between nodes. To compute the inference, we must specify the conditional probabilities. Let s start with the. What do we specify? W B P(O W,B) F F 0.01 F T 0.25 Finally, we specify the remaining conditionals P(W) 0.05 T F 0.05 T T 0.75 P(B) 0.001 W P(C W) F 0.01 T 0.95 B P(D B) F 0.001 T 0.5 16 Now what?

Calculating probabilities using the joint distribution What the probability that the door is open, it is my and not a, we see the car in the garage, and the door is not damaged? Mathematically, we want to compute the expression: P(o,w, b,c, d) =? We can just repeatedly apply the rule relating joint and conditional probabilities. - P(x,y) = P(x y) P(y) 17 Calculating probabilities using the joint distribution The probability that the door is open, it is my and not a, we see the car in the garage, and the door is not damaged. P(o,w, b,c, d) = P(o w, b,c, d)p(w, b,c, d) = P(o w, b)p(w, b,c, d) = P(o w, b)p(c w, b, d)p(w, b, d) = P(o w, b)p(c w)p(w, b, d) = P(o w, b)p(c w)p( d w, b)p(w, b) = P(o w, b)p(c w)p( d b)p(w, b) = P(o w, b)p(c w)p( d b)p(w)p( b) 18

Calculating probabilities using the joint distribution P(o,w, b,c, d) = P(o w, b)p(c w)p( d b)p(w)p( b) = 0.05! 0.95! 0.999! 0.05! 0.999 = 0.0024 This is essentially the probability that my is home and leaves the door open. P(W) 0.05 W B P(O W,B) F F 0.01 F T 0.25 T F 0.05 T T 0.75 P(B) 0.001 W P(C W) F 0.01 T 0.95 B P(D B) F 0.001 T 0.5 19 Calculating probabilities in a general Bayesian belief network A C E Note that by specifying all the conditional probabilities, we have also specified the joint probability. For the directed graph above: P(A,B,C,D,E) = P(A) P(B C) P(C A) P(D C,E) P(E A,C) The general expression is: B D P (x 1,..., x n ) P (X 1 = x 1... X n = x n ) n = P (x i parents(x i )) i=1 " With this we can calculate (in principle) the probability of any joint probability. " This implies that we can also calculate any conditional probability. 20

Calculating conditional probabilities Using the joint we can compute any conditional probability too The conditional probability of any one subset of variables given another disjoint subset is P (S 1 S 2 ) = P (S 1 S 2 ) P (S 2 ) = p S1 S 2 p S2 where p S is shorthand for all the entries of the joint matching subset S. How many terms are in this sum? 2 N The number of terms in the sums is exponential in the number of variables. In fact, general querying Bayes nets is NP complete. 21 So what do we do? There are also many approximations: - stochastic (MCMC) approximations - approximations The are special cases of Bayes nets for which there are fast, exact algorithms: - variable elimination - belief propagation 22

Belief networks with multiple causes In the models above, we specified the joint conditional density by hand. This specified the probability of a variable given each possible value of the causal nodes. Can this be specified in a more generic way? Can we avoid having to specify every entry in the joint conditional pdf? For this we need to specify: P(X parents(x)) One classic example of this function is the Noisy-OR model. W B P(O W,B) F F 0.01 F T 0.25 T F 0.05 T T 0.75 23 Defining causal relationships using Noisy-OR We assume each cause C j can produce effect Ei with probability fij. The noisy-or model assumes the parent causes of effect E i contribute independently. The probability that none of them caused effect E i is simply the product of the probabilities that each one did not cause Ei. The probability that any of them caused Ei is just one minus the above, i.e. P (E i par(e i )) = P (E i C 1,..., C n ) = 1 i (1 P (E i C j )) touch contaminated object (O) hit by viral droplet (D) f CD catch cold (C) eat contaminated food (F) f CO f CF = 1 i (1 f ij ) P (C D, O, F ) = 1 (1 f CD )(1 f CO )(1 f CF ) 24

A general one-layer causal network Could either model causes and effects Or equivalently stochastic binary features. Each input xi encodes the probability that the ith binary input feature is present. The set of features represented by #j is defined by weights fij which encode the probability that feature i is an instance of #j. The data: a set of stochastic binary patterns Each column is a distinct eight-dimensional binary feature. There are five underlying causal feature patterns. What are they?

The data: a set of stochastic binary patterns a Each column is a distinct eight-dimensional binary feature. b c true hidden causes of the data This is a learning problem, which we ll cover in later lecture. Hierarchical Statistical Models A Bayesian belief network: The joint probability of binary states is P (S W) = i P (S i pa(s i ), W) The probability S i depends only on its parents: pa(s i ) S i D P (S i pa(s i ), W) = { h( j S jw ji ) if S i = 1 1 h( j S jw ji ) if S i = 0 The function h specifies how causes are combined, h(u) = 1 exp( u), u > 0. Main points: hierarchical structure allows model to form high order representations upper states are priors for lower states weights encode higher order features 28

Next time Inference methods in Bayes Nets Example with continuous variables More general types of probabilistic networks 29