Artificial Intelligence. Conditional probability. Inference by enumeration. Independence. Lesson 11 (From Russell & Norvig)

Similar documents
Bayesian Networks. Mausam (Slides by UW-AI faculty)

Relational Dynamic Bayesian Networks: a report. Cristina Manfredotti

Bayesian Networks. Read R&N Ch Next lecture: Read R&N

Life of A Knowledge Base (KB)

Decision Trees and Networks

The Basics of Graphical Models

Model-based Synthesis. Tony O Hagan

The Certainty-Factor Model

Knowledge-based systems and the need for learning

A Bayesian Approach for on-line max auditing of Dynamic Statistical Databases

Attribution. Modified from Stuart Russell s slides (Berkeley) Parts of the slides are inspired by Dan Klein s lecture material for CS 188 (Berkeley)

Compression algorithm for Bayesian network modeling of binary systems

Part III: Machine Learning. CS 188: Artificial Intelligence. Machine Learning This Set of Slides. Parameter Estimation. Estimation: Smoothing

Chapter 28. Bayesian Networks

Bayesian Networks and Classifiers in Project Management

3. The Junction Tree Algorithms

Machine Learning.

A Bayesian Network Model for Diagnosis of Liver Disorders Agnieszka Onisko, M.S., 1,2 Marek J. Druzdzel, Ph.D., 1 and Hanna Wasyluk, M.D.,Ph.D.

Section 1. Inequalities

GeNIeRate: An Interactive Generator of Diagnostic Bayesian Network Models

Bayesian Tutorial (Sheet Updated 20 March)

Scheduling. Getting Started. Scheduling 79

Chapter 3. Distribution Problems. 3.1 The idea of a distribution The twenty-fold way

Customer Classification And Prediction Based On Data Mining Technique

Project and Production Management Prof. Arun Kanda Department of Mechanical Engineering Indian Institute of Technology, Delhi

Basics of Statistical Machine Learning

IE 680 Special Topics in Production Systems: Networks, Routing and Logistics*

Probabilistic Networks An Introduction to Bayesian Networks and Influence Diagrams

Supervised Learning (Big Data Analytics)

A Probabilistic Causal Model for Diagnosis of Liver Disorders

Course: Model, Learning, and Inference: Lecture 5

15-381: Artificial Intelligence. Probabilistic Reasoning and Inference

Artificial Intelligence

Statistical Fallacies: Lying to Ourselves and Others

CS104: Data Structures and Object-Oriented Design (Fall 2013) October 24, 2013: Priority Queues Scribes: CS 104 Teaching Team

Decision Making under Uncertainty

Final Project Report

Big Data, Machine Learning, Causal Models

Bayesian Spam Filtering

1 Introductory Comments. 2 Bayesian Probability

Invited Applications Paper

Learning from Data: Naive Bayes

5 Directed acyclic graphs

1 Maximum likelihood estimation

=

2) The three categories of forecasting models are time series, quantitative, and qualitative. 2)

Data Modeling & Analysis Techniques. Probability & Statistics. Manfred Huber

Study Manual. Probabilistic Reasoning. Silja Renooij. August 2015

8. Machine Learning Applied Artificial Intelligence

The Application of Bayesian Optimization and Classifier Systems in Nurse Scheduling

LCs for Binary Classification

An Empirical Evaluation of Bayesian Networks Derived from Fault Trees

SYSM 6304: Risk and Decision Analysis Lecture 5: Methods of Risk Analysis

Circuits 1 M H Miller

Likelihood: Frequentist vs Bayesian Reasoning

def: An axiom is a statement that is assumed to be true, or in the case of a mathematical system, is used to specify the system.

The Optimality of Naive Bayes

Statistical Models in Data Mining

Relational Databases

CSE 135: Introduction to Theory of Computation Decidability and Recognizability

Version Spaces.

Comparison of machine learning methods for intelligent tutoring systems

Cell Phone based Activity Detection using Markov Logic Network

Up/Down Analysis of Stock Index by Using Bayesian Network

Discrete Mathematics and Probability Theory Fall 2009 Satish Rao, David Tse Note 2

Complexity Theory. IE 661: Scheduling Theory Fall 2003 Satyaki Ghosh Dastidar

Question 2 Naïve Bayes (16 points)

CHAPTER 2 Estimating Probabilities

A Few Basics of Probability

Conditional Probability, Independence and Bayes Theorem Class 3, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom

Extend Table Lens for High-Dimensional Data Visualization and Classification Mining

(Refer Slide Time: 2:03)

Set Theory: Shading Venn Diagrams

Lecture 16 : Relations and Functions DRAFT

Bayesian Updating with Discrete Priors Class 11, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom

Handout #1: Mathematical Reasoning

CSE 473: Artificial Intelligence Autumn 2010

Mathematical models to estimate the quality of monitoring software systems for electrical substations


Lecture Note 1 Set and Probability Theory. MIT Spring 2006 Herman Bennett

p-values and significance levels (false positive or false alarm rates)

Chapter 20: Data Analysis

Techniques for Supporting Prediction of Security Breaches in. Critical Cloud Infrastructures Using Bayesian Network and. Markov Decision Process

Relational Calculus. Module 3, Lecture 2. Database Management Systems, R. Ramakrishnan 1

Financial Risk Management Exam Sample Questions/Answers

Performance Assessment Task Baseball Players Grade 6. Common Core State Standards Math - Content Standards

MONITORING AND DIAGNOSIS OF A MULTI-STAGE MANUFACTURING PROCESS USING BAYESIAN NETWORKS

Performance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification

Hypothesis testing. c 2014, Jeffrey S. Simonoff 1

Evaluation of Machine Learning Techniques for Green Energy Prediction

DETERMINING THE CONDITIONAL PROBABILITIES IN BAYESIAN NETWORKS

Smart Graphics: Methoden 3 Suche, Constraints

Transcription:

Artificial Intelligence Conditional probability Conditional or posterior probabilities e.g., cavity toothache) = 0.8 i.e., given that toothache is all I know tation for conditional distributions: Cavity Toothache) = 2-element vector of 2-element vectors) If we know more, e.g., cavity is also given, then we have cavity toothache,cavity) = 1 Lesson 11 (From Russell & rvig) New evidence may be irrelevant, allowing simplification, e.g., cavity toothache, sunny) = cavity toothache) = 0.8 This kind of inference, sanctioned by domain knowledge, is crucial 168 169 Inference by enumeration Start with the joint probability distribution: Independence A and B are independent iff A B) = A) or B A) = B) or A, B) = A) B) Can also compute conditional probabilities: cavity toothache) = cavity toothache) toothache) = 0.016 0.064 0.4 0.108 0.012 0.016 0.064 Toothache, Catch, Cavity, Weather) = Toothache, Catch, Cavity) Weather) 32 entries reduced to 12; for n independent biased coins, O(2 n ) O(n) Absolute independence powerful but rare Dentistry is a large field with hundreds of variables, none of which are independent. What to do? 170 171 1

Conditional independence Toothache, Cavity, Catch) has 2 3 1 = 7 independent entries If I have a cavity, the probability that the probe catches in it doesn't depend on whether I have a toothache: (1) catch toothache, cavity) = catch cavity) The same independence holds if I haven't got a cavity: (2) catch toothache, cavity) = catch cavity) Catch is conditionally independent of Toothache given Cavity: Catch Toothache,Cavity) = Catch Cavity) Equivalent statements: Toothache Catch, Cavity) = Toothache Cavity) Toothache, Catch Cavity) = Toothache Cavity) Catch Cavity) Bayesian networks A simple, graphical notation for conditional independence assertions and hence for compact specification of full joint distributions It describes how variables interact locally Local interactions chain together to give global, indirect interactions Syntax: a set of nodes, one per variable a directed, acyclic graph (link "directly influences") a conditional distribution for each node given its parents: P (X i Parents (X i ))- conditional probability table (CPT) 172 173 Example 1 Topology of network encodes conditional independence assertions: W=true) = 0.4 Cavity T=true Cavity) T.8 F.4 Cavity=true) = 0.8 Cavity C=true Cavity) T.9 F.05 Weather is independent of the other variables Toothache and Catch are conditionally independent given Cavity It is usually easy for a domain expert to decide what direct influences exist Example 2 N independent coin flips : X1=tree) = 0.5 X2=tree) = 0.5 Xn=tree) = 0.5 interactions between variables: absolute independence Does every Bayes Net can represent every full joint?. For example, Only distributions whose variables are absolutely independent can be represented by a Bayes net with no arcs. 174 175 2

Calculation of Joint Probability Given its parents, each node is conditionally independent of everything except its descendants Thus, x 1 x 2 x n ) = P i=1,,n x i parents(x i )) full joint distribution table Every BN over a domain implicitly represents some joint distribution over that domain Example 3 I'm at work, neighbor John calls to say my alarm is ringing, but neighbor Mary doesn't call. Sometimes it's set off by minor earthquakes. Is there a burglar? Variables: Burglary, Earthquake, Alarm, JohnCalls, MaryCalls Network topology reflects "causal" knowledge: A burglar can set the alarm off An earthquake can set the alarm off The alarm can cause Mary to call The alarm can cause John to call 176 177 Example contd. Answering queries I'm at work, neighbor John calls to say my alarm is ringing, but neighbor Mary doesn't call. Sometimes it's set off by minor earthquakes. Is there a burglar? b j, m) = b,j, m)/j, m) b,j m) = b,e,a,j, m) + b, e,a,j, m) + b,e, a,j, m) + b, e, a,j, m) = b)e)a b,e)j a) m a) + b)e) a b,e)j a) m a) + b) e)a b, e)j a) m a) + b) e) a b, e)j a) m a) Do the same to calculate b,j m) and normalize Worst case, for a network with n Boolean variables, O(n2 n ). 178 179 3

Laziness and Ignorance The probabilities actually summarize a potentially infinite set of circumstances in which the alarm might fail to go off high humidity power failure dead battery cut wires a dead mouse stuck inside the bell John or Mary might fail to call and report it out to lunch on vacation temporarily deaf passing helicopter Compactness A CPT for Boolean X i with k Boolean parents has 2 k rows for the combinations of parent values Each row requires one number p for X i = true (the number for X i = false is just 1-p) If each variable has no more than k parents, the complete network requires O(n 2 k ) numbers I.e., grows linearly with n, vs. O(2 n ) for the full joint distribution For burglary net, 1 + 1 + 4 + 2 + 2 = 10 numbers (vs. 2 5-1 = 31) We utilize the property of locally structured system 180 181 Casualty? Reverse Casualty? Rain causes Traffic Let s build the joint: 182 183 4

Casualty? What do the arrows really mean? Topology may happen to encode causal structure Topology really encodes conditional independencies Example 2, Again Consider the following 2 orders for insertion: (a) MaryCalls, JohnCalls, Alarm, Burglary, Earthquake Since, Burglary Alarm, JohnCalls, MaryCalls) = BurglarylAlarm) (b) Mary Calls, JohnCalls, Earthquake, Burglary, Alarm. When Bayes nets reflect the true causal patterns: Often simpler (nodes have fewer parents) Often easier to think about Often easier to elicit from experts BNs need not actually be causal Sometimes no causal net exists over the domain E.g. consider the variables Traffic and RoofDrips End up with arrows that reflect correlation, not causation 184 185 Connection Types Test Question Name Diagram X ind. Z? X ind. Z, given Y? H=true) = 0.1 H G=true H) T.4 F.8 H G Casual chain t necessarily R Common Cause Common Effect H - G - R - J - H G R =true H, G) false false 0.2 false true 0.9 true false 0.3 true true 0.8 Hardworking Good Grader Excellent Recommendation Landed a good Job R J=true R) false 0.2 true 0.7 J 186 187 5

i: ii iii What can be inferred?,, P J R P J H P H G P H P G P J R H P J Q: What is the value of H,G, R, J)? A: H,G, R, J) = H)*G H)* R H,G)* J H,G, R) = H)*G H)* R H,G)* J R) = 0.1 * 0.4 * 0.2 * 0.8 = 0.0064 Q: What if we want to add another parameter, C= Has The Right Connections? C=true) =??? C H=true) = 0.1 Answer C H G R =true H, G,C) false false false?? false false true?? false true false?? false true true?? true false false?? true false true?? true true false?? true true true?? H R G H G=true H) T.4 F.8 R J=true R) false 0.2 true 0.7 J 188 189 Reachability (the Bayes Ball) Example Shade evidence nodes Start at source node Try to reach target by search States: node, along with previous arc Successor function: Unobserved nodes: To any child To any parent if coming from a child Observed nodes: From parent to parent If you can t reach a node, it s conditionally independent of the start node 190 L ind. T, given T? L ind. B? L ind. B, given T? L ind. B, given T? L ind. B, given T and R? 191 6

192 Naïve Bayes Conditional Independence Assumption: features are independent of each other given the class: X1, X n X1 X 2 X n What can we model with naïve bayes? Any process where, Each cause has lots of independent effects Easy to estimate the CPT fro each effect We want to reason about the probability of different causes given observed effects C X 1 X 2 X 3 X n 193 Naive Bayes Classifiers Task: Classify a new instance D based on a tuple of attribute values into one of the classes c j C D x, c MAP 1, x2, x n argmax c x1, x2, xn) c C x1, x2, xn c) c) argmax c C x, x, x ) argmax x1, x2, xn c) c) c C 1 2 n CIS 391- Intro to AI Summary Bayesian networks provide a natural representation for (causally induced) conditional independence Topology + CPTs = compact representation of joint distribution Generally easy for domain experts to construct 194 7