CS 331: Artificial Intelligence Fundamentals of Probability II. Joint Probability Distribution. Marginalization. Marginalization

Similar documents
15-381: Artificial Intelligence. Probabilistic Reasoning and Inference

Attribution. Modified from Stuart Russell s slides (Berkeley) Parts of the slides are inspired by Dan Klein s lecture material for CS 188 (Berkeley)

Bayesian Updating with Discrete Priors Class 11, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom

Introduction to Game Theory IIIii. Payoffs: Probability and Expected Utility

Section 6.1 Joint Distribution Functions

The Basics of Graphical Models

3. Mathematical Induction

Bayesian probability theory

Probability: Terminology and Examples Class 2, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom

Part III: Machine Learning. CS 188: Artificial Intelligence. Machine Learning This Set of Slides. Parameter Estimation. Estimation: Smoothing

Conditional Probability, Independence and Bayes Theorem Class 3, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom

Coin Flip Questions. Suppose you flip a coin five times and write down the sequence of results, like HHHHH or HTTHT.

AP: LAB 8: THE CHI-SQUARE TEST. Probability, Random Chance, and Genetics

6.3 Conditional Probability and Independence

Chapter 4. Probability and Probability Distributions

MATH 140 Lab 4: Probability and the Standard Normal Distribution

Introductory Probability. MATH 107: Finite Mathematics University of Louisville. March 5, 2014

A Few Basics of Probability

Chapter 3. Cartesian Products and Relations. 3.1 Cartesian Products

Bayesian Networks. Mausam (Slides by UW-AI faculty)

Lecture 9: Bayesian hypothesis testing

How to Make the Most of Excel Spreadsheets

People have thought about, and defined, probability in different ways. important to note the consequences of the definition:

E3: PROBABILITY AND STATISTICS lecture notes

Data Modeling & Analysis Techniques. Probability & Statistics. Manfred Huber

Your Own Teeth and Gums

Philosophical argument

Problems often have a certain amount of uncertainty, possibly due to: Incompleteness of information about the environment,

A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution

LAB : THE CHI-SQUARE TEST. Probability, Random Chance, and Genetics

6th Grade Lesson Plan: Probably Probability

U.S. Department of Health and Human Services National Institutes of Health National Institute of Dental and Craniofacial Research

Bayesian Networks. Read R&N Ch Next lecture: Read R&N

OA3-10 Patterns in Addition Tables

Independent samples t-test. Dr. Tom Pierce Radford University

CHARACTERISTICS IN FLIGHT DATA ESTIMATION WITH LOGISTIC REGRESSION AND SUPPORT VECTOR MACHINES

Conditional Probability, Hypothesis Testing, and the Monty Hall Problem

Basics of Statistical Machine Learning

NF5-12 Flexibility with Equivalent Fractions and Pages

STT 200 LECTURE 1, SECTION 2,4 RECITATION 7 (10/16/2012)

Course: Model, Learning, and Inference: Lecture 5

A positive exponent means repeated multiplication. A negative exponent means the opposite of repeated multiplication, which is repeated

Simple Regression Theory II 2010 Samuel L. Baker

Why is Insurance Good? An Example Jon Bakija, Williams College (Revised October 2013)

Decimal Notations for Fractions Number and Operations Fractions /4.NF

Vieta s Formulas and the Identity Theorem

WRITING PROOFS. Christopher Heil Georgia Institute of Technology

Class 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1)

3.1. RATIONAL EXPRESSIONS

CAs and Turing Machines. The Basis for Universal Computation

Please be sure to save a copy of this activity to your computer!

Bayesian Tutorial (Sheet Updated 20 March)

Activity 1: Using base ten blocks to model operations on decimals

Math 58. Rumbos Fall Solutions to Assignment #3

EXAM. Exam #3. Math 1430, Spring April 21, 2001 ANSWERS

Cosmological Arguments for the Existence of God S. Clarke

CHAPTER 18 Programming Your App to Make Decisions: Conditional Blocks

Introduction to Hypothesis Testing

5.5. Solving linear systems by the elimination method

Machine Learning. CS 188: Artificial Intelligence Naïve Bayes. Example: Digit Recognition. Other Classification Tasks

Lecture 12: More on Registers, Multiplexers, Decoders, Comparators and Wot- Nots

Discrete Structures for Computer Science

Sample Size and Power in Clinical Trials

Lecture 1 Introduction Properties of Probability Methods of Enumeration Asrat Temesgen Stockholm University

MULTIPLICATION AND DIVISION OF REAL NUMBERS In this section we will complete the study of the four basic operations with real numbers.

Big Data, Machine Learning, Causal Models

Machine Learning.

Factoring Trinomials of the Form x 2 bx c

Discrete Math in Computer Science Homework 7 Solutions (Max Points: 80)

Click on the links below to jump directly to the relevant section

CHAPTER 2 Estimating Probabilities

Decision Trees and Networks

Oracle Turing machines faced with the verification problem

Programming Your App to Make Decisions: Conditional Blocks

MULTIPLE REGRESSION WITH CATEGORICAL DATA

6. LECTURE 6. Objectives

Australian Reels Slot Machine By Michael Shackleford, A.S.A. January 13, 2009

Year 9 set 1 Mathematics notes, to accompany the 9H book.

Marginal Cost. Example 1: Suppose the total cost in dollars per week by ABC Corporation for 2

Lab 11. Simulations. The Concept

Section 14 Simple Linear Regression: Introduction to Least Squares Regression

SIMPLIFYING ALGEBRAIC FRACTIONS

Introduction to formal semantics -

Comparison of frequentist and Bayesian inference. Class 20, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom

Decision Making under Uncertainty

Database Security. The Need for Database Security

Knowledge-based systems and the need for learning

7 Relations and Functions

p-values and significance levels (false positive or false alarm rates)

Introduction to Learning & Decision Trees

Book Review of Rosenhouse, The Monty Hall Problem. Leslie Burkholder 1

Case studies: Outline. Requirement Engineering. Case Study: Automated Banking System. UML and Case Studies ITNP090 - Object Oriented Software Design

Experiment 4 ~ Resistors in Series & Parallel

Part 1 Expressions, Equations, and Inequalities: Simplifying and Solving


Transcription:

Full Joint Probability Distributions CS 331: Artificial Intelligence Fundamentals of Probability II Toothache Cavity Catch Toothache, Cavity, Catch false false false 0.576 false false true 0.144 false true false 0.008 false true true 0.072 true false false 0.064 true false true 0.016 true true false 0.012 true true true 0.108 Catch means the dentist s probe catches in my teeth Thanks to Andrew Moore for some course material 1 This cell means Toothache = true, Cavity = true, Catch = true = 0.108 2 Full Joint Probability Distributions Joint Probability Distribution Toothache Cavity Catch Toothache, Cavity, Catch false false false 0.576 false false true 0.144 false true false 0.008 false true true 0.072 true false false 0.064 true false true 0.016 true true false 0.012 true true true 0.108 From the full joint probability distribution, we can calculate any probability involving the three random variables in this world eg. Toothache = true OR Cavity = true = Toothache=true true, Cavity=false, Catch=false + Toothache=true, Cavity=false, Catch=true + Toothache=false, Cavity=true, Catch=false + Toothache=false, Cavity=true, Catch=true + Toothache=true, Cavity=true, Catch=false + Toothache=true, Cavity = true, Catch=true + = 0.064 + 0.016 + 0.008 + 0.072 + 0.012 + 0.108 = 0.28 The probabilities in the last column sum to 1 3 Marginalization We can even calculate marginal probabilities (the probability distribution over a subset of the variables eg: Toothache=true true, Cavity=true true = Toothache=true, Cavity=true, Catch=true + Toothache=true, Cavity=true, Catch=false = 0.108 + 0.012 = 0.12 Marginalization Or even: Cavity=true = Toothache=true, Cavity=true, Catch=true + Toothache=true, Cavity=true, Catch=false Toothache=false, Cavity=true, Catch=true + Toothache=false, Cavity=true, Catch=false = 0.108 + 0.012 + 0.072 + 0.008 = 0.2 5 6 1

Marginalization The general marginalization rule for any sets of variables Y and Z: Y Y, z or z Y Y z z z z is over all possible combinations of values of Z (remember Z is a set Marginalization For continuous variables, marginalization involves taking the integral: Y Y, z dz 7 8 Normalization Cavity true Toothache true Cavity true, Toothache true Toothache true 0.108 0.012 0.6 0.108 0.012012 0.016016 0.064064 Cavity false Toothache true Cavity false, Toothache true Toothache true 0.016 0.064 0.4 0.108 0.012 0.016 0.064 Note that t 1/Toothache=true h t remains constant in the two equations. Normalization In fact, 1/toothache can be viewed as a normalization constant for Cavity toothache, ensuring it adds up to 1 We will refer to normalization constants with the symbol Cavity toothache Cavity, toothache 10 Inference Suppose you get a query such as Cavity = true Toothache = true Toothache is called the evidence variable because we observe it. More generally, it s a set of variables. Cavity is called the query variable (we ll assume it s a single variable for now There are also unobserved (aka hidden variables like Catch 11 Inference We will write the query as X e This is a probability distribution hence the boldface X = Query variable (a single variable for now E = Set of evidence variables e = the set of observed values for the evidence variables Y = Unobserved variables 12 2

Inference We will write the query as X e P ( X e X, e X, e, y Summation is over all possible combinations of values of the unobserved variables Y X = Query variable (a single variable for now E = Set of evidence variables e = the set of observed values for the evidence variables Y = Unobserved variables y Inference P ( X e X, e X, e, y Computing X e involves going through all possible entries of the full joint probability distribution and adding up probabilities with X=x i, E=e, and Y=y Suppose you have a domain with n Boolean variables. What is the space and time complexity of computing X e? y 14 How do you avoid the exponential space and time complexity of inference? Use independence (aka factoring Suppose the full joint distribution now consists of four variables: Toothache = {true, false} Catch ={true {true, false} Cavity = {true, false} Weather = {sunny, rain, cloudy, snow} There are now 32 entries in the full joint distribution table 15 16 Does the weather influence one s dental problems? Is Weather=cloudy Toothache = toothache, Catch = catch, Cavity = cavity = Weather=cloudy? In other words, is Weather independent of Toothache, Catch and Cavity? We say that variables X and Y are independent if any of the following hold: (note that they are all equivalent P ( X Y P ( X or Y X Y or X, Y X Y 17 18 3

Why is independence useful? Assume that Weather is independent of toothache, catch, cavity ie. Weather=cloudy Toothache = toothache, Catch = catch, Cavity = cavity = Weather=cloudy Why is independence useful? Weather=cloudy, Toothache = toothache, Catch = catch, Cavity = cavity = Weather=cloudy * Toothache = toothache, Catch = catch, Cavity = cavity Now we can calculate: l Weather=cloudy, Toothache = toothache, Catch = catch, Cavity = cavity = Weather=cloudy Toothache = toothache, Catch = catch, Cavity = cavity * toothache, catch, cavity = Weather=cloudy * Toothache = toothache, Catch = catch, Cavity = cavity 19 This table has 4 values This table has 8 values You now need to store 12 values to calculate Weather, Toothache, Catch, Cavity If Weather was not independent of Toothache, Catch, and Cavity then you would have needed 32 values 20 Another example: Suppose you have n coin flips and you want to calculate the joint distribution C 1,, C n If the coin flips are not independent, you need 2 n values in the table If the coin flips are independent, then n 1,..., Cn C i i1 C Each C i table has 2 entries and there are n of them for a total of 2n values is powerful! It required extra domain knowledge. A different kind of knowledge than numerical probabilities. It needed an understanding of causation. 21 22 Bayes Rule The product rule can be written in two ways: A, B = A BB A, B = B Bayes Rule More generally, the following is known as Bayes Rule: B A B B Note that these are distributions Sometimes, you can treat B as a normalization constant α You can combine the equations above to get: A B B B A B B 23 24 4

More General Forms of Bayes Rule Bayes Rule Example If A takes 2 values: A B If A takes k values: Av B i Av Av n A k1 i Av Av k i k 25 Meningitis causes stiff necks with probability 0.5. The prior probability of having meningitis is 0.00002. The prior probability of having a stiff neck is 0.05. What is the probability of having meningitis given that you have a stiff neck? Let m = patient has meningitis Let s = patient has stiff neck s m = 0.5 m = 0.00002 s = 0.05 s m m (0.5(0.00002 m s 0.0002 s 0.05 Bayes Rule Example Meningitis causes stiff necks with probability 0.5. The prior probability of having meningitis is 0.00002. The prior probability of having a stiff neck is 0.05. What is the probability of having meningitis given that you have a stiff neck? Let m = patient has meningitis Let s = patient has stiff neck s m = 0.5 m = 0.00002 s = 0.05 s m m P m s s Note: Even though s m = 0.5, m s = 0.0002 (0.5(0.00002 ( 0.0002 0.05 When is Bayes Rule Useful? Sometimes it s easier to get X Y than Y X. Information is typically available in the form effect cause rather than cause effect For example, symptom disease is easy to measure empirically but obtaining disease symptom is harder 28 How is Bayes Rule Used In machine learning, we use Bayes rule in the following way: Bayes Rule With More Than One Piece of Evidence Suppose you now have 2 evidence variables Toothache = true and Catch = catch (note that Cavity is uninstantiated below Likelihood of the data Prior probability D h h h D D Posterior probability h = hypothesis D = data 29 Cavity Toothache = true, Catch = true = α Toothache h h = true, Catch = true Cavity Cavity i In order to calculate Toothache = true, Catch = true Cavity, you need a table of 4 probability values. With N evidence variables, you need 2 N probability values. 30 5

Conditional Are Toothache and Catch independent? No if probe catches in the tooth, it likely has a cavity which causes the toothache. But given the presence or absence of the cavity, they are independent d (since they are directly caused dby the cavity but don t have a direct effect on each other Conditional independence: Toothache = true, Catch = catch Cavity = Toothache = true Cavity * Catch = true Cavity 31 Conditional General form: A, B C A C B C Or equivalently: A B, C A C B A, C B C and How to think about conditional independence: In A B, C = A C: if knowing C tells me everything about A, I don t gain anything by knowing B 32 Conditional 7 independent values in table (have to sum to 1 Toothache, Catch, Cavity = Toothache, (, Catch Cavity Cavity ( = Toothache Cavity Catch Cavity Cavity What You Should Know How to do inference in joint probability distributions How to use Bayes Rule Why independence and conditional independence is useful 2 independent values in table 2 independent values in table 1 independent value in table Conditional independence permits probabilistic systems to scale up! 33 34 6