Learning, Bayesian Probability, Graphical Models, and Abduction

Similar documents
CS 2750 Machine Learning. Lecture 1. Machine Learning. CS 2750 Machine Learning.

Bayesian probability theory

Course Outline Department of Computing Science Faculty of Science. COMP Applied Artificial Intelligence (3,1,0) Fall 2015

Probabilistic Models for Big Data. Alex Davies and Roger Frigola University of Cambridge 13th February 2014

STA 4273H: Statistical Machine Learning

Neural Networks for Machine Learning. Lecture 13a The ups and downs of backpropagation

Machine Learning. Term 2012/2013 LSI - FIB. Javier Béjar cbea (LSI - FIB) Machine Learning Term 2012/ / 34

Full and Complete Binary Trees

Software Development Training Camp 1 (0-3) Prerequisite : Program development skill enhancement camp, at least 48 person-hours.

Course: Model, Learning, and Inference: Lecture 5

Statistics Graduate Courses

Bayesian Analysis for the Social Sciences

CS MidTerm Exam 4/1/2004 Name: KEY. Page Max Score Total 139

Sampling via Moment Sharing: A New Framework for Distributed Bayesian Inference for Big Data

Cell Phone based Activity Detection using Markov Logic Network

Attribution. Modified from Stuart Russell s slides (Berkeley) Parts of the slides are inspired by Dan Klein s lecture material for CS 188 (Berkeley)

Supervised Learning (Big Data Analytics)

Christfried Webers. Canberra February June 2015

MONITORING AND DIAGNOSIS OF A MULTI-STAGE MANUFACTURING PROCESS USING BAYESIAN NETWORKS

Data Mining - Evaluation of Classifiers

Big Data Big Knowledge?

Customer Classification And Prediction Based On Data Mining Technique

Ensemble Methods. Knowledge Discovery and Data Mining 2 (VU) ( ) Roman Kern. KTI, TU Graz

Bayesian Networks. Mausam (Slides by UW-AI faculty)

Machine Learning and Data Analysis overview. Department of Cybernetics, Czech Technical University in Prague.

Healthcare data analytics. Da-Wei Wang Institute of Information Science

Agenda. Interface Agents. Interface Agents

Learning is a very general term denoting the way in which agents:

Multisensor Data Fusion and Applications

IC05 Introduction on Networks &Visualization Nov

Chapter ML:IV. IV. Statistical Learning. Probability Basics Bayes Classification Maximum a-posteriori Hypotheses

DETERMINING THE CONDITIONAL PROBABILITIES IN BAYESIAN NETWORKS

Performance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification

Linear Threshold Units

Software and Hardware Solutions for Accurate Data and Profitable Operations. Miguel J. Donald J. Chmielewski Contributor. DuyQuang Nguyen Tanth

Introduction to Machine Learning and Data Mining. Prof. Dr. Igor Trajkovski

The Basics of Graphical Models

Machine Learning. Chapter 18, 21. Some material adopted from notes by Chuck Dyer

Designing a learning system

Lecture 9: Introduction to Pattern Analysis

Calculation of Minimum Distances. Minimum Distance to Means. Σi i = 1

Machine Learning. Mausam (based on slides by Tom Mitchell, Oren Etzioni and Pedro Domingos)

Data Modeling & Analysis Techniques. Probability & Statistics. Manfred Huber

How To Understand The Relation Between Simplicity And Probability In Computer Science

Introduction to Learning & Decision Trees

Monotonicity Hints. Abstract

Lecture 8 February 4

CpSc810 Goddard Notes Chapter 7. Expert Systems

Expert Systems. A knowledge-based approach to intelligent systems. Intelligent System. Motivation. Approaches & Ingredients. Terminology.

A Learning Based Method for Super-Resolution of Low Resolution Images

Statistical Machine Learning

Principles of Data Mining by Hand&Mannila&Smyth

Philosophical argument

Model Combination. 24 Novembre 2009

Bounded Treewidth in Knowledge Representation and Reasoning 1

Neural Networks and Support Vector Machines

Machine Learning and Data Mining. Fundamentals, robotics, recognition

Basics of Statistical Machine Learning

CS 688 Pattern Recognition Lecture 4. Linear Models for Classification

Selecting Pedagogical Protocols Using SOM

Data Mining Algorithms Part 1. Dejan Sarka

Collaborative Filtering. Radek Pelánek

Machine Learning.

1. Classification problems

Bayesian Networks. Read R&N Ch Next lecture: Read R&N

Learning Gaussian process models from big data. Alan Qi Purdue University Joint work with Z. Xu, F. Yan, B. Dai, and Y. Zhu

Bayesian Statistics: Indian Buffet Process

Learning Example. Machine learning and our focus. Another Example. An example: data (loan application) The data and the goal

Learning Agents: Introduction

Chapter 11 Monte Carlo Simulation

Simple and efficient online algorithms for real world applications

Data Distribution Algorithms for Reliable. Reliable Parallel Storage on Flash Memories

Decompose Error Rate into components, some of which can be measured on unlabeled data

Machine Learning and Pattern Recognition Logistic Regression

The Optimality of Naive Bayes

Introduction to Machine Learning Lecture 1. Mehryar Mohri Courant Institute and Google Research

Notes from Week 1: Algorithms for sequential prediction

Knowledge-Based Probabilistic Reasoning from Expert Systems to Graphical Models

Insurance Analytics - analýza dat a prediktivní modelování v pojišťovnictví. Pavel Kříž. Seminář z aktuárských věd MFF 4.

HUGIN Expert White Paper

Homework Assignment 7

Data Mining Part 5. Prediction

Bayesian models of cognition

Bayes and Naïve Bayes. cs534-machine Learning

Decision Trees from large Databases: SLIQ

Integrating Cognitive Models Based on Different Computational Methods

1 Maximum likelihood estimation

Linear Classification. Volker Tresp Summer 2015

Statistical Models in Data Mining

KATE GLEASON COLLEGE OF ENGINEERING. John D. Hromi Center for Quality and Applied Statistics

Predictive Modeling and Big Data

Information Management course

Bilinear Prediction Using Low-Rank Models

Better credit models benefit us all

A Few Basics of Probability

Study Manual. Probabilistic Reasoning. Silja Renooij. August 2015

INDUCTIVE & DEDUCTIVE RESEARCH APPROACH

In Proceedings of the Eleventh Conference on Biocybernetics and Biomedical Engineering, pages , Warsaw, Poland, December 2-4, 1999

CS Master Level Courses and Areas COURSE DESCRIPTIONS. CSCI 521 Real-Time Systems. CSCI 522 High Performance Computing

Transcription:

Learning, Bayesian Probability, Graphical Models, and Abduction David Poole University of British Columbia 1

Induction and abduction Bayesian learning statistical learning MDL principle Bayesian networks logic programming probabilistic Horn abduction neural networks possible worlds semantics logical abduction 2

Overview Causal and evidential modelling & reasoning Bayesian networks, Bayesian conditioning & abduction Noise, overfitting, and Bayesian learning 3

Causal & Evidential Modelling Causal modelling: causes of interest effects observed vision: diagnosis: learning: scene image disease symptoms model data 4

Evidential modelling: effects causes vision: diagnosis: learning: image scene symptoms diseases data model 5

Causal & Evidential Reasoning evidential reasoning observation cause causal reasoning prediction 6

Reasoning & Modelling Strategies How do we do causal and evidential reasoning, given modelling strategies? Evidential modelling & only evidential reasoning (Mycin, Neural Networks). Model evidentially + causally (problem: consistency, redundancy, knowledge acquisition) Model causally; use different reasoning strategies for causal & evidential reasoning. (deduction + abduction or Bayes theorem) 7

Bayes Rule [de Moivre 1718, Bayes 1763, Laplace 1774] P(h e) = P(e h)p(h) P(e) Proof: P(h e) = P(e h)p(h) = P(h e)p(e) 8

Lesson #1 You should know the difference between evidential & causal modelling evidential & causal reasoning There seems to be a relationship between Bayes theorem and abduction used for the same task. 9

Overview Causal and evidential modelling & reasoning Bayesian networks, Bayesian conditioning & abduction Noise, overfitting, and Bayesian learning 10

Bayesian Networks Graphical representation of independence. DAGs with nodes representing random variables. Embed independence assumption: If b 1,,b n are the parents of a then P(a b 1,,b n,v)=p(a b 1,,b n ) if V is not a descendant of a. 11

Bayesian Network for Overhead Projector power_in_building projector_plugged_in light_switch_on power_in_wire projector_switch_on room_light_on power_in_projector lamp_works alan_reading_book projector_lamp_on ray_is_awake screen_lit_up mirror_working ray_says_"screen is dark" 12

Bayesian networks as logic programs projector_lamp_on power_in_projector lamp_works projector_working_ok. possible hypothesis with associated probability projector_lamp_on power_in_projector lamp_works working_with_faulty_lamp. 13

Probabilities of hypotheses P(projector_working_ok) = P(projector_lamp_on power_in_projector lamp_works) provided as part of Bayesian network P( projector_working_ok) = 1 P(projector_working_ok) 14

What do these logic programs mean? Possible world for each assignment of truth value to a possible hypothesis: {projector_working_ok, working_with_faulty_lamp} {projector_working_ok, working_with_faulty_lamp} { projector_working_ok, working_with_faulty_lamp} { projector_working_ok, working_with_faulty_lamp} Probability of a possible world is the product of the probabilities of the associated hypotheses. Logic program specifies what else is true in each possible world. 15

Probabilistic logic programs & abduction Semantics is abductive in nature set of explanations of a proposition characterizes the possible worlds in which it is true. (assume possible hypotheses and their negations). P(g) = E is an explanation of g P(E) P(E) = h E P(h) given with logic program 16

Conditional Probabilities P(g e) = P(g e) P(e) explain g e explain e Given evidence e, explain e, then try to explain g from these explanations. 17

Lessons #2 Bayesian conditioning = abduction The evidence of Bayesian conditioning is what is to be explained. Condition on all information obtained since the knowledge base was built. P(h e k) P k (h e) 18

Overview Causal and evidential modelling & reasoning Bayesian networks, Bayesian conditioning & abduction Noise, overfitting, and Bayesian learning 19

Induction evidential reasoning data model causal reasoning prediction 20

Potential Confusion Often the data is about some evidential reasoning task. e.g., classification, diagnosis, recognition Example: We can do Bayesian learning where the hypotheses are decision trees, neural networks, parametrized distribution, or Bayesian networks. Example: We can do hill climbing learning with cross validation where the hypotheses are decision trees, neural networks, parametrized distribution, or Bayesian networks. 21

Noise and Overfitting Most data contains noise (errors, inadequate attributes, spurious correlations) overfitting the model learned fits random correlations in the data Example: A more detailed decision tree always fits the data better, but usually smaller decision trees provides better predictive value. Need tradeoff between model simplicity + fit to data. 22

Overfitting and Bayes theorem fit to data bias P(h e) = P(e h)p (h) P(e) normalizing constant 23

Minimum description length principle Choose best hypothesis given the evidence: arg max h P(h e) = arg max h = arg max h = arg max h P(e h)p(h) P(e) P(e h)p(h) log 2 P(e h) {}}{ the number of bits to describe the data in terms of the model + log 2 P(h) {}}{ the number of bits to describe the model 24

Graphical Models for Learning Idea: model data Example: parameter estimation for probability of heads (from [Buntine, JAIR, 94]) heads 1 heads 2 heads... N heads N 25

Abductive Version of Parameter Learning heads(c) turns_heads(c, ) prob_heads( ). tails(c) turns_tails(c, ) prob_heads( ). C {turns_heads(c, ), turns_tails(c, )} C {prob_heads(θ) : θ [0,1]} C Prob(turns_heads(C,θ))=θ Prob(turns_tails(C,θ))=1 θ Prob(prob_heads(θ)) = 1 uniform on [0, 1]. 26

Explaining Data If you observe: heads(c 1 ), tails(c 2 ), tails(c 3 ), heads(c 4 ), heads(c 5 ),... For each θ [0,1]there is an explanation: {prob_heads(θ), turns_heads(c 1,θ),turns_tails(c 2,θ), turns_tails(c 3,θ),turns_heads(c 4,θ),turns_heads(c 5,θ),...} 27

Abductive neural-network learning i 1 i 2 i 3 o 1 h 1 h 2 o 2 o 3 prop(x, o 2, V) prop(x, h 1, V 1 ) prop(x, h 2, V 2 ) param(p 8, P 1 ) param(p 10, P 2 ) abducible 1 V = 1 + e (V. sigmoid additive 1P 1 +V 2 P 2 ) 28

Lesson #3 Abductive view of Bayesian learning: rules imply data from parameters or possible representations parameter values or possible representations abducible rules contain logical variables for data items 29

Evidential versus causal modelling Neural Nets Bayesian Nets modelling evidential causal sigmoid additive linear gaussian related by Bayes theorem evidential reasoning direct abduction causal reasoning none direct context changes fragile robust learning easier more difficult 30

Conclusion Bayesian reasoning is abduction. Bayesian network is a causal modelling language abduction + probability. What logic-based abduction can learn from Bayesians: Handle noise, overfitting Conditioning: explain everything Algorithms: exploit sparse structure, exploit extreme distributions, or stochastic simulation. What the Bayesians can learn: Richer representations. 31