Lecture 2: Introduction to belief (Bayesian) networks
|
|
|
- Mariah Henry
- 9 years ago
- Views:
Transcription
1 Lecture 2: Introduction to belief (Bayesian) networks Conditional independence What is a belief network? Independence maps (I-maps) January 7, COMP-526 Lecture 2 Recall from last time: Conditional probabilities Our probabilistic models will compute and manipulate conditional probabilities. Given two random variables X, Y, we denote by p(x = x Y = y) the probability of X taking value x given that we know that Y is certain to have value y. This fits the situation when we observe something and want to make an inference about something related but unobserved: p(cancer recurs tumor measurements) p(gene expressed > 1.3 transcription factor concentrations) p(collision to obstacle sensor readings) p(word uttered sound wave) January 7, COMP-526 Lecture 2
2 Recall from last time: Bayes rule Bayes rule is very simple but very important for relating conditional probabilities: p(x y) = p(y x)p(x) p(y) Bayes rule is a useful tool for inferring the posterior probability of a hypothesis based on evidence and a prior belief in the probability of different hypotheses. January 7, COMP-526 Lecture 2 Using Bayes rule for inference Often we want to form a hypothesis about the world based on observable variables. Bayes rule is fundamental when viewed in terms of stating the belief given to a hypothesis H given evidence e: p(h e) = p(e H)p(H) p(e) p(h e) is sometimes called posterior probability p(h) is called prior probability p(e H) is called likelihood of the evidence (data) p(e) is just a normalizing constant, that can be computed from the requirement that P h p(h = h e) = 1: p(e) = X h p(e h)p(h) Sometimes we write p(h e) p(e H)p(H) January 7, COMP-526 Lecture 2
3 Example: Medical Diagnosis A doctor knows that pneumonia causes a fever 95% of the time. She knows that if a person is selected randomly from the population, there is a 10 7 chance of the person having pneumonia. 1 in 100 people suffer from fever. You go to the doctor complaining about the symptom of having a fever (evidence). What is the probability that pneumonia is the cause of this symptom (hypothesis)? p(pneumonia fever) = p(fever pneumonia)p(pneumonia) p(fever) = January 7, COMP-526 Lecture 2 Computing conditional probabilities Typically, we are interested in the posterior joint distribution of some query variables Y given specific values e for some evidence variables E Let the hidden variables be Z = X Y E If we have a joint probability distribution, we can compute the answer by using the definition of conditional probabilities and marginalizing the hidden variables: p(y e) = p(y, e) p(e) p(y, e) = X z p(y, e, z) This yields the same big problem as before: the joint distribution is too big to handle January 7, COMP-526 Lecture 2
4 Independence of random variables revisited We said that two r.v. s X and Y are independent, denoted X Y, if p(x, y) = p(x)p(y). But we also know that p(x, y) = p(x y)p(y). Hence, two r.v. s are independent if and only if: p(x y) = p(x) (and vice versa), x Ω X, y Ω Y This means that knowledge about Y does not change the uncertainty about X and vice versa. Is there a similar requirement, but less stringent? January 7, COMP-526 Lecture 2 Conditional independence Two random variables X and Y are conditionally independent given Z if: p(x y, z) = p(x z), x, y, z This means that knowing the value of Y does not change the prediction about X if the value of Z is known. We denote this by X Y Z. January 7, COMP-526 Lecture 2
5 Example Consider the medical diagnosis problem with three random variables: P (patient has pneumonia), F (patient has a fever), C (patient has a cough) The full joint distribution has = 7 independent entries If someone has pneumonia, we can assume that the probability of a cough does not depend on whether they have a fever: p(c = 1 P = 1, F ) = p(c = 1 P = 1) (1) Same equality holds if the patient does not have pneumonia: p(c = 1 P = 0, F ) = p(c = 1 P = 0) (2) Hence, C and F are conditionally independent given P. January 7, COMP-526 Lecture 2 Example (continued) The joint distribution can now be written as: p(c, P, F ) = p(c P, F )p(f P )p(p ) = p(c P )p(f P )p(p ) Hence, the joint can be described using = 5 numbers instead of 7 Much more important savings happen with more variables January 7, COMP-526 Lecture 2
6 Naive Bayesian model A common assumption in early diagnosis is that the symptoms are independent of each other given the disease Let s 1,... s n be the symptoms exhibited by a patient (e.g. fever, headache etc) Let D be the patient s disease Then by using the naive Bayes assumption, we get: p(d, s 1,... s n ) = p(d)p(s 1 D) p(s n D) The conditional probability of the disease given the symptoms: p(d s 1,... s n ) = p(d, s 1,... s n ) p(s 1,... s n ) p(d)p(s 1 D) p(s n D) because the denominator is just a normalization constant. January 7, COMP-526 Lecture 2 Recursive Bayesian updating The naive Bayes assumption allows also for a very nice, incremental updating of beliefs as more evidence is gathered Suppose that after knowing symptoms s 1,... s n the probability of D is: ny p(d, s 1... s n ) = p(d) p(s i D) i=1 What happens if a new symptom s n+1 appears? n+1 Y p(, s 1... s n, s n+1 ) = p(d) p(s i D) = p(d, s 1... s n )p(s n+1 D) i=1 An even nicer formula can be obtained by taking logs: log p(d, s 1... s n, s n+1 ) = log p(d, s 1... s n ) + log p(s n+1 D) January 7, COMP-526 Lecture 2
7 Recursive Bayesian updating The naive Bayes assumption allows also for a very nice, incremental updating of beliefs as more evidence is gathered Suppose that after knowing symptoms s 1,... s n the probability of D is: ny p(d, s 1... s n ) = p(d) p(s i D) i=1 What happens if a new symptom s n+1 appears? n+1 Y p(d, s 1... s n, s n+1 ) = p(d) p(s i D) = p(d, s 1... s n )p(s n+1 D) i=1 An even nicer formula can be obtained by taking logs: log p(d, s 1... s n, s n+1 ) = log p(d, s 1... s n ) + log p(s n+1 D) January 7, COMP-526 Lecture 2 A graphical representation of the naive Bayesian model Diagnosis Sympt1 Sympt2... Sympt n The nodes represent random variables The arcs represent influences The lack of arcs represents conditional independence relationships January 7, COMP-526 Lecture 2
8 More generally: Bayesian networks p(e) E=1 E= E B p(b) B=1 B= E=0 E=1 p(r E) R= R= A=0 A=1 R C=1 p(c A) C= A C B=0,E=0 B=0,E=1 B=1,E=0 B=1,E=1 p(a B,E) A=1 A= Bayesian networks are a graphical representation of conditional independence relations, using graphs. January 7, COMP-526 Lecture 2 A graphical representation for probabilistic models Suppose the world is described by a set of r.v. s X 1,... X n Let us define a directed acyclic graph such that each node i corresponds to an r.v. X i Since this is a one-to-one mapping, we will use X i to denote both the node in the graph and the corresponding r.v. Let X πi be the set of parents for node X i in the graph We associate with each node the conditional probability distribution of the r.v. X i given its parents: p(x i X πi ). January 7, COMP-526 Lecture 2
9 Example: A Bayesian (belief) network p(e) E=1 E= E B p(b) B=1 B= E=0 E=1 p(r E) R= R= A=0 A=1 R C=1 p(c A) C= The nodes represent random variables The arcs represent influences A C B=0,E=0 B=0,E=1 B=1,E=0 B=1,E=1 p(a B,E) A=1 A= At each node, we have a conditional probability distribution (CPD) for the corresponding variable given its parents January 7, COMP-526 Lecture 2 Factorization Let G be a DAG over variables X 1,..., X n. We say that a joint probability distribution p factorizes according to G if p can be expressed as a product: ny p(x 1,..., x n ) = p(x i x πi ) i=1 The individual factors p(x i x πi ) are called local probabilistic models or conditional probability distributions (CPD). January 7, COMP-526 Lecture 2
10 Bayesian network definition A Bayesian network is a DAG G over variables X 1,..., X n, together with a distribution p that factorizes over G. p is specified as the set of conditional probability distributions associated with G s nodes. p(e) E=1 E= E B p(b) B=1 B= E=0 E=1 p(r E) R= R= A=0 A=1 R p(c A) C=1 C= A C p(a B,E) A=1 A=0 B=0,E= B=0,E= B=1,E= B=1,E= January 7, COMP-526 Lecture 2 Using a Bayes net for reasoning (1) Computing any entry in the joint probability table is easy because of the factorization property: p(b = 1, E = 0, A = 1, C = 1, R = 0) = p(b = 1)p(E = 0)p(A = 1 B = 1, E = 0)p(C = 1 A = 1)p(R = 0 E = 0) = Computing marginal probabilities is also easy. E.g. What is the probability that a neighbor calls? p(c = 1) = X p(c = 1, e, b, r, a) =... e,b,r,a January 7, COMP-526 Lecture 2
11 Using a Bayes net for reasoning (2) One might want to compute the conditional probability of a variable given evidence that is upstream from it in the graph E.g. What is the probability of a call in case of a burglary? P p(c = 1, B = 1) e,r,a p(c = 1, B = 1, e, r, a) p(c = 1 B = 1) = = p(b = 1) p(c, B = 1, e, r, a) P c,e,r,a This is called causal reasoning or prediction January 7, COMP-526 Lecture 2 Using a Bayes net for reasoning (3) We might have some evidence and need an explanation for it. In this case, we compute a conditional probability based on evidence that is downstream in the graph E.g. Suppose we got a call. What is the probability of a burglary? What is the probability of an earthquake? p(b = 1 C = 1) = p(e = 1 C = 1) = p(c = 1 B = 1)p(B = 1) p(c = 1) p(c = 1 E = 1)p(E = 1) p(c = 1) =... =... This is evidential reasoning or explanation. January 7, COMP-526 Lecture 2
12 Using a Bayes net for reasoning (4) Suppose that you now gather more evidence, e.g. the radio announces an earthquake. What happens to the probabilities? p(e = 1 C = 1, R = 1) p(e = 1 C = 1) and p(b = 1 C = 1, R = 1) p(b = 1 C = 1) This is called explaining away January 7, COMP-526 Lecture 2 I-Maps A directed acyclic graph (DAG) G whose nodes represent random variables X 1,..., X n is an I-map (independence map) of a distribution p if p satisfies the independence assumptions: X i Nondescendents(X i ) X πi, i = 1,... n where X πi are the parents of X i January 7, COMP-526 Lecture 2
13 Example Consider all possible DAG structures over 2 variables. Which graph is an I-map for the following distribution? x y p(x, y) What about the following distribution? x y p(x, y) January 7, COMP-526 Lecture 2 Example (continued) In the first example, X and Y are not independent, so the only I-maps are the graphs X Y and Y X, which assume no independence In the second example, we have p(x = 0) = 0.2, p(y = 0) = 0.4, and and for all entries p(x, y) = p(x)p(y) Hence, X Y, and there are three I-maps for the distribution: the graph in which X and Y are not connected, and both graphs above. Note that independence maps may have extra arcs! January 7, COMP-526 Lecture 2
Data Modeling & Analysis Techniques. Probability & Statistics. Manfred Huber 2011 1
Data Modeling & Analysis Techniques Probability & Statistics Manfred Huber 2011 1 Probability and Statistics Probability and statistics are often used interchangeably but are different, related fields
Bayesian Networks. Read R&N Ch. 14.1-14.2. Next lecture: Read R&N 18.1-18.4
Bayesian Networks Read R&N Ch. 14.1-14.2 Next lecture: Read R&N 18.1-18.4 You will be expected to know Basic concepts and vocabulary of Bayesian networks. Nodes represent random variables. Directed arcs
Bayesian Updating with Discrete Priors Class 11, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom
1 Learning Goals Bayesian Updating with Discrete Priors Class 11, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom 1. Be able to apply Bayes theorem to compute probabilities. 2. Be able to identify
Bayesian Networks. Mausam (Slides by UW-AI faculty)
Bayesian Networks Mausam (Slides by UW-AI faculty) Bayes Nets In general, joint distribution P over set of variables (X 1 x... x X n ) requires exponential space for representation & inference BNs provide
Life of A Knowledge Base (KB)
Life of A Knowledge Base (KB) A knowledge base system is a special kind of database management system to for knowledge base management. KB extraction: knowledge extraction using statistical models in NLP/ML
The Basics of Graphical Models
The Basics of Graphical Models David M. Blei Columbia University October 3, 2015 Introduction These notes follow Chapter 2 of An Introduction to Probabilistic Graphical Models by Michael Jordan. Many figures
People have thought about, and defined, probability in different ways. important to note the consequences of the definition:
PROBABILITY AND LIKELIHOOD, A BRIEF INTRODUCTION IN SUPPORT OF A COURSE ON MOLECULAR EVOLUTION (BIOL 3046) Probability The subject of PROBABILITY is a branch of mathematics dedicated to building models
Part III: Machine Learning. CS 188: Artificial Intelligence. Machine Learning This Set of Slides. Parameter Estimation. Estimation: Smoothing
CS 188: Artificial Intelligence Lecture 20: Dynamic Bayes Nets, Naïve Bayes Pieter Abbeel UC Berkeley Slides adapted from Dan Klein. Part III: Machine Learning Up until now: how to reason in a model and
The Certainty-Factor Model
The Certainty-Factor Model David Heckerman Departments of Computer Science and Pathology University of Southern California HMR 204, 2025 Zonal Ave Los Angeles, CA 90033 [email protected] 1 Introduction
Model-based Synthesis. Tony O Hagan
Model-based Synthesis Tony O Hagan Stochastic models Synthesising evidence through a statistical model 2 Evidence Synthesis (Session 3), Helsinki, 28/10/11 Graphical modelling The kinds of models that
Introduction to Mobile Robotics Bayes Filter Particle Filter and Monte Carlo Localization
Introduction to Mobile Robotics Bayes Filter Particle Filter and Monte Carlo Localization Wolfram Burgard, Maren Bennewitz, Diego Tipaldi, Luciano Spinello 1 Motivation Recall: Discrete filter Discretize
Machine Learning. CS 188: Artificial Intelligence Naïve Bayes. Example: Digit Recognition. Other Classification Tasks
CS 188: Artificial Intelligence Naïve Bayes Machine Learning Up until now: how use a model to make optimal decisions Machine learning: how to acquire a model from data / experience Learning parameters
Course: Model, Learning, and Inference: Lecture 5
Course: Model, Learning, and Inference: Lecture 5 Alan Yuille Department of Statistics, UCLA Los Angeles, CA 90095 [email protected] Abstract Probability distributions on structured representation.
Discrete Structures for Computer Science
Discrete Structures for Computer Science Adam J. Lee [email protected] 6111 Sennott Square Lecture #20: Bayes Theorem November 5, 2013 How can we incorporate prior knowledge? Sometimes we want to know
Neural Networks for Machine Learning. Lecture 13a The ups and downs of backpropagation
Neural Networks for Machine Learning Lecture 13a The ups and downs of backpropagation Geoffrey Hinton Nitish Srivastava, Kevin Swersky Tijmen Tieleman Abdel-rahman Mohamed A brief history of backpropagation
Study Manual. Probabilistic Reasoning. Silja Renooij. August 2015
Study Manual Probabilistic Reasoning 2015 2016 Silja Renooij August 2015 General information This study manual was designed to help guide your self studies. As such, it does not include material that is
Knowledge-Based Probabilistic Reasoning from Expert Systems to Graphical Models
Knowledge-Based Probabilistic Reasoning from Expert Systems to Graphical Models By George F. Luger & Chayan Chakrabarti {luger cc} @cs.unm.edu Department of Computer Science University of New Mexico Albuquerque
Math 141. Lecture 2: More Probability! Albyn Jones 1. [email protected] www.people.reed.edu/ jones/courses/141. 1 Library 304. Albyn Jones Math 141
Math 141 Lecture 2: More Probability! Albyn Jones 1 1 Library 304 [email protected] www.people.reed.edu/ jones/courses/141 Outline Law of total probability Bayes Theorem the Multiplication Rule, again Recall
Question 2 Naïve Bayes (16 points)
Question 2 Naïve Bayes (16 points) About 2/3 of your email is spam so you downloaded an open source spam filter based on word occurrences that uses the Naive Bayes classifier. Assume you collected the
Learning from Data: Naive Bayes
Semester 1 http://www.anc.ed.ac.uk/ amos/lfd/ Naive Bayes Typical example: Bayesian Spam Filter. Naive means naive. Bayesian methods can be much more sophisticated. Basic assumption: conditional independence.
Decision Trees and Networks
Lecture 21: Uncertainty 6 Today s Lecture Victor R. Lesser CMPSCI 683 Fall 2010 Decision Trees and Networks Decision Trees A decision tree is an explicit representation of all the possible scenarios from
Informatics 2D Reasoning and Agents Semester 2, 2015-16
Informatics 2D Reasoning and Agents Semester 2, 2015-16 Alex Lascarides [email protected] Lecture 29 Decision Making Under Uncertainty 24th March 2016 Informatics UoE Informatics 2D 1 Where are we? Last
MONITORING AND DIAGNOSIS OF A MULTI-STAGE MANUFACTURING PROCESS USING BAYESIAN NETWORKS
MONITORING AND DIAGNOSIS OF A MULTI-STAGE MANUFACTURING PROCESS USING BAYESIAN NETWORKS Eric Wolbrecht Bruce D Ambrosio Bob Paasch Oregon State University, Corvallis, OR Doug Kirby Hewlett Packard, Corvallis,
Bayesian Network Development
Bayesian Network Development Kenneth BACLAWSKI College of Computer Science, Northeastern University Boston, Massachusetts 02115 USA [email protected] Abstract Bayesian networks are a popular mechanism
Master s thesis tutorial: part III
for the Autonomous Compliant Research group Tinne De Laet, Wilm Decré, Diederik Verscheure Katholieke Universiteit Leuven, Department of Mechanical Engineering, PMA Division 30 oktober 2006 Outline General
Homework 8 Solutions
CSE 21 - Winter 2014 Homework Homework 8 Solutions 1 Of 330 male and 270 female employees at the Flagstaff Mall, 210 of the men and 180 of the women are on flex-time (flexible working hours). Given that
6.4 Logarithmic Equations and Inequalities
6.4 Logarithmic Equations and Inequalities 459 6.4 Logarithmic Equations and Inequalities In Section 6.3 we solved equations and inequalities involving exponential functions using one of two basic strategies.
Bayesian Tutorial (Sheet Updated 20 March)
Bayesian Tutorial (Sheet Updated 20 March) Practice Questions (for discussing in Class) Week starting 21 March 2016 1. What is the probability that the total of two dice will be greater than 8, given that
Basics of Statistical Machine Learning
CS761 Spring 2013 Advanced Machine Learning Basics of Statistical Machine Learning Lecturer: Xiaojin Zhu [email protected] Modern machine learning is rooted in statistics. You will find many familiar
Bayesian probability theory
Bayesian probability theory Bruno A. Olshausen arch 1, 2004 Abstract Bayesian probability theory provides a mathematical framework for peforming inference, or reasoning, using probability. The foundations
A survey on click modeling in web search
A survey on click modeling in web search Lianghao Li Hong Kong University of Science and Technology Outline 1 An overview of web search marketing 2 An overview of click modeling 3 A survey on click models
STA 371G: Statistics and Modeling
STA 371G: Statistics and Modeling Decision Making Under Uncertainty: Probability, Betting Odds and Bayes Theorem Mingyuan Zhou McCombs School of Business The University of Texas at Austin http://mingyuanzhou.github.io/sta371g
Solutions: Problems for Chapter 3. Solutions: Problems for Chapter 3
Problem A: You are dealt five cards from a standard deck. Are you more likely to be dealt two pairs or three of a kind? experiment: choose 5 cards at random from a standard deck Ω = {5-combinations of
7: The CRR Market Model
Ben Goldys and Marek Rutkowski School of Mathematics and Statistics University of Sydney MATH3075/3975 Financial Mathematics Semester 2, 2015 Outline We will examine the following issues: 1 The Cox-Ross-Rubinstein
Bayesian Analysis for the Social Sciences
Bayesian Analysis for the Social Sciences Simon Jackman Stanford University http://jackman.stanford.edu/bass November 9, 2012 Simon Jackman (Stanford) Bayesian Analysis for the Social Sciences November
5 Directed acyclic graphs
5 Directed acyclic graphs (5.1) Introduction In many statistical studies we have prior knowledge about a temporal or causal ordering of the variables. In this chapter we will use directed graphs to incorporate
Linear Threshold Units
Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear
Experimental Design. Power and Sample Size Determination. Proportions. Proportions. Confidence Interval for p. The Binomial Test
Experimental Design Power and Sample Size Determination Bret Hanlon and Bret Larget Department of Statistics University of Wisconsin Madison November 3 8, 2011 To this point in the semester, we have largely
Probabilities. Probability of a event. From Random Variables to Events. From Random Variables to Events. Probability Theory I
Victor Adamchi Danny Sleator Great Theoretical Ideas In Computer Science Probability Theory I CS 5-25 Spring 200 Lecture Feb. 6, 200 Carnegie Mellon University We will consider chance experiments with
Lecture 9: Bayesian hypothesis testing
Lecture 9: Bayesian hypothesis testing 5 November 27 In this lecture we ll learn about Bayesian hypothesis testing. 1 Introduction to Bayesian hypothesis testing Before we go into the details of Bayesian
Lecture Note 1 Set and Probability Theory. MIT 14.30 Spring 2006 Herman Bennett
Lecture Note 1 Set and Probability Theory MIT 14.30 Spring 2006 Herman Bennett 1 Set Theory 1.1 Definitions and Theorems 1. Experiment: any action or process whose outcome is subject to uncertainty. 2.
Chapter 14 Managing Operational Risks with Bayesian Networks
Chapter 14 Managing Operational Risks with Bayesian Networks Carol Alexander This chapter introduces Bayesian belief and decision networks as quantitative management tools for operational risks. Bayesian
Conditional Probability, Hypothesis Testing, and the Monty Hall Problem
Conditional Probability, Hypothesis Testing, and the Monty Hall Problem Ernie Croot September 17, 2008 On more than one occasion I have heard the comment Probability does not exist in the real world, and
Prediction of Heart Disease Using Naïve Bayes Algorithm
Prediction of Heart Disease Using Naïve Bayes Algorithm R.Karthiyayini 1, S.Chithaara 2 Assistant Professor, Department of computer Applications, Anna University, BIT campus, Tiruchirapalli, Tamilnadu,
Agenda. Interface Agents. Interface Agents
Agenda Marcelo G. Armentano Problem Overview Interface Agents Probabilistic approach Monitoring user actions Model of the application Model of user intentions Example Summary ISISTAN Research Institute
Running Head: STRUCTURE INDUCTION IN DIAGNOSTIC REASONING 1. Structure Induction in Diagnostic Causal Reasoning. Development, Berlin, Germany
Running Head: STRUCTURE INDUCTION IN DIAGNOSTIC REASONING 1 Structure Induction in Diagnostic Causal Reasoning Björn Meder 1, Ralf Mayrhofer 2, & Michael R. Waldmann 2 1 Center for Adaptive Behavior and
Bayesian networks - Time-series models - Apache Spark & Scala
Bayesian networks - Time-series models - Apache Spark & Scala Dr John Sandiford, CTO Bayes Server Data Science London Meetup - November 2014 1 Contents Introduction Bayesian networks Latent variables Anomaly
ST 371 (IV): Discrete Random Variables
ST 371 (IV): Discrete Random Variables 1 Random Variables A random variable (rv) is a function that is defined on the sample space of the experiment and that assigns a numerical variable to each possible
Definition and Calculus of Probability
In experiments with multivariate outcome variable, knowledge of the value of one variable may help predict another. For now, the word prediction will mean update the probabilities of events regarding the
What Is Probability?
1 What Is Probability? The idea: Uncertainty can often be "quantified" i.e., we can talk about degrees of certainty or uncertainty. This is the idea of probability: a higher probability expresses a higher
A Bayesian Approach for on-line max auditing of Dynamic Statistical Databases
A Bayesian Approach for on-line max auditing of Dynamic Statistical Databases Gerardo Canfora Bice Cavallo University of Sannio, Benevento, Italy, {gerardo.canfora,bice.cavallo}@unisannio.it ABSTRACT In
Relational Dynamic Bayesian Networks: a report. Cristina Manfredotti
Relational Dynamic Bayesian Networks: a report Cristina Manfredotti Dipartimento di Informatica, Sistemistica e Comunicazione (D.I.S.Co.) Università degli Studi Milano-Bicocca [email protected]
Comparison of frequentist and Bayesian inference. Class 20, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom
Comparison of frequentist and Bayesian inference. Class 20, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom 1 Learning Goals 1. Be able to explain the difference between the p-value and a posterior
Probabilistic Networks An Introduction to Bayesian Networks and Influence Diagrams
Probabilistic Networks An Introduction to Bayesian Networks and Influence Diagrams Uffe B. Kjærulff Department of Computer Science Aalborg University Anders L. Madsen HUGIN Expert A/S 10 May 2005 2 Contents
DETERMINING THE CONDITIONAL PROBABILITIES IN BAYESIAN NETWORKS
Hacettepe Journal of Mathematics and Statistics Volume 33 (2004), 69 76 DETERMINING THE CONDITIONAL PROBABILITIES IN BAYESIAN NETWORKS Hülya Olmuş and S. Oral Erbaş Received 22 : 07 : 2003 : Accepted 04
Lecture 9: Introduction to Pattern Analysis
Lecture 9: Introduction to Pattern Analysis g Features, patterns and classifiers g Components of a PR system g An example g Probability definitions g Bayes Theorem g Gaussian densities Features, patterns
Naive-Bayes Classification Algorithm
Naive-Bayes Classification Algorithm 1. Introduction to Bayesian Classification The Bayesian Classification represents a supervised learning method as well as a statistical method for classification. Assumes
Big Data, Machine Learning, Causal Models
Big Data, Machine Learning, Causal Models Sargur N. Srihari University at Buffalo, The State University of New York USA Int. Conf. on Signal and Image Processing, Bangalore January 2014 1 Plan of Discussion
MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.
Sample Practice problems - chapter 12-1 and 2 proportions for inference - Z Distributions Name MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. Provide
SEQUENTIAL VALUATION NETWORKS: A NEW GRAPHICAL TECHNIQUE FOR ASYMMETRIC DECISION PROBLEMS
SEQUENTIAL VALUATION NETWORKS: A NEW GRAPHICAL TECHNIQUE FOR ASYMMETRIC DECISION PROBLEMS Riza Demirer and Prakash P. Shenoy University of Kansas, School of Business 1300 Sunnyside Ave, Summerfield Hall
STAT 200 QUIZ 2 Solutions Section 6380 Fall 2013
STAT 200 QUIZ 2 Solutions Section 6380 Fall 2013 The quiz covers Chapters 4, 5 and 6. 1. (8 points) If the IQ scores are normally distributed with a mean of 100 and a standard deviation of 15. (a) (3 pts)
A Bayesian Network Model for Diagnosis of Liver Disorders Agnieszka Onisko, M.S., 1,2 Marek J. Druzdzel, Ph.D., 1 and Hanna Wasyluk, M.D.,Ph.D.
Research Report CBMI-99-27, Center for Biomedical Informatics, University of Pittsburgh, September 1999 A Bayesian Network Model for Diagnosis of Liver Disorders Agnieszka Onisko, M.S., 1,2 Marek J. Druzdzel,
Lecture 6: Discrete & Continuous Probability and Random Variables
Lecture 6: Discrete & Continuous Probability and Random Variables D. Alex Hughes Math Camp September 17, 2015 D. Alex Hughes (Math Camp) Lecture 6: Discrete & Continuous Probability and Random September
Evaluation of Machine Learning Techniques for Green Energy Prediction
arxiv:1406.3726v1 [cs.lg] 14 Jun 2014 Evaluation of Machine Learning Techniques for Green Energy Prediction 1 Objective Ankur Sahai University of Mainz, Germany We evaluate Machine Learning techniques
CONTINGENCY (CROSS- TABULATION) TABLES
CONTINGENCY (CROSS- TABULATION) TABLES Presents counts of two or more variables A 1 A 2 Total B 1 a b a+b B 2 c d c+d Total a+c b+d n = a+b+c+d 1 Joint, Marginal, and Conditional Probability We study methods
Aggregate Loss Models
Aggregate Loss Models Chapter 9 Stat 477 - Loss Models Chapter 9 (Stat 477) Aggregate Loss Models Brian Hartman - BYU 1 / 22 Objectives Objectives Individual risk model Collective risk model Computing
Statistics in Geophysics: Introduction and Probability Theory
Statistics in Geophysics: Introduction and Steffen Unkel Department of Statistics Ludwig-Maximilians-University Munich, Germany Winter Term 2013/14 1/32 What is Statistics? Introduction Statistics is the
1 Prior Probability and Posterior Probability
Math 541: Statistical Theory II Bayesian Approach to Parameter Estimation Lecturer: Songfeng Zheng 1 Prior Probability and Posterior Probability Consider now a problem of statistical inference in which
CHAPTER 2 Estimating Probabilities
CHAPTER 2 Estimating Probabilities Machine Learning Copyright c 2016. Tom M. Mitchell. All rights reserved. *DRAFT OF January 24, 2016* *PLEASE DO NOT DISTRIBUTE WITHOUT AUTHOR S PERMISSION* This is a
Tagging with Hidden Markov Models
Tagging with Hidden Markov Models Michael Collins 1 Tagging Problems In many NLP problems, we would like to model pairs of sequences. Part-of-speech (POS) tagging is perhaps the earliest, and most famous,
Introduction to Hypothesis Testing OPRE 6301
Introduction to Hypothesis Testing OPRE 6301 Motivation... The purpose of hypothesis testing is to determine whether there is enough statistical evidence in favor of a certain belief, or hypothesis, about
Tweaking Naïve Bayes classifier for intelligent spam detection
682 Tweaking Naïve Bayes classifier for intelligent spam detection Ankita Raturi 1 and Sunil Pranit Lal 2 1 University of California, Irvine, CA 92697, USA. [email protected] 2 School of Computing, Information
Lecture 16 : Relations and Functions DRAFT
CS/Math 240: Introduction to Discrete Mathematics 3/29/2011 Lecture 16 : Relations and Functions Instructor: Dieter van Melkebeek Scribe: Dalibor Zelený DRAFT In Lecture 3, we described a correspondence
Cost Model: Work, Span and Parallelism. 1 The RAM model for sequential computation:
CSE341T 08/31/2015 Lecture 3 Cost Model: Work, Span and Parallelism In this lecture, we will look at how one analyze a parallel program written using Cilk Plus. When we analyze the cost of an algorithm
Using game-theoretic probability for probability judgment. Glenn Shafer
Workshop on Game-Theoretic Probability and Related Topics Tokyo University February 28, 2008 Using game-theoretic probability for probability judgment Glenn Shafer For 170 years, people have asked whether
10-601. Machine Learning. http://www.cs.cmu.edu/afs/cs/academic/class/10601-f10/index.html
10-601 Machine Learning http://www.cs.cmu.edu/afs/cs/academic/class/10601-f10/index.html Course data All up-to-date info is on the course web page: http://www.cs.cmu.edu/afs/cs/academic/class/10601-f10/index.html
An Introduction to Information Theory
An Introduction to Information Theory Carlton Downey November 12, 2013 INTRODUCTION Today s recitation will be an introduction to Information Theory Information theory studies the quantification of Information
Compression algorithm for Bayesian network modeling of binary systems
Compression algorithm for Bayesian network modeling of binary systems I. Tien & A. Der Kiureghian University of California, Berkeley ABSTRACT: A Bayesian network (BN) is a useful tool for analyzing the
Chapter 4 Lecture Notes
Chapter 4 Lecture Notes Random Variables October 27, 2015 1 Section 4.1 Random Variables A random variable is typically a real-valued function defined on the sample space of some experiment. For instance,
I I I I I I I I I I I I I I I I I I I
A ABSTRACT Method for Using Belief Networks as nfluence Diagrams Gregory F. Cooper Medical Computer Science Group Medical School Office Building Stanford University Stanford, CA 94305 This paper demonstrates
The Calculus of Probability
The Calculus of Probability Let A and B be events in a sample space S. Partition rule: P(A) = P(A B) + P(A B ) Example: Roll a pair of fair dice P(Total of 10) = P(Total of 10 and double) + P(Total of
Performance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification
Performance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification Tina R. Patil, Mrs. S. S. Sherekar Sant Gadgebaba Amravati University, Amravati [email protected], [email protected]
Lecture 4: BK inequality 27th August and 6th September, 2007
CSL866: Percolation and Random Graphs IIT Delhi Amitabha Bagchi Scribe: Arindam Pal Lecture 4: BK inequality 27th August and 6th September, 2007 4. Preliminaries The FKG inequality allows us to lower bound
CSE 473: Artificial Intelligence Autumn 2010
CSE 473: Artificial Intelligence Autumn 2010 Machine Learning: Naive Bayes and Perceptron Luke Zettlemoyer Many slides over the course adapted from Dan Klein. 1 Outline Learning: Naive Bayes and Perceptron
