13.3 Inference Using Full Joint Distribution


 Elwin Thompson
 1 years ago
 Views:
Transcription
1 191 The probability distribution on a single variable must sum to 1 It is also true that any joint probability distribution on any set of variables must sum to 1 Recall that any proposition a is equivalent to the disjunction of all the atomic events in which a holds Call this set of events e(a) Atomic events are mutually exclusive, so the probability of any conjunction of atomic events is zero, by axiom 2 Hence, from axiom 3 P(a) = ei e(a) P(e i ) Given a full joint distribution that specifies the probabilities of all atomic events, this equation provides a simple method for computing the probability of any proposition Inference Using Full Joint Distribution toothache toothache catch catch catch catch cavity cavity E.g., there are six atomic events for cavity toothache: = 0.28 Extracting the distribution over a variable (or some subset of variables), marginal probability, is attained by adding the entries in the corresponding rows or columns E.g., P(cavity) = = 0.2 We can write the following general marginalization (summing out) rule for any sets of variables Y and Z: P(Y) = z Z P(Y, z) 1
2 193 For example: P( Cavity) z { Catch, Toothache} P( Cavity, z) A variant of this rule involves conditional probabilities instead of joint probabilities, using the product rule: P( Y) P( Y z) P( z) z This rule is called conditioning Marginalization and conditioning turn out to be useful rules for all kinds of derivations involving probability expressions toothache toothache catch catch catch catch cavity cavity Computing a conditional probability P(cavity toothache) = P(cavity toothache)/p(toothache) = ( )/( ) = 0.12/0.2 = 0.6 Respectively P( cavity toothache) = ( )/0.2 = 0.4 The two probabilities sum up to one, as they should 2
3 195 1/P(toothache) = 1/0.2 = 5 is a normalization constant ensuring that the distribution P(Cavity toothache) adds up to 1 Let denote the normalization constant P(Cavity toothache) = P(Cavity, toothache) = (P(Cavity, toothache, catch) + P(Cavity, toothache, catch)) = ([0.108, 0.016] + [0.012, 0.064]) = [0.12, 0.08] = [0.6, 0.4] In other words, we can calculate the conditional probability distribution without knowing P(toothache) using normalization 196 More generally: we need to find out the distribution of the query variable X (Cavity), evidence variables E (Toothache) have observed values e, and the remaining unobserved variables are Y (Catch) Evaluation of a query: P(X e) = P(X, e) = y P(X, e, y), where the summation is over all possible ys; i.e., all possible combinations of values of the unobserved variables Y 3
4 197 P(X, e, y) is simply a subset of the joint probability distribution of variables X, E, and Y X, E, and Y together constitute the complete set of variables for the domain Given the full joint distribution to work with, the equation in the previous slide can answer probabilistic queries for discrete variables It does not scale well For a domain described by n Boolean variables, it requires an input table of size O(2 n ) and takes O(2 n ) time to process the table In realistic problems the approach is completely impractical Independence If we expand the previous example with a fourth random variable Weather, which has four possible values, we have to copy the table of joint probabilities four times to have 32 entries together Dental problems have no influence on the weather, hence: P(Weather = cloudy toothache, catch, cavity) = P(Weather = cloudy) By this observation and product rule P(toothache, catch, cavity, Weather = cloudy) = P(Weather = cloudy) P(toothache, catch, cavity) 4
5 199 A similar equation holds for the other values of the variable Weather, and hence P(Toothache, Catch, Cavity, Weather) = P(Toothache, Catch, Cavity) P(Weather) The required joint distribution tables have 8 and 4 elements Propositions a and b are independent if P(a b) = P(a) P(b a) = P(b) P(a b) = P(a) P(b) Respectively variables X and Y are independent of each other if P(X Y) = P(X) P(Y X) = P(Y) P(X, Y) = P(X)P(Y) Independent coin flips: P(C 1,, C n ) can be represented as the product of n singlevariable distributions P(C i ) Bayes Rule and Its Use By the product rule P(a b) = P(a b) P(b) and the commutativity of conjunction P(a b) = P(b a) P(a) Equating the two righthand sides and dividing by P(a), we get the Bayes rule P(b a) = P(a b) P(b) / P(a) The more general case of multivalued variables X and Y conditionalized on some background evidence e P(Y X, e) = P(X Y, e) P(Y e) / P(X e) Using normalization Bayes rule can be written as P(Y X) = P(X Y) P(Y) 5
6 201 Half of meningitis patients have a stiff neck P(s m) = 0.5 The prior probability of meningitis is 1 / : P(m) = 1/ Every 20th patient complains about a stiff neck P(s) = 1/20 What is the probability that a patient complaining about a stiff neck has meningitis? P(m s) = P(s m) P(m) / P(s) = 20 / ( ) = Perhaps the doctor knows that a stiff neck implies meningitis in 1 out of cases The doctor, hence, has quantitative information in the diagnostic direction from symptoms to causes, and no need to use Bayes rule Unfortunately, diagnostic knowledge is often more fragile than causal knowledge If there is a sudden epidemic of meningitis, the unconditional probability of meningitis P(m) will go up The conditional probability P(s m), however, stays the same The doctor who derived diagnostic probability P(m s) directly from statistical observation of patients before the epidemic will have no idea how to update the value The doctor who computes P(m s) from the other three values will see P(m s) go up proportionally with P(m) 6
7 203 All modern probabilistic inference systems are based on the use of Bayes rule On the surface the relatively simple rule does not seem very useful However, as the previous example illustrates, Bayes rule gives a chance to apply existing knowledge We can avoid assessing the probability of the evidence P(s) by instead computing a posterior probability for each value of the query variable m and m and then normalizing the result P(M s) = [ P(s m) P(m), P(s m) P( m) ] Thus, we need to estimate P(s m) instead of P(s) Sometimes easier, sometimes harder 204 When a probabilistic query has more than one piece of evidence the approach based on full joint probability will not scale up P(Cavity toothache catch) Neither will applying Bayes rule scale up in general P(toothache catch Cavity) P(Cavity) We would need variables to be independent, but variable Toothache and Catch obviously are not: if the probe catches in the tooth, it probably has a cavity and that probably causes a toothache Each is directly caused by the cavity, but neither has a direct effect on the other catch and toothache are conditionally independent given Cavity 7
8 205 Conditional independence: P(toothache catch Cavity) = P(toothache Cavity) P(catch Cavity) Plugging this into Bayes rule yields P(Cavity toothache catch) = P(Cavity) P(toothache Cavity) P(catch Cavity) Now we only need three separate distributions The general definition of conditional independence of variables X and Y, given a third variable Z is P(X, Y Z) = P(X Z) P(Y Z) Equivalently, P(X Y, Z) = P(X Z) and P(Y X, Z) = P(Y Z) 206 If all effects are conditionally independent given a single cause, the exponential size of knowledge representation is cut to linear A probability distribution is called a naïve Bayes model if all effects E 1,, E n are conditionally independent, given a single cause C The full joint probability distribution can be written as P(C, E 1,, E n ) = P(C) i P(E i C) It is often used as a simplifying assumption even in cases where the effect variables are not conditionally independent given the cause variable In practice, naïve Bayes systems can work surprisingly well, even when the independence assumption is not true 8
9 PROBABILISTIC REASONING A Bayesian network is a directed graph in which each node is annotated with quantitative probability information 1. A set of random variables makes up the nodes of the network. Variables may be discrete or continuous 2. A set of directed links (arrows) connects pairs of nodes. If there is an arrow from node X to node Y, then X is said to be a parent of Y 3. Each node X i has a conditional probability distribution P(X i Parents(X i )) 4. The graph has no directed cycles and hence is a directed, acyclic graph DAG 208 The intuitive meaning of arrow X Y is that X has a direct influence on Y The topology of the network the set of nodes and links specifies the conditional independence relations that hold in the domain Once the topology of the Bayesian network has been laid out, we need only specify a conditional probability distribution for each variable, given its parents The combination of the topology and the conditional distributions specify implicitly the full joint distribution for all the variables Weather Cavity Toothache Catch 9
10 209 Example (from LA) A new burglar alarm (A) has been installed at home It is fairly reliable at detecting a burglary (B), but also responds on occasion to minor earthquakes (E) Neighbors John (J) and Mary (M) have promised to call you at work when they hear the alarm John always calls when he hears the alarm, but sometimes confuses the telephone ringing with the alarm and calls then, too Mary, on the other hand, likes loud music and sometimes misses the alarm altogether Given the evidence of who has or has not called, we would like to estimate the probability of a burglary 210 P(B).001 Burglary Earthquake P(E).002 Alarm B E P(A) T T.95 T F.94 F T.29 F F.001 A P(J) T.90 F.05 John Calls Mary Calls A P(M) T.70 F.01 10
11 211 The topology of the network indicates that Burglary and earthquakes affect the probability of the alarm s going off Whether John and Mary call depends only on the alarm They do not perceive any burglaries directly, they do not notice minor earthquakes, and they do not confer before calling Mary listening to loud music and John confusing phone ringing to the sound of the alarm can be read from the network only implicitly as uncertainty associated to calling at work The probabilities actually summarize a potentially infinite set of circumstances The alarm might fail to go off due to high humidity, power failure, dead battery, cut wires, a dead mouse stuck inside the bell, etc. John and Mary might fail to call and report an alarm because they are out to lunch, on vacation, temporarily deaf, passing helicopter, etc. 212 The conditional probability tables in the network give the probabilities for the values of the random variable depending on the combination of values for the parent nodes Each row must sum to 1, because the entries represent exhaustive set of cases for the variable Above all variables are Boolean, and therefore it is enough to know that the probability of a true value is p, the probability of false must be 1 p In general, a table for a Boolean variable with k parents contains 2 k independently specifiable probabilities A variable with no parents has only one row, representing the prior probabilities of each possible value of the variable 11
12 The Semantics of Bayesian Networks Every entry in the full joint probability distribution can be calculated from the information in a Bayesian network A generic entry in the joint distribution is the probability of a conjunction of particular assignments to each variable P(X 1 = x 1 X n = x n ), abbreviated as P(x 1,, x n ) The value of this entry is P(x 1,, x n ) = i=1,,n P(x i parents(x i )), where parents(x i ) denotes the specific values of the variables Parents(X i ) P(j m a b e) = P(j a) P(m a) P(a b e) P( b) P( e) = = Constructing Bayesian networks We can rewrite an entry in the joint distribution P(x 1,, x n ), using the product rule, as P(x n x n1,, x 1 ) P(x n1,, x 1 ) Then we repeat the process, reducing each conjunctive probability to a conditional probability and a smaller conjunction. We end up with one big product: P(xn xn 1,,x 1)P(xn 1 xn 2,,x 1) P(x2 x 1)P(x 1) = P(x x,,x 1) This identity holds true for any set of random variables and is called the chain rule The specification of the joint distribution is thus equivalent to the general assertion that, for every variable X i in the network P(X i X i1,, X 1 ) = P(X i Parents(X i )) provided that Parents(X i ) { X i1,, X 1 } n i= 1 i i1 12
13 215 The last condition is satisfied by labeling the nodes in any order that is consistent with the partial order implicit in the graph structure Each node is required to be conditionally independent of its predecessors in the node ordering, given its parents We need to choose as parents of a node all those nodes that directly influence the value of the variable For example, MaryCalls is certainly influenced by whether there is a Burglary or an Earthquake, but not directly influenced We believe the following conditional independence statement to hold: P(MaryCalls JohnCalls, Alarm, Earthquake, Burglary) = P(MaryCalls Alarm) 216 A Bayesian network is a complete and nonredundant representation of the domain In addition, it is a more compact representation than the full joint distribution, due to its locally structured properties Each subcomponent interacts directly with only a bounded number of other components, regardless of the total number of components Local structure is usually associated with linear rather than exponential growth in complexity In domain of a Bayesian network, if each of the n random variables is influenced by at most k others, then specifying each conditional probability table will require at most 2 k numbers Altogether n2 k numbers; the full joint distribution has 2 n cells E.g., n = 30 and k = 5: n2 k = 960 and 2 n >
14 217 In a Bayesian network we can exchange accuracy with complexity It may be wiser to leave out very weak dependencies from the network in order to restrict the complexity, but this yields a lower accuracy Choosing the right topology for the network is a hard problem The correct order in which to add nodes is to add the root causes first, then the variables they influence, and so on, until we reach the leaves, which have no direct causal influence on other variables Adding nodes in the false order makes the network unnecessarily complex and unintuitive 218 Mary Calls John Calls Earthquake Burglary Alarm 14
Logic, Probability and Learning
Logic, Probability and Learning Luc De Raedt luc.deraedt@cs.kuleuven.be Overview Logic Learning Probabilistic Learning Probabilistic Logic Learning Closely following : Russell and Norvig, AI: a modern
More informationArtificial Intelligence
Artificial Intelligence ICS461 Fall 2010 Nancy E. Reed nreed@hawaii.edu 1 Lecture #13 Uncertainty Outline Uncertainty Probability Syntax and Semantics Inference Independence and Bayes' Rule Uncertainty
More informationOutline. Uncertainty. Uncertainty. Making decisions under uncertainty. Probability. Methods for handling uncertainty
Outline Uncertainty Chapter 13 Uncertainty Probability Syntax and Semantics Inference Independence and Bayes' Rule Uncertainty Let action A t = leave for airport t minutes before flight Will A t get me
More informationArtificial Intelligence. Conditional probability. Inference by enumeration. Independence. Lesson 11 (From Russell & Norvig)
Artificial Intelligence Conditional probability Conditional or posterior probabilities e.g., cavity toothache) = 0.8 i.e., given that toothache is all I know tation for conditional distributions: Cavity
More information/ Informatica. Intelligent Systems. Agents dealing with uncertainty. Example: driving to the airport. Why logic fails
Intelligent Systems Prof. dr. Paul De Bra Technische Universiteit Eindhoven debra@win.tue.nl Agents dealing with uncertainty Uncertainty Probability Syntax and Semantics Inference Independence and Bayes'
More informationBasics of Probability
Basics of Probability 1 Sample spaces, events and probabilities Begin with a set Ω the sample space e.g., 6 possible rolls of a die. ω Ω is a sample point/possible world/atomic event A probability space
More informationProbability, Conditional Independence
Probability, Conditional Independence June 19, 2012 Probability, Conditional Independence Probability Sample space Ω of events Each event ω Ω has an associated measure Probability of the event P(ω) Axioms
More informationCS 331: Artificial Intelligence Fundamentals of Probability II. Joint Probability Distribution. Marginalization. Marginalization
Full Joint Probability Distributions CS 331: Artificial Intelligence Fundamentals of Probability II Toothache Cavity Catch Toothache, Cavity, Catch false false false 0.576 false false true 0.144 false
More informationBayesian Networks. Mausam (Slides by UWAI faculty)
Bayesian Networks Mausam (Slides by UWAI faculty) Bayes Nets In general, joint distribution P over set of variables (X 1 x... x X n ) requires exponential space for representation & inference BNs provide
More informationUncertainty. Chapter 13. Chapter 13 1
Uncertainty Chapter 13 Chapter 13 1 Attribution Modified from Stuart Russell s slides (Berkeley) Chapter 13 2 Outline Uncertainty Probability Syntax and Semantics Inference Independence and Bayes Rule
More informationBayesian Networks Chapter 14. Mausam (Slides by UWAI faculty & David Page)
Bayesian Networks Chapter 14 Mausam (Slides by UWAI faculty & David Page) Bayes Nets In general, joint distribution P over set of variables (X 1 x... x X n ) requires exponential space for representation
More informationLecture 2: Introduction to belief (Bayesian) networks
Lecture 2: Introduction to belief (Bayesian) networks Conditional independence What is a belief network? Independence maps (Imaps) January 7, 2008 1 COMP526 Lecture 2 Recall from last time: Conditional
More informationCS 771 Artificial Intelligence. Reasoning under uncertainty
CS 771 Artificial Intelligence Reasoning under uncertainty Today Representing uncertainty is useful in knowledge bases Probability provides a coherent framework for uncertainty Review basic concepts in
More informationMarginal Independence and Conditional Independence
Marginal Independence and Conditional Independence Computer Science cpsc322, Lecture 26 (Textbook Chpt 6.12) March, 19, 2010 Recap with Example Marginalization Conditional Probability Chain Rule Lecture
More informationCS 188: Artificial Intelligence. Probability recap
CS 188: Artificial Intelligence Bayes Nets Representation and Independence Pieter Abbeel UC Berkeley Many slides over this course adapted from Dan Klein, Stuart Russell, Andrew Moore Conditional probability
More informationOutline. Probability. Random Variables. Joint and Marginal Distributions Conditional Distribution Product Rule, Chain Rule, Bayes Rule.
Probability 1 Probability Random Variables Outline Joint and Marginal Distributions Conditional Distribution Product Rule, Chain Rule, Bayes Rule Inference Independence 2 Inference in Ghostbusters A ghost
More information22c:145 Artificial Intelligence
22c:145 Artificial Intelligence Fall 2005 Uncertainty Cesare Tinelli The University of Iowa Copyright 200105 Cesare Tinelli and Hantao Zhang. a a These notes are copyrighted material and may not be used
More informationData Modeling & Analysis Techniques. Probability & Statistics. Manfred Huber 2011 1
Data Modeling & Analysis Techniques Probability & Statistics Manfred Huber 2011 1 Probability and Statistics Probability and statistics are often used interchangeably but are different, related fields
More informationBayesian Networks. Read R&N Ch. 14.114.2. Next lecture: Read R&N 18.118.4
Bayesian Networks Read R&N Ch. 14.114.2 Next lecture: Read R&N 18.118.4 You will be expected to know Basic concepts and vocabulary of Bayesian networks. Nodes represent random variables. Directed arcs
More informationLife of A Knowledge Base (KB)
Life of A Knowledge Base (KB) A knowledge base system is a special kind of database management system to for knowledge base management. KB extraction: knowledge extraction using statistical models in NLP/ML
More informationArtificial Intelligence Mar 27, Bayesian Networks 1 P (T D)P (D) + P (T D)P ( D) =
Artificial Intelligence 15381 Mar 27, 2007 Bayesian Networks 1 Recap of last lecture Probability: precise representation of uncertainty Probability theory: optimal updating of knowledge based on new information
More informationThe Basics of Graphical Models
The Basics of Graphical Models David M. Blei Columbia University October 3, 2015 Introduction These notes follow Chapter 2 of An Introduction to Probabilistic Graphical Models by Michael Jordan. Many figures
More informationRelational Dynamic Bayesian Networks: a report. Cristina Manfredotti
Relational Dynamic Bayesian Networks: a report Cristina Manfredotti Dipartimento di Informatica, Sistemistica e Comunicazione (D.I.S.Co.) Università degli Studi MilanoBicocca manfredotti@disco.unimib.it
More informationVariable Elimination 1
Variable Elimination 1 Inference Exact inference Enumeration Variable elimination Junction trees and belief propagation Approximate inference Loopy belief propagation Sampling methods: likelihood weighting,
More informationBayesian Tutorial (Sheet Updated 20 March)
Bayesian Tutorial (Sheet Updated 20 March) Practice Questions (for discussing in Class) Week starting 21 March 2016 1. What is the probability that the total of two dice will be greater than 8, given that
More informationDiscrete Mathematics and Probability Theory Fall 2009 Satish Rao,David Tse Note 11
CS 70 Discrete Mathematics and Probability Theory Fall 2009 Satish Rao,David Tse Note Conditional Probability A pharmaceutical company is marketing a new test for a certain medical condition. According
More informationA crash course in probability and Naïve Bayes classification
Probability theory A crash course in probability and Naïve Bayes classification Chapter 9 Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s
More informationIntelligent Systems: Reasoning and Recognition. Uncertainty and Plausible Reasoning
Intelligent Systems: Reasoning and Recognition James L. Crowley ENSIMAG 2 / MoSIG M1 Second Semester 2015/2016 Lesson 12 25 march 2015 Uncertainty and Plausible Reasoning MYCIN (continued)...2 Backward
More informationThe CertaintyFactor Model
The CertaintyFactor Model David Heckerman Departments of Computer Science and Pathology University of Southern California HMR 204, 2025 Zonal Ave Los Angeles, CA 90033 dheck@sumexaim.stanford.edu 1 Introduction
More informationCHAPTER 2 Estimating Probabilities
CHAPTER 2 Estimating Probabilities Machine Learning Copyright c 2016. Tom M. Mitchell. All rights reserved. *DRAFT OF January 24, 2016* *PLEASE DO NOT DISTRIBUTE WITHOUT AUTHOR S PERMISSION* This is a
More informationConcept Learning. Machine Learning 1
Concept Learning Inducing general functions from specific training examples is a main issue of machine learning. Concept Learning: Acquiring the definition of a general category from given sample positive
More informationProbability Theory. Elementary rules of probability Sum rule. Product rule. p. 23
Probability Theory Uncertainty is key concept in machine learning. Probability provides consistent framework for the quantification and manipulation of uncertainty. Probability of an event is the fraction
More information2. Methods of Proof Types of Proofs. Suppose we wish to prove an implication p q. Here are some strategies we have available to try.
2. METHODS OF PROOF 69 2. Methods of Proof 2.1. Types of Proofs. Suppose we wish to prove an implication p q. Here are some strategies we have available to try. Trivial Proof: If we know q is true then
More informationQuerying Joint Probability Distributions
Querying Joint Probability Distributions Sargur Srihari srihari@cedar.buffalo.edu 1 Queries of Interest Probabilistic Graphical Models (BNs and MNs) represent joint probability distributions over multiple
More informationIntroduction to Probability (1)
Introduction to Probability (1) John Kelleher & Brian Mac Namee 1 Introduction Summarizing Uncertainty 2 Where Do Probabilities Come From? 3 Basic Probability Notation Basics 4 The Axioms of Probabilities
More informationDiscrete Mathematics and Probability Theory Fall 2009 Satish Rao, David Tse Note 1
CS 70 Discrete Mathematics and Probability Theory Fall 2009 Satish Rao, David Tse Note 1 Course Outline CS70 is a course on Discrete Mathematics and Probability for EECS Students. The purpose of the course
More informationInformation Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay
Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay Lecture  17 ShannonFanoElias Coding and Introduction to Arithmetic Coding
More informationApplications of Methods of Proof
CHAPTER 4 Applications of Methods of Proof 1. Set Operations 1.1. Set Operations. The settheoretic operations, intersection, union, and complementation, defined in Chapter 1.1 Introduction to Sets are
More informationChapter 1 LOGIC AND PROOF
Chapter 1 LOGIC AND PROOF To be able to understand mathematics and mathematical arguments, it is necessary to have a solid understanding of logic and the way in which known facts can be combined to prove
More informationLecture Note 1 Set and Probability Theory. MIT 14.30 Spring 2006 Herman Bennett
Lecture Note 1 Set and Probability Theory MIT 14.30 Spring 2006 Herman Bennett 1 Set Theory 1.1 Definitions and Theorems 1. Experiment: any action or process whose outcome is subject to uncertainty. 2.
More informationQ1. [19 pts] Cheating at cs188blackjack
CS 188 Spring 2012 Introduction to Artificial Intelligence Practice Midterm II Solutions Q1. [19 pts] Cheating at cs188blackjack Cheating dealers have become a serious problem at the cs188blackjack tables.
More informationA set is a Many that allows itself to be thought of as a One. (Georg Cantor)
Chapter 4 Set Theory A set is a Many that allows itself to be thought of as a One. (Georg Cantor) In the previous chapters, we have often encountered sets, for example, prime numbers form a set, domains
More informationProbability & Statistics Primer Gregory J. Hakim University of Washington 2 January 2009 v2.0
Probability & Statistics Primer Gregory J. Hakim University of Washington 2 January 2009 v2.0 This primer provides an overview of basic concepts and definitions in probability and statistics. We shall
More informationFuzzy Implication Rules. Adnan Yazıcı Dept. of Computer Engineering, Middle East Technical University Ankara/Turkey
Fuzzy Implication Rules Adnan Yazıcı Dept. of Computer Engineering, Middle East Technical University Ankara/Turkey Fuzzy IfThen Rules Remember: The difference between the semantics of fuzzy mapping rules
More informationCORRELATED TO THE SOUTH CAROLINA COLLEGE AND CAREERREADY FOUNDATIONS IN ALGEBRA
We Can Early Learning Curriculum PreK Grades 8 12 INSIDE ALGEBRA, GRADES 8 12 CORRELATED TO THE SOUTH CAROLINA COLLEGE AND CAREERREADY FOUNDATIONS IN ALGEBRA April 2016 www.voyagersopris.com Mathematical
More informationProbability Review. ICPSR Applied Bayesian Modeling
Probability Review ICPSR Applied Bayesian Modeling Random Variables Flip a coin. Will it be heads or tails? The outcome of a single event is random, or unpredictable What if we flip a coin 10 times? How
More informationMath/Stats 425 Introduction to Probability. 1. Uncertainty and the axioms of probability
Math/Stats 425 Introduction to Probability 1. Uncertainty and the axioms of probability Processes in the real world are random if outcomes cannot be predicted with certainty. Example: coin tossing, stock
More information6.3 Conditional Probability and Independence
222 CHAPTER 6. PROBABILITY 6.3 Conditional Probability and Independence Conditional Probability Two cubical dice each have a triangle painted on one side, a circle painted on two sides and a square painted
More informationProbabilistic Networks An Introduction to Bayesian Networks and Influence Diagrams
Probabilistic Networks An Introduction to Bayesian Networks and Influence Diagrams Uffe B. Kjærulff Department of Computer Science Aalborg University Anders L. Madsen HUGIN Expert A/S 10 May 2005 2 Contents
More informationA Few Basics of Probability
A Few Basics of Probability Philosophy 57 Spring, 2004 1 Introduction This handout distinguishes between inductive and deductive logic, and then introduces probability, a concept essential to the study
More informationLearning from Data: Naive Bayes
Semester 1 http://www.anc.ed.ac.uk/ amos/lfd/ Naive Bayes Typical example: Bayesian Spam Filter. Naive means naive. Bayesian methods can be much more sophisticated. Basic assumption: conditional independence.
More informationSolutions to Exercises 8
Discrete Mathematics Lent 2009 MA210 Solutions to Exercises 8 (1) Suppose that G is a graph in which every vertex has degree at least k, where k 1, and in which every cycle contains at least 4 vertices.
More informationChingHan Hsu, BMES, National Tsing Hua University c 2014 by ChingHan Hsu, Ph.D., BMIR Lab
Lecture 2 Probability BMIR Lecture Series in Probability and Statistics ChingHan Hsu, BMES, National Tsing Hua University c 2014 by ChingHan Hsu, Ph.D., BMIR Lab 2.1 1 Sample Spaces and Events Random
More informationInduction. Margaret M. Fleck. 10 October These notes cover mathematical induction and recursive definition
Induction Margaret M. Fleck 10 October 011 These notes cover mathematical induction and recursive definition 1 Introduction to induction At the start of the term, we saw the following formula for computing
More informationThe rule for computing conditional property can be interpreted different. In Question 2, P B
Question 4: What is the product rule for probability? The rule for computing conditional property can be interpreted different. In Question 2, P A and B we defined the conditional probability PA B. If
More informationStatistics in Geophysics: Introduction and Probability Theory
Statistics in Geophysics: Introduction and Steffen Unkel Department of Statistics LudwigMaximiliansUniversity Munich, Germany Winter Term 2013/14 1/32 What is Statistics? Introduction Statistics is the
More informationmax cx s.t. Ax c where the matrix A, cost vector c and right hand side b are given and x is a vector of variables. For this example we have x
Linear Programming Linear programming refers to problems stated as maximization or minimization of a linear function subject to constraints that are linear equalities and inequalities. Although the study
More information10601. Machine Learning. http://www.cs.cmu.edu/afs/cs/academic/class/10601f10/index.html
10601 Machine Learning http://www.cs.cmu.edu/afs/cs/academic/class/10601f10/index.html Course data All uptodate info is on the course web page: http://www.cs.cmu.edu/afs/cs/academic/class/10601f10/index.html
More informationSets and functions. {x R : x > 0}.
Sets and functions 1 Sets The language of sets and functions pervades mathematics, and most of the important operations in mathematics turn out to be functions or to be expressible in terms of functions.
More information3. Mathematical Induction
3. MATHEMATICAL INDUCTION 83 3. Mathematical Induction 3.1. First Principle of Mathematical Induction. Let P (n) be a predicate with domain of discourse (over) the natural numbers N = {0, 1,,...}. If (1)
More informationProbability Models.S1 Introduction to Probability
Probability Models.S1 Introduction to Probability Operations Research Models and Methods Paul A. Jensen and Jonathan F. Bard The stochastic chapters of this book involve random variability. Decisions are
More informationE3: PROBABILITY AND STATISTICS lecture notes
E3: PROBABILITY AND STATISTICS lecture notes 2 Contents 1 PROBABILITY THEORY 7 1.1 Experiments and random events............................ 7 1.2 Certain event. Impossible event............................
More informationIncorporating Evidence in Bayesian networks with the Select Operator
Incorporating Evidence in Bayesian networks with the Select Operator C.J. Butz and F. Fang Department of Computer Science, University of Regina Regina, Saskatchewan, Canada SAS 0A2 {butz, fang11fa}@cs.uregina.ca
More informationAttribution. Modified from Stuart Russell s slides (Berkeley) Parts of the slides are inspired by Dan Klein s lecture material for CS 188 (Berkeley)
Machine Learning 1 Attribution Modified from Stuart Russell s slides (Berkeley) Parts of the slides are inspired by Dan Klein s lecture material for CS 188 (Berkeley) 2 Outline Inductive learning Decision
More informationBasic Probability Concepts
page 1 Chapter 1 Basic Probability Concepts 1.1 Sample and Event Spaces 1.1.1 Sample Space A probabilistic (or statistical) experiment has the following characteristics: (a) the set of all possible outcomes
More informationPart III: Machine Learning. CS 188: Artificial Intelligence. Machine Learning This Set of Slides. Parameter Estimation. Estimation: Smoothing
CS 188: Artificial Intelligence Lecture 20: Dynamic Bayes Nets, Naïve Bayes Pieter Abbeel UC Berkeley Slides adapted from Dan Klein. Part III: Machine Learning Up until now: how to reason in a model and
More informationSolution to Homework 2
Solution to Homework 2 Olena Bormashenko September 23, 2011 Section 1.4: 1(a)(b)(i)(k), 4, 5, 14; Section 1.5: 1(a)(b)(c)(d)(e)(n), 2(a)(c), 13, 16, 17, 18, 27 Section 1.4 1. Compute the following, if
More informationSenior Secondary Australian Curriculum
Senior Secondary Australian Curriculum Mathematical Methods Glossary Unit 1 Functions and graphs Asymptote A line is an asymptote to a curve if the distance between the line and the curve approaches zero
More informationLinear Threshold Units
Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear
More informationDecision Trees and Networks
Lecture 21: Uncertainty 6 Today s Lecture Victor R. Lesser CMPSCI 683 Fall 2010 Decision Trees and Networks Decision Trees A decision tree is an explicit representation of all the possible scenarios from
More informationProof: A logical argument establishing the truth of the theorem given the truth of the axioms and any previously proven theorems.
Math 232  Discrete Math 2.1 Direct Proofs and Counterexamples Notes Axiom: Proposition that is assumed to be true. Proof: A logical argument establishing the truth of the theorem given the truth of the
More informationSection 6.1 Joint Distribution Functions
Section 6.1 Joint Distribution Functions We often care about more than one random variable at a time. DEFINITION: For any two random variables X and Y the joint cumulative probability distribution function
More informationQuestion: What is the probability that a fivecard poker hand contains a flush, that is, five cards of the same suit?
ECS20 Discrete Mathematics Quarter: Spring 2007 Instructor: John Steinberger Assistant: Sophie Engle (prepared by Sophie Engle) Homework 8 Hints Due Wednesday June 6 th 2007 Section 6.1 #16 What is the
More informationBusiness Statistics 41000: Probability 1
Business Statistics 41000: Probability 1 Drew D. Creal University of Chicago, Booth School of Business Week 3: January 24 and 25, 2014 1 Class information Drew D. Creal Email: dcreal@chicagobooth.edu Office:
More informationThe Foundations: Logic and Proofs. Chapter 1, Part III: Proofs
The Foundations: Logic and Proofs Chapter 1, Part III: Proofs Rules of Inference Section 1.6 Section Summary Valid Arguments Inference Rules for Propositional Logic Using Rules of Inference to Build Arguments
More information+ Section 6.2 and 6.3
Section 6.2 and 6.3 Learning Objectives After this section, you should be able to DEFINE and APPLY basic rules of probability CONSTRUCT Venn diagrams and DETERMINE probabilities DETERMINE probabilities
More informationCompression algorithm for Bayesian network modeling of binary systems
Compression algorithm for Bayesian network modeling of binary systems I. Tien & A. Der Kiureghian University of California, Berkeley ABSTRACT: A Bayesian network (BN) is a useful tool for analyzing the
More informationBayesian Updating with Discrete Priors Class 11, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom
1 Learning Goals Bayesian Updating with Discrete Priors Class 11, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom 1. Be able to apply Bayes theorem to compute probabilities. 2. Be able to identify
More information8.3. GEOMETRIC SEQUENCES AND SERIES
8.3. GEOMETRIC SEQUENCES AND SERIES What You Should Learn Recognize, write, and find the nth terms of geometric sequences. Find the sum of a finite geometric sequence. Find the sum of an infinite geometric
More informationProbability OPRE 6301
Probability OPRE 6301 Random Experiment... Recall that our eventual goal in this course is to go from the random sample to the population. The theory that allows for this transition is the theory of probability.
More informationMATH 55: HOMEWORK #2 SOLUTIONS
MATH 55: HOMEWORK # SOLUTIONS ERIC PETERSON * 1. SECTION 1.5: NESTED QUANTIFIERS 1.1. Problem 1.5.8. Determine the truth value of each of these statements if the domain of each variable consists of all
More informationCS 441 Discrete Mathematics for CS Lecture 2. Propositional logic. CS 441 Discrete mathematics for CS. Course administration
CS 441 Discrete Mathematics for CS Lecture 2 Propositional logic Milos Hauskrecht milos@cs.pitt.edu 5329 Sennott Square Course administration Homework 1 First homework assignment is out today will be posted
More informationJoint Probability Distributions and Random Samples. Week 5, 2011 Stat 4570/5570 Material from Devore s book (Ed 8), and Cengage
5 Joint Probability Distributions and Random Samples Week 5, 2011 Stat 4570/5570 Material from Devore s book (Ed 8), and Cengage Two Discrete Random Variables The probability mass function (pmf) of a single
More informationReview. Bayesianism and Reliability. Today s Class
Review Bayesianism and Reliability Models and Simulations in Philosophy April 14th, 2014 Last Class: Difference between individual and social epistemology Why simulations are particularly useful for social
More informationModelbased Synthesis. Tony O Hagan
Modelbased Synthesis Tony O Hagan Stochastic models Synthesising evidence through a statistical model 2 Evidence Synthesis (Session 3), Helsinki, 28/10/11 Graphical modelling The kinds of models that
More informationConditional Probability, Independence and Bayes Theorem Class 3, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom
Conditional Probability, Independence and Bayes Theorem Class 3, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom 1 Learning Goals 1. Know the definitions of conditional probability and independence
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Raquel Urtasun and Tamir Hazan TTI Chicago April 4, 2011 Raquel Urtasun and Tamir Hazan (TTIC) Graphical Models April 4, 2011 1 / 22 Bayesian Networks and independences
More informationStudy Manual. Probabilistic Reasoning. Silja Renooij. August 2015
Study Manual Probabilistic Reasoning 2015 2016 Silja Renooij August 2015 General information This study manual was designed to help guide your self studies. As such, it does not include material that is
More informationPrediction of Heart Disease Using Naïve Bayes Algorithm
Prediction of Heart Disease Using Naïve Bayes Algorithm R.Karthiyayini 1, S.Chithaara 2 Assistant Professor, Department of computer Applications, Anna University, BIT campus, Tiruchirapalli, Tamilnadu,
More informationWUCT121. Discrete Mathematics. Logic
WUCT121 Discrete Mathematics Logic 1. Logic 2. Predicate Logic 3. Proofs 4. Set Theory 5. Relations and Functions WUCT121 Logic 1 Section 1. Logic 1.1. Introduction. In developing a mathematical theory,
More informationQuadratic Equations and Inequalities
MA 134 Lecture Notes August 20, 2012 Introduction The purpose of this lecture is to... Introduction The purpose of this lecture is to... Learn about different types of equations Introduction The purpose
More informationEE 302 Division 1. Homework 5 Solutions.
EE 32 Division. Homework 5 Solutions. Problem. A fair foursided die (with faces labeled,, 2, 3) is thrown once to determine how many times a fair coin is to be flipped: if N is the number that results
More informationFinite and discrete probability distributions
8 Finite and discrete probability distributions To understand the algorithmic aspects of number theory and algebra, and applications such as cryptography, a firm grasp of the basics of probability theory
More informationAn Empirical Evaluation of Bayesian Networks Derived from Fault Trees
An Empirical Evaluation of Bayesian Networks Derived from Fault Trees Shane Strasser and John Sheppard Department of Computer Science Montana State University Bozeman, MT 59717 {shane.strasser, john.sheppard}@cs.montana.edu
More informationCell Phone based Activity Detection using Markov Logic Network
Cell Phone based Activity Detection using Markov Logic Network Somdeb Sarkhel sxs104721@utdallas.edu 1 Introduction Mobile devices are becoming increasingly sophisticated and the latest generation of smart
More informationECEN 5682 Theory and Practice of Error Control Codes
ECEN 5682 Theory and Practice of Error Control Codes Convolutional Codes University of Colorado Spring 2007 Linear (n, k) block codes take k data symbols at a time and encode them into n code symbols.
More informationProbability II (MATH 2647)
Probability II (MATH 2647) Lecturer Dr. O. Hryniv email Ostap.Hryniv@durham.ac.uk office CM309 http://maths.dur.ac.uk/stats/courses/probmc2h/probability2h.html or via DUO This term we shall consider: Review
More informationStudents in their first advanced mathematics classes are often surprised
CHAPTER 8 Proofs Involving Sets Students in their first advanced mathematics classes are often surprised by the extensive role that sets play and by the fact that most of the proofs they encounter are
More informationSection 1. Statements and Truth Tables. Definition 1.1: A mathematical statement is a declarative sentence that is true or false, but not both.
M3210 Supplemental Notes: Basic Logic Concepts In this course we will examine statements about mathematical concepts and relationships between these concepts (definitions, theorems). We will also consider
More informationThe Binomial Tree and Lognormality
The Binomial Tree and Lognormality The Binomial Tree and Lognormality The usefulness of the binomial pricing model hinges on the binomial tree providing a reasonable representation of the stock price distribution
More information