13.3 Inference Using Full Joint Distribution

Size: px
Start display at page:

Download "13.3 Inference Using Full Joint Distribution"

Transcription

1 191 The probability distribution on a single variable must sum to 1 It is also true that any joint probability distribution on any set of variables must sum to 1 Recall that any proposition a is equivalent to the disjunction of all the atomic events in which a holds Call this set of events e(a) Atomic events are mutually exclusive, so the probability of any conjunction of atomic events is zero, by axiom 2 Hence, from axiom 3 P(a) = ei e(a) P(e i ) Given a full joint distribution that specifies the probabilities of all atomic events, this equation provides a simple method for computing the probability of any proposition Inference Using Full Joint Distribution toothache toothache catch catch catch catch cavity cavity E.g., there are six atomic events for cavity toothache: = 0.28 Extracting the distribution over a variable (or some subset of variables), marginal probability, is attained by adding the entries in the corresponding rows or columns E.g., P(cavity) = = 0.2 We can write the following general marginalization (summing out) rule for any sets of variables Y and Z: P(Y) = z Z P(Y, z) 1

2 193 For example: P( Cavity) z { Catch, Toothache} P( Cavity, z) A variant of this rule involves conditional probabilities instead of joint probabilities, using the product rule: P( Y) P( Y z) P( z) z This rule is called conditioning Marginalization and conditioning turn out to be useful rules for all kinds of derivations involving probability expressions toothache toothache catch catch catch catch cavity cavity Computing a conditional probability P(cavity toothache) = P(cavity toothache)/p(toothache) = ( )/( ) = 0.12/0.2 = 0.6 Respectively P( cavity toothache) = ( )/0.2 = 0.4 The two probabilities sum up to one, as they should 2

3 195 1/P(toothache) = 1/0.2 = 5 is a normalization constant ensuring that the distribution P(Cavity toothache) adds up to 1 Let denote the normalization constant P(Cavity toothache) = P(Cavity, toothache) = (P(Cavity, toothache, catch) + P(Cavity, toothache, catch)) = ([0.108, 0.016] + [0.012, 0.064]) = [0.12, 0.08] = [0.6, 0.4] In other words, we can calculate the conditional probability distribution without knowing P(toothache) using normalization 196 More generally: we need to find out the distribution of the query variable X (Cavity), evidence variables E (Toothache) have observed values e, and the remaining unobserved variables are Y (Catch) Evaluation of a query: P(X e) = P(X, e) = y P(X, e, y), where the summation is over all possible ys; i.e., all possible combinations of values of the unobserved variables Y 3

4 197 P(X, e, y) is simply a subset of the joint probability distribution of variables X, E, and Y X, E, and Y together constitute the complete set of variables for the domain Given the full joint distribution to work with, the equation in the previous slide can answer probabilistic queries for discrete variables It does not scale well For a domain described by n Boolean variables, it requires an input table of size O(2 n ) and takes O(2 n ) time to process the table In realistic problems the approach is completely impractical Independence If we expand the previous example with a fourth random variable Weather, which has four possible values, we have to copy the table of joint probabilities four times to have 32 entries together Dental problems have no influence on the weather, hence: P(Weather = cloudy toothache, catch, cavity) = P(Weather = cloudy) By this observation and product rule P(toothache, catch, cavity, Weather = cloudy) = P(Weather = cloudy) P(toothache, catch, cavity) 4

5 199 A similar equation holds for the other values of the variable Weather, and hence P(Toothache, Catch, Cavity, Weather) = P(Toothache, Catch, Cavity) P(Weather) The required joint distribution tables have 8 and 4 elements Propositions a and b are independent if P(a b) = P(a) P(b a) = P(b) P(a b) = P(a) P(b) Respectively variables X and Y are independent of each other if P(X Y) = P(X) P(Y X) = P(Y) P(X, Y) = P(X)P(Y) Independent coin flips: P(C 1,, C n ) can be represented as the product of n single-variable distributions P(C i ) Bayes Rule and Its Use By the product rule P(a b) = P(a b) P(b) and the commutativity of conjunction P(a b) = P(b a) P(a) Equating the two right-hand sides and dividing by P(a), we get the Bayes rule P(b a) = P(a b) P(b) / P(a) The more general case of multivalued variables X and Y conditionalized on some background evidence e P(Y X, e) = P(X Y, e) P(Y e) / P(X e) Using normalization Bayes rule can be written as P(Y X) = P(X Y) P(Y) 5

6 201 Half of meningitis patients have a stiff neck P(s m) = 0.5 The prior probability of meningitis is 1 / : P(m) = 1/ Every 20th patient complains about a stiff neck P(s) = 1/20 What is the probability that a patient complaining about a stiff neck has meningitis? P(m s) = P(s m) P(m) / P(s) = 20 / ( ) = Perhaps the doctor knows that a stiff neck implies meningitis in 1 out of cases The doctor, hence, has quantitative information in the diagnostic direction from symptoms to causes, and no need to use Bayes rule Unfortunately, diagnostic knowledge is often more fragile than causal knowledge If there is a sudden epidemic of meningitis, the unconditional probability of meningitis P(m) will go up The conditional probability P(s m), however, stays the same The doctor who derived diagnostic probability P(m s) directly from statistical observation of patients before the epidemic will have no idea how to update the value The doctor who computes P(m s) from the other three values will see P(m s) go up proportionally with P(m) 6

7 203 All modern probabilistic inference systems are based on the use of Bayes rule On the surface the relatively simple rule does not seem very useful However, as the previous example illustrates, Bayes rule gives a chance to apply existing knowledge We can avoid assessing the probability of the evidence P(s) by instead computing a posterior probability for each value of the query variable m and m and then normalizing the result P(M s) = [ P(s m) P(m), P(s m) P( m) ] Thus, we need to estimate P(s m) instead of P(s) Sometimes easier, sometimes harder 204 When a probabilistic query has more than one piece of evidence the approach based on full joint probability will not scale up P(Cavity toothache catch) Neither will applying Bayes rule scale up in general P(toothache catch Cavity) P(Cavity) We would need variables to be independent, but variable Toothache and Catch obviously are not: if the probe catches in the tooth, it probably has a cavity and that probably causes a toothache Each is directly caused by the cavity, but neither has a direct effect on the other catch and toothache are conditionally independent given Cavity 7

8 205 Conditional independence: P(toothache catch Cavity) = P(toothache Cavity) P(catch Cavity) Plugging this into Bayes rule yields P(Cavity toothache catch) = P(Cavity) P(toothache Cavity) P(catch Cavity) Now we only need three separate distributions The general definition of conditional independence of variables X and Y, given a third variable Z is P(X, Y Z) = P(X Z) P(Y Z) Equivalently, P(X Y, Z) = P(X Z) and P(Y X, Z) = P(Y Z) 206 If all effects are conditionally independent given a single cause, the exponential size of knowledge representation is cut to linear A probability distribution is called a naïve Bayes model if all effects E 1,, E n are conditionally independent, given a single cause C The full joint probability distribution can be written as P(C, E 1,, E n ) = P(C) i P(E i C) It is often used as a simplifying assumption even in cases where the effect variables are not conditionally independent given the cause variable In practice, naïve Bayes systems can work surprisingly well, even when the independence assumption is not true 8

9 PROBABILISTIC REASONING A Bayesian network is a directed graph in which each node is annotated with quantitative probability information 1. A set of random variables makes up the nodes of the network. Variables may be discrete or continuous 2. A set of directed links (arrows) connects pairs of nodes. If there is an arrow from node X to node Y, then X is said to be a parent of Y 3. Each node X i has a conditional probability distribution P(X i Parents(X i )) 4. The graph has no directed cycles and hence is a directed, acyclic graph DAG 208 The intuitive meaning of arrow X Y is that X has a direct influence on Y The topology of the network the set of nodes and links specifies the conditional independence relations that hold in the domain Once the topology of the Bayesian network has been laid out, we need only specify a conditional probability distribution for each variable, given its parents The combination of the topology and the conditional distributions specify implicitly the full joint distribution for all the variables Weather Cavity Toothache Catch 9

10 209 Example (from LA) A new burglar alarm (A) has been installed at home It is fairly reliable at detecting a burglary (B), but also responds on occasion to minor earthquakes (E) Neighbors John (J) and Mary (M) have promised to call you at work when they hear the alarm John always calls when he hears the alarm, but sometimes confuses the telephone ringing with the alarm and calls then, too Mary, on the other hand, likes loud music and sometimes misses the alarm altogether Given the evidence of who has or has not called, we would like to estimate the probability of a burglary 210 P(B).001 Burglary Earthquake P(E).002 Alarm B E P(A) T T.95 T F.94 F T.29 F F.001 A P(J) T.90 F.05 John Calls Mary Calls A P(M) T.70 F.01 10

11 211 The topology of the network indicates that Burglary and earthquakes affect the probability of the alarm s going off Whether John and Mary call depends only on the alarm They do not perceive any burglaries directly, they do not notice minor earthquakes, and they do not confer before calling Mary listening to loud music and John confusing phone ringing to the sound of the alarm can be read from the network only implicitly as uncertainty associated to calling at work The probabilities actually summarize a potentially infinite set of circumstances The alarm might fail to go off due to high humidity, power failure, dead battery, cut wires, a dead mouse stuck inside the bell, etc. John and Mary might fail to call and report an alarm because they are out to lunch, on vacation, temporarily deaf, passing helicopter, etc. 212 The conditional probability tables in the network give the probabilities for the values of the random variable depending on the combination of values for the parent nodes Each row must sum to 1, because the entries represent exhaustive set of cases for the variable Above all variables are Boolean, and therefore it is enough to know that the probability of a true value is p, the probability of false must be 1 p In general, a table for a Boolean variable with k parents contains 2 k independently specifiable probabilities A variable with no parents has only one row, representing the prior probabilities of each possible value of the variable 11

12 The Semantics of Bayesian Networks Every entry in the full joint probability distribution can be calculated from the information in a Bayesian network A generic entry in the joint distribution is the probability of a conjunction of particular assignments to each variable P(X 1 = x 1 X n = x n ), abbreviated as P(x 1,, x n ) The value of this entry is P(x 1,, x n ) = i=1,,n P(x i parents(x i )), where parents(x i ) denotes the specific values of the variables Parents(X i ) P(j m a b e) = P(j a) P(m a) P(a b e) P( b) P( e) = = Constructing Bayesian networks We can rewrite an entry in the joint distribution P(x 1,, x n ), using the product rule, as P(x n x n-1,, x 1 ) P(x n-1,, x 1 ) Then we repeat the process, reducing each conjunctive probability to a conditional probability and a smaller conjunction. We end up with one big product: P(xn xn -1,,x 1)P(xn -1 xn -2,,x 1) P(x2 x 1)P(x 1) = P(x x,,x 1) This identity holds true for any set of random variables and is called the chain rule The specification of the joint distribution is thus equivalent to the general assertion that, for every variable X i in the network P(X i X i-1,, X 1 ) = P(X i Parents(X i )) provided that Parents(X i ) { X i-1,, X 1 } n i= 1 i i-1 12

13 215 The last condition is satisfied by labeling the nodes in any order that is consistent with the partial order implicit in the graph structure Each node is required to be conditionally independent of its predecessors in the node ordering, given its parents We need to choose as parents of a node all those nodes that directly influence the value of the variable For example, MaryCalls is certainly influenced by whether there is a Burglary or an Earthquake, but not directly influenced We believe the following conditional independence statement to hold: P(MaryCalls JohnCalls, Alarm, Earthquake, Burglary) = P(MaryCalls Alarm) 216 A Bayesian network is a complete and nonredundant representation of the domain In addition, it is a more compact representation than the full joint distribution, due to its locally structured properties Each subcomponent interacts directly with only a bounded number of other components, regardless of the total number of components Local structure is usually associated with linear rather than exponential growth in complexity In domain of a Bayesian network, if each of the n random variables is influenced by at most k others, then specifying each conditional probability table will require at most 2 k numbers Altogether n2 k numbers; the full joint distribution has 2 n cells E.g., n = 30 and k = 5: n2 k = 960 and 2 n >

14 217 In a Bayesian network we can exchange accuracy with complexity It may be wiser to leave out very weak dependencies from the network in order to restrict the complexity, but this yields a lower accuracy Choosing the right topology for the network is a hard problem The correct order in which to add nodes is to add the root causes first, then the variables they influence, and so on, until we reach the leaves, which have no direct causal influence on other variables Adding nodes in the false order makes the network unnecessarily complex and unintuitive 218 Mary Calls John Calls Earthquake Burglary Alarm 14

Bayesian Networks. Mausam (Slides by UW-AI faculty)

Bayesian Networks. Mausam (Slides by UW-AI faculty) Bayesian Networks Mausam (Slides by UW-AI faculty) Bayes Nets In general, joint distribution P over set of variables (X 1 x... x X n ) requires exponential space for representation & inference BNs provide

More information

Data Modeling & Analysis Techniques. Probability & Statistics. Manfred Huber 2011 1

Data Modeling & Analysis Techniques. Probability & Statistics. Manfred Huber 2011 1 Data Modeling & Analysis Techniques Probability & Statistics Manfred Huber 2011 1 Probability and Statistics Probability and statistics are often used interchangeably but are different, related fields

More information

Life of A Knowledge Base (KB)

Life of A Knowledge Base (KB) Life of A Knowledge Base (KB) A knowledge base system is a special kind of database management system to for knowledge base management. KB extraction: knowledge extraction using statistical models in NLP/ML

More information

Bayesian Networks. Read R&N Ch. 14.1-14.2. Next lecture: Read R&N 18.1-18.4

Bayesian Networks. Read R&N Ch. 14.1-14.2. Next lecture: Read R&N 18.1-18.4 Bayesian Networks Read R&N Ch. 14.1-14.2 Next lecture: Read R&N 18.1-18.4 You will be expected to know Basic concepts and vocabulary of Bayesian networks. Nodes represent random variables. Directed arcs

More information

The Basics of Graphical Models

The Basics of Graphical Models The Basics of Graphical Models David M. Blei Columbia University October 3, 2015 Introduction These notes follow Chapter 2 of An Introduction to Probabilistic Graphical Models by Michael Jordan. Many figures

More information

Relational Dynamic Bayesian Networks: a report. Cristina Manfredotti

Relational Dynamic Bayesian Networks: a report. Cristina Manfredotti Relational Dynamic Bayesian Networks: a report Cristina Manfredotti Dipartimento di Informatica, Sistemistica e Comunicazione (D.I.S.Co.) Università degli Studi Milano-Bicocca manfredotti@disco.unimib.it

More information

Bayesian Tutorial (Sheet Updated 20 March)

Bayesian Tutorial (Sheet Updated 20 March) Bayesian Tutorial (Sheet Updated 20 March) Practice Questions (for discussing in Class) Week starting 21 March 2016 1. What is the probability that the total of two dice will be greater than 8, given that

More information

6.3 Conditional Probability and Independence

6.3 Conditional Probability and Independence 222 CHAPTER 6. PROBABILITY 6.3 Conditional Probability and Independence Conditional Probability Two cubical dice each have a triangle painted on one side, a circle painted on two sides and a square painted

More information

CHAPTER 2 Estimating Probabilities

CHAPTER 2 Estimating Probabilities CHAPTER 2 Estimating Probabilities Machine Learning Copyright c 2016. Tom M. Mitchell. All rights reserved. *DRAFT OF January 24, 2016* *PLEASE DO NOT DISTRIBUTE WITHOUT AUTHOR S PERMISSION* This is a

More information

The Certainty-Factor Model

The Certainty-Factor Model The Certainty-Factor Model David Heckerman Departments of Computer Science and Pathology University of Southern California HMR 204, 2025 Zonal Ave Los Angeles, CA 90033 dheck@sumex-aim.stanford.edu 1 Introduction

More information

3. Mathematical Induction

3. Mathematical Induction 3. MATHEMATICAL INDUCTION 83 3. Mathematical Induction 3.1. First Principle of Mathematical Induction. Let P (n) be a predicate with domain of discourse (over) the natural numbers N = {0, 1,,...}. If (1)

More information

A Few Basics of Probability

A Few Basics of Probability A Few Basics of Probability Philosophy 57 Spring, 2004 1 Introduction This handout distinguishes between inductive and deductive logic, and then introduces probability, a concept essential to the study

More information

Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay

Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay Lecture - 17 Shannon-Fano-Elias Coding and Introduction to Arithmetic Coding

More information

Attribution. Modified from Stuart Russell s slides (Berkeley) Parts of the slides are inspired by Dan Klein s lecture material for CS 188 (Berkeley)

Attribution. Modified from Stuart Russell s slides (Berkeley) Parts of the slides are inspired by Dan Klein s lecture material for CS 188 (Berkeley) Machine Learning 1 Attribution Modified from Stuart Russell s slides (Berkeley) Parts of the slides are inspired by Dan Klein s lecture material for CS 188 (Berkeley) 2 Outline Inductive learning Decision

More information

CORRELATED TO THE SOUTH CAROLINA COLLEGE AND CAREER-READY FOUNDATIONS IN ALGEBRA

CORRELATED TO THE SOUTH CAROLINA COLLEGE AND CAREER-READY FOUNDATIONS IN ALGEBRA We Can Early Learning Curriculum PreK Grades 8 12 INSIDE ALGEBRA, GRADES 8 12 CORRELATED TO THE SOUTH CAROLINA COLLEGE AND CAREER-READY FOUNDATIONS IN ALGEBRA April 2016 www.voyagersopris.com Mathematical

More information

Lecture Note 1 Set and Probability Theory. MIT 14.30 Spring 2006 Herman Bennett

Lecture Note 1 Set and Probability Theory. MIT 14.30 Spring 2006 Herman Bennett Lecture Note 1 Set and Probability Theory MIT 14.30 Spring 2006 Herman Bennett 1 Set Theory 1.1 Definitions and Theorems 1. Experiment: any action or process whose outcome is subject to uncertainty. 2.

More information

Basic Probability Concepts

Basic Probability Concepts page 1 Chapter 1 Basic Probability Concepts 1.1 Sample and Event Spaces 1.1.1 Sample Space A probabilistic (or statistical) experiment has the following characteristics: (a) the set of all possible outcomes

More information

Learning from Data: Naive Bayes

Learning from Data: Naive Bayes Semester 1 http://www.anc.ed.ac.uk/ amos/lfd/ Naive Bayes Typical example: Bayesian Spam Filter. Naive means naive. Bayesian methods can be much more sophisticated. Basic assumption: conditional independence.

More information

Conditional Probability, Independence and Bayes Theorem Class 3, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom

Conditional Probability, Independence and Bayes Theorem Class 3, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom Conditional Probability, Independence and Bayes Theorem Class 3, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom 1 Learning Goals 1. Know the definitions of conditional probability and independence

More information

People have thought about, and defined, probability in different ways. important to note the consequences of the definition:

People have thought about, and defined, probability in different ways. important to note the consequences of the definition: PROBABILITY AND LIKELIHOOD, A BRIEF INTRODUCTION IN SUPPORT OF A COURSE ON MOLECULAR EVOLUTION (BIOL 3046) Probability The subject of PROBABILITY is a branch of mathematics dedicated to building models

More information

10-601. Machine Learning. http://www.cs.cmu.edu/afs/cs/academic/class/10601-f10/index.html

10-601. Machine Learning. http://www.cs.cmu.edu/afs/cs/academic/class/10601-f10/index.html 10-601 Machine Learning http://www.cs.cmu.edu/afs/cs/academic/class/10601-f10/index.html Course data All up-to-date info is on the course web page: http://www.cs.cmu.edu/afs/cs/academic/class/10601-f10/index.html

More information

Math/Stats 425 Introduction to Probability. 1. Uncertainty and the axioms of probability

Math/Stats 425 Introduction to Probability. 1. Uncertainty and the axioms of probability Math/Stats 425 Introduction to Probability 1. Uncertainty and the axioms of probability Processes in the real world are random if outcomes cannot be predicted with certainty. Example: coin tossing, stock

More information

Statistics in Geophysics: Introduction and Probability Theory

Statistics in Geophysics: Introduction and Probability Theory Statistics in Geophysics: Introduction and Steffen Unkel Department of Statistics Ludwig-Maximilians-University Munich, Germany Winter Term 2013/14 1/32 What is Statistics? Introduction Statistics is the

More information

Solution to Homework 2

Solution to Homework 2 Solution to Homework 2 Olena Bormashenko September 23, 2011 Section 1.4: 1(a)(b)(i)(k), 4, 5, 14; Section 1.5: 1(a)(b)(c)(d)(e)(n), 2(a)(c), 13, 16, 17, 18, 27 Section 1.4 1. Compute the following, if

More information

Section 6.1 Joint Distribution Functions

Section 6.1 Joint Distribution Functions Section 6.1 Joint Distribution Functions We often care about more than one random variable at a time. DEFINITION: For any two random variables X and Y the joint cumulative probability distribution function

More information

Question: What is the probability that a five-card poker hand contains a flush, that is, five cards of the same suit?

Question: What is the probability that a five-card poker hand contains a flush, that is, five cards of the same suit? ECS20 Discrete Mathematics Quarter: Spring 2007 Instructor: John Steinberger Assistant: Sophie Engle (prepared by Sophie Engle) Homework 8 Hints Due Wednesday June 6 th 2007 Section 6.1 #16 What is the

More information

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4) Summary of Formulas and Concepts Descriptive Statistics (Ch. 1-4) Definitions Population: The complete set of numerical information on a particular quantity in which an investigator is interested. We assume

More information

Probability Models.S1 Introduction to Probability

Probability Models.S1 Introduction to Probability Probability Models.S1 Introduction to Probability Operations Research Models and Methods Paul A. Jensen and Jonathan F. Bard The stochastic chapters of this book involve random variability. Decisions are

More information

ST 371 (IV): Discrete Random Variables

ST 371 (IV): Discrete Random Variables ST 371 (IV): Discrete Random Variables 1 Random Variables A random variable (rv) is a function that is defined on the sample space of the experiment and that assigns a numerical variable to each possible

More information

Part III: Machine Learning. CS 188: Artificial Intelligence. Machine Learning This Set of Slides. Parameter Estimation. Estimation: Smoothing

Part III: Machine Learning. CS 188: Artificial Intelligence. Machine Learning This Set of Slides. Parameter Estimation. Estimation: Smoothing CS 188: Artificial Intelligence Lecture 20: Dynamic Bayes Nets, Naïve Bayes Pieter Abbeel UC Berkeley Slides adapted from Dan Klein. Part III: Machine Learning Up until now: how to reason in a model and

More information

Probabilistic Networks An Introduction to Bayesian Networks and Influence Diagrams

Probabilistic Networks An Introduction to Bayesian Networks and Influence Diagrams Probabilistic Networks An Introduction to Bayesian Networks and Influence Diagrams Uffe B. Kjærulff Department of Computer Science Aalborg University Anders L. Madsen HUGIN Expert A/S 10 May 2005 2 Contents

More information

Discrete Mathematics and Probability Theory Fall 2009 Satish Rao, David Tse Note 2

Discrete Mathematics and Probability Theory Fall 2009 Satish Rao, David Tse Note 2 CS 70 Discrete Mathematics and Probability Theory Fall 2009 Satish Rao, David Tse Note 2 Proofs Intuitively, the concept of proof should already be familiar We all like to assert things, and few of us

More information

Incorporating Evidence in Bayesian networks with the Select Operator

Incorporating Evidence in Bayesian networks with the Select Operator Incorporating Evidence in Bayesian networks with the Select Operator C.J. Butz and F. Fang Department of Computer Science, University of Regina Regina, Saskatchewan, Canada SAS 0A2 {butz, fang11fa}@cs.uregina.ca

More information

E3: PROBABILITY AND STATISTICS lecture notes

E3: PROBABILITY AND STATISTICS lecture notes E3: PROBABILITY AND STATISTICS lecture notes 2 Contents 1 PROBABILITY THEORY 7 1.1 Experiments and random events............................ 7 1.2 Certain event. Impossible event............................

More information

CHAPTER 3. Methods of Proofs. 1. Logical Arguments and Formal Proofs

CHAPTER 3. Methods of Proofs. 1. Logical Arguments and Formal Proofs CHAPTER 3 Methods of Proofs 1. Logical Arguments and Formal Proofs 1.1. Basic Terminology. An axiom is a statement that is given to be true. A rule of inference is a logical rule that is used to deduce

More information

Decision Trees and Networks

Decision Trees and Networks Lecture 21: Uncertainty 6 Today s Lecture Victor R. Lesser CMPSCI 683 Fall 2010 Decision Trees and Networks Decision Trees A decision tree is an explicit representation of all the possible scenarios from

More information

Markov random fields and Gibbs measures

Markov random fields and Gibbs measures Chapter Markov random fields and Gibbs measures 1. Conditional independence Suppose X i is a random element of (X i, B i ), for i = 1, 2, 3, with all X i defined on the same probability space (.F, P).

More information

Reliability Applications (Independence and Bayes Rule)

Reliability Applications (Independence and Bayes Rule) Reliability Applications (Independence and Bayes Rule ECE 313 Probability with Engineering Applications Lecture 5 Professor Ravi K. Iyer University of Illinois Today s Topics Review of Physical vs. Stochastic

More information

Compression algorithm for Bayesian network modeling of binary systems

Compression algorithm for Bayesian network modeling of binary systems Compression algorithm for Bayesian network modeling of binary systems I. Tien & A. Der Kiureghian University of California, Berkeley ABSTRACT: A Bayesian network (BN) is a useful tool for analyzing the

More information

Linear Threshold Units

Linear Threshold Units Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear

More information

Business Statistics 41000: Probability 1

Business Statistics 41000: Probability 1 Business Statistics 41000: Probability 1 Drew D. Creal University of Chicago, Booth School of Business Week 3: January 24 and 25, 2014 1 Class information Drew D. Creal Email: dcreal@chicagobooth.edu Office:

More information

4. Continuous Random Variables, the Pareto and Normal Distributions

4. Continuous Random Variables, the Pareto and Normal Distributions 4. Continuous Random Variables, the Pareto and Normal Distributions A continuous random variable X can take any value in a given range (e.g. height, weight, age). The distribution of a continuous random

More information

3.1. RATIONAL EXPRESSIONS

3.1. RATIONAL EXPRESSIONS 3.1. RATIONAL EXPRESSIONS RATIONAL NUMBERS In previous courses you have learned how to operate (do addition, subtraction, multiplication, and division) on rational numbers (fractions). Rational numbers

More information

Representation of functions as power series

Representation of functions as power series Representation of functions as power series Dr. Philippe B. Laval Kennesaw State University November 9, 008 Abstract This document is a summary of the theory and techniques used to represent functions

More information

Chapter 3. Cartesian Products and Relations. 3.1 Cartesian Products

Chapter 3. Cartesian Products and Relations. 3.1 Cartesian Products Chapter 3 Cartesian Products and Relations The material in this chapter is the first real encounter with abstraction. Relations are very general thing they are a special type of subset. After introducing

More information

Model-based Synthesis. Tony O Hagan

Model-based Synthesis. Tony O Hagan Model-based Synthesis Tony O Hagan Stochastic models Synthesising evidence through a statistical model 2 Evidence Synthesis (Session 3), Helsinki, 28/10/11 Graphical modelling The kinds of models that

More information

Basics of Statistical Machine Learning

Basics of Statistical Machine Learning CS761 Spring 2013 Advanced Machine Learning Basics of Statistical Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu Modern machine learning is rooted in statistics. You will find many familiar

More information

Review. Bayesianism and Reliability. Today s Class

Review. Bayesianism and Reliability. Today s Class Review Bayesianism and Reliability Models and Simulations in Philosophy April 14th, 2014 Last Class: Difference between individual and social epistemology Why simulations are particularly useful for social

More information

Integrals of Rational Functions

Integrals of Rational Functions Integrals of Rational Functions Scott R. Fulton Overview A rational function has the form where p and q are polynomials. For example, r(x) = p(x) q(x) f(x) = x2 3 x 4 + 3, g(t) = t6 + 4t 2 3, 7t 5 + 3t

More information

3. The Junction Tree Algorithms

3. The Junction Tree Algorithms A Short Course on Graphical Models 3. The Junction Tree Algorithms Mark Paskin mark@paskin.org 1 Review: conditional independence Two random variables X and Y are independent (written X Y ) iff p X ( )

More information

Bayesian Updating with Discrete Priors Class 11, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom

Bayesian Updating with Discrete Priors Class 11, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom 1 Learning Goals Bayesian Updating with Discrete Priors Class 11, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom 1. Be able to apply Bayes theorem to compute probabilities. 2. Be able to identify

More information

An Empirical Evaluation of Bayesian Networks Derived from Fault Trees

An Empirical Evaluation of Bayesian Networks Derived from Fault Trees An Empirical Evaluation of Bayesian Networks Derived from Fault Trees Shane Strasser and John Sheppard Department of Computer Science Montana State University Bozeman, MT 59717 {shane.strasser, john.sheppard}@cs.montana.edu

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 5 9/17/2008 RANDOM VARIABLES

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 5 9/17/2008 RANDOM VARIABLES MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 5 9/17/2008 RANDOM VARIABLES Contents 1. Random variables and measurable functions 2. Cumulative distribution functions 3. Discrete

More information

Linear Programming. March 14, 2014

Linear Programming. March 14, 2014 Linear Programming March 1, 01 Parts of this introduction to linear programming were adapted from Chapter 9 of Introduction to Algorithms, Second Edition, by Cormen, Leiserson, Rivest and Stein [1]. 1

More information

Statistical Machine Translation: IBM Models 1 and 2

Statistical Machine Translation: IBM Models 1 and 2 Statistical Machine Translation: IBM Models 1 and 2 Michael Collins 1 Introduction The next few lectures of the course will be focused on machine translation, and in particular on statistical machine translation

More information

Cell Phone based Activity Detection using Markov Logic Network

Cell Phone based Activity Detection using Markov Logic Network Cell Phone based Activity Detection using Markov Logic Network Somdeb Sarkhel sxs104721@utdallas.edu 1 Introduction Mobile devices are becoming increasingly sophisticated and the latest generation of smart

More information

Note on growth and growth accounting

Note on growth and growth accounting CHAPTER 0 Note on growth and growth accounting 1. Growth and the growth rate In this section aspects of the mathematical concept of the rate of growth used in growth models and in the empirical analysis

More information

Handout #1: Mathematical Reasoning

Handout #1: Mathematical Reasoning Math 101 Rumbos Spring 2010 1 Handout #1: Mathematical Reasoning 1 Propositional Logic A proposition is a mathematical statement that it is either true or false; that is, a statement whose certainty or

More information

Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus

Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus 1. Introduction Facebook is a social networking website with an open platform that enables developers to extract and utilize user information

More information

Notes on Determinant

Notes on Determinant ENGG2012B Advanced Engineering Mathematics Notes on Determinant Lecturer: Kenneth Shum Lecture 9-18/02/2013 The determinant of a system of linear equations determines whether the solution is unique, without

More information

Decision Making under Uncertainty

Decision Making under Uncertainty 6.825 Techniques in Artificial Intelligence Decision Making under Uncertainty How to make one decision in the face of uncertainty Lecture 19 1 In the next two lectures, we ll look at the question of how

More information

Bayesian probability theory

Bayesian probability theory Bayesian probability theory Bruno A. Olshausen arch 1, 2004 Abstract Bayesian probability theory provides a mathematical framework for peforming inference, or reasoning, using probability. The foundations

More information

3.2 Roulette and Markov Chains

3.2 Roulette and Markov Chains 238 CHAPTER 3. DISCRETE DYNAMICAL SYSTEMS WITH MANY VARIABLES 3.2 Roulette and Markov Chains In this section we will be discussing an application of systems of recursion equations called Markov Chains.

More information

Solutions to Math 51 First Exam January 29, 2015

Solutions to Math 51 First Exam January 29, 2015 Solutions to Math 5 First Exam January 29, 25. ( points) (a) Complete the following sentence: A set of vectors {v,..., v k } is defined to be linearly dependent if (2 points) there exist c,... c k R, not

More information

STA 371G: Statistics and Modeling

STA 371G: Statistics and Modeling STA 371G: Statistics and Modeling Decision Making Under Uncertainty: Probability, Betting Odds and Bayes Theorem Mingyuan Zhou McCombs School of Business The University of Texas at Austin http://mingyuanzhou.github.io/sta371g

More information

Testing LTL Formula Translation into Büchi Automata

Testing LTL Formula Translation into Büchi Automata Testing LTL Formula Translation into Büchi Automata Heikki Tauriainen and Keijo Heljanko Helsinki University of Technology, Laboratory for Theoretical Computer Science, P. O. Box 5400, FIN-02015 HUT, Finland

More information

Quotient Rings and Field Extensions

Quotient Rings and Field Extensions Chapter 5 Quotient Rings and Field Extensions In this chapter we describe a method for producing field extension of a given field. If F is a field, then a field extension is a field K that contains F.

More information

Study Manual. Probabilistic Reasoning. Silja Renooij. August 2015

Study Manual. Probabilistic Reasoning. Silja Renooij. August 2015 Study Manual Probabilistic Reasoning 2015 2016 Silja Renooij August 2015 General information This study manual was designed to help guide your self studies. As such, it does not include material that is

More information

2 Integrating Both Sides

2 Integrating Both Sides 2 Integrating Both Sides So far, the only general method we have for solving differential equations involves equations of the form y = f(x), where f(x) is any function of x. The solution to such an equation

More information

MATH 10034 Fundamental Mathematics IV

MATH 10034 Fundamental Mathematics IV MATH 0034 Fundamental Mathematics IV http://www.math.kent.edu/ebooks/0034/funmath4.pdf Department of Mathematical Sciences Kent State University January 2, 2009 ii Contents To the Instructor v Polynomials.

More information

Chapter 4 Lecture Notes

Chapter 4 Lecture Notes Chapter 4 Lecture Notes Random Variables October 27, 2015 1 Section 4.1 Random Variables A random variable is typically a real-valued function defined on the sample space of some experiment. For instance,

More information

Chapter 4. Probability and Probability Distributions

Chapter 4. Probability and Probability Distributions Chapter 4. robability and robability Distributions Importance of Knowing robability To know whether a sample is not identical to the population from which it was selected, it is necessary to assess the

More information

Math Review. for the Quantitative Reasoning Measure of the GRE revised General Test

Math Review. for the Quantitative Reasoning Measure of the GRE revised General Test Math Review for the Quantitative Reasoning Measure of the GRE revised General Test www.ets.org Overview This Math Review will familiarize you with the mathematical skills and concepts that are important

More information

Prediction of Heart Disease Using Naïve Bayes Algorithm

Prediction of Heart Disease Using Naïve Bayes Algorithm Prediction of Heart Disease Using Naïve Bayes Algorithm R.Karthiyayini 1, S.Chithaara 2 Assistant Professor, Department of computer Applications, Anna University, BIT campus, Tiruchirapalli, Tamilnadu,

More information

Likewise, we have contradictions: formulas that can only be false, e.g. (p p).

Likewise, we have contradictions: formulas that can only be false, e.g. (p p). CHAPTER 4. STATEMENT LOGIC 59 The rightmost column of this truth table contains instances of T and instances of F. Notice that there are no degrees of contingency. If both values are possible, the formula

More information

IAM 530 ELEMENTS OF PROBABILITY AND STATISTICS INTRODUCTION

IAM 530 ELEMENTS OF PROBABILITY AND STATISTICS INTRODUCTION IAM 530 ELEMENTS OF PROBABILITY AND STATISTICS INTRODUCTION 1 WHAT IS STATISTICS? Statistics is a science of collecting data, organizing and describing it and drawing conclusions from it. That is, statistics

More information

Probability. a number between 0 and 1 that indicates how likely it is that a specific event or set of events will occur.

Probability. a number between 0 and 1 that indicates how likely it is that a specific event or set of events will occur. Probability Probability Simple experiment Sample space Sample point, or elementary event Event, or event class Mutually exclusive outcomes Independent events a number between 0 and 1 that indicates how

More information

Curriculum Alignment Project

Curriculum Alignment Project Curriculum Alignment Project Math Unit Date: Unit Details Title: Solving Linear Equations Level: Developmental Algebra Team Members: Michael Guy Mathematics, Queensborough Community College, CUNY Jonathan

More information

DIFFERENTIABILITY OF COMPLEX FUNCTIONS. Contents

DIFFERENTIABILITY OF COMPLEX FUNCTIONS. Contents DIFFERENTIABILITY OF COMPLEX FUNCTIONS Contents 1. Limit definition of a derivative 1 2. Holomorphic functions, the Cauchy-Riemann equations 3 3. Differentiability of real functions 5 4. A sufficient condition

More information

Factoring. Factoring 1

Factoring. Factoring 1 Factoring Factoring 1 Factoring Security of RSA algorithm depends on (presumed) difficulty of factoring o Given N = pq, find p or q and RSA is broken o Rabin cipher also based on factoring Factoring like

More information

Ex. 2.1 (Davide Basilio Bartolini)

Ex. 2.1 (Davide Basilio Bartolini) ECE 54: Elements of Information Theory, Fall 00 Homework Solutions Ex.. (Davide Basilio Bartolini) Text Coin Flips. A fair coin is flipped until the first head occurs. Let X denote the number of flips

More information

5 Directed acyclic graphs

5 Directed acyclic graphs 5 Directed acyclic graphs (5.1) Introduction In many statistical studies we have prior knowledge about a temporal or causal ordering of the variables. In this chapter we will use directed graphs to incorporate

More information

Tagging with Hidden Markov Models

Tagging with Hidden Markov Models Tagging with Hidden Markov Models Michael Collins 1 Tagging Problems In many NLP problems, we would like to model pairs of sequences. Part-of-speech (POS) tagging is perhaps the earliest, and most famous,

More information

Question 2 Naïve Bayes (16 points)

Question 2 Naïve Bayes (16 points) Question 2 Naïve Bayes (16 points) About 2/3 of your email is spam so you downloaded an open source spam filter based on word occurrences that uses the Naive Bayes classifier. Assume you collected the

More information

Bayesian Analysis for the Social Sciences

Bayesian Analysis for the Social Sciences Bayesian Analysis for the Social Sciences Simon Jackman Stanford University http://jackman.stanford.edu/bass November 9, 2012 Simon Jackman (Stanford) Bayesian Analysis for the Social Sciences November

More information

Zeros of a Polynomial Function

Zeros of a Polynomial Function Zeros of a Polynomial Function An important consequence of the Factor Theorem is that finding the zeros of a polynomial is really the same thing as factoring it into linear factors. In this section we

More information

Unit 19: Probability Models

Unit 19: Probability Models Unit 19: Probability Models Summary of Video Probability is the language of uncertainty. Using statistics, we can better predict the outcomes of random phenomena over the long term from the very complex,

More information

Efficient Data Structures for Decision Diagrams

Efficient Data Structures for Decision Diagrams Artificial Intelligence Laboratory Efficient Data Structures for Decision Diagrams Master Thesis Nacereddine Ouaret Professor: Supervisors: Boi Faltings Thomas Léauté Radoslaw Szymanek Contents Introduction...

More information

Mathematical Induction

Mathematical Induction Mathematical Induction In logic, we often want to prove that every member of an infinite set has some feature. E.g., we would like to show: N 1 : is a number 1 : has the feature Φ ( x)(n 1 x! 1 x) How

More information

Extending the Role of Causality in Probabilistic Modeling

Extending the Role of Causality in Probabilistic Modeling Extending the Role of Causality in Probabilistic Modeling Joost Vennekens, Marc Denecker, Maurice Bruynooghe {joost, marcd, maurice}@cs.kuleuven.be K.U. Leuven, Belgium Causality Causality is central concept

More information

Chapter 7 Probability. Example of a random circumstance. Random Circumstance. What does probability mean?? Goals in this chapter

Chapter 7 Probability. Example of a random circumstance. Random Circumstance. What does probability mean?? Goals in this chapter Homework (due Wed, Oct 27) Chapter 7: #17, 27, 28 Announcements: Midterm exams keys on web. (For a few hours the answer to MC#1 was incorrect on Version A.) No grade disputes now. Will have a chance to

More information

So, using the new notation, P X,Y (0,1) =.08 This is the value which the joint probability function for X and Y takes when X=0 and Y=1.

So, using the new notation, P X,Y (0,1) =.08 This is the value which the joint probability function for X and Y takes when X=0 and Y=1. Joint probabilit is the probabilit that the RVs & Y take values &. like the PDF of the two events, and. We will denote a joint probabilit function as P,Y (,) = P(= Y=) Marginal probabilit of is the probabilit

More information

Discrete Structures for Computer Science

Discrete Structures for Computer Science Discrete Structures for Computer Science Adam J. Lee adamlee@cs.pitt.edu 6111 Sennott Square Lecture #20: Bayes Theorem November 5, 2013 How can we incorporate prior knowledge? Sometimes we want to know

More information

POLYNOMIAL FUNCTIONS

POLYNOMIAL FUNCTIONS POLYNOMIAL FUNCTIONS Polynomial Division.. 314 The Rational Zero Test.....317 Descarte s Rule of Signs... 319 The Remainder Theorem.....31 Finding all Zeros of a Polynomial Function.......33 Writing a

More information

This unit will lay the groundwork for later units where the students will extend this knowledge to quadratic and exponential functions.

This unit will lay the groundwork for later units where the students will extend this knowledge to quadratic and exponential functions. Algebra I Overview View unit yearlong overview here Many of the concepts presented in Algebra I are progressions of concepts that were introduced in grades 6 through 8. The content presented in this course

More information

CHAPTER 2. Logic. 1. Logic Definitions. Notation: Variables are used to represent propositions. The most common variables used are p, q, and r.

CHAPTER 2. Logic. 1. Logic Definitions. Notation: Variables are used to represent propositions. The most common variables used are p, q, and r. CHAPTER 2 Logic 1. Logic Definitions 1.1. Propositions. Definition 1.1.1. A proposition is a declarative sentence that is either true (denoted either T or 1) or false (denoted either F or 0). Notation:

More information

CS510 Software Engineering

CS510 Software Engineering CS510 Software Engineering Propositional Logic Asst. Prof. Mathias Payer Department of Computer Science Purdue University TA: Scott A. Carr Slides inspired by Xiangyu Zhang http://nebelwelt.net/teaching/15-cs510-se

More information

Lecture 1 Introduction Properties of Probability Methods of Enumeration Asrat Temesgen Stockholm University

Lecture 1 Introduction Properties of Probability Methods of Enumeration Asrat Temesgen Stockholm University Lecture 1 Introduction Properties of Probability Methods of Enumeration Asrat Temesgen Stockholm University 1 Chapter 1 Probability 1.1 Basic Concepts In the study of statistics, we consider experiments

More information

The Two Envelopes Problem

The Two Envelopes Problem 1 The Two Envelopes Problem Rich Turner and Tom Quilter The Two Envelopes Problem, like its better known cousin, the Monty Hall problem, is seemingly paradoxical if you are not careful with your analysis.

More information

Principle of Data Reduction

Principle of Data Reduction Chapter 6 Principle of Data Reduction 6.1 Introduction An experimenter uses the information in a sample X 1,..., X n to make inferences about an unknown parameter θ. If the sample size n is large, then

More information