# Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Graphical Models. Niels Landwehr

Save this PDF as:

Size: px
Start display at page:

Download "Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Graphical Models. Niels Landwehr"

## Transcription

1 Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Graphical Models iels Landwehr

2 Overview: Graphical Models Graphical models: tool for modelling domain with several random variables. For example medical domains: joint distribution over attributes of patients, symptoms, and diseases. Can be used to answer any probabilistic query. Lyme disease diagnostics 2

3 Agenda Graphical models: syntax and semantics. Inference in graphical models (exact, approximate) Graphical models in machine learning. 3

4 Agenda Graphical models: syntax and semantics. Inference in graphical models (exact, approximate) Graphical models in machine learning. 4

5 Recap: Random Variables, Distributions Random variables: X, Y, Z,... discrete random variables: distributions defined by probabilities for possible values. continuous random variables: distributions defined by densities. Joint distribution p( X, Y) Conditional distribution p( X Y) p( X, Y) py ( ) Product rule: p( X, Y ) p( X Y ) p( Y ) discrete or continuous Sum rule: p( x) p( x, y) discrete random variables y p( x) p( x, y) dy continuous random variables 5

6 Independence of Random Variables Independence (discrete or continuous) X,Y independent if and only if p( X, Y ) p( X ) p( Y ) X,Y independent if and only if p( X Y ) p( X ) X,Y independent if and only if p( Y X ) p( Y ) Conditional independence (discrete or continuous) X,Y independent given Z if and only if p( X, Y Z) p( X Z) p( Y Z) X,Y independent given Z if and only if p( Y X, Z) p( Y Z) X,Y independent given Z if and only if p( X Y, Z) p( X Z)... simply application of the notion of independence to the conditional joint distribution p(x,y Z) 6

7 Graphical Models: Idea/Goal Goal: Model for the joint distribution p(x 1,,X ) of a set of random variables X 1,,X Given p(x 1,,X ), we can compute All marginal distributions (by sum rule) p( X,..., X ), { i,..., i } {1,..., } i i m 1 m 1 All conditional distributions (from marginal distributions) p( X,..., X X,..., X ), { i,..., i } {1,..., } i1 im im1 imk 1 mk nough to answer all probabilistic queries ( inference problems ) over the random variables X 1,,X 7

8 Graphical Models: Idea/Goal Graphical models: combination of probability theory and graph theory. Compact, intuitive modeling of p(x 1,,X ). Graph structure represents structure of the distributions (dependencies between variables X 1,,X ). Insight into structure of the model; easy to inject prior knowledge. fficient algorithms for inference that exploit the graph structure. Many machine learning methods can be represented as graphical models. Tasks such as finding the MAP model or computing Bayes-optimal predictions can be formulated as inference problems in graphical models. 8

9 Graphical Models: xample xample: Alarm scenario Our house in Los Angeles has an alarm system. We are on holidays. Our neighbor calls in case he hears the alarm going off. In case of a burglary we would like to return. Unfortunately, our neigbor is not always at home. Unfortunately, the alarm can also be triggered by earthquakes. 5 binary random variables B A Burglary burglary has taken place arthquake earthquake has taken place Alarm alarm is triggered eighborcalls neighbor calls us R RadioReport Report about an earthquake on the radio 9

10 Graphical Models: xample Random variables have a joint distribution p(b,,a,,r). How to specify? Which dependencies hold? xample for inference problem: neighbor has called (=1), how likely that there was a burglary (B=1)? Depends on several factors How likely is a burglary a priori? How likely is an earthquake a priori? How likely that alarm is triggered? (aive) inference: p( B1 1) p( B 1, 1) p ( 1) A R B A R p( B 1,, A, 1, R) p( B,, A, 1, R) 10

11 Graphical Models: xample How do we model p(b,,a,,r)? 1. Attempt: complete table of probabilities B A R P(B,,A,,R) Any distribution p( B,, A,, R) can be represented - xponential number of parameters - Difficult to specify for humans 2. Attempt: everything is independent p( B,, A,, R) p( B) p( ) p( A) p( ) p( R) + linear number of parameters - too restrictive, independence assumption does not allow any meaningful inference 11

12 Graphical Models: xample Graphical model: selective independence assumptions, motivated by prior knowledge. Choose variable ordering: e.g. B<<A<<R Product rule: p( B,, A,, R) p( B,, A, ) p( R B,, A, ) p( B,, A) p( B,, A) p( R B,, A, ) p( B, ) p( A B, ) p( B,, A) p( R B,, A, ) p( B) p( B) p( A B, ) p( B,, A) p( R B,, A, ) Factors describe distribution of one random variable as a function of other random variables. Can we simplify these factors? Which dependencies really hold in our domain? 12

13 Graphical Models: xample Decomposition into factors according to product rule: p( B,, A,, R) p( B) p( B) p( A B, ) p( B,, A) p( R B,, A, ) Conditional independence assumptions (remove variables from conditional expression) p( B) p( ) p( A B, ) p( A B, ) p( B,, A) p( A) p( R B,, A, ) p( R ) arthquake does not depend on burglary Alarm does depend on burglary and earthquake Whether neighbor calls only depends on alarm Report on the radio only depends on earthquake Arriving at simplified form of joint distribution: p( B,, A,, R) p( B) p( ) p( A, B) p( A) p( R ) Simplified factors 13

14 Graphical Models: xample Graphical model for Alarm scenario P(B=1) P(=1) B Distribution modeled: p( B,, A,, R) p( B) p( ) p( A, B) p( A) p( R ) P(R=1 ) B P(A=1 B,) A A P(=1 A) R Graphical model: - There is one node for each random variable - For each factor of the form p( X X,..., X ) there is a directed edge from the X to X in the graph - Model is parameterized with conditional distributions p X X1 X k (,..., ) i 1 k 14

15 Graphical Models: xample Graphical model for Alarm scenario umber of parameters: O(2 K ), K= max. number of parents of a node. Here instead of parameters for full table. These directed graphical models are also called Bayesian networks. 15

16 Directed Graphical Models: Definition Given a set of random variables {X 1,,X } A directed graphical model over the random variables {X 1,,X } is a directed graph with ode set X 1,,X There are no directed cycles X X... X X i i i i 2 k 1 odes are associated with parameterized conditional distributions p( Xi pa( X )), where pa( X i i) { X j X j Xi} denotes the set of parent nodes of a node. 1 The graphical model represents a joint distribution over X 1,,X by p( X,..., X ) p( X pa( X )) 1 i1 i i 16

17 Directed Graphical Models: Definition Why does the graph have to be acyclic? Theorem from graph theory: G is acyclic there is an ordering of the nodes such that all directed edges respect the ordering ( ' ') For such an ordering, we can factorize p( X,..., X ) p( X pa( X )) 1 i1 i G i Before X i G in variable ordering according to product rule + conditional independence assumptions (variables sorted according to ) G Counterexample (not a graphical model) X Y p( X, Y) p( X Y) p( Y X ) 17

18 Graphical Models: Independence The graph structure of a graphical model implies (conditional) independencies between random variables. otation: for variables X, Y, Z we write X Y Z p( X Y, Z) p( X Z) " X independent of Y given Z" xtension to disjoint sets A, B, C of random variables: C p( A B, C) p( A C) 18

19 Graphical Models: Independence Which independence assumptions of the form are modeled by the graph structure? Can be checked by sum/product rule starting from the modeled joint distribution (lots of work!) For graphical models, independence assumptions can be read off the graph structure much easier. C D-separation : Set of simple rules from which all independence assumptions encoded in the graph can be derived. D in D-separation stands for Directed, because we are talking about directed graphical models (similar mechanism exists for undirected models, which we do not cover). 19

20 Graphical Models: Independence D-separation: Which indepence assumptions C are modeled by the graph structure? Idea: can be checked by looking at pathes connecting random variables. otation: X Y Z Path between X and Z has a diverging connection at Y ( tail to tail ). X Y Z Path between X and Z has a converging connection at Y ( head to head ). X Y Z Path between X and Z has a serial connection at Y ( head to tail ). 20

21 Diverging Connections B Joint distribution: p( B,, A,, R) p( B) p( ) p( A, B) p( A) p( R ) A R B= Burglary = eighbor calls = arthquake R= Radio report A= Alarm Looking at path A R. Does R hold? 21

22 Diverging Connections Looking at path A R. Does R hold? B A R Joint distribution: p( B,, A,, R) p( B) p( ) p( A, B) p( A) p( R ) B= Burglary = arthquake A= Alarm o, p( A R) p( A) [Can be derived from joint distribution] Intuitively: Radio report probably earthquake probably alarm p( A 1 R 1) p( A 1 R 0) = eighbor calls R= Radio report Variable R influences variable A through the diverging connection R A 22

23 Diverging Connections B A R Joint distribution: p( B,, A,, R) p( B) p( ) p( A, B) p( A) p( R ) observed variable Looking at path A R. Does R hold? 23

24 Diverging Connections B A R Joint distribution: p( B,, A,, R) p( B) p( ) p( A, B) p( A) p( R ) observed variable Looking at path A R. Does R hold? Yes, p( A R, ) p( A ) [Can be derived from joint distribution] Intuitively: If we allready know that an earthquake has occured the probably for alarm is not increased or decreased because of radio report. The diverging path A R is blocked by the observation of. 24

25 Serial Connections B A R Joint distribution: p( B,, A,, R) p( B) p( ) p( A, B) p( A) p( R ) Looking at path A B. Does hold? o, p( B ) p( B) [Can be derived from joint distribution] Intuitively: eighbor calls probably alarm probably burglary p( B 1 1) p( B 1 0) Variable influences variable B through the serial connection AB 25

26 Serial Connections B A R Joint distribution: p( B,, A,, R) p( B) p( ) p( A, B) p( A) p( R ) observed variable Looking at path A B. Does A hold? 26

27 Serial Connections B A R Joint distribution: p( B,, A,, R) p( B) p( ) p( A, B) p( A) p( R ) observed variable Looking at path A B. Does A hold? Yes, p( B, A) p( B A) [Can be derived from joint distribution] Intuitively: If we already know that alarm was triggered, the probably for burglary does not increase or decrease because the neighbor calls. The serial connection A B is blocked by the observation of A. 27

28 Converging Connections B A R Joint distribution: p( B,, A,, R) p( B) p( ) p( A, B) p( A) p( R ) Looking at path B A. Does hold? 28

29 Converging Connections B A R Joint distribution: p( B,, A,, R) p( B) p( ) p( A, B) p( A) p( R ) Looking at path B A. Does hold? Yes, p( B ) p( B) [Can be derived from joint distribution] Intuitively: Burglaries are not more/less frequent on days with earthquakes The converging path B A is blocked if A is not observed 29

30 Converging Connections B A R Joint distribution: p( B,, A,, R) p( B) p( ) p( A, B) p( A) p( R ) observed variable Looking at path B A. Does A hold? o, p( B, A) p( B A) [Derive from joint distribution] Intuitively: Alarm was triggered. If we observed an earthquake, this explains the alarm, thus probability for burglary is reduced ("explaining away" phenomenon). The converging path B A is unblocked by observation of A 30

31 Converging connections Looking at path B A. Does hold? B A R Joint distribution: p( B,, A,, R) p( B) p( ) p( A, B) p( A) p( R ) observed variable o, p( B, A) p( B A) [Derive from joint distribution] Intuitively: eighbor calls is an indirect observation of alarm. Observation of an earthquake explains the alarm, probability for burglary is reduced ("explaining away"). The converging path B A is unblocked by observing. 31

32 Summary Pathes Summary: a path -X-Y-Z- is B A R Blocked at Y, if Joint distribution: p( B,, A,, R) Diverging connection, and Y is observed, or Serial connection, and Y is observed, or Converging connection, and neither Y nor one of its descendents Y Descendants(Y) is observed Descendants(Y)={Y there is a directed path from Y zu Y } If the path it not blocked at Y, it is free at Y. p( B) p( ) p( A, B) p( A) p( R ) 32

33 D-Separation: Are All Pathes Blocked? So far we have defined if a path is blocked at a particular node. A path is blocked overall, if it is blocked at one of its nodes: Let X,X be random variables, C a set of observed random variables, X,X C A path X X 1 X n X between X and X is blocked given C if and only if there is a node X i such that the path is blocked at the node X i given C. D-Separation: are all pathes blocked? Let X, Y be random variables, C a set of random variables with X, Y C. Definition: X and Y are d-separated given C if and only if every path from X to Y is blocked given C. 33

34 D-Separation: Correct and Complete Given a graphical model over random variables {X 1,,X } with graph structure G. The graphical model defines a joint distribution by p( X,..., X ) p( X pa( X )) 1 i1 that depends on the conditional distributions p( Xi pa( Xi)). Theorem (D-separation is correct and complete) If X,Y are d-separated given C in G, then There are no other independencies that hold irrespective of the choice of the conditional distribution p( X pa( X )). i i X Y C. i i Of course, additional independencies can exists because of the choice of particular p( X pa( X )). i i 34

35 D-Separation: xample A C B G D F Does A F D hold? Does B C hold? Does A C hold? A path -X-Y-Z- is Blocked at Y, if Diverging connection, and Y is observed, or Serial connection, and Y is observed, or Converging connection, and neither Y nor any of ist descendants Y Descendants(Y) is observed. Otherwise the path is free at Y. 35

36 D-Separation: xample A C B G D F Does A F D hold? Yes Does B C hold? o: B G Does A C hold? o: A C B G A path -X-Y-Z- is Blocked at Y, if Diverging connection, and Y is observed, or Serial connection, and Y is observed, or Converging connection, and neither Y nor any of ist descendants Y Descendants(Y) is observed. Otherwise the path is free at Y. 36

37 Bayesian etworks: Causality Often Bayesian networks are constructed in such a way that directed edges correspond to causal influences G However, equivalent model: G A A arthquakes trigger the alarm system The alarm system triggers an earthquake Definition: I(G) ={ (X Y C) : X and Y are d-separated given C in G} All independence assumptions encoded in G I( G) I( G') : ot statistical reasons to prefer one of the models. We cannot distinguish between the models based on data. But causal models often more intuitive. 37

38 Models of Different Complexity Complexity of a model depends on the number (and location) of edges in the graph Many edges: few independence assumptions, many parameters, large class of distributions can be represented. Few edges: many independence assumptions, few parameters, small class of distributions can be represented. 38

39 Models of Different Complexity Adding edges: family of representable distributions becomes larger, I(G) becomes smaller. xtreme cases: graph without any edges, graph completely connected (as an undirected graph) B B A R A R parameter (for binary variables) 2-1 parameters (for binary variables) I( G) {( X Y C) : X, Y RV, C set of RV} IG ( ) 39

### CS 188: Artificial Intelligence. Probability recap

CS 188: Artificial Intelligence Bayes Nets Representation and Independence Pieter Abbeel UC Berkeley Many slides over this course adapted from Dan Klein, Stuart Russell, Andrew Moore Conditional probability

### 13.3 Inference Using Full Joint Distribution

191 The probability distribution on a single variable must sum to 1 It is also true that any joint probability distribution on any set of variables must sum to 1 Recall that any proposition a is equivalent

### Bayesian Networks Chapter 14. Mausam (Slides by UW-AI faculty & David Page)

Bayesian Networks Chapter 14 Mausam (Slides by UW-AI faculty & David Page) Bayes Nets In general, joint distribution P over set of variables (X 1 x... x X n ) requires exponential space for representation

### Lecture 2: Introduction to belief (Bayesian) networks

Lecture 2: Introduction to belief (Bayesian) networks Conditional independence What is a belief network? Independence maps (I-maps) January 7, 2008 1 COMP-526 Lecture 2 Recall from last time: Conditional

### Bayesian Networks. Mausam (Slides by UW-AI faculty)

Bayesian Networks Mausam (Slides by UW-AI faculty) Bayes Nets In general, joint distribution P over set of variables (X 1 x... x X n ) requires exponential space for representation & inference BNs provide

### Probability, Conditional Independence

Probability, Conditional Independence June 19, 2012 Probability, Conditional Independence Probability Sample space Ω of events Each event ω Ω has an associated measure Probability of the event P(ω) Axioms

### Probabilistic Graphical Models

Probabilistic Graphical Models Raquel Urtasun and Tamir Hazan TTI Chicago April 4, 2011 Raquel Urtasun and Tamir Hazan (TTI-C) Graphical Models April 4, 2011 1 / 22 Bayesian Networks and independences

### The Basics of Graphical Models

The Basics of Graphical Models David M. Blei Columbia University October 3, 2015 Introduction These notes follow Chapter 2 of An Introduction to Probabilistic Graphical Models by Michael Jordan. Many figures

### Bayesian Networks. Read R&N Ch. 14.1-14.2. Next lecture: Read R&N 18.1-18.4

Bayesian Networks Read R&N Ch. 14.1-14.2 Next lecture: Read R&N 18.1-18.4 You will be expected to know Basic concepts and vocabulary of Bayesian networks. Nodes represent random variables. Directed arcs

### Life of A Knowledge Base (KB)

Life of A Knowledge Base (KB) A knowledge base system is a special kind of database management system to for knowledge base management. KB extraction: knowledge extraction using statistical models in NLP/ML

### 3. The Junction Tree Algorithms

A Short Course on Graphical Models 3. The Junction Tree Algorithms Mark Paskin mark@paskin.org 1 Review: conditional independence Two random variables X and Y are independent (written X Y ) iff p X ( )

### Intelligent Systems: Reasoning and Recognition. Uncertainty and Plausible Reasoning

Intelligent Systems: Reasoning and Recognition James L. Crowley ENSIMAG 2 / MoSIG M1 Second Semester 2015/2016 Lesson 12 25 march 2015 Uncertainty and Plausible Reasoning MYCIN (continued)...2 Backward

### Artificial Intelligence. Conditional probability. Inference by enumeration. Independence. Lesson 11 (From Russell & Norvig)

Artificial Intelligence Conditional probability Conditional or posterior probabilities e.g., cavity toothache) = 0.8 i.e., given that toothache is all I know tation for conditional distributions: Cavity

### Course: Model, Learning, and Inference: Lecture 5

Course: Model, Learning, and Inference: Lecture 5 Alan Yuille Department of Statistics, UCLA Los Angeles, CA 90095 yuille@stat.ucla.edu Abstract Probability distributions on structured representation.

### Data Modeling & Analysis Techniques. Probability & Statistics. Manfred Huber 2011 1

Data Modeling & Analysis Techniques Probability & Statistics Manfred Huber 2011 1 Probability and Statistics Probability and statistics are often used interchangeably but are different, related fields

### Probabilistic Networks An Introduction to Bayesian Networks and Influence Diagrams

Probabilistic Networks An Introduction to Bayesian Networks and Influence Diagrams Uffe B. Kjærulff Department of Computer Science Aalborg University Anders L. Madsen HUGIN Expert A/S 10 May 2005 2 Contents

### Querying Joint Probability Distributions

Querying Joint Probability Distributions Sargur Srihari srihari@cedar.buffalo.edu 1 Queries of Interest Probabilistic Graphical Models (BNs and MNs) represent joint probability distributions over multiple

### The Joint Probability Distribution (JPD) of a set of n binary variables involve a huge number of parameters

DEFINING PROILISTI MODELS The Joint Probability Distribution (JPD) of a set of n binary variables involve a huge number of parameters 2 n (larger than 10 25 for only 100 variables). x y z p(x, y, z) 0

### Study Manual. Probabilistic Reasoning. Silja Renooij. August 2015

Study Manual Probabilistic Reasoning 2015 2016 Silja Renooij August 2015 General information This study manual was designed to help guide your self studies. As such, it does not include material that is

### Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay

Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay Lecture - 17 Shannon-Fano-Elias Coding and Introduction to Arithmetic Coding

### Marginal Independence and Conditional Independence

Marginal Independence and Conditional Independence Computer Science cpsc322, Lecture 26 (Textbook Chpt 6.1-2) March, 19, 2010 Recap with Example Marginalization Conditional Probability Chain Rule Lecture

### DETERMINING THE CONDITIONAL PROBABILITIES IN BAYESIAN NETWORKS

Hacettepe Journal of Mathematics and Statistics Volume 33 (2004), 69 76 DETERMINING THE CONDITIONAL PROBABILITIES IN BAYESIAN NETWORKS Hülya Olmuş and S. Oral Erbaş Received 22 : 07 : 2003 : Accepted 04

### Message-passing sequential detection of multiple change points in networks

Message-passing sequential detection of multiple change points in networks Long Nguyen, Arash Amini Ram Rajagopal University of Michigan Stanford University ISIT, Boston, July 2012 Nguyen/Amini/Rajagopal

### Evaluation of Machine Learning Techniques for Green Energy Prediction

arxiv:1406.3726v1 [cs.lg] 14 Jun 2014 Evaluation of Machine Learning Techniques for Green Energy Prediction 1 Objective Ankur Sahai University of Mainz, Germany We evaluate Machine Learning techniques

### Compact Representations and Approximations for Compuation in Games

Compact Representations and Approximations for Compuation in Games Kevin Swersky April 23, 2008 Abstract Compact representations have recently been developed as a way of both encoding the strategic interactions

### Basics of Statistical Machine Learning

CS761 Spring 2013 Advanced Machine Learning Basics of Statistical Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu Modern machine learning is rooted in statistics. You will find many familiar

### The Certainty-Factor Model

The Certainty-Factor Model David Heckerman Departments of Computer Science and Pathology University of Southern California HMR 204, 2025 Zonal Ave Los Angeles, CA 90033 dheck@sumex-aim.stanford.edu 1 Introduction

10-601 Machine Learning http://www.cs.cmu.edu/afs/cs/academic/class/10601-f10/index.html Course data All up-to-date info is on the course web page: http://www.cs.cmu.edu/afs/cs/academic/class/10601-f10/index.html

### Compression algorithm for Bayesian network modeling of binary systems

Compression algorithm for Bayesian network modeling of binary systems I. Tien & A. Der Kiureghian University of California, Berkeley ABSTRACT: A Bayesian network (BN) is a useful tool for analyzing the

### Ex. 2.1 (Davide Basilio Bartolini)

ECE 54: Elements of Information Theory, Fall 00 Homework Solutions Ex.. (Davide Basilio Bartolini) Text Coin Flips. A fair coin is flipped until the first head occurs. Let X denote the number of flips

### An Introduction to Machine Learning

An Introduction to Machine Learning L5: Novelty Detection and Regression Alexander J. Smola Statistical Machine Learning Program Canberra, ACT 0200 Australia Alex.Smola@nicta.com.au Tata Institute, Pune,

### 5 Directed acyclic graphs

5 Directed acyclic graphs (5.1) Introduction In many statistical studies we have prior knowledge about a temporal or causal ordering of the variables. In this chapter we will use directed graphs to incorporate

### A Sublinear Bipartiteness Tester for Bounded Degree Graphs

A Sublinear Bipartiteness Tester for Bounded Degree Graphs Oded Goldreich Dana Ron February 5, 1998 Abstract We present a sublinear-time algorithm for testing whether a bounded degree graph is bipartite

### A Bayesian Network Model for Diagnosis of Liver Disorders Agnieszka Onisko, M.S., 1,2 Marek J. Druzdzel, Ph.D., 1 and Hanna Wasyluk, M.D.,Ph.D.

Research Report CBMI-99-27, Center for Biomedical Informatics, University of Pittsburgh, September 1999 A Bayesian Network Model for Diagnosis of Liver Disorders Agnieszka Onisko, M.S., 1,2 Marek J. Druzdzel,

### Relational Dynamic Bayesian Networks: a report. Cristina Manfredotti

Relational Dynamic Bayesian Networks: a report Cristina Manfredotti Dipartimento di Informatica, Sistemistica e Comunicazione (D.I.S.Co.) Università degli Studi Milano-Bicocca manfredotti@disco.unimib.it

### An Introduction to the Use of Bayesian Network to Analyze Gene Expression Data

n Introduction to the Use of ayesian Network to nalyze Gene Expression Data Cristina Manfredotti Dipartimento di Informatica, Sistemistica e Comunicazione (D.I.S.Co. Università degli Studi Milano-icocca

### Covariance and Correlation

Covariance and Correlation ( c Robert J. Serfling Not for reproduction or distribution) We have seen how to summarize a data-based relative frequency distribution by measures of location and spread, such

### CHAPTER 2 Estimating Probabilities

CHAPTER 2 Estimating Probabilities Machine Learning Copyright c 2016. Tom M. Mitchell. All rights reserved. *DRAFT OF January 24, 2016* *PLEASE DO NOT DISTRIBUTE WITHOUT AUTHOR S PERMISSION* This is a

### 3 Multiple Discrete Random Variables

3 Multiple Discrete Random Variables 3.1 Joint densities Suppose we have a probability space (Ω, F,P) and now we have two discrete random variables X and Y on it. They have probability mass functions f

### Markov random fields and Gibbs measures

Chapter Markov random fields and Gibbs measures 1. Conditional independence Suppose X i is a random element of (X i, B i ), for i = 1, 2, 3, with all X i defined on the same probability space (.F, P).

### In Proceedings of the Eleventh Conference on Biocybernetics and Biomedical Engineering, pages 842-846, Warsaw, Poland, December 2-4, 1999

In Proceedings of the Eleventh Conference on Biocybernetics and Biomedical Engineering, pages 842-846, Warsaw, Poland, December 2-4, 1999 A Bayesian Network Model for Diagnosis of Liver Disorders Agnieszka

### Chapter 28. Bayesian Networks

Chapter 28. Bayesian Networks The Quest for Artificial Intelligence, Nilsson, N. J., 2009. Lecture Notes on Artificial Intelligence, Spring 2012 Summarized by Kim, Byoung-Hee and Lim, Byoung-Kwon Biointelligence

### Applied Algorithm Design Lecture 5

Applied Algorithm Design Lecture 5 Pietro Michiardi Eurecom Pietro Michiardi (Eurecom) Applied Algorithm Design Lecture 5 1 / 86 Approximation Algorithms Pietro Michiardi (Eurecom) Applied Algorithm Design

### Statistical Analysis of Computer Network Security. Goran Kap and Dana Ali

Statistical Analysis of Computer Network Security Goran Kap and Dana Ali October 7, 2013 Abstract In this thesis it is shown how to measure the annual loss expectancy of computer networks due to the risk

### Scheduling. Open shop, job shop, flow shop scheduling. Related problems. Open shop, job shop, flow shop scheduling

Scheduling Basic scheduling problems: open shop, job shop, flow job The disjunctive graph representation Algorithms for solving the job shop problem Computational complexity of the job shop problem Open

### Data visualization and clustering. Genomics is to no small extend a data science

Data visualization and clustering Genomics is to no small extend a data science [www.data2discovery.org] Data visualization and clustering Genomics is to no small extend a data science [Andersson et al.,

### Statistics in Geophysics: Introduction and Probability Theory

Statistics in Geophysics: Introduction and Steffen Unkel Department of Statistics Ludwig-Maximilians-University Munich, Germany Winter Term 2013/14 1/32 What is Statistics? Introduction Statistics is the

### The Big 50 Revision Guidelines for S1

The Big 50 Revision Guidelines for S1 If you can understand all of these you ll do very well 1. Know what is meant by a statistical model and the Modelling cycle of continuous refinement 2. Understand

### Bargaining Solutions in a Social Network

Bargaining Solutions in a Social Network Tanmoy Chakraborty and Michael Kearns Department of Computer and Information Science University of Pennsylvania Abstract. We study the concept of bargaining solutions,

### Large-Sample Learning of Bayesian Networks is NP-Hard

Journal of Machine Learning Research 5 (2004) 1287 1330 Submitted 3/04; Published 10/04 Large-Sample Learning of Bayesian Networks is NP-Hard David Maxwell Chickering David Heckerman Christopher Meek Microsoft

### The Application of Bayesian Optimization and Classifier Systems in Nurse Scheduling

The Application of Bayesian Optimization and Classifier Systems in Nurse Scheduling Proceedings of the 8th International Conference on Parallel Problem Solving from Nature (PPSN VIII), LNCS 3242, pp 581-590,

### SYSM 6304: Risk and Decision Analysis Lecture 5: Methods of Risk Analysis

SYSM 6304: Risk and Decision Analysis Lecture 5: Methods of Risk Analysis M. Vidyasagar Cecil & Ida Green Chair The University of Texas at Dallas Email: M.Vidyasagar@utdallas.edu October 17, 2015 Outline

### Chapter 14 Managing Operational Risks with Bayesian Networks

Chapter 14 Managing Operational Risks with Bayesian Networks Carol Alexander This chapter introduces Bayesian belief and decision networks as quantitative management tools for operational risks. Bayesian

### Homework 5 Solutions

Homework 5 Solutions 4.2: 2: a. 321 = 256 + 64 + 1 = (01000001) 2 b. 1023 = 512 + 256 + 128 + 64 + 32 + 16 + 8 + 4 + 2 + 1 = (1111111111) 2. Note that this is 1 less than the next power of 2, 1024, which

### HUGIN Expert White Paper

HUGIN Expert White Paper White Paper Company Profile History HUGIN Expert Today HUGIN Expert vision Our Logo - the Raven Bayesian Networks What is a Bayesian network? The theory behind Bayesian networks

### SOLiD System accuracy with the Exact Call Chemistry module

WHITE PPER 55 Series SOLiD System SOLiD System accuracy with the Exact all hemistry module ONTENTS Principles of Exact all hemistry Introduction Encoding of base sequences with Exact all hemistry Demonstration

### princeton univ. F 13 cos 521: Advanced Algorithm Design Lecture 6: Provable Approximation via Linear Programming Lecturer: Sanjeev Arora

princeton univ. F 13 cos 521: Advanced Algorithm Design Lecture 6: Provable Approximation via Linear Programming Lecturer: Sanjeev Arora Scribe: One of the running themes in this course is the notion of

### SEQUENTIAL VALUATION NETWORKS: A NEW GRAPHICAL TECHNIQUE FOR ASYMMETRIC DECISION PROBLEMS

SEQUENTIAL VALUATION NETWORKS: A NEW GRAPHICAL TECHNIQUE FOR ASYMMETRIC DECISION PROBLEMS Riza Demirer and Prakash P. Shenoy University of Kansas, School of Business 1300 Sunnyside Ave, Summerfield Hall

### Welcome to Stochastic Processes 1. Welcome to Aalborg University No. 1 of 31

Welcome to Stochastic Processes 1 Welcome to Aalborg University No. 1 of 31 Welcome to Aalborg University No. 2 of 31 Course Plan Part 1: Probability concepts, random variables and random processes Lecturer:

### Probability Theory. Elementary rules of probability Sum rule. Product rule. p. 23

Probability Theory Uncertainty is key concept in machine learning. Probability provides consistent framework for the quantification and manipulation of uncertainty. Probability of an event is the fraction

### Structured Learning and Prediction in Computer Vision. Contents

Foundations and Trends R in Computer Graphics and Vision Vol. 6, Nos. 3 4 (2010) 185 365 c 2011 S. Nowozin and C. H. Lampert DOI: 10.1561/0600000033 Structured Learning and Prediction in Computer Vision

### Introduction to Statistical Machine Learning

CHAPTER Introduction to Statistical Machine Learning We start with a gentle introduction to statistical machine learning. Readers familiar with machine learning may wish to skip directly to Section 2,

### A Probabilistic Causal Model for Diagnosis of Liver Disorders

Intelligent Information Systems VII Proceedings of the Workshop held in Malbork, Poland, June 15-19, 1998 A Probabilistic Causal Model for Diagnosis of Liver Disorders Agnieszka Oniśko 1, Marek J. Druzdzel

### 1. Write the number of the left-hand item next to the item on the right that corresponds to it.

1. Write the number of the left-hand item next to the item on the right that corresponds to it. 1. Stanford prison experiment 2. Friendster 3. neuron 4. router 5. tipping 6. small worlds 7. job-hunting

### E3: PROBABILITY AND STATISTICS lecture notes

E3: PROBABILITY AND STATISTICS lecture notes 2 Contents 1 PROBABILITY THEORY 7 1.1 Experiments and random events............................ 7 1.2 Certain event. Impossible event............................

### Multi-Class and Structured Classification

Multi-Class and Structured Classification [slides prises du cours cs294-10 UC Berkeley (2006 / 2009)] [ p y( )] http://www.cs.berkeley.edu/~jordan/courses/294-fall09 Basic Classification in ML Input Output

### Bayesian networks - Time-series models - Apache Spark & Scala

Bayesian networks - Time-series models - Apache Spark & Scala Dr John Sandiford, CTO Bayes Server Data Science London Meetup - November 2014 1 Contents Introduction Bayesian networks Latent variables Anomaly

### Cost Model: Work, Span and Parallelism. 1 The RAM model for sequential computation:

CSE341T 08/31/2015 Lecture 3 Cost Model: Work, Span and Parallelism In this lecture, we will look at how one analyze a parallel program written using Cilk Plus. When we analyze the cost of an algorithm

### Cell Phone based Activity Detection using Markov Logic Network

Cell Phone based Activity Detection using Markov Logic Network Somdeb Sarkhel sxs104721@utdallas.edu 1 Introduction Mobile devices are becoming increasingly sophisticated and the latest generation of smart

### Statistical Inference. Prof. Kate Calder. If the coin is fair (chance of heads = chance of tails) then

Probability Statistical Inference Question: How often would this method give the correct answer if I used it many times? Answer: Use laws of probability. 1 Example: Tossing a coin If the coin is fair (chance

### Part 2: Community Detection

Chapter 8: Graph Data Part 2: Community Detection Based on Leskovec, Rajaraman, Ullman 2014: Mining of Massive Datasets Big Data Management and Analytics Outline Community Detection - Social networks -

### Series Convergence Tests Math 122 Calculus III D Joyce, Fall 2012

Some series converge, some diverge. Series Convergence Tests Math 22 Calculus III D Joyce, Fall 202 Geometric series. We ve already looked at these. We know when a geometric series converges and what it

### Inference of Probability Distributions for Trust and Security applications

Inference of Probability Distributions for Trust and Security applications Vladimiro Sassone Based on joint work with Mogens Nielsen & Catuscia Palamidessi Outline 2 Outline Motivations 2 Outline Motivations

### An-Najah National University Faculty of Engineering Industrial Engineering Department. Course : Quantitative Methods (65211)

An-Najah National University Faculty of Engineering Industrial Engineering Department Course : Quantitative Methods (65211) Instructor: Eng. Tamer Haddad 2 nd Semester 2009/2010 Chapter 5 Example: Joint

### Reasoning Under Uncertainty: Variable Elimination Example

Reasoning Under Uncertainty: Variable Elimination Example CPSC 322 Lecture 29 March 26, 2007 Textbook 9.5 Reasoning Under Uncertainty: Variable Elimination Example CPSC 322 Lecture 29, Slide 1 Lecture

### Section 6.1 Joint Distribution Functions

Section 6.1 Joint Distribution Functions We often care about more than one random variable at a time. DEFINITION: For any two random variables X and Y the joint cumulative probability distribution function

### Joint Distributions. Tieming Ji. Fall 2012

Joint Distributions Tieming Ji Fall 2012 1 / 33 X : univariate random variable. (X, Y ): bivariate random variable. In this chapter, we are going to study the distributions of bivariate random variables

### Worked examples Basic Concepts of Probability Theory

Worked examples Basic Concepts of Probability Theory Example 1 A regular tetrahedron is a body that has four faces and, if is tossed, the probability that it lands on any face is 1/4. Suppose that one

### CS 2750 Machine Learning. Lecture 1. Machine Learning. http://www.cs.pitt.edu/~milos/courses/cs2750/ CS 2750 Machine Learning.

Lecture Machine Learning Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square, x5 http://www.cs.pitt.edu/~milos/courses/cs75/ Administration Instructor: Milos Hauskrecht milos@cs.pitt.edu 539 Sennott

### Analysis of Algorithms, I

Analysis of Algorithms, I CSOR W4231.002 Eleni Drinea Computer Science Department Columbia University Thursday, February 26, 2015 Outline 1 Recap 2 Representing graphs 3 Breadth-first search (BFS) 4 Applications

### 6. Jointly Distributed Random Variables

6. Jointly Distributed Random Variables We are often interested in the relationship between two or more random variables. Example: A randomly chosen person may be a smoker and/or may get cancer. Definition.

### Probabilistic Models for Big Data. Alex Davies and Roger Frigola University of Cambridge 13th February 2014

Probabilistic Models for Big Data Alex Davies and Roger Frigola University of Cambridge 13th February 2014 The State of Big Data Why probabilistic models for Big Data? 1. If you don t have to worry about

### Bayesian Tutorial (Sheet Updated 20 March)

Bayesian Tutorial (Sheet Updated 20 March) Practice Questions (for discussing in Class) Week starting 21 March 2016 1. What is the probability that the total of two dice will be greater than 8, given that

### We have discussed the notion of probabilistic dependence above and indicated that dependence is

1 CHAPTER 7 Online Supplement Covariance and Correlation for Measuring Dependence We have discussed the notion of probabilistic dependence above and indicated that dependence is defined in terms of conditional

### Big Data from a Database Theory Perspective

Big Data from a Database Theory Perspective Martin Grohe Lehrstuhl Informatik 7 - Logic and the Theory of Discrete Systems A CS View on Data Science Applications Data System Users 2 Us Data HUGE heterogeneous

### Why? A central concept in Computer Science. Algorithms are ubiquitous.

Analysis of Algorithms: A Brief Introduction Why? A central concept in Computer Science. Algorithms are ubiquitous. Using the Internet (sending email, transferring files, use of search engines, online

### Mining Social Network Graphs

Mining Social Network Graphs Debapriyo Majumdar Data Mining Fall 2014 Indian Statistical Institute Kolkata November 13, 17, 2014 Social Network No introduc+on required Really? We s7ll need to understand

### P. Jeyanthi and N. Angel Benseera

Opuscula Math. 34, no. 1 (014), 115 1 http://dx.doi.org/10.7494/opmath.014.34.1.115 Opuscula Mathematica A TOTALLY MAGIC CORDIAL LABELING OF ONE-POINT UNION OF n COPIES OF A GRAPH P. Jeyanthi and N. Angel

### ANALYTICS IN BIG DATA ERA

ANALYTICS IN BIG DATA ERA ANALYTICS TECHNOLOGY AND ARCHITECTURE TO MANAGE VELOCITY AND VARIETY, DISCOVER RELATIONSHIPS AND CLASSIFY HUGE AMOUNT OF DATA MAURIZIO SALUSTI SAS Copyr i g ht 2012, SAS Ins titut

### P (x) 0. Discrete random variables Expected value. The expected value, mean or average of a random variable x is: xp (x) = v i P (v i )

Discrete random variables Probability mass function Given a discrete random variable X taking values in X = {v 1,..., v m }, its probability mass function P : X [0, 1] is defined as: P (v i ) = Pr[X =

### Classifying Large Data Sets Using SVMs with Hierarchical Clusters. Presented by :Limou Wang

Classifying Large Data Sets Using SVMs with Hierarchical Clusters Presented by :Limou Wang Overview SVM Overview Motivation Hierarchical micro-clustering algorithm Clustering-Based SVM (CB-SVM) Experimental

### Why the Normal Distribution?

Why the Normal Distribution? Raul Rojas Freie Universität Berlin Februar 2010 Abstract This short note explains in simple terms why the normal distribution is so ubiquitous in pattern recognition applications.

### Question: What is the probability that a five-card poker hand contains a flush, that is, five cards of the same suit?

ECS20 Discrete Mathematics Quarter: Spring 2007 Instructor: John Steinberger Assistant: Sophie Engle (prepared by Sophie Engle) Homework 8 Hints Due Wednesday June 6 th 2007 Section 6.1 #16 What is the

### Statistical Machine Learning from Data

Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Gaussian Mixture Models Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique

### Programming Tools based on Big Data and Conditional Random Fields

Programming Tools based on Big Data and Conditional Random Fields Veselin Raychev Martin Vechev Andreas Krause Department of Computer Science ETH Zurich Zurich Machine Learning and Data Science Meet-up,

### Approximation Algorithms

Approximation Algorithms or: How I Learned to Stop Worrying and Deal with NP-Completeness Ong Jit Sheng, Jonathan (A0073924B) March, 2012 Overview Key Results (I) General techniques: Greedy algorithms

### Causal Independence for Probability Assessment and Inference Using Bayesian Networks

Causal Independence for Probability Assessment and Inference Using Bayesian Networks David Heckerman and John S. Breese Copyright c 1993 IEEE. Reprinted from D. Heckerman, J. Breese. Causal Independence