A crash course in probability and Naïve Bayes classification

Save this PDF as:
 WORD  PNG  TXT  JPG

Size: px
Start display at page:

Download "A crash course in probability and Naïve Bayes classification"

Transcription

1 Probability theory A crash course in probability and Naïve Bayes classification Chapter 9 Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish between discrete and continuous variables. The distribution of a discrete random variable: The probabilities of each value it can take. Notation: P(X = x i ). These numbers satisfy: X P (X = x i )=1 i 1 2 Probability theory Probability theory A joint probability distribution for two variables is a table. Marginal Probability If the two variables are binary, how many parameters does it have? Joint Probability Conditional Probability What about joint probability of d variables P(X 1,,X d )? How many parameters does it have if each variable is binary? 4 1

2 Probability theory The Rules of Probability Marginalization: Marginalization Product Rule Product Rule Independence: X and Y are independent if P(Y X) = P(Y) This implies P(X,Y) = P(X) P(Y) Using probability in learning P (Y X) Interested in: 1% X features Y labels 0.5% 0% For example, when classifying spam, we could estimate P(Y Viagara, lottery) We would then classify an example if P(Y X) > 0.5. However, it s usually easier to model P(X Y) Maximum likelihood Fit a probabilistic model P(x θ) to data Estimate θ Given independent identically distributed (i.i.d.) data X = (x 1, x 2,, x n ) Likelihood Log likelihood P (X ) =P (x 1 )P (x 2 ),...,P(x n ) ln P (X ) = nx ln P (x i ) i=1 Maximum likelihood solution: parameters θ that maximize ln P(X θ) 7 2

3 : coin toss Estimate the probability p that a coin lands Heads using the result of n coin tosses, h of which resulted in heads. The likelihood of the data: Log likelihood: Taking a derivative and setting to 0: P (X ) =p h (1 p) n h ln P (X ) =h ln p +(n h)ln(1 ln P (X = h p (n h) (1 p) =0 From the product rule: and: Therefore: P(Y, X) = P(Y X) P(X) P(Y, X) = P(X Y) P(Y) P (Y X) = This is known as Bayes rule Bayes rule P (X Y )P (Y ) P (X) ) p = h n 9 10 P (Y X) = posterior Bayes rule P (X Y )P (Y ) P (X) posterior likelihood prior P(X) can be computed as: P (X) = X Y P (X Y )P (Y ) likelihood prior Maximum a-posteriori and maximum likelihood The maximum a posteriori (MAP) rule: y MAP = arg max Y P (X Y )P (Y ) P (Y X) = arg max Y P (X) If we ignore the prior distribution or assume it is uniform we obtain the maximum likelihood rule: y ML = arg max P (X Y ) Y = arg max P (X Y )P (Y ) Y A classifier that has access to P(Y X) is a Bayes optimal classifier. But is not important for inferring a label 12 3

4 Naïve Bayes classifier We would like to model P(X Y), where X is a feature vector, and Y is its associated label. Task:%Predict%whether%or%not%a%picnic%spot%is%enjoyable% % Training&Data:&& X%=%(X 1 %%%%%%%X 2 %%%%%%%%X 3 %%%%%%%% %%%%%%%% %%%%%%%X d )%%%%%%%%%%Y% Naïve Bayes classifier We would like to model P(X Y), where X is a feature vector, and Y is its associated label. Simplifying assumption: conditional independence: given the class label the features are independent, i.e. P (X Y )=P (x 1 Y )P (x 2 Y ),...,P(x d Y ) n&rows& How many parameters now? How many parameters? Prior: P(Y) k-1 if k classes Likelihood: P(X Y) (2 d 1)k for binary features Naïve Bayes classifier We would like to model P(X Y), where X is a feature vector, and Y is its associated label. Simplifying assumption: conditional independence: given the class label the features are independent, i.e. P (X Y )=P (x 1 Y )P (x 2 Y ),...,P(x d Y ) Naïve Bayes decision rule: y NB = arg max Y Naïve Bayes classifier P (X Y )P (Y ) = arg max Y dy P (x i Y )P (Y ) i=1 If conditional independence holds, NB is an optimal classifier! How many parameters now? dk + k

5 Training a Naïve Bayes classifier Training data: Feature matrix X (n x d) and labels y 1, y n Maximum likelihood estimates: Class prior: Likelihood: ˆP (y) = {i : y i = y} n ˆP (x i y) = ˆP (x i,y) ˆP (y) = {i : X ij = x i,y i = y} /n {i : y i = y} /n classification Suppose our vocabulary contains three words a, b and c, and we use a multivariate Bernoulli model for our s, with parameters = (0.5,0.67,0.33) = (0.67,0.33,0.33) This means, for example, that the presence of b is twice as likely in spam (+), compared with ham. The to be classified contains words a and b but not c, and hence is described by the bit vector x = (1,1,0). We obtain likelihoods P(x ) = (1 0.33) = P(x ) = (1 0.33) = The ML classification of x is thus spam classification: training data a? b? c? Class e e e e e e e e ount vectors. (right) The same data set What are the parameters of the model? classification: training data a? b? c? Class e e e e e e e e ount vectors. (right) The same data set What are the parameters of the model? ˆP (y) = {i : y i = y} n ˆP (x i y) = ˆP (x i,y) ˆP (y) = {i : X ij = x i,y i = y} /n {i : y i = y} /n

6 classification: training data a? b? c? Class e e e e e e e e ount vectors. (right) The same data set What are the parameters of the model? P(+) = 0.5, P(-) = 0.5 P(a +) = 0.5, P(a -) = 0.75 P(b +) = 0.75, P(b -)= 0.25 P(c +) = 0.25, P(c -)= 0.25 ˆP (x i y) = ˆP (x i,y) ˆP (y) ˆP (y) = {i : y i = y} n = {i : Xij = xi,yi = y} /n {i : y i = y} /n Comments on Naïve Bayes Usually features are not conditionally independent, i.e. P (X Y ) 6= P (x 1 Y )P (x 2 Y ),...,P(x d Y ) And yet, one of the most widely used classifiers. Easy to train! It often performs well even when the assumption is violated. Domingos, P., & Pazzani, M. (1997). Beyond Independence: Conditions for the Optimality of the Simple Bayesian Classifier. Machine Learning. 29, When there are few training examples What if you never see a training example where x 1 =a when y=spam? P(x spam) = P(a spam) P(b spam) P(c spam) = 0 What to do? When there are few training examples What if you never see a training example where x 1 =a when y=spam? P(x spam) = P(a spam) P(b spam) P(c spam) = 0 What to do? Add virtual examples for which x 1 =a when y=spam

7 Naïve Bayes for continuous variables Continuous Probability Distributions Need to talk about continuous distributions! The probability of the random variable assuming a value within some given interval from x 1 to x 2 is defined to be the area under the graph of the probability density function between x 1 and x 2. f (x) Uniform f (x) Normal/Gaussian x x x x 1 x 1 x Expectations The Gaussian (normal) distribution Discrete variables Continuous variables Conditional expectation (discrete) Approximate expectation (discrete and continuous) 7

8 Properties of the Gaussian distribution Standard Normal Distribution 99.72% 95.44% 68.26% A random variable having a normal distribution with a mean of 0 and a standard deviation of 1 is said to have a standard normal probability distribution. µ 3σ µ 1σ µ 2σ µ µ + 1σ µ + 2σ µ + 3σ x Standard Normal Probability Distribution Gaussian Parameter Estimation Converting to the Standard Normal Distribution z x = µ σ Likelihood function We can think of z as a measure of the number of standard deviations x is from µ. 8

9 Maximum (Log) Likelihood 34 Gaussian models Assume we have data that belongs to three classes, and assume a likelihood that follows a Gaussian distribution Likelihood function: P (X i = x Y = y k )= Gaussian Naïve Bayes 1 (x µik ) 2 p exp 2 ik 2 2 ik Need to estimate mean and variance for each feature in each class

10 Summary Naïve Bayes classifier: ² ² ² What s the assumption Why we make it How we learn it Naïve Bayes for discrete data Gaussian naïve Bayes 37 10

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 5: Decision Theory & ROC Curves Gaussian ML Estimation Many figures courtesy Kevin Murphy s textbook,

More information

Probability Theory. Elementary rules of probability Sum rule. Product rule. p. 23

Probability Theory. Elementary rules of probability Sum rule. Product rule. p. 23 Probability Theory Uncertainty is key concept in machine learning. Probability provides consistent framework for the quantification and manipulation of uncertainty. Probability of an event is the fraction

More information

1 Maximum likelihood estimation

1 Maximum likelihood estimation COS 424: Interacting with Data Lecturer: David Blei Lecture #4 Scribes: Wei Ho, Michael Ye February 14, 2008 1 Maximum likelihood estimation 1.1 MLE of a Bernoulli random variable (coin flips) Given N

More information

Bayesian Learning. CSL465/603 - Fall 2016 Narayanan C Krishnan

Bayesian Learning. CSL465/603 - Fall 2016 Narayanan C Krishnan Bayesian Learning CSL465/603 - Fall 2016 Narayanan C Krishnan ckn@iitrpr.ac.in Outline Bayes Theorem MAP Learners Bayes optimal classifier Naïve Bayes classifier Example text classification Bayesian networks

More information

Bayesian Classification

Bayesian Classification CS 650: Computer Vision Bryan S. Morse BYU Computer Science Statistical Basis Training: Class-Conditional Probabilities Suppose that we measure features for a large training set taken from class ω i. Each

More information

Bayes and Naïve Bayes. cs534-machine Learning

Bayes and Naïve Bayes. cs534-machine Learning Bayes and aïve Bayes cs534-machine Learning Bayes Classifier Generative model learns Prediction is made by and where This is often referred to as the Bayes Classifier, because of the use of the Bayes rule

More information

Probabilities for machine learning

Probabilities for machine learning Probabilities for machine learning Roland Memisevic February 2, 2006 Why probabilities? One of the hardest problems when building complex intelligent systems is brittleness. How can we keep tiny irregularities

More information

Bayes Rule. How is this rule derived? Using Bayes rule for probabilistic inference: Evidence Cause) Evidence)

Bayes Rule. How is this rule derived? Using Bayes rule for probabilistic inference: Evidence Cause) Evidence) Bayes Rule P ( A B = B A A B ( Rev. Thomas Bayes (1702-1761 How is this rule derived? Using Bayes rule for probabilistic inference: P ( Cause Evidence = Evidence Cause Cause Evidence Cause Evidence: diagnostic

More information

Bayesian Decision Theory. Prof. Richard Zanibbi

Bayesian Decision Theory. Prof. Richard Zanibbi Bayesian Decision Theory Prof. Richard Zanibbi Bayesian Decision Theory The Basic Idea To minimize errors, choose the least risky class, i.e. the class for which the expected loss is smallest Assumptions

More information

Lecture 2: October 24

Lecture 2: October 24 Machine Learning: Foundations Fall Semester, 2 Lecture 2: October 24 Lecturer: Yishay Mansour Scribe: Shahar Yifrah, Roi Meron 2. Bayesian Inference - Overview This lecture is going to describe the basic

More information

CS340 Machine learning Naïve Bayes classifiers

CS340 Machine learning Naïve Bayes classifiers CS340 Machine learning Naïve Bayes classifiers Document classification Let Y {1,,C} be the class label and x {0,1} d eg Y {spam, urgent, normal}, x i = I(word i is present in message) Bag of words model

More information

Basics of Statistical Machine Learning

Basics of Statistical Machine Learning CS761 Spring 2013 Advanced Machine Learning Basics of Statistical Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu Modern machine learning is rooted in statistics. You will find many familiar

More information

Parametric Models Part I: Maximum Likelihood and Bayesian Density Estimation

Parametric Models Part I: Maximum Likelihood and Bayesian Density Estimation Parametric Models Part I: Maximum Likelihood and Bayesian Density Estimation Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Fall 2015 CS 551, Fall 2015

More information

L10: Probability, statistics, and estimation theory

L10: Probability, statistics, and estimation theory L10: Probability, statistics, and estimation theory Review of probability theory Bayes theorem Statistics and the Normal distribution Least Squares Error estimation Maximum Likelihood estimation Bayesian

More information

Learning from Data: Naive Bayes

Learning from Data: Naive Bayes Semester 1 http://www.anc.ed.ac.uk/ amos/lfd/ Naive Bayes Typical example: Bayesian Spam Filter. Naive means naive. Bayesian methods can be much more sophisticated. Basic assumption: conditional independence.

More information

Linear Classification. Volker Tresp Summer 2015

Linear Classification. Volker Tresp Summer 2015 Linear Classification Volker Tresp Summer 2015 1 Classification Classification is the central task of pattern recognition Sensors supply information about an object: to which class do the object belong

More information

Linear Threshold Units

Linear Threshold Units Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear

More information

Data Modeling & Analysis Techniques. Probability & Statistics. Manfred Huber 2011 1

Data Modeling & Analysis Techniques. Probability & Statistics. Manfred Huber 2011 1 Data Modeling & Analysis Techniques Probability & Statistics Manfred Huber 2011 1 Probability and Statistics Probability and statistics are often used interchangeably but are different, related fields

More information

Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training

Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training Charles Elkan elkan@cs.ucsd.edu January 10, 2014 1 Principle of maximum likelihood Consider a family of probability distributions

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct

More information

10-601. Machine Learning. http://www.cs.cmu.edu/afs/cs/academic/class/10601-f10/index.html

10-601. Machine Learning. http://www.cs.cmu.edu/afs/cs/academic/class/10601-f10/index.html 10-601 Machine Learning http://www.cs.cmu.edu/afs/cs/academic/class/10601-f10/index.html Course data All up-to-date info is on the course web page: http://www.cs.cmu.edu/afs/cs/academic/class/10601-f10/index.html

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes

More information

Why the Normal Distribution?

Why the Normal Distribution? Why the Normal Distribution? Raul Rojas Freie Universität Berlin Februar 2010 Abstract This short note explains in simple terms why the normal distribution is so ubiquitous in pattern recognition applications.

More information

Naive Bayesian Classifier

Naive Bayesian Classifier POLYTECHNIC UNIVERSITY Department of Computer Science / Finance and Risk Engineering Naive Bayesian Classifier K. Ming Leung Abstract: A statistical classifier called Naive Bayesian classifier is discussed.

More information

Christfried Webers. Canberra February June 2015

Christfried Webers. Canberra February June 2015 c Statistical Group and College of Engineering and Computer Science Canberra February June (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 829 c Part VIII Linear Classification 2 Logistic

More information

P (x) 0. Discrete random variables Expected value. The expected value, mean or average of a random variable x is: xp (x) = v i P (v i )

P (x) 0. Discrete random variables Expected value. The expected value, mean or average of a random variable x is: xp (x) = v i P (v i ) Discrete random variables Probability mass function Given a discrete random variable X taking values in X = {v 1,..., v m }, its probability mass function P : X [0, 1] is defined as: P (v i ) = Pr[X =

More information

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not. Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C

More information

Logistic Regression. Vibhav Gogate The University of Texas at Dallas. Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld.

Logistic Regression. Vibhav Gogate The University of Texas at Dallas. Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld. Logistic Regression Vibhav Gogate The University of Texas at Dallas Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld. Generative vs. Discriminative Classifiers Want to Learn: h:x Y X features

More information

Mixture Models. Jia Li. Department of Statistics The Pennsylvania State University. Mixture Models

Mixture Models. Jia Li. Department of Statistics The Pennsylvania State University. Mixture Models Mixture Models Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Clustering by Mixture Models General bacground on clustering Example method: -means Mixture model based

More information

Summary of Probability

Summary of Probability Summary of Probability Mathematical Physics I Rules of Probability The probability of an event is called P(A), which is a positive number less than or equal to 1. The total probability for all possible

More information

Regime Switching Models: An Example for a Stock Market Index

Regime Switching Models: An Example for a Stock Market Index Regime Switching Models: An Example for a Stock Market Index Erik Kole Econometric Institute, Erasmus School of Economics, Erasmus University Rotterdam April 2010 In this document, I discuss in detail

More information

Artificial Intelligence Mar 27, Bayesian Networks 1 P (T D)P (D) + P (T D)P ( D) =

Artificial Intelligence Mar 27, Bayesian Networks 1 P (T D)P (D) + P (T D)P ( D) = Artificial Intelligence 15-381 Mar 27, 2007 Bayesian Networks 1 Recap of last lecture Probability: precise representation of uncertainty Probability theory: optimal updating of knowledge based on new information

More information

Sufficient Statistics and Exponential Family. 1 Statistics and Sufficient Statistics. Math 541: Statistical Theory II. Lecturer: Songfeng Zheng

Sufficient Statistics and Exponential Family. 1 Statistics and Sufficient Statistics. Math 541: Statistical Theory II. Lecturer: Songfeng Zheng Math 541: Statistical Theory II Lecturer: Songfeng Zheng Sufficient Statistics and Exponential Family 1 Statistics and Sufficient Statistics Suppose we have a random sample X 1,, X n taken from a distribution

More information

Classification Problems

Classification Problems Classification Read Chapter 4 in the text by Bishop, except omit Sections 4.1.6, 4.1.7, 4.2.4, 4.3.3, 4.3.5, 4.3.6, 4.4, and 4.5. Also, review sections 1.5.1, 1.5.2, 1.5.3, and 1.5.4. Classification Problems

More information

Probability & Statistics Primer Gregory J. Hakim University of Washington 2 January 2009 v2.0

Probability & Statistics Primer Gregory J. Hakim University of Washington 2 January 2009 v2.0 Probability & Statistics Primer Gregory J. Hakim University of Washington 2 January 2009 v2.0 This primer provides an overview of basic concepts and definitions in probability and statistics. We shall

More information

CHAPTER 2 Estimating Probabilities

CHAPTER 2 Estimating Probabilities CHAPTER 2 Estimating Probabilities Machine Learning Copyright c 2016. Tom M. Mitchell. All rights reserved. *DRAFT OF January 24, 2016* *PLEASE DO NOT DISTRIBUTE WITHOUT AUTHOR S PERMISSION* This is a

More information

Gaussian Classifiers CS498

Gaussian Classifiers CS498 Gaussian Classifiers CS498 Today s lecture The Gaussian Gaussian classifiers A slightly more sophisticated classifier Nearest Neighbors We can classify with nearest neighbors x m 1 m 2 Decision boundary

More information

CCNY. BME I5100: Biomedical Signal Processing. Linear Discrimination. Lucas C. Parra Biomedical Engineering Department City College of New York

CCNY. BME I5100: Biomedical Signal Processing. Linear Discrimination. Lucas C. Parra Biomedical Engineering Department City College of New York BME I5100: Biomedical Signal Processing Linear Discrimination Lucas C. Parra Biomedical Engineering Department CCNY 1 Schedule Week 1: Introduction Linear, stationary, normal - the stuff biology is not

More information

Probabilistic Graphical Models Final Exam

Probabilistic Graphical Models Final Exam Introduction to Probabilistic Graphical Models Final Exam, March 2007 1 Probabilistic Graphical Models Final Exam Please fill your name and I.D.: Name:... I.D.:... Duration: 3 hours. Guidelines: 1. The

More information

MS&E 226: Small Data

MS&E 226: Small Data MS&E 226: Small Data Lecture 16: Bayesian inference (v3) Ramesh Johari ramesh.johari@stanford.edu 1 / 35 Priors 2 / 35 Frequentist vs. Bayesian inference Frequentists treat the parameters as fixed (deterministic).

More information

Lecture 3: Linear methods for classification

Lecture 3: Linear methods for classification Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,

More information

CSCI567 Machine Learning (Fall 2014)

CSCI567 Machine Learning (Fall 2014) CSCI567 Machine Learning (Fall 2014) Drs. Sha & Liu {feisha,yanliu.cs}@usc.edu September 22, 2014 Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 2014) September 22, 2014 1 /

More information

Machine Learning

Machine Learning Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University February 8, 2011 Today: Graphical models Bayes Nets: Representing distributions Conditional independencies

More information

Lecture 9: Bayesian Learning

Lecture 9: Bayesian Learning Lecture 9: Bayesian Learning Cognitive Systems II - Machine Learning SS 2005 Part II: Special Aspects of Concept Learning Bayes Theorem, MAL / ML hypotheses, Brute-force MAP LEARNING, MDL principle, Bayes

More information

Review. Lecture 3: Probability Distributions. Poisson Distribution. May 8, 2012 GENOME 560, Spring Su In Lee, CSE & GS

Review. Lecture 3: Probability Distributions. Poisson Distribution. May 8, 2012 GENOME 560, Spring Su In Lee, CSE & GS Lecture 3: Probability Distributions May 8, 202 GENOME 560, Spring 202 Su In Lee, CSE & GS suinlee@uw.edu Review Random variables Discrete: Probability mass function (pmf) Continuous: Probability density

More information

MAS108 Probability I

MAS108 Probability I 1 QUEEN MARY UNIVERSITY OF LONDON 2:30 pm, Thursday 3 May, 2007 Duration: 2 hours MAS108 Probability I Do not start reading the question paper until you are instructed to by the invigilators. The paper

More information

ML, MAP, and Bayesian The Holy Trinity of Parameter Estimation and Data Prediction. Avinash Kak Purdue University. January 4, :19am

ML, MAP, and Bayesian The Holy Trinity of Parameter Estimation and Data Prediction. Avinash Kak Purdue University. January 4, :19am ML, MAP, and Bayesian The Holy Trinity of Parameter Estimation and Data Prediction Avinash Kak Purdue University January 4, 2017 11:19am An RVL Tutorial Presentation originally presented in Summer 2008

More information

Statistical Machine Learning from Data

Statistical Machine Learning from Data Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Gaussian Mixture Models Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique

More information

Set theory, and set operations

Set theory, and set operations 1 Set theory, and set operations Sayan Mukherjee Motivation It goes without saying that a Bayesian statistician should know probability theory in depth to model. Measure theory is not needed unless we

More information

Question 2 Naïve Bayes (16 points)

Question 2 Naïve Bayes (16 points) Question 2 Naïve Bayes (16 points) About 2/3 of your email is spam so you downloaded an open source spam filter based on word occurrences that uses the Naive Bayes classifier. Assume you collected the

More information

Bayesian Networks Tutorial I Basics of Probability Theory and Bayesian Networks

Bayesian Networks Tutorial I Basics of Probability Theory and Bayesian Networks Bayesian Networks 2015 2016 Tutorial I Basics of Probability Theory and Bayesian Networks Peter Lucas LIACS, Leiden University Elementary probability theory We adopt the following notations with respect

More information

ECEN 5682 Theory and Practice of Error Control Codes

ECEN 5682 Theory and Practice of Error Control Codes ECEN 5682 Theory and Practice of Error Control Codes Convolutional Codes University of Colorado Spring 2007 Linear (n, k) block codes take k data symbols at a time and encode them into n code symbols.

More information

Lecture 2: Introduction to belief (Bayesian) networks

Lecture 2: Introduction to belief (Bayesian) networks Lecture 2: Introduction to belief (Bayesian) networks Conditional independence What is a belief network? Independence maps (I-maps) January 7, 2008 1 COMP-526 Lecture 2 Recall from last time: Conditional

More information

General Naïve Bayes. CS 188: Artificial Intelligence Fall Example: Spam Filtering. Example: OCR. Generalization and Overfitting

General Naïve Bayes. CS 188: Artificial Intelligence Fall Example: Spam Filtering. Example: OCR. Generalization and Overfitting CS 88: Artificial Intelligence Fall 7 Lecture : Perceptrons //7 General Naïve Bayes A general naive Bayes model: C x E n parameters C parameters n x E x C parameters E E C E n Dan Klein UC Berkeley We

More information

The Optimality of Naive Bayes

The Optimality of Naive Bayes The Optimality of Naive Bayes Harry Zhang Faculty of Computer Science University of New Brunswick Fredericton, New Brunswick, Canada email: hzhang@unbca E3B 5A3 Abstract Naive Bayes is one of the most

More information

Lecture 9: Introduction to Pattern Analysis

Lecture 9: Introduction to Pattern Analysis Lecture 9: Introduction to Pattern Analysis g Features, patterns and classifiers g Components of a PR system g An example g Probability definitions g Bayes Theorem g Gaussian densities Features, patterns

More information

Clustering - example. Given some data x i X Find a partitioning of the data into k disjunctive clusters Example: k-means clustering

Clustering - example. Given some data x i X Find a partitioning of the data into k disjunctive clusters Example: k-means clustering Clustering - example Graph Mining and Graph Kernels Given some data x i X Find a partitioning of the data into k disjunctive clusters Example: k-means clustering x!!!!8!!! 8 x 1 1 Clustering - example

More information

Expectation Discrete RV - weighted average Continuous RV - use integral to take the weighted average

Expectation Discrete RV - weighted average Continuous RV - use integral to take the weighted average PHP 2510 Expectation, variance, covariance, correlation Expectation Discrete RV - weighted average Continuous RV - use integral to take the weighted average Variance Variance is the average of (X µ) 2

More information

Introduction to General and Generalized Linear Models

Introduction to General and Generalized Linear Models Introduction to General and Generalized Linear Models General Linear Models - part I Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby

More information

Data Analysis and Uncertainty Part 3: Hypothesis Testing/Sampling

Data Analysis and Uncertainty Part 3: Hypothesis Testing/Sampling Data Analysis and Uncertainty Part 3: Hypothesis Testing/Sampling Instructor: Sargur N. University at Buffalo The State University of New York srihari@cedar.buffalo.edu Topics 1. Hypothesis Testing 1.

More information

Part III: Machine Learning. CS 188: Artificial Intelligence. Machine Learning This Set of Slides. Parameter Estimation. Estimation: Smoothing

Part III: Machine Learning. CS 188: Artificial Intelligence. Machine Learning This Set of Slides. Parameter Estimation. Estimation: Smoothing CS 188: Artificial Intelligence Lecture 20: Dynamic Bayes Nets, Naïve Bayes Pieter Abbeel UC Berkeley Slides adapted from Dan Klein. Part III: Machine Learning Up until now: how to reason in a model and

More information

Joint Probability Distributions and Random Samples. Week 5, 2011 Stat 4570/5570 Material from Devore s book (Ed 8), and Cengage

Joint Probability Distributions and Random Samples. Week 5, 2011 Stat 4570/5570 Material from Devore s book (Ed 8), and Cengage 5 Joint Probability Distributions and Random Samples Week 5, 2011 Stat 4570/5570 Material from Devore s book (Ed 8), and Cengage Two Discrete Random Variables The probability mass function (pmf) of a single

More information

Math/Stat 370: Engineering Statistics, Washington State University

Math/Stat 370: Engineering Statistics, Washington State University Math/Stat 370: Engineering Statistics, Washington State University Haijun Li lih@math.wsu.edu Department of Mathematics Washington State University Week 2 Haijun Li Math/Stat 370: Engineering Statistics,

More information

Finite Markov Chains and Algorithmic Applications. Matematisk statistik, Chalmers tekniska högskola och Göteborgs universitet

Finite Markov Chains and Algorithmic Applications. Matematisk statistik, Chalmers tekniska högskola och Göteborgs universitet Finite Markov Chains and Algorithmic Applications Olle Häggström Matematisk statistik, Chalmers tekniska högskola och Göteborgs universitet PUBLISHED BY THE PRESS SYNDICATE OF THE UNIVERSITY OF CAMBRIDGE

More information

Examination 110 Probability and Statistics Examination

Examination 110 Probability and Statistics Examination Examination 0 Probability and Statistics Examination Sample Examination Questions The Probability and Statistics Examination consists of 5 multiple-choice test questions. The test is a three-hour examination

More information

Regression Estimation - Least Squares and Maximum Likelihood. Dr. Frank Wood

Regression Estimation - Least Squares and Maximum Likelihood. Dr. Frank Wood Regression Estimation - Least Squares and Maximum Likelihood Dr. Frank Wood Least Squares Max(min)imization Function to minimize w.r.t. b 0, b 1 Q = n (Y i (b 0 + b 1 X i )) 2 i=1 Minimize this by maximizing

More information

4. Joint Distributions of Two Random Variables

4. Joint Distributions of Two Random Variables 4. Joint Distributions of Two Random Variables 4.1 Joint Distributions of Two Discrete Random Variables Suppose the discrete random variables X and Y have supports S X and S Y, respectively. The joint

More information

Markov Chains. Chapter Introduction and Definitions 110SOR201(2002)

Markov Chains. Chapter Introduction and Definitions 110SOR201(2002) page 5 SOR() Chapter Markov Chains. Introduction and Definitions Consider a sequence of consecutive times ( or trials or stages): n =,,,... Suppose that at each time a probabilistic experiment is performed,

More information

Less naive Bayes spam detection

Less naive Bayes spam detection Less naive Bayes spam detection Hongming Yang Eindhoven University of Technology Dept. EE, Rm PT 3.27, P.O.Box 53, 5600MB Eindhoven The Netherlands. E-mail:h.m.yang@tue.nl also CoSiNe Connectivity Systems

More information

ORF 245 Fundamentals of Statistics Chapter 8 Hypothesis Testing

ORF 245 Fundamentals of Statistics Chapter 8 Hypothesis Testing ORF 245 Fundamentals of Statistics Chapter 8 Hypothesis Testing Robert Vanderbei Fall 2015 Slides last edited on December 11, 2015 http://www.princeton.edu/ rvdb Coin Tossing Example Consider two coins.

More information

The Exponential Family

The Exponential Family The Exponential Family David M. Blei Columbia University November 3, 2015 Definition A probability density in the exponential family has this form where p.x j / D h.x/ expf > t.x/ a./g; (1) is the natural

More information

Chapter 10: Introducing Probability

Chapter 10: Introducing Probability Chapter 10: Introducing Probability Randomness and Probability So far, in the first half of the course, we have learned how to examine and obtain data. Now we turn to another very important aspect of Statistics

More information

Hidden Markov Model. Jia Li. Department of Statistics The Pennsylvania State University. Hidden Markov Model

Hidden Markov Model. Jia Li. Department of Statistics The Pennsylvania State University. Hidden Markov Model Hidden Markov Model Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Hidden Markov Model Hidden Markov models have close connection with mixture models. A mixture model

More information

Random Variables and Their Expected Values

Random Variables and Their Expected Values Discrete and Continuous Random Variables The Probability Mass Function The (Cumulative) Distribution Function Discrete and Continuous Random Variables The Probability Mass Function The (Cumulative) Distribution

More information

Probabilistic Linear Classification: Logistic Regression. Piyush Rai IIT Kanpur

Probabilistic Linear Classification: Logistic Regression. Piyush Rai IIT Kanpur Probabilistic Linear Classification: Logistic Regression Piyush Rai IIT Kanpur Probabilistic Machine Learning (CS772A) Jan 18, 2016 Probabilistic Machine Learning (CS772A) Probabilistic Linear Classification:

More information

Bayesian networks. Data Mining. Abraham Otero. Data Mining. Agenda. Bayes' theorem Naive Bayes Bayesian networks Bayesian networks (example)

Bayesian networks. Data Mining. Abraham Otero. Data Mining. Agenda. Bayes' theorem Naive Bayes Bayesian networks Bayesian networks (example) Bayesian networks 1/40 Agenda Introduction Bayes' theorem Bayesian networks Quick reference 2/40 1 Introduction Probabilistic methods: They are backed by a strong mathematical formalism. They can capture

More information

Course: Model, Learning, and Inference: Lecture 5

Course: Model, Learning, and Inference: Lecture 5 Course: Model, Learning, and Inference: Lecture 5 Alan Yuille Department of Statistics, UCLA Los Angeles, CA 90095 yuille@stat.ucla.edu Abstract Probability distributions on structured representation.

More information

Classification: Naïve Bayes Classifier Evaluation. Sheets are based on the those provided by Tan, Steinbach, and Kumar. Introduction to Data Mining

Classification: Naïve Bayes Classifier Evaluation. Sheets are based on the those provided by Tan, Steinbach, and Kumar. Introduction to Data Mining Classification: Naïve Bayes Classifier Evaluation Toon Calders ( t.calders@tue.nl ) Sheets are based on the those provided by Tan, Steinbach, and Kumar. Introduction to Data Mining Last Lecture Classification

More information

The basics of probability theory. Distribution of variables, some important distributions

The basics of probability theory. Distribution of variables, some important distributions The basics of probability theory. Distribution of variables, some important distributions 1 Random experiment The outcome is not determined uniquely by the considered conditions. For example, tossing a

More information

Lecture.7 Poisson Distributions - properties, Normal Distributions- properties. Theoretical Distributions. Discrete distribution

Lecture.7 Poisson Distributions - properties, Normal Distributions- properties. Theoretical Distributions. Discrete distribution Lecture.7 Poisson Distributions - properties, Normal Distributions- properties Theoretical distributions are Theoretical Distributions 1. Binomial distribution 2. Poisson distribution Discrete distribution

More information

Response variables assume only two values, say Y j = 1 or = 0, called success and failure (spam detection, credit scoring, contracting.

Response variables assume only two values, say Y j = 1 or = 0, called success and failure (spam detection, credit scoring, contracting. Prof. Dr. J. Franke All of Statistics 1.52 Binary response variables - logistic regression Response variables assume only two values, say Y j = 1 or = 0, called success and failure (spam detection, credit

More information

Math/Stat 360-1: Probability and Statistics, Washington State University

Math/Stat 360-1: Probability and Statistics, Washington State University Math/Stat 360-1: Probability and Statistics, Washington State University Haijun Li lih@math.wsu.edu Department of Mathematics Washington State University Week 3 Haijun Li Math/Stat 360-1: Probability and

More information

Cell Phone based Activity Detection using Markov Logic Network

Cell Phone based Activity Detection using Markov Logic Network Cell Phone based Activity Detection using Markov Logic Network Somdeb Sarkhel sxs104721@utdallas.edu 1 Introduction Mobile devices are becoming increasingly sophisticated and the latest generation of smart

More information

4. Introduction to Statistics

4. Introduction to Statistics Statistics for Engineers 4-1 4. Introduction to Statistics Descriptive Statistics Types of data A variate or random variable is a quantity or attribute whose value may vary from one unit of investigation

More information

Machine Learning in Spam Filtering

Machine Learning in Spam Filtering Machine Learning in Spam Filtering A Crash Course in ML Konstantin Tretyakov kt@ut.ee Institute of Computer Science, University of Tartu Overview Spam is Evil ML for Spam Filtering: General Idea, Problems.

More information

Machine Learning I Week 14: Sequence Learning Introduction

Machine Learning I Week 14: Sequence Learning Introduction Machine Learning I Week 14: Sequence Learning Introduction Alex Graves Technische Universität München 29. January 2009 Literature Pattern Recognition and Machine Learning Chapter 13: Sequential Data Christopher

More information

Fundamental Probability and Statistics

Fundamental Probability and Statistics Fundamental Probability and Statistics "There are known knowns. These are things we know that we know. There are known unknowns. That is to say, there are things that we know we don't know. But there are

More information

Machine Learning Overview

Machine Learning Overview Machine Learning Overview Sargur N. Srihari University at Buffalo, State University of New York USA 1 Outline 1. What is Machine Learning (ML)? 1. As a scientific Discipline 2. As an area of Computer Science/AI

More information

Bayesian Methods of Parameter Estimation

Bayesian Methods of Parameter Estimation Bayesian Methods of Parameter Estimation Aciel Eshky University of Edinburgh School of Informatics Introduction In order to motivate the idea of parameter estimation we need to first understand the notion

More information

13.3 Inference Using Full Joint Distribution

13.3 Inference Using Full Joint Distribution 191 The probability distribution on a single variable must sum to 1 It is also true that any joint probability distribution on any set of variables must sum to 1 Recall that any proposition a is equivalent

More information

2.8 Expected values and variance

2.8 Expected values and variance y 1 b a 0 y 0.0 0.2 0.4 0.6 0.8 1.0 1.2 0 a b (a) Probability density function x 0 a b (b) Cumulative distribution function x Figure 2.3: The probability density function and cumulative distribution function

More information

Probability, Conditional Independence

Probability, Conditional Independence Probability, Conditional Independence June 19, 2012 Probability, Conditional Independence Probability Sample space Ω of events Each event ω Ω has an associated measure Probability of the event P(ω) Axioms

More information

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression Logistic Regression Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Logistic Regression Preserve linear classification boundaries. By the Bayes rule: Ĝ(x) = arg max

More information

Introduction to latent variable models

Introduction to latent variable models Introduction to latent variable models lecture 1 Francesco Bartolucci Department of Economics, Finance and Statistics University of Perugia, IT bart@stat.unipg.it Outline [2/24] Latent variables and their

More information

Introduction to Statistical Machine Learning

Introduction to Statistical Machine Learning CHAPTER Introduction to Statistical Machine Learning We start with a gentle introduction to statistical machine learning. Readers familiar with machine learning may wish to skip directly to Section 2,

More information

1 A simple example. A short introduction to Bayesian statistics, part I Math 218, Mathematical Statistics D Joyce, Spring 2016

1 A simple example. A short introduction to Bayesian statistics, part I Math 218, Mathematical Statistics D Joyce, Spring 2016 and P (B X). In order to do find those conditional probabilities, we ll use Bayes formula. We can easily compute the reverse probabilities A short introduction to Bayesian statistics, part I Math 18, Mathematical

More information

Mathematical Background

Mathematical Background Appendix A Mathematical Background A.1 Joint, Marginal and Conditional Probability Let the n (discrete or continuous) random variables y 1,..., y n have a joint joint probability probability p(y 1,...,

More information

Logic, Probability and Learning

Logic, Probability and Learning Logic, Probability and Learning Luc De Raedt luc.deraedt@cs.kuleuven.be Overview Logic Learning Probabilistic Learning Probabilistic Logic Learning Closely following : Russell and Norvig, AI: a modern

More information

Discrete Binary Distributions

Discrete Binary Distributions Discrete Binary Distributions Carl Edward Rasmussen November th, 26 Carl Edward Rasmussen Discrete Binary Distributions November th, 26 / 4 Key concepts Bernoulli: probabilities over binary variables Binomial:

More information

Lecture 2: Statistical Estimation and Testing

Lecture 2: Statistical Estimation and Testing Bioinformatics: In-depth PROBABILITY AND STATISTICS Spring Semester 2012 Lecture 2: Statistical Estimation and Testing Stefanie Muff (stefanie.muff@ifspm.uzh.ch) 1 Problems in statistics 2 The three main

More information