A crash course in probability and Naïve Bayes classification


 Annabella Wheeler
 2 years ago
 Views:
Transcription
1 Probability theory A crash course in probability and Naïve Bayes classification Chapter 9 Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish between discrete and continuous variables. The distribution of a discrete random variable: The probabilities of each value it can take. Notation: P(X = x i ). These numbers satisfy: X P (X = x i )=1 i 1 2 Probability theory Probability theory A joint probability distribution for two variables is a table. Marginal Probability If the two variables are binary, how many parameters does it have? Joint Probability Conditional Probability What about joint probability of d variables P(X 1,,X d )? How many parameters does it have if each variable is binary? 4 1
2 Probability theory The Rules of Probability Marginalization: Marginalization Product Rule Product Rule Independence: X and Y are independent if P(Y X) = P(Y) This implies P(X,Y) = P(X) P(Y) Using probability in learning P (Y X) Interested in: 1% X features Y labels 0.5% 0% For example, when classifying spam, we could estimate P(Y Viagara, lottery) We would then classify an example if P(Y X) > 0.5. However, it s usually easier to model P(X Y) Maximum likelihood Fit a probabilistic model P(x θ) to data Estimate θ Given independent identically distributed (i.i.d.) data X = (x 1, x 2,, x n ) Likelihood Log likelihood P (X ) =P (x 1 )P (x 2 ),...,P(x n ) ln P (X ) = nx ln P (x i ) i=1 Maximum likelihood solution: parameters θ that maximize ln P(X θ) 7 2
3 : coin toss Estimate the probability p that a coin lands Heads using the result of n coin tosses, h of which resulted in heads. The likelihood of the data: Log likelihood: Taking a derivative and setting to 0: P (X ) =p h (1 p) n h ln P (X ) =h ln p +(n h)ln(1 ln P (X = h p (n h) (1 p) =0 From the product rule: and: Therefore: P(Y, X) = P(Y X) P(X) P(Y, X) = P(X Y) P(Y) P (Y X) = This is known as Bayes rule Bayes rule P (X Y )P (Y ) P (X) ) p = h n 9 10 P (Y X) = posterior Bayes rule P (X Y )P (Y ) P (X) posterior likelihood prior P(X) can be computed as: P (X) = X Y P (X Y )P (Y ) likelihood prior Maximum aposteriori and maximum likelihood The maximum a posteriori (MAP) rule: y MAP = arg max Y P (X Y )P (Y ) P (Y X) = arg max Y P (X) If we ignore the prior distribution or assume it is uniform we obtain the maximum likelihood rule: y ML = arg max P (X Y ) Y = arg max P (X Y )P (Y ) Y A classifier that has access to P(Y X) is a Bayes optimal classifier. But is not important for inferring a label 12 3
4 Naïve Bayes classifier We would like to model P(X Y), where X is a feature vector, and Y is its associated label. Task:%Predict%whether%or%not%a%picnic%spot%is%enjoyable% % Training&Data:&& X%=%(X 1 %%%%%%%X 2 %%%%%%%%X 3 %%%%%%%% %%%%%%%% %%%%%%%X d )%%%%%%%%%%Y% Naïve Bayes classifier We would like to model P(X Y), where X is a feature vector, and Y is its associated label. Simplifying assumption: conditional independence: given the class label the features are independent, i.e. P (X Y )=P (x 1 Y )P (x 2 Y ),...,P(x d Y ) n&rows& How many parameters now? How many parameters? Prior: P(Y) k1 if k classes Likelihood: P(X Y) (2 d 1)k for binary features Naïve Bayes classifier We would like to model P(X Y), where X is a feature vector, and Y is its associated label. Simplifying assumption: conditional independence: given the class label the features are independent, i.e. P (X Y )=P (x 1 Y )P (x 2 Y ),...,P(x d Y ) Naïve Bayes decision rule: y NB = arg max Y Naïve Bayes classifier P (X Y )P (Y ) = arg max Y dy P (x i Y )P (Y ) i=1 If conditional independence holds, NB is an optimal classifier! How many parameters now? dk + k
5 Training a Naïve Bayes classifier Training data: Feature matrix X (n x d) and labels y 1, y n Maximum likelihood estimates: Class prior: Likelihood: ˆP (y) = {i : y i = y} n ˆP (x i y) = ˆP (x i,y) ˆP (y) = {i : X ij = x i,y i = y} /n {i : y i = y} /n classification Suppose our vocabulary contains three words a, b and c, and we use a multivariate Bernoulli model for our s, with parameters = (0.5,0.67,0.33) = (0.67,0.33,0.33) This means, for example, that the presence of b is twice as likely in spam (+), compared with ham. The to be classified contains words a and b but not c, and hence is described by the bit vector x = (1,1,0). We obtain likelihoods P(x ) = (1 0.33) = P(x ) = (1 0.33) = The ML classification of x is thus spam classification: training data a? b? c? Class e e e e e e e e ount vectors. (right) The same data set What are the parameters of the model? classification: training data a? b? c? Class e e e e e e e e ount vectors. (right) The same data set What are the parameters of the model? ˆP (y) = {i : y i = y} n ˆP (x i y) = ˆP (x i,y) ˆP (y) = {i : X ij = x i,y i = y} /n {i : y i = y} /n
6 classification: training data a? b? c? Class e e e e e e e e ount vectors. (right) The same data set What are the parameters of the model? P(+) = 0.5, P() = 0.5 P(a +) = 0.5, P(a ) = 0.75 P(b +) = 0.75, P(b )= 0.25 P(c +) = 0.25, P(c )= 0.25 ˆP (x i y) = ˆP (x i,y) ˆP (y) ˆP (y) = {i : y i = y} n = {i : Xij = xi,yi = y} /n {i : y i = y} /n Comments on Naïve Bayes Usually features are not conditionally independent, i.e. P (X Y ) 6= P (x 1 Y )P (x 2 Y ),...,P(x d Y ) And yet, one of the most widely used classifiers. Easy to train! It often performs well even when the assumption is violated. Domingos, P., & Pazzani, M. (1997). Beyond Independence: Conditions for the Optimality of the Simple Bayesian Classifier. Machine Learning. 29, When there are few training examples What if you never see a training example where x 1 =a when y=spam? P(x spam) = P(a spam) P(b spam) P(c spam) = 0 What to do? When there are few training examples What if you never see a training example where x 1 =a when y=spam? P(x spam) = P(a spam) P(b spam) P(c spam) = 0 What to do? Add virtual examples for which x 1 =a when y=spam
7 Naïve Bayes for continuous variables Continuous Probability Distributions Need to talk about continuous distributions! The probability of the random variable assuming a value within some given interval from x 1 to x 2 is defined to be the area under the graph of the probability density function between x 1 and x 2. f (x) Uniform f (x) Normal/Gaussian x x x x 1 x 1 x Expectations The Gaussian (normal) distribution Discrete variables Continuous variables Conditional expectation (discrete) Approximate expectation (discrete and continuous) 7
8 Properties of the Gaussian distribution Standard Normal Distribution 99.72% 95.44% 68.26% A random variable having a normal distribution with a mean of 0 and a standard deviation of 1 is said to have a standard normal probability distribution. µ 3σ µ 1σ µ 2σ µ µ + 1σ µ + 2σ µ + 3σ x Standard Normal Probability Distribution Gaussian Parameter Estimation Converting to the Standard Normal Distribution z x = µ σ Likelihood function We can think of z as a measure of the number of standard deviations x is from µ. 8
9 Maximum (Log) Likelihood 34 Gaussian models Assume we have data that belongs to three classes, and assume a likelihood that follows a Gaussian distribution Likelihood function: P (X i = x Y = y k )= Gaussian Naïve Bayes 1 (x µik ) 2 p exp 2 ik 2 2 ik Need to estimate mean and variance for each feature in each class
10 Summary Naïve Bayes classifier: ² ² ² What s the assumption Why we make it How we learn it Naïve Bayes for discrete data Gaussian naïve Bayes 37 10
Introduction to Machine Learning
Introduction to Machine Learning Brown University CSCI 1950F, Spring 2012 Prof. Erik Sudderth Lecture 5: Decision Theory & ROC Curves Gaussian ML Estimation Many figures courtesy Kevin Murphy s textbook,
More informationProbability Theory. Elementary rules of probability Sum rule. Product rule. p. 23
Probability Theory Uncertainty is key concept in machine learning. Probability provides consistent framework for the quantification and manipulation of uncertainty. Probability of an event is the fraction
More information1 Maximum likelihood estimation
COS 424: Interacting with Data Lecturer: David Blei Lecture #4 Scribes: Wei Ho, Michael Ye February 14, 2008 1 Maximum likelihood estimation 1.1 MLE of a Bernoulli random variable (coin flips) Given N
More informationBayesian Learning. CSL465/603  Fall 2016 Narayanan C Krishnan
Bayesian Learning CSL465/603  Fall 2016 Narayanan C Krishnan ckn@iitrpr.ac.in Outline Bayes Theorem MAP Learners Bayes optimal classifier Naïve Bayes classifier Example text classification Bayesian networks
More informationBayesian Classification
CS 650: Computer Vision Bryan S. Morse BYU Computer Science Statistical Basis Training: ClassConditional Probabilities Suppose that we measure features for a large training set taken from class ω i. Each
More informationBayes and Naïve Bayes. cs534machine Learning
Bayes and aïve Bayes cs534machine Learning Bayes Classifier Generative model learns Prediction is made by and where This is often referred to as the Bayes Classifier, because of the use of the Bayes rule
More informationProbabilities for machine learning
Probabilities for machine learning Roland Memisevic February 2, 2006 Why probabilities? One of the hardest problems when building complex intelligent systems is brittleness. How can we keep tiny irregularities
More informationBayes Rule. How is this rule derived? Using Bayes rule for probabilistic inference: Evidence Cause) Evidence)
Bayes Rule P ( A B = B A A B ( Rev. Thomas Bayes (17021761 How is this rule derived? Using Bayes rule for probabilistic inference: P ( Cause Evidence = Evidence Cause Cause Evidence Cause Evidence: diagnostic
More informationBayesian Decision Theory. Prof. Richard Zanibbi
Bayesian Decision Theory Prof. Richard Zanibbi Bayesian Decision Theory The Basic Idea To minimize errors, choose the least risky class, i.e. the class for which the expected loss is smallest Assumptions
More informationLecture 2: October 24
Machine Learning: Foundations Fall Semester, 2 Lecture 2: October 24 Lecturer: Yishay Mansour Scribe: Shahar Yifrah, Roi Meron 2. Bayesian Inference  Overview This lecture is going to describe the basic
More informationCS340 Machine learning Naïve Bayes classifiers
CS340 Machine learning Naïve Bayes classifiers Document classification Let Y {1,,C} be the class label and x {0,1} d eg Y {spam, urgent, normal}, x i = I(word i is present in message) Bag of words model
More informationBasics of Statistical Machine Learning
CS761 Spring 2013 Advanced Machine Learning Basics of Statistical Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu Modern machine learning is rooted in statistics. You will find many familiar
More informationParametric Models Part I: Maximum Likelihood and Bayesian Density Estimation
Parametric Models Part I: Maximum Likelihood and Bayesian Density Estimation Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Fall 2015 CS 551, Fall 2015
More informationL10: Probability, statistics, and estimation theory
L10: Probability, statistics, and estimation theory Review of probability theory Bayes theorem Statistics and the Normal distribution Least Squares Error estimation Maximum Likelihood estimation Bayesian
More informationLearning from Data: Naive Bayes
Semester 1 http://www.anc.ed.ac.uk/ amos/lfd/ Naive Bayes Typical example: Bayesian Spam Filter. Naive means naive. Bayesian methods can be much more sophisticated. Basic assumption: conditional independence.
More informationLinear Classification. Volker Tresp Summer 2015
Linear Classification Volker Tresp Summer 2015 1 Classification Classification is the central task of pattern recognition Sensors supply information about an object: to which class do the object belong
More informationLinear Threshold Units
Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear
More informationData Modeling & Analysis Techniques. Probability & Statistics. Manfred Huber 2011 1
Data Modeling & Analysis Techniques Probability & Statistics Manfred Huber 2011 1 Probability and Statistics Probability and statistics are often used interchangeably but are different, related fields
More informationMaximum Likelihood, Logistic Regression, and Stochastic Gradient Training
Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training Charles Elkan elkan@cs.ucsd.edu January 10, 2014 1 Principle of maximum likelihood Consider a family of probability distributions
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct
More information10601. Machine Learning. http://www.cs.cmu.edu/afs/cs/academic/class/10601f10/index.html
10601 Machine Learning http://www.cs.cmu.edu/afs/cs/academic/class/10601f10/index.html Course data All uptodate info is on the course web page: http://www.cs.cmu.edu/afs/cs/academic/class/10601f10/index.html
More informationStatistical Machine Learning
Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes
More informationWhy the Normal Distribution?
Why the Normal Distribution? Raul Rojas Freie Universität Berlin Februar 2010 Abstract This short note explains in simple terms why the normal distribution is so ubiquitous in pattern recognition applications.
More informationNaive Bayesian Classifier
POLYTECHNIC UNIVERSITY Department of Computer Science / Finance and Risk Engineering Naive Bayesian Classifier K. Ming Leung Abstract: A statistical classifier called Naive Bayesian classifier is discussed.
More informationChristfried Webers. Canberra February June 2015
c Statistical Group and College of Engineering and Computer Science Canberra February June (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 829 c Part VIII Linear Classification 2 Logistic
More informationP (x) 0. Discrete random variables Expected value. The expected value, mean or average of a random variable x is: xp (x) = v i P (v i )
Discrete random variables Probability mass function Given a discrete random variable X taking values in X = {v 1,..., v m }, its probability mass function P : X [0, 1] is defined as: P (v i ) = Pr[X =
More informationExample: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.
Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation:  Feature vector X,  qualitative response Y, taking values in C
More informationLogistic Regression. Vibhav Gogate The University of Texas at Dallas. Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld.
Logistic Regression Vibhav Gogate The University of Texas at Dallas Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld. Generative vs. Discriminative Classifiers Want to Learn: h:x Y X features
More informationMixture Models. Jia Li. Department of Statistics The Pennsylvania State University. Mixture Models
Mixture Models Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Clustering by Mixture Models General bacground on clustering Example method: means Mixture model based
More informationSummary of Probability
Summary of Probability Mathematical Physics I Rules of Probability The probability of an event is called P(A), which is a positive number less than or equal to 1. The total probability for all possible
More informationRegime Switching Models: An Example for a Stock Market Index
Regime Switching Models: An Example for a Stock Market Index Erik Kole Econometric Institute, Erasmus School of Economics, Erasmus University Rotterdam April 2010 In this document, I discuss in detail
More informationArtificial Intelligence Mar 27, Bayesian Networks 1 P (T D)P (D) + P (T D)P ( D) =
Artificial Intelligence 15381 Mar 27, 2007 Bayesian Networks 1 Recap of last lecture Probability: precise representation of uncertainty Probability theory: optimal updating of knowledge based on new information
More informationSufficient Statistics and Exponential Family. 1 Statistics and Sufficient Statistics. Math 541: Statistical Theory II. Lecturer: Songfeng Zheng
Math 541: Statistical Theory II Lecturer: Songfeng Zheng Sufficient Statistics and Exponential Family 1 Statistics and Sufficient Statistics Suppose we have a random sample X 1,, X n taken from a distribution
More informationClassification Problems
Classification Read Chapter 4 in the text by Bishop, except omit Sections 4.1.6, 4.1.7, 4.2.4, 4.3.3, 4.3.5, 4.3.6, 4.4, and 4.5. Also, review sections 1.5.1, 1.5.2, 1.5.3, and 1.5.4. Classification Problems
More informationProbability & Statistics Primer Gregory J. Hakim University of Washington 2 January 2009 v2.0
Probability & Statistics Primer Gregory J. Hakim University of Washington 2 January 2009 v2.0 This primer provides an overview of basic concepts and definitions in probability and statistics. We shall
More informationCHAPTER 2 Estimating Probabilities
CHAPTER 2 Estimating Probabilities Machine Learning Copyright c 2016. Tom M. Mitchell. All rights reserved. *DRAFT OF January 24, 2016* *PLEASE DO NOT DISTRIBUTE WITHOUT AUTHOR S PERMISSION* This is a
More informationGaussian Classifiers CS498
Gaussian Classifiers CS498 Today s lecture The Gaussian Gaussian classifiers A slightly more sophisticated classifier Nearest Neighbors We can classify with nearest neighbors x m 1 m 2 Decision boundary
More informationCCNY. BME I5100: Biomedical Signal Processing. Linear Discrimination. Lucas C. Parra Biomedical Engineering Department City College of New York
BME I5100: Biomedical Signal Processing Linear Discrimination Lucas C. Parra Biomedical Engineering Department CCNY 1 Schedule Week 1: Introduction Linear, stationary, normal  the stuff biology is not
More informationProbabilistic Graphical Models Final Exam
Introduction to Probabilistic Graphical Models Final Exam, March 2007 1 Probabilistic Graphical Models Final Exam Please fill your name and I.D.: Name:... I.D.:... Duration: 3 hours. Guidelines: 1. The
More informationMS&E 226: Small Data
MS&E 226: Small Data Lecture 16: Bayesian inference (v3) Ramesh Johari ramesh.johari@stanford.edu 1 / 35 Priors 2 / 35 Frequentist vs. Bayesian inference Frequentists treat the parameters as fixed (deterministic).
More informationLecture 3: Linear methods for classification
Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,
More informationCSCI567 Machine Learning (Fall 2014)
CSCI567 Machine Learning (Fall 2014) Drs. Sha & Liu {feisha,yanliu.cs}@usc.edu September 22, 2014 Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 2014) September 22, 2014 1 /
More informationMachine Learning
Machine Learning 10701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University February 8, 2011 Today: Graphical models Bayes Nets: Representing distributions Conditional independencies
More informationLecture 9: Bayesian Learning
Lecture 9: Bayesian Learning Cognitive Systems II  Machine Learning SS 2005 Part II: Special Aspects of Concept Learning Bayes Theorem, MAL / ML hypotheses, Bruteforce MAP LEARNING, MDL principle, Bayes
More informationReview. Lecture 3: Probability Distributions. Poisson Distribution. May 8, 2012 GENOME 560, Spring Su In Lee, CSE & GS
Lecture 3: Probability Distributions May 8, 202 GENOME 560, Spring 202 Su In Lee, CSE & GS suinlee@uw.edu Review Random variables Discrete: Probability mass function (pmf) Continuous: Probability density
More informationMAS108 Probability I
1 QUEEN MARY UNIVERSITY OF LONDON 2:30 pm, Thursday 3 May, 2007 Duration: 2 hours MAS108 Probability I Do not start reading the question paper until you are instructed to by the invigilators. The paper
More informationML, MAP, and Bayesian The Holy Trinity of Parameter Estimation and Data Prediction. Avinash Kak Purdue University. January 4, :19am
ML, MAP, and Bayesian The Holy Trinity of Parameter Estimation and Data Prediction Avinash Kak Purdue University January 4, 2017 11:19am An RVL Tutorial Presentation originally presented in Summer 2008
More informationStatistical Machine Learning from Data
Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Gaussian Mixture Models Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique
More informationSet theory, and set operations
1 Set theory, and set operations Sayan Mukherjee Motivation It goes without saying that a Bayesian statistician should know probability theory in depth to model. Measure theory is not needed unless we
More informationQuestion 2 Naïve Bayes (16 points)
Question 2 Naïve Bayes (16 points) About 2/3 of your email is spam so you downloaded an open source spam filter based on word occurrences that uses the Naive Bayes classifier. Assume you collected the
More informationBayesian Networks Tutorial I Basics of Probability Theory and Bayesian Networks
Bayesian Networks 2015 2016 Tutorial I Basics of Probability Theory and Bayesian Networks Peter Lucas LIACS, Leiden University Elementary probability theory We adopt the following notations with respect
More informationECEN 5682 Theory and Practice of Error Control Codes
ECEN 5682 Theory and Practice of Error Control Codes Convolutional Codes University of Colorado Spring 2007 Linear (n, k) block codes take k data symbols at a time and encode them into n code symbols.
More informationLecture 2: Introduction to belief (Bayesian) networks
Lecture 2: Introduction to belief (Bayesian) networks Conditional independence What is a belief network? Independence maps (Imaps) January 7, 2008 1 COMP526 Lecture 2 Recall from last time: Conditional
More informationGeneral Naïve Bayes. CS 188: Artificial Intelligence Fall Example: Spam Filtering. Example: OCR. Generalization and Overfitting
CS 88: Artificial Intelligence Fall 7 Lecture : Perceptrons //7 General Naïve Bayes A general naive Bayes model: C x E n parameters C parameters n x E x C parameters E E C E n Dan Klein UC Berkeley We
More informationThe Optimality of Naive Bayes
The Optimality of Naive Bayes Harry Zhang Faculty of Computer Science University of New Brunswick Fredericton, New Brunswick, Canada email: hzhang@unbca E3B 5A3 Abstract Naive Bayes is one of the most
More informationLecture 9: Introduction to Pattern Analysis
Lecture 9: Introduction to Pattern Analysis g Features, patterns and classifiers g Components of a PR system g An example g Probability definitions g Bayes Theorem g Gaussian densities Features, patterns
More informationClustering  example. Given some data x i X Find a partitioning of the data into k disjunctive clusters Example: kmeans clustering
Clustering  example Graph Mining and Graph Kernels Given some data x i X Find a partitioning of the data into k disjunctive clusters Example: kmeans clustering x!!!!8!!! 8 x 1 1 Clustering  example
More informationExpectation Discrete RV  weighted average Continuous RV  use integral to take the weighted average
PHP 2510 Expectation, variance, covariance, correlation Expectation Discrete RV  weighted average Continuous RV  use integral to take the weighted average Variance Variance is the average of (X µ) 2
More informationIntroduction to General and Generalized Linear Models
Introduction to General and Generalized Linear Models General Linear Models  part I Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK2800 Kgs. Lyngby
More informationData Analysis and Uncertainty Part 3: Hypothesis Testing/Sampling
Data Analysis and Uncertainty Part 3: Hypothesis Testing/Sampling Instructor: Sargur N. University at Buffalo The State University of New York srihari@cedar.buffalo.edu Topics 1. Hypothesis Testing 1.
More informationPart III: Machine Learning. CS 188: Artificial Intelligence. Machine Learning This Set of Slides. Parameter Estimation. Estimation: Smoothing
CS 188: Artificial Intelligence Lecture 20: Dynamic Bayes Nets, Naïve Bayes Pieter Abbeel UC Berkeley Slides adapted from Dan Klein. Part III: Machine Learning Up until now: how to reason in a model and
More informationJoint Probability Distributions and Random Samples. Week 5, 2011 Stat 4570/5570 Material from Devore s book (Ed 8), and Cengage
5 Joint Probability Distributions and Random Samples Week 5, 2011 Stat 4570/5570 Material from Devore s book (Ed 8), and Cengage Two Discrete Random Variables The probability mass function (pmf) of a single
More informationMath/Stat 370: Engineering Statistics, Washington State University
Math/Stat 370: Engineering Statistics, Washington State University Haijun Li lih@math.wsu.edu Department of Mathematics Washington State University Week 2 Haijun Li Math/Stat 370: Engineering Statistics,
More informationFinite Markov Chains and Algorithmic Applications. Matematisk statistik, Chalmers tekniska högskola och Göteborgs universitet
Finite Markov Chains and Algorithmic Applications Olle Häggström Matematisk statistik, Chalmers tekniska högskola och Göteborgs universitet PUBLISHED BY THE PRESS SYNDICATE OF THE UNIVERSITY OF CAMBRIDGE
More informationExamination 110 Probability and Statistics Examination
Examination 0 Probability and Statistics Examination Sample Examination Questions The Probability and Statistics Examination consists of 5 multiplechoice test questions. The test is a threehour examination
More informationRegression Estimation  Least Squares and Maximum Likelihood. Dr. Frank Wood
Regression Estimation  Least Squares and Maximum Likelihood Dr. Frank Wood Least Squares Max(min)imization Function to minimize w.r.t. b 0, b 1 Q = n (Y i (b 0 + b 1 X i )) 2 i=1 Minimize this by maximizing
More information4. Joint Distributions of Two Random Variables
4. Joint Distributions of Two Random Variables 4.1 Joint Distributions of Two Discrete Random Variables Suppose the discrete random variables X and Y have supports S X and S Y, respectively. The joint
More informationMarkov Chains. Chapter Introduction and Definitions 110SOR201(2002)
page 5 SOR() Chapter Markov Chains. Introduction and Definitions Consider a sequence of consecutive times ( or trials or stages): n =,,,... Suppose that at each time a probabilistic experiment is performed,
More informationLess naive Bayes spam detection
Less naive Bayes spam detection Hongming Yang Eindhoven University of Technology Dept. EE, Rm PT 3.27, P.O.Box 53, 5600MB Eindhoven The Netherlands. Email:h.m.yang@tue.nl also CoSiNe Connectivity Systems
More informationORF 245 Fundamentals of Statistics Chapter 8 Hypothesis Testing
ORF 245 Fundamentals of Statistics Chapter 8 Hypothesis Testing Robert Vanderbei Fall 2015 Slides last edited on December 11, 2015 http://www.princeton.edu/ rvdb Coin Tossing Example Consider two coins.
More informationThe Exponential Family
The Exponential Family David M. Blei Columbia University November 3, 2015 Definition A probability density in the exponential family has this form where p.x j / D h.x/ expf > t.x/ a./g; (1) is the natural
More informationChapter 10: Introducing Probability
Chapter 10: Introducing Probability Randomness and Probability So far, in the first half of the course, we have learned how to examine and obtain data. Now we turn to another very important aspect of Statistics
More informationHidden Markov Model. Jia Li. Department of Statistics The Pennsylvania State University. Hidden Markov Model
Hidden Markov Model Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Hidden Markov Model Hidden Markov models have close connection with mixture models. A mixture model
More informationRandom Variables and Their Expected Values
Discrete and Continuous Random Variables The Probability Mass Function The (Cumulative) Distribution Function Discrete and Continuous Random Variables The Probability Mass Function The (Cumulative) Distribution
More informationProbabilistic Linear Classification: Logistic Regression. Piyush Rai IIT Kanpur
Probabilistic Linear Classification: Logistic Regression Piyush Rai IIT Kanpur Probabilistic Machine Learning (CS772A) Jan 18, 2016 Probabilistic Machine Learning (CS772A) Probabilistic Linear Classification:
More informationBayesian networks. Data Mining. Abraham Otero. Data Mining. Agenda. Bayes' theorem Naive Bayes Bayesian networks Bayesian networks (example)
Bayesian networks 1/40 Agenda Introduction Bayes' theorem Bayesian networks Quick reference 2/40 1 Introduction Probabilistic methods: They are backed by a strong mathematical formalism. They can capture
More informationCourse: Model, Learning, and Inference: Lecture 5
Course: Model, Learning, and Inference: Lecture 5 Alan Yuille Department of Statistics, UCLA Los Angeles, CA 90095 yuille@stat.ucla.edu Abstract Probability distributions on structured representation.
More informationClassification: Naïve Bayes Classifier Evaluation. Sheets are based on the those provided by Tan, Steinbach, and Kumar. Introduction to Data Mining
Classification: Naïve Bayes Classifier Evaluation Toon Calders ( t.calders@tue.nl ) Sheets are based on the those provided by Tan, Steinbach, and Kumar. Introduction to Data Mining Last Lecture Classification
More informationThe basics of probability theory. Distribution of variables, some important distributions
The basics of probability theory. Distribution of variables, some important distributions 1 Random experiment The outcome is not determined uniquely by the considered conditions. For example, tossing a
More informationLecture.7 Poisson Distributions  properties, Normal Distributions properties. Theoretical Distributions. Discrete distribution
Lecture.7 Poisson Distributions  properties, Normal Distributions properties Theoretical distributions are Theoretical Distributions 1. Binomial distribution 2. Poisson distribution Discrete distribution
More informationResponse variables assume only two values, say Y j = 1 or = 0, called success and failure (spam detection, credit scoring, contracting.
Prof. Dr. J. Franke All of Statistics 1.52 Binary response variables  logistic regression Response variables assume only two values, say Y j = 1 or = 0, called success and failure (spam detection, credit
More informationMath/Stat 3601: Probability and Statistics, Washington State University
Math/Stat 3601: Probability and Statistics, Washington State University Haijun Li lih@math.wsu.edu Department of Mathematics Washington State University Week 3 Haijun Li Math/Stat 3601: Probability and
More informationCell Phone based Activity Detection using Markov Logic Network
Cell Phone based Activity Detection using Markov Logic Network Somdeb Sarkhel sxs104721@utdallas.edu 1 Introduction Mobile devices are becoming increasingly sophisticated and the latest generation of smart
More information4. Introduction to Statistics
Statistics for Engineers 41 4. Introduction to Statistics Descriptive Statistics Types of data A variate or random variable is a quantity or attribute whose value may vary from one unit of investigation
More informationMachine Learning in Spam Filtering
Machine Learning in Spam Filtering A Crash Course in ML Konstantin Tretyakov kt@ut.ee Institute of Computer Science, University of Tartu Overview Spam is Evil ML for Spam Filtering: General Idea, Problems.
More informationMachine Learning I Week 14: Sequence Learning Introduction
Machine Learning I Week 14: Sequence Learning Introduction Alex Graves Technische Universität München 29. January 2009 Literature Pattern Recognition and Machine Learning Chapter 13: Sequential Data Christopher
More informationFundamental Probability and Statistics
Fundamental Probability and Statistics "There are known knowns. These are things we know that we know. There are known unknowns. That is to say, there are things that we know we don't know. But there are
More informationMachine Learning Overview
Machine Learning Overview Sargur N. Srihari University at Buffalo, State University of New York USA 1 Outline 1. What is Machine Learning (ML)? 1. As a scientific Discipline 2. As an area of Computer Science/AI
More informationBayesian Methods of Parameter Estimation
Bayesian Methods of Parameter Estimation Aciel Eshky University of Edinburgh School of Informatics Introduction In order to motivate the idea of parameter estimation we need to first understand the notion
More information13.3 Inference Using Full Joint Distribution
191 The probability distribution on a single variable must sum to 1 It is also true that any joint probability distribution on any set of variables must sum to 1 Recall that any proposition a is equivalent
More information2.8 Expected values and variance
y 1 b a 0 y 0.0 0.2 0.4 0.6 0.8 1.0 1.2 0 a b (a) Probability density function x 0 a b (b) Cumulative distribution function x Figure 2.3: The probability density function and cumulative distribution function
More informationProbability, Conditional Independence
Probability, Conditional Independence June 19, 2012 Probability, Conditional Independence Probability Sample space Ω of events Each event ω Ω has an associated measure Probability of the event P(ω) Axioms
More informationLogistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression
Logistic Regression Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Logistic Regression Preserve linear classification boundaries. By the Bayes rule: Ĝ(x) = arg max
More informationIntroduction to latent variable models
Introduction to latent variable models lecture 1 Francesco Bartolucci Department of Economics, Finance and Statistics University of Perugia, IT bart@stat.unipg.it Outline [2/24] Latent variables and their
More informationIntroduction to Statistical Machine Learning
CHAPTER Introduction to Statistical Machine Learning We start with a gentle introduction to statistical machine learning. Readers familiar with machine learning may wish to skip directly to Section 2,
More information1 A simple example. A short introduction to Bayesian statistics, part I Math 218, Mathematical Statistics D Joyce, Spring 2016
and P (B X). In order to do find those conditional probabilities, we ll use Bayes formula. We can easily compute the reverse probabilities A short introduction to Bayesian statistics, part I Math 18, Mathematical
More informationMathematical Background
Appendix A Mathematical Background A.1 Joint, Marginal and Conditional Probability Let the n (discrete or continuous) random variables y 1,..., y n have a joint joint probability probability p(y 1,...,
More informationLogic, Probability and Learning
Logic, Probability and Learning Luc De Raedt luc.deraedt@cs.kuleuven.be Overview Logic Learning Probabilistic Learning Probabilistic Logic Learning Closely following : Russell and Norvig, AI: a modern
More informationDiscrete Binary Distributions
Discrete Binary Distributions Carl Edward Rasmussen November th, 26 Carl Edward Rasmussen Discrete Binary Distributions November th, 26 / 4 Key concepts Bernoulli: probabilities over binary variables Binomial:
More informationLecture 2: Statistical Estimation and Testing
Bioinformatics: Indepth PROBABILITY AND STATISTICS Spring Semester 2012 Lecture 2: Statistical Estimation and Testing Stefanie Muff (stefanie.muff@ifspm.uzh.ch) 1 Problems in statistics 2 The three main
More information