Gaussian Classifiers CS498

Size: px
Start display at page:

Download "Gaussian Classifiers CS498"

Transcription

1 Gaussian Classifiers CS498

2 Today s lecture The Gaussian Gaussian classifiers A slightly more sophisticated classifier

3 Nearest Neighbors We can classify with nearest neighbors x m 1 m 2 Decision boundary Can we get a probability?

4 Nearest Neighbors Nearest neighbors offers an intuitive distance measure x d 1 d 2 m 1 m 2 d i (x 1 m i,1 ) 2 +(x 2 m i,2 ) 2 = x m

5 Making a Soft Decision What if I didn t want to classify What if I wanted a degree of belief How would you do that? x d 1 d 2 m 1 m 2

6 From a Distance to a Probability If the distance is 0 the probability is high If the distance is the probability is zero How do we make a function like that?

7 Here s a first crack at it Use exponentiation: 1 e x m

8 Adding an importance factor Let s try to tune the output by adding a factor denoting importance e x m c

9 One more problem Not all dimensions are equal

10 Adding variable importance to dimensions Somewhat more complicated now: e (x m)t C 1 (x m)

11 The Gaussian Distribution This is the idea behind the Gaussian Adding some normalization we get: P(x;m,C) = 1 2π k/2 e (x m) 1/2 C T C 1 (x m)

12 Gaussian models We can now describe data using Gaussians x How? That s very easy

13 Learn Gaussian parameters Estimate the mean: m = 1 x N i Estimate the covariance: C = 1 N 1 (x m)t (x m)

14 Now we can make classifiers We will use probabilities this time We ll compute a belief of class assignment

15 The Classification Process We provide examples of classes We make models of each class We assign all new input data to a class

16 Making an assignment decision Face classification example Having a probability for each face how do we make a decision?

17 Motivating example Face 1 is more likely x y P(y {face 1, face 2 }) Template face 1 Template face 2 Unknown face

18 How the decision is made In simple cases the answer is intuitive To get a complete picture we need to probe a bit deeper

19 Starting simple Two class case, ω 1 and ω 2

20 Starting simple Given a sample x, is it ω 1 or ω 2? i.e. P(ω i x) =?

21 Getting the answer The class posterior probability is: Likelihood P(ω i x) = P(x ω i )P(ω i ) P(x) Evidence Priors To find the answer we need to fill in the terms in the right-hand-side

22 Filling the unknowns Class priors How much of each class? P(ω 1 ) N 1 / N P(ω 2 ) N 2 / N Class likelihood: P(x ω i ) Requires that we know the distribution of ω i We ll assume it is the Gaussian

23 Filling the unknowns Evidence: P(x) = P(x ω 1 )P(ω 1 ) + P(x ω 2 )P(ω 2 ) We now have P(ω 1 x),p(ω 2 x)

24 Making the decision Bayes classification rule If P(ω 1 x) > P(ω 2 x) then x belongs to class ω 1 If P(ω 1 x) < P(ω 2 x) then x belongs to class ω 2 Easier version P(x ω 1 )P(ω 1 ) P(x ω 2 )P(ω 2 ) Equiprobable class version P(x ω 1 ) P(x ω 2 )

25 Visualizing the decision Assume Gaussian data P(x ω i ) = N (x µ i,σ i ) R 1 R 2 P(ω 1 x) = P(ω 2 x) x 0

26 Errors in classification We can t win all the time though Some inputs will be misclassified R 1 R 2 Probability of errors P error = 1 2 x 0 P(x ω 2 )dx x 0 x 0 P(x ω 1 )dx

27 Minimizing misclassifications The Bayes classification rule minimizes any potential misclassifications R 1 R 2 x 0 Can you do any better moving the line?

28 Minimizing risk Not all errors are equal! e.g. medical diagnoses R 1 R 2 x 0 Misclassification is often very tolerable depending on the assumed risks

29 Example Minimum classification error R 1 R 2 x 0 Minimum risk with λ 21 > λ 12 i.e. class 2 is more important R 1 R 2 x 0

30 True/False Positives/Negatives Naming the outcomes classifying for ω 1 x is ω 1 x is ω 2 x classified as ω 1 True positive False positive x classified as ω 2 False negative True negative False positive/false alarm/type I error False negative/miss/type II error

31 Receiver Operating Characteristic Visualize classification balance R.O.C. Curve R 1 R 2 x 0 True positive rate False positive rate

32 Classifying Gaussian data Remember that we need the class likelihood to make a decision For now we ll assume that: P(x ω i ) = N (x µ i,σ i ) i.e. that the input data is Gaussian distributed

33 Overall methodology Obtain training data Fit a Gaussian model to each class Perform parameter estimation for mean, variance and class priors Define decision regions based on models and any given constraints

34 1D example The decision boundary will always be a line separating the two class regions R 1 R 2 x 0

35 2D example

36 2D example fitted Gaussians

37 Gaussian decision boundaries The decision boundary is defined as: P(x ω 1 )P(ω 1 ) = P(x ω 2 )P(ω 2 ) We can substitute Gaussians and solve to find what the boundary looks like

38 Discriminant functions Define a function so that: classify x in ω i if g i (x) > g j (x), i j Decision boundaries are now defined as: g ij (x) ( g i (x) = g j (x))

39 Back to the data Σ i = σ 2 i I produces line boundaries Discriminant: g i (x) = w i T x + b w i = µ i / σ 2 b = µ i T µ i 2σ 2 + log P(ω i )

40 Quadratic boundaries Arbitrary covariance produces more elaborate patterns in the boundary g i (x) = x T W i x + w i T x + w i W i = 1 2 Σ 1 i w i = Σ i 1 µ i w = 1 2 µ T Σ 1 i i µ i 1 2 log Σ i + log P(ω i )

41 Quadratic boundaries Arbitrary covariance produces more elaborate patterns in the boundary

42 Quadratic boundaries Arbitrary covariance produces more elaborate patterns in the boundary

43 Example classification run Learning to recognize two handwritten digits Let s try this in MATLAB

44 Recap The Gaussian Bayesian Decision Theory Risk, decision regions Gaussian classifiers

Bayesian Classification

Bayesian Classification CS 650: Computer Vision Bryan S. Morse BYU Computer Science Statistical Basis Training: Class-Conditional Probabilities Suppose that we measure features for a large training set taken from class ω i. Each

More information

Bayesian Decision Theory. Prof. Richard Zanibbi

Bayesian Decision Theory. Prof. Richard Zanibbi Bayesian Decision Theory Prof. Richard Zanibbi Bayesian Decision Theory The Basic Idea To minimize errors, choose the least risky class, i.e. the class for which the expected loss is smallest Assumptions

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 5: Decision Theory & ROC Curves Gaussian ML Estimation Many figures courtesy Kevin Murphy s textbook,

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes

More information

A crash course in probability and Naïve Bayes classification

A crash course in probability and Naïve Bayes classification Probability theory A crash course in probability and Naïve Bayes classification Chapter 9 Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s

More information

Probabilities for machine learning

Probabilities for machine learning Probabilities for machine learning Roland Memisevic February 2, 2006 Why probabilities? One of the hardest problems when building complex intelligent systems is brittleness. How can we keep tiny irregularities

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct

More information

CS340 Machine learning Naïve Bayes classifiers

CS340 Machine learning Naïve Bayes classifiers CS340 Machine learning Naïve Bayes classifiers Document classification Let Y {1,,C} be the class label and x {0,1} d eg Y {spam, urgent, normal}, x i = I(word i is present in message) Bag of words model

More information

Final Exam, Spring 2007

Final Exam, Spring 2007 10-701 Final Exam, Spring 2007 1. Personal info: Name: Andrew account: E-mail address: 2. There should be 16 numbered pages in this exam (including this cover sheet). 3. You can use any material you brought:

More information

Lecture 2: October 24

Lecture 2: October 24 Machine Learning: Foundations Fall Semester, 2 Lecture 2: October 24 Lecturer: Yishay Mansour Scribe: Shahar Yifrah, Roi Meron 2. Bayesian Inference - Overview This lecture is going to describe the basic

More information

L10: Probability, statistics, and estimation theory

L10: Probability, statistics, and estimation theory L10: Probability, statistics, and estimation theory Review of probability theory Bayes theorem Statistics and the Normal distribution Least Squares Error estimation Maximum Likelihood estimation Bayesian

More information

CSCI567 Machine Learning (Fall 2014)

CSCI567 Machine Learning (Fall 2014) CSCI567 Machine Learning (Fall 2014) Drs. Sha & Liu {feisha,yanliu.cs}@usc.edu September 22, 2014 Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 2014) September 22, 2014 1 /

More information

CSCC11 Assignment 2: Probability, Estimation, and Classification

CSCC11 Assignment 2: Probability, Estimation, and Classification CSCC11 Assignment 2: Probability, Estimation, and Classification Written Part (5%): due Friday, October 7th, 11am In the following questions be sure to show all your work. Answers without derivations will

More information

Math 2015 Lesson 21. We discuss the mean and the median, two important statistics about a distribution. p(x)dx = 0.5

Math 2015 Lesson 21. We discuss the mean and the median, two important statistics about a distribution. p(x)dx = 0.5 ean and edian We discuss the mean and the median, two important statistics about a distribution. The edian The median is the halfway point of a distribution. It is the point where half the population has

More information

Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris

Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris Class #6: Non-linear classification ML4Bio 2012 February 17 th, 2012 Quaid Morris 1 Module #: Title of Module 2 Review Overview Linear separability Non-linear classification Linear Support Vector Machines

More information

CS 688 Pattern Recognition Lecture 4. Linear Models for Classification

CS 688 Pattern Recognition Lecture 4. Linear Models for Classification CS 688 Pattern Recognition Lecture 4 Linear Models for Classification Probabilistic generative models Probabilistic discriminative models 1 Generative Approach ( x ) p C k p( C k ) Ck p ( ) ( x Ck ) p(

More information

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not. Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C

More information

L4: Bayesian Decision Theory

L4: Bayesian Decision Theory L4: Bayesian Decision Theory Likelihood ratio test Probability of error Bayes risk Bayes, MAP and ML criteria Multi-class problems Discriminant functions CSCE 666 Pattern Analysis Ricardo Gutierrez-Osuna

More information

Christfried Webers. Canberra February June 2015

Christfried Webers. Canberra February June 2015 c Statistical Group and College of Engineering and Computer Science Canberra February June (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 829 c Part VIII Linear Classification 2 Logistic

More information

Linear Classification. Volker Tresp Summer 2015

Linear Classification. Volker Tresp Summer 2015 Linear Classification Volker Tresp Summer 2015 1 Classification Classification is the central task of pattern recognition Sensors supply information about an object: to which class do the object belong

More information

Bayes and Naïve Bayes. cs534-machine Learning

Bayes and Naïve Bayes. cs534-machine Learning Bayes and aïve Bayes cs534-machine Learning Bayes Classifier Generative model learns Prediction is made by and where This is often referred to as the Bayes Classifier, because of the use of the Bayes rule

More information

Parametric Models Part I: Maximum Likelihood and Bayesian Density Estimation

Parametric Models Part I: Maximum Likelihood and Bayesian Density Estimation Parametric Models Part I: Maximum Likelihood and Bayesian Density Estimation Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Fall 2015 CS 551, Fall 2015

More information

Linear Models for Classification

Linear Models for Classification Linear Models for Classification Sumeet Agarwal, EEL709 (Most figures from Bishop, PRML) Approaches to classification Discriminant function: Directly assigns each data point x to a particular class Ci

More information

Mixture Models. Jia Li. Department of Statistics The Pennsylvania State University. Mixture Models

Mixture Models. Jia Li. Department of Statistics The Pennsylvania State University. Mixture Models Mixture Models Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Clustering by Mixture Models General bacground on clustering Example method: -means Mixture model based

More information

Classification Problems

Classification Problems Classification Read Chapter 4 in the text by Bishop, except omit Sections 4.1.6, 4.1.7, 4.2.4, 4.3.3, 4.3.5, 4.3.6, 4.4, and 4.5. Also, review sections 1.5.1, 1.5.2, 1.5.3, and 1.5.4. Classification Problems

More information

Linear Threshold Units

Linear Threshold Units Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Here we approach the two-class classification problem in a direct way: We try and find a plane that separates the classes in feature space. If we cannot, we get creative in two

More information

MACHINE LEARNING IN HIGH ENERGY PHYSICS

MACHINE LEARNING IN HIGH ENERGY PHYSICS MACHINE LEARNING IN HIGH ENERGY PHYSICS LECTURE #1 Alex Rogozhnikov, 2015 INTRO NOTES 4 days two lectures, two practice seminars every day this is introductory track to machine learning kaggle competition!

More information

Clustering - example. Given some data x i X Find a partitioning of the data into k disjunctive clusters Example: k-means clustering

Clustering - example. Given some data x i X Find a partitioning of the data into k disjunctive clusters Example: k-means clustering Clustering - example Graph Mining and Graph Kernels Given some data x i X Find a partitioning of the data into k disjunctive clusters Example: k-means clustering x!!!!8!!! 8 x 1 1 Clustering - example

More information

Bayesian Learning. CSL465/603 - Fall 2016 Narayanan C Krishnan

Bayesian Learning. CSL465/603 - Fall 2016 Narayanan C Krishnan Bayesian Learning CSL465/603 - Fall 2016 Narayanan C Krishnan ckn@iitrpr.ac.in Outline Bayes Theorem MAP Learners Bayes optimal classifier Naïve Bayes classifier Example text classification Bayesian networks

More information

General Naïve Bayes. CS 188: Artificial Intelligence Fall Example: Spam Filtering. Example: OCR. Generalization and Overfitting

General Naïve Bayes. CS 188: Artificial Intelligence Fall Example: Spam Filtering. Example: OCR. Generalization and Overfitting CS 88: Artificial Intelligence Fall 7 Lecture : Perceptrons //7 General Naïve Bayes A general naive Bayes model: C x E n parameters C parameters n x E x C parameters E E C E n Dan Klein UC Berkeley We

More information

Estimating the Inverse Covariance Matrix of Independent Multivariate Normally Distributed Random Variables

Estimating the Inverse Covariance Matrix of Independent Multivariate Normally Distributed Random Variables Estimating the Inverse Covariance Matrix of Independent Multivariate Normally Distributed Random Variables Dominique Brunet, Hanne Kekkonen, Vitor Nunes, Iryna Sivak Fields-MITACS Thematic Program on Inverse

More information

Solving Challenging Non-linear Regression Problems by Manipulating a Gaussian Distribution

Solving Challenging Non-linear Regression Problems by Manipulating a Gaussian Distribution Solving Challenging Non-linear Regression Problems by Manipulating a Gaussian Distribution Sheffield Gaussian Process Summer School, 214 Carl Edward Rasmussen Department of Engineering, University of Cambridge

More information

Search Taxonomy. Web Search. Search Engine Optimization. Information Retrieval

Search Taxonomy. Web Search. Search Engine Optimization. Information Retrieval Information Retrieval INFO 4300 / CS 4300! Retrieval models Older models» Boolean retrieval» Vector Space model Probabilistic Models» BM25» Language models Web search» Learning to Rank Search Taxonomy!

More information

Evaluating Machine Learning Algorithms. Machine DIT

Evaluating Machine Learning Algorithms. Machine DIT Evaluating Machine Learning Algorithms Machine Learning @ DIT 2 Evaluating Classification Accuracy During development, and in testing before deploying a classifier in the wild, we need to be able to quantify

More information

EE 368 Project: Face Detection in Color Images

EE 368 Project: Face Detection in Color Images EE 368 Project: Face Detection in Color Images Wenmiao Lu and Shaohua Sun Department of Electrical Engineering Stanford University May 26, 2003 Abstract We present in this report an approach to automatic

More information

Confidence Intervals for the Area Under an ROC Curve

Confidence Intervals for the Area Under an ROC Curve Chapter 261 Confidence Intervals for the Area Under an ROC Curve Introduction Receiver operating characteristic (ROC) curves are used to assess the accuracy of a diagnostic test. The technique is used

More information

Basics of Statistical Machine Learning

Basics of Statistical Machine Learning CS761 Spring 2013 Advanced Machine Learning Basics of Statistical Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu Modern machine learning is rooted in statistics. You will find many familiar

More information

Why the Normal Distribution?

Why the Normal Distribution? Why the Normal Distribution? Raul Rojas Freie Universität Berlin Februar 2010 Abstract This short note explains in simple terms why the normal distribution is so ubiquitous in pattern recognition applications.

More information

Statistical Data Mining. Practical Assignment 3 Discriminant Analysis and Decision Trees

Statistical Data Mining. Practical Assignment 3 Discriminant Analysis and Decision Trees Statistical Data Mining Practical Assignment 3 Discriminant Analysis and Decision Trees In this practical we discuss linear and quadratic discriminant analysis and tree-based classification techniques.

More information

1 Maximum likelihood estimation

1 Maximum likelihood estimation COS 424: Interacting with Data Lecturer: David Blei Lecture #4 Scribes: Wei Ho, Michael Ye February 14, 2008 1 Maximum likelihood estimation 1.1 MLE of a Bernoulli random variable (coin flips) Given N

More information

Classification: Naïve Bayes Classifier Evaluation. Sheets are based on the those provided by Tan, Steinbach, and Kumar. Introduction to Data Mining

Classification: Naïve Bayes Classifier Evaluation. Sheets are based on the those provided by Tan, Steinbach, and Kumar. Introduction to Data Mining Classification: Naïve Bayes Classifier Evaluation Toon Calders ( t.calders@tue.nl ) Sheets are based on the those provided by Tan, Steinbach, and Kumar. Introduction to Data Mining Last Lecture Classification

More information

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical

More information

Probability & Statistics Primer Gregory J. Hakim University of Washington 2 January 2009 v2.0

Probability & Statistics Primer Gregory J. Hakim University of Washington 2 January 2009 v2.0 Probability & Statistics Primer Gregory J. Hakim University of Washington 2 January 2009 v2.0 This primer provides an overview of basic concepts and definitions in probability and statistics. We shall

More information

Lecture 22: Introduction to Log-linear Models

Lecture 22: Introduction to Log-linear Models Lecture 22: Introduction to Log-linear Models Dipankar Bandyopadhyay, Ph.D. BMTRY 711: Analysis of Categorical Data Spring 2011 Division of Biostatistics and Epidemiology Medical University of South Carolina

More information

Machine Learning Final Project Spam Email Filtering

Machine Learning Final Project Spam Email Filtering Machine Learning Final Project Spam Email Filtering March 2013 Shahar Yifrah Guy Lev Table of Content 1. OVERVIEW... 3 2. DATASET... 3 2.1 SOURCE... 3 2.2 CREATION OF TRAINING AND TEST SETS... 4 2.3 FEATURE

More information

Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training

Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training Charles Elkan elkan@cs.ucsd.edu January 10, 2014 1 Principle of maximum likelihood Consider a family of probability distributions

More information

Markov chains and Markov Random Fields (MRFs)

Markov chains and Markov Random Fields (MRFs) Markov chains and Markov Random Fields (MRFs) 1 Why Markov Models We discuss Markov models now. This is the simplest statistical model in which we don t assume that all variables are independent; we assume

More information

Part III: Machine Learning. CS 188: Artificial Intelligence. Machine Learning This Set of Slides. Parameter Estimation. Estimation: Smoothing

Part III: Machine Learning. CS 188: Artificial Intelligence. Machine Learning This Set of Slides. Parameter Estimation. Estimation: Smoothing CS 188: Artificial Intelligence Lecture 20: Dynamic Bayes Nets, Naïve Bayes Pieter Abbeel UC Berkeley Slides adapted from Dan Klein. Part III: Machine Learning Up until now: how to reason in a model and

More information

Lecture 9: Continuous

Lecture 9: Continuous CSC2515 Fall 2007 Introduction to Machine Learning Lecture 9: Continuous Latent Variable Models 1 Example: continuous underlying variables What are the intrinsic latent dimensions in these two datasets?

More information

Basics Inversion and related concepts Random vectors Matrix calculus. Matrix algebra. Patrick Breheny. January 20

Basics Inversion and related concepts Random vectors Matrix calculus. Matrix algebra. Patrick Breheny. January 20 Matrix algebra January 20 Introduction Basics The mathematics of multiple regression revolves around ordering and keeping track of large arrays of numbers and solving systems of equations The mathematical

More information

Mathematical Background

Mathematical Background Appendix A Mathematical Background A.1 Joint, Marginal and Conditional Probability Let the n (discrete or continuous) random variables y 1,..., y n have a joint joint probability probability p(y 1,...,

More information

Lecture 3: Linear methods for classification

Lecture 3: Linear methods for classification Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,

More information

Lecture 9: Bayesian Learning

Lecture 9: Bayesian Learning Lecture 9: Bayesian Learning Cognitive Systems II - Machine Learning SS 2005 Part II: Special Aspects of Concept Learning Bayes Theorem, MAL / ML hypotheses, Brute-force MAP LEARNING, MDL principle, Bayes

More information

Set theory, and set operations

Set theory, and set operations 1 Set theory, and set operations Sayan Mukherjee Motivation It goes without saying that a Bayesian statistician should know probability theory in depth to model. Measure theory is not needed unless we

More information

Lecture 33: Area by slicing

Lecture 33: Area by slicing Lecture : Area by slicing Nathan Pflueger December 1 1 Introduction In this lecture and the next, we ll revisit the idea of Riemann sums, and show how it can be used to convert certain problems to computing

More information

Lecture 4: Thresholding

Lecture 4: Thresholding Lecture 4: Thresholding c Bryan S. Morse, Brigham Young University, 1998 2000 Last modified on Wednesday, January 12, 2000 at 10:00 AM. Reading SH&B, Section 5.1 4.1 Introduction Segmentation involves

More information

Support Vector Machine (SVM)

Support Vector Machine (SVM) Support Vector Machine (SVM) CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin

More information

Generating Gaussian Mixture Models by Model Selection For Speech Recognition

Generating Gaussian Mixture Models by Model Selection For Speech Recognition Generating Gaussian Mixture Models by Model Selection For Speech Recognition Kai Yu F06 10-701 Final Project Report kaiy@andrew.cmu.edu Abstract While all modern speech recognition systems use Gaussian

More information

Pooling and Meta-analysis. Tony O Hagan

Pooling and Meta-analysis. Tony O Hagan Pooling and Meta-analysis Tony O Hagan Pooling Synthesising prior information from several experts 2 Multiple experts The case of multiple experts is important When elicitation is used to provide expert

More information

An Introduction to Bayesian Statistics

An Introduction to Bayesian Statistics An Introduction to Bayesian Statistics Robert Weiss Department of Biostatistics UCLA School of Public Health robweiss@ucla.edu April 2011 Robert Weiss (UCLA) An Introduction to Bayesian Statistics UCLA

More information

Linear Discrimination. Linear Discrimination. Linear Discrimination. Linearly Separable Systems Pairwise Separation. Steven J Zeil.

Linear Discrimination. Linear Discrimination. Linear Discrimination. Linearly Separable Systems Pairwise Separation. Steven J Zeil. Steven J Zeil Old Dominion Univ. Fall 200 Discriminant-Based Classification Linearly Separable Systems Pairwise Separation 2 Posteriors 3 Logistic Discrimination 2 Discriminant-Based Classification Likelihood-based:

More information

CS395T Computational Statistics with Application to Bioinformatics

CS395T Computational Statistics with Application to Bioinformatics CS395T Computational Statistics with Application to Bioinformatics Prof. William H. Press Spring Term, 2010 The University of Texas at Austin Unit 6: Multivariate Normal Distributions and Chi Square The

More information

Sufficient Statistics and Exponential Family. 1 Statistics and Sufficient Statistics. Math 541: Statistical Theory II. Lecturer: Songfeng Zheng

Sufficient Statistics and Exponential Family. 1 Statistics and Sufficient Statistics. Math 541: Statistical Theory II. Lecturer: Songfeng Zheng Math 541: Statistical Theory II Lecturer: Songfeng Zheng Sufficient Statistics and Exponential Family 1 Statistics and Sufficient Statistics Suppose we have a random sample X 1,, X n taken from a distribution

More information

Feature Extraction and Selection. More Info == Better Performance? Curse of Dimensionality. Feature Space. High-Dimensional Spaces

Feature Extraction and Selection. More Info == Better Performance? Curse of Dimensionality. Feature Space. High-Dimensional Spaces More Info == Better Performance? Feature Extraction and Selection APR Course, Delft, The Netherlands Marco Loog Feature Space Curse of Dimensionality A p-dimensional space, in which each dimension is a

More information

Classification Techniques for Remote Sensing

Classification Techniques for Remote Sensing Classification Techniques for Remote Sensing Selim Aksoy Department of Computer Engineering Bilkent University Bilkent, 06800, Ankara saksoy@cs.bilkent.edu.tr http://www.cs.bilkent.edu.tr/ saksoy/courses/cs551

More information

Summary of Probability

Summary of Probability Summary of Probability Mathematical Physics I Rules of Probability The probability of an event is called P(A), which is a positive number less than or equal to 1. The total probability for all possible

More information

Pattern Analysis. Logistic Regression. 12. Mai 2009. Joachim Hornegger. Chair of Pattern Recognition Erlangen University

Pattern Analysis. Logistic Regression. 12. Mai 2009. Joachim Hornegger. Chair of Pattern Recognition Erlangen University Pattern Analysis Logistic Regression 12. Mai 2009 Joachim Hornegger Chair of Pattern Recognition Erlangen University Pattern Analysis 2 / 43 1 Logistic Regression Posteriors and the Logistic Function Decision

More information

Basics of Probability

Basics of Probability Basics of Probability 1 Sample spaces, events and probabilities Begin with a set Ω the sample space e.g., 6 possible rolls of a die. ω Ω is a sample point/possible world/atomic event A probability space

More information

Classifiers & Classification

Classifiers & Classification Classifiers & Classification Forsyth & Ponce Computer Vision A Modern Approach chapter 22 Pattern Classification Duda, Hart and Stork School of Computer Science & Statistics Trinity College Dublin Dublin

More information

2.8 Expected values and variance

2.8 Expected values and variance y 1 b a 0 y 0.0 0.2 0.4 0.6 0.8 1.0 1.2 0 a b (a) Probability density function x 0 a b (b) Cumulative distribution function x Figure 2.3: The probability density function and cumulative distribution function

More information

Fisher Information and Cramér-Rao Bound. 1 Fisher Information. Math 541: Statistical Theory II. Instructor: Songfeng Zheng

Fisher Information and Cramér-Rao Bound. 1 Fisher Information. Math 541: Statistical Theory II. Instructor: Songfeng Zheng Math 54: Statistical Theory II Fisher Information Cramér-Rao Bound Instructor: Songfeng Zheng In the parameter estimation problems, we obtain information about the parameter from a sample of data coming

More information

Topic 2: Scalar random variables. Definition of random variables

Topic 2: Scalar random variables. Definition of random variables Topic 2: Scalar random variables Discrete and continuous random variables Probability distribution and densities (cdf, pmf, pdf) Important random variables Expectation, mean, variance, moments Markov and

More information

Neural Networks. CAP5610 Machine Learning Instructor: Guo-Jun Qi

Neural Networks. CAP5610 Machine Learning Instructor: Guo-Jun Qi Neural Networks CAP5610 Machine Learning Instructor: Guo-Jun Qi Recap: linear classifier Logistic regression Maximizing the posterior distribution of class Y conditional on the input vector X Support vector

More information

Lecture 9: Shape Description (Regions)

Lecture 9: Shape Description (Regions) Lecture 9: Shape Description (Regions) c Bryan S. Morse, Brigham Young University, 1998 2000 Last modified on February 16, 2000 at 4:00 PM Contents 9.1 What Are Descriptors?.........................................

More information

Detecting Corporate Fraud: An Application of Machine Learning

Detecting Corporate Fraud: An Application of Machine Learning Detecting Corporate Fraud: An Application of Machine Learning Ophir Gottlieb, Curt Salisbury, Howard Shek, Vishal Vaidyanathan December 15, 2006 ABSTRACT This paper explores the application of several

More information

2x + y = 3. Since the second equation is precisely the same as the first equation, it is enough to find x and y satisfying the system

2x + y = 3. Since the second equation is precisely the same as the first equation, it is enough to find x and y satisfying the system 1. Systems of linear equations We are interested in the solutions to systems of linear equations. A linear equation is of the form 3x 5y + 2z + w = 3. The key thing is that we don t multiply the variables

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Prof. Alexander Ihler Prof. Max Welling icamp Tutorial July 22 What is machine learning? The ability of a machine to improve its performance based on previous results:

More information

Logistic Regression. Vibhav Gogate The University of Texas at Dallas. Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld.

Logistic Regression. Vibhav Gogate The University of Texas at Dallas. Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld. Logistic Regression Vibhav Gogate The University of Texas at Dallas Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld. Generative vs. Discriminative Classifiers Want to Learn: h:x Y X features

More information

Jointly Distributed Random Variables

Jointly Distributed Random Variables Jointly Distributed Random Variables COMP 245 STATISTICS Dr N A Heard Contents 1 Jointly Distributed Random Variables 1 1.1 Definition......................................... 1 1.2 Joint cdfs..........................................

More information

0.1 Dielectric Slab Waveguide

0.1 Dielectric Slab Waveguide 0.1 Dielectric Slab Waveguide At high frequencies (especially optical frequencies) the loss associated with the induced current in the metal walls is too high. A transmission line filled with dielectric

More information

Monotonicity Hints. Abstract

Monotonicity Hints. Abstract Monotonicity Hints Joseph Sill Computation and Neural Systems program California Institute of Technology email: joe@cs.caltech.edu Yaser S. Abu-Mostafa EE and CS Deptartments California Institute of Technology

More information

Data Modeling & Analysis Techniques. Probability & Statistics. Manfred Huber 2011 1

Data Modeling & Analysis Techniques. Probability & Statistics. Manfred Huber 2011 1 Data Modeling & Analysis Techniques Probability & Statistics Manfred Huber 2011 1 Probability and Statistics Probability and statistics are often used interchangeably but are different, related fields

More information

Probabilistic Linear Classification: Logistic Regression. Piyush Rai IIT Kanpur

Probabilistic Linear Classification: Logistic Regression. Piyush Rai IIT Kanpur Probabilistic Linear Classification: Logistic Regression Piyush Rai IIT Kanpur Probabilistic Machine Learning (CS772A) Jan 18, 2016 Probabilistic Machine Learning (CS772A) Probabilistic Linear Classification:

More information

Parametric Models. dh(t) dt > 0 (1)

Parametric Models. dh(t) dt > 0 (1) Parametric Models: The Intuition Parametric Models As we saw early, a central component of duration analysis is the hazard rate. The hazard rate is the probability of experiencing an event at time t i

More information

Machine Learning. CS 188: Artificial Intelligence Naïve Bayes. Example: Digit Recognition. Other Classification Tasks

Machine Learning. CS 188: Artificial Intelligence Naïve Bayes. Example: Digit Recognition. Other Classification Tasks CS 188: Artificial Intelligence Naïve Bayes Machine Learning Up until now: how use a model to make optimal decisions Machine learning: how to acquire a model from data / experience Learning parameters

More information

D&D, sect : CAPM without risk free asset Main points:

D&D, sect : CAPM without risk free asset Main points: D&D, sect 74 77: CAPM without risk free asset Main points: Consider N risky assets, N > 2, no risk free asset Then the frontier portfolio set is an hyperbola (Mentioned without proof on p 11 of 1 September)

More information

COMP 551 Applied Machine Learning Lecture 6: Performance evaluation. Model assessment and selection.

COMP 551 Applied Machine Learning Lecture 6: Performance evaluation. Model assessment and selection. COMP 551 Applied Machine Learning Lecture 6: Performance evaluation. Model assessment and selection. Instructor: (jpineau@cs.mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/comp551 Unless otherwise

More information

Lecture 9: Introduction to Pattern Analysis

Lecture 9: Introduction to Pattern Analysis Lecture 9: Introduction to Pattern Analysis g Features, patterns and classifiers g Components of a PR system g An example g Probability definitions g Bayes Theorem g Gaussian densities Features, patterns

More information

Advanced Spatial Statistics Fall 2012 NCSU. Fuentes Lecture notes

Advanced Spatial Statistics Fall 2012 NCSU. Fuentes Lecture notes Advanced Spatial Statistics Fall 2012 NCSU Fuentes Lecture notes Areal unit data 2 Areal Modelling Areal unit data Key Issues Is there spatial pattern? Spatial pattern implies that observations from units

More information

Quantum Fields in Curved Spacetime

Quantum Fields in Curved Spacetime Quantum Fields in Curved Spacetime Lecture 1: Introduction Finn Larsen Michigan Center for Theoretical Physics Yerevan, August 20, 2016. Intro: Fields Setting: many microscopic degrees of freedom interacting

More information

The Exponential Family

The Exponential Family The Exponential Family David M. Blei Columbia University November 3, 2015 Definition A probability density in the exponential family has this form where p.x j / D h.x/ expf > t.x/ a./g; (1) is the natural

More information

Joint Probability Distributions and Random Samples (Devore Chapter Five)

Joint Probability Distributions and Random Samples (Devore Chapter Five) Joint Probability Distributions and Random Samples (Devore Chapter Five) 1016-345-01 Probability and Statistics for Engineers Winter 2010-2011 Contents 1 Joint Probability Distributions 1 1.1 Two Discrete

More information

Machine Learning in Spam Filtering

Machine Learning in Spam Filtering Machine Learning in Spam Filtering A Crash Course in ML Konstantin Tretyakov kt@ut.ee Institute of Computer Science, University of Tartu Overview Spam is Evil ML for Spam Filtering: General Idea, Problems.

More information

ECONOMICS 101 WINTER 2004 TA LECTURE 5: GENERAL EQUILIBRIUM

ECONOMICS 101 WINTER 2004 TA LECTURE 5: GENERAL EQUILIBRIUM ECONOMICS 0 WINTER 2004 TA LECTURE 5: GENERAL EQUILIBRIUM KENNETH R. AHERN. Production Possibility Frontier We now move from partial equilibrium, where we examined only one market at a time, to general

More information

ORF 245 Fundamentals of Statistics Chapter 8 Hypothesis Testing

ORF 245 Fundamentals of Statistics Chapter 8 Hypothesis Testing ORF 245 Fundamentals of Statistics Chapter 8 Hypothesis Testing Robert Vanderbei Fall 2015 Slides last edited on December 11, 2015 http://www.princeton.edu/ rvdb Coin Tossing Example Consider two coins.

More information

CS 2750 Machine Learning. Lecture 1. Machine Learning. http://www.cs.pitt.edu/~milos/courses/cs2750/ CS 2750 Machine Learning.

CS 2750 Machine Learning. Lecture 1. Machine Learning. http://www.cs.pitt.edu/~milos/courses/cs2750/ CS 2750 Machine Learning. Lecture Machine Learning Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square, x5 http://www.cs.pitt.edu/~milos/courses/cs75/ Administration Instructor: Milos Hauskrecht milos@cs.pitt.edu 539 Sennott

More information

Math/Stat 370: Engineering Statistics, Washington State University

Math/Stat 370: Engineering Statistics, Washington State University Math/Stat 370: Engineering Statistics, Washington State University Haijun Li lih@math.wsu.edu Department of Mathematics Washington State University Week 2 Haijun Li Math/Stat 370: Engineering Statistics,

More information

3.6: General Hypothesis Tests

3.6: General Hypothesis Tests 3.6: General Hypothesis Tests The χ 2 goodness of fit tests which we introduced in the previous section were an example of a hypothesis test. In this section we now consider hypothesis tests more generally.

More information