# Gaussian Classifiers CS498

Save this PDF as:

Size: px
Start display at page:

## Transcription

1 Gaussian Classifiers CS498

2 Today s lecture The Gaussian Gaussian classifiers A slightly more sophisticated classifier

3 Nearest Neighbors We can classify with nearest neighbors x m 1 m 2 Decision boundary Can we get a probability?

4 Nearest Neighbors Nearest neighbors offers an intuitive distance measure x d 1 d 2 m 1 m 2 d i (x 1 m i,1 ) 2 +(x 2 m i,2 ) 2 = x m

5 Making a Soft Decision What if I didn t want to classify What if I wanted a degree of belief How would you do that? x d 1 d 2 m 1 m 2

6 From a Distance to a Probability If the distance is 0 the probability is high If the distance is the probability is zero How do we make a function like that?

7 Here s a first crack at it Use exponentiation: 1 e x m

8 Adding an importance factor Let s try to tune the output by adding a factor denoting importance e x m c

9 One more problem Not all dimensions are equal

10 Adding variable importance to dimensions Somewhat more complicated now: e (x m)t C 1 (x m)

11 The Gaussian Distribution This is the idea behind the Gaussian Adding some normalization we get: P(x;m,C) = 1 2π k/2 e (x m) 1/2 C T C 1 (x m)

12 Gaussian models We can now describe data using Gaussians x How? That s very easy

13 Learn Gaussian parameters Estimate the mean: m = 1 x N i Estimate the covariance: C = 1 N 1 (x m)t (x m)

14 Now we can make classifiers We will use probabilities this time We ll compute a belief of class assignment

15 The Classification Process We provide examples of classes We make models of each class We assign all new input data to a class

16 Making an assignment decision Face classification example Having a probability for each face how do we make a decision?

17 Motivating example Face 1 is more likely x y P(y {face 1, face 2 }) Template face 1 Template face 2 Unknown face

18 How the decision is made In simple cases the answer is intuitive To get a complete picture we need to probe a bit deeper

19 Starting simple Two class case, ω 1 and ω 2

20 Starting simple Given a sample x, is it ω 1 or ω 2? i.e. P(ω i x) =?

21 Getting the answer The class posterior probability is: Likelihood P(ω i x) = P(x ω i )P(ω i ) P(x) Evidence Priors To find the answer we need to fill in the terms in the right-hand-side

22 Filling the unknowns Class priors How much of each class? P(ω 1 ) N 1 / N P(ω 2 ) N 2 / N Class likelihood: P(x ω i ) Requires that we know the distribution of ω i We ll assume it is the Gaussian

23 Filling the unknowns Evidence: P(x) = P(x ω 1 )P(ω 1 ) + P(x ω 2 )P(ω 2 ) We now have P(ω 1 x),p(ω 2 x)

24 Making the decision Bayes classification rule If P(ω 1 x) > P(ω 2 x) then x belongs to class ω 1 If P(ω 1 x) < P(ω 2 x) then x belongs to class ω 2 Easier version P(x ω 1 )P(ω 1 ) P(x ω 2 )P(ω 2 ) Equiprobable class version P(x ω 1 ) P(x ω 2 )

25 Visualizing the decision Assume Gaussian data P(x ω i ) = N (x µ i,σ i ) R 1 R 2 P(ω 1 x) = P(ω 2 x) x 0

26 Errors in classification We can t win all the time though Some inputs will be misclassified R 1 R 2 Probability of errors P error = 1 2 x 0 P(x ω 2 )dx x 0 x 0 P(x ω 1 )dx

27 Minimizing misclassifications The Bayes classification rule minimizes any potential misclassifications R 1 R 2 x 0 Can you do any better moving the line?

28 Minimizing risk Not all errors are equal! e.g. medical diagnoses R 1 R 2 x 0 Misclassification is often very tolerable depending on the assumed risks

29 Example Minimum classification error R 1 R 2 x 0 Minimum risk with λ 21 > λ 12 i.e. class 2 is more important R 1 R 2 x 0

30 True/False Positives/Negatives Naming the outcomes classifying for ω 1 x is ω 1 x is ω 2 x classified as ω 1 True positive False positive x classified as ω 2 False negative True negative False positive/false alarm/type I error False negative/miss/type II error

31 Receiver Operating Characteristic Visualize classification balance R.O.C. Curve R 1 R 2 x 0 True positive rate False positive rate

32 Classifying Gaussian data Remember that we need the class likelihood to make a decision For now we ll assume that: P(x ω i ) = N (x µ i,σ i ) i.e. that the input data is Gaussian distributed

33 Overall methodology Obtain training data Fit a Gaussian model to each class Perform parameter estimation for mean, variance and class priors Define decision regions based on models and any given constraints

34 1D example The decision boundary will always be a line separating the two class regions R 1 R 2 x 0

35 2D example

36 2D example fitted Gaussians

37 Gaussian decision boundaries The decision boundary is defined as: P(x ω 1 )P(ω 1 ) = P(x ω 2 )P(ω 2 ) We can substitute Gaussians and solve to find what the boundary looks like

38 Discriminant functions Define a function so that: classify x in ω i if g i (x) > g j (x), i j Decision boundaries are now defined as: g ij (x) ( g i (x) = g j (x))

39 Back to the data Σ i = σ 2 i I produces line boundaries Discriminant: g i (x) = w i T x + b w i = µ i / σ 2 b = µ i T µ i 2σ 2 + log P(ω i )

40 Quadratic boundaries Arbitrary covariance produces more elaborate patterns in the boundary g i (x) = x T W i x + w i T x + w i W i = 1 2 Σ 1 i w i = Σ i 1 µ i w = 1 2 µ T Σ 1 i i µ i 1 2 log Σ i + log P(ω i )

41 Quadratic boundaries Arbitrary covariance produces more elaborate patterns in the boundary

42 Quadratic boundaries Arbitrary covariance produces more elaborate patterns in the boundary

43 Example classification run Learning to recognize two handwritten digits Let s try this in MATLAB

44 Recap The Gaussian Bayesian Decision Theory Risk, decision regions Gaussian classifiers

### Bayesian Classification

CS 650: Computer Vision Bryan S. Morse BYU Computer Science Statistical Basis Training: Class-Conditional Probabilities Suppose that we measure features for a large training set taken from class ω i. Each

### Bayesian Decision Theory. Prof. Richard Zanibbi

Bayesian Decision Theory Prof. Richard Zanibbi Bayesian Decision Theory The Basic Idea To minimize errors, choose the least risky class, i.e. the class for which the expected loss is smallest Assumptions

### Introduction to Machine Learning

Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 5: Decision Theory & ROC Curves Gaussian ML Estimation Many figures courtesy Kevin Murphy s textbook,

### Statistical Machine Learning

Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes

### A crash course in probability and Naïve Bayes classification

Probability theory A crash course in probability and Naïve Bayes classification Chapter 9 Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s

### Probabilities for machine learning

Probabilities for machine learning Roland Memisevic February 2, 2006 Why probabilities? One of the hardest problems when building complex intelligent systems is brittleness. How can we keep tiny irregularities

### STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct

### CS340 Machine learning Naïve Bayes classifiers

CS340 Machine learning Naïve Bayes classifiers Document classification Let Y {1,,C} be the class label and x {0,1} d eg Y {spam, urgent, normal}, x i = I(word i is present in message) Bag of words model

### Final Exam, Spring 2007

10-701 Final Exam, Spring 2007 1. Personal info: Name: Andrew account: E-mail address: 2. There should be 16 numbered pages in this exam (including this cover sheet). 3. You can use any material you brought:

### Lecture 2: October 24

Machine Learning: Foundations Fall Semester, 2 Lecture 2: October 24 Lecturer: Yishay Mansour Scribe: Shahar Yifrah, Roi Meron 2. Bayesian Inference - Overview This lecture is going to describe the basic

### L10: Probability, statistics, and estimation theory

L10: Probability, statistics, and estimation theory Review of probability theory Bayes theorem Statistics and the Normal distribution Least Squares Error estimation Maximum Likelihood estimation Bayesian

### CSCI567 Machine Learning (Fall 2014)

CSCI567 Machine Learning (Fall 2014) Drs. Sha & Liu {feisha,yanliu.cs}@usc.edu September 22, 2014 Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 2014) September 22, 2014 1 /

### CSCC11 Assignment 2: Probability, Estimation, and Classification

CSCC11 Assignment 2: Probability, Estimation, and Classification Written Part (5%): due Friday, October 7th, 11am In the following questions be sure to show all your work. Answers without derivations will

### Math 2015 Lesson 21. We discuss the mean and the median, two important statistics about a distribution. p(x)dx = 0.5

ean and edian We discuss the mean and the median, two important statistics about a distribution. The edian The median is the halfway point of a distribution. It is the point where half the population has

### Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris

Class #6: Non-linear classification ML4Bio 2012 February 17 th, 2012 Quaid Morris 1 Module #: Title of Module 2 Review Overview Linear separability Non-linear classification Linear Support Vector Machines

### CS 688 Pattern Recognition Lecture 4. Linear Models for Classification

CS 688 Pattern Recognition Lecture 4 Linear Models for Classification Probabilistic generative models Probabilistic discriminative models 1 Generative Approach ( x ) p C k p( C k ) Ck p ( ) ( x Ck ) p(

### Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C

### L4: Bayesian Decision Theory

L4: Bayesian Decision Theory Likelihood ratio test Probability of error Bayes risk Bayes, MAP and ML criteria Multi-class problems Discriminant functions CSCE 666 Pattern Analysis Ricardo Gutierrez-Osuna

### Christfried Webers. Canberra February June 2015

c Statistical Group and College of Engineering and Computer Science Canberra February June (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 829 c Part VIII Linear Classification 2 Logistic

### Linear Classification. Volker Tresp Summer 2015

Linear Classification Volker Tresp Summer 2015 1 Classification Classification is the central task of pattern recognition Sensors supply information about an object: to which class do the object belong

### Bayes and Naïve Bayes. cs534-machine Learning

Bayes and aïve Bayes cs534-machine Learning Bayes Classifier Generative model learns Prediction is made by and where This is often referred to as the Bayes Classifier, because of the use of the Bayes rule

### Parametric Models Part I: Maximum Likelihood and Bayesian Density Estimation

Parametric Models Part I: Maximum Likelihood and Bayesian Density Estimation Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Fall 2015 CS 551, Fall 2015

### Linear Models for Classification

Linear Models for Classification Sumeet Agarwal, EEL709 (Most figures from Bishop, PRML) Approaches to classification Discriminant function: Directly assigns each data point x to a particular class Ci

### Mixture Models. Jia Li. Department of Statistics The Pennsylvania State University. Mixture Models

Mixture Models Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Clustering by Mixture Models General bacground on clustering Example method: -means Mixture model based

### Classification Problems

Classification Read Chapter 4 in the text by Bishop, except omit Sections 4.1.6, 4.1.7, 4.2.4, 4.3.3, 4.3.5, 4.3.6, 4.4, and 4.5. Also, review sections 1.5.1, 1.5.2, 1.5.3, and 1.5.4. Classification Problems

### Linear Threshold Units

Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear

### Support Vector Machines

Support Vector Machines Here we approach the two-class classification problem in a direct way: We try and find a plane that separates the classes in feature space. If we cannot, we get creative in two

### MACHINE LEARNING IN HIGH ENERGY PHYSICS

MACHINE LEARNING IN HIGH ENERGY PHYSICS LECTURE #1 Alex Rogozhnikov, 2015 INTRO NOTES 4 days two lectures, two practice seminars every day this is introductory track to machine learning kaggle competition!

### Clustering - example. Given some data x i X Find a partitioning of the data into k disjunctive clusters Example: k-means clustering

Clustering - example Graph Mining and Graph Kernels Given some data x i X Find a partitioning of the data into k disjunctive clusters Example: k-means clustering x!!!!8!!! 8 x 1 1 Clustering - example

### Bayesian Learning. CSL465/603 - Fall 2016 Narayanan C Krishnan

Bayesian Learning CSL465/603 - Fall 2016 Narayanan C Krishnan ckn@iitrpr.ac.in Outline Bayes Theorem MAP Learners Bayes optimal classifier Naïve Bayes classifier Example text classification Bayesian networks

### General Naïve Bayes. CS 188: Artificial Intelligence Fall Example: Spam Filtering. Example: OCR. Generalization and Overfitting

CS 88: Artificial Intelligence Fall 7 Lecture : Perceptrons //7 General Naïve Bayes A general naive Bayes model: C x E n parameters C parameters n x E x C parameters E E C E n Dan Klein UC Berkeley We

### Estimating the Inverse Covariance Matrix of Independent Multivariate Normally Distributed Random Variables

Estimating the Inverse Covariance Matrix of Independent Multivariate Normally Distributed Random Variables Dominique Brunet, Hanne Kekkonen, Vitor Nunes, Iryna Sivak Fields-MITACS Thematic Program on Inverse

### Solving Challenging Non-linear Regression Problems by Manipulating a Gaussian Distribution

Solving Challenging Non-linear Regression Problems by Manipulating a Gaussian Distribution Sheffield Gaussian Process Summer School, 214 Carl Edward Rasmussen Department of Engineering, University of Cambridge

### Search Taxonomy. Web Search. Search Engine Optimization. Information Retrieval

Information Retrieval INFO 4300 / CS 4300! Retrieval models Older models» Boolean retrieval» Vector Space model Probabilistic Models» BM25» Language models Web search» Learning to Rank Search Taxonomy!

### Evaluating Machine Learning Algorithms. Machine DIT

Evaluating Machine Learning Algorithms Machine Learning @ DIT 2 Evaluating Classification Accuracy During development, and in testing before deploying a classifier in the wild, we need to be able to quantify

### EE 368 Project: Face Detection in Color Images

EE 368 Project: Face Detection in Color Images Wenmiao Lu and Shaohua Sun Department of Electrical Engineering Stanford University May 26, 2003 Abstract We present in this report an approach to automatic

### Confidence Intervals for the Area Under an ROC Curve

Chapter 261 Confidence Intervals for the Area Under an ROC Curve Introduction Receiver operating characteristic (ROC) curves are used to assess the accuracy of a diagnostic test. The technique is used

### Basics of Statistical Machine Learning

CS761 Spring 2013 Advanced Machine Learning Basics of Statistical Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu Modern machine learning is rooted in statistics. You will find many familiar

### Why the Normal Distribution?

Why the Normal Distribution? Raul Rojas Freie Universität Berlin Februar 2010 Abstract This short note explains in simple terms why the normal distribution is so ubiquitous in pattern recognition applications.

### Statistical Data Mining. Practical Assignment 3 Discriminant Analysis and Decision Trees

Statistical Data Mining Practical Assignment 3 Discriminant Analysis and Decision Trees In this practical we discuss linear and quadratic discriminant analysis and tree-based classification techniques.

### 1 Maximum likelihood estimation

COS 424: Interacting with Data Lecturer: David Blei Lecture #4 Scribes: Wei Ho, Michael Ye February 14, 2008 1 Maximum likelihood estimation 1.1 MLE of a Bernoulli random variable (coin flips) Given N

### Classification: Naïve Bayes Classifier Evaluation. Sheets are based on the those provided by Tan, Steinbach, and Kumar. Introduction to Data Mining

Classification: Naïve Bayes Classifier Evaluation Toon Calders ( t.calders@tue.nl ) Sheets are based on the those provided by Tan, Steinbach, and Kumar. Introduction to Data Mining Last Lecture Classification

### PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical

### Probability & Statistics Primer Gregory J. Hakim University of Washington 2 January 2009 v2.0

Probability & Statistics Primer Gregory J. Hakim University of Washington 2 January 2009 v2.0 This primer provides an overview of basic concepts and definitions in probability and statistics. We shall

### Lecture 22: Introduction to Log-linear Models

Lecture 22: Introduction to Log-linear Models Dipankar Bandyopadhyay, Ph.D. BMTRY 711: Analysis of Categorical Data Spring 2011 Division of Biostatistics and Epidemiology Medical University of South Carolina

### Machine Learning Final Project Spam Email Filtering

Machine Learning Final Project Spam Email Filtering March 2013 Shahar Yifrah Guy Lev Table of Content 1. OVERVIEW... 3 2. DATASET... 3 2.1 SOURCE... 3 2.2 CREATION OF TRAINING AND TEST SETS... 4 2.3 FEATURE

### Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training

Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training Charles Elkan elkan@cs.ucsd.edu January 10, 2014 1 Principle of maximum likelihood Consider a family of probability distributions

### Markov chains and Markov Random Fields (MRFs)

Markov chains and Markov Random Fields (MRFs) 1 Why Markov Models We discuss Markov models now. This is the simplest statistical model in which we don t assume that all variables are independent; we assume

### Part III: Machine Learning. CS 188: Artificial Intelligence. Machine Learning This Set of Slides. Parameter Estimation. Estimation: Smoothing

CS 188: Artificial Intelligence Lecture 20: Dynamic Bayes Nets, Naïve Bayes Pieter Abbeel UC Berkeley Slides adapted from Dan Klein. Part III: Machine Learning Up until now: how to reason in a model and

### Lecture 9: Continuous

CSC2515 Fall 2007 Introduction to Machine Learning Lecture 9: Continuous Latent Variable Models 1 Example: continuous underlying variables What are the intrinsic latent dimensions in these two datasets?

### Basics Inversion and related concepts Random vectors Matrix calculus. Matrix algebra. Patrick Breheny. January 20

Matrix algebra January 20 Introduction Basics The mathematics of multiple regression revolves around ordering and keeping track of large arrays of numbers and solving systems of equations The mathematical

### Mathematical Background

Appendix A Mathematical Background A.1 Joint, Marginal and Conditional Probability Let the n (discrete or continuous) random variables y 1,..., y n have a joint joint probability probability p(y 1,...,

### Lecture 3: Linear methods for classification

Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,

### Lecture 9: Bayesian Learning

Lecture 9: Bayesian Learning Cognitive Systems II - Machine Learning SS 2005 Part II: Special Aspects of Concept Learning Bayes Theorem, MAL / ML hypotheses, Brute-force MAP LEARNING, MDL principle, Bayes

### Set theory, and set operations

1 Set theory, and set operations Sayan Mukherjee Motivation It goes without saying that a Bayesian statistician should know probability theory in depth to model. Measure theory is not needed unless we

### Lecture 33: Area by slicing

Lecture : Area by slicing Nathan Pflueger December 1 1 Introduction In this lecture and the next, we ll revisit the idea of Riemann sums, and show how it can be used to convert certain problems to computing

### Lecture 4: Thresholding

Lecture 4: Thresholding c Bryan S. Morse, Brigham Young University, 1998 2000 Last modified on Wednesday, January 12, 2000 at 10:00 AM. Reading SH&B, Section 5.1 4.1 Introduction Segmentation involves

### Support Vector Machine (SVM)

Support Vector Machine (SVM) CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin

### Generating Gaussian Mixture Models by Model Selection For Speech Recognition

Generating Gaussian Mixture Models by Model Selection For Speech Recognition Kai Yu F06 10-701 Final Project Report kaiy@andrew.cmu.edu Abstract While all modern speech recognition systems use Gaussian

### Pooling and Meta-analysis. Tony O Hagan

Pooling and Meta-analysis Tony O Hagan Pooling Synthesising prior information from several experts 2 Multiple experts The case of multiple experts is important When elicitation is used to provide expert

### An Introduction to Bayesian Statistics

An Introduction to Bayesian Statistics Robert Weiss Department of Biostatistics UCLA School of Public Health robweiss@ucla.edu April 2011 Robert Weiss (UCLA) An Introduction to Bayesian Statistics UCLA

### Linear Discrimination. Linear Discrimination. Linear Discrimination. Linearly Separable Systems Pairwise Separation. Steven J Zeil.

Steven J Zeil Old Dominion Univ. Fall 200 Discriminant-Based Classification Linearly Separable Systems Pairwise Separation 2 Posteriors 3 Logistic Discrimination 2 Discriminant-Based Classification Likelihood-based:

### CS395T Computational Statistics with Application to Bioinformatics

CS395T Computational Statistics with Application to Bioinformatics Prof. William H. Press Spring Term, 2010 The University of Texas at Austin Unit 6: Multivariate Normal Distributions and Chi Square The

### Sufficient Statistics and Exponential Family. 1 Statistics and Sufficient Statistics. Math 541: Statistical Theory II. Lecturer: Songfeng Zheng

Math 541: Statistical Theory II Lecturer: Songfeng Zheng Sufficient Statistics and Exponential Family 1 Statistics and Sufficient Statistics Suppose we have a random sample X 1,, X n taken from a distribution

### Feature Extraction and Selection. More Info == Better Performance? Curse of Dimensionality. Feature Space. High-Dimensional Spaces

More Info == Better Performance? Feature Extraction and Selection APR Course, Delft, The Netherlands Marco Loog Feature Space Curse of Dimensionality A p-dimensional space, in which each dimension is a

### Classification Techniques for Remote Sensing

Classification Techniques for Remote Sensing Selim Aksoy Department of Computer Engineering Bilkent University Bilkent, 06800, Ankara saksoy@cs.bilkent.edu.tr http://www.cs.bilkent.edu.tr/ saksoy/courses/cs551

### Summary of Probability

Summary of Probability Mathematical Physics I Rules of Probability The probability of an event is called P(A), which is a positive number less than or equal to 1. The total probability for all possible

### Pattern Analysis. Logistic Regression. 12. Mai 2009. Joachim Hornegger. Chair of Pattern Recognition Erlangen University

Pattern Analysis Logistic Regression 12. Mai 2009 Joachim Hornegger Chair of Pattern Recognition Erlangen University Pattern Analysis 2 / 43 1 Logistic Regression Posteriors and the Logistic Function Decision

### Basics of Probability

Basics of Probability 1 Sample spaces, events and probabilities Begin with a set Ω the sample space e.g., 6 possible rolls of a die. ω Ω is a sample point/possible world/atomic event A probability space

### Classifiers & Classification

Classifiers & Classification Forsyth & Ponce Computer Vision A Modern Approach chapter 22 Pattern Classification Duda, Hart and Stork School of Computer Science & Statistics Trinity College Dublin Dublin

### 2.8 Expected values and variance

y 1 b a 0 y 0.0 0.2 0.4 0.6 0.8 1.0 1.2 0 a b (a) Probability density function x 0 a b (b) Cumulative distribution function x Figure 2.3: The probability density function and cumulative distribution function

### Fisher Information and Cramér-Rao Bound. 1 Fisher Information. Math 541: Statistical Theory II. Instructor: Songfeng Zheng

Math 54: Statistical Theory II Fisher Information Cramér-Rao Bound Instructor: Songfeng Zheng In the parameter estimation problems, we obtain information about the parameter from a sample of data coming

### Topic 2: Scalar random variables. Definition of random variables

Topic 2: Scalar random variables Discrete and continuous random variables Probability distribution and densities (cdf, pmf, pdf) Important random variables Expectation, mean, variance, moments Markov and

### Neural Networks. CAP5610 Machine Learning Instructor: Guo-Jun Qi

Neural Networks CAP5610 Machine Learning Instructor: Guo-Jun Qi Recap: linear classifier Logistic regression Maximizing the posterior distribution of class Y conditional on the input vector X Support vector

### Lecture 9: Shape Description (Regions)

Lecture 9: Shape Description (Regions) c Bryan S. Morse, Brigham Young University, 1998 2000 Last modified on February 16, 2000 at 4:00 PM Contents 9.1 What Are Descriptors?.........................................

### Detecting Corporate Fraud: An Application of Machine Learning

Detecting Corporate Fraud: An Application of Machine Learning Ophir Gottlieb, Curt Salisbury, Howard Shek, Vishal Vaidyanathan December 15, 2006 ABSTRACT This paper explores the application of several

### 2x + y = 3. Since the second equation is precisely the same as the first equation, it is enough to find x and y satisfying the system

1. Systems of linear equations We are interested in the solutions to systems of linear equations. A linear equation is of the form 3x 5y + 2z + w = 3. The key thing is that we don t multiply the variables

### Introduction to Machine Learning

Introduction to Machine Learning Prof. Alexander Ihler Prof. Max Welling icamp Tutorial July 22 What is machine learning? The ability of a machine to improve its performance based on previous results:

### Logistic Regression. Vibhav Gogate The University of Texas at Dallas. Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld.

Logistic Regression Vibhav Gogate The University of Texas at Dallas Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld. Generative vs. Discriminative Classifiers Want to Learn: h:x Y X features

### Jointly Distributed Random Variables

Jointly Distributed Random Variables COMP 245 STATISTICS Dr N A Heard Contents 1 Jointly Distributed Random Variables 1 1.1 Definition......................................... 1 1.2 Joint cdfs..........................................

### 0.1 Dielectric Slab Waveguide

0.1 Dielectric Slab Waveguide At high frequencies (especially optical frequencies) the loss associated with the induced current in the metal walls is too high. A transmission line filled with dielectric

### Monotonicity Hints. Abstract

Monotonicity Hints Joseph Sill Computation and Neural Systems program California Institute of Technology email: joe@cs.caltech.edu Yaser S. Abu-Mostafa EE and CS Deptartments California Institute of Technology

### Data Modeling & Analysis Techniques. Probability & Statistics. Manfred Huber 2011 1

Data Modeling & Analysis Techniques Probability & Statistics Manfred Huber 2011 1 Probability and Statistics Probability and statistics are often used interchangeably but are different, related fields

### Probabilistic Linear Classification: Logistic Regression. Piyush Rai IIT Kanpur

Probabilistic Linear Classification: Logistic Regression Piyush Rai IIT Kanpur Probabilistic Machine Learning (CS772A) Jan 18, 2016 Probabilistic Machine Learning (CS772A) Probabilistic Linear Classification:

### Parametric Models. dh(t) dt > 0 (1)

Parametric Models: The Intuition Parametric Models As we saw early, a central component of duration analysis is the hazard rate. The hazard rate is the probability of experiencing an event at time t i

### Machine Learning. CS 188: Artificial Intelligence Naïve Bayes. Example: Digit Recognition. Other Classification Tasks

CS 188: Artificial Intelligence Naïve Bayes Machine Learning Up until now: how use a model to make optimal decisions Machine learning: how to acquire a model from data / experience Learning parameters

### D&D, sect : CAPM without risk free asset Main points:

D&D, sect 74 77: CAPM without risk free asset Main points: Consider N risky assets, N > 2, no risk free asset Then the frontier portfolio set is an hyperbola (Mentioned without proof on p 11 of 1 September)

### COMP 551 Applied Machine Learning Lecture 6: Performance evaluation. Model assessment and selection.

COMP 551 Applied Machine Learning Lecture 6: Performance evaluation. Model assessment and selection. Instructor: (jpineau@cs.mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/comp551 Unless otherwise

### Lecture 9: Introduction to Pattern Analysis

Lecture 9: Introduction to Pattern Analysis g Features, patterns and classifiers g Components of a PR system g An example g Probability definitions g Bayes Theorem g Gaussian densities Features, patterns

### Advanced Spatial Statistics Fall 2012 NCSU. Fuentes Lecture notes

Advanced Spatial Statistics Fall 2012 NCSU Fuentes Lecture notes Areal unit data 2 Areal Modelling Areal unit data Key Issues Is there spatial pattern? Spatial pattern implies that observations from units

### Quantum Fields in Curved Spacetime

Quantum Fields in Curved Spacetime Lecture 1: Introduction Finn Larsen Michigan Center for Theoretical Physics Yerevan, August 20, 2016. Intro: Fields Setting: many microscopic degrees of freedom interacting

### The Exponential Family

The Exponential Family David M. Blei Columbia University November 3, 2015 Definition A probability density in the exponential family has this form where p.x j / D h.x/ expf > t.x/ a./g; (1) is the natural

### Joint Probability Distributions and Random Samples (Devore Chapter Five)

Joint Probability Distributions and Random Samples (Devore Chapter Five) 1016-345-01 Probability and Statistics for Engineers Winter 2010-2011 Contents 1 Joint Probability Distributions 1 1.1 Two Discrete

### Machine Learning in Spam Filtering

Machine Learning in Spam Filtering A Crash Course in ML Konstantin Tretyakov kt@ut.ee Institute of Computer Science, University of Tartu Overview Spam is Evil ML for Spam Filtering: General Idea, Problems.

### ECONOMICS 101 WINTER 2004 TA LECTURE 5: GENERAL EQUILIBRIUM

ECONOMICS 0 WINTER 2004 TA LECTURE 5: GENERAL EQUILIBRIUM KENNETH R. AHERN. Production Possibility Frontier We now move from partial equilibrium, where we examined only one market at a time, to general

### ORF 245 Fundamentals of Statistics Chapter 8 Hypothesis Testing

ORF 245 Fundamentals of Statistics Chapter 8 Hypothesis Testing Robert Vanderbei Fall 2015 Slides last edited on December 11, 2015 http://www.princeton.edu/ rvdb Coin Tossing Example Consider two coins.

### CS 2750 Machine Learning. Lecture 1. Machine Learning. http://www.cs.pitt.edu/~milos/courses/cs2750/ CS 2750 Machine Learning.

Lecture Machine Learning Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square, x5 http://www.cs.pitt.edu/~milos/courses/cs75/ Administration Instructor: Milos Hauskrecht milos@cs.pitt.edu 539 Sennott