Maximum-Likelihood and Bayesian Parameter Estimation
|
|
- Gyles Thompson
- 7 years ago
- Views:
Transcription
1 Maximum-Likelihood and Bayesian Parameter Estimation Expectation Maximization (EM)
2 Estimating Missing Feature Value Estimating missing variable with known parameters Known Value In the absence of x, most likely class is ω Choose that value of x which maximizes the likelihood Choosing mean of missing feature (over all classes) will result in worse performance! This is a case of estimating the hidden variable given the parameters. In EM unknowns are both: parameters and hidden variables (Μissing variable)
3 EM Task Estimate unknown parameters θ given measurement data U However some variables J are missing which need to be integrated out We want to maximize the posterior probability of θ, given the data U, marginalizing over J Parameter to be estimated θ arg max θ n JεJ P( θ, J U ) Μissing Variables Data
4 EM Principle Estimate unknown parameters θ given measurement data Uand not given nuisance variables J which need to be integrated out θ arg max θ n Jε J P( θ, J U ) Alternate between estimating unknowns θ and the hidden variables J At each iteration, instead of finding the best J ε J given an estimate θ at each iteration, EM computes a distribution over the space J
5 k-means Algorithm as EM Estimate means of k classes when class labels are unknown Parameters: means to be estimated Hidden variables: class labels begin initialize m, m,..m k (E-step) do classify n samples according to nearest m i (M-step) end recompute m i until no change in m i return m,m,..m k An iterative algorithm derivable from EM
6 EM Importance EM algorithm widely used for learning in the presence of unobserved variables, e.g., missing features, class labels used even for variables whose value is never directly observed, provided the general form of the pdf governing these variables is known has been used to train Radial Basis Function Networks and Bayesian Belief Networks basis for many unsupervised clustering algorithms basis for widely used Baum-Welch forward-backward algorithm for HMMs
7 EM Algorithm Learning in the presence of unobserved variables Only a subset of relevant instance features might be observable Case of unsupervised learning or clustering How many classes are there?
8 EM Principle EM algorithm iteratively estimates the likelihood given the data that is present
9 Likelihood Formulation Sample points from a single distribution W { x,.., x n Any sample has good and missing (bad ) features x k { xkg,.., xkb} Features are divided into two sets } W W UW g b
10 Central equation in EM Likelihood Formulation Expected Value is over missing features Current best estimate for the full distribution Candidate Vector for an improved estimate Algorithm will select the best candidate θ and call it θ i+
11 Algorithm EM begin initialize θ 0, T, i 0 do i i+ E step: compute Q(θ ;θ i ) M step: θ i+ arg max θ Q(θ ;θ i ) until Q(θ i+ ;θ i ) - Q(θ i ;θ i- )< T
12 EM for D Normal Model Suppose data consists of 4 points in dimensions, one point of which is missing a feature, where * represents the unknown value of the first feature of point x 4. Thus our bad data D b consists of the single feature x 4, and the good data D g consists of the rest. 4 *,, 0, 0 },,, { 4 3 x x x x D x x
13 EM for D Normal Model Assuming that the model is a Gaussian with diagonal covariance and arbitrary mean, it can be described by the parameter vector σ σ µ µ θ
14 EM for D Normal model We take our initial initial guess to be a Gaussian centered on the origin having Σ, that is, 0 θ 0 0
15 EM for a D Normal Model To find improved estimate, must calculate
16 EM for a D Normal Model Simplifying Completes E step. Gives the next estimate Final Solution 0.667
17 EM for D normal model Four data points, one missing the value of x, are in red. Initial estimate is a circularly symmetric Gaussian, centered on the origin (gray). (A better initial estimate could have been derived from the 3 known points.) Each iteration leads to an improved estimate, labeled by iteration number i; after 3 iterations, the algorithm converged.
18 EM to Estimate Means of k Gaussians Data D drawn from mixture of k distinct normal distributions Two-step process generates samples One of the k distributions is selected at random A single random instance x i is generated according to selected distribution
19 Instances Generated by a Mixture of Two Normal Distributions instances
20 Example of EM to Estimate Means of k Gaussians Each instance is generated by Choosing one of the k Gaussians with uniform probability Generating an instance at random according to that Gaussian The single normal distribution is chosen with uniform probability Each of the k Normal distributions has the same known variance Learning task: output a hypothesis that describes the means of each of the k distributions h < µ µ,.., k >
21 Estimating Means of k Gaussians We would like to find a maximum likelihood hypothesis for these means: a hypothesis h that maximizes p(d h)
22 Maximum Likelihood Estimate of Mean of a Single Gaussian Given observed data instances x x...,, x m Drawn from a single distribution that is normally distributed Problem is to find the mean of that distribution
23 Maximum Likelihood Estimate of Mean of a Single Gaussian Maximum likelihood estimate of the mean of a normal distribution can be shown to be one that minimizes the sum of squared errors µ ML m min µ i arg ( x µ ) i Right hand side has a maximum value at µ m ML x i m i which is the sample mean
24 Mixture of Two Normal Distributions We cannot observe as to which instances were generated by which distribution Full description of instance < x,, i zi zi x i observed value of i th instance z i and z i indicate which of two normal distributions was used to generate x i z ij if z ij was used to generate x i, 0 otherwise z i and z i are hidden variables, which have probability distributions associated with them >
25 Hidden variables specify distribution Z i Z i 0 Z i 0 Z i (x i,,0)
26 -Means Problem Full description of instance < x,, i zi zi > x i observed variable z i and z i are hidden variables If z i and z i were observed, we could use maximum likelihood estimates for the means: m m µ x i µ i z i i z Since we do not know z i and z i, we will use EM instead m m i x i
27 EM Algorithm Applied to k-means Problem Search for a Maximum Likelihood Hypothesis by repeatedly re-estimating expected values of hidden binary variables z ij given its current hypothesis < µ,...µ k > Then recalculate the maximum likelihood hypothesis using these expected values for the hidden variables
28 EM algorithm for two means Regarded As Probabilities Z i Z i 0 Z i 0 Z i. Hypothesize means, then determine expected value of hidden variables for all samples. Use these hidden variable values to recalculate the means
29 EM Applied to Two-Means Problem Initialize the hypothesis to h < µ, µ > Estimate the expected values of hidden variables z ij given its current hypothesis Recalculate the maximum likelihood hypothesis using these expected values for the hidden variables Re-estimate h repeatedly until the procedure converges to a stationary value of h
30 EM Algorithm for -Means Step Step Calculate the expected value E[z ij ] of each hidden variable z ij, assuming the current hypothesis holds h µ, µ Calculate new maximum likelihood hypothesis assuming the value taken on by each hidden variable h z ij is its expected value E[z ij ] calculated in Step. Then replace hypothesis by the new hypothesis and iterate. h µ, µ h µ, µ µ, µ
31 EM First Step Calculate the expected value E[z ij ] of each hidden variable z ij, assuming the current hypothesis holds µ, µ h ) ( ) ( ] [ n n i j i ij x x p x x p z E µ µ µ µ n e ) ( ) ( x j xi j i e µ σ µ σ Probability that instance x i was generated by the j th Gaussian
32 EM Second Step Calculate a new maximum likelihood hypothesis µ j h m i m µ, µ E[ z i ij ] E[ z ij ] x i Observations:. Similar to earlier sample mean calculation for a single Gaussian. µ ML m m x i i
33 Clustering using EM
34 Feature Extraction Feature Extraction Image I_ Feature Extraction Feature Vector (74 values) i
35 Feature Extraction Image I_ Global Features: Aspect ratio Stroke ratio 7 Local Features F(i,j) s(i,j) i,9 N(i)*S(j) j 0,7 Feature Extraction s(i,j) - no. of components with slope j in subimage i N(i) - no. of components in subimage i S(j) max i s(i,j) N(i) Feature Vector (74 values)
36 For One Feature P(I_ C_) P(I_ C_) P(I_ C_3) P(I_ C_4) P(I_ C_5) Cycle Cycle Cycle N Final Clusters Centers (Images closer to the means) P(C_3) P(C_) P(C_) P(C_3) P(C_3)
37 Clustering Initial Mean for Cluster (Initialized with FV for Image ) 0 cycles Final Mean for Cluster
38 Essence of EM Algorithm The current hypothesis is used to estimate the unobserved variables Expected values of these variables are used to calculate an improved hypothesis It can be shown that on each iteration through the loop EM increases the likelihood P(D/h) unless it is at a local maximum Algorithm converges on a local maximum likelihood hypothesis for µ >, µ <
39 General Statement of EM Algorithm Parameters of interest were θ < µ,µ > Full data were the triples < x,, i zi zi >, of which only x i is observed
40 General Statement of EM Algorithm (cont d.) Let X { x, x..., x } m Let denote the observed data in a set of m independently drawn instances denote the unobserved data in these same instances Let Y Z X { denote the full data z, z..., z } m Z
41 General Statement of EM Algorithm (cont d.) The unobserved Z can be treated as a random variable whose p.d.f. depends on the unknown parameters θ and on the observed data X Similarly Y is a random variable because it is defined in terms of the random variable Z h denotes current hypothesized values of the parameters θ h denotes revised hypothesis estimated on each iteration of EM algorithm
42 General Statement of EM Algorithm (cont d.) EM algorithm searches for m.l.e. hypothesis by seeking the that maximizes h E [ln P( Y h )] h
43 General Statement of EM Algorithm (cont d.) Define the function Q ( h h) that gives E [ln P( Y h )] as a function of h under the assumption that θ h Q ( h h) E[ln P( Y h ) h, X ] And given the observed portion X of the full data Y
44 General Statement of EM Algorithm Repeat until convergence Step : Estimation (E) Step: Calculate Q ( h h) using the current hypothesis h and the observed data X to estimate the probability distribution over Y Q ( h h) E[ln P( Y h ) h, X ] Step : Maximization (M) Step: Replace hypothesis h by the hypothesis h that maximizes this Q function h arg max Q( h h h)
45 Derivation of k Means Algorithm from General EM algorithm Derive previously seen algorithm for estimating the means of a mixture of k Normal distributions. Estimate the parameters θ < µ µ,... k We are given the observed data > X i { x } The hidden variables Z ik { z i, Kz } Indicate which of the k Normal distributions was used to generate x i
46 Derivation of the k Means Algorithm from General EM Algorithm (cont d.) Need to derive an expression for Q( h / h') First we derive an expression for ln P ( Y h )
47 Derivation of the k Means Algorithm The probability of a single instance of the full data can be written ) ( h y p i ik i i z z x y K,, k j j z ij x ik i i i e h z z x p h y p ) ( ),, ( ) ( µ σ π σ K
48 Derivation of the k Means Algorithm (cont d.) Given this probability for a single instance p ( y, the h i ) logarithmic probability ln P ( Y h for ) all m instances in the data is ln P( Y h ) ln m i p( y i h ) m lnp( y i i h ) m k ln z j ij ( x i µ ) j π σ σ i
49 Derivation of the k Means Algorithm (cont d.) In general, for any function f(z) that is a linear function of z, the following equality holds E [ f ( z)] f ( E z]) m k E[ln P( Y h )] E ln z j ij ( x i µ ) i j π σ σ i m k ln E[ z j ij ]( x i µ ) j π σ σ
50 Derivation of the k Means Algorithm To summarize, where h µ and, K, µ Eis [ z k ij calculated ] on the current hypothesis h and the observed data X, the function for the Q ( kh means h) problem is m k E[ z j ij ]( x i µ ) i j π σ σ ( h) ln Q h E[ z ij ] e k n ( x σ e i σ µ ) j ( x µ ) i n
51 Derivation of k Means Algorithm: Second (Maximization Step) To Find the Values h µ,, µ K k arg µ m k maxq( h h) arg max ln E[ z ij ]( x i j ) h h i π σ σ j arg min E[ z ]( x µ ) h m k i j ij i j µ j m i m E[ z CSE 555: isrihari ij ] E[ z ij ] x i
52 Summary In many parameter estimation tasks, some of the relevant instance variables may be unobservable. In this case, the EM algorithm is useful.
Course: Model, Learning, and Inference: Lecture 5
Course: Model, Learning, and Inference: Lecture 5 Alan Yuille Department of Statistics, UCLA Los Angeles, CA 90095 yuille@stat.ucla.edu Abstract Probability distributions on structured representation.
More informationLogistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression
Logistic Regression Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Logistic Regression Preserve linear classification boundaries. By the Bayes rule: Ĝ(x) = arg max
More informationStatistical Machine Learning
Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes
More informationLinear Threshold Units
Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear
More informationCS229 Lecture notes. Andrew Ng
CS229 Lecture notes Andrew Ng Part X Factor analysis Whenwehavedatax (i) R n thatcomesfromamixtureofseveral Gaussians, the EM algorithm can be applied to fit a mixture model. In this setting, we usually
More informationStatistical Machine Learning from Data
Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Gaussian Mixture Models Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique
More informationBayes and Naïve Bayes. cs534-machine Learning
Bayes and aïve Bayes cs534-machine Learning Bayes Classifier Generative model learns Prediction is made by and where This is often referred to as the Bayes Classifier, because of the use of the Bayes rule
More information10-601. Machine Learning. http://www.cs.cmu.edu/afs/cs/academic/class/10601-f10/index.html
10-601 Machine Learning http://www.cs.cmu.edu/afs/cs/academic/class/10601-f10/index.html Course data All up-to-date info is on the course web page: http://www.cs.cmu.edu/afs/cs/academic/class/10601-f10/index.html
More informationLinear Classification. Volker Tresp Summer 2015
Linear Classification Volker Tresp Summer 2015 1 Classification Classification is the central task of pattern recognition Sensors supply information about an object: to which class do the object belong
More informationCHAPTER 2 Estimating Probabilities
CHAPTER 2 Estimating Probabilities Machine Learning Copyright c 2016. Tom M. Mitchell. All rights reserved. *DRAFT OF January 24, 2016* *PLEASE DO NOT DISTRIBUTE WITHOUT AUTHOR S PERMISSION* This is a
More informationHidden Markov Models
8.47 Introduction to omputational Molecular Biology Lecture 7: November 4, 2004 Scribe: Han-Pang hiu Lecturer: Ross Lippert Editor: Russ ox Hidden Markov Models The G island phenomenon The nucleotide frequencies
More informationSTATISTICA Formula Guide: Logistic Regression. Table of Contents
: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary
More informationIntroduction to General and Generalized Linear Models
Introduction to General and Generalized Linear Models General Linear Models - part I Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby
More informationQuestion 2 Naïve Bayes (16 points)
Question 2 Naïve Bayes (16 points) About 2/3 of your email is spam so you downloaded an open source spam filter based on word occurrences that uses the Naive Bayes classifier. Assume you collected the
More informationPackage EstCRM. July 13, 2015
Version 1.4 Date 2015-7-11 Package EstCRM July 13, 2015 Title Calibrating Parameters for the Samejima's Continuous IRT Model Author Cengiz Zopluoglu Maintainer Cengiz Zopluoglu
More information1 Maximum likelihood estimation
COS 424: Interacting with Data Lecturer: David Blei Lecture #4 Scribes: Wei Ho, Michael Ye February 14, 2008 1 Maximum likelihood estimation 1.1 MLE of a Bernoulli random variable (coin flips) Given N
More informationLecture 3: Linear methods for classification
Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,
More informationNeural Networks Lesson 5 - Cluster Analysis
Neural Networks Lesson 5 - Cluster Analysis Prof. Michele Scarpiniti INFOCOM Dpt. - Sapienza University of Rome http://ispac.ing.uniroma1.it/scarpiniti/index.htm michele.scarpiniti@uniroma1.it Rome, 29
More informationFacebook Friend Suggestion Eytan Daniyalzade and Tim Lipus
Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus 1. Introduction Facebook is a social networking website with an open platform that enables developers to extract and utilize user information
More informationLikelihood, MLE & EM for Gaussian Mixture Clustering. Nick Duffield Texas A&M University
Likelihood, MLE & EM for Gaussian Mixture Clustering Nick Duffield Texas A&M University Probability vs. Likelihood Probability: predict unknown outcomes based on known parameters: P(x θ) Likelihood: eskmate
More information1 Determinants and the Solvability of Linear Systems
1 Determinants and the Solvability of Linear Systems In the last section we learned how to use Gaussian elimination to solve linear systems of n equations in n unknowns The section completely side-stepped
More informationLecture 9: Introduction to Pattern Analysis
Lecture 9: Introduction to Pattern Analysis g Features, patterns and classifiers g Components of a PR system g An example g Probability definitions g Bayes Theorem g Gaussian densities Features, patterns
More informationCalculation of Minimum Distances. Minimum Distance to Means. Σi i = 1
Minimum Distance to Means Similar to Parallelepiped classifier, but instead of bounding areas, the user supplies spectral class means in n-dimensional space and the algorithm calculates the distance between
More informationCCNY. BME I5100: Biomedical Signal Processing. Linear Discrimination. Lucas C. Parra Biomedical Engineering Department City College of New York
BME I5100: Biomedical Signal Processing Linear Discrimination Lucas C. Parra Biomedical Engineering Department CCNY 1 Schedule Week 1: Introduction Linear, stationary, normal - the stuff biology is not
More informationTowards running complex models on big data
Towards running complex models on big data Working with all the genomes in the world without changing the model (too much) Daniel Lawson Heilbronn Institute, University of Bristol 2013 1 / 17 Motivation
More informationIntroduction to Algorithmic Trading Strategies Lecture 2
Introduction to Algorithmic Trading Strategies Lecture 2 Hidden Markov Trading Model Haksun Li haksun.li@numericalmethod.com www.numericalmethod.com Outline Carry trade Momentum Valuation CAPM Markov chain
More informationOverview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model
Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written
More informationUsing Mixtures-of-Distributions models to inform farm size selection decisions in representative farm modelling. Philip Kostov and Seamus McErlean
Using Mixtures-of-Distributions models to inform farm size selection decisions in representative farm modelling. by Philip Kostov and Seamus McErlean Working Paper, Agricultural and Food Economics, Queen
More informationHow To Solve The Cluster Algorithm
Cluster Algorithms Adriano Cruz adriano@nce.ufrj.br 28 de outubro de 2013 Adriano Cruz adriano@nce.ufrj.br () Cluster Algorithms 28 de outubro de 2013 1 / 80 Summary 1 K-Means Adriano Cruz adriano@nce.ufrj.br
More informationCI6227: Data Mining. Lesson 11b: Ensemble Learning. Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore.
CI6227: Data Mining Lesson 11b: Ensemble Learning Sinno Jialin PAN Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore Acknowledgements: slides are adapted from the lecture notes
More informationLogistic Regression. Vibhav Gogate The University of Texas at Dallas. Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld.
Logistic Regression Vibhav Gogate The University of Texas at Dallas Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld. Generative vs. Discriminative Classifiers Want to Learn: h:x Y X features
More informationReject Inference in Credit Scoring. Jie-Men Mok
Reject Inference in Credit Scoring Jie-Men Mok BMI paper January 2009 ii Preface In the Master programme of Business Mathematics and Informatics (BMI), it is required to perform research on a business
More informationA Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution
A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution PSYC 943 (930): Fundamentals of Multivariate Modeling Lecture 4: September
More informationFactor analysis. Angela Montanari
Factor analysis Angela Montanari 1 Introduction Factor analysis is a statistical model that allows to explain the correlations between a large number of observed correlated variables through a small number
More informationData a systematic approach
Pattern Discovery on Australian Medical Claims Data a systematic approach Ah Chung Tsoi Senior Member, IEEE, Shu Zhang, Markus Hagenbuchner Member, IEEE Abstract The national health insurance system in
More informationChristfried Webers. Canberra February June 2015
c Statistical Group and College of Engineering and Computer Science Canberra February June (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 829 c Part VIII Linear Classification 2 Logistic
More informationEM Clustering Approach for Multi-Dimensional Analysis of Big Data Set
EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set Amhmed A. Bhih School of Electrical and Electronic Engineering Princy Johnson School of Electrical and Electronic Engineering Martin
More informationCSCI567 Machine Learning (Fall 2014)
CSCI567 Machine Learning (Fall 2014) Drs. Sha & Liu {feisha,yanliu.cs}@usc.edu September 22, 2014 Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 2014) September 22, 2014 1 /
More informationPredict Influencers in the Social Network
Predict Influencers in the Social Network Ruishan Liu, Yang Zhao and Liuyu Zhou Email: rliu2, yzhao2, lyzhou@stanford.edu Department of Electrical Engineering, Stanford University Abstract Given two persons
More informationStatistical Models in R
Statistical Models in R Some Examples Steven Buechler Department of Mathematics 276B Hurley Hall; 1-6233 Fall, 2007 Outline Statistical Models Structure of models in R Model Assessment (Part IA) Anova
More informationPenalized regression: Introduction
Penalized regression: Introduction Patrick Breheny August 30 Patrick Breheny BST 764: Applied Statistical Modeling 1/19 Maximum likelihood Much of 20th-century statistics dealt with maximum likelihood
More informationSAS Software to Fit the Generalized Linear Model
SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling
More informationLOGISTIC REGRESSION. Nitin R Patel. where the dependent variable, y, is binary (for convenience we often code these values as
LOGISTIC REGRESSION Nitin R Patel Logistic regression extends the ideas of multiple linear regression to the situation where the dependent variable, y, is binary (for convenience we often code these values
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct
More informationUnivariate Regression
Univariate Regression Correlation and Regression The regression line summarizes the linear relationship between 2 variables Correlation coefficient, r, measures strength of relationship: the closer r is
More informationProblem of Missing Data
VASA Mission of VA Statisticians Association (VASA) Promote & disseminate statistical methodological research relevant to VA studies; Facilitate communication & collaboration among VA-affiliated statisticians;
More informationMaster s Theory Exam Spring 2006
Spring 2006 This exam contains 7 questions. You should attempt them all. Each question is divided into parts to help lead you through the material. You should attempt to complete as much of each problem
More informationThe Expectation Maximization Algorithm A short tutorial
The Expectation Maximiation Algorithm A short tutorial Sean Borman Comments and corrections to: em-tut at seanborman dot com July 8 2004 Last updated January 09, 2009 Revision history 2009-0-09 Corrected
More informationMedical Information Management & Mining. You Chen Jan,15, 2013 You.chen@vanderbilt.edu
Medical Information Management & Mining You Chen Jan,15, 2013 You.chen@vanderbilt.edu 1 Trees Building Materials Trees cannot be used to build a house directly. How can we transform trees to building materials?
More informationEnvironmental Remote Sensing GEOG 2021
Environmental Remote Sensing GEOG 2021 Lecture 4 Image classification 2 Purpose categorising data data abstraction / simplification data interpretation mapping for land cover mapping use land cover class
More informationTime Series Analysis
Time Series Analysis Forecasting with ARIMA models Andrés M. Alonso Carolina García-Martos Universidad Carlos III de Madrid Universidad Politécnica de Madrid June July, 2012 Alonso and García-Martos (UC3M-UPM)
More informationUW CSE Technical Report 03-06-01 Probabilistic Bilinear Models for Appearance-Based Vision
UW CSE Technical Report 03-06-01 Probabilistic Bilinear Models for Appearance-Based Vision D.B. Grimes A.P. Shon R.P.N. Rao Dept. of Computer Science and Engineering University of Washington Seattle, WA
More informationAnalysis of kiva.com Microlending Service! Hoda Eydgahi Julia Ma Andy Bardagjy December 9, 2010 MAS.622j
Analysis of kiva.com Microlending Service! Hoda Eydgahi Julia Ma Andy Bardagjy December 9, 2010 MAS.622j What is Kiva? An organization that allows people to lend small amounts of money via the Internet
More informationThese slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop
Music and Machine Learning (IFT6080 Winter 08) Prof. Douglas Eck, Université de Montréal These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher
More informationANALYSIS, THEORY AND DESIGN OF LOGISTIC REGRESSION CLASSIFIERS USED FOR VERY LARGE SCALE DATA MINING
ANALYSIS, THEORY AND DESIGN OF LOGISTIC REGRESSION CLASSIFIERS USED FOR VERY LARGE SCALE DATA MINING BY OMID ROUHANI-KALLEH THESIS Submitted as partial fulfillment of the requirements for the degree of
More informationSPECIAL PERTURBATIONS UNCORRELATED TRACK PROCESSING
AAS 07-228 SPECIAL PERTURBATIONS UNCORRELATED TRACK PROCESSING INTRODUCTION James G. Miller * Two historical uncorrelated track (UCT) processing approaches have been employed using general perturbations
More informationPoisson Models for Count Data
Chapter 4 Poisson Models for Count Data In this chapter we study log-linear models for count data under the assumption of a Poisson error structure. These models have many applications, not only to the
More informationDirichlet Processes A gentle tutorial
Dirichlet Processes A gentle tutorial SELECT Lab Meeting October 14, 2008 Khalid El-Arini Motivation We are given a data set, and are told that it was generated from a mixture of Gaussian distributions.
More informationAdvanced Signal Processing and Digital Noise Reduction
Advanced Signal Processing and Digital Noise Reduction Saeed V. Vaseghi Queen's University of Belfast UK WILEY HTEUBNER A Partnership between John Wiley & Sons and B. G. Teubner Publishers Chichester New
More informationBasics of Statistical Machine Learning
CS761 Spring 2013 Advanced Machine Learning Basics of Statistical Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu Modern machine learning is rooted in statistics. You will find many familiar
More informationIs log ratio a good value for measuring return in stock investments
Is log ratio a good value for measuring return in stock investments Alfred Ultsch Databionics Research Group, University of Marburg, Germany, Contact: ultsch@informatik.uni-marburg.de Measuring the rate
More informationData Mining. Cluster Analysis: Advanced Concepts and Algorithms
Data Mining Cluster Analysis: Advanced Concepts and Algorithms Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 More Clustering Methods Prototype-based clustering Density-based clustering Graph-based
More informationMixture Models for Genomic Data
Mixture Models for Genomic Data S. Robin AgroParisTech / INRA École de Printemps en Apprentissage automatique, Baie de somme, May 2010 S. Robin (AgroParisTech / INRA) Mixture Models May 10 1 / 48 Outline
More informationMachine Learning and Data Analysis overview. Department of Cybernetics, Czech Technical University in Prague. http://ida.felk.cvut.
Machine Learning and Data Analysis overview Jiří Kléma Department of Cybernetics, Czech Technical University in Prague http://ida.felk.cvut.cz psyllabus Lecture Lecturer Content 1. J. Kléma Introduction,
More informationProbabilistic Latent Semantic Analysis (plsa)
Probabilistic Latent Semantic Analysis (plsa) SS 2008 Bayesian Networks Multimedia Computing, Universität Augsburg Rainer.Lienhart@informatik.uni-augsburg.de www.multimedia-computing.{de,org} References
More informationComponent Ordering in Independent Component Analysis Based on Data Power
Component Ordering in Independent Component Analysis Based on Data Power Anne Hendrikse Raymond Veldhuis University of Twente University of Twente Fac. EEMCS, Signals and Systems Group Fac. EEMCS, Signals
More informationNeural Networks and Support Vector Machines
INF5390 - Kunstig intelligens Neural Networks and Support Vector Machines Roar Fjellheim INF5390-13 Neural Networks and SVM 1 Outline Neural networks Perceptrons Neural networks Support vector machines
More informationBayesian Machine Learning (ML): Modeling And Inference in Big Data. Zhuhua Cai Google, Rice University caizhua@gmail.com
Bayesian Machine Learning (ML): Modeling And Inference in Big Data Zhuhua Cai Google Rice University caizhua@gmail.com 1 Syllabus Bayesian ML Concepts (Today) Bayesian ML on MapReduce (Next morning) Bayesian
More informationINDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition)
INDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition) Abstract Indirect inference is a simulation-based method for estimating the parameters of economic models. Its
More informationSOLVING LINEAR SYSTEMS
SOLVING LINEAR SYSTEMS Linear systems Ax = b occur widely in applied mathematics They occur as direct formulations of real world problems; but more often, they occur as a part of the numerical analysis
More informationPerformance Metrics for Graph Mining Tasks
Performance Metrics for Graph Mining Tasks 1 Outline Introduction to Performance Metrics Supervised Learning Performance Metrics Unsupervised Learning Performance Metrics Optimizing Metrics Statistical
More information3. The Junction Tree Algorithms
A Short Course on Graphical Models 3. The Junction Tree Algorithms Mark Paskin mark@paskin.org 1 Review: conditional independence Two random variables X and Y are independent (written X Y ) iff p X ( )
More informationBayesian probability theory
Bayesian probability theory Bruno A. Olshausen arch 1, 2004 Abstract Bayesian probability theory provides a mathematical framework for peforming inference, or reasoning, using probability. The foundations
More informationFactor Analysis. Factor Analysis
Factor Analysis Principal Components Analysis, e.g. of stock price movements, sometimes suggests that several variables may be responding to a small number of underlying forces. In the factor model, we
More informationClass #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris
Class #6: Non-linear classification ML4Bio 2012 February 17 th, 2012 Quaid Morris 1 Module #: Title of Module 2 Review Overview Linear separability Non-linear classification Linear Support Vector Machines
More informationMultivariate Analysis (Slides 13)
Multivariate Analysis (Slides 13) The final topic we consider is Factor Analysis. A Factor Analysis is a mathematical approach for attempting to explain the correlation between a large set of variables
More informationBayesian Classifier for a Gaussian Distribution, Decision Surface Equation, with Application
Iraqi Journal of Statistical Science (18) 2010 p.p. [35-58] Bayesian Classifier for a Gaussian Distribution, Decision Surface Equation, with Application ABSTRACT Nawzad. M. Ahmad * Bayesian decision theory
More informationCS 2750 Machine Learning. Lecture 1. Machine Learning. http://www.cs.pitt.edu/~milos/courses/cs2750/ CS 2750 Machine Learning.
Lecture Machine Learning Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square, x5 http://www.cs.pitt.edu/~milos/courses/cs75/ Administration Instructor: Milos Hauskrecht milos@cs.pitt.edu 539 Sennott
More informationLikelihood: Frequentist vs Bayesian Reasoning
"PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION" Integrative Biology 200B University of California, Berkeley Spring 2009 N Hallinan Likelihood: Frequentist vs Bayesian Reasoning Stochastic odels and
More informationModern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh
Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh Peter Richtárik Week 3 Randomized Coordinate Descent With Arbitrary Sampling January 27, 2016 1 / 30 The Problem
More informationLogit Models for Binary Data
Chapter 3 Logit Models for Binary Data We now turn our attention to regression models for dichotomous data, including logistic regression and probit analysis. These models are appropriate when the response
More informationK-Means Clustering Tutorial
K-Means Clustering Tutorial By Kardi Teknomo,PhD Preferable reference for this tutorial is Teknomo, Kardi. K-Means Clustering Tutorials. http:\\people.revoledu.com\kardi\ tutorial\kmean\ Last Update: July
More information1 Introduction to Matrices
1 Introduction to Matrices In this section, important definitions and results from matrix algebra that are useful in regression analysis are introduced. While all statements below regarding the columns
More informationCS 688 Pattern Recognition Lecture 4. Linear Models for Classification
CS 688 Pattern Recognition Lecture 4 Linear Models for Classification Probabilistic generative models Probabilistic discriminative models 1 Generative Approach ( x ) p C k p( C k ) Ck p ( ) ( x Ck ) p(
More informationProbabilistic user behavior models in online stores for recommender systems
Probabilistic user behavior models in online stores for recommender systems Tomoharu Iwata Abstract Recommender systems are widely used in online stores because they are expected to improve both user
More informationThe PageRank Citation Ranking: Bring Order to the Web
The PageRank Citation Ranking: Bring Order to the Web presented by: Xiaoxi Pang 25.Nov 2010 1 / 20 Outline Introduction A ranking for every page on the Web Implementation Convergence Properties Personalized
More informationMachine Learning Big Data using Map Reduce
Machine Learning Big Data using Map Reduce By Michael Bowles, PhD Where Does Big Data Come From? -Web data (web logs, click histories) -e-commerce applications (purchase histories) -Retail purchase histories
More informationTHE NUMBER OF GRAPHS AND A RANDOM GRAPH WITH A GIVEN DEGREE SEQUENCE. Alexander Barvinok
THE NUMBER OF GRAPHS AND A RANDOM GRAPH WITH A GIVEN DEGREE SEQUENCE Alexer Barvinok Papers are available at http://www.math.lsa.umich.edu/ barvinok/papers.html This is a joint work with J.A. Hartigan
More informationPrincipal components analysis
CS229 Lecture notes Andrew Ng Part XI Principal components analysis In our discussion of factor analysis, we gave a way to model data x R n as approximately lying in some k-dimension subspace, where k
More informationIntroduction to Machine Learning Using Python. Vikram Kamath
Introduction to Machine Learning Using Python Vikram Kamath Contents: 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. Introduction/Definition Where and Why ML is used Types of Learning Supervised Learning Linear Regression
More informationParallelization Strategies for Multicore Data Analysis
Parallelization Strategies for Multicore Data Analysis Wei-Chen Chen 1 Russell Zaretzki 2 1 University of Tennessee, Dept of EEB 2 University of Tennessee, Dept. Statistics, Operations, and Management
More informationCommon factor analysis
Common factor analysis This is what people generally mean when they say "factor analysis" This family of techniques uses an estimate of common variance among the original variables to generate the factor
More informationVisualization of Collaborative Data
Visualization of Collaborative Data Guobiao Mei University of California, Riverside gmei@cs.ucr.edu Christian R. Shelton University of California, Riverside cshelton@cs.ucr.edu Abstract Collaborative data
More informationImputing Missing Data using SAS
ABSTRACT Paper 3295-2015 Imputing Missing Data using SAS Christopher Yim, California Polytechnic State University, San Luis Obispo Missing data is an unfortunate reality of statistics. However, there are
More informationExact Inference for Gaussian Process Regression in case of Big Data with the Cartesian Product Structure
Exact Inference for Gaussian Process Regression in case of Big Data with the Cartesian Product Structure Belyaev Mikhail 1,2,3, Burnaev Evgeny 1,2,3, Kapushev Yermek 1,2 1 Institute for Information Transmission
More informationIntroduction to Logistic Regression
OpenStax-CNX module: m42090 1 Introduction to Logistic Regression Dan Calderon This work is produced by OpenStax-CNX and licensed under the Creative Commons Attribution License 3.0 Abstract Gives introduction
More informationIntroduction to Principal Components and FactorAnalysis
Introduction to Principal Components and FactorAnalysis Multivariate Analysis often starts out with data involving a substantial number of correlated variables. Principal Component Analysis (PCA) is a
More informationRobert Collins CSE598G. More on Mean-shift. R.Collins, CSE, PSU CSE598G Spring 2006
More on Mean-shift R.Collins, CSE, PSU Spring 2006 Recall: Kernel Density Estimation Given a set of data samples x i ; i=1...n Convolve with a kernel function H to generate a smooth function f(x) Equivalent
More informationTopic models for Sentiment analysis: A Literature Survey
Topic models for Sentiment analysis: A Literature Survey Nikhilkumar Jadhav 123050033 June 26, 2014 In this report, we present the work done so far in the field of sentiment analysis using topic models.
More information