Probabilistic clustering and the EM algorithm
|
|
- Scott Hensley
- 7 years ago
- Views:
Transcription
1 Probabilistic clustering and the EM algorithm Guillaume Obozinsi Ecole des Ponts - ParisTech Master MVA EM 1/27
2 Outline 1 Clustering 2 The EM algorithm for the Gaussian mixture model EM 2/27
3 Clustering EM 4/27
4 Supervised, unsupervised and semi-supervised learning Supervised learning Training set composed of pairs {(x 1, y 1 ),..., (x n, y n )}. Learn to classify new points in the classes Unsupervised learning Training set composed of pairs {x 1,..., x n }. Partition the data in a number of classes. Possibly produce a decision rule for new points. Transductive learning Data available at train time composed of train data {(x 1, y 1 ),..., (x n, y n )} + test data {x n+1,..., x n } Classify all the test data Semi-supervised learning Data available at train time composed of labelled data {(x 1, y 1 ),..., (x n, y n )} + unlabelled data {x n+1,..., x n } Produce a classification rule for future points EM 5/27
5 Clustering Clustering is word usually used for unsupervised classification Clustering techniques can be useful to solve semi-supervised classification problem. Clustering is not a well-specified problem Classes might be impossible to infer from the distribution of X alone Several goals possible: Find the modes of the distribution Find a set of denser connected regions supporting most of the density Find a set of denser convex regions supporting most of the density Find a set of denser ellipsoidal regions supporting most of the density Find a set of denser round regions supporting most of the density EM 6/27
6 K-means Key assumption: Data composed of K roundish clusters of similar sizes with centroids (µ 1,, µ K ). Problem can be formulated as: 1 min µ 1,,µ K n Difficult (NP-hard) nonconvex problem. K-means algorithm 1 Draw centroids at random 2 Assign each point to the closest centroid n i=1 min x i µ 2. C { i x i µ 2 = min j x i µ j 2} 3 Recompute centroid as center of mass of the cluster 4 Go to 2 µ 1 C i C x i EM 7/27
7 K-means properties Three remars: K-means is greedy algorithm It can be shown that K-means converges in a finite number of steps. The algorithm however typically get stuc in local minima and it practice it is necessary to try several restarts of the algorithm with a random initialization to have chances to obtain a better solution. Will fail if the clusters are not round EM 8/27
8 K-means++, (Arthur and Vassilvitsii, 2007) Algorithm Choose first center µ 1 uniformly among data points For = 2...K endfor Let D 2 i = min j< x i µ 2 2 Choose the next center among {x 1,..., x n } with probability D 2 i. Solution is log(k) optimal. See Arthur, D. and Vassilvitsii, S. (2007). -means++: the advantages of careful seeding. Proceedings of the 18th annual ACM-SIAM symposium on Discrete algorithms. EM 9/27
9 The Gaussian mixture model and the EM algorithm EM 11/27
10 Jensen s Inequality Consider a function f : R d R 1 if f is convex and if X is a random variable, then E [ f(x) ] f ( E[X] ) 2 if f is strictly convex, we have equality in the previous inequality if and only if X is constant almost surely. EM 12/27
11 Entropy Let X a r.v. with values in the finite set X and p(x) = P (X = x). Quantity of information of the observation x Definition of entropy I(x) := log 1 p(x) H(X) := E [I(X)] = x X p(x) log p(x) Remars: Convention: 0 log 0 = 0 H defined either with natural log or the log in base 2 (i.e. log 2 ). log 2 is better for coding interpretations In this course we will use the natural logarithm. EM 13/27
12 Kullbac-Leibler divergence Definition Let p and q be two finite distributions on X finite. The Kullbac-Leibler divergence is defined by D(p q) = p(x) log p(x) q(x) x X = ( p(x) log p(x) ) q(x) q(x) q(x) x X [ = E X p log p(x) ] q(x) [ p(x) p(x) = E X q log q(x) q(x) ] The KL divergence is not a distance: it is not symmetric. x X with q(x) = 0 and p(x) 0 then D(p q) = +. If EM 14/27
13 Kullbac-Leibler divergence Proposition D(p q) 0 and equality holds if and only if p = q. Proof. W.l.o.g assume q(x) > 0 everywhere. 1 y y log y is convex so by Jensen s inequality, we have [ ( )] [ ] [ ] p(x) p(x) p(x) p(x) D(p q) = E q q(x) log E q log E q = 0 q(x) q(x) q(x) since [ ] p(x) E q = p(x) q(x) q(x) q(x) = p(x) = 1. x X x X 2 D(p q) = 0 iff there is equality in Jensen s inequality p(x) = cq(x) q-a.s., but summing this last equality over x implies that c = 1, in turn implies that p = q. EM 15/27
14 Differential entropy and KL Let X be a r.v. with distribution P and density p w.r.t. a measure µ. Differential entropy: H diff (p) = X p(x) log(p(x))dµ(x) Differential Kullbac Leibler Divergence D diff (p q) = p(x) log p(x) X q(x) dµ(x) [ = E X p log p(x) ] q(x) H diff (p) 0 H diff (p) depends on the reference measure µ. H diff (p) does not capture intrinsic properties of P. However, D diff (p q) does not depend on µ. EM 16/27
15 Gaussian mixture model K components z component indicator z = (z 1,..., z K ) {0, 1} K z M(1, (π 1,..., π K )) K p(z) = =1 π z p(x z; (µ, Σ ) ) = p(x) = K z N (x; µ, Σ ) =1 K π N (x; µ, Σ ) =1 Estimation: [ K ] argmax log π N (x; µ, Σ ) µ,σ =1 π z n x n µ Σ (a) N EM 17/27
16 Applying maximum lielihood to the Gaussian mixture Let Z = {z {0, 1} K K =1 z = 1} p(x) = z Z Issue p(x, z) = z Z K =1 [ π N (x; µ, Σ )] z = K π N (x; µ, Σ ) =1 The marginal log-lielihood l(θ) = i log(p(x(i) )) with θ = ( π, (µ, Σ ) 1 K ) is now complicated No hope to find a simple solution to the maximum lielihood problem By contrast the complete log-lielihood has a rather simple form: l ( θ) = M log p(x (i), z (i) ) = i, i=1 z (i) log N (x(i) ; µ, Σ )+ i, z (i) log(π ), EM 18/27
17 Applying ML to the multinomial mixture l ( θ) = M log p(x (i), z (i) ) = i, i=1 z (i) log N (x(i) ; µ, Σ )+ i, z (i) log(π ), If we new z (i) we could maximize l(θ). If we new θ = ( π, (µ, Σ ) 1 K ), we could find the best z (i) since we could compute the true a posteriori on z (i) given x (i) : p(z (i) = 1 x; θ) = Seems a chicen and egg problem... In addition, we want to solve ( ) max log p(x (i), z (i) ) and not θ i z (i) π N (x (i) ; µ, Σ ) K j=1 π j N (x (i) ; µ j, Σ j ) max θ, z (1),...,z (M) log p(x (i), z (i) ) Can we still use the intuitions above to construct an algorithm maximizing the marginal lielihood? i EM 19/27
18 Principle of the Expectation-Maximization Algorithm log p(x; θ) = log z z p(x, z; θ) = log z q(z) log p(x, z; θ) q(z) p(x, z; θ) q(z) q(z) = E q [log p(x, z; θ)] + H(q) =: L(q, θ) This shows that L(q, θ) log p(x; θ) Moreover: θ L(q, θ) is a concave function. Finally it is possible to show that L(q, θ) = log p(x; θ) KL(q p( x; θ)) So that if we set q(z) = p(z x; θ (t) ) then L(q, θ (t) ) = p(x; θ (t) ). ln p(x θ) L (q, θ) θ old θnew EM 20/27
19 A graphical idea of the EM algorithm ln p(x θ) L (q, θ) θ old θ new EM 21/27
20 Expectation Maximization algorithm Initialize θ = θ 0 WHILE (Not converged) Expectation step 1 q(z) = p(z x; θ (t 1) ) 2 L(q, θ) = E q [ log p(x, z; θ) ] + H(q) Maximization step 1 θ (t) [ ] = argmax E q log p(x, z; θ) θ ENDWHILE ln p(x θ) L (q, θ) θ old θ new θ old = θ (t 1) θ new = θ (t) EM 22/27
21 Expected complete log-lielihood With the notation: q (t) i ] E q (t) [ l(θ) = E q (t) [ M = E q (t) = E q (t) = i, = i, = P q (t) (z (i) i [ ] log p(x, Z; θ) i=1 [ i, E q (t) i = 1) = E q (t) i ] log p(x (i), z (i) ; θ) z (i) log N (x(i), µ, Σ ) + i, ] log N (x (i), µ, Σ ) + [ z (i) q (t) i log N (x(i), µ, Σ ) + i, [ (i)] z, we have i, ] z (i) log(π ) E q (t) i q (t) i log(π ) [ (i)] z log(π ) EM 23/27
22 Expectation step for the Gaussian mixture We computed previously q (t) i (z (i) ), which is a multinomial distribution defined by q (t) i (z (i) ) = p(z (i) x (i) ; θ (t 1) ) Abusing notation we will denote (q (t) i1,..., q(t) ik ) the corresponding vector of probabilities defined by q (t) i q (t) i = P q (t) (z (i) i = p(z(i) = 1 x (i) ; θ (t 1) ) = = 1) = E q (t) i π (t 1) K j=1 π(t 1) j [ (i)] z N (x (i), µ (t 1), Σ (t 1) ) N (x (i), µ (t 1) j, Σ (t 1) j ) EM 24/27
23 Maximization step for the Gaussian mixture ( π t, (µ (t), Σ(t) ) 1 K) = argmax θ ] E q (t) [ l(θ) This yields the updates: µ (t) = i x(i) q (t) i i q(t) i, Σ (t) = i ( x (i) µ (t) )( x (i) µ (t) i q(t) i ) q (t) i and π (t) = i q(t) i i, q(t) i EM 25/27
24 Final EM algorithm for the Multinomial mixture model Initialize θ = θ 0 WHILE (Not converged) Expectation step q (t) i π(t 1) K j=1 π(t 1) j N (x (i), µ (t 1), Σ (t 1) ) N (x (i), µ (t 1) j, Σ (t 1) j ) Maximization step µ (t) = i x(i) q (t) i i q(t) i, Σ (t) = i ( x (i) µ (t) )( x (i) µ (t) i q(t) i ) q (t) i ENDWHILE and π (t) = i q(t) i i, q(t) i EM 26/27
25 EM Algorithm for the Gaussian mixture model III p(x z) p(z x) EM 27/27
Semi-Supervised Support Vector Machines and Application to Spam Filtering
Semi-Supervised Support Vector Machines and Application to Spam Filtering Alexander Zien Empirical Inference Department, Bernhard Schölkopf Max Planck Institute for Biological Cybernetics ECML 2006 Discovery
More informationThe Expectation Maximization Algorithm A short tutorial
The Expectation Maximiation Algorithm A short tutorial Sean Borman Comments and corrections to: em-tut at seanborman dot com July 8 2004 Last updated January 09, 2009 Revision history 2009-0-09 Corrected
More informationSocial Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
More informationNeural Networks Lesson 5 - Cluster Analysis
Neural Networks Lesson 5 - Cluster Analysis Prof. Michele Scarpiniti INFOCOM Dpt. - Sapienza University of Rome http://ispac.ing.uniroma1.it/scarpiniti/index.htm michele.scarpiniti@uniroma1.it Rome, 29
More informationLecture 3: Linear methods for classification
Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,
More informationClustering. Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016
Clustering Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016 1 Supervised learning vs. unsupervised learning Supervised learning: discover patterns in the data that relate data attributes with
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct
More informationMASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 5 9/17/2008 RANDOM VARIABLES
MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 5 9/17/2008 RANDOM VARIABLES Contents 1. Random variables and measurable functions 2. Cumulative distribution functions 3. Discrete
More informationParallel Programming Map-Reduce. Needless to Say, We Need Machine Learning for Big Data
Case Study 2: Document Retrieval Parallel Programming Map-Reduce Machine Learning/Statistics for Big Data CSE599C1/STAT592, University of Washington Carlos Guestrin January 31 st, 2013 Carlos Guestrin
More informationLinear Threshold Units
Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear
More informationBasics of Statistical Machine Learning
CS761 Spring 2013 Advanced Machine Learning Basics of Statistical Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu Modern machine learning is rooted in statistics. You will find many familiar
More informationSupervised and unsupervised learning - 1
Chapter 3 Supervised and unsupervised learning - 1 3.1 Introduction The science of learning plays a key role in the field of statistics, data mining, artificial intelligence, intersecting with areas in
More informationProbabilistic Latent Semantic Analysis (plsa)
Probabilistic Latent Semantic Analysis (plsa) SS 2008 Bayesian Networks Multimedia Computing, Universität Augsburg Rainer.Lienhart@informatik.uni-augsburg.de www.multimedia-computing.{de,org} References
More informationFacebook Friend Suggestion Eytan Daniyalzade and Tim Lipus
Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus 1. Introduction Facebook is a social networking website with an open platform that enables developers to extract and utilize user information
More informationLABEL PROPAGATION ON GRAPHS. SEMI-SUPERVISED LEARNING. ----Changsheng Liu 10-30-2014
LABEL PROPAGATION ON GRAPHS. SEMI-SUPERVISED LEARNING ----Changsheng Liu 10-30-2014 Agenda Semi Supervised Learning Topics in Semi Supervised Learning Label Propagation Local and global consistency Graph
More informationChristfried Webers. Canberra February June 2015
c Statistical Group and College of Engineering and Computer Science Canberra February June (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 829 c Part VIII Linear Classification 2 Logistic
More informationTail inequalities for order statistics of log-concave vectors and applications
Tail inequalities for order statistics of log-concave vectors and applications Rafał Latała Based in part on a joint work with R.Adamczak, A.E.Litvak, A.Pajor and N.Tomczak-Jaegermann Banff, May 2011 Basic
More informationData Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining
Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining 4/8/2004 Hierarchical
More informationLogistic Regression. Vibhav Gogate The University of Texas at Dallas. Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld.
Logistic Regression Vibhav Gogate The University of Texas at Dallas Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld. Generative vs. Discriminative Classifiers Want to Learn: h:x Y X features
More informationLikelihood, MLE & EM for Gaussian Mixture Clustering. Nick Duffield Texas A&M University
Likelihood, MLE & EM for Gaussian Mixture Clustering Nick Duffield Texas A&M University Probability vs. Likelihood Probability: predict unknown outcomes based on known parameters: P(x θ) Likelihood: eskmate
More informationStatistical Machine Translation: IBM Models 1 and 2
Statistical Machine Translation: IBM Models 1 and 2 Michael Collins 1 Introduction The next few lectures of the course will be focused on machine translation, and in particular on statistical machine translation
More informationStatistical Machine Learning from Data
Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Gaussian Mixture Models Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique
More informationGambling and Data Compression
Gambling and Data Compression Gambling. Horse Race Definition The wealth relative S(X) = b(x)o(x) is the factor by which the gambler s wealth grows if horse X wins the race, where b(x) is the fraction
More informationARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING)
ARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING) Gabriela Ochoa http://www.cs.stir.ac.uk/~goc/ OUTLINE Preliminaries Classification and Clustering Applications
More informationIntroduction to Machine Learning Using Python. Vikram Kamath
Introduction to Machine Learning Using Python Vikram Kamath Contents: 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. Introduction/Definition Where and Why ML is used Types of Learning Supervised Learning Linear Regression
More informationTHE NUMBER OF GRAPHS AND A RANDOM GRAPH WITH A GIVEN DEGREE SEQUENCE. Alexander Barvinok
THE NUMBER OF GRAPHS AND A RANDOM GRAPH WITH A GIVEN DEGREE SEQUENCE Alexer Barvinok Papers are available at http://www.math.lsa.umich.edu/ barvinok/papers.html This is a joint work with J.A. Hartigan
More informationStatistical Machine Learning
Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes
More informationVariational Mean Field for Graphical Models
Variational Mean Field for Graphical Models CS/CNS/EE 155 Baback Moghaddam Machine Learning Group baback @ jpl.nasa.gov Approximate Inference Consider general UGs (i.e., not tree-structured) All basic
More informationUNSUPERVISED RESTORATION IN GAUSSIAN PAIRWISE MIXTURE MODEL
USUPERVISED RESTORATIO I GAUSSIA PAIRWISE MIXTURE MODE Stéphane Derrode and Wojciech Piecznsi École Centrale Marseille, Institut Fresnel CRS UMR 633, 38, rue F. Joliot-Curie, F-345 Marseille cedex 20,
More informationApplied Algorithm Design Lecture 5
Applied Algorithm Design Lecture 5 Pietro Michiardi Eurecom Pietro Michiardi (Eurecom) Applied Algorithm Design Lecture 5 1 / 86 Approximation Algorithms Pietro Michiardi (Eurecom) Applied Algorithm Design
More informationDATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS
DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS 1 AND ALGORITHMS Chiara Renso KDD-LAB ISTI- CNR, Pisa, Italy WHAT IS CLUSTER ANALYSIS? Finding groups of objects such that the objects in a group will be similar
More informationModern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh
Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh Peter Richtárik Week 3 Randomized Coordinate Descent With Arbitrary Sampling January 27, 2016 1 / 30 The Problem
More informationBayesian Clustering for Email Campaign Detection
Peter Haider haider@cs.uni-potsdam.de Tobias Scheffer scheffer@cs.uni-potsdam.de University of Potsdam, Department of Computer Science, August-Bebel-Strasse 89, 14482 Potsdam, Germany Abstract We discuss
More informationDirichlet Processes A gentle tutorial
Dirichlet Processes A gentle tutorial SELECT Lab Meeting October 14, 2008 Khalid El-Arini Motivation We are given a data set, and are told that it was generated from a mixture of Gaussian distributions.
More informationRandomization Approaches for Network Revenue Management with Customer Choice Behavior
Randomization Approaches for Network Revenue Management with Customer Choice Behavior Sumit Kunnumkal Indian School of Business, Gachibowli, Hyderabad, 500032, India sumit kunnumkal@isb.edu March 9, 2011
More information2.3 Convex Constrained Optimization Problems
42 CHAPTER 2. FUNDAMENTAL CONCEPTS IN CONVEX OPTIMIZATION Theorem 15 Let f : R n R and h : R R. Consider g(x) = h(f(x)) for all x R n. The function g is convex if either of the following two conditions
More informationUnsupervised Learning and Data Mining. Unsupervised Learning and Data Mining. Clustering. Supervised Learning. Supervised Learning
Unsupervised Learning and Data Mining Unsupervised Learning and Data Mining Clustering Decision trees Artificial neural nets K-nearest neighbor Support vectors Linear regression Logistic regression...
More informationOn Entropy in Network Traffic Anomaly Detection
On Entropy in Network Traffic Anomaly Detection Jayro Santiago-Paz, Deni Torres-Roman. Cinvestav, Campus Guadalajara, Mexico November 2015 Jayro Santiago-Paz, Deni Torres-Roman. 1/19 On Entropy in Network
More informationCS 2750 Machine Learning. Lecture 1. Machine Learning. http://www.cs.pitt.edu/~milos/courses/cs2750/ CS 2750 Machine Learning.
Lecture Machine Learning Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square, x5 http://www.cs.pitt.edu/~milos/courses/cs75/ Administration Instructor: Milos Hauskrecht milos@cs.pitt.edu 539 Sennott
More informationWeek 1: Introduction to Online Learning
Week 1: Introduction to Online Learning 1 Introduction This is written based on Prediction, Learning, and Games (ISBN: 2184189 / -21-8418-9 Cesa-Bianchi, Nicolo; Lugosi, Gabor 1.1 A Gentle Start Consider
More informationClustering Connectionist and Statistical Language Processing
Clustering Connectionist and Statistical Language Processing Frank Keller keller@coli.uni-sb.de Computerlinguistik Universität des Saarlandes Clustering p.1/21 Overview clustering vs. classification supervised
More informationProbabilistic user behavior models in online stores for recommender systems
Probabilistic user behavior models in online stores for recommender systems Tomoharu Iwata Abstract Recommender systems are widely used in online stores because they are expected to improve both user
More informationMath 4310 Handout - Quotient Vector Spaces
Math 4310 Handout - Quotient Vector Spaces Dan Collins The textbook defines a subspace of a vector space in Chapter 4, but it avoids ever discussing the notion of a quotient space. This is understandable
More informationCourse: Model, Learning, and Inference: Lecture 5
Course: Model, Learning, and Inference: Lecture 5 Alan Yuille Department of Statistics, UCLA Los Angeles, CA 90095 yuille@stat.ucla.edu Abstract Probability distributions on structured representation.
More informationIntroduction to Clustering
1/57 Introduction to Clustering Mark Johnson Department of Computing Macquarie University 2/57 Outline Supervised versus unsupervised learning Applications of clustering in text processing Evaluating clustering
More informationNatural Language Processing. Today. Logistic Regression Models. Lecture 13 10/6/2015. Jim Martin. Multinomial Logistic Regression
Natural Language Processing Lecture 13 10/6/2015 Jim Martin Today Multinomial Logistic Regression Aka log-linear models or maximum entropy (maxent) Components of the model Learning the parameters 10/1/15
More informationK-Means Clustering Tutorial
K-Means Clustering Tutorial By Kardi Teknomo,PhD Preferable reference for this tutorial is Teknomo, Kardi. K-Means Clustering Tutorials. http:\\people.revoledu.com\kardi\ tutorial\kmean\ Last Update: July
More information3. The Junction Tree Algorithms
A Short Course on Graphical Models 3. The Junction Tree Algorithms Mark Paskin mark@paskin.org 1 Review: conditional independence Two random variables X and Y are independent (written X Y ) iff p X ( )
More informationPrinciple of Data Reduction
Chapter 6 Principle of Data Reduction 6.1 Introduction An experimenter uses the information in a sample X 1,..., X n to make inferences about an unknown parameter θ. If the sample size n is large, then
More informationOnline Learning. 1 Online Learning and Mistake Bounds for Finite Hypothesis Classes
Advanced Course in Machine Learning Spring 2011 Online Learning Lecturer: Shai Shalev-Shwartz Scribe: Shai Shalev-Shwartz In this lecture we describe a different model of learning which is called online
More informationNotes from Week 1: Algorithms for sequential prediction
CS 683 Learning, Games, and Electronic Markets Spring 2007 Notes from Week 1: Algorithms for sequential prediction Instructor: Robert Kleinberg 22-26 Jan 2007 1 Introduction In this course we will be looking
More informationTensor Methods for Machine Learning, Computer Vision, and Computer Graphics
Tensor Methods for Machine Learning, Computer Vision, and Computer Graphics Part I: Factorizations and Statistical Modeling/Inference Amnon Shashua School of Computer Science & Eng. The Hebrew University
More informationLecture 9: Introduction to Pattern Analysis
Lecture 9: Introduction to Pattern Analysis g Features, patterns and classifiers g Components of a PR system g An example g Probability definitions g Bayes Theorem g Gaussian densities Features, patterns
More informationE3: PROBABILITY AND STATISTICS lecture notes
E3: PROBABILITY AND STATISTICS lecture notes 2 Contents 1 PROBABILITY THEORY 7 1.1 Experiments and random events............................ 7 1.2 Certain event. Impossible event............................
More informationThe Steepest Descent Algorithm for Unconstrained Optimization and a Bisection Line-search Method
The Steepest Descent Algorithm for Unconstrained Optimization and a Bisection Line-search Method Robert M. Freund February, 004 004 Massachusetts Institute of Technology. 1 1 The Algorithm The problem
More informationDistributed Structured Prediction for Big Data
Distributed Structured Prediction for Big Data A. G. Schwing ETH Zurich aschwing@inf.ethz.ch T. Hazan TTI Chicago M. Pollefeys ETH Zurich R. Urtasun TTI Chicago Abstract The biggest limitations of learning
More informationBig Data - Lecture 1 Optimization reminders
Big Data - Lecture 1 Optimization reminders S. Gadat Toulouse, Octobre 2014 Big Data - Lecture 1 Optimization reminders S. Gadat Toulouse, Octobre 2014 Schedule Introduction Major issues Examples Mathematics
More informationThe p-norm generalization of the LMS algorithm for adaptive filtering
The p-norm generalization of the LMS algorithm for adaptive filtering Jyrki Kivinen University of Helsinki Manfred Warmuth University of California, Santa Cruz Babak Hassibi California Institute of Technology
More information10-601. Machine Learning. http://www.cs.cmu.edu/afs/cs/academic/class/10601-f10/index.html
10-601 Machine Learning http://www.cs.cmu.edu/afs/cs/academic/class/10601-f10/index.html Course data All up-to-date info is on the course web page: http://www.cs.cmu.edu/afs/cs/academic/class/10601-f10/index.html
More informationTopic models for Sentiment analysis: A Literature Survey
Topic models for Sentiment analysis: A Literature Survey Nikhilkumar Jadhav 123050033 June 26, 2014 In this report, we present the work done so far in the field of sentiment analysis using topic models.
More informationCluster Analysis. Isabel M. Rodrigues. Lisboa, 2014. Instituto Superior Técnico
Instituto Superior Técnico Lisboa, 2014 Introduction: Cluster analysis What is? Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from
More informationInterpreting Kullback-Leibler Divergence with the Neyman-Pearson Lemma
Interpreting Kullback-Leibler Divergence with the Neyman-Pearson Lemma Shinto Eguchi a, and John Copas b a Institute of Statistical Mathematics and Graduate University of Advanced Studies, Minami-azabu
More informationClass #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris
Class #6: Non-linear classification ML4Bio 2012 February 17 th, 2012 Quaid Morris 1 Module #: Title of Module 2 Review Overview Linear separability Non-linear classification Linear Support Vector Machines
More informationClustering UE 141 Spring 2013
Clustering UE 141 Spring 013 Jing Gao SUNY Buffalo 1 Definition of Clustering Finding groups of obects such that the obects in a group will be similar (or related) to one another and different from (or
More informationEx. 2.1 (Davide Basilio Bartolini)
ECE 54: Elements of Information Theory, Fall 00 Homework Solutions Ex.. (Davide Basilio Bartolini) Text Coin Flips. A fair coin is flipped until the first head occurs. Let X denote the number of flips
More information15 Kuhn -Tucker conditions
5 Kuhn -Tucker conditions Consider a version of the consumer problem in which quasilinear utility x 2 + 4 x 2 is maximised subject to x +x 2 =. Mechanically applying the Lagrange multiplier/common slopes
More informationCreating a NL Texas Hold em Bot
Creating a NL Texas Hold em Bot Introduction Poker is an easy game to learn by very tough to master. One of the things that is hard to do is controlling emotions. Due to frustration, many have made the
More informationMachine Learning. CUNY Graduate Center, Spring 2013. Professor Liang Huang. huang@cs.qc.cuny.edu
Machine Learning CUNY Graduate Center, Spring 2013 Professor Liang Huang huang@cs.qc.cuny.edu http://acl.cs.qc.edu/~lhuang/teaching/machine-learning Logistics Lectures M 9:30-11:30 am Room 4419 Personnel
More informationInformation Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay
Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay Lecture - 17 Shannon-Fano-Elias Coding and Introduction to Arithmetic Coding
More informationTwo-Stage Stochastic Linear Programs
Two-Stage Stochastic Linear Programs Operations Research Anthony Papavasiliou 1 / 27 Two-Stage Stochastic Linear Programs 1 Short Reviews Probability Spaces and Random Variables Convex Analysis 2 Deterministic
More informationMachine Learning for Medical Image Analysis. A. Criminisi & the InnerEye team @ MSRC
Machine Learning for Medical Image Analysis A. Criminisi & the InnerEye team @ MSRC Medical image analysis the goal Automatic, semantic analysis and quantification of what observed in medical scans Brain
More informationCHAPTER 2 Estimating Probabilities
CHAPTER 2 Estimating Probabilities Machine Learning Copyright c 2016. Tom M. Mitchell. All rights reserved. *DRAFT OF January 24, 2016* *PLEASE DO NOT DISTRIBUTE WITHOUT AUTHOR S PERMISSION* This is a
More informationSpazi vettoriali e misure di similaritá
Spazi vettoriali e misure di similaritá R. Basili Corso di Web Mining e Retrieval a.a. 2008-9 April 10, 2009 Outline Outline Spazi vettoriali a valori reali Operazioni tra vettori Indipendenza Lineare
More informationMachine Learning using MapReduce
Machine Learning using MapReduce What is Machine Learning Machine learning is a subfield of artificial intelligence concerned with techniques that allow computers to improve their outputs based on previous
More informationLinear Programming. Widget Factory Example. Linear Programming: Standard Form. Widget Factory Example: Continued.
Linear Programming Widget Factory Example Learning Goals. Introduce Linear Programming Problems. Widget Example, Graphical Solution. Basic Theory:, Vertices, Existence of Solutions. Equivalent formulations.
More informationLoad balancing of temporary tasks in the l p norm
Load balancing of temporary tasks in the l p norm Yossi Azar a,1, Amir Epstein a,2, Leah Epstein b,3 a School of Computer Science, Tel Aviv University, Tel Aviv, Israel. b School of Computer Science, The
More informationUnsupervised learning: Clustering
Unsupervised learning: Clustering Salissou Moutari Centre for Statistical Science and Operational Research CenSSOR 17 th September 2013 Unsupervised learning: Clustering 1/52 Outline 1 Introduction What
More informationAdaptive Online Gradient Descent
Adaptive Online Gradient Descent Peter L Bartlett Division of Computer Science Department of Statistics UC Berkeley Berkeley, CA 94709 bartlett@csberkeleyedu Elad Hazan IBM Almaden Research Center 650
More information1 if 1 x 0 1 if 0 x 1
Chapter 3 Continuity In this chapter we begin by defining the fundamental notion of continuity for real valued functions of a single real variable. When trying to decide whether a given function is or
More information17.3.1 Follow the Perturbed Leader
CS787: Advanced Algorithms Topic: Online Learning Presenters: David He, Chris Hopman 17.3.1 Follow the Perturbed Leader 17.3.1.1 Prediction Problem Recall the prediction problem that we discussed in class.
More informationComparing large datasets structures through unsupervised learning
Comparing large datasets structures through unsupervised learning Guénaël Cabanes and Younès Bennani LIPN-CNRS, UMR 7030, Université de Paris 13 99, Avenue J-B. Clément, 93430 Villetaneuse, France cabanes@lipn.univ-paris13.fr
More informationChapter 11. 11.1 Load Balancing. Approximation Algorithms. Load Balancing. Load Balancing on 2 Machines. Load Balancing: Greedy Scheduling
Approximation Algorithms Chapter Approximation Algorithms Q. Suppose I need to solve an NP-hard problem. What should I do? A. Theory says you're unlikely to find a poly-time algorithm. Must sacrifice one
More informationHT2015: SC4 Statistical Data Mining and Machine Learning
HT2015: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Bayesian Nonparametrics Parametric vs Nonparametric
More informationHow To Solve A Sequential Mca Problem
Monte Carlo-based statistical methods (MASM11/FMS091) Jimmy Olsson Centre for Mathematical Sciences Lund University, Sweden Lecture 6 Sequential Monte Carlo methods II February 3, 2012 Changes in HA1 Problem
More informationDuality in General Programs. Ryan Tibshirani Convex Optimization 10-725/36-725
Duality in General Programs Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: duality in linear programs Given c R n, A R m n, b R m, G R r n, h R r : min x R n c T x max u R m, v R r b T
More informationMoral Hazard. Itay Goldstein. Wharton School, University of Pennsylvania
Moral Hazard Itay Goldstein Wharton School, University of Pennsylvania 1 Principal-Agent Problem Basic problem in corporate finance: separation of ownership and control: o The owners of the firm are typically
More informationMachine Learning Big Data using Map Reduce
Machine Learning Big Data using Map Reduce By Michael Bowles, PhD Where Does Big Data Come From? -Web data (web logs, click histories) -e-commerce applications (purchase histories) -Retail purchase histories
More information5.3 Improper Integrals Involving Rational and Exponential Functions
Section 5.3 Improper Integrals Involving Rational and Exponential Functions 99.. 3. 4. dθ +a cos θ =, < a
More informationAnalysis of kiva.com Microlending Service! Hoda Eydgahi Julia Ma Andy Bardagjy December 9, 2010 MAS.622j
Analysis of kiva.com Microlending Service! Hoda Eydgahi Julia Ma Andy Bardagjy December 9, 2010 MAS.622j What is Kiva? An organization that allows people to lend small amounts of money via the Internet
More informationHow To Solve The Cluster Algorithm
Cluster Algorithms Adriano Cruz adriano@nce.ufrj.br 28 de outubro de 2013 Adriano Cruz adriano@nce.ufrj.br () Cluster Algorithms 28 de outubro de 2013 1 / 80 Summary 1 K-Means Adriano Cruz adriano@nce.ufrj.br
More informationLogistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression
Logistic Regression Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Logistic Regression Preserve linear classification boundaries. By the Bayes rule: Ĝ(x) = arg max
More informationIntroduction to Markov Chain Monte Carlo
Introduction to Markov Chain Monte Carlo Monte Carlo: sample from a distribution to estimate the distribution to compute max, mean Markov Chain Monte Carlo: sampling using local information Generic problem
More informationProbability Generating Functions
page 39 Chapter 3 Probability Generating Functions 3 Preamble: Generating Functions Generating functions are widely used in mathematics, and play an important role in probability theory Consider a sequence
More informationDefinition and Properties of the Production Function: Lecture
Definition and Properties of the Production Function: Lecture II August 25, 2011 Definition and : Lecture A Brief Brush with Duality Cobb-Douglas Cost Minimization Lagrangian for the Cobb-Douglas Solution
More informationTesting Cost Inefficiency under Free Entry in the Real Estate Brokerage Industry
Web Appendix to Testing Cost Inefficiency under Free Entry in the Real Estate Brokerage Industry Lu Han University of Toronto lu.han@rotman.utoronto.ca Seung-Hyun Hong University of Illinois hyunhong@ad.uiuc.edu
More informationData Mining Clustering (2) Sheets are based on the those provided by Tan, Steinbach, and Kumar. Introduction to Data Mining
Data Mining Clustering (2) Toon Calders Sheets are based on the those provided by Tan, Steinbach, and Kumar. Introduction to Data Mining Outline Partitional Clustering Distance-based K-means, K-medoids,
More informationMapReduce Approach to Collective Classification for Networks
MapReduce Approach to Collective Classification for Networks Wojciech Indyk 1, Tomasz Kajdanowicz 1, Przemyslaw Kazienko 1, and Slawomir Plamowski 1 Wroclaw University of Technology, Wroclaw, Poland Faculty
More information203.4770: Introduction to Machine Learning Dr. Rita Osadchy
203.4770: Introduction to Machine Learning Dr. Rita Osadchy 1 Outline 1. About the Course 2. What is Machine Learning? 3. Types of problems and Situations 4. ML Example 2 About the course Course Homepage:
More informationUNSUPERVISED MACHINE LEARNING TECHNIQUES IN GENOMICS
UNSUPERVISED MACHINE LEARNING TECHNIQUES IN GENOMICS Dwijesh C. Mishra I.A.S.R.I., Library Avenue, New Delhi-110 012 dcmishra@iasri.res.in What is Learning? "Learning denotes changes in a system that enable
More informationClassification Techniques for Remote Sensing
Classification Techniques for Remote Sensing Selim Aksoy Department of Computer Engineering Bilkent University Bilkent, 06800, Ankara saksoy@cs.bilkent.edu.tr http://www.cs.bilkent.edu.tr/ saksoy/courses/cs551
More information