Chapter 4: Non-Parametric Classification

Save this PDF as:
 WORD  PNG  TXT  JPG

Size: px
Start display at page:

Download "Chapter 4: Non-Parametric Classification"

Transcription

1 Chapter 4: Non-Parametric Classification Introduction Density Estimation Parzen Windows Kn-Nearest Neighbor Density Estimation K-Nearest Neighbor (KNN) Decision Rule

2 Gaussian Mixture Model A weighted combination of a small but unknown no. of Gaussians Can one recover the parameters of Gaussians from unlabeled data?

3 Introduction 2 All parametric densities are unimodal (have a single local maximum), whereas many practical problems involve multi-modal modal densities Nonparametric procedures can be used with arbitrary distributions and without the assumption that the form of the underlying densities is known Two types of nonparametric methods: Estimating class conditional densities, P P(x ω j ) Bypass class-conditional conditional density estimation and directly estimate the a-posteriori a probability Pattern Classification, Ch4 (Part 1)

4 Density Estimation 3 Basic idea: Probability that a vector x will fall in region R is: P = p( x' )dx' (1) R P is a smoothed (or averaged) version of the density function p(x). How do we estimate P? If we have a sample of size n (i.i.d from p(x)), the probability that k of the n points fall in R is: P k n = P k k ( 1 P ) n k (2) and the expected value for k is: E(k) = np (3) Pattern Classification, Ch4 (Part 1)

5 ML estimation of unknown P = θ 4 Max( Pk θ ) is achieved when θ ˆ θ = k n P Ratio k/n is a good estimate for the probability P and hence for the density function p (x) Assume p(x) is continuous and region R is so small that p does not vary significantly within it. Then R p( x' )dx' x )V where x is a point in R and V the volume enclosed by R p( (4)

6 Combining equations (1), (3) and (4) yields: 5 p( x ) k / V n Pattern Classification, Ch4 (Part 1)

7 Conditions for convergence 6 The fraction k/(nv) is a space averaged value of p(x). Convergence to true p(x) is obtained only if V approaches zero. lim V 0, k = 0 p( x ) (if This is the case where no samples are included in R lim V 0, k 0 In this case, the estimate diverges = p( 0 x ) n = = fixed) Pattern Classification, Ch4 (Part 1)

8 The volume V needs to approach 0 if we want to obtain p(x) rather than just an averaged version of it. 7 In practice, V cannot be allowed to become small since the number of samples, n, is always limited We have to accept a certain amount of averaging in p(x) Theoretically, if an unlimited number of samples is available, we can circumvent this difficulty as follows: To estimate the density of x, we form a sequence of regions R 1, R 2, containing x: the first region contains one sample, the second two samples and so on. Let V n be the volume of R n, k n be the number of samples falling in R n and p n (x) be the n th estimate for p(x): p n (x) = (k n /n)/v n (7) Pattern Classification, Ch4 (Part 1)

9 Necessary conditions for p n (x) to converge to p(x): 8 1 ) limv n 2 ) lim k n 3 ) lim k n n n n = 0 = / n = 0 Two ways of defining sequences of regions that satisfy these conditions: (a) Shrink an initial region where V n = 1/ n ; it can be shown that p n ( x ) n x ) This is called the Parzen-window estimation method (b) Specify k n as some function of n, such as k n = n; ; the volume V n is grown until it encloses k n neighbors of x. This is called the k n -nearest neighbor estimation method p( Pattern Classification, Ch4 (Part 1)

10 9 Pattern Classification, Ch4 (Part 1)

11 Parzen Windows 10 For simplicity, let us assume that the region R n is a d-d dimensional hypercube V n = h d n (h n Let ϕ(u) be the : length of 1 1 u j j = ϕ(u) = 2 0 otherwise the edge of following window 1,...,d R n ) function : ϕ((x-x i )/h n ) is equal to unity if x i falls within the hypercube of volume V n centered at x and equal to zero otherwise. Pattern Classification, Ch4 (Part 1)

12 The number of samples in this hypercube is: 11 k n = i = n i= 1 x ϕ h n x i By substituting k n in equation (7), we obtain the following estimate: p n (x) i 1 = = n n 1 V x ϕ h i= 1 n n x i P n (x) estimates p(x) as an average of functions of x and the samples x i, i = 1,,n. These functions ϕ can be of any form, e.g., Gaussian. Pattern Classification, Ch4 (Part 1)

13 12 The behavior of the Parzen-window method Case where p(x) N(0,1) Let ϕ(u) = (1/ (2π) exp(-u 2 /2) and h n = h 1 / n (n>1) Thus: (h 1 : known parameter) p n ( x ) i 1 = = n n 1 h x ϕ h i= 1 n n x i is an average of normal densities centered at the samples x i. Pattern Classification, Ch4 (Part 1)

14 13 Numerical results: For n = 1 and h 1 =1 1 1 / 2 2 p1( x ) = ϕ( x x1 ) = e ( x x1 ) N( x1 2π,1 ) For n = 10 and h = 0.1,, the contributions of the individual samples are clearly observable! Pattern Classification, Ch4 (Part 1)

15 14 Pattern Classification, Ch4 (Part 1)

16 15 Pattern Classification, Ch4 (Part 1)

17 Analogous results are also obtained in two dimensions: 16 Pattern Classification, Ch4 (Part 1)

18 17 Pattern Classification, Ch4 (Part 1)

19 Unknown density p(x) = λ 1.U(a,b) + λ 2.T(c,d) is a mixture of a uniform and a triangle density 18 Pattern Classification, Ch4 (Part 1)

20 19 Pattern Classification, Ch4 (Part 1)

21 Parzen-window density estimates

22 21 Classification example How do we design classifiers based on Parzen-window density estimation? Estimate the class-conditional conditional densities for each class; classify a test sample by the label corresponding to the maximum a posteriori density Decision region for a Parzen-window classifier depends upon the choice of window function and the window width; in practice window function chosen is Gaussian Pattern Classification, Ch4 (Part 1)

23 22 Pattern Classification, Ch4 (Part 1)

24 K n - Nearest Neighbor Density Estimation 23 Let the region volume be a function of the training data Center a region about x and let it grows until it captures k n samples (k n = f(n)) k n denotes the k n nearest-neighbors neighbors of x Two possibilities can occur: If the true density is high near x, the region will be small which provides a good resolution If the true density at x is low, the region will grow large to capture k n samples We can obtain a family of estimates by setting k n =k 1 / n and choosing different values for k 1

25

26 25 Illustration For k n = n n = 1 ; the estimate becomes: P n (x) = k n / n.v n = 1 / V 1 =1 / 2 x-x 1 Pattern Classification, Chapter 4 (Part 2)

27 26 Pattern Classification, Chapter 4 (Part 2)

28 27 Pattern Classification, Chapter 4 (Part 2)

29 28 Estimation of a-posteriori a probabilities Goal: estimate P(ω i x) from a set of n labeled samples Place a cell of volume V around x and capture k samples Suppose k i samples amongst k are labeled ω i then: p n (x, ω i ) = k i /n.v An estimate for p n (ω i x) is: p ( ω x ) = = p p ( x, ω n i n i = j c j= 1 n ( x, ω ) j ) ki k

30 29 k i /k is the fraction of the samples within the cell that are labeled ω i For minimum error rate, the most frequently represented category within the cell is selected If k is large and the cell is sufficiently small, the performance will approach the optimal Bayes rule Pattern Classification, Chapter 4 (Part 2)

31 Nearest neighbor Decision Rule 30 Let D n = {x 1, x 2,,, x n } be a set of n labeled prototypes Let x D n be the closest prototype to a test point x then the nearest-neighbor neighbor rule for classifying x is to assign it the label associated with x The nearest-neighbor neighbor rule leads to an error rate greater than the minimum possible, namely the Bayes error rate If the number of prototypes is large (unlimited), the error rate of the nearest-neighbor neighbor classifier is never worse than twice the Bayes rate (it can be proved!) If n,, it is always possible to find x sufficiently close to x so that: P(ω i x ) x P(ω i x) Pattern Classification, Chapter 4 (Part 2)

32 31 If P(ω m x) 1,, then the nearest neighbor selection is almost always the same as the Bayes selection Pattern Classification, Chapter 4 (Part 2)

33 32

34 Bound on the Nearest Neighbor Error Rate

35 k Nearest Neighbor (k-nn) Decision Rule 34 Classify the test sample x by assigning it the label most frequently represented among the k nearest samples of x Pattern Classification, Chapter 4 (Part 2)

36 35 Pattern Classification, Chapter 4 (Part 2)

37 The k-nearest Neighbor Rule The bounds on k-nn error rate for different values of k. As k increases, the upper bounds get progressively closer to the lower bound the Bayes Error rate. When k goes to infinity, the two bounds meet and k-nn rule becomes optimal.

38 Metric Nearest neighbor rule relies on a metric or a distance function between the patterns A metric D(.,.) is a function that takes two vectors a and b as arguments, and returns a scalar function. For a function D(.,.) to be called a metric, it needs to satisfy the following properties for any given vectors a, b and c.

39 Metrics and Nearest-neighbors A metric remains a metric even when each dimension is scaled (in fact, under any positive definite linear transformation). Left plot shows a scenario where the black point is closer to x. Scaling the x 1 axis by 1/3, however, brings the red point closer to x than the black point.

40 Minkowski metric Example Metrics K = 1 gives the Manhattan distance K = 2 gives the Euclidean distance K = gives the max-distance (maximum value of the absolute differences between the elements of the two vectors) Tanimoto metric: used for finding distance between sets

41 Computational Complexity of K-NN K Rule 40 Complexity both in terms of time (search) and space (storage of prototypes) has received a lot of attention Suppose we are given n labeled samples in d dimensions and we seek the sample closest to a test point x. Naïve approach would require calculating the distance of x to each stored point and then find the closest. Three general algorithmic techniques for reducing the computational burden computing partial distances Pre-structuring editing the stored prototypes Pattern Classification, Chapter 4 (Part 2)

42 Nearest-Neighbor Neighbor Editing For each given test point, k-nn k has to make N distance computations, followed by selection of k-closest k neighbors. Imagine a billion web-pages, or a million images. Different ways to reduce search complexity Pre-structuring the data. (Eg. Divide 2D data uniformly distributed in [-1[ 1 1] 2 into quadrants and compare the given example to points only in that quadrant) Build search trees Nearest neighbor editing Nearest-neighbor editing eliminates useless prototypes Eliminate prototypes that are surrounded by training examples of the same category label Leaves the decision boundaries intact

43 Nearest neighbor editing: Algorithm The complexity of this algorithm is O(d 3 n [d/2] ln n), where [d/2] is the floor of d/2.

44 A Branch and Bound Algorithm for Computing k-nearest Neighbors K. Fukunga & P. Narendra, IEEE Trans. Computers, July 1975

45 KNN K-NN generally requires large number of computations for large n (number of training samples) Several solutions to reduce complexity Hart s method: condense the training dataset size while retaining most of the samples that provide sufficient discriminatory information between classes Fischer and Patrick: re-order the examples such that many distance computations can be eliminated Fukunaga and Narendra: hierarchically decompose the design samples into disjoint subsets and apply a branch-and-bound algorithm for searching through the hierarchy.

46 Stage 1: Hierarchical Decomposition Given a dataset of n samples, divide it into l groups, and recursively divide each of the l-groups further into l-groups each. Partitioning must be done such that similar samples are placed in the same group. K-means algorithm is used to divide the dataset, since it scales well to large datasets.

47 Searching the Tree For each node in the tree, evaluate the following quantities S p N p M p Set of samples associated with node p Number of samples at node p Sample mean of samples at node p r p Farthest distance from mean M p Use two rules for eliminating examples for nearest neighbor computation.

48 Rule 1 Rule 1: No sample x i in S p can be nearest neighbor to test point x if where B is the distance to the current nearest neighbor. Proof: For any point x i in S p if test point x is in the same node, by triangle inequality, we have Since d(x i,m p ) < r p by definition, Therefore x i cannot be nearest neighbor to x if

49 Rule 2 Rule 2: A sample x i in S p cannot be a nearest neighbor to x if Proof: On similar lines as the previous proof using triangle inequality. This rule is applied only on the leaf nodes. The distances d(x i,m p ) are computed for all the points in the leaf nodes; only requires additional storage of n real numbers.

50 Branch and Bound Search Algorithm 1. Set B =. Current level L = 1, and Node = Expansion:Place all nodes that are immediate successors of current node in the active set of nodes. 3. For each node, test Rule 1. Remove nodes that fail. 4. Pick the closest node to the test point xand expand using step 2. If you are at the final level, go to step Apply Rule 2 to all the points in the last node. 1. For all the points that fail, do not compute the distance to x. 2. For points that succeed, compute the distances. 6. Pick the closest point using the computed distances.

51 Extension to k-nearest Neighbors The same algorithm may be applied with B representing the distance to the k-th closest point in the dataset. When a distance is actually calculated in Step 5, it is compared against distances to its current k-nearest neighbors, and the current k-nearest neighbor table is updated as necessary.

Social Media Mining. Data Mining Essentials

Social Media Mining. Data Mining Essentials Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers

More information

Classification algorithm in Data mining: An Overview

Classification algorithm in Data mining: An Overview Classification algorithm in Data mining: An Overview S.Neelamegam #1, Dr.E.Ramaraj *2 #1 M.phil Scholar, Department of Computer Science and Engineering, Alagappa University, Karaikudi. *2 Professor, Department

More information

Clustering. Data Mining. Abraham Otero. Data Mining. Agenda

Clustering. Data Mining. Abraham Otero. Data Mining. Agenda Clustering 1/46 Agenda Introduction Distance K-nearest neighbors Hierarchical clustering Quick reference 2/46 1 Introduction It seems logical that in a new situation we should act in a similar way as in

More information

Neural Networks Lesson 5 - Cluster Analysis

Neural Networks Lesson 5 - Cluster Analysis Neural Networks Lesson 5 - Cluster Analysis Prof. Michele Scarpiniti INFOCOM Dpt. - Sapienza University of Rome http://ispac.ing.uniroma1.it/scarpiniti/index.htm michele.scarpiniti@uniroma1.it Rome, 29

More information

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression Logistic Regression Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Logistic Regression Preserve linear classification boundaries. By the Bayes rule: Ĝ(x) = arg max

More information

Medical Information Management & Mining. You Chen Jan,15, 2013 You.chen@vanderbilt.edu

Medical Information Management & Mining. You Chen Jan,15, 2013 You.chen@vanderbilt.edu Medical Information Management & Mining You Chen Jan,15, 2013 You.chen@vanderbilt.edu 1 Trees Building Materials Trees cannot be used to build a house directly. How can we transform trees to building materials?

More information

Data Mining. Cluster Analysis: Advanced Concepts and Algorithms

Data Mining. Cluster Analysis: Advanced Concepts and Algorithms Data Mining Cluster Analysis: Advanced Concepts and Algorithms Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 More Clustering Methods Prototype-based clustering Density-based clustering Graph-based

More information

MACHINE LEARNING IN HIGH ENERGY PHYSICS

MACHINE LEARNING IN HIGH ENERGY PHYSICS MACHINE LEARNING IN HIGH ENERGY PHYSICS LECTURE #1 Alex Rogozhnikov, 2015 INTRO NOTES 4 days two lectures, two practice seminars every day this is introductory track to machine learning kaggle competition!

More information

Cluster Analysis: Advanced Concepts

Cluster Analysis: Advanced Concepts Cluster Analysis: Advanced Concepts and dalgorithms Dr. Hui Xiong Rutgers University Introduction to Data Mining 08/06/2006 1 Introduction to Data Mining 08/06/2006 1 Outline Prototype-based Fuzzy c-means

More information

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical

More information

DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS

DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS 1 AND ALGORITHMS Chiara Renso KDD-LAB ISTI- CNR, Pisa, Italy WHAT IS CLUSTER ANALYSIS? Finding groups of objects such that the objects in a group will be similar

More information

. Learn the number of classes and the structure of each class using similarity between unlabeled training patterns

. Learn the number of classes and the structure of each class using similarity between unlabeled training patterns Outline Part 1: of data clustering Non-Supervised Learning and Clustering : Problem formulation cluster analysis : Taxonomies of Clustering Techniques : Data types and Proximity Measures : Difficulties

More information

HT2015: SC4 Statistical Data Mining and Machine Learning

HT2015: SC4 Statistical Data Mining and Machine Learning HT2015: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Bayesian Nonparametrics Parametric vs Nonparametric

More information

FAST APPROXIMATE NEAREST NEIGHBORS WITH AUTOMATIC ALGORITHM CONFIGURATION

FAST APPROXIMATE NEAREST NEIGHBORS WITH AUTOMATIC ALGORITHM CONFIGURATION FAST APPROXIMATE NEAREST NEIGHBORS WITH AUTOMATIC ALGORITHM CONFIGURATION Marius Muja, David G. Lowe Computer Science Department, University of British Columbia, Vancouver, B.C., Canada mariusm@cs.ubc.ca,

More information

Classification Techniques for Remote Sensing

Classification Techniques for Remote Sensing Classification Techniques for Remote Sensing Selim Aksoy Department of Computer Engineering Bilkent University Bilkent, 06800, Ankara saksoy@cs.bilkent.edu.tr http://www.cs.bilkent.edu.tr/ saksoy/courses/cs551

More information

α = u v. In other words, Orthogonal Projection

α = u v. In other words, Orthogonal Projection Orthogonal Projection Given any nonzero vector v, it is possible to decompose an arbitrary vector u into a component that points in the direction of v and one that points in a direction orthogonal to v

More information

1.3. DOT PRODUCT 19. 6. If θ is the angle (between 0 and π) between two non-zero vectors u and v,

1.3. DOT PRODUCT 19. 6. If θ is the angle (between 0 and π) between two non-zero vectors u and v, 1.3. DOT PRODUCT 19 1.3 Dot Product 1.3.1 Definitions and Properties The dot product is the first way to multiply two vectors. The definition we will give below may appear arbitrary. But it is not. It

More information

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015 An Introduction to Data Mining for Wind Power Management Spring 2015 Big Data World Every minute: Google receives over 4 million search queries Facebook users share almost 2.5 million pieces of content

More information

Email Spam Detection A Machine Learning Approach

Email Spam Detection A Machine Learning Approach Email Spam Detection A Machine Learning Approach Ge Song, Lauren Steimle ABSTRACT Machine learning is a branch of artificial intelligence concerned with the creation and study of systems that can learn

More information

Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay

Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay Lecture - 17 Shannon-Fano-Elias Coding and Introduction to Arithmetic Coding

More information

Lecture 4: BK inequality 27th August and 6th September, 2007

Lecture 4: BK inequality 27th August and 6th September, 2007 CSL866: Percolation and Random Graphs IIT Delhi Amitabha Bagchi Scribe: Arindam Pal Lecture 4: BK inequality 27th August and 6th September, 2007 4. Preliminaries The FKG inequality allows us to lower bound

More information

Approximation Algorithms

Approximation Algorithms Approximation Algorithms or: How I Learned to Stop Worrying and Deal with NP-Completeness Ong Jit Sheng, Jonathan (A0073924B) March, 2012 Overview Key Results (I) General techniques: Greedy algorithms

More information

Classification by Pairwise Coupling

Classification by Pairwise Coupling Classification by Pairwise Coupling TREVOR HASTIE * Stanford University and ROBERT TIBSHIRANI t University of Toronto Abstract We discuss a strategy for polychotomous classification that involves estimating

More information

Data Mining Essentials

Data Mining Essentials This chapter is from Social Media Mining: An Introduction. By Reza Zafarani, Mohammad Ali Abbasi, and Huan Liu. Cambridge University Press, 2014. Draft version: April 20, 2014. Complete Draft and Slides

More information

Adaptive Online Gradient Descent

Adaptive Online Gradient Descent Adaptive Online Gradient Descent Peter L Bartlett Division of Computer Science Department of Statistics UC Berkeley Berkeley, CA 94709 bartlett@csberkeleyedu Elad Hazan IBM Almaden Research Center 650

More information

Offline sorting buffers on Line

Offline sorting buffers on Line Offline sorting buffers on Line Rohit Khandekar 1 and Vinayaka Pandit 2 1 University of Waterloo, ON, Canada. email: rkhandekar@gmail.com 2 IBM India Research Lab, New Delhi. email: pvinayak@in.ibm.com

More information

Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data

Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data CMPE 59H Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data Term Project Report Fatma Güney, Kübra Kalkan 1/15/2013 Keywords: Non-linear

More information

Lecture 3: Linear methods for classification

Lecture 3: Linear methods for classification Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes

More information

Bi-level Locality Sensitive Hashing for K-Nearest Neighbor Computation

Bi-level Locality Sensitive Hashing for K-Nearest Neighbor Computation Bi-level Locality Sensitive Hashing for K-Nearest Neighbor Computation Jia Pan UNC Chapel Hill panj@cs.unc.edu Dinesh Manocha UNC Chapel Hill dm@cs.unc.edu ABSTRACT We present a new Bi-level LSH algorithm

More information

Machine Learning in Spam Filtering

Machine Learning in Spam Filtering Machine Learning in Spam Filtering A Crash Course in ML Konstantin Tretyakov kt@ut.ee Institute of Computer Science, University of Tartu Overview Spam is Evil ML for Spam Filtering: General Idea, Problems.

More information

Big Data & Scripting Part II Streaming Algorithms

Big Data & Scripting Part II Streaming Algorithms Big Data & Scripting Part II Streaming Algorithms 1, 2, a note on sampling and filtering sampling: (randomly) choose a representative subset filtering: given some criterion (e.g. membership in a set),

More information

Random graphs with a given degree sequence

Random graphs with a given degree sequence Sourav Chatterjee (NYU) Persi Diaconis (Stanford) Allan Sly (Microsoft) Let G be an undirected simple graph on n vertices. Let d 1,..., d n be the degrees of the vertices of G arranged in descending order.

More information

R-trees. R-Trees: A Dynamic Index Structure For Spatial Searching. R-Tree. Invariants

R-trees. R-Trees: A Dynamic Index Structure For Spatial Searching. R-Tree. Invariants R-Trees: A Dynamic Index Structure For Spatial Searching A. Guttman R-trees Generalization of B+-trees to higher dimensions Disk-based index structure Occupancy guarantee Multiple search paths Insertions

More information

1 Maximum likelihood estimation

1 Maximum likelihood estimation COS 424: Interacting with Data Lecturer: David Blei Lecture #4 Scribes: Wei Ho, Michael Ye February 14, 2008 1 Maximum likelihood estimation 1.1 MLE of a Bernoulli random variable (coin flips) Given N

More information

Bayesian logistic betting strategy against probability forecasting. Akimichi Takemura, Univ. Tokyo. November 12, 2012

Bayesian logistic betting strategy against probability forecasting. Akimichi Takemura, Univ. Tokyo. November 12, 2012 Bayesian logistic betting strategy against probability forecasting Akimichi Takemura, Univ. Tokyo (joint with Masayuki Kumon, Jing Li and Kei Takeuchi) November 12, 2012 arxiv:1204.3496. To appear in Stochastic

More information

Cluster Analysis. Isabel M. Rodrigues. Lisboa, 2014. Instituto Superior Técnico

Cluster Analysis. Isabel M. Rodrigues. Lisboa, 2014. Instituto Superior Técnico Instituto Superior Técnico Lisboa, 2014 Introduction: Cluster analysis What is? Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from

More information

An Introduction to Machine Learning

An Introduction to Machine Learning An Introduction to Machine Learning L5: Novelty Detection and Regression Alexander J. Smola Statistical Machine Learning Program Canberra, ACT 0200 Australia Alex.Smola@nicta.com.au Tata Institute, Pune,

More information

Equations Involving Lines and Planes Standard equations for lines in space

Equations Involving Lines and Planes Standard equations for lines in space Equations Involving Lines and Planes In this section we will collect various important formulas regarding equations of lines and planes in three dimensional space Reminder regarding notation: any quantity

More information

Christfried Webers. Canberra February June 2015

Christfried Webers. Canberra February June 2015 c Statistical Group and College of Engineering and Computer Science Canberra February June (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 829 c Part VIII Linear Classification 2 Logistic

More information

Cluster Analysis: Basic Concepts and Algorithms

Cluster Analysis: Basic Concepts and Algorithms 8 Cluster Analysis: Basic Concepts and Algorithms Cluster analysis divides data into groups (clusters) that are meaningful, useful, or both. If meaningful groups are the goal, then the clusters should

More information

Data Clustering. Dec 2nd, 2013 Kyrylo Bessonov

Data Clustering. Dec 2nd, 2013 Kyrylo Bessonov Data Clustering Dec 2nd, 2013 Kyrylo Bessonov Talk outline Introduction to clustering Types of clustering Supervised Unsupervised Similarity measures Main clustering algorithms k-means Hierarchical Main

More information

Unsupervised learning: Clustering

Unsupervised learning: Clustering Unsupervised learning: Clustering Salissou Moutari Centre for Statistical Science and Operational Research CenSSOR 17 th September 2013 Unsupervised learning: Clustering 1/52 Outline 1 Introduction What

More information

Section 1.1. Introduction to R n

Section 1.1. Introduction to R n The Calculus of Functions of Several Variables Section. Introduction to R n Calculus is the study of functional relationships and how related quantities change with each other. In your first exposure to

More information

Sample Solutions for Assignment 2.

Sample Solutions for Assignment 2. AMath 383, Autumn 01 Sample Solutions for Assignment. Reading: Chs. -3. 1. Exercise 4 of Chapter. Please note that there is a typo in the formula in part c: An exponent of 1 is missing. It should say 4

More information

CSC2420 Fall 2012: Algorithm Design, Analysis and Theory

CSC2420 Fall 2012: Algorithm Design, Analysis and Theory CSC2420 Fall 2012: Algorithm Design, Analysis and Theory Allan Borodin November 15, 2012; Lecture 10 1 / 27 Randomized online bipartite matching and the adwords problem. We briefly return to online algorithms

More information

Master s Theory Exam Spring 2006

Master s Theory Exam Spring 2006 Spring 2006 This exam contains 7 questions. You should attempt them all. Each question is divided into parts to help lead you through the material. You should attempt to complete as much of each problem

More information

Machine Learning and Pattern Recognition Logistic Regression

Machine Learning and Pattern Recognition Logistic Regression Machine Learning and Pattern Recognition Logistic Regression Course Lecturer:Amos J Storkey Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh Crichton Street,

More information

Question 2 Naïve Bayes (16 points)

Question 2 Naïve Bayes (16 points) Question 2 Naïve Bayes (16 points) About 2/3 of your email is spam so you downloaded an open source spam filter based on word occurrences that uses the Naive Bayes classifier. Assume you collected the

More information

Character Image Patterns as Big Data

Character Image Patterns as Big Data 22 International Conference on Frontiers in Handwriting Recognition Character Image Patterns as Big Data Seiichi Uchida, Ryosuke Ishida, Akira Yoshida, Wenjie Cai, Yaokai Feng Kyushu University, Fukuoka,

More information

4. Expanding dynamical systems

4. Expanding dynamical systems 4.1. Metric definition. 4. Expanding dynamical systems Definition 4.1. Let X be a compact metric space. A map f : X X is said to be expanding if there exist ɛ > 0 and L > 1 such that d(f(x), f(y)) Ld(x,

More information

A Practical Scheme for Wireless Network Operation

A Practical Scheme for Wireless Network Operation A Practical Scheme for Wireless Network Operation Radhika Gowaikar, Amir F. Dana, Babak Hassibi, Michelle Effros June 21, 2004 Abstract In many problems in wireline networks, it is known that achieving

More information

These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop

These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop Music and Machine Learning (IFT6080 Winter 08) Prof. Douglas Eck, Université de Montréal These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher

More information

Direct Methods for Solving Linear Systems. Matrix Factorization

Direct Methods for Solving Linear Systems. Matrix Factorization Direct Methods for Solving Linear Systems Matrix Factorization Numerical Analysis (9th Edition) R L Burden & J D Faires Beamer Presentation Slides prepared by John Carroll Dublin City University c 2011

More information

BIDM Project. Predicting the contract type for IT/ITES outsourcing contracts

BIDM Project. Predicting the contract type for IT/ITES outsourcing contracts BIDM Project Predicting the contract type for IT/ITES outsourcing contracts N a n d i n i G o v i n d a r a j a n ( 6 1 2 1 0 5 5 6 ) The authors believe that data modelling can be used to predict if an

More information

Knowledge Discovery and Data Mining

Knowledge Discovery and Data Mining Knowledge Discovery and Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Evaluating the Accuracy of a Classifier Holdout, random subsampling, crossvalidation, and the bootstrap are common techniques for

More information

Chapter 22: Electric Flux and Gauss s Law

Chapter 22: Electric Flux and Gauss s Law 22.1 ntroduction We have seen in chapter 21 that determining the electric field of a continuous charge distribution can become very complicated for some charge distributions. t would be desirable if we

More information

Competitive Analysis of On line Randomized Call Control in Cellular Networks

Competitive Analysis of On line Randomized Call Control in Cellular Networks Competitive Analysis of On line Randomized Call Control in Cellular Networks Ioannis Caragiannis Christos Kaklamanis Evi Papaioannou Abstract In this paper we address an important communication issue arising

More information

A Sampled Texture Prior for Image Super-Resolution

A Sampled Texture Prior for Image Super-Resolution A Sampled Texture Prior for Image Super-Resolution Lyndsey C. Pickup, Stephen J. Roberts and Andrew Zisserman Robotics Research Group Department of Engineering Science University of Oxford Parks Road,

More information

Log Analysis-Based Intrusion Detection via Unsupervised Learning. Pingchuan Ma

Log Analysis-Based Intrusion Detection via Unsupervised Learning. Pingchuan Ma Log Analysis-Based Intrusion Detection via Unsupervised Learning Pingchuan Ma Master of Science School of Informatics University of Edinburgh 2003 Abstract Keeping networks secure has never been such an

More information

Mobile Phone APP Software Browsing Behavior using Clustering Analysis

Mobile Phone APP Software Browsing Behavior using Clustering Analysis Proceedings of the 2014 International Conference on Industrial Engineering and Operations Management Bali, Indonesia, January 7 9, 2014 Mobile Phone APP Software Browsing Behavior using Clustering Analysis

More information

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 by Tan, Steinbach, Kumar 1 What is Cluster Analysis? Finding groups of objects such that the objects in a group will

More information

MOBILE ROBOT TRACKING OF PRE-PLANNED PATHS. Department of Computer Science, York University, Heslington, York, Y010 5DD, UK (email:nep@cs.york.ac.

MOBILE ROBOT TRACKING OF PRE-PLANNED PATHS. Department of Computer Science, York University, Heslington, York, Y010 5DD, UK (email:nep@cs.york.ac. MOBILE ROBOT TRACKING OF PRE-PLANNED PATHS N. E. Pears Department of Computer Science, York University, Heslington, York, Y010 5DD, UK (email:nep@cs.york.ac.uk) 1 Abstract A method of mobile robot steering

More information

LABEL PROPAGATION ON GRAPHS. SEMI-SUPERVISED LEARNING. ----Changsheng Liu 10-30-2014

LABEL PROPAGATION ON GRAPHS. SEMI-SUPERVISED LEARNING. ----Changsheng Liu 10-30-2014 LABEL PROPAGATION ON GRAPHS. SEMI-SUPERVISED LEARNING ----Changsheng Liu 10-30-2014 Agenda Semi Supervised Learning Topics in Semi Supervised Learning Label Propagation Local and global consistency Graph

More information

Vector Spaces; the Space R n

Vector Spaces; the Space R n Vector Spaces; the Space R n Vector Spaces A vector space (over the real numbers) is a set V of mathematical entities, called vectors, U, V, W, etc, in which an addition operation + is defined and in which

More information

Logistic Regression. Vibhav Gogate The University of Texas at Dallas. Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld.

Logistic Regression. Vibhav Gogate The University of Texas at Dallas. Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld. Logistic Regression Vibhav Gogate The University of Texas at Dallas Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld. Generative vs. Discriminative Classifiers Want to Learn: h:x Y X features

More information

Linear Threshold Units

Linear Threshold Units Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear

More information

COC131 Data Mining - Clustering

COC131 Data Mining - Clustering COC131 Data Mining - Clustering Martin D. Sykora m.d.sykora@lboro.ac.uk Tutorial 05, Friday 20th March 2009 1. Fire up Weka (Waikako Environment for Knowledge Analysis) software, launch the explorer window

More information

11 Linear and Quadratic Discriminant Analysis, Logistic Regression, and Partial Least Squares Regression

11 Linear and Quadratic Discriminant Analysis, Logistic Regression, and Partial Least Squares Regression Frank C Porter and Ilya Narsky: Statistical Analysis Techniques in Particle Physics Chap. c11 2013/9/9 page 221 le-tex 221 11 Linear and Quadratic Discriminant Analysis, Logistic Regression, and Partial

More information

A Complete Gradient Clustering Algorithm for Features Analysis of X-ray Images

A Complete Gradient Clustering Algorithm for Features Analysis of X-ray Images A Complete Gradient Clustering Algorithm for Features Analysis of X-ray Images Małgorzata Charytanowicz, Jerzy Niewczas, Piotr A. Kowalski, Piotr Kulczycki, Szymon Łukasik, and Sławomir Żak Abstract Methods

More information

Using Mixtures-of-Distributions models to inform farm size selection decisions in representative farm modelling. Philip Kostov and Seamus McErlean

Using Mixtures-of-Distributions models to inform farm size selection decisions in representative farm modelling. Philip Kostov and Seamus McErlean Using Mixtures-of-Distributions models to inform farm size selection decisions in representative farm modelling. by Philip Kostov and Seamus McErlean Working Paper, Agricultural and Food Economics, Queen

More information

Load Balancing and Switch Scheduling

Load Balancing and Switch Scheduling EE384Y Project Final Report Load Balancing and Switch Scheduling Xiangheng Liu Department of Electrical Engineering Stanford University, Stanford CA 94305 Email: liuxh@systems.stanford.edu Abstract Load

More information

Machine Learning Big Data using Map Reduce

Machine Learning Big Data using Map Reduce Machine Learning Big Data using Map Reduce By Michael Bowles, PhD Where Does Big Data Come From? -Web data (web logs, click histories) -e-commerce applications (purchase histories) -Retail purchase histories

More information

An Analysis on Density Based Clustering of Multi Dimensional Spatial Data

An Analysis on Density Based Clustering of Multi Dimensional Spatial Data An Analysis on Density Based Clustering of Multi Dimensional Spatial Data K. Mumtaz 1 Assistant Professor, Department of MCA Vivekanandha Institute of Information and Management Studies, Tiruchengode,

More information

Metric Spaces. Chapter 1

Metric Spaces. Chapter 1 Chapter 1 Metric Spaces Many of the arguments you have seen in several variable calculus are almost identical to the corresponding arguments in one variable calculus, especially arguments concerning convergence

More information

Math 215 HW #6 Solutions

Math 215 HW #6 Solutions Math 5 HW #6 Solutions Problem 34 Show that x y is orthogonal to x + y if and only if x = y Proof First, suppose x y is orthogonal to x + y Then since x, y = y, x In other words, = x y, x + y = (x y) T

More information

24. The Branch and Bound Method

24. The Branch and Bound Method 24. The Branch and Bound Method It has serious practical consequences if it is known that a combinatorial problem is NP-complete. Then one can conclude according to the present state of science that no

More information

Course: Model, Learning, and Inference: Lecture 5

Course: Model, Learning, and Inference: Lecture 5 Course: Model, Learning, and Inference: Lecture 5 Alan Yuille Department of Statistics, UCLA Los Angeles, CA 90095 yuille@stat.ucla.edu Abstract Probability distributions on structured representation.

More information

Which Space Partitioning Tree to Use for Search?

Which Space Partitioning Tree to Use for Search? Which Space Partitioning Tree to Use for Search? P. Ram Georgia Tech. / Skytree, Inc. Atlanta, GA 30308 p.ram@gatech.edu Abstract A. G. Gray Georgia Tech. Atlanta, GA 30308 agray@cc.gatech.edu We consider

More information

Support Vector Machines with Clustering for Training with Very Large Datasets

Support Vector Machines with Clustering for Training with Very Large Datasets Support Vector Machines with Clustering for Training with Very Large Datasets Theodoros Evgeniou Technology Management INSEAD Bd de Constance, Fontainebleau 77300, France theodoros.evgeniou@insead.fr Massimiliano

More information

Introduction to Machine Learning Using Python. Vikram Kamath

Introduction to Machine Learning Using Python. Vikram Kamath Introduction to Machine Learning Using Python Vikram Kamath Contents: 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. Introduction/Definition Where and Why ML is used Types of Learning Supervised Learning Linear Regression

More information

Voronoi Treemaps in D3

Voronoi Treemaps in D3 Voronoi Treemaps in D3 Peter Henry University of Washington phenry@gmail.com Paul Vines University of Washington paul.l.vines@gmail.com ABSTRACT Voronoi treemaps are an alternative to traditional rectangular

More information

Bayes and Naïve Bayes. cs534-machine Learning

Bayes and Naïve Bayes. cs534-machine Learning Bayes and aïve Bayes cs534-machine Learning Bayes Classifier Generative model learns Prediction is made by and where This is often referred to as the Bayes Classifier, because of the use of the Bayes rule

More information

Lecture 6 Online and streaming algorithms for clustering

Lecture 6 Online and streaming algorithms for clustering CSE 291: Unsupervised learning Spring 2008 Lecture 6 Online and streaming algorithms for clustering 6.1 On-line k-clustering To the extent that clustering takes place in the brain, it happens in an on-line

More information

Clustering Via Decision Tree Construction

Clustering Via Decision Tree Construction Clustering Via Decision Tree Construction Bing Liu 1, Yiyuan Xia 2, and Philip S. Yu 3 1 Department of Computer Science, University of Illinois at Chicago, 851 S. Morgan Street, Chicago, IL 60607-7053.

More information

Machine Learning and Data Analysis overview. Department of Cybernetics, Czech Technical University in Prague. http://ida.felk.cvut.

Machine Learning and Data Analysis overview. Department of Cybernetics, Czech Technical University in Prague. http://ida.felk.cvut. Machine Learning and Data Analysis overview Jiří Kléma Department of Cybernetics, Czech Technical University in Prague http://ida.felk.cvut.cz psyllabus Lecture Lecturer Content 1. J. Kléma Introduction,

More information

Solving Simultaneous Equations and Matrices

Solving Simultaneous Equations and Matrices Solving Simultaneous Equations and Matrices The following represents a systematic investigation for the steps used to solve two simultaneous linear equations in two unknowns. The motivation for considering

More information

3. INNER PRODUCT SPACES

3. INNER PRODUCT SPACES . INNER PRODUCT SPACES.. Definition So far we have studied abstract vector spaces. These are a generalisation of the geometric spaces R and R. But these have more structure than just that of a vector space.

More information

Local classification and local likelihoods

Local classification and local likelihoods Local classification and local likelihoods November 18 k-nearest neighbors The idea of local regression can be extended to classification as well The simplest way of doing so is called nearest neighbor

More information

International Journal of Computer Science Trends and Technology (IJCST) Volume 3 Issue 3, May-June 2015

International Journal of Computer Science Trends and Technology (IJCST) Volume 3 Issue 3, May-June 2015 RESEARCH ARTICLE OPEN ACCESS Data Mining Technology for Efficient Network Security Management Ankit Naik [1], S.W. Ahmad [2] Student [1], Assistant Professor [2] Department of Computer Science and Engineering

More information

Ranking on Data Manifolds

Ranking on Data Manifolds Ranking on Data Manifolds Dengyong Zhou, Jason Weston, Arthur Gretton, Olivier Bousquet, and Bernhard Schölkopf Max Planck Institute for Biological Cybernetics, 72076 Tuebingen, Germany {firstname.secondname

More information

Load Balancing. Load Balancing 1 / 24

Load Balancing. Load Balancing 1 / 24 Load Balancing Backtracking, branch & bound and alpha-beta pruning: how to assign work to idle processes without much communication? Additionally for alpha-beta pruning: implementing the young-brothers-wait

More information

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. IEEE/ACM TRANSACTIONS ON NETWORKING 1 A Greedy Link Scheduler for Wireless Networks With Gaussian Multiple-Access and Broadcast Channels Arun Sridharan, Student Member, IEEE, C Emre Koksal, Member, IEEE,

More information

Principles of Data Mining by Hand&Mannila&Smyth

Principles of Data Mining by Hand&Mannila&Smyth Principles of Data Mining by Hand&Mannila&Smyth Slides for Textbook Ari Visa,, Institute of Signal Processing Tampere University of Technology October 4, 2010 Data Mining: Concepts and Techniques 1 Differences

More information

Fast Matching of Binary Features

Fast Matching of Binary Features Fast Matching of Binary Features Marius Muja and David G. Lowe Laboratory for Computational Intelligence University of British Columbia, Vancouver, Canada {mariusm,lowe}@cs.ubc.ca Abstract There has been

More information

Data mining and statistical models in marketing campaigns of BT Retail

Data mining and statistical models in marketing campaigns of BT Retail Data mining and statistical models in marketing campaigns of BT Retail Francesco Vivarelli and Martyn Johnson Database Exploitation, Segmentation and Targeting group BT Retail Pp501 Holborn centre 120

More information

Clustering. Adrian Groza. Department of Computer Science Technical University of Cluj-Napoca

Clustering. Adrian Groza. Department of Computer Science Technical University of Cluj-Napoca Clustering Adrian Groza Department of Computer Science Technical University of Cluj-Napoca Outline 1 Cluster Analysis What is Datamining? Cluster Analysis 2 K-means 3 Hierarchical Clustering What is Datamining?

More information

AUTOMATION OF ENERGY DEMAND FORECASTING. Sanzad Siddique, B.S.

AUTOMATION OF ENERGY DEMAND FORECASTING. Sanzad Siddique, B.S. AUTOMATION OF ENERGY DEMAND FORECASTING by Sanzad Siddique, B.S. A Thesis submitted to the Faculty of the Graduate School, Marquette University, in Partial Fulfillment of the Requirements for the Degree

More information

The equivalence of logistic regression and maximum entropy models

The equivalence of logistic regression and maximum entropy models The equivalence of logistic regression and maximum entropy models John Mount September 23, 20 Abstract As our colleague so aptly demonstrated ( http://www.win-vector.com/blog/20/09/the-simplerderivation-of-logistic-regression/

More information

Joint models for classification and comparison of mortality in different countries.

Joint models for classification and comparison of mortality in different countries. Joint models for classification and comparison of mortality in different countries. Viani D. Biatat 1 and Iain D. Currie 1 1 Department of Actuarial Mathematics and Statistics, and the Maxwell Institute

More information