K-nearest neighbours

Similar documents
Machine Learning. Term 2012/2013 LSI - FIB. Javier Béjar cbea (LSI - FIB) Machine Learning Term 2012/ / 34

Principles of Data Mining by Hand&Mannila&Smyth

Introduction to Machine Learning Lecture 1. Mehryar Mohri Courant Institute and Google Research

Big Data Analytics CSCI 4030

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

Cross-validation for detecting and preventing overfitting

Machine Learning. CUNY Graduate Center, Spring Professor Liang Huang.

Decompose Error Rate into components, some of which can be measured on unlabeled data

Linear Threshold Units

Introduction to nonparametric regression: Least squares vs. Nearest neighbors

CS 2750 Machine Learning. Lecture 1. Machine Learning. CS 2750 Machine Learning.

MACHINE LEARNING IN HIGH ENERGY PHYSICS

Local classification and local likelihoods

Location matters. 3 techniques to incorporate geo-spatial effects in one's predictive model

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015

Smoothing and Non-Parametric Regression

Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data

Classification algorithm in Data mining: An Overview

Supervised Learning (Big Data Analytics)

Unsupervised Data Mining (Clustering)

Data, Measurements, Features

Reference Books. Data Mining. Supervised vs. Unsupervised Learning. Classification: Definition. Classification k-nearest neighbors

Analysis of kiva.com Microlending Service! Hoda Eydgahi Julia Ma Andy Bardagjy December 9, 2010 MAS.622j

Part-Based Recognition

Feature Selection vs. Extraction

Machine Learning and Data Mining. Regression Problem. (adapted from) Prof. Alexander Ihler

Supervised and unsupervised learning - 1

Chapter 7. Feature Selection. 7.1 Introduction

Social Media Mining. Data Mining Essentials

Employer Health Insurance Premium Prediction Elliott Lui

Machine Learning Logistic Regression

Knowledge Discovery and Data Mining

Introduction to Machine Learning Using Python. Vikram Kamath

Ensemble Methods. Knowledge Discovery and Data Mining 2 (VU) ( ) Roman Kern. KTI, TU Graz

Improving performance of Memory Based Reasoning model using Weight of Evidence coded categorical variables

Why do statisticians "hate" us?

Data Mining: Overview. What is Data Mining?

Support Vector Machine (SVM)

Assessment. Presenter: Yupu Zhang, Guoliang Jin, Tuo Wang Computer Vision 2008 Fall

Classification Techniques (1)

Journal of Asian Scientific Research COMPARISON OF THREE CLASSIFICATION ALGORITHMS FOR PREDICTING PM2.5 IN HONG KONG RURAL AREA.

LABEL PROPAGATION ON GRAPHS. SEMI-SUPERVISED LEARNING. ----Changsheng Liu

Content-Based Recommendation

Medical Information Management & Mining. You Chen Jan,15, 2013 You.chen@vanderbilt.edu

Visualization by Linear Projections as Information Retrieval

Lecture 13: Validation

Cross Validation. Dr. Thomas Jensen Expedia.com

Azure Machine Learning, SQL Data Mining and R

Search Taxonomy. Web Search. Search Engine Optimization. Information Retrieval

Artificial Neural Networks and Support Vector Machines. CS 486/686: Introduction to Artificial Intelligence

Classification of Bad Accounts in Credit Card Industry

Predict Influencers in the Social Network

An Introduction to Data Mining

Business Analytics and Credit Scoring

Knowledge Discovery and Data Mining

Supporting Online Material for

Data Mining Practical Machine Learning Tools and Techniques

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R

DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS

L13: cross-validation

Decision Trees from large Databases: SLIQ

Pixels Description of scene contents. Rob Fergus (NYU) Antonio Torralba (MIT) Yair Weiss (Hebrew U.) William T. Freeman (MIT) Banksy, 2006

Logistic Regression. Vibhav Gogate The University of Texas at Dallas. Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld.

Spam Detection A Machine Learning Approach

Galaxy Morphological Classification

Data Mining and Visualization

Logistic Regression (1/24/13)

A semi-supervised Spam mail detector

Knowledge Discovery and Data Mining

Summary: Transformations. Lecture 14 Parameter Estimation Readings T&V Sec Parameter Estimation: Fitting Geometric Models

Defending Networks with Incomplete Information: A Machine Learning Approach. Alexandre

STA 4273H: Statistical Machine Learning

CSC 411: Lecture 07: Multiclass Classification

The Delicate Art of Flower Classification

Statistical Machine Learning

Data Mining - Evaluation of Classifiers

Data Mining: An Overview. David Madigan

Decision Support System Methodology Using a Visual Approach for Cluster Analysis Problems

Application of Event Based Decision Tree and Ensemble of Data Driven Methods for Maintenance Action Recommendation

Linear smoother. ŷ = S y. where s ij = s ij (x) e.g. s ij = diag(l i (x)) To go the other way, you need to diagonalize S

Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches

The treatment of missing values and its effect in the classifier accuracy

Digital Imaging and Multimedia. Filters. Ahmed Elgammal Dept. of Computer Science Rutgers University

Cluster Analysis. Alison Merikangas Data Analysis Seminar 18 November 2009

Better credit models benefit us all

Statistical Models in Data Mining

Measuring Line Edge Roughness: Fluctuations in Uncertainty

Clustering. Data Mining. Abraham Otero. Data Mining. Agenda

A Simple Introduction to Support Vector Machines

Collaborative Filtering. Radek Pelánek

: Introduction to Machine Learning Dr. Rita Osadchy

The Role of Size Normalization on the Recognition Rate of Handwritten Numerals

Machine learning for algo trading

Principal components analysis

EECS 556 Image Processing W 09. Interpolation. Interpolation techniques B splines

LCs for Binary Classification

Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris

Towards Online Recognition of Handwritten Mathematics

Neural Networks Lesson 5 - Cluster Analysis

THE GOAL of this work is to learn discriminative components

Transcription:

Javier Béjar cbea LSI - FIB Term 2012/2013 Javier Béjar cbea (LSI - FIB) K-nearest neighbours Term 2012/2013 1 / 23

Outline 1 Non Parametric Learning 2 K-nearest neighbours K-nn Algorithm K-nn Regression Advantages and drawbacks Application Javier Béjar cbea (LSI - FIB) K-nearest neighbours Term 2012/2013 2 / 23

Non Parametric Learning 1 Non Parametric Learning 2 K-nearest neighbours K-nn Algorithm K-nn Regression Advantages and drawbacks Application Javier Béjar cbea (LSI - FIB) K-nearest neighbours Term 2012/2013 3 / 23

Non Parametric Learning Parametric vs Non parametric Models In the models that we have seen, we select a hypothesis space and adjust a fixed set of parameters with the training data (h α (x)) We assume that the parameters α summarize the training and we can forget about it This methods are called parametric models When we have a small amount of data it makes sense to have a small set of parameters and to constraint the complexity of the model (avoiding overfitting) Javier Béjar cbea (LSI - FIB) K-nearest neighbours Term 2012/2013 4 / 23

Non Parametric Learning Parametric vs Non parametric Models When we have a large quantity of data, overfitting is less an issue If data shows that the hipothesis has to be complex, we can try to adjust to that complexity A non parametric model is one that can not be characterized by a fixed set of parameters A family of non parametric models is Instance Based Learning Javier Béjar cbea (LSI - FIB) K-nearest neighbours Term 2012/2013 5 / 23

Non Parametric Learning Instance Based Learning Instance based learning is based on the memorization of the dataset The number of parameters is unbounded and grows with the size of the data There is not a model associated to the learned concepts The classification is obtained by looking into the memorized examples The cost of the learning process is 0, all the cost is in the computation of the prediction This kind learning is also known as lazy learning Javier Béjar cbea (LSI - FIB) K-nearest neighbours Term 2012/2013 6 / 23

1 Non Parametric Learning 2 K-nearest neighbours K-nn Algorithm K-nn Regression Advantages and drawbacks Application Javier Béjar cbea (LSI - FIB) K-nearest neighbours Term 2012/2013 7 / 23

K-nearest neighbours K-nearest neighbours uses the local neighborhood to obtain a prediction The K memorized examples more similar to the one that is being classified are retrieved A distance function is needed to compare the examples similarity Euclidean distance (d(x j, x k ) = i (x j,i x k,i ) 2 ) Mahnattan distance (d(x j, x k ) = i x j,i x k,i ) This means that if we change the distance function, we change how examples are classified Javier Béjar cbea (LSI - FIB) K-nearest neighbours Term 2012/2013 8 / 23

K-nearest neighbours - hypothesis space (1 neighbour) + + + Javier Béjar cbea (LSI - FIB) K-nearest neighbours Term 2012/2013 9 / 23

K-nn Algorithm K-nearest neighbours - Algorithm Training: Store all the examples Prediction: h(x new ) Let be x 1,..., x k the k more similar examples to x new h(x new )= combine predictions(x 1,..., x k ) The parameters of the algorithm are the number k of neighbours and the procedure for combining the predictions of the k examples The value of k has to be adjusted (crossvalidation) We can overfit (k too low) We can underfit (k too high) Javier Béjar cbea (LSI - FIB) K-nearest neighbours Term 2012/2013 10 / 23

K-nearest neighbours - Prediction K-nn Algorithm Javier Béjar cbea (LSI - FIB) K-nearest neighbours Term 2012/2013 11 / 23

K-nn Algorithm Looking for neighbours Looking for the K-nearest examples for a new example can be expensive The straightforward algorithm has a cost O(n log(k)), not good if the dataset is large We can use indexing with k-d trees (multidimensional binary search trees) They are good only if we have around 2 dim examples, so not good for high dimensionality We can use locality sensitive hashing (approximate k-nn) Examples are inserted in multiple hash tables that use hash functions that with high probability put together examples that are close We retrieve from all the hash tables the examples that are in the bin of the query example We compute the k-nn only with these examples Javier Béjar cbea (LSI - FIB) K-nearest neighbours Term 2012/2013 12 / 23

K-nearest neighbours - Variants K-nn Algorithm There are different possibilities for computing the class from the k nearest neighbours Majority vote Distance weighted vote Inverse of the distance Inverse of the square of the distance Kernel functions (gaussian kernel, tricube kernel,...) Once we use weights for the prediction we can relax the constraint of using only k neighbours 1 We can use k examples (local model) 2 We can use all examples (global model) Javier Béjar cbea (LSI - FIB) K-nearest neighbours Term 2012/2013 13 / 23

K-nn Regression K-nearest neighbours - Regression We can extend this method from classification to regression Instead of combining the discrete predictions of k-neighbours we have to combine continuous predictions This predictions can be obtained in different ways: Simple interpolation Averaging Local linear regression Local weighted regression The time complexity of the prediction will depend on the method Javier Béjar cbea (LSI - FIB) K-nearest neighbours Term 2012/2013 14 / 23

K-nearest neighbours - Regression K-nn Regression Simple interpolation K-nn averaging Javier Béjar cbea (LSI - FIB) K-nearest neighbours Term 2012/2013 15 / 23

K-nn Regression K-nearest neighbours - Regression (linear) K-nn linear regression fits the best line between the neighbors A linear regression problem has to be solved for each query (least squares regression) Javier Béjar cbea (LSI - FIB) K-nearest neighbours Term 2012/2013 16 / 23

K-nn Regression K-nearest neighbours - Regression (LWR) Local weighted regression uses a function to weight the contribution of the neighbours depending on the distance, this is done using a kernel function 1 0.5 0-4 -2 0 2 4 Kernel functions have a width parameter that determines the decay of the weight (it has to be adjusted) Too narrow = overfitting Too wide = underfitting A weighted linear regression problem has to be solved for each query (gradient descent search) Javier Béjar cbea (LSI - FIB) K-nearest neighbours Term 2012/2013 17 / 23

K-nn Regression K-nearest neighbours - Regression (LWR) Javier Béjar cbea (LSI - FIB) K-nearest neighbours Term 2012/2013 18 / 23

Advantages and drawbacks K-nearest neighbours - Advantages The cost of the learning process is zero No assumptions about the characteristics of the concepts to learn have to be done Complex concepts can be learned by local approximation using simple procedures Javier Béjar cbea (LSI - FIB) K-nearest neighbours Term 2012/2013 19 / 23

Advantages and drawbacks K-nearest neighbours - Drawbacks The model can not be interpreted (there is no description of the learned concepts) It is computationally expensive to find the k nearest neighbours when the dataset is very large Performance depends on the number of dimensions that we have (curse of dimensionality) = Attribute Selection Javier Béjar cbea (LSI - FIB) K-nearest neighbours Term 2012/2013 20 / 23

Advantages and drawbacks The Curse of dimensionality The more dimensions we have, the more examples we need to approximate a hypothesis The number of examples that we have in a volume of space decreases exponentially with the number of dimensions This is specially bad for k-nearest neighbors If the number of dimensions is very high the nearest neighbours can be very far away Javier Béjar cbea (LSI - FIB) K-nearest neighbours Term 2012/2013 21 / 23

Optical Character Recognition Application A A A A A A A A A A A A OCR capital letters 14 Attributes (All continuous) Attributes: horizontal position of box, vertical position of box, width of box, height of box, total num on pixels, mean x of on pixels in box,... 20000 instances 26 classes (A-Z) Validation: 10 fold cross validation Javier Béjar cbea (LSI - FIB) K-nearest neighbours Term 2012/2013 22 / 23

Application Optical Character Recognition: Models K-nn 1 (Euclidean distance, weighted): accuracy 96.0% K-nn 5 (Manhattan distance, weighted): accuracy 95.9% K-nn 1 (Correlation distance, weighted): accuracy 95.1% Javier Béjar cbea (LSI - FIB) K-nearest neighbours Term 2012/2013 23 / 23