K-nearest-neighbor: an introduction to machine learning

Similar documents
Introduction to Machine Learning Lecture 1. Mehryar Mohri Courant Institute and Google Research

Big Data Analytics CSCI 4030

Cross-validation for detecting and preventing overfitting

Reference Books. Data Mining. Supervised vs. Unsupervised Learning. Classification: Definition. Classification k-nearest neighbors

Maschinelles Lernen mit MATLAB

Machine Learning. CUNY Graduate Center, Spring Professor Liang Huang.

Lecture 13: Validation

: Introduction to Machine Learning Dr. Rita Osadchy

Machine Learning. Chapter 18, 21. Some material adopted from notes by Chuck Dyer

Analysis of kiva.com Microlending Service! Hoda Eydgahi Julia Ma Andy Bardagjy December 9, 2010 MAS.622j

L13: cross-validation

Network Machine Learning Research Group. Intended status: Informational October 19, 2015 Expires: April 21, 2016

Social Media Mining. Data Mining Essentials

Classification algorithm in Data mining: An Overview

Learning is a very general term denoting the way in which agents:

Data Mining - Evaluation of Classifiers

Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data

Data, Measurements, Features

Supervised Learning (Big Data Analytics)

Machine Learning: Overview

CS 2750 Machine Learning. Lecture 1. Machine Learning. CS 2750 Machine Learning.

Cross-Validation. Synonyms Rotation estimation

Distance Metric Learning in Data Mining (Part I) Fei Wang and Jimeng Sun IBM TJ Watson Research Center

Machine Learning Capacity and Performance Analysis and R

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015

Knowledge Discovery and Data Mining

ARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING)

Data Mining Part 5. Prediction

Linear Threshold Units

Introduction to Learning & Decision Trees

Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus

Attribution. Modified from Stuart Russell s slides (Berkeley) Parts of the slides are inspired by Dan Klein s lecture material for CS 188 (Berkeley)

Online Semi-Supervised Learning

Supervised and unsupervised learning - 1

Local classification and local likelihoods

Classification Techniques (1)

Machine Learning CS Lecture 01. Razvan C. Bunescu School of Electrical Engineering and Computer Science

W6.B.1. FAQs CS535 BIG DATA W6.B If the distance of the point is additionally less than the tight distance T 2, remove it from the original set

Classifying Manipulation Primitives from Visual Data

Knowledge Discovery and Data Mining

Machine Learning using MapReduce

Cross Validation. Dr. Thomas Jensen Expedia.com

Introduction to Machine Learning Using Python. Vikram Kamath

Overview. Evaluation Connectionist and Statistical Language Processing. Test and Validation Set. Training and Test Set

ADVANCED MACHINE LEARNING. Introduction

Learning Example. Machine learning and our focus. Another Example. An example: data (loan application) The data and the goal

Machine Learning. Mausam (based on slides by Tom Mitchell, Oren Etzioni and Pedro Domingos)

DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS

Chapter 7. Feature Selection. 7.1 Introduction

OUTLIER ANALYSIS. Data Mining 1

Spam Detection A Machine Learning Approach

Principal components analysis

Performance Metrics for Graph Mining Tasks

Distances, Clustering, and Classification. Heatmaps

Clustering. Data Mining. Abraham Otero. Data Mining. Agenda

Knowledge Discovery and Data Mining

INTRODUCTION TO MACHINE LEARNING 3RD EDITION

Semi-Supervised Support Vector Machines and Application to Spam Filtering

Music Mood Classification

Supervised Feature Selection & Unsupervised Dimensionality Reduction

T : Classification as Spam or Ham using Naive Bayes Classifier. Santosh Tirunagari :

Data Mining. Nonlinear Classification

Introduction to Pattern Recognition

Steven C.H. Hoi School of Information Systems Singapore Management University

MACHINE LEARNING IN HIGH ENERGY PHYSICS

A Feature Selection Methodology for Steganalysis

Medical Information Management & Mining. You Chen Jan,15, 2013 You.chen@vanderbilt.edu

We discuss 2 resampling methods in this chapter - cross-validation - the bootstrap

Probabilistic Latent Semantic Analysis (plsa)

Non-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning

Defending Networks with Incomplete Information: A Machine Learning Approach. Alexandre

Machine Learning Introduction

Machine Learning. Term 2012/2013 LSI - FIB. Javier Béjar cbea (LSI - FIB) Machine Learning Term 2012/ / 34

Spam Detection and Pattern Recognition

Linear smoother. ŷ = S y. where s ij = s ij (x) e.g. s ij = diag(l i (x)) To go the other way, you need to diagonalize S

Active Learning SVM for Blogs recommendation

Towards better accuracy for Spam predictions

Machine Learning Logistic Regression

CSE 517A MACHINE LEARNING INTRODUCTION

Environmental Remote Sensing GEOG 2021

How To Cluster

Feature Selection vs. Extraction

Using multiple models: Bagging, Boosting, Ensembles, Forests

Going Big in Data Dimensionality:

Machine learning for algo trading

Classification and Prediction techniques using Machine Learning for Anomaly Detection.

Framing Business Problems as Data Mining Problems

Machine Learning.

New Ensemble Combination Scheme

Employer Health Insurance Premium Prediction Elliott Lui

Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris

Foundations of Artificial Intelligence. Introduction to Data Mining

Data Mining: An Overview. David Madigan

Making Sense of the Mayhem: Machine Learning and March Madness

MA2823: Foundations of Machine Learning

How To Perform An Ensemble Analysis

Unsupervised Learning and Data Mining. Unsupervised Learning and Data Mining. Clustering. Supervised Learning. Supervised Learning

Data Mining Practical Machine Learning Tools and Techniques

Final Project Report

Self Organizing Maps: Fundamentals

Transcription:

K-nearest-neighbor: an introduction to machine learning Xiaojin Zhu jerryzhu@cs.wisc.edu Computer Sciences Department University of Wisconsin, Madison slide 1

Outline Types of learning Classification: features, classes, classifier K-nearest-neighbor, instance-based learning Distance metric Training error, test error Overfitting Held-out set, k-fold cross validation, leave-one-out slide 2

machine learning Learn: to improve automatically with experience speech recognition autonomous vehicles classify astronomical objects play backgammon predict heart attack predict stock prices filter spam... slide 3

1. supervised learning there are 3 types of learning 1.classification: You're given images of scanned handwritten digits (x) and their corresponding labels (y discrete). Design a ZIP-code reader for the post office to label new scanned digit images. 2.regression: Given the infrared absorption spectrum (x) of diabetic people's blood, and the amount of glucose (y continuous) in the blood. Estimate glucose amount from a new spectrum. supervised because y was given during training. x are called [instances examples points]. Each instance is represented by a [feature attribute] vector. slide 4

there are 3 types of learning 1. unsupervised learning clustering: 'cut out' in digital photo: how can we group pixels (x) into meaningful regions (clusters)? [Shi & Malik 2000] outlier detection: Are some of x abnormal? [Wand et al. 2001] [Weinberger et al. 2005] dimensionality reduction, and more... unsupervised because we only have (x1...xn) slide 5

1. reinforcement learning there are 3 types of learning world: you're in observable world state x1 robot: i'll take action a1 world: you receive reward r1. now you're in state x2 robot: i'll take action a2 world: you receive reward r2. now you're in state x3... What's the optimal action a when in state x, in order to maximize life-long reward? reinforcement because learning is via action - reward slide 6

there are more than 3 types of learning 1. supervised learning: (x,y) classification regression 2. unsupervised learning: x clustering, outlier detection, dimensionality reduction 3. reinforcement learning: (x,a,r) 5. semi-supervised learning: (x,y),x 6. multi-agent reinforcement learning as game playing: the world is full of other robots (recall the tragedy of the Commons)... slide 7

the design cycle of classification 1. collect data and labels (the real effort) 2. choose features (the real ingenuity) 3. pick a classifier (some ingenuity) 4. train the classifier (some knobs, fairly mechanical) 5. evaluate the classifier (needs care) If classifier no good, goto 1 4 slide 8

1-nearest-neighbor (1NN) classifier Training set: (x 1,y 1 ), (x 2,y 2 ),, (x n,y n ) Assume x i =(x i (1), x i (2),, x i (d) ) is a d-dimensional feature vector of real numbers, for all i y i is a class label in {1 C}, for all i Your task: determine y new for x new 1NN algorithm: 8. Find the closest point x j to x new w.r.t. Euclidean distance [(x j (1) - x new (1) ) 2 + + (x j (d) - x new (d) ) 2 ] 1/2 9. Classify by y 1nn = y j slide 9

Instance-based learning 1NN is a special case of instance-based learning Find similar instances to x new in training data Fit the label within the similar instances 100-year old idea Variations in instance-based learning How many neighbors to consider? (1 for 1NN) What distance to use? (Euclidean for 1NN) How to combine neighbors labels? (the same) Instance-based learning methods which you should know the name, but we won t cover today: Nearest-neighbor regression Kernel regression, locally weighted regression slide 10

K-nearest-neighbor (knn) Obvious extension to 1NN: consider k neighbors knn algorithm: 5. Find k closest training points to x new w.r.t. Euclidean distance 6. Classify by y knn = majority vote among the k points How many neighbors to consider? (k) What distance to use? (Euclidean) How to combine neighbors labels? (majority vote) slide 11

knn demo http://www.cs.cmu.edu/~zhuxj/courseproject/knndemo/knn.html slide 12

Near neighbors should count more than far neighbors Each neighbor cast vote with weight w i, depending on the distance d. Many choices Weighted knn How many neighbors to consider? (k) What distance to use? (Euclidean) How to combine neighbors labels? (weighted majority vote) Copied from Andrew Moore slide 13

The distance metric The answer shouldn t change if one uses metric or English units for the features An Euclidean distance with different scaling factors (a 1 a d ) for different dimensions: [a 1 (x j (1) - x new (1) ) 2 + + a d (x j (d) - x new (d) ) 2 ] 1/2 Dist 2 (x i,x j ) = (x i1 x j1 ) 2 + (x i2 x j2 ) 2 Dist 2 (x i,x j ) =(x i1 x j1 ) 2 +(3x i2 3x j2 ) 2 Copied from Andrew Moore slide 14

Some notable distance metrics Scaled Euclidean (diagonal covariance) Mahalanobis (full covariance) L 1 norm L (max) norm Copied from Andrew Moore slide 15

Distance metric However choosing an appropriate distance metric is a hard problem Mars climate orbiter, lost in 1999 due to a mix-up of metric and English units within NASA. Important questions: 2. How to choose the scaling factors (a 1 a d )? 3. What if some feature dimensions are simply noise?? 4. How to choose K? These are the parameters of the classifier. slide 16

Evaluate classifiers During training You re given training set (x 1,y 1 ), (x 2,y 2 ),, (x n,y n ). You write knn code. You now have a classifier, a black box which generates y knn for any x new During testing For new test data x n+1 x n+m, your classifier generates labels y n+1 knn y n+m knn Test set accuracy: the correct performance measure You need to know the true test labels y n+1 y n+m acc = ( y i knn ==y i ) / m, where i=n+1,, n+m Test set error = 1 acc slide 17

Parameter selection During training You re given training set (x 1,y 1 ), (x 2,y 2 ),, (x n,y n ). You write knn code. You now have a classifier, a Good idea: pick the black box which estimates y new for any But x most of the parameters (e.g. K) that time new we only have maximize During testing set accuracy the training set, For new test data x n+1 x n+m, your classifier when we design a knn knn generates labels y n+1 y n+m classifier. Test set accuracy: the correct performance measure You need to know the true test labels y n+1 y n+m knn Can we use training acc = ( y i ==y i ) / m, where i=n+1,, n+m set accuracy instead? Test set error = 1 acc slide 18

Parameter selection by training set accuracy? You re given training set (x 1,y 1 ), (x 2,y 2 ),, (x n,y n ). You build a knn classifier on them. You compute the training set accuracy acc train = ( y i knn ==y i ) / n, where i=1,, n You use acc train to select K There is something very wrong, can you see why? Hint: consider K=1 slide 19

Overfitting Overfitting is when a learning algorithm performs too good on the training set, compared to its true performance on unseen test data. Never use training accuracy to select parameters, you ll overfit. Overfitting fits [noise idiosyncrasy ] in training data, not the general underlying regularity. Ideas? slide 20

Leave-one-out cross validation (LOOCV) 1. For each training example (x i, y i ) Train a classifier with all training data except (x i, y i ) Test the classifier s accuracy on (x i, y i ) 2. LOOCV accuracy = average of all n accuracies Selected parameters based on LOOCV accuracy Easy to compute for knn, but (as we ll see later) expensive for many other classifiers. slide 21

The holdout set method 1. Randomly choose, say 30%, of the training data, set it aside 2. Train a classifier with the remaining 70% 3. Test the classifier s accuracy on the 30% Selected parameters based on holdout set accuracy Easy to compute, but has higher variance (not as good an estimator of future performance). Waste data. slide 22

K-fold cross validation (CV) 1. Randomly divide training data into k chunks (folds) 2. For each fold: Train a classifier with all training data except the fold Test the classifier s accuracy on the fold 3. K-fold CV accuracy = average of all k accuracies Selected parameters based on k-fold CV accuracy Between LOOCV and holdout set. Often used. People tend to select k=10 or 5 These methods will not completely prevent overfitting. slide 23

What you should know Types of learning K-nearest-neighbor, instance-based learning Distance metric Training error, test error Overfitting Held-out set, k-fold cross validation, leave-one-out slide 24