Introduction to Machine Learning Lecture 1. Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu

Similar documents

CS 2750 Machine Learning. Lecture 1. Machine Learning. CS 2750 Machine Learning.

Machine Learning. Chapter 18, 21. Some material adopted from notes by Chuck Dyer

Supervised Learning (Big Data Analytics)

Learning is a very general term denoting the way in which agents:

: Introduction to Machine Learning Dr. Rita Osadchy

Machine Learning. CUNY Graduate Center, Spring Professor Liang Huang.

Machine Learning Introduction

MA2823: Foundations of Machine Learning

Machine Learning CS Lecture 01. Razvan C. Bunescu School of Electrical Engineering and Computer Science

Network Machine Learning Research Group. Intended status: Informational October 19, 2015 Expires: April 21, 2016

HT2015: SC4 Statistical Data Mining and Machine Learning

Data Mining - Evaluation of Classifiers

Azure Machine Learning, SQL Data Mining and R

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R

COLLEGE OF SCIENCE. John D. Hromi Center for Quality and Applied Statistics

Government of Russian Federation. Faculty of Computer Science School of Data Analysis and Artificial Intelligence

BIOINF 585 Fall 2015 Machine Learning for Systems Biology & Clinical Informatics

Data Mining. Nonlinear Classification

Maschinelles Lernen mit MATLAB

KATE GLEASON COLLEGE OF ENGINEERING. John D. Hromi Center for Quality and Applied Statistics

Microsoft Azure Machine learning Algorithms

Data, Measurements, Features

Machine Learning: Overview

1 What is Machine Learning?

What is Learning? CS 391L: Machine Learning Introduction. Raymond J. Mooney. Classification. Problem Solving / Planning / Control

ADVANCED MACHINE LEARNING. Introduction

An Introduction to Data Mining

Knowledge Discovery and Data Mining

STA 4273H: Statistical Machine Learning

Big Data Analytics CSCI 4030

IFT3395/6390. Machine Learning from linear regression to Neural Networks. Machine Learning. Training Set. t (3.5, -2,..., 127, 0,...

Machine Learning for Data Science (CS4786) Lecture 1

Machine Learning and Data Mining. Fundamentals, robotics, recognition

CPSC 340: Machine Learning and Data Mining. Mark Schmidt University of British Columbia Fall 2015

Social Media Mining. Data Mining Essentials

Attribution. Modified from Stuart Russell s slides (Berkeley) Parts of the slides are inspired by Dan Klein s lecture material for CS 188 (Berkeley)

Machine learning for algo trading

INTRODUCTION TO MACHINE LEARNING 3RD EDITION

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015

MACHINE LEARNING IN HIGH ENERGY PHYSICS

Machine Learning Capacity and Performance Analysis and R

CSE 517A MACHINE LEARNING INTRODUCTION

No BI without Machine Learning

Introduction to Data Mining

Prerequisites. Course Outline

F. Aiolli - Sistemi Informativi 2007/2008

Data Mining Part 5. Prediction

Ensemble Methods. Knowledge Discovery and Data Mining 2 (VU) ( ) Roman Kern. KTI, TU Graz

Decompose Error Rate into components, some of which can be measured on unlabeled data

CSci 538 Articial Intelligence (Machine Learning and Data Analysis)

Introduction to Machine Learning Using Python. Vikram Kamath

Introduction to Learning & Decision Trees

COLUMBIA UNIVERSITY IN THE CITY OF NEW YORK DEPARTMENT OF INDUSTRIAL ENGINEERING AND OPERATIONS RESEARCH

Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris

Machine Learning and Statistics: What s the Connection?

Scalable Developments for Big Data Analytics in Remote Sensing

LCs for Binary Classification

Machine Learning. Mausam (based on slides by Tom Mitchell, Oren Etzioni and Pedro Domingos)

The Data Mining Process

Data Mining and Machine Learning in Bioinformatics

MS1b Statistical Data Mining

Introduction to Data Mining

Knowledge Discovery and Data Mining

Foundations of Machine Learning On-Line Learning. Mehryar Mohri Courant Institute and Google Research

How To Create A Text Classification System For Spam Filtering

Lecture Slides for INTRODUCTION TO. ETHEM ALPAYDIN The MIT Press, Lab Class and literature. Friday, , Harburger Schloßstr.

Welcome. Data Mining: Updates in Technologies. Xindong Wu. Colorado School of Mines Golden, Colorado 80401, USA

MACHINE LEARNING BRETT WUJEK, SAS INSTITUTE INC.

A Study Of Bagging And Boosting Approaches To Develop Meta-Classifier

Question 2 Naïve Bayes (16 points)

Audit Analytics. --An innovative course at Rutgers. Qi Liu. Roman Chinchila

Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data

Faculty of Science School of Mathematics and Statistics

Defending Networks with Incomplete Information: A Machine Learning Approach. Alexandre

Classification algorithm in Data mining: An Overview

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

Artificial Neural Networks and Support Vector Machines. CS 486/686: Introduction to Artificial Intelligence

CS Master Level Courses and Areas COURSE DESCRIPTIONS. CSCI 521 Real-Time Systems. CSCI 522 High Performance Computing

Machine Learning.

Semi-Supervised Support Vector Machines and Application to Spam Filtering

Simple and efficient online algorithms for real world applications

Chapter 6. The stacking ensemble approach

COPYRIGHTED MATERIAL. Contents. List of Figures. Acknowledgments

Predictive Analytics Techniques: What to Use For Your Big Data. March 26, 2014 Fern Halper, PhD

Machine Learning in Spam Filtering

Steven C.H. Hoi School of Information Systems Singapore Management University

Towards better accuracy for Spam predictions

Cross-validation for detecting and preventing overfitting

Obtaining Value from Big Data

Machine Learning What, how, why?

Bing Liu. Web Data Mining. Exploring Hyperlinks, Contents, and Usage Data. With 177 Figures. ~ Spring~r

ARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING)

Lecture 13: Validation

Learning outcomes. Knowledge and understanding. Competence and skills

A Partially Supervised Metric Multidimensional Scaling Algorithm for Textual Data Visualization

Data Mining + Business Intelligence. Integration, Design and Implementation

Information Management course

Machine Learning. Term 2012/2013 LSI - FIB. Javier Béjar cbea (LSI - FIB) Machine Learning Term 2012/ / 34

Data Mining: Overview. What is Data Mining?

Office: LSK 5045 Begin subject: [ISOM3360]...

Transcription:

Introduction to Machine Learning Lecture 1 Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu

Introduction

Logistics Prerequisites: basics concepts needed in probability and statistics will be introduced. Workload: 5 homework assignments. mid-term exam, final project. Textbooks: recommended, not mandatory. no single textbook covering material presented. lecture slides available electronically. Mailing list: join as soon as possible. page 3

Machine Learning Definition: computational methods using experience to improve performance, e.g., to make accurate predictions. Experience: data-driven task, thus statistics, probability. Example: use height and weight to predict gender. Computer science: need to design efficient and accurate algorithms, analysis of complexity, theoretical guarantees. page 4

Examples of Learning Tasks Optical character recognition. Text or document classification, spam detection. Morphological analysis, part-of-speech tagging, statistical parsing. Speech recognition, speech synthesis, speaker verification. Image recognition, face recognition. page 5

Examples of Learning Tasks Fraud detection (credit card, telephone), network intrusion. Games (chess, backgammon). Unassisted control of a vehicle (robots, navigation). Medical diagnosis. Recommendation systems, search engines, information extraction systems. page 6

Some Broad Areas of ML Classification: assign a category to each object (OCR, text classification, speech recognition). note: infinite number of categories in difficult tasks. Regression: predict a real value for each object (prices, stock values, economic variables, ratings). measure of success: closeness of predictions. Ranking: order objects according to some criterion (relevant web pages returned by a search engine). page 7

Some Broad Areas of ML Clustering: partition data into homogenous groups (analysis of very large data sets). Dimensionality reduction: find lower-dimensional manifold preserving some properties of the data (computer vision). Density estimation: learning probability distribution according to which data has been sampled (distribution typically selected out of pre-selected family). page 8

Objectives of Machine Learning Algorithms: design of efficient, accurate, and general learning algorithms to deal with large-scale problems ( data > 1-10M). make accurate predictions (unseen examples). handle a variety of different learning problems. Theoretical questions what can be learned? Under what conditions? how well can it be learned computationally? page 9

This course Algorithms: covers most key learning algorithms. nearest-neighbor algorithms. perceptron, Winnow, Halving, Weighted Majority. support vector machines, kernel methods. boosting, bagging. Applications: illustration of the use of algorithms. software and familiarization (assignments). Theory: analysis and introduction to concepts. page 10

Topics Basic notions of probability. Bayesian inference. Nearest-neighbor algorithms. On-line learning with expert advice (Weighted Majority, Exponentiated Average). On-line linear classification (Perceptron, Winnow). Support Vector Machines (SVMs). page 11

Topics Kernel methods. Decision trees. Ensemble methods (boosting, bagging). Logistic regression. Density estimation (ML, Maxent models) Multi-class classification. page 12

Topics Linear regression. Kernel ridge regression, Lasso. Neural networks. Clustering. Dimensionality reduction Introduction to reinforcement learning. Elements of learning theory. page 13

Definitions and Terminology Example: an object or instance in data used. Features: the set of attributes, often represented as a vector, associated to an example, e.g., height and weight for gender prediction. Labels: in classification, category associated to an object, e.g., positive or negative in binary classification. in regression, real-valued numbers. page 14

Definitions and Terminology Training data: data used for training algorithm. Test data: data exclusively used for testing algorithm. Some standard learning scenarios: supervised learning: labeled training data. unsupervised learning: no labeled data. semi-supervised learning: labeled training data + unlabeled data. transductive learning: labeled training data + unlabeled test data. page 15

Example - SPAM Detection Problem: classify each e-mail message as SPAM or non-spam (binary classification problem). Data: large collection of SPAM and non-spam messages (labeled examples). Features: define features for all examples (e.g., presence or absence of some sequences). critical step (should use prior knowledge). Algorithm: choose type of algorithm adapted to the problem. typically requires choice of hypothesis set. page 16

Example - SPAM Detection Learning stages: divide labeled collection into training and test data. use training data and features to train machine learning algorithm. predict labels of examples in test data to evaluate algorithm. algorithms may require choosing a parameter (number of rounds, learning parameter, trade--off parameter) validation set or cross-validation. page 17

Cross-Validation Partition data into K folds (typically, K =5 or 10). Train on all but kth fold hypothesis, k [1,K]. Compute k-fold cross validation error: 1 K K k=1 error(h θ,k, fold k). Choose value of θ minimizing CV error. h θ,k When K =m (sample size) leave-one-out crossvalidation and error. page 18

Importance of Features Features: poor features, uncorrelated with labels, make learning very difficult for all algorithms. good features, can be very effective; often knowledge of the task can help. Example: 00101001110 11010110001 00101001110 11110111100 11100100100 2 1 2 0 2 11000111001 00001100011 11010001101 00101010100 11000011100 0 0 1 4 0 page 19

Generalization Generalization: not memorization. minimizing error on the training set in general does not guarantee good generalization. too complex hypotheses could overfit training sample. how much complexity vs. training sample size? page 20