MA2823: Foundations of Machine Learning



Similar documents
INTRODUCTION TO MACHINE LEARNING 3RD EDITION

Machine Learning. CS494/594, Fall :10 AM 12:25 PM Claxton 205. Slides adapted (and extended) from: ETHEM ALPAYDIN The MIT Press, 2004

Obtaining Value from Big Data

Lecture Slides for INTRODUCTION TO. ETHEM ALPAYDIN The MIT Press, Lab Class and literature. Friday, , Harburger Schloßstr.

Machine Learning Introduction

Introduction to Machine Learning Lecture 1. Mehryar Mohri Courant Institute and Google Research

Machine Learning CS Lecture 01. Razvan C. Bunescu School of Electrical Engineering and Computer Science

Machine Learning: Overview

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015

Example application (1) Telecommunication. Lecture 1: Data Mining Overview and Process. Example application (2) Health

Machine Learning and Data Mining. Fundamentals, robotics, recognition

CS 2750 Machine Learning. Lecture 1. Machine Learning. CS 2750 Machine Learning.

Introduction to Machine Learning Using Python. Vikram Kamath

Machine Learning and Statistics: What s the Connection?

Network Machine Learning Research Group. Intended status: Informational October 19, 2015 Expires: April 21, 2016

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

Introduction to Data Mining

Data, Measurements, Features

BIOINF 585 Fall 2015 Machine Learning for Systems Biology & Clinical Informatics

Information Management course

Learning is a very general term denoting the way in which agents:

Graduate Co-op Students Information Manual. Department of Computer Science. Faculty of Science. University of Regina

CPSC 340: Machine Learning and Data Mining. Mark Schmidt University of British Columbia Fall 2015

Principles of Data Mining by Hand&Mannila&Smyth

Introduction to Data Mining and Machine Learning Techniques. Iza Moise, Evangelos Pournaras, Dirk Helbing

Machine Learning. CUNY Graduate Center, Spring Professor Liang Huang.

MACHINE LEARNING IN HIGH ENERGY PHYSICS

Data Mining + Business Intelligence. Integration, Design and Implementation

Machine Learning. Chapter 18, 21. Some material adopted from notes by Chuck Dyer

Data Mining and Machine Learning in Bioinformatics

Data Mining: Overview. What is Data Mining?

: Introduction to Machine Learning Dr. Rita Osadchy

Introduction to Pattern Recognition

Title. Introduction to Data Mining. Dr Arulsivanathan Naidoo Statistics South Africa. OECD Conference Cape Town 8-10 December 2010.

Data Mining and Exploration. Data Mining and Exploration: Introduction. Relationships between courses. Overview. Course Introduction

Learning outcomes. Knowledge and understanding. Competence and skills

ADVANCED MACHINE LEARNING. Introduction

Data Mining and Knowledge Discovery in Databases (KDD) State of the Art. Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland

Data Mining for Fun and Profit

An Introduction to Data Mining

CSE 517A MACHINE LEARNING INTRODUCTION

TIETS34 Seminar: Data Mining on Biometric identification

Introduction to Data Mining

Detection. Perspective. Network Anomaly. Bhattacharyya. Jugal. A Machine Learning »C) Dhruba Kumar. Kumar KaKta. CRC Press J Taylor & Francis Croup

Using Artificial Intelligence to Manage Big Data for Litigation

DATA MINING TECHNIQUES AND APPLICATIONS

ARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING)

Using Data Mining for Mobile Communication Clustering and Characterization

Machine Learning for Data Science (CS4786) Lecture 1

Applications of Deep Learning to the GEOINT mission. June 2015

Introduction to Learning & Decision Trees

The Data Mining Process

An Overview of Knowledge Discovery Database and Data mining Techniques

Defending Networks with Incomplete Information: A Machine Learning Approach. Alexandre

Foundations of Artificial Intelligence. Introduction to Data Mining

Data Mining Techniques in CRM

A STUDY ON DATA MINING INVESTIGATING ITS METHODS, APPROACHES AND APPLICATIONS

CSci 538 Articial Intelligence (Machine Learning and Data Analysis)

Predictive Analytics Techniques: What to Use For Your Big Data. March 26, 2014 Fern Halper, PhD

Data Mining Techniques

What is Data Mining? Data Mining (Knowledge discovery in database) Data mining: Basic steps. Mining tasks. Classification: YES, NO

MS1b Statistical Data Mining

MACHINE LEARNING BASICS WITH R

Software Development Training Camp 1 (0-3) Prerequisite : Program development skill enhancement camp, at least 48 person-hours.

Machine Learning with MATLAB David Willingham Application Engineer

Chapter 12 Discovering New Knowledge Data Mining

Data Mining System, Functionalities and Applications: A Radical Review

Steven C.H. Hoi School of Information Systems Singapore Management University

Classification algorithm in Data mining: An Overview

Machine Learning What, how, why?

Data Mining for Customer Service Support. Senioritis Seminar Presentation Megan Boice Jay Carter Nick Linke KC Tobin

A Case of Study on Hadoop Benchmark Behavior Modeling Using ALOJA-ML

from Larson Text By Susan Miertschin

Index Contents Page No. Introduction . Data Mining & Knowledge Discovery

Introduction to Artificial Intelligence G51IAI. An Introduction to Data Mining

Statistics for BIG data

TDA and Machine Learning: Better Together

Data Warehousing and Data Mining for improvement of Customs Administration in India. Lessons learnt overseas for implementation in India

Supervised Learning (Big Data Analytics)

Support Vector Machines with Clustering for Training with Very Large Datasets

Data Mining is sometimes referred to as KDD and DM and KDD tend to be used as synonyms

Government of Russian Federation. Faculty of Computer Science School of Data Analysis and Artificial Intelligence

Predictive Modeling Techniques in Insurance

Faculty of Science School of Mathematics and Statistics

Machine Learning Capacity and Performance Analysis and R

Non-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning

Mobile Phone APP Software Browsing Behavior using Clustering Analysis

Role Description. Position of a Data Scientist Machine Learning at Fractal Analytics

Introduction. A. Bellaachia Page: 1

Is a Data Scientist the New Quant? Stuart Kozola MathWorks

Data Mining Solutions for the Business Environment

Machine Learning using MapReduce

Syllabus for MATH 191 MATH 191 Topics in Data Science: Algorithms and Mathematical Foundations Department of Mathematics, UCLA Fall Quarter 2015

Social Media Mining. Data Mining Essentials

Big Data - Lecture 1 Optimization reminders

HT2015: SC4 Statistical Data Mining and Machine Learning

CS Master Level Courses and Areas COURSE DESCRIPTIONS. CSCI 521 Real-Time Systems. CSCI 522 High Performance Computing

Analytics on Big Data

Transcription:

MA2823: Foundations of Machine Learning École Centrale Paris Fall 2015 Chloé-Agathe Azencot Centre for Computational Biology, Mines ParisTech chloe agathe.azencott@mines paristech.fr

TAs: Jiaqian Yu jiaqian.yu@centralesupelec.fr Eugene Belilovsky eugene.belilovsky@inria.fr Slides thanks to Ethem Alpaydi, Matthew Blaschko, Trevor Hastie, Rob Tibshirani and Jean-Philippe Vert. Textbook: Elements of Statistical Learning Hastie & Tibshirani Available at http://statweb.stanford.edu/~tibs/elemstatlearn/ Course material & contact http://cazencott.info [Teaching] http://cazencott.info/index.php/pages/ma2823 Foundations of Machine Learning %28Fall 2015%29 chloe agathe.azencott@mines paristech.fr 2

What is (Machine) Learning? 3

Why Learn? Learning: Modifying a behavior based on experience. [F. Bénureau] Machine learning: Programming computers to optimize a performance criterion using example data. There is no need to learn to calculate payroll. Learning is used when Human expertise does not exist (bioinformatics); Humans are unable to explain their expertise (speech recognition, computer vision); Solutions change in time (routing computer networks); Solutions need adapting to new cases (user biometrics). 4

Artificial Intelligence Machine Learning is a subfield of Artificial Intelligence: A system that lives in a changing environment must have the ability to learn in order to adapt. ML algorithms are building blocks that make computers behave more intelligently by generalizing rather than merely storing and retrieving data (like a database system would do). 5

What we talk about when we talk about learning Learning general models from particular examples (data) Data is (mostly) cheap and abundant; Knowledge is expensive and scarce. Example in retail: From customer transactions to consumer behavior People who bought Game of Thrones also bought Lord of the Rings [amazon.com] Goal: Build a model that is a good and useful approximation to the data. 6

Data mining: Applying ML to (large) databases Retail: Market basket analysis, Customer relationship management (CRM). Finance: Credit scoring, fraud detection. Manufacturing: Control, optimization, troubleshooting. Medicine: Medical diagnosis. Telecommunications: Spam filters, intrusion detection, network optimization, routing. Science: Analyze large amounts of data in physics, astronomy, biology. Web: Search engines. 7

What is machine learning? Optimizing a performance criterion using example data or past experience. Role of Statistics: Build mathematical models to make inference from a sample. Role of Computer Science: Efficient algorithms to Solve the optimization problem; Represent and evaluate the model for inference. 8

Classes of machine learning problems Association rule learning: Discover relations between variables Supervised learning: Predict outcome from features Classification Regression Ranking Ordered categories (e.g. scores) Unsupervised learning: Find paterns in the data Dimensionality reduction Clustering Semi-supervised learning: Predict outcome for unlabeled but known instances Reinforcement learning: Maximize cumulative reward 9

Learning associations Market basket analysis: X, Y: products/services P ( Y X ): probability that somebody who buys X also buys Y. Example: P ( chips beer ) = 0.70 P ( chips beer, M, age < 30) = 0.85 10

Classification (or patern recognition) Example: Credit scoring savings Differentiate between low-risk and high-risk customers, from their income and savings θ 2 High risk Low risk θ 1 income Discriminant: IF income > θ 1 AND savings > θ 2 THEN low-risk ELSE high-risk 11

Classification: Applications Face recognition: independently of pose, lighting, occlusion (glasses, beard), make-up, hair style. Character recognition: independently of different handwriting styles. Speech recognition: account for temporal dependency. Medical diagnosis: from symptoms to illnesses. Precision medicine: from clinical & genetic features to diagnosis, prognosis, response to treatment. Biometrics: recognition/authentication using physical or behavioral characteristics: Face, iris, signature... 12

Face recognition example Training examples (one person) Test images ORL dataset, AT&T Labs, Cambridge (UK). 13

Regression Example: Price of a used car x: milage y: price y = f(x θ) f = model θ = parameters θ = (a, b) price mileage 14

Regression: Applications Car navigation: angle of steering Kinematics of a robot arm Binding affinities between molecules Age of onset of a disease Solubility of a chemical in water Yield of a crop Direction of a forest fire 15

Ranking Find a function that orders a given list of items of X As a classification problem: pairwise approach Is {x1, x2} correctly ordered? As a regression problem: pointwise approach Listwise approach: directly optimize for the best list. 16

Ranking: applications Chemoinformatics: virtual screening scoring 3D structures Recommander systems: rank movies according to how likely a user is to view them. rank items according to how likely a consumer is to buy them. Information retrieval: document retrieval web search 17

Uses of supervised learning Prediction of future cases: Use the rule to predict the output for new inputs. Knowledge extraction: Interpret the rule. Assumes the rule is easy to understand. Compression: The rule is simpler than the data it explains. Outlier detection: Find exceptions that are not covered by the rule (they might be fraud, intrusion, errors). 18

Unsupervised learning No output Learn what normally happens Density estimation: Find the structure (regularities) in the data Clustering Dimensionality reduction 19

Clustering Goal: Group objects into clusters, i.e. classes that were unknown beforehand. Objects in the same cluster are more similar to each other than to objects in a different cluster. Motivation: Understanding general characteristics of the data; Visualization; Infering some properties of an object based on how it relates to other objects. 20

Clustering: applications Customer relationship management: Customer segmentation Image compression: Color quantization Document clustering: Group documents by topics (bagof-words) Bioinformatics: Learning motifs. 21

Dimensionality reduction Goal: Reduce the number of input variables Feature selection: Only keep the features (= variables) that are relevant Feature extraction: Transform the data into a space of lower dimension. Goals: Reduce storage space & computational time Remove colinearities Visualization (in 2 or 3 dimensions) and interpretability. 22

Reinforcement learning Output = sequence of actions Learn a policy: sequence of correct actions to reach the goal No supervised output but delayed reward Examples Game playing Robot in a maze Issues: multiple agents, partial observability,... 23

Artificial intelligence Electrical engineering Signal processing Patern recognition Engineering Knowledge discovery Optimization in databases Computer science Data mining Big data Business Data science Inference Discriminant analysis Induction Statistics 24

Learning objectives After this course, you should be able to Identify problems that can be solved by machine learning; Given such a problem, identify and apply the most appropriate classical algorithm(s); Implement some of these algorithms yourself; Evaluate and compare machine learning algorithms for a particular task. 25

Course Syllabus Sep 23 1. Introduction 2. Supervised learning Sep 25 3. Model evaluation & selection Lab: scikit-learn Oct 02 4. Bayesian decision theory Lab: scikit-learn Oct 09 5. Linear and logistic regression Lab: Kaggle challenge Oct 16 Parametric methods 6. Regularized linear regression Lab: Kaggle challenge 26

Nov 06 7. Nearest neighbor methods Lab: Kaggle challenge Nov 13 8. Tree-based methods Lab: Kaggle challenge Nov 20 Non-parametric methods 9. Support vector machines Nov 27 Guest talk: Yunlong Jiao, Ranking Kernels 10. Neural networks Lab: Kaggle challenge Dec 04 11. Dimensionality reduction Lab: Kaggle challenge Dec 11 Unsupervised Learning 12. Clustering Lab: Kaggle challenge 27

challenge project San Francisco Crime Classification Challenge https://www.kaggle.com/c/sf crime Predict the category of crimes that occurred in the city by the bay Multiple classes (Fraud, Kidnapping, DUI...) From location & time Evaluation on Prediction performance Quality of code (3 most upvoted scripts get Kaggle swag) 28

Evaluation Kaggle project Written report Position in the leaderboard Community involvement Introduction: October 2, 2015 Final exam Pen and paper Closed book December 15, 2015 29

Resources: Datasets UCI Repository: http://www.ics.uci.edu/~mlearn/mlrepository.html UCI KDD Archive: http://kdd.ics.uci.edu/summary.data.application.ht ml Statlib: http://lib.stat.cmu.edu/ Delve: http://www.cs.utoronto.ca/~delve/ 30

Resources: Journals Journal of Machine Learning Research Machine Learning Neural Computation Neural Networks IEEE Transactions on Neural Networks IEEE Transactions on Pattern Analysis and Machine Intelligence Annals of Statistics Journal of the American Statistical Association 31

Resources: Conferences International Conference on Machine Learning (ICML) European Conference on Machine Learning (ECML) Neural Information Processing Systems (NIPS) Uncertainty in Artificial Intelligence (UAI) Computational Learning Theory (COLT) International Conference on Artificial Neural Networks (ICANN) International Conference on AI & Statistics (AISTATS) International Conference on Pattern Recognition (ICPR) 32