ADVANCED MACHINE LEARNING. Introduction

Similar documents
CS 2750 Machine Learning. Lecture 1. Machine Learning. CS 2750 Machine Learning.

BIOINF 585 Fall 2015 Machine Learning for Systems Biology & Clinical Informatics

: Introduction to Machine Learning Dr. Rita Osadchy

Introduction to Machine Learning Using Python. Vikram Kamath

Introduction to Machine Learning Lecture 1. Mehryar Mohri Courant Institute and Google Research

Distance Metric Learning in Data Mining (Part I) Fei Wang and Jimeng Sun IBM TJ Watson Research Center

Supervised Feature Selection & Unsupervised Dimensionality Reduction

HT2015: SC4 Statistical Data Mining and Machine Learning

MS1b Statistical Data Mining

Machine learning for algo trading

Azure Machine Learning, SQL Data Mining and R

Data, Measurements, Features

Unsupervised Data Mining (Clustering)

The Data Mining Process

Machine Learning and Data Analysis overview. Department of Cybernetics, Czech Technical University in Prague.

Supervised and unsupervised learning - 1

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R

COMP 598 Applied Machine Learning Lecture 21: Parallelization methods for large-scale machine learning! Big Data by the numbers

APPM4720/5720: Fast algorithms for big data. Gunnar Martinsson The University of Colorado at Boulder

CSci 538 Articial Intelligence (Machine Learning and Data Analysis)

Machine Learning. CUNY Graduate Center, Spring Professor Liang Huang.

Environmental Remote Sensing GEOG 2021

Machine Learning Introduction

Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus

Machine Learning with MATLAB David Willingham Application Engineer

Information Management course

CS Master Level Courses and Areas COURSE DESCRIPTIONS. CSCI 521 Real-Time Systems. CSCI 522 High Performance Computing

EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set

Learning outcomes. Knowledge and understanding. Competence and skills

Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data

Machine Learning CS Lecture 01. Razvan C. Bunescu School of Electrical Engineering and Computer Science

DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS

Using Data Mining for Mobile Communication Clustering and Characterization

Clustering Big Data. Anil K. Jain. (with Radha Chitta and Rong Jin) Department of Computer Science Michigan State University November 29, 2012

A Partially Supervised Metric Multidimensional Scaling Algorithm for Textual Data Visualization

Journée Thématique Big Data 13/03/2015

STA 4273H: Statistical Machine Learning

Simple and efficient online algorithms for real world applications

Feature Selection using Integer and Binary coded Genetic Algorithm to improve the performance of SVM Classifier

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015

Machine Learning.

MSCA Introduction to Statistical Concepts

MA2823: Foundations of Machine Learning

Network Machine Learning Research Group. Intended status: Informational October 19, 2015 Expires: April 21, 2016

Introduction to Pattern Recognition

How To Cluster

Machine Learning for Data Science (CS4786) Lecture 1

Predict Influencers in the Social Network

Statistics for BIG data

Robust Outlier Detection Technique in Data Mining: A Univariate Approach

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

KATE GLEASON COLLEGE OF ENGINEERING. John D. Hromi Center for Quality and Applied Statistics

Principles of Data Mining by Hand&Mannila&Smyth

ARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING)

Social Media Mining. Data Mining Essentials

DATA MINING TECHNIQUES AND APPLICATIONS

The Scientific Data Mining Process

Machine Learning in Computer Vision A Tutorial. Ajay Joshi, Anoop Cherian and Ravishankar Shivalingam Dept. of Computer Science, UMN

Machine Learning What, how, why?

Government of Russian Federation. Faculty of Computer Science School of Data Analysis and Artificial Intelligence

Making Sense of the Mayhem: Machine Learning and March Madness

Non-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning

Big Data Analytics CSCI 4030

Machine Learning: Overview

The Artificial Prediction Market

Role of Social Networking in Marketing using Data Mining

Knowledge Discovery from patents using KMX Text Analytics

An Analysis on Density Based Clustering of Multi Dimensional Spatial Data

Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches

AMIS 7640 Data Mining for Business Intelligence

Music Mood Classification

Neural Networks Lesson 5 - Cluster Analysis

Bayesian Machine Learning (ML): Modeling And Inference in Big Data. Zhuhua Cai Google, Rice University

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning

Syllabus for MATH 191 MATH 191 Topics in Data Science: Algorithms and Mathematical Foundations Department of Mathematics, UCLA Fall Quarter 2015

Machine Learning and Data Mining. Fundamentals, robotics, recognition

Introduction to Data Mining

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

Comparison of K-means and Backpropagation Data Mining Algorithms

Ensemble Methods. Knowledge Discovery and Data Mining 2 (VU) ( ) Roman Kern. KTI, TU Graz

Colour Image Segmentation Technique for Screen Printing

Maschinelles Lernen mit MATLAB

Exploratory Data Analysis with MATLAB

BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES

An Introduction to Data Mining

not possible or was possible at a high cost for collecting the data.

MACHINE LEARNING IN HIGH ENERGY PHYSICS

Semi-Supervised and Unsupervised Machine Learning. Novel Strategies

Analysis Tools and Libraries for BigData

Multiple Kernel Learning on the Limit Order Book

Mathematical Models of Supervised Learning and their Application to Medical Diagnosis

Recognizing Informed Option Trading

Classification of Bad Accounts in Credit Card Industry

Clustering. Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016

Is a Data Scientist the New Quant? Stuart Kozola MathWorks

Data Mining and Knowledge Discovery in Databases (KDD) State of the Art. Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland

How To Identify A Churner

Transcription:

1 1 Introduction Lecturer: Prof. Aude Billard (aude.billard@epfl.ch) Teaching Assistants: Guillaume de Chambrier, Nadia Figueroa, Denys Lamotte, Nicola Sommer

2 2 Course Format Alternate between: Lectures + Exercises and Practice Sessions: 9h15-11h00 (Thursdays) 10h15-11h00 (Fridays) Check class timetable!! http://lasa.epfl.ch/teaching/lectures/ml_msc_advanced/

3 3 Class Timetable

4 4 Grading 50% of the grade based on personal work. Choice between: 1. Mini-project implementing and evaluating the algorithm performance and sensibility to parameter choices (should be done individually preferably). OR 2. A literature survey on a topic chosen among a list provided in class (can be done in team of two people, report conf. paper format, 8 pages, 10pt, double column format) ~25-30 hours of personal work, i.e. count one full week of work. 50% based on final oral exam 20 minutes preparation 20 minutes answer on the black board (closed book, but allowed to bring a recto-verso A4 page with personal notes)

5 5 Prerequisites Linear Algebra, Probabilities and Statistics Familiar with basic Machine Learning Techniques: Principal Component Analysis Clustering with K-means and Gaussian Mixture Models Linear and weighted regression

6 6 Today s class content Taxonomy and basic concepts of ML Examples of ML applications Overview of class topics

7 7 Main Types of Learning Methods Supervised learning where the algorithm learns a function or model that maps best a set of inputs to a set of desired outputs. Reinforcement learning where the algorithm learns a policy or model of the set of transitions across a discrete set of inputoutput states (Markovian world) in order to maximize a reward value (external reinforcement). Unsupervised learning where the algorithm learns a model that best represent a set of inputs without any feedback (no desired output, no external reinforcement)

8 8 Main Types of Learning Methods Supervised learning where the algorithm learns a function or model that maps best a set of inputs to a set of desired outputs. Classes of supervised algorithms: Classification Regression

9 9 Classification: Principle N Map N-dim. input x to a set of class labels y. C: number of classes ( ( )) { 1,..., C} If two classes (binary classification), learn a function of the type: N h: and y = sgn h x. y = +1 y =? Train on subset of datapoints y = 1 Hope to generalize class labels to unseen datapoints

10 10 Classification: overview Classification is a supervised clustering process. Classification is usually multi-class; Given a set of known classes, the algorithm learns to extract combinations of data features that best predict the true class of the data. Original Data After 4-class classification using Support Vector Machine (SVM)

111 Classification: example Classification of finance data to assess solvability using Support Vector Machine (SVM). 5-classes to measure insolvency risks: Excellent, good, satisfactory, passable, poor (credit) Swiderski et al, Decision Multistage classification by using logistic regression and neural networks for assessment of financial condition of company, Support System, 2012

12 12 Classification: issues Classification of finance data to assess solvability using Support Vector Machine (SVM). The features are crucial. If they are poor, poor learning Usually determine features by hand (prior knowledge) Novel approaches learn the features Swiderski et al, Decision Multistage classification by using logistic regression and neural networks for assessment of financial condition of company, Support System, 2012 5-classes to measure insolvency risks: Excellent, good, satisfactory, passable, poor (credit)

13 13 Classification: issues A recurrent problem when applying classification to real life problems is that classes are often unbalanced. More samples from positive class than from negative class Swiderski et al, 2012 This can affect drastically classification performance, as classes with many data points have more influence on the error measure during training.

14 14 Classification Algorithms in this Course Support Vector Machine Relevance Vector Machine Boosting random projections Boosting random gaussians Random forest Gaussian Process

15 15 Regression: Principle N Map N-dim. input x to a continuous output y. Learn a function of the type: ( ) N f : and y = f x. y 3 y 2 y True function Estimate 4 y 1 y 1 x 2 x 3 x 4 x x { i i} i= 1,... Estimate f that best predict set of training points x, y? M

16 16 Regression: Issues N Map N-dim. input x to a continuous output y. Learn a function of the type: ( ) N f : and y = f x. y 3 y 2 y 4 y 1 y Fit strongly influenced by choice of: - datapoints for training - complexity of the model (interpolation) True function Estimate 1 x 2 x 3 x 4 x x { i i} i= 1,... Estimate f that best predict set of training points x, y? M

Regression: example Predicting the optimal position and orientation of the golf club to hit the ball so as to sink it into the goal. Kronander, Khansari and Billard, JTSC award, IEEE Int. Conf. on Int. and Rob. Systems 2011. 17 17

Regression: example Predicting the optimal position and orientation of the golf club to hit the ball so as to sink it into the goal. Kronander, Khansari and Billard, JTSC award, IEEE Int. Conf. on Int. and Rob. Systems 2011. 18 18

Regression: example Contrast prediction of two methods (Gaussian Process Regression and Gaussian Mixture Regression) in terms of precision and generalization. Generalization power outside the training data. GPR GMR Kronander, Khansari and Billard, JTSC award, IEEE Int. Conf. on Int. and Rob. Systems 2011. 19 19

20 20 Regression Algorithms in this Course Support Vector Machine Relevance Vector Machine Support vector regression Boosting random projections Relevance vector regression Boosting random gaussians Random Gaussian forest process regression Gaussian Process Gradient boosting Locally weighted projected regression

21 21 Main Types of Learning Methods Supervised learning where the algorithm learns a function or model that maps best a set of inputs to a set of desired outputs. Reinforcement learning where the algorithm learns a policy or model of the set of transitions across a discrete set of inputoutput states (Markovian world) in order to maximize a reward value (external reinforcement). Unsupervised learning where the algorithm learns a model that best represent a set of inputs without any feedback (no desired output, no external reinforcement)

222 Reinforcement Learning: Principle Generate a series of input for t time steps by performing action state x to state x: t 1 a a a 1 2 t 1 x x x 1 2 t... t a to go from Collecte data D= { x, x,... x, a, a,..., a } 1 2 t 1 2 t 1 Receive a reward at t: r t = 1 Map the reward to choices of actions a for each state (Credit Assignment Problem) Repeat the procedure (generate a series of trials and errors) until you converge to a good map that represents the best choice of action in any state to maximize total rewards.

23 23 t 1 Reinforcement Learning: Issues Generate a series of input for t time steps by performing action a to go from state x to state x: 1 2 t 1 x x x 1 2 t... Collecte data D= { x, x,... x, a, a,..., a } 1 2 t 1 2 t 1 t action-state pairs may be prohibitive. Receive a reward at t: r t = 1 a a a Number of states and actions may be infinite (continuous world). In discrete world, visiting all Map the reward to choices of actions a for each state (Credit Assignment Problem) Repeat the procedure (generate a series of trials and errors) until you converge to a good map that represents the best choice of action in any state to maximize total rewards.

Reinforcement Learning: example Goal: to stand up Reward: 1 if up, 0 otherwise Requires 750 trials in simulation + 170 on real robot to learn First set of trials At convergence Morimoto and Doya, Robotics and Autonomous Systems, 2001 24 24

Reinforcement Learning Algorithms in this Course Random walk Dynamic programming Power Genetic algorithm Discrete and continuous state reinforcement learning methods Above: solutions found by 4 different techniques for learning 25 25

26 26 Main Types of Learning Methods Supervised learning where the algorithm learns a function or model that maps best a set of inputs to a set of desired outputs. Reinforcement learning where the algorithm learns a policy or model of the set of transitions across a discrete set of inputoutput states (Markovian world) in order to maximize a reward value (external reinforcement). Unsupervised learning where the algorithm learns a model that best represent a set of inputs without any feedback (no desired output, no external reinforcement)

27 27 Unsupervised Learning: Principle Observe a series of input: D= { x, x,... x }, x 1 2 M 1 N Find regularities in the input, e.g. correlations, patterns. Use these to reduce the dimensionality of the data (with correlations) or to group datapoints (patterns).

Unsupervised learning ~ Structure discovery Raw Data Trying to find some structure in the data.. Pattern Recognition 28 28

29 29 Pattern Recognition: Similar Shape Results of decomposition with Principal Component Analysis: eigenvectors

30 30 Pattern Recognition: Similar Shape Encapsulate main differences across groups of images (in the first eigenvectors)

31 31 Pattern Recognition: Similar Shape Detailed features (glasses) get encapsulated next (in the following eigenvectors)

32 32 Pattern Recognition: Similar Shape Data are classifiable in a lower dimensional space (here plotted on 1 st and 2 nd eigenvector) Pattern recognition techniques of the type of PCA can be used to reduce the dimensionality of the dataset

333 Pattern Recognition: Similar Shape Subclasses can be discovered by looking at other combinations of projections (here plotted on 2 nd and 3 rd eigenvector) Pattern recognition techniques of the type of PCA can be used to reduce the dimensionality of the dataset

34 34 Pattern Recognition: Determining Trends Goal: Predicting evolution of prices of stock market Issues: Data vary over time Time is the input variable

35 35 Pattern Recognition: Determining Trends Moving time window Size of time window? Goal: Predicting evolution of prices of stock market Challenges: are previous data meaningful to predict future data? Data are very noisy; each datapoint is seen only once Must decouple noise from temporal variations of time series

36 36 Pattern Recognition: Clustering Clustering for automatic processing of medical images: enhance visibility Multispectral medical image segmentation. (left: MRI-image from 1 channel) (right: classification from a 9-cluster semi-supervised learning); Clusters should identify patterns, such as cerebro-spinal fluid, white matter, striated muscle, tumor. (Lundervolt et al, 1996). Cluster groups of pixels by tissue type use grey shadings to denote different groups

Pattern Recognition: Clustering Clustering assume groups of points are similar according to the same metric of similarity. Challenge: Clustering may fail when metric changes depending on the region of space Jain, 2010, Data clustering: 50 years beyond K-means, Pattern Recognition Letters 37 37

Pattern Recognition: Clustering Different techniques or heuristics can be developed to help the algorithm determine the right boundaries across clusters: Add manually set of constraints across pairs of datapoints determine links between datapoints in same/different clusters use semi-supervised clustering Jain, 2010, Data clustering: 50 years beyond K-means, Pattern Recognition Letters 38 38

39 Pattern Recognition: Clustering Clustering encompasses a large set of methods that try to find patterns that are similar in some way. Hierarchical clustering builds tree-like structure by pairing datapoints according to increasing levels of similarity.

40 40 Pattern Recognition: Clustering Hierarchical clustering can be used with arbitrary sets of data. Example: Hierarchical clustering to discover similar temporal pattern of crimes across districts in India. Chandra et al, A Multivariate Time Series Clustering Approach for Crime Trends Prediction, IEEE SMC 2008.

Unsupervised Learning Algorithms in this Course PCA Kernel PCA Kernel K-means Genetic algorithm Isomap Laplacian map Projections on Laplacian eigenvector Swissroll example 41 41

42 42 Main Types of Learning Methods Supervised learning where the algorithm learns a function or model that maps best a set of inputs to a set of desired outputs. Reinforcement Classification learning where the algorithm learns a policor model of the set of transitions across a discrete set of semi-supervised clustering input-output states (Markovian world) in order to maximize a reward value Clustering (external reinforcement). Unsupervised learning where the algorithm learns a model that best represent a set of inputs without any feedback (no desired output, no external reinforcement)

Data Mining Pattern recognition with very large amount of high-dimensional data (Tens of thousands to billions) (Several hundreds and more) 43 43

Mining webpages Data Mining: examples Cluster groups of webpage by topics Cluster links across webpages Other algorithms required: Fast methods for crawling the web Text processing (Natural Language Processing) Understanding semantics Issues: Domain-specific language / terminology Foreign languages Dynamics of web (pages disappear / get created) 444

Mining webpages Data Mining: examples Personalizing search Improvising user s experience Enhance commercial market value Build user-specific data country of origin languages spoken list of websites visited in the past days/weeks frequency of occurrence of some queries Link user-specific data to other groups of data e.g. robotics in Switzerland academic institutions in Europe 45 45

Data Mining Main Issue: Curse of Dimensionality Computational Costs O(M 2 ) O(M) M: Nb of datapoints Computational costs may also grow as a function of number of dimensions ML techniques seek linear growth whenever possible Apply methods for dimensionality reduction whenever possible 46 46

47 47 Some Machine Learning Resources http://www.machinelearning.org/index.html http://www.pascal-network.org/ Network of excellence on Pattern Recognition, Statistical Modelling and Computational Learning (summer schools and workshops) Databases: http://expdb.cs.kuleuven.be/expdb/index.php http://archive.ics.uci.edu/ml/ Journals: Machine Learning Journal, Kluwer Publisher IEEE Transactions on Signal processing IEEE Transactions on Pattern Analysis IEEE Transactions on Pattern Recognition The Journal of Machine Learning Research Conferences: ICML: int. conf. on machine learning Neural Information Processing Conference on-line repository of all research papers, www.nips.org

48 48 Topics for Literature survey and Mini-Projects The exact list of topics for lit. survey and mini-project will be posted by March 4 Topics for survey will entail: Data mining methods for crawling mailboxes Data mining methods for crawling git-hub Ethical issues on data mining Method for learning the features Metrics for clustering Topics for mini-project will entail implementing either of these: - Manifold learning (Isomap, Eigenmaps, Local Linear Embedding) - Classification (Random Forest) - Regression (Relevance Vector Regression) - Non-Parametric ApproximationsTechniques for Mixture Models Programming language open: matlab, C/C++, Python, or embed into MLDemo software

49 49 Overview of Practicals

50 50 Summary of today s class content Taxonomy and basic concepts of ML Structure discovery, classification, regression Supervised, unsupervised and reinforcement learning Data mining and curse of dimensionality Examples of ML applications Pattern recognition in images Data mining of the web Overview of projects and survey topics