Introduction to Machine Learning. Speaker: Harry Chao Advisor: J.J. Ding Date: 1/27/2011

Similar documents
CS 2750 Machine Learning. Lecture 1. Machine Learning. CS 2750 Machine Learning.

Learning is a very general term denoting the way in which agents:

Principles of Data Mining by Hand&Mannila&Smyth

BIOINF 585 Fall 2015 Machine Learning for Systems Biology & Clinical Informatics

An Introduction to Data Mining

Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris

Machine Learning. CUNY Graduate Center, Spring Professor Liang Huang.

Neural Networks and Support Vector Machines

Introduction to Machine Learning Lecture 1. Mehryar Mohri Courant Institute and Google Research

Data, Measurements, Features

KATE GLEASON COLLEGE OF ENGINEERING. John D. Hromi Center for Quality and Applied Statistics

COLLEGE OF SCIENCE. John D. Hromi Center for Quality and Applied Statistics

STA 4273H: Statistical Machine Learning

A Partially Supervised Metric Multidimensional Scaling Algorithm for Textual Data Visualization

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015

Supervised Learning (Big Data Analytics)

Machine learning for algo trading

Artificial Neural Networks and Support Vector Machines. CS 486/686: Introduction to Artificial Intelligence

Chapter 12 Discovering New Knowledge Data Mining

An Overview of Knowledge Discovery Database and Data mining Techniques

Machine Learning. Chapter 18, 21. Some material adopted from notes by Chuck Dyer

Machine Learning and Data Analysis overview. Department of Cybernetics, Czech Technical University in Prague.

Data Mining Practical Machine Learning Tools and Techniques

Social Media Mining. Data Mining Essentials

Machine Learning Introduction

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

Understanding Network Traffic An Introduction to Machine Learning in Networking

Statistical Machine Learning

6.2.8 Neural networks for data mining

Machine Learning for Data Science (CS4786) Lecture 1

Ensemble Methods. Knowledge Discovery and Data Mining 2 (VU) ( ) Roman Kern. KTI, TU Graz

Linear Models for Classification

HT2015: SC4 Statistical Data Mining and Machine Learning

ADVANCED MACHINE LEARNING. Introduction

Introduction to Data Mining

Machine Learning: Overview

Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches

Model Combination. 24 Novembre 2009

Learning outcomes. Knowledge and understanding. Competence and skills

Classification algorithm in Data mining: An Overview

Data Mining Part 5. Prediction

Linear Threshold Units

Machine Learning and Data Mining. Fundamentals, robotics, recognition

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

Machine Learning. Mausam (based on slides by Tom Mitchell, Oren Etzioni and Pedro Domingos)

: Introduction to Machine Learning Dr. Rita Osadchy

Learning to Process Natural Language in Big Data Environment

Search Taxonomy. Web Search. Search Engine Optimization. Information Retrieval

Data Mining - Evaluation of Classifiers

Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus

Using Data Mining for Mobile Communication Clustering and Characterization

Reference Books. Data Mining. Supervised vs. Unsupervised Learning. Classification: Definition. Classification k-nearest neighbors

Analysis of kiva.com Microlending Service! Hoda Eydgahi Julia Ma Andy Bardagjy December 9, 2010 MAS.622j

Software Development Training Camp 1 (0-3) Prerequisite : Program development skill enhancement camp, at least 48 person-hours.

TIETS34 Seminar: Data Mining on Biometric identification

Machine Learning. Term 2012/2013 LSI - FIB. Javier Béjar cbea (LSI - FIB) Machine Learning Term 2012/ / 34

Supervised and unsupervised learning - 1

What is Learning? CS 391L: Machine Learning Introduction. Raymond J. Mooney. Classification. Problem Solving / Planning / Control

NEURAL NETWORKS A Comprehensive Foundation

MACHINE LEARNING IN HIGH ENERGY PHYSICS

Non-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning

Introduction to Support Vector Machines. Colin Campbell, Bristol University

Training Methods for Adaptive Boosting of Neural Networks for Character Recognition

Practical Applications of DATA MINING. Sang C Suh Texas A&M University Commerce JONES & BARTLETT LEARNING

MA2823: Foundations of Machine Learning

Big Data and Marketing

Content-Based Recommendation

Big Data Analytics CSCI 4030

Statistical Models in Data Mining

A Simple Introduction to Support Vector Machines

E-commerce Transaction Anomaly Classification

Support Vector Machines with Clustering for Training with Very Large Datasets

The Artificial Prediction Market

Why Ensembles Win Data Mining Competitions

Data Mining: Overview. What is Data Mining?

COPYRIGHTED MATERIAL. Contents. List of Figures. Acknowledgments

Support Vector Machine (SVM)

LCs for Binary Classification

ARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING)

MS1b Statistical Data Mining

Simple and efficient online algorithms for real world applications

Decompose Error Rate into components, some of which can be measured on unlabeled data

A Review of Data Mining Techniques

Machine Learning.

An Introduction to Machine Learning

Machine Learning in Python with scikit-learn. O Reilly Webcast Aug. 2014

Introduction to Machine Learning Using Python. Vikram Kamath

Predict Influencers in the Social Network

Pentaho Data Mining Last Modified on January 22, 2007

A Study Of Bagging And Boosting Approaches To Develop Meta-Classifier

Maschinelles Lernen mit MATLAB

How To Cluster

Machine Learning Logistic Regression

SUCCESSFUL PREDICTION OF HORSE RACING RESULTS USING A NEURAL NETWORK

Introduction to Machine Learning and Data Mining. Prof. Dr. Igor Trajkovski

Welcome. Data Mining: Updates in Technologies. Xindong Wu. Colorado School of Mines Golden, Colorado 80401, USA

Big Data Analytics and Optimization

Comparison of K-means and Backpropagation Data Mining Algorithms

Comparison of Data Mining Techniques used for Financial Data Analysis

Logistic Regression. Vibhav Gogate The University of Texas at Dallas. Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld.

Transcription:

Introduction to Machine Learning Speaker: Harry Chao Advisor: J.J. Ding Date: 1/27/2011 1

Outline 1. What is machine learning? 2. The basic of machine learning 3. Principles and effects of machine learning 4. Different machine learning techniques (hypotheses) Linear classifier (numerical functions) Non-parametric (Instance-based functions) Non-metric (Symbolic functions) Parametric (Probabilistic functions) 5. Practical usage and application: Pattern recognition 6. Conclusion and discussion 2

1. What is machine learning? Description: 1. Optimize a performance criterion using example data or past experience 2. Learning and discovering some rules or properties from the given data set 2-dimensional Feature space 3

1. What is machine learning Notation: (1) X = : (training) data set, which contains N samples, and each sample is an d-dim vector (2) Y = : (training) label set (3) Data pair: Each corresponds to a (4) Another form of data set: N = 1000 d = 2 4

1. What is machine learning? Is learning feasible? Data acquisition Practical usage Universal set (unobserved) Training set (observed) Testing set (unobserved) 5

1. What is machine learning? Is learning feasible? (1) Yes: in the probabilistic way by Hoeffding Inequality (2) No free lunch rule: Training set and testing set come from the same distribution Need to make some assumptions or bias (ex: classifier type, ) 6

1. What is machine learning Disciplines: Computer science and Statistics Statistics: Learning and Inference given samples Computer science: Efficient algorithms for optimization, and model representation & evaluation Two important factors: Modeling Optimization Other related disciplines: Neural networks, Pattern recognition, Information retrieval, Artificial Intelligence, Data mining, Function approximation, 7

Outline 1. What is machine learning? 2. The basic of machine learning 3. Principles and effects of machine learning 4. Different machine learning techniques (hypotheses) Linear classifier (numerical functions) Non-parametric (Instance-based functions) Non-metric (Symbolic functions) Parametric (Probabilistic functions) 5. Practical usage and application: Pattern recognition 6. Conclusion and discussion 8

2. The basic of machine learning Design or learning Types of machine learning What re we seeking for? Machine learning structure (supervised) Optimization criterion Supervised learning strategies 9

2. The basic of machine learning Design or learning: Design: design some rules or strategies (If we know the intrinsic factors!) Learning: Learning from the (training) data Knowledge types: Domain knowledge Data-driven knowledge Example: feature generation or data compression (DCT, Wavelets VS. KL transform, PCA) 10

2. The basic of machine learning Design or learning Types of machine learning What re we seeking for? Machine learning structure (supervised) Optimization criterion Supervised learning strategies 11

2. The basic of machine learning Types of machine learning: Supervised learning ( ) (1) Prediction (2) Classification (discrete labels), Regression (real values) Unsupervised learning ( ) (1) Clustering (2) Probability distribution estimation (3) Finding association (in features) (4) Dimension reduction Reinforcement learning (1) Decision making (robot, chess machine) 12

2. The basic of machine learning Supervised learning Unsupervised learning What we learned Semi-supervised learning 13

2. The basic of machine learning Design or learning Types of machine learning What re we seeking for? Machine learning structure (supervised) Optimization criterion Supervised learning strategies 14

2. The basic of machine learning What re we seeking? (1) Supervised: Low E-out or maximize probabilistic terms E-in: for training set E-out: for testing set Hoeffding inequality and VC-bound tell us that minimizing E-in has some correspondences to reach low E-out With probability δ: About Model complexity (2) Unsupervised: Minimum quantization error, Minimum distance, MAP, MLE(maximum likelihood estimation) 15

2. The basic of machine learning Design or learning Types of machine learning What re we seeking for? Machine learning structure (supervised) Optimization criterion Supervised learning strategies 16

2. The basic of machine learning Machine learning structure: (Supervised) target: f Learning algorithm: A Stable: N, d Correctness Efficiency: fast Assumption, bias 17

2. The basic of machine learning Hypothesis set: (Linear classifier) Hypothesis set Through Learning algorithms 18

2. The basic of machine learning Design or learning Types of machine learning What re we seeking for? Machine learning structure (supervised) Optimization criterion Supervised learning strategies 19

2. The basic of machine learning Optimization criterion (for supervised linear classifier) Assume, where w is an d-dim vector (learned) +1-1 How to optimize with non-continuous error functions? For a sample with label 1: +1 error 1 Loss function -1 20

2. The basic of machine learning Optimization is a big issue in machine learning: Partial differential (Ex: convex optimization ) Genetic algorithm, Neural evolution Make differential available: Non-continuous (Piecewise) differentiable Approximation: (loss functions) VC bound Hypothesis + Loss function VC bound or generalization error 21

2. The basic of machine learning Approximation: Loss (objective) functions Different loss functions may find different final hypothesis For a sample with label 1: 1 error 0/1 loss (Perceptron learning) Logistic regression Hinge loss (SVM) Ada-line Square error (regression) 1 22

2. The basic of machine learning Optimization criterion: (1) Hypothesis type: linear classifier, decision tree, (2) Loss (objective) functions: Hinge loss, square error, (3) Optimization algorithms: (stable, efficient, correct) Gradient descent: back-propagation, general learning Convex optimization: primal & dual problems Dynamic programming: HMM Divide and Conquer: Decision tree induction, ** Hypothesis type affects model complexity ** Loss functions also affects model complexity & final hypothesis ** Algorithms find the desired hypothesis we want 23

2. The basic of machine learning Gradient descent: (searching for w) Loss (w) 24

2. The basic of machine learning Design or learning Types of machine learning What re we seeking for? Machine learning structure (supervised) Optimization criterion Supervised learning strategies 25

2. The basic of machine learning Supervised learning strategies: assume (1) One-shot (Discriminant model) (2) Two-stage (Probabilistic model) Discriminative: Generative: 26

2. The basic of machine learning What re we seeking? (revised) (1) One-shot (Discriminant) Low E-out by E-in or E-app target: f (2) Two-stage (Probabilistic model) Discriminative: Generative: 27

2. The basic of machine learning Supervised learning strategies: Comparison Category: One-shot Two-stage Model: Discriminant Generative, Discriminative Adv: Fewer assumptions Direct towards the goal More flexible Uncertainty & Domain-knowledge Discovery and analysis power Dis-Adv: No probabilistic signal More assumptions Computational complexity Usage: Supervised Supervised & Unsupervised Techniques: SVM, MLP, KNN, LDA, adaboost, CART, random forest GMM, GDA, Graphical models, HMM, Naïve Bayes 28

2. The basic of machine learning Machine learning revised: Theory: Goal: Methods: VC bound 29

Outline 1. What is machine learning? 2. The basic of machine learning 3. Principles and effects of machine learning 4. Different machine learning techniques(hypotheses) Linear classifier (numerical functions) Non-parametric (Instance-based functions) Non-metric (Symbolic functions) Parametric (Probabilistic functions) 5. Practical usage and application: Pattern recognition 6. Conclusion and discussion 30

3. Principles and effects of machine learning Effect 1: General performance With probability δ: (VC bound, Generalization error) Factors: Complex model: Generative error: 31

3. Principles and effects of machine learning Model complexity: :Low :Middle :high 32

3. Principles and effects of machine learning Effect 2: N, VC dimension, and d on generalized error 33

3. Principles and effects of machine learning Effect 3: Under-fitting VS. Over-fitting (fixed N) error (model = hypothesis + loss functions) 34

3. Principles and effects of machine learning Effect 4: Bias VS. Variance (fixed N) (Frequently used in statistics regression) Bias: Ability to fit the training data (lower bias the better) Variance: Final hypothesis variance with the training set error Bias + Variance Variance Bias Universal set (unobserved) 35

3. Principles and effects of machine learning Effect 5: Learning curve (fixed ) error Generalized error N 36

3. Principles and effects of machine learning Principles: Occam s Razor: The simplest model that fits the data is also the most plausible! Sampling bias: If the data is sampled in a biased way, learning will produce a similarity biased outcome! Data snooping: If a data set has affected any step in the learning process, it cannot be fully trusted in assessing the outcome! 37

3. Principles and effects of machine learning Model selection: For a machine learning problems, we have many hypothesis, loss functions, and algorithms to try! 2 parameters for a hypothesis set: (1) Set parameter (manual), (2) Hypothesis parameter (learned) We need a method or strategy to find the best model (hypothesis set + loss function) for training and testing! ** Directly using the achieved training error (E-in) of each model for model selection is risky!! 38

3. Principles and effects of machine learning Model selection: Regularization: Validation: Training set Base set + Validation set (1)Train parameters on Base set (2)Test performance on Validation set 39

3. Principles and effects of machine learning Machine learning revised: Theory: Goal: Methods: VC bound 40

Outline 1. What is machine learning? 2. The basic of machine learning 3. Principles and effects of machine learning 4. Different machine learning techniques(hypotheses) Linear classifier (numerical functions) Non-parametric (Instance-based functions) Non-metric (Symbolic functions) Parametric (Probabilistic functions) 5. Practical usage and application: Pattern recognition 6. Conclusion and discussion 41

4. Different machine learning techniques Linear classifier (numerical functions) Non-parametric (Instance-based functions) Non-metric (Symbolic functions) Parametric (Probabilistic functions) Aggregation 42

4. Different machine learning techniques Linear classifier (numerical functions):, where w is an d-dim vector (learned) Techniques: Logistic regression, perceptron, Ada-line, Support vector machine (SVM), Multi-layer perceptron (MLP) Linear to nonlinear: Feature transform and Kernel 43

4. Different machine learning techniques Feature transform: Support vector machine (SVM): Useful because it combines regularization with feature transform 44

4. Different machine learning techniques Linear classifier (numerical functions) Non-parametric (Instance-based functions) Non-metric (Symbolic functions) Parametric (Probabilistic functions) Aggregation 45

4. Different machine learning techniques Non-parametric (Instance-based functions): Techniques: (adaptive) K nearest neighbor, 46

4. Different machine learning techniques How large K is? About the model complexity Equal weight? Distance metric? 47

4. Different machine learning techniques Linear classifier (numerical functions) Non-parametric (Instance-based functions) Non-metric (Symbolic functions) Parametric (Probabilistic functions) Aggregation 48

4. Different machine learning techniques Non-metric (Symbolic functions): Techniques: Decision tree, 49

4. Different machine learning techniques Linear classifier (numerical functions) Non-parametric (Instance-based functions) Non-metric (Symbolic functions) Parametric (Probabilistic functions) Aggregation 50

4. Different machine learning techniques Parametric (Probabilistic functions): Techniques: Gaussian discriminant analysis (GDA), Naïve Bayes, Graphical models, P(x y = -1) P(x y = 1) 51

4. Different machine learning techniques Linear classifier (numerical functions) Non-parametric (Instance-based functions) Non-metric (Symbolic functions) Parametric (Probabilistic functions) Aggregation 52

4. Different machine learning techniques Aggregation: (ensemble learning) Assume we have trained 5 models and get 5 final hypotheses For a test x g1(x) g2(x) g3(x) g4(x) g5(x) Final decision Techniques: Adaboost, Bagging, Random forest, 53

4. Different machine learning techniques Type Adv Dis-adv Techniques Linear low Too easy Logistic regression, Ada-line, perceptron learning rule, Linear + feature transform Hard to design transform functions 2 nd order polynomial transform, SVM Powerful Over-fitting Computational complexity C-SVM, ν-svm, SVR, SVM + RBF, SVM + poly, MLP Powerful Convergence? Back-propagation MLP Non-parametric Easy idea Computational complexity KNN, adaptive KNN, KNN with kernel smoother, Decision tree Easy idea Over-fitting CART, Parametric Flexible Discovery Computational complexity GDA, Naïve Bayes, Graphical models, PLSA, PCA, Aggregation Powerful Over-fitting Computational complexity Adaboost, Bagging, Random forest, Blending, 54

4. Different machine learning techniques Machine learning revised: Theory: Goal: Methods: VC bound Usage: Linear SVM, Adaboost, Random forest 55

Outline 1. What is machine learning? 2. The basic of machine learning 3. Principles and effects of machine learning 4. Different machine learning techniques (hypotheses) Linear classifier (numerical functions) Non-parametric (Instance-based functions) Non-metric (Symbolic functions) Parametric (Probabilistic functions) 5. Practical usage and application: Pattern recognition 6. Conclusion and discussion 56

5. Practical usage and application Pattern recognition: Data base Test query Feature extraction Feature extraction Classifier training Results 57

5. Practical usage and application Pattern recognition: (1) Data pre-processing Raw data: music signal, image, Feature extraction: DCT, FT, Gabor feature, RGB, MFCC, Feature selection & feature transform: PCA, LDA, ICA, (2) Classifier training Choose a model and set parameters Train the classifier Test the performance, either with validation or regularization Find another model? 58

Outline 1. What is machine learning? 2. The basic of machine learning 3. Principles and effects of machine learning 4. Different machine learning techniques (hypotheses) Linear classifier (numerical functions) Non-parametric (Instance-based functions) Non-metric (Symbolic functions) Parametric (Probabilistic functions) 5. Practical usage and application: Pattern recognition 6. Conclusion and discussion 59

6. Conclusion and discussion Factors: Training set with N and d Assumptions: Hypothesis, loss functions, and same distribution Theory: (1) Cautions: (2) Probabilistic: Assume the basic distribution is known Over-fitting VS. Under-fitting Bias VS. Variance Learning curve 60

6. Conclusion and discussion Machine learning is powerful and widely-used in recent researches such as pattern recognition, data mining, and information retrieval Knowing how to use machine learning algorithms is very important Q&A? 61

Thank you for listening 62