Data Mining. Supervised Methods. Ciro Donalek donalek@astro.caltech.edu. Ay/Bi 199ab: Methods of Computa@onal Sciences hcp://esci101.blogspot.

Similar documents

Pa8ern Recogni6on. and Machine Learning. Chapter 4: Linear Models for Classiﬁca6on

Ensemble Methods. Adapted from slides by Todd Holloway h8p://abeau<fulwww.com/2007/11/23/ ensemble- machine- learning- tutorial/

ECBDL 14: Evolu/onary Computa/on for Big Data and Big Learning Workshop July 13 th, 2014 Big Data Compe//on

Artificial Neural Networks and Support Vector Machines. CS 486/686: Introduction to Artificial Intelligence

Neural Networks and Support Vector Machines

Nodes, Ties and Influence

NEURAL NETWORKS A Comprehensive Foundation

Comparison of Supervised and Unsupervised Learning Classifiers for Travel Recommendations

Big Data in medical image processing

Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

ANALYTICAL TECHNIQUES FOR DATA VISUALIZATION

6.2.8 Neural networks for data mining

Missing Data. Katyn & Elena

Lecture 6. Artificial Neural Networks

Predict Influencers in the Social Network

Data quality in Accounting Information Systems

Active Learning SVM for Blogs recommendation

SUCCESSFUL PREDICTION OF HORSE RACING RESULTS USING A NEURAL NETWORK

Introduction to Machine Learning Using Python. Vikram Kamath

Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data

Novelty Detection in image recognition using IRF Neural Networks properties

AUTOMATION OF ENERGY DEMAND FORECASTING. Sanzad Siddique, B.S.

Analecta Vol. 8, No. 2 ISSN

Programming Exercise 3: Multi-class Classification and Neural Networks

Machine learning for algo trading

STA 4273H: Statistical Machine Learning

DDC Sequencing and Redundancy

Performance Evaluation of Artificial Neural. Networks for Spatial Data Analysis

Feed-Forward mapping networks KAIST 바이오및뇌공학과 정재승

Recurrent Neural Networks

Role of Neural network in data mining

Data Mining mit der JMSL Numerical Library for Java Applications

A Simple Introduction to Support Vector Machines

Applying Machine Learning to Network Security Monitoring. Alex Pinto Chief Data Scien2st

Scalable Developments for Big Data Analytics in Remote Sensing

Introduction to Machine Learning and Data Mining. Prof. Dr. Igor Trajkovski

How To Understand The Big Data Paradigm

Analysis Tools and Libraries for BigData

Data Mining Techniques for Prognosis in Pancreatic Cancer

Self Organizing Maps: Fundamentals

Neural network software tool development: exploring programming language options

Data Mining and Neural Networks in Stata

Making Sense of the Mayhem: Machine Learning and March Madness

Search Taxonomy. Web Search. Search Engine Optimization. Information Retrieval

Car Insurance. Havránek, Pokorný, Tomášek

Chapter 12 Discovering New Knowledge Data Mining

3 An Illustrative Example

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015

Big Data Analytics CSCI 4030

Data Mining. Nonlinear Classification

Data Mining Practical Machine Learning Tools and Techniques

Machine Learning and Data Mining -

Neural Networks in Quantitative Finance

NEURAL NETWORKS IN DATA MINING

Stock Prediction using Artificial Neural Networks

These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop

Data Mining and Visualization

degrees of freedom and are able to adapt to the task they are supposed to do [Gupta].

TOLOMEO. ORFEO Toolbox. Jordi Inglada - CNES. TOoLs for Open Mul/- risk assessment using Earth Observa/on data TOLOMEO

Application of Event Based Decision Tree and Ensemble of Data Driven Methods for Maintenance Action Recommendation

An Open Dynamic Big Data Driven Applica3on System Toolkit

A hybrid financial analysis model for business failure prediction

1. Classification problems

Artificial Intelligence and Machine Learning Models

BIOINF 585 Fall 2015 Machine Learning for Systems Biology & Clinical Informatics

Employer Health Insurance Premium Prediction Elliott Lui

IFT3395/6390. Machine Learning from linear regression to Neural Networks. Machine Learning. Training Set. t (3.5, -2,..., 127, 0,...

Use of Artificial Neural Network in Data Mining For Weather Forecasting

Classification algorithm in Data mining: An Overview

Chapter 6. The stacking ensemble approach

Artificial Neural Network, Decision Tree and Statistical Techniques Applied for Designing and Developing Classifier

KEITH LEHNERT AND ERIC FRIEDRICH

Power Prediction Analysis using Artificial Neural Network in MS Excel

DTREG. Predictive Modeling Software. Phillip H. Sherrod. Copyright All rights reserved.

Data Management in the Cloud: Limitations and Opportunities. Annies Ductan

Building an Iris Plant Data Classifier Using Neural Network Associative Classification

Chapter 4: Artificial Neural Networks

ARTIFICIAL NEURAL NETWORKS FOR DATA MINING

Component Ordering in Independent Component Analysis Based on Data Power

Neural Network Add-in

American International Journal of Research in Science, Technology, Engineering & Mathematics

Feature Subset Selection in Spam Detection

International Journal of Computer Science Trends and Technology (IJCST) Volume 3 Issue 3, May-June 2015

An Introduction to Artificial Neural Networks (ANN) - Methods, Abstraction, and Usage

A Multi-level Artificial Neural Network for Residential and Commercial Energy Demand Forecast: Iran Case Study

Using artificial intelligence for data reduction in mechanical engineering

EFFICIENT DATA PRE-PROCESSING FOR DATA MINING

Neural Networks for Sentiment Detection in Financial Text

Ensembles and PMML in KNIME

Intrusion Detection via Machine Learning for SCADA System Protection

Method of Combining the Degrees of Similarity in Handwritten Signature Authentication Using Neural Networks

Learning Example. Machine learning and our focus. Another Example. An example: data (loan application) The data and the goal

Big Data and Health Insurance Product Selec6on (and a few other applica6on) Jonathan Kolstad UC Berkeley and NBER

COC131 Data Mining - Clustering

Transcription:

Data Mining Supervised Methods Ciro Donalek donalek@astro.caltech.edu

Supervised Methods Summary Ar@ficial Neural Networks Mul@layer Perceptron Support Vector Machines SoLwares

Supervised Models: Supervised Methods training data includes both the input and the desired results; for some examples the correct results (target values) are known and are given in input to the model during the learning process; the network has to be able to generalize, ie, to give the correct results when new data are given in input without knowing a priori the target.

Supervised Models: Supervised Methods these methods are usually fast and very accurate the construc@on of a proper training, valida@on and test set and the choose of the architecture may be a rather troublesome task

Ar@ficial Neural Networks An Artificial Neural Network is an information processing paradigm that is inspired by the way biological nervous systems process information: a large number of highly interconnected simple processing elements (neurons) working together to solve specific problems

Neural Networks A Neural Network is usually structured into an input layer of neurons, one or more hidden layers and one output layer. Neurons belonging to adjacent layers are usually fully connected. The values of the func@ons associated with the connec@ons are called weights. Simplified model of a NN

Neural Networks Feed forward: Single Layer Perceptron, MLP, ADALINE (Adap@ve Linear Neuron), RBF Self Organized: SOM Recurrent: Simple Recurrent Network, Hopfield Network. Stochas@c: Boltzmann machines. Modular: CommiCee of Machines, ASNN (Associa@ve Neural Networks), Ensembles. Others: Instantaneously Trained, Spiking (SNN), Dynamic, Cascades, NeuroFuzzy, PPS, GTM.

Mul@layer Perceptron The MLP is one of the most used supervised model: it consists of mul@ple layers of computa@onal units, usually interconnected in a feedforward way. Each neuron in one layer has directed connec@ons to all the neurons of the subsequent layer.

A Simple Ar@ficial Neuron The basic computa@onal element is olen called a node or unit. It receives input from some other units, or from an external source. Each input has an associated weight w, which can be modified so as to model synap@c learning. The unit computes some func@on f of the weighted sum of its inputs:

Ac@va@on Func@ons Ac@va@on Func@ons scalar to scalar func@on; used by most units to transform their inputs; needed to introduce non linearity into the network linear, logis@c, tanh, solmax...

Ac@va@on Func@ons 2 Step func3on The output is a certain value A1, if the input sum is above a certain threshold and A0 if the input sum is below a certain threshold. When we want to classify an input pacern into one of two groups, we can use a binary classifier with a step ac@va@on func@on. Sigmoid func3on Has the property of being similar to the step func@on, but with the addi@on of a region of uncertainty. Sigmoid func@ons in this respect are very similar to the input output rela@onships of biological neurons.

Error Func@ons Error Func@ons most methods for supervised learning require a measure of the discrepancy between the netwrok output values and the target; sum of the squared errors (SSE), cross entropy (CE), etc. Using a Mul@layer Perceptron with a solmax ac@va@on func@on and cross entropy error, the network outputs can be interpreted as the condi@onal probabili@es p(c 1 x) and p(c 2 x) where x is the input vector, C 1 the first class, C 2 the second class.

MLP Supervised Learning Supervised neural networks are adjusted, or trained, so that a par@cular input leads to a specific target output. It requires that for a subset of data in the input space there must be an a accurate knowledge of the desired property (e.g. the real class).

Back Propaga@on Learning Process the output values are compared with the target to compute the value of some predefined error func@on; the error is then fed back through the network; using this informa@on, the algorithm adjusts the weights of each connec@on in order to reduce the value of the error func@on. ALer repea@ng this process for a sufficiently large number of training cycles, the network will usually converge.

Generaliza@on Generaliza@on refers to the neural network ability to produce reasonable outputs for inputs not encountered during the training In other words: NO PANIC when never seen before data are given in input!

Datasets Training set: a set of example used for learning, where the target value is known. Valida@on set: a set of examples used to tune the architecture of a classifier and es@mate the error. Test set: a set of examples used only to assess the performance of a classifier. The test set is never used during the training process so that the error on the test set provides an unbiased es@mate of the generaliza@on error.

Data Selec@on Garbage in, garbage out : training, valida@on and test data must be representa@ve of the underlying model; All eventuali@es must be covered Unbalanced data sets. Since a network minimizes an overall error, the propor@on of types of data in the set is cri@cal; inclusion of a loss matrix (Bishop, 1995); olen, the best approach is to ensure even representa@on of different cases, then to interpret the network's decisions accordingly.

Hidden Units The best number of hidden units depends on: numbers of inputs and outputs number of training cases the amount of noise in the targets the complexity of the func@on to be learned the ac@va@on func@on Too few hidden units => high training and generaliza@on error, due to underfikng and high sta@s@cal bias. Too many hidden units => low training error but high generaliza@on error, due to overfikng and high variance.

Regression MLP Data Explora@on Data table sta@s@cal correla@on mapping without any prior assump@on on the func@onal form of the data distribu@on; machine learning algorithms well suited for this. Curve fikng find a well defined and known func@on underlying the data.

MLP Data Explora@on Classifica3on Crispy classifica@on given an input, the classifier returns its label. Probabilis@c classifica@on given an input, the classifier returns its probabili@es to belong to each class useful when some mistakes can be more costly than others winner take all rule

Results: Confusion Matrix In the confusion matrix the network predic@on Y are compared with the target T: the rows represent the true classes and the columns the predicted classes. Training set Test set Galaxy Galaxy Star Star Galaxy Star Galaxy Star

Performances The performances of the classifiers are rated based on the following three criteria. Supposing we have 2 classes A and B: completeness: the percentage of objects of class A correctly classified as such; contamina3on: the percentage of objects of class A incorrectly classified as objects belonging to the class B; classifica3on rate: the overall percentage of objects correctly classified.

Combining Models It is olen found that improved performance can be obtained by combining models together in some way, instead of using a single model in isola@on. In this way, individual classifiers may be op@mized or trained differently. Exp 1 Input Exp 2 Combiner... Exp K CommiCee Machines: combina@on of experts that "vote" together on a given example.

Combining Models Input 1 Input 2 Input 3 NN 1 output Input n addi@onal { informa@on NN 2 final output

A priori Knowledge An alterna@ve of model combina@ons is to select one of the models to make the classifica@on, and let the choice of the model be driven by an input parameter or by an a priori knowledge. Predic@ons are made using the average of the predic@ons made by the two classifiers in which each sample falls.

Support Vector Machines Support vector machines (SVM) are a group of supervised learning methods that can be applied to classifica@on or regression. In a short period of @me, SVM found numerous applica@ons in a lot of scien@fic fields like physics, biology, chemistry: drug design (discrimina@ng between ligands and nonligands, inhibitors and noninhibitors, etc.), quan@ta@ve structure ac@vity rela@onships (QSAR, where SVM regression is used to predict various physical, chemical, or biological proper@es), chemometrics (op@miza@on of chromatographic separa@on or compound concentra@on predic@on from spectral data as examples), sensors (for qualita@ve and quan@ta@ve predic@on from sensor data), chemical engineering (fault detec@on and modeling of industrial processes), text mining (automa@c recogni@on of scien@fic informa@on)

SVM Hyperplanes SVM models were originally defined for the classifica@on of linearly separable classes of objects. For any par@cular set of two class objects, an SVM finds the unique hyperplane having the maximum margin. H3 (green) doesn't separate the 2 classes. H1 (blue) does, with a small margin. H2 (red) does with the maximum margin.

SVM Classifica@on SVM can be used to separate classes that cannot be separated with a linear classifier. Training vectors are mapped into an higher dimensional feature space using nonlinear func@ons ϕ. The feature space is a high dimensional space in which the two classes can be separated with a linear classifier.

SVM Classifica@on The mapping into a more dimensional space is done using the so called Kernel Func@ons.

SVM Support Vectors A special characteris@c of SVM is that the solu@on to a classifica@on problem is represented by the support vectors that determine the maximum margin hyperplane. These objects, represented inside circles in Figure, are called support vectors. The hyperplane H1 defines the border with class +1 objects, whereas the hyperplane H2 defines the border with class 1 objects. Two objects from class +1 define the hyperplane H1, and three objects from class 1 define the hyperplane H2.

SVM Example Linear Poly=2 Poly=3 Poly=10 The linear kernel doesn t work The polynomials discriminate perfectly among the two class avoid overfikng

Datasets IRIS (Bi) consists of 3 classes, 50 instances each and 4 numerical acributes (sepal and petal lengths and widths in cm); each class refers to a type of Iris plant (Setosa, Versicolor, Verginica); the first class is linearly separable from others while that lacer are not linearly separable;

PQ Ar@facts (Ay) 2 main classes and 4 numeric acributes; classes are: true objects, ar@facts Datasets

SoLwares FANN: Fast Ar@ficial Neural Networks hcp://leenissen.dk/fann/ Netlab (Matlab toolbox) hcp://www.ncrg.aston.ac.uk/netlab/index.php LIBSVM hcp://www.csie.ntu.edu.tw/~cjlin/libsvm/ VONEURAL/DAME hcp://voneural.na.infn.it/

Send your comments...