fml The SHOGUN Machine Learning Toolbox (and its python interface)

Similar documents

Introduction to Machine Learning Lecture 1. Mehryar Mohri Courant Institute and Google Research

Machine learning for algo trading

: Introduction to Machine Learning Dr. Rita Osadchy

Support Vector Machine. Tutorial. (and Statistical Learning Theory)

MAV Stabilization using Machine Learning and Onboard Sensors

CS 2750 Machine Learning. Lecture 1. Machine Learning. CS 2750 Machine Learning.

Machine Learning Introduction

Simple and efficient online algorithms for real world applications

HT2015: SC4 Statistical Data Mining and Machine Learning

Statistical Machine Learning

Data Mining - Evaluation of Classifiers

Supervised Learning (Big Data Analytics)

Knowledge Discovery from patents using KMX Text Analytics

Analysis Tools and Libraries for BigData

Scalable Developments for Big Data Analytics in Remote Sensing

BIOINF 585 Fall 2015 Machine Learning for Systems Biology & Clinical Informatics

A Simple Introduction to Support Vector Machines

Support Vector Machines

Learning is a very general term denoting the way in which agents:

A MapReduce based distributed SVM algorithm for binary classification

MACHINE LEARNING IN HIGH ENERGY PHYSICS

Artificial Neural Networks and Support Vector Machines. CS 486/686: Introduction to Artificial Intelligence

Feature Selection using Integer and Binary coded Genetic Algorithm to improve the performance of SVM Classifier

Machine Learning in Spam Filtering

Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms

Search Taxonomy. Web Search. Search Engine Optimization. Information Retrieval

Introduction to Support Vector Machines. Colin Campbell, Bristol University

Classification algorithm in Data mining: An Overview

Big Data Analytics CSCI 4030

ADVANCED MACHINE LEARNING. Introduction

Introduction to Online Learning Theory

Predict Influencers in the Social Network

Data, Measurements, Features

MAXIMIZING RETURN ON DIRECT MARKETING CAMPAIGNS

Beating the NCAA Football Point Spread

Azure Machine Learning, SQL Data Mining and R

Support Vector Machines Explained

Introduction to Learning & Decision Trees

Machine Learning. Term 2012/2013 LSI - FIB. Javier Béjar cbea (LSI - FIB) Machine Learning Term 2012/ / 34

Machine Learning. CUNY Graduate Center, Spring Professor Liang Huang.

Machine Learning Final Project Spam Filtering

Introduction to Machine Learning Using Python. Vikram Kamath

KEITH LEHNERT AND ERIC FRIEDRICH

Maschinelles Lernen mit MATLAB

Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris

Support Vector Machines with Clustering for Training with Very Large Datasets

Bayesian Machine Learning (ML): Modeling And Inference in Big Data. Zhuhua Cai Google, Rice University

CHARACTERISTICS IN FLIGHT DATA ESTIMATION WITH LOGISTIC REGRESSION AND SUPPORT VECTOR MACHINES

High Productivity Data Processing Analytics Methods with Applications

Introduction to Pattern Recognition

Support Vector Machine (SVM)

Mathematical Models of Supervised Learning and their Application to Medical Diagnosis

MACHINE LEARNING BASICS WITH R

Intrusion Detection via Machine Learning for SCADA System Protection

8. Machine Learning Applied Artificial Intelligence

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R

Machine Learning and Statistics: What s the Connection?

Car Insurance. Prvák, Tomi, Havri

Learning from Diversity

Network Machine Learning Research Group. Intended status: Informational October 19, 2015 Expires: April 21, 2016

Challenges of Cloud Scale Natural Language Processing

Making Sense of the Mayhem: Machine Learning and March Madness

Monotonicity Hints. Abstract

The Data Mining Process

Machine Learning.

An Introduction to Data Mining

Neural Networks and Support Vector Machines

Support Vector Machines for Classification and Regression

1. Classification problems

Predicting Flight Delays

Gaussian Processes in Machine Learning

E-commerce Transaction Anomaly Classification

INTRODUCTION TO MACHINE LEARNING 3RD EDITION

Semi-Supervised Support Vector Machines and Application to Spam Filtering

STA 4273H: Statistical Machine Learning

Classifying Large Data Sets Using SVMs with Hierarchical Clusters. Presented by :Limou Wang

Understanding Network Traffic An Introduction to Machine Learning in Networking

Journée Thématique Big Data 13/03/2015

Distributed Machine Learning and Big Data

Learning to Process Natural Language in Big Data Environment

Machine Learning. Chapter 18, 21. Some material adopted from notes by Chuck Dyer

Defending Networks with Incomplete Information: A Machine Learning Approach. Alexandre

An Introduction to Machine Learning

How can we discover stocks that will

Evaluation of Machine Learning Techniques for Green Energy Prediction

TIETS34 Seminar: Data Mining on Biometric identification

Machine Learning Algorithms for Classification. Rob Schapire Princeton University

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning

CS Master Level Courses and Areas COURSE DESCRIPTIONS. CSCI 521 Real-Time Systems. CSCI 522 High Performance Computing

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015

CSE 473: Artificial Intelligence Autumn 2010

Early defect identification of semiconductor processes using machine learning

Assignment 2: Option Pricing and the Black-Scholes formula The University of British Columbia Science One CS Instructor: Michael Gelbart

Machine Learning CS Lecture 01. Razvan C. Bunescu School of Electrical Engineering and Computer Science

Data Mining Techniques for Prognosis in Pancreatic Cancer

Social Media Mining. Data Mining Essentials

MA2823: Foundations of Machine Learning

Lecture Slides for INTRODUCTION TO. ETHEM ALPAYDIN The MIT Press, Lab Class and literature. Friday, , Harburger Schloßstr.

Transcription:

fml The SHOGUN Machine Learning Toolbox (and its python interface) Sören Sonnenburg 1,2, Gunnar Rätsch 2,Sebastian Henschel 2,Christian Widmer 2,Jonas Behr 2,Alexander Zien 2,Fabio de Bona 2,Alexander Binder 1,Christian Gehl 1, and Vojtech Franc 3 1 Berlin Institute of Technology, Germany 2 Friedrich Miescher Laboratory, Max Planck Society, Germany 3 Center for Machine Perception, Czech Republic

Outline 1 Introduction 2 Machine Learning 3 The SHOGUN Machine Learning Toolbox

About me Who Am I? About Me 1997-2002 studied Computer Science 2002-2009 doing ML & Bioinformatics research at Fraunhofer Institute FIRST and Max Planck Society since 2002 2008 PhD: Machine Learning for Genomic Sequence Analysis 2009- Researcher at Berlin Institute of Technology Open Source Involvement Debian Developer http://www.debian.org Machine Learning OSS http://mloss.org Machine Learning Data http://mldata.org Main author of SHOGUN - this talk More about me http://sonnenburgs.de/soeren

Machine Learning - Learning from Data What is Machine Learning and what can it do for you? What is ML? AIM: Learning from empirical data! Applications speech and handwriting recognition search engines, natural language processing medical diagnosis, bioinformatics, chemoinformatics detecting credit card fraud computer vision, object recognition stock market analysis network security, intrusion detection brain-machine interfaces...

Machine Learning - Learning from Data What is Machine Learning and what can it do for you? What is ML? AIM: Learning from empirical data! Applications speech and handwriting recognition search engines, natural language processing medical diagnosis, bioinformatics, chemoinformatics detecting credit card fraud computer vision, object recognition stock market analysis network security, intrusion detection brain-machine interfaces...

Introduction Machine Learning Further reading Books on Machine Learning... and many more... The SHOGUN Machine Learning Toolbox

Formal Definition A more formal definition Example x i X, for example, text document Label y i Y, for example, whether the document is Spam Training Data Data consisting of examples and associated labels which are used for training the machine Testing Data Data consisting only of examples used for generating predictions Predictions Output of the trained machine

Formal Definition A more formal definition Example x i X, for example, text document Label y i Y, for example, whether the document is Spam Training Data Data consisting of examples and associated labels which are used for training the machine Testing Data Data consisting only of examples used for generating predictions Predictions Output of the trained machine

Tasks Machine Learning: Main Tasks Supervised Learning We have both, input and labels, for each example. The aim is to learn about the pattern between input and labels. (The input is sometimes also called example.) Unsupervised Learning We do not have labels for the examples, but wish to discover the underlying structure of the data. Reinforcement Learning How an autonomous agent that senses and acts in its environment can learn to choose optimal actions to achieve its goals.

Tasks Estimators Basic Notion Formally Intuition We want to estimate the relationship between the examples x i and the associated label y i. We want to choose an estimator f : X Y. We would like a function f which correctly predicts the label y for a given example x. Question How do we measure how well we are doing?

Loss Functions Loss Function Basic Notion Formally We characterize the quality of an estimator by a loss function. We define a loss function as Intuition l(f (x i ), y i ) : Y Y R +. For a given label y i and a given prediction f (x i ), we want a positive value telling us how much error there is.

Loss Functions Classification GC content before ''? GC content after '' In binary classification (Y = { 1, +1}), we one may use the 0/1-loss function: { 0 if f (xi ) = y l(f (x i ), y i ) = i 1 if f (x i ) y i

Loss Functions Regression In regression (Y = R), one often uses the square loss function: l(f (x i ), y i ) = (f (x i ) y i ) 2.

Measuring Complexity Expected vs. Empirical Risk Expected Risk This is the average loss on unseen examples. We would like to have it as small as possible, but it is hard to compute. Empirical Risk We can compute the average on training data. We define the empirical risk to be: R emp (f, X, Y ) = 1 n l(f (x i ), y i ). n Basic Notion Question i=1 Instead of minimizing the expected risk, we minimize the empirical risk. This is called empirical risk minimization. How do we know that our estimator will perform well on unseen data?

Measuring Complexity Simple vs. Complex Functions GC content before '' GC content before '' GC content before '' GC content after '' GC content after '' GC content after '' Which function is preferable? [http://www.franciscans.org] Occam s razor (a.k.a. Occam s Law of Parsimony): (William of Occam, 14th century) Entities should not be multiplied beyond necessity ( Do not make the hypothesis more complex than necessary )

Measuring Complexity Simple vs. Complex Functions GC content before '' GC content before '' GC content before '' GC content after '' GC content after '' GC content after '' Which function is preferable? [http://www.franciscans.org] Occam s razor (a.k.a. Occam s Law of Parsimony): (William of Occam, 14th century) Entities should not be multiplied beyond necessity ( Do not make the hypothesis more complex than necessary )

Measuring Complexity Special Case: Complexity of Hyperplanes GC content before '' What is the complexity of a hyperplane classifier? GC content after ''

Measuring Complexity Special Case: Complexity of Hyperplanes GC content before '' What is the complexity of a hyperplane classifier? GC content after '' Vladimir Vapnik and Alexey Chervonenkis: Vapnik-Chervonenkis (VC) dimension (VapChe71,Vap95) [http://tinyurl.com/cl8jo9,http://tinyurl.com/d7lmux]

Measuring Complexity Larger Margin Less Complex Support Vector Machines (SVMs) Maximum Margin Minimum Complexity Minimize complexity by maximizing margin (irrespective of the dimension of the space). Useful Idea: Find the hyperplane that classifies all points correctly, while maximizing the margin (=SVMs).

Putting Things together Summary of Empirical Inference Learn function f : X Y given N labeled examples (x i, y i ) X Y. Three important ingredients: Model f θ parametrized with some parameters θ Θ Loss function l(f (x), y) measuring the deviation between predictions f (x) and the label y Complexity term P[f ] defining model classes with limited complexity Most algorithms find θ in f θ by minimizing: θ = argmin θ Θ ( N l(f θ (x i ), y i ) i=1 } {{ } Empricial error Regularization parameter + {}}{ ) C P[f θ ] } {{ } Complexity term for given C

Putting Things together Summary of Empirical Inference Learn function f : X Y given N labeled examples (x i, y i ) X Y. Three important ingredients: Model f θ parametrized with some parameters θ Θ Loss function l(f (x), y) measuring the deviation between predictions f (x) and the label y Complexity term P[f ] defining model classes with limited complexity Most algorithms find θ in f θ by minimizing: θ = argmin θ Θ ( N l(f θ (x i ), y i ) i=1 } {{ } Empricial error Regularization parameter + {}}{ ) C P[f θ ] } {{ } Complexity term for given C

Putting Things together Summary of Empirical Inference Learn function f : X Y given N labeled examples (x i, y i ) X Y. Three important ingredients: Model f θ parametrized with some parameters θ Θ Loss function l(f (x), y) measuring the deviation between predictions f (x) and the label y Complexity term P[f ] defining model classes with limited complexity Most algorithms find θ in f θ by minimizing: θ = argmin θ Θ ( N l(f θ (x i ), y i ) i=1 } {{ } Empricial error Regularization parameter + {}}{ ) C P[f θ ] } {{ } Complexity term for given C

Putting Things together Summary of Empirical Inference Learn function f : X Y given N labeled examples (x i, y i ) X Y. Three important ingredients: Model f θ parametrized with some parameters θ Θ Loss function l(f (x), y) measuring the deviation between predictions f (x) and the label y Complexity term P[f ] defining model classes with limited complexity Most algorithms find θ in f θ by minimizing: θ = argmin θ Θ ( N l(f θ (x i ), y i ) i=1 } {{ } Empricial error Regularization parameter + {}}{ ) C P[f θ ] } {{ } Complexity term for given C

Support Vector Machines Support Vector Machine (SVMs) Given: Points x i X (i = 1,..., N) with labels y i { 1, +1} Task: Find hyperplane that maximizes margin Decision function f (x) = w x + b

Support Vector Machines SVM with Kernels SVM decision function in kernel feature space: N f (x) = y i α i Φ(x) Φ(x i ) + b (1) } {{ } i=1 =k(x,x i ) Training: Find parameters α Corresponds to solving quadratic optimization problem (QP)

Measuring the Performance Measuring Performance in Practice What to do in practice Split the data into training and validation sets; use error on validation set as estimate of the expected error A. Cross-validation Split data into c disjoint parts; use each subset as validation set and rest as training set B. Random splits Randomly split data set into two parts, for example, 80% of data for training and 20% for validation; Repeat this many times See, for instance, (DudHarSto01) for more details.

Measuring the Performance Model Selection Do not train on the test set! Use subset of data for training From subset, further split to select model. Model selection = Find best parameters Regularization parameter C. Other parameters (kernel,... )

Overview SHOGUN Machine Learning Toolbox Overview I History 1999 Initiated by S. Sonnenburg and G. Rätsch (SHOGUN) 2006 First public release (June) 2008 used in 3rd party code (PyVMPA) Now Several other contributors mostly from Berlin, Tübingen Debian, Ubuntu, MacPorts packaged, > 1000 installations Unified (large-scale) learning for various feature types and settings Machine Learning Methods Overview Regression (Kernel Ridge Regression, SVR) Distributions (Hidden Markov models... ) Performance Measures Clustering Classification (focus: Support Vector Machines)

Overview SHOGUN Machine Learning Toolbox Overview I History 1999 Initiated by S. Sonnenburg and G. Rätsch (SHOGUN) 2006 First public release (June) 2008 used in 3rd party code (PyVMPA) Now Several other contributors mostly from Berlin, Tübingen Debian, Ubuntu, MacPorts packaged, > 1000 installations Unified (large-scale) learning for various feature types and settings Machine Learning Methods Overview Regression (Kernel Ridge Regression, SVR) Distributions (Hidden Markov models... ) Performance Measures Clustering Classification (focus: Support Vector Machines)

Overview SHOGUN Machine Learning Toolbox Overview I History 1999 Initiated by S. Sonnenburg and G. Rätsch (SHOGUN) 2006 First public release (June) 2008 used in 3rd party code (PyVMPA) Now Several other contributors mostly from Berlin, Tübingen Debian, Ubuntu, MacPorts packaged, > 1000 installations Unified (large-scale) learning for various feature types and settings Machine Learning Methods Overview Regression (Kernel Ridge Regression, SVR) Distributions (Hidden Markov models... ) Performance Measures Clustering Classification (focus: Support Vector Machines)

Overview SHOGUN Machine Learning Toolbox - Overview II Focus: Large-Scale Learning with... 15 implementations of Support Vector Machines solvers 35 kernels (Focus on string-kernels for Bioinformatics) Multiple Kernel Learning Linear Methods Implementation and Interfaces Implemented in C++ (> 130, 000 lines of code) Interfaces: libshogun, python, octave, R, matlab, cmdline Over 600 examples Doxygen documentation Inline python documentation (e.g. help(gaussiankernel) ) Testsuite ensuring that obvious bugs do not slip through

A Tutorial Installing SHOGUN Steps to Install SHOGUN LINUX $ sudo apt-get install shogun-python-modular MacOSX $ sudo ports install shogun If that does not work or wrong OS download source code, untar and read INSTALL or follow http://www.shogun-toolbox.org/doc/installation.html DON T Do not use the legacy static python interface (just called python).

A Tutorial Simple code example: SVM Training A) Generate Toy Data from numpy import concatenate as con from numpy import ones,mean,sign from numpy.random import randn num=1000; dist=1; width=2.1; C=1.0 traindata_real=con((randn(2,num)-dist, randn(2,num)+dist), axis=1) testdata_real=con((randn(2,num)-dist, randn(2,num)+dist), axis=1) trainlab=con((-ones(num), ones(num))) testlab=con((-ones(num), ones(num)))

A Tutorial Simple code example: SVM Training B) Train and Apply SVM with SHOGUN from shogun.features import Labels,RealFeatures from shogun.kernel import GaussianKernel from shogun.classifier import LibSVM feats_train=realfeatures(traindata_real) kernel=gaussiankernel(feats_train, feats_train, width) labels=labels(trainlab) svm=libsvm(c, kernel, labels) svm.train() out=svm.classify(realfeatures(testdata_real)).get_labels() testerr=mean(sign(out)!=testlab) print testerr

Features Efficient Native Feature Representations Input Features Dense Vectors/Matrices (SimpleFeatures) uint8 t. float64 t Sparse Vectors/Matrices (SparseFeatures) uint8 t. float64 t Variable Length Vectors/Matrices (StringFeatures) uint8 t. float64 t loading and saving as hdf5, ascii, binary all of numpy

Features Interfaces (legacy static interface) Interface Types Static Interfaces (single object of each type only) NEW Modular Interfaces (really object oriented, SWIG based) Support for all Feature Types Dense, Sparse, Strings Possible by defining generic get/set functions, e.g. void set int(int32 t scalar); void set real(float64 t scalar); void set bool(bool scalar); void set vector(float64 t* vector, int32 t len); void set matrix(float64 t* m, int32 t rws, int32 t cls);... set/get functions for access from python, R, octave, matlab

Features The Eierlegendewollmilchsau TM Interface Embed Interface A from Interface B possible to run python code from octave possible to run octave code from python possible to run r code from python... Demo: Use matplotlib to plot functions from octave.

Features Modular Python SWIG based Interface typemaps for numpy,scipy.sparse,files,lists of strings defined Wrapping a C++ Object with swig (Kernel) %{ #include <shogun/kernel/gaussiankernel.h> %} %rename(gaussiankernel) CGaussianKernel; %include <shogun/kernel/gaussiankernel.h> Wrapping a C++ Object with swig (Classifier) %{ #include <shogun/classifier/svm/libsvm.h> %} %rename(libsvm) CLibSVM; %include <shogun/classifier/svm/libsvm.h>

Features Unique Features of SHOGUN I Input Features possible to stack together features of arbitrary types (sparse, dense, string) via CombinedFeatures and DotFeatures chains of preprocessors (e.g. substracting the mean) can be attached to each feature object (on-the-fly pre-processing) Kernels working with custom pre-computed kernels. possible to stack together kernels via CombinedKernel (weighted linear combination of a number of sub-kernels, not necessarily working on the same domain) kernel weighting can be learned using MKL Methods (e.g., SVMs) can be trained using unified interface

Features Unique Features of SHOGUN II Large Scale multiprocessor parallelization (training with up to 10 million examples and kernels) implements COFFIN framework (dynamic feature / example generation; training on 200,000,000 dimensions and 50,000,000 examples) Community Integration Documentation available, many many examples There is a Debian Package, MacOSX Mailing-List, open SVN repository... and many more...

Applications and Demo Application Genomic Signals Transcription Start (Sonnenburg et al., 2006) Acceptor Splice Site (Sonnenburg et al., 2007) Donor Splice Site (Sonnenburg et al., 2007) Alternative Splicing (Rätsch et al., 2005) Transsplicing (Schweikert et al., 2009) Translation Initiation (Sonnenburg et al., 2008) Genefinding Splice form recognition - msplicer (Rätsch et al. 2008) Genefinding - mgene (Schweikert et al., 2009)

Applications and Demo Demo I Support Vector Classification Task: separate 2 clouds of gaussian distributed points in 2D Simple code example: SVM Training lab = Labels(labels) train = RealFeatures(features) gk = GaussianKernel(train, train, width) svm = LibSVM(10.0, gk, lab) svm.train()

Applications and Demo Demo II Support Vector Regression Task: learn a sine function Simple code example: Support Vector Regression lab = Labels(labels) train = RealFeatures(features) gk = GaussianKernel(train, train, width) svm = LibSVR(10.0, gk, lab) svm.train()

Applications and Demo Demo III Hidden Markov Model Task: 3 loaded dice are drawn 1000 times, find out when which dice was drawn Clustering Task: find clustering of 3 clouds of gaussian distributed points in 2D

Applications and Demo Summary SHOGUN Machine Learning Toolbox Unified framework, for various interfaces Applicable to huge datasets (>50 million examples) Algorithms: HMM, LDA, LPM, Perceptron, SVM, SVR + many kernels,... Documentation, Examples, Source Code Implementation http://www.shogun-toolbox.org Documentation http://www.shogun-toolbox.org/doc More machine learning software http://mloss.org Machine Learning Data http://mldata.org We need your help: Documentation, Examples, Testing, Extensions