TMVA, the Toolkit for Multivariate Data Analysis with ROOT
|
|
|
- Harold Mitchell
- 10 years ago
- Views:
Transcription
1 with ROOT Max-Planck-Institut für Kernphysik, Heidelberg Andreas Höcker CERN, Switzerland Jörg Stelzer CERN, Switzerland Fredrik Tegenfeldt Iowa State U., USA Multivariate classification methods based on machine learning techniques play a fundamental role in today s high-energy physics analyses dealing with ever smaller signal in ever larger data sets. TMVA is a toolkit integrated in the ROOT framework which implements a large variety of multivariate classification algorithms ranging from simple rectangular cut optimisation and likelihood estimators, over linear and non-linear discriminants to more recent developments like boosted decision trees, rule fitting and support vector machines. All classifiers can be trained, tested and evaluated simultaneously. They all see the same training are then afterwards also tested on the same independent test data allowing meaningful comparisons between the methods for a particular use case. Here, an overview about the package and the classifiers currently implemented is presented. XI International Workshop on Advanced Computing and Analysis Techniques in Physics Research April Amsterdam, the Netherlands Speaker. contributors given in arxiv physics/7339 c Copyright owned by the author(s) under the terms of the Creative Commons Attribution-NonCommercial-ShareAlike Licence.
2 . Introduction The Toolkit for Multivariate Analysis (TMVA) provides a ROOT-integrated environment for the processing and parallel evaluation of sophisticated multivariate classification techniques. The classification is done in terms of two event categories, e.g. signal and background. While TMVA is specifically designed for the needs of high-energy physics (HEP) applications it is not necessarily restricted to those. The idea behind this software package is to provide a large variety of powerful multivariate classifiers in one common environment, allowing for easy use and comparison of the different classification techniques for any give problem. TMVA provides convenient preprocessing possibilities for the data prior to feeding them into any of the classifiers. Many auxiliary information is provided about the data like the correlation matrix, variable ranking and separation power and finally a full efficiency versus background rejection curve of the trained classifiers. With this information the user is in the position to quickly find the optimal data selection procedure for his selection problem. The package currently includes: Rectangular cut optimisation; Projective likelihood estimation; Multi-dimensional likelihood estimation (PDE range-search and k-nn); Linear and nonlinear discriminant analysis (H-Matrix, Fisher, FDA); Artificial neural networks (three different implementations); Support Vector Machine; Boosted/bagged decision trees; Predictive learning via rule ensembles. The TMVA software package consists of object-oriented implementations in C++/ROOT for each of these discrimination techniques and auxiliary tools such as parameter fitting and variable transformations. It provides training, testing and performance evaluation algorithms and visualisation scripts. A short description of the various classifiers is given in Sec. while a detailed description including the options available for tuning the individual classifiers is given in the TMVA manual []. TMVA is an open source product and distributed either via the ROOT package [] and separately via sourceforge [3]. Several similar combined multivariate classification ( machine learning ) packages exist with rising importance in most fields of science and industry. In the HEP community there is also StatPatternRecognition [, 5]. The idea of parallel training and evaluation of MVA-based classification in HEP has been pioneered by the Cornelius package, developed by the Tagging Group of the BABAR Collaboration [6].
3 . Data Preprocessing, Training and Testing Training and testing of the classifiers is performed with the use of user-supplied data sets with known event classification. These data are supplied in form of either ROOT trees or ASCII text files. Individual event weights may be attributed when specified in the data set, allowing the use of Monte Carlo that include weighted events. One is able to select any subset of variables provided in the data set, as well as variable combinations and formulas, just as they are available for the Draw command of a ROOT tree. All classifiers see the same training data and are afterwards tested all on the same test data, which is an independet data set from the training data. The evaluation prescriptions are the same for all classifiers. A Factory class organises the interaction between the user and the TMVA analysis steps including preanalysis and preprocessing of the training data to assess basic properties of the chosen variables as input to the classifiers. As preanalysis, the linear correlation coefficients of the input variables are calculated and displayed, and a preliminary ranking is derived (which is later superseded by classifier-specific variable rankings). Preprocessing of the data set includes the application of conventional preselection cuts that are common for all classifiers. In addition there is the possibility for two variables transformations, decorrelation via the square-root of the covariance matrix and via a principal component decomposition. The latter transformations can be individually chosen for any particular classifier. Each classifier writes its transformation into its weight file once the training has converged. The weight files contain all information for later application of the trained classifier. For testing and application of a classifier, the transformation is read from the weight file and a corresponding transformation is then applied during the application of the trained classifier to event data provided in the same format as the original training data. Removing linear correlations from the data sample may be useful for classifiers that intrinsically do not take into account variable correlations as for example the projective likelihood. Because in most realistic use cases correlations are present, this results in a loss of performance. Also other classifiers, such as rectangular cuts or decision trees, and even multidimensional likelihood approaches underperform in presence of variable correlations. A demonstration of the decorrelation procedure is shown in Fig.. This shows the decorrelation applied to a toy Monte Carlo with linearly correlated and Gaussian distributed variables, that is supplied together with the TMVA package. After the training, the classifiers are subjected to testing and evaluation in order to assess their performances. The optimal classifier to be used for a specific analysis strongly depends on the problem at hand and no general recommendations can be given. To ease the choice TMVA computes and displays a number of benchmark quantities in order to compare the classifiers using the independent test sample. These quantities are The signal efficiency at three representative background efficiencies (the efficiency is equal to rejection) obtained from a cut on the classifier output. Also given is the area of the back- While training and testing is performed on statistically independent data samples to ensure an unbiased testing, a third data sample would be needed for a fully unbiased performance estimate in case parameter tuning of the individual classifiers is involved in the procedure. Some limitations when using negative event weights may be encountered. 3
4 TMVA, the Toolkit for Multivariate Data Analysis var versus var3 (signal)_decorrtransform var var var versus var3 (signal)_notransform var3 3 var3 Figure : Correlation between input variables. Left: correlations between var3 and var for signal training events from a toy Monte Carlo. Right: the same after applying a linear decorrelation transformation. ground rejection versus signal efficiency function (the larger the area the better the overall performance). The separation hs iof a classifier y, defined by the integral [6] hs i = Z (y S (y) y B (y)) dy, y S (y) + y B (y) (.) where y S and y B are the signal and background PDFs of y, respectively. The separation is zero for identical signal and background shapes, and it is one for shapes with no overlap. The discrimination significance of a classifier, defined by the difference between the classifier means for signal and background divided by the quadratic sum of their root-mean-squares. In addition, smooth background rejection/efficiency versus signal efficiency curves are written to the target ROOT file. This and many other results and control plots like the MVA-output distributions (Fig. ), variable distributions, correlation matrices and scatter plots as well as classifier specific information like the neural network architecture are conveniently plotted using custom made ROOT macros executed via a graphical user interface that comes with the TMVA distribution. The TMVA training/testing job runs alternatively as a ROOT script, as a standalone executable, where TMVA shared library is linked, or as a python script via the PyROOT interface. Each classifier trained in one of these applications writes its configuration and training results in a result ( weight ) file. 3. Classifier Application The application of the trained classifiers to select events from a data sample with unknown signal and background composition is handled via a light-weight Reader object. It reads and interprets the weight files of the chosen classifier and can be included in any C++ executable, ROOT macro or python analysis job. -
5 TMVA output for classifier: Likelihood TMVA output for classifier: PDERS Normalized Signal Background Normalized.5 Signal Background 8 Normalized TMVA output for classifier: MLP Signal Background Likelihood MLP U/O-flow (S,B): (.,.)% / (.,.)% U/O-flow (S,B): (.,.)% / (.,.)% Normalized TMVA output for classifier: BDT Signal Background PDERS Figure : Example plots for classifier output distributions for signal and background events from the academic test sample. Shown are likelihood (upper left), PDE range search (upper right), MLP (lower left) and boosted decision trees. BDT U/O-flow (S,B): (.,.)% / (.,.)% U/O-flow (S,B): (.,.)% / (.,.)% For standalone use of the trained classifiers, TMVA also generates lightweight C++ response classes, which contain the encoded information from the weight files such that these are not required anymore. These classes do not depend on TMVA or ROOT, neither on any other external library. Emphasis has been put on the clarity and functionality of the Factory and Reader interfaces to the user applications, which will hardly exceed a few lines of code. All classifiers run with reasonable default configurations and should have satisfying performance for average applications. It is stressed however that, to solve a concrete problem, all classifiers require at least some specific tuning to deploy their maximum classification capability. Individual optimisation and customisation of the classifiers is achieved via configuration strings that are detailed in [].. The Classifiers All TMVA classifying methods inherit from a common base class, which implements basic functionality common to all classifiers. Options common for all classifiers that can be specified upon booking include the possibility to normalise the input variables, variable transformations as 5
6 Background rejection Background rejection versus Signal efficiency MVA Method: Fisher MLP BDT PDERS Likelihood Signal efficiency Figure 3: Example for the background rejection versus signal efficiency obtained by cutting on the classifier outputs for the events of the test sample. preprocessing, the possibility to create probability density functions from the resulting MVA-output distributions and the level of printout generated during application and testing.. Rectangular Cut optimisation Although not a real multivariate classifier, the simplest and most common technique for selecting signal events from a mixed sample of signal and background events is the application of an ensemble of rectangular cuts on discriminating variables. Unlike all other classifiers in TMVA, the cut classifier only returns a binary response (signal or background). The optimisation of cuts performed by TMVA maximises the background rejection at given signal efficiency, and scans over the full range of the latter quantity. Dedicated analysis optimisation for which, e.g., the signal significance is maximised rather than the background rejection requires the expected signal and background yields to be known before applying the cuts. This is not the case for a multi-purpose discrimination and hence not used by TMVA. However, the cut ensemble leading to maximum significance corresponds to a particular working point on the efficiency curve, and can hence be easily derived after the cut optimisation scan has converged. 3 TMVA cut optimisation is performed with the use of multivariate parameter fits. Satisfactory results for the fitting are obtained with either Monte Carlo sampling or a Genetic Algorithm, both 3 Assuming a large enough number of events so that Gaussian statistics is applicable, the significance for a signal is given by S = ε S N S / ε S N S + ε B (ε S )N S, where ε S(B) and N S(B) are the signal and background efficiencies for a cut ensemble and the event yields before applying the cuts, respectively. The background efficiency ε B is expressed as a function of ε S using the TMVA evaluation curve obtained form the test data sample. The maximum significance is then found at the root of the derivative ( ) ds ε B (ε S )N B + ε S N S dε B(ε S ) dε S N B = N S dε S (ε S N S + ε B (ε S )N B ) 3/ =, (.) which depends on the problem. 6
7 implemented in TMVA. The training events are sorted in binary trees prior to the optimisation, which significantly reduces the computing time required to determine the number of events passing a given cut ensemble.. Projective likelihood estimation The maximum likelihood classification is based on building a model of one dimensional probability density functions (PDFs) from the training data that reproduces the input variables for signal and background. For a given event, the likelihood for being of signal type is obtained by multiplying the signal probability densities of all input variables, and normalising this by the sum of the signal and background likelihoods. Correlations among the variables are ignored. The probability density functions are constructed either by histogramming the variable distributions in the training data with subsequent spline smoothing or using an unbinned kernel density estimator..3 Multi-dimensional likelihood estimation (k-nn and PDE range-search ) These are generalisations of the projective likelihood classifier to n var dimensions, where n var is the number of input variables used. If the multidimensional PDF for signal and background were known, the multi-dimensional likelihood would exploit the full information contained in the input variables, and would hence be optimal. In practice however, huge training samples are necessary to sufficiently populate the multidimensional phase space for a good estimate of the underlying PDF. Kernel estimation or event counting methods are typically used to approximate the shape of the PDF for finite training statistics. The k-nearest neighbour (k-nn) method compares an observed (test) event to reference events from a training data set. It searches for a fixed number of closest events in the training sample. The fraction of signal events in the neighbourhood is then used as a probability density estimator (PDE) for the test event s phase space point. To enhance the sensitivity within the volume, instead of pure counting of the events, a polynomial Kernel estimate which weighs events according to their distance from the test event is implemented as well. A variant of the k-nn classifier is the PDE range search, or PDERS, that has been suggested in Ref. [7]. Here the PDE for a given test event is obtained by counting the (normalised) number of signal and background (training) events that occur in a fixed volume around the test event. The classification of the test event may then be conducted on the basis of the majority of the nearest training events. In the TMVA implementation, the n var -dimensional volume that encloses the "vicinity" is user-defined and can also be adaptive to ensure a certain number of events in the volume. Again, a number of kernel functions may be used to weight the reference events according to their distance from the test event. The mulit-dimensional likelihood estimators use sorted binary trees to reduce the computing time for the range search.. Linear and nonlinear discriminant analysis (H-Matrix, Fisher, FDA) The linear discriminant analysis generally finds a hyperplane in the variable phase space that (best) separates signal from background. The standard method of Fisher discriminants [8] works Due to correlations between the input variables, only a sub-space of the full phase space may be populated. 7
8 in a linear transformed variable space where projected onto the axis defining the separating hyperplane, the events of different classes are pushed as far as possible away from each other, while events of a same class are confined in a close vicinity. The linearity property of this classifier is reflected in the metric with which "far apart" and "close vicinity" are determined: the covariance matrix of the discriminating variable space. The H-Matrix method is a simple estimator built on the relative difference of the two multivariate χ -values of the event being compatible with either signal (χs ) or background (χ B ). The respective χs(b) -values are calculated using the sample means for signal and background and their respective covariance matrices obtained from the training data. The function discriminant analysis (FDA) lets the user choose any parametrised function as discrimination boundary and FDA fits the parameters to it, requiring the signal (background) function value to be as close as possible to (). Its advantage over the more involved and automatic nonlinear discriminators is the simplicity and transparency of the discrimination expression. A shortcoming is that FDA will underperform for involved problems with complicated, phase space dependent nonlinear correlations, where it will be difficult to guess a discriminating function for the decision boundary..5 Artificial neural networks Artificial neural networks (ANNs) are classifiers that feed the weighted input variables of a test event into a set of nodes also called neurons. Each node generates an output as response of the input according to its (non linear) activation function which can either be fed as input into consecutive nodes or used as an output, i.e. the final response of the network. With each node acting as a basis function, an appropriate choice of arrangement and number of nodes and their interconnecting weights in principle allows to approximate any decision boundary. For a given node arrangement (architecture) the training of a neural network consists of finding the interconnecting weights such that the separation between background and signal event is optimised. There are three different implementations of ANNs in TMVA, all being feed-forward networks. This means that the nodes are arranged in an array with connections only in one direction, i.e. forward, from the input nodes (through the hidden layers) towards the output nodes without cycles or loops. The CFANN was adapted from a FORTRAN code developed at the Université Blaise Pascal in Clermont-Ferrand and which uses random Monte Carlo sampling for the weight fitting during training. The other two networks both use standard back-propagation during the weight optimisation. One is an interface to the network already previously implemented in ROOT and the other is our own development offering additional flexibility concerning the choice of different activation functions..6 Support Vector Machine Support Vector Machines are non-linear discrimination algorithms following an idea from [9, ] to separate signal and background by a simple linear hyperplane in a non-linear transformed variable space. The hyperplane is completely defined by the events that are closest to the separating plane. These events are also called the support vectors. The actual variable space transformation 8
9 allowing for the linear hyperplane separation of signal and background is never actually specified or calculated but replaced by the choice of so called kernel functions that define a (non linear) metric in the variable space. Polynomial, Sigmoidal and Gaussian kernel functions are available in TMVA. Their parameters (i.e. the with of the Gaussian) are to be chosen in order to adapt to the size and structure of the non-linear features that need to be captured in the decision boundary of the data..7 Boosted/bagged decision trees A decision tree is a classifier that is structured as a binary tree. For each test event, repeated left/right (yes/no) decisions are performed on a single variable at a time until the event reaches a so called leaf node which classifies it as being either signal or background. The collection of leaf nodes split the phase space into many disjunct regions that are classified as being either signal or background like. The tree structure is defined during the training (tree building) phase, when starting from the whole training sample, consecutive binary splits are determined using the variable and cut value that allows maximum separation between signal and background at the time. When the splitting is finished, the node is classified as either signal or background depending on the majority of training events that end up in it. In TMVA, the stop criteria for the splitting during the training phase is given by the minimum number of events which is demanded for a leaf node. Small numbers of events in leaf nodes are able to capture small features in the phase space discriminating signal from background. However this may easily result in over-training, i.e. the capture of statistial fluctuations in the training sample rather than genuine features of the underlying PDFs. A pruning algorithm with adjustable prune strength is applied after the tree building to remove statistically insignificant nodes from the decision tree. Boosted decision trees represent an extension to a single decision tree. The classification of individuals of an ensemble of decision trees are combined to form a classifier which is given by a (weighted) majority vote of the classificaion from the individual decision trees. The individual trees are derived from the same training sample by reweighting the events. In TMVA, the standard AdaBoost [] algorithm is implemented, which calculates the boost weight used in the next tree building depending on the number of misclassified training events in the previously trained tree. Rather than performing boosting in order to create an ensemble of decision trees, bagging as defined in [] uses randomly drawn events taken from the original training sample with replacement to construct different training samples. An ensemble of decision trees is then constructed from the collection of training samples derived by this re-sampling. A variant of this idea implemented in TMVA uses random event weights to create different training samples from which the decision trees in the ensemble are constructed. Both bagging and boosting stabilise the response of the decision trees with respect to fluctuations in the training sample..8 Predictive learning via rule ensembles This classifier is a TMVA implementation of Friedman-Popscus RuleFit method described in [3]. Its idea is to use an ensemble of so-called rules to create a scoring function with good 9
10 classification power. Each rule is defined by a simple sequence of cuts which either selects signal or background. The easiest way to create an ensemble of rules is to extract it from an ensemble of decision trees where every node in a tree (except the root node) corresponds to a sequence of cuts required to reach the node from the root node, and can be regarded as a rule. The rules give a response value of if an events satisfies the respective cuts and otherwise and are regarded as basis function for the final classifier. The latter is constructed as linear combinations of the rules in the ensemble resulting in a scoring function as response of the RuleFit classifier. The coefficients (rule weights) of the linear combination that maximise the separation power between signal and background events are calculated using a regularised minimisation procedure []..9 Classifier Discussion There is obviously no general answer to the question which classifier should be used. To guide the user, we have attempted an assessment of various relevant classifier properties in Table. Simplicity is a virtue, but only if it is not at the expense of discrimination power. Robustness with respect to overtraining could become an issue when the training sample is scarce. Some methods require more attention than others in this regard. For example, boosted decision trees are particularly vulnerable to overtraining if used without care. To circumvent overtraining a problemspecific adjustment of the pruning strength parameter is required. To assess whether a linear discriminant analysis (LDA) could be sufficient for a classification problem, the user is advised to analyse the correlations among the discriminating variables by inspecting scatter and profile plots (it is not enough to print the correlation coefficients, which by definition are linear only). Using an LDA greatly reduces the number of parameters to be adjusted and hence allow smaller training samples. It usually is robust with respect to generalisation to larger data samples. For intermediate problems, the function discriminant analysis (FDA) with some selected nonlinearity may be found sufficient. It is always useful to cross-check its performance against several of the sophisticated nonlinear classifiers to see how much can be gained over the use of the simple and very transparent FDA. For problems that require a high degree of optimisation and allow to form a large number of input variables, complex nonlinear methods like neural networks, the support vector machine, boosted decision trees and/or RuleFit are more appropriate. Very involved multi-dimensional variable correlations with strong nonlinearities are usually best mapped by the multidimensional probability density estimators such as PDERS and k-nn. For RuleFit we emphasize that the TMVA implementation differs from Friedman-Popescu s original code [3], with (yet) slightly better robustness and out-of-the-box performance for the latter version. In particular, the behaviour of the original code with respect to nonlinear correlations and the curse of dimensionality would have merited two stars. 5 We also point out that the excellent performance for by majority linearly correlated input variables is achieved somewhat artificially by adding a Fisher-like term to the RuleFit classifier (this is true for both implementations). 5 An interface to Friedman-Popescu s original code has now been implemented in TMVA.
11 CRITERIA Cuts Likelihood PDE- RS CLASSIFIERS k-nn H- Fisher ANN BDT Rule- Matrix Fit SVM Performance No or linear correlations Nonlinear correlations Speed Training Response Robust- Overtraining ness Weak variables Curse of dimensionality Transparency Table : Assessment of classifier properties. The symbols stand for the attributes good ( ), fair ( ) and bad ( ). Curse of dimensionality refers to the burden of required increase in training statistics and processing time when adding more input variables. See also comments in text. The FDA classifier is not represented here since its properties depend on the chosen function. 5. Conclusion TMVA is a toolkit that unifies highly customisable multivariate classification algorithms in a single framework thus ensuring convenient use and an objective performance assessment as all classifiers see the same training and test data, and are evaluated following the same prescription. Source code and library of TMVA-v.3.5. and higher versions are part of the standard ROOT distribution kit (v5. and higher). The newest TMVA development version can be downloaded from Sourceforge.net at Acknowledgements The fast growth of TMVA would not have been possible without the contribution and feedback from many developers mentionned as co-authors of the users guide []) as well as René Brun and the ROOT team. We thank in particular the CERN Summer students Matt Jachowski (Stanford U.) for the implementation of TMVA s MLP neural network, and Yair Mahalalel (Tel Aviv U.) for a significant improvement of PDERS. The Support Vector Machine has been contributed to TMVA by Andrzej Zemla and Marcin Wolter (IFJ PAN Krakow), and the k-nn classifier has been written by Rustem Ospanov (Texas U.). References [] A. Höcker, P. Speckmayer, J. Stelzer, F. Tegenfeldt, H. Voss, K. Voss, A. Christov, S. Henrot-Versillé, M. Jachowski, A. Krasznahorkay Jr., Y. Mahalalel, R. Ospanov, X. Prudent, M. Wolter, A. Zemla arxiv:physics/7339 (7).
12 [] Rene Brun and Fons Rademakers, ROOT - An Object Oriented Data Analysis Framework, Proceedings AIHENP 96 Workshop, Lausanne, Sep. 996, Nucl. Inst. & Meth. in Phys. Res. A 389 (997) See also [3] [] I. Narsky, StatPatternRecognition: A C++ Package for Statistical Analysis of High Energy Physics Data, physics/573 (5). [5] The following web pages give information on available statistical tools in HEP and other areas of science: [6] The BABAR Physics Book, BABAR Collaboration (P.F. Harrison and H. Quinn (editors) et al.), SLAC-R-5 (998); S. Versillé, PhD Thesis at LPNHE, (998). [7] T. Carli and B. Koblitz, Nucl. Instrum. Meth. A5, 576 (3) [hep-ex/9]. [8] R.A. Fisher, Annals Eugenics 7, 79 (936). [9] V. Vapnik and A. Lerner, Pattern recognition using generalized portrait method, Automation and Remote Control,, 77 (963). [] V. Vapnik and A. Chervonenkis, A note on one class of perceptrons, Automation and Remote Control, 5 (96). [] Y. Freund and R.E. Schapire, J. of Computer and System Science 55, 9 (997). [] L. Breiman, J. Friedman, R. Olshen and C. Stone, Classification and Regression Trees, Wadsworth (98). [3] J. Friedman and B.E. Popescu, Predictive Learning via Rule Ensembles, Technical Report, Statistics Department, Stanford University,. [] J. Friedman and B.E. Popescu, Gradient Directed Regularization for Linear Regression and Classification, Technical Report, Statistics Department, Stanford University, 3.
Classification of Bad Accounts in Credit Card Industry
Classification of Bad Accounts in Credit Card Industry Chengwei Yuan December 12, 2014 Introduction Risk management is critical for a credit card company to survive in such competing industry. In addition
Neural networks in data analysis
ISAPP Summer Institute 2009 Neural networks in data analysis Michal Kreps ISAPP Summer Institute 2009 M. Kreps, KIT Neural networks in data analysis p. 1/38 Outline What are the neural networks 1 Basic
Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris
Class #6: Non-linear classification ML4Bio 2012 February 17 th, 2012 Quaid Morris 1 Module #: Title of Module 2 Review Overview Linear separability Non-linear classification Linear Support Vector Machines
Predict Influencers in the Social Network
Predict Influencers in the Social Network Ruishan Liu, Yang Zhao and Liuyu Zhou Email: rliu2, yzhao2, [email protected] Department of Electrical Engineering, Stanford University Abstract Given two persons
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical
Lecture 6. Artificial Neural Networks
Lecture 6 Artificial Neural Networks 1 1 Artificial Neural Networks In this note we provide an overview of the key concepts that have led to the emergence of Artificial Neural Networks as a major paradigm
Ensembles and PMML in KNIME
Ensembles and PMML in KNIME Alexander Fillbrunn 1, Iris Adä 1, Thomas R. Gabriel 2 and Michael R. Berthold 1,2 1 Department of Computer and Information Science Universität Konstanz Konstanz, Germany [email protected]
Artificial Neural Networks and Support Vector Machines. CS 486/686: Introduction to Artificial Intelligence
Artificial Neural Networks and Support Vector Machines CS 486/686: Introduction to Artificial Intelligence 1 Outline What is a Neural Network? - Perceptron learners - Multi-layer networks What is a Support
Linear Threshold Units
Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear
Predictive Dynamix Inc
Predictive Modeling Technology Predictive modeling is concerned with analyzing patterns and trends in historical and operational data in order to transform data into actionable decisions. This is accomplished
AUTOMATION OF ENERGY DEMAND FORECASTING. Sanzad Siddique, B.S.
AUTOMATION OF ENERGY DEMAND FORECASTING by Sanzad Siddique, B.S. A Thesis submitted to the Faculty of the Graduate School, Marquette University, in Partial Fulfillment of the Requirements for the Degree
D-optimal plans in observational studies
D-optimal plans in observational studies Constanze Pumplün Stefan Rüping Katharina Morik Claus Weihs October 11, 2005 Abstract This paper investigates the use of Design of Experiments in observational
MACHINE LEARNING IN HIGH ENERGY PHYSICS
MACHINE LEARNING IN HIGH ENERGY PHYSICS LECTURE #1 Alex Rogozhnikov, 2015 INTRO NOTES 4 days two lectures, two practice seminars every day this is introductory track to machine learning kaggle competition!
Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms
Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms Scott Pion and Lutz Hamel Abstract This paper presents the results of a series of analyses performed on direct mail
Identifying SPAM with Predictive Models
Identifying SPAM with Predictive Models Dan Steinberg and Mikhaylo Golovnya Salford Systems 1 Introduction The ECML-PKDD 2006 Discovery Challenge posed a topical problem for predictive modelers: how to
Machine Learning and Pattern Recognition Logistic Regression
Machine Learning and Pattern Recognition Logistic Regression Course Lecturer:Amos J Storkey Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh Crichton Street,
Artificial Neural Network, Decision Tree and Statistical Techniques Applied for Designing and Developing E-mail Classifier
International Journal of Recent Technology and Engineering (IJRTE) ISSN: 2277-3878, Volume-1, Issue-6, January 2013 Artificial Neural Network, Decision Tree and Statistical Techniques Applied for Designing
Data quality in Accounting Information Systems
Data quality in Accounting Information Systems Comparing Several Data Mining Techniques Erjon Zoto Department of Statistics and Applied Informatics Faculty of Economy, University of Tirana Tirana, Albania
Mining Direct Marketing Data by Ensembles of Weak Learners and Rough Set Methods
Mining Direct Marketing Data by Ensembles of Weak Learners and Rough Set Methods Jerzy B laszczyński 1, Krzysztof Dembczyński 1, Wojciech Kot lowski 1, and Mariusz Paw lowski 2 1 Institute of Computing
Statistical Machine Learning
Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes
Classification algorithm in Data mining: An Overview
Classification algorithm in Data mining: An Overview S.Neelamegam #1, Dr.E.Ramaraj *2 #1 M.phil Scholar, Department of Computer Science and Engineering, Alagappa University, Karaikudi. *2 Professor, Department
Introduction to Support Vector Machines. Colin Campbell, Bristol University
Introduction to Support Vector Machines Colin Campbell, Bristol University 1 Outline of talk. Part 1. An Introduction to SVMs 1.1. SVMs for binary classification. 1.2. Soft margins and multi-class classification.
Gerry Hobbs, Department of Statistics, West Virginia University
Decision Trees as a Predictive Modeling Method Gerry Hobbs, Department of Statistics, West Virginia University Abstract Predictive modeling has become an important area of interest in tasks such as credit
An Overview of Knowledge Discovery Database and Data mining Techniques
An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,
Neural Networks and Support Vector Machines
INF5390 - Kunstig intelligens Neural Networks and Support Vector Machines Roar Fjellheim INF5390-13 Neural Networks and SVM 1 Outline Neural networks Perceptrons Neural networks Support vector machines
Component Ordering in Independent Component Analysis Based on Data Power
Component Ordering in Independent Component Analysis Based on Data Power Anne Hendrikse Raymond Veldhuis University of Twente University of Twente Fac. EEMCS, Signals and Systems Group Fac. EEMCS, Signals
SUCCESSFUL PREDICTION OF HORSE RACING RESULTS USING A NEURAL NETWORK
SUCCESSFUL PREDICTION OF HORSE RACING RESULTS USING A NEURAL NETWORK N M Allinson and D Merritt 1 Introduction This contribution has two main sections. The first discusses some aspects of multilayer perceptrons,
Statistical Data Mining. Practical Assignment 3 Discriminant Analysis and Decision Trees
Statistical Data Mining Practical Assignment 3 Discriminant Analysis and Decision Trees In this practical we discuss linear and quadratic discriminant analysis and tree-based classification techniques.
6.2.8 Neural networks for data mining
6.2.8 Neural networks for data mining Walter Kosters 1 In many application areas neural networks are known to be valuable tools. This also holds for data mining. In this chapter we discuss the use of neural
L25: Ensemble learning
L25: Ensemble learning Introduction Methods for constructing ensembles Combination strategies Stacked generalization Mixtures of experts Bagging Boosting CSCE 666 Pattern Analysis Ricardo Gutierrez-Osuna
Chapter 6. The stacking ensemble approach
82 This chapter proposes the stacking ensemble approach for combining different data mining classifiers to get better performance. Other combination techniques like voting, bagging etc are also described
Application of Event Based Decision Tree and Ensemble of Data Driven Methods for Maintenance Action Recommendation
Application of Event Based Decision Tree and Ensemble of Data Driven Methods for Maintenance Action Recommendation James K. Kimotho, Christoph Sondermann-Woelke, Tobias Meyer, and Walter Sextro Department
degrees of freedom and are able to adapt to the task they are supposed to do [Gupta].
1.3 Neural Networks 19 Neural Networks are large structured systems of equations. These systems have many degrees of freedom and are able to adapt to the task they are supposed to do [Gupta]. Two very
Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.
Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C
EFFICIENT DATA PRE-PROCESSING FOR DATA MINING
EFFICIENT DATA PRE-PROCESSING FOR DATA MINING USING NEURAL NETWORKS JothiKumar.R 1, Sivabalan.R.V 2 1 Research scholar, Noorul Islam University, Nagercoil, India Assistant Professor, Adhiparasakthi College
Leveraging Ensemble Models in SAS Enterprise Miner
ABSTRACT Paper SAS133-2014 Leveraging Ensemble Models in SAS Enterprise Miner Miguel Maldonado, Jared Dean, Wendy Czika, and Susan Haller SAS Institute Inc. Ensemble models combine two or more models to
Programming Exercise 3: Multi-class Classification and Neural Networks
Programming Exercise 3: Multi-class Classification and Neural Networks Machine Learning November 4, 2011 Introduction In this exercise, you will implement one-vs-all logistic regression and neural networks
Ensemble Data Mining Methods
Ensemble Data Mining Methods Nikunj C. Oza, Ph.D., NASA Ames Research Center, USA INTRODUCTION Ensemble Data Mining Methods, also known as Committee Methods or Model Combiners, are machine learning methods
Learning outcomes. Knowledge and understanding. Competence and skills
Syllabus Master s Programme in Statistics and Data Mining 120 ECTS Credits Aim The rapid growth of databases provides scientists and business people with vast new resources. This programme meets the challenges
Operations Research and Knowledge Modeling in Data Mining
Operations Research and Knowledge Modeling in Data Mining Masato KODA Graduate School of Systems and Information Engineering University of Tsukuba, Tsukuba Science City, Japan 305-8573 [email protected]
Solving Regression Problems Using Competitive Ensemble Models
Solving Regression Problems Using Competitive Ensemble Models Yakov Frayman, Bernard F. Rolfe, and Geoffrey I. Webb School of Information Technology Deakin University Geelong, VIC, Australia {yfraym,brolfe,webb}@deakin.edu.au
Data Mining mit der JMSL Numerical Library for Java Applications
Data Mining mit der JMSL Numerical Library for Java Applications Stefan Sineux 8. Java Forum Stuttgart 07.07.2005 Agenda Visual Numerics JMSL TM Numerical Library Neuronale Netze (Hintergrund) Demos Neuronale
A Learning Algorithm For Neural Network Ensembles
A Learning Algorithm For Neural Network Ensembles H. D. Navone, P. M. Granitto, P. F. Verdes and H. A. Ceccatto Instituto de Física Rosario (CONICET-UNR) Blvd. 27 de Febrero 210 Bis, 2000 Rosario. República
How To Make A Credit Risk Model For A Bank Account
TRANSACTIONAL DATA MINING AT LLOYDS BANKING GROUP Csaba Főző [email protected] 15 October 2015 CONTENTS Introduction 04 Random Forest Methodology 06 Transactional Data Mining Project 17 Conclusions
Comparison of Data Mining Techniques used for Financial Data Analysis
Comparison of Data Mining Techniques used for Financial Data Analysis Abhijit A. Sawant 1, P. M. Chawan 2 1 Student, 2 Associate Professor, Department of Computer Technology, VJTI, Mumbai, INDIA Abstract
Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches
Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches PhD Thesis by Payam Birjandi Director: Prof. Mihai Datcu Problematic
Java Modules for Time Series Analysis
Java Modules for Time Series Analysis Agenda Clustering Non-normal distributions Multifactor modeling Implied ratings Time series prediction 1. Clustering + Cluster 1 Synthetic Clustering + Time series
Exploratory Data Analysis with MATLAB
Computer Science and Data Analysis Series Exploratory Data Analysis with MATLAB Second Edition Wendy L Martinez Angel R. Martinez Jeffrey L. Solka ( r ec) CRC Press VV J Taylor & Francis Group Boca Raton
Neural Network Add-in
Neural Network Add-in Version 1.5 Software User s Guide Contents Overview... 2 Getting Started... 2 Working with Datasets... 2 Open a Dataset... 3 Save a Dataset... 3 Data Pre-processing... 3 Lagging...
Nonlinear Iterative Partial Least Squares Method
Numerical Methods for Determining Principal Component Analysis Abstract Factors Béchu, S., Richard-Plouet, M., Fernandez, V., Walton, J., and Fairley, N. (2016) Developments in numerical treatments for
THE HYBRID CART-LOGIT MODEL IN CLASSIFICATION AND DATA MINING. Dan Steinberg and N. Scott Cardell
THE HYBID CAT-LOGIT MODEL IN CLASSIFICATION AND DATA MINING Introduction Dan Steinberg and N. Scott Cardell Most data-mining projects involve classification problems assigning objects to classes whether
CI6227: Data Mining. Lesson 11b: Ensemble Learning. Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore.
CI6227: Data Mining Lesson 11b: Ensemble Learning Sinno Jialin PAN Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore Acknowledgements: slides are adapted from the lecture notes
Local features and matching. Image classification & object localization
Overview Instance level search Local features and matching Efficient visual recognition Image classification & object localization Category recognition Image classification: assigning a class label to
Introduction to Machine Learning and Data Mining. Prof. Dr. Igor Trajkovski [email protected]
Introduction to Machine Learning and Data Mining Prof. Dr. Igor Trakovski [email protected] Neural Networks 2 Neural Networks Analogy to biological neural systems, the most robust learning systems
Linear Classification. Volker Tresp Summer 2015
Linear Classification Volker Tresp Summer 2015 1 Classification Classification is the central task of pattern recognition Sensors supply information about an object: to which class do the object belong
Ensemble Methods. Knowledge Discovery and Data Mining 2 (VU) (707.004) Roman Kern. KTI, TU Graz 2015-03-05
Ensemble Methods Knowledge Discovery and Data Mining 2 (VU) (707004) Roman Kern KTI, TU Graz 2015-03-05 Roman Kern (KTI, TU Graz) Ensemble Methods 2015-03-05 1 / 38 Outline 1 Introduction 2 Classification
Data Mining Techniques for Prognosis in Pancreatic Cancer
Data Mining Techniques for Prognosis in Pancreatic Cancer by Stuart Floyd A Thesis Submitted to the Faculty of the WORCESTER POLYTECHNIC INSTITUE In partial fulfillment of the requirements for the Degree
STA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! [email protected]! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct
Analysis of kiva.com Microlending Service! Hoda Eydgahi Julia Ma Andy Bardagjy December 9, 2010 MAS.622j
Analysis of kiva.com Microlending Service! Hoda Eydgahi Julia Ma Andy Bardagjy December 9, 2010 MAS.622j What is Kiva? An organization that allows people to lend small amounts of money via the Internet
International Journal of Computer Science Trends and Technology (IJCST) Volume 3 Issue 3, May-June 2015
RESEARCH ARTICLE OPEN ACCESS Data Mining Technology for Efficient Network Security Management Ankit Naik [1], S.W. Ahmad [2] Student [1], Assistant Professor [2] Department of Computer Science and Engineering
Novelty Detection in image recognition using IRF Neural Networks properties
Novelty Detection in image recognition using IRF Neural Networks properties Philippe Smagghe, Jean-Luc Buessler, Jean-Philippe Urban Université de Haute-Alsace MIPS 4, rue des Frères Lumière, 68093 Mulhouse,
TRAINING A LIMITED-INTERCONNECT, SYNTHETIC NEURAL IC
777 TRAINING A LIMITED-INTERCONNECT, SYNTHETIC NEURAL IC M.R. Walker. S. Haghighi. A. Afghan. and L.A. Akers Center for Solid State Electronics Research Arizona State University Tempe. AZ 85287-6206 [email protected]
Chapter 12 Discovering New Knowledge Data Mining
Chapter 12 Discovering New Knowledge Data Mining Becerra-Fernandez, et al. -- Knowledge Management 1/e -- 2004 Prentice Hall Additional material 2007 Dekai Wu Chapter Objectives Introduce the student to
7 Time series analysis
7 Time series analysis In Chapters 16, 17, 33 36 in Zuur, Ieno and Smith (2007), various time series techniques are discussed. Applying these methods in Brodgar is straightforward, and most choices are
Support Vector Machines with Clustering for Training with Very Large Datasets
Support Vector Machines with Clustering for Training with Very Large Datasets Theodoros Evgeniou Technology Management INSEAD Bd de Constance, Fontainebleau 77300, France [email protected] Massimiliano
Interactive Data Mining and Visualization
Interactive Data Mining and Visualization Zhitao Qiu Abstract: Interactive analysis introduces dynamic changes in Visualization. On another hand, advanced visualization can provide different perspectives
Advanced Ensemble Strategies for Polynomial Models
Advanced Ensemble Strategies for Polynomial Models Pavel Kordík 1, Jan Černý 2 1 Dept. of Computer Science, Faculty of Information Technology, Czech Technical University in Prague, 2 Dept. of Computer
Data Mining - Evaluation of Classifiers
Data Mining - Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010
Chapter 11 Boosting. Xiaogang Su Department of Statistics University of Central Florida - 1 -
Chapter 11 Boosting Xiaogang Su Department of Statistics University of Central Florida - 1 - Perturb and Combine (P&C) Methods have been devised to take advantage of the instability of trees to create
REVIEW OF ENSEMBLE CLASSIFICATION
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology ISSN 2320 088X IJCSMC, Vol. 2, Issue.
Machine Learning Final Project Spam Email Filtering
Machine Learning Final Project Spam Email Filtering March 2013 Shahar Yifrah Guy Lev Table of Content 1. OVERVIEW... 3 2. DATASET... 3 2.1 SOURCE... 3 2.2 CREATION OF TRAINING AND TEST SETS... 4 2.3 FEATURE
NEURAL NETWORKS A Comprehensive Foundation
NEURAL NETWORKS A Comprehensive Foundation Second Edition Simon Haykin McMaster University Hamilton, Ontario, Canada Prentice Hall Prentice Hall Upper Saddle River; New Jersey 07458 Preface xii Acknowledgments
INTELLIGENT ENERGY MANAGEMENT OF ELECTRICAL POWER SYSTEMS WITH DISTRIBUTED FEEDING ON THE BASIS OF FORECASTS OF DEMAND AND GENERATION Chr.
INTELLIGENT ENERGY MANAGEMENT OF ELECTRICAL POWER SYSTEMS WITH DISTRIBUTED FEEDING ON THE BASIS OF FORECASTS OF DEMAND AND GENERATION Chr. Meisenbach M. Hable G. Winkler P. Meier Technology, Laboratory
An Introduction to Data Mining
An Introduction to Intel Beijing [email protected] January 17, 2014 Outline 1 DW Overview What is Notable Application of Conference, Software and Applications Major Process in 2 Major Tasks in Detail
IBM SPSS Neural Networks 22
IBM SPSS Neural Networks 22 Note Before using this information and the product it supports, read the information in Notices on page 21. Product Information This edition applies to version 22, release 0,
Supervised Learning (Big Data Analytics)
Supervised Learning (Big Data Analytics) Vibhav Gogate Department of Computer Science The University of Texas at Dallas Practical advice Goal of Big Data Analytics Uncover patterns in Data. Can be used
An Introduction to Machine Learning
An Introduction to Machine Learning L5: Novelty Detection and Regression Alexander J. Smola Statistical Machine Learning Program Canberra, ACT 0200 Australia [email protected] Tata Institute, Pune,
Benchmarking of different classes of models used for credit scoring
Benchmarking of different classes of models used for credit scoring We use this competition as an opportunity to compare the performance of different classes of predictive models. In particular we want
Biometric Authentication using Online Signatures
Biometric Authentication using Online Signatures Alisher Kholmatov and Berrin Yanikoglu [email protected], [email protected] http://fens.sabanciuniv.edu Sabanci University, Tuzla, Istanbul,
Car Insurance. Havránek, Pokorný, Tomášek
Car Insurance Havránek, Pokorný, Tomášek Outline Data overview Horizontal approach + Decision tree/forests Vertical (column) approach + Neural networks SVM Data overview Customers Viewed policies Bought
Decision Trees from large Databases: SLIQ
Decision Trees from large Databases: SLIQ C4.5 often iterates over the training set How often? If the training set does not fit into main memory, swapping makes C4.5 unpractical! SLIQ: Sort the values
Data Mining and Visualization
Data Mining and Visualization Jeremy Walton NAG Ltd, Oxford Overview Data mining components Functionality Example application Quality control Visualization Use of 3D Example application Market research
A Property & Casualty Insurance Predictive Modeling Process in SAS
Paper AA-02-2015 A Property & Casualty Insurance Predictive Modeling Process in SAS 1.0 ABSTRACT Mei Najim, Sedgwick Claim Management Services, Chicago, Illinois Predictive analytics has been developing
Model Combination. 24 Novembre 2009
Model Combination 24 Novembre 2009 Datamining 1 2009-2010 Plan 1 Principles of model combination 2 Resampling methods Bagging Random Forests Boosting 3 Hybrid methods Stacking Generic algorithm for mulistrategy
Making Sense of the Mayhem: Machine Learning and March Madness
Making Sense of the Mayhem: Machine Learning and March Madness Alex Tran and Adam Ginzberg Stanford University [email protected] [email protected] I. Introduction III. Model The goal of our research
Stephen MacDonell Andrew Gray Grant MacLennan Philip Sallis. The Information Science Discussion Paper Series. Number 99/12 June 1999 ISSN 1177-455X
DUNEDIN NEW ZEALAND Software Forensics for Discriminating between Program Authors using Case-Based Reasoning, Feed-Forward Neural Networks and Multiple Discriminant Analysis Stephen MacDonell Andrew Gray
Social Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
1. Classification problems
Neural and Evolutionary Computing. Lab 1: Classification problems Machine Learning test data repository Weka data mining platform Introduction Scilab 1. Classification problems The main aim of a classification
Environmental Remote Sensing GEOG 2021
Environmental Remote Sensing GEOG 2021 Lecture 4 Image classification 2 Purpose categorising data data abstraction / simplification data interpretation mapping for land cover mapping use land cover class
MHI3000 Big Data Analytics for Health Care Final Project Report
MHI3000 Big Data Analytics for Health Care Final Project Report Zhongtian Fred Qiu (1002274530) http://gallery.azureml.net/details/81ddb2ab137046d4925584b5095ec7aa 1. Data pre-processing The data given
Data Mining. Supervised Methods. Ciro Donalek [email protected]. Ay/Bi 199ab: Methods of Computa@onal Sciences hcp://esci101.blogspot.
Data Mining Supervised Methods Ciro Donalek [email protected] Supervised Methods Summary Ar@ficial Neural Networks Mul@layer Perceptron Support Vector Machines SoLwares Supervised Models: Supervised
Machine learning for algo trading
Machine learning for algo trading An introduction for nonmathematicians Dr. Aly Kassam Overview High level introduction to machine learning A machine learning bestiary What has all this got to do with
Intrusion Detection via Machine Learning for SCADA System Protection
Intrusion Detection via Machine Learning for SCADA System Protection S.L.P. Yasakethu Department of Computing, University of Surrey, Guildford, GU2 7XH, UK. [email protected] J. Jiang Department
Weather forecast prediction: a Data Mining application
Weather forecast prediction: a Data Mining application Ms. Ashwini Mandale, Mrs. Jadhawar B.A. Assistant professor, Dr.Daulatrao Aher College of engg,karad,[email protected],8407974457 Abstract
Linear Models for Classification
Linear Models for Classification Sumeet Agarwal, EEL709 (Most figures from Bishop, PRML) Approaches to classification Discriminant function: Directly assigns each data point x to a particular class Ci
New Work Item for ISO 3534-5 Predictive Analytics (Initial Notes and Thoughts) Introduction
Introduction New Work Item for ISO 3534-5 Predictive Analytics (Initial Notes and Thoughts) Predictive analytics encompasses the body of statistical knowledge supporting the analysis of massive data sets.
Decompose Error Rate into components, some of which can be measured on unlabeled data
Bias-Variance Theory Decompose Error Rate into components, some of which can be measured on unlabeled data Bias-Variance Decomposition for Regression Bias-Variance Decomposition for Classification Bias-Variance
Clustering & Visualization
Chapter 5 Clustering & Visualization Clustering in high-dimensional databases is an important problem and there are a number of different clustering paradigms which are applicable to high-dimensional data.
