Solving Regression Problems Using Competitive Ensemble Models

Save this PDF as:

Size: px
Start display at page:

Transcription

2 512 Y. Frayman, B.F. Rolfe, and G.I. Webb a weighted average of all the weak predictors. The AdaBoost is shown to be effective for reducing bias and variance in a wide range of classification problems [3]. However, the recent work on boosting [13] have shown that AdaBoost algorithm is an optimization method for finding a model that minimizes a particular exponential loss function, namely the Bernoulli likelihood. Recent investigations of the boosting algorithm from the statistical viewpoint [18] have found that while boosting algorithm appears complex on surface, it is similar to the ways linear models are fitted in statistics. This opens the possibility to unite seemingly different, but essentially similar approaches to ensemble modeling from machine learning and statistical and engineering viewpoints. The current trend in boosting with major developments in boosted regression [9] is going in a direction similar to the statistical approach to model mixing. The boosted regression approach follows the spirit of AdaBoost algorithm by repeatedly performing weighted tree regression followed by increasing weighting of the poorly predicted observations and decreasing the weighting of the better predicted samples. The work on Gradient Boosting Machine [14] uses the connection between boosting and optimization more explicitly. At each iteration the algorithm determines the direction, the gradient, in which it needs to improve the fit to the data and selects a particular model from the available class of functions that is most in agreement with the direction. While this is a significant improvement on the approach of AdaBoost, the Gradient Boosting Machine algorithm is, however, increasingly similar to other non parametric regression models such as neural networks. As such the Gradient Boosting Machine still does not provide a reliably guidance to selection of ensemble members and effective means of combining such ensemble members. This leads us to analysis of the available methods for creation of ensemble members and their combination for efficient ensembles. 2 Ensemble Modeling The effectiveness of an ensemble can be measured by the extent to which the members are error independent (show different patterns of generalization) [19]. The ideal would be a set of models where each of the models generalize well, and when they do make errors on new data, these errors are not shared with any other models [19]. 2.1 Creating Ensemble Members In order to create efficient ensembles we need to consider the relative merits of methods of creating ensemble members, and to choose and apply one that is likely to result in models that generalize differently. There are several possible ways to achieve this objective that include the following: Sampling data: A set of models for an ensemble is commonly created by using some form of sampling, such that each model in the ensemble is trained on a different sub sample of the training data. Re-sampling methods which have been used for this purpose include cross-validation, and bootstrapping [5]. In bagging [5], a training set containing N cases is perturbed by sampling with replacement (bootstrap) N times from the training set. The perturbed data set may contain repeats. This procedure can be repeated several times to create a number of different, although overlapping, data sets. A method similar

3 Solving Regression Problems Using Competitive Ensemble Models 513 to the sampling is the use of disjoint training sets, that is sampling without replacement. There is then no overlap between the data used to train different models. However, the presence of correlation between the errors of the ensemble members could reduce the effectiveness of the ensemble [15]. While sampling the data might be an effective way of producing models that generalize differently, this will not necessarily result in low error correlations as it requires a representative training set where a function being inferred is similar to that which generated the test set [8]. However, two representative training sets could lead to very similar functions being inferred, so their pattern of errors on the new data would be very similar. On the other hand, if ensemble members are trained using unrepresentative training sets, the resulting generalization performance would be poor. Each member might show different patterns of generalization, but as the amount of errors increases so does the probability that their errors will overlap. Boosting and adaptive resampling: In boosting, as discussed previously, a series of weak learners could be converted to a strong learner as a result of training the members of an ensemble on patterns that have been filtered by previously trained members [20], [23]. AdaBoost algorithm [11] has training sets adaptively resampled, such that the weights in the resampling are increased for those cases which are most often predicted incorrectly. Varying the learning method employed: The learning method used to train the models could be varied while holding the data constant when creating ensemble members. An ensemble might be constructed from models generated by a combination of learning techniques such as various statistical methods, linear regression, neural networks, k nearest neighbors, decision trees, and Markov chains [7]. While this approach is not commonly used, in our opinion, it is a very promising approach to ensemble learning, as the use of different learning methods for ensemble members is more likely to result in different patterns of generalization than sampling the data. Furthermore, there is a possibility to combine varying both the learning method and the data, for example, with adaptive re sampling of the data. In this paper we will investigate the approach of varying the learning method for creation of ensemble members and compare it with approaches that use sampling the data. 2.2 Combining Ensemble Members The next step in ensemble learning is to find an effective way of combining model outputs. While there exists several possible ways of combining models in an ensemble, the co operative combination is the most dominant one. In co operative combination it is assumed that all of the ensemble members will make some contribution to the ensemble decision, even though this contribution may be weighted in some way. Methods of combining the models in co operative fashion include the following: Averaging and weighted averaging: Linear combination of the outputs of the ensemble members are one of the most popular aggregation methods. A single output can be created from a set of model outputs via simple averaging, or by means of a weighted average that takes account of the relative accuracy of the models to be combined. Stacked generalization: Stacked generalization [24] uses an additional model that learns how to combine the models with weights that vary over the feature space. The

4 514 Y. Frayman, B.F. Rolfe, and G.I. Webb outputs from a set of level 0 generalizers are used as the input to a level 1 generalizer, which is trained to produce the appropriate output. It is also possible to view other methods of combining, such as averaging, as instances of stacking with a simple level 1 generalizer. In competitive combination, on the other hand, it is assumed that for each input only the most appropriate ensemble member will be selected based on either the inputs or outputs of the models [21]. The two main methods for a competitive combination (selection) are: Gating: Under the divide and conquer approach employed by mixtures-of-experts [16] and hierarchical mixtures-of-experts [17] the complex problem is decomposed into a set of simpler problems. The data is partitioned into regions and the simple surfaces are fitted to the data into each region. The regions have soft boundaries where data points may lie simultaneously in multiple regions. Such decomposition ensures that the errors made by the expert models will not be correlated as they deal with different data points. A gating model is used to output a set of scalar coefficients that weights the contributions of the various inputs. Rule based switching: In this case, the switching between the models can be triggered on the basis of the input or the output of one of the models. For example, in the study on the diagnosis of myocardial infarction (heart attack), two models were optimized separately by varying the proportion of high risk and low risk patients in the training sets. The first model was trained to make as few positive errors as possible, and the second model was trained to make as few negative errors as possible [4]. The output of the first model was used unless it exceeds a threshold in which case the output of the second model was used. There exists other examples of rule based switching, for example, switching of control to the most appropriate model depending on the current situation as it is exploited in behavior based robotics [6]. Our empirical observation [10] has been that the better results can be obtained through the use of a more explicit rule based switching between models. 3 Computational Experiments 3.1 Experimental Set Up To evaluate the performance of a competitive ensemble model, several public domain regression data sets were selected from DELVE (Data for Evaluating Learning in Valid Experiments) (see delve/). DELVE is a standardized environment designed to evaluate the performance of methods that learn relationships based primarily on empirical data. DELVE makes it possible for users to compare their learning methods with other methods on many data sets. The DELVE learning methods and evaluation procedures are well documented, such that meaningful comparisons can be made. Since our approach involves ensembles, we compared the performance of our competitive ensemble model to that of multivariable adaptive regression splines (MARS) with bagging, mixture of experts, hierarchical mixture of experts, neural network ensemble and also with a standard linear regression as a baseline method. The competitive ensemble model was used in accordance with DELVE guidelines. The performance of other

6 516 Y. Frayman, B.F. Rolfe, and G.I. Webb 3.2 Regression Methods Used The competitive (selection) model used in this paper was created following the guidelines discussed previously. A non linear model (neural network) was trained to select the appropriate output of the ensemble members as a final output of the ensemble model based on the performance of ensemble members on a particular data tuple. The aim here is basically to create a global model (ensemble model) where each of the ensemble members is acting as a local predictor in the area of its best performance. In such a rule based switching [22], the control is switched between the ensemble members depending on the output of one of the members. For the ensemble members (level 0 generalizers) we have selected a linear method (linear regression), an efficient nonlinear method (multilayer perceptron (MLP) with two hidden layers with 20 nodes each), a logistic regression (MLP with a single hidden node) and a clustering algorithm (KNN) with k equal 10. The reasons for selecting these learning methods for creation of the ensemble members are that these methods are very different in nature and as such have different learning biases. In our experience, these learning methods have a different pattern of generalization and as such can produce an efficient ensemble. As a level 1 generalizer (selector model) another MLP consisting of 2 hidden layers of 20 hidden nodes each was used. All MLPs were fully connected with hyperbolic tangent hidden units and linear output units. All ensemble members were trained using the back propagation learning algorithm with early stopping to avoiding over fitting. The pattern (on line) learning was used. The learning rate of 0.05 and momentum of 0.99 were used for all MLPs to avoid local minima. The structure and the parameters of both MLPs and the KNN were selected based on our experience with other data sets and no attempt was made to optimize the level 0 models to the data sets considered. The structure of the competitive model is in Fig 1. In this case all the available inputs in a training set were supplied to the level 0 models and the selection model. Selection model also receives the output of the level 0 models. In the learning phase a binary value is assigned to the output of the selection model that represents the best model or otherwise for a particular data tuple. The selection model is solving a classification problem which is to choose the appropriate output of the one of the level 0 models as the final output of the ensemble. Supplying the input data to the selection model is not strictly necessary as the MLP with 2 hidden layers is a very efficient classifier, but may help the selection model to distinguish between the level 0 models in some cases. For comparison with our selection model we have used several popular regression methods: multivariable adaptive regression splines with bagging, mixture of-experts, hierarchical mixture of-experts and ensemble of neural networks. We have already discussed mixture of-experts and hierarchical mixture of-experts methods. In the following, we will briefly describe the rest of the methods. Multivariable adaptive regression splines (MARS) [12] were created to provide the advantages of tree based regression methods without the disadvantages of the response being discontinuous along the boundaries. Here each step basis function in the predictor space is replaced by a pair of linear basis functions. The new splits are not required to

7 Solving Regression Problems Using Competitive Ensemble Models 517 Outputs (Selection) Level 1 model Level 0 models Inputs Fig. 1. Selection Model. depend on the previous splits. The final solution is made smooth by replacing the linear functions with cubic functions after backward deletion of unnecessary basic functions. Table 2. Results from Boston data-set. Standardized Standard error Significance Method estimated for difference of difference expected loss estimate (F-test) Selection ensemble Linear regression < 0.05 HME (ensemble learning) < 0.05 HME (early stopping) < 0.05 HME (growing and early stopping) < 0.05 MARS version 3.6 with Bagging < 0.05 ME (ensemble learning) < 0.05 ME (early stopping) < 0.05 MLP (early stopping) < 0.05 MARS was used in conjunction with the bagging procedure [5]. Using this method one trains MARS on a number of bootstrap samples of the training set and averages the resulting predictions. The bootstrap samples are generated by sampling the original training set with replacement. Samples of the same size as the original training set are used. The ensemble of neural networks from DELVE repository trains ensembles of MLPs using early stopping to avoiding over fitting. The networks all have identical architecture: fully connected with a single hidden layer of hyperbolic tangent units and linear output units. The minimization algorithm is based on conjugate gradients. A fraction of the training examples are held out for validation, and performance on this set is monitored while the iterative learning procedure is applied. The learning is stopped as soon as minimum in validation error is achieved.

8 518 Y. Frayman, B.F. Rolfe, and G.I. Webb The number of hidden units is chosen to be the smallest such that the number of weights is at least as large as the total number of training cases after removal of the validation cases. One third of the training examples were used for validation and the rest for training. Table 3. Results from Kin-8nh and Kin-32nh data sets. Kin-8nh Standarized Standard error Significance Method estimated for difference of difference expected loss estimate (T-test) Selection ensemble Linear regression HME (ensemble learning) HME (early stopping) HME (growing and early stopping) MARS version 3.6 with Bagging ME (ensemble learning) ME (early stopping) MLP (early stopping) Kin-32nh Standarized Standard error Significance Method estimated for difference of difference expected loss estimate (T-test) Selection ensemble Linear regression HME (ensemble learning) HME (early stopping) HME (growing and early stopping) MARS version 3.6 with Bagging ME (ensemble learning) ME (early stopping) MLP (early stopping) Experimental Results and Discussion For each of the data sets the attribute variables were normalized to a median of zero and an average absolute deviation from the median of one. This enabled each level 0 model to learn from the same set of data as certain models have limits on the input and output data ranges. A standard squared loss function between the target output value and the predicted value was used to calculate the error for each ensemble member. The predicted values were then un normalised before they were compared to the target values. We have only run the competitive model once over each training and testing sets using the same parameters for the ensemble members, in accordance with DELVE guidelines. All the results obtained are thus the averages of these runs.

9 Solving Regression Problems Using Competitive Ensemble Models 519 DELVE uses an ANOVA (ANalysis Of VAriance) model to estimate the statistics in evaluating the performance of different models. Tables 2 6 contain the resulting statistical values from the competitive (selection) ensemble model and other ensemble models on the five data sets considered. Each table includes the standardized estimated squared error loss that is calculated against a simple baseline method within DELVE. The second statistic is the standard error of the estimated expected difference. This gives an indication of the variability of the estimated expected difference between the two methods. The final statistic is either F test or T test value which is a probability that the null hypothesis (no difference between the expected loss of the two methods) is true. This means that high values of the F test or of the T test would indicate the two methods are probably similar, and low values of the tests indicate that the difference between the performances of different methods are statistically significant. Table 4. Results from Pumadyn-8nh and Pumadyn-32nh data sets. Pumadyn-8nh Standardized Standard error Significance Method estimated for difference of difference expected loss estimate (F-test) Selection ensemble Linear regression HME (ensemble learning) HME (early stopping) HME (growing and early stopping) MARS version 3.6 with Bagging ME (ensemble learning) ME (early stopping) MLP (early stopping) Pumadyn-32nh Standarized Standard error Significance Method estimated for difference of difference expected loss estimate (T-test) Selection ensemble Linear regression HME (ensemble learning) HME (early stopping) HME (growing and early stopping) MARS version 3.6 with Bagging ME (ensemble learning) ME (early stopping) MLP (early stopping) In Tables 2 6, HME (ensemble learning) is a Hierarchical mixture of experts trained using Bayesian methods, HME (early stopping) is a Hierarchical mixtures of experts trained using early stopping, HME (growing and early stopping) is a Hierarchical mixtures of experts trained using growing and early stopping, ME (ensemble learning) is a Mixtures of experts trained using Bayesian methods, ME (early stopping) is a Mix-

10 520 Y. Frayman, B.F. Rolfe, and G.I. Webb Median House Price Boston Target Selection Ensemble MARS Prediction Samples Fig. 2. First 100 samples of the Boston data, plot of the selection ensemble model and the MARS model version 3.6 with bagging versus the actual target. tures of experts trained using early stopping and MLP (early stopping) is a Multilayer perceptron ensembles trained with early stopping. The actual predictions of a competitive model and the best of the comparison models are shown in Figs 2 4. For clarity only the first 100 samples of testing set are shown for respective data sets. Tables 2 6 show that the competitive (selection) model is significantly better, to a confidence of 95% (p =0.05), than any of the other methods. This demonstrates that the competitive (selection) combination in conjunction with varying the learning method approach for creating ensemble members is working better than the co operative combination in conjunction with sampling the data approach for creating ensemble members. Distance to End-Effector Kin-8nh Target Selection Ensemble MLP Ensemble Distance to End-Effector Kin-32nh Target Selection Ensemble Mixture of Experts Prediction Samples Samples Fig. 3. First 100 samples of the (a) Kin-8nh data, plot of the selection ensemble model and the MLP ensemble model with early stopping versus the actual target (b) the Kin-32nh data, plot of the selection ensemble model and the mixture-of-experts model with early stopping versus the actual target.

11 Solving Regression Problems Using Competitive Ensemble Models 521 Angular Acceleration Pumadyn-8nh Target Selection Ensemble MARS Prediction Angular Acceleration Pumadyn-32nh Target Selection Ensemble MARS Prediction Samples Samples Fig. 4. First 100 samples of the (a) Pumadyn-8nh data, plot of the selection ensemble model and the MARS model version 3.6 with bagging versus the actual target, (b) the Pumadyn-32nh data, plot of the selection ensemble model and the MARS model version 3.6 with bagging versus the actual target. However, while the competitive model is much better than the other models, there is still much room for improvement considering the actual prediction accuracy. The prediction of the competitive (selection) model for Boston Housing data set can be considered good, as seen in Fig. 2, even though there are some predictions that are not quite accurate. In case of Figs. 3 and 4, the results of the selection model are not as good, especially in case of Fig. 3(b), even though the selection model out performs the other ensemble models. We obviously need to keep in mind that we explicitly selected the most difficult data sets of that available in DELVE. However, based on our experience with real data sets [10] the selected regression problems are no more difficult than the many real non linear problems with a high degree of noise. This just shows that the ultimate goal of the prediction is not just to achieve better results than the competing methods, but to achieve the best possible prediction. References 1. Barnett, J. A. Computational methods for a mathematical theory of evidence, Proceedings of IJCAI, pp , Bates, J. M. and C. W. J. Granger. The combination of forecasts. Operations Research Quaterly, 20: , Bauer, E. and Kohavi, R. An empirical comparison of voting classification algorithms: bagging, boosting and variants. Machine Learning, 36(1,2), , Baxt, W. G. Improving the accuracy of an artificial neural network using multiple differently trained networks. Neural Computation, 4: , Breiman, L. Bagging predictors. Machine Learning, 26(2): , Brooks, R. A. A robust layered control system for a mobile robot. IEEE Journal of Robotics and Automation, 2:14 23, Catfolis, T. and Meert, K. Hybridization and specialization of real time recurrent learning based neural networks, Connectionist Science, 9(1):51 70, 1997.

12 522 Y. Frayman, B.F. Rolfe, and G.I. Webb 8. Denker, J., Schwartz, D., Wittner, B., Solla, S., Howard, R., Jackel, L. and Hopfield, J. Large automatic learning, rule extraction and generalisation. Complex Systems, 1: , Drucker, H. Improving regressors using boosting techniques. Proceedings of the 14th International Conference on Machine Learning, pp , Frayman, Y., Rolfe B. F., Hodgson, P. D. and Webb G. I. Predicting the rolling force in hot steel rolling mill using an ensemble model. Proceedings of the IASTED International Conference on Artificial Intelligence and Applications (AIA 2002), (in press). 11. Freund, Y. and R.Schapire. A decision theoretic generalization of on line learning and an application to boosting. Journal of Computer and System Sciences, 55(1): , Friedman J. Multivariate adaptive regression splines (with discussion). Annals of Statistics, 19(1), 1 82, Friedman, J., Hastie, T., and Tibshirani, R. Additive logistic regression: a statistical view of boosting (with discussion), Annals of Statistics, 28(2), , Friedman, J. Greedy function approximation: a gradient boosting machine. Annals of Statistics, 29(4) Hashem, S. Optimal linear combinations of neural networks. Neural Networks, 10(4): , Jacobs, R. A, Jordan, M. I., Nowlan, S. J., and Hinton, G. E. Adaptive mixtures of local experts. Neural Computation, 3:79 97, Jordan, M. I. and Jacobs R. A. Hierarchical mixtures of experts and the em algorithm. Neural Computation, 6(2): , Ridgeway, G. The state of boosting. Computing Science and Statistics, 31: , Rogova, G. Combining the results of several neural network classifiers. Neural Networks, 7(5): , Schapire, R. E. The strength of weak learnability. Machine Learning, 5: , Sharkey, A.J.C. (Ed.) Combining artificial neural nets: ensemble and modular multi-net systems, Springer-Verlag, Ting, K. M. The characterisation of predictive accuracy and decision combination. Proceedings of the 13th International Conference on Machine Learning, pp , Webb, G. MultiBoosting: a technique for combining boosting and wagging. Machine Learning, 40(2): , Wolpert, D.H. Stacked generalization. Neural Networks, 5: , 1992.

Data Mining Practical Machine Learning Tools and Techniques

Ensemble learning Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 8 of Data Mining by I. H. Witten, E. Frank and M. A. Hall Combining multiple models Bagging The basic idea

L25: Ensemble learning

L25: Ensemble learning Introduction Methods for constructing ensembles Combination strategies Stacked generalization Mixtures of experts Bagging Boosting CSCE 666 Pattern Analysis Ricardo Gutierrez-Osuna

On the effect of data set size on bias and variance in classification learning

On the effect of data set size on bias and variance in classification learning Abstract Damien Brain Geoffrey I Webb School of Computing and Mathematics Deakin University Geelong Vic 3217 With the advent

REVIEW OF ENSEMBLE CLASSIFICATION

Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology ISSN 2320 088X IJCSMC, Vol. 2, Issue.

Ensemble Methods. Knowledge Discovery and Data Mining 2 (VU) (707.004) Roman Kern. KTI, TU Graz 2015-03-05

Ensemble Methods Knowledge Discovery and Data Mining 2 (VU) (707004) Roman Kern KTI, TU Graz 2015-03-05 Roman Kern (KTI, TU Graz) Ensemble Methods 2015-03-05 1 / 38 Outline 1 Introduction 2 Classification

Advanced Ensemble Strategies for Polynomial Models

Advanced Ensemble Strategies for Polynomial Models Pavel Kordík 1, Jan Černý 2 1 Dept. of Computer Science, Faculty of Information Technology, Czech Technical University in Prague, 2 Dept. of Computer

Chapter 11 Boosting. Xiaogang Su Department of Statistics University of Central Florida - 1 -

Chapter 11 Boosting Xiaogang Su Department of Statistics University of Central Florida - 1 - Perturb and Combine (P&C) Methods have been devised to take advantage of the instability of trees to create

Model Combination. 24 Novembre 2009

Model Combination 24 Novembre 2009 Datamining 1 2009-2010 Plan 1 Principles of model combination 2 Resampling methods Bagging Random Forests Boosting 3 Hybrid methods Stacking Generic algorithm for mulistrategy

A Learning Algorithm For Neural Network Ensembles

A Learning Algorithm For Neural Network Ensembles H. D. Navone, P. M. Granitto, P. F. Verdes and H. A. Ceccatto Instituto de Física Rosario (CONICET-UNR) Blvd. 27 de Febrero 210 Bis, 2000 Rosario. República

Ensemble Data Mining Methods

Ensemble Data Mining Methods Nikunj C. Oza, Ph.D., NASA Ames Research Center, USA INTRODUCTION Ensemble Data Mining Methods, also known as Committee Methods or Model Combiners, are machine learning methods

A Study Of Bagging And Boosting Approaches To Develop Meta-Classifier

A Study Of Bagging And Boosting Approaches To Develop Meta-Classifier G.T. Prasanna Kumari Associate Professor, Dept of Computer Science and Engineering, Gokula Krishna College of Engg, Sullurpet-524121,

Data Mining. Nonlinear Classification

Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Nonlinear Classification Classes may not be separable by a linear boundary Suppose we randomly generate a data set as follows: X has range between 0 to 15

Gerry Hobbs, Department of Statistics, West Virginia University

Decision Trees as a Predictive Modeling Method Gerry Hobbs, Department of Statistics, West Virginia University Abstract Predictive modeling has become an important area of interest in tasks such as credit

Data Mining - Evaluation of Classifiers

Data Mining - Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010

Introduction to Machine Learning and Data Mining. Prof. Dr. Igor Trajkovski trajkovski@nyus.edu.mk

Introduction to Machine Learning and Data Mining Prof. Dr. Igor Trakovski trakovski@nyus.edu.mk Neural Networks 2 Neural Networks Analogy to biological neural systems, the most robust learning systems

Data Mining Methods: Applications for Institutional Research

Data Mining Methods: Applications for Institutional Research Nora Galambos, PhD Office of Institutional Research, Planning & Effectiveness Stony Brook University NEAIR Annual Conference Philadelphia 2014

Chapter 6. The stacking ensemble approach

82 This chapter proposes the stacking ensemble approach for combining different data mining classifiers to get better performance. Other combination techniques like voting, bagging etc are also described

Introduction to Machine Learning and Data Mining. Prof. Dr. Igor Trajkovski trajkovski@nyus.edu.mk

Introduction to Machine Learning and Data Mining Prof. Dr. Igor Trajkovski trajkovski@nyus.edu.mk Ensembles 2 Learning Ensembles Learn multiple alternative definitions of a concept using different training

New Work Item for ISO 3534-5 Predictive Analytics (Initial Notes and Thoughts) Introduction

Introduction New Work Item for ISO 3534-5 Predictive Analytics (Initial Notes and Thoughts) Predictive analytics encompasses the body of statistical knowledge supporting the analysis of massive data sets.

SUCCESSFUL PREDICTION OF HORSE RACING RESULTS USING A NEURAL NETWORK

SUCCESSFUL PREDICTION OF HORSE RACING RESULTS USING A NEURAL NETWORK N M Allinson and D Merritt 1 Introduction This contribution has two main sections. The first discusses some aspects of multilayer perceptrons,

New Ensemble Combination Scheme

New Ensemble Combination Scheme Namhyoung Kim, Youngdoo Son, and Jaewook Lee, Member, IEEE Abstract Recently many statistical learning techniques are successfully developed and used in several areas However,

Knowledge Discovery and Data Mining

Knowledge Discovery and Data Mining Unit # 11 Sajjad Haider Fall 2013 1 Supervised Learning Process Data Collection/Preparation Data Cleaning Discretization Supervised/Unuspervised Identification of right

AUTOMATION OF ENERGY DEMAND FORECASTING. Sanzad Siddique, B.S.

AUTOMATION OF ENERGY DEMAND FORECASTING by Sanzad Siddique, B.S. A Thesis submitted to the Faculty of the Graduate School, Marquette University, in Partial Fulfillment of the Requirements for the Degree

BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL

The Fifth International Conference on e-learning (elearning-2014), 22-23 September 2014, Belgrade, Serbia BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL SNJEŽANA MILINKOVIĆ University

Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris

Class #6: Non-linear classification ML4Bio 2012 February 17 th, 2012 Quaid Morris 1 Module #: Title of Module 2 Review Overview Linear separability Non-linear classification Linear Support Vector Machines

Predicting the Risk of Heart Attacks using Neural Network and Decision Tree

Predicting the Risk of Heart Attacks using Neural Network and Decision Tree S.Florence 1, N.G.Bhuvaneswari Amma 2, G.Annapoorani 3, K.Malathi 4 PG Scholar, Indian Institute of Information Technology, Srirangam,

An Overview of Knowledge Discovery Database and Data mining Techniques

An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,

Knowledge Discovery and Data Mining

Knowledge Discovery and Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Evaluating the Accuracy of a Classifier Holdout, random subsampling, crossvalidation, and the bootstrap are common techniques for

Principles of Data Mining by Hand&Mannila&Smyth

Principles of Data Mining by Hand&Mannila&Smyth Slides for Textbook Ari Visa,, Institute of Signal Processing Tampere University of Technology October 4, 2010 Data Mining: Concepts and Techniques 1 Differences

Neural Networks and Support Vector Machines

INF5390 - Kunstig intelligens Neural Networks and Support Vector Machines Roar Fjellheim INF5390-13 Neural Networks and SVM 1 Outline Neural networks Perceptrons Neural networks Support vector machines

Bootstrapping Big Data

Bootstrapping Big Data Ariel Kleiner Ameet Talwalkar Purnamrita Sarkar Michael I. Jordan Computer Science Division University of California, Berkeley {akleiner, ameet, psarkar, jordan}@eecs.berkeley.edu

Comparison of Data Mining Techniques used for Financial Data Analysis

Comparison of Data Mining Techniques used for Financial Data Analysis Abhijit A. Sawant 1, P. M. Chawan 2 1 Student, 2 Associate Professor, Department of Computer Technology, VJTI, Mumbai, INDIA Abstract

Data quality in Accounting Information Systems

Data quality in Accounting Information Systems Comparing Several Data Mining Techniques Erjon Zoto Department of Statistics and Applied Informatics Faculty of Economy, University of Tirana Tirana, Albania

Classification of Bad Accounts in Credit Card Industry

Classification of Bad Accounts in Credit Card Industry Chengwei Yuan December 12, 2014 Introduction Risk management is critical for a credit card company to survive in such competing industry. In addition

Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets

Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets http://info.salford-systems.com/jsm-2015-ctw August 2015 Salford Systems Course Outline Demonstration of two classification

Building lighting energy consumption modelling with hybrid neural-statistic approaches

EPJ Web of Conferences 33, 05009 (2012) DOI: 10.1051/ epjconf/ 20123305009 C Owned by the authors, published by EDP Sciences, 2012 Building lighting energy consumption modelling with hybrid neural-statistic

Introduction to Machine Learning. Speaker: Harry Chao Advisor: J.J. Ding Date: 1/27/2011

Introduction to Machine Learning Speaker: Harry Chao Advisor: J.J. Ding Date: 1/27/2011 1 Outline 1. What is machine learning? 2. The basic of machine learning 3. Principles and effects of machine learning

HYBRID PROBABILITY BASED ENSEMBLES FOR BANKRUPTCY PREDICTION

HYBRID PROBABILITY BASED ENSEMBLES FOR BANKRUPTCY PREDICTION Chihli Hung 1, Jing Hong Chen 2, Stefan Wermter 3, 1,2 Department of Management Information Systems, Chung Yuan Christian University, Taiwan

CI6227: Data Mining. Lesson 11b: Ensemble Learning. Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore.

CI6227: Data Mining Lesson 11b: Ensemble Learning Sinno Jialin PAN Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore Acknowledgements: slides are adapted from the lecture notes

Building Ensembles of Neural Networks with Class-switching

Building Ensembles of Neural Networks with Class-switching Gonzalo Martínez-Muñoz, Aitor Sánchez-Martínez, Daniel Hernández-Lobato and Alberto Suárez Universidad Autónoma de Madrid, Avenida Francisco Tomás

Evaluation of Feature Selection Methods for Predictive Modeling Using Neural Networks in Credits Scoring

714 Evaluation of Feature election Methods for Predictive Modeling Using Neural Networks in Credits coring Raghavendra B. K. Dr. M.G.R. Educational and Research Institute, Chennai-95 Email: raghavendra_bk@rediffmail.com

Mining Direct Marketing Data by Ensembles of Weak Learners and Rough Set Methods

Mining Direct Marketing Data by Ensembles of Weak Learners and Rough Set Methods Jerzy B laszczyński 1, Krzysztof Dembczyński 1, Wojciech Kot lowski 1, and Mariusz Paw lowski 2 1 Institute of Computing

Knowledge Discovery and Data Mining. Bootstrap review. Bagging Important Concepts. Notes. Lecture 19 - Bagging. Tom Kelsey. Notes

Knowledge Discovery and Data Mining Lecture 19 - Bagging Tom Kelsey School of Computer Science University of St Andrews http://tom.host.cs.st-andrews.ac.uk twk@st-andrews.ac.uk Tom Kelsey ID5059-19-B &

Leveraging Ensemble Models in SAS Enterprise Miner

ABSTRACT Paper SAS133-2014 Leveraging Ensemble Models in SAS Enterprise Miner Miguel Maldonado, Jared Dean, Wendy Czika, and Susan Haller SAS Institute Inc. Ensemble models combine two or more models to

Data Mining Chapter 6: Models and Patterns Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University

Data Mining Chapter 6: Models and Patterns Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Models vs. Patterns Models A model is a high level, global description of a

CS570 Data Mining Classification: Ensemble Methods

CS570 Data Mining Classification: Ensemble Methods Cengiz Günay Dept. Math & CS, Emory University Fall 2013 Some slides courtesy of Han-Kamber-Pei, Tan et al., and Li Xiong Günay (Emory) Classification:

Feature vs. Classifier Fusion for Predictive Data Mining a Case Study in Pesticide Classification

Feature vs. Classifier Fusion for Predictive Data Mining a Case Study in Pesticide Classification Henrik Boström School of Humanities and Informatics University of Skövde P.O. Box 408, SE-541 28 Skövde

A New Ensemble Model for Efficient Churn Prediction in Mobile Telecommunication

2012 45th Hawaii International Conference on System Sciences A New Ensemble Model for Efficient Churn Prediction in Mobile Telecommunication Namhyoung Kim, Jaewook Lee Department of Industrial and Management

CS 2750 Machine Learning. Lecture 1. Machine Learning. http://www.cs.pitt.edu/~milos/courses/cs2750/ CS 2750 Machine Learning.

Lecture Machine Learning Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square, x5 http://www.cs.pitt.edu/~milos/courses/cs75/ Administration Instructor: Milos Hauskrecht milos@cs.pitt.edu 539 Sennott

Azure Machine Learning, SQL Data Mining and R

Azure Machine Learning, SQL Data Mining and R Day-by-day Agenda Prerequisites No formal prerequisites. Basic knowledge of SQL Server Data Tools, Excel and any analytical experience helps. Best of all:

Chapter 12 Discovering New Knowledge Data Mining

Chapter 12 Discovering New Knowledge Data Mining Becerra-Fernandez, et al. -- Knowledge Management 1/e -- 2004 Prentice Hall Additional material 2007 Dekai Wu Chapter Objectives Introduce the student to

CHARACTERISTICS IN FLIGHT DATA ESTIMATION WITH LOGISTIC REGRESSION AND SUPPORT VECTOR MACHINES

CHARACTERISTICS IN FLIGHT DATA ESTIMATION WITH LOGISTIC REGRESSION AND SUPPORT VECTOR MACHINES Claus Gwiggner, Ecole Polytechnique, LIX, Palaiseau, France Gert Lanckriet, University of Berkeley, EECS,

Data Mining Classification: Alternative Techniques. Instance-Based Classifiers. Lecture Notes for Chapter 5. Introduction to Data Mining

Data Mining Classification: Alternative Techniques Instance-Based Classifiers Lecture Notes for Chapter 5 Introduction to Data Mining by Tan, Steinbach, Kumar Set of Stored Cases Atr1... AtrN Class A B

DATA MINING TECHNIQUES AND APPLICATIONS

DATA MINING TECHNIQUES AND APPLICATIONS Mrs. Bharati M. Ramageri, Lecturer Modern Institute of Information Technology and Research, Department of Computer Application, Yamunanagar, Nigdi Pune, Maharashtra,

EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set

EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set Amhmed A. Bhih School of Electrical and Electronic Engineering Princy Johnson School of Electrical and Electronic Engineering Martin

MS1b Statistical Data Mining

MS1b Statistical Data Mining Yee Whye Teh Department of Statistics Oxford http://www.stats.ox.ac.uk/~teh/datamining.html Outline Administrivia and Introduction Course Structure Syllabus Introduction to

6.2.8 Neural networks for data mining

6.2.8 Neural networks for data mining Walter Kosters 1 In many application areas neural networks are known to be valuable tools. This also holds for data mining. In this chapter we discuss the use of neural

International Journal of Computer Trends and Technology (IJCTT) volume 4 Issue 8 August 2013

A Short-Term Traffic Prediction On A Distributed Network Using Multiple Regression Equation Ms.Sharmi.S 1 Research Scholar, MS University,Thirunelvelli Dr.M.Punithavalli Director, SREC,Coimbatore. Abstract:

Lecture/Recitation Topic SMA 5303 L1 Sampling and statistical distributions

SMA 50: Statistical Learning and Data Mining in Bioinformatics (also listed as 5.077: Statistical Learning and Data Mining ()) Spring Term (Feb May 200) Faculty: Professor Roy Welsch Wed 0 Feb 7:00-8:0

Welcome Xindong Wu Data Mining: Updates in Technologies Dept of Math and Computer Science Colorado School of Mines Golden, Colorado 80401, USA Email: xwu@ mines.edu Home Page: http://kais.mines.edu/~xwu/

Internet Traffic Prediction by W-Boost: Classification and Regression

Internet Traffic Prediction by W-Boost: Classification and Regression Hanghang Tong 1, Chongrong Li 2, Jingrui He 1, and Yang Chen 1 1 Department of Automation, Tsinghua University, Beijing 100084, China

SURVIVABILITY ANALYSIS OF PEDIATRIC LEUKAEMIC PATIENTS USING NEURAL NETWORK APPROACH

330 SURVIVABILITY ANALYSIS OF PEDIATRIC LEUKAEMIC PATIENTS USING NEURAL NETWORK APPROACH T. M. D.Saumya 1, T. Rupasinghe 2 and P. Abeysinghe 3 1 Department of Industrial Management, University of Kelaniya,

Towards better accuracy for Spam predictions

Towards better accuracy for Spam predictions Chengyan Zhao Department of Computer Science University of Toronto Toronto, Ontario, Canada M5S 2E4 czhao@cs.toronto.edu Abstract Spam identification is crucial

Data Mining Techniques for Prognosis in Pancreatic Cancer

Data Mining Techniques for Prognosis in Pancreatic Cancer by Stuart Floyd A Thesis Submitted to the Faculty of the WORCESTER POLYTECHNIC INSTITUE In partial fulfillment of the requirements for the Degree

Artificial Neural Networks and Support Vector Machines. CS 486/686: Introduction to Artificial Intelligence

Artificial Neural Networks and Support Vector Machines CS 486/686: Introduction to Artificial Intelligence 1 Outline What is a Neural Network? - Perceptron learners - Multi-layer networks What is a Support

A Content based Spam Filtering Using Optical Back Propagation Technique

A Content based Spam Filtering Using Optical Back Propagation Technique Sarab M. Hameed 1, Noor Alhuda J. Mohammed 2 Department of Computer Science, College of Science, University of Baghdad - Iraq ABSTRACT

Silvermine House Steenberg Office Park, Tokai 7945 Cape Town, South Africa Telephone: +27 21 702 4666 www.spss-sa.com

SPSS-SA Silvermine House Steenberg Office Park, Tokai 7945 Cape Town, South Africa Telephone: +27 21 702 4666 www.spss-sa.com SPSS-SA Training Brochure 2009 TABLE OF CONTENTS 1 SPSS TRAINING COURSES FOCUSING

Training Methods for Adaptive Boosting of Neural Networks for Character Recognition

Submission to NIPS*97, Category: Algorithms & Architectures, Preferred: Oral Training Methods for Adaptive Boosting of Neural Networks for Character Recognition Holger Schwenk Dept. IRO Université de Montréal

Artificial Neural Network, Decision Tree and Statistical Techniques Applied for Designing and Developing E-mail Classifier

International Journal of Recent Technology and Engineering (IJRTE) ISSN: 2277-3878, Volume-1, Issue-6, January 2013 Artificial Neural Network, Decision Tree and Statistical Techniques Applied for Designing

Using multiple models: Bagging, Boosting, Ensembles, Forests

Using multiple models: Bagging, Boosting, Ensembles, Forests Bagging Combining predictions from multiple models Different models obtained from bootstrap samples of training data Average predictions or

Assessing Data Mining: The State of the Practice

Assessing Data Mining: The State of the Practice 2003 Herbert A. Edelstein Two Crows Corporation 10500 Falls Road Potomac, Maryland 20854 www.twocrows.com (301) 983-3555 Objectives Separate myth from reality

Statistics Graduate Courses STAT 7002--Topics in Statistics-Biological/Physical/Mathematics (cr.arr.).organized study of selected topics. Subjects and earnable credit may vary from semester to semester.

Optimization of technical trading strategies and the profitability in security markets

Economics Letters 59 (1998) 249 254 Optimization of technical trading strategies and the profitability in security markets Ramazan Gençay 1, * University of Windsor, Department of Economics, 401 Sunset,

An Introduction to Statistical Machine Learning - Overview -

An Introduction to Statistical Machine Learning - Overview - Samy Bengio bengio@idiap.ch Dalle Molle Institute for Perceptual Artificial Intelligence (IDIAP) CP 592, rue du Simplon 4 1920 Martigny, Switzerland

Government of Russian Federation. Faculty of Computer Science School of Data Analysis and Artificial Intelligence

Government of Russian Federation Federal State Autonomous Educational Institution of High Professional Education National Research University «Higher School of Economics» Faculty of Computer Science School

Online Forecasting of Stock Market Movement Direction Using the Improved Incremental Algorithm

Online Forecasting of Stock Market Movement Direction Using the Improved Incremental Algorithm Dalton Lunga and Tshilidzi Marwala University of the Witwatersrand School of Electrical and Information Engineering

Studying Auto Insurance Data

Studying Auto Insurance Data Ashutosh Nandeshwar February 23, 2010 1 Introduction To study auto insurance data using traditional and non-traditional tools, I downloaded a well-studied data from http://www.statsci.org/data/general/motorins.

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R Overview This 4-day class is the first of the two data science courses taught by Rafal Lukawiecki. Some of the topics will be

Stock Trading by Modelling Price Trend with Dynamic Bayesian Networks

Stock Trading by Modelling Price Trend with Dynamic Bayesian Networks Jangmin O 1,JaeWonLee 2, Sung-Bae Park 1, and Byoung-Tak Zhang 1 1 School of Computer Science and Engineering, Seoul National University

Why Ensembles Win Data Mining Competitions

Why Ensembles Win Data Mining Competitions A Predictive Analytics Center of Excellence (PACE) Tech Talk November 14, 2012 Dean Abbott Abbott Analytics, Inc. Blog: http://abbottanalytics.blogspot.com URL:

Supervised Learning (Big Data Analytics)

Supervised Learning (Big Data Analytics) Vibhav Gogate Department of Computer Science The University of Texas at Dallas Practical advice Goal of Big Data Analytics Uncover patterns in Data. Can be used

Supporting Online Material for

www.sciencemag.org/cgi/content/full/313/5786/504/dc1 Supporting Online Material for Reducing the Dimensionality of Data with Neural Networks G. E. Hinton* and R. R. Salakhutdinov *To whom correspondence

DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES

DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES Vijayalakshmi Mahanra Rao 1, Yashwant Prasad Singh 2 Multimedia University, Cyberjaya, MALAYSIA 1 lakshmi.mahanra@gmail.com

INTRODUCTION TO NEURAL NETWORKS

INTRODUCTION TO NEURAL NETWORKS Pictures are taken from http://www.cs.cmu.edu/~tom/mlbook-chapter-slides.html http://research.microsoft.com/~cmbishop/prml/index.htm By Nobel Khandaker Neural Networks An

II. RELATED WORK. Sentiment Mining

Sentiment Mining Using Ensemble Classification Models Matthew Whitehead and Larry Yaeger Indiana University School of Informatics 901 E. 10th St. Bloomington, IN 47408 {mewhiteh, larryy}@indiana.edu Abstract

THE HYBRID CART-LOGIT MODEL IN CLASSIFICATION AND DATA MINING. Dan Steinberg and N. Scott Cardell

THE HYBID CAT-LOGIT MODEL IN CLASSIFICATION AND DATA MINING Introduction Dan Steinberg and N. Scott Cardell Most data-mining projects involve classification problems assigning objects to classes whether

Social Media Mining. Data Mining Essentials

Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers

Ensemble Approach for the Classification of Imbalanced Data

Ensemble Approach for the Classification of Imbalanced Data Vladimir Nikulin 1, Geoffrey J. McLachlan 1, and Shu Kay Ng 2 1 Department of Mathematics, University of Queensland v.nikulin@uq.edu.au, gjm@maths.uq.edu.au

DATA ANALYTICS USING R

DATA ANALYTICS USING R Duration: 90 Hours Intended audience and scope: The course is targeted at fresh engineers, practicing engineers and scientists who are interested in learning and understanding data

Handling missing data in large data sets. Agostino Di Ciaccio Dept. of Statistics University of Rome La Sapienza

Handling missing data in large data sets Agostino Di Ciaccio Dept. of Statistics University of Rome La Sapienza The problem Often in official statistics we have large data sets with many variables and

SOFTWARE EFFORT ESTIMATION USING RADIAL BASIS FUNCTION NEURAL NETWORKS Ana Maria Bautista, Angel Castellanos, Tomas San Feliu

International Journal Information Theories and Applications, Vol. 21, Number 4, 2014 319 SOFTWARE EFFORT ESTIMATION USING RADIAL BASIS FUNCTION NEURAL NETWORKS Ana Maria Bautista, Angel Castellanos, Tomas

Winning the Kaggle Algorithmic Trading Challenge with the Composition of Many Models and Feature Engineering

IEICE Transactions on Information and Systems, vol.e96-d, no.3, pp.742-745, 2013. 1 Winning the Kaggle Algorithmic Trading Challenge with the Composition of Many Models and Feature Engineering Ildefons

Cross-Validation. Synonyms Rotation estimation

Comp. by: BVijayalakshmiGalleys0000875816 Date:6/11/08 Time:19:52:53 Stage:First Proof C PAYAM REFAEILZADEH, LEI TANG, HUAN LIU Arizona State University Synonyms Rotation estimation Definition is a statistical

Bike sharing model reuse framework for tree-based ensembles

Bike sharing model reuse framework for tree-based ensembles Gergo Barta 1 Department of Telecommunications and Media Informatics, Budapest University of Technology and Economics, Magyar tudosok krt. 2.

The Artificial Prediction Market

The Artificial Prediction Market Adrian Barbu Department of Statistics Florida State University Joint work with Nathan Lay, Siemens Corporate Research 1 Overview Main Contributions A mathematical theory

Lecture 13: Validation

Lecture 3: Validation g Motivation g The Holdout g Re-sampling techniques g Three-way data splits Motivation g Validation techniques are motivated by two fundamental problems in pattern recognition: model

Neural Networks & Boosting

Neural Networks & Boosting Bob Stine Dept of Statistics, School University of Pennsylvania Questions How is logistic regression different from OLS? Logistic mean function for probabilities Larger weight

Graduate Programs in Statistics Course Titles STAT 100 CALCULUS AND MATR IX ALGEBRA FOR STATISTICS. Differential and integral calculus; infinite series; matrix algebra STAT 195 INTRODUCTION TO MATHEMATICAL