Encog: Library of Interchangeable Machine Learning Models for Java and C#

Size: px
Start display at page:

Download "Encog: Library of Interchangeable Machine Learning Models for Java and C#"

Transcription

1 Journal of Machine Learning Research 16 (2015) Submitted 11/14; Published 6/15 Encog: Library of Interchangeable Machine Learning Models for Java and C# Jeff Heaton College of Engineering and Computing Nova Southeastern University Fort Lauderdale, FL 33314, USA Editor: Cheng Soon Ong Abstract This paper introduces the Encog library for Java and C#, a scalable, adaptable, multiplatform machine learning framework that was first released in Encog allows a variety of machine learning models to be applied to data sets using regression, classification, and clustering. Various supported machine learning models can be used interchangeably with minimal recoding. Encog uses efficient multithreaded code to reduce training time by exploiting modern multicore processors. The current version of Encog can be downloaded from Keywords: Java, C#, neural network, support vector machine, open source software 1. Intention and Goals This paper describes the Encog API for Java and C# that is provided as a JAR or DLL library. The C# version of Encog is also compatible with the Xamarin Mono package. Encog has an active community that has provided many enhancements that are beyond the scope of this paper. This includes extensions such as Javascript, GPU processing, C/C++ support, Scala support, and interfaces to various automated trading platforms. The scope of this paper is limited to the Java and C# API. Encog allows the Java or C# programmer to experiment with a wide range of machine language models using a simple, consistent interface for clustering, regression, and classifications. This allows the programmer to construct applications that discover which model provides the most suitable fit for the data. Encog provides basic tools for automated model selection. Most Encog models are implemented as efficient multithreaded algorithms to reduce processing time. This often allows Encog to perform more efficiently than many other Java and C# libraries, as demonstrated empirically by Taheri (2014) and Matviykiv and Faitas (2012). Luhasz et al. (2013) and Ramos-Pollán et al. (2012) also saw favorable results when evaluating Encog to similar libraries. The Encog s API is presented in an intuitive object-oriented paradigm that allows various models, optimization algorithms, and training algorithms to be highly interchangeable. However, beneath the API, the models are represented as one and two-dimensional arrays. This internal representation allows for highly efficient calculation. The API shields the programmer from the complexity of model calculation and fitting. c 2015 Jeff Heaton.

2 Heaton Encog contains nearly 400 unit tests to ensure consistency between the Java and C# model implementations. Expected results are calculated and cross-checked between the two platforms. A custom pseudorandom number generator (PRNG) is used in both language s unit tests to ensure that even stochastic models produce consistent, verifiable test results. Encog contains nearly 150 examples to demonstrate the use of the API in a variety of scenarios. These examples include simple prediction, time series, simulation, financial applications, path finding, curve fitting, and other applications. Documentation for Encog is provided as Java/C# docs and an online wiki. Additionally, discussion groups and a Stack Overflow tag are maintained for support. Links to all of these resources can be found at 2. Framework Overview The design goal of Encog is to provide interchangeable models with efficient, internal implementations. The Encog framework supports machine learning models with multiple training algorithms. These models are listed here: Adaline, Feedforward, Hopfield, PNN/GRNN, RBF & NEAT neural networks generalized linear regression (GLM) genetic programming (tree-based) k-means clustering k-nearest neighbors linear regression self-organizing map (SOM) simple recurrent network (Elman and Jordan) support vector machine (SVM) Encog provides optimization algorithms such as particle swarm optimization (PSO) (Poli, 2008), genetic algorithms (GA), Nelder-Mead and simulated annealing. These algorithms can optimize a vector to minimize a loss function; consequently, these algorithms can fit model parameters to data sets. Propagation-training algorithms for neural network fitting, such as back propagation (Rumelhart et al., 1988), resilient propagation (Riedmiller and Braun, 1992), Levenberg- Marquardt (Marquardt, 1963), quickpropagation (Fahlman, 1988), and scaled conjugate gradient (Møller, 1993) are included. Neural network pruning and model selection can be used to find optimal network architectures. Neural network architectures can be automatically built by a genetic algorithm using NEAT and HyperNEAT (Stanley and Miikkulainen, 2002). A number of preprocessing tools are built into the Encog library. Collected data can be divided into training, test, and validation sets. Time-series data can be encoded into data windows. Quantitative data can be normalized by range or z-score to prevent biases in some models. Masters (1993) normalizes qualitative data using one-of-n encoding or equilateral encoding. Encog uses these normalization techniques. 1244

3 Encog: Interchangeable Machine Learning Models for Java and C# Encog also contains extensive support for genetic programming using a tree representation (Koza, 1993). A full set of mathematical and programming functions are provided. Additionally, new functions can be defined. Constant nodes can either be drawn from a constant pool or generated as needed. Rules can optionally be added to simplify expressions and penalize specific genome patterns. 3. API Overview One of the central design philosophies of Encog is to allow models to be quickly interchanged without a great deal of code modification. A classification example will demonstrate this interchangeability, using the iris data set (Fisher, 1936). Portions of this classification example are presented in this paper, using the Java programming language. The complete example, in both Java and C#, is provided in the Encog Quick Start Guide (available from The Quick Start Guide also provides regression and time-series examples. The following example learns to predict the species of an iris flower by using four types of measurements from each flower. To begin, the program loads the iris data set s CSV file. In addition to CSV, Encog contains classes to read fixed-length text, JDBC, ODBC, and XML data sources. The iris data set is loaded, and the four measurement columns are defined as continuous values. VersatileDataSource source = new CSVDataSource(irisFile, false, CSVFormat.DECIMAL_POINT); VersatileMLDataSet data = new VersatileMLDataSet(source); data.definesourcecolumn("sepal-length", 0, ColumnType.continuous); data.definesourcecolumn("sepal-width", 1, ColumnType.continuous); data.definesourcecolumn("petal-length", 2, ColumnType.continuous); data.definesourcecolumn("petal-width", 3, ColumnType.continuous); The species of Iris is defined as nominal value. Defining the columns as continuous, nominal or ordinal allows Encog to determine the appropriate way to encode these data for a model. For specialized cases, it is possible to override Encog s encoding defaults for any model type. ColumnDefinition outputcolumn = data.definesourcecolumn("species", 4, ColumnType.nominal); Once the columns have been defined, the file is analyzed to determine minimum, maximum, and other statistical properties of the columns. This allows the columns to be properly normalized and encoded by Encog for modeling. data.analyze(); data.definesingleoutputothersinput(outputcolumn); Next the model type is defined to be a feedforward neural network. EncogModel model = new EncogModel(data); model.selectmethod(data, MLMethodFactory.TYPE_FEEDFORWARD); Only the above line needs to be changed to switch to model types that include the following: MLMethodFactory.SVM: support vector machine MLMethodFactory.TYPE RBFNETWORK: RBF neural network 1245

4 Heaton MLMethodFactor.TYPE NEAT: NEAT neural network MLMethodFactor.TYPE PNN: probabilistic neural network Next the data set is normalized and encoded. Encog will automatically determine the correct normalization type based on the model chosen in the last step. For model validation, 30% of the data are held back. Though the validation sampling is random, a seed of 1001 is used so that the items selected for validation remain constant between program runs. Finally, the default training type is selected. data.normalize(); model.holdbackvalidation(0.3, true, 1001); model.selecttrainingtype(data); The example trains using a 5-fold cross-validated technique that chooses the model with the best validation score. The resulting training and validation errors are displayed. MLRegression bestmethod = (MLRegression)model.crossvalidate(5, true); System.out.println( "Training error: " + EncogUtility.calculateRegressionError( bestmethod, model.gettrainingdataset())); System.out.println( "Validation error: " + EncogUtility.calculateRegressionError (bestmethod, model.getvalidationdataset())); Display normalization parameters and final model. NormalizationHelper helper = data.getnormhelper(); System.out.println(helper.toString()); System.out.println("Final model: " + bestmethod); 4. Future Plans and Conclusions A number of enhancements are planned for Encog. Gradient boosting machines (GBM) and deep learning are two future model additions. Several planned enhancements will provide interoperability with other machine learning packages. Future versions of Encog will have the ability to read and write Weka Attribute-Relation File Format (ARFF) and libsvm data files. Encog will gain the ability to load and save models in the Predictive Model Markup Language (PMML) format. A code contribution by Mosca (2012) will soon be integrated, enhancing Encog s ensemble learning capabilities. Acknowledgments The Encog community has been very helpful for bug reports, bug fixes, and feature suggestions. Contributors to Encog include Olivier Guiglionda, Seema Singh, César Roberto de Souza, and others. A complete list of contributors to Encog can be found at the GitHub repository: Alan Mosca, Department of Computer Science and Information Systems, Birkbeck, University of London, UK, created Encog s ensemble functionality. Matthew Dean, Marc Fletcher and Edmund Owen, Semiconductor Physics Research Group, University of Cambridge, UK, created Encog s RBF Neural network model. 1246

5 Encog: Interchangeable Machine Learning Models for Java and C# References S. Fahlman. An empirical study of learning speed in back-propagation networks. Technical report, Carnegie Mellon University, R. Fisher. The use of multiple measurements in taxonomic problems. Annals of Eugenics, 7(7): , J. Koza. Genetic Programming: On the Programming of Computers by Means of Natural Selection. Complex adaptive systems. MIT Press, ISBN G. Luhasz, V. Munteanu, and V. Negru. Data mining considerations for knowledge acquisition in real time strategy games. In IEEE 11th International Symposium on Intelligent Systems and Informatics, SISY 2013, Subotica, Serbia, September 26-28, 2013, pages , doi: /SISY D. Marquardt. An algorithm for least-squares estimation of nonlinear parameters. SIAM Journal on Applied Mathematics, 11(2): , doi: / T. Masters. Practical Neural Network Recipes in C++. Academic Press Professional, Inc., San Diego, CA, USA, ISBN O. Matviykiv and O. Faitas. Data classification of spectrum analysis using neural network. Lviv Polytechnic National University, M. Møller. A scaled conjugate gradient algorithm for fast supervised learning. Neural Networks, 6(4): , A. Mosca. Extending Encog: a study on classifier ensemble techniques. Master s thesis, Birkbeck, University of London, R. Poli. Analysis of the publications on the applications of particle swarm optimisation. J. Artif. Evol. App., 2008:4:1 4:10, January ISSN Raúl Ramos-Pollán, Miguel Ángel Guevara-López, and Eugénio C. Oliveira. A software framework for building biomedical machine learning classifiers through grid computing resources. J. Medical Systems, 36(4): , doi: /s M. Riedmiller and H. Braun. Rprop: A fast adaptive learning algorithm. Technical report, Proc. of ISCIS VII), Universitat, D. Rumelhart, G. Hinton, and R. Williams. Neurocomputing: Foundations of research. In James A. Anderson and Edward Rosenfeld, editors, Neurocomputing: Foundations of Research, chapter Learning Representations by Back-propagating Errors, pages MIT Press, Cambridge, MA, USA, ISBN K. Stanley and R. Miikkulainen. Evolving neural networks through augmenting topologies. Evol. Comput., 10(2):99 127, June ISSN T. Taheri. Benchmarking and comparing Encog, Neuroph and JOONE neural networks. June Accessed:

BIOINF 585 Fall 2015 Machine Learning for Systems Biology & Clinical Informatics http://www.ccmb.med.umich.edu/node/1376

BIOINF 585 Fall 2015 Machine Learning for Systems Biology & Clinical Informatics http://www.ccmb.med.umich.edu/node/1376 Course Director: Dr. Kayvan Najarian (DCM&B, [email protected]) Lectures: Labs: Mondays and Wednesdays 9:00 AM -10:30 AM Rm. 2065 Palmer Commons Bldg. Wednesdays 10:30 AM 11:30 AM (alternate weeks) Rm.

More information

NEURAL NETWORKS A Comprehensive Foundation

NEURAL NETWORKS A Comprehensive Foundation NEURAL NETWORKS A Comprehensive Foundation Second Edition Simon Haykin McMaster University Hamilton, Ontario, Canada Prentice Hall Prentice Hall Upper Saddle River; New Jersey 07458 Preface xii Acknowledgments

More information

14.10.2014. Overview. Swarms in nature. Fish, birds, ants, termites, Introduction to swarm intelligence principles Particle Swarm Optimization (PSO)

14.10.2014. Overview. Swarms in nature. Fish, birds, ants, termites, Introduction to swarm intelligence principles Particle Swarm Optimization (PSO) Overview Kyrre Glette kyrrehg@ifi INF3490 Swarm Intelligence Particle Swarm Optimization Introduction to swarm intelligence principles Particle Swarm Optimization (PSO) 3 Swarms in nature Fish, birds,

More information

Projects - Neural and Evolutionary Computing

Projects - Neural and Evolutionary Computing Projects - Neural and Evolutionary Computing 2014-2015 I. Application oriented topics 1. Task scheduling in distributed systems. The aim is to assign a set of (independent or correlated) tasks to some

More information

An Introduction to Data Mining

An Introduction to Data Mining An Introduction to Intel Beijing [email protected] January 17, 2014 Outline 1 DW Overview What is Notable Application of Conference, Software and Applications Major Process in 2 Major Tasks in Detail

More information

Université de Montpellier 2 Hugo Alatrista-Salas : [email protected]

Université de Montpellier 2 Hugo Alatrista-Salas : hugo.alatrista-salas@teledetection.fr Université de Montpellier 2 Hugo Alatrista-Salas : [email protected] WEKA Gallirallus Zeland) australis : Endemic bird (New Characteristics Waikato university Weka is a collection

More information

Waffles: A Machine Learning Toolkit

Waffles: A Machine Learning Toolkit Journal of Machine Learning Research 12 (2011) 2383-2387 Submitted 6/10; Revised 3/11; Published 7/11 Waffles: A Machine Learning Toolkit Mike Gashler Department of Computer Science Brigham Young University

More information

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014 RESEARCH ARTICLE OPEN ACCESS A Survey of Data Mining: Concepts with Applications and its Future Scope Dr. Zubair Khan 1, Ashish Kumar 2, Sunny Kumar 3 M.Tech Research Scholar 2. Department of Computer

More information

NEURAL NETWORKS IN DATA MINING

NEURAL NETWORKS IN DATA MINING NEURAL NETWORKS IN DATA MINING 1 DR. YASHPAL SINGH, 2 ALOK SINGH CHAUHAN 1 Reader, Bundelkhand Institute of Engineering & Technology, Jhansi, India 2 Lecturer, United Institute of Management, Allahabad,

More information

Active Learning SVM for Blogs recommendation

Active Learning SVM for Blogs recommendation Active Learning SVM for Blogs recommendation Xin Guan Computer Science, George Mason University Ⅰ.Introduction In the DH Now website, they try to review a big amount of blogs and articles and find the

More information

ON INTEGRATING UNSUPERVISED AND SUPERVISED CLASSIFICATION FOR CREDIT RISK EVALUATION

ON INTEGRATING UNSUPERVISED AND SUPERVISED CLASSIFICATION FOR CREDIT RISK EVALUATION ISSN 9 X INFORMATION TECHNOLOGY AND CONTROL, 00, Vol., No.A ON INTEGRATING UNSUPERVISED AND SUPERVISED CLASSIFICATION FOR CREDIT RISK EVALUATION Danuta Zakrzewska Institute of Computer Science, Technical

More information

Is a Data Scientist the New Quant? Stuart Kozola MathWorks

Is a Data Scientist the New Quant? Stuart Kozola MathWorks Is a Data Scientist the New Quant? Stuart Kozola MathWorks 2015 The MathWorks, Inc. 1 Facts or information used usually to calculate, analyze, or plan something Information that is produced or stored by

More information

Getting Even More Out of Ensemble Selection

Getting Even More Out of Ensemble Selection Getting Even More Out of Ensemble Selection Quan Sun Department of Computer Science The University of Waikato Hamilton, New Zealand [email protected] ABSTRACT Ensemble Selection uses forward stepwise

More information

Performance Comparison between Backpropagation Algorithms Applied to Intrusion Detection in Computer Network Systems

Performance Comparison between Backpropagation Algorithms Applied to Intrusion Detection in Computer Network Systems Performance Comparison between Backpropagation Algorithms Applied to Intrusion Detection in Computer Network Systems Iftikhar Ahmad, M.A Ansari, Sajjad Mohsin Department of Computer Sciences, Federal Urdu

More information

Advanced analytics at your hands

Advanced analytics at your hands 2.3 Advanced analytics at your hands Neural Designer is the most powerful predictive analytics software. It uses innovative neural networks techniques to provide data scientists with results in a way previously

More information

Financial Trading System using Combination of Textual and Numerical Data

Financial Trading System using Combination of Textual and Numerical Data Financial Trading System using Combination of Textual and Numerical Data Shital N. Dange Computer Science Department, Walchand Institute of Rajesh V. Argiddi Assistant Prof. Computer Science Department,

More information

Scalable Developments for Big Data Analytics in Remote Sensing

Scalable Developments for Big Data Analytics in Remote Sensing Scalable Developments for Big Data Analytics in Remote Sensing Federated Systems and Data Division Research Group High Productivity Data Processing Dr.-Ing. Morris Riedel et al. Research Group Leader,

More information

The basic data mining algorithms introduced may be enhanced in a number of ways.

The basic data mining algorithms introduced may be enhanced in a number of ways. DATA MINING TECHNOLOGIES AND IMPLEMENTATIONS The basic data mining algorithms introduced may be enhanced in a number of ways. Data mining algorithms have traditionally assumed data is memory resident,

More information

CS Master Level Courses and Areas COURSE DESCRIPTIONS. CSCI 521 Real-Time Systems. CSCI 522 High Performance Computing

CS Master Level Courses and Areas COURSE DESCRIPTIONS. CSCI 521 Real-Time Systems. CSCI 522 High Performance Computing CS Master Level Courses and Areas The graduate courses offered may change over time, in response to new developments in computer science and the interests of faculty and students; the list of graduate

More information

Programming Neural Networks with Encog3 in C#

Programming Neural Networks with Encog3 in C# Programming Neural Networks with Encog3 in C# Programming Neural Networks with Encog3 in C# Jeff Heaton Heaton Research, Inc. St. Louis, MO, USA v Publisher: Heaton Research, Inc Programming Neural Networks

More information

Application of Event Based Decision Tree and Ensemble of Data Driven Methods for Maintenance Action Recommendation

Application of Event Based Decision Tree and Ensemble of Data Driven Methods for Maintenance Action Recommendation Application of Event Based Decision Tree and Ensemble of Data Driven Methods for Maintenance Action Recommendation James K. Kimotho, Christoph Sondermann-Woelke, Tobias Meyer, and Walter Sextro Department

More information

Introduction Predictive Analytics Tools: Weka

Introduction Predictive Analytics Tools: Weka Introduction Predictive Analytics Tools: Weka Predictive Analytics Center of Excellence San Diego Supercomputer Center University of California, San Diego Tools Landscape Considerations Scale User Interface

More information

Knowledge Discovery from patents using KMX Text Analytics

Knowledge Discovery from patents using KMX Text Analytics Knowledge Discovery from patents using KMX Text Analytics Dr. Anton Heijs [email protected] Treparel Abstract In this white paper we discuss how the KMX technology of Treparel can help searchers

More information

Data Mining and Neural Networks in Stata

Data Mining and Neural Networks in Stata Data Mining and Neural Networks in Stata 2 nd Italian Stata Users Group Meeting Milano, 10 October 2005 Mario Lucchini e Maurizo Pisati Università di Milano-Bicocca [email protected] [email protected]

More information

COPYRIGHTED MATERIAL. Contents. List of Figures. Acknowledgments

COPYRIGHTED MATERIAL. Contents. List of Figures. Acknowledgments Contents List of Figures Foreword Preface xxv xxiii xv Acknowledgments xxix Chapter 1 Fraud: Detection, Prevention, and Analytics! 1 Introduction 2 Fraud! 2 Fraud Detection and Prevention 10 Big Data for

More information

KnowledgeSTUDIO HIGH-PERFORMANCE PREDICTIVE ANALYTICS USING ADVANCED MODELING TECHNIQUES

KnowledgeSTUDIO HIGH-PERFORMANCE PREDICTIVE ANALYTICS USING ADVANCED MODELING TECHNIQUES HIGH-PERFORMANCE PREDICTIVE ANALYTICS USING ADVANCED MODELING TECHNIQUES Translating data into business value requires the right data mining and modeling techniques which uncover important patterns within

More information

Introducing diversity among the models of multi-label classification ensemble

Introducing diversity among the models of multi-label classification ensemble Introducing diversity among the models of multi-label classification ensemble Lena Chekina, Lior Rokach and Bracha Shapira Ben-Gurion University of the Negev Dept. of Information Systems Engineering and

More information

How To Use Neural Networks In Data Mining

How To Use Neural Networks In Data Mining International Journal of Electronics and Computer Science Engineering 1449 Available Online at www.ijecse.org ISSN- 2277-1956 Neural Networks in Data Mining Priyanka Gaur Department of Information and

More information

Decision Tree Learning on Very Large Data Sets

Decision Tree Learning on Very Large Data Sets Decision Tree Learning on Very Large Data Sets Lawrence O. Hall Nitesh Chawla and Kevin W. Bowyer Department of Computer Science and Engineering ENB 8 University of South Florida 4202 E. Fowler Ave. Tampa

More information

life science data mining

life science data mining life science data mining - '.)'-. < } ti» (>.:>,u» c ~'editors Stephen Wong Harvard Medical School, USA Chung-Sheng Li /BM Thomas J Watson Research Center World Scientific NEW JERSEY LONDON SINGAPORE.

More information

Pentaho Data Mining Last Modified on January 22, 2007

Pentaho Data Mining Last Modified on January 22, 2007 Pentaho Data Mining Copyright 2007 Pentaho Corporation. Redistribution permitted. All trademarks are the property of their respective owners. For the latest information, please visit our web site at www.pentaho.org

More information

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015 An Introduction to Data Mining for Wind Power Management Spring 2015 Big Data World Every minute: Google receives over 4 million search queries Facebook users share almost 2.5 million pieces of content

More information

Beating the NCAA Football Point Spread

Beating the NCAA Football Point Spread Beating the NCAA Football Point Spread Brian Liu Mathematical & Computational Sciences Stanford University Patrick Lai Computer Science Department Stanford University December 10, 2010 1 Introduction Over

More information

Big Data Analytics and Optimization

Big Data Analytics and Optimization Big Data Analytics and Optimization C e r t i f i c a t e P r o g r a m i n E n g i n e e r i n g E x c e l l e n c e e.edu.in http://www.insof LIST OF COURSES Essential Business Skills for a Data Scientist...

More information

Data Mining mit der JMSL Numerical Library for Java Applications

Data Mining mit der JMSL Numerical Library for Java Applications Data Mining mit der JMSL Numerical Library for Java Applications Stefan Sineux 8. Java Forum Stuttgart 07.07.2005 Agenda Visual Numerics JMSL TM Numerical Library Neuronale Netze (Hintergrund) Demos Neuronale

More information

Neural network software tool development: exploring programming language options

Neural network software tool development: exploring programming language options INEB- PSI Technical Report 2006-1 Neural network software tool development: exploring programming language options Alexandra Oliveira [email protected] Supervisor: Professor Joaquim Marques de Sá June 2006

More information

Leveraging Ensemble Models in SAS Enterprise Miner

Leveraging Ensemble Models in SAS Enterprise Miner ABSTRACT Paper SAS133-2014 Leveraging Ensemble Models in SAS Enterprise Miner Miguel Maldonado, Jared Dean, Wendy Czika, and Susan Haller SAS Institute Inc. Ensemble models combine two or more models to

More information

Classification of Bad Accounts in Credit Card Industry

Classification of Bad Accounts in Credit Card Industry Classification of Bad Accounts in Credit Card Industry Chengwei Yuan December 12, 2014 Introduction Risk management is critical for a credit card company to survive in such competing industry. In addition

More information

Neural Network Add-in

Neural Network Add-in Neural Network Add-in Version 1.5 Software User s Guide Contents Overview... 2 Getting Started... 2 Working with Datasets... 2 Open a Dataset... 3 Save a Dataset... 3 Data Pre-processing... 3 Lagging...

More information

GURLS: A Least Squares Library for Supervised Learning

GURLS: A Least Squares Library for Supervised Learning Journal of Machine Learning Research 14 (2013) 3201-3205 Submitted 1/12; Revised 2/13; Published 10/13 GURLS: A Least Squares Library for Supervised Learning Andrea Tacchetti Pavan K. Mallapragada Center

More information

Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms

Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms Scott Pion and Lutz Hamel Abstract This paper presents the results of a series of analyses performed on direct mail

More information

How To Predict Web Site Visits

How To Predict Web Site Visits Web Site Visit Forecasting Using Data Mining Techniques Chandana Napagoda Abstract: Data mining is a technique which is used for identifying relationships between various large amounts of data in many

More information

Knowledge Discovery and Data Mining. Bootstrap review. Bagging Important Concepts. Notes. Lecture 19 - Bagging. Tom Kelsey. Notes

Knowledge Discovery and Data Mining. Bootstrap review. Bagging Important Concepts. Notes. Lecture 19 - Bagging. Tom Kelsey. Notes Knowledge Discovery and Data Mining Lecture 19 - Bagging Tom Kelsey School of Computer Science University of St Andrews http://tom.host.cs.st-andrews.ac.uk [email protected] Tom Kelsey ID5059-19-B &

More information

An Overview of Knowledge Discovery Database and Data mining Techniques

An Overview of Knowledge Discovery Database and Data mining Techniques An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,

More information

Lecture 6. Artificial Neural Networks

Lecture 6. Artificial Neural Networks Lecture 6 Artificial Neural Networks 1 1 Artificial Neural Networks In this note we provide an overview of the key concepts that have led to the emergence of Artificial Neural Networks as a major paradigm

More information

Index Contents Page No. Introduction . Data Mining & Knowledge Discovery

Index Contents Page No. Introduction . Data Mining & Knowledge Discovery Index Contents Page No. 1. Introduction 1 1.1 Related Research 2 1.2 Objective of Research Work 3 1.3 Why Data Mining is Important 3 1.4 Research Methodology 4 1.5 Research Hypothesis 4 1.6 Scope 5 2.

More information

Analysis Tools and Libraries for BigData

Analysis Tools and Libraries for BigData + Analysis Tools and Libraries for BigData Lecture 02 Abhijit Bendale + Office Hours 2 n Terry Boult (Waiting to Confirm) n Abhijit Bendale (Tue 2:45 to 4:45 pm). Best if you email me in advance, but I

More information

Clustering Connectionist and Statistical Language Processing

Clustering Connectionist and Statistical Language Processing Clustering Connectionist and Statistical Language Processing Frank Keller [email protected] Computerlinguistik Universität des Saarlandes Clustering p.1/21 Overview clustering vs. classification supervised

More information

Learning outcomes. Knowledge and understanding. Competence and skills

Learning outcomes. Knowledge and understanding. Competence and skills Syllabus Master s Programme in Statistics and Data Mining 120 ECTS Credits Aim The rapid growth of databases provides scientists and business people with vast new resources. This programme meets the challenges

More information

From Raw Data to. Actionable Insights with. MATLAB Analytics. Learn more. Develop predictive models. 1Access and explore data

From Raw Data to. Actionable Insights with. MATLAB Analytics. Learn more. Develop predictive models. 1Access and explore data 100 001 010 111 From Raw Data to 10011100 Actionable Insights with 00100111 MATLAB Analytics 01011100 11100001 1 Access and Explore Data For scientists the problem is not a lack of available but a deluge.

More information

Novelty Detection in image recognition using IRF Neural Networks properties

Novelty Detection in image recognition using IRF Neural Networks properties Novelty Detection in image recognition using IRF Neural Networks properties Philippe Smagghe, Jean-Luc Buessler, Jean-Philippe Urban Université de Haute-Alsace MIPS 4, rue des Frères Lumière, 68093 Mulhouse,

More information

Distributed forests for MapReduce-based machine learning

Distributed forests for MapReduce-based machine learning Distributed forests for MapReduce-based machine learning Ryoji Wakayama, Ryuei Murata, Akisato Kimura, Takayoshi Yamashita, Yuji Yamauchi, Hironobu Fujiyoshi Chubu University, Japan. NTT Communication

More information

HYBRID PROBABILITY BASED ENSEMBLES FOR BANKRUPTCY PREDICTION

HYBRID PROBABILITY BASED ENSEMBLES FOR BANKRUPTCY PREDICTION HYBRID PROBABILITY BASED ENSEMBLES FOR BANKRUPTCY PREDICTION Chihli Hung 1, Jing Hong Chen 2, Stefan Wermter 3, 1,2 Department of Management Information Systems, Chung Yuan Christian University, Taiwan

More information

IBM SPSS Neural Networks 22

IBM SPSS Neural Networks 22 IBM SPSS Neural Networks 22 Note Before using this information and the product it supports, read the information in Notices on page 21. Product Information This edition applies to version 22, release 0,

More information

Practical Applications of DATA MINING. Sang C Suh Texas A&M University Commerce JONES & BARTLETT LEARNING

Practical Applications of DATA MINING. Sang C Suh Texas A&M University Commerce JONES & BARTLETT LEARNING Practical Applications of DATA MINING Sang C Suh Texas A&M University Commerce r 3 JONES & BARTLETT LEARNING Contents Preface xi Foreword by Murat M.Tanik xvii Foreword by John Kocur xix Chapter 1 Introduction

More information

Random forest algorithm in big data environment

Random forest algorithm in big data environment Random forest algorithm in big data environment Yingchun Liu * School of Economics and Management, Beihang University, Beijing 100191, China Received 1 September 2014, www.cmnt.lv Abstract Random forest

More information

Machine Learning in Python with scikit-learn. O Reilly Webcast Aug. 2014

Machine Learning in Python with scikit-learn. O Reilly Webcast Aug. 2014 Machine Learning in Python with scikit-learn O Reilly Webcast Aug. 2014 Outline Machine Learning refresher scikit-learn How the project is structured Some improvements released in 0.15 Ongoing work for

More information

IT services for analyses of various data samples

IT services for analyses of various data samples IT services for analyses of various data samples Ján Paralič, František Babič, Martin Sarnovský, Peter Butka, Cecília Havrilová, Miroslava Muchová, Michal Puheim, Martin Mikula, Gabriel Tutoky Technical

More information

SELECTING NEURAL NETWORK ARCHITECTURE FOR INVESTMENT PROFITABILITY PREDICTIONS

SELECTING NEURAL NETWORK ARCHITECTURE FOR INVESTMENT PROFITABILITY PREDICTIONS UDC: 004.8 Original scientific paper SELECTING NEURAL NETWORK ARCHITECTURE FOR INVESTMENT PROFITABILITY PREDICTIONS Tonimir Kišasondi, Alen Lovren i University of Zagreb, Faculty of Organization and Informatics,

More information

Machine Learning with MATLAB David Willingham Application Engineer

Machine Learning with MATLAB David Willingham Application Engineer Machine Learning with MATLAB David Willingham Application Engineer 2014 The MathWorks, Inc. 1 Goals Overview of machine learning Machine learning models & techniques available in MATLAB Streamlining the

More information

Supervised Learning (Big Data Analytics)

Supervised Learning (Big Data Analytics) Supervised Learning (Big Data Analytics) Vibhav Gogate Department of Computer Science The University of Texas at Dallas Practical advice Goal of Big Data Analytics Uncover patterns in Data. Can be used

More information

Oracle Database. Products Available on the Oracle Database Examples Media. Oracle Database Examples. Examples Installation Guide 11g Release 2 (11.

Oracle Database. Products Available on the Oracle Database Examples Media. Oracle Database Examples. Examples Installation Guide 11g Release 2 (11. Oracle Database Examples Installation Guide 11g Release 2 (11.2) E10846-01 August 2009 This document describes how to install and configure the products available on the Oracle Database Examples media.

More information

Data Mining: Concepts and Techniques. Jiawei Han. Micheline Kamber. Simon Fräser University К MORGAN KAUFMANN PUBLISHERS. AN IMPRINT OF Elsevier

Data Mining: Concepts and Techniques. Jiawei Han. Micheline Kamber. Simon Fräser University К MORGAN KAUFMANN PUBLISHERS. AN IMPRINT OF Elsevier Data Mining: Concepts and Techniques Jiawei Han Micheline Kamber Simon Fräser University К MORGAN KAUFMANN PUBLISHERS AN IMPRINT OF Elsevier Contents Foreword Preface xix vii Chapter I Introduction I I.

More information

Visualization of large data sets using MDS combined with LVQ.

Visualization of large data sets using MDS combined with LVQ. Visualization of large data sets using MDS combined with LVQ. Antoine Naud and Włodzisław Duch Department of Informatics, Nicholas Copernicus University, Grudziądzka 5, 87-100 Toruń, Poland. www.phys.uni.torun.pl/kmk

More information

Prof. Pietro Ducange Students Tutor and Practical Classes Course of Business Intelligence 2014 http://www.iet.unipi.it/p.ducange/esercitazionibi/

Prof. Pietro Ducange Students Tutor and Practical Classes Course of Business Intelligence 2014 http://www.iet.unipi.it/p.ducange/esercitazionibi/ Prof. Pietro Ducange Students Tutor and Practical Classes Course of Business Intelligence 2014 http://www.iet.unipi.it/p.ducange/esercitazionibi/ Email: [email protected] Office: Dipartimento di Ingegneria

More information

Maschinelles Lernen mit MATLAB

Maschinelles Lernen mit MATLAB Maschinelles Lernen mit MATLAB Jérémy Huard Applikationsingenieur The MathWorks GmbH 2015 The MathWorks, Inc. 1 Machine Learning is Everywhere Image Recognition Speech Recognition Stock Prediction Medical

More information

Classification of Titanic Passenger Data and Chances of Surviving the Disaster Data Mining with Weka and Kaggle Competition Data

Classification of Titanic Passenger Data and Chances of Surviving the Disaster Data Mining with Weka and Kaggle Competition Data Proceedings of Student-Faculty Research Day, CSIS, Pace University, May 2 nd, 2014 Classification of Titanic Passenger Data and Chances of Surviving the Disaster Data Mining with Weka and Kaggle Competition

More information

Chapter 12 Discovering New Knowledge Data Mining

Chapter 12 Discovering New Knowledge Data Mining Chapter 12 Discovering New Knowledge Data Mining Becerra-Fernandez, et al. -- Knowledge Management 1/e -- 2004 Prentice Hall Additional material 2007 Dekai Wu Chapter Objectives Introduce the student to

More information

American International Journal of Research in Science, Technology, Engineering & Mathematics

American International Journal of Research in Science, Technology, Engineering & Mathematics American International Journal of Research in Science, Technology, Engineering & Mathematics Available online at http://www.iasir.net ISSN (Print): 2328-349, ISSN (Online): 2328-3580, ISSN (CD-ROM): 2328-3629

More information

Data Mining for Customer Service Support. Senioritis Seminar Presentation Megan Boice Jay Carter Nick Linke KC Tobin

Data Mining for Customer Service Support. Senioritis Seminar Presentation Megan Boice Jay Carter Nick Linke KC Tobin Data Mining for Customer Service Support Senioritis Seminar Presentation Megan Boice Jay Carter Nick Linke KC Tobin Traditional Hotline Services Problem Traditional Customer Service Support (manufacturing)

More information

COMPUTATIONIMPROVEMENTOFSTOCKMARKETDECISIONMAKING MODELTHROUGHTHEAPPLICATIONOFGRID. Jovita Nenortaitė

COMPUTATIONIMPROVEMENTOFSTOCKMARKETDECISIONMAKING MODELTHROUGHTHEAPPLICATIONOFGRID. Jovita Nenortaitė ISSN 1392 124X INFORMATION TECHNOLOGY AND CONTROL, 2005, Vol.34, No.3 COMPUTATIONIMPROVEMENTOFSTOCKMARKETDECISIONMAKING MODELTHROUGHTHEAPPLICATIONOFGRID Jovita Nenortaitė InformaticsDepartment,VilniusUniversityKaunasFacultyofHumanities

More information

DATA MINING IN FINANCE

DATA MINING IN FINANCE DATA MINING IN FINANCE Advances in Relational and Hybrid Methods by BORIS KOVALERCHUK Central Washington University, USA and EVGENII VITYAEV Institute of Mathematics Russian Academy of Sciences, Russia

More information

Learning to Process Natural Language in Big Data Environment

Learning to Process Natural Language in Big Data Environment CCF ADL 2015 Nanchang Oct 11, 2015 Learning to Process Natural Language in Big Data Environment Hang Li Noah s Ark Lab Huawei Technologies Part 1: Deep Learning - Present and Future Talk Outline Overview

More information

Chapter 6. The stacking ensemble approach

Chapter 6. The stacking ensemble approach 82 This chapter proposes the stacking ensemble approach for combining different data mining classifiers to get better performance. Other combination techniques like voting, bagging etc are also described

More information

DATA MINING TOOL FOR INTEGRATED COMPLAINT MANAGEMENT SYSTEM WEKA 3.6.7

DATA MINING TOOL FOR INTEGRATED COMPLAINT MANAGEMENT SYSTEM WEKA 3.6.7 DATA MINING TOOL FOR INTEGRATED COMPLAINT MANAGEMENT SYSTEM WEKA 3.6.7 UNDER THE GUIDANCE Dr. N.P. DHAVALE, DGM, INFINET Department SUBMITTED TO INSTITUTE FOR DEVELOPMENT AND RESEARCH IN BANKING TECHNOLOGY

More information

In this presentation, you will be introduced to data mining and the relationship with meaningful use.

In this presentation, you will be introduced to data mining and the relationship with meaningful use. In this presentation, you will be introduced to data mining and the relationship with meaningful use. Data mining refers to the art and science of intelligent data analysis. It is the application of machine

More information

An Evaluation of Neural Networks Approaches used for Software Effort Estimation

An Evaluation of Neural Networks Approaches used for Software Effort Estimation Proc. of Int. Conf. on Multimedia Processing, Communication and Info. Tech., MPCIT An Evaluation of Neural Networks Approaches used for Software Effort Estimation B.V. Ajay Prakash 1, D.V.Ashoka 2, V.N.

More information

Knowledge Based Descriptive Neural Networks

Knowledge Based Descriptive Neural Networks Knowledge Based Descriptive Neural Networks J. T. Yao Department of Computer Science, University or Regina Regina, Saskachewan, CANADA S4S 0A2 Email: [email protected] Abstract This paper presents a

More information

EFFICIENCY OF DECISION TREES IN PREDICTING STUDENT S ACADEMIC PERFORMANCE

EFFICIENCY OF DECISION TREES IN PREDICTING STUDENT S ACADEMIC PERFORMANCE EFFICIENCY OF DECISION TREES IN PREDICTING STUDENT S ACADEMIC PERFORMANCE S. Anupama Kumar 1 and Dr. Vijayalakshmi M.N 2 1 Research Scholar, PRIST University, 1 Assistant Professor, Dept of M.C.A. 2 Associate

More information

Machine Learning. 01 - Introduction

Machine Learning. 01 - Introduction Machine Learning 01 - Introduction Machine learning course One lecture (Wednesday, 9:30, 346) and one exercise (Monday, 17:15, 203). Oral exam, 20 minutes, 5 credit points. Some basic mathematical knowledge

More information

Predictive Analytics Techniques: What to Use For Your Big Data. March 26, 2014 Fern Halper, PhD

Predictive Analytics Techniques: What to Use For Your Big Data. March 26, 2014 Fern Halper, PhD Predictive Analytics Techniques: What to Use For Your Big Data March 26, 2014 Fern Halper, PhD Presenter Proven Performance Since 1995 TDWI helps business and IT professionals gain insight about data warehousing,

More information

Data Mining Analytics for Business Intelligence and Decision Support

Data Mining Analytics for Business Intelligence and Decision Support Data Mining Analytics for Business Intelligence and Decision Support Chid Apte, T.J. Watson Research Center, IBM Research Division Knowledge Discovery and Data Mining (KDD) techniques are used for analyzing

More information

Sanjeev Kumar. contribute

Sanjeev Kumar. contribute RESEARCH ISSUES IN DATAA MINING Sanjeev Kumar I.A.S.R.I., Library Avenue, Pusa, New Delhi-110012 [email protected] 1. Introduction The field of data mining and knowledgee discovery is emerging as a

More information

BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL

BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL The Fifth International Conference on e-learning (elearning-2014), 22-23 September 2014, Belgrade, Serbia BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL SNJEŽANA MILINKOVIĆ University

More information

Predictive Model Representation and Comparison: Towards Data and Predictive Models Governance

Predictive Model Representation and Comparison: Towards Data and Predictive Models Governance Predictive Model Representation and Comparison Towards Data and Predictive Models Governance Mokhairi Makhtar, Daniel C. Neagu, Mick Ridley School of Computing, Informatics and Media, University of Bradford

More information

Web Document Clustering

Web Document Clustering Web Document Clustering Lab Project based on the MDL clustering suite http://www.cs.ccsu.edu/~markov/mdlclustering/ Zdravko Markov Computer Science Department Central Connecticut State University New Britain,

More information

Study Plan for the Master Degree In Industrial Engineering / Management. (Thesis Track)

Study Plan for the Master Degree In Industrial Engineering / Management. (Thesis Track) Study Plan for the Master Degree In Industrial Engineering / Management (Thesis Track) Plan no. 2005 T A. GENERAL RULES AND CONDITIONS: 1. This plan conforms to the valid regulations of programs of graduate

More information

Prediction Model for Crude Oil Price Using Artificial Neural Networks

Prediction Model for Crude Oil Price Using Artificial Neural Networks Applied Mathematical Sciences, Vol. 8, 2014, no. 80, 3953-3965 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/ams.2014.43193 Prediction Model for Crude Oil Price Using Artificial Neural Networks

More information

NEURAL NETWORK FUNDAMENTALS WITH GRAPHS, ALGORITHMS, AND APPLICATIONS

NEURAL NETWORK FUNDAMENTALS WITH GRAPHS, ALGORITHMS, AND APPLICATIONS NEURAL NETWORK FUNDAMENTALS WITH GRAPHS, ALGORITHMS, AND APPLICATIONS N. K. Bose HRB-Systems Professor of Electrical Engineering The Pennsylvania State University, University Park P. Liang Associate Professor

More information

Make Better Decisions Through Predictive Intelligence

Make Better Decisions Through Predictive Intelligence IBM SPSS Modeler Professional Make Better Decisions Through Predictive Intelligence Highlights Easily access, prepare and model structured data with this intuitive, visual data mining workbench Rapidly

More information

What s Cooking in KNIME

What s Cooking in KNIME What s Cooking in KNIME Thomas Gabriel Copyright 2015 KNIME.com AG Agenda Querying NoSQL Databases Database Improvements & Big Data Copyright 2015 KNIME.com AG 2 Querying NoSQL Databases MongoDB & CouchDB

More information

Master Specialization in Knowledge Engineering

Master Specialization in Knowledge Engineering Master Specialization in Knowledge Engineering Pavel Kordík, Ph.D. Department of Computer Science Faculty of Information Technology Czech Technical University in Prague Prague, Czech Republic http://www.fit.cvut.cz/en

More information

Eastern Washington University Department of Computer Science. Questionnaire for Prospective Masters in Computer Science Students

Eastern Washington University Department of Computer Science. Questionnaire for Prospective Masters in Computer Science Students Eastern Washington University Department of Computer Science Questionnaire for Prospective Masters in Computer Science Students I. Personal Information Name: Last First M.I. Mailing Address: Permanent

More information

Football Match Winner Prediction

Football Match Winner Prediction Football Match Winner Prediction Kushal Gevaria 1, Harshal Sanghavi 2, Saurabh Vaidya 3, Prof. Khushali Deulkar 4 Department of Computer Engineering, Dwarkadas J. Sanghvi College of Engineering, Mumbai,

More information

INTERNATIONAL JOURNAL FOR ENGINEERING APPLICATIONS AND TECHNOLOGY DATA MINING IN HEALTHCARE SECTOR. [email protected]

INTERNATIONAL JOURNAL FOR ENGINEERING APPLICATIONS AND TECHNOLOGY DATA MINING IN HEALTHCARE SECTOR. ankitanandurkar2394@gmail.com IJFEAT INTERNATIONAL JOURNAL FOR ENGINEERING APPLICATIONS AND TECHNOLOGY DATA MINING IN HEALTHCARE SECTOR Bharti S. Takey 1, Ankita N. Nandurkar 2,Ashwini A. Khobragade 3,Pooja G. Jaiswal 4,Swapnil R.

More information

Machine Learning and Data Mining -

Machine Learning and Data Mining - Machine Learning and Data Mining - Perceptron Neural Networks Nuno Cavalheiro Marques ([email protected]) Spring Semester 2010/2011 MSc in Computer Science Multi Layer Perceptron Neurons and the Perceptron

More information

Data Mining. Supervised Methods. Ciro Donalek [email protected]. Ay/Bi 199ab: Methods of Computa@onal Sciences hcp://esci101.blogspot.

Data Mining. Supervised Methods. Ciro Donalek donalek@astro.caltech.edu. Ay/Bi 199ab: Methods of Computa@onal Sciences hcp://esci101.blogspot. Data Mining Supervised Methods Ciro Donalek [email protected] Supervised Methods Summary Ar@ficial Neural Networks Mul@layer Perceptron Support Vector Machines SoLwares Supervised Models: Supervised

More information