SVM Ensemble Model for Investment Prediction

Save this PDF as:
 WORD  PNG  TXT  JPG

Size: px
Start display at page:

Download "SVM Ensemble Model for Investment Prediction"

Transcription

1 19 SVM Ensemble Model for Investment Prediction Chandra J, Assistant Professor, Department of Computer Science, Christ University, Bangalore Siji T. Mathew, Research Scholar, Christ University, Dept of Computer Science, Bangalore ABSTRACT This paper analyses the usage of SVM Ensemble for Investment service. Objective of this work is to construct a model for investment prediction. Ensemble learning is a machine-learning paradigm where multiple models are trained to solve the problem. In this paper, a detailed study of SVM ensemble is done. An insurance dataset obtained from UCI knowledge discovery in Databases Archive was taken. The AdaBoost, multiclassifier SVM Ensemble was created and tested with the insurance dataset. From this work, the SVM Ensemble produces better accuracy than other ensembles. The knowledge flow of SVM ensemble model was created using Weka tool. This model helps the user to predict the best policy for investment Keywords SVM Ensemble, AdaBoost, Multiclassifier, Accuracy, ROC. 1. INTRODUCTION In this research, The SVM Ensemble model is created which is highly reliable for investment sector. We present a novel method based on SVM Ensemble classification. The Business intelligence helps the companies in their decision support system. A business intelligence technology gives both historical and current views of the businesses process and using this knowledge the decision-making system can produce good results. Different categories of Policies were selected from UCI knowledge discovery in Databases Archive. Data mining is the process of analysing data from different perspectives and summarizing it into useful information. Data are any facts, text or numbers that can be processed out. The information is useful to increase the revenue and reduce the costs of the organization. Datamining is called as data or knowledge discovery from data. Datamining is the process of finding correlations or knowledge among different fields in large relational databases where the information is stored and available for mining. Datamining tools allow the users to analyze the data from different dimensions. A data mining system contains data, information and knowledge to extract these data and information. Companies with a strong consumer focus mainly use datamining in these days. It is applied in retail field, financial sector, communication media, and in marketing organizations. Datamining facilitate these companies to determine relationships among company internal factors such as price, product positioning, or staff skills, and external factors such as competition in products, economic indicators, and customer demographics. The impacts of different factors determine the effects on sales, corporate profits, product quality and customer satisfaction. With the application of datamining, a retailer could use point-of-sale records of customer purchases to send targeted promotions based on an individual's purchase history [14]. By mining demographic data collected from various sources, the retailer could develop products and promotions to appeal to specific customer segments. 2. METHODOLOGY In this research work, the multi-classifier SVM Ensemble creation uses AdaBoost.M1 algorithm, C- SVC and Majority voting method. A. AdaBoost Algorithm AdaBoost is a Machine Learning algorithm used for improving the performance of learning algorithm [18]. AdaBoost algorithm is used to boost the accuracy of the Support Vector Machines. Let D be the given dataset with d class-labelled tuples (X 1, y1), (X 2, y2) (X d, Y d ). In the initial step, AdaBoost assign each training tuple an equal weight of 1/d. For generating the ensemble, AdaBoost algorithm require k rounds through the rest of the algorithm. In round I, the tuples from D are sampled to form a training set, D i of size d. sampling with replacement is used in AdaBoost. The selection of each sample depends on its weight. A classifier model, M i is derived from the training tuple of D i its error is calculated Di as the test set. If the tuple was incorrectly classified, its weight is increased. If a tuple is classified as correct, its weight is decreased. These weights are used to generate classifiers in the next round. B. Support Vector Machines A Support Vector Machine is a type of classifier, which can handle both linear and non-linear data. A classifier takes a feature vector and assigns a class or a label to the vector. The goal of the Support Vector Machine is to predict target value of data instances in testing set with better accuracy. The number of

2 20 elements in the feature vector corresponds to its dimensionality. Support Vector Machine consists of a set of related supervised learning methods. During the training stage, Support Vector Machine finds the maximum-margin hyperplane between different classes. The best hyperplane is selected so that the distance from it to the nearest data point on each side is maximised. Such classifiers are called as maximummargin hyperplane. This is the line in two dimensions, plane in three dimensions or hyperplane in higher dimensions that maximises the distance to the nearest data point. Cross-validation is used to reduce the chances of overfitting. The vectors or the data points that are closest to the hyperplane are called the support vectors. Let D be the dataset for linear SVM. It contains a set of points of the from D = {(Xi, yi ) Xi Rp, yi {-1,1}} Where yi is either 1 or 1 and it indicate the class to which the point Xi belongs. Equation for any hyper plane can be written as w. x b = 0, Where. Indicate the dot product, b is a constant and w is the normal vector to the hyper plane. Hyper plane with maximum distance away to separate the dataset can be described as w. x b = 1 And w. x - b= -1. The Support Vector Machines are linear functions of the form f(x) = w T x + b, where w is the weight vector and x is the input vector. Let the set of training examples be {(x 1, y 1 ), (x 2, y 2 ),, (x n, y n )}, where x i is an input vector and y i is its class label, y i {1, -1}. To find the linear function: Minimize: 1 W T W 2 Subject to the constraint: Y i (W T X i + b)>=1, i=1,2,3.n Where the index i, represents number of training cases. The C-SVC classifier with linear kernel is available in LIBSVM of WEKA. The C-SVC is used for experimental result. The Main features of LIBSVM include different SVM formulations, efficient multi-class classification, Cross validation for model selection, Probability estimates, various kernel functions and Weighted SVM for unbalanced data. C. SVM Ensemble SVM Ensemble classifier is a collection of several SVM classifiers whose individual decisions are combined to classify the test samples [4].An ensemble shows better performance than individual classifiers from which it is constructed. The SVM Ensemble classification prediction includes two levels: classifier construction and the usage of the classifier. The Model is constructed from the training set. Each sample in the training set is assumed to belong to a predefined class, as determined by the class attribute label [5]. Using WEKA knowledge flow and classifier selection, a boosted SVM Ensemble is created [10]. The created model is used for further prediction. The later involves the use of SVM Ensemble built to predict or classify the output. The processes start by training an SVM classifier with a less imbalanced subset of data, and then classify the entire training data set with the SVM to identify the incorrectly classified examples. Training another SVM classifier can reinforce the first classifier [15]. The process is repeated until we obtain an SVM ensemble in which each classifier tries to enhance the performance of its previous one. The trained models are aggregated using majority voting to obtain a collective outcome. SVM ensemble was created using C- SVC. SVM is a kernel-based algorithm. A kernel is a function that transforms the input data to a high-dimensional space where the problem is solved. Kernel functions can be linear or non-linear. SVM has built-in mechanisms that automatically choose appropriate settings based on the data. In C-SVC, SVM type available is LIBSVM, which uses One-against-one approach for multi class classification and builds k (k- 1)/2 binary classifiers for given K classifier. The classifiers are trained separately with classes against each other. The testing stage is one of the most critical phases of any classification model development process, which gives the model developer a most informative measure about the classifier performance. The classifier can helpful to justify its use, leading to possible optimisation. The final decision of ensemble is obtained after combining the individual predictions of ensemble members. Majority voting is used for the aggregation. D. Majority voting The majority voting is a method for combining several SVMs output. Majority vote counts the votes for each

3 21 class over the input classifiers and selects the majority class [15]. It is used for aggregating the results form various ensemble models. Let f k, (k=1,2,., K) be a decision function of the k th SVM in the SVM ensemble. Also let C j (j=1, 2 C) denote a label of the j th class. Then, the number of SVMs whose decisions are known to the j th class be N j =# { k f k (x) = Cj }. The final decision of the SVM Ensemble fmv(x) for a given test vector x due to the majority voting is determined by fmv(x) = arg j max Nj. In majority voting, the models are trained and allowed to vote. The one with high majority vote is selected for the individual models. By aggregating, the accuracy of the SVMs ensemble is increased. method is used. The final result is obtained and saved automatically into a file. This file is taken and used in the developed Investment prediction tool. The accuracy value of the policies varies from one policy to another. The policy, which is having maximum accuracy, is taken as the best policy among that category. 3. EXPERIMENTAL DESIGN A. Dataset Description This research makes use of dataset available from UCI Knowledge Discovery in Databases Archive [20]. Different policies are added in order to make necessary analysis. B. Performance Measures Performance metrics are used to assess how accurately the model predicts the known values. If the model performs well and meets the business requirements, it can then be applied to new data to predict the future. For evaluating the performance of a classifier confusion matrix, accuracy values etc are used. The accuracy is defined as, Accuracy = TP+TN/(TP+FP+TN+FN). Where TP, FP, TN and FN are the numbers of true positive predictions, false positive predictions, true negative predictions and false negative predictions, respectively. The Precision, Recall, F-Measures etc are also listed from the WEKA classifier performance evaluator. 4. RESULT ANALYSIS The SVM Ensemble model is created in weka knowledge flow. The first experiment was done using the Fire Policy. The knowledge flow for Fire policy is given in figure 2. There are three different fire policies. Each fire policy data is loaded using CSV loader data source. The loaded dataset is assigned into a train-test split maker using a class assigner. The class assigner assigns the class label of each policy. The loaded dataset is divided into training data and as testing data. 75% of data is taken as the training data and 25% of data was taken as testing data. The training data is used to build the SVM ensemble classifier. The test data is used to test the build classifier. A multi classified SVM ensemble is created. There are three C-SVC Support Vector Machines were selected and applied to AdaBoost. M1, to create a SVM ensemble.the kernel type is chosen as linear. This result is applied to Multiclassifier. In multiclass classifier one-against-one Figure 2.1.SVM Ensemble Model using Knowledge flow Figure 4.1 Parameter settings for SVM Ensemble Model Below table shows the detailed analyzed value, obtained for Fire policies with SVM Ensemble model and single SVM. Table 1. Single SVM vs SVM ensemble RE-1 RE-2 RE-3 F-Measure ROC-Area Single SVM SVM ensemble

4 22 The same policies were tested with single SVM. From the values obtained after the testing, it is found out that SVM ensemble gives better accuracy than single SVM Following figure shows, 38% of customers were chosen for RE-1 policy and 37% people had chosen RE-2 policy. Only 25% of the customers were taken RE-3 policy. 25% 37% fire policies 38% RE-1 RE-2 RE-3 from a given list. The detailed analysis of the Insurance dataset was done using the suggested model which was loaded in the WEKA knowledge flow. The policy, which was having maximum accuracy, has taken as the best policy among that category. From this research work, it is found that the accuracy of SVM Ensemble is better than other ensemble methods and is showing a better Investment prediction for different categories of policies. 5. CONCLUSION The experiment was carried out for different policies in the same way and found that the SVM ensemble performs better than the single classifier. Fast and accurate classifier is used for making investment prediction.svm Ensemble classifier is well-known as the state-of-art ensemble classifier for insurance field. In future, this research work can be extended to develop a Prediction Tool for investment field. REFERENCES Figure 4.2: Percentage of Policyholders The best fire policy output is shown in Investment Predictor Tool. All categories of policies were listed out in the Investment Predictor Tool. To get best policy for Fire policy, select fire policy from the list of Policy list and click on Get Result button. The analyzed results of three fire policies are stored, while running the WEKA knowledge flow model. The best policy having maximum accuracy is shown in fire policy. The RE-1 policy is selected as the best policy. This model was created and tested with other investment policies also. Figure 5.1: Best Policy prediction for FIRE In this paper, the data is collected and analyzed using multi-classifier SVM. Based on SVM ensemble classifier, the best policies were predicted for each category. The prediction helps to choose the best policy [1] Siji T.Mathew, Chandra J.Estimating the performance of SVM using Personal equity plan, in proceedings Of International Conf On Mathematics In Engineering &Business Management, pp , March 9-10,2012. [2] Siji T. Mathew, SVM Ensemble for Insurance Data Analysis, Mphil Dissertation, Christ University, Jun [3] Chawla, Nitesh V (2004). Learning Ensembles from Bites : A Scalable and Accurate Approach. Journal of Machine Learning Research, pp [4] C.Cortes,V.Vapnik,Support vector network, Machine Learning, pp ,1995. [5] Fumera, G, Roli, F, A theoretical and experimental analysis of linear combiners for multiple classifier systems Pattern Analysis and Machine Intelligence,IEEE Transactions, vol.27, no.6, pp [6] Harris Drucker et al, Support vector machines for spam categorization IEEE Transactions On Neural Networks,1999. [7] Kuncheva LI.,Combining Pattern Classifiers. Methods and Algorithms,2004. [8] Leon Bottou and Chih-Jen Lin, Support vectormachine Solvers in Large Scale Kernel Machines. Weston editors, MIT Press, Cambridge, MA, pp [9] Ma Chao, Chen Xihong, A New Algorithm of Support Vector Machine Ensemble and Its Application,Intelligent Human- Machine Systems and Cybernetics (IHMSC), 2nd International Conference, ,2010. [10] Mark Hall, et al(2009),the WEKA Data Mining

5 23 [11] Software: An Update SIGKDD Explorations, vol 11. [12] Nicolas Garcia-Pedrajas, Constructing Ensembles of Classifiers by Means of Weighted Instance Selection, IEEE Transactions On Neural Networks, vol. 20, NO. 2,2009. [13] Nikunj C. Oza and Kagan Tumer, Key Real- World Applications of Classifier Ensembles Information Fusion, Special Issue on Applications of Ensemble Methods, pp. 4 20,2008. [14] Petr Hajek, Vladimir Olej, Municipal Revenue Prediction by Support Vector Machine ensemble, ICCOMP 10:Proc. of the 14th WSEAS international Conf. On Computers: part of the 14th WSEAS CSCC multi Conf. vol.1,2010. [15] Ravi V.H et al, Soft computing system for bank performance Prediction. Elsevier Applied Soft Computing, [16] Robi Policar Ensemble based systems in decision making IEEE circuits and systems magazine, [17] Shin, Kim, An application of support vector machine in bankruptcy prediction model Expert System with applications,vol ,2005. [18] Bhardwaj M., Gupta, T., Grover T, Bhatnagar V An efficient classifier ensemble using SVM, Methods and Models in Computer Science, ICM2CS, Proceeding of International Conf, vol.2,2009. [19] Yan-Shi Dong, Ke-Song Han, Boosting SVM Classifiers By Ensemble, Special interest tracks and posters of the 14thinternational Conf on World Wide web May, pp.10-14,2005. [20] Ye Li, Yun Ze cal, et al, Fault diagnosis based on support vector machine ensemble, Machine Learning and Cybernetics. Pp ,2005. [21] Dataset available

Ensemble Methods. Knowledge Discovery and Data Mining 2 (VU) (707.004) Roman Kern. KTI, TU Graz 2015-03-05

Ensemble Methods. Knowledge Discovery and Data Mining 2 (VU) (707.004) Roman Kern. KTI, TU Graz 2015-03-05 Ensemble Methods Knowledge Discovery and Data Mining 2 (VU) (707004) Roman Kern KTI, TU Graz 2015-03-05 Roman Kern (KTI, TU Graz) Ensemble Methods 2015-03-05 1 / 38 Outline 1 Introduction 2 Classification

More information

Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms

Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms Scott Pion and Lutz Hamel Abstract This paper presents the results of a series of analyses performed on direct mail

More information

Knowledge Discovery from patents using KMX Text Analytics

Knowledge Discovery from patents using KMX Text Analytics Knowledge Discovery from patents using KMX Text Analytics Dr. Anton Heijs anton.heijs@treparel.com Treparel Abstract In this white paper we discuss how the KMX technology of Treparel can help searchers

More information

Artificial Neural Network, Decision Tree and Statistical Techniques Applied for Designing and Developing E-mail Classifier

Artificial Neural Network, Decision Tree and Statistical Techniques Applied for Designing and Developing E-mail Classifier International Journal of Recent Technology and Engineering (IJRTE) ISSN: 2277-3878, Volume-1, Issue-6, January 2013 Artificial Neural Network, Decision Tree and Statistical Techniques Applied for Designing

More information

Search Taxonomy. Web Search. Search Engine Optimization. Information Retrieval

Search Taxonomy. Web Search. Search Engine Optimization. Information Retrieval Information Retrieval INFO 4300 / CS 4300! Retrieval models Older models» Boolean retrieval» Vector Space model Probabilistic Models» BM25» Language models Web search» Learning to Rank Search Taxonomy!

More information

Classification algorithm in Data mining: An Overview

Classification algorithm in Data mining: An Overview Classification algorithm in Data mining: An Overview S.Neelamegam #1, Dr.E.Ramaraj *2 #1 M.phil Scholar, Department of Computer Science and Engineering, Alagappa University, Karaikudi. *2 Professor, Department

More information

Support Vector Machines with Clustering for Training with Very Large Datasets

Support Vector Machines with Clustering for Training with Very Large Datasets Support Vector Machines with Clustering for Training with Very Large Datasets Theodoros Evgeniou Technology Management INSEAD Bd de Constance, Fontainebleau 77300, France theodoros.evgeniou@insead.fr Massimiliano

More information

Data Mining Part 5. Prediction

Data Mining Part 5. Prediction Data Mining Part 5. Prediction 5.1 Spring 2010 Instructor: Dr. Masoud Yaghini Outline Classification vs. Numeric Prediction Prediction Process Data Preparation Comparing Prediction Methods References Classification

More information

A Study Of Bagging And Boosting Approaches To Develop Meta-Classifier

A Study Of Bagging And Boosting Approaches To Develop Meta-Classifier A Study Of Bagging And Boosting Approaches To Develop Meta-Classifier G.T. Prasanna Kumari Associate Professor, Dept of Computer Science and Engineering, Gokula Krishna College of Engg, Sullurpet-524121,

More information

A fast multi-class SVM learning method for huge databases

A fast multi-class SVM learning method for huge databases www.ijcsi.org 544 A fast multi-class SVM learning method for huge databases Djeffal Abdelhamid 1, Babahenini Mohamed Chaouki 2 and Taleb-Ahmed Abdelmalik 3 1,2 Computer science department, LESIA Laboratory,

More information

Performance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification

Performance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification Performance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification Tina R. Patil, Mrs. S. S. Sherekar Sant Gadgebaba Amravati University, Amravati tnpatil2@gmail.com, ss_sherekar@rediffmail.com

More information

Towards better accuracy for Spam predictions

Towards better accuracy for Spam predictions Towards better accuracy for Spam predictions Chengyan Zhao Department of Computer Science University of Toronto Toronto, Ontario, Canada M5S 2E4 czhao@cs.toronto.edu Abstract Spam identification is crucial

More information

Azure Machine Learning, SQL Data Mining and R

Azure Machine Learning, SQL Data Mining and R Azure Machine Learning, SQL Data Mining and R Day-by-day Agenda Prerequisites No formal prerequisites. Basic knowledge of SQL Server Data Tools, Excel and any analytical experience helps. Best of all:

More information

HYBRID PROBABILITY BASED ENSEMBLES FOR BANKRUPTCY PREDICTION

HYBRID PROBABILITY BASED ENSEMBLES FOR BANKRUPTCY PREDICTION HYBRID PROBABILITY BASED ENSEMBLES FOR BANKRUPTCY PREDICTION Chihli Hung 1, Jing Hong Chen 2, Stefan Wermter 3, 1,2 Department of Management Information Systems, Chung Yuan Christian University, Taiwan

More information

Chapter 6. The stacking ensemble approach

Chapter 6. The stacking ensemble approach 82 This chapter proposes the stacking ensemble approach for combining different data mining classifiers to get better performance. Other combination techniques like voting, bagging etc are also described

More information

Using Random Forest to Learn Imbalanced Data

Using Random Forest to Learn Imbalanced Data Using Random Forest to Learn Imbalanced Data Chao Chen, chenchao@stat.berkeley.edu Department of Statistics,UC Berkeley Andy Liaw, andy liaw@merck.com Biometrics Research,Merck Research Labs Leo Breiman,

More information

Behavior Analysis of SVM Based Spam Filtering Using Various Kernel Functions and Data Representations

Behavior Analysis of SVM Based Spam Filtering Using Various Kernel Functions and Data Representations ISSN: 2278-181 Vol. 2 Issue 9, September - 213 Behavior Analysis of SVM Based Spam Filtering Using Various Kernel Functions and Data Representations Author :Sushama Chouhan Author Affiliation: MTech Scholar

More information

Artificial Neural Networks and Support Vector Machines. CS 486/686: Introduction to Artificial Intelligence

Artificial Neural Networks and Support Vector Machines. CS 486/686: Introduction to Artificial Intelligence Artificial Neural Networks and Support Vector Machines CS 486/686: Introduction to Artificial Intelligence 1 Outline What is a Neural Network? - Perceptron learners - Multi-layer networks What is a Support

More information

Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data

Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data CMPE 59H Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data Term Project Report Fatma Güney, Kübra Kalkan 1/15/2013 Keywords: Non-linear

More information

Data Mining. Nonlinear Classification

Data Mining. Nonlinear Classification Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Nonlinear Classification Classes may not be separable by a linear boundary Suppose we randomly generate a data set as follows: X has range between 0 to 15

More information

A NEW DECISION TREE METHOD FOR DATA MINING IN MEDICINE

A NEW DECISION TREE METHOD FOR DATA MINING IN MEDICINE A NEW DECISION TREE METHOD FOR DATA MINING IN MEDICINE Kasra Madadipouya 1 1 Department of Computing and Science, Asia Pacific University of Technology & Innovation ABSTRACT Today, enormous amount of data

More information

Keywords Data mining, Classification Algorithm, Decision tree, J48, Random forest, Random tree, LMT, WEKA 3.7. Fig.1. Data mining techniques.

Keywords Data mining, Classification Algorithm, Decision tree, J48, Random forest, Random tree, LMT, WEKA 3.7. Fig.1. Data mining techniques. International Journal of Emerging Research in Management &Technology Research Article October 2015 Comparative Study of Various Decision Tree Classification Algorithm Using WEKA Purva Sewaiwar, Kamal Kant

More information

DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES

DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES Vijayalakshmi Mahanra Rao 1, Yashwant Prasad Singh 2 Multimedia University, Cyberjaya, MALAYSIA 1 lakshmi.mahanra@gmail.com

More information

AUTO CLAIM FRAUD DETECTION USING MULTI CLASSIFIER SYSTEM

AUTO CLAIM FRAUD DETECTION USING MULTI CLASSIFIER SYSTEM AUTO CLAIM FRAUD DETECTION USING MULTI CLASSIFIER SYSTEM ABSTRACT Luis Alexandre Rodrigues and Nizam Omar Department of Electrical Engineering, Mackenzie Presbiterian University, Brazil, São Paulo 71251911@mackenzie.br,nizam.omar@mackenzie.br

More information

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R Practical Data Science with Azure Machine Learning, SQL Data Mining, and R Overview This 4-day class is the first of the two data science courses taught by Rafal Lukawiecki. Some of the topics will be

More information

Classifiers & Classification

Classifiers & Classification Classifiers & Classification Forsyth & Ponce Computer Vision A Modern Approach chapter 22 Pattern Classification Duda, Hart and Stork School of Computer Science & Statistics Trinity College Dublin Dublin

More information

Addressing the Class Imbalance Problem in Medical Datasets

Addressing the Class Imbalance Problem in Medical Datasets Addressing the Class Imbalance Problem in Medical Datasets M. Mostafizur Rahman and D. N. Davis the size of the training set is significantly increased [5]. If the time taken to resample is not considered,

More information

CI6227: Data Mining. Lesson 11b: Ensemble Learning. Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore.

CI6227: Data Mining. Lesson 11b: Ensemble Learning. Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore. CI6227: Data Mining Lesson 11b: Ensemble Learning Sinno Jialin PAN Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore Acknowledgements: slides are adapted from the lecture notes

More information

Supervised Learning (Big Data Analytics)

Supervised Learning (Big Data Analytics) Supervised Learning (Big Data Analytics) Vibhav Gogate Department of Computer Science The University of Texas at Dallas Practical advice Goal of Big Data Analytics Uncover patterns in Data. Can be used

More information

Support Vector Machines Explained

Support Vector Machines Explained March 1, 2009 Support Vector Machines Explained Tristan Fletcher www.cs.ucl.ac.uk/staff/t.fletcher/ Introduction This document has been written in an attempt to make the Support Vector Machines (SVM),

More information

Ensemble Data Mining Methods

Ensemble Data Mining Methods Ensemble Data Mining Methods Nikunj C. Oza, Ph.D., NASA Ames Research Center, USA INTRODUCTION Ensemble Data Mining Methods, also known as Committee Methods or Model Combiners, are machine learning methods

More information

Active Learning SVM for Blogs recommendation

Active Learning SVM for Blogs recommendation Active Learning SVM for Blogs recommendation Xin Guan Computer Science, George Mason University Ⅰ.Introduction In the DH Now website, they try to review a big amount of blogs and articles and find the

More information

MAXIMIZING RETURN ON DIRECT MARKETING CAMPAIGNS

MAXIMIZING RETURN ON DIRECT MARKETING CAMPAIGNS MAXIMIZING RETURN ON DIRET MARKETING AMPAIGNS IN OMMERIAL BANKING S 229 Project: Final Report Oleksandra Onosova INTRODUTION Recent innovations in cloud computing and unified communications have made a

More information

A New Ensemble Model for Efficient Churn Prediction in Mobile Telecommunication

A New Ensemble Model for Efficient Churn Prediction in Mobile Telecommunication 2012 45th Hawaii International Conference on System Sciences A New Ensemble Model for Efficient Churn Prediction in Mobile Telecommunication Namhyoung Kim, Jaewook Lee Department of Industrial and Management

More information

1. Classification problems

1. Classification problems Neural and Evolutionary Computing. Lab 1: Classification problems Machine Learning test data repository Weka data mining platform Introduction Scilab 1. Classification problems The main aim of a classification

More information

FRAUD DETECTION IN ELECTRIC POWER DISTRIBUTION NETWORKS USING AN ANN-BASED KNOWLEDGE-DISCOVERY PROCESS

FRAUD DETECTION IN ELECTRIC POWER DISTRIBUTION NETWORKS USING AN ANN-BASED KNOWLEDGE-DISCOVERY PROCESS FRAUD DETECTION IN ELECTRIC POWER DISTRIBUTION NETWORKS USING AN ANN-BASED KNOWLEDGE-DISCOVERY PROCESS Breno C. Costa, Bruno. L. A. Alberto, André M. Portela, W. Maduro, Esdras O. Eler PDITec, Belo Horizonte,

More information

BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES

BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES ISSN: 2229-6956(ONLINE) ICTACT JOURNAL ON SOFT COMPUTING, JULY 2012, VOLUME: 02, ISSUE: 04 BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES V. Dheepa 1 and R. Dhanapal 2 1 Research

More information

An Approach to Detect Spam Emails by Using Majority Voting

An Approach to Detect Spam Emails by Using Majority Voting An Approach to Detect Spam Emails by Using Majority Voting Roohi Hussain Department of Computer Engineering, National University of Science and Technology, H-12 Islamabad, Pakistan Usman Qamar Faculty,

More information

Mining Direct Marketing Data by Ensembles of Weak Learners and Rough Set Methods

Mining Direct Marketing Data by Ensembles of Weak Learners and Rough Set Methods Mining Direct Marketing Data by Ensembles of Weak Learners and Rough Set Methods Jerzy B laszczyński 1, Krzysztof Dembczyński 1, Wojciech Kot lowski 1, and Mariusz Paw lowski 2 1 Institute of Computing

More information

T-61.3050 : Email Classification as Spam or Ham using Naive Bayes Classifier. Santosh Tirunagari : 245577

T-61.3050 : Email Classification as Spam or Ham using Naive Bayes Classifier. Santosh Tirunagari : 245577 T-61.3050 : Email Classification as Spam or Ham using Naive Bayes Classifier Santosh Tirunagari : 245577 January 20, 2011 Abstract This term project gives a solution how to classify an email as spam or

More information

Overview. Evaluation Connectionist and Statistical Language Processing. Test and Validation Set. Training and Test Set

Overview. Evaluation Connectionist and Statistical Language Processing. Test and Validation Set. Training and Test Set Overview Evaluation Connectionist and Statistical Language Processing Frank Keller keller@coli.uni-sb.de Computerlinguistik Universität des Saarlandes training set, validation set, test set holdout, stratification

More information

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015 An Introduction to Data Mining for Wind Power Management Spring 2015 Big Data World Every minute: Google receives over 4 million search queries Facebook users share almost 2.5 million pieces of content

More information

Data Mining Practical Machine Learning Tools and Techniques

Data Mining Practical Machine Learning Tools and Techniques Ensemble learning Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 8 of Data Mining by I. H. Witten, E. Frank and M. A. Hall Combining multiple models Bagging The basic idea

More information

CLASSIFYING NETWORK TRAFFIC IN THE BIG DATA ERA

CLASSIFYING NETWORK TRAFFIC IN THE BIG DATA ERA CLASSIFYING NETWORK TRAFFIC IN THE BIG DATA ERA Professor Yang Xiang Network Security and Computing Laboratory (NSCLab) School of Information Technology Deakin University, Melbourne, Australia http://anss.org.au/nsclab

More information

SURVIVABILITY OF COMPLEX SYSTEM SUPPORT VECTOR MACHINE BASED APPROACH

SURVIVABILITY OF COMPLEX SYSTEM SUPPORT VECTOR MACHINE BASED APPROACH 1 SURVIVABILITY OF COMPLEX SYSTEM SUPPORT VECTOR MACHINE BASED APPROACH Y, HONG, N. GAUTAM, S. R. T. KUMARA, A. SURANA, H. GUPTA, S. LEE, V. NARAYANAN, H. THADAKAMALLA The Dept. of Industrial Engineering,

More information

Feature Selection using Integer and Binary coded Genetic Algorithm to improve the performance of SVM Classifier

Feature Selection using Integer and Binary coded Genetic Algorithm to improve the performance of SVM Classifier Feature Selection using Integer and Binary coded Genetic Algorithm to improve the performance of SVM Classifier D.Nithya a, *, V.Suganya b,1, R.Saranya Irudaya Mary c,1 Abstract - This paper presents,

More information

Constrained Classification of Large Imbalanced Data by Logistic Regression and Genetic Algorithm

Constrained Classification of Large Imbalanced Data by Logistic Regression and Genetic Algorithm Constrained Classification of Large Imbalanced Data by Logistic Regression and Genetic Algorithm Martin Hlosta, Rostislav Stríž, Jan Kupčík, Jaroslav Zendulka, and Tomáš Hruška A. Imbalanced Data Classification

More information

BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL

BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL The Fifth International Conference on e-learning (elearning-2014), 22-23 September 2014, Belgrade, Serbia BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL SNJEŽANA MILINKOVIĆ University

More information

Introduction to Support Vector Machines. Colin Campbell, Bristol University

Introduction to Support Vector Machines. Colin Campbell, Bristol University Introduction to Support Vector Machines Colin Campbell, Bristol University 1 Outline of talk. Part 1. An Introduction to SVMs 1.1. SVMs for binary classification. 1.2. Soft margins and multi-class classification.

More information

Classification of Bad Accounts in Credit Card Industry

Classification of Bad Accounts in Credit Card Industry Classification of Bad Accounts in Credit Card Industry Chengwei Yuan December 12, 2014 Introduction Risk management is critical for a credit card company to survive in such competing industry. In addition

More information

Decision Support Systems

Decision Support Systems Decision Support Systems 50 (2011) 602 613 Contents lists available at ScienceDirect Decision Support Systems journal homepage: www.elsevier.com/locate/dss Data mining for credit card fraud: A comparative

More information

An Overview of Knowledge Discovery Database and Data mining Techniques

An Overview of Knowledge Discovery Database and Data mining Techniques An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,

More information

Social Media Mining. Data Mining Essentials

Social Media Mining. Data Mining Essentials Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers

More information

Data Clustering. Dec 2nd, 2013 Kyrylo Bessonov

Data Clustering. Dec 2nd, 2013 Kyrylo Bessonov Data Clustering Dec 2nd, 2013 Kyrylo Bessonov Talk outline Introduction to clustering Types of clustering Supervised Unsupervised Similarity measures Main clustering algorithms k-means Hierarchical Main

More information

Combining SVM classifiers for email anti-spam filtering

Combining SVM classifiers for email anti-spam filtering Combining SVM classifiers for email anti-spam filtering Ángela Blanco Manuel Martín-Merino Abstract Spam, also known as Unsolicited Commercial Email (UCE) is becoming a nightmare for Internet users and

More information

International Journal of Software and Web Sciences (IJSWS) www.iasir.net

International Journal of Software and Web Sciences (IJSWS) www.iasir.net International Association of Scientific Innovation and Research (IASIR) (An Association Unifying the Sciences, Engineering, and Applied Research) ISSN (Print): 2279-0063 ISSN (Online): 2279-0071 International

More information

Getting Even More Out of Ensemble Selection

Getting Even More Out of Ensemble Selection Getting Even More Out of Ensemble Selection Quan Sun Department of Computer Science The University of Waikato Hamilton, New Zealand qs12@cs.waikato.ac.nz ABSTRACT Ensemble Selection uses forward stepwise

More information

Advanced Ensemble Strategies for Polynomial Models

Advanced Ensemble Strategies for Polynomial Models Advanced Ensemble Strategies for Polynomial Models Pavel Kordík 1, Jan Černý 2 1 Dept. of Computer Science, Faculty of Information Technology, Czech Technical University in Prague, 2 Dept. of Computer

More information

Data Mining - Evaluation of Classifiers

Data Mining - Evaluation of Classifiers Data Mining - Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010

More information

Big Data Analytics CSCI 4030

Big Data Analytics CSCI 4030 High dim. data Graph data Infinite data Machine learning Apps Locality sensitive hashing PageRank, SimRank Filtering data streams SVM Recommen der systems Clustering Community Detection Web advertising

More information

Knowledge Discovery and Data Mining

Knowledge Discovery and Data Mining Knowledge Discovery and Data Mining Unit # 11 Sajjad Haider Fall 2013 1 Supervised Learning Process Data Collection/Preparation Data Cleaning Discretization Supervised/Unuspervised Identification of right

More information

International Journal of Computer Science Trends and Technology (IJCST) Volume 3 Issue 3, May-June 2015

International Journal of Computer Science Trends and Technology (IJCST) Volume 3 Issue 3, May-June 2015 RESEARCH ARTICLE OPEN ACCESS Data Mining Technology for Efficient Network Security Management Ankit Naik [1], S.W. Ahmad [2] Student [1], Assistant Professor [2] Department of Computer Science and Engineering

More information

II. RELATED WORK. Sentiment Mining

II. RELATED WORK. Sentiment Mining Sentiment Mining Using Ensemble Classification Models Matthew Whitehead and Larry Yaeger Indiana University School of Informatics 901 E. 10th St. Bloomington, IN 47408 {mewhiteh, larryy}@indiana.edu Abstract

More information

E-commerce Transaction Anomaly Classification

E-commerce Transaction Anomaly Classification E-commerce Transaction Anomaly Classification Minyong Lee minyong@stanford.edu Seunghee Ham sham12@stanford.edu Qiyi Jiang qjiang@stanford.edu I. INTRODUCTION Due to the increasing popularity of e-commerce

More information

Selecting Data Mining Model for Web Advertising in Virtual Communities

Selecting Data Mining Model for Web Advertising in Virtual Communities Selecting Data Mining for Web Advertising in Virtual Communities Jerzy Surma Faculty of Business Administration Warsaw School of Economics Warsaw, Poland e-mail: jerzy.surma@gmail.com Mariusz Łapczyński

More information

IDENTIFIC ATION OF SOFTWARE EROSION USING LOGISTIC REGRESSION

IDENTIFIC ATION OF SOFTWARE EROSION USING LOGISTIC REGRESSION http:// IDENTIFIC ATION OF SOFTWARE EROSION USING LOGISTIC REGRESSION Harinder Kaur 1, Raveen Bajwa 2 1 PG Student., CSE., Baba Banda Singh Bahadur Engg. College, Fatehgarh Sahib, (India) 2 Asstt. Prof.,

More information

Data Mining Practical Machine Learning Tools and Techniques

Data Mining Practical Machine Learning Tools and Techniques Counting the cost Data Mining Practical Machine Learning Tools and Techniques Slides for Section 5.7 In practice, different types of classification errors often incur different costs Examples: Loan decisions

More information

BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES

BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES 123 CHAPTER 7 BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES 7.1 Introduction Even though using SVM presents

More information

Analysis of kiva.com Microlending Service! Hoda Eydgahi Julia Ma Andy Bardagjy December 9, 2010 MAS.622j

Analysis of kiva.com Microlending Service! Hoda Eydgahi Julia Ma Andy Bardagjy December 9, 2010 MAS.622j Analysis of kiva.com Microlending Service! Hoda Eydgahi Julia Ma Andy Bardagjy December 9, 2010 MAS.622j What is Kiva? An organization that allows people to lend small amounts of money via the Internet

More information

Analysis of WEKA Data Mining Algorithm REPTree, Simple Cart and RandomTree for Classification of Indian News

Analysis of WEKA Data Mining Algorithm REPTree, Simple Cart and RandomTree for Classification of Indian News Analysis of WEKA Data Mining Algorithm REPTree, Simple Cart and RandomTree for Classification of Indian News Sushilkumar Kalmegh Associate Professor, Department of Computer Science, Sant Gadge Baba Amravati

More information

Introduction to Machine Learning. Speaker: Harry Chao Advisor: J.J. Ding Date: 1/27/2011

Introduction to Machine Learning. Speaker: Harry Chao Advisor: J.J. Ding Date: 1/27/2011 Introduction to Machine Learning Speaker: Harry Chao Advisor: J.J. Ding Date: 1/27/2011 1 Outline 1. What is machine learning? 2. The basic of machine learning 3. Principles and effects of machine learning

More information

Mining Wiki Usage Data for Predicting Final Grades of Students

Mining Wiki Usage Data for Predicting Final Grades of Students Mining Wiki Usage Data for Predicting Final Grades of Students Gökhan Akçapınar, Erdal Coşgun, Arif Altun Hacettepe University gokhana@hacettepe.edu.tr, erdal.cosgun@hacettepe.edu.tr, altunar@hacettepe.edu.tr

More information

Mining the Software Change Repository of a Legacy Telephony System

Mining the Software Change Repository of a Legacy Telephony System Mining the Software Change Repository of a Legacy Telephony System Jelber Sayyad Shirabad, Timothy C. Lethbridge, Stan Matwin School of Information Technology and Engineering University of Ottawa, Ottawa,

More information

Choosing the Best Classification Performance Metric for Wrapper-based Software Metric Selection for Defect Prediction

Choosing the Best Classification Performance Metric for Wrapper-based Software Metric Selection for Defect Prediction Choosing the Best Classification Performance Metric for Wrapper-based Software Metric Selection for Defect Prediction Huanjing Wang Western Kentucky University huanjing.wang@wku.edu Taghi M. Khoshgoftaar

More information

Introducing diversity among the models of multi-label classification ensemble

Introducing diversity among the models of multi-label classification ensemble Introducing diversity among the models of multi-label classification ensemble Lena Chekina, Lior Rokach and Bracha Shapira Ben-Gurion University of the Negev Dept. of Information Systems Engineering and

More information

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014 RESEARCH ARTICLE OPEN ACCESS A Survey of Data Mining: Concepts with Applications and its Future Scope Dr. Zubair Khan 1, Ashish Kumar 2, Sunny Kumar 3 M.Tech Research Scholar 2. Department of Computer

More information

A Content based Spam Filtering Using Optical Back Propagation Technique

A Content based Spam Filtering Using Optical Back Propagation Technique A Content based Spam Filtering Using Optical Back Propagation Technique Sarab M. Hameed 1, Noor Alhuda J. Mohammed 2 Department of Computer Science, College of Science, University of Baghdad - Iraq ABSTRACT

More information

Email Classification Using Data Reduction Method

Email Classification Using Data Reduction Method Email Classification Using Data Reduction Method Rafiqul Islam and Yang Xiang, member IEEE School of Information Technology Deakin University, Burwood 3125, Victoria, Australia Abstract Classifying user

More information

A DECISION TREE BASED PEDOMETER AND ITS IMPLEMENTATION ON THE ANDROID PLATFORM

A DECISION TREE BASED PEDOMETER AND ITS IMPLEMENTATION ON THE ANDROID PLATFORM A DECISION TREE BASED PEDOMETER AND ITS IMPLEMENTATION ON THE ANDROID PLATFORM ABSTRACT Juanying Lin, Leanne Chan and Hong Yan Department of Electronic Engineering, City University of Hong Kong, Hong Kong,

More information

Introduction to Machine Learning Lecture 1. Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu

Introduction to Machine Learning Lecture 1. Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Introduction to Machine Learning Lecture 1 Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Introduction Logistics Prerequisites: basics concepts needed in probability and statistics

More information

A Novel Feature Selection Method Based on an Integrated Data Envelopment Analysis and Entropy Mode

A Novel Feature Selection Method Based on an Integrated Data Envelopment Analysis and Entropy Mode A Novel Feature Selection Method Based on an Integrated Data Envelopment Analysis and Entropy Mode Seyed Mojtaba Hosseini Bamakan, Peyman Gholami RESEARCH CENTRE OF FICTITIOUS ECONOMY & DATA SCIENCE UNIVERSITY

More information

KEITH LEHNERT AND ERIC FRIEDRICH

KEITH LEHNERT AND ERIC FRIEDRICH MACHINE LEARNING CLASSIFICATION OF MALICIOUS NETWORK TRAFFIC KEITH LEHNERT AND ERIC FRIEDRICH 1. Introduction 1.1. Intrusion Detection Systems. In our society, information systems are everywhere. They

More information

ISSN: 2321-7782 (Online) Volume 2, Issue 10, October 2014 International Journal of Advance Research in Computer Science and Management Studies

ISSN: 2321-7782 (Online) Volume 2, Issue 10, October 2014 International Journal of Advance Research in Computer Science and Management Studies ISSN: 2321-7782 (Online) Volume 2, Issue 10, October 2014 International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online

More information

International Journal of Advance Research in Computer Science and Management Studies

International Journal of Advance Research in Computer Science and Management Studies Volume 2, Issue 12, December 2014 ISSN: 2321 7782 (Online) International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online

More information

Experiments in Web Page Classification for Semantic Web

Experiments in Web Page Classification for Semantic Web Experiments in Web Page Classification for Semantic Web Asad Satti, Nick Cercone, Vlado Kešelj Faculty of Computer Science, Dalhousie University E-mail: {rashid,nick,vlado}@cs.dal.ca Abstract We address

More information

A Survey of Classification Techniques in the Area of Big Data.

A Survey of Classification Techniques in the Area of Big Data. A Survey of Classification Techniques in the Area of Big Data. 1PrafulKoturwar, 2 SheetalGirase, 3 Debajyoti Mukhopadhyay 1Reseach Scholar, Department of Information Technology 2Assistance Professor,Department

More information

Data Mining Techniques for Prognosis in Pancreatic Cancer

Data Mining Techniques for Prognosis in Pancreatic Cancer Data Mining Techniques for Prognosis in Pancreatic Cancer by Stuart Floyd A Thesis Submitted to the Faculty of the WORCESTER POLYTECHNIC INSTITUE In partial fulfillment of the requirements for the Degree

More information

Anti-Spam Filter Based on Naïve Bayes, SVM, and KNN model

Anti-Spam Filter Based on Naïve Bayes, SVM, and KNN model AI TERM PROJECT GROUP 14 1 Anti-Spam Filter Based on,, and model Yun-Nung Chen, Che-An Lu, Chao-Yu Huang Abstract spam email filters are a well-known and powerful type of filters. We construct different

More information

Spam detection with data mining method:

Spam detection with data mining method: Spam detection with data mining method: Ensemble learning with multiple SVM based classifiers to optimize generalization ability of email spam classification Keywords: ensemble learning, SVM classifier,

More information

A User s Guide to Support Vector Machines

A User s Guide to Support Vector Machines A User s Guide to Support Vector Machines Asa Ben-Hur Department of Computer Science Colorado State University Jason Weston NEC Labs America Princeton, NJ 08540 USA Abstract The Support Vector Machine

More information

Multiclass Classification. 9.520 Class 06, 25 Feb 2008 Ryan Rifkin

Multiclass Classification. 9.520 Class 06, 25 Feb 2008 Ryan Rifkin Multiclass Classification 9.520 Class 06, 25 Feb 2008 Ryan Rifkin It is a tale Told by an idiot, full of sound and fury, Signifying nothing. Macbeth, Act V, Scene V What Is Multiclass Classification? Each

More information

Predicting required bandwidth for educational institutes using prediction techniques in data mining (Case Study: Qom Payame Noor University)

Predicting required bandwidth for educational institutes using prediction techniques in data mining (Case Study: Qom Payame Noor University) 260 IJCSNS International Journal of Computer Science and Network Security, VOL.11 No.6, June 2011 Predicting required bandwidth for educational institutes using prediction techniques in data mining (Case

More information

Semi-Supervised Learning for Blog Classification

Semi-Supervised Learning for Blog Classification Proceedings of the Twenty-Third AAAI Conference on Artificial Intelligence (2008) Semi-Supervised Learning for Blog Classification Daisuke Ikeda Department of Computational Intelligence and Systems Science,

More information

Feature Subset Selection in E-mail Spam Detection

Feature Subset Selection in E-mail Spam Detection Feature Subset Selection in E-mail Spam Detection Amir Rajabi Behjat, Universiti Technology MARA, Malaysia IT Security for the Next Generation Asia Pacific & MEA Cup, Hong Kong 14-16 March, 2012 Feature

More information

Predictive Data modeling for health care: Comparative performance study of different prediction models

Predictive Data modeling for health care: Comparative performance study of different prediction models Predictive Data modeling for health care: Comparative performance study of different prediction models Shivanand Hiremath hiremat.nitie@gmail.com National Institute of Industrial Engineering (NITIE) Vihar

More information

Ensemble Approach for the Classification of Imbalanced Data

Ensemble Approach for the Classification of Imbalanced Data Ensemble Approach for the Classification of Imbalanced Data Vladimir Nikulin 1, Geoffrey J. McLachlan 1, and Shu Kay Ng 2 1 Department of Mathematics, University of Queensland v.nikulin@uq.edu.au, gjm@maths.uq.edu.au

More information

Situational Awareness at Internet Scale: Detection of Extremely Rare Crisis Periods

Situational Awareness at Internet Scale: Detection of Extremely Rare Crisis Periods Situational Awareness at Internet Scale: Detection of Extremely Rare Crisis Periods 2008 Sandia Workshop on Data Mining and Data Analysis David Cieslak, dcieslak@cse.nd.edu, http://www.nd.edu/~dcieslak/,

More information

The Artificial Prediction Market

The Artificial Prediction Market The Artificial Prediction Market Adrian Barbu Department of Statistics Florida State University Joint work with Nathan Lay, Siemens Corporate Research 1 Overview Main Contributions A mathematical theory

More information

Customer Classification And Prediction Based On Data Mining Technique

Customer Classification And Prediction Based On Data Mining Technique Customer Classification And Prediction Based On Data Mining Technique Ms. Neethu Baby 1, Mrs. Priyanka L.T 2 1 M.E CSE, Sri Shakthi Institute of Engineering and Technology, Coimbatore 2 Assistant Professor

More information

Intrusion Detection via Machine Learning for SCADA System Protection

Intrusion Detection via Machine Learning for SCADA System Protection Intrusion Detection via Machine Learning for SCADA System Protection S.L.P. Yasakethu Department of Computing, University of Surrey, Guildford, GU2 7XH, UK. s.l.yasakethu@surrey.ac.uk J. Jiang Department

More information