# SVM Ensemble Model for Investment Prediction

Save this PDF as:

Size: px
Start display at page:

## Transcription

2 20 elements in the feature vector corresponds to its dimensionality. Support Vector Machine consists of a set of related supervised learning methods. During the training stage, Support Vector Machine finds the maximum-margin hyperplane between different classes. The best hyperplane is selected so that the distance from it to the nearest data point on each side is maximised. Such classifiers are called as maximummargin hyperplane. This is the line in two dimensions, plane in three dimensions or hyperplane in higher dimensions that maximises the distance to the nearest data point. Cross-validation is used to reduce the chances of overfitting. The vectors or the data points that are closest to the hyperplane are called the support vectors. Let D be the dataset for linear SVM. It contains a set of points of the from D = {(Xi, yi ) Xi Rp, yi {-1,1}} Where yi is either 1 or 1 and it indicate the class to which the point Xi belongs. Equation for any hyper plane can be written as w. x b = 0, Where. Indicate the dot product, b is a constant and w is the normal vector to the hyper plane. Hyper plane with maximum distance away to separate the dataset can be described as w. x b = 1 And w. x - b= -1. The Support Vector Machines are linear functions of the form f(x) = w T x + b, where w is the weight vector and x is the input vector. Let the set of training examples be {(x 1, y 1 ), (x 2, y 2 ),, (x n, y n )}, where x i is an input vector and y i is its class label, y i {1, -1}. To find the linear function: Minimize: 1 W T W 2 Subject to the constraint: Y i (W T X i + b)>=1, i=1,2,3.n Where the index i, represents number of training cases. The C-SVC classifier with linear kernel is available in LIBSVM of WEKA. The C-SVC is used for experimental result. The Main features of LIBSVM include different SVM formulations, efficient multi-class classification, Cross validation for model selection, Probability estimates, various kernel functions and Weighted SVM for unbalanced data. C. SVM Ensemble SVM Ensemble classifier is a collection of several SVM classifiers whose individual decisions are combined to classify the test samples [4].An ensemble shows better performance than individual classifiers from which it is constructed. The SVM Ensemble classification prediction includes two levels: classifier construction and the usage of the classifier. The Model is constructed from the training set. Each sample in the training set is assumed to belong to a predefined class, as determined by the class attribute label [5]. Using WEKA knowledge flow and classifier selection, a boosted SVM Ensemble is created [10]. The created model is used for further prediction. The later involves the use of SVM Ensemble built to predict or classify the output. The processes start by training an SVM classifier with a less imbalanced subset of data, and then classify the entire training data set with the SVM to identify the incorrectly classified examples. Training another SVM classifier can reinforce the first classifier [15]. The process is repeated until we obtain an SVM ensemble in which each classifier tries to enhance the performance of its previous one. The trained models are aggregated using majority voting to obtain a collective outcome. SVM ensemble was created using C- SVC. SVM is a kernel-based algorithm. A kernel is a function that transforms the input data to a high-dimensional space where the problem is solved. Kernel functions can be linear or non-linear. SVM has built-in mechanisms that automatically choose appropriate settings based on the data. In C-SVC, SVM type available is LIBSVM, which uses One-against-one approach for multi class classification and builds k (k- 1)/2 binary classifiers for given K classifier. The classifiers are trained separately with classes against each other. The testing stage is one of the most critical phases of any classification model development process, which gives the model developer a most informative measure about the classifier performance. The classifier can helpful to justify its use, leading to possible optimisation. The final decision of ensemble is obtained after combining the individual predictions of ensemble members. Majority voting is used for the aggregation. D. Majority voting The majority voting is a method for combining several SVMs output. Majority vote counts the votes for each

3 21 class over the input classifiers and selects the majority class [15]. It is used for aggregating the results form various ensemble models. Let f k, (k=1,2,., K) be a decision function of the k th SVM in the SVM ensemble. Also let C j (j=1, 2 C) denote a label of the j th class. Then, the number of SVMs whose decisions are known to the j th class be N j =# { k f k (x) = Cj }. The final decision of the SVM Ensemble fmv(x) for a given test vector x due to the majority voting is determined by fmv(x) = arg j max Nj. In majority voting, the models are trained and allowed to vote. The one with high majority vote is selected for the individual models. By aggregating, the accuracy of the SVMs ensemble is increased. method is used. The final result is obtained and saved automatically into a file. This file is taken and used in the developed Investment prediction tool. The accuracy value of the policies varies from one policy to another. The policy, which is having maximum accuracy, is taken as the best policy among that category. 3. EXPERIMENTAL DESIGN A. Dataset Description This research makes use of dataset available from UCI Knowledge Discovery in Databases Archive [20]. Different policies are added in order to make necessary analysis. B. Performance Measures Performance metrics are used to assess how accurately the model predicts the known values. If the model performs well and meets the business requirements, it can then be applied to new data to predict the future. For evaluating the performance of a classifier confusion matrix, accuracy values etc are used. The accuracy is defined as, Accuracy = TP+TN/(TP+FP+TN+FN). Where TP, FP, TN and FN are the numbers of true positive predictions, false positive predictions, true negative predictions and false negative predictions, respectively. The Precision, Recall, F-Measures etc are also listed from the WEKA classifier performance evaluator. 4. RESULT ANALYSIS The SVM Ensemble model is created in weka knowledge flow. The first experiment was done using the Fire Policy. The knowledge flow for Fire policy is given in figure 2. There are three different fire policies. Each fire policy data is loaded using CSV loader data source. The loaded dataset is assigned into a train-test split maker using a class assigner. The class assigner assigns the class label of each policy. The loaded dataset is divided into training data and as testing data. 75% of data is taken as the training data and 25% of data was taken as testing data. The training data is used to build the SVM ensemble classifier. The test data is used to test the build classifier. A multi classified SVM ensemble is created. There are three C-SVC Support Vector Machines were selected and applied to AdaBoost. M1, to create a SVM ensemble.the kernel type is chosen as linear. This result is applied to Multiclassifier. In multiclass classifier one-against-one Figure 2.1.SVM Ensemble Model using Knowledge flow Figure 4.1 Parameter settings for SVM Ensemble Model Below table shows the detailed analyzed value, obtained for Fire policies with SVM Ensemble model and single SVM. Table 1. Single SVM vs SVM ensemble RE-1 RE-2 RE-3 F-Measure ROC-Area Single SVM SVM ensemble

4 22 The same policies were tested with single SVM. From the values obtained after the testing, it is found out that SVM ensemble gives better accuracy than single SVM Following figure shows, 38% of customers were chosen for RE-1 policy and 37% people had chosen RE-2 policy. Only 25% of the customers were taken RE-3 policy. 25% 37% fire policies 38% RE-1 RE-2 RE-3 from a given list. The detailed analysis of the Insurance dataset was done using the suggested model which was loaded in the WEKA knowledge flow. The policy, which was having maximum accuracy, has taken as the best policy among that category. From this research work, it is found that the accuracy of SVM Ensemble is better than other ensemble methods and is showing a better Investment prediction for different categories of policies. 5. CONCLUSION The experiment was carried out for different policies in the same way and found that the SVM ensemble performs better than the single classifier. Fast and accurate classifier is used for making investment prediction.svm Ensemble classifier is well-known as the state-of-art ensemble classifier for insurance field. In future, this research work can be extended to develop a Prediction Tool for investment field. REFERENCES Figure 4.2: Percentage of Policyholders The best fire policy output is shown in Investment Predictor Tool. All categories of policies were listed out in the Investment Predictor Tool. To get best policy for Fire policy, select fire policy from the list of Policy list and click on Get Result button. The analyzed results of three fire policies are stored, while running the WEKA knowledge flow model. The best policy having maximum accuracy is shown in fire policy. The RE-1 policy is selected as the best policy. This model was created and tested with other investment policies also. Figure 5.1: Best Policy prediction for FIRE In this paper, the data is collected and analyzed using multi-classifier SVM. Based on SVM ensemble classifier, the best policies were predicted for each category. The prediction helps to choose the best policy [1] Siji T.Mathew, Chandra J.Estimating the performance of SVM using Personal equity plan, in proceedings Of International Conf On Mathematics In Engineering &Business Management, pp , March 9-10,2012. [2] Siji T. Mathew, SVM Ensemble for Insurance Data Analysis, Mphil Dissertation, Christ University, Jun [3] Chawla, Nitesh V (2004). Learning Ensembles from Bites : A Scalable and Accurate Approach. Journal of Machine Learning Research, pp [4] C.Cortes,V.Vapnik,Support vector network, Machine Learning, pp ,1995. [5] Fumera, G, Roli, F, A theoretical and experimental analysis of linear combiners for multiple classifier systems Pattern Analysis and Machine Intelligence,IEEE Transactions, vol.27, no.6, pp [6] Harris Drucker et al, Support vector machines for spam categorization IEEE Transactions On Neural Networks,1999. [7] Kuncheva LI.,Combining Pattern Classifiers. Methods and Algorithms,2004. [8] Leon Bottou and Chih-Jen Lin, Support vectormachine Solvers in Large Scale Kernel Machines. Weston editors, MIT Press, Cambridge, MA, pp [9] Ma Chao, Chen Xihong, A New Algorithm of Support Vector Machine Ensemble and Its Application,Intelligent Human- Machine Systems and Cybernetics (IHMSC), 2nd International Conference, ,2010. [10] Mark Hall, et al(2009),the WEKA Data Mining

5 23 [11] Software: An Update SIGKDD Explorations, vol 11. [12] Nicolas Garcia-Pedrajas, Constructing Ensembles of Classifiers by Means of Weighted Instance Selection, IEEE Transactions On Neural Networks, vol. 20, NO. 2,2009. [13] Nikunj C. Oza and Kagan Tumer, Key Real- World Applications of Classifier Ensembles Information Fusion, Special Issue on Applications of Ensemble Methods, pp. 4 20,2008. [14] Petr Hajek, Vladimir Olej, Municipal Revenue Prediction by Support Vector Machine ensemble, ICCOMP 10:Proc. of the 14th WSEAS international Conf. On Computers: part of the 14th WSEAS CSCC multi Conf. vol.1,2010. [15] Ravi V.H et al, Soft computing system for bank performance Prediction. Elsevier Applied Soft Computing, [16] Robi Policar Ensemble based systems in decision making IEEE circuits and systems magazine, [17] Shin, Kim, An application of support vector machine in bankruptcy prediction model Expert System with applications,vol ,2005. [18] Bhardwaj M., Gupta, T., Grover T, Bhatnagar V An efficient classifier ensemble using SVM, Methods and Models in Computer Science, ICM2CS, Proceeding of International Conf, vol.2,2009. [19] Yan-Shi Dong, Ke-Song Han, Boosting SVM Classifiers By Ensemble, Special interest tracks and posters of the 14thinternational Conf on World Wide web May, pp.10-14,2005. [20] Ye Li, Yun Ze cal, et al, Fault diagnosis based on support vector machine ensemble, Machine Learning and Cybernetics. Pp ,2005. [21] Dataset available

### Ensemble Methods. Knowledge Discovery and Data Mining 2 (VU) (707.004) Roman Kern. KTI, TU Graz 2015-03-05

Ensemble Methods Knowledge Discovery and Data Mining 2 (VU) (707004) Roman Kern KTI, TU Graz 2015-03-05 Roman Kern (KTI, TU Graz) Ensemble Methods 2015-03-05 1 / 38 Outline 1 Introduction 2 Classification

### Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms

Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms Scott Pion and Lutz Hamel Abstract This paper presents the results of a series of analyses performed on direct mail

### Knowledge Discovery from patents using KMX Text Analytics

Knowledge Discovery from patents using KMX Text Analytics Dr. Anton Heijs anton.heijs@treparel.com Treparel Abstract In this white paper we discuss how the KMX technology of Treparel can help searchers

### Artificial Neural Network, Decision Tree and Statistical Techniques Applied for Designing and Developing E-mail Classifier

International Journal of Recent Technology and Engineering (IJRTE) ISSN: 2277-3878, Volume-1, Issue-6, January 2013 Artificial Neural Network, Decision Tree and Statistical Techniques Applied for Designing

### Search Taxonomy. Web Search. Search Engine Optimization. Information Retrieval

Information Retrieval INFO 4300 / CS 4300! Retrieval models Older models» Boolean retrieval» Vector Space model Probabilistic Models» BM25» Language models Web search» Learning to Rank Search Taxonomy!

### Classification algorithm in Data mining: An Overview

Classification algorithm in Data mining: An Overview S.Neelamegam #1, Dr.E.Ramaraj *2 #1 M.phil Scholar, Department of Computer Science and Engineering, Alagappa University, Karaikudi. *2 Professor, Department

### Support Vector Machines with Clustering for Training with Very Large Datasets

Support Vector Machines with Clustering for Training with Very Large Datasets Theodoros Evgeniou Technology Management INSEAD Bd de Constance, Fontainebleau 77300, France theodoros.evgeniou@insead.fr Massimiliano

### Data Mining Part 5. Prediction

Data Mining Part 5. Prediction 5.1 Spring 2010 Instructor: Dr. Masoud Yaghini Outline Classification vs. Numeric Prediction Prediction Process Data Preparation Comparing Prediction Methods References Classification

### A Study Of Bagging And Boosting Approaches To Develop Meta-Classifier

A Study Of Bagging And Boosting Approaches To Develop Meta-Classifier G.T. Prasanna Kumari Associate Professor, Dept of Computer Science and Engineering, Gokula Krishna College of Engg, Sullurpet-524121,

### A fast multi-class SVM learning method for huge databases

www.ijcsi.org 544 A fast multi-class SVM learning method for huge databases Djeffal Abdelhamid 1, Babahenini Mohamed Chaouki 2 and Taleb-Ahmed Abdelmalik 3 1,2 Computer science department, LESIA Laboratory,

### Performance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification

Performance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification Tina R. Patil, Mrs. S. S. Sherekar Sant Gadgebaba Amravati University, Amravati tnpatil2@gmail.com, ss_sherekar@rediffmail.com

### Towards better accuracy for Spam predictions

Towards better accuracy for Spam predictions Chengyan Zhao Department of Computer Science University of Toronto Toronto, Ontario, Canada M5S 2E4 czhao@cs.toronto.edu Abstract Spam identification is crucial

### Azure Machine Learning, SQL Data Mining and R

Azure Machine Learning, SQL Data Mining and R Day-by-day Agenda Prerequisites No formal prerequisites. Basic knowledge of SQL Server Data Tools, Excel and any analytical experience helps. Best of all:

### HYBRID PROBABILITY BASED ENSEMBLES FOR BANKRUPTCY PREDICTION

HYBRID PROBABILITY BASED ENSEMBLES FOR BANKRUPTCY PREDICTION Chihli Hung 1, Jing Hong Chen 2, Stefan Wermter 3, 1,2 Department of Management Information Systems, Chung Yuan Christian University, Taiwan

### Chapter 6. The stacking ensemble approach

82 This chapter proposes the stacking ensemble approach for combining different data mining classifiers to get better performance. Other combination techniques like voting, bagging etc are also described

### Using Random Forest to Learn Imbalanced Data

Using Random Forest to Learn Imbalanced Data Chao Chen, chenchao@stat.berkeley.edu Department of Statistics,UC Berkeley Andy Liaw, andy liaw@merck.com Biometrics Research,Merck Research Labs Leo Breiman,

### Behavior Analysis of SVM Based Spam Filtering Using Various Kernel Functions and Data Representations

ISSN: 2278-181 Vol. 2 Issue 9, September - 213 Behavior Analysis of SVM Based Spam Filtering Using Various Kernel Functions and Data Representations Author :Sushama Chouhan Author Affiliation: MTech Scholar

### Artificial Neural Networks and Support Vector Machines. CS 486/686: Introduction to Artificial Intelligence

Artificial Neural Networks and Support Vector Machines CS 486/686: Introduction to Artificial Intelligence 1 Outline What is a Neural Network? - Perceptron learners - Multi-layer networks What is a Support

### Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data

CMPE 59H Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data Term Project Report Fatma Güney, Kübra Kalkan 1/15/2013 Keywords: Non-linear

### Data Mining. Nonlinear Classification

Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Nonlinear Classification Classes may not be separable by a linear boundary Suppose we randomly generate a data set as follows: X has range between 0 to 15

### A NEW DECISION TREE METHOD FOR DATA MINING IN MEDICINE

A NEW DECISION TREE METHOD FOR DATA MINING IN MEDICINE Kasra Madadipouya 1 1 Department of Computing and Science, Asia Pacific University of Technology & Innovation ABSTRACT Today, enormous amount of data

### Keywords Data mining, Classification Algorithm, Decision tree, J48, Random forest, Random tree, LMT, WEKA 3.7. Fig.1. Data mining techniques.

International Journal of Emerging Research in Management &Technology Research Article October 2015 Comparative Study of Various Decision Tree Classification Algorithm Using WEKA Purva Sewaiwar, Kamal Kant

### DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES

DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES Vijayalakshmi Mahanra Rao 1, Yashwant Prasad Singh 2 Multimedia University, Cyberjaya, MALAYSIA 1 lakshmi.mahanra@gmail.com

### AUTO CLAIM FRAUD DETECTION USING MULTI CLASSIFIER SYSTEM

AUTO CLAIM FRAUD DETECTION USING MULTI CLASSIFIER SYSTEM ABSTRACT Luis Alexandre Rodrigues and Nizam Omar Department of Electrical Engineering, Mackenzie Presbiterian University, Brazil, São Paulo 71251911@mackenzie.br,nizam.omar@mackenzie.br

### Practical Data Science with Azure Machine Learning, SQL Data Mining, and R

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R Overview This 4-day class is the first of the two data science courses taught by Rafal Lukawiecki. Some of the topics will be

### Classifiers & Classification

Classifiers & Classification Forsyth & Ponce Computer Vision A Modern Approach chapter 22 Pattern Classification Duda, Hart and Stork School of Computer Science & Statistics Trinity College Dublin Dublin

### Addressing the Class Imbalance Problem in Medical Datasets

Addressing the Class Imbalance Problem in Medical Datasets M. Mostafizur Rahman and D. N. Davis the size of the training set is significantly increased [5]. If the time taken to resample is not considered,

### CI6227: Data Mining. Lesson 11b: Ensemble Learning. Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore.

CI6227: Data Mining Lesson 11b: Ensemble Learning Sinno Jialin PAN Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore Acknowledgements: slides are adapted from the lecture notes

### Supervised Learning (Big Data Analytics)

Supervised Learning (Big Data Analytics) Vibhav Gogate Department of Computer Science The University of Texas at Dallas Practical advice Goal of Big Data Analytics Uncover patterns in Data. Can be used

### Support Vector Machines Explained

March 1, 2009 Support Vector Machines Explained Tristan Fletcher www.cs.ucl.ac.uk/staff/t.fletcher/ Introduction This document has been written in an attempt to make the Support Vector Machines (SVM),

### Ensemble Data Mining Methods

Ensemble Data Mining Methods Nikunj C. Oza, Ph.D., NASA Ames Research Center, USA INTRODUCTION Ensemble Data Mining Methods, also known as Committee Methods or Model Combiners, are machine learning methods

### Active Learning SVM for Blogs recommendation

Active Learning SVM for Blogs recommendation Xin Guan Computer Science, George Mason University Ⅰ.Introduction In the DH Now website, they try to review a big amount of blogs and articles and find the

### MAXIMIZING RETURN ON DIRECT MARKETING CAMPAIGNS

MAXIMIZING RETURN ON DIRET MARKETING AMPAIGNS IN OMMERIAL BANKING S 229 Project: Final Report Oleksandra Onosova INTRODUTION Recent innovations in cloud computing and unified communications have made a

### A New Ensemble Model for Efficient Churn Prediction in Mobile Telecommunication

2012 45th Hawaii International Conference on System Sciences A New Ensemble Model for Efficient Churn Prediction in Mobile Telecommunication Namhyoung Kim, Jaewook Lee Department of Industrial and Management

### 1. Classification problems

Neural and Evolutionary Computing. Lab 1: Classification problems Machine Learning test data repository Weka data mining platform Introduction Scilab 1. Classification problems The main aim of a classification

### FRAUD DETECTION IN ELECTRIC POWER DISTRIBUTION NETWORKS USING AN ANN-BASED KNOWLEDGE-DISCOVERY PROCESS

FRAUD DETECTION IN ELECTRIC POWER DISTRIBUTION NETWORKS USING AN ANN-BASED KNOWLEDGE-DISCOVERY PROCESS Breno C. Costa, Bruno. L. A. Alberto, André M. Portela, W. Maduro, Esdras O. Eler PDITec, Belo Horizonte,

### BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES

ISSN: 2229-6956(ONLINE) ICTACT JOURNAL ON SOFT COMPUTING, JULY 2012, VOLUME: 02, ISSUE: 04 BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES V. Dheepa 1 and R. Dhanapal 2 1 Research

### An Approach to Detect Spam Emails by Using Majority Voting

An Approach to Detect Spam Emails by Using Majority Voting Roohi Hussain Department of Computer Engineering, National University of Science and Technology, H-12 Islamabad, Pakistan Usman Qamar Faculty,

### Mining Direct Marketing Data by Ensembles of Weak Learners and Rough Set Methods

Mining Direct Marketing Data by Ensembles of Weak Learners and Rough Set Methods Jerzy B laszczyński 1, Krzysztof Dembczyński 1, Wojciech Kot lowski 1, and Mariusz Paw lowski 2 1 Institute of Computing

### T-61.3050 : Email Classification as Spam or Ham using Naive Bayes Classifier. Santosh Tirunagari : 245577

T-61.3050 : Email Classification as Spam or Ham using Naive Bayes Classifier Santosh Tirunagari : 245577 January 20, 2011 Abstract This term project gives a solution how to classify an email as spam or

### Overview. Evaluation Connectionist and Statistical Language Processing. Test and Validation Set. Training and Test Set

Overview Evaluation Connectionist and Statistical Language Processing Frank Keller keller@coli.uni-sb.de Computerlinguistik Universität des Saarlandes training set, validation set, test set holdout, stratification

### An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015

An Introduction to Data Mining for Wind Power Management Spring 2015 Big Data World Every minute: Google receives over 4 million search queries Facebook users share almost 2.5 million pieces of content

### Data Mining Practical Machine Learning Tools and Techniques

Ensemble learning Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 8 of Data Mining by I. H. Witten, E. Frank and M. A. Hall Combining multiple models Bagging The basic idea

### CLASSIFYING NETWORK TRAFFIC IN THE BIG DATA ERA

CLASSIFYING NETWORK TRAFFIC IN THE BIG DATA ERA Professor Yang Xiang Network Security and Computing Laboratory (NSCLab) School of Information Technology Deakin University, Melbourne, Australia http://anss.org.au/nsclab

### SURVIVABILITY OF COMPLEX SYSTEM SUPPORT VECTOR MACHINE BASED APPROACH

1 SURVIVABILITY OF COMPLEX SYSTEM SUPPORT VECTOR MACHINE BASED APPROACH Y, HONG, N. GAUTAM, S. R. T. KUMARA, A. SURANA, H. GUPTA, S. LEE, V. NARAYANAN, H. THADAKAMALLA The Dept. of Industrial Engineering,

### Feature Selection using Integer and Binary coded Genetic Algorithm to improve the performance of SVM Classifier

Feature Selection using Integer and Binary coded Genetic Algorithm to improve the performance of SVM Classifier D.Nithya a, *, V.Suganya b,1, R.Saranya Irudaya Mary c,1 Abstract - This paper presents,

### Constrained Classification of Large Imbalanced Data by Logistic Regression and Genetic Algorithm

Constrained Classification of Large Imbalanced Data by Logistic Regression and Genetic Algorithm Martin Hlosta, Rostislav Stríž, Jan Kupčík, Jaroslav Zendulka, and Tomáš Hruška A. Imbalanced Data Classification

### BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL

The Fifth International Conference on e-learning (elearning-2014), 22-23 September 2014, Belgrade, Serbia BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL SNJEŽANA MILINKOVIĆ University

### Introduction to Support Vector Machines. Colin Campbell, Bristol University

Introduction to Support Vector Machines Colin Campbell, Bristol University 1 Outline of talk. Part 1. An Introduction to SVMs 1.1. SVMs for binary classification. 1.2. Soft margins and multi-class classification.

### Classification of Bad Accounts in Credit Card Industry

Classification of Bad Accounts in Credit Card Industry Chengwei Yuan December 12, 2014 Introduction Risk management is critical for a credit card company to survive in such competing industry. In addition

### Decision Support Systems

Decision Support Systems 50 (2011) 602 613 Contents lists available at ScienceDirect Decision Support Systems journal homepage: www.elsevier.com/locate/dss Data mining for credit card fraud: A comparative

### An Overview of Knowledge Discovery Database and Data mining Techniques

An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,

### Social Media Mining. Data Mining Essentials

Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers

### Data Clustering. Dec 2nd, 2013 Kyrylo Bessonov

Data Clustering Dec 2nd, 2013 Kyrylo Bessonov Talk outline Introduction to clustering Types of clustering Supervised Unsupervised Similarity measures Main clustering algorithms k-means Hierarchical Main

### Combining SVM classifiers for email anti-spam filtering

Combining SVM classifiers for email anti-spam filtering Ángela Blanco Manuel Martín-Merino Abstract Spam, also known as Unsolicited Commercial Email (UCE) is becoming a nightmare for Internet users and

### International Journal of Software and Web Sciences (IJSWS) www.iasir.net

International Association of Scientific Innovation and Research (IASIR) (An Association Unifying the Sciences, Engineering, and Applied Research) ISSN (Print): 2279-0063 ISSN (Online): 2279-0071 International

### Getting Even More Out of Ensemble Selection

Getting Even More Out of Ensemble Selection Quan Sun Department of Computer Science The University of Waikato Hamilton, New Zealand qs12@cs.waikato.ac.nz ABSTRACT Ensemble Selection uses forward stepwise

### Advanced Ensemble Strategies for Polynomial Models

Advanced Ensemble Strategies for Polynomial Models Pavel Kordík 1, Jan Černý 2 1 Dept. of Computer Science, Faculty of Information Technology, Czech Technical University in Prague, 2 Dept. of Computer

### Data Mining - Evaluation of Classifiers

Data Mining - Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010

### Big Data Analytics CSCI 4030

High dim. data Graph data Infinite data Machine learning Apps Locality sensitive hashing PageRank, SimRank Filtering data streams SVM Recommen der systems Clustering Community Detection Web advertising

### Knowledge Discovery and Data Mining

Knowledge Discovery and Data Mining Unit # 11 Sajjad Haider Fall 2013 1 Supervised Learning Process Data Collection/Preparation Data Cleaning Discretization Supervised/Unuspervised Identification of right

### International Journal of Computer Science Trends and Technology (IJCST) Volume 3 Issue 3, May-June 2015

RESEARCH ARTICLE OPEN ACCESS Data Mining Technology for Efficient Network Security Management Ankit Naik [1], S.W. Ahmad [2] Student [1], Assistant Professor [2] Department of Computer Science and Engineering

### II. RELATED WORK. Sentiment Mining

Sentiment Mining Using Ensemble Classification Models Matthew Whitehead and Larry Yaeger Indiana University School of Informatics 901 E. 10th St. Bloomington, IN 47408 {mewhiteh, larryy}@indiana.edu Abstract

### E-commerce Transaction Anomaly Classification

E-commerce Transaction Anomaly Classification Minyong Lee minyong@stanford.edu Seunghee Ham sham12@stanford.edu Qiyi Jiang qjiang@stanford.edu I. INTRODUCTION Due to the increasing popularity of e-commerce

### Selecting Data Mining Model for Web Advertising in Virtual Communities

Selecting Data Mining for Web Advertising in Virtual Communities Jerzy Surma Faculty of Business Administration Warsaw School of Economics Warsaw, Poland e-mail: jerzy.surma@gmail.com Mariusz Łapczyński

### IDENTIFIC ATION OF SOFTWARE EROSION USING LOGISTIC REGRESSION

http:// IDENTIFIC ATION OF SOFTWARE EROSION USING LOGISTIC REGRESSION Harinder Kaur 1, Raveen Bajwa 2 1 PG Student., CSE., Baba Banda Singh Bahadur Engg. College, Fatehgarh Sahib, (India) 2 Asstt. Prof.,

### Data Mining Practical Machine Learning Tools and Techniques

Counting the cost Data Mining Practical Machine Learning Tools and Techniques Slides for Section 5.7 In practice, different types of classification errors often incur different costs Examples: Loan decisions

### BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES

BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES 123 CHAPTER 7 BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES 7.1 Introduction Even though using SVM presents

### Analysis of kiva.com Microlending Service! Hoda Eydgahi Julia Ma Andy Bardagjy December 9, 2010 MAS.622j

Analysis of kiva.com Microlending Service! Hoda Eydgahi Julia Ma Andy Bardagjy December 9, 2010 MAS.622j What is Kiva? An organization that allows people to lend small amounts of money via the Internet

### Analysis of WEKA Data Mining Algorithm REPTree, Simple Cart and RandomTree for Classification of Indian News

Analysis of WEKA Data Mining Algorithm REPTree, Simple Cart and RandomTree for Classification of Indian News Sushilkumar Kalmegh Associate Professor, Department of Computer Science, Sant Gadge Baba Amravati

### Introduction to Machine Learning. Speaker: Harry Chao Advisor: J.J. Ding Date: 1/27/2011

Introduction to Machine Learning Speaker: Harry Chao Advisor: J.J. Ding Date: 1/27/2011 1 Outline 1. What is machine learning? 2. The basic of machine learning 3. Principles and effects of machine learning

### Mining Wiki Usage Data for Predicting Final Grades of Students

Mining Wiki Usage Data for Predicting Final Grades of Students Gökhan Akçapınar, Erdal Coşgun, Arif Altun Hacettepe University gokhana@hacettepe.edu.tr, erdal.cosgun@hacettepe.edu.tr, altunar@hacettepe.edu.tr

### Mining the Software Change Repository of a Legacy Telephony System

Mining the Software Change Repository of a Legacy Telephony System Jelber Sayyad Shirabad, Timothy C. Lethbridge, Stan Matwin School of Information Technology and Engineering University of Ottawa, Ottawa,

### Choosing the Best Classification Performance Metric for Wrapper-based Software Metric Selection for Defect Prediction

Choosing the Best Classification Performance Metric for Wrapper-based Software Metric Selection for Defect Prediction Huanjing Wang Western Kentucky University huanjing.wang@wku.edu Taghi M. Khoshgoftaar

### Introducing diversity among the models of multi-label classification ensemble

Introducing diversity among the models of multi-label classification ensemble Lena Chekina, Lior Rokach and Bracha Shapira Ben-Gurion University of the Negev Dept. of Information Systems Engineering and

### International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

RESEARCH ARTICLE OPEN ACCESS A Survey of Data Mining: Concepts with Applications and its Future Scope Dr. Zubair Khan 1, Ashish Kumar 2, Sunny Kumar 3 M.Tech Research Scholar 2. Department of Computer

### A Content based Spam Filtering Using Optical Back Propagation Technique

A Content based Spam Filtering Using Optical Back Propagation Technique Sarab M. Hameed 1, Noor Alhuda J. Mohammed 2 Department of Computer Science, College of Science, University of Baghdad - Iraq ABSTRACT

### Email Classification Using Data Reduction Method

Email Classification Using Data Reduction Method Rafiqul Islam and Yang Xiang, member IEEE School of Information Technology Deakin University, Burwood 3125, Victoria, Australia Abstract Classifying user

### A DECISION TREE BASED PEDOMETER AND ITS IMPLEMENTATION ON THE ANDROID PLATFORM

A DECISION TREE BASED PEDOMETER AND ITS IMPLEMENTATION ON THE ANDROID PLATFORM ABSTRACT Juanying Lin, Leanne Chan and Hong Yan Department of Electronic Engineering, City University of Hong Kong, Hong Kong,

### Introduction to Machine Learning Lecture 1. Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu

Introduction to Machine Learning Lecture 1 Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Introduction Logistics Prerequisites: basics concepts needed in probability and statistics

### A Novel Feature Selection Method Based on an Integrated Data Envelopment Analysis and Entropy Mode

A Novel Feature Selection Method Based on an Integrated Data Envelopment Analysis and Entropy Mode Seyed Mojtaba Hosseini Bamakan, Peyman Gholami RESEARCH CENTRE OF FICTITIOUS ECONOMY & DATA SCIENCE UNIVERSITY

### KEITH LEHNERT AND ERIC FRIEDRICH

MACHINE LEARNING CLASSIFICATION OF MALICIOUS NETWORK TRAFFIC KEITH LEHNERT AND ERIC FRIEDRICH 1. Introduction 1.1. Intrusion Detection Systems. In our society, information systems are everywhere. They

### ISSN: 2321-7782 (Online) Volume 2, Issue 10, October 2014 International Journal of Advance Research in Computer Science and Management Studies

ISSN: 2321-7782 (Online) Volume 2, Issue 10, October 2014 International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online

### International Journal of Advance Research in Computer Science and Management Studies

Volume 2, Issue 12, December 2014 ISSN: 2321 7782 (Online) International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online

### Experiments in Web Page Classification for Semantic Web

Experiments in Web Page Classification for Semantic Web Asad Satti, Nick Cercone, Vlado Kešelj Faculty of Computer Science, Dalhousie University E-mail: {rashid,nick,vlado}@cs.dal.ca Abstract We address

### A Survey of Classification Techniques in the Area of Big Data.

A Survey of Classification Techniques in the Area of Big Data. 1PrafulKoturwar, 2 SheetalGirase, 3 Debajyoti Mukhopadhyay 1Reseach Scholar, Department of Information Technology 2Assistance Professor,Department

### Data Mining Techniques for Prognosis in Pancreatic Cancer

Data Mining Techniques for Prognosis in Pancreatic Cancer by Stuart Floyd A Thesis Submitted to the Faculty of the WORCESTER POLYTECHNIC INSTITUE In partial fulfillment of the requirements for the Degree

### Anti-Spam Filter Based on Naïve Bayes, SVM, and KNN model

AI TERM PROJECT GROUP 14 1 Anti-Spam Filter Based on,, and model Yun-Nung Chen, Che-An Lu, Chao-Yu Huang Abstract spam email filters are a well-known and powerful type of filters. We construct different

### Spam detection with data mining method:

Spam detection with data mining method: Ensemble learning with multiple SVM based classifiers to optimize generalization ability of email spam classification Keywords: ensemble learning, SVM classifier,

### A User s Guide to Support Vector Machines

A User s Guide to Support Vector Machines Asa Ben-Hur Department of Computer Science Colorado State University Jason Weston NEC Labs America Princeton, NJ 08540 USA Abstract The Support Vector Machine

### Multiclass Classification. 9.520 Class 06, 25 Feb 2008 Ryan Rifkin

Multiclass Classification 9.520 Class 06, 25 Feb 2008 Ryan Rifkin It is a tale Told by an idiot, full of sound and fury, Signifying nothing. Macbeth, Act V, Scene V What Is Multiclass Classification? Each

### Predicting required bandwidth for educational institutes using prediction techniques in data mining (Case Study: Qom Payame Noor University)

260 IJCSNS International Journal of Computer Science and Network Security, VOL.11 No.6, June 2011 Predicting required bandwidth for educational institutes using prediction techniques in data mining (Case

### Semi-Supervised Learning for Blog Classification

Proceedings of the Twenty-Third AAAI Conference on Artificial Intelligence (2008) Semi-Supervised Learning for Blog Classification Daisuke Ikeda Department of Computational Intelligence and Systems Science,

### Feature Subset Selection in E-mail Spam Detection

Feature Subset Selection in E-mail Spam Detection Amir Rajabi Behjat, Universiti Technology MARA, Malaysia IT Security for the Next Generation Asia Pacific & MEA Cup, Hong Kong 14-16 March, 2012 Feature

### Predictive Data modeling for health care: Comparative performance study of different prediction models

Predictive Data modeling for health care: Comparative performance study of different prediction models Shivanand Hiremath hiremat.nitie@gmail.com National Institute of Industrial Engineering (NITIE) Vihar

### Ensemble Approach for the Classification of Imbalanced Data

Ensemble Approach for the Classification of Imbalanced Data Vladimir Nikulin 1, Geoffrey J. McLachlan 1, and Shu Kay Ng 2 1 Department of Mathematics, University of Queensland v.nikulin@uq.edu.au, gjm@maths.uq.edu.au

### Situational Awareness at Internet Scale: Detection of Extremely Rare Crisis Periods

Situational Awareness at Internet Scale: Detection of Extremely Rare Crisis Periods 2008 Sandia Workshop on Data Mining and Data Analysis David Cieslak, dcieslak@cse.nd.edu, http://www.nd.edu/~dcieslak/,

### The Artificial Prediction Market

The Artificial Prediction Market Adrian Barbu Department of Statistics Florida State University Joint work with Nathan Lay, Siemens Corporate Research 1 Overview Main Contributions A mathematical theory