SVM Ensemble Model for Investment Prediction
|
|
- Lester Chapman
- 8 years ago
- Views:
Transcription
1 19 SVM Ensemble Model for Investment Prediction Chandra J, Assistant Professor, Department of Computer Science, Christ University, Bangalore Siji T. Mathew, Research Scholar, Christ University, Dept of Computer Science, Bangalore ABSTRACT This paper analyses the usage of SVM Ensemble for Investment service. Objective of this work is to construct a model for investment prediction. Ensemble learning is a machine-learning paradigm where multiple models are trained to solve the problem. In this paper, a detailed study of SVM ensemble is done. An insurance dataset obtained from UCI knowledge discovery in Databases Archive was taken. The AdaBoost, multiclassifier SVM Ensemble was created and tested with the insurance dataset. From this work, the SVM Ensemble produces better accuracy than other ensembles. The knowledge flow of SVM ensemble model was created using Weka tool. This model helps the user to predict the best policy for investment Keywords SVM Ensemble, AdaBoost, Multiclassifier, Accuracy, ROC. 1. INTRODUCTION In this research, The SVM Ensemble model is created which is highly reliable for investment sector. We present a novel method based on SVM Ensemble classification. The Business intelligence helps the companies in their decision support system. A business intelligence technology gives both historical and current views of the businesses process and using this knowledge the decision-making system can produce good results. Different categories of Policies were selected from UCI knowledge discovery in Databases Archive. Data mining is the process of analysing data from different perspectives and summarizing it into useful information. Data are any facts, text or numbers that can be processed out. The information is useful to increase the revenue and reduce the costs of the organization. Datamining is called as data or knowledge discovery from data. Datamining is the process of finding correlations or knowledge among different fields in large relational databases where the information is stored and available for mining. Datamining tools allow the users to analyze the data from different dimensions. A data mining system contains data, information and knowledge to extract these data and information. Companies with a strong consumer focus mainly use datamining in these days. It is applied in retail field, financial sector, communication media, and in marketing organizations. Datamining facilitate these companies to determine relationships among company internal factors such as price, product positioning, or staff skills, and external factors such as competition in products, economic indicators, and customer demographics. The impacts of different factors determine the effects on sales, corporate profits, product quality and customer satisfaction. With the application of datamining, a retailer could use point-of-sale records of customer purchases to send targeted promotions based on an individual's purchase history [14]. By mining demographic data collected from various sources, the retailer could develop products and promotions to appeal to specific customer segments. 2. METHODOLOGY In this research work, the multi-classifier SVM Ensemble creation uses AdaBoost.M1 algorithm, C- SVC and Majority voting method. A. AdaBoost Algorithm AdaBoost is a Machine Learning algorithm used for improving the performance of learning algorithm [18]. AdaBoost algorithm is used to boost the accuracy of the Support Vector Machines. Let D be the given dataset with d class-labelled tuples (X 1, y1), (X 2, y2) (X d, Y d ). In the initial step, AdaBoost assign each training tuple an equal weight of 1/d. For generating the ensemble, AdaBoost algorithm require k rounds through the rest of the algorithm. In round I, the tuples from D are sampled to form a training set, D i of size d. sampling with replacement is used in AdaBoost. The selection of each sample depends on its weight. A classifier model, M i is derived from the training tuple of D i its error is calculated Di as the test set. If the tuple was incorrectly classified, its weight is increased. If a tuple is classified as correct, its weight is decreased. These weights are used to generate classifiers in the next round. B. Support Vector Machines A Support Vector Machine is a type of classifier, which can handle both linear and non-linear data. A classifier takes a feature vector and assigns a class or a label to the vector. The goal of the Support Vector Machine is to predict target value of data instances in testing set with better accuracy. The number of
2 20 elements in the feature vector corresponds to its dimensionality. Support Vector Machine consists of a set of related supervised learning methods. During the training stage, Support Vector Machine finds the maximum-margin hyperplane between different classes. The best hyperplane is selected so that the distance from it to the nearest data point on each side is maximised. Such classifiers are called as maximummargin hyperplane. This is the line in two dimensions, plane in three dimensions or hyperplane in higher dimensions that maximises the distance to the nearest data point. Cross-validation is used to reduce the chances of overfitting. The vectors or the data points that are closest to the hyperplane are called the support vectors. Let D be the dataset for linear SVM. It contains a set of points of the from D = {(Xi, yi ) Xi Rp, yi {-1,1}} Where yi is either 1 or 1 and it indicate the class to which the point Xi belongs. Equation for any hyper plane can be written as w. x b = 0, Where. Indicate the dot product, b is a constant and w is the normal vector to the hyper plane. Hyper plane with maximum distance away to separate the dataset can be described as w. x b = 1 And w. x - b= -1. The Support Vector Machines are linear functions of the form f(x) = w T x + b, where w is the weight vector and x is the input vector. Let the set of training examples be {(x 1, y 1 ), (x 2, y 2 ),, (x n, y n )}, where x i is an input vector and y i is its class label, y i {1, -1}. To find the linear function: Minimize: 1 W T W 2 Subject to the constraint: Y i (W T X i + b)>=1, i=1,2,3.n Where the index i, represents number of training cases. The C-SVC classifier with linear kernel is available in LIBSVM of WEKA. The C-SVC is used for experimental result. The Main features of LIBSVM include different SVM formulations, efficient multi-class classification, Cross validation for model selection, Probability estimates, various kernel functions and Weighted SVM for unbalanced data. C. SVM Ensemble SVM Ensemble classifier is a collection of several SVM classifiers whose individual decisions are combined to classify the test samples [4].An ensemble shows better performance than individual classifiers from which it is constructed. The SVM Ensemble classification prediction includes two levels: classifier construction and the usage of the classifier. The Model is constructed from the training set. Each sample in the training set is assumed to belong to a predefined class, as determined by the class attribute label [5]. Using WEKA knowledge flow and classifier selection, a boosted SVM Ensemble is created [10]. The created model is used for further prediction. The later involves the use of SVM Ensemble built to predict or classify the output. The processes start by training an SVM classifier with a less imbalanced subset of data, and then classify the entire training data set with the SVM to identify the incorrectly classified examples. Training another SVM classifier can reinforce the first classifier [15]. The process is repeated until we obtain an SVM ensemble in which each classifier tries to enhance the performance of its previous one. The trained models are aggregated using majority voting to obtain a collective outcome. SVM ensemble was created using C- SVC. SVM is a kernel-based algorithm. A kernel is a function that transforms the input data to a high-dimensional space where the problem is solved. Kernel functions can be linear or non-linear. SVM has built-in mechanisms that automatically choose appropriate settings based on the data. In C-SVC, SVM type available is LIBSVM, which uses One-against-one approach for multi class classification and builds k (k- 1)/2 binary classifiers for given K classifier. The classifiers are trained separately with classes against each other. The testing stage is one of the most critical phases of any classification model development process, which gives the model developer a most informative measure about the classifier performance. The classifier can helpful to justify its use, leading to possible optimisation. The final decision of ensemble is obtained after combining the individual predictions of ensemble members. Majority voting is used for the aggregation. D. Majority voting The majority voting is a method for combining several SVMs output. Majority vote counts the votes for each
3 21 class over the input classifiers and selects the majority class [15]. It is used for aggregating the results form various ensemble models. Let f k, (k=1,2,., K) be a decision function of the k th SVM in the SVM ensemble. Also let C j (j=1, 2 C) denote a label of the j th class. Then, the number of SVMs whose decisions are known to the j th class be N j =# { k f k (x) = Cj }. The final decision of the SVM Ensemble fmv(x) for a given test vector x due to the majority voting is determined by fmv(x) = arg j max Nj. In majority voting, the models are trained and allowed to vote. The one with high majority vote is selected for the individual models. By aggregating, the accuracy of the SVMs ensemble is increased. method is used. The final result is obtained and saved automatically into a file. This file is taken and used in the developed Investment prediction tool. The accuracy value of the policies varies from one policy to another. The policy, which is having maximum accuracy, is taken as the best policy among that category. 3. EXPERIMENTAL DESIGN A. Dataset Description This research makes use of dataset available from UCI Knowledge Discovery in Databases Archive [20]. Different policies are added in order to make necessary analysis. B. Performance Measures Performance metrics are used to assess how accurately the model predicts the known values. If the model performs well and meets the business requirements, it can then be applied to new data to predict the future. For evaluating the performance of a classifier confusion matrix, accuracy values etc are used. The accuracy is defined as, Accuracy = TP+TN/(TP+FP+TN+FN). Where TP, FP, TN and FN are the numbers of true positive predictions, false positive predictions, true negative predictions and false negative predictions, respectively. The Precision, Recall, F-Measures etc are also listed from the WEKA classifier performance evaluator. 4. RESULT ANALYSIS The SVM Ensemble model is created in weka knowledge flow. The first experiment was done using the Fire Policy. The knowledge flow for Fire policy is given in figure 2. There are three different fire policies. Each fire policy data is loaded using CSV loader data source. The loaded dataset is assigned into a train-test split maker using a class assigner. The class assigner assigns the class label of each policy. The loaded dataset is divided into training data and as testing data. 75% of data is taken as the training data and 25% of data was taken as testing data. The training data is used to build the SVM ensemble classifier. The test data is used to test the build classifier. A multi classified SVM ensemble is created. There are three C-SVC Support Vector Machines were selected and applied to AdaBoost. M1, to create a SVM ensemble.the kernel type is chosen as linear. This result is applied to Multiclassifier. In multiclass classifier one-against-one Figure 2.1.SVM Ensemble Model using Knowledge flow Figure 4.1 Parameter settings for SVM Ensemble Model Below table shows the detailed analyzed value, obtained for Fire policies with SVM Ensemble model and single SVM. Table 1. Single SVM vs SVM ensemble RE-1 RE-2 RE-3 F-Measure ROC-Area Single SVM SVM ensemble
4 22 The same policies were tested with single SVM. From the values obtained after the testing, it is found out that SVM ensemble gives better accuracy than single SVM Following figure shows, 38% of customers were chosen for RE-1 policy and 37% people had chosen RE-2 policy. Only 25% of the customers were taken RE-3 policy. 25% 37% fire policies 38% RE-1 RE-2 RE-3 from a given list. The detailed analysis of the Insurance dataset was done using the suggested model which was loaded in the WEKA knowledge flow. The policy, which was having maximum accuracy, has taken as the best policy among that category. From this research work, it is found that the accuracy of SVM Ensemble is better than other ensemble methods and is showing a better Investment prediction for different categories of policies. 5. CONCLUSION The experiment was carried out for different policies in the same way and found that the SVM ensemble performs better than the single classifier. Fast and accurate classifier is used for making investment prediction.svm Ensemble classifier is well-known as the state-of-art ensemble classifier for insurance field. In future, this research work can be extended to develop a Prediction Tool for investment field. REFERENCES Figure 4.2: Percentage of Policyholders The best fire policy output is shown in Investment Predictor Tool. All categories of policies were listed out in the Investment Predictor Tool. To get best policy for Fire policy, select fire policy from the list of Policy list and click on Get Result button. The analyzed results of three fire policies are stored, while running the WEKA knowledge flow model. The best policy having maximum accuracy is shown in fire policy. The RE-1 policy is selected as the best policy. This model was created and tested with other investment policies also. Figure 5.1: Best Policy prediction for FIRE In this paper, the data is collected and analyzed using multi-classifier SVM. Based on SVM ensemble classifier, the best policies were predicted for each category. The prediction helps to choose the best policy [1] Siji T.Mathew, Chandra J.Estimating the performance of SVM using Personal equity plan, in proceedings Of International Conf On Mathematics In Engineering &Business Management, pp , March 9-10,2012. [2] Siji T. Mathew, SVM Ensemble for Insurance Data Analysis, Mphil Dissertation, Christ University, Jun [3] Chawla, Nitesh V (2004). Learning Ensembles from Bites : A Scalable and Accurate Approach. Journal of Machine Learning Research, pp [4] C.Cortes,V.Vapnik,Support vector network, Machine Learning, pp ,1995. [5] Fumera, G, Roli, F, A theoretical and experimental analysis of linear combiners for multiple classifier systems Pattern Analysis and Machine Intelligence,IEEE Transactions, vol.27, no.6, pp [6] Harris Drucker et al, Support vector machines for spam categorization IEEE Transactions On Neural Networks,1999. [7] Kuncheva LI.,Combining Pattern Classifiers. Methods and Algorithms,2004. [8] Leon Bottou and Chih-Jen Lin, Support vectormachine Solvers in Large Scale Kernel Machines. Weston editors, MIT Press, Cambridge, MA, pp [9] Ma Chao, Chen Xihong, A New Algorithm of Support Vector Machine Ensemble and Its Application,Intelligent Human- Machine Systems and Cybernetics (IHMSC), 2nd International Conference, ,2010. [10] Mark Hall, et al(2009),the WEKA Data Mining
5 23 [11] Software: An Update SIGKDD Explorations, vol 11. [12] Nicolas Garcia-Pedrajas, Constructing Ensembles of Classifiers by Means of Weighted Instance Selection, IEEE Transactions On Neural Networks, vol. 20, NO. 2,2009. [13] Nikunj C. Oza and Kagan Tumer, Key Real- World Applications of Classifier Ensembles Information Fusion, Special Issue on Applications of Ensemble Methods, pp. 4 20,2008. [14] Petr Hajek, Vladimir Olej, Municipal Revenue Prediction by Support Vector Machine ensemble, ICCOMP 10:Proc. of the 14th WSEAS international Conf. On Computers: part of the 14th WSEAS CSCC multi Conf. vol.1,2010. [15] Ravi V.H et al, Soft computing system for bank performance Prediction. Elsevier Applied Soft Computing, [16] Robi Policar Ensemble based systems in decision making IEEE circuits and systems magazine, [17] Shin, Kim, An application of support vector machine in bankruptcy prediction model Expert System with applications,vol ,2005. [18] Bhardwaj M., Gupta, T., Grover T, Bhatnagar V An efficient classifier ensemble using SVM, Methods and Models in Computer Science, ICM2CS, Proceeding of International Conf, vol.2,2009. [19] Yan-Shi Dong, Ke-Song Han, Boosting SVM Classifiers By Ensemble, Special interest tracks and posters of the 14thinternational Conf on World Wide web May, pp.10-14,2005. [20] Ye Li, Yun Ze cal, et al, Fault diagnosis based on support vector machine ensemble, Machine Learning and Cybernetics. Pp ,2005. [21] Dataset available
Ensemble Methods. Knowledge Discovery and Data Mining 2 (VU) (707.004) Roman Kern. KTI, TU Graz 2015-03-05
Ensemble Methods Knowledge Discovery and Data Mining 2 (VU) (707004) Roman Kern KTI, TU Graz 2015-03-05 Roman Kern (KTI, TU Graz) Ensemble Methods 2015-03-05 1 / 38 Outline 1 Introduction 2 Classification
More informationComparing the Results of Support Vector Machines with Traditional Data Mining Algorithms
Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms Scott Pion and Lutz Hamel Abstract This paper presents the results of a series of analyses performed on direct mail
More informationA Study Of Bagging And Boosting Approaches To Develop Meta-Classifier
A Study Of Bagging And Boosting Approaches To Develop Meta-Classifier G.T. Prasanna Kumari Associate Professor, Dept of Computer Science and Engineering, Gokula Krishna College of Engg, Sullurpet-524121,
More informationClassification algorithm in Data mining: An Overview
Classification algorithm in Data mining: An Overview S.Neelamegam #1, Dr.E.Ramaraj *2 #1 M.phil Scholar, Department of Computer Science and Engineering, Alagappa University, Karaikudi. *2 Professor, Department
More informationKnowledge Discovery from patents using KMX Text Analytics
Knowledge Discovery from patents using KMX Text Analytics Dr. Anton Heijs anton.heijs@treparel.com Treparel Abstract In this white paper we discuss how the KMX technology of Treparel can help searchers
More informationSupport Vector Machines with Clustering for Training with Very Large Datasets
Support Vector Machines with Clustering for Training with Very Large Datasets Theodoros Evgeniou Technology Management INSEAD Bd de Constance, Fontainebleau 77300, France theodoros.evgeniou@insead.fr Massimiliano
More informationArtificial Neural Network, Decision Tree and Statistical Techniques Applied for Designing and Developing E-mail Classifier
International Journal of Recent Technology and Engineering (IJRTE) ISSN: 2277-3878, Volume-1, Issue-6, January 2013 Artificial Neural Network, Decision Tree and Statistical Techniques Applied for Designing
More informationData Mining Part 5. Prediction
Data Mining Part 5. Prediction 5.1 Spring 2010 Instructor: Dr. Masoud Yaghini Outline Classification vs. Numeric Prediction Prediction Process Data Preparation Comparing Prediction Methods References Classification
More informationSearch Taxonomy. Web Search. Search Engine Optimization. Information Retrieval
Information Retrieval INFO 4300 / CS 4300! Retrieval models Older models» Boolean retrieval» Vector Space model Probabilistic Models» BM25» Language models Web search» Learning to Rank Search Taxonomy!
More informationTowards better accuracy for Spam predictions
Towards better accuracy for Spam predictions Chengyan Zhao Department of Computer Science University of Toronto Toronto, Ontario, Canada M5S 2E4 czhao@cs.toronto.edu Abstract Spam identification is crucial
More informationHYBRID PROBABILITY BASED ENSEMBLES FOR BANKRUPTCY PREDICTION
HYBRID PROBABILITY BASED ENSEMBLES FOR BANKRUPTCY PREDICTION Chihli Hung 1, Jing Hong Chen 2, Stefan Wermter 3, 1,2 Department of Management Information Systems, Chung Yuan Christian University, Taiwan
More informationData Mining. Nonlinear Classification
Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Nonlinear Classification Classes may not be separable by a linear boundary Suppose we randomly generate a data set as follows: X has range between 0 to 15
More informationComparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data
CMPE 59H Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data Term Project Report Fatma Güney, Kübra Kalkan 1/15/2013 Keywords: Non-linear
More informationA fast multi-class SVM learning method for huge databases
www.ijcsi.org 544 A fast multi-class SVM learning method for huge databases Djeffal Abdelhamid 1, Babahenini Mohamed Chaouki 2 and Taleb-Ahmed Abdelmalik 3 1,2 Computer science department, LESIA Laboratory,
More informationChapter 6. The stacking ensemble approach
82 This chapter proposes the stacking ensemble approach for combining different data mining classifiers to get better performance. Other combination techniques like voting, bagging etc are also described
More informationAzure Machine Learning, SQL Data Mining and R
Azure Machine Learning, SQL Data Mining and R Day-by-day Agenda Prerequisites No formal prerequisites. Basic knowledge of SQL Server Data Tools, Excel and any analytical experience helps. Best of all:
More informationA NEW DECISION TREE METHOD FOR DATA MINING IN MEDICINE
A NEW DECISION TREE METHOD FOR DATA MINING IN MEDICINE Kasra Madadipouya 1 1 Department of Computing and Science, Asia Pacific University of Technology & Innovation ABSTRACT Today, enormous amount of data
More informationUsing Random Forest to Learn Imbalanced Data
Using Random Forest to Learn Imbalanced Data Chao Chen, chenchao@stat.berkeley.edu Department of Statistics,UC Berkeley Andy Liaw, andy liaw@merck.com Biometrics Research,Merck Research Labs Leo Breiman,
More informationPerformance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification
Performance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification Tina R. Patil, Mrs. S. S. Sherekar Sant Gadgebaba Amravati University, Amravati tnpatil2@gmail.com, ss_sherekar@rediffmail.com
More informationArtificial Neural Networks and Support Vector Machines. CS 486/686: Introduction to Artificial Intelligence
Artificial Neural Networks and Support Vector Machines CS 486/686: Introduction to Artificial Intelligence 1 Outline What is a Neural Network? - Perceptron learners - Multi-layer networks What is a Support
More informationDECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES
DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES Vijayalakshmi Mahanra Rao 1, Yashwant Prasad Singh 2 Multimedia University, Cyberjaya, MALAYSIA 1 lakshmi.mahanra@gmail.com
More informationKeywords Data mining, Classification Algorithm, Decision tree, J48, Random forest, Random tree, LMT, WEKA 3.7. Fig.1. Data mining techniques.
International Journal of Emerging Research in Management &Technology Research Article October 2015 Comparative Study of Various Decision Tree Classification Algorithm Using WEKA Purva Sewaiwar, Kamal Kant
More informationAddressing the Class Imbalance Problem in Medical Datasets
Addressing the Class Imbalance Problem in Medical Datasets M. Mostafizur Rahman and D. N. Davis the size of the training set is significantly increased [5]. If the time taken to resample is not considered,
More informationFRAUD DETECTION IN ELECTRIC POWER DISTRIBUTION NETWORKS USING AN ANN-BASED KNOWLEDGE-DISCOVERY PROCESS
FRAUD DETECTION IN ELECTRIC POWER DISTRIBUTION NETWORKS USING AN ANN-BASED KNOWLEDGE-DISCOVERY PROCESS Breno C. Costa, Bruno. L. A. Alberto, André M. Portela, W. Maduro, Esdras O. Eler PDITec, Belo Horizonte,
More informationSupervised Learning (Big Data Analytics)
Supervised Learning (Big Data Analytics) Vibhav Gogate Department of Computer Science The University of Texas at Dallas Practical advice Goal of Big Data Analytics Uncover patterns in Data. Can be used
More informationCI6227: Data Mining. Lesson 11b: Ensemble Learning. Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore.
CI6227: Data Mining Lesson 11b: Ensemble Learning Sinno Jialin PAN Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore Acknowledgements: slides are adapted from the lecture notes
More informationAUTO CLAIM FRAUD DETECTION USING MULTI CLASSIFIER SYSTEM
AUTO CLAIM FRAUD DETECTION USING MULTI CLASSIFIER SYSTEM ABSTRACT Luis Alexandre Rodrigues and Nizam Omar Department of Electrical Engineering, Mackenzie Presbiterian University, Brazil, São Paulo 71251911@mackenzie.br,nizam.omar@mackenzie.br
More informationPractical Data Science with Azure Machine Learning, SQL Data Mining, and R
Practical Data Science with Azure Machine Learning, SQL Data Mining, and R Overview This 4-day class is the first of the two data science courses taught by Rafal Lukawiecki. Some of the topics will be
More informationEnsemble Data Mining Methods
Ensemble Data Mining Methods Nikunj C. Oza, Ph.D., NASA Ames Research Center, USA INTRODUCTION Ensemble Data Mining Methods, also known as Committee Methods or Model Combiners, are machine learning methods
More informationAn Approach to Detect Spam Emails by Using Majority Voting
An Approach to Detect Spam Emails by Using Majority Voting Roohi Hussain Department of Computer Engineering, National University of Science and Technology, H-12 Islamabad, Pakistan Usman Qamar Faculty,
More informationSupport Vector Machines Explained
March 1, 2009 Support Vector Machines Explained Tristan Fletcher www.cs.ucl.ac.uk/staff/t.fletcher/ Introduction This document has been written in an attempt to make the Support Vector Machines (SVM),
More informationHow To Identify A Churner
2012 45th Hawaii International Conference on System Sciences A New Ensemble Model for Efficient Churn Prediction in Mobile Telecommunication Namhyoung Kim, Jaewook Lee Department of Industrial and Management
More informationMining Direct Marketing Data by Ensembles of Weak Learners and Rough Set Methods
Mining Direct Marketing Data by Ensembles of Weak Learners and Rough Set Methods Jerzy B laszczyński 1, Krzysztof Dembczyński 1, Wojciech Kot lowski 1, and Mariusz Paw lowski 2 1 Institute of Computing
More informationActive Learning SVM for Blogs recommendation
Active Learning SVM for Blogs recommendation Xin Guan Computer Science, George Mason University Ⅰ.Introduction In the DH Now website, they try to review a big amount of blogs and articles and find the
More informationT-61.3050 : Email Classification as Spam or Ham using Naive Bayes Classifier. Santosh Tirunagari : 245577
T-61.3050 : Email Classification as Spam or Ham using Naive Bayes Classifier Santosh Tirunagari : 245577 January 20, 2011 Abstract This term project gives a solution how to classify an email as spam or
More informationData Mining Practical Machine Learning Tools and Techniques
Ensemble learning Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 8 of Data Mining by I. H. Witten, E. Frank and M. A. Hall Combining multiple models Bagging The basic idea
More informationSURVIVABILITY OF COMPLEX SYSTEM SUPPORT VECTOR MACHINE BASED APPROACH
1 SURVIVABILITY OF COMPLEX SYSTEM SUPPORT VECTOR MACHINE BASED APPROACH Y, HONG, N. GAUTAM, S. R. T. KUMARA, A. SURANA, H. GUPTA, S. LEE, V. NARAYANAN, H. THADAKAMALLA The Dept. of Industrial Engineering,
More informationAn Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015
An Introduction to Data Mining for Wind Power Management Spring 2015 Big Data World Every minute: Google receives over 4 million search queries Facebook users share almost 2.5 million pieces of content
More informationCLASSIFYING NETWORK TRAFFIC IN THE BIG DATA ERA
CLASSIFYING NETWORK TRAFFIC IN THE BIG DATA ERA Professor Yang Xiang Network Security and Computing Laboratory (NSCLab) School of Information Technology Deakin University, Melbourne, Australia http://anss.org.au/nsclab
More information1. Classification problems
Neural and Evolutionary Computing. Lab 1: Classification problems Machine Learning test data repository Weka data mining platform Introduction Scilab 1. Classification problems The main aim of a classification
More informationDecision Support Systems
Decision Support Systems 50 (2011) 602 613 Contents lists available at ScienceDirect Decision Support Systems journal homepage: www.elsevier.com/locate/dss Data mining for credit card fraud: A comparative
More informationFeature Selection using Integer and Binary coded Genetic Algorithm to improve the performance of SVM Classifier
Feature Selection using Integer and Binary coded Genetic Algorithm to improve the performance of SVM Classifier D.Nithya a, *, V.Suganya b,1, R.Saranya Irudaya Mary c,1 Abstract - This paper presents,
More informationKnowledge Discovery and Data Mining
Knowledge Discovery and Data Mining Unit # 11 Sajjad Haider Fall 2013 1 Supervised Learning Process Data Collection/Preparation Data Cleaning Discretization Supervised/Unuspervised Identification of right
More informationInternational Journal of Software and Web Sciences (IJSWS) www.iasir.net
International Association of Scientific Innovation and Research (IASIR) (An Association Unifying the Sciences, Engineering, and Applied Research) ISSN (Print): 2279-0063 ISSN (Online): 2279-0071 International
More informationIntroduction to Support Vector Machines. Colin Campbell, Bristol University
Introduction to Support Vector Machines Colin Campbell, Bristol University 1 Outline of talk. Part 1. An Introduction to SVMs 1.1. SVMs for binary classification. 1.2. Soft margins and multi-class classification.
More informationAn Overview of Knowledge Discovery Database and Data mining Techniques
An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,
More informationData Mining - Evaluation of Classifiers
Data Mining - Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010
More informationOverview. Evaluation Connectionist and Statistical Language Processing. Test and Validation Set. Training and Test Set
Overview Evaluation Connectionist and Statistical Language Processing Frank Keller keller@coli.uni-sb.de Computerlinguistik Universität des Saarlandes training set, validation set, test set holdout, stratification
More informationSocial Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
More informationBEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES
ISSN: 2229-6956(ONLINE) ICTACT JOURNAL ON SOFT COMPUTING, JULY 2012, VOLUME: 02, ISSUE: 04 BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES V. Dheepa 1 and R. Dhanapal 2 1 Research
More informationMAXIMIZING RETURN ON DIRECT MARKETING CAMPAIGNS
MAXIMIZING RETURN ON DIRET MARKETING AMPAIGNS IN OMMERIAL BANKING S 229 Project: Final Report Oleksandra Onosova INTRODUTION Recent innovations in cloud computing and unified communications have made a
More informationConstrained Classification of Large Imbalanced Data by Logistic Regression and Genetic Algorithm
Constrained Classification of Large Imbalanced Data by Logistic Regression and Genetic Algorithm Martin Hlosta, Rostislav Stríž, Jan Kupčík, Jaroslav Zendulka, and Tomáš Hruška A. Imbalanced Data Classification
More informationCombining SVM classifiers for email anti-spam filtering
Combining SVM classifiers for email anti-spam filtering Ángela Blanco Manuel Martín-Merino Abstract Spam, also known as Unsolicited Commercial Email (UCE) is becoming a nightmare for Internet users and
More informationBOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL
The Fifth International Conference on e-learning (elearning-2014), 22-23 September 2014, Belgrade, Serbia BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL SNJEŽANA MILINKOVIĆ University
More informationGetting Even More Out of Ensemble Selection
Getting Even More Out of Ensemble Selection Quan Sun Department of Computer Science The University of Waikato Hamilton, New Zealand qs12@cs.waikato.ac.nz ABSTRACT Ensemble Selection uses forward stepwise
More informationInternational Journal of Computer Science Trends and Technology (IJCST) Volume 3 Issue 3, May-June 2015
RESEARCH ARTICLE OPEN ACCESS Data Mining Technology for Efficient Network Security Management Ankit Naik [1], S.W. Ahmad [2] Student [1], Assistant Professor [2] Department of Computer Science and Engineering
More informationAnalysis of kiva.com Microlending Service! Hoda Eydgahi Julia Ma Andy Bardagjy December 9, 2010 MAS.622j
Analysis of kiva.com Microlending Service! Hoda Eydgahi Julia Ma Andy Bardagjy December 9, 2010 MAS.622j What is Kiva? An organization that allows people to lend small amounts of money via the Internet
More informationAdvanced Ensemble Strategies for Polynomial Models
Advanced Ensemble Strategies for Polynomial Models Pavel Kordík 1, Jan Černý 2 1 Dept. of Computer Science, Faculty of Information Technology, Czech Technical University in Prague, 2 Dept. of Computer
More informationBig Data Analytics CSCI 4030
High dim. data Graph data Infinite data Machine learning Apps Locality sensitive hashing PageRank, SimRank Filtering data streams SVM Recommen der systems Clustering Community Detection Web advertising
More informationSpam detection with data mining method:
Spam detection with data mining method: Ensemble learning with multiple SVM based classifiers to optimize generalization ability of email spam classification Keywords: ensemble learning, SVM classifier,
More informationHow To Cluster
Data Clustering Dec 2nd, 2013 Kyrylo Bessonov Talk outline Introduction to clustering Types of clustering Supervised Unsupervised Similarity measures Main clustering algorithms k-means Hierarchical Main
More informationClassification of Bad Accounts in Credit Card Industry
Classification of Bad Accounts in Credit Card Industry Chengwei Yuan December 12, 2014 Introduction Risk management is critical for a credit card company to survive in such competing industry. In addition
More informationFeature Subset Selection in E-mail Spam Detection
Feature Subset Selection in E-mail Spam Detection Amir Rajabi Behjat, Universiti Technology MARA, Malaysia IT Security for the Next Generation Asia Pacific & MEA Cup, Hong Kong 14-16 March, 2012 Feature
More informationMining Wiki Usage Data for Predicting Final Grades of Students
Mining Wiki Usage Data for Predicting Final Grades of Students Gökhan Akçapınar, Erdal Coşgun, Arif Altun Hacettepe University gokhana@hacettepe.edu.tr, erdal.cosgun@hacettepe.edu.tr, altunar@hacettepe.edu.tr
More informationSelecting Data Mining Model for Web Advertising in Virtual Communities
Selecting Data Mining for Web Advertising in Virtual Communities Jerzy Surma Faculty of Business Administration Warsaw School of Economics Warsaw, Poland e-mail: jerzy.surma@gmail.com Mariusz Łapczyński
More informationE-commerce Transaction Anomaly Classification
E-commerce Transaction Anomaly Classification Minyong Lee minyong@stanford.edu Seunghee Ham sham12@stanford.edu Qiyi Jiang qjiang@stanford.edu I. INTRODUCTION Due to the increasing popularity of e-commerce
More informationIntroducing diversity among the models of multi-label classification ensemble
Introducing diversity among the models of multi-label classification ensemble Lena Chekina, Lior Rokach and Bracha Shapira Ben-Gurion University of the Negev Dept. of Information Systems Engineering and
More informationData Mining Framework for Direct Marketing: A Case Study of Bank Marketing
www.ijcsi.org 198 Data Mining Framework for Direct Marketing: A Case Study of Bank Marketing Lilian Sing oei 1 and Jiayang Wang 2 1 School of Information Science and Engineering, Central South University
More informationInternational Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014
RESEARCH ARTICLE OPEN ACCESS A Survey of Data Mining: Concepts with Applications and its Future Scope Dr. Zubair Khan 1, Ashish Kumar 2, Sunny Kumar 3 M.Tech Research Scholar 2. Department of Computer
More informationScalable Developments for Big Data Analytics in Remote Sensing
Scalable Developments for Big Data Analytics in Remote Sensing Federated Systems and Data Division Research Group High Productivity Data Processing Dr.-Ing. Morris Riedel et al. Research Group Leader,
More informationInternational Journal of Advance Research in Computer Science and Management Studies
Volume 2, Issue 12, December 2014 ISSN: 2321 7782 (Online) International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online
More informationEmail Classification Using Data Reduction Method
Email Classification Using Data Reduction Method Rafiqul Islam and Yang Xiang, member IEEE School of Information Technology Deakin University, Burwood 3125, Victoria, Australia Abstract Classifying user
More informationIDENTIFIC ATION OF SOFTWARE EROSION USING LOGISTIC REGRESSION
http:// IDENTIFIC ATION OF SOFTWARE EROSION USING LOGISTIC REGRESSION Harinder Kaur 1, Raveen Bajwa 2 1 PG Student., CSE., Baba Banda Singh Bahadur Engg. College, Fatehgarh Sahib, (India) 2 Asstt. Prof.,
More informationAnti-Spam Filter Based on Naïve Bayes, SVM, and KNN model
AI TERM PROJECT GROUP 14 1 Anti-Spam Filter Based on,, and model Yun-Nung Chen, Che-An Lu, Chao-Yu Huang Abstract spam email filters are a well-known and powerful type of filters. We construct different
More informationData Mining Techniques for Prognosis in Pancreatic Cancer
Data Mining Techniques for Prognosis in Pancreatic Cancer by Stuart Floyd A Thesis Submitted to the Faculty of the WORCESTER POLYTECHNIC INSTITUE In partial fulfillment of the requirements for the Degree
More informationA Novel Feature Selection Method Based on an Integrated Data Envelopment Analysis and Entropy Mode
A Novel Feature Selection Method Based on an Integrated Data Envelopment Analysis and Entropy Mode Seyed Mojtaba Hosseini Bamakan, Peyman Gholami RESEARCH CENTRE OF FICTITIOUS ECONOMY & DATA SCIENCE UNIVERSITY
More informationIntroduction to Machine Learning Lecture 1. Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu
Introduction to Machine Learning Lecture 1 Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Introduction Logistics Prerequisites: basics concepts needed in probability and statistics
More informationEnhancing Quality of Data using Data Mining Method
JOURNAL OF COMPUTING, VOLUME 2, ISSUE 9, SEPTEMBER 2, ISSN 25-967 WWW.JOURNALOFCOMPUTING.ORG 9 Enhancing Quality of Data using Data Mining Method Fatemeh Ghorbanpour A., Mir M. Pedram, Kambiz Badie, Mohammad
More informationPredicting required bandwidth for educational institutes using prediction techniques in data mining (Case Study: Qom Payame Noor University)
260 IJCSNS International Journal of Computer Science and Network Security, VOL.11 No.6, June 2011 Predicting required bandwidth for educational institutes using prediction techniques in data mining (Case
More informationAnalysis of WEKA Data Mining Algorithm REPTree, Simple Cart and RandomTree for Classification of Indian News
Analysis of WEKA Data Mining Algorithm REPTree, Simple Cart and RandomTree for Classification of Indian News Sushilkumar Kalmegh Associate Professor, Department of Computer Science, Sant Gadge Baba Amravati
More informationREVIEW OF ENSEMBLE CLASSIFICATION
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology ISSN 2320 088X IJCSMC, Vol. 2, Issue.
More informationChoosing the Best Classification Performance Metric for Wrapper-based Software Metric Selection for Defect Prediction
Choosing the Best Classification Performance Metric for Wrapper-based Software Metric Selection for Defect Prediction Huanjing Wang Western Kentucky University huanjing.wang@wku.edu Taghi M. Khoshgoftaar
More informationSemi-Supervised Learning for Blog Classification
Proceedings of the Twenty-Third AAAI Conference on Artificial Intelligence (2008) Semi-Supervised Learning for Blog Classification Daisuke Ikeda Department of Computational Intelligence and Systems Science,
More informationBEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES
BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES 123 CHAPTER 7 BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES 7.1 Introduction Even though using SVM presents
More informationIntrusion Detection via Machine Learning for SCADA System Protection
Intrusion Detection via Machine Learning for SCADA System Protection S.L.P. Yasakethu Department of Computing, University of Surrey, Guildford, GU2 7XH, UK. s.l.yasakethu@surrey.ac.uk J. Jiang Department
More informationThe Artificial Prediction Market
The Artificial Prediction Market Adrian Barbu Department of Statistics Florida State University Joint work with Nathan Lay, Siemens Corporate Research 1 Overview Main Contributions A mathematical theory
More informationII. RELATED WORK. Sentiment Mining
Sentiment Mining Using Ensemble Classification Models Matthew Whitehead and Larry Yaeger Indiana University School of Informatics 901 E. 10th St. Bloomington, IN 47408 {mewhiteh, larryy}@indiana.edu Abstract
More informationA Content based Spam Filtering Using Optical Back Propagation Technique
A Content based Spam Filtering Using Optical Back Propagation Technique Sarab M. Hameed 1, Noor Alhuda J. Mohammed 2 Department of Computer Science, College of Science, University of Baghdad - Iraq ABSTRACT
More informationISSN: 2321-7782 (Online) Volume 2, Issue 10, October 2014 International Journal of Advance Research in Computer Science and Management Studies
ISSN: 2321-7782 (Online) Volume 2, Issue 10, October 2014 International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online
More informationBIOINF 585 Fall 2015 Machine Learning for Systems Biology & Clinical Informatics http://www.ccmb.med.umich.edu/node/1376
Course Director: Dr. Kayvan Najarian (DCM&B, kayvan@umich.edu) Lectures: Labs: Mondays and Wednesdays 9:00 AM -10:30 AM Rm. 2065 Palmer Commons Bldg. Wednesdays 10:30 AM 11:30 AM (alternate weeks) Rm.
More informationCustomer Classification And Prediction Based On Data Mining Technique
Customer Classification And Prediction Based On Data Mining Technique Ms. Neethu Baby 1, Mrs. Priyanka L.T 2 1 M.E CSE, Sri Shakthi Institute of Engineering and Technology, Coimbatore 2 Assistant Professor
More informationADVANCED MACHINE LEARNING. Introduction
1 1 Introduction Lecturer: Prof. Aude Billard (aude.billard@epfl.ch) Teaching Assistants: Guillaume de Chambrier, Nadia Figueroa, Denys Lamotte, Nicola Sommer 2 2 Course Format Alternate between: Lectures
More informationApplication of Event Based Decision Tree and Ensemble of Data Driven Methods for Maintenance Action Recommendation
Application of Event Based Decision Tree and Ensemble of Data Driven Methods for Maintenance Action Recommendation James K. Kimotho, Christoph Sondermann-Woelke, Tobias Meyer, and Walter Sextro Department
More informationDATA MINING TECHNIQUES AND APPLICATIONS
DATA MINING TECHNIQUES AND APPLICATIONS Mrs. Bharati M. Ramageri, Lecturer Modern Institute of Information Technology and Research, Department of Computer Application, Yamunanagar, Nigdi Pune, Maharashtra,
More informationCase Study Report: Building and analyzing SVM ensembles with Bagging and AdaBoost on big data sets
Case Study Report: Building and analyzing SVM ensembles with Bagging and AdaBoost on big data sets Ricardo Ramos Guerra Jörg Stork Master in Automation and IT Faculty of Computer Science and Engineering
More informationA DECISION TREE BASED PEDOMETER AND ITS IMPLEMENTATION ON THE ANDROID PLATFORM
A DECISION TREE BASED PEDOMETER AND ITS IMPLEMENTATION ON THE ANDROID PLATFORM ABSTRACT Juanying Lin, Leanne Chan and Hong Yan Department of Electronic Engineering, City University of Hong Kong, Hong Kong,
More informationA User s Guide to Support Vector Machines
A User s Guide to Support Vector Machines Asa Ben-Hur Department of Computer Science Colorado State University Jason Weston NEC Labs America Princeton, NJ 08540 USA Abstract The Support Vector Machine
More informationA Health Degree Evaluation Algorithm for Equipment Based on Fuzzy Sets and the Improved SVM
Journal of Computational Information Systems 10: 17 (2014) 7629 7635 Available at http://www.jofcis.com A Health Degree Evaluation Algorithm for Equipment Based on Fuzzy Sets and the Improved SVM Tian
More informationMulticlass Classification. 9.520 Class 06, 25 Feb 2008 Ryan Rifkin
Multiclass Classification 9.520 Class 06, 25 Feb 2008 Ryan Rifkin It is a tale Told by an idiot, full of sound and fury, Signifying nothing. Macbeth, Act V, Scene V What Is Multiclass Classification? Each
More informationEnsemble Learning Better Predictions Through Diversity. Todd Holloway ETech 2008
Ensemble Learning Better Predictions Through Diversity Todd Holloway ETech 2008 Outline Building a classifier (a tutorial example) Neighbor method Major ideas and challenges in classification Ensembles
More information