Applied Mathematical Sciences, Vol. 7, 2013, no. 112, HIKARI Ltd,

Save this PDF as:
 WORD  PNG  TXT  JPG

Size: px
Start display at page:

Download "Applied Mathematical Sciences, Vol. 7, 2013, no. 112, 5591-5597 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/ams.2013."

Transcription

1 Applied Mathematical Sciences, Vol. 7, 2013, no. 112, HIKARI Ltd, Accuracy Rate of Predictive Models in Credit Screening Anirut Suebsing Department of Innovation Management, Faculty of Management Science Ubonratchathani University, Ubonratchathani, 34190, Thailand Chakarin Vajiramedhin Department of Innovation Management, Faculty of Management Science Ubonratchathani University, Ubonratchathani, 34190, Thailand Copyright 2013 Anirut Suebsing and Chakarin Vajiramedhin. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Abstract Nowadays, data mining has been playing an important role in the various disciplines of sciences and technologies. However, one of the essential process of data mining is the data preparation because if without this process, the data mining tasks will provide bad results as well, especially in terms of classification task. There are several sub procedures of the data preparation but one of the most significant sub procedures ignored by many researches is data transformation techniques. Therefore, this paper presents the transformation techniques are important in the classification task of data mining, particularly our proposed approach in this paper. This paper focuses on increasing the accuracy of credit card screening predictive model as in the last several years, the credit card business is the extremely competitive market share that a number of credit card payment defaults are escalating. Thus, predictive model based on data mining can be applied to filter customers data to reduce a number of default customers. However, with the proposed approach in this paper, it can enhance the performance of accuracy rate of the predictive models for the credit screening data set effectively. During the evaluation phase, all accuracy rates of each predictive model based on the proposed approach shows the high performance compared with the accuracy rates of each predictive model that without using the proposed approach. Keywords: data mining, predictive model, credit screening

2 5592 Anirut Suebsing and Chakarin Vajiramedhin 1 Introduction Nowadays, data mining plays important in the highly competitive business world. That discovers the valuable information from a very large transaction database for supporting the business. The information has been applied to analysis, plan or manage [10]. In the financial industry, the data mining has been applied successfully in several fields such as the credit card approval. In the last several years, credit card business is the extremely competitive market share. The customers can sign up for credit cards easier using marketing strategies that focus on a few conditions. A number of credit card payment defaults are increasing. Therefore, data mining technique can be applied to filter customers data to reduce a number of default customers. Generally, the financial institutions have a screening of customers information before approving the credit card by using efficient techniques such as automatic predictive model. The predictive model is a supervised learning tool for predicting the customers behavior from their history data. The input data format is an important key of the accuracy performance besides the best predictive solution. If the format is suitable for processing, the predictive model would have the best performance. On the other hand, if the input data do not meet the agreement, the predictive model would make the experimental error. Thus the appropriate data help to enhance the efficiency and correctness of the algorithm. Data transformation is one of the processing data steps that converts the input data into suitable form. In this paper, we applied the data transformation with eight predictive model algorithms to predict credit card customer data. In this paper, we focus on how to increase an accuracy rate of predictive model for a credit approval to help banks to decrease a risk for a credit approval. The goal of scrutinization is to identify a group of customers who have a high probability to defraud the banks by ignoring to disburse money to the banks. In data mining based on the data transformation, it helps the banks to predict customers profiles or attributes are able to give the credit approval so that reduce the risk of bad debt. The remainders of this paper are organized as follows: Section 2 describes the research objectives. Section 3 reviews of the predictive model algorithms and data transformations. Section 4 describes the methodology in this paper. Section 5 illustrates the experimental results. Section 6 discusses the performance evaluation. Finally, conclusion is discussed in Section 7. 2 Related Work 2.1 Data transformation Data preprocessing is an important part of data mining preparing the input data before the data are processed [9] (see Fig.1). This step consists of several parts

3 Accuracy rate of predictive models 5593 such as data cleaning, data integration, data transformation and data reduction. The input data can be processed in their raw collected form but the accuracy percentage of predictive model may be reduced in case of incomplete data. That the quality of knowledge extracted from the data can be enhanced by its transformation [3]. Thus, data transformation is the phase in preprocessing that converts a source data format into appropriate data form. Fig.1 Data preprocessing for Predictive. 2.2 Predictive Model Predictive model is the area of data mining that is usually applied to classify the group of data such as a classification of customers credit card. The predictive model is a process to create a model of future behavior from the collected data. The model may be created from a simple or complex equation (see Fig. 2). In the financial industry, the future behavior predictive is very important thus many banks have adopted the predictive to predict the customer data for screening the customer information before approving the credit card. Thus, many predictive models ([2], [6], [7], [8]) have been proposed. Each model has its own advantages and disadvantages that the performance is measured by the percentage of accuracy predicting. Fig.2 Data preprocessing for Predictive. 3 Data Transform for Predictive Model In this section, an approach, which is used to transform a raw data to a new data by using the binary coding method, is a method for turning any characters into numerical values to help a predictive model for credit approval to enhance its accuracy rate. The data sets used in this paper must be changed numerical value.

4 5594 Anirut Suebsing and Chakarin Vajiramedhin The proposed approach is to divide a nominal attribute into n binary attributes, which is called binary coding or dummy coding, if there are possible values (here, n > 2), with 0/1 representing the absence or presence of each value. For instance, an attribute of sex in the data set which has two values as follows: man, woman, then this attribute can be turned into numeric based on binary coding as follows: Fig.3 An example of mapping nominal based on binary coding technique. In this paper, we focus on how to increase an accuracy rate of predictive model for a credit approval to help banks to decrease a risk for a credit approval. The goal of scrutinization is to identify a group of customers who have a high probability to defraud the banks by ignoring to disburse money to the banks. In data mining based on the data transformation, it helps the banks to predict customers profiles or attributes are able to give the credit approval so that reduce the risk of bad debt. After mapping character or nominal data into numerical data, we make all values of attributes in each data sets equivalent by using the Min-Max normalization. The Min-max normalization is the simplest normalization technique and the most commonly used to standardize the range of independent attributes or features of data sets ([4], [5]). Min-max normalization is the simplest normalization technique that is best-suited for the cases where the bounds of the scores produced by a classifier are known. V min = minimum of the value in the data set. V max = maximum of the value in the data set. In this case, given a set of matching values x i, i=1, 2,, M, the set of normalized scores is given by as follows:

5 Accuracy rate of predictive models 5595 x x i = V 1 min max V V min (1) Now, the experimental raw data sets have transformed by the proposed approach and they are ready to be used for increase the accuracy rate of the predictive model in credit screening. 4 Performance Evaluation In order to evaluate our proposed approach, the data sets from the UCI repository [1] (a Japanese credit screening data set) and KNN, Logistic, Smo, NaiveBayes, Kstar, VotedPerceptron, SGD, MultiClassClassifier algorithms, we used eight algorithms to evaluate the proposed approach because of increasing reliability of evaluating the proposed approach. Note that the measurement in this paper of the experimental results is based on the standard metrics for evaluations of accuracy rate for truth positive (TP) refers to the ratio between the number of correctly predicted values and the total number of values. This section shows the accuracy rate of different predictive models used the data set provided by the proposed approach and the original data set so that we compare the accuracy rate of each predictive model that used the different data sets. Table 1. Accuracy rate of different predictive models that used the proposed data set and original data set From Table 1, it shows all accuracy rates of each predictive model used the proposed data set are higher than the accuracy rates of each predictive model employed the original data set significantly especially the accuracy rates of KNN, Logistic, Kstar, and MultiClassClassifier models that are able to provide the rate more than 98 percent.

6 5596 Anirut Suebsing and Chakarin Vajiramedhin Fig.4 Accuracy rate of different predictive models. From the Result Section (see Fig.4), it shows the proposed approach can enhance the accuracy rate of the predictive models effectively. Because the proposed approach has transformed all characters of each attribute or column in the original data set to the numeric values in terms of binary code. Thus, it leads to many predictive model algorithms developed to support the numeric data sets can compute and a build the predictive model efficiently even if in the real-world applications, most of the predictive model algorithms can build the predictive model based on every type of data sets. Furthermore, the Min-Max method, which is used to make all values of attributes in the data set equivalent, is one of techniques proposed in the paper. After the data set was transformed to the numeric values, all values of attributes in the data set has to be made to standardization in order to avoid outlier values because this problem will make any predictive models built by any predictive model algorithms forecast ineffectively. Therefore, this method affects to the accuracy rate of the predictive model also. 5 Conclusion The proposed approach can reform the Japanese credit screening data set to the new one to improve the performance of accuracy rate of the predictive models. All accuracy rates of each predictive model shows the high performance compared with the accuracy rates of each predictive model used the original data set. When we focus on the accuracy rates of each predictive model based on the proposed data set in Table 1, it found that every accuracy rate of each predictive model is able to show performance impressively especially KNN, Logistic, Kstar, and MultiClassClassifier models providing the rate over than 98 percent.

7 Accuracy rate of predictive models 5597 Acknowledgement We would like to thank the Faculty of Management Science, Ubonratchathani University for the financial support. We are also grateful to the owner of the datasets, the UCI repository, for providing the available datasets. References [1] Asuncion, A., Newman, D., CA: University of California, School of Information and Computer Science, Uci machine learning repository. Irvine, February 8, [2] Hung, S., Yen, D. C., Wang H., Applying data mining to telecom churn manage-ment. Expert Systems with Applications, 2006, [3] Kusiak, A., Feature Transformation Methods in Data Mining. IEEE Transactions on Electronics Packaging Manufacturing, 24, 2001, [4] Li, C., Biswas, G., Unsupervised learning with mixed numeric and nominal data. IEEE Transactions on Knowledge and Data Engineering, 14, 2002, [5] Modugno, R., Pirlo, G., Impedovo, D., Score normalization by dynamic time warping, 2010 IEEE International Conference on Computational Intelligence for Measurement Systems and Applications (CIMSA), 2010, [6] Singh, R., Aggarwal., R. R., Comparative Evaluation of Predictive Modeling Techniques on Credit Card Data. International Journal of Computer Theory and Engineering, 3, 2011, [7] Wang, G., Et al., Predicting credit card holder churn in banks of China using data mining and MCDM IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology, 2010, [8] Yu, Y., Song, L., The Issue and Risk Analysis of the Credit card Fourth International Conference on Business Intelligence and Financial Engineering, 2011, [9] A. Suebsing, N. Hiransakolwong, A novel technique for feature subset selection based on cosine similarity, Applied Mathematical Sciences, vol.6, no. 133, 2012, [10] C. Vajiramedhin, J. Werapun, EBPA: An efficient data structure for frequent closed itemset mining Applied Mathematical Sciences, vol.7, no.30, 2013, Received: August 16, 2013

Feature Selection with Data Balancing for. Prediction of Bank Telemarketing

Feature Selection with Data Balancing for. Prediction of Bank Telemarketing Applied Mathematical Sciences, Vol. 8, 2014, no. 114, 5667-5672 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/ams.2014.47222 Feature Selection with Data Balancing for Prediction of Bank Telemarketing

More information

Predicting the Risk of Heart Attacks using Neural Network and Decision Tree

Predicting the Risk of Heart Attacks using Neural Network and Decision Tree Predicting the Risk of Heart Attacks using Neural Network and Decision Tree S.Florence 1, N.G.Bhuvaneswari Amma 2, G.Annapoorani 3, K.Malathi 4 PG Scholar, Indian Institute of Information Technology, Srirangam,

More information

Data Mining Approach For Subscription-Fraud. Detection in Telecommunication Sector

Data Mining Approach For Subscription-Fraud. Detection in Telecommunication Sector Contemporary Engineering Sciences, Vol. 7, 2014, no. 11, 515-522 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/ces.2014.4431 Data Mining Approach For Subscription-Fraud Detection in Telecommunication

More information

Data Mining: Overview. What is Data Mining?

Data Mining: Overview. What is Data Mining? Data Mining: Overview What is Data Mining? Recently * coined term for confluence of ideas from statistics and computer science (machine learning and database methods) applied to large databases in science,

More information

Data Mining: A Preprocessing Engine

Data Mining: A Preprocessing Engine Journal of Computer Science 2 (9): 735-739, 2006 ISSN 1549-3636 2005 Science Publications Data Mining: A Preprocessing Engine Luai Al Shalabi, Zyad Shaaban and Basel Kasasbeh Applied Science University,

More information

DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES

DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES Vijayalakshmi Mahanra Rao 1, Yashwant Prasad Singh 2 Multimedia University, Cyberjaya, MALAYSIA 1 lakshmi.mahanra@gmail.com

More information

An Analysis of Missing Data Treatment Methods and Their Application to Health Care Dataset

An Analysis of Missing Data Treatment Methods and Their Application to Health Care Dataset P P P Health An Analysis of Missing Data Treatment Methods and Their Application to Health Care Dataset Peng Liu 1, Elia El-Darzi 2, Lei Lei 1, Christos Vasilakis 2, Panagiotis Chountas 2, and Wei Huang

More information

A Survey on Outlier Detection Techniques for Credit Card Fraud Detection

A Survey on Outlier Detection Techniques for Credit Card Fraud Detection IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661, p- ISSN: 2278-8727Volume 16, Issue 2, Ver. VI (Mar-Apr. 2014), PP 44-48 A Survey on Outlier Detection Techniques for Credit Card Fraud

More information

Enhanced Boosted Trees Technique for Customer Churn Prediction Model

Enhanced Boosted Trees Technique for Customer Churn Prediction Model IOSR Journal of Engineering (IOSRJEN) ISSN (e): 2250-3021, ISSN (p): 2278-8719 Vol. 04, Issue 03 (March. 2014), V5 PP 41-45 www.iosrjen.org Enhanced Boosted Trees Technique for Customer Churn Prediction

More information

Social Media Mining. Data Mining Essentials

Social Media Mining. Data Mining Essentials Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers

More information

Domain Classification of Technical Terms Using the Web

Domain Classification of Technical Terms Using the Web Systems and Computers in Japan, Vol. 38, No. 14, 2007 Translated from Denshi Joho Tsushin Gakkai Ronbunshi, Vol. J89-D, No. 11, November 2006, pp. 2470 2482 Domain Classification of Technical Terms Using

More information

New Ensemble Combination Scheme

New Ensemble Combination Scheme New Ensemble Combination Scheme Namhyoung Kim, Youngdoo Son, and Jaewook Lee, Member, IEEE Abstract Recently many statistical learning techniques are successfully developed and used in several areas However,

More information

Random Forest Based Imbalanced Data Cleaning and Classification

Random Forest Based Imbalanced Data Cleaning and Classification Random Forest Based Imbalanced Data Cleaning and Classification Jie Gu Software School of Tsinghua University, China Abstract. The given task of PAKDD 2007 data mining competition is a typical problem

More information

Prediction Model for Crude Oil Price Using Artificial Neural Networks

Prediction Model for Crude Oil Price Using Artificial Neural Networks Applied Mathematical Sciences, Vol. 8, 2014, no. 80, 3953-3965 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/ams.2014.43193 Prediction Model for Crude Oil Price Using Artificial Neural Networks

More information

BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL

BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL The Fifth International Conference on e-learning (elearning-2014), 22-23 September 2014, Belgrade, Serbia BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL SNJEŽANA MILINKOVIĆ University

More information

Standardization and Its Effects on K-Means Clustering Algorithm

Standardization and Its Effects on K-Means Clustering Algorithm Research Journal of Applied Sciences, Engineering and Technology 6(7): 399-3303, 03 ISSN: 040-7459; e-issn: 040-7467 Maxwell Scientific Organization, 03 Submitted: January 3, 03 Accepted: February 5, 03

More information

FRAUD DETECTION IN ELECTRIC POWER DISTRIBUTION NETWORKS USING AN ANN-BASED KNOWLEDGE-DISCOVERY PROCESS

FRAUD DETECTION IN ELECTRIC POWER DISTRIBUTION NETWORKS USING AN ANN-BASED KNOWLEDGE-DISCOVERY PROCESS FRAUD DETECTION IN ELECTRIC POWER DISTRIBUTION NETWORKS USING AN ANN-BASED KNOWLEDGE-DISCOVERY PROCESS Breno C. Costa, Bruno. L. A. Alberto, André M. Portela, W. Maduro, Esdras O. Eler PDITec, Belo Horizonte,

More information

Feature Selection using Integer and Binary coded Genetic Algorithm to improve the performance of SVM Classifier

Feature Selection using Integer and Binary coded Genetic Algorithm to improve the performance of SVM Classifier Feature Selection using Integer and Binary coded Genetic Algorithm to improve the performance of SVM Classifier D.Nithya a, *, V.Suganya b,1, R.Saranya Irudaya Mary c,1 Abstract - This paper presents,

More information

CS 6220: Data Mining Techniques Course Project Description

CS 6220: Data Mining Techniques Course Project Description CS 6220: Data Mining Techniques Course Project Description College of Computer and Information Science Northeastern University Spring 2013 General Goal In this project, you will have an opportunity to

More information

ON INTEGRATING UNSUPERVISED AND SUPERVISED CLASSIFICATION FOR CREDIT RISK EVALUATION

ON INTEGRATING UNSUPERVISED AND SUPERVISED CLASSIFICATION FOR CREDIT RISK EVALUATION ISSN 9 X INFORMATION TECHNOLOGY AND CONTROL, 00, Vol., No.A ON INTEGRATING UNSUPERVISED AND SUPERVISED CLASSIFICATION FOR CREDIT RISK EVALUATION Danuta Zakrzewska Institute of Computer Science, Technical

More information

ASSOCIATION RULE MINING ON WEB LOGS FOR EXTRACTING INTERESTING PATTERNS THROUGH WEKA TOOL

ASSOCIATION RULE MINING ON WEB LOGS FOR EXTRACTING INTERESTING PATTERNS THROUGH WEKA TOOL International Journal Of Advanced Technology In Engineering And Science Www.Ijates.Com Volume No 03, Special Issue No. 01, February 2015 ISSN (Online): 2348 7550 ASSOCIATION RULE MINING ON WEB LOGS FOR

More information

Estimating Missing Attribute Values Using Dynamically-Ordered Attribute Trees

Estimating Missing Attribute Values Using Dynamically-Ordered Attribute Trees Estimating Missing Attribute Values Using Dynamically-Ordered Attribute Trees Jing Wang Computer Science Department, The University of Iowa jing-wang-1@uiowa.edu W. Nick Street Management Sciences Department,

More information

Fig. 1 A typical Knowledge Discovery process [2]

Fig. 1 A typical Knowledge Discovery process [2] Volume 4, Issue 7, July 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com A Review on Clustering

More information

Université de Montpellier 2 Hugo Alatrista-Salas : hugo.alatrista-salas@teledetection.fr

Université de Montpellier 2 Hugo Alatrista-Salas : hugo.alatrista-salas@teledetection.fr Université de Montpellier 2 Hugo Alatrista-Salas : hugo.alatrista-salas@teledetection.fr WEKA Gallirallus Zeland) australis : Endemic bird (New Characteristics Waikato university Weka is a collection

More information

Categorical Data Visualization and Clustering Using Subjective Factors

Categorical Data Visualization and Clustering Using Subjective Factors Categorical Data Visualization and Clustering Using Subjective Factors Chia-Hui Chang and Zhi-Kai Ding Department of Computer Science and Information Engineering, National Central University, Chung-Li,

More information

Data Mining Framework for Direct Marketing: A Case Study of Bank Marketing

Data Mining Framework for Direct Marketing: A Case Study of Bank Marketing www.ijcsi.org 198 Data Mining Framework for Direct Marketing: A Case Study of Bank Marketing Lilian Sing oei 1 and Jiayang Wang 2 1 School of Information Science and Engineering, Central South University

More information

Role of Customer Response Models in Customer Solicitation Center s Direct Marketing Campaign

Role of Customer Response Models in Customer Solicitation Center s Direct Marketing Campaign Role of Customer Response Models in Customer Solicitation Center s Direct Marketing Campaign Arun K Mandapaka, Amit Singh Kushwah, Dr.Goutam Chakraborty Oklahoma State University, OK, USA ABSTRACT Direct

More information

IDENTIFIC ATION OF SOFTWARE EROSION USING LOGISTIC REGRESSION

IDENTIFIC ATION OF SOFTWARE EROSION USING LOGISTIC REGRESSION http:// IDENTIFIC ATION OF SOFTWARE EROSION USING LOGISTIC REGRESSION Harinder Kaur 1, Raveen Bajwa 2 1 PG Student., CSE., Baba Banda Singh Bahadur Engg. College, Fatehgarh Sahib, (India) 2 Asstt. Prof.,

More information

Data Mining for Manufacturing: Preventive Maintenance, Failure Prediction, Quality Control

Data Mining for Manufacturing: Preventive Maintenance, Failure Prediction, Quality Control Data Mining for Manufacturing: Preventive Maintenance, Failure Prediction, Quality Control Andre BERGMANN Salzgitter Mannesmann Forschung GmbH; Duisburg, Germany Phone: +49 203 9993154, Fax: +49 203 9993234;

More information

Tweaking Naïve Bayes classifier for intelligent spam detection

Tweaking Naïve Bayes classifier for intelligent spam detection 682 Tweaking Naïve Bayes classifier for intelligent spam detection Ankita Raturi 1 and Sunil Pranit Lal 2 1 University of California, Irvine, CA 92697, USA. araturi@uci.edu 2 School of Computing, Information

More information

A Regression Approach for Forecasting Vendor Revenue in Telecommunication Industries

A Regression Approach for Forecasting Vendor Revenue in Telecommunication Industries A Regression Approach for Forecasting Vendor Revenue in Telecommunication Industries Aida Mustapha *1, Farhana M. Fadzil #2 * Faculty of Computer Science and Information Technology, Universiti Tun Hussein

More information

Data Mining for Fun and Profit

Data Mining for Fun and Profit Data Mining for Fun and Profit Data mining is the extraction of implicit, previously unknown, and potentially useful information from data. - Ian H. Witten, Data Mining: Practical Machine Learning Tools

More information

Data Mining Part 5. Prediction

Data Mining Part 5. Prediction Data Mining Part 5. Prediction 5.1 Spring 2010 Instructor: Dr. Masoud Yaghini Outline Classification vs. Numeric Prediction Prediction Process Data Preparation Comparing Prediction Methods References Classification

More information

Building A Smart Academic Advising System Using Association Rule Mining

Building A Smart Academic Advising System Using Association Rule Mining Building A Smart Academic Advising System Using Association Rule Mining Raed Shatnawi +962795285056 raedamin@just.edu.jo Qutaibah Althebyan +962796536277 qaalthebyan@just.edu.jo Baraq Ghalib & Mohammed

More information

Comparison of K-means and Backpropagation Data Mining Algorithms

Comparison of K-means and Backpropagation Data Mining Algorithms Comparison of K-means and Backpropagation Data Mining Algorithms Nitu Mathuriya, Dr. Ashish Bansal Abstract Data mining has got more and more mature as a field of basic research in computer science and

More information

COURSE RECOMMENDER SYSTEM IN E-LEARNING

COURSE RECOMMENDER SYSTEM IN E-LEARNING International Journal of Computer Science and Communication Vol. 3, No. 1, January-June 2012, pp. 159-164 COURSE RECOMMENDER SYSTEM IN E-LEARNING Sunita B Aher 1, Lobo L.M.R.J. 2 1 M.E. (CSE)-II, Walchand

More information

Knowledge Discovery and Data Mining. Structured vs. Non-Structured Data

Knowledge Discovery and Data Mining. Structured vs. Non-Structured Data Knowledge Discovery and Data Mining Unit # 2 1 Structured vs. Non-Structured Data Most business databases contain structured data consisting of well-defined fields with numeric or alphanumeric values.

More information

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014 RESEARCH ARTICLE OPEN ACCESS A Survey of Data Mining: Concepts with Applications and its Future Scope Dr. Zubair Khan 1, Ashish Kumar 2, Sunny Kumar 3 M.Tech Research Scholar 2. Department of Computer

More information

A Fast and Efficient Method to Find the Conditional Functional Dependencies in Databases

A Fast and Efficient Method to Find the Conditional Functional Dependencies in Databases International Journal of Engineering Research and Development e-issn: 2278-067X, p-issn: 2278-800X, www.ijerd.com Volume 3, Issue 5 (August 2012), PP. 56-61 A Fast and Efficient Method to Find the Conditional

More information

Numerical Algorithms for Predicting Sports Results

Numerical Algorithms for Predicting Sports Results Numerical Algorithms for Predicting Sports Results by Jack David Blundell, 1 School of Computing, Faculty of Engineering ABSTRACT Numerical models can help predict the outcome of sporting events. The features

More information

Manjeet Kaur Bhullar, Kiranbir Kaur Department of CSE, GNDU, Amritsar, Punjab, India

Manjeet Kaur Bhullar, Kiranbir Kaur Department of CSE, GNDU, Amritsar, Punjab, India Volume 5, Issue 6, June 2015 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Multiple Pheromone

More information

A fast multi-class SVM learning method for huge databases

A fast multi-class SVM learning method for huge databases www.ijcsi.org 544 A fast multi-class SVM learning method for huge databases Djeffal Abdelhamid 1, Babahenini Mohamed Chaouki 2 and Taleb-Ahmed Abdelmalik 3 1,2 Computer science department, LESIA Laboratory,

More information

Detection of Heart Diseases by Mathematical Artificial Intelligence Algorithm Using Phonocardiogram Signals

Detection of Heart Diseases by Mathematical Artificial Intelligence Algorithm Using Phonocardiogram Signals International Journal of Innovation and Applied Studies ISSN 2028-9324 Vol. 3 No. 1 May 2013, pp. 145-150 2013 Innovative Space of Scientific Research Journals http://www.issr-journals.org/ijias/ Detection

More information

Impact of Boolean factorization as preprocessing methods for classification of Boolean data

Impact of Boolean factorization as preprocessing methods for classification of Boolean data Impact of Boolean factorization as preprocessing methods for classification of Boolean data Radim Belohlavek, Jan Outrata, Martin Trnecka Data Analysis and Modeling Lab (DAMOL) Dept. Computer Science,

More information

Hedging Against Foreign Exchange Risk of Peso-Dollar Rates Using Futures

Hedging Against Foreign Exchange Risk of Peso-Dollar Rates Using Futures Applied Mathematical Sciences, Vol. 8, 2014, no. 110, 5469-5476 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/ams.2014.47595 Hedging Against Foreign Exchange Risk of Peso-Dollar Rates Using Futures

More information

Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets

Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets http://info.salford-systems.com/jsm-2015-ctw August 2015 Salford Systems Course Outline Demonstration of two classification

More information

Data, Measurements, Features

Data, Measurements, Features Data, Measurements, Features Middle East Technical University Dep. of Computer Engineering 2009 compiled by V. Atalay What do you think of when someone says Data? We might abstract the idea that data are

More information

A Review of Anomaly Detection Techniques in Network Intrusion Detection System

A Review of Anomaly Detection Techniques in Network Intrusion Detection System A Review of Anomaly Detection Techniques in Network Intrusion Detection System Dr.D.V.S.S.Subrahmanyam Professor, Dept. of CSE, Sreyas Institute of Engineering & Technology, Hyderabad, India ABSTRACT:In

More information

Implementation of Data Mining Techniques to Perform Market Analysis

Implementation of Data Mining Techniques to Perform Market Analysis Implementation of Data Mining Techniques to Perform Market Analysis B.Sabitha 1, N.G.Bhuvaneswari Amma 2, G.Annapoorani 3, P.Balasubramanian 4 PG Scholar, Indian Institute of Information Technology, Srirangam,

More information

Forecasting Stock Prices using a Weightless Neural Network. Nontokozo Mpofu

Forecasting Stock Prices using a Weightless Neural Network. Nontokozo Mpofu Forecasting Stock Prices using a Weightless Neural Network Nontokozo Mpofu Abstract In this research work, we propose forecasting stock prices in the stock market industry in Zimbabwe using a Weightless

More information

Customer Classification And Prediction Based On Data Mining Technique

Customer Classification And Prediction Based On Data Mining Technique Customer Classification And Prediction Based On Data Mining Technique Ms. Neethu Baby 1, Mrs. Priyanka L.T 2 1 M.E CSE, Sri Shakthi Institute of Engineering and Technology, Coimbatore 2 Assistant Professor

More information

Internet of Things for Smart Crime Detection

Internet of Things for Smart Crime Detection Contemporary Engineering Sciences, Vol. 7, 2014, no. 15, 749-754 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/ces.2014.4685 Internet of Things for Smart Crime Detection Jeong-Yong Byun, Aziz

More information

Pattern Mining and Querying in Business Intelligence

Pattern Mining and Querying in Business Intelligence Pattern Mining and Querying in Business Intelligence Nittaya Kerdprasop, Fonthip Koongaew, Phaichayon Kongchai, Kittisak Kerdprasop Data Engineering Research Unit, School of Computer Engineering, Suranaree

More information

Enhancing Quality of Data using Data Mining Method

Enhancing Quality of Data using Data Mining Method JOURNAL OF COMPUTING, VOLUME 2, ISSUE 9, SEPTEMBER 2, ISSN 25-967 WWW.JOURNALOFCOMPUTING.ORG 9 Enhancing Quality of Data using Data Mining Method Fatemeh Ghorbanpour A., Mir M. Pedram, Kambiz Badie, Mohammad

More information

A CRM -DATA MINING FRAMEWORK FOR DIRECT BANK MARKETING USING CLASSIFICATION MODEL

A CRM -DATA MINING FRAMEWORK FOR DIRECT BANK MARKETING USING CLASSIFICATION MODEL A CRM -DATA MINING FRAMEWORK FOR DIRECT BANK MARKETING USING CLASSIFICATION MODEL Miss. Akanksha P. Ingole 1, Prof. P.D.Thakare 2 1 M.E. Department of Computer science & Engineering, Jagadamba college

More information

Available online at Math. Finance Lett. 2014, 2014:2 ISSN A RULE OF THUMB FOR REJECT INFERENCE IN CREDIT SCORING

Available online at  Math. Finance Lett. 2014, 2014:2 ISSN A RULE OF THUMB FOR REJECT INFERENCE IN CREDIT SCORING Available online at http://scik.org Math. Finance Lett. 2014, 2014:2 ISSN 2051-2929 A RULE OF THUMB FOR REJECT INFERENCE IN CREDIT SCORING GUOPING ZENG AND QI ZHAO Think Finance, 4150 International Plaza,

More information

Comparison of Data Mining Techniques used for Financial Data Analysis

Comparison of Data Mining Techniques used for Financial Data Analysis Comparison of Data Mining Techniques used for Financial Data Analysis Abhijit A. Sawant 1, P. M. Chawan 2 1 Student, 2 Associate Professor, Department of Computer Technology, VJTI, Mumbai, INDIA Abstract

More information

End-User Satisfaction in ERP System: Application of Logit Modeling

End-User Satisfaction in ERP System: Application of Logit Modeling Applied Mathematical Sciences, Vol. 8, 2014, no. 24, 1187-1192 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/ams.2014.4284 End-User Satisfaction in ERP System: Application of Logit Modeling Hashem

More information

DATA PREPARATION FOR DATA MINING

DATA PREPARATION FOR DATA MINING Applied Artificial Intelligence, 17:375 381, 2003 Copyright # 2003 Taylor & Francis 0883-9514/03 $12.00 +.00 DOI: 10.1080/08839510390219264 u DATA PREPARATION FOR DATA MINING SHICHAO ZHANG and CHENGQI

More information

Open Access A Facial Expression Recognition Algorithm Based on Local Binary Pattern and Empirical Mode Decomposition

Open Access A Facial Expression Recognition Algorithm Based on Local Binary Pattern and Empirical Mode Decomposition Send Orders for Reprints to reprints@benthamscience.ae The Open Electrical & Electronic Engineering Journal, 2014, 8, 599-604 599 Open Access A Facial Expression Recognition Algorithm Based on Local Binary

More information

Introduction to Learning & Decision Trees

Introduction to Learning & Decision Trees Artificial Intelligence: Representation and Problem Solving 5-38 April 0, 2007 Introduction to Learning & Decision Trees Learning and Decision Trees to learning What is learning? - more than just memorizing

More information

DATA MINING TOOL FOR INTEGRATED COMPLAINT MANAGEMENT SYSTEM WEKA 3.6.7

DATA MINING TOOL FOR INTEGRATED COMPLAINT MANAGEMENT SYSTEM WEKA 3.6.7 DATA MINING TOOL FOR INTEGRATED COMPLAINT MANAGEMENT SYSTEM WEKA 3.6.7 UNDER THE GUIDANCE Dr. N.P. DHAVALE, DGM, INFINET Department SUBMITTED TO INSTITUTE FOR DEVELOPMENT AND RESEARCH IN BANKING TECHNOLOGY

More information

CLASSIFICATION JELENA JOVANOVIĆ. Web:

CLASSIFICATION JELENA JOVANOVIĆ.   Web: CLASSIFICATION JELENA JOVANOVIĆ Email: jeljov@gmail.com Web: http://jelenajovanovic.net OUTLINE What is classification? Binary and multiclass classification Classification algorithms Performance measures

More information

Beating the MLB Moneyline

Beating the MLB Moneyline Beating the MLB Moneyline Leland Chen llxchen@stanford.edu Andrew He andu@stanford.edu 1 Abstract Sports forecasting is a challenging task that has similarities to stock market prediction, requiring time-series

More information

Visualization of large data sets using MDS combined with LVQ.

Visualization of large data sets using MDS combined with LVQ. Visualization of large data sets using MDS combined with LVQ. Antoine Naud and Włodzisław Duch Department of Informatics, Nicholas Copernicus University, Grudziądzka 5, 87-100 Toruń, Poland. www.phys.uni.torun.pl/kmk

More information

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015 An Introduction to Data Mining for Wind Power Management Spring 2015 Big Data World Every minute: Google receives over 4 million search queries Facebook users share almost 2.5 million pieces of content

More information

Performance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification

Performance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification Performance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification Tina R. Patil, Mrs. S. S. Sherekar Sant Gadgebaba Amravati University, Amravati tnpatil2@gmail.com, ss_sherekar@rediffmail.com

More information

Research of Postal Data mining system based on big data

Research of Postal Data mining system based on big data 3rd International Conference on Mechatronics, Robotics and Automation (ICMRA 2015) Research of Postal Data mining system based on big data Xia Hu 1, Yanfeng Jin 1, Fan Wang 1 1 Shi Jiazhuang Post & Telecommunication

More information

Choosing the Best Classification Performance Metric for Wrapper-based Software Metric Selection for Defect Prediction

Choosing the Best Classification Performance Metric for Wrapper-based Software Metric Selection for Defect Prediction Choosing the Best Classification Performance Metric for Wrapper-based Software Metric Selection for Defect Prediction Huanjing Wang Western Kentucky University huanjing.wang@wku.edu Taghi M. Khoshgoftaar

More information

Fraudulent Behavior Forecast in Telecom Industry Based on Data Mining Technology

Fraudulent Behavior Forecast in Telecom Industry Based on Data Mining Technology Communications of the IIMA Volume 7 Issue 4 Article 1 2007 Fraudulent Behavior Forecast in Telecom Industry Based on Data Mining Technology Sen Wu School of Economics and Management University of Science

More information

Data Mining Solutions for the Business Environment

Data Mining Solutions for the Business Environment Database Systems Journal vol. IV, no. 4/2013 21 Data Mining Solutions for the Business Environment Ruxandra PETRE University of Economic Studies, Bucharest, Romania ruxandra_stefania.petre@yahoo.com Over

More information

A STUDY OF DATA MINING ACTIVITIES FOR MARKET RESEARCH

A STUDY OF DATA MINING ACTIVITIES FOR MARKET RESEARCH 205 A STUDY OF DATA MINING ACTIVITIES FOR MARKET RESEARCH ABSTRACT MR. HEMANT KUMAR*; DR. SARMISTHA SARMA** *Assistant Professor, Department of Information Technology (IT), Institute of Innovation in Technology

More information

The Role of Size Normalization on the Recognition Rate of Handwritten Numerals

The Role of Size Normalization on the Recognition Rate of Handwritten Numerals The Role of Size Normalization on the Recognition Rate of Handwritten Numerals Chun Lei He, Ping Zhang, Jianxiong Dong, Ching Y. Suen, Tien D. Bui Centre for Pattern Recognition and Machine Intelligence,

More information

Cluster analysis and Association analysis for the same data

Cluster analysis and Association analysis for the same data Cluster analysis and Association analysis for the same data Huaiguo Fu Telecommunications Software & Systems Group Waterford Institute of Technology Waterford, Ireland hfu@tssg.org Abstract: Both cluster

More information

Dr. Antony Selvadoss Thanamani, Head & Associate Professor, Department of Computer Science, NGM College, Pollachi, India.

Dr. Antony Selvadoss Thanamani, Head & Associate Professor, Department of Computer Science, NGM College, Pollachi, India. Enhanced Approach on Web Page Classification Using Machine Learning Technique S.Gowri Shanthi Research Scholar, Department of Computer Science, NGM College, Pollachi, India. Dr. Antony Selvadoss Thanamani,

More information

Methodology Framework for Analysis and Design of Business Intelligence Systems

Methodology Framework for Analysis and Design of Business Intelligence Systems Applied Mathematical Sciences, Vol. 7, 2013, no. 31, 1523-1528 HIKARI Ltd, www.m-hikari.com Methodology Framework for Analysis and Design of Business Intelligence Systems Martin Závodný Department of Information

More information

Web Document Clustering

Web Document Clustering Web Document Clustering Lab Project based on the MDL clustering suite http://www.cs.ccsu.edu/~markov/mdlclustering/ Zdravko Markov Computer Science Department Central Connecticut State University New Britain,

More information

A Novel Feature Selection Method Based on an Integrated Data Envelopment Analysis and Entropy Mode

A Novel Feature Selection Method Based on an Integrated Data Envelopment Analysis and Entropy Mode A Novel Feature Selection Method Based on an Integrated Data Envelopment Analysis and Entropy Mode Seyed Mojtaba Hosseini Bamakan, Peyman Gholami RESEARCH CENTRE OF FICTITIOUS ECONOMY & DATA SCIENCE UNIVERSITY

More information

Introduction to Data Mining

Introduction to Data Mining Introduction to Data Mining Jay Urbain Credits: Nazli Goharian & David Grossman @ IIT Outline Introduction Data Pre-processing Data Mining Algorithms Naïve Bayes Decision Tree Neural Network Association

More information

Research on Clustering Analysis of Big Data Yuan Yuanming 1, 2, a, Wu Chanle 1, 2

Research on Clustering Analysis of Big Data Yuan Yuanming 1, 2, a, Wu Chanle 1, 2 Advanced Engineering Forum Vols. 6-7 (2012) pp 82-87 Online: 2012-09-26 (2012) Trans Tech Publications, Switzerland doi:10.4028/www.scientific.net/aef.6-7.82 Research on Clustering Analysis of Big Data

More information

Personalized Expedia Hotel Searches

Personalized Expedia Hotel Searches Personalized Expedia Hotel Searches Xinxing Jiang, Yao Xiao, and Shunji Li Stanford University December 13, 2013 Abstract In this paper, we propose machine learning algorithms with search data of Expedia

More information

Decision Trees for Mining Data Streams Based on the Gaussian Approximation

Decision Trees for Mining Data Streams Based on the Gaussian Approximation International Journal of Computer Sciences and Engineering Open Access Review Paper Volume-4, Issue-3 E-ISSN: 2347-2693 Decision Trees for Mining Data Streams Based on the Gaussian Approximation S.Babu

More information

Data Mining in Web Search Engine Optimization and User Assisted Rank Results

Data Mining in Web Search Engine Optimization and User Assisted Rank Results Data Mining in Web Search Engine Optimization and User Assisted Rank Results Minky Jindal Institute of Technology and Management Gurgaon 122017, Haryana, India Nisha kharb Institute of Technology and Management

More information

Neural Networks in Data Mining

Neural Networks in Data Mining IOSR Journal of Engineering (IOSRJEN) ISSN (e): 2250-3021, ISSN (p): 2278-8719 Vol. 04, Issue 03 (March. 2014), V6 PP 01-06 www.iosrjen.org Neural Networks in Data Mining Ripundeep Singh Gill, Ashima Department

More information

Word Spotting in Cursive Handwritten Documents using Modified Character Shape Codes

Word Spotting in Cursive Handwritten Documents using Modified Character Shape Codes Word Spotting in Cursive Handwritten Documents using Modified Character Shape Codes Sayantan Sarkar Department of Electrical Engineering, NIT Rourkela sayantansarkar24@gmail.com Abstract.There is a large

More information

Grid Density Clustering Algorithm

Grid Density Clustering Algorithm Grid Density Clustering Algorithm Amandeep Kaur Mann 1, Navneet Kaur 2, Scholar, M.Tech (CSE), RIMT, Mandi Gobindgarh, Punjab, India 1 Assistant Professor (CSE), RIMT, Mandi Gobindgarh, Punjab, India 2

More information

Experiments in Web Page Classification for Semantic Web

Experiments in Web Page Classification for Semantic Web Experiments in Web Page Classification for Semantic Web Asad Satti, Nick Cercone, Vlado Kešelj Faculty of Computer Science, Dalhousie University E-mail: {rashid,nick,vlado}@cs.dal.ca Abstract We address

More information

Use of Data Mining Techniques to Improve the Effectiveness of Sales and Marketing

Use of Data Mining Techniques to Improve the Effectiveness of Sales and Marketing Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 4, April 2015,

More information

What is Customer Relationship Management? Customer Relationship Management Analytics. Customer Life Cycle. Objectives of CRM. Three Types of CRM

What is Customer Relationship Management? Customer Relationship Management Analytics. Customer Life Cycle. Objectives of CRM. Three Types of CRM Relationship Management Analytics What is Relationship Management? CRM is a strategy which utilises a combination of Week 13: Summary information technology policies processes, employees to develop profitable

More information

International Journal of Electronics and Computer Science Engineering 1449

International Journal of Electronics and Computer Science Engineering 1449 International Journal of Electronics and Computer Science Engineering 1449 Available Online at www.ijecse.org ISSN- 2277-1956 Neural Networks in Data Mining Priyanka Gaur Department of Information and

More information

SPATIAL DATA CLASSIFICATION AND DATA MINING

SPATIAL DATA CLASSIFICATION AND DATA MINING , pp.-40-44. Available online at http://www. bioinfo. in/contents. php?id=42 SPATIAL DATA CLASSIFICATION AND DATA MINING RATHI J.B. * AND PATIL A.D. Department of Computer Science & Engineering, Jawaharlal

More information

Action Reducts. Computer Science Department, University of Pittsburgh at Johnstown, Johnstown, PA 15904, USA 2

Action Reducts. Computer Science Department, University of Pittsburgh at Johnstown, Johnstown, PA 15904, USA 2 Action Reducts Seunghyun Im 1, Zbigniew Ras 2,3, Li-Shiang Tsay 4 1 Computer Science Department, University of Pittsburgh at Johnstown, Johnstown, PA 15904, USA sim@pitt.edu 2 Computer Science Department,

More information

Semi-Supervised and Unsupervised Machine Learning. Novel Strategies

Semi-Supervised and Unsupervised Machine Learning. Novel Strategies Brochure More information from http://www.researchandmarkets.com/reports/2179190/ Semi-Supervised and Unsupervised Machine Learning. Novel Strategies Description: This book provides a detailed and up to

More information

IMPROVING QUALITY OF FEEDBACK MECHANISM

IMPROVING QUALITY OF FEEDBACK MECHANISM IMPROVING QUALITY OF FEEDBACK MECHANISM IN UN BY USING DATA MINING TECHNIQUES Mohammed Alnajjar 1, Prof. Samy S. Abu Naser 2 1 Faculty of Information Technology, The Islamic University of Gaza, Palestine

More information

Getting Even More Out of Ensemble Selection

Getting Even More Out of Ensemble Selection Getting Even More Out of Ensemble Selection Quan Sun Department of Computer Science The University of Waikato Hamilton, New Zealand qs12@cs.waikato.ac.nz ABSTRACT Ensemble Selection uses forward stepwise

More information

Chapter 6. The stacking ensemble approach

Chapter 6. The stacking ensemble approach 82 This chapter proposes the stacking ensemble approach for combining different data mining classifiers to get better performance. Other combination techniques like voting, bagging etc are also described

More information

Robust Outlier Detection Technique in Data Mining: A Univariate Approach

Robust Outlier Detection Technique in Data Mining: A Univariate Approach Robust Outlier Detection Technique in Data Mining: A Univariate Approach Singh Vijendra and Pathak Shivani Faculty of Engineering and Technology Mody Institute of Technology and Science Lakshmangarh, Sikar,

More information

Prediction of Stock Performance Using Analytical Techniques

Prediction of Stock Performance Using Analytical Techniques 136 JOURNAL OF EMERGING TECHNOLOGIES IN WEB INTELLIGENCE, VOL. 5, NO. 2, MAY 2013 Prediction of Stock Performance Using Analytical Techniques Carol Hargreaves Institute of Systems Science National University

More information

A Framework for Data Migration between Various Types of Relational Database Management Systems

A Framework for Data Migration between Various Types of Relational Database Management Systems A Framework for Data Migration between Various Types of Relational Database Management Systems Ahlam Mohammad Al Balushi Sultanate of Oman, International Maritime College Oman ABSTRACT Data Migration is

More information

CHURN PREDICTION IN MOBILE TELECOM SYSTEM USING DATA MINING TECHNIQUES

CHURN PREDICTION IN MOBILE TELECOM SYSTEM USING DATA MINING TECHNIQUES International Journal of Scientific and Research Publications, Volume 4, Issue 4, April 2014 1 CHURN PREDICTION IN MOBILE TELECOM SYSTEM USING DATA MINING TECHNIQUES DR. M.BALASUBRAMANIAN *, M.SELVARANI

More information