FEATURE SELECTIONS AND CLASSIFICATION MODEL FOR CUSTOMER CHURN
|
|
|
- Meagan Wilkins
- 9 years ago
- Views:
Transcription
1 FEATURE SELECTIONS AND CLASSIFICATION MODEL FOR CUSTOMER CHURN 1 SYAFIQAH NAFIS, 2 MOKHAIRI MAKHTAR, 3 MOHD KHALID AWANG, 4 M. NORDIN A.RAHMAN, 5 MUSTAFA MAT DERIS 1, 2, 3, 4 Faculty of Informatics and Computing, Universiti Sultan Zainal Abidin, Kuala Terengganu, Malaysia 5 Faculties of Computer Science and Information Technology, Universiti Tun Hussein Onn Malaysia, Malaysia 1 [email protected], 2 [email protected], 3 [email protected], 4 [email protected], 5 [email protected] ABSTRACT As customers actively exercise their right to change to a better service and since engaging new customers is more costly compared to retaining loyal customers, customer churn has become the main focus for one organization. This phenomenon affects many industries such as telecommunication companies which need to provide excellent service in order to win over the competition. Several models were developed in previous research using various methods such as the conventional statistical method, decision tree based model and neural network based approach in predicting customer churn. Several experiments were conducted in this research for feature selection and classification from selected customer churn dataset to compare its usefulness among the different feature selections and classifications using a data mining tool. The results from the experiments showed that the Logistic Model Tree (LMT) method is the best method for this dataset with a 95% accuracy enhanced using neural network from previous research. Keywords: Customer Churn, Classification, Feature Selection, Telecommunications, Data Mining 1. INTRODUCTION Telecommunications is the number one requirement today of the young and the elderly due to advances in computer and network technologies. The telecommunications industry has grown well due to this phenomenon. However, there is stiff competition amongst the telecommunication companies since customers actively exercise their right to switch from one provider to another which satisfies their needs defined as, churn. Churn is defined as turnover action by a customer of a business or service. There are also common clashes whereby customers demand excellent service with a cheaper price while are strictly focused on their business goal i.e. to increase the churn rate. The fact is that acquiring a new customer is more expensive as compared to retaining existing subscribers as mentioned in previous research [1]. Econsultancy is the leading source of independent advice and insight on digital marketing and e-commerce founded in Econsultancy published the Cross-channel Marketing Report 2013 which denied some compelling statistics for those who believed that customer retention is undervalued as illustrated in Figure 1 and Figure 2. Figure 1 illustrates a company s concern on relationship marketing. It shows that only 30% of its staff is committed to customer relationship management and about 22% do not really care about relationship marketing which in turn indicates that their awareness of customer relationship is relatively minimal. Figure 2 shows that 70% of the respondents agree that it is cheaper to retain customers than to acquire new customers which supports [1]. 356
2 Does your company carry out relationship marketing? Yes, we are Yes, to a certain very commited extent to this No, we don't Figure 1.Company Awareness on Relationship Marketing Figure 2.Customer Relationship Management Awareness. Data Mining (DM) has been proposed as a technique to predict customer churn. Decision tree, neural network and Naïve Bayes are a few examples of classification methods in data mining. Classification refers to a process in data mining for pattern extraction and categorization. Rapid developments in the classification field in recent years have triggered researchers to develop a model to overcome the churn problem in s. Past research has proven that classification is capable of assisting Customer Relationship Management (CRM) concerning churn management as in [2]. CRM is a huge concern in this issue to optimize their marketing strategies or take corrective action to prevent customer churn as well as to maintain and improve their services. In short, if CRMs are fully aware which customers are at high risk of churn, they can design a treatment program to address the issue. For instance, CRMs can offer new telecommunication packages or services to retain 357
3 existing customers as well as attract new subscribers. In this paper, it targets to evaluate usefulness among different feature selections and classifications in order to identify the best model to classify customer churn, besides improving accuracy using Neural Network in [3] by demonstrating several experiments for feature selection and classification from selected customer churn dataset. This paper is organized as follows: In section 2, it describes related work in customer churn, feature selection and classification. Section 3 gives detailed explanation on the tested dataset, software tool used and classification method applied. Section 4 discusses the experiment and analysis followed by the conclusion and future works in section LITERATURE REVIEW 2.1 Related Work Recently customer churn is the main concern for many companies to maximize profit. Investment in CRM technology has become crucial because of the competitive growth of telecommunication companies which has reached the stage of maturity. Besides, many highly competitive organizations understand that retaining existing and valuable customers is their core managerial strategy to survive in the industry [4]. Churn can be divided into two types which is voluntary churn and involuntary churn. Voluntary churn is varied and more complex. For example, due to the poor service provided, a customer decides to terminate the service. Meanwhile, if a customer is disconnected by the service provider for fraud or non-payment, it is categorized as involuntary churn. The focus in this paper is to overcome voluntary churn because voluntary churn is an unexpected occurrence. Based on a thorough study in churn management, it was found that a large number of studies have been conducted in various areas. Thus, this research identified which method is the best for customer churn classification. Table 1 below, explains briefly the various methods used in customer churn prediction in different domains. Table 1.Previous Research in Different Domains Case Study Cellular Network Services [3] Method Neural Network Based approach Insurance Customer [5] Churn Index and Decision Tree CART. Telecommunications Industry [6] Decision Tree Analysis Food Industry [7] Random Forest and Boosted Trees Techniques Telecommunication Company [8] Mobile Industry [ 9] Telephony Wireless Industry [10] Australia Industry [11] Nursing Bank Customer (India scenario) [12] Neural Network (Back Propagation) Naïve Bayes and Bayesian Network Naïve Bayes Qualitative Design CART, C Classification Model Generally, classification can be referred to as a process to categorize objects according to the characteristics of the objects. Classification is part of the data mining process. In data mining, classification is defined as an analyzing task for a set of pre-classified data objects to study a model or function that can be used or applied to unseen data objects before being placed into one of several predefined classes [13]. Classification has been widely used in many domains such as the application of classification in multimedia data management [14], in classification [15] and images classification [16]. A thorough search of classification methodologies used in literature revealed that no one methodology can be served as a recipe for a comparison of all computational classification methods [17]. 2.3 Feature Selection Method Feature selection is one of the important phases in data pre-processing for data mining. The objective of feature selection is to select the best subset containing the least number of dimensions that contribute the most to accuracy, and discards the remaining, unimportant dimensions [18]. Basically, two approaches are used in feature selection that is the forward selection and backward selection. This paper uses the forward selection in the Best First feature selection method. Two feature selection evaluators are used. They are the 358
4 CfsSubsetEvaluator and the Leave One Out evaluator. The two feature sets are defined as follows: a) CfsSubsetEvaluator Phone, International Plan, Day Min and Customer Service Call b) Leave One out Evaluator Voice Mail Plan, Day Min, Eve Min and Customer Service Call 3. CASE STUDY 3.1 WEKA WEKA was selected as a DM tool for feature selection and classification because of its simple interface as used in [19]. It is developed on Java platform which provides a collection of machine learning and data mining algorithms for data classification, clustering, association rule, and evaluation [20]. WEKA is a software which is distributed under the GNU General Public License, followed by some lessons learned over a period spanning its development and maintenance [21]. In this paper, the WEKA tool was selected for feature selection and classification due to its open source nature and ease of use of its interface which allows the user to apply data mining methods directly to the selected dataset. 3.2 Dataset The challenge in finding dataset is business confidentiality and privacy. Thus, this paper used a dataset acquired from UCI Repository of Machine Learning Database of the University of California Irvine. The classification is implemented on WEKA, a tool for data mining. The dataset consists of 3,333 cleaned objects and 20 instances along with one indicator whether or not to churn. The instances are briefly explained as follows: State: Categorical variable, for the 50 states and the district of Columbia Account length: Integer-valued variable for how long account has been active Area code: Categorical variable Phone number: Essentially a surrogate key for customer identification International Plan: Dichotomous categorical having yes or no value Voice Mail Plan: Dichotomous categorical variable yes or no value Number of voice mail messages: Integervalued variable Total day minutes: Continuous variable for number of minutes customer has used the service during the day Total day calls: Integer-valued variable Total day charge: Continuous variable based on foregoing two variables Total evening minutes: Continuous variable for number of minutes customer has used the service during the evening Total evening calls: Integer-valued variable Total evening charge: Continuous variable based on previous two variables Total night minutes: Continuous variable for storing minutes the customer has used the service during the night Total night calls: Integer-valued variable Total night charge: Continuous variable based on foregoing two variables Total international minutes: Continuous variable for minutes customer has used service to make international calls Total international calls: Integer-valued variable Total international charge: Continuous variable based on foregoing two variables Number of calls to customer service: Integer-valued variable In this research, 10 fold-cross validations were used. The dataset was tested using four different classification methods which include: 359
5 decision tree, decision rules, Naïve Bayes and 4 neural networks. These four types of methods were chosen because they are commonly used in data mining Decision Tree Decision tree is one of the well-known classifiers among researchers. It creates a decision tree from data to classify a class. A decision-tree model classifies an instance by sorting it through the tree to the appropriate leaf node, i.e. each node represents a classification [4]. A common decision tree learning algorithm adopts a top-down recursive divide-and conquer strategy to construct a decision tree [13]. Figure 3 shows a simple example of a decision tree. Figure 3. Example Of A Decision Tree It starts from a root node representing the whole training data at the top. The data is split into two or more subsets (nodes) based on the values of a chosen attribute. This results in a decision tree in which data is partitioned into several nodes or leaves along branches. For each subset a child node is created and the subset is associated with the child at the bottom. Then the process is repeated separately on the data in each of the child nodes, and so on, until a termination criterion is satisfied. Many decision tree learning algorithms exist such as Random Tree, Random Forest and C Decision Rules Decision rules are a set of simple if-then rules. The example of decision rule is if age < 9 and gender = L, then the example is classified as Class X. Thus, this proves that it is an expressive, yet simple and human readable classification. Examples of decision rules are RIPPER and PART Naïve Bayes The Naïve Bayes classifier is a simple classifier to use and efficient to learn because it is based on Bayes theorem [13]. Bayes Theorem, P(C i X) = P(X C i )P(C i) P(X) Naïve Bayes just requires considering one attribute in each class separately from the training data, thus making it easier and faster. Naïve Bayes always competes well with more sophisticated classifiers and is one of the most effective and efficient inductive learning algorithms for data mining along with machine learning [22] Neural Network Neural network also refers to artificial neural network. It simply means a study to mimic the human brain although the human brain is much more complex compared to the artificial neural network developed so far. The neural network technique consists of developing a computer based decision support system to perform as a human brain. Previous research on neural network showed better performance in predicting diseases in the medical field [23]. But Multilayer Perceptron is the simplest and commonly used neural network architecture. Figure 4. Example of a Neural Network 4. RESULTS AND DISCUSSION Few experiments have been conducted to find the best feature selection and classification for experimented datasets with 10-fold cross validation. There are four classification techniques used with different feature selections and classifiers, which include Decision Tree, Naïve Bayes, Decision Rules and Neural Network. Accuracy is used to evaluate the results from the classifiers. Table 2 lists the results for decision tree classifications. Results show that the J48 method gives the highest accuracy of % without any feature selection. Accuracy for all decision tree methods decreases or remains the same if feature selection is included in the experiment. This result indicates that feature selection is not needed in 360
6 decision tree classification for the selected dataset because decision tree classification comprises a flowchart like structure whereby removing one attribute could affect the flowchart. Table 3 above lists Naïve Bayes classification methods. Bayes Net with feature selection (a) gives the highest accuracy of %. But the rest of the methods decrease in accuracy with feature selection. At this point, it can be concluded that Naïve Bayes classification methods do not need feature selection beforehand because Naïve Bayes classifiers are based on the probability model. Table 4 shows the classification methods for Decision Rule. JRip without feature selection provides the highest accuracy (94.99%). As in Decision Tree classification, add-in feature selection in Decision Rule classification reduces accuracy. This shows that all instances are important for classifying with regards to Decision Rule classifiers Table 2. Decision Tree Classification Name of Method Decision Hoeffding J48 Random Random REPTree LMT Stump Tree Forest Tree No feature selection accuracy (%) Accuracy with feature selection (a)(%) Accuracy with feature selection (b)(%) Table 3. Naïve Bayes Classification Name of Method Bayes Net Naïve Bayes Naïve Bayes Updateable Accuracy without feature selection (%) Accuracy with feature selection (a) (%) Accuracy with feature selection (a) (%) Table 4. Decision Table Classification Name Method of Accuracy without feature selection (%) Accuracy with feature selection (a) (%) Accuracy with feature selection (b) (%) Decision Table JRip OneR PART
7 Table 5. Various classification methods with state and phone attributes removed Type of classification Name of Method Accuracy (%) Neural Network Multilayer Perceptron Neural Network Voted Perceptron Decision Tree J Decision Tree LMT Decision Rule PART Decision Rule JRip Naïve Bayes Naïve Bayes Naïve Bayes Naïve Bayes Updateable Table 5 summarizes the best method from various classification techniques. This experiment is based on [3] which removed state and phone since these variables were used for identification purposes only. Thus, this step can be considered as the feature selection. No feature selection method in WEKA is used. Logistic Model Trees (LMT) achieved the highest accuracy of 95.32%. In short, it can be concluded that feature selection is needed but selecting the right feature selection is crucial. Therefore, this calls for future research. Multiple Classification Accuracy Multiple Classification Accuracy Figure 5. Multiple Classification Accuracy using Different Feature Selections Figure 5 and Figure 6 summarize the best classification methods among the four classifiers. Figure 1 shows accuracy in having a feature selection (b) and not having a feature selection (b). Figure 2 shows the best classification method when attributes such as state and phone are removed. LMT classifier secured the highest accuracy among the other five classifiers. 362
8 Multiple Classification Accuracy Multilayer Perceptron J48 LMT PART Jrip Figure 6. Multiple Classification Accuracy with state and phone attributes removed Table 6. Confusion Matrix for JRip Classifier Multiple Classification Accuracy JRip Classifier True Label False True Totals False True Total Table 7. Confusion Matrix for LMT Classifier LMT Classifier True Label False True Totals False True Total Table 6 and Table 7 show two different confusion matrixes for LMT and JRip. The LMT classifier classified about 90% churners while JRip was able to predict 89% churners. In short, the LMT is relatively better in predicting customer churn. Hence, it is the best classification model for this dataset but at the same time the LMT is also good for classifying non-churners with 96% accuracy. However, it is crucial to predict the churners as compared to the non-churners class as the objective of the company is to maximize profit by focusing on churners. For the non-churners, error in classifying them would not affect the CRM or the overall company profit. 5. CONCLUSION Churn prediction is crucial not only in the telecommunications industry since customers are aware of their right to choose the best service. 363
9 Thus, having the best prediction model would assist in retaining loyal customers in CRM. For the tested dataset, it can be concluded that it cannot represent the current telecommunications phenomenon and do not fit in with the case in Malaysia even though the Logistic Model Tree gives the best accuracy of 95% for tested dataset. The result would be different if different datasets were tested using the same method. For example, the latest and current mobile telecommunication phenomenon would have different features, which would impact on the feature selection sets for example, its mobile data usage and roaming usage. For future work, several issues could be considered. First, careful attention needs to be exercised when choosing feature selection in order to improve churn accuracy and that only important features are selected. For example, its application in soft computing such as Rough Set Theory needs to be considered. Secondly, it is a huge success if a model can predict churners beforehand although it is better to know when customers churn. Throughout this research, data limitation is identified. Experimented dataset do not represent current situation in Malaysia since it is retrieved from established repository due to business confidentiality and privacy. Finally, models can be tested for its fit for purpose with the latest/current telecommunications or mobile network sector ACKNOWLEDGEMENTS This work is partially supported by Universiti Sultan Zainal Abidin (UniSZA) and The Ministry of Education, Malaysia (Grant No.FRGS/2/2013/ICT07/UniSZA/02/2). REFERENCES: [1] J. Lu and D. Ph, Predicting Customer Churn in the Telecommunications Industry An Application of Survival Analysis Modeling Using SAS â, [2] J. Hadden, A. Tiwari, R. Roy, and D. Ruta, Churn Prediction using Complaints Data, no. 1999, [3] A. Sharma, A Neural Network based Approach for Predicting Customer Churn in Cellular Network Services, vol. 27, no. 11, pp , [4] C.-F. Tsai and M.-Y. Chen, Variable selection by association rules for customer churn prediction of multimedia on demand, Expert Syst. Appl., vol. 37, no. 3, pp , Mar [5] R. A. Soeini and K. V. Rodpysh, Applying Data Mining to Insurance Customer Churn Management, vol. 30, pp , [6] N. A. Haris, O. Support, and S. Division, Data Mining in Churn Analysis Model for Telecommunication Industry, vol. 1, no. 19, pp , [7] S. Nabavi and S. Jafari, Providing a Customer Churn Prediction Model Using Random Forest and Boosted Trees Techniques ( Case Study: Solico Food Industries Group ), vol. 3, no. 6, pp , [8] R. J. Jadhav, Churn Prediction in Telecommunication Using Data Mining Technology, vol. 2, no. 2, pp , [9] C. Kirui, L. Hong, W. Cheruiyot, and H. Kirui, Predicting Customer Churn in Mobile Telephony Industry Using Probabilistic Classifiers in Data Mining, vol. 10, no. 2, pp , [10] S. V Nath, Customer Churn Analysis in the Wireless Industry: A Data Mining Approach Customer Churn Analysis in the Wireless Industry: A Data Mining Approach, no. 561, pp. 1 20, [11] A. J. Dawson, H. Stasa, M. a Roche, C. S. E. Homer, and C. Duffield, Nursing churn and turnover in Australian hospitals: Nurses perceptions and suggestions for supportive strategies., BMC Nurs., vol. 13, p. 11, [12] U. D. Prasad, Prediction of Churn Behavior of Bank Customers, Bus. Intell. J., vol. 5, pp , [13] A. An, Classification Methods, pp , Jun [14] M. N. A. Rahman, Y. M. Lazim, and F. Mohamed, Applying Rough Set Theory in Multimedia Data Classification, vol. 1, no. 3, pp , [15] Z. Z. Z. Zhu, An Classification Model Based on Rough Set and Support Vector Machine, 2008 Fifth Int. Conf. Fuzzy Syst. Knowl. Discov., vol. 5, pp , [16] N. S. Kamarudin, M. Makhtar, S. A. Fadzli, M. Mohamad, F. S. Mohamad, M. F. Abdul Kadir, Comparison of Image Classification Techniques Using Caltech 101 Dataset, vol. 71, no. 1,
10 [17] Q. A. Al-radaideh, The Impact of Classification Evaluation Methods on Rough Set Based Classifier, no. 1, pp. 2 6, [18] L. Ladha and T. Deepa, Feature Selection Methods and Algorithms, Int. J. Comput. Sci. Eng., vol. 3, pp , [19] P. Ozer, Data Mining Algorithms for Classification, no. January, [20] A. A. A. Hafieza Ismail, Fadhilah Ahmad, Seminar Penyelidikan Siswazah UniSZA Peringkat Kebangsaan ( SEMPSIS ), Implementing WEKA as a Data Mining Tool to Analyze Students Academic Performances using Naïve Bayes Classifier Nur Hafieza Ismail, Fadhilah Ahmad, Azwa Abdul Aziz University Sultan, no. July 2011, [21] M. A. Hall and I. H. Witten, WEKA Experiences with a Java Open-Source Project, J. Mach. Learn. Res., vol. 11, pp , [22] A. M. Taha, A. Mustapha, and S. Der Chen, Naive Bayes-guided bat algorithm for feature selection, Sci. World J., vol. 2013, [23] D. R. Chowdhury, M. Chatterjee, and R. K. Samanta, An Artificial Neural Network Model for Neonatal Disease Diagnosis, Int. J. Artif. Intell. Expert Syst., vol. 2, pp ,
First Semester Computer Science Students Academic Performances Analysis by Using Data Mining Classification Algorithms
First Semester Computer Science Students Academic Performances Analysis by Using Data Mining Classification Algorithms Azwa Abdul Aziz, Nor Hafieza IsmailandFadhilah Ahmad Faculty Informatics & Computing
DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES
DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES Vijayalakshmi Mahanra Rao 1, Yashwant Prasad Singh 2 Multimedia University, Cyberjaya, MALAYSIA 1 [email protected]
A Neural Network based Approach for Predicting Customer Churn in Cellular Network Services
A Neural Network based Approach for Predicting Customer Churn in Cellular Network Services Anuj Sharma Information Systems Area Indian Institute of Management, Indore, India Dr. Prabin Kumar Panigrahi
Enhanced Boosted Trees Technique for Customer Churn Prediction Model
IOSR Journal of Engineering (IOSRJEN) ISSN (e): 2250-3021, ISSN (p): 2278-8719 Vol. 04, Issue 03 (March. 2014), V5 PP 41-45 www.iosrjen.org Enhanced Boosted Trees Technique for Customer Churn Prediction
Performance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification
Performance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification Tina R. Patil, Mrs. S. S. Sherekar Sant Gadgebaba Amravati University, Amravati [email protected], [email protected]
A Content based Spam Filtering Using Optical Back Propagation Technique
A Content based Spam Filtering Using Optical Back Propagation Technique Sarab M. Hameed 1, Noor Alhuda J. Mohammed 2 Department of Computer Science, College of Science, University of Baghdad - Iraq ABSTRACT
BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL
The Fifth International Conference on e-learning (elearning-2014), 22-23 September 2014, Belgrade, Serbia BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL SNJEŽANA MILINKOVIĆ University
Churn Prediction. Vladislav Lazarov. Marius Capota. [email protected]. [email protected]
Churn Prediction Vladislav Lazarov Technische Universität München [email protected] Marius Capota Technische Universität München [email protected] ABSTRACT The rapid growth of the market
EMPIRICAL STUDY ON SELECTION OF TEAM MEMBERS FOR SOFTWARE PROJECTS DATA MINING APPROACH
EMPIRICAL STUDY ON SELECTION OF TEAM MEMBERS FOR SOFTWARE PROJECTS DATA MINING APPROACH SANGITA GUPTA 1, SUMA. V. 2 1 Jain University, Bangalore 2 Dayanada Sagar Institute, Bangalore, India Abstract- One
International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014
RESEARCH ARTICLE OPEN ACCESS A Survey of Data Mining: Concepts with Applications and its Future Scope Dr. Zubair Khan 1, Ashish Kumar 2, Sunny Kumar 3 M.Tech Research Scholar 2. Department of Computer
Introduction to Data Mining
Introduction to Data Mining Jay Urbain Credits: Nazli Goharian & David Grossman @ IIT Outline Introduction Data Pre-processing Data Mining Algorithms Naïve Bayes Decision Tree Neural Network Association
Prediction of Heart Disease Using Naïve Bayes Algorithm
Prediction of Heart Disease Using Naïve Bayes Algorithm R.Karthiyayini 1, S.Chithaara 2 Assistant Professor, Department of computer Applications, Anna University, BIT campus, Tiruchirapalli, Tamilnadu,
ENSEMBLE DECISION TREE CLASSIFIER FOR BREAST CANCER DATA
ENSEMBLE DECISION TREE CLASSIFIER FOR BREAST CANCER DATA D.Lavanya 1 and Dr.K.Usha Rani 2 1 Research Scholar, Department of Computer Science, Sree Padmavathi Mahila Visvavidyalayam, Tirupati, Andhra Pradesh,
DATA MINING TECHNIQUES AND APPLICATIONS
DATA MINING TECHNIQUES AND APPLICATIONS Mrs. Bharati M. Ramageri, Lecturer Modern Institute of Information Technology and Research, Department of Computer Application, Yamunanagar, Nigdi Pune, Maharashtra,
Predicting the Risk of Heart Attacks using Neural Network and Decision Tree
Predicting the Risk of Heart Attacks using Neural Network and Decision Tree S.Florence 1, N.G.Bhuvaneswari Amma 2, G.Annapoorani 3, K.Malathi 4 PG Scholar, Indian Institute of Information Technology, Srirangam,
Data quality in Accounting Information Systems
Data quality in Accounting Information Systems Comparing Several Data Mining Techniques Erjon Zoto Department of Statistics and Applied Informatics Faculty of Economy, University of Tirana Tirana, Albania
TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM
TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM Thanh-Nghi Do College of Information Technology, Cantho University 1 Ly Tu Trong Street, Ninh Kieu District Cantho City, Vietnam
Using Data Mining for Mobile Communication Clustering and Characterization
Using Data Mining for Mobile Communication Clustering and Characterization A. Bascacov *, C. Cernazanu ** and M. Marcu ** * Lasting Software, Timisoara, Romania ** Politehnica University of Timisoara/Computer
Rule based Classification of BSE Stock Data with Data Mining
International Journal of Information Sciences and Application. ISSN 0974-2255 Volume 4, Number 1 (2012), pp. 1-9 International Research Publication House http://www.irphouse.com Rule based Classification
How To Identify A Churner
2012 45th Hawaii International Conference on System Sciences A New Ensemble Model for Efficient Churn Prediction in Mobile Telecommunication Namhyoung Kim, Jaewook Lee Department of Industrial and Management
Keywords Data mining, Classification Algorithm, Decision tree, J48, Random forest, Random tree, LMT, WEKA 3.7. Fig.1. Data mining techniques.
International Journal of Emerging Research in Management &Technology Research Article October 2015 Comparative Study of Various Decision Tree Classification Algorithm Using WEKA Purva Sewaiwar, Kamal Kant
ANALYSIS OF FEATURE SELECTION WITH CLASSFICATION: BREAST CANCER DATASETS
ANALYSIS OF FEATURE SELECTION WITH CLASSFICATION: BREAST CANCER DATASETS Abstract D.Lavanya * Department of Computer Science, Sri Padmavathi Mahila University Tirupati, Andhra Pradesh, 517501, India [email protected]
Social Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
Prediction of Stock Performance Using Analytical Techniques
136 JOURNAL OF EMERGING TECHNOLOGIES IN WEB INTELLIGENCE, VOL. 5, NO. 2, MAY 2013 Prediction of Stock Performance Using Analytical Techniques Carol Hargreaves Institute of Systems Science National University
Data Mining as a tool to Predict the Churn Behaviour among Indian bank customers
Data Mining as a tool to Predict the Churn Behaviour among Indian bank customers Manjit Kaur Department of Computer Science Punjabi University Patiala, India [email protected] Dr. Kawaljeet Singh University
International Journal of Computer Science Trends and Technology (IJCST) Volume 3 Issue 3, May-June 2015
RESEARCH ARTICLE OPEN ACCESS Data Mining Technology for Efficient Network Security Management Ankit Naik [1], S.W. Ahmad [2] Student [1], Assistant Professor [2] Department of Computer Science and Engineering
SURVEY OF TEXT CLASSIFICATION ALGORITHMS FOR SPAM FILTERING
I J I T E ISSN: 2229-7367 3(1-2), 2012, pp. 233-237 SURVEY OF TEXT CLASSIFICATION ALGORITHMS FOR SPAM FILTERING K. SARULADHA 1 AND L. SASIREKA 2 1 Assistant Professor, Department of Computer Science and
Analysis of WEKA Data Mining Algorithm REPTree, Simple Cart and RandomTree for Classification of Indian News
Analysis of WEKA Data Mining Algorithm REPTree, Simple Cart and RandomTree for Classification of Indian News Sushilkumar Kalmegh Associate Professor, Department of Computer Science, Sant Gadge Baba Amravati
Predicting Student Performance by Using Data Mining Methods for Classification
BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume 13, No 1 Sofia 2013 Print ISSN: 1311-9702; Online ISSN: 1314-4081 DOI: 10.2478/cait-2013-0006 Predicting Student Performance
Chapter 12 Discovering New Knowledge Data Mining
Chapter 12 Discovering New Knowledge Data Mining Becerra-Fernandez, et al. -- Knowledge Management 1/e -- 2004 Prentice Hall Additional material 2007 Dekai Wu Chapter Objectives Introduce the student to
Towards applying Data Mining Techniques for Talent Mangement
2009 International Conference on Computer Engineering and Applications IPCSIT vol.2 (2011) (2011) IACSIT Press, Singapore Towards applying Data Mining Techniques for Talent Mangement Hamidah Jantan 1,
INTERNATIONAL JOURNAL FOR ENGINEERING APPLICATIONS AND TECHNOLOGY DATA MINING IN HEALTHCARE SECTOR. [email protected]
IJFEAT INTERNATIONAL JOURNAL FOR ENGINEERING APPLICATIONS AND TECHNOLOGY DATA MINING IN HEALTHCARE SECTOR Bharti S. Takey 1, Ankita N. Nandurkar 2,Ashwini A. Khobragade 3,Pooja G. Jaiswal 4,Swapnil R.
Data Mining Algorithms Part 1. Dejan Sarka
Data Mining Algorithms Part 1 Dejan Sarka Join the conversation on Twitter: @DevWeek #DW2015 Instructor Bio Dejan Sarka ([email protected]) 30 years of experience SQL Server MVP, MCT, 13 books 7+ courses
Chapter 6. The stacking ensemble approach
82 This chapter proposes the stacking ensemble approach for combining different data mining classifiers to get better performance. Other combination techniques like voting, bagging etc are also described
Data Mining Classification: Decision Trees
Data Mining Classification: Decision Trees Classification Decision Trees: what they are and how they work Hunt s (TDIDT) algorithm How to select the best split How to handle Inconsistent data Continuous
Classification algorithm in Data mining: An Overview
Classification algorithm in Data mining: An Overview S.Neelamegam #1, Dr.E.Ramaraj *2 #1 M.phil Scholar, Department of Computer Science and Engineering, Alagappa University, Karaikudi. *2 Professor, Department
Dr. U. Devi Prasad Associate Professor Hyderabad Business School GITAM University, Hyderabad Email: [email protected]
96 Business Intelligence Journal January PREDICTION OF CHURN BEHAVIOR OF BANK CUSTOMERS USING DATA MINING TOOLS Dr. U. Devi Prasad Associate Professor Hyderabad Business School GITAM University, Hyderabad
Data Mining Framework for Direct Marketing: A Case Study of Bank Marketing
www.ijcsi.org 198 Data Mining Framework for Direct Marketing: A Case Study of Bank Marketing Lilian Sing oei 1 and Jiayang Wang 2 1 School of Information Science and Engineering, Central South University
Application of Data Mining in Medical Decision Support System
Application of Data Mining in Medical Decision Support System Habib Shariff Mahmud School of Engineering & Computing Sciences University of East London - FTMS College Technology Park Malaysia Bukit Jalil,
Final Project Report
CPSC545 by Introduction to Data Mining Prof. Martin Schultz & Prof. Mark Gerstein Student Name: Yu Kor Hugo Lam Student ID : 904907866 Due Date : May 7, 2007 Introduction Final Project Report Pseudogenes
Working with telecommunications
Working with telecommunications Minimizing churn in the telecommunications industry Contents: 1 Churn analysis using data mining 2 Customer churn analysis with IBM SPSS Modeler 3 Types of analysis 3 Feature
A Review of Data Mining Techniques
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 4, April 2014,
A Regression Approach for Forecasting Vendor Revenue in Telecommunication Industries
A Regression Approach for Forecasting Vendor Revenue in Telecommunication Industries Aida Mustapha *1, Farhana M. Fadzil #2 * Faculty of Computer Science and Information Technology, Universiti Tun Hussein
Experiments in Web Page Classification for Semantic Web
Experiments in Web Page Classification for Semantic Web Asad Satti, Nick Cercone, Vlado Kešelj Faculty of Computer Science, Dalhousie University E-mail: {rashid,nick,vlado}@cs.dal.ca Abstract We address
STATISTICA. Financial Institutions. Case Study: Credit Scoring. and
Financial Institutions and STATISTICA Case Study: Credit Scoring STATISTICA Solutions for Business Intelligence, Data Mining, Quality Control, and Web-based Analytics Table of Contents INTRODUCTION: WHAT
CHURN PREDICTION IN MOBILE TELECOM SYSTEM USING DATA MINING TECHNIQUES
International Journal of Scientific and Research Publications, Volume 4, Issue 4, April 2014 1 CHURN PREDICTION IN MOBILE TELECOM SYSTEM USING DATA MINING TECHNIQUES DR. M.BALASUBRAMANIAN *, M.SELVARANI
Nine Common Types of Data Mining Techniques Used in Predictive Analytics
1 Nine Common Types of Data Mining Techniques Used in Predictive Analytics By Laura Patterson, President, VisionEdge Marketing Predictive analytics enable you to develop mathematical models to help better
Data Mining in Education: Data Classification and Decision Tree Approach
Data Mining in Education: Data Classification and Decision Tree Approach Sonali Agarwal, G. N. Pandey, and M. D. Tiwari Abstract Educational organizations are one of the important parts of our society
Customer Classification And Prediction Based On Data Mining Technique
Customer Classification And Prediction Based On Data Mining Technique Ms. Neethu Baby 1, Mrs. Priyanka L.T 2 1 M.E CSE, Sri Shakthi Institute of Engineering and Technology, Coimbatore 2 Assistant Professor
Data Mining for Knowledge Management. Classification
1 Data Mining for Knowledge Management Classification Themis Palpanas University of Trento http://disi.unitn.eu/~themis Data Mining for Knowledge Management 1 Thanks for slides to: Jiawei Han Eamonn Keogh
Comparison of Data Mining Techniques used for Financial Data Analysis
Comparison of Data Mining Techniques used for Financial Data Analysis Abhijit A. Sawant 1, P. M. Chawan 2 1 Student, 2 Associate Professor, Department of Computer Technology, VJTI, Mumbai, INDIA Abstract
Keywords data mining, prediction techniques, decision making.
Volume 5, Issue 4, April 2015 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Analysis of Datamining
Data Mining for Fun and Profit
Data Mining for Fun and Profit Data mining is the extraction of implicit, previously unknown, and potentially useful information from data. - Ian H. Witten, Data Mining: Practical Machine Learning Tools
Data Mining Analysis (breast-cancer data)
Data Mining Analysis (breast-cancer data) Jung-Ying Wang Register number: D9115007, May, 2003 Abstract In this AI term project, we compare some world renowned machine learning tools. Including WEKA data
Use of Data Mining in Banking
Use of Data Mining in Banking Kazi Imran Moin*, Dr. Qazi Baseer Ahmed** *(Department of Computer Science, College of Computer Science & Information Technology, Latur, (M.S), India ** (Department of Commerce
Effective Analysis and Predictive Model of Stroke Disease using Classification Methods
Effective Analysis and Predictive Model of Stroke Disease using Classification Methods A.Sudha Student, M.Tech (CSE) VIT University Vellore, India P.Gayathri Assistant Professor VIT University Vellore,
REVIEW OF HEART DISEASE PREDICTION SYSTEM USING DATA MINING AND HYBRID INTELLIGENT TECHNIQUES
REVIEW OF HEART DISEASE PREDICTION SYSTEM USING DATA MINING AND HYBRID INTELLIGENT TECHNIQUES R. Chitra 1 and V. Seenivasagam 2 1 Department of Computer Science and Engineering, Noorul Islam Centre for
Data Mining Applications in Fund Raising
Data Mining Applications in Fund Raising Nafisseh Heiat Data mining tools make it possible to apply mathematical models to the historical data to manipulate and discover new information. In this study,
Data Mining: A Preprocessing Engine
Journal of Computer Science 2 (9): 735-739, 2006 ISSN 1549-3636 2005 Science Publications Data Mining: A Preprocessing Engine Luai Al Shalabi, Zyad Shaaban and Basel Kasasbeh Applied Science University,
Data Mining - Evaluation of Classifiers
Data Mining - Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010
In this presentation, you will be introduced to data mining and the relationship with meaningful use.
In this presentation, you will be introduced to data mining and the relationship with meaningful use. Data mining refers to the art and science of intelligent data analysis. It is the application of machine
IDENTIFYING BANK FRAUDS USING CRISP-DM AND DECISION TREES
IDENTIFYING BANK FRAUDS USING CRISP-DM AND DECISION TREES Bruno Carneiro da Rocha 1,2 and Rafael Timóteo de Sousa Júnior 2 1 Bank of Brazil, Brasília-DF, Brazil [email protected] 2 Network Engineering
An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015
An Introduction to Data Mining for Wind Power Management Spring 2015 Big Data World Every minute: Google receives over 4 million search queries Facebook users share almost 2.5 million pieces of content
Data Mining. Nonlinear Classification
Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Nonlinear Classification Classes may not be separable by a linear boundary Suppose we randomly generate a data set as follows: X has range between 0 to 15
Extension of Decision Tree Algorithm for Stream Data Mining Using Real Data
Fifth International Workshop on Computational Intelligence & Applications IEEE SMC Hiroshima Chapter, Hiroshima University, Japan, November 10, 11 & 12, 2009 Extension of Decision Tree Algorithm for Stream
Role of Social Networking in Marketing using Data Mining
Role of Social Networking in Marketing using Data Mining Mrs. Saroj Junghare Astt. Professor, Department of Computer Science and Application St. Aloysius College, Jabalpur, Madhya Pradesh, India Abstract:
An Introduction to Data Mining
An Introduction to Intel Beijing [email protected] January 17, 2014 Outline 1 DW Overview What is Notable Application of Conference, Software and Applications Major Process in 2 Major Tasks in Detail
DATA MINING AND REPORTING IN HEALTHCARE
DATA MINING AND REPORTING IN HEALTHCARE Divya Gandhi 1, Pooja Asher 2, Harshada Chaudhari 3 1,2,3 Department of Information Technology, Sardar Patel Institute of Technology, Mumbai,(India) ABSTRACT The
Role of Customer Response Models in Customer Solicitation Center s Direct Marketing Campaign
Role of Customer Response Models in Customer Solicitation Center s Direct Marketing Campaign Arun K Mandapaka, Amit Singh Kushwah, Dr.Goutam Chakraborty Oklahoma State University, OK, USA ABSTRACT Direct
A NEW DECISION TREE METHOD FOR DATA MINING IN MEDICINE
A NEW DECISION TREE METHOD FOR DATA MINING IN MEDICINE Kasra Madadipouya 1 1 Department of Computing and Science, Asia Pacific University of Technology & Innovation ABSTRACT Today, enormous amount of data
Equity forecast: Predicting long term stock price movement using machine learning
Equity forecast: Predicting long term stock price movement using machine learning Nikola Milosevic School of Computer Science, University of Manchester, UK [email protected] Abstract Long
Data Mining Techniques and its Applications in Banking Sector
Data Mining Techniques and its Applications in Banking Sector Dr. K. Chitra 1, B. Subashini 2 1 Assistant Professor, Department of Computer Science, Government Arts College, Melur, Madurai. 2 Assistant
Data Mining: Overview. What is Data Mining?
Data Mining: Overview What is Data Mining? Recently * coined term for confluence of ideas from statistics and computer science (machine learning and database methods) applied to large databases in science,
Analyzing Customer Churn in the Software as a Service (SaaS) Industry
Analyzing Customer Churn in the Software as a Service (SaaS) Industry Ben Frank, Radford University Jeff Pittges, Radford University Abstract Predicting customer churn is a classic data mining problem.
Data Mining Approach For Subscription-Fraud. Detection in Telecommunication Sector
Contemporary Engineering Sciences, Vol. 7, 2014, no. 11, 515-522 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/ces.2014.4431 Data Mining Approach For Subscription-Fraud Detection in Telecommunication
Data Mining Solutions for the Business Environment
Database Systems Journal vol. IV, no. 4/2013 21 Data Mining Solutions for the Business Environment Ruxandra PETRE University of Economic Studies, Bucharest, Romania [email protected] Over
How To Use Neural Networks In Data Mining
International Journal of Electronics and Computer Science Engineering 1449 Available Online at www.ijecse.org ISSN- 2277-1956 Neural Networks in Data Mining Priyanka Gaur Department of Information and
Decision Support System For A Customer Relationship Management Case Study
61 Decision Support System For A Customer Relationship Management Case Study Ozge Kart 1, Alp Kut 1, and Vladimir Radevski 2 1 Dokuz Eylul University, Izmir, Turkey {ozge, alp}@cs.deu.edu.tr 2 SEE University,
An Evaluation of Neural Networks Approaches used for Software Effort Estimation
Proc. of Int. Conf. on Multimedia Processing, Communication and Info. Tech., MPCIT An Evaluation of Neural Networks Approaches used for Software Effort Estimation B.V. Ajay Prakash 1, D.V.Ashoka 2, V.N.
Data Mining Practical Machine Learning Tools and Techniques
Ensemble learning Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 8 of Data Mining by I. H. Witten, E. Frank and M. A. Hall Combining multiple models Bagging The basic idea
Predicting Students Final GPA Using Decision Trees: A Case Study
Predicting Students Final GPA Using Decision Trees: A Case Study Mashael A. Al-Barrak and Muna Al-Razgan Abstract Educational data mining is the process of applying data mining tools and techniques to
Electronic Payment Fraud Detection Techniques
World of Computer Science and Information Technology Journal (WCSIT) ISSN: 2221-0741 Vol. 2, No. 4, 137-141, 2012 Electronic Payment Fraud Detection Techniques Adnan M. Al-Khatib CIS Dept. Faculty of Information
A New Approach for Evaluation of Data Mining Techniques
181 A New Approach for Evaluation of Data Mining s Moawia Elfaki Yahia 1, Murtada El-mukashfi El-taher 2 1 College of Computer Science and IT King Faisal University Saudi Arabia, Alhasa 31982 2 Faculty
Detection. Perspective. Network Anomaly. Bhattacharyya. Jugal. A Machine Learning »C) Dhruba Kumar. Kumar KaKta. CRC Press J Taylor & Francis Croup
Network Anomaly Detection A Machine Learning Perspective Dhruba Kumar Bhattacharyya Jugal Kumar KaKta»C) CRC Press J Taylor & Francis Croup Boca Raton London New York CRC Press is an imprint of the Taylor
Churn Prediction in MMORPGs: A Social Influence Based Approach
Churn Prediction in MMORPGs: A Social Influence Based Approach Jaya Kawale Dept of Computer Science & Engg University Of Minnesota Minneapolis, MN 55455 Email: [email protected] Aditya Pal Dept of Computer
Artificial Neural Network, Decision Tree and Statistical Techniques Applied for Designing and Developing E-mail Classifier
International Journal of Recent Technology and Engineering (IJRTE) ISSN: 2277-3878, Volume-1, Issue-6, January 2013 Artificial Neural Network, Decision Tree and Statistical Techniques Applied for Designing
Utilization of Neural Network for Disease Forecasting
Utilization of Neural Network for Disease Forecasting Oyas Wahyunggoro 1, Adhistya Erna Permanasari 1, and Ahmad Chamsudin 1,2 1 Department of Electrical Engineering and Information Technology, Gadjah
Heart Disease Diagnosis Using Predictive Data mining
ISSN (Online) : 2319-8753 ISSN (Print) : 2347-6710 International Journal of Innovative Research in Science, Engineering and Technology Volume 3, Special Issue 3, March 2014 2014 International Conference
Data Mining Techniques for Prognosis in Pancreatic Cancer
Data Mining Techniques for Prognosis in Pancreatic Cancer by Stuart Floyd A Thesis Submitted to the Faculty of the WORCESTER POLYTECHNIC INSTITUE In partial fulfillment of the requirements for the Degree
AUTO CLAIM FRAUD DETECTION USING MULTI CLASSIFIER SYSTEM
AUTO CLAIM FRAUD DETECTION USING MULTI CLASSIFIER SYSTEM ABSTRACT Luis Alexandre Rodrigues and Nizam Omar Department of Electrical Engineering, Mackenzie Presbiterian University, Brazil, São Paulo [email protected],[email protected]
COMP3420: Advanced Databases and Data Mining. Classification and prediction: Introduction and Decision Tree Induction
COMP3420: Advanced Databases and Data Mining Classification and prediction: Introduction and Decision Tree Induction Lecture outline Classification versus prediction Classification A two step process Supervised
A Secured Approach to Credit Card Fraud Detection Using Hidden Markov Model
A Secured Approach to Credit Card Fraud Detection Using Hidden Markov Model Twinkle Patel, Ms. Ompriya Kale Abstract: - As the usage of credit card has increased the credit card fraud has also increased
Introduction. A. Bellaachia Page: 1
Introduction 1. Objectives... 3 2. What is Data Mining?... 4 3. Knowledge Discovery Process... 5 4. KD Process Example... 7 5. Typical Data Mining Architecture... 8 6. Database vs. Data Mining... 9 7.
Hexaware E-book on Predictive Analytics
Hexaware E-book on Predictive Analytics Business Intelligence & Analytics Actionable Intelligence Enabled Published on : Feb 7, 2012 Hexaware E-book on Predictive Analytics What is Data mining? Data mining,
Clustering Marketing Datasets with Data Mining Techniques
Clustering Marketing Datasets with Data Mining Techniques Özgür Örnek International Burch University, Sarajevo [email protected] Abdülhamit Subaşı International Burch University, Sarajevo [email protected]
Standardization and Its Effects on K-Means Clustering Algorithm
Research Journal of Applied Sciences, Engineering and Technology 6(7): 399-3303, 03 ISSN: 040-7459; e-issn: 040-7467 Maxwell Scientific Organization, 03 Submitted: January 3, 03 Accepted: February 5, 03
Predictive Data modeling for health care: Comparative performance study of different prediction models
Predictive Data modeling for health care: Comparative performance study of different prediction models Shivanand Hiremath [email protected] National Institute of Industrial Engineering (NITIE) Vihar
