Performance Study on Data Discretization Techniques Using Nutrition Dataset
|
|
|
- Kathryn Roberts
- 10 years ago
- Views:
Transcription
1 2009 International Symposium on Computing, Communication, and Control (ISCCC 2009) Proc.of CSIT vol.1 (2011) (2011) IACSIT Press, Singapore Performance Study on Data Discretization Techniques Using Nutrition Dataset Nor Liyana Mohd Shuib 1, Azuraliza Abu Bakar 2 and Zulaiha Ali Othman 2+ 1 Department of Information Science, Faculty of Computer Science and Information Technology, University of Malaya, Kuala Lumpur, Malaysia 2 Center for Artificial Intelligence Technology, Faculty of Information Science and Technology, University Kebangsaan Malaysia, Bangi, Selangor Darul Ehsan, Malaysia Abstract. Data mining has been widely used in medical and health care domain as the predictive models. Data preprocessing is one of the important steps in data mining process as it consumes about sixty percent of the data mining project effort. Data discretization is one of the pre-processing methods. It makes learning process faster and more accurate. In this paper we proposed the nutrition data classification modeling using two discretization techniques i.e. Boolean Reasoning and Entropy Algorithm. Both techniques are selected from detail study of fifty discretization techniques available to date. The purpose of this work is to compare the performance of different data discretization techniques and to find the most suitable discretization techniques for the nutrition data set. The nutrition data set are obtained from a survey conducted and it contains 160 attributes and 820 records. Both techniques are used to discretize the nutrition data set and the classification performance of both techniques in terms of accuracy and the number of rules evaluated. The experimental results showed that Boolean Reasoning performs better than Entropy Algorithm which gives higher classification accuracy in nutrition data set. Keywords: Data Mining, Data Pre-processing, Discretization 1. Introduction There are huge volumes of data available today because of the advancement technology in software computer and media storage. However, these data are often to be dirty due to several reasons such as incomplete data, missing data, noisy data and inconsistent data [1]. The uses of dirty data would give a huge impact to data mining result because it could give wrong interpretation [2]. Therefore, pre-processing is very important to ensure that data to be used are clean and appropriate for data mining. Data preparation and preprocessing is the key to solve the problem [3]. However, pre-processing is always omitted by researcher which will lead to inaccurate model result since the process is time consuming and tedious. A good data preprocessing will helps to create better model and will consume less time. There are many pre-processing techniques. Each technique has its own functions and advantages. Discretization technique is one of the pre-processing techniques. Most of the classification tasks requires the data to be in the discrete form to be able to perform the mining process. The usage of continuous attributes involves huge storage, misinterpretation and long rules. Hence, discretization is needed to change from continuous attributes to discrete attribute in order to increase the accurateness in prediction. Discrete attributes are the key factor in data mining as it involves with simple interval numbers for representation which is understandable and easier to use. The rules of discrete attributes usually are shorter and easy to understand, hence will increase the accurateness of prediction. Most of the algorithms in the + Corresponding author. Tel.: ; fax: address: [email protected] 304
2 literature requires discrete attribute which caused data mining practitioners and researchers to perform data discretization before or while doing data mining. Most of the real data set usually contains continuous attributes. This involves data sets from health care. In this research, nutrition data set from a general hospital in Malaysia which consists of 820 objects and 160 attributes are used. This data set is used to understand the functions of foods and its relation to health. The objective of mining this data set is to identify patients dietary pattern and how this pattern could lead to the disease. 60 years old people and their dietary pattern have been chosen as the domain. This data set is also used to compare performance of different data discretization techniques and to find the most suitable discretization techniques for the nutrition data set. 2. Literature Review Data mining has been widely used in medical and health care domain [4; 5; 6; 7]. Data pre-processing is one of the most important steps in data mining process as it consumes about sixty percent of the data mining project effort. The steps are named as data integration, data selection, data cleaning, data reduction and data transformation. Data reduction process refers to two approaches i.e. the reduction of data dimensional sizes or reduction of the data distribution. One of the data distribution reduction approach is data discretization. Data discretization is defined as one of the way to reduce data used to change the original continuous attributes to discrete attributes [8]. It creates an appropriate number of intervals for data values thus transforming the continuous data values into the discrete values. The smaller data intervals usually contributed to more accurate predictive model which could cover higher prediction rates into new cases. Discretization is required particularly for rule-based data mining model such as decision tree and rough set classifiers. Based on the study that has been done [9], two types of discretization techniques have been chosen. The methods are Boolean Reasoning [10; 11] and Entropy Algorithm [12] Boolean Reasoning Boolean Reasoning (BR) is suggested by [10]. This technique is used by [11] in rough set theory. BR is developed based on rough set theory and Boolean reasoning. This technique is a supervised technique that consider all attributed at the same time and produce smaller cut point. Cut point is defined as a real value that divides continuous value into intervals [14]. BR was chosen because it is suitable for rough set classification and it was the best approach for researches that involve with classification and recognition [11]. Moreover, this method hasn t been used widely by the researchers in discretization Entropy Algorithm Entropy Algorithm (Ent-MDLP) is a discretization technique based on entropy that was suggested by Fayyad & Irani [13; 14; 15]. Ent-MDLP uses entropy minimization heuristic (EMH) to discretize continuous attributes to interval. This technique also uses minimum description length criteria [16] to control number of interval. Ent-MDLP is a supervised technique that uses information class entropy to choose cut point. Ent- MDLP method was chosen because it was widely used in discretization researches [13; 17]. It also one of the best discretization methods [18; 19; 20] reported in literature. 3. Model Development In this research, rough set algorithm is chosen as a data mining tool. Rough set [21] mining algorithm requires the data to be discretized. This is suitable with our research objective which is to compare discretization techniques. This model development can be divided to three steps which is data preparation and data pre-processing, model development and evaluation and testing. The framework for model development is illustrated in Figure Preparation of Data and Pre-processing Data set collected is a nutrition data set from the UKM Hospital (HUKM). These data sets are collected from 820 patients and have 160 attributes. Out of 160 attributes, 56 continuous attributes are identified. First 305
3 step in preparing the data is the selection of the attributes. This is done by removing redundant attributes (two attributes that have the same knowledge) and unimportant attributes (attributes that contains insignificant knowledge for modelling). Then, the pre-processing is carried out. The preprocessing stage involves filling in the missing value (use the attribute mean), attribute construction, concept hierarchies, replacement of nearest neighbourhood techniques, and discretization techniques. Techniques Ent-MDLP and BR are chosen for discretization. Fig. 1: Model Development Framework 3.2. Mining After data pre-processing has been conducted, data set is divided to training data and testing data using k-folds cross validation techniques. This technique prepared nutrition data set to 10 folds randomly. Training data is used to develop a model while testing data is used to determine the accuracy of the model acquired. Model development is built using roughs set theory. Two model is developed. One is using data set that is discretized using BR while one data sets i using Ent-MDLP. Rosetta [22], rough set application, is used to mine the data Evaluation and Testing Evaluation is based on classifier accuracy, numbers of rules, minimum of rules length, maximum of rules length and numbers of intervals. 4. Results BR and Ent-MDLP has been used as comparison techniques. Evaluation for nutrition data set modeling is based on classifier accuracy (ACC), numbers of rules (NR), minimum of rules length (min_l), maximum of rules length (max_l) and numbers of intervals (I). the min_l and max_l are considered since it is the indication of complexities of the rules where shorter rules are expected to perform better than longer and specific rules. The best model for all folds are shown in Table 1. The results showed comparative performance for both BR and Ent-MDLP techniques. However the best accuracy (ACC) was obtained from model 4 using BR recorded 90.52% with 6772 rules. The highest accuracy by Ent-MDLP recorded 87.93% with 7548 rules. In average, BR gives 85.15% accuracy while Ent- MDLP gives 83.27%. Both techniques showed equal performances in min_l and max_l. the number of rules generated (NR) also showed comparative results. For each attribute in the dataset the number of intervals (I) set by both BR and Ent-MDLP showed significant difference. BR outperformed Ent-MDLP by producing the less I. However, this result does not indicate any significant relation to the accuracy for bothtechniques. Naturally, less number of intervals ensured the better modelling accuracy but if the number of intervals are too small it may cause certain information loss in the data. Therefore, in the aspect of number of generated intervals and the quality of rules via knowledge, Ent-MDLP seems to be a better choice. 306
4 5. Discussion and Conclusion BEST MODEL FROM ALL FOLDS m BR Ent-MDLP ACC NR Min_L Max_L ACC NR Min_L Max_L THE NUMBER OF INTERVALS (I) Atribut prot fat cho ca p fe na k retinol BR Ent- MDLP In this study, two discretization techniques were used for modelling the patient nutrition data. The experimental results showed that both techniques outperformed in different aspects. BR gives higher accuracy, lesser number of rules and number of intervals. Ent-MDLP gives lower accuracy, larger number of rules and large number of intervals. Both findings have their own advantages and drawbacks. Although BR produced the best model but shorter rules generated may contributes to the loss of knowledge. On the other hand, Ent-MDLP showed comparative performance towards BR with larger number of intervals. It gives good indication that although it produces many distinct values in an attribute which indicate the lesser loss of original knowledge, it does not affects the accuracy of the model. Discretization techniques are one of the important techniques in data mining. Discrete attributes will produce short and precise results (rules) compared to continuous attributes. This research investigates two types of techniques namely BR and Ent-MDLP to identify the most suitable discretization techniques to the nutrition data sets in order to produce a better model. Based on the experimental results, Boolean Reasoning performs better than Entropy Algorithm which gives higher classification accuracy in nutrition data set. Further research can be made by comparing these techniques in neural network. 6. Acknowledgement We would like to thank IRPA EA004 group for providing us the nutrition data set to be used in this research. 7. References [1] A. Storkey. Data Mining and Exploration: Introduction. School of Informatics (online) courses/dme/slides/intro4up.pdf. [2] P. Wright. Knowledge discovery preprocessing: determining record usability. ACM Southeast Regional Conference. 1998, pp [3] D. Pyle. Data preparation for data mining. San Francisco: Morgan Kaufmann Publishers,1999. [4] A. Kusiak, K.H. Kernstine, J.A. Kern, K.A. McLaughlin, and T.L. Tseng. Data Mining: Medical and Engineering Case Studies. Proceedings of the IIE Research 2000 Conference, Cleveland, OH, May 2000, pp. 1-7 [5] I. Kononenko, I. Bratko, and M. Kokar, Application of machine learning to medical diagnosis, in [6] Michalski, RS, Bratko, I and Kubat M. (Eds), Machine Learning in Data Mining: Methods and [7] Applications, Wiley, New York, 1998, pp
5 [8] M.R. Kraft, K. C Desouza and I. Androwich. Data Mining in Healthcare Information Systems: Case Study of a Veterans' Administration Spinal Cord Injury Population. In Proceedings of the 36th Annual Hawaii international Conference on System Sciences (Hicss'03). IEEE Computer Society, Washington. 2003, 6 (6): [9] Peng Liu, Lei Lei, Junjie Yin, Wei Zhang, Wu Naijun and E. El-Darzi. Healthcare Data Mining: Prediction Inpatient Length of Stay. 3rd International IEEE Conference on Intelligent Systems. 2006, pp [10] N. Goharian and D. Grossman. Data mining: data preprocessing. Illinois Institute of Technology (online) CS422-Slides/DM-Preprocessing.pdf [11] Nor Liyana Mohd Shuib. Discretization Techniques in Data Mining: A Case Study on Nutrition Data Set. Master Thesis. Universiti Kebangsaan Malaysia [12] H.S. Nguyen, and A. Skowron. Boolean reasoning for feature extraction problems. International Symposium on Methodologies for Intelligent System. 1997, pp [13] Z. Pawlak and A. Skowron. Rough sets and Boolean reasoning. Information Sciences (1): pp [14] U.M. Fayyad and K.B. Irani, Multi-interval discretization of continuous valued attributes for classification learning. Proceedings of IJCAI , pp [15] J. Dougherty, R. Kohavi and M. Sahami. Supervised and unsupervised discretization of continuous features. Proc Twelfth International Conference on Machine Learning. 1995, pp [16] H. Liu, F. Hussain, C.L Tan and M. Dash. Discretization: an enabling technique. Data Mining and Knowledge Discovery. 2002, 6: pp [17] Ying Yang. Discretization for Naive-Bayes Learning. Tesis Ph.D. Monash University [18] J. Rissanen. Modelling by shortest data description. Automatica. 1978, 14: pp [19] J. Gama and C. Pinto. Discretization from data streams: applications to histograms and data mining. Proceedings of the 2006 ACM symposium on Applied computing. 2006, pp [20] J. Cerquides and R López de Màntaras. Proposal and empirical comparison of a parallelizable distance-based discretization method. 3d Int. Conference on Knowledge Discovery and Data Mining (KDD'97) pp [21] X. Liu and H. Wang. A discretization algorithm based on a heterogeneity criterion. IEEE Transactions on Knowledge and Data Engineering. 2005, 17(9): pp [22] F. Tay and L. Shen. A modified chi2 algorithm for discretization. IEEE Transactions of Knowledge and Data Engineering. 2002, 14(3): pp [23] Z. Pawlak. Rough sets. International Journal of Computer and Information Science. 1982, 11: pp [24] A. Øhrn. ROSETTA: Technical Reference Manual (online) [25] aleks/rosetta. 308
Healthcare Data Mining: Prediction Inpatient Length of Stay
3rd International IEEE Conference Intelligent Systems, September 2006 Healthcare Data Mining: Prediction Inpatient Length of Peng Liu, Lei Lei, Junjie Yin, Wei Zhang, Wu Naijun, Elia El-Darzi 1 Abstract
A Framework for Data Warehouse Using Data Mining and Knowledge Discovery for a Network of Hospitals in Pakistan
, pp.217-222 http://dx.doi.org/10.14257/ijbsbt.2015.7.3.23 A Framework for Data Warehouse Using Data Mining and Knowledge Discovery for a Network of Hospitals in Pakistan Muhammad Arif 1,2, Asad Khatak
Network Intrusion Detection Using a HNB Binary Classifier
2015 17th UKSIM-AMSS International Conference on Modelling and Simulation Network Intrusion Detection Using a HNB Binary Classifier Levent Koc and Alan D. Carswell Center for Security Studies, University
Data Mining Framework for Direct Marketing: A Case Study of Bank Marketing
www.ijcsi.org 198 Data Mining Framework for Direct Marketing: A Case Study of Bank Marketing Lilian Sing oei 1 and Jiayang Wang 2 1 School of Information Science and Engineering, Central South University
Towards applying Data Mining Techniques for Talent Mangement
2009 International Conference on Computer Engineering and Applications IPCSIT vol.2 (2011) (2011) IACSIT Press, Singapore Towards applying Data Mining Techniques for Talent Mangement Hamidah Jantan 1,
Data Mining Classification Techniques for Human Talent Forecasting
Data Mining Classification Techniques for Human Talent Forecasting Hamidah Jantan 1, Abdul Razak Hamdan 2 and Zulaiha Ali Othman 2 1 1 Faculty of Computer and Mathematical Sciences UiTM, Terengganu, 23000
An Analysis of Missing Data Treatment Methods and Their Application to Health Care Dataset
P P P Health An Analysis of Missing Data Treatment Methods and Their Application to Health Care Dataset Peng Liu 1, Elia El-Darzi 2, Lei Lei 1, Christos Vasilakis 2, Panagiotis Chountas 2, and Wei Huang
PREDICTING STOCK PRICES USING DATA MINING TECHNIQUES
The International Arab Conference on Information Technology (ACIT 2013) PREDICTING STOCK PRICES USING DATA MINING TECHNIQUES 1 QASEM A. AL-RADAIDEH, 2 ADEL ABU ASSAF 3 EMAN ALNAGI 1 Department of Computer
IDENTIFYING BANK FRAUDS USING CRISP-DM AND DECISION TREES
IDENTIFYING BANK FRAUDS USING CRISP-DM AND DECISION TREES Bruno Carneiro da Rocha 1,2 and Rafael Timóteo de Sousa Júnior 2 1 Bank of Brazil, Brasília-DF, Brazil [email protected] 2 Network Engineering
Rule based Classification of BSE Stock Data with Data Mining
International Journal of Information Sciences and Application. ISSN 0974-2255 Volume 4, Number 1 (2012), pp. 1-9 International Research Publication House http://www.irphouse.com Rule based Classification
Comparative Analysis of Supervised and Unsupervised Discretization Techniques
, Vol., No. 3, 0 Comparative Analysis of Supervised and Unsupervised Discretization Techniques Rajashree Dash, Rajib Lochan Paramguru, Rasmita Dash 3, Department of Computer Science and Engineering, ITER,Siksha
International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014
RESEARCH ARTICLE OPEN ACCESS A Survey of Data Mining: Concepts with Applications and its Future Scope Dr. Zubair Khan 1, Ashish Kumar 2, Sunny Kumar 3 M.Tech Research Scholar 2. Department of Computer
Divide-n-Discover Discretization based Data Exploration Framework for Healthcare Analytics
for Healthcare Analytics Si-Chi Chin,KiyanaZolfaghar,SenjutiBasuRoy,AnkurTeredesai,andPaulAmoroso Institute of Technology, The University of Washington -Tacoma,900CommerceStreet,Tacoma,WA980-00,U.S.A.
ANALYSIS OF FEATURE SELECTION WITH CLASSFICATION: BREAST CANCER DATASETS
ANALYSIS OF FEATURE SELECTION WITH CLASSFICATION: BREAST CANCER DATASETS Abstract D.Lavanya * Department of Computer Science, Sri Padmavathi Mahila University Tirupati, Andhra Pradesh, 517501, India [email protected]
AnalysisofData MiningClassificationwithDecisiontreeTechnique
Global Journal of omputer Science and Technology Software & Data Engineering Volume 13 Issue 13 Version 1.0 Year 2013 Type: Double Blind Peer Reviewed International Research Journal Publisher: Global Journals
Data Mining for Manufacturing: Preventive Maintenance, Failure Prediction, Quality Control
Data Mining for Manufacturing: Preventive Maintenance, Failure Prediction, Quality Control Andre BERGMANN Salzgitter Mannesmann Forschung GmbH; Duisburg, Germany Phone: +49 203 9993154, Fax: +49 203 9993234;
The Research of Data Mining Based on Neural Networks
2011 International Conference on Computer Science and Information Technology (ICCSIT 2011) IPCSIT vol. 51 (2012) (2012) IACSIT Press, Singapore DOI: 10.7763/IPCSIT.2012.V51.09 The Research of Data Mining
FRAUD DETECTION IN ELECTRIC POWER DISTRIBUTION NETWORKS USING AN ANN-BASED KNOWLEDGE-DISCOVERY PROCESS
FRAUD DETECTION IN ELECTRIC POWER DISTRIBUTION NETWORKS USING AN ANN-BASED KNOWLEDGE-DISCOVERY PROCESS Breno C. Costa, Bruno. L. A. Alberto, André M. Portela, W. Maduro, Esdras O. Eler PDITec, Belo Horizonte,
Predicting the Risk of Heart Attacks using Neural Network and Decision Tree
Predicting the Risk of Heart Attacks using Neural Network and Decision Tree S.Florence 1, N.G.Bhuvaneswari Amma 2, G.Annapoorani 3, K.Malathi 4 PG Scholar, Indian Institute of Information Technology, Srirangam,
Explanation-Oriented Association Mining Using a Combination of Unsupervised and Supervised Learning Algorithms
Explanation-Oriented Association Mining Using a Combination of Unsupervised and Supervised Learning Algorithms Y.Y. Yao, Y. Zhao, R.B. Maguire Department of Computer Science, University of Regina Regina,
TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM
TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM Thanh-Nghi Do College of Information Technology, Cantho University 1 Ly Tu Trong Street, Ninh Kieu District Cantho City, Vietnam
Elia El-Darzi School of Computer Science, University of Westminster, London, UK
The current issue and full text archive of this journal is available at www.emeraldinsight.com/1741-0398.htm Applying data mining algorithms to inpatient dataset with missing values Peng Liu School of
Data Quality Mining: Employing Classifiers for Assuring consistent Datasets
Data Quality Mining: Employing Classifiers for Assuring consistent Datasets Fabian Grüning Carl von Ossietzky Universität Oldenburg, Germany, [email protected] Abstract: Independent
Comparison of K-means and Backpropagation Data Mining Algorithms
Comparison of K-means and Backpropagation Data Mining Algorithms Nitu Mathuriya, Dr. Ashish Bansal Abstract Data mining has got more and more mature as a field of basic research in computer science and
Extension of Decision Tree Algorithm for Stream Data Mining Using Real Data
Fifth International Workshop on Computational Intelligence & Applications IEEE SMC Hiroshima Chapter, Hiroshima University, Japan, November 10, 11 & 12, 2009 Extension of Decision Tree Algorithm for Stream
Introduction to Data Mining
Introduction to Data Mining Jay Urbain Credits: Nazli Goharian & David Grossman @ IIT Outline Introduction Data Pre-processing Data Mining Algorithms Naïve Bayes Decision Tree Neural Network Association
Evaluating an Integrated Time-Series Data Mining Environment - A Case Study on a Chronic Hepatitis Data Mining -
Evaluating an Integrated Time-Series Data Mining Environment - A Case Study on a Chronic Hepatitis Data Mining - Hidenao Abe, Miho Ohsaki, Hideto Yokoi, and Takahira Yamaguchi Department of Medical Informatics,
Impact of Boolean factorization as preprocessing methods for classification of Boolean data
Impact of Boolean factorization as preprocessing methods for classification of Boolean data Radim Belohlavek, Jan Outrata, Martin Trnecka Data Analysis and Modeling Lab (DAMOL) Dept. Computer Science,
Mobile Phone APP Software Browsing Behavior using Clustering Analysis
Proceedings of the 2014 International Conference on Industrial Engineering and Operations Management Bali, Indonesia, January 7 9, 2014 Mobile Phone APP Software Browsing Behavior using Clustering Analysis
Discretization and grouping: preprocessing steps for Data Mining
Discretization and grouping: preprocessing steps for Data Mining PetrBerka 1 andivanbruha 2 1 LaboratoryofIntelligentSystems Prague University of Economic W. Churchill Sq. 4, Prague CZ 13067, Czech Republic
Robust Outlier Detection Technique in Data Mining: A Univariate Approach
Robust Outlier Detection Technique in Data Mining: A Univariate Approach Singh Vijendra and Pathak Shivani Faculty of Engineering and Technology Mody Institute of Technology and Science Lakshmangarh, Sikar,
Horizontal Aggregations in SQL to Prepare Data Sets for Data Mining Analysis
IOSR Journal of Computer Engineering (IOSRJCE) ISSN: 2278-0661, ISBN: 2278-8727 Volume 6, Issue 5 (Nov. - Dec. 2012), PP 36-41 Horizontal Aggregations in SQL to Prepare Data Sets for Data Mining Analysis
Data Mining based on Rough Set and Decision Tree Optimization
Data Mining based on Rough Set and Decision Tree Optimization College of Information Engineering, North China University of Water Resources and Electric Power, China, [email protected] Abstract This paper
How To Solve The Kd Cup 2010 Challenge
A Lightweight Solution to the Educational Data Mining Challenge Kun Liu Yan Xing Faculty of Automation Guangdong University of Technology Guangzhou, 510090, China [email protected] [email protected]
Enhancing Quality of Data using Data Mining Method
JOURNAL OF COMPUTING, VOLUME 2, ISSUE 9, SEPTEMBER 2, ISSN 25-967 WWW.JOURNALOFCOMPUTING.ORG 9 Enhancing Quality of Data using Data Mining Method Fatemeh Ghorbanpour A., Mir M. Pedram, Kambiz Badie, Mohammad
PREPROCESSING OF WEB LOGS
PREPROCESSING OF WEB LOGS Ms. Dipa Dixit Lecturer Fr.CRIT, Vashi Abstract-Today s real world databases are highly susceptible to noisy, missing and inconsistent data due to their typically huge size data
Financial Trading System using Combination of Textual and Numerical Data
Financial Trading System using Combination of Textual and Numerical Data Shital N. Dange Computer Science Department, Walchand Institute of Rajesh V. Argiddi Assistant Prof. Computer Science Department,
SURVIVABILITY ANALYSIS OF PEDIATRIC LEUKAEMIC PATIENTS USING NEURAL NETWORK APPROACH
330 SURVIVABILITY ANALYSIS OF PEDIATRIC LEUKAEMIC PATIENTS USING NEURAL NETWORK APPROACH T. M. D.Saumya 1, T. Rupasinghe 2 and P. Abeysinghe 3 1 Department of Industrial Management, University of Kelaniya,
A Regression Approach for Forecasting Vendor Revenue in Telecommunication Industries
A Regression Approach for Forecasting Vendor Revenue in Telecommunication Industries Aida Mustapha *1, Farhana M. Fadzil #2 * Faculty of Computer Science and Information Technology, Universiti Tun Hussein
International Journal of Advanced Computer Technology (IJACT) ISSN:2319-7900 PRIVACY PRESERVING DATA MINING IN HEALTH CARE APPLICATIONS
PRIVACY PRESERVING DATA MINING IN HEALTH CARE APPLICATIONS First A. Dr. D. Aruna Kumari, Ph.d, ; Second B. Ch.Mounika, Student, Department Of ECM, K L University, [email protected]; Third C.
DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES
DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES Vijayalakshmi Mahanra Rao 1, Yashwant Prasad Singh 2 Multimedia University, Cyberjaya, MALAYSIA 1 [email protected]
Data Mining for Customer Service Support. Senioritis Seminar Presentation Megan Boice Jay Carter Nick Linke KC Tobin
Data Mining for Customer Service Support Senioritis Seminar Presentation Megan Boice Jay Carter Nick Linke KC Tobin Traditional Hotline Services Problem Traditional Customer Service Support (manufacturing)
Evaluation of Feature Selection Methods for Predictive Modeling Using Neural Networks in Credits Scoring
714 Evaluation of Feature election Methods for Predictive Modeling Using Neural Networks in Credits coring Raghavendra B. K. Dr. M.G.R. Educational and Research Institute, Chennai-95 Email: [email protected]
ClusterOSS: a new undersampling method for imbalanced learning
1 ClusterOSS: a new undersampling method for imbalanced learning Victor H Barella, Eduardo P Costa, and André C P L F Carvalho, Abstract A dataset is said to be imbalanced when its classes are disproportionately
A Framework for Data Migration between Various Types of Relational Database Management Systems
A Framework for Data Migration between Various Types of Relational Database Management Systems Ahlam Mohammad Al Balushi Sultanate of Oman, International Maritime College Oman ABSTRACT Data Migration is
An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015
An Introduction to Data Mining for Wind Power Management Spring 2015 Big Data World Every minute: Google receives over 4 million search queries Facebook users share almost 2.5 million pieces of content
An Overview of Knowledge Discovery Database and Data mining Techniques
An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,
Customer Classification And Prediction Based On Data Mining Technique
Customer Classification And Prediction Based On Data Mining Technique Ms. Neethu Baby 1, Mrs. Priyanka L.T 2 1 M.E CSE, Sri Shakthi Institute of Engineering and Technology, Coimbatore 2 Assistant Professor
Enhanced Boosted Trees Technique for Customer Churn Prediction Model
IOSR Journal of Engineering (IOSRJEN) ISSN (e): 2250-3021, ISSN (p): 2278-8719 Vol. 04, Issue 03 (March. 2014), V5 PP 41-45 www.iosrjen.org Enhanced Boosted Trees Technique for Customer Churn Prediction
Feature Selection using Integer and Binary coded Genetic Algorithm to improve the performance of SVM Classifier
Feature Selection using Integer and Binary coded Genetic Algorithm to improve the performance of SVM Classifier D.Nithya a, *, V.Suganya b,1, R.Saranya Irudaya Mary c,1 Abstract - This paper presents,
Applied Mathematical Sciences, Vol. 7, 2013, no. 112, 5591-5597 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/ams.2013.
Applied Mathematical Sciences, Vol. 7, 2013, no. 112, 5591-5597 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/ams.2013.38457 Accuracy Rate of Predictive Models in Credit Screening Anirut Suebsing
College information system research based on data mining
2009 International Conference on Machine Learning and Computing IPCSIT vol.3 (2011) (2011) IACSIT Press, Singapore College information system research based on data mining An-yi Lan 1, Jie Li 2 1 Hebei
AUTO CLAIM FRAUD DETECTION USING MULTI CLASSIFIER SYSTEM
AUTO CLAIM FRAUD DETECTION USING MULTI CLASSIFIER SYSTEM ABSTRACT Luis Alexandre Rodrigues and Nizam Omar Department of Electrical Engineering, Mackenzie Presbiterian University, Brazil, São Paulo [email protected],[email protected]
A NEW DECISION TREE METHOD FOR DATA MINING IN MEDICINE
A NEW DECISION TREE METHOD FOR DATA MINING IN MEDICINE Kasra Madadipouya 1 1 Department of Computing and Science, Asia Pacific University of Technology & Innovation ABSTRACT Today, enormous amount of data
Healthcare Measurement Analysis Using Data mining Techniques
www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242 Volume 03 Issue 07 July, 2014 Page No. 7058-7064 Healthcare Measurement Analysis Using Data mining Techniques 1 Dr.A.Shaik
How To Identify A Churner
2012 45th Hawaii International Conference on System Sciences A New Ensemble Model for Efficient Churn Prediction in Mobile Telecommunication Namhyoung Kim, Jaewook Lee Department of Industrial and Management
Predicting Student Performance by Using Data Mining Methods for Classification
BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume 13, No 1 Sofia 2013 Print ISSN: 1311-9702; Online ISSN: 1314-4081 DOI: 10.2478/cait-2013-0006 Predicting Student Performance
Web Document Clustering
Web Document Clustering Lab Project based on the MDL clustering suite http://www.cs.ccsu.edu/~markov/mdlclustering/ Zdravko Markov Computer Science Department Central Connecticut State University New Britain,
A General Approach to Incorporate Data Quality Matrices into Data Mining Algorithms
A General Approach to Incorporate Data Quality Matrices into Data Mining Algorithms Ian Davidson 1st author's affiliation 1st line of address 2nd line of address Telephone number, incl country code 1st
Big Data Classification: Problems and Challenges in Network Intrusion Prediction with Machine Learning
Big Data Classification: Problems and Challenges in Network Intrusion Prediction with Machine Learning By: Shan Suthaharan Suthaharan, S. (2014). Big data classification: Problems and challenges in network
Model Trees for Classification of Hybrid Data Types
Model Trees for Classification of Hybrid Data Types Hsing-Kuo Pao, Shou-Chih Chang, and Yuh-Jye Lee Dept. of Computer Science & Information Engineering, National Taiwan University of Science & Technology,
Prediction of Stock Performance Using Analytical Techniques
136 JOURNAL OF EMERGING TECHNOLOGIES IN WEB INTELLIGENCE, VOL. 5, NO. 2, MAY 2013 Prediction of Stock Performance Using Analytical Techniques Carol Hargreaves Institute of Systems Science National University
Identifying At-Risk Students Using Machine Learning Techniques: A Case Study with IS 100
Identifying At-Risk Students Using Machine Learning Techniques: A Case Study with IS 100 Erkan Er Abstract In this paper, a model for predicting students performance levels is proposed which employs three
Data Mining Solutions for the Business Environment
Database Systems Journal vol. IV, no. 4/2013 21 Data Mining Solutions for the Business Environment Ruxandra PETRE University of Economic Studies, Bucharest, Romania [email protected] Over
Data Mining: A Preprocessing Engine
Journal of Computer Science 2 (9): 735-739, 2006 ISSN 1549-3636 2005 Science Publications Data Mining: A Preprocessing Engine Luai Al Shalabi, Zyad Shaaban and Basel Kasasbeh Applied Science University,
Knowledge Based Descriptive Neural Networks
Knowledge Based Descriptive Neural Networks J. T. Yao Department of Computer Science, University or Regina Regina, Saskachewan, CANADA S4S 0A2 Email: [email protected] Abstract This paper presents a
Using Data Mining for Mobile Communication Clustering and Characterization
Using Data Mining for Mobile Communication Clustering and Characterization A. Bascacov *, C. Cernazanu ** and M. Marcu ** * Lasting Software, Timisoara, Romania ** Politehnica University of Timisoara/Computer
Application of Data Mining Methods in Health Care Databases
6 th International Conference on Applied Informatics Eger, Hungary, January 27 31, 2004. Application of Data Mining Methods in Health Care Databases Ágnes Vathy-Fogarassy Department of Mathematics and
Data Mining using Artificial Neural Network Rules
Data Mining using Artificial Neural Network Rules Pushkar Shinde MCOERC, Nasik Abstract - Diabetes patients are increasing in number so it is necessary to predict, treat and diagnose the disease. Data
A Review of Data Mining Techniques
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 4, April 2014,
Constrained Classification of Large Imbalanced Data by Logistic Regression and Genetic Algorithm
Constrained Classification of Large Imbalanced Data by Logistic Regression and Genetic Algorithm Martin Hlosta, Rostislav Stríž, Jan Kupčík, Jaroslav Zendulka, and Tomáš Hruška A. Imbalanced Data Classification
A Review of Missing Data Treatment Methods
A Review of Missing Data Treatment Methods Liu Peng, Lei Lei Department of Information Systems, Shanghai University of Finance and Economics, Shanghai, 200433, P.R. China ABSTRACT Missing data is a common
Addressing the Class Imbalance Problem in Medical Datasets
Addressing the Class Imbalance Problem in Medical Datasets M. Mostafizur Rahman and D. N. Davis the size of the training set is significantly increased [5]. If the time taken to resample is not considered,
Social Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
In this presentation, you will be introduced to data mining and the relationship with meaningful use.
In this presentation, you will be introduced to data mining and the relationship with meaningful use. Data mining refers to the art and science of intelligent data analysis. It is the application of machine
DATA PREPARATION FOR DATA MINING
Applied Artificial Intelligence, 17:375 381, 2003 Copyright # 2003 Taylor & Francis 0883-9514/03 $12.00 +.00 DOI: 10.1080/08839510390219264 u DATA PREPARATION FOR DATA MINING SHICHAO ZHANG and CHENGQI
Data Mining Analytics for Business Intelligence and Decision Support
Data Mining Analytics for Business Intelligence and Decision Support Chid Apte, T.J. Watson Research Center, IBM Research Division Knowledge Discovery and Data Mining (KDD) techniques are used for analyzing
A Statistical Text Mining Method for Patent Analysis
A Statistical Text Mining Method for Patent Analysis Department of Statistics Cheongju University, [email protected] Abstract Most text data from diverse document databases are unsuitable for analytical
Data Mining Part 5. Prediction
Data Mining Part 5. Prediction 5.1 Spring 2010 Instructor: Dr. Masoud Yaghini Outline Classification vs. Numeric Prediction Prediction Process Data Preparation Comparing Prediction Methods References Classification
Knowledge Discovery from patents using KMX Text Analytics
Knowledge Discovery from patents using KMX Text Analytics Dr. Anton Heijs [email protected] Treparel Abstract In this white paper we discuss how the KMX technology of Treparel can help searchers
Random forest algorithm in big data environment
Random forest algorithm in big data environment Yingchun Liu * School of Economics and Management, Beihang University, Beijing 100191, China Received 1 September 2014, www.cmnt.lv Abstract Random forest
Automatic Resolver Group Assignment of IT Service Desk Outsourcing
Automatic Resolver Group Assignment of IT Service Desk Outsourcing in Banking Business Padej Phomasakha Na Sakolnakorn*, Phayung Meesad ** and Gareth Clayton*** Abstract This paper proposes a framework
A STUDY ON DATA MINING INVESTIGATING ITS METHODS, APPROACHES AND APPLICATIONS
A STUDY ON DATA MINING INVESTIGATING ITS METHODS, APPROACHES AND APPLICATIONS Mrs. Jyoti Nawade 1, Dr. Balaji D 2, Mr. Pravin Nawade 3 1 Lecturer, JSPM S Bhivrabai Sawant Polytechnic, Pune (India) 2 Assistant
Chapter 6. The stacking ensemble approach
82 This chapter proposes the stacking ensemble approach for combining different data mining classifiers to get better performance. Other combination techniques like voting, bagging etc are also described
Grid Density Clustering Algorithm
Grid Density Clustering Algorithm Amandeep Kaur Mann 1, Navneet Kaur 2, Scholar, M.Tech (CSE), RIMT, Mandi Gobindgarh, Punjab, India 1 Assistant Professor (CSE), RIMT, Mandi Gobindgarh, Punjab, India 2
Heart Disease Diagnosis Using Predictive Data mining
ISSN (Online) : 2319-8753 ISSN (Print) : 2347-6710 International Journal of Innovative Research in Science, Engineering and Technology Volume 3, Special Issue 3, March 2014 2014 International Conference
Single Level Drill Down Interactive Visualization Technique for Descriptive Data Mining Results
, pp.33-40 http://dx.doi.org/10.14257/ijgdc.2014.7.4.04 Single Level Drill Down Interactive Visualization Technique for Descriptive Data Mining Results Muzammil Khan, Fida Hussain and Imran Khan Department
INVESTIGATIONS INTO EFFECTIVENESS OF GAUSSIAN AND NEAREST MEAN CLASSIFIERS FOR SPAM DETECTION
INVESTIGATIONS INTO EFFECTIVENESS OF AND CLASSIFIERS FOR SPAM DETECTION Upasna Attri C.S.E. Department, DAV Institute of Engineering and Technology, Jalandhar (India) [email protected] Harpreet Kaur
CHURN PREDICTION IN MOBILE TELECOM SYSTEM USING DATA MINING TECHNIQUES
International Journal of Scientific and Research Publications, Volume 4, Issue 4, April 2014 1 CHURN PREDICTION IN MOBILE TELECOM SYSTEM USING DATA MINING TECHNIQUES DR. M.BALASUBRAMANIAN *, M.SELVARANI
Introducing diversity among the models of multi-label classification ensemble
Introducing diversity among the models of multi-label classification ensemble Lena Chekina, Lior Rokach and Bracha Shapira Ben-Gurion University of the Negev Dept. of Information Systems Engineering and
An innovative application of a constrained-syntax genetic programming system to the problem of predicting survival of patients
An innovative application of a constrained-syntax genetic programming system to the problem of predicting survival of patients Celia C. Bojarczuk 1, Heitor S. Lopes 2 and Alex A. Freitas 3 1 Departamento
Impact of Feature Selection on the Performance of Wireless Intrusion Detection Systems
2009 International Conference on Computer Engineering and Applications IPCSIT vol.2 (2011) (2011) IACSIT Press, Singapore Impact of Feature Selection on the Performance of ireless Intrusion Detection Systems
Email Classification Using Data Reduction Method
Email Classification Using Data Reduction Method Rafiqul Islam and Yang Xiang, member IEEE School of Information Technology Deakin University, Burwood 3125, Victoria, Australia Abstract Classifying user
How To Use Neural Networks In Data Mining
International Journal of Electronics and Computer Science Engineering 1449 Available Online at www.ijecse.org ISSN- 2277-1956 Neural Networks in Data Mining Priyanka Gaur Department of Information and
