An Empirical Comparison of Ensemble and Hybrid Classification

Size: px
Start display at page:

Download "An Empirical Comparison of Ensemble and Hybrid Classification"

Transcription

1 Proc. of Int. Conf. on Recent Trends in Signal Processing, Image Processing and VLSI, ICrtSIV An Empirical Comparison of Ensemble and Hybrid Classification B V Sumana 1 and T. Santhanam 2 1 Assistant Professor, Department of Computer Science, Vijaya College Jayanagar, Bangalore, India [email protected] 2. Associate Professor, & Head, Department of Computer Applications, DG Vaishnav College, Chennai, India [email protected] Abstract The application of Data mining has proved to be successful in almost all the fields including medical domain. Medical data mining is the process of extracting useful knowledge and hidden patterns from medical data. This paper proposes a hybrid model for classifying Cleveland Heart dataset with hybrid feature selection and compares the performance with the base classifiers and ensemble classifiers. The model is developed in four stages. In the initial stage, Cleveland Heart dataset selected from the UCI repository is cleaned by deleting all the instances with missing values. In the second stage Fuzzy and Rough Set is used in a cascaded fashion for relevant feature extraction. In the third stage the resultant dataset was clustered into two segments using K-means and incorrectly clustered samples were eliminated to get final samples. Finally, the correctly clustered samples from the previous stage was trained with 5 different classifiers to build the final classifier model using 10 fold cross validation. Experimental results proved that proposed hybrid model showed enhanced classification accuracy compared to base classifiers and ensemble classifiers. It yielded highest accuracy of 99.54%. Index Terms Classification, Clustering, Fuzzy, Rough Set, K-means, Hybrid I. INTRODUCTION Recent technological advances has led society to generate large amounts of data in almost all fields like business, marketing, surveillance, science, medicine, economics, fraud detection sports etc., The data stored is growing exponentially from tera bytes to peta bytes and later might be to yotta bytes. There is often information hidden in the data that is not readily available until and unless analyzed. This huge amount of data is a key source to be processed and analyzed for knowledge extraction that enables support for cost savings and decision making. Human analysts may take weeks to extract useful information. Hence need techniques to extract information. Data mining has become an important tool to transform these data into information. The application of Data mining has proved to be successful in almost all fields. It has also proved a similar application in medical domain also. Knowledge extracted (using data mining) from the medical data provide the physicians an additional source of knowledge to take decisions in their practices, treatment planning, risk analysis and other predictions Disease diagnosis is one of the applications where data mining is proving successful results. Data mining which is a confluence of multi disciplines like machine learning, statistics, pattern recognition, visualization etc., provides various techniques like association, regression, prediction, clustering and DOI: 03.AETS Association of Computer Electronics and Electrical Engineers, 2014

2 classification [1]. The two most widely used data mining techniques are Classification and Clustering. Classification, supervised learning technique whose goal is to predict the target class for each case in the data where the classes are predefined. Clustering, an unsupervised learning technique whose goal is to cluster the data into groups of similar objects in which the objects are similar to one another within the same cluster and dissimilar to the objects in other clusters [2]. Presently, though various classification algorithms are available in the literature researchers are facing the problem of choosing the best model for a particular data set as these traditional algorithms suffer with common problems, such as computational complexity, sticking to local minima or over-fitting to the data set used for training [3]. One of the most eminently used approaches to overcome these problems is the ensemble learning which is primarily used to improve the performance of a classifier. It is a process by which predictions of multiple classifiers are combined to classify new samples to achieve better prediction accuracy [4]. The commonly used ensemble techniques are bagging, boosting, voting and stacking. Despite recent researches have been noticed some problems, like ensembles do not always improve accuracy of the model but tends to increase the error of each individual base classifier [5]. The use of many classifiers makes them more complex and produces output that is very hard to analyze [6]. It is stated in [7] that not all the ensemble classifiers improve the classification accuracy on all the datasets. Sarvestan Soltani et al [8]. deduced that it takes more build time compared to a single classifier. The most challenging tasks to be faced while developing ensemble classifiers are (i) the combination of the classifiers to be used. (ii) The base classifiers used for ensemble must be simple so that they should not over fit. (iii) To get a good ensemble, the base learners used should be as accurate as possible, and as distinct as possible. (iv) Sometime we get poor accuracy due to difficulty in selecting the correct combination of classifiers. To overcome these problems recent researches have shown enhanced results in diagnosing the disease when more than one technique is hybridized. Hybridization is an emerging approach where more than one technique is combined, example, clustering and classification or clustering and association and so on. From the literature study it seen that recent researches has focused on hybridization of more than one technique but the research gap is none of the paper used hybrid feature selection, hence presently a hybrid model combining clustering and classification is proposed with hybrid feature selection combining fuzzy and rough set technique to optimize the accuracy of the classifier. The objective of this paper is to review the hybrid clustering and classification model [9, 10] and analyze the performances of hybrid model over single classification model and ensemble, based on classification accuracy, error rate, specificity, sensitivity and time taken to build the model on Cleveland Heart dataset. This work helped us to propose the best model for heart dataset and also helped to provide some suggestions to the researchers facing problem to choose the best algorithm suitable for the particular dataset by proposing a hybrid model which includes classification and clustering to diagnose the heart disease. Heart dataset was selected, because heart disease is one the common cause of death globally. As estimated in million people died from cardiovascular disease. 80% of deaths took place in low and middle income countries and it is estimated that by 2030 more than 23 million people will die annually from CVDs [11]. The death rate between 1 st January 2013 and 31 st march 2013 is 161,894 in USA. About 9.5 million deaths, which are about one in six deaths worldwide, occur in India every year [12]. Heart disease is a condition of the body that affects heart's ability to work. Different categories of heart disease are Coronary heart disease, Cardiomyopathy, Cardiovascular disease, congestive heart failure and heart attack. Heart attacks and strokes are usually acute events and are mainly caused by a blockage that prevents blood from flowing to the heart or brain. The most common reason for this is a build-up of fatty deposits on the inner walls of the blood vessels that supply the heart or brain. Strokes can also be caused by bleeding from a blood vessel in the brain or from blood clots. The overall objective of this paper is to study the performance comparison of single classification model and ensemble over hybrid model on heart dataset to segment patients into two clusters one cluster with presence of disease and another with absence of disease and to find which model performs best. The rest of the paper is organized as follows Section II provides a brief review of the related work. Section III explains the proposed model and Heart dataset used in this study. Section IV presents an overview of the classifiers used. Section V discusses the performance measures adopted in this study. Section VI reports the results of the experiment which is followed by the standard guidelines. Finally, Section VII and VIII conclude the paper with an outlook of future work. 464

3 II. LITERATURE REVIEW A lot of research work has been done on various medical data sets. There are many researches going on till today. It is not possible to list all the researches. Hence only few are listed below. TABLE I lists few researches on single, ensemble and hybrid clustering and classification method performed using UCI repository medical datasets. TABLE I. DATA MINING TECHNIQUES APPLIED ON DIFFERENT MEDICAL DATASETS Type Author Year Data set Technique Sellappan Palaniappan et al Heart Naïve Bayes, Decision Trees, Neural Network Sarvestan Soltani A. et al Breast Cancer MLP,SOM,RBF,PNN Anbarasi et al. Genetic with Decision tree,naïve Bayes and 2010 Heart Classification via Clustering AH Chen et al Heart ANN Single Sam Chao et al 2009 Rajeswari K et al Heart Decision Tree Algorithms Umair Abdullah et al Shariq Bashir et al Thanh-Trung Nguyen 2010 Apriori and FP Growth Sunil Joshi et al Xiaoyong Lin et al Lior Rokach et al Sotiris et al Indra Bhan et al Ensemble Srimani et al classifiers Bendi Venkata Ramana Liver 2012 Rotation Forest et al. Sarwesh et al Shantakumar B.Patil et al Heart P. Rajendran et al Medical Image Association and Classification Sung Ho Ha et al chest S Kartik et al Liver Rough Set and Classification Asha T et al Tuberculosis Hybrid Sarojini et al. Diabetes. heart and 2011 cancer Asha Gowda Karegowda Diabetes 2012 et al. Clustering and Classification Shomona Gracia Jacob et Lymphography 2012 al. Shezad Shaikh et al NSL-KDD III. METHODS AND MATERIALS For this experiment WEKA an open source tool and Cleveland Heart dataset collected from UCI Machine Learning Repository [13] is used. The experiment was conducted using 10-fold cross validation to test the accuracy and time complexity of the classifiers. A. Proposed Model The methodology consists of four stages based on clustering and classification that classifies the dataset into two clusters. In the initial stage, Heart dataset selected is cleaned by deleting all the instances with missing values. In the second stage Fuzzy Rough Set is used in a cascaded fashion for relevant feature extraction. In the third stage the resultant dataset was then clustered into two segments using K-means and incorrectly clustered samples were eliminated to get final samples. Finally, the correctly clustered samples from the previous stage was trained with 5 different classifiers to build the final classifier model using 10 fold cross validation. 1) Data Preprocessing: is an important preliminary preparation step in the data mining process which includes cleaning, integration, transformation, feature extraction and selection [14]. The accuracy and quality of analysis depends on the quality of the data. The data may contain missing values, noisy, irrelevant and redundant information. If the data is not handled properly the mining process will produce misleading results. The Heart dataset used in this study has 7 missing values. If it is not handled properly it will produce misleading results during classification process. There are many approaches to handle missing data. In our 465

4 approach, the Heart data set was refined by deleting records containing missing values since it contributed only 2% of the data samples and was transformed to a form appropriate for clustering. Hence data preprocessing acts as the preliminary preparation process for transforming the data suitable for clustering 2) Feature Selection: Data may contain many redundant or irrelevant features. Redundant features are those which provide no more information and irrelevant features are those which provide no useful information. The classification accuracy of a given algorithm generally depends on the nature of dataset rather than the algorithm itself. The main characteristics of a dataset are its attributes, classes and number of instances. Feature selection is a form of dimensionality reduction where in the input data will be transformed into a reduced representation set of features eliminating irrelevant features and selecting a subset of relevant features for the model construction which optimizes the accuracy of the classifiers. In this approach Fuzzy Rough set feature selection (FRFS) was adapted to select the best attributes and clustering as a reduction technique applying which the wrongly clustered instances were eliminated to get final samples. Fuzzy-Rough set Feature Selection (FRFS) was adapted, as it can analyze both quantitative and qualitative features and can reduce mixture of nominal and continuous valued features based only on the original data without any additional information about the data. Though Rough set theory proposed by Pawlak (1982) has many successful advantages in the extraction of feature subsets it has the limitation of handling only nominal data therefore fuzzy set theory is combined with Rough set to handle continuous data. Hence the hybrid FRFS can handle mixture of nominal and continuous valued features 3) Clustering Using K-Means Algorithm: Clustering is an unsupervised learning technique using which data elements are segmented into related groups without prior knowledge of the group definitions. Their basic task is to group objects into meaningful categories and develop classification labels automatically. Numerous methods are available in the literature for clustering. The k-means algorithm is one of the widely recognized clustering algorithms that are applied in numerous scientific and industrial applications. Hence K- Means clustering algorithm is adapted in the present approach as it is an unsupervised partition method which is simple and takes relatively low computational time [15, 16, 17]. K-means algorithm takes k as an input which is a positive integer denoting the number of clusters and groups the data in accordance with their characteristic values into K distinct clusters. So that the resulting objects of one cluster are dissimilar to that of other cluster and similar to objects of the same cluster. Finally, the relevant features identified and the correctly classified samples from first and second stage and the relevant instances identified in the third stage were given as an input to five different classifiers of WEKA using 10 fold cross validation. The performances of the classifiers were evaluated based on the confusion matrix. Table 2 illustrates the defined process. TABLE II. SUMMARY OF PREPROCESSING Data set No of instances No of instances after preprocessing No of incorrectly clustered instances using Kmeans Error in clustering (%) No of instances after elimination of wrongly clustered instances Heart % 219 IV. OVERVIEW OF THE CLASSIFIERS There are large numbers of classifiers available in the literature such as Bayes, rule based, Neural Networks; tree etc. classifiers may be of any type their optimum goal is to predict the class. In our approach we have evaluated the Cleveland Heart dataset using very prominently used five different classifiers. Bayesian classifiers Naive Bayes. Naive Bayes (NB) is a probabilistic method for classification based on Bayesian theorem. It assumes independence among the attributes that the input features are conditionally independent of each other Support Vector Machines using Sequential Minimal Optimization. The Support Vector Machine (SVM) algorithm builds a hyper plane to separate different instances into their respective classes SMO implements the sequential minimal optimization algorithm for training a support vector classifier using polynomial or Gaussian kernels which is a fast and an efficient version of SVM implemented in WEKA. 466

5 Instance Based Learners. IBk Is an Instance based Classifier in which classification is done on the basis of a majority vote of k neighboring instance. Trees J48 Decision Tree A divide-and-conquer approach to the problem of learning.the decision tree (J48) is an implementation of C4.5 in WEKA. The tree comprises of nodes (attributes) at every stage that are structured with the help of training examples. Rule Learner PART is a rule learner classifier proposed by Frank and Witten (1998) which is a combination of C4.5 and Ripper This algorithm generates ordered set of rules called decision lists and new data is compared to each rule in the list and the item is assigned the category of the first matching rule (a default is applied if no rule successfully matches).part builds the decision tree in each iteration using C4.5 s heuristics and makes the best leaf into a rule. Ensembles Bagging combines the multiple models generated by training a single algorithm on random sub-samples of a given dataset. Unbiased voting is used during the fusion process. Adaboost Boosting, in contrast to bagging, uses weighted voting to generate more misclassified instances in its successive models. Rotation Forest is successful ensemble technique in which each tree is trained on the whole data set in a rotated feature space Dagging creates a number of disjoint, stratified folds out of the data and feeds each chunk of data to a copy of the supplied base classifier. Random Forest constructs random forests by bagging ensembles of random trees. The characteristics of the dataset used is explained in TABLE 3 TABLE III. DATA SET DESCRIPTIONS Data Set No. of Attributes No. of Classes No. of Instances Missing Values Including class Heart Cleveland Yes (7) Sl Attribute Description Values no 1 Age Age in years Continuous 2 Sex Male or Female 1 = male 0 = female 3 cp Chest pain type 1 = typical type 1 2 = typical type angina 3 = non-angina pain 4 = asymptomatic 4 Thestbps Resting blood pressure Continuous value in mm hg 5 Chol Serum cholesterol Continuous value in mm/dl 6 Restecg Resting electrographic results 0 = normal 1 = having_st_t wave abnormal 2 = left ventricular hypertrophy 7 Fbs Fasting blood sugar 1 _ 120 mg/dl 0 _ 120 mg/dl 8 Thalach Maximum heart rate achieved Continuous value 9 Exang Exercise induced angina 0= no 1 = yes 10 Oldpeak ST depression induced by exercise Continuous value relative to rest 11 Slope Slope of the peak exercise ST 1 = unsloping 2 = flat 3 = downsloping Segment 12 Ca Number of major vessels colored by 0-3 value floursopy 13 Thal Defect type 3 = normal 6 = fixed 7 = reversible defect V. ESTIMATIONS FOR MODEL PERFORMANCE A. Stratified 10 Fold Cross Validation Method In this study stratified Cross validation with 10 folds has been used for evaluating the classifier models. Cross validation is a statistical technique used for evaluating the performance of the predictive model and also used to compare learning algorithms by dividing data into 2 segments one used to train a model and the other used 467

6 to validate the model [18]. Stratification is a process of partitioning the data such that each class is properly represented in both training and test sets [19]. In a stratified 10-fold Cross-Validation the data is divided randomly into 10 parts in which the class is represented in approximately the same proportions as in the full dataset. Each part is held out in turn and the learning scheme trained on the remaining nine-tenths; then its error rate is calculated on the holdout set. The learning procedure is executed a total of 10 times on different training sets, and finally the 10 error rates are averaged to yield an overall error estimate. When seeking an accurate error estimate, it is standard procedure to repeat the CV process 10 times [19]. B. Performance Measures Supervised Machine Learning (ML) has several ways of evaluating the performance of the classifiers. The quality of classification algorithms is measured based on the confusion matrix which records correctly and incorrectly recognized examples for each class. Table 4 presents a confusion matrix for binary classification, where TP are true positive TN are true negative, FP false positive, FN false Negative. The different measures used with the confusion matrix are TABLE IV. CONFUSION MATRIX Actual class Predicted class Test Negative (T-) Test Positive (T+) Disease Absent (D-) True Negative (TN) False Positive (FP) Disease Present (D+) False Negative (FN)) True Positive (TP) Accuracy: The accuracy of a classifier is the percentage of the test set tuples that are correctly classified by the classifier. Accuracy = (TP + TN) / (TP + TN + FP + FN) Sensitivity: Sensitivity is also referred as True positive rate i.e., the proportion of positive tuples that are correctly identified. Sensitivity = TP/ (TP+FN) Specificity: Specificity is the True negative rate that is the proportion of negative tuples that are correctly identified Specificity= TN/ (TN + FP) Time: The amount of time required to build the model. VI. EXPERIMENTAL RESULTS This section explains the experimental results and analysis of the study. In this study the classification accuracy of 5 classification algorithms were analyzed over Heart dataset selected from UCI repository and empirically compared the accuracy of the proposed model with the Benchmark comparison of results given in [20]. TABLE V. EXPERIMENTAL RESULTS USING BASIC LEARNING CLASSIFIERS Performance Evaluators Naïve Bayes K-NN (IBK) Part C4.5 (J48) SVM (SMO) Accuracy Error Rate Sensitivity Specificity Time to build the model TABLE VI. EXPERIMENTAL RESULTS USING ENSEMBLE CLASSIFIERS Performance Bagging Adaboost Rotation Dagging Random Forest Evaluators (AdaboostM1) Forest Accuracy Error Rate Sensitivity Specificity Time to build the model

7 TABLE. VII EXPERIMENTAL RESULTS USNG PROPOSED HYBRID MODEL (WITHOUT FEATURE SELECTION) Performance Evaluators Naïve Bayes K-NN (IBK) Part C4.5 (J48) SVM (SMO) Accuracy Error Rate Sensitivity Specificity Time to build the model TABLE.VIII. EXPERIMENTAL RESULTS USING PROPOSED HYBRID MODEL+HYBRID FEATURE SELECTION (FUZZY+ROUGHSET FRFS WITH ANTSEARCH) Data set Heart Cleveland TABLE IX COMPARISON OF PROPOSED HYBRID MODEL OVER BENCHMARK COMPARISON OF CLASSIFIERS From Table V, VI, VII and VIII experimental results showed that cascaded K-means clustering and classification with cascaded Fuzzy Rough Set Feature Selection showed an enhanced classification accuracy. Table IX gives the comparison of the proposed model with benchmark comparison of results given in [20]. A. Research Findings 1) From Table VIII it is seen that the proposed model showed good accuracy when compared to the hybrid model from the literature study and benchmark comparison of results given in [20]. The highest accuracy obtained from the proposed model on Cleveland Heart dataset is 99.54%. 2) From the literature study it is noticed that ensemble classifiers always does not give promising enhanced results[4], it depends on the type of the base classifiers used, where as the proposed model in this study, irrespective of the classifier gives promising enhanced result when compared with traditional base classifiers and ensemble classifiers. It is also investigated that by integrating feature selection with K-means the accuracy of the classifier could be enhanced even further. VII. CONCLUSION Performance Evaluators Naïve Bayes K-NN (IBK) Part C4.5 (J48) SVM (SMO) Accuracy Error Rate Sensitivity Specificity Time to build the model Accuracy Range Benchmark Single classifier model Proposed Hybrid model Classifier with Highest accuracy on proposed model 46.2% to 90.0% 99.54% PART Various data mining techniques are available for diagnosis of a disease. The main goal of the research was to identify the enhancement of the hybrid model over single classifier model and ensemble model. Accuracy of different base classifiers depends on the type of the data and type of the features. Ensemble classifiers has advantage over traditional classifiers because it works on the concept, when dealing with a complicated problem, a group of experts with varied experience in the same area will have a higher probability of reaching a satisfactory solution than a single expert. To get good ensemble accuracy, the base classifiers should be simple and accurate so that they should not over fit because as known the disadvantage of ensemble learning is when trying to maximize classifier accuracy tends to increase the error of each individual base classifier. Therefore in this study a hybrid model of clustering and classification with hybrid feature selection was proposed to diagnose the presence or absence of heart disease to enhance the classification accuracy of the data set tested with 10-fold cross validation. The experimental results showed that irrespective of the type of the classifier the proposed hybrid approach with the combination of preprocessing and hybrid feature selection demonstrated the promising enhanced classification accuracy on Heart data set, because in the feature selection phase the redundant and irrelevant 469

8 features were eliminated and in clustering phase irrelevant instances were eliminated and then the resultant dataset was trained with different classifiers. VIII. FUTURE WORK Accuracy of the classifier also differs on the error rate of the cluster algorithm hence our future work will focus on applying the cluster algorithm which produces less error rate compared to k-means and also from the investigation it was found that feature selection before clustering enhanced the classification accuracy therefore our future work will also focus on applying different feature selection algorithms and test the performance of the proposed model on different datasets. REFERENCES [1] Data mining: Concepts and Techniques, by Jiawei Han and Micheline Kamber, Morgan Kaufmann Publishers, ISBN [2] Introduction to Data Mining, by Pang-Ning Tan, Michael Steinbach, and Vipin Kumar, Pearson/Addison Wesley, ISBN [3] T. G. Dietterich, Ensemble methods in machine learning, in MCS 00: Proceedings of the First International Workshop on Multiple Classifier Systems, pp. 1 15, Springer-Verlag, (London, UK), 2000 [4] Introduction to Machine Learning Second Edition by Ethem Alpaydın MIT Press, [5] Cios K J and Moore G W, 2002.Uniqueness of medical data mining. Artificial Intelligence in Medicine 26(1-2), [6] A Review of Ensemble Technique for Improving Majority Voting for Classifier Sarwesh Site M.Tech Scholer LNCT Bhopal India Dr. Sadhna K. Mishra Prof LNCT Bhopal India, IJARCSSE. Volume 3 Issue 1 ISSN: X [7] Srimani P. K. and Manjula Sanjay Koti.(2013)." Medical Diagnosis Using Ensemble Classifiers - A Novel Machine- Learning Approach". Journal of Advanced Computing.(2013) 1: 9-27 doi: /jac [8] Sarvestan Soltani A., Safavi A. A., Parandeh M. N. and Salehi M., Predicting Breast Cancer Survivability using data mining techniques, Software Technology and Engineering (ICSTE), 2nd International Conference, 2010, vol.2,pp [9] Asha Gowda Karegowda, M.A. Jayaram, A.S. Manjunath,Cascading K-means Clustering and K-Nearest Neighbor Classifier for Categorization of Diabetic Patients, International Journal of Engineering and Advanced Technology (IJEAT) ISSN: , Volume-1, Issue-3, (Feb 2012). [10] T. Asha, S. Natarajan, and K. N. B. Murthy, A Data Mining Approach to the Diagnosis of Tuberculosis by Cascading Clustering and Classification, Journal of computing, vol. 3, no. 4, 2011 [11] World Health Organization. Cardiovascular diseases (CVDs) Fact sheet Updated March 2013 Available at. [12] Prospective study if 1 million deaths in India: Rationale Design and Validation result Prabhat Jha Vendhan Gajalakshmi, Prakash C Gupta, Rajesh Kumar, Prem Mony, Neeraj Dhingra,Richard peto and RGI-CGHR Prospective Study Collaborators Mauricio Hernandez Avila, Academic Editor [13] UCI repository of machine learning databases. Irvine, CA: University of California, Department of Information and Computer Science. { [14] A Systematic Approach on Data Pre-processing In Data Mining S.S.Baskar, Dr. L. Arockiam, S.Charles. COMPUSOFT International Journal of Advanced Computer Technology' ISSN [15] Khaled Hammouda, Fakhreddine Karray, A Comparative Study of Data Clustering Techniques, University of Waterloo, Ontario, Canada, Volume 13, Issues 2-3, November 1997,pp [16] A Comparative Performance Analysis of Clustering Algorithms Pallavi, Sunila Godara International Journal of Engineering Research and Applications (IJERA) ISSN: Vol. 1, Issue 3, pp [17] Narendra Sharma, Aman Bajpai and Ratnesh Litoriya, Comparison the various clustering algorithms of weka tools, International Journal of Emerging Technology and Advanced Engineering, Volume 2, Issue 5, May 2012 [18] Cross-Validation PAYAM REFAEILZADEH, LEI TANG, HUAN LIU Arizona State University Encyclopedia of Database Systems. Springer US 2009, pp [19] I.H. Witten and E. Frank. Data Mining: Practical Machine Learning Tools and Techniques, Second Edition: Morgan Kaufmann Pub, 2005 [20] Benchmark datasets used for classification: comparison of results Computational Intelligence Laboratory Department of Informatics Nicolaus Copernicus University { 470

Heart Disease Diagnosis Using Predictive Data mining

Heart Disease Diagnosis Using Predictive Data mining ISSN (Online) : 2319-8753 ISSN (Print) : 2347-6710 International Journal of Innovative Research in Science, Engineering and Technology Volume 3, Special Issue 3, March 2014 2014 International Conference

More information

CLINICAL DECISION SUPPORT FOR HEART DISEASE USING PREDICTIVE MODELS

CLINICAL DECISION SUPPORT FOR HEART DISEASE USING PREDICTIVE MODELS CLINICAL DECISION SUPPORT FOR HEART DISEASE USING PREDICTIVE MODELS Srpriva Sundaraman Northwestern University [email protected] Sunil Kakade Northwestern University [email protected]

More information

Decision Support System on Prediction of Heart Disease Using Data Mining Techniques

Decision Support System on Prediction of Heart Disease Using Data Mining Techniques International Journal of Engineering Research and General Science Volume 3, Issue, March-April, 015 ISSN 091-730 Decision Support System on Prediction of Heart Disease Using Data Mining Techniques Ms.

More information

Predicting the Risk of Heart Attacks using Neural Network and Decision Tree

Predicting the Risk of Heart Attacks using Neural Network and Decision Tree Predicting the Risk of Heart Attacks using Neural Network and Decision Tree S.Florence 1, N.G.Bhuvaneswari Amma 2, G.Annapoorani 3, K.Malathi 4 PG Scholar, Indian Institute of Information Technology, Srirangam,

More information

Decision Support in Heart Disease Prediction System using Naive Bayes

Decision Support in Heart Disease Prediction System using Naive Bayes Decision Support in Heart Disease Prediction System using Naive Bayes Mrs.G.Subbalakshmi (M.Tech), Kakinada Institute of Engineering & Technology (Affiliated to JNTU-Kakinada), Yanam Road, Korangi-533461,

More information

Feature Selection for Classification in Medical Data Mining

Feature Selection for Classification in Medical Data Mining Feature Selection for Classification in Medical Data Mining Prof.K.Rajeswari 1, Dr.V.Vaithiyanathan 2 and Shailaja V.Pede 3 1 Associate Professor, PCCOE, Pune University, & Ph.D Research Scholar, SASTRA

More information

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015 An Introduction to Data Mining for Wind Power Management Spring 2015 Big Data World Every minute: Google receives over 4 million search queries Facebook users share almost 2.5 million pieces of content

More information

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014 RESEARCH ARTICLE OPEN ACCESS A Survey of Data Mining: Concepts with Applications and its Future Scope Dr. Zubair Khan 1, Ashish Kumar 2, Sunny Kumar 3 M.Tech Research Scholar 2. Department of Computer

More information

Web-Based Heart Disease Decision Support System using Data Mining Classification Modeling Techniques

Web-Based Heart Disease Decision Support System using Data Mining Classification Modeling Techniques Web-Based Heart Disease Decision Support System using Data Mining Classification Modeling Techniques Sellappan Palaniappan 1 ), Rafiah Awang 2 ) Abstract The healthcare industry collects huge amounts of

More information

Effective Analysis and Predictive Model of Stroke Disease using Classification Methods

Effective Analysis and Predictive Model of Stroke Disease using Classification Methods Effective Analysis and Predictive Model of Stroke Disease using Classification Methods A.Sudha Student, M.Tech (CSE) VIT University Vellore, India P.Gayathri Assistant Professor VIT University Vellore,

More information

Chapter 6. The stacking ensemble approach

Chapter 6. The stacking ensemble approach 82 This chapter proposes the stacking ensemble approach for combining different data mining classifiers to get better performance. Other combination techniques like voting, bagging etc are also described

More information

DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES

DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES Vijayalakshmi Mahanra Rao 1, Yashwant Prasad Singh 2 Multimedia University, Cyberjaya, MALAYSIA 1 [email protected]

More information

Intelligent Heart Disease Prediction System Using Data Mining Techniques *Ms. Ishtake S.H, ** Prof. Sanap S.A.

Intelligent Heart Disease Prediction System Using Data Mining Techniques *Ms. Ishtake S.H, ** Prof. Sanap S.A. Intelligent Heart Disease Prediction System Using Data Mining Techniques *Ms. Ishtake S.H, ** Prof. Sanap S.A. *Department of Copmputer science, MIT, Aurangabad, Maharashtra, India. ** Department of Computer

More information

Artificial Neural Network, Decision Tree and Statistical Techniques Applied for Designing and Developing E-mail Classifier

Artificial Neural Network, Decision Tree and Statistical Techniques Applied for Designing and Developing E-mail Classifier International Journal of Recent Technology and Engineering (IJRTE) ISSN: 2277-3878, Volume-1, Issue-6, January 2013 Artificial Neural Network, Decision Tree and Statistical Techniques Applied for Designing

More information

Predictive Data Mining for Medical Diagnosis: An Overview of Heart Disease Prediction

Predictive Data Mining for Medical Diagnosis: An Overview of Heart Disease Prediction Predictive Data Mining for Medical Diagnosis: An Overview of Heart Disease Prediction Jyoti Soni Ujma Ansari Dipesh Sharma Student, M.Tech (CSE). Professor Reader Raipur Institute of Technology Raipur

More information

Performance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification

Performance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification Performance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification Tina R. Patil, Mrs. S. S. Sherekar Sant Gadgebaba Amravati University, Amravati [email protected], [email protected]

More information

HYBRID PROBABILITY BASED ENSEMBLES FOR BANKRUPTCY PREDICTION

HYBRID PROBABILITY BASED ENSEMBLES FOR BANKRUPTCY PREDICTION HYBRID PROBABILITY BASED ENSEMBLES FOR BANKRUPTCY PREDICTION Chihli Hung 1, Jing Hong Chen 2, Stefan Wermter 3, 1,2 Department of Management Information Systems, Chung Yuan Christian University, Taiwan

More information

FRAUD DETECTION IN ELECTRIC POWER DISTRIBUTION NETWORKS USING AN ANN-BASED KNOWLEDGE-DISCOVERY PROCESS

FRAUD DETECTION IN ELECTRIC POWER DISTRIBUTION NETWORKS USING AN ANN-BASED KNOWLEDGE-DISCOVERY PROCESS FRAUD DETECTION IN ELECTRIC POWER DISTRIBUTION NETWORKS USING AN ANN-BASED KNOWLEDGE-DISCOVERY PROCESS Breno C. Costa, Bruno. L. A. Alberto, André M. Portela, W. Maduro, Esdras O. Eler PDITec, Belo Horizonte,

More information

ANALYSIS OF FEATURE SELECTION WITH CLASSFICATION: BREAST CANCER DATASETS

ANALYSIS OF FEATURE SELECTION WITH CLASSFICATION: BREAST CANCER DATASETS ANALYSIS OF FEATURE SELECTION WITH CLASSFICATION: BREAST CANCER DATASETS Abstract D.Lavanya * Department of Computer Science, Sri Padmavathi Mahila University Tirupati, Andhra Pradesh, 517501, India [email protected]

More information

REVIEW OF HEART DISEASE PREDICTION SYSTEM USING DATA MINING AND HYBRID INTELLIGENT TECHNIQUES

REVIEW OF HEART DISEASE PREDICTION SYSTEM USING DATA MINING AND HYBRID INTELLIGENT TECHNIQUES REVIEW OF HEART DISEASE PREDICTION SYSTEM USING DATA MINING AND HYBRID INTELLIGENT TECHNIQUES R. Chitra 1 and V. Seenivasagam 2 1 Department of Computer Science and Engineering, Noorul Islam Centre for

More information

ENSEMBLE DECISION TREE CLASSIFIER FOR BREAST CANCER DATA

ENSEMBLE DECISION TREE CLASSIFIER FOR BREAST CANCER DATA ENSEMBLE DECISION TREE CLASSIFIER FOR BREAST CANCER DATA D.Lavanya 1 and Dr.K.Usha Rani 2 1 Research Scholar, Department of Computer Science, Sree Padmavathi Mahila Visvavidyalayam, Tirupati, Andhra Pradesh,

More information

Data Mining Practical Machine Learning Tools and Techniques

Data Mining Practical Machine Learning Tools and Techniques Ensemble learning Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 8 of Data Mining by I. H. Witten, E. Frank and M. A. Hall Combining multiple models Bagging The basic idea

More information

DATA MINING AND REPORTING IN HEALTHCARE

DATA MINING AND REPORTING IN HEALTHCARE DATA MINING AND REPORTING IN HEALTHCARE Divya Gandhi 1, Pooja Asher 2, Harshada Chaudhari 3 1,2,3 Department of Information Technology, Sardar Patel Institute of Technology, Mumbai,(India) ABSTRACT The

More information

Keywords data mining, prediction techniques, decision making.

Keywords data mining, prediction techniques, decision making. Volume 5, Issue 4, April 2015 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Analysis of Datamining

More information

An Overview of Knowledge Discovery Database and Data mining Techniques

An Overview of Knowledge Discovery Database and Data mining Techniques An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,

More information

Artificial Neural Network Approach for Classification of Heart Disease Dataset

Artificial Neural Network Approach for Classification of Heart Disease Dataset Artificial Neural Network Approach for Classification of Heart Disease Dataset Manjusha B. Wadhonkar 1, Prof. P.A. Tijare 2 and Prof. S.N.Sawalkar 3 1 M.E Computer Engineering (Second Year)., Computer

More information

A Novel Approach for Heart Disease Diagnosis using Data Mining and Fuzzy Logic

A Novel Approach for Heart Disease Diagnosis using Data Mining and Fuzzy Logic A Novel Approach for Heart Disease Diagnosis using Data Mining and Fuzzy Logic Nidhi Bhatla GNDEC, Ludhiana, India Kiran Jyoti GNDEC, Ludhiana, India ABSTRACT Cardiovascular disease is a term used to describe

More information

A NEW DECISION TREE METHOD FOR DATA MINING IN MEDICINE

A NEW DECISION TREE METHOD FOR DATA MINING IN MEDICINE A NEW DECISION TREE METHOD FOR DATA MINING IN MEDICINE Kasra Madadipouya 1 1 Department of Computing and Science, Asia Pacific University of Technology & Innovation ABSTRACT Today, enormous amount of data

More information

SVM Ensemble Model for Investment Prediction

SVM Ensemble Model for Investment Prediction 19 SVM Ensemble Model for Investment Prediction Chandra J, Assistant Professor, Department of Computer Science, Christ University, Bangalore Siji T. Mathew, Research Scholar, Christ University, Dept of

More information

Prediction of Heart Disease Using Naïve Bayes Algorithm

Prediction of Heart Disease Using Naïve Bayes Algorithm Prediction of Heart Disease Using Naïve Bayes Algorithm R.Karthiyayini 1, S.Chithaara 2 Assistant Professor, Department of computer Applications, Anna University, BIT campus, Tiruchirapalli, Tamilnadu,

More information

Social Media Mining. Data Mining Essentials

Social Media Mining. Data Mining Essentials Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers

More information

International Journal of Computer Science Trends and Technology (IJCST) Volume 3 Issue 3, May-June 2015

International Journal of Computer Science Trends and Technology (IJCST) Volume 3 Issue 3, May-June 2015 RESEARCH ARTICLE OPEN ACCESS Data Mining Technology for Efficient Network Security Management Ankit Naik [1], S.W. Ahmad [2] Student [1], Assistant Professor [2] Department of Computer Science and Engineering

More information

A Content based Spam Filtering Using Optical Back Propagation Technique

A Content based Spam Filtering Using Optical Back Propagation Technique A Content based Spam Filtering Using Optical Back Propagation Technique Sarab M. Hameed 1, Noor Alhuda J. Mohammed 2 Department of Computer Science, College of Science, University of Baghdad - Iraq ABSTRACT

More information

Knowledge Discovery and Data Mining

Knowledge Discovery and Data Mining Knowledge Discovery and Data Mining Unit # 11 Sajjad Haider Fall 2013 1 Supervised Learning Process Data Collection/Preparation Data Cleaning Discretization Supervised/Unuspervised Identification of right

More information

REVIEW OF ENSEMBLE CLASSIFICATION

REVIEW OF ENSEMBLE CLASSIFICATION Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology ISSN 2320 088X IJCSMC, Vol. 2, Issue.

More information

Data Mining. Nonlinear Classification

Data Mining. Nonlinear Classification Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Nonlinear Classification Classes may not be separable by a linear boundary Suppose we randomly generate a data set as follows: X has range between 0 to 15

More information

Genetic Neural Approach for Heart Disease Prediction

Genetic Neural Approach for Heart Disease Prediction Genetic Neural Approach for Heart Disease Prediction Nilakshi P. Waghulde 1, Nilima P. Patil 2 Abstract Data mining techniques are used to explore, analyze and extract data using complex algorithms in

More information

International Journal of Advance Research in Computer Science and Management Studies

International Journal of Advance Research in Computer Science and Management Studies Volume 2, Issue 12, December 2014 ISSN: 2321 7782 (Online) International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online

More information

An Analysis of Missing Data Treatment Methods and Their Application to Health Care Dataset

An Analysis of Missing Data Treatment Methods and Their Application to Health Care Dataset P P P Health An Analysis of Missing Data Treatment Methods and Their Application to Health Care Dataset Peng Liu 1, Elia El-Darzi 2, Lei Lei 1, Christos Vasilakis 2, Panagiotis Chountas 2, and Wei Huang

More information

Data Mining - Evaluation of Classifiers

Data Mining - Evaluation of Classifiers Data Mining - Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010

More information

DataMining Clustering Techniques in the Prediction of Heart Disease using Attribute Selection Method

DataMining Clustering Techniques in the Prediction of Heart Disease using Attribute Selection Method DataMining Clustering Techniques in the Prediction of Heart Disease using Attribute Selection Method Atul Kumar Pandey*, Prabhat Pandey**, K.L. Jaiswal***, Ashish Kumar Sen**** ABSTRACT Heart disease is

More information

Improving spam mail filtering using classification algorithms with discretization Filter

Improving spam mail filtering using classification algorithms with discretization Filter International Association of Scientific Innovation and Research (IASIR) (An Association Unifying the Sciences, Engineering, and Applied Research) International Journal of Emerging Technologies in Computational

More information

AUTO CLAIM FRAUD DETECTION USING MULTI CLASSIFIER SYSTEM

AUTO CLAIM FRAUD DETECTION USING MULTI CLASSIFIER SYSTEM AUTO CLAIM FRAUD DETECTION USING MULTI CLASSIFIER SYSTEM ABSTRACT Luis Alexandre Rodrigues and Nizam Omar Department of Electrical Engineering, Mackenzie Presbiterian University, Brazil, São Paulo [email protected],[email protected]

More information

How To Solve The Kd Cup 2010 Challenge

How To Solve The Kd Cup 2010 Challenge A Lightweight Solution to the Educational Data Mining Challenge Kun Liu Yan Xing Faculty of Automation Guangdong University of Technology Guangzhou, 510090, China [email protected] [email protected]

More information

INTERNATIONAL JOURNAL FOR ENGINEERING APPLICATIONS AND TECHNOLOGY DATA MINING IN HEALTHCARE SECTOR. [email protected]

INTERNATIONAL JOURNAL FOR ENGINEERING APPLICATIONS AND TECHNOLOGY DATA MINING IN HEALTHCARE SECTOR. ankitanandurkar2394@gmail.com IJFEAT INTERNATIONAL JOURNAL FOR ENGINEERING APPLICATIONS AND TECHNOLOGY DATA MINING IN HEALTHCARE SECTOR Bharti S. Takey 1, Ankita N. Nandurkar 2,Ashwini A. Khobragade 3,Pooja G. Jaiswal 4,Swapnil R.

More information

REVIEW ON PREDICTION SYSTEM FOR HEART DIAGNOSIS USING DATA MINING TECHNIQUES

REVIEW ON PREDICTION SYSTEM FOR HEART DIAGNOSIS USING DATA MINING TECHNIQUES International Journal of Latest Research in Engineering and Technology (IJLRET) ISSN: 2454-5031(Online) ǁ Volume 1 Issue 5ǁOctober 2015 ǁ PP 09-14 REVIEW ON PREDICTION SYSTEM FOR HEART DIAGNOSIS USING

More information

Intelligent and Effective Heart Disease Prediction System using Weighted Associative Classifiers

Intelligent and Effective Heart Disease Prediction System using Weighted Associative Classifiers Intelligent and Effective Heart Disease Prediction System using Weighted Associative Classifiers Jyoti Soni, Uzma Ansari, Dipesh Sharma Computer Science Raipur Institute of Technology, Raipur C.G., India

More information

DATA MINING TECHNIQUES AND APPLICATIONS

DATA MINING TECHNIQUES AND APPLICATIONS DATA MINING TECHNIQUES AND APPLICATIONS Mrs. Bharati M. Ramageri, Lecturer Modern Institute of Information Technology and Research, Department of Computer Application, Yamunanagar, Nigdi Pune, Maharashtra,

More information

CS 2750 Machine Learning. Lecture 1. Machine Learning. http://www.cs.pitt.edu/~milos/courses/cs2750/ CS 2750 Machine Learning.

CS 2750 Machine Learning. Lecture 1. Machine Learning. http://www.cs.pitt.edu/~milos/courses/cs2750/ CS 2750 Machine Learning. Lecture Machine Learning Milos Hauskrecht [email protected] 539 Sennott Square, x5 http://www.cs.pitt.edu/~milos/courses/cs75/ Administration Instructor: Milos Hauskrecht [email protected] 539 Sennott

More information

International Journal of Software and Web Sciences (IJSWS) www.iasir.net

International Journal of Software and Web Sciences (IJSWS) www.iasir.net International Association of Scientific Innovation and Research (IASIR) (An Association Unifying the Sciences, Engineering, and Applied Research) ISSN (Print): 2279-0063 ISSN (Online): 2279-0071 International

More information

Experiments in Web Page Classification for Semantic Web

Experiments in Web Page Classification for Semantic Web Experiments in Web Page Classification for Semantic Web Asad Satti, Nick Cercone, Vlado Kešelj Faculty of Computer Science, Dalhousie University E-mail: {rashid,nick,vlado}@cs.dal.ca Abstract We address

More information

Classification algorithm in Data mining: An Overview

Classification algorithm in Data mining: An Overview Classification algorithm in Data mining: An Overview S.Neelamegam #1, Dr.E.Ramaraj *2 #1 M.phil Scholar, Department of Computer Science and Engineering, Alagappa University, Karaikudi. *2 Professor, Department

More information

Comparison of Data Mining Techniques used for Financial Data Analysis

Comparison of Data Mining Techniques used for Financial Data Analysis Comparison of Data Mining Techniques used for Financial Data Analysis Abhijit A. Sawant 1, P. M. Chawan 2 1 Student, 2 Associate Professor, Department of Computer Technology, VJTI, Mumbai, INDIA Abstract

More information

Impelling Heart Attack Prediction System using Data Mining and Artificial Neural Network

Impelling Heart Attack Prediction System using Data Mining and Artificial Neural Network General Article International Journal of Current Engineering and Technology E-ISSN 2277 4106, P-ISSN 2347-5161 2014 INPRESSCO, All Rights Reserved Available at http://inpressco.com/category/ijcet Impelling

More information

Towards better accuracy for Spam predictions

Towards better accuracy for Spam predictions Towards better accuracy for Spam predictions Chengyan Zhao Department of Computer Science University of Toronto Toronto, Ontario, Canada M5S 2E4 [email protected] Abstract Spam identification is crucial

More information

Comparison of K-means and Backpropagation Data Mining Algorithms

Comparison of K-means and Backpropagation Data Mining Algorithms Comparison of K-means and Backpropagation Data Mining Algorithms Nitu Mathuriya, Dr. Ashish Bansal Abstract Data mining has got more and more mature as a field of basic research in computer science and

More information

Keywords Data mining, Classification Algorithm, Decision tree, J48, Random forest, Random tree, LMT, WEKA 3.7. Fig.1. Data mining techniques.

Keywords Data mining, Classification Algorithm, Decision tree, J48, Random forest, Random tree, LMT, WEKA 3.7. Fig.1. Data mining techniques. International Journal of Emerging Research in Management &Technology Research Article October 2015 Comparative Study of Various Decision Tree Classification Algorithm Using WEKA Purva Sewaiwar, Kamal Kant

More information

Université de Montpellier 2 Hugo Alatrista-Salas : [email protected]

Université de Montpellier 2 Hugo Alatrista-Salas : hugo.alatrista-salas@teledetection.fr Université de Montpellier 2 Hugo Alatrista-Salas : [email protected] WEKA Gallirallus Zeland) australis : Endemic bird (New Characteristics Waikato university Weka is a collection

More information

Data Mining Part 5. Prediction

Data Mining Part 5. Prediction Data Mining Part 5. Prediction 5.1 Spring 2010 Instructor: Dr. Masoud Yaghini Outline Classification vs. Numeric Prediction Prediction Process Data Preparation Comparing Prediction Methods References Classification

More information

Feature Selection using Integer and Binary coded Genetic Algorithm to improve the performance of SVM Classifier

Feature Selection using Integer and Binary coded Genetic Algorithm to improve the performance of SVM Classifier Feature Selection using Integer and Binary coded Genetic Algorithm to improve the performance of SVM Classifier D.Nithya a, *, V.Suganya b,1, R.Saranya Irudaya Mary c,1 Abstract - This paper presents,

More information

Random forest algorithm in big data environment

Random forest algorithm in big data environment Random forest algorithm in big data environment Yingchun Liu * School of Economics and Management, Beihang University, Beijing 100191, China Received 1 September 2014, www.cmnt.lv Abstract Random forest

More information

A Study Of Bagging And Boosting Approaches To Develop Meta-Classifier

A Study Of Bagging And Boosting Approaches To Develop Meta-Classifier A Study Of Bagging And Boosting Approaches To Develop Meta-Classifier G.T. Prasanna Kumari Associate Professor, Dept of Computer Science and Engineering, Gokula Krishna College of Engg, Sullurpet-524121,

More information

Data Mining Techniques for Prognosis in Pancreatic Cancer

Data Mining Techniques for Prognosis in Pancreatic Cancer Data Mining Techniques for Prognosis in Pancreatic Cancer by Stuart Floyd A Thesis Submitted to the Faculty of the WORCESTER POLYTECHNIC INSTITUE In partial fulfillment of the requirements for the Degree

More information

ON INTEGRATING UNSUPERVISED AND SUPERVISED CLASSIFICATION FOR CREDIT RISK EVALUATION

ON INTEGRATING UNSUPERVISED AND SUPERVISED CLASSIFICATION FOR CREDIT RISK EVALUATION ISSN 9 X INFORMATION TECHNOLOGY AND CONTROL, 00, Vol., No.A ON INTEGRATING UNSUPERVISED AND SUPERVISED CLASSIFICATION FOR CREDIT RISK EVALUATION Danuta Zakrzewska Institute of Computer Science, Technical

More information

BIOINF 585 Fall 2015 Machine Learning for Systems Biology & Clinical Informatics http://www.ccmb.med.umich.edu/node/1376

BIOINF 585 Fall 2015 Machine Learning for Systems Biology & Clinical Informatics http://www.ccmb.med.umich.edu/node/1376 Course Director: Dr. Kayvan Najarian (DCM&B, [email protected]) Lectures: Labs: Mondays and Wednesdays 9:00 AM -10:30 AM Rm. 2065 Palmer Commons Bldg. Wednesdays 10:30 AM 11:30 AM (alternate weeks) Rm.

More information

Data Mining Approach to Detect Heart Dieses

Data Mining Approach to Detect Heart Dieses International Journal of Advanced Computer Science and Information Technology (IJACSIT) Vol. 2, No. 4, 2013, Page: 56-66, ISSN: 2296-1739 Helvetic Editions LTD, Switzerland www.elvedit.com Data Mining

More information

Analysis of WEKA Data Mining Algorithm REPTree, Simple Cart and RandomTree for Classification of Indian News

Analysis of WEKA Data Mining Algorithm REPTree, Simple Cart and RandomTree for Classification of Indian News Analysis of WEKA Data Mining Algorithm REPTree, Simple Cart and RandomTree for Classification of Indian News Sushilkumar Kalmegh Associate Professor, Department of Computer Science, Sant Gadge Baba Amravati

More information

Predicting the Analysis of Heart Disease Symptoms Using Medicinal Data Mining Methods

Predicting the Analysis of Heart Disease Symptoms Using Medicinal Data Mining Methods Predicting the Analysis of Heart Disease Symptoms Using Medicinal Data Mining Methods V. Manikantan & S. Latha Department of Computer Science and Engineering, Mahendra Institute of Technology Tiruchengode,

More information

BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL

BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL The Fifth International Conference on e-learning (elearning-2014), 22-23 September 2014, Belgrade, Serbia BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL SNJEŽANA MILINKOVIĆ University

More information

Supervised Learning (Big Data Analytics)

Supervised Learning (Big Data Analytics) Supervised Learning (Big Data Analytics) Vibhav Gogate Department of Computer Science The University of Texas at Dallas Practical advice Goal of Big Data Analytics Uncover patterns in Data. Can be used

More information

Ensemble Methods. Knowledge Discovery and Data Mining 2 (VU) (707.004) Roman Kern. KTI, TU Graz 2015-03-05

Ensemble Methods. Knowledge Discovery and Data Mining 2 (VU) (707.004) Roman Kern. KTI, TU Graz 2015-03-05 Ensemble Methods Knowledge Discovery and Data Mining 2 (VU) (707004) Roman Kern KTI, TU Graz 2015-03-05 Roman Kern (KTI, TU Graz) Ensemble Methods 2015-03-05 1 / 38 Outline 1 Introduction 2 Classification

More information

Predicting Student Performance by Using Data Mining Methods for Classification

Predicting Student Performance by Using Data Mining Methods for Classification BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume 13, No 1 Sofia 2013 Print ISSN: 1311-9702; Online ISSN: 1314-4081 DOI: 10.2478/cait-2013-0006 Predicting Student Performance

More information

BIDM Project. Predicting the contract type for IT/ITES outsourcing contracts

BIDM Project. Predicting the contract type for IT/ITES outsourcing contracts BIDM Project Predicting the contract type for IT/ITES outsourcing contracts N a n d i n i G o v i n d a r a j a n ( 6 1 2 1 0 5 5 6 ) The authors believe that data modelling can be used to predict if an

More information

HIGH DIMENSIONAL UNSUPERVISED CLUSTERING BASED FEATURE SELECTION ALGORITHM

HIGH DIMENSIONAL UNSUPERVISED CLUSTERING BASED FEATURE SELECTION ALGORITHM HIGH DIMENSIONAL UNSUPERVISED CLUSTERING BASED FEATURE SELECTION ALGORITHM Ms.Barkha Malay Joshi M.E. Computer Science and Engineering, Parul Institute Of Engineering & Technology, Waghodia. India Email:

More information

Keywords Data Mining, Knowledge Discovery, Direct Marketing, Classification Techniques, Customer Relationship Management

Keywords Data Mining, Knowledge Discovery, Direct Marketing, Classification Techniques, Customer Relationship Management Volume 4, Issue 6, June 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com A Simplified Data

More information

DETECTION OF HEALTH CARE USING DATAMINING CONCEPTS THROUGH WEB

DETECTION OF HEALTH CARE USING DATAMINING CONCEPTS THROUGH WEB DETECTION OF HEALTH CARE USING DATAMINING CONCEPTS THROUGH WEB Mounika NaiduP, Mtech(CS), ASCET, Gudur, pmounikanaidu@gmailcom C Rajendra, Prof, HOD, CSE Dept, ASCET, Gudur, hodcse@audisankaracom Abstract:

More information

A Comparative Analysis of Classification Techniques on Categorical Data in Data Mining

A Comparative Analysis of Classification Techniques on Categorical Data in Data Mining A Comparative Analysis of Classification Techniques on Categorical Data in Data Mining Sakshi Department Of Computer Science And Engineering United College of Engineering & Research Naini Allahabad [email protected]

More information

Knowledge Discovery and Data Mining

Knowledge Discovery and Data Mining Knowledge Discovery and Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Evaluating the Accuracy of a Classifier Holdout, random subsampling, crossvalidation, and the bootstrap are common techniques for

More information

Application of Data Mining Techniques to Model Breast Cancer Data

Application of Data Mining Techniques to Model Breast Cancer Data Application of Data Mining Techniques to Model Breast Cancer Data S. Syed Shajahaan 1, S. Shanthi 2, V. ManoChitra 3 1 Department of Information Technology, Rathinam Technical Campus, Anna University,

More information

Reference Books. Data Mining. Supervised vs. Unsupervised Learning. Classification: Definition. Classification k-nearest neighbors

Reference Books. Data Mining. Supervised vs. Unsupervised Learning. Classification: Definition. Classification k-nearest neighbors Classification k-nearest neighbors Data Mining Dr. Engin YILDIZTEPE Reference Books Han, J., Kamber, M., Pei, J., (2011). Data Mining: Concepts and Techniques. Third edition. San Francisco: Morgan Kaufmann

More information

PERFORMANCE ANALYSIS OF CLASSIFICATION DATA MINING TECHNIQUES OVER HEART DISEASE DATA BASE

PERFORMANCE ANALYSIS OF CLASSIFICATION DATA MINING TECHNIQUES OVER HEART DISEASE DATA BASE PERFORMANCE ANALYSIS OF CLASSIFICATION DATA MINING TECHNIQUES OVER HEART DISEASE DATA BASE N. Aditya Sundar 1, P. Pushpa Latha 2, M. Rama Chandra 3 1 Asst.professor, CSE Department, GMR Institute of Technology,

More information

Data Mining Solutions for the Business Environment

Data Mining Solutions for the Business Environment Database Systems Journal vol. IV, no. 4/2013 21 Data Mining Solutions for the Business Environment Ruxandra PETRE University of Economic Studies, Bucharest, Romania [email protected] Over

More information

Data Mining Framework for Direct Marketing: A Case Study of Bank Marketing

Data Mining Framework for Direct Marketing: A Case Study of Bank Marketing www.ijcsi.org 198 Data Mining Framework for Direct Marketing: A Case Study of Bank Marketing Lilian Sing oei 1 and Jiayang Wang 2 1 School of Information Science and Engineering, Central South University

More information

Addressing the Class Imbalance Problem in Medical Datasets

Addressing the Class Imbalance Problem in Medical Datasets Addressing the Class Imbalance Problem in Medical Datasets M. Mostafizur Rahman and D. N. Davis the size of the training set is significantly increased [5]. If the time taken to resample is not considered,

More information

Enhanced Boosted Trees Technique for Customer Churn Prediction Model

Enhanced Boosted Trees Technique for Customer Churn Prediction Model IOSR Journal of Engineering (IOSRJEN) ISSN (e): 2250-3021, ISSN (p): 2278-8719 Vol. 04, Issue 03 (March. 2014), V5 PP 41-45 www.iosrjen.org Enhanced Boosted Trees Technique for Customer Churn Prediction

More information

New Ensemble Combination Scheme

New Ensemble Combination Scheme New Ensemble Combination Scheme Namhyoung Kim, Youngdoo Son, and Jaewook Lee, Member, IEEE Abstract Recently many statistical learning techniques are successfully developed and used in several areas However,

More information

Evaluation of Feature Selection Methods for Predictive Modeling Using Neural Networks in Credits Scoring

Evaluation of Feature Selection Methods for Predictive Modeling Using Neural Networks in Credits Scoring 714 Evaluation of Feature election Methods for Predictive Modeling Using Neural Networks in Credits coring Raghavendra B. K. Dr. M.G.R. Educational and Research Institute, Chennai-95 Email: [email protected]

More information

Data Mining Algorithms Part 1. Dejan Sarka

Data Mining Algorithms Part 1. Dejan Sarka Data Mining Algorithms Part 1 Dejan Sarka Join the conversation on Twitter: @DevWeek #DW2015 Instructor Bio Dejan Sarka ([email protected]) 30 years of experience SQL Server MVP, MCT, 13 books 7+ courses

More information

EMPIRICAL STUDY ON SELECTION OF TEAM MEMBERS FOR SOFTWARE PROJECTS DATA MINING APPROACH

EMPIRICAL STUDY ON SELECTION OF TEAM MEMBERS FOR SOFTWARE PROJECTS DATA MINING APPROACH EMPIRICAL STUDY ON SELECTION OF TEAM MEMBERS FOR SOFTWARE PROJECTS DATA MINING APPROACH SANGITA GUPTA 1, SUMA. V. 2 1 Jain University, Bangalore 2 Dayanada Sagar Institute, Bangalore, India Abstract- One

More information

Maschinelles Lernen mit MATLAB

Maschinelles Lernen mit MATLAB Maschinelles Lernen mit MATLAB Jérémy Huard Applikationsingenieur The MathWorks GmbH 2015 The MathWorks, Inc. 1 Machine Learning is Everywhere Image Recognition Speech Recognition Stock Prediction Medical

More information

A Review of Missing Data Treatment Methods

A Review of Missing Data Treatment Methods A Review of Missing Data Treatment Methods Liu Peng, Lei Lei Department of Information Systems, Shanghai University of Finance and Economics, Shanghai, 200433, P.R. China ABSTRACT Missing data is a common

More information

Predictive Data modeling for health care: Comparative performance study of different prediction models

Predictive Data modeling for health care: Comparative performance study of different prediction models Predictive Data modeling for health care: Comparative performance study of different prediction models Shivanand Hiremath [email protected] National Institute of Industrial Engineering (NITIE) Vihar

More information

Overview. Evaluation Connectionist and Statistical Language Processing. Test and Validation Set. Training and Test Set

Overview. Evaluation Connectionist and Statistical Language Processing. Test and Validation Set. Training and Test Set Overview Evaluation Connectionist and Statistical Language Processing Frank Keller [email protected] Computerlinguistik Universität des Saarlandes training set, validation set, test set holdout, stratification

More information

Data Mining as a tool to Predict the Churn Behaviour among Indian bank customers

Data Mining as a tool to Predict the Churn Behaviour among Indian bank customers Data Mining as a tool to Predict the Churn Behaviour among Indian bank customers Manjit Kaur Department of Computer Science Punjabi University Patiala, India [email protected] Dr. Kawaljeet Singh University

More information

Introduction to Data Mining

Introduction to Data Mining Introduction to Data Mining Jay Urbain Credits: Nazli Goharian & David Grossman @ IIT Outline Introduction Data Pre-processing Data Mining Algorithms Naïve Bayes Decision Tree Neural Network Association

More information

A Survey on classification & feature selection technique based ensemble models in health care domain

A Survey on classification & feature selection technique based ensemble models in health care domain A Survey on classification & feature selection technique based ensemble models in health care domain GarimaSahu M.Tech (CSE) Raipur Institute of Technology,(R.I.T.) Raipur, Chattishgarh, India [email protected]

More information

CLUSTERING AND PREDICTIVE MODELING: AN ENSEMBLE APPROACH

CLUSTERING AND PREDICTIVE MODELING: AN ENSEMBLE APPROACH CLUSTERING AND PREDICTIVE MODELING: AN ENSEMBLE APPROACH Except where reference is made to the work of others, the work described in this thesis is my own or was done in collaboration with my advisory

More information

MS1b Statistical Data Mining

MS1b Statistical Data Mining MS1b Statistical Data Mining Yee Whye Teh Department of Statistics Oxford http://www.stats.ox.ac.uk/~teh/datamining.html Outline Administrivia and Introduction Course Structure Syllabus Introduction to

More information

Using multiple models: Bagging, Boosting, Ensembles, Forests

Using multiple models: Bagging, Boosting, Ensembles, Forests Using multiple models: Bagging, Boosting, Ensembles, Forests Bagging Combining predictions from multiple models Different models obtained from bootstrap samples of training data Average predictions or

More information