EFFECT OF DISCRETIZATION METHOD ON THE DIAGNOSIS OF PARKINSON S DISEASE

Size: px
Start display at page:

Download "EFFECT OF DISCRETIZATION METHOD ON THE DIAGNOSIS OF PARKINSON S DISEASE"

Transcription

1 International Journal of Innovative Computing, Information and Control ICIC International c 2011 ISSN Volume 7, Number 8, August 2011 pp EFFECT OF DISCRETIZATION METHOD ON THE DIAGNOSIS OF PARKINSON S DISEASE Ersin Kaya, Oğuz Findik, İsmail Babaoğlu and Ahmet Arslan Department of Computer Engineering Faculty of Engineering and Architecture Selçuk University Selçuklu, Konya 42075, Turkey { ersinkaya; oguzf; ibabaoglu; ahmetarslan }@selcuk.edu.tr Received April 2010; revised January 2011 Abstract. Implementing different classification methods, this study analyzes the effect of discretization on the diagnosis of Parkinson s disease. Entropy-based discretization method is used as the discretization method, and support vector machines, C4.5, k-nearest neighbors and Naïve Bayes are used as the classification methods. The diagnosis of Parkinson s disease is implemented without using any preprocessing method. Afterwards, the Parkinson s disease dataset is classified after implementing entropy-based discretization on the dataset. Both results are compared, and it is observed that using discretization method increases the success of classification on the diagnosis of Parkinson s disease by 4.1% to 12.8%. Keywords: Parkinson s disease, Entropy-based discretization method, Classification methods 1. Introduction. Parkinson s disease is a kind of nervous system disorder which generally arises mostly in men in their 50s. This disease is firstly discovered by James Parkinson, so it was called Parkinson s disease [1]. The symptoms like poverty of movement, slowness of movement, rigidity and rest tremor are commonly diagnosed in patients with Parkinson s disease [2]. Nowadays, no treatment for Parkinson s disease is available. However, if the disease is diagnosed at an earlier time, drug treatments mitigating the effects of the symptoms are implemented at clinic environments [3]. Research into this disease shows that sound distortion occurs on 90% of Parkinson s disease [4,5]. Much research was performed by using voice disorders for the diagnosis of Parkinson s disease [6]. Little et al. used linear discriminant analysis (LDA) to identify the characteristics of sound data to be used in the diagnosis of the disease. For the diagnosis of Parkinson s disease, they composed a model using selected properties with support vector machine (SVM) classifier [7]. The data subjected to preprocessing in the classification process increase the performance of classification [8,9]. Discretization in the data mining is an important preprocessing type. Continuous-valued features in dataset are transformed to discrete values with discretization method. Research shows that discretization of continuous values features increases the performance of the classification. Polat et al. studying to diagnose nerve disease showed that when used with traditional methods like artificial neural network (ANN), least squares support vector machines, and C4.5, discretization increases the performance of classification [8]. Abraham et al. studying on 28 publicly available medical dataset pointed out the effect of discretization on the success of Naïve Bayes classification [10]. Demsar et al. created a predictive model on data consisting of 69 examples and 174 properties belonging to trauma patients. They used decision tree and Naïve Bayes 4669

2 4670 E. KAYA, O. FINDIK, İ. BABAOĞLU AND A. ARSLAN classification method in this model. Positive effects on the success of discretization methods have been shown in this study [11]. Acid et al. introduced a model which evaluates the performance of emergency service of a Spanish hospital by using Bayesian network. In their study, some continuously valued features were transformed into interval-valued features [12]. The data obtained from University of California Irvine (UCI) machine learning repository housing Parkinson s disease dataset are used in this study. The continuously valued features in the data are transformed into interval-valued features by discretization method based on entropy. Original data and discretizated data are classified by using Naïve Bayes, C4.5, k-nearest neighbor (k-nn) and SVM classifier methods. The results are compared with each other, and also the effect of discretization on the classification accuracy is shown. 2. Materials and Methods The Parkinson dataset. In this study, the dataset obtained from UCI machine learning repository is used. This dataset is composed of 32 people from both sexes, of them being Parkinson patients. 7 biomedical voice measurements are obtained from S21, S27 and S35 and 6 biomedical voice measurements from the others. The dataset is composed of 195 measurements and 22 features. Detailed analysis of the dataset is shown in Table Discretization. Discretization is an important pre-processing method in data analysis concept. By discretization methods, continuous-valued features are transformed into interval-valued features. Because the data is transformed into a more meaningful shape, the performance of the classification becomes more effective. There are many discretization methods like entropy-based, equal frequency and equal width discretization in literature [13,14]. Common steps of discretization methods are shown in Figure 1 and these steps can be summarized as follows. Firstly, values of the continuous-valued feature in the dataset are sorted. Then, the candidate cut points are determined for this continuous-valued feature. Fitness values of obtained candidate cut points are computed and values of the continuous-valued feature are splitted according to candidate cut point which has the best fitness value. These steps are used recursively until the stopping criterion. A discretization method is identified by determination of the candidate cut points, computation of the fitness values of candidate cut points and the stopping criterion Entropy-based discretization. Entropy-based discretization method is a commonly used discretization method proposed by Fayyad and Irani [15]. In this method, candidate cut-points are determined for the continuous-valued feature. The cut-point is selected according to the entropy of the candidate cut-points. Entropies of candidate cut-points are defined by following expressions: E(A, T ; S) = S 1 S Ent(S 1) S 2 S Ent(S 2) (1) Ent(S) = Z p(c i, S) log 2 (p(c i, S)) (2) i=1 where A is the feature which is going to be discretizated, T is candidate cut point, S is the set of samples, S 1 and S 2 are the subsets of the split samples for the left and right part of S, respectively, Z is the number of the classes in the dataset, C i is the decision value of the ith class, p(c i, S) is the proportion of samples/instances lying in the class C i.

3 EFFECT OF DISCRETIZATION METHOD ON DIAGNOSIS OF PARKINSON S DISEASE 4671 Table 1. Detailed analysis of the dataset Features Max Min Median Mean SD MDVP:Fo (Hz) , MDVP:Fhi (Hz) , MDVP:Flo (Hz) , MDVP:Jitter (%) , MDVP:Jitter (Abs) , MDVP:RAP , MDVP:PPQ , Jitter:DDP , MDVP:Shimmer , MDVP:Shimmer (db) , Shimmer:APQ , Shimmer:APQ , MDVP:APQ , Shimmer:DDA , NHR , HNR , RPDE , DFA , spread , Spread , D , PPE , Feature, names of the features obtained from biomedical voice measurements; Max, maximum value of the features; Min, minimum value of the features; Median, median value of the features; Mean, mean value of the features; SD, Standard derivation of the features; NoC, number of the cut points obtained after discretization; CP, value of the cut points obtained after discretization. After selection of the cut-point which has the minimum entropy, values of the continuousvalued feature are splitted into two parts. Then, this procedure is repeated until the stopping criterion is reached for each part. In entropy-based discretization method, the stopping criterion is defined by following expressions: Gain(A, T ; S) > log 2(N 1) N + (A, T ; S) N Gain(A, T ; S) = Ent(S) E(A, T ; S) (4) (A, T ; S) = log 2 (3 Z 2) [Z.Ent(S) Z 1.Ent(S 1 ) Z 2.Ent(S 2 )] (5) where A is the feature which is going to be discretizated, T is candidate cut point, S is the set of samples, S 1 and S 2 are the subsets of the split samples for the left and right part of S, respectively, N is the number of the samples in S, Z is the number of the classes in the dataset, Z 1 and Z 2 are the numbers of the classes present in S 1 and S 2, respectively Naïve Bayes classifier. Naïve Bayes is a probabilistic classification method [16]. v NB of each different class in training data is calculated for a new sample. The new sample is accepted to be a member of the class where it has the highest v NB value for that class (3)

4 4672 E. KAYA, O. FINDIK, İ. BABAOĞLU AND A. ARSLAN Figure 1. General steps of discretization method [17]. v NB is defined by following expression: v NB = arg max p(v j ) p(a i v j ) (6) v j ı where j is the number of the classes in the dataset, i is the number of the condition features in the dataset, a i is the value of ith feature, v j is the class value of jth class C4.5 decision tree classifier. Decision Tree classifier is a non-complex classification method. Decision trees are composed of nodes, branches and leaves. Nodes, branches and leaves are defined as the features, the values of features and the values of the decision features, respectively. Each different path which begins from the root node and reaches to the leaf denotes a rule like if condition1 and condition2 and... then decision. Nodes and branches correspond to condition terms, and leaves correspond to decision term in the rule. In this study, C4.5 method is used to create the decision tree. In this method, the feature which has maximum gain is determined as the root node. The gains belonging to

5 EFFECT OF DISCRETIZATION METHOD ON DIAGNOSIS OF PARKINSON S DISEASE 4673 the subset of branches of the root node are recalculated. Nodes having maximum gain within each subset are determined as sub-nodes [18,19]. The creation of the tree goes on until each branch denotes a class. Gain is defined by following expression: Gain(S, A) = Entropy(S) Ent(S) = v values(a) S v S Entropy(S v) (7) Z p(c i, S) log 2 (p(c i, S)) (8) i=1 where S is the set of samples, A is the feature which represents the calculated gain, S v is the set of samples in where A feature get v value, Z is the number of the classes in the dataset, p(c i, S) is the proportion of samples/instances lying in the class C i k-nearest neighbor classifier. k-nn is a supervised learning algorithm. The k- neighborhood parameter is determined in the initialization stage of k-nn. The k samples which are closest to the new sample are found among the training data. The class of the new sample is determined according to the closest k-samples by using majority voting [20]. Distance measurements like Euclidean, Hamming and Manhattan are used to calculate the distances of the samples to each other Support vector machine classifier. SVM, which is based on the statistical learning theory, is one of the most commonly used classification techniques. This technique was firstly proposed by Vapnik [21]. In basic concept of linear SVM, the method separates two classes from each other optimally. It is aimed to find the optimal separating hyperplane that makes the margin between the hyperplanes maximum so that the classes are optimally separated. As a learning method, SVM is often used to train and design radial basis function (RBF) networks, and generally, it is more successful compared to similar artificial neural networks. The formulations and the detailed concept of this commonly used classifier can be reached from studies given [22-28]. 3. Experimental Results. Implementing different classification methods, the researchers analyzed the effect of discretization on the diagnosis of Parkinson s disease. The dataset used in the study is available online in the UCI database containing Parkinson dataset. Entropy-based discretization is used as the discretization method. The reason for selecting entropy-based discretization as the discretization method is it is being an unsupervised discretization method. The dataset and discretizated form of the dataset are classified with Naive Bayes, C4.5, k-nn and SVM classification methods. Both of the obtained classification results are compared. To make the results more consistent, k-fold cross validation is used. Each classification is implemented by a 5-fold cross validation in this study. The dataset is classified in both discretizated and non-discretizated forms using RBF, linear and polynomial kernels with SVM classifier. The SVM classifier s kernel parameter range for c and σ can be given as [0.1, 30000] and [0.001, 10], respectively. RBF kernel is determined as the optimum kernel used in SVM. The parameters of the optimum RBF kernel are and 2 for G and c, respectively. k parameter is taken as 5 in k-nn. Euclidian distance is used as the distance measurement between samples in k-nn and is given as follows: D(x, y) = n (x i y i ) 2 (9) i=1

6 4674 E. KAYA, O. FINDIK, İ. BABAOĞLU AND A. ARSLAN where n is the number of the features in the dataset, x and y are the samples in the dataset. Classification accuracy, sensitivity, specificity and area under the ROC curve (AUC) measurements are utilized to compare the results. The measurements are as follows: T P + T N CA = (10) T P + T N + F P + F N T P SEN = (11) T P + F N T N SP E = (12) T N + F P AUC = Area Under the ROC curve (13) where, CA, SEN and SP E denoted classification accuracy, sensitivity and specificity, respectively. T P is number of healthy prediction in healthy samples. T N is number of patient prediction in patient samples. F P is number of patient prediction in healthy samples. F N is number of healthy prediction in patient samples. Twenty two continuous-valued features in Parkinson s disease dataset are discretizated by using entropy-based discretization method. Numbers and values of the cut-points of features are given in Table 2. Table 2. Cut-points of features No Features NoC CP No Features NoC CP 1 MDVP:Fo (Hz) Shimmer:APQ MDVP:Fhi (Hz) MDVP:APQ MDVP:Flo (Hz) Shimmer:DDA MDVP:Jitter (%) NHR MDVP:Jitter (Abs) HNR MDVP:RAP RPDE MDVP:PPQ DFA Jitter:DDP spread MDVP:Shimmer spread MDVP:Shimmer (db) D Shimmer:APQ PPE Feature, names of the features obtained from biomedical voice measurements; NoC, the number of the cut points obtained after discretization; CP, the value of the cut-points obtained after discretization. The classification accuracy, sensitivity, specificity and AUC which are obtained from both discretizated and non-discretizated forms of the classification processes using naive Bayes, C4.5, k-nn and SVM classifiers are given in Table 3. By using entropy-based discretization method, the classification accuracies and AUC values of Naive Bayes, C4.5, k-nn, SVM classifiers have increased to 8.2%, 4.1%, 9.2%, 12.8% and 0.94%, 7.24%, 8.42%, 8.82%, respectively.

7 EFFECT OF DISCRETIZATION METHOD ON DIAGNOSIS OF PARKINSON S DISEASE 4675 Table 3. Classification results CA (%) Sen Spe AUC Naïve Bayes non-discretizated discretizated C4.5 non-discretizated discretizated k-nn non-discretizated discretizated SVM non-discretizated discretizated CA, SEN, SPE and AUC are denoted classification accuracy, sensitivity, specificity and Area under ROC curve, respectively. ROC curves belonging to healthy and unhealthy samples obtained using Naive Bayes, C4.5, k-nn and SVM are as shown in Figures 2-5. As shown by ROC curves, after discretization of dataset, an increase in classification accuracy has been observed in this study. Besides, the obtained results have shown that discretization method has given a very promising result in the diagnosis of Parkinson disease. The best model on the diagnosing of Parkinson disease was SVM with discretizated dataset. As a result, discretization method can be used in medical dataset as pre-processing. Thanks to discretization, diagnosis of diseases can be performed more accurately. 4. Conclusion. In this study, the dataset of Parkinson s disease obtained from UCI machine learning repository is used. Naïve Bayes, C4.5, k-nn and SVM classifier methods are used to classify the dataset. The dataset is classified using the features discretizated and non-discretizated in order to show the effectiveness of discretization on diagnosis of Parkinson s disease. The results have shown that discretization increases the classification accuracy of the diagnosis of Parkinson s disease. (a) (b) Figure 2. (a) ROC curve belongs to the healthy class obtained by classification of both discretizated and non-discretizated dataset using Naïve Bayes and (b) ROC curve belongs to the unhealthy class obtained by classification of both discretizated and non-discretizated datasets using Naïve Bayes

8 4676 E. KAYA, O. FINDIK, İ. BABAOĞLU AND A. ARSLAN (a) (b) Figure 3. (a) ROC curve belongs to the healthy class obtained by classification of both discretizated and non-discretizated dataset using C4.5 and (b) ROC curve belongs to the unhealthy class obtained by classification of both discretizated and non-discretizated datasets using C4.5 (a) (b) Figure 4. (a) ROC curve belongs to the healthy class obtained by classification of both discretizated and non-discretizated datasets using k-nn and (b) ROC curve belongs to the unhealthy class obtained by classification of both discretizated and non-discretizated datasets using k-nn (a) (b) Figure 5. (a) ROC curve belongs to the healthy class obtained by classification of both discretizated and non-discretizated datasets using SVM and (b) ROC curve belongs to the unhealthy class obtained by classification of both discretizated and non-discretizated datasets using SVM

9 EFFECT OF DISCRETIZATION METHOD ON DIAGNOSIS OF PARKINSON S DISEASE 4677 REFERENCES [1] A. E. Lang and A. M. Lozano, Parkinson s disease First of two parts, The New England Journal of Medicine, vol.339, pp , [2] N. Singh, V. Pillay and Y. E. Choonara, Advances in the treatment of Parkinson s disease, Progr. Neurobiol, vol.81, pp.29-44, [3] National Collaborating Centre for Chronic Conditions, Parkinson s disease: National clinical guideline for diagnosis and management in primary and secondary care, Royal College of Physicians, [4] A. K. Ho, R. Iansek, C. Marigliani, J. L. Bradshaw and S. Gates, Speech impairment in a large sample of patients with Parkinson s disease, Behavioural Neurology, vol.11, pp , [5] J. A. Logemann, H. B. Fisher, B. Boshes and E. R. Blonsky, Frequency and co-occurrence of vocaltract dysfunctions in speech of a large sample of parkinson patients, Journal of Speech and Hearing Disorders, vol.43, pp.47-57, [6] M. A. Little, P. E. McSharry, S. J. Roberts, D. A. Costello and I. M. Moroz, Exploiting nonlinear recurrence and fractal scaling properties for voice disorder detection, Biomedical Engineering Online, vol.6, pp.23-58, [7] M. A. Little, E. McSharry, E. J. Hunter, J. Spielman and L. O. Ramig, Suitability of dysphonia measurements for telemonitoring of Parkinson s disease, IEEE Transactions on Biomedical Engineering, vol.56, pp , [8] A. Kumar and D. Zhang, Hand-geometry recognition using entropy-based discretization, IEEE Transactions on Information Forensics and Security, vol.2, pp , [9] K. Polat, S. Kara, A. Güven and S. Güneş, Utilization of discretization method on the diagnosis of optic nerve disease, Computer Methods and Programs in Biomedicine, vol.91, pp , [10] R. Abraham, J. Simha and S. Iyengar, A comparative analysis of discretization methods for medical datamining with Naïve Bayesian clasifier, The 9th International Conference on Information Technology, pp , [11] J. Demsar, B. Zupan, N. Aoki, M. J. Wall, T. H. Granchi and J. R. Beck, Feature mining and predictive model construction from severe trauma patient s data, International Journal of Medical Informatics, vol.63, pp.41-50, [12] S. Acid, L. M. Campos, J. M. Fernandez-Luna, S. Rodriguez, J. M. Rodriguez and J. L. Salcedo, A comparison of learning algorithms for Bayesian networks: A case study based on data from an emergency medical service, Artificial Intelligence in Medicine, vol.30, pp , [13] M. K. Ismail and V. Ciesielski, An empirical investigation of the impact of discretization on common data distributions, Design and Application of Hybrid Intelligent Systems, pp , [14] H. Kodaz, S. Özşen, A. Arslan and S. Güneş, Medical application of information gain based artificial immune recognition system (AIRS): Diagnosis of thyroid disease, Expert Systems with Applications, vol.36, no.2, pp , [15] U. M. Fayyad and K. B. Irani, Multi-interval discretization of continuous-valued attributes for classification learning, The 13th International Joint Conference on Artificial Intelligence, pp , [16] H. Kima and S. Chen, Associative Naïve Bayes classifier: Automated linking of gene ontology to medline documents, Pattern Recognition, vol.42, pp , [17] C. Hsu, H. Huang and T. Wong, On why discretization works for Naïve Bayesian, Lecture Notes in Computer Science, pp , [18] M. Hill and M. T. Mitchell, Machine Learning, Singapore, [19] J. R. Quinlan, Induction of C4.5 decision trees, Machine Learning, vol.1, pp , [20] G. Shakhnarovish, T. Darrell and P. Indyk, Nearest-Neighbor Methods in Learning and Vision, MIT Press, [21] V. Vapnik, The Nature of Statistical Learning Theory, Springer, New York, [22] K. Y. Chen and C. H. Wang, A hybrid SARIMA and support vector machines in forecasting the production values of the machinery industry in Taiwan, Expert Systems with Applications, vol.32, pp , [23] E. Çomak, A. Arslan and İ. Türko qlu, A decision support system based on support vector machines for diagnosis of the heart valve diseases, Computers in Biology and Medicine, vol.37, pp.21-27, [24] K. Takeuchi and N. Collier, Bio-medical entity extraction using support vector machines, Artificial Intelligence in Medicine, vol.33, no.2, pp , 2003.

10 4678 E. KAYA, O. FINDIK, İ. BABAOĞLU AND A. ARSLAN [25] J. Chen and F. Pan, A new online support vector machine algorithm, ICIC Express Letters, vol.4, no.1, pp , [26] Z. Chen, W. Hong and C. Wang, RNA secondary structure prediction with plane pseudoknots based on support vector machine, ICIC Express Letters, vol.3, no.4(b), pp , [27] B. R. Chang and H. F. Tsai, Training support vector regression by quantum-neuron-based hopfield neural net with nested local adiabatic evolution, International Journal of Innovative Computing, Information and Control, vol.5, no.4, pp , [28] N. Begum, M. A. Fattah and F. Ren, Automatic text summarization using support vector machine, International Journal of Innovative Computing, Information and Control, vol.5, no.7, pp , 2009.

Social Media Mining. Data Mining Essentials

Social Media Mining. Data Mining Essentials Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers

More information

Classification algorithm in Data mining: An Overview

Classification algorithm in Data mining: An Overview Classification algorithm in Data mining: An Overview S.Neelamegam #1, Dr.E.Ramaraj *2 #1 M.phil Scholar, Department of Computer Science and Engineering, Alagappa University, Karaikudi. *2 Professor, Department

More information

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014 RESEARCH ARTICLE OPEN ACCESS A Survey of Data Mining: Concepts with Applications and its Future Scope Dr. Zubair Khan 1, Ashish Kumar 2, Sunny Kumar 3 M.Tech Research Scholar 2. Department of Computer

More information

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015 An Introduction to Data Mining for Wind Power Management Spring 2015 Big Data World Every minute: Google receives over 4 million search queries Facebook users share almost 2.5 million pieces of content

More information

A NEW DECISION TREE METHOD FOR DATA MINING IN MEDICINE

A NEW DECISION TREE METHOD FOR DATA MINING IN MEDICINE A NEW DECISION TREE METHOD FOR DATA MINING IN MEDICINE Kasra Madadipouya 1 1 Department of Computing and Science, Asia Pacific University of Technology & Innovation ABSTRACT Today, enormous amount of data

More information

ANALYSIS OF FEATURE SELECTION WITH CLASSFICATION: BREAST CANCER DATASETS

ANALYSIS OF FEATURE SELECTION WITH CLASSFICATION: BREAST CANCER DATASETS ANALYSIS OF FEATURE SELECTION WITH CLASSFICATION: BREAST CANCER DATASETS Abstract D.Lavanya * Department of Computer Science, Sri Padmavathi Mahila University Tirupati, Andhra Pradesh, 517501, India lav_dlr@yahoo.com

More information

Predicting the Risk of Heart Attacks using Neural Network and Decision Tree

Predicting the Risk of Heart Attacks using Neural Network and Decision Tree Predicting the Risk of Heart Attacks using Neural Network and Decision Tree S.Florence 1, N.G.Bhuvaneswari Amma 2, G.Annapoorani 3, K.Malathi 4 PG Scholar, Indian Institute of Information Technology, Srirangam,

More information

ENSEMBLE DECISION TREE CLASSIFIER FOR BREAST CANCER DATA

ENSEMBLE DECISION TREE CLASSIFIER FOR BREAST CANCER DATA ENSEMBLE DECISION TREE CLASSIFIER FOR BREAST CANCER DATA D.Lavanya 1 and Dr.K.Usha Rani 2 1 Research Scholar, Department of Computer Science, Sree Padmavathi Mahila Visvavidyalayam, Tirupati, Andhra Pradesh,

More information

International Journal of Software and Web Sciences (IJSWS) www.iasir.net

International Journal of Software and Web Sciences (IJSWS) www.iasir.net International Association of Scientific Innovation and Research (IASIR) (An Association Unifying the Sciences, Engineering, and Applied Research) ISSN (Print): 2279-0063 ISSN (Online): 2279-0071 International

More information

INTERNATIONAL JOURNAL FOR ENGINEERING APPLICATIONS AND TECHNOLOGY DATA MINING IN HEALTHCARE SECTOR. ankitanandurkar2394@gmail.com

INTERNATIONAL JOURNAL FOR ENGINEERING APPLICATIONS AND TECHNOLOGY DATA MINING IN HEALTHCARE SECTOR. ankitanandurkar2394@gmail.com IJFEAT INTERNATIONAL JOURNAL FOR ENGINEERING APPLICATIONS AND TECHNOLOGY DATA MINING IN HEALTHCARE SECTOR Bharti S. Takey 1, Ankita N. Nandurkar 2,Ashwini A. Khobragade 3,Pooja G. Jaiswal 4,Swapnil R.

More information

An Introduction to Data Mining

An Introduction to Data Mining An Introduction to Intel Beijing wei.heng@intel.com January 17, 2014 Outline 1 DW Overview What is Notable Application of Conference, Software and Applications Major Process in 2 Major Tasks in Detail

More information

International Journal of Computer Science Trends and Technology (IJCST) Volume 3 Issue 3, May-June 2015

International Journal of Computer Science Trends and Technology (IJCST) Volume 3 Issue 3, May-June 2015 RESEARCH ARTICLE OPEN ACCESS Data Mining Technology for Efficient Network Security Management Ankit Naik [1], S.W. Ahmad [2] Student [1], Assistant Professor [2] Department of Computer Science and Engineering

More information

Identifying At-Risk Students Using Machine Learning Techniques: A Case Study with IS 100

Identifying At-Risk Students Using Machine Learning Techniques: A Case Study with IS 100 Identifying At-Risk Students Using Machine Learning Techniques: A Case Study with IS 100 Erkan Er Abstract In this paper, a model for predicting students performance levels is proposed which employs three

More information

Data Mining - Evaluation of Classifiers

Data Mining - Evaluation of Classifiers Data Mining - Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010

More information

A Content based Spam Filtering Using Optical Back Propagation Technique

A Content based Spam Filtering Using Optical Back Propagation Technique A Content based Spam Filtering Using Optical Back Propagation Technique Sarab M. Hameed 1, Noor Alhuda J. Mohammed 2 Department of Computer Science, College of Science, University of Baghdad - Iraq ABSTRACT

More information

Model Trees for Classification of Hybrid Data Types

Model Trees for Classification of Hybrid Data Types Model Trees for Classification of Hybrid Data Types Hsing-Kuo Pao, Shou-Chih Chang, and Yuh-Jye Lee Dept. of Computer Science & Information Engineering, National Taiwan University of Science & Technology,

More information

TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM

TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM Thanh-Nghi Do College of Information Technology, Cantho University 1 Ly Tu Trong Street, Ninh Kieu District Cantho City, Vietnam

More information

Feature Selection using Integer and Binary coded Genetic Algorithm to improve the performance of SVM Classifier

Feature Selection using Integer and Binary coded Genetic Algorithm to improve the performance of SVM Classifier Feature Selection using Integer and Binary coded Genetic Algorithm to improve the performance of SVM Classifier D.Nithya a, *, V.Suganya b,1, R.Saranya Irudaya Mary c,1 Abstract - This paper presents,

More information

HYBRID PROBABILITY BASED ENSEMBLES FOR BANKRUPTCY PREDICTION

HYBRID PROBABILITY BASED ENSEMBLES FOR BANKRUPTCY PREDICTION HYBRID PROBABILITY BASED ENSEMBLES FOR BANKRUPTCY PREDICTION Chihli Hung 1, Jing Hong Chen 2, Stefan Wermter 3, 1,2 Department of Management Information Systems, Chung Yuan Christian University, Taiwan

More information

Keywords data mining, prediction techniques, decision making.

Keywords data mining, prediction techniques, decision making. Volume 5, Issue 4, April 2015 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Analysis of Datamining

More information

Data Mining: A Preprocessing Engine

Data Mining: A Preprocessing Engine Journal of Computer Science 2 (9): 735-739, 2006 ISSN 1549-3636 2005 Science Publications Data Mining: A Preprocessing Engine Luai Al Shalabi, Zyad Shaaban and Basel Kasasbeh Applied Science University,

More information

Scalable Developments for Big Data Analytics in Remote Sensing

Scalable Developments for Big Data Analytics in Remote Sensing Scalable Developments for Big Data Analytics in Remote Sensing Federated Systems and Data Division Research Group High Productivity Data Processing Dr.-Ing. Morris Riedel et al. Research Group Leader,

More information

Data Mining for Customer Service Support. Senioritis Seminar Presentation Megan Boice Jay Carter Nick Linke KC Tobin

Data Mining for Customer Service Support. Senioritis Seminar Presentation Megan Boice Jay Carter Nick Linke KC Tobin Data Mining for Customer Service Support Senioritis Seminar Presentation Megan Boice Jay Carter Nick Linke KC Tobin Traditional Hotline Services Problem Traditional Customer Service Support (manufacturing)

More information

AUTO CLAIM FRAUD DETECTION USING MULTI CLASSIFIER SYSTEM

AUTO CLAIM FRAUD DETECTION USING MULTI CLASSIFIER SYSTEM AUTO CLAIM FRAUD DETECTION USING MULTI CLASSIFIER SYSTEM ABSTRACT Luis Alexandre Rodrigues and Nizam Omar Department of Electrical Engineering, Mackenzie Presbiterian University, Brazil, São Paulo 71251911@mackenzie.br,nizam.omar@mackenzie.br

More information

Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data

Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data CMPE 59H Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data Term Project Report Fatma Güney, Kübra Kalkan 1/15/2013 Keywords: Non-linear

More information

Comparison of Data Mining Techniques used for Financial Data Analysis

Comparison of Data Mining Techniques used for Financial Data Analysis Comparison of Data Mining Techniques used for Financial Data Analysis Abhijit A. Sawant 1, P. M. Chawan 2 1 Student, 2 Associate Professor, Department of Computer Technology, VJTI, Mumbai, INDIA Abstract

More information

Predict Influencers in the Social Network

Predict Influencers in the Social Network Predict Influencers in the Social Network Ruishan Liu, Yang Zhao and Liuyu Zhou Email: rliu2, yzhao2, lyzhou@stanford.edu Department of Electrical Engineering, Stanford University Abstract Given two persons

More information

DATA MINING AND REPORTING IN HEALTHCARE

DATA MINING AND REPORTING IN HEALTHCARE DATA MINING AND REPORTING IN HEALTHCARE Divya Gandhi 1, Pooja Asher 2, Harshada Chaudhari 3 1,2,3 Department of Information Technology, Sardar Patel Institute of Technology, Mumbai,(India) ABSTRACT The

More information

DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES

DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES Vijayalakshmi Mahanra Rao 1, Yashwant Prasad Singh 2 Multimedia University, Cyberjaya, MALAYSIA 1 lakshmi.mahanra@gmail.com

More information

Artificial Neural Network, Decision Tree and Statistical Techniques Applied for Designing and Developing E-mail Classifier

Artificial Neural Network, Decision Tree and Statistical Techniques Applied for Designing and Developing E-mail Classifier International Journal of Recent Technology and Engineering (IJRTE) ISSN: 2277-3878, Volume-1, Issue-6, January 2013 Artificial Neural Network, Decision Tree and Statistical Techniques Applied for Designing

More information

Hong Kong Stock Index Forecasting

Hong Kong Stock Index Forecasting Hong Kong Stock Index Forecasting Tong Fu Shuo Chen Chuanqi Wei tfu1@stanford.edu cslcb@stanford.edu chuanqi@stanford.edu Abstract Prediction of the movement of stock market is a long-time attractive topic

More information

SURVEY OF TEXT CLASSIFICATION ALGORITHMS FOR SPAM FILTERING

SURVEY OF TEXT CLASSIFICATION ALGORITHMS FOR SPAM FILTERING I J I T E ISSN: 2229-7367 3(1-2), 2012, pp. 233-237 SURVEY OF TEXT CLASSIFICATION ALGORITHMS FOR SPAM FILTERING K. SARULADHA 1 AND L. SASIREKA 2 1 Assistant Professor, Department of Computer Science and

More information

Machine learning for algo trading

Machine learning for algo trading Machine learning for algo trading An introduction for nonmathematicians Dr. Aly Kassam Overview High level introduction to machine learning A machine learning bestiary What has all this got to do with

More information

life science data mining

life science data mining life science data mining - '.)'-. < } ti» (>.:>,u» c ~'editors Stephen Wong Harvard Medical School, USA Chung-Sheng Li /BM Thomas J Watson Research Center World Scientific NEW JERSEY LONDON SINGAPORE.

More information

Learning Example. Machine learning and our focus. Another Example. An example: data (loan application) The data and the goal

Learning Example. Machine learning and our focus. Another Example. An example: data (loan application) The data and the goal Learning Example Chapter 18: Learning from Examples 22c:145 An emergency room in a hospital measures 17 variables (e.g., blood pressure, age, etc) of newly admitted patients. A decision is needed: whether

More information

Using Data Mining for Mobile Communication Clustering and Characterization

Using Data Mining for Mobile Communication Clustering and Characterization Using Data Mining for Mobile Communication Clustering and Characterization A. Bascacov *, C. Cernazanu ** and M. Marcu ** * Lasting Software, Timisoara, Romania ** Politehnica University of Timisoara/Computer

More information

A Comparative Analysis of Classification Techniques on Categorical Data in Data Mining

A Comparative Analysis of Classification Techniques on Categorical Data in Data Mining A Comparative Analysis of Classification Techniques on Categorical Data in Data Mining Sakshi Department Of Computer Science And Engineering United College of Engineering & Research Naini Allahabad sakshikashyap09@gmail.com

More information

Prediction and Diagnosis of Heart Disease by Data Mining Techniques

Prediction and Diagnosis of Heart Disease by Data Mining Techniques Prediction and Diagnosis of Heart Disease by Data Mining Techniques Boshra Bahrami, Mirsaeid Hosseini Shirvani* Department of Computer Engineering, Sari Branch, Islamic Azad University Sari, Iran Boshrabahrami_znu@yahoo.com;

More information

A fast multi-class SVM learning method for huge databases

A fast multi-class SVM learning method for huge databases www.ijcsi.org 544 A fast multi-class SVM learning method for huge databases Djeffal Abdelhamid 1, Babahenini Mohamed Chaouki 2 and Taleb-Ahmed Abdelmalik 3 1,2 Computer science department, LESIA Laboratory,

More information

Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms

Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms Scott Pion and Lutz Hamel Abstract This paper presents the results of a series of analyses performed on direct mail

More information

Classifying Large Data Sets Using SVMs with Hierarchical Clusters. Presented by :Limou Wang

Classifying Large Data Sets Using SVMs with Hierarchical Clusters. Presented by :Limou Wang Classifying Large Data Sets Using SVMs with Hierarchical Clusters Presented by :Limou Wang Overview SVM Overview Motivation Hierarchical micro-clustering algorithm Clustering-Based SVM (CB-SVM) Experimental

More information

DATA MINING TECHNIQUES AND APPLICATIONS

DATA MINING TECHNIQUES AND APPLICATIONS DATA MINING TECHNIQUES AND APPLICATIONS Mrs. Bharati M. Ramageri, Lecturer Modern Institute of Information Technology and Research, Department of Computer Application, Yamunanagar, Nigdi Pune, Maharashtra,

More information

Email Spam Detection A Machine Learning Approach

Email Spam Detection A Machine Learning Approach Email Spam Detection A Machine Learning Approach Ge Song, Lauren Steimle ABSTRACT Machine learning is a branch of artificial intelligence concerned with the creation and study of systems that can learn

More information

Healthcare Data Mining: Prediction Inpatient Length of Stay

Healthcare Data Mining: Prediction Inpatient Length of Stay 3rd International IEEE Conference Intelligent Systems, September 2006 Healthcare Data Mining: Prediction Inpatient Length of Peng Liu, Lei Lei, Junjie Yin, Wei Zhang, Wu Naijun, Elia El-Darzi 1 Abstract

More information

Machine Learning in Spam Filtering

Machine Learning in Spam Filtering Machine Learning in Spam Filtering A Crash Course in ML Konstantin Tretyakov kt@ut.ee Institute of Computer Science, University of Tartu Overview Spam is Evil ML for Spam Filtering: General Idea, Problems.

More information

Data quality in Accounting Information Systems

Data quality in Accounting Information Systems Data quality in Accounting Information Systems Comparing Several Data Mining Techniques Erjon Zoto Department of Statistics and Applied Informatics Faculty of Economy, University of Tirana Tirana, Albania

More information

Data Mining Analysis (breast-cancer data)

Data Mining Analysis (breast-cancer data) Data Mining Analysis (breast-cancer data) Jung-Ying Wang Register number: D9115007, May, 2003 Abstract In this AI term project, we compare some world renowned machine learning tools. Including WEKA data

More information

IDENTIFIC ATION OF SOFTWARE EROSION USING LOGISTIC REGRESSION

IDENTIFIC ATION OF SOFTWARE EROSION USING LOGISTIC REGRESSION http:// IDENTIFIC ATION OF SOFTWARE EROSION USING LOGISTIC REGRESSION Harinder Kaur 1, Raveen Bajwa 2 1 PG Student., CSE., Baba Banda Singh Bahadur Engg. College, Fatehgarh Sahib, (India) 2 Asstt. Prof.,

More information

Applied Mathematical Sciences, Vol. 7, 2013, no. 112, 5591-5597 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/ams.2013.

Applied Mathematical Sciences, Vol. 7, 2013, no. 112, 5591-5597 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/ams.2013. Applied Mathematical Sciences, Vol. 7, 2013, no. 112, 5591-5597 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/ams.2013.38457 Accuracy Rate of Predictive Models in Credit Screening Anirut Suebsing

More information

Discretization and grouping: preprocessing steps for Data Mining

Discretization and grouping: preprocessing steps for Data Mining Discretization and grouping: preprocessing steps for Data Mining PetrBerka 1 andivanbruha 2 1 LaboratoryofIntelligentSystems Prague University of Economic W. Churchill Sq. 4, Prague CZ 13067, Czech Republic

More information

Categorical Data Visualization and Clustering Using Subjective Factors

Categorical Data Visualization and Clustering Using Subjective Factors Categorical Data Visualization and Clustering Using Subjective Factors Chia-Hui Chang and Zhi-Kai Ding Department of Computer Science and Information Engineering, National Central University, Chung-Li,

More information

An Overview of Knowledge Discovery Database and Data mining Techniques

An Overview of Knowledge Discovery Database and Data mining Techniques An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,

More information

ON INTEGRATING UNSUPERVISED AND SUPERVISED CLASSIFICATION FOR CREDIT RISK EVALUATION

ON INTEGRATING UNSUPERVISED AND SUPERVISED CLASSIFICATION FOR CREDIT RISK EVALUATION ISSN 9 X INFORMATION TECHNOLOGY AND CONTROL, 00, Vol., No.A ON INTEGRATING UNSUPERVISED AND SUPERVISED CLASSIFICATION FOR CREDIT RISK EVALUATION Danuta Zakrzewska Institute of Computer Science, Technical

More information

Predicting Student Performance by Using Data Mining Methods for Classification

Predicting Student Performance by Using Data Mining Methods for Classification BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume 13, No 1 Sofia 2013 Print ISSN: 1311-9702; Online ISSN: 1314-4081 DOI: 10.2478/cait-2013-0006 Predicting Student Performance

More information

ARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING)

ARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING) ARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING) Gabriela Ochoa http://www.cs.stir.ac.uk/~goc/ OUTLINE Preliminaries Classification and Clustering Applications

More information

First Semester Computer Science Students Academic Performances Analysis by Using Data Mining Classification Algorithms

First Semester Computer Science Students Academic Performances Analysis by Using Data Mining Classification Algorithms First Semester Computer Science Students Academic Performances Analysis by Using Data Mining Classification Algorithms Azwa Abdul Aziz, Nor Hafieza IsmailandFadhilah Ahmad Faculty Informatics & Computing

More information

Data Mining Classification: Decision Trees

Data Mining Classification: Decision Trees Data Mining Classification: Decision Trees Classification Decision Trees: what they are and how they work Hunt s (TDIDT) algorithm How to select the best split How to handle Inconsistent data Continuous

More information

COMBINED METHODOLOGY of the CLASSIFICATION RULES for MEDICAL DATA-SETS

COMBINED METHODOLOGY of the CLASSIFICATION RULES for MEDICAL DATA-SETS COMBINED METHODOLOGY of the CLASSIFICATION RULES for MEDICAL DATA-SETS V.Sneha Latha#, P.Y.L.Swetha#, M.Bhavya#, G. Geetha#, D. K.Suhasini# # Dept. of Computer Science& Engineering K.L.C.E, GreenFields-522502,

More information

DATA MINING USING INTEGRATION OF CLUSTERING AND DECISION TREE

DATA MINING USING INTEGRATION OF CLUSTERING AND DECISION TREE DATA MINING USING INTEGRATION OF CLUSTERING AND DECISION TREE 1 K.Murugan, 2 P.Varalakshmi, 3 R.Nandha Kumar, 4 S.Boobalan 1 Teaching Fellow, Department of Computer Technology, Anna University 2 Assistant

More information

COMPARING NEURAL NETWORK ALGORITHM PERFORMANCE USING SPSS AND NEUROSOLUTIONS

COMPARING NEURAL NETWORK ALGORITHM PERFORMANCE USING SPSS AND NEUROSOLUTIONS COMPARING NEURAL NETWORK ALGORITHM PERFORMANCE USING SPSS AND NEUROSOLUTIONS AMJAD HARB and RASHID JAYOUSI Faculty of Computer Science, Al-Quds University, Jerusalem, Palestine Abstract This study exploits

More information

DATA MINING-BASED PREDICTIVE MODEL TO DETERMINE PROJECT FINANCIAL SUCCESS USING PROJECT DEFINITION PARAMETERS

DATA MINING-BASED PREDICTIVE MODEL TO DETERMINE PROJECT FINANCIAL SUCCESS USING PROJECT DEFINITION PARAMETERS DATA MINING-BASED PREDICTIVE MODEL TO DETERMINE PROJECT FINANCIAL SUCCESS USING PROJECT DEFINITION PARAMETERS Seungtaek Lee, Changmin Kim, Yoora Park, Hyojoo Son, and Changwan Kim* Department of Architecture

More information

Decision Support System on Prediction of Heart Disease Using Data Mining Techniques

Decision Support System on Prediction of Heart Disease Using Data Mining Techniques International Journal of Engineering Research and General Science Volume 3, Issue, March-April, 015 ISSN 091-730 Decision Support System on Prediction of Heart Disease Using Data Mining Techniques Ms.

More information

Data Mining Techniques for Prognosis in Pancreatic Cancer

Data Mining Techniques for Prognosis in Pancreatic Cancer Data Mining Techniques for Prognosis in Pancreatic Cancer by Stuart Floyd A Thesis Submitted to the Faculty of the WORCESTER POLYTECHNIC INSTITUE In partial fulfillment of the requirements for the Degree

More information

AnalysisofData MiningClassificationwithDecisiontreeTechnique

AnalysisofData MiningClassificationwithDecisiontreeTechnique Global Journal of omputer Science and Technology Software & Data Engineering Volume 13 Issue 13 Version 1.0 Year 2013 Type: Double Blind Peer Reviewed International Research Journal Publisher: Global Journals

More information

Bagged Ensemble Classifiers for Sentiment Classification of Movie Reviews

Bagged Ensemble Classifiers for Sentiment Classification of Movie Reviews www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242 Volume 3 Issue 2 February, 2014 Page No. 3951-3961 Bagged Ensemble Classifiers for Sentiment Classification of Movie

More information

Introduction to Data Mining

Introduction to Data Mining Introduction to Data Mining Jay Urbain Credits: Nazli Goharian & David Grossman @ IIT Outline Introduction Data Pre-processing Data Mining Algorithms Naïve Bayes Decision Tree Neural Network Association

More information

E-commerce Transaction Anomaly Classification

E-commerce Transaction Anomaly Classification E-commerce Transaction Anomaly Classification Minyong Lee minyong@stanford.edu Seunghee Ham sham12@stanford.edu Qiyi Jiang qjiang@stanford.edu I. INTRODUCTION Due to the increasing popularity of e-commerce

More information

Reference Books. Data Mining. Supervised vs. Unsupervised Learning. Classification: Definition. Classification k-nearest neighbors

Reference Books. Data Mining. Supervised vs. Unsupervised Learning. Classification: Definition. Classification k-nearest neighbors Classification k-nearest neighbors Data Mining Dr. Engin YILDIZTEPE Reference Books Han, J., Kamber, M., Pei, J., (2011). Data Mining: Concepts and Techniques. Third edition. San Francisco: Morgan Kaufmann

More information

Improving spam mail filtering using classification algorithms with discretization Filter

Improving spam mail filtering using classification algorithms with discretization Filter International Association of Scientific Innovation and Research (IASIR) (An Association Unifying the Sciences, Engineering, and Applied Research) International Journal of Emerging Technologies in Computational

More information

Electroencephalography Analysis Using Neural Network and Support Vector Machine during Sleep

Electroencephalography Analysis Using Neural Network and Support Vector Machine during Sleep Engineering, 23, 5, 88-92 doi:.4236/eng.23.55b8 Published Online May 23 (http://www.scirp.org/journal/eng) Electroencephalography Analysis Using Neural Network and Support Vector Machine during Sleep JeeEun

More information

The treatment of missing values and its effect in the classifier accuracy

The treatment of missing values and its effect in the classifier accuracy The treatment of missing values and its effect in the classifier accuracy Edgar Acuña 1 and Caroline Rodriguez 2 1 Department of Mathematics, University of Puerto Rico at Mayaguez, Mayaguez, PR 00680 edgar@cs.uprm.edu

More information

EFFICIENCY OF DECISION TREES IN PREDICTING STUDENT S ACADEMIC PERFORMANCE

EFFICIENCY OF DECISION TREES IN PREDICTING STUDENT S ACADEMIC PERFORMANCE EFFICIENCY OF DECISION TREES IN PREDICTING STUDENT S ACADEMIC PERFORMANCE S. Anupama Kumar 1 and Dr. Vijayalakshmi M.N 2 1 Research Scholar, PRIST University, 1 Assistant Professor, Dept of M.C.A. 2 Associate

More information

EMPIRICAL STUDY ON SELECTION OF TEAM MEMBERS FOR SOFTWARE PROJECTS DATA MINING APPROACH

EMPIRICAL STUDY ON SELECTION OF TEAM MEMBERS FOR SOFTWARE PROJECTS DATA MINING APPROACH EMPIRICAL STUDY ON SELECTION OF TEAM MEMBERS FOR SOFTWARE PROJECTS DATA MINING APPROACH SANGITA GUPTA 1, SUMA. V. 2 1 Jain University, Bangalore 2 Dayanada Sagar Institute, Bangalore, India Abstract- One

More information

Predictive Data modeling for health care: Comparative performance study of different prediction models

Predictive Data modeling for health care: Comparative performance study of different prediction models Predictive Data modeling for health care: Comparative performance study of different prediction models Shivanand Hiremath hiremat.nitie@gmail.com National Institute of Industrial Engineering (NITIE) Vihar

More information

An Analysis of Missing Data Treatment Methods and Their Application to Health Care Dataset

An Analysis of Missing Data Treatment Methods and Their Application to Health Care Dataset P P P Health An Analysis of Missing Data Treatment Methods and Their Application to Health Care Dataset Peng Liu 1, Elia El-Darzi 2, Lei Lei 1, Christos Vasilakis 2, Panagiotis Chountas 2, and Wei Huang

More information

Towards better accuracy for Spam predictions

Towards better accuracy for Spam predictions Towards better accuracy for Spam predictions Chengyan Zhao Department of Computer Science University of Toronto Toronto, Ontario, Canada M5S 2E4 czhao@cs.toronto.edu Abstract Spam identification is crucial

More information

Prediction of Heart Disease Using Naïve Bayes Algorithm

Prediction of Heart Disease Using Naïve Bayes Algorithm Prediction of Heart Disease Using Naïve Bayes Algorithm R.Karthiyayini 1, S.Chithaara 2 Assistant Professor, Department of computer Applications, Anna University, BIT campus, Tiruchirapalli, Tamilnadu,

More information

Gender Identification using MFCC for Telephone Applications A Comparative Study

Gender Identification using MFCC for Telephone Applications A Comparative Study Gender Identification using MFCC for Telephone Applications A Comparative Study Jamil Ahmad, Mustansar Fiaz, Soon-il Kwon, Maleerat Sodanil, Bay Vo, and * Sung Wook Baik Abstract Gender recognition is

More information

Implementation of Data Mining Techniques to Perform Market Analysis

Implementation of Data Mining Techniques to Perform Market Analysis Implementation of Data Mining Techniques to Perform Market Analysis B.Sabitha 1, N.G.Bhuvaneswari Amma 2, G.Annapoorani 3, P.Balasubramanian 4 PG Scholar, Indian Institute of Information Technology, Srirangam,

More information

Data Mining. 1 Introduction 2 Data Mining methods. Alfred Holl Data Mining 1

Data Mining. 1 Introduction 2 Data Mining methods. Alfred Holl Data Mining 1 Data Mining 1 Introduction 2 Data Mining methods Alfred Holl Data Mining 1 1 Introduction 1.1 Motivation 1.2 Goals and problems 1.3 Definitions 1.4 Roots 1.5 Data Mining process 1.6 Epistemological constraints

More information

An Overview of Data Mining Techniques Applied for Heart Disease Diagnosis and Prediction

An Overview of Data Mining Techniques Applied for Heart Disease Diagnosis and Prediction Lecture Notes on Information Theory Vol. 2, No. 4, December 2014 An Overview of Data Mining Techniques Applied for Heart Disease Diagnosis and Prediction Salha M. Alzahani, Afnan Althopity, Ashwag Alghamdi,

More information

Rule based Classification of BSE Stock Data with Data Mining

Rule based Classification of BSE Stock Data with Data Mining International Journal of Information Sciences and Application. ISSN 0974-2255 Volume 4, Number 1 (2012), pp. 1-9 International Research Publication House http://www.irphouse.com Rule based Classification

More information

Neural Networks Lesson 5 - Cluster Analysis

Neural Networks Lesson 5 - Cluster Analysis Neural Networks Lesson 5 - Cluster Analysis Prof. Michele Scarpiniti INFOCOM Dpt. - Sapienza University of Rome http://ispac.ing.uniroma1.it/scarpiniti/index.htm michele.scarpiniti@uniroma1.it Rome, 29

More information

Application of Event Based Decision Tree and Ensemble of Data Driven Methods for Maintenance Action Recommendation

Application of Event Based Decision Tree and Ensemble of Data Driven Methods for Maintenance Action Recommendation Application of Event Based Decision Tree and Ensemble of Data Driven Methods for Maintenance Action Recommendation James K. Kimotho, Christoph Sondermann-Woelke, Tobias Meyer, and Walter Sextro Department

More information

Equity forecast: Predicting long term stock price movement using machine learning

Equity forecast: Predicting long term stock price movement using machine learning Equity forecast: Predicting long term stock price movement using machine learning Nikola Milosevic School of Computer Science, University of Manchester, UK Nikola.milosevic@manchester.ac.uk Abstract Long

More information

Customer Data Mining and Visualization by Generative Topographic Mapping Methods

Customer Data Mining and Visualization by Generative Topographic Mapping Methods Customer Data Mining and Visualization by Generative Topographic Mapping Methods Jinsan Yang and Byoung-Tak Zhang Artificial Intelligence Lab (SCAI) School of Computer Science and Engineering Seoul National

More information

Tweaking Naïve Bayes classifier for intelligent spam detection

Tweaking Naïve Bayes classifier for intelligent spam detection 682 Tweaking Naïve Bayes classifier for intelligent spam detection Ankita Raturi 1 and Sunil Pranit Lal 2 1 University of California, Irvine, CA 92697, USA. araturi@uci.edu 2 School of Computing, Information

More information

Advanced Ensemble Strategies for Polynomial Models

Advanced Ensemble Strategies for Polynomial Models Advanced Ensemble Strategies for Polynomial Models Pavel Kordík 1, Jan Černý 2 1 Dept. of Computer Science, Faculty of Information Technology, Czech Technical University in Prague, 2 Dept. of Computer

More information

How To Cluster

How To Cluster Data Clustering Dec 2nd, 2013 Kyrylo Bessonov Talk outline Introduction to clustering Types of clustering Supervised Unsupervised Similarity measures Main clustering algorithms k-means Hierarchical Main

More information

Predicting required bandwidth for educational institutes using prediction techniques in data mining (Case Study: Qom Payame Noor University)

Predicting required bandwidth for educational institutes using prediction techniques in data mining (Case Study: Qom Payame Noor University) 260 IJCSNS International Journal of Computer Science and Network Security, VOL.11 No.6, June 2011 Predicting required bandwidth for educational institutes using prediction techniques in data mining (Case

More information

Network Intrusion Detection Using a HNB Binary Classifier

Network Intrusion Detection Using a HNB Binary Classifier 2015 17th UKSIM-AMSS International Conference on Modelling and Simulation Network Intrusion Detection Using a HNB Binary Classifier Levent Koc and Alan D. Carswell Center for Security Studies, University

More information

Knowledge Discovery from patents using KMX Text Analytics

Knowledge Discovery from patents using KMX Text Analytics Knowledge Discovery from patents using KMX Text Analytics Dr. Anton Heijs anton.heijs@treparel.com Treparel Abstract In this white paper we discuss how the KMX technology of Treparel can help searchers

More information

Data Mining for Knowledge Management. Classification

Data Mining for Knowledge Management. Classification 1 Data Mining for Knowledge Management Classification Themis Palpanas University of Trento http://disi.unitn.eu/~themis Data Mining for Knowledge Management 1 Thanks for slides to: Jiawei Han Eamonn Keogh

More information

A Survey on classification & feature selection technique based ensemble models in health care domain

A Survey on classification & feature selection technique based ensemble models in health care domain A Survey on classification & feature selection technique based ensemble models in health care domain GarimaSahu M.Tech (CSE) Raipur Institute of Technology,(R.I.T.) Raipur, Chattishgarh, India garima.sahu03@gmail.com

More information

BIOINF 585 Fall 2015 Machine Learning for Systems Biology & Clinical Informatics http://www.ccmb.med.umich.edu/node/1376

BIOINF 585 Fall 2015 Machine Learning for Systems Biology & Clinical Informatics http://www.ccmb.med.umich.edu/node/1376 Course Director: Dr. Kayvan Najarian (DCM&B, kayvan@umich.edu) Lectures: Labs: Mondays and Wednesdays 9:00 AM -10:30 AM Rm. 2065 Palmer Commons Bldg. Wednesdays 10:30 AM 11:30 AM (alternate weeks) Rm.

More information

BIG DATA IN HEALTHCARE THE NEXT FRONTIER

BIG DATA IN HEALTHCARE THE NEXT FRONTIER BIG DATA IN HEALTHCARE THE NEXT FRONTIER Divyaa Krishna Sonnad 1, Dr. Jharna Majumdar 2 2 Dean R&D, Prof. and Head, 1,2 Dept of CSE (PG), Nitte Meenakshi Institute of Technology Abstract: The world of

More information

Data Mining Essentials

Data Mining Essentials This chapter is from Social Media Mining: An Introduction. By Reza Zafarani, Mohammad Ali Abbasi, and Huan Liu. Cambridge University Press, 2014. Draft version: April 20, 2014. Complete Draft and Slides

More information

Biomarker Discovery and Data Visualization Tool for Ovarian Cancer Screening

Biomarker Discovery and Data Visualization Tool for Ovarian Cancer Screening , pp.169-178 http://dx.doi.org/10.14257/ijbsbt.2014.6.2.17 Biomarker Discovery and Data Visualization Tool for Ovarian Cancer Screening Ki-Seok Cheong 2,3, Hye-Jeong Song 1,3, Chan-Young Park 1,3, Jong-Dae

More information

Document Image Retrieval using Signatures as Queries

Document Image Retrieval using Signatures as Queries Document Image Retrieval using Signatures as Queries Sargur N. Srihari, Shravya Shetty, Siyuan Chen, Harish Srinivasan, Chen Huang CEDAR, University at Buffalo(SUNY) Amherst, New York 14228 Gady Agam and

More information

Machine Learning in FX Carry Basket Prediction

Machine Learning in FX Carry Basket Prediction Machine Learning in FX Carry Basket Prediction Tristan Fletcher, Fabian Redpath and Joe D Alessandro Abstract Artificial Neural Networks ANN), Support Vector Machines SVM) and Relevance Vector Machines

More information