EFFECT OF DISCRETIZATION METHOD ON THE DIAGNOSIS OF PARKINSON S DISEASE

Transcription

1 International Journal of Innovative Computing, Information and Control ICIC International c 2011 ISSN Volume 7, Number 8, August 2011 pp EFFECT OF DISCRETIZATION METHOD ON THE DIAGNOSIS OF PARKINSON S DISEASE Ersin Kaya, Oğuz Findik, İsmail Babaoğlu and Ahmet Arslan Department of Computer Engineering Faculty of Engineering and Architecture Selçuk University Selçuklu, Konya 42075, Turkey { ersinkaya; oguzf; ibabaoglu; ahmetarslan }@selcuk.edu.tr Received April 2010; revised January 2011 Abstract. Implementing different classification methods, this study analyzes the effect of discretization on the diagnosis of Parkinson s disease. Entropy-based discretization method is used as the discretization method, and support vector machines, C4.5, k-nearest neighbors and Naïve Bayes are used as the classification methods. The diagnosis of Parkinson s disease is implemented without using any preprocessing method. Afterwards, the Parkinson s disease dataset is classified after implementing entropy-based discretization on the dataset. Both results are compared, and it is observed that using discretization method increases the success of classification on the diagnosis of Parkinson s disease by 4.1% to 12.8%. Keywords: Parkinson s disease, Entropy-based discretization method, Classification methods 1. Introduction. Parkinson s disease is a kind of nervous system disorder which generally arises mostly in men in their 50s. This disease is firstly discovered by James Parkinson, so it was called Parkinson s disease [1]. The symptoms like poverty of movement, slowness of movement, rigidity and rest tremor are commonly diagnosed in patients with Parkinson s disease [2]. Nowadays, no treatment for Parkinson s disease is available. However, if the disease is diagnosed at an earlier time, drug treatments mitigating the effects of the symptoms are implemented at clinic environments [3]. Research into this disease shows that sound distortion occurs on 90% of Parkinson s disease [4,5]. Much research was performed by using voice disorders for the diagnosis of Parkinson s disease [6]. Little et al. used linear discriminant analysis (LDA) to identify the characteristics of sound data to be used in the diagnosis of the disease. For the diagnosis of Parkinson s disease, they composed a model using selected properties with support vector machine (SVM) classifier [7]. The data subjected to preprocessing in the classification process increase the performance of classification [8,9]. Discretization in the data mining is an important preprocessing type. Continuous-valued features in dataset are transformed to discrete values with discretization method. Research shows that discretization of continuous values features increases the performance of the classification. Polat et al. studying to diagnose nerve disease showed that when used with traditional methods like artificial neural network (ANN), least squares support vector machines, and C4.5, discretization increases the performance of classification [8]. Abraham et al. studying on 28 publicly available medical dataset pointed out the effect of discretization on the success of Naïve Bayes classification [10]. Demsar et al. created a predictive model on data consisting of 69 examples and 174 properties belonging to trauma patients. They used decision tree and Naïve Bayes 4669

2 4670 E. KAYA, O. FINDIK, İ. BABAOĞLU AND A. ARSLAN classification method in this model. Positive effects on the success of discretization methods have been shown in this study [11]. Acid et al. introduced a model which evaluates the performance of emergency service of a Spanish hospital by using Bayesian network. In their study, some continuously valued features were transformed into interval-valued features [12]. The data obtained from University of California Irvine (UCI) machine learning repository housing Parkinson s disease dataset are used in this study. The continuously valued features in the data are transformed into interval-valued features by discretization method based on entropy. Original data and discretizated data are classified by using Naïve Bayes, C4.5, k-nearest neighbor (k-nn) and SVM classifier methods. The results are compared with each other, and also the effect of discretization on the classification accuracy is shown. 2. Materials and Methods The Parkinson dataset. In this study, the dataset obtained from UCI machine learning repository is used. This dataset is composed of 32 people from both sexes, of them being Parkinson patients. 7 biomedical voice measurements are obtained from S21, S27 and S35 and 6 biomedical voice measurements from the others. The dataset is composed of 195 measurements and 22 features. Detailed analysis of the dataset is shown in Table Discretization. Discretization is an important pre-processing method in data analysis concept. By discretization methods, continuous-valued features are transformed into interval-valued features. Because the data is transformed into a more meaningful shape, the performance of the classification becomes more effective. There are many discretization methods like entropy-based, equal frequency and equal width discretization in literature [13,14]. Common steps of discretization methods are shown in Figure 1 and these steps can be summarized as follows. Firstly, values of the continuous-valued feature in the dataset are sorted. Then, the candidate cut points are determined for this continuous-valued feature. Fitness values of obtained candidate cut points are computed and values of the continuous-valued feature are splitted according to candidate cut point which has the best fitness value. These steps are used recursively until the stopping criterion. A discretization method is identified by determination of the candidate cut points, computation of the fitness values of candidate cut points and the stopping criterion Entropy-based discretization. Entropy-based discretization method is a commonly used discretization method proposed by Fayyad and Irani [15]. In this method, candidate cut-points are determined for the continuous-valued feature. The cut-point is selected according to the entropy of the candidate cut-points. Entropies of candidate cut-points are defined by following expressions: E(A, T ; S) = S 1 S Ent(S 1) S 2 S Ent(S 2) (1) Ent(S) = Z p(c i, S) log 2 (p(c i, S)) (2) i=1 where A is the feature which is going to be discretizated, T is candidate cut point, S is the set of samples, S 1 and S 2 are the subsets of the split samples for the left and right part of S, respectively, Z is the number of the classes in the dataset, C i is the decision value of the ith class, p(c i, S) is the proportion of samples/instances lying in the class C i.

3 EFFECT OF DISCRETIZATION METHOD ON DIAGNOSIS OF PARKINSON S DISEASE 4671 Table 1. Detailed analysis of the dataset Features Max Min Median Mean SD MDVP:Fo (Hz) , MDVP:Fhi (Hz) , MDVP:Flo (Hz) , MDVP:Jitter (%) , MDVP:Jitter (Abs) , MDVP:RAP , MDVP:PPQ , Jitter:DDP , MDVP:Shimmer , MDVP:Shimmer (db) , Shimmer:APQ , Shimmer:APQ , MDVP:APQ , Shimmer:DDA , NHR , HNR , RPDE , DFA , spread , Spread , D , PPE , Feature, names of the features obtained from biomedical voice measurements; Max, maximum value of the features; Min, minimum value of the features; Median, median value of the features; Mean, mean value of the features; SD, Standard derivation of the features; NoC, number of the cut points obtained after discretization; CP, value of the cut points obtained after discretization. After selection of the cut-point which has the minimum entropy, values of the continuousvalued feature are splitted into two parts. Then, this procedure is repeated until the stopping criterion is reached for each part. In entropy-based discretization method, the stopping criterion is defined by following expressions: Gain(A, T ; S) > log 2(N 1) N + (A, T ; S) N Gain(A, T ; S) = Ent(S) E(A, T ; S) (4) (A, T ; S) = log 2 (3 Z 2) [Z.Ent(S) Z 1.Ent(S 1 ) Z 2.Ent(S 2 )] (5) where A is the feature which is going to be discretizated, T is candidate cut point, S is the set of samples, S 1 and S 2 are the subsets of the split samples for the left and right part of S, respectively, N is the number of the samples in S, Z is the number of the classes in the dataset, Z 1 and Z 2 are the numbers of the classes present in S 1 and S 2, respectively Naïve Bayes classifier. Naïve Bayes is a probabilistic classification method [16]. v NB of each different class in training data is calculated for a new sample. The new sample is accepted to be a member of the class where it has the highest v NB value for that class (3)

4 4672 E. KAYA, O. FINDIK, İ. BABAOĞLU AND A. ARSLAN Figure 1. General steps of discretization method [17]. v NB is defined by following expression: v NB = arg max p(v j ) p(a i v j ) (6) v j ı where j is the number of the classes in the dataset, i is the number of the condition features in the dataset, a i is the value of ith feature, v j is the class value of jth class C4.5 decision tree classifier. Decision Tree classifier is a non-complex classification method. Decision trees are composed of nodes, branches and leaves. Nodes, branches and leaves are defined as the features, the values of features and the values of the decision features, respectively. Each different path which begins from the root node and reaches to the leaf denotes a rule like if condition1 and condition2 and... then decision. Nodes and branches correspond to condition terms, and leaves correspond to decision term in the rule. In this study, C4.5 method is used to create the decision tree. In this method, the feature which has maximum gain is determined as the root node. The gains belonging to

5 EFFECT OF DISCRETIZATION METHOD ON DIAGNOSIS OF PARKINSON S DISEASE 4673 the subset of branches of the root node are recalculated. Nodes having maximum gain within each subset are determined as sub-nodes [18,19]. The creation of the tree goes on until each branch denotes a class. Gain is defined by following expression: Gain(S, A) = Entropy(S) Ent(S) = v values(a) S v S Entropy(S v) (7) Z p(c i, S) log 2 (p(c i, S)) (8) i=1 where S is the set of samples, A is the feature which represents the calculated gain, S v is the set of samples in where A feature get v value, Z is the number of the classes in the dataset, p(c i, S) is the proportion of samples/instances lying in the class C i k-nearest neighbor classifier. k-nn is a supervised learning algorithm. The k- neighborhood parameter is determined in the initialization stage of k-nn. The k samples which are closest to the new sample are found among the training data. The class of the new sample is determined according to the closest k-samples by using majority voting [20]. Distance measurements like Euclidean, Hamming and Manhattan are used to calculate the distances of the samples to each other Support vector machine classifier. SVM, which is based on the statistical learning theory, is one of the most commonly used classification techniques. This technique was firstly proposed by Vapnik [21]. In basic concept of linear SVM, the method separates two classes from each other optimally. It is aimed to find the optimal separating hyperplane that makes the margin between the hyperplanes maximum so that the classes are optimally separated. As a learning method, SVM is often used to train and design radial basis function (RBF) networks, and generally, it is more successful compared to similar artificial neural networks. The formulations and the detailed concept of this commonly used classifier can be reached from studies given [22-28]. 3. Experimental Results. Implementing different classification methods, the researchers analyzed the effect of discretization on the diagnosis of Parkinson s disease. The dataset used in the study is available online in the UCI database containing Parkinson dataset. Entropy-based discretization is used as the discretization method. The reason for selecting entropy-based discretization as the discretization method is it is being an unsupervised discretization method. The dataset and discretizated form of the dataset are classified with Naive Bayes, C4.5, k-nn and SVM classification methods. Both of the obtained classification results are compared. To make the results more consistent, k-fold cross validation is used. Each classification is implemented by a 5-fold cross validation in this study. The dataset is classified in both discretizated and non-discretizated forms using RBF, linear and polynomial kernels with SVM classifier. The SVM classifier s kernel parameter range for c and σ can be given as [0.1, 30000] and [0.001, 10], respectively. RBF kernel is determined as the optimum kernel used in SVM. The parameters of the optimum RBF kernel are and 2 for G and c, respectively. k parameter is taken as 5 in k-nn. Euclidian distance is used as the distance measurement between samples in k-nn and is given as follows: D(x, y) = n (x i y i ) 2 (9) i=1

6 4674 E. KAYA, O. FINDIK, İ. BABAOĞLU AND A. ARSLAN where n is the number of the features in the dataset, x and y are the samples in the dataset. Classification accuracy, sensitivity, specificity and area under the ROC curve (AUC) measurements are utilized to compare the results. The measurements are as follows: T P + T N CA = (10) T P + T N + F P + F N T P SEN = (11) T P + F N T N SP E = (12) T N + F P AUC = Area Under the ROC curve (13) where, CA, SEN and SP E denoted classification accuracy, sensitivity and specificity, respectively. T P is number of healthy prediction in healthy samples. T N is number of patient prediction in patient samples. F P is number of patient prediction in healthy samples. F N is number of healthy prediction in patient samples. Twenty two continuous-valued features in Parkinson s disease dataset are discretizated by using entropy-based discretization method. Numbers and values of the cut-points of features are given in Table 2. Table 2. Cut-points of features No Features NoC CP No Features NoC CP 1 MDVP:Fo (Hz) Shimmer:APQ MDVP:Fhi (Hz) MDVP:APQ MDVP:Flo (Hz) Shimmer:DDA MDVP:Jitter (%) NHR MDVP:Jitter (Abs) HNR MDVP:RAP RPDE MDVP:PPQ DFA Jitter:DDP spread MDVP:Shimmer spread MDVP:Shimmer (db) D Shimmer:APQ PPE Feature, names of the features obtained from biomedical voice measurements; NoC, the number of the cut points obtained after discretization; CP, the value of the cut-points obtained after discretization. The classification accuracy, sensitivity, specificity and AUC which are obtained from both discretizated and non-discretizated forms of the classification processes using naive Bayes, C4.5, k-nn and SVM classifiers are given in Table 3. By using entropy-based discretization method, the classification accuracies and AUC values of Naive Bayes, C4.5, k-nn, SVM classifiers have increased to 8.2%, 4.1%, 9.2%, 12.8% and 0.94%, 7.24%, 8.42%, 8.82%, respectively.

7 EFFECT OF DISCRETIZATION METHOD ON DIAGNOSIS OF PARKINSON S DISEASE 4675 Table 3. Classification results CA (%) Sen Spe AUC Naïve Bayes non-discretizated discretizated C4.5 non-discretizated discretizated k-nn non-discretizated discretizated SVM non-discretizated discretizated CA, SEN, SPE and AUC are denoted classification accuracy, sensitivity, specificity and Area under ROC curve, respectively. ROC curves belonging to healthy and unhealthy samples obtained using Naive Bayes, C4.5, k-nn and SVM are as shown in Figures 2-5. As shown by ROC curves, after discretization of dataset, an increase in classification accuracy has been observed in this study. Besides, the obtained results have shown that discretization method has given a very promising result in the diagnosis of Parkinson disease. The best model on the diagnosing of Parkinson disease was SVM with discretizated dataset. As a result, discretization method can be used in medical dataset as pre-processing. Thanks to discretization, diagnosis of diseases can be performed more accurately. 4. Conclusion. In this study, the dataset of Parkinson s disease obtained from UCI machine learning repository is used. Naïve Bayes, C4.5, k-nn and SVM classifier methods are used to classify the dataset. The dataset is classified using the features discretizated and non-discretizated in order to show the effectiveness of discretization on diagnosis of Parkinson s disease. The results have shown that discretization increases the classification accuracy of the diagnosis of Parkinson s disease. (a) (b) Figure 2. (a) ROC curve belongs to the healthy class obtained by classification of both discretizated and non-discretizated dataset using Naïve Bayes and (b) ROC curve belongs to the unhealthy class obtained by classification of both discretizated and non-discretizated datasets using Naïve Bayes

8 4676 E. KAYA, O. FINDIK, İ. BABAOĞLU AND A. ARSLAN (a) (b) Figure 3. (a) ROC curve belongs to the healthy class obtained by classification of both discretizated and non-discretizated dataset using C4.5 and (b) ROC curve belongs to the unhealthy class obtained by classification of both discretizated and non-discretizated datasets using C4.5 (a) (b) Figure 4. (a) ROC curve belongs to the healthy class obtained by classification of both discretizated and non-discretizated datasets using k-nn and (b) ROC curve belongs to the unhealthy class obtained by classification of both discretizated and non-discretizated datasets using k-nn (a) (b) Figure 5. (a) ROC curve belongs to the healthy class obtained by classification of both discretizated and non-discretizated datasets using SVM and (b) ROC curve belongs to the unhealthy class obtained by classification of both discretizated and non-discretizated datasets using SVM

9 EFFECT OF DISCRETIZATION METHOD ON DIAGNOSIS OF PARKINSON S DISEASE 4677 REFERENCES [1] A. E. Lang and A. M. Lozano, Parkinson s disease First of two parts, The New England Journal of Medicine, vol.339, pp , [2] N. Singh, V. Pillay and Y. E. Choonara, Advances in the treatment of Parkinson s disease, Progr. Neurobiol, vol.81, pp.29-44, [3] National Collaborating Centre for Chronic Conditions, Parkinson s disease: National clinical guideline for diagnosis and management in primary and secondary care, Royal College of Physicians, [4] A. K. Ho, R. Iansek, C. Marigliani, J. L. Bradshaw and S. Gates, Speech impairment in a large sample of patients with Parkinson s disease, Behavioural Neurology, vol.11, pp , [5] J. A. Logemann, H. B. Fisher, B. Boshes and E. R. Blonsky, Frequency and co-occurrence of vocaltract dysfunctions in speech of a large sample of parkinson patients, Journal of Speech and Hearing Disorders, vol.43, pp.47-57, [6] M. A. Little, P. E. McSharry, S. J. Roberts, D. A. Costello and I. M. Moroz, Exploiting nonlinear recurrence and fractal scaling properties for voice disorder detection, Biomedical Engineering Online, vol.6, pp.23-58, [7] M. A. Little, E. McSharry, E. J. Hunter, J. Spielman and L. O. Ramig, Suitability of dysphonia measurements for telemonitoring of Parkinson s disease, IEEE Transactions on Biomedical Engineering, vol.56, pp , [8] A. Kumar and D. Zhang, Hand-geometry recognition using entropy-based discretization, IEEE Transactions on Information Forensics and Security, vol.2, pp , [9] K. Polat, S. Kara, A. Güven and S. Güneş, Utilization of discretization method on the diagnosis of optic nerve disease, Computer Methods and Programs in Biomedicine, vol.91, pp , [10] R. Abraham, J. Simha and S. Iyengar, A comparative analysis of discretization methods for medical datamining with Naïve Bayesian clasifier, The 9th International Conference on Information Technology, pp , [11] J. Demsar, B. Zupan, N. Aoki, M. J. Wall, T. H. Granchi and J. R. Beck, Feature mining and predictive model construction from severe trauma patient s data, International Journal of Medical Informatics, vol.63, pp.41-50, [12] S. Acid, L. M. Campos, J. M. Fernandez-Luna, S. Rodriguez, J. M. Rodriguez and J. L. Salcedo, A comparison of learning algorithms for Bayesian networks: A case study based on data from an emergency medical service, Artificial Intelligence in Medicine, vol.30, pp , [13] M. K. Ismail and V. Ciesielski, An empirical investigation of the impact of discretization on common data distributions, Design and Application of Hybrid Intelligent Systems, pp , [14] H. Kodaz, S. Özşen, A. Arslan and S. Güneş, Medical application of information gain based artificial immune recognition system (AIRS): Diagnosis of thyroid disease, Expert Systems with Applications, vol.36, no.2, pp , [15] U. M. Fayyad and K. B. Irani, Multi-interval discretization of continuous-valued attributes for classification learning, The 13th International Joint Conference on Artificial Intelligence, pp , [16] H. Kima and S. Chen, Associative Naïve Bayes classifier: Automated linking of gene ontology to medline documents, Pattern Recognition, vol.42, pp , [17] C. Hsu, H. Huang and T. Wong, On why discretization works for Naïve Bayesian, Lecture Notes in Computer Science, pp , [18] M. Hill and M. T. Mitchell, Machine Learning, Singapore, [19] J. R. Quinlan, Induction of C4.5 decision trees, Machine Learning, vol.1, pp , [20] G. Shakhnarovish, T. Darrell and P. Indyk, Nearest-Neighbor Methods in Learning and Vision, MIT Press, [21] V. Vapnik, The Nature of Statistical Learning Theory, Springer, New York, [22] K. Y. Chen and C. H. Wang, A hybrid SARIMA and support vector machines in forecasting the production values of the machinery industry in Taiwan, Expert Systems with Applications, vol.32, pp , [23] E. Çomak, A. Arslan and İ. Türko qlu, A decision support system based on support vector machines for diagnosis of the heart valve diseases, Computers in Biology and Medicine, vol.37, pp.21-27, [24] K. Takeuchi and N. Collier, Bio-medical entity extraction using support vector machines, Artificial Intelligence in Medicine, vol.33, no.2, pp , 2003.

10 4678 E. KAYA, O. FINDIK, İ. BABAOĞLU AND A. ARSLAN [25] J. Chen and F. Pan, A new online support vector machine algorithm, ICIC Express Letters, vol.4, no.1, pp , [26] Z. Chen, W. Hong and C. Wang, RNA secondary structure prediction with plane pseudoknots based on support vector machine, ICIC Express Letters, vol.3, no.4(b), pp , [27] B. R. Chang and H. F. Tsai, Training support vector regression by quantum-neuron-based hopfield neural net with nested local adiabatic evolution, International Journal of Innovative Computing, Information and Control, vol.5, no.4, pp , [28] N. Begum, M. A. Fattah and F. Ren, Automatic text summarization using support vector machine, International Journal of Innovative Computing, Information and Control, vol.5, no.7, pp , 2009.