Chapter 7. Diagnosis and Prognosis of Breast Cancer using Histopathological Data

Chapter 7 Diagnosis and Prognosis of Breast Cancer using Histopathological Data In the previous chapter, a method for classification of mammograms using wavelet analysis and adaptive neuro-fuzzy inference system (ANFIS) was analyzed. In this chapter, cytologically proved tumors are evaluated using support vector machine (SVM), radial basis function neural network (RBFNN) and auto associative neural network (AANN) based on the analysis of the histopathological data obtained from fine needle aspirate (FNA) procedure. Diagnosis of breast cancer is carried out using the polynomial kernel of SVM and RBFNN. Accurate cancer prognosis prediction is critical to cancer treatment. Prognosis is a medical term denoting the doctor s prediction of how a patient will progress, and whether there is a chance of recovery. In this chapter, prognosis of breast cancer is also carried out using a different set of histopathological data and the classifiers namely SVM and AANN are used to predict the long term behavior of the disease. 7.1 Introduction A pathologist is a physician who analyzes cells and tissues under a microscope. The pathologist s report helps to characterize specimens taken during biopsy or other surgical procedures and also helps to determine the treatment. Histology is the study of tissues, including cellular structure and function. To determine a tumor s histologic grade, pathologists examine the tissue for cellular patterns under a microscope. A sample of breast cells may be taken from a breast biopsy and the findings of the pathologist are recorded to form a database and this serves as input to the classifier. 108

In this chapter, histopathological data obtained from the Wisconsin breast cancer database is used in the diagnosis and prognosis of breast cancer. 7.2 Dataset used for Diagnosis of Breast Cancer In this section, histopathological data are used to demonstrate the applicability of SVM and RBFNN to medical diagnosis and decision making. The database containing 699 instances of breast cancer cases obtained from the Wisconsin diagnosis breast cancer database [29] is used for this purpose. The feature vector formulated has nine attributes related to the frequency of cell mitosis (rate of cell division) and nuclear pleomorphism (change in cell size, shape and uniformity), etc. The nine features used for classification include clump thickness, uniformity of cell size, uniformity of cell shape, marginal adhesion, single epithelial cell size, bare nuclei, bland chromatin, normal nucleoli and mitosis. These nine characteristics are found to differ significantly between benign and malignant samples. Each of the nine cytological characteristics of breast FNA reported to differ between benign and malignant samples graded 1 to 10 at the time of sample collection. Out of the total data 65.5% data belong to benign class and the remaining 34.5% of the data belong to malignant class. 7.3 Techniques for Diagnosis of Breast Cancer Radial Basis Function Neural Network The RBFNN has a feed forward architecture as shown in Fig. 7.1. The construction of a radial basis function network in its most basic form involves three different layers. The input layer is made up of N I units for a N I dimensional input vector. The input layer is fully connected to the second layer which is a hidden layer of N H units. The hidden layer units are fully connected to the N C output layer units where N C is the number of output classes. The output layer supplies the response of the network to the activation patterns applied to the input layer. The transformation from the input space to the hidden-unit space is nonlinear whereas the transformation from the hidden-unit space to the output space is linear [180]. 109

Fig. 7.1: Architecture of a radial basis function neural network. The activation functions (AFs) of the hidden layers are chosen to be Gaussian and are characterized by their mean vectors (centers) µ i, and covariance matrices Σ i, i = 1, 2..., N H. For simplicity it is assumed that the covariance matrices are of the form Σ i = σi 2 I, where i = 1, 2..., N H. Then the activation function of the i th hidden unit for an input vector x j is given by (7.1): ( ) xj µ i 2 g i (x j ) = exp 2σ 2 i (7.1) The µ i and σi 2 are estimated using a suitable clustering algorithm. The number of AFs in the network and their spread influence the smoothness of the mapping. The number of hidden units is empirically determined and it is assumed that σi 2 = σ2, where σ 2 is given in (7.2). σ 2 = η l2 2 (7.2) In (7.2), l is the maximum distance between the chosen centers and η is the empirical factor which serves to control the smoothness of the mapping function. Therefore (7.1) 110

is rewritten as ( ) xj µ i 2 g i (x j ) = exp (7.3) η l 2 The hidden layer units are fully connected to the N C output layer through weights λ ik. The output units are linear, and the response of the k th output for an input x j is given by y k (x j ) = N H i=0 λ ik g i (x j ), k = 1, 2,...N C (7.4) where g 0 (x j ) = 1. Given N T cytology feature vectors from N C classes, that is (benign and malignant) training the RBFNN involves estimating µ i, i = 1, 2,...N H, η, l 2 and λ ik, i = 1, 2,...N H. Training the RBFNN involves two stages [181]. First, the basis functions must be established using an algorithm to cluster data in the training set. Typical ways to do this include Kohonen self organizing maps [182], k-means clustering, decision trees, genetic algorithms or orthogonal least squares algorithms [183]. In this study, k-means clustering is used. k-means clustering involves sorting all objects into a predefined number of groups by minimizing the total squared Euclidean distance for each object with respect to its nearest cluster center. Next, it is necessary to fix the weights linking the hidden and the output layers. If neurons in the output layer contain linear activation functions, these weights can be calculated directly using matrix inversion (using singular value decomposition) and matrix multiplication. Because of the direct calculation of weights in an RBFNN, it is usually much quicker to train than an equivalent multi-layer perceptron (MLP) training algorithm. 7.3.1 Experimental Results and Discussion Nine cytological features of breast fine-needle aspirates reported to differ between benign and malignant samples of 699 patients are used to train and test the models. All the features are first normalized between 1 and +1 in order for the classifier to have a common range to work with. A program has been written in C language for that purpose. 111

Training and Testing RBFNN In this implementation, the k-means unsupervised algorithm was used to estimate the hidden-layer weights from a set of training data. After the initial training and the estimation of the hidden-layer weights, the weights in the output layer are computed. The training phase consists of two steps. By using the k-means algorithm, appropriate centers are generated based on the training patterns as the first step. Initially the dataset containing 699 patterns are stored as two data files one containing data related to benign class (458 instances)and the other related to malignant class (241 instances). A program has been written to generate the required number of centroids for each of the class datasets. Then all the generated means are combined into a single file. The computed centers are copied into the corresponding links. Evenly distributed centers from the training patterns are selected and assigned to the links between input and hidden layer. The second step is the computation of the weights between the hidden layer and the output layer. Then another program has been written to test the data using the weights so generated. The performance of the classifier has been found out by varying the number of centroids in each run. The classifier output for the test data has been compared with the original class attribute for identifying true positives, true negatives, false positives and false negative values. Table 7.1 gives these values in the form of a confusion matrix and Table 7.2 shows the performance metrics calculated using this confusion matrix. The overall performance of RBFNN is arrived at by taking the average performance values of the different clusters and it is shown in Fig. 7.2. Support vector machine A support vector machine performs classification by constructing an N-dimensional hyperplane that optimally separates the data into two categories. A brief overview of SVM is given in Section 4.4.1 of Chapter 4. 112

Table 7.1: RBFNN : Confusion matrix for diagnosis of breast cancer using histopathological data. No.of Clusters (k) tp tn f p f n 1 434 234 9 22 4 441 231 12 15 5 445 232 11 11 10 445 231 12 11 Table 7.2: RBFNN : Performance measures for diagnosis of breast cancer using histopathological data. Performance in (%) No. of Clusters Accuracy Specificity Sensitivity F-Score 1 95.56 96.29 95.17 98.31 4 96.14 95.06 96.71 97.67 5 96.85 95.47 97.58 96.91 10 96.70 95.06 97.58 97.14 Training and Testing SVM SVM Torch, is used for training and testing the model [173]. In order to evaluate the result three fold cross validation is used. A program has been written in C to divide the data randomly into three different sets for training and testing the classifier. The train data includes the nine feature attributes and a class attribute, while the test data has only the nine feature attributes excluding the class attribute. The polynomial kernel based SVM is trained using two third (433 instances) of the data randomly chosen and tested with the remaining one third (233 instances) of the data for evaluating the classifier s effectiveness. Training and testing is done using all the three randomly 113

Fig. 7.2: Overall performance of RBFNN for diagnosis of breast cancer using histopathological data. divided sets(3 cross validation)to ensure fair and unbiased classification. The classifier output for the test data is compared with the original class attribute for identifying true positives, true negatives, false positives and false negative values. Table 7.3 gives these values in the form of a confusion matrix and Table 7.4 shows the performance metrics calculated using this confusion matrix. The overall performance of SVM is arrived at, by taking the average performance values of the different cross runs and it is shown in Fig. 7.3 Accuracy approximates, how effective the algorithm is, by showing the probability of the true value of the class label. Table 7.5 shows that the accuracy of RBFNN in classifying benign and malignant mass using cytological data is better (96.31%) than SVM ( 92.11%). Sensitivity/Specificity approximates the probability of the positive/negative label being true (assesses the effectiveness of the algorithm on a single class). Here positive 114

Table 7.3: SVM : Confusion matrix for diagnosis of breast cancer using histopathological data. Cross run tp tn f p f n 1 82 122 7 22 2 51 176 3 3 3 73 140 9 11 Table 7.4: SVM : Performance measures for diagnosis of breast cancer using histopathological data. Performance in (%) Cross run Accuracy Specificity Sensitivity F-Score 1 87.5 94.57 78.85 85.43 2 97.42 98.32 94.44 92.43 3 91.41 93.95 86.90 88.32 refers to benign mass and negative refers to malignant mass. Referring to Table 7.5, it can be observed that the sensitivity of both RBFNN and SVM is around 96% indicating that they are equally good in identifying malignant mass correctly. However, RBFNN is far better than SVM in identifying benign masses having a sensitivity of 96.73% in comparison to the sensitivity of SVM (86.73%), indicating that SVM has failed in identifying the true positives correctly. F-Score is a composite measure which favors algorithms with higher sensitivity and challenges those with higher specificity. RBFNN having a higher sensitivity has higher value of F-Score compared to SVM as seen in the Fig. 7.4. 115

Fig. 7.3: Overall performance of SVM for breast cancer diagnosis using histopathological data. 7.4 Dataset used for Prognosis of Breast Cancer The word prognosis is often used in medical reports dictating a physician s view on a case. Prognosis is a medical term denoting the doctor s prediction of how a patient will progress, and whether there is a chance of recovery. In other words, prognosis is the prediction of long term behavior of the disease. In this work, prognosis is done using cytological features and classifiers such as support vector machine and auto associative neural network which are used to classify the disease as either recurrent or non-recurrent. Three fold cross validation is done to avoid bias in classifying and the performance metrics such as accuracy, sensitivity, specificity, F-score of SVM and AANN is found and compared. The dataset of 198 samples from Wisconsin prognosis breast cancer database [29] are taken as input. Among the dataset, two-third of the dataset are used for training the classifier, and one-third of the dataset are used for testing the classifier. The 116

Fig. 7.4: Graph comparing performance of SVM and RBFNN for diagnosis of breast cancer using histopathological data. first attribute of each sample is discarded which is an ID number. The remaining 34 attributes are considered for training and testing the classifier. Some of the attributes include time, radius, texture, perimeter, area, smoothness, compactness, concavity, symmetry, fractal dimension, etc. These features specify the texture of the cell which gives the physician the clue for arriving at the the prognosis. In this work, an attempt has been made to automate prognosis using these features and pattern classification models namely SVM and AANN. 7.5 Techniques for Prognosis of Breast Cancer SVM and AANN are used to classify the cancer as recurrent or non recurrent based on cytological data. SVM has been dealt with in detail in Chapter 4. This section discusses AANN. 117

Table 7.5: Performance comparison of SVM and RBFNN for diagnosis of breast cancer using histopathological data Measures (%) SVM RBF Accuracy 92.11 96.31 Sensitivity 86.73 96.73 Specificity 95.61 95.47 F-Score 88.72 97.50 Autoassociative Neural Network Autoassociative neural network is a special class of feedforward neural network architecture having some interesting properties which can be exploited for some pattern recognition tasks [184]. Separate AANN models are used to capture the distribution of feature vectors of each class namely recurrent and non-recurrent. A five layer autoassociative neural network model is shown in Fig. 7.5. Autoassociative neural network is a network having the same number of neurons in input and output layers, and less in the hidden layers. The network is trained using the input vector itself as the desired output. This training leads to organize a compression encoding network between the input layer and the hidden layer, and a decoding network between the hidden layer and the output layer as shown in Fig. 7.5. Each of the autoassociative networks is trained independently for each class using the feature vector of the class. As a result, the squared error between an input and the output is generally minimized by the network of the class to which the input pattern belongs. This property enables to classify an unknown input pattern. The unknown pattern is fed to all networks, and is classified to the class with minimum squared error. The processing units of the input layer and the output layer are linear, whereas the units in the hidden layer are nonlinear [185]. During training of the network, the target vectors are the same as the input vectors. To realize the input vectors at the output layer, the network projects an M-dimensional vector in the input space R M onto a vector in the subspace R N, and then maps it back onto the M-dimensional space, where N < M. The network performs nonlinear 118

Fig. 7.5: An autoassociative neural network. principal component analysis of projecting the input vectors onto the subspace R N. The subspace R N is the space spanned by the first N principal components derived from the training data. The value of N is determined by the number of units in the dimension compression layer. The mapping of the subspace R N back to the M - dimensional space R M determines the way in which the subspace R N is embedded in the original subspace R M. It has been shown that the AANN trained with a dataset will capture the subspace and the hyper surface along the maximum variance of the data [186] and [169]. In other words, AANN can be used to capture the distribution of the given data set. In this work, this feature of AANN is used to classify recurrent and non-recurrent cancer cases. 119

7.5.1 Experimental Results and Discussion Thirty three cytological features of breast fine-needle aspirates reported to differ between recurrent and non-recurrent cases of 198 patients are used to train and test the models. Training and testing the SVM In order to evaluate the result, three fold cross validation is used. The training data includes thirty four feature attributes and a class attribute while the test data has only thirty three feature attributes excluding the class attribute. The data are normalized between -1 and +1 in order for the classifier to have a common range to work with. SVM Torch is used for training and testing the model [173]. The polynomial kernel based SVM is trained using two third of the data randomly chosen and tested with the remaining one third of the data for evaluating the classifier s effectiveness. Table 7.6 shows the confusion matrix and Table 7.7 depicts the performance measures for the three different cross runs. The overall performance of SVM was arrived at, by taking the average performance values of the different cross runs and it is shown in Fig. 7.6. Table 7.6: SVM : Confusion matrix for prognosis of breast cancer using histopathological data. Cross run tp tn f p f n 1 40 3 6 9 2 44 3 6 17 3 47 75 8 10 120

Table 7.7: SVM : Performance measures for prognosis of breast cancer using histopathological data. Performance in (%) Cross run Accuracy Specificity Sensitivity F-Score 1 74.13 33.33 81.63 84.20 2 74.20 38.40 82.40 83.90 3 67.10 33.30 72.10 79.20 Table 7.8: AANN : Confusion matrix for prognosis of breast cancer using histopathological data. Cross run Epochs tp tn f p f n 100 30 20 2 8 1 500 32 20 2 6 1000 32 20 4 4 100 29 19 1 11 2 500 33 19 1 7 1000 31 18 2 9 100 29 20 2 9 3 500 32 20 2 6 1000 32 20 2 8 Training and Testing AANN The structure of the AANN model used is 12L 38N 4N 38N 12L, where L denotes linear units and N denotes non-linear units. The activation function of the non-linear unit is a hyperbolic tangent function. The network is trained using error backpropagation learning algorithm for 100, 500 and 1000 epochs. Table 7.8 gives the results in the form of a confusion matrix and the performance metrics calculated for the three different cross runs is shown in Table 7.9. The AANN gives better performance for 121

Fig. 7.6: Overall performance of SVM for prognosis of breast cancer using histopathological data. 500 epochs and the final overall performance is calculated by taking the average of all three cross runs with respect to 500 epochs and is shown in Fig. 7.7. The comparison of SVM and AANN obtained by taking the average performance values of the different cross runs is shown in Fig. 7.8. It can be seen that the accuracy of AANN is far better (86.66%) than SVM (71.81%). It is also seen from the Fig. 7.8 that specificity of SVM is very poor compared to AANN. This implies that SVM could not perform well in identifying non-recurring cases correctly resulting in more false positivies. 122

Table 7.9: AANN : Performance measures for prognosis of breast cancer using histopathological data. Performance in (%) Cross run Epochs Accuracy Specificity Sensitivity F-Score 100 83.40 90.90 78.90 84.02 1 500 86.70 90.90 84.21 88.88 1000 86.70 83.40 88.90 85.70 100 80.00 95.00 72.80 85.70 2 500 86.60 95.00 82.50 88.87 1000 81.60 90.00 77.50 88.00 100 81.67 90.90 76.30 82.84 3 500 86.67 90.90 84.20 89.18 1000 83.33 90.90 78.94 84.92 7.6 Summary In this chapter, the usage of support vector machines and radial basis function neural networks in actual clinical diagnosis was examined. Known sets of cytologically proved tumor data obtained from the Wisconsin breast cancer database were used to train the models to categorize cancer patients according to their diagnosis. Experimental results show that RBFNN gives better performance than SVM for breast cancer classification. Also methods were proposed to arrive at prognosis using AANN and SVM. Cytological features of the Wisconsin breast cancer database was used for this purpose. The experimental results reveal that AANN is better than SVM for prognosis of breast cancer. This work indicates that RBFNN and AANN can be effectively used for breast cancer diagnosis and prognosis to help oncologists. 123

Fig. 7.7: Overall performance of AANN for prognosis of breast cancer using histopathological data. Fig. 7.8: Performance comparison of SVM and AANN for prognosis of breast cancer using histopathological data. 124