Chapter 7. Diagnosis and Prognosis of Breast Cancer using Histopathological Data
|
|
- Henry Tyler
- 7 years ago
- Views:
Transcription
1 Chapter 7 Diagnosis and Prognosis of Breast Cancer using Histopathological Data In the previous chapter, a method for classification of mammograms using wavelet analysis and adaptive neuro-fuzzy inference system (ANFIS) was analyzed. In this chapter, cytologically proved tumors are evaluated using support vector machine (SVM), radial basis function neural network (RBFNN) and auto associative neural network (AANN) based on the analysis of the histopathological data obtained from fine needle aspirate (FNA) procedure. Diagnosis of breast cancer is carried out using the polynomial kernel of SVM and RBFNN. Accurate cancer prognosis prediction is critical to cancer treatment. Prognosis is a medical term denoting the doctor s prediction of how a patient will progress, and whether there is a chance of recovery. In this chapter, prognosis of breast cancer is also carried out using a different set of histopathological data and the classifiers namely SVM and AANN are used to predict the long term behavior of the disease. 7.1 Introduction A pathologist is a physician who analyzes cells and tissues under a microscope. The pathologist s report helps to characterize specimens taken during biopsy or other surgical procedures and also helps to determine the treatment. Histology is the study of tissues, including cellular structure and function. To determine a tumor s histologic grade, pathologists examine the tissue for cellular patterns under a microscope. A sample of breast cells may be taken from a breast biopsy and the findings of the pathologist are recorded to form a database and this serves as input to the classifier. 108
2 In this chapter, histopathological data obtained from the Wisconsin breast cancer database is used in the diagnosis and prognosis of breast cancer. 7.2 Dataset used for Diagnosis of Breast Cancer In this section, histopathological data are used to demonstrate the applicability of SVM and RBFNN to medical diagnosis and decision making. The database containing 699 instances of breast cancer cases obtained from the Wisconsin diagnosis breast cancer database [29] is used for this purpose. The feature vector formulated has nine attributes related to the frequency of cell mitosis (rate of cell division) and nuclear pleomorphism (change in cell size, shape and uniformity), etc. The nine features used for classification include clump thickness, uniformity of cell size, uniformity of cell shape, marginal adhesion, single epithelial cell size, bare nuclei, bland chromatin, normal nucleoli and mitosis. These nine characteristics are found to differ significantly between benign and malignant samples. Each of the nine cytological characteristics of breast FNA reported to differ between benign and malignant samples graded 1 to 10 at the time of sample collection. Out of the total data 65.5% data belong to benign class and the remaining 34.5% of the data belong to malignant class. 7.3 Techniques for Diagnosis of Breast Cancer Radial Basis Function Neural Network The RBFNN has a feed forward architecture as shown in Fig The construction of a radial basis function network in its most basic form involves three different layers. The input layer is made up of N I units for a N I dimensional input vector. The input layer is fully connected to the second layer which is a hidden layer of N H units. The hidden layer units are fully connected to the N C output layer units where N C is the number of output classes. The output layer supplies the response of the network to the activation patterns applied to the input layer. The transformation from the input space to the hidden-unit space is nonlinear whereas the transformation from the hidden-unit space to the output space is linear [180]. 109
3 Fig. 7.1: Architecture of a radial basis function neural network. The activation functions (AFs) of the hidden layers are chosen to be Gaussian and are characterized by their mean vectors (centers) µ i, and covariance matrices Σ i, i = 1, 2..., N H. For simplicity it is assumed that the covariance matrices are of the form Σ i = σi 2 I, where i = 1, 2..., N H. Then the activation function of the i th hidden unit for an input vector x j is given by (7.1): ( ) xj µ i 2 g i (x j ) = exp 2σ 2 i (7.1) The µ i and σi 2 are estimated using a suitable clustering algorithm. The number of AFs in the network and their spread influence the smoothness of the mapping. The number of hidden units is empirically determined and it is assumed that σi 2 = σ2, where σ 2 is given in (7.2). σ 2 = η l2 2 (7.2) In (7.2), l is the maximum distance between the chosen centers and η is the empirical factor which serves to control the smoothness of the mapping function. Therefore (7.1) 110
4 is rewritten as ( ) xj µ i 2 g i (x j ) = exp (7.3) η l 2 The hidden layer units are fully connected to the N C output layer through weights λ ik. The output units are linear, and the response of the k th output for an input x j is given by y k (x j ) = N H i=0 λ ik g i (x j ), k = 1, 2,...N C (7.4) where g 0 (x j ) = 1. Given N T cytology feature vectors from N C classes, that is (benign and malignant) training the RBFNN involves estimating µ i, i = 1, 2,...N H, η, l 2 and λ ik, i = 1, 2,...N H. Training the RBFNN involves two stages [181]. First, the basis functions must be established using an algorithm to cluster data in the training set. Typical ways to do this include Kohonen self organizing maps [182], k-means clustering, decision trees, genetic algorithms or orthogonal least squares algorithms [183]. In this study, k-means clustering is used. k-means clustering involves sorting all objects into a predefined number of groups by minimizing the total squared Euclidean distance for each object with respect to its nearest cluster center. Next, it is necessary to fix the weights linking the hidden and the output layers. If neurons in the output layer contain linear activation functions, these weights can be calculated directly using matrix inversion (using singular value decomposition) and matrix multiplication. Because of the direct calculation of weights in an RBFNN, it is usually much quicker to train than an equivalent multi-layer perceptron (MLP) training algorithm Experimental Results and Discussion Nine cytological features of breast fine-needle aspirates reported to differ between benign and malignant samples of 699 patients are used to train and test the models. All the features are first normalized between 1 and +1 in order for the classifier to have a common range to work with. A program has been written in C language for that purpose. 111
5 Training and Testing RBFNN In this implementation, the k-means unsupervised algorithm was used to estimate the hidden-layer weights from a set of training data. After the initial training and the estimation of the hidden-layer weights, the weights in the output layer are computed. The training phase consists of two steps. By using the k-means algorithm, appropriate centers are generated based on the training patterns as the first step. Initially the dataset containing 699 patterns are stored as two data files one containing data related to benign class (458 instances)and the other related to malignant class (241 instances). A program has been written to generate the required number of centroids for each of the class datasets. Then all the generated means are combined into a single file. The computed centers are copied into the corresponding links. Evenly distributed centers from the training patterns are selected and assigned to the links between input and hidden layer. The second step is the computation of the weights between the hidden layer and the output layer. Then another program has been written to test the data using the weights so generated. The performance of the classifier has been found out by varying the number of centroids in each run. The classifier output for the test data has been compared with the original class attribute for identifying true positives, true negatives, false positives and false negative values. Table 7.1 gives these values in the form of a confusion matrix and Table 7.2 shows the performance metrics calculated using this confusion matrix. The overall performance of RBFNN is arrived at by taking the average performance values of the different clusters and it is shown in Fig Support vector machine A support vector machine performs classification by constructing an N-dimensional hyperplane that optimally separates the data into two categories. A brief overview of SVM is given in Section of Chapter
6 Table 7.1: RBFNN : Confusion matrix for diagnosis of breast cancer using histopathological data. No.of Clusters (k) tp tn f p f n Table 7.2: RBFNN : Performance measures for diagnosis of breast cancer using histopathological data. Performance in (%) No. of Clusters Accuracy Specificity Sensitivity F-Score Training and Testing SVM SVM Torch, is used for training and testing the model [173]. In order to evaluate the result three fold cross validation is used. A program has been written in C to divide the data randomly into three different sets for training and testing the classifier. The train data includes the nine feature attributes and a class attribute, while the test data has only the nine feature attributes excluding the class attribute. The polynomial kernel based SVM is trained using two third (433 instances) of the data randomly chosen and tested with the remaining one third (233 instances) of the data for evaluating the classifier s effectiveness. Training and testing is done using all the three randomly 113
7 Fig. 7.2: Overall performance of RBFNN for diagnosis of breast cancer using histopathological data. divided sets(3 cross validation)to ensure fair and unbiased classification. The classifier output for the test data is compared with the original class attribute for identifying true positives, true negatives, false positives and false negative values. Table 7.3 gives these values in the form of a confusion matrix and Table 7.4 shows the performance metrics calculated using this confusion matrix. The overall performance of SVM is arrived at, by taking the average performance values of the different cross runs and it is shown in Fig. 7.3 Accuracy approximates, how effective the algorithm is, by showing the probability of the true value of the class label. Table 7.5 shows that the accuracy of RBFNN in classifying benign and malignant mass using cytological data is better (96.31%) than SVM ( 92.11%). Sensitivity/Specificity approximates the probability of the positive/negative label being true (assesses the effectiveness of the algorithm on a single class). Here positive 114
8 Table 7.3: SVM : Confusion matrix for diagnosis of breast cancer using histopathological data. Cross run tp tn f p f n Table 7.4: SVM : Performance measures for diagnosis of breast cancer using histopathological data. Performance in (%) Cross run Accuracy Specificity Sensitivity F-Score refers to benign mass and negative refers to malignant mass. Referring to Table 7.5, it can be observed that the sensitivity of both RBFNN and SVM is around 96% indicating that they are equally good in identifying malignant mass correctly. However, RBFNN is far better than SVM in identifying benign masses having a sensitivity of 96.73% in comparison to the sensitivity of SVM (86.73%), indicating that SVM has failed in identifying the true positives correctly. F-Score is a composite measure which favors algorithms with higher sensitivity and challenges those with higher specificity. RBFNN having a higher sensitivity has higher value of F-Score compared to SVM as seen in the Fig
9 Fig. 7.3: Overall performance of SVM for breast cancer diagnosis using histopathological data. 7.4 Dataset used for Prognosis of Breast Cancer The word prognosis is often used in medical reports dictating a physician s view on a case. Prognosis is a medical term denoting the doctor s prediction of how a patient will progress, and whether there is a chance of recovery. In other words, prognosis is the prediction of long term behavior of the disease. In this work, prognosis is done using cytological features and classifiers such as support vector machine and auto associative neural network which are used to classify the disease as either recurrent or non-recurrent. Three fold cross validation is done to avoid bias in classifying and the performance metrics such as accuracy, sensitivity, specificity, F-score of SVM and AANN is found and compared. The dataset of 198 samples from Wisconsin prognosis breast cancer database [29] are taken as input. Among the dataset, two-third of the dataset are used for training the classifier, and one-third of the dataset are used for testing the classifier. The 116
10 Fig. 7.4: Graph comparing performance of SVM and RBFNN for diagnosis of breast cancer using histopathological data. first attribute of each sample is discarded which is an ID number. The remaining 34 attributes are considered for training and testing the classifier. Some of the attributes include time, radius, texture, perimeter, area, smoothness, compactness, concavity, symmetry, fractal dimension, etc. These features specify the texture of the cell which gives the physician the clue for arriving at the the prognosis. In this work, an attempt has been made to automate prognosis using these features and pattern classification models namely SVM and AANN. 7.5 Techniques for Prognosis of Breast Cancer SVM and AANN are used to classify the cancer as recurrent or non recurrent based on cytological data. SVM has been dealt with in detail in Chapter 4. This section discusses AANN. 117
11 Table 7.5: Performance comparison of SVM and RBFNN for diagnosis of breast cancer using histopathological data Measures (%) SVM RBF Accuracy Sensitivity Specificity F-Score Autoassociative Neural Network Autoassociative neural network is a special class of feedforward neural network architecture having some interesting properties which can be exploited for some pattern recognition tasks [184]. Separate AANN models are used to capture the distribution of feature vectors of each class namely recurrent and non-recurrent. A five layer autoassociative neural network model is shown in Fig Autoassociative neural network is a network having the same number of neurons in input and output layers, and less in the hidden layers. The network is trained using the input vector itself as the desired output. This training leads to organize a compression encoding network between the input layer and the hidden layer, and a decoding network between the hidden layer and the output layer as shown in Fig Each of the autoassociative networks is trained independently for each class using the feature vector of the class. As a result, the squared error between an input and the output is generally minimized by the network of the class to which the input pattern belongs. This property enables to classify an unknown input pattern. The unknown pattern is fed to all networks, and is classified to the class with minimum squared error. The processing units of the input layer and the output layer are linear, whereas the units in the hidden layer are nonlinear [185]. During training of the network, the target vectors are the same as the input vectors. To realize the input vectors at the output layer, the network projects an M-dimensional vector in the input space R M onto a vector in the subspace R N, and then maps it back onto the M-dimensional space, where N < M. The network performs nonlinear 118
12 Fig. 7.5: An autoassociative neural network. principal component analysis of projecting the input vectors onto the subspace R N. The subspace R N is the space spanned by the first N principal components derived from the training data. The value of N is determined by the number of units in the dimension compression layer. The mapping of the subspace R N back to the M - dimensional space R M determines the way in which the subspace R N is embedded in the original subspace R M. It has been shown that the AANN trained with a dataset will capture the subspace and the hyper surface along the maximum variance of the data [186] and [169]. In other words, AANN can be used to capture the distribution of the given data set. In this work, this feature of AANN is used to classify recurrent and non-recurrent cancer cases. 119
13 7.5.1 Experimental Results and Discussion Thirty three cytological features of breast fine-needle aspirates reported to differ between recurrent and non-recurrent cases of 198 patients are used to train and test the models. Training and testing the SVM In order to evaluate the result, three fold cross validation is used. The training data includes thirty four feature attributes and a class attribute while the test data has only thirty three feature attributes excluding the class attribute. The data are normalized between -1 and +1 in order for the classifier to have a common range to work with. SVM Torch is used for training and testing the model [173]. The polynomial kernel based SVM is trained using two third of the data randomly chosen and tested with the remaining one third of the data for evaluating the classifier s effectiveness. Table 7.6 shows the confusion matrix and Table 7.7 depicts the performance measures for the three different cross runs. The overall performance of SVM was arrived at, by taking the average performance values of the different cross runs and it is shown in Fig Table 7.6: SVM : Confusion matrix for prognosis of breast cancer using histopathological data. Cross run tp tn f p f n
14 Table 7.7: SVM : Performance measures for prognosis of breast cancer using histopathological data. Performance in (%) Cross run Accuracy Specificity Sensitivity F-Score Table 7.8: AANN : Confusion matrix for prognosis of breast cancer using histopathological data. Cross run Epochs tp tn f p f n Training and Testing AANN The structure of the AANN model used is 12L 38N 4N 38N 12L, where L denotes linear units and N denotes non-linear units. The activation function of the non-linear unit is a hyperbolic tangent function. The network is trained using error backpropagation learning algorithm for 100, 500 and 1000 epochs. Table 7.8 gives the results in the form of a confusion matrix and the performance metrics calculated for the three different cross runs is shown in Table 7.9. The AANN gives better performance for 121
15 Fig. 7.6: Overall performance of SVM for prognosis of breast cancer using histopathological data. 500 epochs and the final overall performance is calculated by taking the average of all three cross runs with respect to 500 epochs and is shown in Fig The comparison of SVM and AANN obtained by taking the average performance values of the different cross runs is shown in Fig It can be seen that the accuracy of AANN is far better (86.66%) than SVM (71.81%). It is also seen from the Fig. 7.8 that specificity of SVM is very poor compared to AANN. This implies that SVM could not perform well in identifying non-recurring cases correctly resulting in more false positivies. 122
16 Table 7.9: AANN : Performance measures for prognosis of breast cancer using histopathological data. Performance in (%) Cross run Epochs Accuracy Specificity Sensitivity F-Score Summary In this chapter, the usage of support vector machines and radial basis function neural networks in actual clinical diagnosis was examined. Known sets of cytologically proved tumor data obtained from the Wisconsin breast cancer database were used to train the models to categorize cancer patients according to their diagnosis. Experimental results show that RBFNN gives better performance than SVM for breast cancer classification. Also methods were proposed to arrive at prognosis using AANN and SVM. Cytological features of the Wisconsin breast cancer database was used for this purpose. The experimental results reveal that AANN is better than SVM for prognosis of breast cancer. This work indicates that RBFNN and AANN can be effectively used for breast cancer diagnosis and prognosis to help oncologists. 123
17 Fig. 7.7: Overall performance of AANN for prognosis of breast cancer using histopathological data. Fig. 7.8: Performance comparison of SVM and AANN for prognosis of breast cancer using histopathological data. 124
Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data
CMPE 59H Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data Term Project Report Fatma Güney, Kübra Kalkan 1/15/2013 Keywords: Non-linear
More informationData Mining Analysis (breast-cancer data)
Data Mining Analysis (breast-cancer data) Jung-Ying Wang Register number: D9115007, May, 2003 Abstract In this AI term project, we compare some world renowned machine learning tools. Including WEKA data
More informationFeature Selection using Integer and Binary coded Genetic Algorithm to improve the performance of SVM Classifier
Feature Selection using Integer and Binary coded Genetic Algorithm to improve the performance of SVM Classifier D.Nithya a, *, V.Suganya b,1, R.Saranya Irudaya Mary c,1 Abstract - This paper presents,
More informationArtificial Neural Networks and Support Vector Machines. CS 486/686: Introduction to Artificial Intelligence
Artificial Neural Networks and Support Vector Machines CS 486/686: Introduction to Artificial Intelligence 1 Outline What is a Neural Network? - Perceptron learners - Multi-layer networks What is a Support
More informationAnalysis of kiva.com Microlending Service! Hoda Eydgahi Julia Ma Andy Bardagjy December 9, 2010 MAS.622j
Analysis of kiva.com Microlending Service! Hoda Eydgahi Julia Ma Andy Bardagjy December 9, 2010 MAS.622j What is Kiva? An organization that allows people to lend small amounts of money via the Internet
More informationClassifying Large Data Sets Using SVMs with Hierarchical Clusters. Presented by :Limou Wang
Classifying Large Data Sets Using SVMs with Hierarchical Clusters Presented by :Limou Wang Overview SVM Overview Motivation Hierarchical micro-clustering algorithm Clustering-Based SVM (CB-SVM) Experimental
More information1. Classification problems
Neural and Evolutionary Computing. Lab 1: Classification problems Machine Learning test data repository Weka data mining platform Introduction Scilab 1. Classification problems The main aim of a classification
More informationSUCCESSFUL PREDICTION OF HORSE RACING RESULTS USING A NEURAL NETWORK
SUCCESSFUL PREDICTION OF HORSE RACING RESULTS USING A NEURAL NETWORK N M Allinson and D Merritt 1 Introduction This contribution has two main sections. The first discusses some aspects of multilayer perceptrons,
More informationData Mining Techniques for Prognosis in Pancreatic Cancer
Data Mining Techniques for Prognosis in Pancreatic Cancer by Stuart Floyd A Thesis Submitted to the Faculty of the WORCESTER POLYTECHNIC INSTITUE In partial fulfillment of the requirements for the Degree
More information203.4770: Introduction to Machine Learning Dr. Rita Osadchy
203.4770: Introduction to Machine Learning Dr. Rita Osadchy 1 Outline 1. About the Course 2. What is Machine Learning? 3. Types of problems and Situations 4. ML Example 2 About the course Course Homepage:
More informationPredictive Dynamix Inc
Predictive Modeling Technology Predictive modeling is concerned with analyzing patterns and trends in historical and operational data in order to transform data into actionable decisions. This is accomplished
More information3 An Illustrative Example
Objectives An Illustrative Example Objectives - Theory and Examples -2 Problem Statement -2 Perceptron - Two-Input Case -4 Pattern Recognition Example -5 Hamming Network -8 Feedforward Layer -8 Recurrent
More informationNeural Networks and Support Vector Machines
INF5390 - Kunstig intelligens Neural Networks and Support Vector Machines Roar Fjellheim INF5390-13 Neural Networks and SVM 1 Outline Neural networks Perceptrons Neural networks Support vector machines
More informationSURVIVABILITY ANALYSIS OF PEDIATRIC LEUKAEMIC PATIENTS USING NEURAL NETWORK APPROACH
330 SURVIVABILITY ANALYSIS OF PEDIATRIC LEUKAEMIC PATIENTS USING NEURAL NETWORK APPROACH T. M. D.Saumya 1, T. Rupasinghe 2 and P. Abeysinghe 3 1 Department of Industrial Management, University of Kelaniya,
More informationData Mining. Supervised Methods. Ciro Donalek donalek@astro.caltech.edu. Ay/Bi 199ab: Methods of Computa@onal Sciences hcp://esci101.blogspot.
Data Mining Supervised Methods Ciro Donalek donalek@astro.caltech.edu Supervised Methods Summary Ar@ficial Neural Networks Mul@layer Perceptron Support Vector Machines SoLwares Supervised Models: Supervised
More informationData Mining Part 5. Prediction
Data Mining Part 5. Prediction 5.1 Spring 2010 Instructor: Dr. Masoud Yaghini Outline Classification vs. Numeric Prediction Prediction Process Data Preparation Comparing Prediction Methods References Classification
More informationAn Introduction to Neural Networks
An Introduction to Vincent Cheung Kevin Cannons Signal & Data Compression Laboratory Electrical & Computer Engineering University of Manitoba Winnipeg, Manitoba, Canada Advisor: Dr. W. Kinsner May 27,
More informationComparison of Supervised and Unsupervised Learning Classifiers for Travel Recommendations
Volume 3, No. 8, August 2012 Journal of Global Research in Computer Science REVIEW ARTICLE Available Online at www.jgrcs.info Comparison of Supervised and Unsupervised Learning Classifiers for Travel Recommendations
More informationPredict Influencers in the Social Network
Predict Influencers in the Social Network Ruishan Liu, Yang Zhao and Liuyu Zhou Email: rliu2, yzhao2, lyzhou@stanford.edu Department of Electrical Engineering, Stanford University Abstract Given two persons
More informationComparison of K-means and Backpropagation Data Mining Algorithms
Comparison of K-means and Backpropagation Data Mining Algorithms Nitu Mathuriya, Dr. Ashish Bansal Abstract Data mining has got more and more mature as a field of basic research in computer science and
More informationClass #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris
Class #6: Non-linear classification ML4Bio 2012 February 17 th, 2012 Quaid Morris 1 Module #: Title of Module 2 Review Overview Linear separability Non-linear classification Linear Support Vector Machines
More informationChapter 12 Discovering New Knowledge Data Mining
Chapter 12 Discovering New Knowledge Data Mining Becerra-Fernandez, et al. -- Knowledge Management 1/e -- 2004 Prentice Hall Additional material 2007 Dekai Wu Chapter Objectives Introduce the student to
More informationPredict the Popularity of YouTube Videos Using Early View Data
000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050
More informationInternational Journal of Computer Science Trends and Technology (IJCST) Volume 3 Issue 3, May-June 2015
RESEARCH ARTICLE OPEN ACCESS Data Mining Technology for Efficient Network Security Management Ankit Naik [1], S.W. Ahmad [2] Student [1], Assistant Professor [2] Department of Computer Science and Engineering
More informationBEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES
BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES 123 CHAPTER 7 BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES 7.1 Introduction Even though using SVM presents
More informationArtificial Neural Network, Decision Tree and Statistical Techniques Applied for Designing and Developing E-mail Classifier
International Journal of Recent Technology and Engineering (IJRTE) ISSN: 2277-3878, Volume-1, Issue-6, January 2013 Artificial Neural Network, Decision Tree and Statistical Techniques Applied for Designing
More informationMonotonicity Hints. Abstract
Monotonicity Hints Joseph Sill Computation and Neural Systems program California Institute of Technology email: joe@cs.caltech.edu Yaser S. Abu-Mostafa EE and CS Deptartments California Institute of Technology
More informationScienceDirect. Brain Image Classification using Learning Machine Approach and Brain Structure Analysis
Available online at www.sciencedirect.com ScienceDirect Procedia Computer Science 50 (2015 ) 388 394 2nd International Symposium on Big Data and Cloud Computing (ISBCC 15) Brain Image Classification using
More informationClassification algorithm in Data mining: An Overview
Classification algorithm in Data mining: An Overview S.Neelamegam #1, Dr.E.Ramaraj *2 #1 M.phil Scholar, Department of Computer Science and Engineering, Alagappa University, Karaikudi. *2 Professor, Department
More informationFRAUD DETECTION IN ELECTRIC POWER DISTRIBUTION NETWORKS USING AN ANN-BASED KNOWLEDGE-DISCOVERY PROCESS
FRAUD DETECTION IN ELECTRIC POWER DISTRIBUTION NETWORKS USING AN ANN-BASED KNOWLEDGE-DISCOVERY PROCESS Breno C. Costa, Bruno. L. A. Alberto, André M. Portela, W. Maduro, Esdras O. Eler PDITec, Belo Horizonte,
More informationData Mining: A Hybrid Approach on the Clinical Diagnosis of Breast Tumor Patients
Data Mining: A Hybrid Approach on the Clinical Diagnosis of Breast Tumor Patients Onuodu F. E. 1, Eke B. O. 2 2 bathoyol@gmail.com, University of Port Harcourt, Port Harcourt, Nigeria 1 University of Port
More informationSupervised Feature Selection & Unsupervised Dimensionality Reduction
Supervised Feature Selection & Unsupervised Dimensionality Reduction Feature Subset Selection Supervised: class labels are given Select a subset of the problem features Why? Redundant features much or
More informationDiagnosis of Breast Cancer Using Intelligent Techniques
International Journal of Emerging Science and Engineering (IJESE) Diagnosis of Breast Cancer Using Intelligent Techniques H.S.Hota Abstract- Breast cancer is a serious and life threatening disease due
More informationA Partially Supervised Metric Multidimensional Scaling Algorithm for Textual Data Visualization
A Partially Supervised Metric Multidimensional Scaling Algorithm for Textual Data Visualization Ángela Blanco Universidad Pontificia de Salamanca ablancogo@upsa.es Spain Manuel Martín-Merino Universidad
More informationTHREE DIMENSIONAL REPRESENTATION OF AMINO ACID CHARAC- TERISTICS
THREE DIMENSIONAL REPRESENTATION OF AMINO ACID CHARAC- TERISTICS O.U. Sezerman 1, R. Islamaj 2, E. Alpaydin 2 1 Laborotory of Computational Biology, Sabancı University, Istanbul, Turkey. 2 Computer Engineering
More informationJava Modules for Time Series Analysis
Java Modules for Time Series Analysis Agenda Clustering Non-normal distributions Multifactor modeling Implied ratings Time series prediction 1. Clustering + Cluster 1 Synthetic Clustering + Time series
More informationBig Data Analytics CSCI 4030
High dim. data Graph data Infinite data Machine learning Apps Locality sensitive hashing PageRank, SimRank Filtering data streams SVM Recommen der systems Clustering Community Detection Web advertising
More informationEVALUATION OF NEURAL NETWORK BASED CLASSIFICATION SYSTEMS FOR CLINICAL CANCER DATA CLASSIFICATION
EVALUATION OF NEURAL NETWORK BASED CLASSIFICATION SYSTEMS FOR CLINICAL CANCER DATA CLASSIFICATION K. Mumtaz Vivekanandha Institute of Information and Management Studies, Tiruchengode, India S.A.Sheriff
More information6.2.8 Neural networks for data mining
6.2.8 Neural networks for data mining Walter Kosters 1 In many application areas neural networks are known to be valuable tools. This also holds for data mining. In this chapter we discuss the use of neural
More informationTowards better accuracy for Spam predictions
Towards better accuracy for Spam predictions Chengyan Zhao Department of Computer Science University of Toronto Toronto, Ontario, Canada M5S 2E4 czhao@cs.toronto.edu Abstract Spam identification is crucial
More informationHow To Cluster
Data Clustering Dec 2nd, 2013 Kyrylo Bessonov Talk outline Introduction to clustering Types of clustering Supervised Unsupervised Similarity measures Main clustering algorithms k-means Hierarchical Main
More informationKNOWLEDGE BASED ANALYSIS OF VARIOUS STATISTICAL TOOLS IN DETECTING BREAST CANCER
KNOWLEDGE BASED ANALYSIS OF VARIOUS STATISTICAL TOOLS IN DETECTING BREAST CANCER S. Aruna 1, Dr S.P. Rajagopalan 2 and L.V. Nandakishore 3 1,2 Department of Computer Applications, Dr M.G.R University,
More informationFacebook Friend Suggestion Eytan Daniyalzade and Tim Lipus
Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus 1. Introduction Facebook is a social networking website with an open platform that enables developers to extract and utilize user information
More informationFeature Subset Selection in E-mail Spam Detection
Feature Subset Selection in E-mail Spam Detection Amir Rajabi Behjat, Universiti Technology MARA, Malaysia IT Security for the Next Generation Asia Pacific & MEA Cup, Hong Kong 14-16 March, 2012 Feature
More informationInformation Model Requirements of Post-Coordinated SNOMED CT Expressions for Structured Pathology Reports
Information Model Requirements of Post-Coordinated SNOMED CT Expressions for Structured Pathology Reports W. Scott Campbell, Ph.D., MBA James R. Campbell, MD Acknowledgements Steven H. Hinrichs, MD Chairman
More informationA New Approach For Estimating Software Effort Using RBFN Network
IJCSNS International Journal of Computer Science and Network Security, VOL.8 No.7, July 008 37 A New Approach For Estimating Software Using RBFN Network Ch. Satyananda Reddy, P. Sankara Rao, KVSVN Raju,
More informationAutomated Stellar Classification for Large Surveys with EKF and RBF Neural Networks
Chin. J. Astron. Astrophys. Vol. 5 (2005), No. 2, 203 210 (http:/www.chjaa.org) Chinese Journal of Astronomy and Astrophysics Automated Stellar Classification for Large Surveys with EKF and RBF Neural
More informationMachine learning for algo trading
Machine learning for algo trading An introduction for nonmathematicians Dr. Aly Kassam Overview High level introduction to machine learning A machine learning bestiary What has all this got to do with
More informationNovelty Detection in image recognition using IRF Neural Networks properties
Novelty Detection in image recognition using IRF Neural Networks properties Philippe Smagghe, Jean-Luc Buessler, Jean-Philippe Urban Université de Haute-Alsace MIPS 4, rue des Frères Lumière, 68093 Mulhouse,
More informationSocial Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
More informationBIOINF 585 Fall 2015 Machine Learning for Systems Biology & Clinical Informatics http://www.ccmb.med.umich.edu/node/1376
Course Director: Dr. Kayvan Najarian (DCM&B, kayvan@umich.edu) Lectures: Labs: Mondays and Wednesdays 9:00 AM -10:30 AM Rm. 2065 Palmer Commons Bldg. Wednesdays 10:30 AM 11:30 AM (alternate weeks) Rm.
More informationApplication of Data Mining Techniques to Model Breast Cancer Data
Application of Data Mining Techniques to Model Breast Cancer Data S. Syed Shajahaan 1, S. Shanthi 2, V. ManoChitra 3 1 Department of Information Technology, Rathinam Technical Campus, Anna University,
More informationAccurate and robust image superresolution by neural processing of local image representations
Accurate and robust image superresolution by neural processing of local image representations Carlos Miravet 1,2 and Francisco B. Rodríguez 1 1 Grupo de Neurocomputación Biológica (GNB), Escuela Politécnica
More informationVisualization of Breast Cancer Data by SOM Component Planes
International Journal of Science and Technology Volume 3 No. 2, February, 2014 Visualization of Breast Cancer Data by SOM Component Planes P.Venkatesan. 1, M.Mullai 2 1 Department of Statistics,NIRT(Indian
More informationPredictive Data modeling for health care: Comparative performance study of different prediction models
Predictive Data modeling for health care: Comparative performance study of different prediction models Shivanand Hiremath hiremat.nitie@gmail.com National Institute of Industrial Engineering (NITIE) Vihar
More informationINTERNATIONAL JOURNAL FOR ENGINEERING APPLICATIONS AND TECHNOLOGY DATA MINING IN HEALTHCARE SECTOR. ankitanandurkar2394@gmail.com
IJFEAT INTERNATIONAL JOURNAL FOR ENGINEERING APPLICATIONS AND TECHNOLOGY DATA MINING IN HEALTHCARE SECTOR Bharti S. Takey 1, Ankita N. Nandurkar 2,Ashwini A. Khobragade 3,Pooja G. Jaiswal 4,Swapnil R.
More informationSupport Vector Machines with Clustering for Training with Very Large Datasets
Support Vector Machines with Clustering for Training with Very Large Datasets Theodoros Evgeniou Technology Management INSEAD Bd de Constance, Fontainebleau 77300, France theodoros.evgeniou@insead.fr Massimiliano
More informationARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING)
ARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING) Gabriela Ochoa http://www.cs.stir.ac.uk/~goc/ OUTLINE Preliminaries Classification and Clustering Applications
More informationVolume 2, Issue 9, September 2014 International Journal of Advance Research in Computer Science and Management Studies
Volume 2, Issue 9, September 2014 International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online at: www.ijarcsms.com
More informationEnvironmental Remote Sensing GEOG 2021
Environmental Remote Sensing GEOG 2021 Lecture 4 Image classification 2 Purpose categorising data data abstraction / simplification data interpretation mapping for land cover mapping use land cover class
More informationOBJECTIVES By the end of this segment, the community participant will be able to:
Cancer 101: Cancer Diagnosis and Staging Linda U. Krebs, RN, PhD, AOCN, FAAN OCEAN Native Navigators and the Cancer Continuum (NNACC) (NCMHD R24MD002811) Cancer 101: Diagnosis & Staging (Watanabe-Galloway
More informationData Mining Algorithms Part 1. Dejan Sarka
Data Mining Algorithms Part 1 Dejan Sarka Join the conversation on Twitter: @DevWeek #DW2015 Instructor Bio Dejan Sarka (dsarka@solidq.com) 30 years of experience SQL Server MVP, MCT, 13 books 7+ courses
More informationMethods and Applications for Distance Based ANN Training
Methods and Applications for Distance Based ANN Training Christoph Lassner, Rainer Lienhart Multimedia Computing and Computer Vision Lab Augsburg University, Universitätsstr. 6a, 86159 Augsburg, Germany
More informationA new Approach for Intrusion Detection in Computer Networks Using Data Mining Technique
A new Approach for Intrusion Detection in Computer Networks Using Data Mining Technique Aida Parbaleh 1, Dr. Heirsh Soltanpanah 2* 1 Department of Computer Engineering, Islamic Azad University, Sanandaj
More informationUsing Data Mining for Mobile Communication Clustering and Characterization
Using Data Mining for Mobile Communication Clustering and Characterization A. Bascacov *, C. Cernazanu ** and M. Marcu ** * Lasting Software, Timisoara, Romania ** Politehnica University of Timisoara/Computer
More informationMethod of Combining the Degrees of Similarity in Handwritten Signature Authentication Using Neural Networks
Method of Combining the Degrees of Similarity in Handwritten Signature Authentication Using Neural Networks Ph. D. Student, Eng. Eusebiu Marcu Abstract This paper introduces a new method of combining the
More informationStock Prediction using Artificial Neural Networks
Stock Prediction using Artificial Neural Networks Abhishek Kar (Y8021), Dept. of Computer Science and Engineering, IIT Kanpur Abstract In this work we present an Artificial Neural Network approach to predict
More informationRecurrent Neural Networks
Recurrent Neural Networks Neural Computation : Lecture 12 John A. Bullinaria, 2015 1. Recurrent Neural Network Architectures 2. State Space Models and Dynamical Systems 3. Backpropagation Through Time
More informationComparison of Supervised and Unsupervised Learning Algorithms for Pattern Classification
Comparison of Supervised and Unsupervised Learning Algorithms for Pattern Classification R. Sathya Professor, Dept. of MCA, Jyoti Nivas College (Autonomous), Professor and Head, Dept. of Mathematics, Bangalore,
More informationA Content based Spam Filtering Using Optical Back Propagation Technique
A Content based Spam Filtering Using Optical Back Propagation Technique Sarab M. Hameed 1, Noor Alhuda J. Mohammed 2 Department of Computer Science, College of Science, University of Baghdad - Iraq ABSTRACT
More informationRole of Neural network in data mining
Role of Neural network in data mining Chitranjanjit kaur Associate Prof Guru Nanak College, Sukhchainana Phagwara,(GNDU) Punjab, India Pooja kapoor Associate Prof Swami Sarvanand Group Of Institutes Dinanagar(PTU)
More informationPATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical
More informationNetwork Intrusion Detection using Semi Supervised Support Vector Machine
Network Intrusion Detection using Semi Supervised Support Vector Machine Jyoti Haweliya Department of Computer Engineering Institute of Engineering & Technology, Devi Ahilya University Indore, India ABSTRACT
More informationData Mining using Artificial Neural Network Rules
Data Mining using Artificial Neural Network Rules Pushkar Shinde MCOERC, Nasik Abstract - Diabetes patients are increasing in number so it is necessary to predict, treat and diagnose the disease. Data
More informationNeural Network Structures
3 Neural Network Structures This chapter describes various types of neural network structures that are useful for RF and microwave applications. The most commonly used neural network configurations, known
More informationMathematical Models of Supervised Learning and their Application to Medical Diagnosis
Genomic, Proteomic and Transcriptomic Lab High Performance Computing and Networking Institute National Research Council, Italy Mathematical Models of Supervised Learning and their Application to Medical
More informationApplication of Data mining in Medical Applications
Application of Data mining in Medical Applications by Arun George Eapen A thesis presented to the University of Waterloo in fulfillment of the thesis requirement for the degree of Master of Applied Science
More informationModelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches
Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches PhD Thesis by Payam Birjandi Director: Prof. Mihai Datcu Problematic
More informationNeural Network Add-in
Neural Network Add-in Version 1.5 Software User s Guide Contents Overview... 2 Getting Started... 2 Working with Datasets... 2 Open a Dataset... 3 Save a Dataset... 3 Data Pre-processing... 3 Lagging...
More informationMulti-class Classification: A Coding Based Space Partitioning
Multi-class Classification: A Coding Based Space Partitioning Sohrab Ferdowsi, Svyatoslav Voloshynovskiy, Marcin Gabryel, and Marcin Korytkowski University of Geneva, Centre Universitaire d Informatique,
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct
More informationFunctional Data Analysis of MALDI TOF Protein Spectra
Functional Data Analysis of MALDI TOF Protein Spectra Dean Billheimer dean.billheimer@vanderbilt.edu. Department of Biostatistics Vanderbilt University Vanderbilt Ingram Cancer Center FDA for MALDI TOF
More informationData quality in Accounting Information Systems
Data quality in Accounting Information Systems Comparing Several Data Mining Techniques Erjon Zoto Department of Statistics and Applied Informatics Faculty of Economy, University of Tirana Tirana, Albania
More informationKeywords data mining, prediction techniques, decision making.
Volume 5, Issue 4, April 2015 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Analysis of Datamining
More informationW6.B.1. FAQs CS535 BIG DATA W6.B.3. 4. If the distance of the point is additionally less than the tight distance T 2, remove it from the original set
http://wwwcscolostateedu/~cs535 W6B W6B2 CS535 BIG DAA FAQs Please prepare for the last minute rush Store your output files safely Partial score will be given for the output from less than 50GB input Computer
More informationKnowledge Discovery from patents using KMX Text Analytics
Knowledge Discovery from patents using KMX Text Analytics Dr. Anton Heijs anton.heijs@treparel.com Treparel Abstract In this white paper we discuss how the KMX technology of Treparel can help searchers
More informationSupervised Learning (Big Data Analytics)
Supervised Learning (Big Data Analytics) Vibhav Gogate Department of Computer Science The University of Texas at Dallas Practical advice Goal of Big Data Analytics Uncover patterns in Data. Can be used
More informationFinal Project Report
CPSC545 by Introduction to Data Mining Prof. Martin Schultz & Prof. Mark Gerstein Student Name: Yu Kor Hugo Lam Student ID : 904907866 Due Date : May 7, 2007 Introduction Final Project Report Pseudogenes
More informationIntroduction to Machine Learning and Data Mining. Prof. Dr. Igor Trajkovski trajkovski@nyus.edu.mk
Introduction to Machine Learning and Data Mining Prof. Dr. Igor Trakovski trakovski@nyus.edu.mk Neural Networks 2 Neural Networks Analogy to biological neural systems, the most robust learning systems
More informationBlood Vessel Classification into Arteries and Veins in Retinal Images
Blood Vessel Classification into Arteries and Veins in Retinal Images Claudia Kondermann and Daniel Kondermann a and Michelle Yan b a Interdisciplinary Center for Scientific Computing (IWR), University
More informationSupervised and unsupervised learning - 1
Chapter 3 Supervised and unsupervised learning - 1 3.1 Introduction The science of learning plays a key role in the field of statistics, data mining, artificial intelligence, intersecting with areas in
More informationREVIEW OF HEART DISEASE PREDICTION SYSTEM USING DATA MINING AND HYBRID INTELLIGENT TECHNIQUES
REVIEW OF HEART DISEASE PREDICTION SYSTEM USING DATA MINING AND HYBRID INTELLIGENT TECHNIQUES R. Chitra 1 and V. Seenivasagam 2 1 Department of Computer Science and Engineering, Noorul Islam Centre for
More informationAUTOMATED CLASSIFICATION OF BLASTS IN ACUTE LEUKEMIA BLOOD SAMPLES USING HMLP NETWORK
AUTOMATED CLASSIFICATION OF BLASTS IN ACUTE LEUKEMIA BLOOD SAMPLES USING HMLP NETWORK N. H. Harun 1, M.Y.Mashor 1, A.S. Abdul Nasir 1 and H.Rosline 2 1 Electronic & Biomedical Intelligent Systems (EBItS)
More informationUsing artificial intelligence for data reduction in mechanical engineering
Using artificial intelligence for data reduction in mechanical engineering L. Mdlazi 1, C.J. Stander 1, P.S. Heyns 1, T. Marwala 2 1 Dynamic Systems Group Department of Mechanical and Aeronautical Engineering,
More informationAUTOMATION OF ENERGY DEMAND FORECASTING. Sanzad Siddique, B.S.
AUTOMATION OF ENERGY DEMAND FORECASTING by Sanzad Siddique, B.S. A Thesis submitted to the Faculty of the Graduate School, Marquette University, in Partial Fulfillment of the Requirements for the Degree
More informationImpelling Heart Attack Prediction System using Data Mining and Artificial Neural Network
General Article International Journal of Current Engineering and Technology E-ISSN 2277 4106, P-ISSN 2347-5161 2014 INPRESSCO, All Rights Reserved Available at http://inpressco.com/category/ijcet Impelling
More informationElectroencephalography Analysis Using Neural Network and Support Vector Machine during Sleep
Engineering, 23, 5, 88-92 doi:.4236/eng.23.55b8 Published Online May 23 (http://www.scirp.org/journal/eng) Electroencephalography Analysis Using Neural Network and Support Vector Machine during Sleep JeeEun
More informationSVM Ensemble Model for Investment Prediction
19 SVM Ensemble Model for Investment Prediction Chandra J, Assistant Professor, Department of Computer Science, Christ University, Bangalore Siji T. Mathew, Research Scholar, Christ University, Dept of
More informationChapter 2 The Research on Fault Diagnosis of Building Electrical System Based on RBF Neural Network
Chapter 2 The Research on Fault Diagnosis of Building Electrical System Based on RBF Neural Network Qian Wu, Yahui Wang, Long Zhang and Li Shen Abstract Building electrical system fault diagnosis is the
More informationChapter 6. The stacking ensemble approach
82 This chapter proposes the stacking ensemble approach for combining different data mining classifiers to get better performance. Other combination techniques like voting, bagging etc are also described
More information