Chapter 7. Diagnosis and Prognosis of Breast Cancer using Histopathological Data


 Henry Tyler
 2 years ago
 Views:
Transcription
1 Chapter 7 Diagnosis and Prognosis of Breast Cancer using Histopathological Data In the previous chapter, a method for classification of mammograms using wavelet analysis and adaptive neurofuzzy inference system (ANFIS) was analyzed. In this chapter, cytologically proved tumors are evaluated using support vector machine (SVM), radial basis function neural network (RBFNN) and auto associative neural network (AANN) based on the analysis of the histopathological data obtained from fine needle aspirate (FNA) procedure. Diagnosis of breast cancer is carried out using the polynomial kernel of SVM and RBFNN. Accurate cancer prognosis prediction is critical to cancer treatment. Prognosis is a medical term denoting the doctor s prediction of how a patient will progress, and whether there is a chance of recovery. In this chapter, prognosis of breast cancer is also carried out using a different set of histopathological data and the classifiers namely SVM and AANN are used to predict the long term behavior of the disease. 7.1 Introduction A pathologist is a physician who analyzes cells and tissues under a microscope. The pathologist s report helps to characterize specimens taken during biopsy or other surgical procedures and also helps to determine the treatment. Histology is the study of tissues, including cellular structure and function. To determine a tumor s histologic grade, pathologists examine the tissue for cellular patterns under a microscope. A sample of breast cells may be taken from a breast biopsy and the findings of the pathologist are recorded to form a database and this serves as input to the classifier. 108
2 In this chapter, histopathological data obtained from the Wisconsin breast cancer database is used in the diagnosis and prognosis of breast cancer. 7.2 Dataset used for Diagnosis of Breast Cancer In this section, histopathological data are used to demonstrate the applicability of SVM and RBFNN to medical diagnosis and decision making. The database containing 699 instances of breast cancer cases obtained from the Wisconsin diagnosis breast cancer database [29] is used for this purpose. The feature vector formulated has nine attributes related to the frequency of cell mitosis (rate of cell division) and nuclear pleomorphism (change in cell size, shape and uniformity), etc. The nine features used for classification include clump thickness, uniformity of cell size, uniformity of cell shape, marginal adhesion, single epithelial cell size, bare nuclei, bland chromatin, normal nucleoli and mitosis. These nine characteristics are found to differ significantly between benign and malignant samples. Each of the nine cytological characteristics of breast FNA reported to differ between benign and malignant samples graded 1 to 10 at the time of sample collection. Out of the total data 65.5% data belong to benign class and the remaining 34.5% of the data belong to malignant class. 7.3 Techniques for Diagnosis of Breast Cancer Radial Basis Function Neural Network The RBFNN has a feed forward architecture as shown in Fig The construction of a radial basis function network in its most basic form involves three different layers. The input layer is made up of N I units for a N I dimensional input vector. The input layer is fully connected to the second layer which is a hidden layer of N H units. The hidden layer units are fully connected to the N C output layer units where N C is the number of output classes. The output layer supplies the response of the network to the activation patterns applied to the input layer. The transformation from the input space to the hiddenunit space is nonlinear whereas the transformation from the hiddenunit space to the output space is linear [180]. 109
3 Fig. 7.1: Architecture of a radial basis function neural network. The activation functions (AFs) of the hidden layers are chosen to be Gaussian and are characterized by their mean vectors (centers) µ i, and covariance matrices Σ i, i = 1, 2..., N H. For simplicity it is assumed that the covariance matrices are of the form Σ i = σi 2 I, where i = 1, 2..., N H. Then the activation function of the i th hidden unit for an input vector x j is given by (7.1): ( ) xj µ i 2 g i (x j ) = exp 2σ 2 i (7.1) The µ i and σi 2 are estimated using a suitable clustering algorithm. The number of AFs in the network and their spread influence the smoothness of the mapping. The number of hidden units is empirically determined and it is assumed that σi 2 = σ2, where σ 2 is given in (7.2). σ 2 = η l2 2 (7.2) In (7.2), l is the maximum distance between the chosen centers and η is the empirical factor which serves to control the smoothness of the mapping function. Therefore (7.1) 110
4 is rewritten as ( ) xj µ i 2 g i (x j ) = exp (7.3) η l 2 The hidden layer units are fully connected to the N C output layer through weights λ ik. The output units are linear, and the response of the k th output for an input x j is given by y k (x j ) = N H i=0 λ ik g i (x j ), k = 1, 2,...N C (7.4) where g 0 (x j ) = 1. Given N T cytology feature vectors from N C classes, that is (benign and malignant) training the RBFNN involves estimating µ i, i = 1, 2,...N H, η, l 2 and λ ik, i = 1, 2,...N H. Training the RBFNN involves two stages [181]. First, the basis functions must be established using an algorithm to cluster data in the training set. Typical ways to do this include Kohonen self organizing maps [182], kmeans clustering, decision trees, genetic algorithms or orthogonal least squares algorithms [183]. In this study, kmeans clustering is used. kmeans clustering involves sorting all objects into a predefined number of groups by minimizing the total squared Euclidean distance for each object with respect to its nearest cluster center. Next, it is necessary to fix the weights linking the hidden and the output layers. If neurons in the output layer contain linear activation functions, these weights can be calculated directly using matrix inversion (using singular value decomposition) and matrix multiplication. Because of the direct calculation of weights in an RBFNN, it is usually much quicker to train than an equivalent multilayer perceptron (MLP) training algorithm Experimental Results and Discussion Nine cytological features of breast fineneedle aspirates reported to differ between benign and malignant samples of 699 patients are used to train and test the models. All the features are first normalized between 1 and +1 in order for the classifier to have a common range to work with. A program has been written in C language for that purpose. 111
5 Training and Testing RBFNN In this implementation, the kmeans unsupervised algorithm was used to estimate the hiddenlayer weights from a set of training data. After the initial training and the estimation of the hiddenlayer weights, the weights in the output layer are computed. The training phase consists of two steps. By using the kmeans algorithm, appropriate centers are generated based on the training patterns as the first step. Initially the dataset containing 699 patterns are stored as two data files one containing data related to benign class (458 instances)and the other related to malignant class (241 instances). A program has been written to generate the required number of centroids for each of the class datasets. Then all the generated means are combined into a single file. The computed centers are copied into the corresponding links. Evenly distributed centers from the training patterns are selected and assigned to the links between input and hidden layer. The second step is the computation of the weights between the hidden layer and the output layer. Then another program has been written to test the data using the weights so generated. The performance of the classifier has been found out by varying the number of centroids in each run. The classifier output for the test data has been compared with the original class attribute for identifying true positives, true negatives, false positives and false negative values. Table 7.1 gives these values in the form of a confusion matrix and Table 7.2 shows the performance metrics calculated using this confusion matrix. The overall performance of RBFNN is arrived at by taking the average performance values of the different clusters and it is shown in Fig Support vector machine A support vector machine performs classification by constructing an Ndimensional hyperplane that optimally separates the data into two categories. A brief overview of SVM is given in Section of Chapter
6 Table 7.1: RBFNN : Confusion matrix for diagnosis of breast cancer using histopathological data. No.of Clusters (k) tp tn f p f n Table 7.2: RBFNN : Performance measures for diagnosis of breast cancer using histopathological data. Performance in (%) No. of Clusters Accuracy Specificity Sensitivity FScore Training and Testing SVM SVM Torch, is used for training and testing the model [173]. In order to evaluate the result three fold cross validation is used. A program has been written in C to divide the data randomly into three different sets for training and testing the classifier. The train data includes the nine feature attributes and a class attribute, while the test data has only the nine feature attributes excluding the class attribute. The polynomial kernel based SVM is trained using two third (433 instances) of the data randomly chosen and tested with the remaining one third (233 instances) of the data for evaluating the classifier s effectiveness. Training and testing is done using all the three randomly 113
7 Fig. 7.2: Overall performance of RBFNN for diagnosis of breast cancer using histopathological data. divided sets(3 cross validation)to ensure fair and unbiased classification. The classifier output for the test data is compared with the original class attribute for identifying true positives, true negatives, false positives and false negative values. Table 7.3 gives these values in the form of a confusion matrix and Table 7.4 shows the performance metrics calculated using this confusion matrix. The overall performance of SVM is arrived at, by taking the average performance values of the different cross runs and it is shown in Fig. 7.3 Accuracy approximates, how effective the algorithm is, by showing the probability of the true value of the class label. Table 7.5 shows that the accuracy of RBFNN in classifying benign and malignant mass using cytological data is better (96.31%) than SVM ( 92.11%). Sensitivity/Specificity approximates the probability of the positive/negative label being true (assesses the effectiveness of the algorithm on a single class). Here positive 114
8 Table 7.3: SVM : Confusion matrix for diagnosis of breast cancer using histopathological data. Cross run tp tn f p f n Table 7.4: SVM : Performance measures for diagnosis of breast cancer using histopathological data. Performance in (%) Cross run Accuracy Specificity Sensitivity FScore refers to benign mass and negative refers to malignant mass. Referring to Table 7.5, it can be observed that the sensitivity of both RBFNN and SVM is around 96% indicating that they are equally good in identifying malignant mass correctly. However, RBFNN is far better than SVM in identifying benign masses having a sensitivity of 96.73% in comparison to the sensitivity of SVM (86.73%), indicating that SVM has failed in identifying the true positives correctly. FScore is a composite measure which favors algorithms with higher sensitivity and challenges those with higher specificity. RBFNN having a higher sensitivity has higher value of FScore compared to SVM as seen in the Fig
9 Fig. 7.3: Overall performance of SVM for breast cancer diagnosis using histopathological data. 7.4 Dataset used for Prognosis of Breast Cancer The word prognosis is often used in medical reports dictating a physician s view on a case. Prognosis is a medical term denoting the doctor s prediction of how a patient will progress, and whether there is a chance of recovery. In other words, prognosis is the prediction of long term behavior of the disease. In this work, prognosis is done using cytological features and classifiers such as support vector machine and auto associative neural network which are used to classify the disease as either recurrent or nonrecurrent. Three fold cross validation is done to avoid bias in classifying and the performance metrics such as accuracy, sensitivity, specificity, Fscore of SVM and AANN is found and compared. The dataset of 198 samples from Wisconsin prognosis breast cancer database [29] are taken as input. Among the dataset, twothird of the dataset are used for training the classifier, and onethird of the dataset are used for testing the classifier. The 116
10 Fig. 7.4: Graph comparing performance of SVM and RBFNN for diagnosis of breast cancer using histopathological data. first attribute of each sample is discarded which is an ID number. The remaining 34 attributes are considered for training and testing the classifier. Some of the attributes include time, radius, texture, perimeter, area, smoothness, compactness, concavity, symmetry, fractal dimension, etc. These features specify the texture of the cell which gives the physician the clue for arriving at the the prognosis. In this work, an attempt has been made to automate prognosis using these features and pattern classification models namely SVM and AANN. 7.5 Techniques for Prognosis of Breast Cancer SVM and AANN are used to classify the cancer as recurrent or non recurrent based on cytological data. SVM has been dealt with in detail in Chapter 4. This section discusses AANN. 117
11 Table 7.5: Performance comparison of SVM and RBFNN for diagnosis of breast cancer using histopathological data Measures (%) SVM RBF Accuracy Sensitivity Specificity FScore Autoassociative Neural Network Autoassociative neural network is a special class of feedforward neural network architecture having some interesting properties which can be exploited for some pattern recognition tasks [184]. Separate AANN models are used to capture the distribution of feature vectors of each class namely recurrent and nonrecurrent. A five layer autoassociative neural network model is shown in Fig Autoassociative neural network is a network having the same number of neurons in input and output layers, and less in the hidden layers. The network is trained using the input vector itself as the desired output. This training leads to organize a compression encoding network between the input layer and the hidden layer, and a decoding network between the hidden layer and the output layer as shown in Fig Each of the autoassociative networks is trained independently for each class using the feature vector of the class. As a result, the squared error between an input and the output is generally minimized by the network of the class to which the input pattern belongs. This property enables to classify an unknown input pattern. The unknown pattern is fed to all networks, and is classified to the class with minimum squared error. The processing units of the input layer and the output layer are linear, whereas the units in the hidden layer are nonlinear [185]. During training of the network, the target vectors are the same as the input vectors. To realize the input vectors at the output layer, the network projects an Mdimensional vector in the input space R M onto a vector in the subspace R N, and then maps it back onto the Mdimensional space, where N < M. The network performs nonlinear 118
12 Fig. 7.5: An autoassociative neural network. principal component analysis of projecting the input vectors onto the subspace R N. The subspace R N is the space spanned by the first N principal components derived from the training data. The value of N is determined by the number of units in the dimension compression layer. The mapping of the subspace R N back to the M  dimensional space R M determines the way in which the subspace R N is embedded in the original subspace R M. It has been shown that the AANN trained with a dataset will capture the subspace and the hyper surface along the maximum variance of the data [186] and [169]. In other words, AANN can be used to capture the distribution of the given data set. In this work, this feature of AANN is used to classify recurrent and nonrecurrent cancer cases. 119
13 7.5.1 Experimental Results and Discussion Thirty three cytological features of breast fineneedle aspirates reported to differ between recurrent and nonrecurrent cases of 198 patients are used to train and test the models. Training and testing the SVM In order to evaluate the result, three fold cross validation is used. The training data includes thirty four feature attributes and a class attribute while the test data has only thirty three feature attributes excluding the class attribute. The data are normalized between 1 and +1 in order for the classifier to have a common range to work with. SVM Torch is used for training and testing the model [173]. The polynomial kernel based SVM is trained using two third of the data randomly chosen and tested with the remaining one third of the data for evaluating the classifier s effectiveness. Table 7.6 shows the confusion matrix and Table 7.7 depicts the performance measures for the three different cross runs. The overall performance of SVM was arrived at, by taking the average performance values of the different cross runs and it is shown in Fig Table 7.6: SVM : Confusion matrix for prognosis of breast cancer using histopathological data. Cross run tp tn f p f n
14 Table 7.7: SVM : Performance measures for prognosis of breast cancer using histopathological data. Performance in (%) Cross run Accuracy Specificity Sensitivity FScore Table 7.8: AANN : Confusion matrix for prognosis of breast cancer using histopathological data. Cross run Epochs tp tn f p f n Training and Testing AANN The structure of the AANN model used is 12L 38N 4N 38N 12L, where L denotes linear units and N denotes nonlinear units. The activation function of the nonlinear unit is a hyperbolic tangent function. The network is trained using error backpropagation learning algorithm for 100, 500 and 1000 epochs. Table 7.8 gives the results in the form of a confusion matrix and the performance metrics calculated for the three different cross runs is shown in Table 7.9. The AANN gives better performance for 121
15 Fig. 7.6: Overall performance of SVM for prognosis of breast cancer using histopathological data. 500 epochs and the final overall performance is calculated by taking the average of all three cross runs with respect to 500 epochs and is shown in Fig The comparison of SVM and AANN obtained by taking the average performance values of the different cross runs is shown in Fig It can be seen that the accuracy of AANN is far better (86.66%) than SVM (71.81%). It is also seen from the Fig. 7.8 that specificity of SVM is very poor compared to AANN. This implies that SVM could not perform well in identifying nonrecurring cases correctly resulting in more false positivies. 122
16 Table 7.9: AANN : Performance measures for prognosis of breast cancer using histopathological data. Performance in (%) Cross run Epochs Accuracy Specificity Sensitivity FScore Summary In this chapter, the usage of support vector machines and radial basis function neural networks in actual clinical diagnosis was examined. Known sets of cytologically proved tumor data obtained from the Wisconsin breast cancer database were used to train the models to categorize cancer patients according to their diagnosis. Experimental results show that RBFNN gives better performance than SVM for breast cancer classification. Also methods were proposed to arrive at prognosis using AANN and SVM. Cytological features of the Wisconsin breast cancer database was used for this purpose. The experimental results reveal that AANN is better than SVM for prognosis of breast cancer. This work indicates that RBFNN and AANN can be effectively used for breast cancer diagnosis and prognosis to help oncologists. 123
17 Fig. 7.7: Overall performance of AANN for prognosis of breast cancer using histopathological data. Fig. 7.8: Performance comparison of SVM and AANN for prognosis of breast cancer using histopathological data. 124
Comparison of Nonlinear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data
CMPE 59H Comparison of Nonlinear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data Term Project Report Fatma Güney, Kübra Kalkan 1/15/2013 Keywords: Nonlinear
More informationApplication of Data Mining Techniques in Improving Breast Cancer Diagnosis
Application of Data Mining Techniques in Improving Breast Cancer Diagnosis ABSTRACT Breast cancer is the second leading cause of cancer deaths among women in the United States. Although mortality rates
More informationFeature Selection using Integer and Binary coded Genetic Algorithm to improve the performance of SVM Classifier
Feature Selection using Integer and Binary coded Genetic Algorithm to improve the performance of SVM Classifier D.Nithya a, *, V.Suganya b,1, R.Saranya Irudaya Mary c,1 Abstract  This paper presents,
More informationData Mining Analysis (breastcancer data)
Data Mining Analysis (breastcancer data) JungYing Wang Register number: D9115007, May, 2003 Abstract In this AI term project, we compare some world renowned machine learning tools. Including WEKA data
More informationArtificial Neural Networks and Support Vector Machines. CS 486/686: Introduction to Artificial Intelligence
Artificial Neural Networks and Support Vector Machines CS 486/686: Introduction to Artificial Intelligence 1 Outline What is a Neural Network?  Perceptron learners  Multilayer networks What is a Support
More informationAnalysis of kiva.com Microlending Service! Hoda Eydgahi Julia Ma Andy Bardagjy December 9, 2010 MAS.622j
Analysis of kiva.com Microlending Service! Hoda Eydgahi Julia Ma Andy Bardagjy December 9, 2010 MAS.622j What is Kiva? An organization that allows people to lend small amounts of money via the Internet
More informationNotes on Support Vector Machines
Notes on Support Vector Machines Fernando Mira da Silva Fernando.Silva@inesc.pt Neural Network Group I N E S C November 1998 Abstract This report describes an empirical study of Support Vector Machines
More informationNeural Networks. Neural network is a network or circuit of neurons. Neurons can be. Biological neurons Artificial neurons
Neural Networks Neural network is a network or circuit of neurons Neurons can be Biological neurons Artificial neurons Biological neurons Building block of the brain Human brain contains over 10 billion
More informationNeural Pattern Recognition Model for Breast Cancer Diagnosis
Cyber Journals: Multidisciplinary Journals in Science and Technology, Journal of Selected Areas in Bioinformatics (JBIO), August Edition, 2012 Neural Pattern Recognition Model for Breast Cancer Diagnosis
More information1. Classification problems
Neural and Evolutionary Computing. Lab 1: Classification problems Machine Learning test data repository Weka data mining platform Introduction Scilab 1. Classification problems The main aim of a classification
More informationClassifying Large Data Sets Using SVMs with Hierarchical Clusters. Presented by :Limou Wang
Classifying Large Data Sets Using SVMs with Hierarchical Clusters Presented by :Limou Wang Overview SVM Overview Motivation Hierarchical microclustering algorithm ClusteringBased SVM (CBSVM) Experimental
More informationFeature Extraction by Neural Network Nonlinear Mapping for Pattern Classification
Lerner et al.:feature Extraction by NN Nonlinear Mapping 1 Feature Extraction by Neural Network Nonlinear Mapping for Pattern Classification B. Lerner, H. Guterman, M. Aladjem, and I. Dinstein Department
More informationSURVIVABILITY ANALYSIS OF PEDIATRIC LEUKAEMIC PATIENTS USING NEURAL NETWORK APPROACH
330 SURVIVABILITY ANALYSIS OF PEDIATRIC LEUKAEMIC PATIENTS USING NEURAL NETWORK APPROACH T. M. D.Saumya 1, T. Rupasinghe 2 and P. Abeysinghe 3 1 Department of Industrial Management, University of Kelaniya,
More informationData Mining Techniques for Prognosis in Pancreatic Cancer
Data Mining Techniques for Prognosis in Pancreatic Cancer by Stuart Floyd A Thesis Submitted to the Faculty of the WORCESTER POLYTECHNIC INSTITUE In partial fulfillment of the requirements for the Degree
More information203.4770: Introduction to Machine Learning Dr. Rita Osadchy
203.4770: Introduction to Machine Learning Dr. Rita Osadchy 1 Outline 1. About the Course 2. What is Machine Learning? 3. Types of problems and Situations 4. ML Example 2 About the course Course Homepage:
More informationIntroduction to Machine Learning. Speaker: Harry Chao Advisor: J.J. Ding Date: 1/27/2011
Introduction to Machine Learning Speaker: Harry Chao Advisor: J.J. Ding Date: 1/27/2011 1 Outline 1. What is machine learning? 2. The basic of machine learning 3. Principles and effects of machine learning
More informationPredictive Dynamix Inc
Predictive Modeling Technology Predictive modeling is concerned with analyzing patterns and trends in historical and operational data in order to transform data into actionable decisions. This is accomplished
More informationNeural Networks and Support Vector Machines
INF5390  Kunstig intelligens Neural Networks and Support Vector Machines Roar Fjellheim INF539013 Neural Networks and SVM 1 Outline Neural networks Perceptrons Neural networks Support vector machines
More informationMACHINE LEARNING. Introduction. Alessandro Moschitti
MACHINE LEARNING Introduction Alessandro Moschitti Department of Computer Science and Information Engineering University of Trento Email: moschitti@disi.unitn.it Course Schedule Lectures Tuesday, 14:0016:00
More informationSUCCESSFUL PREDICTION OF HORSE RACING RESULTS USING A NEURAL NETWORK
SUCCESSFUL PREDICTION OF HORSE RACING RESULTS USING A NEURAL NETWORK N M Allinson and D Merritt 1 Introduction This contribution has two main sections. The first discusses some aspects of multilayer perceptrons,
More informationNeural Nets. General Model Building
Neural Nets To give you an idea of how new this material is, let s do a little history lesson. The origins are typically dated back to the early 1940 s and work by two physiologists, McCulloch and Pitts.
More informationClassifiers & Classification
Classifiers & Classification Forsyth & Ponce Computer Vision A Modern Approach chapter 22 Pattern Classification Duda, Hart and Stork School of Computer Science & Statistics Trinity College Dublin Dublin
More informationData Mining Chapter 6: Models and Patterns Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University
Data Mining Chapter 6: Models and Patterns Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Models vs. Patterns Models A model is a high level, global description of a
More informationData Mining. Supervised Methods. Ciro Donalek donalek@astro.caltech.edu. Ay/Bi 199ab: Methods of Computa@onal Sciences hcp://esci101.blogspot.
Data Mining Supervised Methods Ciro Donalek donalek@astro.caltech.edu Supervised Methods Summary Ar@ficial Neural Networks Mul@layer Perceptron Support Vector Machines SoLwares Supervised Models: Supervised
More informationData Mining Part 5. Prediction
Data Mining Part 5. Prediction 5.1 Spring 2010 Instructor: Dr. Masoud Yaghini Outline Classification vs. Numeric Prediction Prediction Process Data Preparation Comparing Prediction Methods References Classification
More informationPCA, Clustering and Classification. By H. Bjørn Nielsen strongly inspired by Agnieszka S. Juncker
PCA, Clustering and Classification By H. Bjørn Nielsen strongly inspired by Agnieszka S. Juncker Motivation: Multidimensional data Pat1 Pat2 Pat3 Pat4 Pat5 Pat6 Pat7 Pat8 Pat9 209619_at 7758 4705 5342
More information3 An Illustrative Example
Objectives An Illustrative Example Objectives  Theory and Examples 2 Problem Statement 2 Perceptron  TwoInput Case 4 Pattern Recognition Example 5 Hamming Network 8 Feedforward Layer 8 Recurrent
More informationChapter 12 Discovering New Knowledge Data Mining
Chapter 12 Discovering New Knowledge Data Mining BecerraFernandez, et al.  Knowledge Management 1/e  2004 Prentice Hall Additional material 2007 Dekai Wu Chapter Objectives Introduce the student to
More informationAn Introduction to Neural Networks
An Introduction to Vincent Cheung Kevin Cannons Signal & Data Compression Laboratory Electrical & Computer Engineering University of Manitoba Winnipeg, Manitoba, Canada Advisor: Dr. W. Kinsner May 27,
More informationComparison of Supervised and Unsupervised Learning Classifiers for Travel Recommendations
Volume 3, No. 8, August 2012 Journal of Global Research in Computer Science REVIEW ARTICLE Available Online at www.jgrcs.info Comparison of Supervised and Unsupervised Learning Classifiers for Travel Recommendations
More informationSupervised Learning with Unsupervised Output Separation
Supervised Learning with Unsupervised Output Separation Nathalie Japkowicz School of Information Technology and Engineering University of Ottawa 150 Louis Pasteur, P.O. Box 450 Stn. A Ottawa, Ontario,
More informationCheng Soon Ong & Christfried Webers. Canberra February June 2016
c Cheng Soon Ong & Christfried Webers Research Group and College of Engineering and Computer Science Canberra February June (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 31 c Part I
More informationINTRODUCTION TO NEURAL NETWORKS
INTRODUCTION TO NEURAL NETWORKS Pictures are taken from http://www.cs.cmu.edu/~tom/mlbookchapterslides.html http://research.microsoft.com/~cmbishop/prml/index.htm By Nobel Khandaker Neural Networks An
More informationComparison of Kmeans and Backpropagation Data Mining Algorithms
Comparison of Kmeans and Backpropagation Data Mining Algorithms Nitu Mathuriya, Dr. Ashish Bansal Abstract Data mining has got more and more mature as a field of basic research in computer science and
More informationIntroduction to machine learning and pattern recognition Lecture 1 Coryn BailerJones
Introduction to machine learning and pattern recognition Lecture 1 Coryn BailerJones http://www.mpia.de/homes/calj/mlpr_mpia2008.html 1 1 What is machine learning? Data description and interpretation
More informationFace Recognition using Principle Component Analysis
Face Recognition using Principle Component Analysis Kyungnam Kim Department of Computer Science University of Maryland, College Park MD 20742, USA Summary This is the summary of the basic idea about PCA
More informationA Survey of Kernel Clustering Methods
A Survey of Kernel Clustering Methods Maurizio Filippone, Francesco Camastra, Francesco Masulli and Stefano Rovetta Presented by: Kedar Grama Outline Unsupervised Learning and Clustering Types of clustering
More informationData Mining: A Hybrid Approach on the Clinical Diagnosis of Breast Tumor Patients
Data Mining: A Hybrid Approach on the Clinical Diagnosis of Breast Tumor Patients Onuodu F. E. 1, Eke B. O. 2 2 bathoyol@gmail.com, University of Port Harcourt, Port Harcourt, Nigeria 1 University of Port
More informationPredict Influencers in the Social Network
Predict Influencers in the Social Network Ruishan Liu, Yang Zhao and Liuyu Zhou Email: rliu2, yzhao2, lyzhou@stanford.edu Department of Electrical Engineering, Stanford University Abstract Given two persons
More informationDiagnosis of Breast Cancer Using Intelligent Techniques
International Journal of Emerging Science and Engineering (IJESE) Diagnosis of Breast Cancer Using Intelligent Techniques H.S.Hota Abstract Breast cancer is a serious and life threatening disease due
More informationFace Recognition using SIFT Features
Face Recognition using SIFT Features Mohamed Aly CNS186 Term Project Winter 2006 Abstract Face recognition has many important practical applications, like surveillance and access control.
More informationCSE 494 CSE/CBS 598 (Fall 2007): Numerical Linear Algebra for Data Exploration Clustering Instructor: Jieping Ye
CSE 494 CSE/CBS 598 Fall 2007: Numerical Linear Algebra for Data Exploration Clustering Instructor: Jieping Ye 1 Introduction One important method for data compression and classification is to organize
More informationScienceDirect. Brain Image Classification using Learning Machine Approach and Brain Structure Analysis
Available online at www.sciencedirect.com ScienceDirect Procedia Computer Science 50 (2015 ) 388 394 2nd International Symposium on Big Data and Cloud Computing (ISBCC 15) Brain Image Classification using
More informationMonotonicity Hints. Abstract
Monotonicity Hints Joseph Sill Computation and Neural Systems program California Institute of Technology email: joe@cs.caltech.edu Yaser S. AbuMostafa EE and CS Deptartments California Institute of Technology
More informationInternational Journal of Computer Science Trends and Technology (IJCST) Volume 3 Issue 3, MayJune 2015
RESEARCH ARTICLE OPEN ACCESS Data Mining Technology for Efficient Network Security Management Ankit Naik [1], S.W. Ahmad [2] Student [1], Assistant Professor [2] Department of Computer Science and Engineering
More informationClassification algorithm in Data mining: An Overview
Classification algorithm in Data mining: An Overview S.Neelamegam #1, Dr.E.Ramaraj *2 #1 M.phil Scholar, Department of Computer Science and Engineering, Alagappa University, Karaikudi. *2 Professor, Department
More informationClass #6: Nonlinear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris
Class #6: Nonlinear classification ML4Bio 2012 February 17 th, 2012 Quaid Morris 1 Module #: Title of Module 2 Review Overview Linear separability Nonlinear classification Linear Support Vector Machines
More informationFRAUD DETECTION IN ELECTRIC POWER DISTRIBUTION NETWORKS USING AN ANNBASED KNOWLEDGEDISCOVERY PROCESS
FRAUD DETECTION IN ELECTRIC POWER DISTRIBUTION NETWORKS USING AN ANNBASED KNOWLEDGEDISCOVERY PROCESS Breno C. Costa, Bruno. L. A. Alberto, André M. Portela, W. Maduro, Esdras O. Eler PDITec, Belo Horizonte,
More informationBEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES
BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES 123 CHAPTER 7 BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES 7.1 Introduction Even though using SVM presents
More informationPredict the Popularity of YouTube Videos Using Early View Data
000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050
More informationCLASSIFICATION AND CLUSTERING. Anveshi Charuvaka
CLASSIFICATION AND CLUSTERING Anveshi Charuvaka Learning from Data Classification Regression Clustering Anomaly Detection Contrast Set Mining Classification: Definition Given a collection of records (training
More informationA Partially Supervised Metric Multidimensional Scaling Algorithm for Textual Data Visualization
A Partially Supervised Metric Multidimensional Scaling Algorithm for Textual Data Visualization Ángela Blanco Universidad Pontificia de Salamanca ablancogo@upsa.es Spain Manuel MartínMerino Universidad
More informationIntroduction to Neural Networks : Revision Lectures
Introduction to Neural Networks : Revision Lectures John A. Bullinaria, 2004 1. Module Aims and Learning Outcomes 2. Biological and Artificial Neural Networks 3. Training Methods for Multi Layer Perceptrons
More informationSupervised Feature Selection & Unsupervised Dimensionality Reduction
Supervised Feature Selection & Unsupervised Dimensionality Reduction Feature Subset Selection Supervised: class labels are given Select a subset of the problem features Why? Redundant features much or
More information3D Ultrasonic Diagnosis of Breast Tumors. WeiMing Chen
3D Ultrasonic Diagnosis of Breast Tumors WeiMing Chen Three major benefits of ultrasound Ultrasound imaging has been shown to be valuable for differentiating some aspects of benign and malignant diseases.
More informationNovelty Detection in image recognition using IRF Neural Networks properties
Novelty Detection in image recognition using IRF Neural Networks properties Philippe Smagghe, JeanLuc Buessler, JeanPhilippe Urban Université de HauteAlsace MIPS 4, rue des Frères Lumière, 68093 Mulhouse,
More informationArtificial Neural Network, Decision Tree and Statistical Techniques Applied for Designing and Developing Email Classifier
International Journal of Recent Technology and Engineering (IJRTE) ISSN: 22773878, Volume1, Issue6, January 2013 Artificial Neural Network, Decision Tree and Statistical Techniques Applied for Designing
More informationEVALUATION OF NEURAL NETWORK BASED CLASSIFICATION SYSTEMS FOR CLINICAL CANCER DATA CLASSIFICATION
EVALUATION OF NEURAL NETWORK BASED CLASSIFICATION SYSTEMS FOR CLINICAL CANCER DATA CLASSIFICATION K. Mumtaz Vivekanandha Institute of Information and Management Studies, Tiruchengode, India S.A.Sheriff
More informationData Clustering. Dec 2nd, 2013 Kyrylo Bessonov
Data Clustering Dec 2nd, 2013 Kyrylo Bessonov Talk outline Introduction to clustering Types of clustering Supervised Unsupervised Similarity measures Main clustering algorithms kmeans Hierarchical Main
More information2. Feature Extraction Methods
Artificial Intelligence Research and Development L. Museros et al. (Eds.) IOS Press, 2014 2014 The authors and IOS Press. All rights reserved. doi:10.3233/9781614994527159 159 Improvement of Mass
More informationINTERNATIONAL JOURNAL FOR ENGINEERING APPLICATIONS AND TECHNOLOGY DATA MINING IN HEALTHCARE SECTOR. ankitanandurkar2394@gmail.com
IJFEAT INTERNATIONAL JOURNAL FOR ENGINEERING APPLICATIONS AND TECHNOLOGY DATA MINING IN HEALTHCARE SECTOR Bharti S. Takey 1, Ankita N. Nandurkar 2,Ashwini A. Khobragade 3,Pooja G. Jaiswal 4,Swapnil R.
More informationTowards better accuracy for Spam predictions
Towards better accuracy for Spam predictions Chengyan Zhao Department of Computer Science University of Toronto Toronto, Ontario, Canada M5S 2E4 czhao@cs.toronto.edu Abstract Spam identification is crucial
More information6.2.8 Neural networks for data mining
6.2.8 Neural networks for data mining Walter Kosters 1 In many application areas neural networks are known to be valuable tools. This also holds for data mining. In this chapter we discuss the use of neural
More informationFeature Subset Selection in Email Spam Detection
Feature Subset Selection in Email Spam Detection Amir Rajabi Behjat, Universiti Technology MARA, Malaysia IT Security for the Next Generation Asia Pacific & MEA Cup, Hong Kong 1416 March, 2012 Feature
More informationBuilding MLP networks by construction
University of Wollongong Research Online Faculty of Informatics  Papers (Archive) Faculty of Engineering and Information Sciences 2000 Building MLP networks by construction Ah Chung Tsoi University of
More informationTHREE DIMENSIONAL REPRESENTATION OF AMINO ACID CHARAC TERISTICS
THREE DIMENSIONAL REPRESENTATION OF AMINO ACID CHARAC TERISTICS O.U. Sezerman 1, R. Islamaj 2, E. Alpaydin 2 1 Laborotory of Computational Biology, Sabancı University, Istanbul, Turkey. 2 Computer Engineering
More informationKNOWLEDGE BASED ANALYSIS OF VARIOUS STATISTICAL TOOLS IN DETECTING BREAST CANCER
KNOWLEDGE BASED ANALYSIS OF VARIOUS STATISTICAL TOOLS IN DETECTING BREAST CANCER S. Aruna 1, Dr S.P. Rajagopalan 2 and L.V. Nandakishore 3 1,2 Department of Computer Applications, Dr M.G.R University,
More informationApplication of Data Mining Techniques in Improving Breast Cancer Diagnosis
Paper 94202016 Application of Data Mining Techniques in Improving Breast Cancer Diagnosis Josephine S. Akosa, Oklahoma State University; Shannon Kelly, Oklahoma State University ABSTRACT Breast cancer
More informationMachine learning for algo trading
Machine learning for algo trading An introduction for nonmathematicians Dr. Aly Kassam Overview High level introduction to machine learning A machine learning bestiary What has all this got to do with
More informationAutomated Stellar Classification for Large Surveys with EKF and RBF Neural Networks
Chin. J. Astron. Astrophys. Vol. 5 (2005), No. 2, 203 210 (http:/www.chjaa.org) Chinese Journal of Astronomy and Astrophysics Automated Stellar Classification for Large Surveys with EKF and RBF Neural
More informationJava Modules for Time Series Analysis
Java Modules for Time Series Analysis Agenda Clustering Nonnormal distributions Multifactor modeling Implied ratings Time series prediction 1. Clustering + Cluster 1 Synthetic Clustering + Time series
More informationBig Data Analytics CSCI 4030
High dim. data Graph data Infinite data Machine learning Apps Locality sensitive hashing PageRank, SimRank Filtering data streams SVM Recommen der systems Clustering Community Detection Web advertising
More informationInformation Model Requirements of PostCoordinated SNOMED CT Expressions for Structured Pathology Reports
Information Model Requirements of PostCoordinated SNOMED CT Expressions for Structured Pathology Reports W. Scott Campbell, Ph.D., MBA James R. Campbell, MD Acknowledgements Steven H. Hinrichs, MD Chairman
More informationBIOINF 585 Fall 2015 Machine Learning for Systems Biology & Clinical Informatics http://www.ccmb.med.umich.edu/node/1376
Course Director: Dr. Kayvan Najarian (DCM&B, kayvan@umich.edu) Lectures: Labs: Mondays and Wednesdays 9:00 AM 10:30 AM Rm. 2065 Palmer Commons Bldg. Wednesdays 10:30 AM 11:30 AM (alternate weeks) Rm.
More informationApplication of Data Mining Techniques to Model Breast Cancer Data
Application of Data Mining Techniques to Model Breast Cancer Data S. Syed Shajahaan 1, S. Shanthi 2, V. ManoChitra 3 1 Department of Information Technology, Rathinam Technical Campus, Anna University,
More informationAccurate and robust image superresolution by neural processing of local image representations
Accurate and robust image superresolution by neural processing of local image representations Carlos Miravet 1,2 and Francisco B. Rodríguez 1 1 Grupo de Neurocomputación Biológica (GNB), Escuela Politécnica
More informationMathematical Models of Supervised Learning and their Application to Medical Diagnosis
Genomic, Proteomic and Transcriptomic Lab High Performance Computing and Networking Institute National Research Council, Italy Mathematical Models of Supervised Learning and their Application to Medical
More informationData Mining using Artificial Neural Network Rules
Data Mining using Artificial Neural Network Rules Pushkar Shinde MCOERC, Nasik Abstract  Diabetes patients are increasing in number so it is necessary to predict, treat and diagnose the disease. Data
More informationVisualization of Breast Cancer Data by SOM Component Planes
International Journal of Science and Technology Volume 3 No. 2, February, 2014 Visualization of Breast Cancer Data by SOM Component Planes P.Venkatesan. 1, M.Mullai 2 1 Department of Statistics,NIRT(Indian
More informationPredictive Data modeling for health care: Comparative performance study of different prediction models
Predictive Data modeling for health care: Comparative performance study of different prediction models Shivanand Hiremath hiremat.nitie@gmail.com National Institute of Industrial Engineering (NITIE) Vihar
More informationARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING)
ARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING) Gabriela Ochoa http://www.cs.stir.ac.uk/~goc/ OUTLINE Preliminaries Classification and Clustering Applications
More informationOBJECTIVES By the end of this segment, the community participant will be able to:
Cancer 101: Cancer Diagnosis and Staging Linda U. Krebs, RN, PhD, AOCN, FAAN OCEAN Native Navigators and the Cancer Continuum (NNACC) (NCMHD R24MD002811) Cancer 101: Diagnosis & Staging (WatanabeGalloway
More informationSupport Vector Machines with Clustering for Training with Very Large Datasets
Support Vector Machines with Clustering for Training with Very Large Datasets Theodoros Evgeniou Technology Management INSEAD Bd de Constance, Fontainebleau 77300, France theodoros.evgeniou@insead.fr Massimiliano
More informationApplication of Data mining in Medical Applications
Application of Data mining in Medical Applications by Arun George Eapen A thesis presented to the University of Waterloo in fulfillment of the thesis requirement for the degree of Master of Applied Science
More informationA new Approach for Intrusion Detection in Computer Networks Using Data Mining Technique
A new Approach for Intrusion Detection in Computer Networks Using Data Mining Technique Aida Parbaleh 1, Dr. Heirsh Soltanpanah 2* 1 Department of Computer Engineering, Islamic Azad University, Sanandaj
More informationA New Approach For Estimating Software Effort Using RBFN Network
IJCSNS International Journal of Computer Science and Network Security, VOL.8 No.7, July 008 37 A New Approach For Estimating Software Using RBFN Network Ch. Satyananda Reddy, P. Sankara Rao, KVSVN Raju,
More informationMethods and Applications for Distance Based ANN Training
Methods and Applications for Distance Based ANN Training Christoph Lassner, Rainer Lienhart Multimedia Computing and Computer Vision Lab Augsburg University, Universitätsstr. 6a, 86159 Augsburg, Germany
More informationFacebook Friend Suggestion Eytan Daniyalzade and Tim Lipus
Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus 1. Introduction Facebook is a social networking website with an open platform that enables developers to extract and utilize user information
More informationNeural networks. Chapter 20, Section 5 1
Neural networks Chapter 20, Section 5 Chapter 20, Section 5 Outline Brains Neural networks Perceptrons Multilayer perceptrons Applications of neural networks Chapter 20, Section 5 2 Brains 0 neurons of
More informationFunctional Data Analysis of MALDI TOF Protein Spectra
Functional Data Analysis of MALDI TOF Protein Spectra Dean Billheimer dean.billheimer@vanderbilt.edu. Department of Biostatistics Vanderbilt University Vanderbilt Ingram Cancer Center FDA for MALDI TOF
More informationPredicting Results of Brazilian Soccer League Matches
University of WisconsinMadison ECE/CS/ME 539 Introduction to Artificial Neural Networks and Fuzzy Systems Predicting Results of Brazilian Soccer League Matches Student: Alberto Trindade Tavares Email:
More informationBreast Cancer Diagnosis by using knearest Neighbor with Different Distances and Classification Rules
Breast Cancer Diagnosis by using knearest Neighbor with Different Distances and Classification Rules Seyyid Ahmed Medjahed University of Science and Technology Oran USTOMB, Algeria Tamazouzt Ait Saadi
More informationData Mining Algorithms Part 1. Dejan Sarka
Data Mining Algorithms Part 1 Dejan Sarka Join the conversation on Twitter: @DevWeek #DW2015 Instructor Bio Dejan Sarka (dsarka@solidq.com) 30 years of experience SQL Server MVP, MCT, 13 books 7+ courses
More informationUsing Data Mining for Mobile Communication Clustering and Characterization
Using Data Mining for Mobile Communication Clustering and Characterization A. Bascacov *, C. Cernazanu ** and M. Marcu ** * Lasting Software, Timisoara, Romania ** Politehnica University of Timisoara/Computer
More informationComparison of Supervised and Unsupervised Learning Algorithms for Pattern Classification
Comparison of Supervised and Unsupervised Learning Algorithms for Pattern Classification R. Sathya Professor, Dept. of MCA, Jyoti Nivas College (Autonomous), Professor and Head, Dept. of Mathematics, Bangalore,
More informationRole of Neural network in data mining
Role of Neural network in data mining Chitranjanjit kaur Associate Prof Guru Nanak College, Sukhchainana Phagwara,(GNDU) Punjab, India Pooja kapoor Associate Prof Swami Sarvanand Group Of Institutes Dinanagar(PTU)
More informationAUTOMATED CLASSIFICATION OF BLASTS IN ACUTE LEUKEMIA BLOOD SAMPLES USING HMLP NETWORK
AUTOMATED CLASSIFICATION OF BLASTS IN ACUTE LEUKEMIA BLOOD SAMPLES USING HMLP NETWORK N. H. Harun 1, M.Y.Mashor 1, A.S. Abdul Nasir 1 and H.Rosline 2 1 Electronic & Biomedical Intelligent Systems (EBItS)
More information10810 /02710 Computational Genomics. Clustering expression data
10810 /02710 Computational Genomics Clustering expression data What is Clustering? Organizing data into clusters such that there is high intracluster similarity low intercluster similarity Informally,
More informationNetwork Intrusion Detection using Semi Supervised Support Vector Machine
Network Intrusion Detection using Semi Supervised Support Vector Machine Jyoti Haweliya Department of Computer Engineering Institute of Engineering & Technology, Devi Ahilya University Indore, India ABSTRACT
More informationElectroencephalography Analysis Using Neural Network and Support Vector Machine during Sleep
Engineering, 23, 5, 8892 doi:.4236/eng.23.55b8 Published Online May 23 (http://www.scirp.org/journal/eng) Electroencephalography Analysis Using Neural Network and Support Vector Machine during Sleep JeeEun
More information