Visualization of Breast Cancer Data by SOM Component Planes



Similar documents
USING SELF-ORGANIZING MAPS FOR INFORMATION VISUALIZATION AND KNOWLEDGE DISCOVERY IN COMPLEX GEOSPATIAL DATASETS

VISUALIZATION OF GEOSPATIAL DATA BY COMPONENT PLANES AND U-MATRIX

Self Organizing Maps: Fundamentals

Visual analysis of self-organizing maps

Monitoring of Complex Industrial Processes based on Self-Organizing Maps and Watershed Transformations

Self-Organizing g Maps (SOM) COMP61021 Modelling and Visualization of High Dimensional Data

On the use of Three-dimensional Self-Organizing Maps for Visualizing Clusters in Geo-referenced Data

USING SELF-ORGANISING MAPS FOR ANOMALOUS BEHAVIOUR DETECTION IN A COMPUTER FORENSIC INVESTIGATION

Visualization of large data sets using MDS combined with LVQ.

Using Smoothed Data Histograms for Cluster Visualization in Self-Organizing Maps

A Discussion on Visual Interactive Data Exploration using Self-Organizing Maps

ENERGY RESEARCH at the University of Oulu. 1 Introduction Air pollutants such as SO 2

UNIVERSITY OF BOLTON SCHOOL OF ENGINEERING MS SYSTEMS ENGINEERING AND ENGINEERING MANAGEMENT SEMESTER 1 EXAMINATION 2015/2016 INTELLIGENT SYSTEMS

Data topology visualization for the Self-Organizing Map

Methodology for Emulating Self Organizing Maps for Visualization of Large Datasets

ViSOM A Novel Method for Multivariate Data Projection and Structure Visualization

Visualising Class Distribution on Self-Organising Maps

Reconstructing Self Organizing Maps as Spider Graphs for better visual interpretation of large unstructured datasets

Customer Data Mining and Visualization by Generative Topographic Mapping Methods

6.2.8 Neural networks for data mining

Mobile Phone APP Software Browsing Behavior using Clustering Analysis

ANALYSIS OF MOBILE RADIO ACCESS NETWORK USING THE SELF-ORGANIZING MAP

Data Mining and Neural Networks in Stata

Analysis of Performance Metrics from a Database Management System Using Kohonen s Self Organizing Maps

EVALUATION OF NEURAL NETWORK BASED CLASSIFICATION SYSTEMS FOR CLINICAL CANCER DATA CLASSIFICATION

Comparing large datasets structures through unsupervised learning

Cartogram representation of the batch-som magnification factor

INTERACTIVE DATA EXPLORATION USING MDS MAPPING

Icon and Geometric Data Visualization with a Self-Organizing Map Grid

Data Clustering and Topology Preservation Using 3D Visualization of Self Organizing Maps

A Computational Framework for Exploratory Data Analysis

Data Mining using Rule Extraction from Kohonen Self-Organising Maps

Visualization of Topology Representing Networks

Data Mining on Sequences with recursive Self-Organizing Maps

An Analysis on Density Based Clustering of Multi Dimensional Spatial Data

Classification of Engineering Consultancy Firms Using Self-Organizing Maps: A Scientific Approach

EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

Neural networks and their rules for classification in marine geology

Information Visualization with Self-Organizing Maps

Predicting the Risk of Heart Attacks using Neural Network and Decision Tree

Visualization of Crime Trajectories with Self-Organizing Maps: A Case Study on Evaluating the Impact of Hurricanes on Spatio- Temporal Crime Hotspots

Comparison of Supervised and Unsupervised Learning Classifiers for Travel Recommendations

EFFICIENT DATA PRE-PROCESSING FOR DATA MINING

Models of Cortical Maps II

Advanced Web Usage Mining Algorithm using Neural Network and Principal Component Analysis

How To Use Neural Networks In Data Mining

Online data visualization using the neural gas network

Quality Assessment in Spatial Clustering of Data Mining

Learning Vector Quantization: generalization ability and dynamics of competing prototypes

LVQ Plug-In Algorithm for SQL Server

A Learning Based Method for Super-Resolution of Low Resolution Images

A STUDY ON DATA MINING INVESTIGATING ITS METHODS, APPROACHES AND APPLICATIONS

Fuel Cell Health Monitoring Using Self Organizing Maps

Using Data Mining for Mobile Communication Clustering and Characterization

Expanding Self-organizing Map for Data. Visualization and Cluster Analysis

Stabilization by Conceptual Duplication in Adaptive Resonance Theory

Data Mining Techniques Chapter 7: Artificial Neural Networks

A Study of Web Log Analysis Using Clustering Techniques

Visualizing class probability estimators

The Research of Data Mining Based on Neural Networks

A Partially Supervised Metric Multidimensional Scaling Algorithm for Textual Data Visualization

COMPARISON OF OBJECT BASED AND PIXEL BASED CLASSIFICATION OF HIGH RESOLUTION SATELLITE IMAGES USING ARTIFICIAL NEURAL NETWORKS

Proceedings - AutoCarto Columbus, Ohio, USA - September 16-18, 2012

Knowledge Discovery in Stock Market Data

MANAGING QUEUE STABILITY USING ART2 IN ACTIVE QUEUE MANAGEMENT FOR CONGESTION CONTROL

Visualization methods for patent data

IMPROVED METHODS FOR CLUSTER IDENTIFICATION AND VISUALIZATION IN HIGH-DIMENSIONAL DATA USING SELF-ORGANIZING MAPS. A Thesis Presented.

Introduction to Pattern Recognition

Self Organizing Maps for Visualization of Categories

Machine Learning Introduction

ultra fast SOM using CUDA

Supervised and unsupervised learning - 1

CS 2750 Machine Learning. Lecture 1. Machine Learning. CS 2750 Machine Learning.

Detecting Denial of Service Attacks Using Emergent Self-Organizing Maps

Web Traffic Mining Using a Concurrent Neuro-Fuzzy Approach

Credit Card Fraud Detection Using Self Organised Map

Application of self-organizing maps to clustering of high-frequency financial data

Visualization and Data Mining of Pareto Solutions Using Self-Organizing Map

Comparative Analysis of EM Clustering Algorithm and Density Based Clustering Algorithm Using WEKA tool.

Multiscale Object-Based Classification of Satellite Images Merging Multispectral Information with Panchromatic Textural Features

Segmentation of stock trading customers according to potential value

Nonlinear Discriminative Data Visualization

Lluis Belanche + Alfredo Vellido. Intelligent Data Analysis and Data Mining

A General Tracking and Auditing Architecture for the OpenACS framework

CITY UNIVERSITY OF HONG KONG 香 港 城 市 大 學. Self-Organizing Map: Visualization and Data Handling 自 組 織 神 經 網 絡 : 可 視 化 和 數 據 處 理

Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches

Social Media Mining. Data Mining Essentials

A Review of Data Clustering Approaches

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015

Visual decisions in the analysis of customers online shopping behavior

Comparison of K-means and Backpropagation Data Mining Algorithms

2002 IEEE. Reprinted with permission.

USING SELF-ORGANIZED MAPS AND ANALYTIC HIERARCHY PROCESS FOR EVALUATING CUSTOMER PREFERENCES IN NETBOOK DESIGNS

Data Quality Mining: Employing Classifiers for Assuring consistent Datasets

Big Data: Rethinking Text Visualization

What is Data Mining? Data Mining (Knowledge discovery in database) Data mining: Basic steps. Mining tasks. Classification: YES, NO

: Introduction to Machine Learning Dr. Rita Osadchy

Machine Learning CS Lecture 01. Razvan C. Bunescu School of Electrical Engineering and Computer Science

INTRUSION DETECTION SYSTEM USING SELF ORGANIZING MAP

Transcription:

International Journal of Science and Technology Volume 3 No. 2, February, 2014 Visualization of Breast Cancer Data by SOM Component Planes P.Venkatesan. 1, M.Mullai 2 1 Department of Statistics,NIRT(Indian Council for Medical Research),Chennai 2 Department of Mathematics, Ethiraj College For Women,Chennai ABSTRACT The component plane presentation of integrated self-organized maps(som) is a powerful artificial intelligence tool for analysis of large, complex, biological databases. This approach allows the display of multi-dimensional SOM outputs disease data bases in multiple sample specific presentation providing distinct advantages in visual inspection of biological significance of features clustered in each unit. This paper attempts to analyse the behavior of the various attributes constituting breast cancer and the correlation between them through their component planes. This additional information obtained will, definitely, complement the results obtained through other techniques. Keywords: Medical diagnosis, Artificial intelligence (AI),Artificial Neural Unit(BMU),Component Plane(CP). network(ann), Self Organizing Map(SOM), Best Matching I. INTRODUCTION Artificial Neural Network (ANN) is a powerful AI technique playing a vital role in the medical field. It has emerged as one of the indispensible tools in clinical diagnosis for its ability to explore medical data.ann is capable of handling data exploration efficiently and effectively and hence enhances the decision making in the medical field when used for classification, clustering, pattern recognition and prediction[akay et al 1994,Bishop 1995,Pattichis et al 1995]. Supervised and unsupervised are two of the modes in which ANN performs. supervised learning necessitates the presence of a desired output result for each input while training the network where as desired output is not available for unsupervised learning. Self organizing map(som), introduced by Teuvo Kohonen(2001), is one of the best known unsupervised, competitive, natural learning algorithm[kohonen T,1995]. Data mining is non-trivial extraction of implicit, unknown and potentially useful information from data ( Frawley et al, 1992). Its aim is to search through the data for its hidden features and other relations. Scatter plot is one of the traditional techniques to identify the dependencies or the relations between the variables. But this technique does not seem to be practical where variables are more in number. Visualization technique based on component planes (Miguel A et al,2007 ) helps to find the correlating attributes. II. SELF ORGANISING MAP A self-organizing map consists of components called nodes or neurons [Kohonen T, 1982]. Weight vector of the same dimension as the input data vectors is associated with each node in the map space. The usual arrangement is a regular spaced nodes in a hexagonal or rectangular grid [Ultsch&Siemon, 1989]. SOM describes a mapping from a higher dimensional input space to a lower dimensional map space, usually two dimensional. SOM combines the features of vector quantization and vector projection. During the learning process, weight vector for each node is found. In each iteration, the most similar node called the best matching unit (BMU) is found by a similarity measure. Learning takes place with an objective of grouping similar nodes of the BMU and updates their corresponding weight. SOM is trained using unsupervised learning to produce a lowdimensional (typically two-dimensional), discretized representation of the input space of the training samples, called a map[lipp,1987].self organizing maps are different from other artificial neural networks in the sense that they use neighbourhood function to preserve the topological properties of the input space.som is organized so that similar data are mapped on to the same node or to the neighbouring nodes and there by similar input patterns are spatially clustered and get organized themselves. IJST 2014 IJST Publications UK. All rights reserved. 127

SOM Process Competitive process identifies the BMU, Co-operative process locates the neurons closer to BMU using a neighbourhood function, most preferably Gaussian hj,i(x)(n) = exp [- d 2 j,i / 2 σ(n)] having the property of being symmetric about the maximum point, monotonically decreasing with time (a unique feature of SOM). Here, d ij is the lateral distance between the BMU and excited neuron. σ explains the width of the neighbourhood, which shrinks with time. Adaption process enables the excited neurons to increment weight vector by suitable adjustment wj(n+1) = wj(n) + n hji (x) (x wj(n)) Self organizing of the neurons and fine tuning of feature map take place during the adaptation process. Topological ordering, density matching and feature selection are the distinct properties of SOM making it the most effective tool. Component Planes Visualization of data by component planes is one among the various visualization techniques using SOM grid(marcos et al 2004,Miguel A et al,2007).they show the emerging patterns of data distribution on SOM grid(kohonen,2001), and detect dependencies among variables and contribution of each one to the SOM grid. This depends on code book vectors and hence a more comprehensive method. Each component plane exhibits the behavior of an attribute constituting the data using colour coding. With the help of component planes, data set can be explored for interpretation of patterns and association of the attributes and hence gain knowledge about the data set.kaski et al (1996) applied both U-matrix and CP to classify countries by their socioeconomic global indexes. Winter et al(1994) used CP to analyze the racial segregation issue of South Africa. Data base Breast cancer data obtained from UCI -Machine Learning Repository is used for the purpose. Data consists of details of 699 individuals tested for breast cancer based on 32 real valued features contributing the disease attributes and nine important features, viz, radius, texture, perimeter, area, smoothness, compactness, concavity, concave points and symmetry. The component planes for the attributes are obtained for both actual and normalized data with Gaussian neighborhood and bubble kernel. SOM_PAK (version 3.1) has been used in matlab (version 7.9.0.529-R2009b) for the purpose (Vesanto.J etal,1999). Correlation co-efficient (given within brackets) are obtained with the help of viscovery SOMine 5 software. Map is trained using batch algorithm. Figure-1: Component Planes(Actual Data) with Gaussian neighbourhood IJST 2014 IJST Publications UK. All rights reserved. 128

Figure-2.Component Planes (Normalized Data) with Gaussian neighbourhood U-matrix 1.79 Variable1 0.999 Variable2 Variable3 0.982 0.996 1.11 0.502 0.492 0.501 Variable4 0.429 0.998 Variable5 d 0.00489 0.999 Variable6 d 0.0015 0.999 Variable7 d 0.0061 0.991 0.515 0.503 0.503 0.498 d 0.0314 Variable8 Variable9 0.987 d 0.00688 0.991 d 0.0060 d 0.006 0.496 0.497 d 0.00388 d 0.0033 Figure-3 Component Planes(Normalized Data) with Bubble neighbourhood Final quantization error : 2.489 ; Final topographic error : 0.033 IJST 2014 IJST Publications UK. All rights reserved. 129

Observation Figure 1. shows a U-Matrix and nine CPs for each of nine attributes where high values are primarily located below the first third of the map. U-matrix has three short clusters in the lower part. This suggests that these areas could be associated with radius, texture, perimeter and compactness. Moreover, while comparing the U-matrix and CP of radius, both of them have void space in similar position which might suggest that radius seem to dominate the decision. At this point, one can make two inferences that data set does not have a similarity or the U-matrix could not separate the dataset in a proper manner. For the CPs, high value is associated with red colour and low values with blue colour. Some visible patterns can be observed from the component planes. CPs for texture and perimeter have a very similar colour pattern indicating a strong correlation between them.(0.6568, the highest correlation among the attributes).both of them have deep dark left bottom corner an identical nature of their CPs. It is also observed that the attribute symmetry contributes very less to the prediction and is negatively correlated (-0.3469) to compactness as their CPS do not have similar colour coding of the regions. From figure-2 it is inferred that CPs pertaining to the actual data and normalized data look identical with the Gaussian neighbourhood. CPs pertaining to bubble neighbourhood (fig.3) looks totally different and does not help us to draw inferences as the case in Gaussian neighbourhood.performance of Gaussian function can be attributed to its nature such as uni-modal, monotonic decreasing with increasing neighbourhood (necessary condition for convergence), translation invariance (independent of location of winning neuron), attainment of maximum value at the winning neuron and dependence on neighbourhood (the necessary condition for co-operation among neighbouring neurons) while bubble neighbourhood uses step function. The quantization error assesses the vector quantization properties. Quantization being 2.489 implies the fact that more neurons represent, stability, better input. The simplest measure of topology preservation, the topographic error, being 0.33 indicates the better quality of the map. III. CONCLUSION Comparison of U-matrix with the component planes show the contribution of the attributes to the data set and comparison between the component planes show their correlations visually, which is an interesting and important aspect. Moreover, the attributes whose contribution is negligible (revealed by their respective component planes) can be ignored and attributes contributing almost equivalently can help to remove the redundancy in data, there by number of attributes can, further, be reduced enabling the reduction of data size, a welcoming aspect indeed. REFERENCES [1]. Bishop,C.M(1995).Neural Networks for pattern recognition, Oxford University Press, NEW YORK [2]. Brause, R.W.,(2001). Medical analysis and diagnosis by neural networks. Proceedings of Second International Symposium on Medical data Analysis. Oct 08-09, Springer-Verlag, LONDON, UK.,pp: 1-13 [3]. Frawley.W.J,Piatetsky-Shapiro& Matheus(1992). Knowledge discovery in databases an overview AI Magazine 13: 57-70 [4]. Kasi.S,Kohonen.T(1996) Exploratory Data Analysis by the self Organizing Map: Proceedings of the Third International conference on Neural Networks in the capital Markets 498-507 [5]. Kohonen, T. (1982) Self-organizing formation of topologically correct feature maps, Biological Cyberbetics. Volume 43, 1982, pp:59-96 [6]. Kohonen, T. (1995) Self-Organizing Maps, Springer Series in Information Sciences, Vol. 30, Springer, Berlin, Heidelberg, NEW YORK, 1995 [7]. Kohonen, T. (1997) Self-Organizing Maps,2 nd edn.,springer Verlag [8]. Kohonen, T. (2001) Self-Organizing Maps, Springer Series in Information Sciences, Vol. 30, Springer, Berlin, Heidelberg, NEW YORK, 2001 [9]. Marcos A,Antonio M,Jose SM(2004) Visulization of geospatial data by component planes and U-Matrix Proceedings of geoinormation.75-89 [10]. Miguel A,Andres P(2007) Improving the correlation hunting in a large quantity of SOM Component planes ICANN,Part II,LNCS 4669.379-388 [11]. Pattichi C.S(1995) IEEE Tranc Biomed Eng 42: 486-496 [12]. Ultsch A &Siemon H. P(1989) Exploratory Data Analysis: Using Kohonen Networks on Transputers, Research Report No. 329, University of Dortmund, 1989 [13]. Vesanto J(1999) SOM based visualization methods. Intelligent Data Analysis,3:111-126, [14]. Vesanto, J., Ahola, J (1999).: Hunting for Correlations in Data Using the Self-Organizing Map. In: Proceeding of the International ICSC Congress on IJST 2014 IJST Publications UK. All rights reserved. 130

Computational Intelligence Methods and Applications, pp. 279 285 [15]. Vesanto.J,Himberg.J,Alhoniemi.E,Parhankangas,J (2000) SOM Toolbox for Matlab 5, Report A57,April 2000,ISBN 951-22-4951-0 [16]. Winter,K,Hewitson.B (1994). Self organizing mapsapplications to cencus data.in Hewitson.B,Crane.R,editors,Neural nets:applications in geography.kluwer. IJST 2014 IJST Publications UK. All rights reserved. 131