# Layout Based Visualization Techniques for Multi Dimensional Data

Save this PDF as:

Size: px
Start display at page:

Download "Layout Based Visualization Techniques for Multi Dimensional Data"

## Transcription

1 Layout Based Visualization Techniques for Multi Dimensional Data Wim de Leeuw Robert van Liere Center for Mathematics and Computer Science, CWI Amsterdam, the Netherlands October 27, 2000 Abstract In this paper we present an overview for the interactive visualization of structural information in tabular multidimensional data. We first provide an overview of methods for layout of high dimensional data. Then we discuss a number of interactive visualization methods that can be used to present the structure of these high dimensional spaces. We discuss the use of the described methods in three applications. Keywords: Information Visualization, Multi Dimensional Scaling, Kohonen Maps, 1 Introduction A large class of data can be characterized by tables. This table is a matrix of attribute variables in one dimension and the outcome of specific cases in the other. Discovery and understanding of the structure in this type of data is a crucial part of science and business. Structure can take on many forms: clusters, regular patterns, outliers etc. Two approaches in the analysis of data can be destinguished. In the first approach the data in the elements are aggregated into some new information in the other approach the data elements are laid out on a two or threedimensional space in some way. Of the first type are methods such as principal component analysis, k-means and hierarchical clustering algorithms. The second method will be described in more detail later as it is the focus of work in this paper. Both methods can serve as a basis for visualization. Interactive visualization can be an aid in the process of discovery and understanding. The process of transforming the data tables into an image can be considered as a pipe line consisting of a number of steps; the so called visualization pipeline. By manipulation of the steps the user can interact with the system. For data tables the following steps can be discerned in the pipeline (see Figure 1): matrix generation, layout, and mapping. Manipulation is an important ingredient of visualization which can affect each of these steps. 1

2 pre processing layout mapping image manipulation Figure 1: The steps for visualization table data A distance matrix or adjacency matrix is generated by defining a metric by which the similarity or dissimilarity between cases in the table can be determined. Depending on the data type in the table, numeric, boolean or textual, many different metrics exist to calculate this difference. Using this distance matrix a layout of the samples is determined. This layout can be considered as a mapping from a high dimensional space to a lower dimensional space. Depending on the application the layout space can have one, two or three dimensions. This layout forms the basis for the visualization of the data set. A distinction can be made between discrete mapping and continuous mapping. In case of discrete mapping each sample is represented by a separate icon. Visualization of attributes of the samples can be realized via color or shape mappings. In continuous mapping the underlying data samples are not explicitly represented but some form of aggregation is applied such that the overall properties of the set are reflected in the visualization. Manipulation of the visualization allows the user to investigate different aspects of the data. In all but the smallest data sets it is impossible to present all information contained in the data automatically in a single image. Therefore the user should be able to manipulate the parameters in the visualization pipeline in a meaningful and understandable way to expose certain aspects of the information in the data. Others have done work on frameworks for information visualization [1]. The paper is structured as follows... 2 Methods As described the visualization of sets of objects consists of three steps: layout, visualization and manipulation. This section presents an overview of methods which can be used to for each of these steps. The input of the pipeline is a collection data samples and some way to calculate distances between samples. Depending on the nature of the data (numerical, textual or boolean) many different distance metrics can be used. 2.1 Layout In the layout stage a position is determined for each object in the set. These positions are chosen such that the structural properties of the set are conserved. This step can be regarded as a projection from the multi dimensional data space to a two dimensional or three dimensional Euclidean space. We assume a two dimensional projection space here, however all presented methods are in theory equally capable of producing three 2

3 dimensional layouts. The methods for layout can be classified based on the representation of the mapping function, for some methods the mapping function used for a particular set of data is not explicitly formulated. This means that if a new data element is added the method does not have the ability to position it in the existing layout. Other criteria to look at the method is dependence of ordering of the input or initial status of the algorithms. Some algorithms deliver the same result independent of ordering and initial position others can be influenced by the order. In Multi Dimensional Scaling (MDS) [2], also known as Sammon s Mapping the idea is to approximate the original distance relations between samples in the layout as good as possible. Formally, if is the distance between sample and and is the position of the minimum of the equation is calculated. Usually some iterative process is used for the minimization. By Self Organizing Maps (SOM) or Kohonen Maps[3] the layout of data samples is generated by a trained neural network. The the neural network consists of a two dimensional arrangement of nodes (neurons). This network is trained using a collection of training samples. During training of the network the reference vector associated which each node are iteratively improved such that a good distribution of the data samples over space is achieved. The trained network represents a non-linear mapping from n-dimensional space to the two dimensional node array. After training the response to an input samples produces localized signal in the network. Generative Topographic Mapping (GTM) [4] is a technique in which an explicit topographic mapping function between the input data and the mapping space is found. The idea is to use a function, which maps a distribution in mapping space into the original data space. This is combined with a Gaussian noise model such that a sample mapped through the has a random position with a Gaussian distribution. A so called EM algorithm is used to find a combination of the 2D distribution function and mapping function which gives the optimal representation of the original data. The methods described can be divided into two categories. The first category contains methods in which the position in the mapping space are explicitly taken into account in the layout process for example MDS. In the second category this is not the case for example Self Organizing maps. 2.2 Visualization This section deals with the presentation of the generated layout to the user. How the information is presented to the user depends on: The structure of the underlying data. The size of the set The question which must be answered.... (1) 3

4 A global distinction can be made in discrete and continuous presentation. In the first case the individual samples of the data are presented as separate items. In the second case an aggregate of the data is presented in which the individual data samples might not be visible but the structure of the data set remains intact. If the data set is small icons can be used to show the individual samples. Attribute data can be encoded in the color and shape of the icon. If the number of samples grows this method becomes less suitable due to cluttering. A continuous representation is useful for the presentation of global structure in large data sets. 2.3 Manipulation 3 Applications 3.1 City distances The idea is to construct a map based on a distance table. The distance table used in this case was a table found on an old road map of the Netherlands and presents the road distances between 39 towns in the Netherlands. Based on this data a map was constructed using the mass spring system the towns were modeled as nodes and the distance matrix was filled with the found distances. The road distances are larger or equal to the the straight distances. The mass spring system was used to find the best fit of the towns on a plane. (See figure 2 left) This actually corresponds pretty well with the actual map of the Netherlands (See figure 2 right). The biggest abberation are the towns in the south west. An explanation can be found in the fact that a large detour is needed to reach these places from almost all other places due to the water around these places. The visualization can also be considered as a distance map of the Netherlands it shows which places, although geometrically are close are far away from a driving perspective. 3.2 Image feature spaces Image classification is a field in which the goal is to to allow images to be retrieved from data repositories subject to a user defined query. To be able to process these queries the images are classified based on a collection of attributes, such as color, texture and object shape. The usefulness of the attributes for the query system is an important question for the developers of image retrieval systems. Our goal was the development of a system in which feature developers can experiment with features on wide varieties of image sets. The input of our system is a single attribute vector consisting of the individual attributes. Because the individual features usually are vectors a user controllable scaling is applied to the weight of the attributes in the attribute vector. The distance matrix used for the layout of the images was defined by the Euclidean distance between the feature vectors. This way of presentation gives a global overview of the structure of the feature space as well as similarity relations among images. In addition, the method 4

5 Figure 2: City distance visualization. Left: the found layout of towns based on the road distances. Right: the solution overlayed on a map of the Netherlands allows a user to interactively scale each dimension of the feature space. In this way the user can explore the relation between features. We applied our methods to a synthetic test set of 3276 images. The test set consisted of 36 groups of images with distinct hue values. Each group had 91 textures of varying frequency and orientation. For each image, 6 feature vectors were computed: a 1 four-dimensional Gabor feature vector for texture analysis and 5 distinct color-based features vectors. The color-based features vectors including a hue histogram, a hue histogram of the center region of the image, and 3 hue transition histograms. For transition histograms, the hue is first dithered to 16 bins; then the histogram of the 256 resulting combinations is recorded. As a pre-processing step, the images were segmented into 32, 128, and 256 tiles, and each tile was replaced by its dominant hue. The dimensionality of the feature space spanned by the 6 features vectors is 804. Figure 3 shows a snapshot of the user interface. The left panel shows the graph view: an arrangement of the graph in the visualization space. Small dots are used to represent vertices. Grey lines represent edges between points with distances below the threshold distance. Edges also provide additional feedback on the state and progression of the layout algorithm. For example, very long edges will indicate that the layout algorithm has not reached an equilibrium. Some selected vertices are annotated with a thumbnail image. The right panel shows the splat field, which is color encoded from white (low density values) to black (high density values). In Figure 3, the mass spring algorithm has reached an equilibrium. Users can drag vertices to other positions, after which the mass spring algorithm will compute a new equilibrium. Animation is used to display each step of the mass spring system evolving to an equilibrium. In this way, the user can study how an arrangement evolves towards another. The graph provides a 2D view in which the images are displayed according to their mutual dissimilarities and similar images are clustered. A problem with the graph view 5

6 Figure 3: Two views of a graph arrangement for the test set. The left panel shows graph view with vertices and edges. The right panel shows the splat field. Some vertices are annotated with a thumbnail image. is the potential cluttering, making it difficult to estimate density of vertices in dense regions. The splat field provides a 2D view of a continuous density field. Colors are used to show which areas have a high density of vertices. In this way, the user can see in a glance which images are similar. 3.3 Citations The goal of this application is the analysis of the citation index of all IEEE Vis 9X papers. We show that clustering of citations leads to specific topics in visualization. We have applied our method to the analysis of the IEEE Vis 9X citation index. The input data set are BibTeX entries of all papers in the proceedings of the IEEE Vis 9X conferences and all references to papers in this set from other papers in the set. The data set consists 599 BibTeX entries and 881 references. The graph represents papers as a vertices and references as edges. The goals of the visualization was item to test the hypothesis that topics in visualization could be identified by only analyzing the density of the references. The motivation of this hypothesis is that papers in one topic often refer to other papers in the same topic. The distance matrix used for the layout was the reference matrix. This matrix which has the dimension of the total number of papers in both directions and each element contains true if a paper references the other. Figure 4 shows the output of the spring mass algorithm. As can be seen, aside from the papers which are not referenced and do not reference papers, there is a single main graph which can not be partitioned without breaking edges. The number of elements in the graph is too large and cluttered to get insight into the structure of the graph. Figure 4 shows a slightly zoomed in 2D rendition of the same splat field. One can clearly see various clusters of papers. For example, the papers in the large dark region in the middle of the image deal with flow visualization. The (smaller) region 6

7 Figure 4: Left: All papers published in IEEE Vis 9X conferences, represented is discs and references between the papers represented as lines. Right: Interacting with the citation index. The influence of a group of papers is drawn with yellow (incoming) and blue (outgoing) references. Papers in the region selected by the highlighted contour on the right are shown as discs. below are papers describing visualization systems. The region on the right are volume visualization papers. Discrete discs (red dots) and lines (yellow lines for incoming references and blue lines for outgoing references) are used for representing vertices and edges are also drawn as annotation. The region at the top left contain information visualization papers. The distance to the other peaks in the field clearly illustrates the distinction between information visualization and other data visualization topics. Figure 4 also illustrates some interaction techniques. The contour line around the right region is used as a selection criterion. In this way all papers in an area, in this case volume visualization, can be selected. Also, the influence of a paper is shown by drawing the edges representing references to the selected papers. The user can also pick individual papers and show all information related to that individual paper. 4 Conclusion References [1] M. Kreuseler N. López H. Schumann. A scalable framework for information visualization. In Proceedings IEEE Symposium on Information Visualization 2000, pages 27 36, [2] S.S. Schiffman M.L. Reynolds F.W. Young. Introduction to Multidimensional Scaling, Theory, Methods, and Applications. Academic Press,

8 [3] T. Kohonen. Self-Organizing Maps. Springer-Verlag Berlin Heidelberg New York, [4] C.M. Bishop M Svensén C.K.I. Williams. GTM: A principled alternative to the self-organizing map. Advances in Neural Information Processing Systems, 9: ,

### GraphSplatting: visualizing graphs as continuous fields

GraphSplatting: visualizing graphs as continuous fields Robert van Liere Wim de Leeuw Center for Mathematics and Computer Science Amsterdam, the Netherlands Abstract This paper introduces GraphSplatting,

### Clustering & Visualization

Chapter 5 Clustering & Visualization Clustering in high-dimensional databases is an important problem and there are a number of different clustering paradigms which are applicable to high-dimensional data.

### Self Organizing Maps: Fundamentals

Self Organizing Maps: Fundamentals Introduction to Neural Networks : Lecture 16 John A. Bullinaria, 2004 1. What is a Self Organizing Map? 2. Topographic Maps 3. Setting up a Self Organizing Map 4. Kohonen

### USING SELF-ORGANIZING MAPS FOR INFORMATION VISUALIZATION AND KNOWLEDGE DISCOVERY IN COMPLEX GEOSPATIAL DATASETS

USING SELF-ORGANIZING MAPS FOR INFORMATION VISUALIZATION AND KNOWLEDGE DISCOVERY IN COMPLEX GEOSPATIAL DATASETS Koua, E.L. International Institute for Geo-Information Science and Earth Observation (ITC).

### 9. Text & Documents. Visualizing and Searching Documents. Dr. Thorsten Büring, 20. Dezember 2007, Vorlesung Wintersemester 2007/08

9. Text & Documents Visualizing and Searching Documents Dr. Thorsten Büring, 20. Dezember 2007, Vorlesung Wintersemester 2007/08 Slide 1 / 37 Outline Characteristics of text data Detecting patterns SeeSoft

### Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches

Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches PhD Thesis by Payam Birjandi Director: Prof. Mihai Datcu Problematic

### Visualization of Breast Cancer Data by SOM Component Planes

International Journal of Science and Technology Volume 3 No. 2, February, 2014 Visualization of Breast Cancer Data by SOM Component Planes P.Venkatesan. 1, M.Mullai 2 1 Department of Statistics,NIRT(Indian

### Triangular Visualization

Triangular Visualization Tomasz Maszczyk and W lodzis law Duch Department of Informatics, Nicolaus Copernicus University, Toruń, Poland tmaszczyk@is.umk.pl;google:w.duch http://www.is.umk.pl Abstract.

### Introduction to machine learning and pattern recognition Lecture 1 Coryn Bailer-Jones

Introduction to machine learning and pattern recognition Lecture 1 Coryn Bailer-Jones http://www.mpia.de/homes/calj/mlpr_mpia2008.html 1 1 What is machine learning? Data description and interpretation

### . Learn the number of classes and the structure of each class using similarity between unlabeled training patterns

Outline Part 1: of data clustering Non-Supervised Learning and Clustering : Problem formulation cluster analysis : Taxonomies of Clustering Techniques : Data types and Proximity Measures : Difficulties

### Lecture 20: Clustering

Lecture 20: Clustering Wrap-up of neural nets (from last lecture Introduction to unsupervised learning K-means clustering COMP-424, Lecture 20 - April 3, 2013 1 Unsupervised learning In supervised learning,

### Chapter ML:XI (continued)

Chapter ML:XI (continued) XI. Cluster Analysis Data Mining Overview Cluster Analysis Basics Hierarchical Cluster Analysis Iterative Cluster Analysis Density-Based Cluster Analysis Cluster Evaluation Constrained

### Classication with Kohonen Self-Organizing Maps

Classication with Kohonen Self-Organizing Maps Mia Louise Westerlund Soft Computing, Haskoli Islands, April 24, 2005 1 Introduction 1.1 Background A neural network is dened in [1] as a set of nodes connected

### SELF-ORGANISING MAPPING NETWORKS (SOM) WITH SAS E-MINER

SELF-ORGANISING MAPPING NETWORKS (SOM) WITH SAS E-MINER C.Sarada, K.Alivelu and Lakshmi Prayaga Directorate of Oilseeds Research, Rajendranagar, Hyderabad saradac@yahoo.com Self Organising mapping networks

### INTERACTIVE DATA EXPLORATION USING MDS MAPPING

INTERACTIVE DATA EXPLORATION USING MDS MAPPING Antoine Naud and Włodzisław Duch 1 Department of Computer Methods Nicolaus Copernicus University ul. Grudziadzka 5, 87-100 Toruń, Poland Abstract: Interactive

### CHAPTER 3 DATA MINING AND CLUSTERING

CHAPTER 3 DATA MINING AND CLUSTERING 3.1 Introduction Nowadays, large quantities of data are being accumulated. The amount of data collected is said to be almost doubled every 9 months. Seeking knowledge

### DATA VISUALIZATION GABRIEL PARODI STUDY MATERIAL: PRINCIPLES OF GEOGRAPHIC INFORMATION SYSTEMS AN INTRODUCTORY TEXTBOOK CHAPTER 7

DATA VISUALIZATION GABRIEL PARODI STUDY MATERIAL: PRINCIPLES OF GEOGRAPHIC INFORMATION SYSTEMS AN INTRODUCTORY TEXTBOOK CHAPTER 7 Contents GIS and maps The visualization process Visualization and strategies

### The Value of Visualization 2

The Value of Visualization 2 G Janacek -0.69 1.11-3.1 4.0 GJJ () Visualization 1 / 21 Parallel coordinates Parallel coordinates is a common way of visualising high-dimensional geometry and analysing multivariate

### A Survey of Kernel Clustering Methods

A Survey of Kernel Clustering Methods Maurizio Filippone, Francesco Camastra, Francesco Masulli and Stefano Rovetta Presented by: Kedar Grama Outline Unsupervised Learning and Clustering Types of clustering

### Topic Maps Visualization

Topic Maps Visualization Bénédicte Le Grand, Laboratoire d'informatique de Paris 6 Introduction Topic maps provide a bridge between the domains of knowledge representation and information management. Topics

### A Computational Framework for Exploratory Data Analysis

A Computational Framework for Exploratory Data Analysis Axel Wismüller Depts. of Radiology and Biomedical Engineering, University of Rochester, New York 601 Elmwood Avenue, Rochester, NY 14642-8648, U.S.A.

### High dimensional Data visualization and clustering using Self Organizing Maps

High dimensional Data visualization and clustering using Self Organizing Maps Oliver Bernhardt oliver.bernhardt@fh-hagenberg.at Department of Medical- and Bioinformatics Upper Austria University of Applied

### Self-Organizing g Maps (SOM) COMP61021 Modelling and Visualization of High Dimensional Data

Self-Organizing g Maps (SOM) Ke Chen Outline Introduction ti Biological Motivation Kohonen SOM Learning Algorithm Visualization Method Examples Relevant Issues Conclusions 2 Introduction Self-organizing

### Visualization of large data sets using MDS combined with LVQ.

Visualization of large data sets using MDS combined with LVQ. Antoine Naud and Włodzisław Duch Department of Informatics, Nicholas Copernicus University, Grudziądzka 5, 87-100 Toruń, Poland. www.phys.uni.torun.pl/kmk

### Using Smoothed Data Histograms for Cluster Visualization in Self-Organizing Maps

Technical Report OeFAI-TR-2002-29, extended version published in Proceedings of the International Conference on Artificial Neural Networks, Springer Lecture Notes in Computer Science, Madrid, Spain, 2002.

### Data Mining and Clustering Techniques

DRTC Workshop on Semantic Web 8 th 10 th December, 2003 DRTC, Bangalore Paper: K Data Mining and Clustering Techniques I. K. Ravichandra Rao Professor and Head Documentation Research and Training Center

### Data Mining. Cluster Analysis: Advanced Concepts and Algorithms

Data Mining Cluster Analysis: Advanced Concepts and Algorithms Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 More Clustering Methods Prototype-based clustering Density-based clustering Graph-based

### Visualization methods for patent data

Visualization methods for patent data Treparel 2013 Dr. Anton Heijs (CTO & Founder) Delft, The Netherlands Introduction Treparel can provide advanced visualizations for patent data. This document describes

### Content-Based Image Retrieval

Content-Based Image Retrieval Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr Image retrieval Searching a large database for images that match a query: What kind

### CHAPTER 5 SENDER AUTHENTICATION USING FACE BIOMETRICS

74 CHAPTER 5 SENDER AUTHENTICATION USING FACE BIOMETRICS 5.1 INTRODUCTION Face recognition has become very popular in recent years, and is used in many biometric-based security systems. Face recognition

### A Study of Web Log Analysis Using Clustering Techniques

A Study of Web Log Analysis Using Clustering Techniques Hemanshu Rana 1, Mayank Patel 2 Assistant Professor, Dept of CSE, M.G Institute of Technical Education, Gujarat India 1 Assistant Professor, Dept

### EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set

EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set Amhmed A. Bhih School of Electrical and Electronic Engineering Princy Johnson School of Electrical and Electronic Engineering Martin

### Visualizing an Auto-Generated Topic Map

Visualizing an Auto-Generated Topic Map Nadine Amende 1, Stefan Groschupf 2 1 University Halle-Wittenberg, information manegement technology na@media-style.com 2 media style labs Halle Germany sg@media-style.com

### 2/3/2009. Color. Today! Sensing Color Coding color systems Models of Reflectance Applications

Color Today! Sensing Color Coding color systems Models of Reflectance Applications 1 Color Complexity Many theories, measurement techniques, and standards for colors, yet no one theory of human color perception

### Data topology visualization for the Self-Organizing Map

Data topology visualization for the Self-Organizing Map Kadim Taşdemir and Erzsébet Merényi Rice University - Electrical & Computer Engineering 6100 Main Street, Houston, TX, 77005 - USA Abstract. The

### An Analysis on Density Based Clustering of Multi Dimensional Spatial Data

An Analysis on Density Based Clustering of Multi Dimensional Spatial Data K. Mumtaz 1 Assistant Professor, Department of MCA Vivekanandha Institute of Information and Management Studies, Tiruchengode,

### Digital Image Processing Using Matlab. Haris Papasaika-Hanusch Institute of Geodesy and Photogrammetry, ETH Zurich

Haris Papasaika-Hanusch Institute of Geodesy and Photogrammetry, ETH Zurich haris@geod.baug.ethz.ch Images and Digital Images A digital image differs from a photo in that the values are all discrete. Usually

### Knowledge Discovery and Data Mining. Structured vs. Non-Structured Data

Knowledge Discovery and Data Mining Unit # 2 1 Structured vs. Non-Structured Data Most business databases contain structured data consisting of well-defined fields with numeric or alphanumeric values.

### STATISTICA. Clustering Techniques. Case Study: Defining Clusters of Shopping Center Patrons. and

Clustering Techniques and STATISTICA Case Study: Defining Clusters of Shopping Center Patrons STATISTICA Solutions for Business Intelligence, Data Mining, Quality Control, and Web-based Analytics Table

### Data Mining Cluster Analysis: Advanced Concepts and Algorithms. ref. Chapter 9. Introduction to Data Mining

Data Mining Cluster Analysis: Advanced Concepts and Algorithms ref. Chapter 9 Introduction to Data Mining by Tan, Steinbach, Kumar 1 Outline Prototype-based Fuzzy c-means Mixture Model Clustering Density-based

### Cluster Analysis: Advanced Concepts

Cluster Analysis: Advanced Concepts and dalgorithms Dr. Hui Xiong Rutgers University Introduction to Data Mining 08/06/2006 1 Introduction to Data Mining 08/06/2006 1 Outline Prototype-based Fuzzy c-means

### An Open Framework for Reverse Engineering Graph Data Visualization. Alexandru C. Telea Eindhoven University of Technology The Netherlands.

An Open Framework for Reverse Engineering Graph Data Visualization Alexandru C. Telea Eindhoven University of Technology The Netherlands Overview Reverse engineering (RE) overview Limitations of current

### 2002 IEEE. Reprinted with permission.

Laiho J., Kylväjä M. and Höglund A., 2002, Utilization of Advanced Analysis Methods in UMTS Networks, Proceedings of the 55th IEEE Vehicular Technology Conference ( Spring), vol. 2, pp. 726-730. 2002 IEEE.

### Decision Support via Big Multidimensional Data Visualization

Decision Support via Big Multidimensional Data Visualization Audronė Lupeikienė 1, Viktor Medvedev 1, Olga Kurasova 1, Albertas Čaplinskas 1, Gintautas Dzemyda 1 1 Vilnius University, Institute of Mathematics

### A Simple Feature Extraction Technique of a Pattern By Hopfield Network

A Simple Feature Extraction Technique of a Pattern By Hopfield Network A.Nag!, S. Biswas *, D. Sarkar *, P.P. Sarkar *, B. Gupta **! Academy of Technology, Hoogly - 722 *USIC, University of Kalyani, Kalyani

### Methodology for Emulating Self Organizing Maps for Visualization of Large Datasets

Methodology for Emulating Self Organizing Maps for Visualization of Large Datasets Macario O. Cordel II and Arnulfo P. Azcarraga College of Computer Studies *Corresponding Author: macario.cordel@dlsu.edu.ph

### Reconstructing Self Organizing Maps as Spider Graphs for better visual interpretation of large unstructured datasets

Reconstructing Self Organizing Maps as Spider Graphs for better visual interpretation of large unstructured datasets Aaditya Prakash, Infosys Limited aaadityaprakash@gmail.com Abstract--Self-Organizing

### Monitoring of Complex Industrial Processes based on Self-Organizing Maps and Watershed Transformations

Monitoring of Complex Industrial Processes based on Self-Organizing Maps and Watershed Transformations Christian W. Frey 2012 Monitoring of Complex Industrial Processes based on Self-Organizing Maps and

### Data Mining Project Report. Document Clustering. Meryem Uzun-Per

Data Mining Project Report Document Clustering Meryem Uzun-Per 504112506 Table of Content Table of Content... 2 1. Project Definition... 3 2. Literature Survey... 3 3. Methods... 4 3.1. K-means algorithm...

### Information Visualization Multivariate Data Visualization Krešimir Matković

Information Visualization Multivariate Data Visualization Krešimir Matković Vienna University of Technology, VRVis Research Center, Vienna Multivariable >3D Data Tables have so many variables that orthogonal

### DATA ANALYTICS USING R

DATA ANALYTICS USING R Duration: 90 Hours Intended audience and scope: The course is targeted at fresh engineers, practicing engineers and scientists who are interested in learning and understanding data

### COMPARISON OF OBJECT BASED AND PIXEL BASED CLASSIFICATION OF HIGH RESOLUTION SATELLITE IMAGES USING ARTIFICIAL NEURAL NETWORKS

COMPARISON OF OBJECT BASED AND PIXEL BASED CLASSIFICATION OF HIGH RESOLUTION SATELLITE IMAGES USING ARTIFICIAL NEURAL NETWORKS B.K. Mohan and S. N. Ladha Centre for Studies in Resources Engineering IIT

### The Scientific Data Mining Process

Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In

### Utilizing spatial information systems for non-spatial-data analysis

Jointly published by Akadémiai Kiadó, Budapest Scientometrics, and Kluwer Academic Publishers, Dordrecht Vol. 51, No. 3 (2001) 563 571 Utilizing spatial information systems for non-spatial-data analysis

### CONNECTIONIST THEORIES OF LEARNING

CONNECTIONIST THEORIES OF LEARNING Themis N. Karaminis, Michael S.C. Thomas Department of Psychological Sciences, Birkbeck College, University of London London, WC1E 7HX UK tkaram01@students.bbk.ac.uk,

### Security Threats to Signal Classifiers Using Self-Organizing Maps

Security Threats to Signal Classifiers Using Self-Organizing Maps T. Charles Clancy Awais Khawar tcc@umd.edu awais@umd.edu Department of Electrical and Computer Engineering University of Maryland, College

### Integration of Cluster Analysis and Visualization Techniques for Visual Data Analysis

Integration of Cluster Analysis and Visualization Techniques for Visual Data Analysis M. Kreuseler, T. Nocke, H. Schumann, Institute of Computer Graphics University of Rostock, D-18059 Rostock, Germany

### An Overview of Knowledge Discovery Database and Data mining Techniques

An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,

### Clustering Hierarchical clustering and k-mean clustering

Clustering Hierarchical clustering and k-mean clustering Genome 373 Genomic Informatics Elhanan Borenstein The clustering problem: A quick review partition genes into distinct sets with high homogeneity

### Assessment. Presenter: Yupu Zhang, Guoliang Jin, Tuo Wang Computer Vision 2008 Fall

Automatic Photo Quality Assessment Presenter: Yupu Zhang, Guoliang Jin, Tuo Wang Computer Vision 2008 Fall Estimating i the photorealism of images: Distinguishing i i paintings from photographs h Florin

### IMPROVED VIRTUAL MOUSE POINTER USING KALMAN FILTER BASED GESTURE TRACKING TECHNIQUE

39 IMPROVED VIRTUAL MOUSE POINTER USING KALMAN FILTER BASED GESTURE TRACKING TECHNIQUE D.R.A.M. Dissanayake, U.K.R.M.H. Rajapaksha 2 and M.B Dissanayake 3 Department of Electrical and Electronic Engineering,

### Customer Data Mining and Visualization by Generative Topographic Mapping Methods

Customer Data Mining and Visualization by Generative Topographic Mapping Methods Jinsan Yang and Byoung-Tak Zhang Artificial Intelligence Lab (SCAI) School of Computer Science and Engineering Seoul National

### Fraudulent Behavior Forecast in Telecom Industry Based on Data Mining Technology

Communications of the IIMA Volume 7 Issue 4 Article 1 2007 Fraudulent Behavior Forecast in Telecom Industry Based on Data Mining Technology Sen Wu School of Economics and Management University of Science

### VOL. 2, NO. 1, January 2012 ISSN ARPN Journal of Systems and Software AJSS Journal. All rights reserved

Color and Texture Based Image Retrieval Sridhar, Gowri Department of Information Science and Technology College of Engineering, Guindy Anna University Chennai {ssridhar@annauniv.edu, manoharan.gowri@gmail.com}

### Neural Networks Kohonen Self-Organizing Maps

Neural Networks Kohonen Self-Organizing Maps Mohamed Krini Christian-Albrechts-Universität zu Kiel Faculty of Engineering Institute of Electrical and Information Engineering Digital Signal Processing and

### GIS Tutorial 1. Lecture 2 Map design

GIS Tutorial 1 Lecture 2 Map design Outline Choropleth maps Colors Vector GIS display GIS queries Map layers and scale thresholds Hyperlinks and map tips 2 Lecture 2 CHOROPLETH MAPS Choropleth maps Color-coded

### BIG DATA VISUALIZATION. Team Impossible Peter Vilim, Sruthi Mayuram Krithivasan, Matt Burrough, and Ismini Lourentzou

BIG DATA VISUALIZATION Team Impossible Peter Vilim, Sruthi Mayuram Krithivasan, Matt Burrough, and Ismini Lourentzou Let s begin with a story Let s explore Yahoo s data! Dora the Data Explorer has a new

### PRACTICAL DATA MINING IN A LARGE UTILITY COMPANY

QÜESTIIÓ, vol. 25, 3, p. 509-520, 2001 PRACTICAL DATA MINING IN A LARGE UTILITY COMPANY GEORGES HÉBRAIL We present in this paper the main applications of data mining techniques at Electricité de France,

### Course 803401 DSS. Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

Oman College of Management and Technology Course 803401 DSS Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization CS/MIS Department Information Sharing

### Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data

CMPE 59H Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data Term Project Report Fatma Güney, Kübra Kalkan 1/15/2013 Keywords: Non-linear

### Big Data: Rethinking Text Visualization

Big Data: Rethinking Text Visualization Dr. Anton Heijs anton.heijs@treparel.com Treparel April 8, 2013 Abstract In this white paper we discuss text visualization approaches and how these are important

### Cartogram representation of the batch-som magnification factor

ESANN 2012 proceedings, European Symposium on Artificial Neural Networs, Computational Intelligence Cartogram representation of the batch-som magnification factor Alessandra Tosi 1 and Alfredo Vellido

### Neural Networks. CAP5610 Machine Learning Instructor: Guo-Jun Qi

Neural Networks CAP5610 Machine Learning Instructor: Guo-Jun Qi Recap: linear classifier Logistic regression Maximizing the posterior distribution of class Y conditional on the input vector X Support vector

### Segmentation of building models from dense 3D point-clouds

Segmentation of building models from dense 3D point-clouds Joachim Bauer, Konrad Karner, Konrad Schindler, Andreas Klaus, Christopher Zach VRVis Research Center for Virtual Reality and Visualization, Institute

### Comparing large datasets structures through unsupervised learning

Comparing large datasets structures through unsupervised learning Guénaël Cabanes and Younès Bennani LIPN-CNRS, UMR 7030, Université de Paris 13 99, Avenue J-B. Clément, 93430 Villetaneuse, France cabanes@lipn.univ-paris13.fr

### MVA ENS Cachan. Lecture 2: Logistic regression & intro to MIL Iasonas Kokkinos Iasonas.kokkinos@ecp.fr

Machine Learning for Computer Vision 1 MVA ENS Cachan Lecture 2: Logistic regression & intro to MIL Iasonas Kokkinos Iasonas.kokkinos@ecp.fr Department of Applied Mathematics Ecole Centrale Paris Galen

### Graph/Network Visualization

Graph/Network Visualization Data model: graph structures (relations, knowledge) and networks. Applications: Telecommunication systems, Internet and WWW, Retailers distribution networks knowledge representation

### EFFICIENT K-MEANS CLUSTERING ALGORITHM USING RANKING METHOD IN DATA MINING

EFFICIENT K-MEANS CLUSTERING ALGORITHM USING RANKING METHOD IN DATA MINING Navjot Kaur, Jaspreet Kaur Sahiwal, Navneet Kaur Lovely Professional University Phagwara- Punjab Abstract Clustering is an essential

### The Overview Panel in Gephi 0.9

Gephi Cheat Sheet 1 Appearance To change the size and color of the network according to some values. The Overview Panel in Gephi 0.9 Where all the functions are available to explore the network visually.

### Centroid Distance Function and the Fourier Descriptor with Applications to Cancer Cell Clustering

Centroid Distance Function and the Fourier Descriptor with Applications to Cancer Cell Clustering By, Swati Bhonsle Alissa Klinzmann Mentors Fred Park Department of Mathematics Ernie Esser Department of

### Supplemental material for TVCG submission Visual Adjacency Lists for Dynamic Graphs

Supplemental material for TVCG submission Visual Adjacency Lists for Dynamic Graphs In these slides, we provide a more detailed analysis of the case studies in our paper. 1. Migration data 1.1. Visualization

### 2.Animation Tool: Synfig

Standard:11 2.Animation Tool: Synfig In previous chapter we learn Multimedia and basic building block of multimedia. To create a multimedia presentation using these building blocks we need application

### Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

Turban, Aronson, and Liang Decision Support Systems and Intelligent Systems, Seventh Edition Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

### Threshold Logic. 2.1 Networks of functions

2 Threshold Logic 2. Networks of functions We deal in this chapter with the simplest kind of computing units used to build artificial neural networks. These computing elements are a generalization of the

Image Segmentation Preview Segmentation subdivides an image to regions or objects Two basic properties of intensity values Discontinuity Edge detection Similarity Thresholding Region growing/splitting/merging

### Parallel Data Selection Based on Neurodynamic Optimization in the Era of Big Data

Parallel Data Selection Based on Neurodynamic Optimization in the Era of Big Data Jun Wang Department of Mechanical and Automation Engineering The Chinese University of Hong Kong Shatin, New Territories,

### COMP3420: Advanced Databases and Data Mining. Web data mining

COMP3420: Advanced Databases and Data Mining Web data mining Lecture outline The Web as a data source Challenges the Web poses to data mining Types of Web data mining Mining the Web page layout structure

### They can be obtained in HQJHQH format directly from the home page at: http://www.engene.cnb.uam.es/downloads/kobayashi.dat

HQJHQH70 *XLGHG7RXU This document contains a Guided Tour through the HQJHQH platform and it was created for training purposes with respect to the system options and analysis possibilities. It is not intended

### COPYRIGHTED MATERIAL. Contents. List of Figures. Acknowledgments

Contents List of Figures Foreword Preface xxv xxiii xv Acknowledgments xxix Chapter 1 Fraud: Detection, Prevention, and Analytics! 1 Introduction 2 Fraud! 2 Fraud Detection and Prevention 10 Big Data for

### Practical Graph Mining with R. 5. Link Analysis

Practical Graph Mining with R 5. Link Analysis Outline Link Analysis Concepts Metrics for Analyzing Networks PageRank HITS Link Prediction 2 Link Analysis Concepts Link A relationship between two entities

### Segmentation & Clustering

EECS 442 Computer vision Segmentation & Clustering Segmentation in human vision K-mean clustering Mean-shift Graph-cut Reading: Chapters 14 [FP] Some slides of this lectures are courtesy of prof F. Li,

### Role of Neural network in data mining

Role of Neural network in data mining Chitranjanjit kaur Associate Prof Guru Nanak College, Sukhchainana Phagwara,(GNDU) Punjab, India Pooja kapoor Associate Prof Swami Sarvanand Group Of Institutes Dinanagar(PTU)

### Analysis of Performance Metrics from a Database Management System Using Kohonen s Self Organizing Maps

WSEAS Transactions on Systems Issue 3, Volume 2, July 2003, ISSN 1109-2777 629 Analysis of Performance Metrics from a Database Management System Using Kohonen s Self Organizing Maps Claudia L. Fernandez,

### Big Data Text Mining and Visualization. Anton Heijs

Copyright 2007 by Treparel Information Solutions BV. This report nor any part of it may be copied, circulated, quoted without prior written approval from Treparel7 Treparel Information Solutions BV Delftechpark

### CS 591.03 Introduction to Data Mining Instructor: Abdullah Mueen

CS 591.03 Introduction to Data Mining Instructor: Abdullah Mueen LECTURE 3: DATA TRANSFORMATION AND DIMENSIONALITY REDUCTION Chapter 3: Data Preprocessing Data Preprocessing: An Overview Data Quality Major

### Visualizing the Top 400 Universities

Int'l Conf. e-learning, e-bus., EIS, and e-gov. EEE'15 81 Visualizing the Top 400 Universities Salwa Aljehane 1, Reem Alshahrani 1, and Maha Thafar 1 saljehan@kent.edu, ralshahr@kent.edu, mthafar@kent.edu

### Data visualization and clustering. Genomics is to no small extend a data science

Data visualization and clustering Genomics is to no small extend a data science [www.data2discovery.org] Data visualization and clustering Genomics is to no small extend a data science [Andersson et al.,

### Connecting Segments for Visual Data Exploration and Interactive Mining of Decision Rules

Journal of Universal Computer Science, vol. 11, no. 11(2005), 1835-1848 submitted: 1/9/05, accepted: 1/10/05, appeared: 28/11/05 J.UCS Connecting Segments for Visual Data Exploration and Interactive Mining