INTERACTIVE DATA EXPLORATION USING MDS MAPPING
|
|
|
- Blake Strickland
- 10 years ago
- Views:
Transcription
1 INTERACTIVE DATA EXPLORATION USING MDS MAPPING Antoine Naud and Włodzisław Duch 1 Department of Computer Methods Nicolaus Copernicus University ul. Grudziadzka 5, Toruń, Poland Abstract: Interactive exploratory data analysis can be realised by using dimensionality reduction techniques integrated in data visualization software. This work presents an adaptation of one multidimensional scaling algorithm to provide it with generalization capability, allowing the display of new data on an existing mapping. The ensuing relative mapping is used to help the understanding of classification results. Keywords: Data visualization, Exploratory Data Analysis, multidimensional scaling. 1 INTRODUCTION In many applications understanding of the data is of primary importance. Thus usual approach to data analysis in machine learning community is based on logical rules and the lack of comprehensibility is perceived as the biggest drawback of neural network and pattern recognition methods [4]. Interpretation of logical rules is not so straightforward as it may seem since rules may not be stable and logical true-false answers near decision borders of the classifier are misleading. If the decision borders are complex, a large number of crisp rules are needed for proper approximation. Fuzzy rules may be more appropriate in such case, although they also do not provide optimal decision borders. An alternative way to understand the structure of the data is based on visualization. New case, given for classification, should be viewed in relation to known cases stored in the training database and in relation to the decision borders provided by the classifier used. Visualization may always be used to understand decision borders of any classifier, and in some cases may provide more information than logical rules. Unfortunately our spatial imagination is restricted to 3 dimensions and thus we cannot directly view relations in highly dimensional spaces. Self-organizing topographic feature maps (SOM) [8], known also as the Kohonen networks, were design as a visualization and classification tool. They have found many applications as systems capable of unsupervised learning and visualization of data. Most classification problems belong to the supervised learning class. SOM networks appear to be among the worst classifiers in such tasks [12]. They also do not seem to provide good maps that do preserve correct topographical relations among multidimensional objects [6, 7]. Quantitative measures for the distortions of original data topography by SOM mapping have been introduced in [3]. The same idea has already been published in psychometric journal by Kruskal 1 address: {naud, duch}@phys.uni.torun.pl
2 [10] and is known as the multidimensional scaling" (MDS), and in engineering journal [14], where it is known as the Sammon mapping". A quantitative measure of topographical distortion based on differences between distances of objects in the original space and distances between projections of these objects in the low-dimensional space is introduced and minimized in MDS methods. In this paper we will show how MDS maps displaying classification borders may be formed. In the next section some information about the software allowing for the interactive data exploration is given, in the third section a method to place a single new object on existing map is described, and in the fourth section sensitivity of such maps to initial conditions and computational experiments are described. A short discussion concludes this paper. 2 INTERACTIVE DATA EXPLORATION USING MDS MAP- PING The TCM (Topographically Correct Mapping) interactive software developed by us performs a mapping of multivariate data from a high-dimensional space (D-dimensional space, D 3) to data points in a lower d-dimensional space (d D). Usually d =2in order to allow visualization by scatterplots. This dimensionality reduction, obtained by a multidimensional scaling (MDS) procedure, is such that similar (or close, in the sense of the Euclidean distance in the D-dimensional space) multivariate objects or cases are mapped on representative points close to each other in the d-dimensional representation space. The mapping should preserve topography of data vectors. Linear projections are not able to preserve topography; such mappings have to be non-linear. Topography preservation may be unreachable in the whole feature space if the analyzed multivariate data is intrinsically high dimensional and cannot be imbedded in a lower dimensional space without distortion. Therefore MDS mappings may be confusing and may give incorrect ideas about data structure. When zooming on a small neighborhood of a particular vector, distortions of such mapping should become smaller. Therefore the software should be interactive, allowing for such zooming. The MDS algorithm used here (Kruskal s nonmetric scaling [10]) requires only a dissimilarity or distance matrix of the multivariate objects. Preservation of data topography allows the display of the data structure, the clusters, outliers or class overlapping. The program should be treated as an exploratory data analysis tool and should have a Graphical User Interface to allow interactive exploration. Perhaps the best known software for interactive data exploration is the XGobi [15], allowing to see 3-dimensional projections from different perspective, but without preservation of multidimensional topographical relations. The minimization involved in TCM makes it difficult to provide real-time tour in the multidimensional space, as it is done with the Xgobi projections. In TCM the initial MDS algorithm has been adapted to perform MDS mapping of new points on a set of points previously mapped, i.e. to perform a relative mapping. The user may select a subset of a mapped dataset using the mouse and to create a new map, zooming on this subset only. The value of topographical distortion measure should decrease during zooming and allows an evaluation of the confidence one may have in correct representation of the original data topography.
3 3 RELATIVE MDS MAPPING In classification tasks, it can be interesting to see where a new data points "falls" among known cases, and discover the class topology of its neighboring known cases (classified and labeled), to get an insight on how a classifier would classify this new data. The realization of this purpose gives rise to the need for a method that allows the mapping of one new point on a set of data points previously mapped, using a topology-preserving mapping. MDS is a topology preserving mapping, but it does not offer the possibility to project new points on an existing mapped set of points. To get a mapping presenting the previously mapped known points together with the new ones requires a complete re-run of the MDS algorithm on the new and the old data points. Let us denote by N f the number of known data points, N m the number of new data points, N t = N m + N f the total number of points considered during the mapping, F = {P i,i =1,N f } the set of known data points and by M = {P i,i =1,N m } the set of new data points. The computational cost involved can be reduced using the following scheme: 1. Map set F using normal MDS mapping, 2. Map set M relatively to the mapped set F using the relative MDS mapping technique. Relative MDS mapping differs from normal MDS by the fact that during the minimization of the topography distortion measure (which will be called Stress" here) only the points from M are allowed to move while the points from F are kept fixed. This is achieved by modifying the Stress function so that it sums only over the distances that change during iterations, i.e. the distances between the fixed and the moving points, and the moving points interpoint distances. The original Stress function: is redefined as: S r (x) = N m ij N t ( ) 2 S(x) = w ij ˆdij d ij (1) ij ( ) 2 N m w ij ˆdij d ij + N t i=1 j=n m+1 w ij ( ˆdij d ij ) 2 (2) Relative mapping can also be used to visualize large datasets, in which the high number of points makes MDS techniques unusable due to the exponentially increasing computation time. Various approaches have been proposed to tackle this problem, especially in the case of Sammon mapping. One category of such approaches, called in [2] the frame method, is to first select a subset of N b points and to map it, and then to add sequentially the remaining points to the mapping. Relative mapping can be used to perform this second step, taking the points already mapped as fixed points, and the first point to be mapped as the moving point. In this way relative mapping offers an alternative to methods designed for the purpose of giving generalization capability to Sammon mapping such as the ANN Sammon mapping [9], Neuroscale [16], distance mapping [13] or incremental scaling [1].
4 4 SENSITIVITY OF MDS ALGORITHM TO INITIAL CON- DITIONS AND COMPUTATIONAL EXPERIMENTS MDS algorithm may be sensitive to initial conditions, making the interpretation of maps difficult. The configuration of representative points that is chosen to start the algorithm may have a strong influence on the final configuration. This is due to the fact that MDS relies on the minimization of a multivariate function by local minimization methods that get easily stucked in local minimums. We have applied two strategies for algorithm initialization. The first one was to set the initial configuration randomly and map it. This is repeated a number of times (20 times is often enough) and only the resulting configuration with the lowest Stress measure is kept. The second strategy is to first perform a linear mapping using Principal Components Analysis (by a Singular Value Decomposition of the data matrix) and to use the two first principal components as the initial configuration. Empirical experience on various datasets shows that the best Stress reached after random initialization is often slightly lower than the Stress resulting from the mapping initialized by PCA, but those differences are almost invisible when looking at the resulting configurations. In the case of a single point mapped relatively to the existing map, a better choice is to initialize the representative point by the triangulation method [11], leading in the D-dimensional space to the exact preservation of the distances to the d closest fixed points. The dataset used in our visualization experiments contains results of 1611 psychometric evaluations (this is an extended version of the database used in [5]), with 13 attributes, belonging to one of 20 classes. Figure 1 shows zooming into the neighborhood of one particular point (marked as ) from the database class organic_w. The plots show the mappings of the 300, 200, 50, 40, 30 and 20 closest points to this chosen point (the neighbors marked are from class norma_w, down triangles class organic_w, # class neurosis_w, up triangles class psychopatia_w, + class schizofrenia_w, pentagons class alcoholics and rectangles class narcotics_w ). 5 DISCUSSION Visualization may be sufficient to understand the data and may be the method of choice if the classification borders are complex. By separating classification from visualization, a better performance in both tasks may be achieved than that obtained with the SOM maps. Interpretation of these maps requires careful analysis of the amount of topographical distortions they introduce. Interactive zooming on interesting areas of the input space allows the exploration of the neighborhood of the case under inspection. Since there may be some uncertainty in the input one may generate a number (for example 100) of vectors drawn from a Gaussian distribution centered at this vector. The class of these additional points is determined using neural or other classification systems and corresponding points added (using relative mapping) to the map. Visual inspection of such map allows estimating the probability of misclassification, proportional to the number of points from alternative classes. Visualization of the decision borders of different classifiers is another interesting possibility. In this case large number of additional points are produced, classified and mapped
5 relatively to the true points from the database. All surface of the map becomes colored, showing smooth transitions between different classes. Areas that are near the classification borders will have mixed (dithered) colors, while areas containing well separated classes will have pure colors. In this way classification borders will be visible and the confidence in classification may be evaluated, depending on the distance of the given vector from the border. Topographically correct mapping may find many applications. Acknowledgments: J. Gomuła and T. Kucharski are gratefully acknowledged for providing the psychometric data. REFERENCES [1] W. Basalaj. Incremental multidimensional scaling method for database visualization. In Proceedings of Visual data Exploration and Analysis VI, SPIE, volume 3643 (1999) [2] C. L. Chang and R. C. T. Lee. A heuristic relaxation method for nonlinear mapping in cluster analysis. IEEE Transactions on Systems, Man, and Cybernetics 3 (1973) [3] W. Duch. Quantitative measures for the self-organizing topographic maps. Open Systems & Information Dynamics 2 (1995) [4] W. Duch, N. Jankowski, K. Grąbczewski, and R. Adamczak. Optimization and interpretation of rule-based classifiers. In P. V. (Springer), editor, Intelligent Information Systems, Bystra, Poland, June [5] W. Duch, T. Kucharski, J. Gomuła, and R. Adamczak. Metody uczenia maszynowego w analizie danych psychometrycznych. Zastosowanie do wielowymiarowego kwestionariusza osobowości MMPI WISKAD. Toruń, Poland, [6] W. Duch and A. Naud. Multidimensional scaling and kohonen s self-organizing maps. In Proceedings of the Second Conference "Neural Networks and their Applications", Szczyrk, Vol. I, pp , [7] W. Duch and A. Naud. On global self-organizing maps. In Proceedings of the 4th European Symposium on Artificial Neural Networks, Bruges, pp , [8] T. Kohonen. Self-Organizing Maps. Springer-Verlag, Heidelberg Berlin, [9] M. A. Kraaijveld, J. Mao, and A. K. Jain. A nonlinear projection method based on kohonen s topology preserving maps. IEEE Transactions on Neural Networks, 6 (1995) [10] J. B. Kruskal. Non metric multidimensional scaling : a numerical method. Psychometrika, 29 (1964) [11] R. C. T. Lee, J. R. Slagle, and H. Blum. A triangulation method for the sequential mapping of points from n-space to two-space. IEEE Transactions on Computers, 26 (1977) [12] D. Michie, D. J. Spiegelhalter, and C. C. Taylor, editors. Machine Learning, Neural and Statistical Classification. Ellis Horwood, New York, [13] E. Pekalska, D. de Ridder, R. P. Duin, and M. A. Kraaijveld. A new method of generalizing sammon mapping with application to algorithm speed-up. In M. Boasson, J. Karndorp, J. Torino, and M. Vosselman, editors, ASCI 99 Proc. 5th Annual Conference of the Advanced School for Computing and Image, [14] J. W. Sammon. A nonlinear mapping for data analysis. IEEE Transactions on Computers 5 (1969) [15] D. F. Swayne, D. Cook, and A. Buja. Xgobi: Interactive dynamic data visualization in the x window system. AT&T Labs Technical Rep andreas/xgobi/. [16] M. E. Tipping. Topographic mappings and feed-forward neural networks. PhD thesis, Aston University, Birmingham, UK, February 1996.
6 Figure 1: Zooming on the chosen case in the psychometric database. From top left to bottom right the number of points mapped (Stress values) are: 300 (0.931), 200 (0.926), 50 (0.893), 40 (0.887), 30 (0.883) and 20 (0.868).
Visualization of large data sets using MDS combined with LVQ.
Visualization of large data sets using MDS combined with LVQ. Antoine Naud and Włodzisław Duch Department of Informatics, Nicholas Copernicus University, Grudziądzka 5, 87-100 Toruń, Poland. www.phys.uni.torun.pl/kmk
A Computational Framework for Exploratory Data Analysis
A Computational Framework for Exploratory Data Analysis Axel Wismüller Depts. of Radiology and Biomedical Engineering, University of Rochester, New York 601 Elmwood Avenue, Rochester, NY 14642-8648, U.S.A.
ViSOM A Novel Method for Multivariate Data Projection and Structure Visualization
IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 13, NO. 1, JANUARY 2002 237 ViSOM A Novel Method for Multivariate Data Projection and Structure Visualization Hujun Yin Abstract When used for visualization of
Self Organizing Maps for Visualization of Categories
Self Organizing Maps for Visualization of Categories Julian Szymański 1 and Włodzisław Duch 2,3 1 Department of Computer Systems Architecture, Gdańsk University of Technology, Poland, [email protected]
Visualization of Breast Cancer Data by SOM Component Planes
International Journal of Science and Technology Volume 3 No. 2, February, 2014 Visualization of Breast Cancer Data by SOM Component Planes P.Venkatesan. 1, M.Mullai 2 1 Department of Statistics,NIRT(Indian
Meta-learning. Synonyms. Definition. Characteristics
Meta-learning Włodzisław Duch, Department of Informatics, Nicolaus Copernicus University, Poland, School of Computer Engineering, Nanyang Technological University, Singapore [email protected] (or search
Visualization of Topology Representing Networks
Visualization of Topology Representing Networks Agnes Vathy-Fogarassy 1, Agnes Werner-Stark 1, Balazs Gal 1 and Janos Abonyi 2 1 University of Pannonia, Department of Mathematics and Computing, P.O.Box
USING SELF-ORGANIZING MAPS FOR INFORMATION VISUALIZATION AND KNOWLEDGE DISCOVERY IN COMPLEX GEOSPATIAL DATASETS
USING SELF-ORGANIZING MAPS FOR INFORMATION VISUALIZATION AND KNOWLEDGE DISCOVERY IN COMPLEX GEOSPATIAL DATASETS Koua, E.L. International Institute for Geo-Information Science and Earth Observation (ITC).
High-dimensional labeled data analysis with Gabriel graphs
High-dimensional labeled data analysis with Gabriel graphs Michaël Aupetit CEA - DAM Département Analyse Surveillance Environnement BP 12-91680 - Bruyères-Le-Châtel, France Abstract. We propose the use
Data Mining and Neural Networks in Stata
Data Mining and Neural Networks in Stata 2 nd Italian Stata Users Group Meeting Milano, 10 October 2005 Mario Lucchini e Maurizo Pisati Università di Milano-Bicocca [email protected] [email protected]
A Partially Supervised Metric Multidimensional Scaling Algorithm for Textual Data Visualization
A Partially Supervised Metric Multidimensional Scaling Algorithm for Textual Data Visualization Ángela Blanco Universidad Pontificia de Salamanca [email protected] Spain Manuel Martín-Merino Universidad
The Value of Visualization 2
The Value of Visualization 2 G Janacek -0.69 1.11-3.1 4.0 GJJ () Visualization 1 / 21 Parallel coordinates Parallel coordinates is a common way of visualising high-dimensional geometry and analysing multivariate
Data topology visualization for the Self-Organizing Map
Data topology visualization for the Self-Organizing Map Kadim Taşdemir and Erzsébet Merényi Rice University - Electrical & Computer Engineering 6100 Main Street, Houston, TX, 77005 - USA Abstract. The
Methodology for Emulating Self Organizing Maps for Visualization of Large Datasets
Methodology for Emulating Self Organizing Maps for Visualization of Large Datasets Macario O. Cordel II and Arnulfo P. Azcarraga College of Computer Studies *Corresponding Author: [email protected]
HDDVis: An Interactive Tool for High Dimensional Data Visualization
HDDVis: An Interactive Tool for High Dimensional Data Visualization Mingyue Tan Department of Computer Science University of British Columbia [email protected] ABSTRACT Current high dimensional data visualization
Meta-learning via Search Combined with Parameter Optimization.
Meta-learning via Search Combined with Parameter Optimization. Włodzisław Duch and Karol Grudziński Department of Informatics, Nicholas Copernicus University, Grudziądzka 5, 87-100 Toruń, Poland. www.phys.uni.torun.pl/kmk
Exploratory Data Analysis with MATLAB
Computer Science and Data Analysis Series Exploratory Data Analysis with MATLAB Second Edition Wendy L Martinez Angel R. Martinez Jeffrey L. Solka ( r ec) CRC Press VV J Taylor & Francis Group Boca Raton
An Overview of Knowledge Discovery Database and Data mining Techniques
An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,
On the use of Three-dimensional Self-Organizing Maps for Visualizing Clusters in Geo-referenced Data
On the use of Three-dimensional Self-Organizing Maps for Visualizing Clusters in Geo-referenced Data Jorge M. L. Gorricha and Victor J. A. S. Lobo CINAV-Naval Research Center, Portuguese Naval Academy,
Advanced Web Usage Mining Algorithm using Neural Network and Principal Component Analysis
Advanced Web Usage Mining Algorithm using Neural Network and Principal Component Analysis Arumugam, P. and Christy, V Department of Statistics, Manonmaniam Sundaranar University, Tirunelveli, Tamilnadu,
Self Organizing Maps: Fundamentals
Self Organizing Maps: Fundamentals Introduction to Neural Networks : Lecture 16 John A. Bullinaria, 2004 1. What is a Self Organizing Map? 2. Topographic Maps 3. Setting up a Self Organizing Map 4. Kohonen
CURRICULUM VITAE. Norbert Jankowski. Department of Computer Methods. ul. Grudziądzka 5 87-100 Toruń Poland
CURRICULUM VITAE 22 September, 1999 Name Permanent address Norbert Jankowski Department of Computer Methods Nicholas Copernicus University ul. Grudziądzka 5 87-100 Toruń Poland phone: +48 56 6113307 e-mail:
An Analysis on Density Based Clustering of Multi Dimensional Spatial Data
An Analysis on Density Based Clustering of Multi Dimensional Spatial Data K. Mumtaz 1 Assistant Professor, Department of MCA Vivekanandha Institute of Information and Management Studies, Tiruchengode,
Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data
CMPE 59H Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data Term Project Report Fatma Güney, Kübra Kalkan 1/15/2013 Keywords: Non-linear
Comparing large datasets structures through unsupervised learning
Comparing large datasets structures through unsupervised learning Guénaël Cabanes and Younès Bennani LIPN-CNRS, UMR 7030, Université de Paris 13 99, Avenue J-B. Clément, 93430 Villetaneuse, France [email protected]
Social Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
CITY UNIVERSITY OF HONG KONG 香 港 城 市 大 學. Self-Organizing Map: Visualization and Data Handling 自 組 織 神 經 網 絡 : 可 視 化 和 數 據 處 理
CITY UNIVERSITY OF HONG KONG 香 港 城 市 大 學 Self-Organizing Map: Visualization and Data Handling 自 組 織 神 經 網 絡 : 可 視 化 和 數 據 處 理 Submitted to Department of Electronic Engineering 電 子 工 程 學 系 in Partial Fulfillment
Neural Networks Lesson 5 - Cluster Analysis
Neural Networks Lesson 5 - Cluster Analysis Prof. Michele Scarpiniti INFOCOM Dpt. - Sapienza University of Rome http://ispac.ing.uniroma1.it/scarpiniti/index.htm [email protected] Rome, 29
Online data visualization using the neural gas network
Online data visualization using the neural gas network Pablo A. Estévez, Cristián J. Figueroa Department of Electrical Engineering, University of Chile, Casilla 412-3, Santiago, Chile Abstract A high-quality
Visualization of General Defined Space Data
International Journal of Computer Graphics & Animation (IJCGA) Vol.3, No.4, October 013 Visualization of General Defined Space Data John R Rankin La Trobe University, Australia Abstract A new algorithm
EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set
EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set Amhmed A. Bhih School of Electrical and Electronic Engineering Princy Johnson School of Electrical and Electronic Engineering Martin
Selection of the Suitable Parameter Value for ISOMAP
1034 JOURNAL OF SOFTWARE, VOL. 6, NO. 6, JUNE 2011 Selection of the Suitable Parameter Value for ISOMAP Li Jing and Chao Shao School of Computer and Information Engineering, Henan University of Economics
Visibility optimization for data visualization: A Survey of Issues and Techniques
Visibility optimization for data visualization: A Survey of Issues and Techniques Ch Harika, Dr.Supreethi K.P Student, M.Tech, Assistant Professor College of Engineering, Jawaharlal Nehru Technological
Chapter ML:XI (continued)
Chapter ML:XI (continued) XI. Cluster Analysis Data Mining Overview Cluster Analysis Basics Hierarchical Cluster Analysis Iterative Cluster Analysis Density-Based Cluster Analysis Cluster Evaluation Constrained
ARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING)
ARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING) Gabriela Ochoa http://www.cs.stir.ac.uk/~goc/ OUTLINE Preliminaries Classification and Clustering Applications
An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015
An Introduction to Data Mining for Wind Power Management Spring 2015 Big Data World Every minute: Google receives over 4 million search queries Facebook users share almost 2.5 million pieces of content
Specific Usage of Visual Data Analysis Techniques
Specific Usage of Visual Data Analysis Techniques Snezana Savoska 1 and Suzana Loskovska 2 1 Faculty of Administration and Management of Information systems, Partizanska bb, 7000, Bitola, Republic of Macedonia
Support Vector Machines with Clustering for Training with Very Large Datasets
Support Vector Machines with Clustering for Training with Very Large Datasets Theodoros Evgeniou Technology Management INSEAD Bd de Constance, Fontainebleau 77300, France [email protected] Massimiliano
International Journal of Computer Science Trends and Technology (IJCST) Volume 3 Issue 3, May-June 2015
RESEARCH ARTICLE OPEN ACCESS Data Mining Technology for Efficient Network Security Management Ankit Naik [1], S.W. Ahmad [2] Student [1], Assistant Professor [2] Department of Computer Science and Engineering
Learning Vector Quantization: generalization ability and dynamics of competing prototypes
Learning Vector Quantization: generalization ability and dynamics of competing prototypes Aree Witoelar 1, Michael Biehl 1, and Barbara Hammer 2 1 University of Groningen, Mathematics and Computing Science
A Study of Web Log Analysis Using Clustering Techniques
A Study of Web Log Analysis Using Clustering Techniques Hemanshu Rana 1, Mayank Patel 2 Assistant Professor, Dept of CSE, M.G Institute of Technical Education, Gujarat India 1 Assistant Professor, Dept
A Survey on Pre-processing and Post-processing Techniques in Data Mining
, pp. 99-128 http://dx.doi.org/10.14257/ijdta.2014.7.4.09 A Survey on Pre-processing and Post-processing Techniques in Data Mining Divya Tomar and Sonali Agarwal Indian Institute of Information Technology,
Visualization by Linear Projections as Information Retrieval
Visualization by Linear Projections as Information Retrieval Jaakko Peltonen Helsinki University of Technology, Department of Information and Computer Science, P. O. Box 5400, FI-0015 TKK, Finland [email protected]
Visualizing an Auto-Generated Topic Map
Visualizing an Auto-Generated Topic Map Nadine Amende 1, Stefan Groschupf 2 1 University Halle-Wittenberg, information manegement technology [email protected] 2 media style labs Halle Germany [email protected]
How To Make Visual Analytics With Big Data Visual
Big-Data Visualization Customizing Computational Methods for Visual Analytics with Big Data Jaegul Choo and Haesun Park Georgia Tech O wing to the complexities and obscurities in large-scale datasets (
Quality Assessment in Spatial Clustering of Data Mining
Quality Assessment in Spatial Clustering of Data Mining Azimi, A. and M.R. Delavar Centre of Excellence in Geomatics Engineering and Disaster Management, Dept. of Surveying and Geomatics Engineering, Engineering
QoS Mapping of VoIP Communication using Self-Organizing Neural Network
QoS Mapping of VoIP Communication using Self-Organizing Neural Network Masao MASUGI NTT Network Service System Laboratories, NTT Corporation -9- Midori-cho, Musashino-shi, Tokyo 80-88, Japan E-mail: [email protected]
Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches
Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches PhD Thesis by Payam Birjandi Director: Prof. Mihai Datcu Problematic
DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS
DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS 1 AND ALGORITHMS Chiara Renso KDD-LAB ISTI- CNR, Pisa, Italy WHAT IS CLUSTER ANALYSIS? Finding groups of objects such that the objects in a group will be similar
Component Ordering in Independent Component Analysis Based on Data Power
Component Ordering in Independent Component Analysis Based on Data Power Anne Hendrikse Raymond Veldhuis University of Twente University of Twente Fac. EEMCS, Signals and Systems Group Fac. EEMCS, Signals
Virtual Landmarks for the Internet
Virtual Landmarks for the Internet Liying Tang Mark Crovella Boston University Computer Science Internet Distance Matters! Useful for configuring Content delivery networks Peer to peer applications Multiuser
Using Data Mining for Mobile Communication Clustering and Characterization
Using Data Mining for Mobile Communication Clustering and Characterization A. Bascacov *, C. Cernazanu ** and M. Marcu ** * Lasting Software, Timisoara, Romania ** Politehnica University of Timisoara/Computer
Using Smoothed Data Histograms for Cluster Visualization in Self-Organizing Maps
Technical Report OeFAI-TR-2002-29, extended version published in Proceedings of the International Conference on Artificial Neural Networks, Springer Lecture Notes in Computer Science, Madrid, Spain, 2002.
E-commerce Transaction Anomaly Classification
E-commerce Transaction Anomaly Classification Minyong Lee [email protected] Seunghee Ham [email protected] Qiyi Jiang [email protected] I. INTRODUCTION Due to the increasing popularity of e-commerce
Load balancing in a heterogeneous computer system by self-organizing Kohonen network
Bull. Nov. Comp. Center, Comp. Science, 25 (2006), 69 74 c 2006 NCC Publisher Load balancing in a heterogeneous computer system by self-organizing Kohonen network Mikhail S. Tarkov, Yakov S. Bezrukov Abstract.
Data Mining - Evaluation of Classifiers
Data Mining - Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical
How To Cluster
Data Clustering Dec 2nd, 2013 Kyrylo Bessonov Talk outline Introduction to clustering Types of clustering Supervised Unsupervised Similarity measures Main clustering algorithms k-means Hierarchical Main
Rule based Classification of BSE Stock Data with Data Mining
International Journal of Information Sciences and Application. ISSN 0974-2255 Volume 4, Number 1 (2012), pp. 1-9 International Research Publication House http://www.irphouse.com Rule based Classification
PLAANN as a Classification Tool for Customer Intelligence in Banking
PLAANN as a Classification Tool for Customer Intelligence in Banking EUNITE World Competition in domain of Intelligent Technologies The Research Report Ireneusz Czarnowski and Piotr Jedrzejowicz Department
Cluster Analysis: Advanced Concepts
Cluster Analysis: Advanced Concepts and dalgorithms Dr. Hui Xiong Rutgers University Introduction to Data Mining 08/06/2006 1 Introduction to Data Mining 08/06/2006 1 Outline Prototype-based Fuzzy c-means
Using multiple models: Bagging, Boosting, Ensembles, Forests
Using multiple models: Bagging, Boosting, Ensembles, Forests Bagging Combining predictions from multiple models Different models obtained from bootstrap samples of training data Average predictions or
Machine Learning and Data Analysis overview. Department of Cybernetics, Czech Technical University in Prague. http://ida.felk.cvut.
Machine Learning and Data Analysis overview Jiří Kléma Department of Cybernetics, Czech Technical University in Prague http://ida.felk.cvut.cz psyllabus Lecture Lecturer Content 1. J. Kléma Introduction,
Visualization of Large Font Databases
Visualization of Large Font Databases Martin Solli and Reiner Lenz Linköping University, Sweden ITN, Campus Norrköping, Linköping University, 60174 Norrköping, Sweden [email protected], [email protected]
Credit Card Fraud Detection Using Self Organised Map
International Journal of Information & Computation Technology. ISSN 0974-2239 Volume 4, Number 13 (2014), pp. 1343-1348 International Research Publications House http://www. irphouse.com Credit Card Fraud
Classification of Engineering Consultancy Firms Using Self-Organizing Maps: A Scientific Approach
International Journal of Civil & Environmental Engineering IJCEE-IJENS Vol:13 No:03 46 Classification of Engineering Consultancy Firms Using Self-Organizing Maps: A Scientific Approach Mansour N. Jadid
Comparison of Supervised and Unsupervised Learning Classifiers for Travel Recommendations
Volume 3, No. 8, August 2012 Journal of Global Research in Computer Science REVIEW ARTICLE Available Online at www.jgrcs.info Comparison of Supervised and Unsupervised Learning Classifiers for Travel Recommendations
TDA and Machine Learning: Better Together
TDA and Machine Learning: Better Together TDA AND MACHINE LEARNING: BETTER TOGETHER 2 TABLE OF CONTENTS The New Data Analytics Dilemma... 3 Introducing Topology and Topological Data Analysis... 3 The Promise
PRACTICAL DATA MINING IN A LARGE UTILITY COMPANY
QÜESTIIÓ, vol. 25, 3, p. 509-520, 2001 PRACTICAL DATA MINING IN A LARGE UTILITY COMPANY GEORGES HÉBRAIL We present in this paper the main applications of data mining techniques at Electricité de France,
Data, Measurements, Features
Data, Measurements, Features Middle East Technical University Dep. of Computer Engineering 2009 compiled by V. Atalay What do you think of when someone says Data? We might abstract the idea that data are
Multiscale Object-Based Classification of Satellite Images Merging Multispectral Information with Panchromatic Textural Features
Remote Sensing and Geoinformation Lena Halounová, Editor not only for Scientific Cooperation EARSeL, 2011 Multiscale Object-Based Classification of Satellite Images Merging Multispectral Information with
LVQ Plug-In Algorithm for SQL Server
LVQ Plug-In Algorithm for SQL Server Licínia Pedro Monteiro Instituto Superior Técnico [email protected] I. Executive Summary In this Resume we describe a new functionality implemented
Environmental Remote Sensing GEOG 2021
Environmental Remote Sensing GEOG 2021 Lecture 4 Image classification 2 Purpose categorising data data abstraction / simplification data interpretation mapping for land cover mapping use land cover class
Data visualization is a graphical presentation
Journal of Advances in Computer Engineering and Technology, 1(4) 2015 A new approach for data visualization problem MohammadRezaKeyvanpour,Mona Soleymanpour Received (2015-08-09) Accepted (2015-10-18)
Reconstructing Self Organizing Maps as Spider Graphs for better visual interpretation of large unstructured datasets
Reconstructing Self Organizing Maps as Spider Graphs for better visual interpretation of large unstructured datasets Aaditya Prakash, Infosys Limited [email protected] Abstract--Self-Organizing
KATE GLEASON COLLEGE OF ENGINEERING. John D. Hromi Center for Quality and Applied Statistics
ROCHESTER INSTITUTE OF TECHNOLOGY COURSE OUTLINE FORM KATE GLEASON COLLEGE OF ENGINEERING John D. Hromi Center for Quality and Applied Statistics NEW (or REVISED) COURSE (KGCOE- CQAS- 747- Principles of
A Discussion on Visual Interactive Data Exploration using Self-Organizing Maps
A Discussion on Visual Interactive Data Exploration using Self-Organizing Maps Julia Moehrmann 1, Andre Burkovski 1, Evgeny Baranovskiy 2, Geoffrey-Alexeij Heinze 2, Andrej Rapoport 2, and Gunther Heidemann
Utilizing spatial information systems for non-spatial-data analysis
Jointly published by Akadémiai Kiadó, Budapest Scientometrics, and Kluwer Academic Publishers, Dordrecht Vol. 51, No. 3 (2001) 563 571 Utilizing spatial information systems for non-spatial-data analysis
Monitoring of Complex Industrial Processes based on Self-Organizing Maps and Watershed Transformations
Monitoring of Complex Industrial Processes based on Self-Organizing Maps and Watershed Transformations Christian W. Frey 2012 Monitoring of Complex Industrial Processes based on Self-Organizing Maps and
6.2.8 Neural networks for data mining
6.2.8 Neural networks for data mining Walter Kosters 1 In many application areas neural networks are known to be valuable tools. This also holds for data mining. In this chapter we discuss the use of neural
Segmentation of stock trading customers according to potential value
Expert Systems with Applications 27 (2004) 27 33 www.elsevier.com/locate/eswa Segmentation of stock trading customers according to potential value H.W. Shin a, *, S.Y. Sohn b a Samsung Economy Research
