On the use of Three-dimensional Self-Organizing Maps for Visualizing Clusters in Geo-referenced Data
|
|
|
- Ferdinand Patrick
- 10 years ago
- Views:
Transcription
1 On the use of Three-dimensional Self-Organizing Maps for Visualizing Clusters in Geo-referenced Data Jorge M. L. Gorricha and Victor J. A. S. Lobo CINAV-Naval Research Center, Portuguese Naval Academy, and ISEGI-UNL. Abstract. The Self-Organizing Map (SOM) is an artificial neural network that is very effective for clustering via visualization. Ideally, so as produce a good model, the output space dimension of the SOM should match the intrinsic dimension of the data. However, because it is very difficult or even impossible to visualize SOM s with more than two dimensions, the vast majority of applications use SOM with a regular two-dimensional (2D) grid of nodes. For complex problems, this poses a limitation on the quality of the results obtained. There are no theoretical problems in generating SOMs with higher dimensional output spaces, but the 3D SOMs have met limited success. In this paper we show that the 3D SOM can be used successfully for visualizing clusters in georeferenced data. To overcome the problem of visualizing the 3D grid of units, we start by assigning one primary color (of the RGB color scheme) to each of the three dimensions of the 3D SOM. We then use those colors when representing, on a geographic map, the geo-referenced elements that are mapped to each SOM unit. We then provide a comparison of a 2D and 3D SOM for a concrete problem. The results obtained point to a significant increase in the clustering quality due to use of 3D SOMs. Keywords: Clustering, geo-referenced data, geospatial clustering, threedimensional self-organizing Maps, visualization. 1 INTRODUCTION There is a wide range of problems that need to be addressed in a geo-spatial perspective. Some of these problems are often associated with environmental and socio-economic phenomena where the geographic position is a determinant element for analysis [1]. In such kind of analysis, frequently based on geo-referenced secondary data [1], the focus is centered in the search of patterns and spatial relationships, without defined a priori hypotheses [2]. This is achieved through clustering, defined as the unsupervised classification of patterns into groups [3]. It is also widely recognized that visualization is a potentially useful technique for pattern exploratory analysis and may, under certain circumstances, contribute to discover new knowledge. Moreover, when applied to geo-referenced data, visualization may allow the explanation of complex structures and phenomena in a spatial perspective [4]. Visualization is, in this perspective, defined as the use of
2 visual representations of data obtained with interactive computer systems in order to amplify cognition [5]. It is in this context that unsupervised neural networks, such as the SOM [6-8], have been proposed as tools for visualizing geo-referenced data [4]. In fact, the SOM algorithm performs both vector quantization (process of representing a given data set by a reduced set of reference vectors [9-11]) and vector projection, making this Artificial Neural Network (ANN) a very effective method for clustering via visualization [12]. Among all the strategies for visualizing the SOM, we are particularly interested in those that allow dealing with spatial dependency. One of the methods used to visualize geo-referenced data using the SOM consists in assigning different colors to the units of the SOM network, defined only in two dimensions (2D SOM), so that each geo-referenced element can be geographically represented with the color of its Best Matching Unit (BMU) [13]. Because it is very difficult or even impossible to visualize SOM s with more than two dimensions [14, 15], the output space of this ANN is generally defined by a regular two-dimensional grid of nodes (2D SOM). This approach, supported by a non-linear projection of data on a two-dimensional surface, performs a dimensionality reduction that generally leads to a loss of information, and for this reason there is a strong probability that some of the existing clusters will be undetectable [12]. However, in the particular case of geo-referenced data, it is possible to consider the use of a three-dimensional SOM for this purpose, thus adding one more dimension in the analysis, and consequently reducing information loss. As we shall see later, the inclusion of a third dimension in the analysis will allow us to identify some of the clusters that are undifferentiated in SOM s with the output space defined only in two dimensions. This paper is divided into five sections as follows: Section 2 presents the theoretical framework of the problem under review, especially regarding the use of SOM as a tool for visualizing clusters in a spatial perspective; In Section 3 we propose a method for visualizing clusters in geo-referenced data that uses the output space of a three dimensional SOM; Section 4 presents the results and discusses practical applications of the presented method, including experiments with real and artificial data; Finally, in Section 5 we present the general conclusions. 2 The Self-Organizing Map The SOM is an ANN based on an unsupervised learning process that performs a nonlinear mapping of high dimensional input data onto an ordered and structured array of nodes, generally of lower dimension [6]. As a result of this process, and by combining the properties of a vector quantization and vector projection algorithm, the SOM compresses information and reduces dimensionality [15, 16]. In its most usual form, the SOM algorithm performs a number of successive iterations until the reference vectors associated to the nodes of a bi-dimensional network represent, as far as possible, the input patterns that are closer to those nodes (vector quantization). In the end, every input pattern in the data set is mapped to one of the network nodes (vector projection).
3 After this optimization process, topological relations amongst input patterns are, whenever possible, preserved through the mapping process, allowing the similarities and dissimilarities in the data to be represented in the output space [7]. Therefore, the SOM algorithm establishes a nonlinear between the input data space and the map grid that is called the output space. Because the SOM converts nonlinear statistical relationships that exist in data into geometric relationships, able to be represented visually [6, 7], it can be considered a visualization method for multidimensional data specially adapted to display the clustering structure [17, 18], or in other words, as a diagram of clusters [7]. When compared with other clustering tools, the SOM is distinguished mainly by the fact that, during the learning process, the algorithm tries to guarantee the topological ordering of its units, thus allowing an analysis of proximity between the clusters and the visualization of their structure [13]. In order to transform the SOM into a better tool for exploratory data analysis, several methods have been developed increasing the capabilities of this algorithm for that purpose. These methods explore both perspectives of SOM: vector projection (an output space perspective) and vector quantization (an input data space perspective). Usually, the visualization of SOM is based on two-dimensional constructs such as the U-Matrix [19, 20], component planes, hit maps, and other similar variants [20, 21] or by exploring the data topology [22]. However the aim of this paper is not focused on all those strategies but only in those that allow visualizing clusters in a geo-spatial perspective. Typically, a clustering tool should ensure the representation of the existing data patterns, the definition of proximity between these patterns, the characterization of clusters and the final evaluation of output [3]. In the case of geo-referenced data, the clustering tool should also ensure that the groups are made in line with geographical closeness [13]. Thus, the geo-spatial perspective is, in fact, a crucial point that makes the difference between clustering geo-referenced data and other data. Recognizing that fact and knowing that the visualization of SOM can be considered by other means than the usually used methods, we will look now to one specific approach that has been proposed in order to deal with geo-spatial features. An alternative way to visualize the SOM can be reached by taking advantage of the very nature of geo-referenced data, coloring the geographic map with label colors obtained from the SOM units [13]. This approach is proposed in the Prototypically Exploratory Geovisualization Environment [23] (PEGE). This software incorporates the possibility of linking SOM to the geographic representation of data by color, allowing its analysis in a geo-spatial perspective. One possible application of PEGE, that constitutes the bottom line of this paper, consists in assigning colors to the map units of a 2D SOM with some kind of criterion (similarity by example) and finally coloring the geo-referenced elements with those colors. Fig. 1 shows an example of clustering geo-referenced data based on the application of this method. A color was assigned to each map unit of a 2D SOM defined with nine units (3x3). This map was trained with data related to the main causes of death in several European countries. As we can see through this example, the geo-spatial perspective seems to be essential to understand this particular phenomenon.
4 Fig. 1. The principal causes of death with a 2D SOM. This example was obtained by training a 2D SOM with data related to the main causes of death in several European countries. Each country was painted with the same color of its BMU in the SOM. Data Source: EUROSTAT. 3 Clustering Geo-referenced Data With 3D SOM In this section we propose a clustering method for geo-referenced data based on a visualization of the output space of a 3D SOM. This method is no more than a association of each of the three orthogonal axes (x, y and z) that define the SOM grid to one of the three primary colors: red, green and blue (RGB Scheme). As a result, each of the three dimensions of the 3D SOM will be expressed by a change in tone of one particular primary color (RGB), and each SOM unit will have a distinct color label. After that we can paint each geographic element with its BMU color. Fig. 2 represents schematically a SOM with 27 units (3x3x3) in RGB space followed by the geographical representation of several geo-referenced elements painted with colors labels of their BMU's. Formally, let us consider a SOM 3D defined with three dimensions [ u v w] and a rectangular topology. The SOM grid or the output space ( N ) is a set of ( u v w) units (nodes) defined in, such that: 3 T 3 N { n i [ x y z] : i 1, 2,..., ( u v w)} (1) Where x, y and z are the unit coordinates in the output space, such that: x 0,1,...,( u 1) y 0,1,...,( v 1) z 0,1,...,( w 1) (2)
5 Fig. 2. Linking SOM s knowledge to cartographic representation. A color is assigned to each SOM unit (following the topological order). Then the geo-referenced elements are painted with the color of their BMU s in the SOM. These coordinates must be adjusted to fit the RGB values, which typically vary between 0 and 1. The new coordinates ( R, G, B) of the unit n i in RGB space can be obtained through the range normalization of the initial values: x y z R ; G ; B (3) ( u 1) ( v 1) ( w 1) Finally, the interior of the polygon that defines each geo-referenced element mapped to the unit n i (BMU) can receive the color ( R, G, B), as may be seen in Fig. 2. The process is then repeated for all units of the map grid. 4 Experimental Results To quantify the efficiency of the proposed method we conducted several experiments. In this section we present the experimental results obtained using two geo-referenced data sets: a first one using artificial data, where we know exactly the number and extension of the clusters; and a second experiment using real data.
6 Experiment with Artificial Data To illustrate the use of tridimensional SOM s for clustering geo-referenced data, we designed a dataset for that purpose, inspired in one of the fields of application for this kind of tools: ecological modeling. In this special case, the geo-referenced dataset refers to an area of intensive fishing where there is a particular interest in the spatial analysis of the distribution of five species of great commercial importance. The dataset was constructed in order to characterize 225 sea areas, exclusively based on their biodiversity. We simulated a sampling procedure, assuming that each sample was representative of an area of approximately 50 square miles. All samples are geo-referenced to the centroid of the area, defined with geographical coordinates (x and y) and their attributes are the amount of each of five species of interest, expressed in tons. The initial data set was designed so that variables are in the same scale. However, as the variables have very different variances a Z-Score normalization was carried out to guarantee that all the variances are equal to 1. As we can see in Fig. 3 and Fig. 4, the map has a total of twelve well defined areas (geo-clusters), including a few small areas of spatial outliers. In fact, if we analyze only the attributes that refer to the species of interest, there are only eight distinct groups of data. Fig. 3 also represents the distribution of each variable. The dark areas correspond to high values of each variable. (a) (b) (c) (d) (e) Fig. 3. Artificial Dataset. The distribution of each variable it is also represented. The dark areas correspond to high values of each variable: (a) Variable 1; (b) Variable 2; (c) Variable 3; (d) Variable 4; (e) Variable 5.
7 Fig. 4. Artificial Dataset. All the twelve clusters are delimited. The first experiment was conducted in order to compare SOM s with different dimensions (3D SOM versus 2D SOM). Considering the size of the data set (225 georeferenced elements), we decided to use the following map sizes with a total of 64 network units for both models: 2D SOM: 8 8; 3D SOM: In the experiments, we always used the SOM Batch Algorithm implemented in SOMToolbox [24] with the following parameterizations: - Gaussian neighborhood function (Were tested several models with different neighborhood functions but the results were always better with this function); - The lattice was defined as rectangular for the 3D SOM (unique option allowed by SOMToolbox for SOM s with more than two dimensions) and hexagonal for the 2D SOM. The hexagonal lattice gives better results for 2D SOM s and each unit has the same number of neighbors as the units of the 3D SOMs (except, naturally, for the border units). By following this strategy we guarantee that the 3D SOM is compared with the best model of 2D SOM s; - The learning rate was 0.5 for the unfolding phase and 0.05 for the fine-tuning phase; - In both models we used an unfolding phase with 12 epochs and a fine-tuning phase with 48 epochs. Random and linear initializations were tested. Five hundred models were assessed for both topologies (using random initialization), and although we present statistical numerical results, the figures were obtained with a particular SOM that we chose as the best model. Considering that all the measures available to assess the map quality [6] have advantages and disadvantages and that it is not possible to indicate the best one, we opted for the models of both topologies that presented the minimum quantization error (QE). We also analyzed the topological error (TE), but since it proved to be always very low TE, this measure was not used to choose the final model. The topological error was
8 calculated as the proportion of all data vectors for which first and second BMUs are not adjacent units, i.e., where distance (measured in output space) between the first and second BMU is greater than 2 for the 2D SOM and 3 for 3D SOM. The results are presented and summarized in table 1: Table 1. Quantization error and topological error Random Initialization 2D SOM 3D SOM QE 0,3138 0,3697 Model with the minimum QE TE 0, QE 0,3333 0,4181 ( =0,0105) ( =0,0032) Average Values TE 0, ,00326 ( =0,0214) ( =0,0095) Linear Initialization QE 0,3282 0,4057 Linear Initialization Model TE 0 0 The value of standard deviation is between Brackets. Using the methodology proposed in section III we get the cartographic representation of both models, using the 2D SOM and 3D SOM. In Fig. 5 we present the result of the application of color labels linking the output space of a 2D SOM with the cartographic representation.
9 Fig. 5. Cartographic representation with 2D SOM. By inspection of the map we can t identify more than six well defined clusters and there is a false continuum linking several zones. As we can see, the cartographic representation of the 2D SOM does not show all the eight clusters. In fact, we can hardly say by inspection of the geographic map that there are more than six clusters. As regards the differentiation of the twelve defined areas, we may say that there is a mixed zone composed by zone 5 and zone 7; there is a false continuum linking zone 4 to zone 6; and some ambiguity between zone 1 and zone 3 and between zone 2 and zone 4. In Fig. 6 we show the U-matrix using the 2D SOM. The U-matrix exposes all the eight clusters. Fig. 6. U-Matrix 2D SOM. Despite the results obtained with the cartographic representation of 2D SOM (Fig. 5), it is important to note that the U-Matrix shows all eight groups very effectively. However, it is difficult to analyze this information in a geospatial perspective; in particular, it is difficult to identify the twelve different areas. Fig. 7 shows the geographic map with color labels obtained from the 3D SOM. In this particular case, it seems that the 3D SOM exposes all the eight clusters and all the
10 twelve different areas. However, there still remain some doubts relative to some areas, especially in zones 4 and 5. Fig. 7. Cartographic representation with 3D SOM. All the eight clusters are well defined. However, there still remain some doubts relative to zones 4 and 5.
11 Lisbon s metropolitan area Another experiment was conducted using a real geo-referenced data set to train several SOM s. This data set consists of 61 socio-demographic variables which describe a total of 3978 geo-referenced elements belonging to Lisbon s metropolitan area (see Fig. 8). The data was collected during the 2001 census and the variables describe the region according to five main areas of interest: type of construction, family structure, age structure, education levels and economic activities. Fig. 8. Lisbon Metropolitan Area. The data set was collected during the 2001 census and consists in 61 socio-demographic variables which describe a total of 3978 geo-referenced elements belonging to the Lisbon s metropolitan. Because the variables have different scales and ranges, we performed a linear range normalization to guarantee that all the variables take values between 0 and 1. As previously, these second tests were also conducted in order to compare qualitatively SOM s with different dimensions. Taking into account the size of the data set (3978 geo-referenced elements), we choose the following map sizes with a total of 512 network units for the 3D SOM and 2D SOM: - 2D SOM: 16 32; - 3D SOM: 8 8 8; Once again, we used the SOM Batch Algorithm parameterized this way: - Neighborhood function: Gaussian; - The lattice was defined rectangular for the 3D SOM and hexagonal for the 2D SOM; - The learning rate was 0.5 for the unfolding phase and 0.05 for the fine-tuning phase; In both models we used a unfolding phase with 8 epochs and a fine-tuning phase with 24 epochs. Both random initialization and linear initialization were tested. One hundred models were assessed for both topologies (with random
12 initialization). Once more, we opted for the maps of both topologies that present the minimum quantization error among all models with an acceptable topological error. The results are presented and summarized in Table 2: Table 2. Quantization error and topological error Random Initialization 2D SOM 3D SOM Model with the minimum QE QE 0,6170 0,6459 TE 0,0339 0,0261 QE 0,6197 0,6494 ( =0,0010) ( =0,0016) Average Values TE 0,0371 0,0343 ( =0,0031) ( =0,0069) Linear Initialization QE 0,6191 0,6458 Linear Initialization Model TE 0,0422 0,0206 The analysis of the U-Matrix presented in Fig. 9 indicates that there are several clusters, including some with well-defined borders. The darker blue shades represent dense areas in the input space. On the contrary, the red shades indicate sparse areas. In this work the interest lies not in the analysis of existing clusters but essentially in the comparison between the representations offered by two the types of topologies (2D SOM and 3D SOM).
13 Fig. 9. U-Matrix of a 2D SOM. It seems evident that the data set has a very complex structure with several clusters. Fig. 10 represents part of Lisbon s city center. The 2D SOM in Fig. 10 (a) is much less informative than the representation offered by the 3D SOM in Fig. 10 (b). In the cartographic representation, the results obtained with 2D SOM, when compared with the SOM 3D, are much less detailed. (a) (b) Fig. 10 Lisbon centre visualized with both 2D SOM and 3D SOM. (a) Represents the 2D SOM visualization; (b) represents the 3D SOM visualization (only output space). Naturally, the discrimination provided by 3D SOM may be artificial and forced. But the analysis of some particular differences between the maps points in the opposite direction: there are differences and some of those differences are visualized better with the inclusion of one more dimension. Let us consider the zone highlighted on both maps represented in Fig. 10. With the 2D SOM, the zone is similar to the neighborhood; on the contrary, with 3D SOM there is a difference. The zone indicated in the map is, in fact, different from its
14 neighbors and corresponds to the old Lisbon center ( Baixa Pombalina ). The main difference (among others) is the construction profile. Lisbon s centre and the nearby zones are essentially buildings constructed before 1919, very different from the rest of the city. In a global analysis it seems that the 2D SOM is not reflecting the main differences in the construction profile. 5 Conclusion In this paper we have presented a method for clustering geo-referenced data using the three dimensional SOM. The 3D SOM was compared with the 2D SOM using two datasets: one artificial dataset that consisted of 225 geo-referenced elements with 5 variables; and one real life data set that consisted of 3978 geo-referenced elements described by 61 variables. The experiments were conducted using several parameterizations of the SOM algorithm in order to optimize the final results of both topologies. In the first experiment, using an artificial dataset with clusters and geo-clusters known a priori, the 3D SOM has proved to be more effective in detecting the predefined homogenous groups from a spatial perspective. Nevertheless even with the use of one additional dimension there are still some difficulties to classify correctly all the geo-referenced elements. In what concerns to the effectiveness of the 3D SOM when applied to real data, we can say that the 3D topology was, in the tested data set, much more informative and revealed differences between geo-referenced elements that weren t accessible with the application of 2D SOM. However, the high discrimination of geo-referenced data provided by the application of 3D SOM creates a complex visualization scheme that makes it difficult to identify the global trends in data. So, the application of 3D SOM seems better suited to a more fine and detailed analysis. References 1. Openshaw, S., Developing Automated and Smart Spatial Pattern Exploration Tools for Geographical Information Systems Applications. The Statistician, (1): p Miller, H.J. and J. Han, Overview of geographic data mining and knowledge discovery, in Geographic Data Mining and Knowledge Discovery, H.J. Miller and J. Han, Editors. 2001, Taylor & Francis: London. 3. Jain, A.K., M.N. Murty, and P.J. Flynn, Data Clustering: A Review. ACM Computing Surveys, (3): p Koua, E.L. Using self-organizing maps for information visualization and knowledge discovery in complex geospatial datasets. in Proceedings of 21st International Cartographic Renaissance (ICC) Durban: International Cartographic Association. 5. Card, S.K., J.D. Mackinlay, and B. Shneiderman, Readings in Information Visualization: Using Vision to Think. 1999, Morgan Kaufmann Publishers: San Francisco. 6. Kohonen, T., Self-organizing Maps. 3rd ed. Springer Series in Information Sciences, ed. T.S. Huang, T. Kohonen, and M.R. Schroeder. 2001, New York: Springer.
15 7. Kohonen, T., The self-organizing map. Neurocomputing, (1-3): p Kohonen, T., The self-organizing map. Proceedings of the IEEE, (9): p Gersho, A., Principles of quantization. IEEE Transactions on Circuits and Systems, (7): p Gersho, A., Quantization. IEEE Communications Magazine, (5): p Buhmann, J. and H. Khnel. Complexity optimized vector quantization: a neural network approach. in Proceedings of DCC '92, Data Compression Conference. 1992: IEEE Comput. Soc. Press. 12. Flexer, A., On the use of self-organizing maps for clustering and visualization. Intelligent Data Analysis, (5): p Skupin, A. and P. Agarwal, What is a Self-organizing Map?, in Self-Organising Maps: applications in geographic information science, P. Agarwal and A. Skupin, Editors. 2008, John Wiley & Sons: Chichester, England. p Bação, F., V. Lobo, and M. Painho, The self-organizing map, the Geo-SOM, and relevant variants for geosciences. Computers & Geosciences, (2): p Vesanto, J., SOM Based Data Visualization Methods. Intelligent Data Analysis, (2): p Vesanto, J., et al., SOM Toolbox for Matlab , Helsinki Universitu of Techology: Espoo, Finland. 17. Himberg, J. A SOM based cluster visualization and its application for false coloring. in Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks Como, Italy. 18. Kaski, S., J. Venna, and T. Kohonen. Coloring that reveals high-dimensional structures in data. in Proceedings of 6th International Conference on Neural Information Processing Perth, WA: IEEE. 19. Ultsch, A. and H.P. Siemon. Kohonen's self organizing feature maps for exploratory data analysis. in Proceedings of International Neural Network Conference Paris: Kluwer Academic Press. 20. Ultsch, A. Maps for the visualization of high-dimensional data spaces. in Proceedings of the workshop on self-organizing maps Japan: Kyushu. 21. Kraaijveld, M.A., J. Mao, and A.K. Jain, A nonlinear projection method based on Kohonen's topology preserving maps. IEEE Transactions on Neural Networks, (3): p Tasdemir, K. and E. Merenyi, Exploiting Data Topology in Visualization and Clustering of Self-Organizing Maps. Neural Networks, IEEE Transactions on, (4): p Koua, E.L. and M. Kraak, An Integrated Exploratory Geovisualization Environment Based on Self-Organizing Map, in Self-Organising Maps: applications in geographic information science, P. Agarwal and A. Skupin, Editors. 2008, John Wiley & Sons: Chichester, England. p Alhoniemi, E., et al., SOM Toolbox
USING SELF-ORGANIZING MAPS FOR INFORMATION VISUALIZATION AND KNOWLEDGE DISCOVERY IN COMPLEX GEOSPATIAL DATASETS
USING SELF-ORGANIZING MAPS FOR INFORMATION VISUALIZATION AND KNOWLEDGE DISCOVERY IN COMPLEX GEOSPATIAL DATASETS Koua, E.L. International Institute for Geo-Information Science and Earth Observation (ITC).
Data topology visualization for the Self-Organizing Map
Data topology visualization for the Self-Organizing Map Kadim Taşdemir and Erzsébet Merényi Rice University - Electrical & Computer Engineering 6100 Main Street, Houston, TX, 77005 - USA Abstract. The
Visualization of Breast Cancer Data by SOM Component Planes
International Journal of Science and Technology Volume 3 No. 2, February, 2014 Visualization of Breast Cancer Data by SOM Component Planes P.Venkatesan. 1, M.Mullai 2 1 Department of Statistics,NIRT(Indian
VISUALIZATION OF GEOSPATIAL DATA BY COMPONENT PLANES AND U-MATRIX
VISUALIZATION OF GEOSPATIAL DATA BY COMPONENT PLANES AND U-MATRIX Marcos Aurélio Santos da Silva 1, Antônio Miguel Vieira Monteiro 2 and José Simeão Medeiros 2 1 Embrapa Tabuleiros Costeiros - Laboratory
Methodology for Emulating Self Organizing Maps for Visualization of Large Datasets
Methodology for Emulating Self Organizing Maps for Visualization of Large Datasets Macario O. Cordel II and Arnulfo P. Azcarraga College of Computer Studies *Corresponding Author: [email protected]
Local Anomaly Detection for Network System Log Monitoring
Local Anomaly Detection for Network System Log Monitoring Pekka Kumpulainen Kimmo Hätönen Tampere University of Technology Nokia Siemens Networks [email protected] [email protected] Abstract
A Discussion on Visual Interactive Data Exploration using Self-Organizing Maps
A Discussion on Visual Interactive Data Exploration using Self-Organizing Maps Julia Moehrmann 1, Andre Burkovski 1, Evgeny Baranovskiy 2, Geoffrey-Alexeij Heinze 2, Andrej Rapoport 2, and Gunther Heidemann
Clustering census data: comparing the performance of self-organising maps and k-means algorithms
Clustering census data: comparing the performance of self-organising maps and k-means algorithms Fernando Bação 1, Victor Lobo 1,2, Marco Painho 1 1 ISEGI/UNL, Campus de Campolide, 1070-312 LISBOA, Portugal
Data Clustering and Topology Preservation Using 3D Visualization of Self Organizing Maps
, July 4-6, 2012, London, U.K. Data Clustering and Topology Preservation Using 3D Visualization of Self Organizing Maps Z. Mohd Zin, M. Khalid, E. Mesbahi and R. Yusof Abstract The Self Organizing Maps
ViSOM A Novel Method for Multivariate Data Projection and Structure Visualization
IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 13, NO. 1, JANUARY 2002 237 ViSOM A Novel Method for Multivariate Data Projection and Structure Visualization Hujun Yin Abstract When used for visualization of
Monitoring of Complex Industrial Processes based on Self-Organizing Maps and Watershed Transformations
Monitoring of Complex Industrial Processes based on Self-Organizing Maps and Watershed Transformations Christian W. Frey 2012 Monitoring of Complex Industrial Processes based on Self-Organizing Maps and
An Analysis on Density Based Clustering of Multi Dimensional Spatial Data
An Analysis on Density Based Clustering of Multi Dimensional Spatial Data K. Mumtaz 1 Assistant Professor, Department of MCA Vivekanandha Institute of Information and Management Studies, Tiruchengode,
Using Smoothed Data Histograms for Cluster Visualization in Self-Organizing Maps
Technical Report OeFAI-TR-2002-29, extended version published in Proceedings of the International Conference on Artificial Neural Networks, Springer Lecture Notes in Computer Science, Madrid, Spain, 2002.
Visualization of large data sets using MDS combined with LVQ.
Visualization of large data sets using MDS combined with LVQ. Antoine Naud and Włodzisław Duch Department of Informatics, Nicholas Copernicus University, Grudziądzka 5, 87-100 Toruń, Poland. www.phys.uni.torun.pl/kmk
Data Mining and Neural Networks in Stata
Data Mining and Neural Networks in Stata 2 nd Italian Stata Users Group Meeting Milano, 10 October 2005 Mario Lucchini e Maurizo Pisati Università di Milano-Bicocca [email protected] [email protected]
A user friendly toolbox for exploratory data analysis of underwater sound
061215-064 1 A user friendly toolbox for exploratory data analysis of underwater sound Fernando J. Pires 1 and Victor Lobo 2, Member, IEEE Abstract The underwater acoustics research group at the Portuguese
A Computational Framework for Exploratory Data Analysis
A Computational Framework for Exploratory Data Analysis Axel Wismüller Depts. of Radiology and Biomedical Engineering, University of Rochester, New York 601 Elmwood Avenue, Rochester, NY 14642-8648, U.S.A.
INTERACTIVE DATA EXPLORATION USING MDS MAPPING
INTERACTIVE DATA EXPLORATION USING MDS MAPPING Antoine Naud and Włodzisław Duch 1 Department of Computer Methods Nicolaus Copernicus University ul. Grudziadzka 5, 87-100 Toruń, Poland Abstract: Interactive
Self Organizing Maps for Visualization of Categories
Self Organizing Maps for Visualization of Categories Julian Szymański 1 and Włodzisław Duch 2,3 1 Department of Computer Systems Architecture, Gdańsk University of Technology, Poland, [email protected]
Proceedings - AutoCarto 2012 - Columbus, Ohio, USA - September 16-18, 2012
Data Mining of Collaboratively Collected Geographic Crime Information Using an Unsupervised Neural Network Approach Julian Hagenauer, Marco Helbich (corresponding author), Michael Leitner, Jerry Ratcliffe,
Mobile Phone APP Software Browsing Behavior using Clustering Analysis
Proceedings of the 2014 International Conference on Industrial Engineering and Operations Management Bali, Indonesia, January 7 9, 2014 Mobile Phone APP Software Browsing Behavior using Clustering Analysis
USING THE AGGLOMERATIVE METHOD OF HIERARCHICAL CLUSTERING AS A DATA MINING TOOL IN CAPITAL MARKET 1. Vera Marinova Boncheva
382 [7] Reznik, A, Kussul, N., Sokolov, A.: Identification of user activity using neural networks. Cybernetics and computer techniques, vol. 123 (1999) 70 79. (in Russian) [8] Kussul, N., et al. : Multi-Agent
Quality Assessment in Spatial Clustering of Data Mining
Quality Assessment in Spatial Clustering of Data Mining Azimi, A. and M.R. Delavar Centre of Excellence in Geomatics Engineering and Disaster Management, Dept. of Surveying and Geomatics Engineering, Engineering
Clustering & Visualization
Chapter 5 Clustering & Visualization Clustering in high-dimensional databases is an important problem and there are a number of different clustering paradigms which are applicable to high-dimensional data.
A Partially Supervised Metric Multidimensional Scaling Algorithm for Textual Data Visualization
A Partially Supervised Metric Multidimensional Scaling Algorithm for Textual Data Visualization Ángela Blanco Universidad Pontificia de Salamanca [email protected] Spain Manuel Martín-Merino Universidad
Self Organizing Maps: Fundamentals
Self Organizing Maps: Fundamentals Introduction to Neural Networks : Lecture 16 John A. Bullinaria, 2004 1. What is a Self Organizing Map? 2. Topographic Maps 3. Setting up a Self Organizing Map 4. Kohonen
Online data visualization using the neural gas network
Online data visualization using the neural gas network Pablo A. Estévez, Cristián J. Figueroa Department of Electrical Engineering, University of Chile, Casilla 412-3, Santiago, Chile Abstract A high-quality
Self-Organizing g Maps (SOM) COMP61021 Modelling and Visualization of High Dimensional Data
Self-Organizing g Maps (SOM) Ke Chen Outline Introduction ti Biological Motivation Kohonen SOM Learning Algorithm Visualization Method Examples Relevant Issues Conclusions 2 Introduction Self-organizing
AN EXPERT SYSTEM TO ANALYZE HOMOGENEITY IN FUEL ELEMENT PLATES FOR RESEARCH REACTORS
AN EXPERT SYSTEM TO ANALYZE HOMOGENEITY IN FUEL ELEMENT PLATES FOR RESEARCH REACTORS Cativa Tolosa, S. and Marajofsky, A. Comisión Nacional de Energía Atómica Abstract In the manufacturing control of Fuel
Visualization of textual data: unfolding the Kohonen maps.
Visualization of textual data: unfolding the Kohonen maps. CNRS - GET - ENST 46 rue Barrault, 75013, Paris, France (e-mail: [email protected]) Ludovic Lebart Abstract. The Kohonen self organizing
SPECIAL PERTURBATIONS UNCORRELATED TRACK PROCESSING
AAS 07-228 SPECIAL PERTURBATIONS UNCORRELATED TRACK PROCESSING INTRODUCTION James G. Miller * Two historical uncorrelated track (UCT) processing approaches have been employed using general perturbations
Visualization of Topology Representing Networks
Visualization of Topology Representing Networks Agnes Vathy-Fogarassy 1, Agnes Werner-Stark 1, Balazs Gal 1 and Janos Abonyi 2 1 University of Pannonia, Department of Mathematics and Computing, P.O.Box
In Apostolos-Paul N. Refenes, Yaser Abu-Mostafa, John Moody, and Andreas Weigend (Eds.) Neural Networks in Financial Engineering. Proceedings of the Third International Conference on Neural Networks in
Multiscale Object-Based Classification of Satellite Images Merging Multispectral Information with Panchromatic Textural Features
Remote Sensing and Geoinformation Lena Halounová, Editor not only for Scientific Cooperation EARSeL, 2011 Multiscale Object-Based Classification of Satellite Images Merging Multispectral Information with
Comparison of K-means and Backpropagation Data Mining Algorithms
Comparison of K-means and Backpropagation Data Mining Algorithms Nitu Mathuriya, Dr. Ashish Bansal Abstract Data mining has got more and more mature as a field of basic research in computer science and
Comparison of Supervised and Unsupervised Learning Classifiers for Travel Recommendations
Volume 3, No. 8, August 2012 Journal of Global Research in Computer Science REVIEW ARTICLE Available Online at www.jgrcs.info Comparison of Supervised and Unsupervised Learning Classifiers for Travel Recommendations
Standardization and Its Effects on K-Means Clustering Algorithm
Research Journal of Applied Sciences, Engineering and Technology 6(7): 399-3303, 03 ISSN: 040-7459; e-issn: 040-7467 Maxwell Scientific Organization, 03 Submitted: January 3, 03 Accepted: February 5, 03
A Study of Web Log Analysis Using Clustering Techniques
A Study of Web Log Analysis Using Clustering Techniques Hemanshu Rana 1, Mayank Patel 2 Assistant Professor, Dept of CSE, M.G Institute of Technical Education, Gujarat India 1 Assistant Professor, Dept
Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data
CMPE 59H Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data Term Project Report Fatma Güney, Kübra Kalkan 1/15/2013 Keywords: Non-linear
EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set
EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set Amhmed A. Bhih School of Electrical and Electronic Engineering Princy Johnson School of Electrical and Electronic Engineering Martin
Comparing large datasets structures through unsupervised learning
Comparing large datasets structures through unsupervised learning Guénaël Cabanes and Younès Bennani LIPN-CNRS, UMR 7030, Université de Paris 13 99, Avenue J-B. Clément, 93430 Villetaneuse, France [email protected]
An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015
An Introduction to Data Mining for Wind Power Management Spring 2015 Big Data World Every minute: Google receives over 4 million search queries Facebook users share almost 2.5 million pieces of content
Specific Usage of Visual Data Analysis Techniques
Specific Usage of Visual Data Analysis Techniques Snezana Savoska 1 and Suzana Loskovska 2 1 Faculty of Administration and Management of Information systems, Partizanska bb, 7000, Bitola, Republic of Macedonia
Analysis of Performance Metrics from a Database Management System Using Kohonen s Self Organizing Maps
WSEAS Transactions on Systems Issue 3, Volume 2, July 2003, ISSN 1109-2777 629 Analysis of Performance Metrics from a Database Management System Using Kohonen s Self Organizing Maps Claudia L. Fernandez,
Visualization by Linear Projections as Information Retrieval
Visualization by Linear Projections as Information Retrieval Jaakko Peltonen Helsinki University of Technology, Department of Information and Computer Science, P. O. Box 5400, FI-0015 TKK, Finland [email protected]
NETWORK-BASED INTRUSION DETECTION USING NEURAL NETWORKS
1 NETWORK-BASED INTRUSION DETECTION USING NEURAL NETWORKS ALAN BIVENS [email protected] RASHEDA SMITH [email protected] CHANDRIKA PALAGIRI [email protected] BOLESLAW SZYMANSKI [email protected] MARK
CITY UNIVERSITY OF HONG KONG 香 港 城 市 大 學. Self-Organizing Map: Visualization and Data Handling 自 組 織 神 經 網 絡 : 可 視 化 和 數 據 處 理
CITY UNIVERSITY OF HONG KONG 香 港 城 市 大 學 Self-Organizing Map: Visualization and Data Handling 自 組 織 神 經 網 絡 : 可 視 化 和 數 據 處 理 Submitted to Department of Electronic Engineering 電 子 工 程 學 系 in Partial Fulfillment
Machine Learning and Data Mining. Regression Problem. (adapted from) Prof. Alexander Ihler
Machine Learning and Data Mining Regression Problem (adapted from) Prof. Alexander Ihler Overview Regression Problem Definition and define parameters ϴ. Prediction using ϴ as parameters Measure the error
High-dimensional labeled data analysis with Gabriel graphs
High-dimensional labeled data analysis with Gabriel graphs Michaël Aupetit CEA - DAM Département Analyse Surveillance Environnement BP 12-91680 - Bruyères-Le-Châtel, France Abstract. We propose the use
Topic Maps Visualization
Topic Maps Visualization Bénédicte Le Grand, Laboratoire d'informatique de Paris 6 Introduction Topic maps provide a bridge between the domains of knowledge representation and information management. Topics
Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches
Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches PhD Thesis by Payam Birjandi Director: Prof. Mihai Datcu Problematic
Visualizing an Auto-Generated Topic Map
Visualizing an Auto-Generated Topic Map Nadine Amende 1, Stefan Groschupf 2 1 University Halle-Wittenberg, information manegement technology [email protected] 2 media style labs Halle Germany [email protected]
The Research of Data Mining Based on Neural Networks
2011 International Conference on Computer Science and Information Technology (ICCSIT 2011) IPCSIT vol. 51 (2012) (2012) IACSIT Press, Singapore DOI: 10.7763/IPCSIT.2012.V51.09 The Research of Data Mining
Utilizing spatial information systems for non-spatial-data analysis
Jointly published by Akadémiai Kiadó, Budapest Scientometrics, and Kluwer Academic Publishers, Dordrecht Vol. 51, No. 3 (2001) 563 571 Utilizing spatial information systems for non-spatial-data analysis
STATISTICA. Clustering Techniques. Case Study: Defining Clusters of Shopping Center Patrons. and
Clustering Techniques and STATISTICA Case Study: Defining Clusters of Shopping Center Patrons STATISTICA Solutions for Business Intelligence, Data Mining, Quality Control, and Web-based Analytics Table
Data, Measurements, Features
Data, Measurements, Features Middle East Technical University Dep. of Computer Engineering 2009 compiled by V. Atalay What do you think of when someone says Data? We might abstract the idea that data are
A Comparative Study of clustering algorithms Using weka tools
A Comparative Study of clustering algorithms Using weka tools Bharat Chaudhari 1, Manan Parikh 2 1,2 MECSE, KITRC KALOL ABSTRACT Data clustering is a process of putting similar data into groups. A clustering
Data Exploration and Preprocessing. Data Mining and Text Mining (UIC 583 @ Politecnico di Milano)
Data Exploration and Preprocessing Data Mining and Text Mining (UIC 583 @ Politecnico di Milano) References Jiawei Han and Micheline Kamber, "Data Mining: Concepts and Techniques", The Morgan Kaufmann
Analecta Vol. 8, No. 2 ISSN 2064-7964
EXPERIMENTAL APPLICATIONS OF ARTIFICIAL NEURAL NETWORKS IN ENGINEERING PROCESSING SYSTEM S. Dadvandipour Institute of Information Engineering, University of Miskolc, Egyetemváros, 3515, Miskolc, Hungary,
Neural Network Add-in
Neural Network Add-in Version 1.5 Software User s Guide Contents Overview... 2 Getting Started... 2 Working with Datasets... 2 Open a Dataset... 3 Save a Dataset... 3 Data Pre-processing... 3 Lagging...
Neural networks and their rules for classification in marine geology
Neural networks and their rules for classification in marine geology Alfred Ultsch 1, Dieter Korus 2, Achim Wehrmann 3 Abstract Artificial neural networks are more and more used for classification. They
Clustering. Data Mining. Abraham Otero. Data Mining. Agenda
Clustering 1/46 Agenda Introduction Distance K-nearest neighbors Hierarchical clustering Quick reference 2/46 1 Introduction It seems logical that in a new situation we should act in a similar way as in
Segmentation of building models from dense 3D point-clouds
Segmentation of building models from dense 3D point-clouds Joachim Bauer, Konrad Karner, Konrad Schindler, Andreas Klaus, Christopher Zach VRVis Research Center for Virtual Reality and Visualization, Institute
Fuel Cell Health Monitoring Using Self Organizing Maps
A publication of CHEMICAL ENGINEERINGTRANSACTIONS VOL. 33, 2013 Guest Editors: Enrico Zio, Piero Baraldi Copyright 2013, AIDIC ServiziS.r.l., ISBN 978-88-95608-24-2; ISSN 1974-9791 The Italian Association
COMPARISON OF OBJECT BASED AND PIXEL BASED CLASSIFICATION OF HIGH RESOLUTION SATELLITE IMAGES USING ARTIFICIAL NEURAL NETWORKS
COMPARISON OF OBJECT BASED AND PIXEL BASED CLASSIFICATION OF HIGH RESOLUTION SATELLITE IMAGES USING ARTIFICIAL NEURAL NETWORKS B.K. Mohan and S. N. Ladha Centre for Studies in Resources Engineering IIT
Content Based Analysis of Email Databases Using Self-Organizing Maps
A. Nürnberger and M. Detyniecki, "Content Based Analysis of Email Databases Using Self-Organizing Maps," Proceedings of the European Symposium on Intelligent Technologies, Hybrid Systems and their implementation
Credit Card Fraud Detection Using Self Organised Map
International Journal of Information & Computation Technology. ISSN 0974-2239 Volume 4, Number 13 (2014), pp. 1343-1348 International Research Publications House http://www. irphouse.com Credit Card Fraud
GEO-VISUALIZATION SUPPORT FOR MULTIDIMENSIONAL CLUSTERING
Geoinformatics 2004 Proc. 12th Int. Conf. on Geoinformatics Geospatial Information Research: Bridging the Pacific and Atlantic University of Gävle, Sweden, 7-9 June 2004 GEO-VISUALIZATION SUPPORT FOR MULTIDIMENSIONAL
The Scientific Data Mining Process
Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In
QoS Mapping of VoIP Communication using Self-Organizing Neural Network
QoS Mapping of VoIP Communication using Self-Organizing Neural Network Masao MASUGI NTT Network Service System Laboratories, NTT Corporation -9- Midori-cho, Musashino-shi, Tokyo 80-88, Japan E-mail: [email protected]
PRACTICAL DATA MINING IN A LARGE UTILITY COMPANY
QÜESTIIÓ, vol. 25, 3, p. 509-520, 2001 PRACTICAL DATA MINING IN A LARGE UTILITY COMPANY GEORGES HÉBRAIL We present in this paper the main applications of data mining techniques at Electricité de France,
DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS
DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS 1 AND ALGORITHMS Chiara Renso KDD-LAB ISTI- CNR, Pisa, Italy WHAT IS CLUSTER ANALYSIS? Finding groups of objects such that the objects in a group will be similar
Novelty Detection in image recognition using IRF Neural Networks properties
Novelty Detection in image recognition using IRF Neural Networks properties Philippe Smagghe, Jean-Luc Buessler, Jean-Philippe Urban Université de Haute-Alsace MIPS 4, rue des Frères Lumière, 68093 Mulhouse,
International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014
RESEARCH ARTICLE OPEN ACCESS A Survey of Data Mining: Concepts with Applications and its Future Scope Dr. Zubair Khan 1, Ashish Kumar 2, Sunny Kumar 3 M.Tech Research Scholar 2. Department of Computer
