Visualization of large data sets using MDS combined with LVQ.
|
|
|
- Arlene Darleen Cole
- 10 years ago
- Views:
Transcription
1 Visualization of large data sets using MDS combined with LVQ. Antoine Naud and Włodzisław Duch Department of Informatics, Nicholas Copernicus University, Grudziądzka 5, Toruń, Poland. Abstract. A common task in data mining is the visualization of multivariate objects using various methods, allowing human observers to perceive subtle inter-relations in the dataset. Multidimensional scaling (MDS) is a well known technique used for this purpose, but it due to its computational complexity there are limitations on the number of objects that can be displayed. Combining MDS with a clustering method as Learning Vector Quantization allows to obtain displays of large databases that preserve both high accuracy of clustering methods and good visualization properties. 1 Introduction Visualization helps in data exploration by providing the user an idea about clusters in data and their mutual relations, shows outliers and allows for rough classification of data. A very popular way to visualize data is based on self-organizing topographic feature maps (SOM) [1], known also as the Kohonen networks. SOM is so popular because the method may be applied to very large databases and provides clusterization as well as visualization at the same time. Unfortunately SOM networks are among the worst classifiers, as found in empirical studies [2]. There are many better clustering methods than SOM, for example methods based on dendrograms. We have compared SOM visualization using simplexes (up to dimension 20), hypercubes and points on the hypersphere, with a method that preserves correct topographical relations among multidimensional objects [3,4]. These results clearly show that minimization of a quantitative measures for the distortions of original data topography, i.e. the multidimensional scaling technique [5,6], generates maps that preserve both local and global features in a better way than SOM does. Unfortunately MDS requires difficult minimization of cost function that is based on distances among the represented multidimensional objects. For N objects N(N 1)/2 distances enter the cost function, and if maps are 2-dimensional they have 2N 3 adaptive parameters (position (x, y) of an arbitrary point and the rotation angle, or y coordinate of a second point, may be fixed). The MDS-based interactive software developed by one of us [7] performs a mapping of multivariate data from a high-dimensional space (D-dimensional space, D 3) to data points in a lower d-dimensional space (d D). Usually d = 2in order to allow visualization by scatterplots, but higher d values may be useful for dimensionality reduction. The MDS dimensionality reduction is such that similar
2 (or close, in the sense of some distance measure in the D-dimensional space) multivariate objects are mapped on representative points close to each other in the d- dimensional representation space. The mapping should preserve topography of data vectors. Linear projections are often not able to preserve topography; such mappings have to be non-linear. Topography preservation may be unreachable in the whole feature space if the analyzed multivariate data is intrinsically high dimensional and cannot be imbedded in a lower dimensional space without much distortion. Combination of this algorithm with clusterization methods have several advantages. First, hierachical clusterization methods or any other clusterization may be used. Second, information about the cluster centers may be displayed on the map and combined with MDS information, giving more details than dendrograms typically used to display clusters. Third, multi-level mapping, starting from large clusters and creating smaller ones, may help to find a better MDS solution since initially smaller number of points are mapped. Fourth, it may be applied to large datasets, and thus be competitive to SOM, providing good clusterization and visualization. The process that combines clusterization with dimensionality reduction is described below. We have used Learning Vector Quantization (LVQ) for clusterization [1] but any other algorithm can be used. Since it is not clear how accurate are the maps created in this way, we have compared them to the original, fully optimized maps in the third section. A short discussion concludes this paper. 2 Combining MDS with LVQ A direct visualization by MDS of large datasets may be unpracticable due to the limitation on the number of simultaneously mapped objects. Various approaches have been proposed to tackle this problem. Our approach is similar to the so-called frame method [8]. It proceeds as follows: First build (by some heuristics) a smaller dataset (called the basis B), designed as a first approximation of the original large dataset. Then map the Basis using standard MDS. Finally add to the mapped Basis all the remaining cases from the original dataset. Let us denote D the dataset to be visualized and N D the number of cases in this dataset. Let B denote the basis and N B be the size of the basis, N t = N m + N f is the total number of points considered during the mapping, F = {P i,i = 1,N f } the set of known data points and by M = {P i,i = 1,N m } the set of new data points. In Chang and Lee [8] the basis B is build by selecting a subset of points from D, as a result the final display highly depends on which points are selected. We prefer to use a clustering technique on the dataset in order to obtain N B cluster centers that more uniformly represent the data. For the final step, we use relative MDS [13], where each new data is individually added to the mapped Basis using an algorithm that is much less time consuming than the full Stress minimization. The general process proposed for mapping large datasets is shown in figure 1. Relative MDS mapping differs from standard MDS in this respect that during minimization of the topography distortion measure (called Stress") only the points from M set are allowed to move while the points from F set are kept fixed. This is
3 N cases D Dataset Basis 1 LVQ 2 N B cases F features 3 relative MDS MDS Map F features 2-D Fig. 1. Process used for the visualization of large datasets: 1 cluster the data using LVQ to form the so-called Basis, 2 Map the Basis using standard metric MDS, 3 Add the remaining Dataset cases to the mapped Basis using relative MDS. achieved by modifying the Stress function so that it sums only over the distances that change during mapping, i.e. the distances between the added and the basis points, and the added points interpoint distances. The original Stress function: is redefined as: S(x)= S r (x)= N D w ( ) ij dˆ 2 ij d ij (1) ij N D N B i=1 j=1 w ij ( ˆ d ij d ij ) 2 (2) In relative mapping, the order in which the data are added does not influence the final result because each case is added independently from the other cases. Relative mapping can also be seen as an alternative to methods designed for the purpose of giving generalization capability to Sammon mapping, such as the ANN Sammon mapping [9], Neuroscale [10] or incremental scaling [11]. 3 Visualization of the Satimage dataset It is interesting to see where a new data points falls among known cases, discover the class of its neighbors (classified or labeled), and to get an insight on how a classifier would evaluate this new data. Satimage dataset comes from the Landsat MSS imagery, it consists of four digital images of the same scene in different spectral bands. This dataset is often used to perform comparison of classifiers, and it is part
4 of the UCI repository [12]. The dataset is made of N D = 4435 cases with 36 numerical features in the range 0 to 255. Each case stands for one pixel in the image, the pixels are attributed to 6 classes standing for different soil types photographed. The visualization of the N D cases has been performed by clustering the data into N B cluster centers. As we are interested in determining the optimal size for the basis, we performed a series of mappings with different values for N B and compared the resulting displays visually and by means of the final Stress. The following values were chosen: N B = 100,300,500,700,1000,1200,1300, this last value is the maximum reachable with 256 MB of RAM. The final Stress values obtained are presented in Fig. 2. As can be expected, the final Stress value decreases when the Basis size increases. The curve should converge towards the lowest Stress obtained when mapping all N D cases in one MDS run. Stress number of code vectors Fig. 2. Final Stress values obtained after relative mapping of the dataset cases added to a Basis of 100, 300, 500, 1000, 1200 and 1300 LVQ code vectors. The whole mapped Satimage dataset is shown in figure 3. It can be seen that a Basis size of N B = 500 cluster centers is enough to get quite good display because the obtained configuration does not change much for higher sizes. This optimal size should also be observed on the curve of Fig. 2, where the speed" of Stress decrease slows down. 4 Discussion The main advantage of MDS dimensionality reduction is that the topographical distortions induced can evaluated by the value of the Stress function reached during
5 minimization. In this paper we have shown that the main disadvantage of MDS, its computational complexity, may be removed by mapping smaller number of cluster centers and adding the remining points using relative maping. Interactive zooming on interesting areas of the input space [13] allows for exploration of the neighborhood of the case under inspection. Since there may be some uncertainty in the input one may generate a number (for example 100) of vectors drawn from a Gaussian distribution centered at this vector. The class of these additional points is determined using neural or other classification systems and corresponding points added (using relative mapping) to the map. Visual inspection of such map allows estimating the probability of misclassification, proportional to the number of points from alternative classes. Acknowledgments: Support by the Polish Committee for Scientific Research, grant 8 T11C , is gratefully acknowledged. References 1. Kohonen T. (1995) Self-Organizing Maps. Springer-Verlag, Heidelberg Berlin 2. D. Michie, D. J. Spiegelhalter, and C. C. Taylor, editors. Machine Learning, Neural and Statistical Classification. Ellis Horwood, New York, Duch W, Naud A. (1996) Multidimensional scaling and Kohonen s self-organizing maps. In Proc. 2nd Conf. eural Networks and their Applications, Szczyrk, Poland, Vol. I, pp Duch W, Naud A. (1996) On global self-organizing maps. In Proc. 4th European Symposium on Artificial Neural Networks, Brugges, pp Kruskal J. B. (1964) Non metric multidimensional scaling : a numerical method. Psychometrika 29, Sammon J. W. (1969) A nonlinear mapping for data analysis. IEEE Transactions on Computers 5, Naud A. (2001) Neural and statistical methods for the visualization of multidimensional data. PhD Thesis, Dept. of Informatics, Nicholas Copernicus University, Poland Chang C. L, Lee R. C. T. (1973) A heuristic relaxation method for nonlinear mapping in cluster analysis. IEEE Transactions on Systems, Man, and Cybernetics 3, Kraaijveld M. A, Mao J, Jain A. K. (1995) A nonlinear projection method based on Kohonen s topology preserving maps. IEEE Transactions on Neural Networks, 6, Tipping M. E. (1996) Topographic mappings and feed-forward neural networks. PhD thesis, Aston University, Birmingham, UK 11. Basalaj W. (1999) Incremental multidimensional scaling method for database visualization. In Proceedings of Visual data Exploration and Analysis VI, SPIE, Vol. 3643, Blake, C.L, Merz, C.J. (1998). UCI Repository of machine learning databases mlearn/mlrepository.html. Irvine, CA: University of California, Department of Information and Computer Science. 13. Naud A, Duch W. (2000) Interactive data exploration using MDS mapping. In: 5th Conf. on Neural Networks and Soft Computing, Zakopane, Poland, pp
6 grey soil damp grey soil soil with vegetation stubble very damp grey soil cotton crop red soil (a) 100 code vectors, S = (b) 500 code vectors, S = (c) 1000 code vectors, S = (d) 1300 code vectors, S = Fig. 3. Visualization of Satimage dataset using different numbers of code vectors for LVQ clustering, to which the whole dataset has been added by relative mapping.
INTERACTIVE DATA EXPLORATION USING MDS MAPPING
INTERACTIVE DATA EXPLORATION USING MDS MAPPING Antoine Naud and Włodzisław Duch 1 Department of Computer Methods Nicolaus Copernicus University ul. Grudziadzka 5, 87-100 Toruń, Poland Abstract: Interactive
A Computational Framework for Exploratory Data Analysis
A Computational Framework for Exploratory Data Analysis Axel Wismüller Depts. of Radiology and Biomedical Engineering, University of Rochester, New York 601 Elmwood Avenue, Rochester, NY 14642-8648, U.S.A.
ViSOM A Novel Method for Multivariate Data Projection and Structure Visualization
IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 13, NO. 1, JANUARY 2002 237 ViSOM A Novel Method for Multivariate Data Projection and Structure Visualization Hujun Yin Abstract When used for visualization of
Data topology visualization for the Self-Organizing Map
Data topology visualization for the Self-Organizing Map Kadim Taşdemir and Erzsébet Merényi Rice University - Electrical & Computer Engineering 6100 Main Street, Houston, TX, 77005 - USA Abstract. The
Self Organizing Maps for Visualization of Categories
Self Organizing Maps for Visualization of Categories Julian Szymański 1 and Włodzisław Duch 2,3 1 Department of Computer Systems Architecture, Gdańsk University of Technology, Poland, [email protected]
A Partially Supervised Metric Multidimensional Scaling Algorithm for Textual Data Visualization
A Partially Supervised Metric Multidimensional Scaling Algorithm for Textual Data Visualization Ángela Blanco Universidad Pontificia de Salamanca [email protected] Spain Manuel Martín-Merino Universidad
High-dimensional labeled data analysis with Gabriel graphs
High-dimensional labeled data analysis with Gabriel graphs Michaël Aupetit CEA - DAM Département Analyse Surveillance Environnement BP 12-91680 - Bruyères-Le-Châtel, France Abstract. We propose the use
Visualization of Breast Cancer Data by SOM Component Planes
International Journal of Science and Technology Volume 3 No. 2, February, 2014 Visualization of Breast Cancer Data by SOM Component Planes P.Venkatesan. 1, M.Mullai 2 1 Department of Statistics,NIRT(Indian
Categorical Data Visualization and Clustering Using Subjective Factors
Categorical Data Visualization and Clustering Using Subjective Factors Chia-Hui Chang and Zhi-Kai Ding Department of Computer Science and Information Engineering, National Central University, Chung-Li,
Meta-learning. Synonyms. Definition. Characteristics
Meta-learning Włodzisław Duch, Department of Informatics, Nicolaus Copernicus University, Poland, School of Computer Engineering, Nanyang Technological University, Singapore [email protected] (or search
Meta-learning via Search Combined with Parameter Optimization.
Meta-learning via Search Combined with Parameter Optimization. Włodzisław Duch and Karol Grudziński Department of Informatics, Nicholas Copernicus University, Grudziądzka 5, 87-100 Toruń, Poland. www.phys.uni.torun.pl/kmk
An Analysis on Density Based Clustering of Multi Dimensional Spatial Data
An Analysis on Density Based Clustering of Multi Dimensional Spatial Data K. Mumtaz 1 Assistant Professor, Department of MCA Vivekanandha Institute of Information and Management Studies, Tiruchengode,
Visualization of Topology Representing Networks
Visualization of Topology Representing Networks Agnes Vathy-Fogarassy 1, Agnes Werner-Stark 1, Balazs Gal 1 and Janos Abonyi 2 1 University of Pannonia, Department of Mathematics and Computing, P.O.Box
Multiscale Object-Based Classification of Satellite Images Merging Multispectral Information with Panchromatic Textural Features
Remote Sensing and Geoinformation Lena Halounová, Editor not only for Scientific Cooperation EARSeL, 2011 Multiscale Object-Based Classification of Satellite Images Merging Multispectral Information with
Methodology for Emulating Self Organizing Maps for Visualization of Large Datasets
Methodology for Emulating Self Organizing Maps for Visualization of Large Datasets Macario O. Cordel II and Arnulfo P. Azcarraga College of Computer Studies *Corresponding Author: [email protected]
TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM
TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM Thanh-Nghi Do College of Information Technology, Cantho University 1 Ly Tu Trong Street, Ninh Kieu District Cantho City, Vietnam
Selection of the Suitable Parameter Value for ISOMAP
1034 JOURNAL OF SOFTWARE, VOL. 6, NO. 6, JUNE 2011 Selection of the Suitable Parameter Value for ISOMAP Li Jing and Chao Shao School of Computer and Information Engineering, Henan University of Economics
CITY UNIVERSITY OF HONG KONG 香 港 城 市 大 學. Self-Organizing Map: Visualization and Data Handling 自 組 織 神 經 網 絡 : 可 視 化 和 數 據 處 理
CITY UNIVERSITY OF HONG KONG 香 港 城 市 大 學 Self-Organizing Map: Visualization and Data Handling 自 組 織 神 經 網 絡 : 可 視 化 和 數 據 處 理 Submitted to Department of Electronic Engineering 電 子 工 程 學 系 in Partial Fulfillment
Visualizing class probability estimators
Visualizing class probability estimators Eibe Frank and Mark Hall Department of Computer Science University of Waikato Hamilton, New Zealand {eibe, mhall}@cs.waikato.ac.nz Abstract. Inducing classifiers
Data Mining and Neural Networks in Stata
Data Mining and Neural Networks in Stata 2 nd Italian Stata Users Group Meeting Milano, 10 October 2005 Mario Lucchini e Maurizo Pisati Università di Milano-Bicocca [email protected] [email protected]
Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches
Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches PhD Thesis by Payam Birjandi Director: Prof. Mihai Datcu Problematic
Reconstructing Self Organizing Maps as Spider Graphs for better visual interpretation of large unstructured datasets
Reconstructing Self Organizing Maps as Spider Graphs for better visual interpretation of large unstructured datasets Aaditya Prakash, Infosys Limited [email protected] Abstract--Self-Organizing
Online data visualization using the neural gas network
Online data visualization using the neural gas network Pablo A. Estévez, Cristián J. Figueroa Department of Electrical Engineering, University of Chile, Casilla 412-3, Santiago, Chile Abstract A high-quality
On the use of Three-dimensional Self-Organizing Maps for Visualizing Clusters in Geo-referenced Data
On the use of Three-dimensional Self-Organizing Maps for Visualizing Clusters in Geo-referenced Data Jorge M. L. Gorricha and Victor J. A. S. Lobo CINAV-Naval Research Center, Portuguese Naval Academy,
A SECURE DECISION SUPPORT ESTIMATION USING GAUSSIAN BAYES CLASSIFICATION IN HEALTH CARE SERVICES
A SECURE DECISION SUPPORT ESTIMATION USING GAUSSIAN BAYES CLASSIFICATION IN HEALTH CARE SERVICES K.M.Ruba Malini #1 and R.Lakshmi *2 # P.G.Scholar, Computer Science and Engineering, K. L. N College Of
Character Image Patterns as Big Data
22 International Conference on Frontiers in Handwriting Recognition Character Image Patterns as Big Data Seiichi Uchida, Ryosuke Ishida, Akira Yoshida, Wenjie Cai, Yaokai Feng Kyushu University, Fukuoka,
The Value of Visualization 2
The Value of Visualization 2 G Janacek -0.69 1.11-3.1 4.0 GJJ () Visualization 1 / 21 Parallel coordinates Parallel coordinates is a common way of visualising high-dimensional geometry and analysing multivariate
Comparing large datasets structures through unsupervised learning
Comparing large datasets structures through unsupervised learning Guénaël Cabanes and Younès Bennani LIPN-CNRS, UMR 7030, Université de Paris 13 99, Avenue J-B. Clément, 93430 Villetaneuse, France [email protected]
A Study of Web Log Analysis Using Clustering Techniques
A Study of Web Log Analysis Using Clustering Techniques Hemanshu Rana 1, Mayank Patel 2 Assistant Professor, Dept of CSE, M.G Institute of Technical Education, Gujarat India 1 Assistant Professor, Dept
CURRICULUM VITAE. Norbert Jankowski. Department of Computer Methods. ul. Grudziądzka 5 87-100 Toruń Poland
CURRICULUM VITAE 22 September, 1999 Name Permanent address Norbert Jankowski Department of Computer Methods Nicholas Copernicus University ul. Grudziądzka 5 87-100 Toruń Poland phone: +48 56 6113307 e-mail:
Advanced Web Usage Mining Algorithm using Neural Network and Principal Component Analysis
Advanced Web Usage Mining Algorithm using Neural Network and Principal Component Analysis Arumugam, P. and Christy, V Department of Statistics, Manonmaniam Sundaranar University, Tirunelveli, Tamilnadu,
DATA MINING TECHNIQUES AND APPLICATIONS
DATA MINING TECHNIQUES AND APPLICATIONS Mrs. Bharati M. Ramageri, Lecturer Modern Institute of Information Technology and Research, Department of Computer Application, Yamunanagar, Nigdi Pune, Maharashtra,
HDDVis: An Interactive Tool for High Dimensional Data Visualization
HDDVis: An Interactive Tool for High Dimensional Data Visualization Mingyue Tan Department of Computer Science University of British Columbia [email protected] ABSTRACT Current high dimensional data visualization
CROP CLASSIFICATION WITH HYPERSPECTRAL DATA OF THE HYMAP SENSOR USING DIFFERENT FEATURE EXTRACTION TECHNIQUES
Proceedings of the 2 nd Workshop of the EARSeL SIG on Land Use and Land Cover CROP CLASSIFICATION WITH HYPERSPECTRAL DATA OF THE HYMAP SENSOR USING DIFFERENT FEATURE EXTRACTION TECHNIQUES Sebastian Mader
IMPROVISATION OF STUDYING COMPUTER BY CLUSTER STRATEGIES
INTERNATIONAL JOURNAL OF ADVANCED RESEARCH IN ENGINEERING AND SCIENCE IMPROVISATION OF STUDYING COMPUTER BY CLUSTER STRATEGIES C.Priyanka 1, T.Giri Babu 2 1 M.Tech Student, Dept of CSE, Malla Reddy Engineering
Load balancing in a heterogeneous computer system by self-organizing Kohonen network
Bull. Nov. Comp. Center, Comp. Science, 25 (2006), 69 74 c 2006 NCC Publisher Load balancing in a heterogeneous computer system by self-organizing Kohonen network Mikhail S. Tarkov, Yakov S. Bezrukov Abstract.
Classifying Large Data Sets Using SVMs with Hierarchical Clusters. Presented by :Limou Wang
Classifying Large Data Sets Using SVMs with Hierarchical Clusters Presented by :Limou Wang Overview SVM Overview Motivation Hierarchical micro-clustering algorithm Clustering-Based SVM (CB-SVM) Experimental
A STUDY ON DATA MINING INVESTIGATING ITS METHODS, APPROACHES AND APPLICATIONS
A STUDY ON DATA MINING INVESTIGATING ITS METHODS, APPROACHES AND APPLICATIONS Mrs. Jyoti Nawade 1, Dr. Balaji D 2, Mr. Pravin Nawade 3 1 Lecturer, JSPM S Bhivrabai Sawant Polytechnic, Pune (India) 2 Assistant
Learning Vector Quantization: generalization ability and dynamics of competing prototypes
Learning Vector Quantization: generalization ability and dynamics of competing prototypes Aree Witoelar 1, Michael Biehl 1, and Barbara Hammer 2 1 University of Groningen, Mathematics and Computing Science
PLAANN as a Classification Tool for Customer Intelligence in Banking
PLAANN as a Classification Tool for Customer Intelligence in Banking EUNITE World Competition in domain of Intelligent Technologies The Research Report Ireneusz Czarnowski and Piotr Jedrzejowicz Department
Network Intrusion Detection Using an Improved Competitive Learning Neural Network
Network Intrusion Detection Using an Improved Competitive Learning Neural Network John Zhong Lei and Ali Ghorbani Faculty of Computer Science University of New Brunswick Fredericton, NB, E3B 5A3, Canada
Data Mining: A Preprocessing Engine
Journal of Computer Science 2 (9): 735-739, 2006 ISSN 1549-3636 2005 Science Publications Data Mining: A Preprocessing Engine Luai Al Shalabi, Zyad Shaaban and Basel Kasasbeh Applied Science University,
Self Organizing Maps: Fundamentals
Self Organizing Maps: Fundamentals Introduction to Neural Networks : Lecture 16 John A. Bullinaria, 2004 1. What is a Self Organizing Map? 2. Topographic Maps 3. Setting up a Self Organizing Map 4. Kohonen
COMPARISON OF OBJECT BASED AND PIXEL BASED CLASSIFICATION OF HIGH RESOLUTION SATELLITE IMAGES USING ARTIFICIAL NEURAL NETWORKS
COMPARISON OF OBJECT BASED AND PIXEL BASED CLASSIFICATION OF HIGH RESOLUTION SATELLITE IMAGES USING ARTIFICIAL NEURAL NETWORKS B.K. Mohan and S. N. Ladha Centre for Studies in Resources Engineering IIT
USING SELF-ORGANIZING MAPS FOR INFORMATION VISUALIZATION AND KNOWLEDGE DISCOVERY IN COMPLEX GEOSPATIAL DATASETS
USING SELF-ORGANIZING MAPS FOR INFORMATION VISUALIZATION AND KNOWLEDGE DISCOVERY IN COMPLEX GEOSPATIAL DATASETS Koua, E.L. International Institute for Geo-Information Science and Earth Observation (ITC).
Ensembles of Similarity-Based Models.
Ensembles of Similarity-Based Models. Włodzisław Duch and Karol Grudziński Department of Computer Methods, Nicholas Copernicus University, Grudziądzka 5, 87-100 Toruń, Poland. E-mails: {duch,kagru}@phys.uni.torun.pl
Mobile Phone APP Software Browsing Behavior using Clustering Analysis
Proceedings of the 2014 International Conference on Industrial Engineering and Operations Management Bali, Indonesia, January 7 9, 2014 Mobile Phone APP Software Browsing Behavior using Clustering Analysis
2 Visualized IEC. Shiobaru, Minami-ku, Fukuoka, 815-8540 Japan Tel&Fax: +81-92-553-4555, [email protected], http://www.kyushu-id.ac.
Visualized IEC: Interactive Evolutionary Computation with Multidimensional Data Visualization Norimasa Hayashida Hideyuki Takagi Kyushu Institute of Design Shiobaru, Minami-ku, Fukuoka, 81-84 Japan Tel&Fax:
How To Make Visual Analytics With Big Data Visual
Big-Data Visualization Customizing Computational Methods for Visual Analytics with Big Data Jaegul Choo and Haesun Park Georgia Tech O wing to the complexities and obscurities in large-scale datasets (
Using Smoothed Data Histograms for Cluster Visualization in Self-Organizing Maps
Technical Report OeFAI-TR-2002-29, extended version published in Proceedings of the International Conference on Artificial Neural Networks, Springer Lecture Notes in Computer Science, Madrid, Spain, 2002.
An Analysis of Missing Data Treatment Methods and Their Application to Health Care Dataset
P P P Health An Analysis of Missing Data Treatment Methods and Their Application to Health Care Dataset Peng Liu 1, Elia El-Darzi 2, Lei Lei 1, Christos Vasilakis 2, Panagiotis Chountas 2, and Wei Huang
An Overview of Knowledge Discovery Database and Data mining Techniques
An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,
Effective Data Mining Using Neural Networks
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 8, NO. 6, DECEMBER 1996 957 Effective Data Mining Using Neural Networks Hongjun Lu, Member, IEEE Computer Society, Rudy Setiono, and Huan Liu,
Classification of Engineering Consultancy Firms Using Self-Organizing Maps: A Scientific Approach
International Journal of Civil & Environmental Engineering IJCEE-IJENS Vol:13 No:03 46 Classification of Engineering Consultancy Firms Using Self-Organizing Maps: A Scientific Approach Mansour N. Jadid
Monitoring of Complex Industrial Processes based on Self-Organizing Maps and Watershed Transformations
Monitoring of Complex Industrial Processes based on Self-Organizing Maps and Watershed Transformations Christian W. Frey 2012 Monitoring of Complex Industrial Processes based on Self-Organizing Maps and
EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set
EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set Amhmed A. Bhih School of Electrical and Electronic Engineering Princy Johnson School of Electrical and Electronic Engineering Martin
International Journal of Computer Science Trends and Technology (IJCST) Volume 3 Issue 3, May-June 2015
RESEARCH ARTICLE OPEN ACCESS Data Mining Technology for Efficient Network Security Management Ankit Naik [1], S.W. Ahmad [2] Student [1], Assistant Professor [2] Department of Computer Science and Engineering
Enhancing Quality of Data using Data Mining Method
JOURNAL OF COMPUTING, VOLUME 2, ISSUE 9, SEPTEMBER 2, ISSN 25-967 WWW.JOURNALOFCOMPUTING.ORG 9 Enhancing Quality of Data using Data Mining Method Fatemeh Ghorbanpour A., Mir M. Pedram, Kambiz Badie, Mohammad
Novelty Detection in image recognition using IRF Neural Networks properties
Novelty Detection in image recognition using IRF Neural Networks properties Philippe Smagghe, Jean-Luc Buessler, Jean-Philippe Urban Université de Haute-Alsace MIPS 4, rue des Frères Lumière, 68093 Mulhouse,
Knowledge Discovery from patents using KMX Text Analytics
Knowledge Discovery from patents using KMX Text Analytics Dr. Anton Heijs [email protected] Treparel Abstract In this white paper we discuss how the KMX technology of Treparel can help searchers
Data visualization is a graphical presentation
Journal of Advances in Computer Engineering and Technology, 1(4) 2015 A new approach for data visualization problem MohammadRezaKeyvanpour,Mona Soleymanpour Received (2015-08-09) Accepted (2015-10-18)
Quality Assessment in Spatial Clustering of Data Mining
Quality Assessment in Spatial Clustering of Data Mining Azimi, A. and M.R. Delavar Centre of Excellence in Geomatics Engineering and Disaster Management, Dept. of Surveying and Geomatics Engineering, Engineering
E-commerce Transaction Anomaly Classification
E-commerce Transaction Anomaly Classification Minyong Lee [email protected] Seunghee Ham [email protected] Qiyi Jiang [email protected] I. INTRODUCTION Due to the increasing popularity of e-commerce
Data Quality Mining: Employing Classifiers for Assuring consistent Datasets
Data Quality Mining: Employing Classifiers for Assuring consistent Datasets Fabian Grüning Carl von Ossietzky Universität Oldenburg, Germany, [email protected] Abstract: Independent
SVM Ensemble Model for Investment Prediction
19 SVM Ensemble Model for Investment Prediction Chandra J, Assistant Professor, Department of Computer Science, Christ University, Bangalore Siji T. Mathew, Research Scholar, Christ University, Dept of
Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data
CMPE 59H Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data Term Project Report Fatma Güney, Kübra Kalkan 1/15/2013 Keywords: Non-linear
Impact of Boolean factorization as preprocessing methods for classification of Boolean data
Impact of Boolean factorization as preprocessing methods for classification of Boolean data Radim Belohlavek, Jan Outrata, Martin Trnecka Data Analysis and Modeling Lab (DAMOL) Dept. Computer Science,
A Discussion on Visual Interactive Data Exploration using Self-Organizing Maps
A Discussion on Visual Interactive Data Exploration using Self-Organizing Maps Julia Moehrmann 1, Andre Burkovski 1, Evgeny Baranovskiy 2, Geoffrey-Alexeij Heinze 2, Andrej Rapoport 2, and Gunther Heidemann
ADVANCED MACHINE LEARNING. Introduction
1 1 Introduction Lecturer: Prof. Aude Billard ([email protected]) Teaching Assistants: Guillaume de Chambrier, Nadia Figueroa, Denys Lamotte, Nicola Sommer 2 2 Course Format Alternate between: Lectures
SURVIVABILITY ANALYSIS OF PEDIATRIC LEUKAEMIC PATIENTS USING NEURAL NETWORK APPROACH
330 SURVIVABILITY ANALYSIS OF PEDIATRIC LEUKAEMIC PATIENTS USING NEURAL NETWORK APPROACH T. M. D.Saumya 1, T. Rupasinghe 2 and P. Abeysinghe 3 1 Department of Industrial Management, University of Kelaniya,
Analyzing The Role Of Dimension Arrangement For Data Visualization in Radviz
Analyzing The Role Of Dimension Arrangement For Data Visualization in Radviz Luigi Di Caro 1, Vanessa Frias-Martinez 2, and Enrique Frias-Martinez 2 1 Department of Computer Science, Universita di Torino,
Towards better accuracy for Spam predictions
Towards better accuracy for Spam predictions Chengyan Zhao Department of Computer Science University of Toronto Toronto, Ontario, Canada M5S 2E4 [email protected] Abstract Spam identification is crucial
Exploratory Data Analysis with MATLAB
Computer Science and Data Analysis Series Exploratory Data Analysis with MATLAB Second Edition Wendy L Martinez Angel R. Martinez Jeffrey L. Solka ( r ec) CRC Press VV J Taylor & Francis Group Boca Raton
Decision Tree Learning on Very Large Data Sets
Decision Tree Learning on Very Large Data Sets Lawrence O. Hall Nitesh Chawla and Kevin W. Bowyer Department of Computer Science and Engineering ENB 8 University of South Florida 4202 E. Fowler Ave. Tampa
Visibility optimization for data visualization: A Survey of Issues and Techniques
Visibility optimization for data visualization: A Survey of Issues and Techniques Ch Harika, Dr.Supreethi K.P Student, M.Tech, Assistant Professor College of Engineering, Jawaharlal Nehru Technological
How To Identify Noisy Variables In A Cluster
Identification of noisy variables for nonmetric and symbolic data in cluster analysis Marek Walesiak and Andrzej Dudek Wroclaw University of Economics, Department of Econometrics and Computer Science,
QoS Mapping of VoIP Communication using Self-Organizing Neural Network
QoS Mapping of VoIP Communication using Self-Organizing Neural Network Masao MASUGI NTT Network Service System Laboratories, NTT Corporation -9- Midori-cho, Musashino-shi, Tokyo 80-88, Japan E-mail: [email protected]
Feature Selection using Integer and Binary coded Genetic Algorithm to improve the performance of SVM Classifier
Feature Selection using Integer and Binary coded Genetic Algorithm to improve the performance of SVM Classifier D.Nithya a, *, V.Suganya b,1, R.Saranya Irudaya Mary c,1 Abstract - This paper presents,
An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015
An Introduction to Data Mining for Wind Power Management Spring 2015 Big Data World Every minute: Google receives over 4 million search queries Facebook users share almost 2.5 million pieces of content
Standardization and Its Effects on K-Means Clustering Algorithm
Research Journal of Applied Sciences, Engineering and Technology 6(7): 399-3303, 03 ISSN: 040-7459; e-issn: 040-7467 Maxwell Scientific Organization, 03 Submitted: January 3, 03 Accepted: February 5, 03
An Evaluation of Neural Networks Approaches used for Software Effort Estimation
Proc. of Int. Conf. on Multimedia Processing, Communication and Info. Tech., MPCIT An Evaluation of Neural Networks Approaches used for Software Effort Estimation B.V. Ajay Prakash 1, D.V.Ashoka 2, V.N.
Models of Cortical Maps II
CN510: Principles and Methods of Cognitive and Neural Modeling Models of Cortical Maps II Lecture 19 Instructor: Anatoli Gorchetchnikov dy dt The Network of Grossberg (1976) Ay B y f (
Data Mining - Evaluation of Classifiers
Data Mining - Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010
Network Intrusion Detection Systems
Network Intrusion Detection Systems False Positive Reduction Through Anomaly Detection Joint research by Emmanuele Zambon & Damiano Bolzoni 7/1/06 NIDS - False Positive reduction through Anomaly Detection
Grid Density Clustering Algorithm
Grid Density Clustering Algorithm Amandeep Kaur Mann 1, Navneet Kaur 2, Scholar, M.Tech (CSE), RIMT, Mandi Gobindgarh, Punjab, India 1 Assistant Professor (CSE), RIMT, Mandi Gobindgarh, Punjab, India 2
Specific Usage of Visual Data Analysis Techniques
Specific Usage of Visual Data Analysis Techniques Snezana Savoska 1 and Suzana Loskovska 2 1 Faculty of Administration and Management of Information systems, Partizanska bb, 7000, Bitola, Republic of Macedonia
Prediction of Stock Performance Using Analytical Techniques
136 JOURNAL OF EMERGING TECHNOLOGIES IN WEB INTELLIGENCE, VOL. 5, NO. 2, MAY 2013 Prediction of Stock Performance Using Analytical Techniques Carol Hargreaves Institute of Systems Science National University
. Learn the number of classes and the structure of each class using similarity between unlabeled training patterns
Outline Part 1: of data clustering Non-Supervised Learning and Clustering : Problem formulation cluster analysis : Taxonomies of Clustering Techniques : Data types and Proximity Measures : Difficulties
