Self-Organizing g Maps (SOM) COMP61021 Modelling and Visualization of High Dimensional Data



Similar documents
Self Organizing Maps: Fundamentals

Monitoring of Complex Industrial Processes based on Self-Organizing Maps and Watershed Transformations

Visualization of Breast Cancer Data by SOM Component Planes

ViSOM A Novel Method for Multivariate Data Projection and Structure Visualization

Models of Cortical Maps II

Visualizing an Auto-Generated Topic Map

A Computational Framework for Exploratory Data Analysis

An Analysis on Density Based Clustering of Multi Dimensional Spatial Data

Online data visualization using the neural gas network

USING SELF-ORGANIZING MAPS FOR INFORMATION VISUALIZATION AND KNOWLEDGE DISCOVERY IN COMPLEX GEOSPATIAL DATASETS

CITY UNIVERSITY OF HONG KONG 香 港 城 市 大 學. Self-Organizing Map: Visualization and Data Handling 自 組 織 神 經 網 絡 : 可 視 化 和 數 據 處 理

Segmentation of stock trading customers according to potential value

Mobile Phone APP Software Browsing Behavior using Clustering Analysis

Data topology visualization for the Self-Organizing Map

Load balancing in a heterogeneous computer system by self-organizing Kohonen network

Data Mining and Neural Networks in Stata

Reconstructing Self Organizing Maps as Spider Graphs for better visual interpretation of large unstructured datasets

Data Mining Techniques Chapter 7: Artificial Neural Networks

Visualization of Topology Representing Networks

Advanced Web Usage Mining Algorithm using Neural Network and Principal Component Analysis

Using Smoothed Data Histograms for Cluster Visualization in Self-Organizing Maps

Comparing large datasets structures through unsupervised learning

Cluster Analysis: Advanced Concepts

On the use of Three-dimensional Self-Organizing Maps for Visualizing Clusters in Geo-referenced Data

6.2.8 Neural networks for data mining

Sensory-motor control scheme based on Kohonen Maps and AVITE model

INTERACTIVE DATA EXPLORATION USING MDS MAPPING

MANAGING QUEUE STABILITY USING ART2 IN ACTIVE QUEUE MANAGEMENT FOR CONGESTION CONTROL

Self Organizing Maps for Visualization of Categories

Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data

9. Text & Documents. Visualizing and Searching Documents. Dr. Thorsten Büring, 20. Dezember 2007, Vorlesung Wintersemester 2007/08

The Research of Data Mining Based on Neural Networks

EVALUATION OF NEURAL NETWORK BASED CLASSIFICATION SYSTEMS FOR CLINICAL CANCER DATA CLASSIFICATION

DOG Pets cat - dog - horse - hamster - rabbit - fish

A Study of Web Log Analysis Using Clustering Techniques

Digital image processing

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining

Data Mining for Customer Service Support. Senioritis Seminar Presentation Megan Boice Jay Carter Nick Linke KC Tobin

Image Classification for Dogs and Cats

Data Mining. Cluster Analysis: Advanced Concepts and Algorithms

Content Based Analysis of Databases Using Self-Organizing Maps

Classification of Engineering Consultancy Firms Using Self-Organizing Maps: A Scientific Approach

ARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING)

Machine Learning. CUNY Graduate Center, Spring Professor Liang Huang.

Data visualization is a graphical presentation

Network Intrusion Detection Using an Improved Competitive Learning Neural Network

Hierarchical Cluster Analysis Some Basics and Algorithms

DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS

SUCCESSFUL PREDICTION OF HORSE RACING RESULTS USING A NEURAL NETWORK

Medical Information Management & Mining. You Chen Jan,15, 2013 You.chen@vanderbilt.edu

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015

Using Predictive Analytics to Detect Fraudulent Claims

UNIVERSITY OF BOLTON SCHOOL OF ENGINEERING MS SYSTEMS ENGINEERING AND ENGINEERING MANAGEMENT SEMESTER 1 EXAMINATION 2015/2016 INTELLIGENT SYSTEMS

Credit Card Fraud Detection Using Self Organised Map

Local Anomaly Detection for Network System Log Monitoring

Neural networks and their rules for classification in marine geology

Chapter 7. Cluster Analysis

K-Means Clustering Tutorial

High-dimensional labeled data analysis with Gabriel graphs

Data Mining 資 料 探 勘. 分 群 分 析 (Cluster Analysis)

Neural Network Add-in

Analysis of Performance Metrics from a Database Management System Using Kohonen s Self Organizing Maps

USING SPECTRAL RADIUS RATIO FOR NODE DEGREE TO ANALYZE THE EVOLUTION OF SCALE- FREE NETWORKS AND SMALL-WORLD NETWORKS

Statistical Databases and Registers with some datamining

Distance Degree Sequences for Network Analysis

Exploratory data analysis approaches unsupervised approaches. Steven Kiddle With thanks to Richard Dobson and Emanuele de Rinaldis

Comparison of Supervised and Unsupervised Learning Classifiers for Travel Recommendations

Data Mining Project Report. Document Clustering. Meryem Uzun-Per

Visualization of large data sets using MDS combined with LVQ.

Accurate and robust image superresolution by neural processing of local image representations

Clustering. Data Mining. Abraham Otero. Data Mining. Agenda

A comparison of various clustering methods and algorithms in data mining

A Discussion on Visual Interactive Data Exploration using Self-Organizing Maps

jorge s. marques image processing

How To Understand The Network Of A Network

VISUALIZATION OF GEOSPATIAL DATA BY COMPONENT PLANES AND U-MATRIX

Bildverarbeitung und Mustererkennung Image Processing and Pattern Recognition

Transcription:

Self-Organizing g Maps (SOM) Ke Chen

Outline Introduction ti Biological Motivation Kohonen SOM Learning Algorithm Visualization Method Examples Relevant Issues Conclusions 2

Introduction Self-organizing maps (SOM) SOM is a biologically inspired unsupervised neural network that approximates an unlimited number of input data by a finite set of nodes arranged in a grid of low-dimension, where neighbor nodes correspond to more similar input data. The model is produced by a learning algorithm that automatically orders the inputs on a one or two-dimensional i grid according to their mutual similarity. il it Useful for clustering analysis and data visualization Input space Initial weights Final weights 3

Biological Motivation Mapping two dimensional continuous inputs from sensory organ (eyes, ears, skin, etc) to two dimensional discrete outputs in the nerve system. Retinotopic map: from eye (retina) to the visual cortex. Tonotopic map: from the ear to the auditory cortex These maps preserve topographic orders of input. Biological evidence shows that the connections in these maps are not entirely pre- programmed or pre-wired at birth. Learning must occur after the birth to create the necessary connections for appropriate topographic mapping. 4

Kohonen SOM 5

Kohonen SOM Competition 6

Kohonen SOM Cooperation 7

Kohonen SOM (see the algorithm on the next slide for details) Adaptation 8

Learning Algorithm neurons i and k 9

Visualization Method In 2D/3D dimensional space, neurons are visualized as changing positions in the weight space as learning takes place. Each neuron is described by the corresponding weight vector. Two neurons are connected by an edge if they are direct neighbors in the neural network lattice. For 2-D/3-D data, the lattice via weights can be displayed in the original data space. For high-dimension data, a unified distance matrix (U-matrix) is constructed t to facilitate t the visualization distance between the neighboring neurons gives an approximation of the distance between different parts of the underlying data depicted in an image, similar colors depict the closely spaced nodes and distinct colors indicate the more distant nodes groups of similar colors can be considered as a clusters, and the contrast parts as the boundary regions 10

Visualization Method Example: U-Matrix 11

Examples Example 1: 1-D self-organizing map 12

Examples Example 2: 2-D self-organizing map 13

Examples Example 3: self-organizing map of synthetic data sets After convergence of SOM learning, we achieve SOMs for dff different data distributions 14

Examples Example 4: Taxonomy of animals Animal names and their attributes is has likes to Dove Hen Duck Goose Owl Hawk Eagle Fox Dog Wolf Cat Tiger Lion Horse Zebra Cow Small 1 1 1 1 1 1 0 0 0 0 1 0 0 0 0 0 Medium 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 Big 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 2 legs 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 4 legs 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 Hair 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 Hooves 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 Mane 0 0 0 0 0 0 0 0 0 1 0 0 1 1 1 0 Feathers 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 Hunt 0 0 0 0 1 1 1 1 0 1 1 1 1 0 0 0 Run 0 0 0 0 0 0 0 0 1 1 0 1 1 1 1 0 Fly 1 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 Swim 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 A grouping with SOM according to similarity has emerged peaceful birds hunters 15

Examples Example 5: Macroeconomical data analysis Factors: annual increase (%), infant mortality ( ), illiteracy ratio (%), school attendance (%), GIP, annual GIP increase (%) (1990) 16

Examples Example 5: Macroeconomical data analysis (cont.) Applying PCA and SOM to this data set, we achieve different dff groupings PCA SOM From Data analysis: How to compare Kohonen neural networks to other techniques?, F. Blayo, P. Demartines, in IWANN 91 (Granada, Spain) proceedings, Springer-Verlag Lecture Notes in Computer Sciences 540, pp. 469-476. 17

Relevant Issues Training: order phase vs. convergence phase Order phase There is a topological ordering of weight vectors. It may take 1000 or more iterations of SOM algorithm. The choice of the parameter values is important. With a proper initial setting of the parameters, the neighborhood of the winning neuron includes almost all neurons in the network, then it shrinks slowly with time. Convergence phase Fine tune the weight vectors. Must be at least 500 times the number of neurons in the network thousands or tens of thousands of iterations. Choice of parameter values: η(t) maintained on the order of 0.01. Neighborhood function such that the neighbor of a BMU contains only the nearest neighbors. It eventually reduces to one or zero neighboring i neurons. 18

SOM extension Relevant Issues PSOM: continuous projection: interpolation between centroid locations dissom: SOM on dissimilarity between objects; more general than distance Nonnegative Matrix Factorization Hierarchical SOM: from single to multiple layers for multi-scale data analysis Generative topographic map (GTM): a probabilistic counterpart of the SOM and is provably convergent and does not require a shrinking neighborhood or a decreasing step size. Kernel SOM: overcome two major limitations of Kohonen SOM 19

Conclusions SOM is a biologically i ll inspired i neural network for high h dimensional data clustering and visualization. Its most important property is topology preservation. Learning gets involved in two phases: order vs. convergence It is no guarantee that SOM is always convergent and hence the parameter tuning is needed. There are several variants or extensions, which tends to overcome the limitations of the SOM. There are a number of successful applications of SOM. 20