Self-Organizing g Maps (SOM) Ke Chen
Outline Introduction ti Biological Motivation Kohonen SOM Learning Algorithm Visualization Method Examples Relevant Issues Conclusions 2
Introduction Self-organizing maps (SOM) SOM is a biologically inspired unsupervised neural network that approximates an unlimited number of input data by a finite set of nodes arranged in a grid of low-dimension, where neighbor nodes correspond to more similar input data. The model is produced by a learning algorithm that automatically orders the inputs on a one or two-dimensional i grid according to their mutual similarity. il it Useful for clustering analysis and data visualization Input space Initial weights Final weights 3
Biological Motivation Mapping two dimensional continuous inputs from sensory organ (eyes, ears, skin, etc) to two dimensional discrete outputs in the nerve system. Retinotopic map: from eye (retina) to the visual cortex. Tonotopic map: from the ear to the auditory cortex These maps preserve topographic orders of input. Biological evidence shows that the connections in these maps are not entirely pre- programmed or pre-wired at birth. Learning must occur after the birth to create the necessary connections for appropriate topographic mapping. 4
Kohonen SOM 5
Kohonen SOM Competition 6
Kohonen SOM Cooperation 7
Kohonen SOM (see the algorithm on the next slide for details) Adaptation 8
Learning Algorithm neurons i and k 9
Visualization Method In 2D/3D dimensional space, neurons are visualized as changing positions in the weight space as learning takes place. Each neuron is described by the corresponding weight vector. Two neurons are connected by an edge if they are direct neighbors in the neural network lattice. For 2-D/3-D data, the lattice via weights can be displayed in the original data space. For high-dimension data, a unified distance matrix (U-matrix) is constructed t to facilitate t the visualization distance between the neighboring neurons gives an approximation of the distance between different parts of the underlying data depicted in an image, similar colors depict the closely spaced nodes and distinct colors indicate the more distant nodes groups of similar colors can be considered as a clusters, and the contrast parts as the boundary regions 10
Visualization Method Example: U-Matrix 11
Examples Example 1: 1-D self-organizing map 12
Examples Example 2: 2-D self-organizing map 13
Examples Example 3: self-organizing map of synthetic data sets After convergence of SOM learning, we achieve SOMs for dff different data distributions 14
Examples Example 4: Taxonomy of animals Animal names and their attributes is has likes to Dove Hen Duck Goose Owl Hawk Eagle Fox Dog Wolf Cat Tiger Lion Horse Zebra Cow Small 1 1 1 1 1 1 0 0 0 0 1 0 0 0 0 0 Medium 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 Big 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 2 legs 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 4 legs 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 Hair 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 Hooves 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 Mane 0 0 0 0 0 0 0 0 0 1 0 0 1 1 1 0 Feathers 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 Hunt 0 0 0 0 1 1 1 1 0 1 1 1 1 0 0 0 Run 0 0 0 0 0 0 0 0 1 1 0 1 1 1 1 0 Fly 1 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 Swim 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 A grouping with SOM according to similarity has emerged peaceful birds hunters 15
Examples Example 5: Macroeconomical data analysis Factors: annual increase (%), infant mortality ( ), illiteracy ratio (%), school attendance (%), GIP, annual GIP increase (%) (1990) 16
Examples Example 5: Macroeconomical data analysis (cont.) Applying PCA and SOM to this data set, we achieve different dff groupings PCA SOM From Data analysis: How to compare Kohonen neural networks to other techniques?, F. Blayo, P. Demartines, in IWANN 91 (Granada, Spain) proceedings, Springer-Verlag Lecture Notes in Computer Sciences 540, pp. 469-476. 17
Relevant Issues Training: order phase vs. convergence phase Order phase There is a topological ordering of weight vectors. It may take 1000 or more iterations of SOM algorithm. The choice of the parameter values is important. With a proper initial setting of the parameters, the neighborhood of the winning neuron includes almost all neurons in the network, then it shrinks slowly with time. Convergence phase Fine tune the weight vectors. Must be at least 500 times the number of neurons in the network thousands or tens of thousands of iterations. Choice of parameter values: η(t) maintained on the order of 0.01. Neighborhood function such that the neighbor of a BMU contains only the nearest neighbors. It eventually reduces to one or zero neighboring i neurons. 18
SOM extension Relevant Issues PSOM: continuous projection: interpolation between centroid locations dissom: SOM on dissimilarity between objects; more general than distance Nonnegative Matrix Factorization Hierarchical SOM: from single to multiple layers for multi-scale data analysis Generative topographic map (GTM): a probabilistic counterpart of the SOM and is provably convergent and does not require a shrinking neighborhood or a decreasing step size. Kernel SOM: overcome two major limitations of Kohonen SOM 19
Conclusions SOM is a biologically i ll inspired i neural network for high h dimensional data clustering and visualization. Its most important property is topology preservation. Learning gets involved in two phases: order vs. convergence It is no guarantee that SOM is always convergent and hence the parameter tuning is needed. There are several variants or extensions, which tends to overcome the limitations of the SOM. There are a number of successful applications of SOM. 20