Models of Cortical Maps II

Similar documents
Self Organizing Maps: Fundamentals

Self-Organizing g Maps (SOM) COMP61021 Modelling and Visualization of High Dimensional Data

Artificial Neural Networks and Support Vector Machines. CS 486/686: Introduction to Artificial Intelligence

MANAGING QUEUE STABILITY USING ART2 IN ACTIVE QUEUE MANAGEMENT FOR CONGESTION CONTROL

Introduction to Machine Learning and Data Mining. Prof. Dr. Igor Trajkovski

ViSOM A Novel Method for Multivariate Data Projection and Structure Visualization

degrees of freedom and are able to adapt to the task they are supposed to do [Gupta].

Visualization of Topology Representing Networks

Visualization of Breast Cancer Data by SOM Component Planes

Lecture 6. Artificial Neural Networks

Data Mining and Neural Networks in Stata

Data Mining Techniques Chapter 7: Artificial Neural Networks

Stabilization by Conceptual Duplication in Adaptive Resonance Theory

EFFICIENT DATA PRE-PROCESSING FOR DATA MINING

Appendix 4 Simulation software for neuronal network models

An Introduction to Neural Networks

3. Reaction Diffusion Equations Consider the following ODE model for population growth

Fluid Dynamics and the Navier-Stokes Equation

Chapter 4: Artificial Neural Networks

Computational Optical Imaging - Optique Numerique. -- Deconvolution --

The Counterpropagation Network

A Partially Supervised Metric Multidimensional Scaling Algorithm for Textual Data Visualization

Neural Networks and Support Vector Machines

Learning Vector Quantization: generalization ability and dynamics of competing prototypes

TRAINING A LIMITED-INTERCONNECT, SYNTHETIC NEURAL IC

Online data visualization using the neural gas network

Example 4.1 (nonlinear pendulum dynamics with friction) Figure 4.1: Pendulum. asin. k, a, and b. We study stability of the origin x

EVALUATION OF NEURAL NETWORK BASED CLASSIFICATION SYSTEMS FOR CLINICAL CANCER DATA CLASSIFICATION

Biological Neurons and Neural Networks, Artificial Neurons

Artificial neural networks

CHAPTER 5 PREDICTIVE MODELING STUDIES TO DETERMINE THE CONVEYING VELOCITY OF PARTS ON VIBRATORY FEEDER

Machine Learning and Pattern Recognition Logistic Regression

Segmentation of stock trading customers according to potential value

4 Lyapunov Stability Theory

Neural Networks algorithms and applications

Visualization of large data sets using MDS combined with LVQ.

Neural Networks: a replacement for Gaussian Processes?

Machine Learning Introduction

The Open University s repository of research publications and other research outputs

15 Limit sets. Lyapunov functions

3 An Illustrative Example

Introduction to Machine Learning Using Python. Vikram Kamath

(Quasi-)Newton methods

Clustering & Visualization

Orthogonal Projections

Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches

ON THE K-WINNERS-TAKE-ALL NETWORK

A Learning Based Method for Super-Resolution of Low Resolution Images

Nonlinear Systems of Ordinary Differential Equations

Recurrent Neural Networks

Linear smoother. ŷ = S y. where s ij = s ij (x) e.g. s ij = diag(l i (x)) To go the other way, you need to diagonalize S

Stock Prediction using Artificial Neural Networks

Reconstructing Self Organizing Maps as Spider Graphs for better visual interpretation of large unstructured datasets

Digital image processing

AUTOMATION OF ENERGY DEMAND FORECASTING. Sanzad Siddique, B.S.

7 Gaussian Elimination and LU Factorization

Artificial Intelligence and Machine Learning Models

Machine Learning and Data Mining -

Follow links Class Use and other Permissions. For more information, send to:

THREE DIMENSIONAL REPRESENTATION OF AMINO ACID CHARAC- TERISTICS

Thnkwell s Homeschool Precalculus Course Lesson Plan: 36 weeks

Introduction to nonparametric regression: Least squares vs. Nearest neighbors

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

System Identification for Acoustic Comms.:

CHAPTER 6 PRINCIPLES OF NEURAL CIRCUITS.

Lecture Notes to Accompany. Scientific Computing An Introductory Survey. by Michael T. Heath. Chapter 10

Programming Exercise 3: Multi-class Classification and Neural Networks

D A T A M I N I N G C L A S S I F I C A T I O N

Neural network software tool development: exploring programming language options

Practice Final Math 122 Spring 12 Instructor: Jeff Lang

Lecture 8 : Dynamic Stability

An Analysis on Density Based Clustering of Multi Dimensional Spatial Data

Making Sense of the Mayhem: Machine Learning and March Madness

Math 241, Exam 1 Information.

2002 IEEE. Reprinted with permission.

NEURAL NETWORK FUNDAMENTALS WITH GRAPHS, ALGORITHMS, AND APPLICATIONS

General Theory of Differential Equations Sections 2.8, , 4.1

An Introduction to Machine Learning

NEURAL NETWORKS A Comprehensive Foundation

Metrics on SO(3) and Inverse Kinematics

THE HUMAN BRAIN. observations and foundations

Sensory-motor control scheme based on Kohonen Maps and AVITE model

An Introduction to Artificial Neural Networks (ANN) - Methods, Abstraction, and Usage

Big Data Analytics CSCI 4030

3.2 Sources, Sinks, Saddles, and Spirals

Monitoring of Complex Industrial Processes based on Self-Organizing Maps and Watershed Transformations

t := maxγ ν subject to ν {0,1,2,...} and f(x c +γ ν d) f(x c )+cγ ν f (x c ;d).

Dynamical Systems Analysis II: Evaluating Stability, Eigenvalues

Numerical methods for American options

AN INTRODUCTION TO NUMERICAL METHODS AND ANALYSIS

Role of Neural network in data mining

Neural Computation - Assignment

Study of the Human Eye Working Principle: An impressive high angular resolution system with simple array detectors

Data Mining Lab 5: Introduction to Neural Networks

Transcription:

CN510: Principles and Methods of Cognitive and Neural Modeling Models of Cortical Maps II Lecture 19 Instructor: Anatoli Gorchetchnikov <anatoli@bu.edu>

dy dt The Network of Grossberg (1976) Ay B y f ( y ) I C y f ( yk ) k where or I x w i i i x w x w cos F2 So the winner will be the cell that has incoming weights closest to the current input pattern (same is true for von der Malsburg network) F1

dy dt The Network of Grossberg (1976) Ay B y f ( y ) I C y f ( yk ) k In the F2 RCF of Grossberg (1976), there are no inherent neighborhood relationships of the F2 cells I.e., All cells excite themselves and inhibit equally all other cells in the network As a result the network forms no topological relationships whatsoever w i i-

von der Malsburg (1973) Model In von der Malsburg model the competition is distance dependent, as a result feature detectors for similar patterns are closer together than neurons responding to less similar patterns This gives a smoothly varying fuzzy classification where class nodes are also clustered

The tendency for nearby cells to be tuned to similar orientations results because the cells tend to fire in clusters This in turn is due to the nature of the on-center, off-surround interactions: w i i- Cell at location (i-1) becomes tuned to an orientation of 0 degrees Cell at (i+1) becomes tuned to 20 degrees Cell at location i, which fires often with both the cell at (i-1) and the cell at (i+1) due to the on-center offsurround connectivity, will tend to become tuned to the average of these orientations, i.e. 10 degrees

dy dt The Network of Grossberg (1976) One would not expect any tendency for the F2 cells to fire in clusters (even in the non-choice case), and therefore one would not expect that neighboring cells in F2 would become tuned to similar orientations Ay B y f ( y ) C I D y f ( y ) E i k i i k k To explain the visual cortex topography, the F2 network would need to be modified as shown w i C i This kind of RCF is briefly discussed in Grossberg (1976) and was studied in Levine and Grossberg (1976) E i i-

Issue with Stability in SOM The afferent weight vector becomes more parallel to the current F1 STM pattern CURRENT F1 STM PATTERN CHANGE DUE TO LEARNING WEIGHT VECTOR AFFERENT TO THE F2 WINNER Recall that we can set up a sequence of inputs so that the weight vectors keep rotating around and never stabilize

Issue with Competitive Learning Even with non-pathological data sequence it is possible that the weights will never stabilize, so the algorithm will never terminate One approach is to reduce the learning rate, but then after some time we will not learn new patterns we will not be able to track changes through time This leads to stability-plasticity dilemma

Kohonen s Critique and Approach Map in von der Malsburg s model is patchy, local order exists, but there is no global order, this is good for orientation selectivity but not good for retinotopy Gradual reduction of learning rate is OK, when the system learns the complete mapping from input to output it does not need to learn more plasticity in the cortex is much higher during development than later in life So he created a cross-breed between vector quantization and neural network to preserve a global mapping of the data

Vector Quantization Create a codebook of vectors w (similar to weights coming into neurons) Check the similarity between the input x and each of these vectors and select a winner Similarity can be based on dot product or simply on the difference between these two vectors Do a gradient descent optimization where () t w t x w = ( ) i i i is monotonically decreasing learning rate Note that this is very similar to postsynaptically gated decay = w x w y i i i except that learning rate is not constant and y {0,1}

The resulting codebook approximates the probability distribution of inputs, but in a disorganized way Kohonen wanted to create feature detectors mapped in such a way that resulting map conveys information about the similarity between original data points on a global scale That would also results in a smooth probability density surface Representation of data in a lower dimensional space so that the distances between points correspond to similarity between original data items is called multidimensional scaling Kohonen s SOM is often used as a visualization tool for analysis of high-dimensional data

Poverty Map A variety of factors representing quality of life used as feature vectors Colors represent the overall living quality

Preparations Need to have some distance metric for each of the components of the input vector x Those metrics should be weighted to reflect the importance of each individual component Relative magnitude of two first principal components of the input set can hint for the height/width ratio of the resulting map Resolution of the map is up to you, larger maps achieve better topology but require more time to converge

Algorithm For every data point x: Pick a winning node W with weight w closest to x Update the weight to all nodes according to w t h t x w = ( ) (,, ) i W i i where h() is a neighborhood function, for example ( W ) 2 () t h(,, t) h e W o 2 or some other function that favors the winner and its neighbors

Critical Point Both the learning rate and the neighborhood function shall decrease through time Starting with large neighborhood ensures global mapping, decreasing it allows to fine tune the details

Issues, Tips and Tricks As nothing favors a particular orientation of the resulting map, any mirror or point-symmetric inversion can result: Does not match the data on retinotopic map consistency 500 times the number of neurons is a good iteration count: Full scale visual cortex model will need too many iterations Learning rate shall be large (0.9) in the beginning, go down to about 0.01-0.1 in the first 1000 iterations, and continue to slowly decrease afterwards Might be reasonable considering developmental cycle Neighborhood size shall start with more than a half of network diameter, and decrease during first 1000 iterations to about nearest-neighbor size Not too reasonable, there is no such drastic connectivity reduction in the brain

Generally slow convergence Problems / Features Possible initializations that will not allow the network to converge Like VQ, SOM is sensitive to input distribution (can it be a reason for cortical magnification?)

All three models have Comparison of Three Models Competition at the second layer that allows only a subset of cells to learn for each input presentation Associative learning law Differences: Grossberg (1976) has WTA competition with global inhibition of all other cells by the winner, while von der Malsburg (1973) and Kohonen (1982) have distancedependent inhibition This gives the latter two the ability to learn smooth topological representation of the input features

Differences: Comparison of Three Models Kohonen (1982) does a gradual reduction of the neighborhood size as the learning progresses, the other two models do not This gives his network the ability to learn global topology of the input features during initial phase of learning and fine tune it later This also makes his model less realistic, in biology the global topology (like retinotopy) is probably genetically prewired Kohonen (1982) does a gradual reduction of the learning rate, the other two do not It is unclear if it is necessary without neighborhood reduction, but it sure leads to a much slower convergence

Differences: Comparison of Three Models Grossberg (1976) has shunting neuronal dynamics, von der Malsburg (1973) has additive dynamics, and Kohonen (1982) does not have any dynamics This gives the latter the simplicity for use outside of biological domain Grossberg (1976) and Kohonen (1982) use postsynaptic gating of the weight decay, von der Malsburg (1973) uses explicit normalization Both methods appear biologically realistic, but the latter is non-local Similarity between postsynaptically gated decay and gradient descent can lead to local minima, but without global topology it does not matter

Differences: Comparison of Three Models Kohonen (1982) has shown how the input distribution can lead to features similar to cortical magnification von der Malsburg (1973) network probably can do this too, as well as distance dependent Levine and Grossberg (1976) network Grossberg (1976) is stable and converging for all parameter choices, von der Malsburg (1973) and Kohonen (1982) are not

In general, systems of equations of a type x = f x,, x are not guaranteed to converge i 1 n Both Grossberg s and von der Malsburg s RCFs are of this type But if the equations are of certain specific types the stability can be proven

Stability of a Critical Point A critical point is stable if all the traectories that start near the critical point stay near this point as time evolves The system in the middle displays stable oscillations If the traectories converge to critical point, it is called asymptotically stable (point on the left) If the starting point does not matter than the critical point is globally asymptotically stable (right)

Stability of a Linear System If any eigenvalue has a positive real part the point is unstable and so is the system If real parts are negative or zero then things get complicated: If imaginary eigenvalues are non-repeating the point will show stable oscillations, but we will not consider the system stable The bigger issue is that neither Grossberg s RCF nor von der Malsburg s model are linear due to sigmoidal, quadratic, or threshold linear signal functions

General approach is: Stability of a Non-linear System determine all critical points approximate the system with a linear one through Jacobian in the neighborhood of each point analyze the local stability around each point if possible build a phase plane for a global picture There are also theorems that prove stability of different specific systems of non-linear ODEs Cohen-Grossberg theorem is one of these and it proves absolute stability of global pattern formation for RCF-like general systems Absolute stability of global pattern formation means any parameter or input choice will not break it

Cohen-Grossberg Theorem Applies to a particular but quite general class of systems: x = a x b x c d x Proves that this system converges to a stable equilibrium point if several conditions are met Four of those conditions: c is symmetric and non-negative: b is continuous: i i i i i i a is positive and continuous: i i d is non-negative and monotone: a i lim x g x c lim b x b g x g i i i i i i i c i 0 i 0; ai xi ai gi dixi 0; sign d x const i i

von der Malsburg and C-G Theorem Compare E ae p f ( E ) s R q f ( I ) i i i ki k li l k l I ai r f ( E ) i i i with x = a x b x c d x i i i i i i a=1, which is continuous and positive Reduced C-G becomes x = b x c d x i i i i

Compare with and von der Malsburg and C-G Theorem x = b x c d x i i i i b is linear and therefore continuous E = a E s R p f E q f I i i i ki k i li l k l I = a I r f E i i i i d is threshold-linear an therefore monotone and non-negative The final requirement is c is symmetric and non-negative: c 0 To ensure it we need to check p i, q i, and r i i c i

(1) E-cell to E-cell connections (p i ) E-cells excite their immediate neighbors in the array von der Malsburg and C-G Theorem These are symmetric, but positive, making c negative and breaking the requirement Note, that this requirement eliminates all networks with oncenter excitation from the list of eligible networks for C-G theorem

von der Malsburg and C-G Theorem (2) E-cell to I-cell connections (r i ) E-cells excite the corresponding I-cell and its immediate neighbors in the array (3) I-cell to E-cell connections (q i ) I-cells inhibit the next-toimmediate neighbor E- cells These are not even symmetric, also breaking the requirement

von der Malsburg and C-G Theorem Note that the previous slide reasoning would be true for any network where excitation and inhibition is separated into principal cells and interneurons with different weight kernels Thus von der Malsburg s model does not fit C-G requirements Does it automatically makes it unstable? Of course not But: not every set of parameters p, q, r, and s lead to stable solutions. those finally employed were found partly by trial and error Can we set any realistic on-center off-surround map meeting C-G requirements?

Is Proven Stability Absolutely Necessary? Although a very nice thing to have, a guarantee of stability is not always a necessary thing e.g., von der Malsburg s model is very useful and insightful even without a guarantee of stability Furthermore, many apparently stable networks are far too complicated to allow rigorous mathematical proof of stability However, knowing when your system will remain stable makes the ob of parameter searching significantly less troublesome and can insure reliable system performance for untested input domains Are networks in the brain stable for all parameter choices?

Next Time Diffusely proecting transmitter systems of the CNS and the implications of these systems in terms of a variety of drug effects Readings: None required, but any facts related to drug effects in the brain are welcome