ENS Lyon Théorie des graphes avancée: Professeur: Eric Thierry. Hopfield Network. Alice Julien-Laferriere. Lyon

Similar documents
13.1 Synchronous and asynchronous networks

DATA ANALYSIS II. Matrix Algorithms

An Improved ACO Algorithm for Multicast Routing

Self Organizing Maps: Fundamentals

Introduction to Machine Learning and Data Mining. Prof. Dr. Igor Trajkovski

Why? A central concept in Computer Science. Algorithms are ubiquitous.

Practical Guide to the Simplex Method of Linear Programming

Feed-Forward mapping networks KAIST 바이오및뇌공학과 정재승

degrees of freedom and are able to adapt to the task they are supposed to do [Gupta].

A Review And Evaluations Of Shortest Path Algorithms

CHAPTER 9. Integer Programming

A Simple Feature Extraction Technique of a Pattern By Hopfield Network

5.1 Bipartite Matching

Recurrent Neural Networks

FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM

Neural Networks and Back Propagation Algorithm

Chapter 6: Graph Theory

THE NUMBER OF GRAPHS AND A RANDOM GRAPH WITH A GIVEN DEGREE SEQUENCE. Alexander Barvinok

Row Echelon Form and Reduced Row Echelon Form

Social Media Mining. Graph Essentials

COMBINATORIAL PROPERTIES OF THE HIGMAN-SIMS GRAPH. 1. Introduction

Direct Methods for Solving Linear Systems. Matrix Factorization

3 Some Integer Functions

Discrete Mathematics & Mathematical Reasoning Chapter 10: Graphs

Zachary Monaco Georgia College Olympic Coloring: Go For The Gold

Warshall s Algorithm: Transitive Closure

SHARP BOUNDS FOR THE SUM OF THE SQUARES OF THE DEGREES OF A GRAPH

Midterm Practice Problems

3. Reaction Diffusion Equations Consider the following ODE model for population growth

Network (Tree) Topology Inference Based on Prüfer Sequence

Reductions & NP-completeness as part of Foundations of Computer Science undergraduate course

Reinforcement Learning


An Introduction to Neural Networks

Load balancing Static Load Balancing

SELECTING NEURAL NETWORK ARCHITECTURE FOR INVESTMENT PROFITABILITY PREDICTIONS

NEURAL NETWORK FUNDAMENTALS WITH GRAPHS, ALGORITHMS, AND APPLICATIONS

WHERE DOES THE 10% CONDITION COME FROM?

THE HUMAN BRAIN. observations and foundations

Graph Theory Problems and Solutions

The Assignment Problem and the Hungarian Method

Scheduling Shop Scheduling. Tim Nieberg

Enumerating possible Sudoku grids

Social Media Mining. Network Measures

Normal distribution. ) 2 /2σ. 2π σ

3. Mathematical Induction

Solution to Homework 2

3 An Illustrative Example

SOLVING LINEAR SYSTEMS

Complexity Theory. IE 661: Scheduling Theory Fall 2003 Satyaki Ghosh Dastidar

Part 2: Community Detection

A linear combination is a sum of scalars times quantities. Such expressions arise quite frequently and have the form

DETERMINANTS IN THE KRONECKER PRODUCT OF MATRICES: THE INCIDENCE MATRIX OF A COMPLETE GRAPH

Bicolored Shortest Paths in Graphs with Applications to Network Overlay Design

SPANNING CACTI FOR STRUCTURALLY CONTROLLABLE NETWORKS NGO THI TU ANH NATIONAL UNIVERSITY OF SINGAPORE

Applied Algorithm Design Lecture 5

Guessing Game: NP-Complete?

UNIVERSITY OF BOLTON SCHOOL OF ENGINEERING MS SYSTEMS ENGINEERING AND ENGINEERING MANAGEMENT SEMESTER 1 EXAMINATION 2015/2016 INTELLIGENT SYSTEMS

Load Balancing and Termination Detection

Introduction to Scheduling Theory

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS

Lecture 6. Artificial Neural Networks

Reinforcement Learning

A New Nature-inspired Algorithm for Load Balancing

5.3 The Cross Product in R 3

Artificial neural networks

Ranking on Data Manifolds

A Branch and Bound Algorithm for Solving the Binary Bi-level Linear Programming Problem

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Chapter 3. Distribution Problems. 3.1 The idea of a distribution The twenty-fold way

Lecture 3. Linear Programming. 3B1B Optimization Michaelmas 2015 A. Zisserman. Extreme solutions. Simplex method. Interior point method

Just the Factors, Ma am

Approximation Algorithms

7 Gaussian Elimination and LU Factorization

Analysis of an Artificial Hormone System (Extended abstract)

(67902) Topics in Theory and Complexity Nov 2, Lecture 7

Artificial Neural Networks and Support Vector Machines. CS 486/686: Introduction to Artificial Intelligence

2.3 Convex Constrained Optimization Problems

Handout #Ch7 San Skulrattanakulchai Gustavus Adolphus College Dec 6, Chapter 7: Digraphs

Mathematics Course 111: Algebra I Part IV: Vector Spaces

Systems of Linear Equations

Course: Model, Learning, and Inference: Lecture 5

(Refer Slide Time: 2:03)

! Solve problem to optimality. ! Solve problem in poly-time. ! Solve arbitrary instances of the problem. !-approximation algorithm.

High Performance Computing for Operation Research

COUNTING INDEPENDENT SETS IN SOME CLASSES OF (ALMOST) REGULAR GRAPHS

THREE DIMENSIONAL REPRESENTATION OF AMINO ACID CHARAC- TERISTICS

K-Means Clustering Tutorial

1 Error in Euler s Method

SEMITOTAL AND TOTAL BLOCK-CUTVERTEX GRAPH

CSE 4351/5351 Notes 7: Task Scheduling & Load Balancing

General Framework for an Iterative Solution of Ax b. Jacobi s Method

Follow links Class Use and other Permissions. For more information, send to:

Math Quizzes Winter 2009

Appendix 4 Simulation software for neuronal network models

Chapter Load Balancing. Approximation Algorithms. Load Balancing. Load Balancing on 2 Machines. Load Balancing: Greedy Scheduling

Lecture 7: NP-Complete Problems

Circuit Analysis using the Node and Mesh Methods

Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh

The Characteristic Polynomial

Transcription:

ENS Lyon Théorie des graphes avancée: Professeur: Eric Thierry Hopfield Network Alice Julien-Laferriere Lyon

1 Introduction 3 2 Hopfield Network Elements 4 3 Convergence 6 3.1 Link to graph theory............................... 7 3.2 Minimum cut problem.............................. 8 4 Application of Hopfield Network : 10 4.1 Associative Memory :.............................. 10 4.1.1 Hebbian Learning............................ 10 4.2 Combinatorial Problem............................. 11 4.2.1 The travelling salesman......................... 11 4.2.2 Discussion about efficiency....................... 12 5 The continuous model 13 6 Conclusion 14 2

Introduction 1 Introduction A neural network (or more formally artificial neural network) is a mathematical model or computational model inspired by the structure and functional aspects of biological neural networks. It consists of an interconnected group of artificial neurons. The original inspiration for the term Artificial Neural Network came from examination of central nervous systems and their neurons, axons, dendrites and synapses which constitute the processing elements of biological neural networks. The first model of a neuron was presented in 1943 by W. Mc Culloch and W. Pitts, and in 1958 Rossenblatt conceived the Perceptron. His work had big repercussion but in 1969 a violent critic by Minsky and Papert was published. The work on neural network was slow down but John Hopfield convinced of the power of neural network came out with his model in 1982 and boost research in this field. Hopfield Network is a particular case of Neural Network. It is based on physics, inspired by spin system. This work is mainly based on the paper of J.Bruck : On the convergence properties of Hopfield Model [1] and the chapter 13 of the book of R.Rojas : Neural Networks [2]. 3

Hopfield Network Elements 2 Hopfield Network Elements Hopfield network consists of a set of interconnected neurons which update their activation values asynchronously. The activation values are binary, usually {-1,1}. The update of a unit depends on the other units of the network and on itself. A unit i will be influence by an other unit j with a certain weight w ij, and have a threshold value. So there is a constraint due to the other neurons and due the specific threshold of the unit. Update and parameters The new activation value (state) of a neuron is compute, in discret time, by the function : n x i (t + 1) = sign( x j (t)w ij θ i ) (1) j=1 or X = sign(xw T ) where X, W, T and the sign function are : x 1 x 2 X is the activation value of the n units/neurons : X =. w 11 w 12... w 1n w 21 w 22... w 2n W is the weight matrix : W =..... x n where w ij can be interpreted w n1...... w nn as the influence of neuron i over neuron j (and reciprocally) θ 1 θ T is the threshold of each unit : T = 2. the sign function is define as : { θ n +1 if x 0 1 otherwise We can easily represent an Hopfield Network by a weighted undirected graph were : each unit is a vertex the weighted edge between each vertex is the weight, W is then useful as an adjacent 4

Hopfield Network Elements matrix. For example, the graph 1 will correspond to : X = x 1 x 2 x 3 0 1 1 0.5, W = 1 0 1, T = 0.5. 1 1 0 0.5 Figure 1: Example of a really simple Hopfield network Usually, an Hopfield Network has a weight matrix symmetric, zero-diagonal (no loop, a unit does not influence on itself). We will only consider that case in our study. 5

Convergence 3 Convergence Hopfield networks converges to a local state. The energy function of a Hopfield network in a certain state is : E 1 := 1 2 Xt W X + T X t = 1 2 n n n w ij x i x j + θ i x i (2) i=1 j=1 i=1 E 2 := 1 2 Xt W X = 1 2 n n w ij x i x j (3) i=1 j=1 E1 can be useful as a general energy function. More often, we use E2. E2 is equivalent to E1. In order to take into account the threshold of the units, we add a artificial unit which is in the activation state 1. This unit is linked to the other with the value of their negative threshold. In that case, the new weight matrix will be W G = ( W T t 0 ) T Those energy functions are helpful to determinate the actual state of the network. Proposition 3.0.1 A Hopfield network with n units and asynchronous dynamics which starts from any given network state, eventually reaches a stable state at local minimum of the energy function. Proof: As the dynamic is asynchronous, only one unit is evaluated at each time. If the k th unit is selected, there is two options : 1. k does not change state 2. k changes of state ( 1 1 or 1 1) In the first case, the energy function remains the same, E(t + 1) = E(t). Else, the energy function changes, according to the new excitation value (x k )of the kth unit. The difference of energy of the system is given by : E 1 (x) E 1 (x ) = ( n w kj x j x k + θ k x k ) ( j=1 n w kj x j x k + θ kx k ) n Since w kk = 0, E 1 (x) E 1 (x ) = (x k x k ) ( w kj x j θ k ) (4) j=1 j=1 6

3.1 Link to graph theory The second term of this equation : ( n j=1 w kjx j θ k ) is the excitation e k of the k th unit. The unit changed its state, so the excitation has a different sign than x k and x k (according to (1)). Therefore, (4) is positive, which implies that E(x) > E(x ) So the energy will decrease for every change, and since there is only a finite number of possible states, the network should reach a state where the energy cannot decrease more, it will be a stable state. Of course the stable state is not unique but it is certain that the system will converge. The final state will depend on the input, the initial state of the system. 3.1 Link to graph theory Theoreme 3.1 [1] Let G=(V,E) a weighted and undirected graph with W being the matrix of its edges weights. Then the minimal cut problem is equivalent to max Q g where X = { 1, 1} n, and Q g := 1 2 n n w ij x i x j i=1 j=1 Proof: It is clear than Q g = E 1. So finding the max of Q g is equivalent to finding the min of E. Let assign a variable x to every node i V. Let W + denote the sum of the weights of edges in G with both end points equal to 1, and let W and W + denote the corresponding sums of the other two cases. Q g = (W ++ + W W + ) which also can be written as Q g = (W ++ + W + W + ) 2W + Q g = 1 n n 2 ( w ij ) 2W + (5) i=1 j=1 The first term in (5) is constant (equals the sum of eights of edges in G ). Hence, maximization of Q g is equivalent to minimization of W +. This maximization is actually a weight of a cut in G with V 1 being the set of nodes in G that correspond to the units that are equal to 1 and V 2 being the set of nodes in G that correspond to the units that are equal to 1. The theorem 3.1 does not consider that in Hopfield Network, each unit as a threshold, which determine if a unit gets sufficient impulsion to change state. When we take into 7

3.2 Minimum cut problem account the thresholds of the units of the network, we can also prove that finding a stable state will be equivalent to the minimal cut problem. Theoreme 3.2 Let N = (W, T ) be a network with W being a n*n symmetric zero diagonal matrix. Let G be a weighted graph with (n+1) nodes, with its weight matrix W G being ( ) W T W G = T t 0 The problem of finding a state V for which E is a global minimum is equivalent to minimum cut problem on the corresponding graph G. Proof: The graph G contains n+1 nodes. The last node included is connected to the other ones by weighted edges. Those edges has a weight w n+1,i = θ i. If the state of the node n + 1 is constrained to 1, E 1 := 1 n+1 n+1 w ij x i x j = 1 2 2 i=1 j=1 n n n w ij x i x j θ i x i i=1 j=1 i=1 And then we can apply the theorem 3.1.The cut is symmetric, Q g (X) = Q g ( X). The minimal cut can also be obtained by interchanging V 1 and V 1. And the additional node can be set to 1. 3.2 Minimum cut problem Definition A cut C = (S,T) is a partition of V of a graph G = (V, E). The cut-set of a cut C = (S,T) is the set (u, v) E u S, v T. If the edges are weighted, the value of the cut is the sum of the weights of the edges in the cut set. The minimal cut problem is : find the cut of smallest edge weight. The weight inside the set must be greater than the weights between. A cut is minimum if the size of the cut is not larger than the size of any other cut. The simplistic way to find the minimal cut in an undirected weighted graph is (from [1]) : 1) Start with a random assignment V { 1, 1} n 2)Chose a node i {1...n} randomly 3) Compare the sum of the weights of the edges which belong to the cut and incident at node k with the sum of weight of the other edges which are incident at node k. Move node k to the side of the cut which will result in a decrease in the weight of the cut. If there is equality, place the node k in V 1 4) Go to step 2... 8

3.2 Minimum cut problem Theoreme 3.3 Let N=(W,T) be a network with T=0 and W be a symmetric zero-diagonal matrix. If N is operating asynchronously then it will always converge to a stable state. Proof: As said before, we can consider the operation of N as the running of the algorithm above, for the min cut in N. In each iteration, the value of the cut is non increasing (ties are broken), thus the algorithm will always stop resulting in a cut whose weight is a local minimum. A small example Figure 2: A small example For example, here, for the node circled in red, the outside attraction is higher than the inner attraction (out : 6, in : 3). So the node will change state ( 1 1) and the size of the cut will decrease (-3). 9

Application of Hopfield Network : 4 Application of Hopfield Network : Neuronal network such as Hopfield Network have diverse applications. Generally they are used for : Associative memories : the network is able to memorize some states, patterns. Combinatorial optimization : if the problem is correctly modelled, the network can give some minimum and some solution but can rarely find the optimal solution. Generally Hopfield Network are used as a black box to calculate some output resulting from a certain self-organization due to the network. One of the most common application is the Hebbian learning. 4.1 Associative Memory : 4.1.1 Hebbian Learning In the Hebbian learning, the weight of the matrix results from a unsupervised learning, this learning type is a reinforcement learning. We consider that we can impress patterns to the network in order to recognize one of them even if they are given in input with a lot of noise. The learning phase consists in learning different pattern in order to set the weight of the edges. For each pattern, we can resume the learning phase as : if two unit has the same activation state ({1, 1} { 1, 1}), the strength of the connexion is reinforce (w ij increase) if two unit has opposite activation state ({ 1, 1}), the strength of the connexion decline. With those kind of rule, W will store patterns and then allow the network to converge to them. Unfortunately, in that case we can not predict the convergence to the right pattern (the network will still converge but we cannot predict to which stable state). Learning The weight matrix is modified according to the m pattern we want to store. Those pattern represent the stable state we hope to reach. The weights are modified, for example : w ij = w ij + ɛx i x j (6) (6) is iterated various time for each pattern. x i and x j represent the state of the units i and j of the pattern being learned. ɛ is a constant to prevent the connection to became to big or to small (for example : ɛ = 1 m ) When m = 1, we always converge to the impress pattern. 10

4.2 Combinatorial Problem When m > 1, we can only hope to converge to a known pattern if the different patterns are as orthogonal as possible (the different patterns have to be really different). If they are quite related it is possible to have a mix of different pattern. It has been shown that Hopfield network can only stock 0.14 n patterns (n being the number of units in the network. 4.2 Combinatorial Problem Hopfield Network can be use for combinatorial problem. If a problem can be written as an isomorphic form of the energy function, the network will be able to find a local minimum of the function but will not guaranty that the solution is optimal. Usually, in those problems the state of the units are code as (0, 1). We will use this notation, it makes easier the notations. 4.2.1 The travelling salesman The Travelling Salesman Problem (TSP) is an NP-hard problem in combinatorial optimization. Given a list of cities and their pairwise distances, the task is to find a shortest possible tour that visits each city exactly once. The path has to pass through n cities : S 1, S 2,...S n. Each city should only be visited once, the salesman has to return to his original point of departure and the length of the path has to be minimal. The network can be represented as : S 1 S 2. S n 1 2... n Where rows are the city and columns are the time. The units are the entry of the matrix. We suppose that the (n+1) column is the first one to keep the cycle. Also only one 1 should appears for each row and column because the salesman only visit a city once and cannot visit two city during the same time step. The distance between the city S i and the city S j is d ij. To find the shortest path, the network will minimize the function : E tsp = 1 n d ij x ik x j,k+1 + γ n n n n 2 2 ( ( x ij 1)) 2 + ( x ij 1) 2 )) i,j,k j=1 i=1 i=1 j=1 }{{}}{{} L Condition (7) 11

4.2 Combinatorial Problem The first part of the equation, L, is the total length of the trip. x i,k = 1 means that the city S i was visited at time k(otherwise x i,k = 0), so if x ik = 1 and x i,k+1 = 1 (S i visited at time k and S j visited at time k+1), the distance will be added to the length L. The second part represent the condition : one time in each city. n i=1 x ij 1) 2 will be minimal (=0) if for every i (here city), the sum over j (here time) is 1, meaning that a city can only be visited once. n j=1 ( n i=1 x ij 1) 2 will be minimal if for every time (k=1...n), the salesman only visit one city. γ is a parameter. Unfortunately, there is no other choice than using trial and error to determine the value we want to set. If γ is small, the condition will not be respected. If γ is really big, the system will be constraint a lot and L will be hidden (so the minimization of E tsp will not consider the distance between cities). The weights between units becomes :w ik,jk+1 = d ij + t ik,jk+1 where { γ if the units belong to the same column or row t ik,jk+1 = 0 otherwise and the threshold should be set to γ 2. The network will not be able to find the good path. But still, the solutions are approximation. If the number of cities is important (> 100), there is several minima so we cannot prove if the solution given by the system is a good approximation. 4.2.2 Discussion about efficiency When we try to resolve combinatorial problem, Hopfield network can help if we are able to express the problem as an energy function (and the aim will be minimizing this function). However, as said before, the optimality is not guaranteed ( and really often minimization will failed) and the system could fall into a local minima. But this kind of network still remains useful because if a unit is considered as a small processor, neural networks can offer high power of computation and parallelism ( thanks to the asynchronous update of the network). It is also possible to improve the result by adding noise to the network dynamics, and hope that it will allow the system to jump the energy ladder. Finally, an other solution is to increase the possible solutions, meaning that instead of using a discrete network, we can consider a continuous one. 12

The continuous model 5 The continuous model In the continuous model, the different states of the units can take any real value between [0, 1]. The dynamic remains asynchronous and the activation function is a sigmoid. x i = s(u i ) = 1 1 + e u i (8) For each unit, x i once the excitation u i changes slowly as : and the new energy function is du i dt = γ( u i + n w ij s(u j )) j=1 E = 1 2 n n w ij x i x j + n i=1 j=1 i=1 xi 0 s 1 (x)dx The convergence can be prove (for more information : [2]) 13

Conclusion 6 Conclusion Hopfiel network was a breakthrough in neural network and gave a important dynamism to neural network s research. A lot of result are available and nowadays neural system are used in computer science. Their ability to learn by example makes them very flexible. They are also very well suited for real time systems because of their parallel architecture. Still, there are also critics. A. K. Dewdney wrote in 1997 Although neural nets do solve a few toy problems, their powers of computation are so limited that I am surprised anyone takes them seriously as a general problem-solving tool. Of course, a big problem is that neural network needs a lot of training to be efficient. Nevertheless, they are used in many different ways : image processing, pattern matching, data deconvolution... 14

Conclusion Appendix Hopfield Application : This list is not exhaustive and more applications and the references can be found in Artificial Neural Systems of Patrick K.Simpson([3]) : Combinatorial Optimization Applications Image processing Signal processing Graph Coloring Coloring, Graph Flow and Graph Manipulation ANS programming Data deconvolution Pattern Matching Solving equations and Optimizing functions Traveling Salesman, Scheduling and Resource Allocation 15

Conclusion References [1] Jehoshua Bruck. On the convergence properties of the hopfield model. Procceedings of the IEEE, 78(10), October 1990. [2] Raul Rojas. Neural Networks. Springer, 1996. [3] Patrick K.Simpson. Artificial Neural Systems. Pergamon Press, 1990. [4] David Kriesel. A Brief Introduction to Neural Networks. 2007. chap 8 pp. 125-135, available at http://www.dkriesel.com. [5] Wikipédia. http://fr.wikipedia.org/wiki/r%c3%a9seau de neurones de Hopfield. 16