ENS Lyon Théorie des graphes avancée: Professeur: Eric Thierry. Hopfield Network. Alice Julien-Laferriere. Lyon

ENS Lyon Théorie des graphes avancée: Professeur: Eric Thierry Hopfield Network Alice Julien-Laferriere Lyon

1 Introduction 3 2 Hopfield Network Elements 4 3 Convergence 6 3.1 Link to graph theory............................... 7 3.2 Minimum cut problem.............................. 8 4 Application of Hopfield Network : 10 4.1 Associative Memory :.............................. 10 4.1.1 Hebbian Learning............................ 10 4.2 Combinatorial Problem............................. 11 4.2.1 The travelling salesman......................... 11 4.2.2 Discussion about efficiency....................... 12 5 The continuous model 13 6 Conclusion 14 2

Introduction 1 Introduction A neural network (or more formally artificial neural network) is a mathematical model or computational model inspired by the structure and functional aspects of biological neural networks. It consists of an interconnected group of artificial neurons. The original inspiration for the term Artificial Neural Network came from examination of central nervous systems and their neurons, axons, dendrites and synapses which constitute the processing elements of biological neural networks. The first model of a neuron was presented in 1943 by W. Mc Culloch and W. Pitts, and in 1958 Rossenblatt conceived the Perceptron. His work had big repercussion but in 1969 a violent critic by Minsky and Papert was published. The work on neural network was slow down but John Hopfield convinced of the power of neural network came out with his model in 1982 and boost research in this field. Hopfield Network is a particular case of Neural Network. It is based on physics, inspired by spin system. This work is mainly based on the paper of J.Bruck : On the convergence properties of Hopfield Model [1] and the chapter 13 of the book of R.Rojas : Neural Networks [2]. 3

Hopfield Network Elements 2 Hopfield Network Elements Hopfield network consists of a set of interconnected neurons which update their activation values asynchronously. The activation values are binary, usually {-1,1}. The update of a unit depends on the other units of the network and on itself. A unit i will be influence by an other unit j with a certain weight w ij, and have a threshold value. So there is a constraint due to the other neurons and due the specific threshold of the unit. Update and parameters The new activation value (state) of a neuron is compute, in discret time, by the function : n x i (t + 1) = sign( x j (t)w ij θ i ) (1) j=1 or X = sign(xw T ) where X, W, T and the sign function are : x 1 x 2 X is the activation value of the n units/neurons : X =. w 11 w 12... w 1n w 21 w 22... w 2n W is the weight matrix : W =..... x n where w ij can be interpreted w n1...... w nn as the influence of neuron i over neuron j (and reciprocally) θ 1 θ T is the threshold of each unit : T = 2. the sign function is define as : { θ n +1 if x 0 1 otherwise We can easily represent an Hopfield Network by a weighted undirected graph were : each unit is a vertex the weighted edge between each vertex is the weight, W is then useful as an adjacent 4

Hopfield Network Elements matrix. For example, the graph 1 will correspond to : X = x 1 x 2 x 3 0 1 1 0.5, W = 1 0 1, T = 0.5. 1 1 0 0.5 Figure 1: Example of a really simple Hopfield network Usually, an Hopfield Network has a weight matrix symmetric, zero-diagonal (no loop, a unit does not influence on itself). We will only consider that case in our study. 5

Convergence 3 Convergence Hopfield networks converges to a local state. The energy function of a Hopfield network in a certain state is : E 1 := 1 2 Xt W X + T X t = 1 2 n n n w ij x i x j + θ i x i (2) i=1 j=1 i=1 E 2 := 1 2 Xt W X = 1 2 n n w ij x i x j (3) i=1 j=1 E1 can be useful as a general energy function. More often, we use E2. E2 is equivalent to E1. In order to take into account the threshold of the units, we add a artificial unit which is in the activation state 1. This unit is linked to the other with the value of their negative threshold. In that case, the new weight matrix will be W G = ( W T t 0 ) T Those energy functions are helpful to determinate the actual state of the network. Proposition 3.0.1 A Hopfield network with n units and asynchronous dynamics which starts from any given network state, eventually reaches a stable state at local minimum of the energy function. Proof: As the dynamic is asynchronous, only one unit is evaluated at each time. If the k th unit is selected, there is two options : 1. k does not change state 2. k changes of state ( 1 1 or 1 1) In the first case, the energy function remains the same, E(t + 1) = E(t). Else, the energy function changes, according to the new excitation value (x k )of the kth unit. The difference of energy of the system is given by : E 1 (x) E 1 (x ) = ( n w kj x j x k + θ k x k ) ( j=1 n w kj x j x k + θ kx k ) n Since w kk = 0, E 1 (x) E 1 (x ) = (x k x k ) ( w kj x j θ k ) (4) j=1 j=1 6

3.1 Link to graph theory The second term of this equation : ( n j=1 w kjx j θ k ) is the excitation e k of the k th unit. The unit changed its state, so the excitation has a different sign than x k and x k (according to (1)). Therefore, (4) is positive, which implies that E(x) > E(x ) So the energy will decrease for every change, and since there is only a finite number of possible states, the network should reach a state where the energy cannot decrease more, it will be a stable state. Of course the stable state is not unique but it is certain that the system will converge. The final state will depend on the input, the initial state of the system. 3.1 Link to graph theory Theoreme 3.1 [1] Let G=(V,E) a weighted and undirected graph with W being the matrix of its edges weights. Then the minimal cut problem is equivalent to max Q g where X = { 1, 1} n, and Q g := 1 2 n n w ij x i x j i=1 j=1 Proof: It is clear than Q g = E 1. So finding the max of Q g is equivalent to finding the min of E. Let assign a variable x to every node i V. Let W + denote the sum of the weights of edges in G with both end points equal to 1, and let W and W + denote the corresponding sums of the other two cases. Q g = (W ++ + W W + ) which also can be written as Q g = (W ++ + W + W + ) 2W + Q g = 1 n n 2 ( w ij ) 2W + (5) i=1 j=1 The first term in (5) is constant (equals the sum of eights of edges in G ). Hence, maximization of Q g is equivalent to minimization of W +. This maximization is actually a weight of a cut in G with V 1 being the set of nodes in G that correspond to the units that are equal to 1 and V 2 being the set of nodes in G that correspond to the units that are equal to 1. The theorem 3.1 does not consider that in Hopfield Network, each unit as a threshold, which determine if a unit gets sufficient impulsion to change state. When we take into 7

3.2 Minimum cut problem account the thresholds of the units of the network, we can also prove that finding a stable state will be equivalent to the minimal cut problem. Theoreme 3.2 Let N = (W, T ) be a network with W being a n*n symmetric zero diagonal matrix. Let G be a weighted graph with (n+1) nodes, with its weight matrix W G being ( ) W T W G = T t 0 The problem of finding a state V for which E is a global minimum is equivalent to minimum cut problem on the corresponding graph G. Proof: The graph G contains n+1 nodes. The last node included is connected to the other ones by weighted edges. Those edges has a weight w n+1,i = θ i. If the state of the node n + 1 is constrained to 1, E 1 := 1 n+1 n+1 w ij x i x j = 1 2 2 i=1 j=1 n n n w ij x i x j θ i x i i=1 j=1 i=1 And then we can apply the theorem 3.1.The cut is symmetric, Q g (X) = Q g ( X). The minimal cut can also be obtained by interchanging V 1 and V 1. And the additional node can be set to 1. 3.2 Minimum cut problem Definition A cut C = (S,T) is a partition of V of a graph G = (V, E). The cut-set of a cut C = (S,T) is the set (u, v) E u S, v T. If the edges are weighted, the value of the cut is the sum of the weights of the edges in the cut set. The minimal cut problem is : find the cut of smallest edge weight. The weight inside the set must be greater than the weights between. A cut is minimum if the size of the cut is not larger than the size of any other cut. The simplistic way to find the minimal cut in an undirected weighted graph is (from [1]) : 1) Start with a random assignment V { 1, 1} n 2)Chose a node i {1...n} randomly 3) Compare the sum of the weights of the edges which belong to the cut and incident at node k with the sum of weight of the other edges which are incident at node k. Move node k to the side of the cut which will result in a decrease in the weight of the cut. If there is equality, place the node k in V 1 4) Go to step 2... 8

3.2 Minimum cut problem Theoreme 3.3 Let N=(W,T) be a network with T=0 and W be a symmetric zero-diagonal matrix. If N is operating asynchronously then it will always converge to a stable state. Proof: As said before, we can consider the operation of N as the running of the algorithm above, for the min cut in N. In each iteration, the value of the cut is non increasing (ties are broken), thus the algorithm will always stop resulting in a cut whose weight is a local minimum. A small example Figure 2: A small example For example, here, for the node circled in red, the outside attraction is higher than the inner attraction (out : 6, in : 3). So the node will change state ( 1 1) and the size of the cut will decrease (-3). 9

Application of Hopfield Network : 4 Application of Hopfield Network : Neuronal network such as Hopfield Network have diverse applications. Generally they are used for : Associative memories : the network is able to memorize some states, patterns. Combinatorial optimization : if the problem is correctly modelled, the network can give some minimum and some solution but can rarely find the optimal solution. Generally Hopfield Network are used as a black box to calculate some output resulting from a certain self-organization due to the network. One of the most common application is the Hebbian learning. 4.1 Associative Memory : 4.1.1 Hebbian Learning In the Hebbian learning, the weight of the matrix results from a unsupervised learning, this learning type is a reinforcement learning. We consider that we can impress patterns to the network in order to recognize one of them even if they are given in input with a lot of noise. The learning phase consists in learning different pattern in order to set the weight of the edges. For each pattern, we can resume the learning phase as : if two unit has the same activation state ({1, 1} { 1, 1}), the strength of the connexion is reinforce (w ij increase) if two unit has opposite activation state ({ 1, 1}), the strength of the connexion decline. With those kind of rule, W will store patterns and then allow the network to converge to them. Unfortunately, in that case we can not predict the convergence to the right pattern (the network will still converge but we cannot predict to which stable state). Learning The weight matrix is modified according to the m pattern we want to store. Those pattern represent the stable state we hope to reach. The weights are modified, for example : w ij = w ij + ɛx i x j (6) (6) is iterated various time for each pattern. x i and x j represent the state of the units i and j of the pattern being learned. ɛ is a constant to prevent the connection to became to big or to small (for example : ɛ = 1 m ) When m = 1, we always converge to the impress pattern. 10

4.2 Combinatorial Problem When m > 1, we can only hope to converge to a known pattern if the different patterns are as orthogonal as possible (the different patterns have to be really different). If they are quite related it is possible to have a mix of different pattern. It has been shown that Hopfield network can only stock 0.14 n patterns (n being the number of units in the network. 4.2 Combinatorial Problem Hopfield Network can be use for combinatorial problem. If a problem can be written as an isomorphic form of the energy function, the network will be able to find a local minimum of the function but will not guaranty that the solution is optimal. Usually, in those problems the state of the units are code as (0, 1). We will use this notation, it makes easier the notations. 4.2.1 The travelling salesman The Travelling Salesman Problem (TSP) is an NP-hard problem in combinatorial optimization. Given a list of cities and their pairwise distances, the task is to find a shortest possible tour that visits each city exactly once. The path has to pass through n cities : S 1, S 2,...S n. Each city should only be visited once, the salesman has to return to his original point of departure and the length of the path has to be minimal. The network can be represented as : S 1 S 2. S n 1 2... n Where rows are the city and columns are the time. The units are the entry of the matrix. We suppose that the (n+1) column is the first one to keep the cycle. Also only one 1 should appears for each row and column because the salesman only visit a city once and cannot visit two city during the same time step. The distance between the city S i and the city S j is d ij. To find the shortest path, the network will minimize the function : E tsp = 1 n d ij x ik x j,k+1 + γ n n n n 2 2 ( ( x ij 1)) 2 + ( x ij 1) 2 )) i,j,k j=1 i=1 i=1 j=1 }{{}}{{} L Condition (7) 11

4.2 Combinatorial Problem The first part of the equation, L, is the total length of the trip. x i,k = 1 means that the city S i was visited at time k(otherwise x i,k = 0), so if x ik = 1 and x i,k+1 = 1 (S i visited at time k and S j visited at time k+1), the distance will be added to the length L. The second part represent the condition : one time in each city. n i=1 x ij 1) 2 will be minimal (=0) if for every i (here city), the sum over j (here time) is 1, meaning that a city can only be visited once. n j=1 ( n i=1 x ij 1) 2 will be minimal if for every time (k=1...n), the salesman only visit one city. γ is a parameter. Unfortunately, there is no other choice than using trial and error to determine the value we want to set. If γ is small, the condition will not be respected. If γ is really big, the system will be constraint a lot and L will be hidden (so the minimization of E tsp will not consider the distance between cities). The weights between units becomes :w ik,jk+1 = d ij + t ik,jk+1 where { γ if the units belong to the same column or row t ik,jk+1 = 0 otherwise and the threshold should be set to γ 2. The network will not be able to find the good path. But still, the solutions are approximation. If the number of cities is important (> 100), there is several minima so we cannot prove if the solution given by the system is a good approximation. 4.2.2 Discussion about efficiency When we try to resolve combinatorial problem, Hopfield network can help if we are able to express the problem as an energy function (and the aim will be minimizing this function). However, as said before, the optimality is not guaranteed ( and really often minimization will failed) and the system could fall into a local minima. But this kind of network still remains useful because if a unit is considered as a small processor, neural networks can offer high power of computation and parallelism ( thanks to the asynchronous update of the network). It is also possible to improve the result by adding noise to the network dynamics, and hope that it will allow the system to jump the energy ladder. Finally, an other solution is to increase the possible solutions, meaning that instead of using a discrete network, we can consider a continuous one. 12

The continuous model 5 The continuous model In the continuous model, the different states of the units can take any real value between [0, 1]. The dynamic remains asynchronous and the activation function is a sigmoid. x i = s(u i ) = 1 1 + e u i (8) For each unit, x i once the excitation u i changes slowly as : and the new energy function is du i dt = γ( u i + n w ij s(u j )) j=1 E = 1 2 n n w ij x i x j + n i=1 j=1 i=1 xi 0 s 1 (x)dx The convergence can be prove (for more information : [2]) 13

Conclusion 6 Conclusion Hopfiel network was a breakthrough in neural network and gave a important dynamism to neural network s research. A lot of result are available and nowadays neural system are used in computer science. Their ability to learn by example makes them very flexible. They are also very well suited for real time systems because of their parallel architecture. Still, there are also critics. A. K. Dewdney wrote in 1997 Although neural nets do solve a few toy problems, their powers of computation are so limited that I am surprised anyone takes them seriously as a general problem-solving tool. Of course, a big problem is that neural network needs a lot of training to be efficient. Nevertheless, they are used in many different ways : image processing, pattern matching, data deconvolution... 14

Conclusion Appendix Hopfield Application : This list is not exhaustive and more applications and the references can be found in Artificial Neural Systems of Patrick K.Simpson([3]) : Combinatorial Optimization Applications Image processing Signal processing Graph Coloring Coloring, Graph Flow and Graph Manipulation ANS programming Data deconvolution Pattern Matching Solving equations and Optimizing functions Traveling Salesman, Scheduling and Resource Allocation 15

Conclusion References [1] Jehoshua Bruck. On the convergence properties of the hopfield model. Procceedings of the IEEE, 78(10), October 1990. [2] Raul Rojas. Neural Networks. Springer, 1996. [3] Patrick K.Simpson. Artificial Neural Systems. Pergamon Press, 1990. [4] David Kriesel. A Brief Introduction to Neural Networks. 2007. chap 8 pp. 125-135, available at http://www.dkriesel.com. [5] Wikipédia. http://fr.wikipedia.org/wiki/r%c3%a9seau de neurones de Hopfield. 16