Training a Self-Organizing distributed on a PVM network

Transcription

1 Training a Self-Organizing Map distributed on a PVM network Nuno Bandeira Dep.Informatics, New University of Lisbon, Quinta da Torre 85 MONTE DA CAPARICA, PORTUGAL nb@di.fct.unl.pt Victor Jose Lobo Fernando Moura-Pires li. New University of Dep.Informatics, New University Lisbon/Portuguese Naval of Lisbon, Quinta da Tort-e 85 Academy, Escola Naval, Alfeite, MONTE DA CAPARICA, 8 ALMADA, PORTUGAL PORTUGAL vsl@di.fct.unl.pt finp@di.fct.unl.pt Abstract A distributed version of Kohonen s Self Organizing Map (SOM) algorithm is presented, and it s implementation on a Parallel Virtual Machine (PVM) network running on Intel based PCs with Microsoft Windows 95 is described. Numerical results are given for d#?erent work loads. Keywords: SOM, distributed algorithms, parallel algorithms, neural networks 1 - Introduction Although neural networks are intrinsically parallel algorithms, they are not easily implemented on distributed architectures because the strong interactions between neurons impose a very high communication overhead (neural networks are also called connectionist models). One of the neural models that has beenimplemented with more success onto parallel architectures is Kohonen s SOM [ 1][][3], because it requires very little communication between neurons. However these implementations have traditionally used parallel machines that tend to be expensive and non-standard []. Over the last few years, a system called Parallel Virtual Machine (PVM) [4][6], has been developed that enables a programmer to use networked computers (running different operating systems such as UNIX and MS- Windows 95) in a manner very similar to a single UNIX machine, using common languages such as C. Thanks to PVM, existing computer networks, no matter how heterogeneous, can easily be programmed. Moreover, simple PCs running MS-Windows (which abound in most organizations), can be put to work during otherwise unproductive times, such as nights and weekends. In this paper we describe our experience in trying to use the University s PC computer laboratories to train SOM networks. Although we intend to use these maps to classify sound, we used random vectors in these --I experiments. - Distributed SOM algorithm A thorough discussion of Kohonen s algorithm can be found in [5], but the core training algorithm may be described as follows: 1. For a given training pattern x: Calculate the distance of each neuron to the training pattern x (Calculation phase) 1. - Find the neuron with smaller distance, and call it the winner W (Voting phase) Change the network neurons with a function G, which depends on the learning rate a, the distance d to W (in the output plane), and the neighborhood function F. Due to the nature of the neighborhood function, only the neurons closer to W (in the output space) will be changed. (Update phase). Update the learning rate a and the neighborhood function F according to some rule 3. Repeat steps 1 and for the next training pattern, until some stopping criteria is reached. Many different distributed versions of Kohonen s SOM are possible, each being more adequate for a certain machine architecture. For implementing in PVM, we think the most adequate is the following: O l/98 $ IEEE 457

2 Given: Np processors (PVM hosts) A coordinator process C (within Np) Nt training patterns Xi N, neurons forming a SOM 1 - Assign the neurons. Assign the N, neurons to the processors, in such a way that each processor receives approximately the same number of neurons &/NJ., and that for any given area of the SOM the neurons are evenly distributed amongst the processors. - Set up the training set. Send all the N, training patterns to all the processors, together with the initial training parameters. 3 - Train the neurons. For each set, do the following: 3.1 Calculate the local (calculation phase) pattern xi in the training winner neuron in each processor 3. Send all local winner neuron coordinates and distances to the pattern x to the coordinator. At the coordinator, select the global winner, and send it s coordinates to all processors (voting phase) 3.3 Update the neurons at each processor, according to the update function F and update the network training parameters (update phase) 4 - Repeat step 3 until the stopping criteria is met 5 - Send all neurons back to the coordinator. A graphical representation of the algorithm messages involved is given in Figure 1. showing the The initial and final phases of the algorithm (represented in white in Figure 1) are executed only once, and thus have very little influence in the overall performance. Most of the time is spent in the main loop (represented in gray in Figure I), which iterates through the three main phases: calculation of the distances, voting for the global winner, and updating the neurons. The calculation phase of the algorithm (fmding the winner) is inherently parallel, and it s computational load can be spread evenly across the network if each processor has roughly the same number of neurons. The voting phase is the only one that requires communication ( and synchronization ) between processors, because the global winner must be known to all for the algorithm to proceed. If there was no coordinator, each processor would have to send information about it s local winner to all other processors. While PVM does support a broadcast mechanism, this would translate to a multicast at the data link level, thus originating N&VP-I) messages per iteration. With a coordinator, each process sends,only one message to the coordinator, and it in return sends only one message back, thus originating only (?V$) messages. Furthermore, the coordinator can piggyback additional information on the return message, that can be used to select the next training pattern, change the training parameters, etc. Coordinator Send start order Receive neurons Clients Receive neurons Figure 1 - Message exchange in distributed SOM During the update phase, each processor will have to calculate the distance between it s local neurons and the global winner (in the output plane), and then update only the neurons in p the winner s neighborhood. The computational load will be evenly distributed only if all processors have roughly the same number of neurons in this neighborhood. Thus it is very important to assign the neurons evenly during step 1 of the distributed algorithm. When the neighborhood radius is large, the computational load will easily be distributed. However, when the neighborhood radius is small, it is difficult to guarantee that all processors will have the same number of neurons, and thus some processors will have to wait for others. It must be pointed out that in this case, the number of neurons to update will be small, so the difference in processing time am-ongst the processors will be small. 3 - Experimental results During our experiments, we used networks of up to 1 PCs. Each was a Digital Venturis FX, with a Pentium running at loomhz, with 16Mb of RAM (55ISb cache). The computers where connected with a coaxial cable, 458

3 using lobase level protocol (1 Megabits per second), and TCP/IP as the level 3 protocol. The computers where running the MS-Windows 95 operating system, WPVM. [6] and Microsoft LAN Manager peer-to-peer network clients and servers. The computers had all screen-savers and anti-virus checkers disabled and where not running any other software during these tests. As the speedup obtained by using the distributed SOM depends critically on the amount of processing required before each synchronization, we used very large pattern vectors, with 14 features, and then varied the number of neurons on the map. We used square maps with 5x5, 1x1,x, and 4x4 neurons. Although square maps tend to slow convergence, we used them because they have a shorter boundary than rectangular maps (for the same number of neurons) and thus are less affected by discontinuities of the neighborhood function on those boundaries. This discontinuity effect would also affect the smaller maps more then the large ones if we used the same initial radius in all tests. To avoid this, we used an initial neighborhood function radius equal to each map s side. So as to make the radius decrease smoothly, we force it to decrease only 1 unit each time the whole training set patterns are presented. The map with 5x5 neurons will thus have only 5xh$ iterations of the patterns, while the one with 4x4 will have 4OxN,. So as not to make the simulation too long for larger maps, we use fewer training patterns for these maps. In the end, each map requires exactly 4 times more calculations then the previous one. Figure - Diagonal neuron distribution The number of patterns used for training does not influence the performance or speedup of the algorithm, so we use only 1 patterns for the 5x5 map, 6 for the 1x1, 3 for the x and 15 for the 4x4. While this number of patterns (and iterations) would be far too small for a useful classification or clustering, it is sufficient to prove that the system works reliably. The number of training patterns in the smaller maps has to be greater than for the larger maps, because otherwise the time intervals would be too small to be measured reliably. During these simulations we distributed the neurons amongst the processors in such a way that each processor has one or more diagonal lines of neurons. This distribution, although not ideal, provides a reasonable equilibrium of neurons by processors for rectangular neighborhoods (see Figure - every uncut neighborhood has the same neuron load per processor at each radius). The execution times where measured within the programs (with a call to a system timer), so as to measure only the time spent on the iterations (step 3 of the distributed algorithm, and shaded area in Figure 1). The numerical results are presented in Table 1 and in Figure 3, we can see a graph of the absolute execution times, while in Figure 4 we can see the relative times, that is, the speedup. tn g 5 Q i 3, No of machines Figure 3 - Absolute execution times ~(VdCOON \ \ No of machines + 5x5 -..-@b- 1x1 - Ref. Figure 4 - Relative execution time (l=time on a single machine) 459

4 Size of map No of PCs 5x5 1x1 x 4x4 r L Conclusions t Table 1 - Execution times (in seconds) The results presented confirm the claim that the SOM can be efficiently distributed on an ordinary computer network. However, depending on the work load, we may obtain overwhelming gains (as in the 4x4 map), moderate but consistent gains (as in the x map), or even high losses (as in the 5x5 map). There are a couple of extremely high running times corresponding to the 4x4 map running in a single machine or in a group of two. After that there is a sudden break and then the running times decrease smoothly. This initial peak is due to the machine configuration we are working on, namely because we have 16 Mb of RAM and a 4x4 map takes up to 13Mb RAM, forcing the operating system to use the disk as swap-space. We used machines with this configuration because we needed to have a reasonable pool of highly similar computers to achieve fair comparisons, and these where the ones available. Nevertheless we consider this as an advantage instead of a shortcoming, since these are the most common machines around any office or university lab, and allowed us to expose another very important fact when using distributed processing - the efficient use of each machine s memory. Using our distribution model, you can take advantage of each machine s local memory along with the corresponding processing power, effectively avoiding a RAM/HardDisk swapping situation which terribly slows down the SOM processing. In the more general case, the total execution time will decrease smoothly (except sometimes for the transition from 1 to machines), and then start to increase slowly. It s not reasonable to expect an unlimited gains as you throw more and more machines into the pool because of the increase in network load. When distributing a process, there will be a minimum overload on the total running time, due to the network. In our tests this can be seen by observing that the highest jumps upwards happen when there is a switch from a single machine to two machines. The distribution is profitable only when there is a significant workload to be distributed, thereby overcoming this minimum network * overload. After that initial step, all other machine extensions walk along an almost smooth curve, reflecting the converging equilibrium between each machine s designated workload and the network overload due to an increasing number of traded messages. The overload due to the network will increase linearly with the number of machines (each additional machine will be responsible for new messages), until the network starts to saturate due to collisions and/or sending queues. The workload per machine, on the other hand, will decrease hyperbolically, so at a certain point a minimum execution time will be reached, and adding a new machine will not improve the overall performance. We intend to concentrate our future work in three main areas. One of them is finding a better algorithm for distributing the neurons amongst processors. The ideal distribution will be a function of the radius of the neighborhood function and the number of processors, and genetic algorithms have been proposed to fmd good distributions. Another interesting problem is trying to fme tune the measurements of time spent calculating, transmitting, and waiting, during the iterations. On a PC running MS- Windows, this is rather tricky due to the lack of suitable tools. Finally, we intent to test and improve the algorithm for heterogeneous networks. This poses interesting problems of load balancing, and even more so if we attempt to do dynamic balancing during operation. Acknowledgments We would like to thank the Department de Engenharia Informatica da Universidade de Coimbra and the Escola Superior de Tecnologia e Gest%o do Instituto Politecnico de Viana do Castelo the for the development and availability of WPVM., the PVM implementation used in this work, and also the Prodep, FLAD and INVOTAN for the financial support to Nuno Bandeira (Prodep) and Victor Lobo (FLAD and INVOTAN). 46

5 References [I] Kohonen,T, et al.; Bibliography on Self-Organizing Map (SOM) and Learning Vector Quantization (LVQ), Helsinki 1997, available at ftp://cochlea.hut.fi [] Przytula, K.W, Prasanna, V.K.; Parallel Digital Implementations of Neural Networks, Prentice-Hall, [3] Ultsch, A, Siemon, H.P; Kohonen Networks on Transputers: Implementation and Animation, Proceedings of the International Neural Network Conference (INNC), July 199. [4] Geist, A., et al.; PVM: Parallel Virtual Machine - A User s guide and Tutorial for Networked Parallel Computing, The MIT Press, 1994 [5] Kohonen, Tuevo; Self-Organizing Maps, Springer- Verlag, 1995 [6] Alves, A, et al.; WPVM Manual, supplied with WPVM. package 461