1 A Pattern Recognition System Using Evolvable Hardware Masaya Iwata 1 Isamu Kajitani 2 Hitoshi Yamada 2 Hitoshi Iba 1 Tetsuya Higuchi ,Umezono,Tsukuba,Ibaraki,305,Japan Electrotechnical Laboratory ,Tennoudai,Tsukuba,Ibaraki,305,Japan University of Tsukuba Abstract. We describe a high-speed pattern recognition system using Evolvable Hardware (EHW), which can change its own hardware structure by genetic learning in order to adapt best to the environment. The purpose of the system is to show that EHW can work as a recognition device with such robustness for the noise as seen in the recognition systems based on neural networks. The advantage of EHW compared with a neural network is the high processing speed and the readability of the learned result. The readability means that the result is understandable in terms of Boolean functions. In this paper, we describe the architecture, the learning algorithm and the experiment on the pattern recognition system using EHW. 1 Introduction The interests on evolvable hardware (EHW) are growing rapidly since the idea of EHW was proposed independently in Japan and in Switzerland around 1992 [Higuchi94],[Marchal94]. And in 1995, the rst international workshop on evolvable hardware was held in Lausanne. EHW is a hardware which can adapt to the new environment which the designer doesn't anticipate. This contrasts with the conventional hardware where the adaptive changes are not allowed. EHW, built on programmable logic devices (PLDs), is an adaptive hardware whose architecture can be recongured by using genetic algorithms to adapt to the new environment. EHW is best suitable for applications where hardware specications can not be given in advance. Applications solved by articial neural network (ANN) are such examples because pattern classier functions can be obtained only after learning is complete. The purpose of this paper is to show that EHW may have the possibility to take the place of ANN when used for a pattern recognition system. EHW is expected to work as ANN-like robust pattern recognizer which realizes noiseinsensitive recognition. Advantages of EHW over ANN are as follows. First the processing speed is at least two orders of magnitude faster than ANN systems whose executions are mostly software-based. Second the the learned results of
2 EHW are readable. That means that the learned result is easily expressed in terms of readable Boolean functions. In ANN, on the contrary, it is dicult to read the learned result because it is represented just by the enumeration of real values for thresholds and weights. This paper consists of the following sections. Section 2 describes the EHW concept. Section 3 describes the pattern recognition using EHW. It introduces MDL (Minimum description length) and VGA (Variable length chromosome genetic algorithm) for increasing the capability of noise-insensitive recognition. Section 4 describes an architecture of the pattern recognition system using EHW and the experiment on the recognition of numerical characters. Section 5 discusses about the recognition system and Section 6 concludes this paper. 2 Evolvable Hardware (EHW) 2.1 Basic Idea Evolvable Hardware (EHW) is a hardware which modies its own hardware structure according to the environmental changes. EHW is implemented on a programmable logic device (PLD), whose architecture can be altered by downloading a binary bit string, i.e. architecture bits. The architecture bits are adaptively acquired by genetic algorithms (GA). The basic idea of EHW is to regard the architecture bits of a PLD as a chromosome for GA (see Fig. 1). The hardware structure is adaptively searched by GA. These architecture bits, i.e. the GA chromosome, are downloaded onto a PLD, on and after the genetic learning. Therefore, EHW can be considered as an on-line adaptive hardware. Architecture bits GA operation Architecture bits (Chromosome) (Evolved chromosome) Downloading Downloading PLD Evolution PLD Fig. 1. Evolvable Hardware (EHW) 2.2 Programmable Logic Device (PLD) We explain in more detail about PLD using the simplied model as shown in Fig. 2.
3 A PLD consists of logic cells and a fuse. In addition, architecture bits determine the architecture of the PLD. These bits are assumed to be stored in an architecture bit register (ABR). Each link of the fuse corresponds to a bit in the ABR. The fuse determines the interconnection between the device inputs and the logic cell. It also species the logic cell's AND-term inputs. If a link on a particular row of the fuse is switched on, which is indicated by a black dot in Fig. 2, then the corresponding input signal is connected to the row. In the architecture bits, these black and white dots are represented by 1 and 0 respectively. Consider the example PLD shown in Fig. 2. The rst row indicates that I 0 and I 2 are connected by an AND-term, which generates I 0 I 2. Similarly, the second row generates I 1. These AND-terms are connected by an OR gate. Thus, the resultant output is O 0 = I 0 I 2 + I 1. As mentioned above, both of the fuse and the functionality of the logic cell are represented in a binary string. The key idea of EHW is to regard this binary bit string as a chromosome for the sake of GA-based adaptive search. The hardware structure we actually use is a FPLA device, which is a commercial PLD (Fig. 3). This architecture mainly consists of an AND and OR s. A vertical line of the OR corresponds to a logic cell in Fig. 2. Inputs I 0 I 1... IM-1 Inputs I 0 I 1 I 2 Fuse Logic cell..... Architecture Bit Register O 0 Output Fig. 2. A Simplied PLD (Programmable Logic Device) Structure AND Architecture bits O 0 OR.... Outputs O 1 O N-1 Fig. 3. A FPLA Architecture for EHW 2.3 Genetic learning We describe the genotype representation of EHW and the genetic learning method. In our earlier works, the architecture bits were regarded as the GA chromosome and the chromosome length was xed. In spite of this simple representation, the hardware evolution was successful for combinatorial logic circuits (e.g. 6-multiplexer [Higuchi93]) and sequential logic circuits (e.g. 4-state machine, 3-bit counter [Higuchi94]). However, this straightforward representation had a serious limitation in the hardware evolution. All the fuse bits should have been included in the
4 genotype, even when eective bits in the fuse were only a few. This made the chromosome too long to be eectively searched by evolution. Therefore, we have introduced a new GA based on variable length chromosome called VGA [Kajitani95]. VGA is expected to evolve a large circuit more quickly. The chromosome length of VGA is smaller than the previous GA, especially when evolving a circuit with large inputs. VGA is described in more detail in section 3. The tness evaluation of GA is basically determined by the correctness of the EHW's output for the training data set. In the pattern recognition system we introduce MDL (Minimum Description Length) [Rissanen89] for the tness evaluation. Using MDL, the ability of robustness in recognizing noisy pattern is expected to increase. (For more details, see section 3.3) 3 Pattern Recognition 3.1 Motivation EHW has been applied to high-speed pattern recognition in order to establish a robust system in noisy environments [Iwata96]. This ability, i.e. robustness, seems to be the main feature of ANN. ANN is mostly run in a software-based way, i.e. executed by a workstation. Thus, current ANN may have diculty with real-time processing because of the speed limit of the software-based execution. Another desirable feature of EHW is its readability. The learned result by EHW is expressed as a Boolean function, whereas ANN represents it as thresholds and weights. Thus, the acquired result of EHW is more easily understood than that of ANN. We believe that this understandable feature leads to wider usage of EHW in industrial applications. For the sake of achieving exible recognition capability, it is necessary to cope with a pattern which is classiable not by a linear function, but by a nonlinear function. We have conducted an experiment in learning the exclusive-or problem in order to check the above capability. From the simulation result, we conrmed that EHW can learn non-linear functions successfully [Higuchi95]. In other words, EHW is supposed to fulll the minimum requirement towards the robust pattern recognition. 3.2 Procedure of pattern recognition The pattern recognition procedure consists of two phases as shown in Fig. 4. The rst is the learning phase of training patterns. The training patterns are genetically learned by EHW. We use VGA and MDL-based tness described in section 3.3 and 3.4. The second phase is the recognition of test patterns. Our aim is the noise-insensitive pattern recognition.
5 1. Learning EHW Recognition 2 GA operation EHW 2 y x Fig. 4. The Procedure of Pattern Recognition using EHW Fig. 5. An Example of Pattern Classication using MDL 3.3 Fitness evaluation by MDL (Minimum Description Length) MDL (Minimum Description Length) is an information criteria in machine learning in order to predict the rest of the data set with the given data set [Rissanen89]. Using MDL for pattern recognition, a noise-insensitive classier function is obtained eectively. A classier function which is noise-insensitive is more desirable than a classi- er which is noise-sensitive, since the latter is susceptible to noise and overtting occurs. For example, in Fig. 5, the function denoted with solid line classies two patterns in very strict way, but the function denoted with dotted line is better as the classier function because it is noise-insensitive [Itoh92]. Thus, MDL is dened so as to choose more simple and more general classier functions. We have introduced the above MDL criterion into the GA tness evaluation. The purpose is to establish a robust learning method for EHW. In general, the greater the number of \don't care" 1 inputs, the more robust (i.e. noiseinsensitive) the evolved hardware. Thus, we regard the number of \don't care" inputs as an index of MDL. More formally, the MDL value for our EHW is written as follows: MDL = A c log(c +1)+(10 A c ) log(e +1); (1) where C denotes the complexity of the EHW. E is the error rate of the EHW's output. The C value (i.e. the complexity of the EHW) determines the performance of the MDL. We introduce three types of C denitions as described in Appendix. To use MDL as the tness function of GA, it must be normalized so that it has the range of 0 MDL 1. Thus the tness is expressed as follows. Fitness = 1 0 MDL (2) 1 We call an input "don't care" if it is not included in the output expression. For instance, if O = I1 + I2 in case of a PLD shown in Fig. 2, then I0 is a "don't care" input.
6 Chromosome: (0,1) (4,1) (8,2) (9,1) (13,1) (14,1) Chromosome: (0,1) (4,1) (8,2) (9,1) (13,1) (14,1) AND Allele: (Location, Connection Type) OR Connection Type 1 2 AND OR Architecture bits: AND Inputs I 0 I 1 I 2 OR (a) Representation of an Allele Outputs O O 0 1 (b) An Example of a Chromosome Fig. 6. Chromosome Representation of Variable Length Chromosome GA 3.4 Variable length chromosome GA (VGA) We introduce a new GA based on variable length chromosome called VGA to increase the performance of GA. In conventional EHW, the whole architecture bits of PLDs were regarded as a chromosome of GA. We call this method simple GA (called SGA). However, in pattern recognition problem of 2D image here, many inputs are needed. This causes the increase of chromosome length, leading to the increase of GA learning time, and the restriction to evolved circuit size. Comparing with SGA, the chromosome length of VGA is smaller especially when evolving a circuit with large inputs. This is because VGA can deal with a part of architecture bits, which eectively determine the hardware structure. Because of this short chromosome, VGA can increase the maximum circuit size and establish an ecient adaptive search. The coding method of VGA is described in Fig. 6. An example of a chromosome and representation of an allele is shown in Fig. 6 (a). An allele in a chromosome consists of a location and a connection type. The location is the position of the allele in the fuse. There are two kinds of connection type. The AND connection type denes the input of the AND tobeeitherpositive or negative. The OR connection type denes the output of the AND to be connected or not to the input of the OR. For example, an allele (0,1) means that the connection type at location 0 is 1. By converting each allele into the connection pattern of the PLD, the chromosome is converted into the the architecture bits dening the PLD as shown in Fig. 6 (b). We use the roulette wheel selection strategy. Recombination operators are cut and splice, which are used in the messy GA [Goldberg93]. The splice operator is slightly dierent in the sense that a gene with the same locations (for instance, (0,1) and (0,2) ) are not allowed in one chromosome. A mutation operator is applied so as to change the values of the location and the connection type randomly. Splice operator concatenates two chromosomes. For more details of VGA, refer to [Kajitani95].
7 4 Pattern recognition system 4.1 The pattern recognition system We have developed the pattern recognition system (Fig. 7). The organization of the system is shown in Fig. 8. It consists of an EHW board including 4 FPGA chips (Xilinx 4025), a DOS/V machine, and an input tablet for drawing patterns. The DOS/V machine handles GA operations, the control of EHW board and the display of patterns. The PLD on FPGA is recongurable, which means that the system can be used as a universal EHW system. The overview of the EHW board is shown in Fig. 9. and the block diagram is shown in Fig. 10. In the EHW board, there are four FPGA (hatched area in the gure), board control registers, and SRAM which stores the conguration data of FPGA. In the EHW, a circuit represented by a chromosome is realized by an ABR (architecture bit register) and a PLD. The ABR stores architecture bits of the PLD. The PLD has the architecture of FPLA device (Fig. 3). In this gure, there are K individuals, i.e. K pairs of an ABR and a PLD in a FPGA chip. In the rst version of this system, we designed a genetically recongurable hardware device with four FPGAs. The processing time of the EHW board is 720 ns. EHW board FPGA (Xilinx XC4025) x 4 I / F Input pattern Input tablet PC (DOS/V) Recognition result GA operation Fig. 7. Pattern Recognition System using EHW Fig. 8. Block Diagram of Pattern Recognition System Fig. 9. The EHW Board
8 Host Machine (DOS / V) ISA Board 16 EHW board 16 Selector. 8 Selector Selector Mem. Select SRAM 1 SRAM 4 FPGA Select FPGA Reset reset Download done ABR 1 ABR 1 Control PLD 1 PLD 1 Registers..... ABR K ABR K PLD K PLD K IPR IPR OPR OPR FPGA 1 FPGA 4 IPR: Input Pattern Register, OPR: Output Pattern Register Fig. 10. Block Diagram of EHW Board 4.2 Experiment We have conducted the experiment in recognizing binary patterns of 828 pixels. They are 30 input patterns of 64 bits in the training set as shown in Fig. 11. Three patterns exactly represent numerical characters (i.e. 0, 1, and 2). The other 27 patterns represent the same numerical characters with noises (i.e. 5 bits are randomly ipped). The outputs of EHW consists of 3 bits; each bit corresponds to one of three characters. The initial length of a chromosome is 100. The probability of the cut and splice operators is 0.1. The mutation probability is The line number of AND in the PLD is 24. The test data set consists of 30 patterns, which are generated with random noises (i.e. less than 5 bits are ipped randomly). For dierent learning methods were examined, i.e. MDL-based EHW with three types of MDL denitions (MDL1, MDL2 and MDL3 which correspond to equations (3), (4), and (5) in Appendix, respectively) and non-mdl EHW. The recognition result of the test set is plotted in Fig. 12. From the gure, it is clear that MDL-based EHWs give better performance for noisy patterns than EHW without MDL. An important feature of EHW is that the resultant expression can be represented by a simple Boolean function. For example, in one run, learned results in case of MDL3 are O 0 = I 34 I 38 ;O 1 = I 22 I 38 + I 13,and O 2 = I 37, where I i (0 i 63) indicates the location of the pixel in the pattern and O i is the recognition output for the pattern of letter i. Clearly, the results obtained by EHW are easier to understand, compared with ANN.
9 Correctness of test Number of noise bits MDL1 MDL2 MDL3 No MDL Fig. 11. Training Patterns Fig. 12. Recognition Result of Test Set 5 Discussion In this section we discuss about 1) the Boolean function which has high recognition ability for noisy patterns, 2) the advantage of VGA over SGA. First we discuss the Boolean function which has high recognition ability for noisy patterns. Roughly speaking, the Boolean function with better recognition ability is the function which has less inputs, that is, the function with more \don't care" inputs. We conrmed that we can get such a function using MDL. However, we can get more robust functions by adding more terms in the equation. The method to obtain such functions is the subject of the future research. In the pattern recognition system, we used VGA instead of SGA. The main advantage of VGA in pattern recognition is that we can handle larger inputs than using SGA. For example, EHW could learn three patterns of 16 inputs by SGA with the chromosome length of 840. On the other hand, by VGA, EHW can learn three patterns of 64 inputs with the chromosome length of in average. In addition, the learning by VGA is much faster than SGA; generation by VGA, 4053 by SGA. The reason why VGA can handle larger inputs than SGA is that VGA encodes into the chromosome only the inputs which actually generate AND terms. So, the chromosome length can be kept small. If SGA is used for problems of this nature, we suer the increase of the chromosome length because of many inputs, leading to the increase of GA execution time. In addition, VGA has a very good matching with MDL because MDL directs the GA search to nd smaller circuits. Thus, we can say that VGA is suitable for pattern recognition problems because it handles many inputs and learns small circuits. 6 Conclusion In this paper, we described the pattern recognition system using EHW. The system aims to recognize noisy patterns as neural networks do. We described the learning algorithm using MDL (Minimum Description Length) and VGA (Variable length chromosome GA). The noise-insensitive function was obtained eectively by using MDL as a tness function of GA. By using VGA, EHW could handle larger inputs with faster learning speed than using simple GA. We developed the pattern recognition system to show the feasibility of EHW for noise-insensitive recognition. We conducted experiments of recognizing noisy patterns and we conrmed that EHW has the ability to recognize noisy patterns.
10 References [Goldberg89] Goldberg D., \Genetic Algorithms in Search, Optimization, and Machine Learning" Addison Wesley, [Goldberg93] Goldberg D. et al., \Rapid Accurate Optimization of Dicult Problems using Fast Messy Genetic Algorithms" Proc. 5th Int. Joint Conf. on Genetic Algorithms (ICGA93), [Higuchi93] Higuchi T. et al., \Evolvable Hardware with Genetic Learning" in Proc. Simulation of Adaptive Behavior, MIT Press, [Higuchi94] Higuchi T. et al., \Evolvable Hardware with Genetic Learning" in Massively Parallel Articial Intelligence (eds. H. Kitano), MIT Press, [Higuchi95] Higuchi T. et al., \Evolvable Hardware and its Application to Pattern Recognition and Fault-tolerant Systems" in 1st Int. Workshop Towards Evolvable Hardware, Springer Verlag, [Itoh92] Itoh, S., \Application of MDL principle to pattern classication problems" (in Japanese), J. of Japanese Society for Articial Intelligence, Vol. 7, No. 4, [Kajitani95] Kajitani I. et al., \Variable Length Chromosome GA for Evolvable Hardware" in Proc. 3rd Int. Conf. on Evolutionary Computation (ICEC96), [Marchal94] P. Marchal et al., \Embryological Development on Silicon" Articial Life IV, [Rissanen89] Rissanen, J., Stochastic Complexity in Statistical Inquiry, World Scientic Series in Computer Science, Vol. 15, Appendix Denition of Complexity value for MDL We describe the C value which is the complexity value for MDL. The C value (i.e. the complexity of the EHW) determines the performance of the MDL. We introduce three types of C denitions as follows:- C 1 = X i j AN D Oi j; (3) C 2 = j AN D j2jor j; (4) C 3 = X i j AN D Oi j2jor Oi j : (5) Where j AN D O j and j OR O j are the numbers of ANDs and ORs connected to the output O. j AN D j (j OR j) is the number of ANDs (ORs) on the AND (OR). Consider Fig. 6(b) for instance. ANDs and ORs are represented as black dots and 2 marks in the gure. The values of C 1, C 2 and C 3 are 3 (= 1 + 2), 9 (= 3 2 3) and 5 (= ) respectively, because j AN D O0 j, j OR O0 j, j AN D O1 j, j OR O1 j, j AN D j, and j OR j are 1, 1, 2, 2, 3 and 3. The denition of C 1 is not very precise because it does not include the information of OR gates. On the other hand, C 2 and C 3 are expected to give more exact MDL values. We tested several other denitions of the complexity. In this paper we tested best three denitions. This article was processed using the LaT E X macro package with LLNCS style