FPGA Implementation of Boolean Neural Networks using UML Roman Kohut,, Bernd Steinbach, Dominik Fröhlich Freiberg University of Mining and Technology Institute of Computer Science Freiberg (Sachs), Germany
Outline Introduction Boolean Neural Networks UML-Models Experiment results Conclusion 2
Introduction FPGAs y = f (x), x ={x1, x2,, xnx}, Nx 4 5 slice Nx 5 6 CLB Nx 6 Slice structure 3
Introduction The Problem Connectivity problems, Structured problems limited number of logic gates and interconnections, On-Chip learning problems (sequential computations), type of data Large number of CLBs number of inputs Large number of CLBs (10 th -100 th ) are required for one single neuron GANGLION - (640-784) CLBs, Gschwind NN - 22 CLBs, Xilinx-NN - 51 CLBs, Hopfield NN - 26 CLBs. complex tranfer function 4
Boolean Neural Networks Boolean Neuron y y = = f B f B ( x, w) ( x, w ) B B Inputs x 1 x 2 x 3 w 1 w 2 w 3 w Nx Weights of synaptic connections f B Transfer function y = f Output ( x,w) y x w = { x x,, }, B 1 2 x Nx f B y B = { w w,, }, B 1 2 w N x, {0,1} f B y B x i w i {0,1} {0,1} - Boolean transfer function - output signal x Nx General structure of Boolean neuron Advantages of the BN: speeding up of calculation significantly, reduction of necessary memory size, possibility to map the BN into one single CLB of FPGAs. 5
Boolean Neural Network Structure Nk1 x 1 x 2 x Nx Nk 2 k 1 k 2 N 1 N 2 y 1 y 2 y Ny inputs x 1 x 2 x 3 x 4 LUT of CLB weight coefficients w 1 w 2 output w 3 f B y w 4 y = f B (x B, w B ) transfer function Nk Zn k Zn LUT: Nk Zn, N Ny 4 Slice: Nk Zn, N Ny 5 CLB: Nk Zn, N Ny 6 Training algorithm N Ny N x =4 LUT 6
Boolean Neural Networks Mapping of BNN to FPGA BN 1 LUT 1 BN 2 LUT 2 BN 3 LUT 3 BN 4 LUT 4 BN 5 LUT 5 LUT 6 BN 6 7
UML Models Example Structure of BNN y 0 x 1 x 2 x 3 k 1 k 2 k 3 k 4 0 0 0 0 0 1 0 0 0 1 1 0 1 0 0 1 0 0 0 0 1 0 1 1 1 0 0 0 1 0 0 0 1 0 0 1 0 1 1 0 0 1 1 1 0 1 0 0 0 1 1 1 0 0 0 1 x 1 x 2 k 1 k 2 y 1 k 3 y 0 y 1 y 2 y 3 y 4 y 5 y 6 y 7 y 8 y 9 k 1 1 1 0 1 0 1 1 0 0 0 k 2 0 1 0 0 1 0 1 1 1 0 k 3 0 0 1 1 0 0 1 1 1 0 k 4 0 0 1 0 1 1 0 0 1 1 x 3 k 4 y 9 8
UML Models Design Model <<focus>> Main <<focus>> -app +create() : Main 1 +destroy() : void -app +main() : int +create() : Main 1 +destroy() : void +main() : int boolean[] x=new boolean[nx]; boolean[] y=new boolean[ny]; Bnn net = new Bnn(); net.init_x(x); if(net.calculate()) { y=net.get_y(); } destroy net; destroy y; destroy x; return 0; calculate <<auxiliary>> -bnn Bnn 1 calculate -bnn 1 return (!a&&!c a&&!b&&c +create() : Bnn a&&b&&!c); +destroy() : void +calculate() : boolean return k01 k02; k01=k1(); k02=k2(); y00=y0(); y01=y1(); y09=y9(); return true; y[0]=y00; y[1]=y01; y[9]=y09; return y; a=inputs[0]; b=inputs[1]; c=inputs[2]; <<auxiliary>> Bnn +a : boolean +b : boolean +c : boolean +y00 : boolean +y01 : boolean +create() : Bnn +destroy() : void +k1() : boolean +k2() : boolean +() +y0() : boolean +y1() : boolean +() +calculate() : boolean +init_x( x : boolean[] ) : void +get_y() : boolean[] 9
UML Models Deployment Model Implementations platforms: C++ VHDL <<ImplementationPlatform>> C++ <<implement>> <<ImplementationPlatform>> VHDL <<implement>> Hardware platform: Pentium IV processor 2.4 GHz Xilinx Virtex-II FPGA 3 million gates 100 MHz Communication Path: PCI-Bus 33 MHz <<SystemMaster>> h0 <<deploy>> <<executable>> main.exe - master 1 <<manifest>> Main <<realize>> <<focus>> Main Communication Path BNN <<realize>> <<auxiliary>> Bnn -slave 1 <<manifest>> <<FPGA>> h1 <<deploy>> <<Configuration>> bnn.bit 10
Experiment results Device Utilization Summary compilation/synthesis time: 3-5 minutes Logic Utilization Used for Bnn # Slices: 64 (49) # Flip Flops: 92 (79) # LUTs: 91 (56) # IOBs: 102 Bnn::calculate: 21(14) LUTs, execution time: 0.200 µs Method # Slices #Flip Flops #4-input LUTs Bnn::calculate 18 27 21 Bnn::create 1 1 0 Bnn::destroy 1 1 0 Bnn::k1 4 5 7 Bnn::k2 3 4 5 Bnn::k3 2 4 3 Bnn::k4 3 5 4 Bnn::y0 2 3 2 Bnn::y1 2 3 3 Bnn::y2 2 3 3 Bnn::y3 2 3 3 Bnn::y4 2 3 3 Bnn::y5 2 3 3 Bnn::y6 2 4 3 Bnn::y7 2 3 3 Bnn::y8 2 4 3 Bnn::y9 2 3 2 11
Experiment results Technology schematic of Bnn::k4() 12
Conclusion Results (1) UML based hardware/software co-design of Boolean neural networks, (2) decreasing of the required number of configurable logic blocks (CLB) for the realizing of Boolean neuron, (3) Boolean neuron can be mapped directly to lookup table (LUT) and configurable logic block (CLB) of FPGAs, (4) efficient FPGA implementations of BNNs in terms of performance and gate count. 13
Conclusion Future work optimal presentation of Boolean functions by BNNs, automated hardware/software synthesis with MOCCA and UML, optimization of FPGA implementation of Boolean neural networks, design and develop of mapping methodology for Boolean neural networks with on-chip learning. 14