Implementation of emulated digital CNN-UM architecture on programmable logic devices and its applications



Similar documents
Investigation of emulated-digital CNN-UM architectures: Retina model and Cellular Wave-Computing Architecture implementation on FPGA

Reconfigurable Architecture Requirements for Co-Designed Virtual Machines

Implementations of CNN-based image processing and adaptive optic system on FPGA

Implementation and Design of AES S-Box on FPGA

7a. System-on-chip design and prototyping platforms

Digital Systems Design! Lecture 1 - Introduction!!

High-Level Synthesis for FPGA Designs

Seeking Opportunities for Hardware Acceleration in Big Data Analytics

LMS is a simple but powerful algorithm and can be implemented to take advantage of the Lattice FPGA architecture.

IMPLEMENTATION OF FPGA CARD IN CONTENT FILTERING SOLUTIONS FOR SECURING COMPUTER NETWORKS. Received May 2010; accepted July 2010

Low-resolution Image Processing based on FPGA

Agenda. Michele Taliercio, Il circuito Integrato, Novembre 2001

9/14/ :38

Reconfigurable Low Area Complexity Filter Bank Architecture for Software Defined Radio

AC : PRACTICAL DESIGN PROJECTS UTILIZING COMPLEX PROGRAMMABLE LOGIC DEVICES (CPLD)

ON SUITABILITY OF FPGA BASED EVOLVABLE HARDWARE SYSTEMS TO INTEGRATE RECONFIGURABLE CIRCUITS WITH HOST PROCESSING UNIT

BSc in Computer Engineering, University of Cyprus

Hardware and Software

How To Design An Image Processing System On A Chip

Design and Analysis of Parallel AES Encryption and Decryption Algorithm for Multi Processor Arrays

Systolic Computing. Fundamentals

FPGA area allocation for parallel C applications

An Artificial Neural Networks-Based on-line Monitoring Odor Sensing System

A Simultaneous Solution for General Linear Equations on a Ring or Hierarchical Cluster

METHODOLOGICAL CONSIDERATIONS OF DRIVE SYSTEM SIMULATION, WHEN COUPLING FINITE ELEMENT MACHINE MODELS WITH THE CIRCUIT SIMULATOR MODELS OF CONVERTERS.

NEUROMATHEMATICS: DEVELOPMENT TENDENCIES. 1. Which tasks are adequate of neurocomputers?

Aims and Objectives. E 3.05 Digital System Design. Course Syllabus. Course Syllabus (1) Programmable Logic

NIOS II Based Embedded Web Server Development for Networking Applications

High-fidelity electromagnetic modeling of large multi-scale naval structures

BUILD VERSUS BUY. Understanding the Total Cost of Embedded Design.

SECOND YEAR. Major Subject 3 Thesis (EE 300) 3 Thesis (EE 300) 3 TOTAL 3 TOTAL 6. MASTER OF ENGINEERING IN ELECTRICAL ENGINEERING (MEng EE) FIRST YEAR

Implementation of Modified Booth Algorithm (Radix 4) and its Comparison with Booth Algorithm (Radix-2)

synthesizer called C Compatible Architecture Prototyper(CCAP).

Non-Data Aided Carrier Offset Compensation for SDR Implementation

CFD Implementation with In-Socket FPGA Accelerators

HPC enabling of OpenFOAM R for CFD applications

International Workshop on Field Programmable Logic and Applications, FPL '99

Floating Point Fused Add-Subtract and Fused Dot-Product Units

Abstract. Cycle Domain Simulator for Phase-Locked Loops

Introduction to Digital System Design

Design and FPGA Implementation of a Novel Square Root Evaluator based on Vedic Mathematics

Implementation of Full -Parallelism AES Encryption and Decryption

A Computer Vision System on a Chip: a case study from the automotive domain

Solutions for Increasing the Number of PC Parallel Port Control and Selecting Lines

FPGA Implementation of RSA Encryption Engine with Flexible Key Size

An Efficient Architecture for Image Compression and Lightweight Encryption using Parameterized DWT

MsC in Advanced Electronics Systems Engineering

Lesson 7: SYSTEM-ON. SoC) AND USE OF VLSI CIRCUIT DESIGN TECHNOLOGY. Chapter-1L07: "Embedded Systems - ", Raj Kamal, Publs.: McGraw-Hill Education

2. TEACHING ENVIRONMENT AND MOTIVATION

A Parallel Processor for Distributed Genetic Algorithm with Redundant Binary Number

Intel Labs at ISSCC Copyright Intel Corporation 2012

Attaining EDF Task Scheduling with O(1) Time Complexity

Analecta Vol. 8, No. 2 ISSN

Hardware-Aware Analysis and. Presentation Date: Sep 15 th 2009 Chrissie C. Cui

General syllabus for third-cycle studies in Electrical Engineering TEEITF00

HARDWARE ACCELERATION IN FINANCIAL MARKETS. A step change in speed

How To Calculate Kinematics Of A Parallel Robot

DEVELOPMENT OF DEVICES AND METHODS FOR PHASE AND AC LINEARITY MEASUREMENTS IN DIGITIZERS

Synchronization of sampling in distributed signal processing systems

The implementation and performance/cost/power analysis of the network security accelerator on SoC applications

Best Practises for LabVIEW FPGA Design Flow. uk.ni.com ireland.ni.com

Rapid System Prototyping with FPGAs

COURSE CATALOGUE

VHDL DESIGN OF EDUCATIONAL, MODERN AND OPEN- ARCHITECTURE CPU

Overview. Swarms in nature. Fish, birds, ants, termites, Introduction to swarm intelligence principles Particle Swarm Optimization (PSO)

Power Noise Analysis of Large-Scale Printed Circuit Boards

Voronoi Treemaps in D3

Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches

Macromodels of Packages Via Scattering Data and Complex Frequency Hopping

Design and Implementation of an On-Chip timing based Permutation Network for Multiprocessor system on Chip

Curriculum Vitae. 01 August 1973, Gümüşhane, TURKEY. Phone : / Ext.: : kilic@erciyes.edu.tr

MEng, BSc Applied Computer Science

Systems on Chip Design

Control 2004, University of Bath, UK, September 2004

LBM BASED FLOW SIMULATION USING GPU COMPUTING PROCESSOR

Integer Computation of Image Orthorectification for High Speed Throughput

Testing of Digital System-on- Chip (SoC)

Hardware Implementations of RSA Using Fast Montgomery Multiplications. ECE 645 Prof. Gaj Mike Koontz and Ryon Sumner

Windows Server Performance Monitoring

SDR Architecture. Introduction. Figure 1.1 SDR Forum High Level Functional Model. Contributed by Lee Pucker, Spectrum Signal Processing

Technical Aspects of Creating and Assessing a Learning Environment in Digital Electronics for High School Students

FPGA. AT6000 FPGAs. Application Note AT6000 FPGAs. 3x3 Convolver with Run-Time Reconfigurable Vector Multiplier in Atmel AT6000 FPGAs.

Hardware and Logic Implementation of Multiple Alarm System for GSM BTS Rooms

Design Cycle for Microprocessors

Product Development Flow Including Model- Based Design and System-Level Functional Verification

A Compact FPGA Implementation of Triple-DES Encryption System with IP Core Generation and On-Chip Verification

What is a System on a Chip?

Quartus II Software Design Series : Foundation. Digitale Signalverarbeitung mit FPGA. Digitale Signalverarbeitung mit FPGA (DSF) Quartus II 1

Compiling PCRE to FPGA for Accelerating SNORT IDS

Transcription:

Implementation of emulated digital CNN-UM architecture on programmable logic devices and its applications Theses of the Ph.D. dissertation Zoltán Nagy Scientific adviser: Dr. Péter Szolgay Doctoral School of Information Sciences University of Pannonia Veszprém, 2007

Introduction Though the scaling-down covers the problem of increasing computational needs there are some problems which are difficult to solve on traditional digital computers. Typical examples are pattern recognition, data organization, clustering and solution of partial differential equations. Neural networks are proved to be more feasible for these applications than digital computers but they are not used expansively in industrial applications because of the imperfections of the neural hardware. The most important drawback of a general neural network is that quick reprogramming is not possible which restricts its use in very specific applications. Additionally assuming a fully connected neural network is a major obstacle of the implementation because the complexity increases exponentially with the number of processors. Cellular Neural Networks solves this interconnection bottleneck by arranging the processing elements in a square grid and connecting each cell to its local neighborhood. This approach makes it possible to integrate large number of analog processors on a single chip. CNN was found to be very efficient in real time image and signal processing tasks where the computation is carried out by some kind of spatio-temporal phenomena. But the limited accuracy of the current analogue VLSI CNN chips does not make it possible to solve partial differential equations accurate enough to use the results in engineering applications. By using a digital architecture to emulate the CNN dynamics these limitations can be solved but the speed of these architectures is one order smaller than its analogue counterparts. Designing a full custom digital VLSI architecture is very time consuming and costly especially when small number of chips are manufactured. The development costs of an emulated digital CNN architecture can be reduced by using programmable devices during the implementation. The main advantage of the use of reconfigurable devices is that it makes the design and implementation of a digital architecture without any concern about the manufacturing technology possible. Additionally technology changes become easier because only small portions of the design should be redesigned or no redesign is required at all. Researches and applied methods In the course of my work I investigated how an emulated digital CNN-UM architecture can be implemented on reconfigurable devices. During the exploration I attended to develop and use design techniques which made it possible to design an emulated digital CNN-UM where the computational precision is configurable. Capabilities of the designed basic CNN- UM architecture were extended to make it possible to use arbitrary sized templates and to emulate multi-layer CNN array. The designed architecture was optimized by both area and speed. To simulate the different emulated digital CNN-UM architectures the ModelSim VHDL simulator from Model Technology was used while the Foundation ISE development system from Xilinx was used during the FPGA implementation. The implemented CNN-UM processors were tested on the XSV-300 prototyping board from XESS Corporation. During my work I investigated the required computational precision to solve different partial differential equations on emulated digital CNN-UM architectures. To solve partial differential equations several different numerical methods based on finite difference method were used. The solutions of the different partial differential equations were carried out by using proprietary software. To implement the CNN-UM architectures, which were optimized to solve partial differential equations, the C based Handel-C high-level hardware description language and the DK development system was used. The DK development system makes it possible to 2

sythesize the behavioral models described in Handel-C which increased the flexibility and shortened the design time compared to the traditional RTL level VHDL approach. The designed emulated digital processors, which were optimized to solve partial differential equations, were implemented on the RC-200 prototyping board form Celoxica. New scientific results 1. Thesis: Feasibility of the emulated digital CNN-UM processor implementations on FPGA circuits The CASTLE emulated digital CNN-UM architecture which was designed in SZTAKI makes it possible to emulate the CNN dynamics using different computational precision (1, 6 and 12 bit). The computing performance can be increased by reducing the precision but only a small portion of the chip is used in the low precision modes. However the predefined precisions are appropriate in general image processing tasks but often larger accuracy is required e.g. modeling of biological systems, solution of partial differential equations. On the recent analog and digital VLSI CNN-UM implementations only nearest neighbor templates can be used. Larger templates can be used after template decomposition but not every CNN template can be decomposed. In these cases software simulation must be used to compute the CNN dynamics but its performance is very low due to the increased computing requirements. Complicated biological and physical systems can be very efficiently modeled by using multi-layer CNN. But the analog VLSI CNN implementations can not be used in multilayer applications or its accuracy is not satisfactory. Thus software simulation is required in the analysis of the multi-layer CNN dynamics which is very slow especially when the array size is large or the time-constants of the layers are very different. To solve the previous problems a new emulated digital CNN-UM family called Falcon was developed. I have showed that the FPGA implementation of the Falcon emulated digital CNN-UM has orders of magnitude higher computing performance than the software simulation running on a 3.0GHz Pentium 4 processor. The capabilities of the Falcon emulated digital CNN-UM was extended to make application of arbitrary sized templates and emulation of multi-layer CNN possible. 1.1. Implementation and optimization of configurable emulated digital CNN-UM processor on Xilinx FPGA circuits Based on the CASTLE emulated digital CNN-UM architecture I have designed a new configurable emulated digital CNN-UM processor and optimized it on FPGA circuits. By using this new architecture, which called Falcon, arbitrary sized CNN arrays can be emulated with configurable computing precision. The main parameters of the processors such as width of the cell array, bit width of the state, input and template values, the number of space-variant templates and the number and arrangement of the processor cores in the architecture can be set in the synthsizable RTL description. By changing the previously specified parameters the size and performance of the Falcon architecture can be optimized for the given application. I have shown that the clock frequency of the Falcon architecture is 147-429MHz depending on the computing precision when implemented on Xilinx Virtex-IIPro FPGAs. Computation of a new cell value is carried out in 3 clock cycles thus the performance of the processor is 49-143 million cell iteration/s. I have shown that the computing performance is 3.5-10.4 times higher than the performance of a Pentium 4 processor 3

running on 3.0GHz clock frequency. Performance of the Falcon architecture can be further increased by using more processing elements. The number of implementable processor cores on the largest Virtex-IIPro 125 FPGA is 11-185 depending on the computing precision. 1.2. Implementation of a CNN-UM for arbitrary sized templates I have worked out a new method to run arbitrary sized templates on emulated digital architectures. I have designed a new emulated digital architecture where the template size can be configured in the synthesizable RTL level description. According to the configuration parameters in the RTL description the number of functional units is changed automatically. According to the n n template size n multipliers are required which can compute a new cell value in n clock cycles. The control unit of the processor is automatically adapted to the length of the different iteration cycles. I have shown that the larger number of functional units does not influence the operating speed. The clock frequency is independent from the template size and 147-429MHz can be achieved on the Virtex-IIPro FPGAs. Due to the longer iteration cycle performance of the Falcon architecture is decreased to 29-85 million cell iteration/s in case of 5 5 sized templates. I have shown that the computing performance is 3.3-9.8 times higher than the performance of a Pentium 4 processor running on 3.0GHz clock frequency. The increased number of functional units reduces the number of implementable processors. In the case of 5 5 sized templates and by using the Virtex-IIPro 125 FPGA 6-111 processors can be implemented depending on the computing precision. 1.3. Implementation of a multi-layer CNN-UM I have extended the capabilities of the Falcon emulated digital CNN-UM architecture to emulate multi-layer CNN cell array. The new architecture emulates a fully connected CNN thus every layer is connected to the other with globally configurable sized templates. The multi-layer Falcon architecture is constructed from the main elements of the singlelayer processor. In the case of r layers r memory units and r interconnected arithmetic units are required for each layer (r r altogether). The number of clock cycles required to compute a new cell value is independent from the number of layers and only depends on the template size. Area requirement of the multi-layer processor is greatly increased by the several interlayer connections. In case of 3 layer network with 3 3 sized templates the number of implementable processors is 1-20 on the Virtex-IIPro 125 FPGA. I have shown that this area increase does not affect the operating frequency and 147-429MHz can be achieved. In case of 3 layer network with 3 3 sized templates the multi-layer Falcon processor is 49-143 times faster than a Pentium 4 processor running on 3.0GHz clock frequency. 1.4. Area optimization of the Falcon emulated digital CNN-UM architecture on FPGAs by using distributed arithmetic I have designed an area optimized version of the arithmetic unit of the Falcon emulated digital CNN-UM architecture by using distributed arithmetic. This architecture can be used to run space invariant templates. I have shown that the area requirement of the optimized arithmetic unit is about 40% smaller while its computing performance is unchanged. Additionally the new arithmetic unit is more scalable than the conventional arithmetic unit. In the case of the conventional arithmetic unit and assuming n n sized template the template operation can be carried out by using 1, n and n 2 multipliers and the computation can be carried out in n 2, n and 1 clock cycles respectively. I have shown that 4

in case of distributed arithmetic the cycle time depends on the precision of the state variable for example: if the precision is 12 bits the cycle time can be 1, 2, 3, 4, 6 and 12 clock cycles. 2. Thesis: Using application specific emulated digital CNN-UM in the solution of partial differential equations The solution of partial differential equations (PDE) has long been one of the most important fields of mathematics, due to the frequent occurrence of spatio-temporal dynamics in many branches of physics, engineering and other sciences. The array structure and local connectivity of the CNN paradigm makes it a natural framework to solve partial differential equations by using finite differencing. But in most cases multilayer CNN is required. Additionally in case of some important equation, for example the Navier-Stokes equations, the interaction between the cells is nonlinear. By using the recent analog VLSI CNN-UM chips only approximation of the multi-layer behavior is possible and an additional problem is the implementation of the nonlinear interactions. The 7-8 bit accuracy and the 128 128 array size of the recent analog VLSI CNN-UM chips are not enough in some engineering applications. By using the Falcon emulated digital architecture the array size and the number of layers are not problems. But the accuracy of the solution should be examined from a different aspect: what the required minimal precision to get right solution is. Template operators required to solve partial differential equation on CNN are usually symmetrical, space invariant or the ratio of the template values are constant. These properties make it possible to specialize the Falcon emulated digital CNN-UM architecture to solve the given partial differential equation. Implementation of these specialized processors requires smaller area and its performance can be improved significantly. In these cases the conventional VHDL based RTL level design method is very time consuming thus high level synthesis methods should be used during the design of the processors. 2.1. The effect of the computing precision on the accuracy of the solution in case of the solution of partial differential equations I have developed two new heuristic methods which can be used to determine the optimal computing precision during the fixed-point solution of partial differential equations and systems of ordinary differential equations. The efficiency of the methods was proved by algorithmic considerations and experiments. I have tested the new heuristic methods on different types of partial differential equations and systems of ordinary differential equations. I have shown experimentally that the new heuristic methods are general. 2.2. Application of high-level synthesis and rapid prototyping techniques in the design of partial differential equation solver architectures I have examined the solution of two partial differential equations (tactile sensor, barotropic ocean model) and I have designed a new computing architecture to solve these equations which fit well into the structure of emulated digital CNN architecture and permit fast and efficient computation. I have introduced a new method which can be used to design specialized emulated digital architectures for solution of partial differential equations in a fraction of time than the conventional design methods. I have demonstrated the operation and efficiency of the method in the solution of two partial differential equations (tactile sensor, barotropic ocean model). The architecture makes it possible to 5

emulate locally connected cell arrays with arbitrary cell characteristics. To change the characteristics of the cell only the arithmetic unit should be modified but by using a highlevel hardware description language this can be done simpler and its simulation is orders of magnitude faster than the conventional VHDL based approach. Application of the results Shortly after the publication of the theory of Cellular Neural Networks lots of analogic algorithms were published to solve wide variety of tasks. But the lack of the appropriate hardware platform raised difficulties during the practical application of the results. Introduction of the first analog VLSI CNN-UM chips boosted up the research and made implementation of the theoretical results possible. In spite of the computing performance of the analog VLSI CNN-UM chips are very significant their accuracy is inadequate in some cases. Additionally they are very sensitive to the different types of noises. To overcome these difficulties emulated digital CNN-UM architectures were designed which are slower than their analog counterparts but their accuracy and noise sensitivity are much better. But both solutions have common drawback because only nearest neighborhood templates can be used on these architectures. The Falcon emulated digital CNN-UM architecture presented in the dissertation makes it possible to run analogic algorithms with high accuracy requirement while its computing performance is comparable to the analog implementations. The configurable computing precision makes it possible to optimize the resource requirements of the different analogic algorithms. On the extended Falcon architecture arbitrary sized templates can be used and multi-layer CNN cell array can be emulated. The multi-layer CNN can be used in the solution of the state equation of complex dynamical systems and partial differential equations. Such a dynamical system can be for example a qualitatively correct mammalian retina model. The usefulness of the Falcon emulated digital CNN-UM architecture is demonstrated during the solution of several different partial differential equations. Two heuristic methods are presented to determine the optimal computing precision which makes it possible to reduce the area, power and I/O requirements of the architecture. The Falcon emulated digital CNN-UM architecture can be very efficiently used to solve problems where the dynamics of the system should be determined with high accuracy. 6

The Author s Publications Journal papers [1] Z. Nagy, P. Szolgay Configurable Multi-Layer CNN-UM Emulator on FPGA IEEE Transactions on Circuits and Systems I: Fundamental Theory and Applications, Vol. 50, pp. 774-778, 2003 [2] Z. Nagy, P. Szolgay Fast and efficient multi-layer CNN-UM emulator using FPGA Periodica Polytechnica Electrical Engineering, Vol. 47, No. 1-2, pp. 57-70, 2003 [3] P. Kozma, Z. Nagy, P. Szolgay Seismic wave propagation modelling on emulated digital CNN-UM architecture Periodica Polytechnica Electrical Engineering, Vol. 49, No. 3-4, pp. 183-193, 2005 [4] Z. Nagy, P. Szolgay Solving Partial Differential Equations On Emulated Digital CNN- UM Architectures International Journal Functional Differential Equations, Vol. 13, No. 1, pp. 61-87, 2006, ISSN: 0793-1786 [5] Z. Nagy, Zs. Vörösházi, P. Szolgay Emulated digital CNN-UM solution of partial differential equations International Journal of Circuit Theory and Applications, Vo. 34, Issue 4, pp. 445-470, 2006, DOI: 10.1002/cta.363 International conference papers [6] A. Katona, Z. Nagy, The functional test of a real-time image processor model Proceedings of INTCOM99, Budapest, Hungary, 1999 [7] A. Katona, Z. Nagy, The implementation of a real-time image processor on FPGA Proceedings of INTCOM2000, Veszprém, Hungary, 9-14 September, 2000 [8] Z. Nagy, P. Szolgay An emulated digital CNN-UM implementation on FPGA with programmable accuracy Proceedings of the 4 th IEEE DDECS Workshop, Győr, Hungary, April 18-20, 2001 [9] Z. Nagy, P. Szolgay Fast and efficient multi-layer CNN-UM emulator using FPGA Proceedings of the 3 rd Conference of PhD Students in Computer Science, Szeged, Hungary, July 1-4, 2002 [10] Z. Nagy, P. Szolgay Configurable Multi-Layer CNN-UM Emulator on FPGA Proceedings of the 7 th IEEE International Workshop on Cellular Neural Networks and their Applications, CNNA 2002, Frankfurt/Main, Germany, July 22-24, 2002 [11] Z. Nagy, P. Szolgay Configurable multi-layer CNN-UM emulator on FPGA using Distributed Arithmetic Proceedings of the 9 th IEEE International Conference on Electronics, Circuits and Systems, Dubrovnik, Croatia, September 15-18, 2002 [12] Z. Nagy, P. Szolgay Numerical solution of a class of PDEs by using emulated digital CNN-UM on FPGAs Proceedings of the 16 th European Conference on Circuits Theory and Design, Cracow, September 1-4, 2003 [13] Z. Nagy, Zs. Szolgay, P. Szolgay Tactile Sensor Modeling by Using Emulated Digital CNN-UM Proceedings of the 8 th IEEE International Workshop on Cellular Neural Networks and their Applications, CNNA 2004, Budapest, Hungary, July 22-24, 2004 [14] Z. Nagy, P. Szolgay Emulated Digital CNN-UM Implementation of a Barotropic Ocean Model Proceedings of the International Joint Conference on Neural Networks, IJCNN 2004, Budapest, Hungary, July 25-29, 2004 7

[15] L. Beke, Z. Nagy, P. Szolgay Low-cost CNN-UM global analogic programming unit implementation on FPGA Proceedings of the 8 th IEEE International Workshop on Cellular Neural Networks and their Applications, CNNA 2004, Budapest, Hungary, July 22-24, 2004 [16] Z. Nagy, Zs. Vörösházi, P. Szolgay An Emulated Digital Retina Model Implementation on FPGA Proceedings of the 9 th IEEE International Workshop on Cellular Neural Networks and their Applications, CNNA 2005, Hsin-chu, Taiwan, May 28-30, 2005 [17] Z. Nagy, P. Szolgay Emulated Digital CNN-UM Implementation of a 3-dimensional Ocean Model on FPGAs Proceedings of the 8 th Military and Aerospace Programmable Logic Devices International Conference, MAPLD2005, Wasgington DC., USA, September 7-9, 2005 http://klabs.org/mapld05/abstracts/153_nagy_a.html [18] Z. Nagy, Zs. Vörösházi, P. Szolgay, Mammalian retina model implementation on emulated digital FPGA, Joint Hungarian-Austrian Conference on Image Processing and Pattern Recognition, ISBN 3-85403-192-0, pp. 295-302, Veszprém, 2005 [19] Z. Nagy, Zs. Vörösházi, P. Szolgay An advanced emulated digital retina model on FPGA to implement a real-time test environment Proceedings of the 2006 IEEE International Symposium on Circuits and Systems, ISCAS2006, Island of Kos, Greece, May 21-24, 2006 [20] Z. Nagy, Zs. Vörösházi, and P. Szolgay A Real-time Mammalian Retina Model Implementation on FPGA, Proceedings of the 10 th IEEE International Workshop on Cellular Neural Networks and their Applications, CNNA2006, Istanbul, Turkey, August 28-30, 2006 [21] Zs. Vörösházi, Z. Nagy, A. Kiss, P. Szolgay An Embedded CNN-UM Global Analogic Programming Unit implementation on FPGA, Proceedings of the 10 th IEEE International Workshop on Cellular Neural Networks and their Applications, CNNA2006, Istanbul, Turkey, August 28-30, 2006 [22] Z. Kincses, Z. Nagy, P. Szolgay Implementation of Nonlinear Template Runner Emulated Digital CNN-UM on FPGA, Proceedings of the 10 th IEEE International Workshop on Cellular Neural Networks and their Applications, CNNA2006, Istanbul, Turkey, August 28-30, 2006 [23] S. Kocsárdi, Z. Nagy, S. Kostianev, P. Szolgay FPGA Based Implementation of Water Reinjection in Geothermal Structure, Proceedings of the 10 th IEEE International Workshop on Cellular Neural Networks and their Applications, CNNA2006, Istanbul, Turkey, August 28-30, 2006 [24] P. Sonkoly, P. Kozma, Z. Nagy, P. Szolgay Acoustic Wave Propagation Modeling on 3D CNN-UM Architecture, Proceedings of the 10 th IEEE International Workshop on Cellular Neural Networks and their Applications, CNNA2006, Istanbul, Turkey, August 28-30, 2006 [25] P. Szolgay, S. Kocsárdi; Z. Nagy, P. Sonkoly, Zs. Vörösházi, Complex computational problems in cellular architectures RSEE 2006. Proceedings of the 6 th international conference on renewable sources and environmental electro-technologies, pp: 111-115, Stana De Vele, 2006 8