Investigation of emulated-digital CNN-UM architectures: Retina model and Cellular Wave-Computing Architecture implementation on FPGA

Size: px
Start display at page:

Download "Investigation of emulated-digital CNN-UM architectures: Retina model and Cellular Wave-Computing Architecture implementation on FPGA"

Transcription

1 University of Pannonia Information Science and Technology Doctoral School Investigation of emulated-digital CNN-UM architectures: Retina model and Cellular Wave-Computing Architecture implementation on FPGA Theses of Ph.D. dissertation Zsolt Vörösházi Supervisor: Péter Szolgay DSc Veszprém, Hungary 2009

2 Motivations and aims 1 I. Motivations and aims Nowadays, both the analog- and the digital circuit technology and fabrication are continuously improving and supplementing each other. This improvement is well featured by the scaling-down (micro-minimalization) effect based on the Moore s law. The choice between these two technologies in case of the high-performance, real-time, and near-sensor signal processing tasks is primarily determined by the method of the application. To support the decision critical and typical physical parameters are calculated: such as area (A), speed (S), dissipated power (P) of the complex, large-scale integrated VLSI circuits. Recently, the parallel array processing has become the focus point of the state-ofthe-art analog circuit technology and its digital counterpart. However, following this type of design methodology an important problem was emerged: most of the designers and researchers intended to construct a globally interconnected processor array structure, but its complexity grows exponentially according to the increasing number of processor elements in an array. Cellular Neural / Nonlinear Networks (CNN) are defined as analog, non-linear, parallel computing structure of array including a lot of elementary processor units (e.g. nucleus) arranged in a 2-dimensional regular grid. They can be implemented not only in a single-layer, but well formed in a multi-layer architecture, as well. The processor elements are locally connected (discrete) in space, but they operate continuously in time. The program of the CNN network (called template ) can be defined by the strength of the local interconnections between the elementary cells, or in other words by setting the matrix of the weight factors. The result of the computation is derived from both the spatio-temporal dynamic of the processing elements and the template operations (called analog transient) together. If each elementary CNN cell is extended with a local analog and logical memory unit, a local control unit, and an optical sensor input, moreover adding a global control unit to this integrated cell array the CNN Universal Machine (CNN-UM) architecture can be constructed, recently defined as Cellular Wave Computing Architecture. The CNN-UM is universal both in terms of Turing Machine and it may work as a non-linear computing operator. Each elementary instruction of the CNN-UM defines complex, spatio-temporal dynamic behavior. In the present era, this novel computing architecture based on the CNN paradigm has been implemented on several different platforms. First hardware prototypes of the CNN networks contained analog / mixed-signal VLSI chips. The huge computing performance of these CNN chips (a few TeraOPS operations/sec) far exceeds the performance of all the other digital processor implementations, and the power dissipation is very low. However, they have some disadvantages, which impede their wide spread usability in industrial applications. They are suffered from noisesensitivity, lack of flexibility, and as the most important problem the limited analog accuracy (about 7-8 bits in I/O) giving inaccurate solution in most of signal processing tasks.

3 Motivations and aims 2 The simplest, most accurate, and most flexible, but slowest CNN-UM implementation is the CNN software simulator running on a traditional computer. The software simulator is generally used to ease the template design and optimization process. Moreover, during the measurements some comparisons should be made between the speed-up of different CNN-UM approaches and the computing performance of the CNN software simulator, which latter is considered unity. As an alternate way, CNN software simulation can be accelerated by many-core technologies using GPU-based (Graphics Processor Unit) implementations, such as the NVidia CUDA, or the IBM CELL architecture. The emulated-digital CNN-UM solution means the best compromise between the analog VLSI CNN-UM implementations and software simulators regarding computing performance and accuracy. The emulated-digital solutions have many different physical forms, such as ASIC (Application Specific Integrated Circuit) like the CASTLE array processor, DSP-based (Digital Signal Processor) CNN-HAC prototyping board, or they can be built up on an FPGA (Field Programmable Gate Array) e.g. the FALCON architecture. In case of this emulated-digital approach the behavior of the analog CNN cell network can be approximated by a discretized model in space and time, while the locally connected digital processor elements are arranged in an array. Hence, the nature of CNN provides a flexible and effective computing structure for the complex spatio-temporal dynamical computations of various bio-inspired systems (such as a retina), moreover it makes possible to generate the activity patterns in video real-time, as well. The neuromorph structure of the multi-layer CNN retina model is derived from both morphological and electro-physiological information measured by neurobiologist. According to the latest results of neurobiological investigations a mammalian rabbit retina consists of about different types of ganglion channels, but further channels might be explored due to improvements in the methodology and measurement techniques. Each channel is made up from several (at least 10) diversely inter-connected stack of strata, on which large number of simple processor elements (neurons) are arranged on a two-dimensional structure. The difficulty lies in this evolvable computing problem, that we could handle large number of CNN layers with different physicaltiming parameters, and various connectivity properties beyond the increased computation power requirements. Universality of a CNN-UM network is based on the stored-programming principle, which task can be solved by integrating an embedded Global Analogic Programming Unit (hereafter GAPU) into the cell network. This GAPU is responsible for controlling the sequential instructions of the complex, sophisticated analogical (analog-logical) CNN algorithms; moreover, it can store the necessary values (input, state, bias) to perform the computations. I have chosen the FPGA-based reconfigurable computing devices both for neuromorph structured mammalian retina model implementation and for elaboration of the GAPU. The reason is that today s modern FPGAs provide good alternative to perform complex spatio-temporal, multi-layer CNN dynamical computations at high precision owing to their advantageous features, such as high flexibility and

4 Motivations and aims 3 computing performance, rapid prototyping development, and low cost (in low volume). Therefore, it is worthwhile to review the different CNN-UM approaches especially regarding the FPGA-based emulated-digital CNN-UM implementation. It is very interesting how its inherent computational potential can be exploited in the solution of various real-time processing tasks.

5 Methodology of the research 4 II. Methodology of the research Topic of the dissertation is based on one hand the neuromorph structured, multilayer mammalian retina model implementation, and on the other hand the Global Analogic Programming Unit (GAPU) implementation on FPGA architecture. During the software development and setup phase of different test applications I used several industry-standard programming and simulation EDA tools, such as Xilinx ISE synthesis tools with EDK embedded processor developer kit, Celoxica/Agility DK Design Suite supporting the Handel-C high-level description language, Mentor ModelSim simulator, and MATLAB programming tool. Moreover, for hardware prototyping I have tried to choose various development boards, which are capable of covering the distinct directions and levels of FPGA evolutions (e.g. Celoxica RC203 and RC2000 boards with Virtex-II, a Xilinx V2Pro board equipped with Virtex-II Pro, and finally a Xilinx ML-506 card embedded with Virtex-5 FGPA). I m intended to use the modern hardware-software co-design and co-verification techniques, which means the state-of-the-art and most popular form of the FPGAbased reconfigurable computing (RC) implementation. In these types of hardwaresoftware co-design tasks the partitioning step means the key problem, in which the designer can freely determine the distinction of hardware and software parts. However, the partitioning may also depend on the available resources and achievable performance. The reconfigurable FPGA architectures make it possible to perform optimized DSP operations by utilizing the internal dedicated building blocks at high speed: e.g. calculating a convolution by means of Multiply-Accumulate (MAC) DSP block. Today (2009) largest and most powerful high-end Xilinx FPGA is the Virtex-6 SX family (XC6VSX475T), which contains at most 2016 dedicated MAC DSP blocks. During the comprehensive research work I first examined how the bio-inspired mammalian retina model can be implemented on a FPGA-based emulated-digital CNN-UM architecture, which means an open and emerging computation problem. Although, several analog VLSI CNN-UM devices with complex-cells (e.g. CACE1k, CACE2k) are available, they can only handle external layers from a given retina channel. Using this type of CNN approach in case of an OPL (Outer Plexiform Layer) at most 2 or 3 strata can be simultaneously modeled in video real-time (about 25 fps). Because of this main limitation I have chosen the emulated-digital CNN-UM approach, which makes it possible to implement different retina channels with various configurable timing-, and connectivity parameters supposing a globally connected multi-layer structure, as well. The aim of the proposed multi-channel, multi-layer retina model implementation on FPGA is to get qualitatively correct results compared to the original neurobiological measurements in order to mimic the behavior of the living retina. Furthermore, the implementation provides real-time processing and several orders of magnitude faster than a software simulator. By the help of this FPGA-based implementation model parameters of the mammalian retina can be verified rapidly

6 Methodology of the research 5 and set correctly. (The run-time of a multi-layer retina model using a software simulator on conventional PC might last about several hours depending on the size of the model and its parameters.) Complexity of this task comes from the handling of large number of strata and the very different parameters of the layers, such as feed forward-, feedback connections, couplings, and time constants. The governing equations, which describe the dynamics of the neuromorph-structured retina model, do not have an exact analytical solution. Therefore, during the investigations I made comparisons between the results of different numerical integrating formulas (e.g. Forward Euler method, higher-order Runge-Kutta methods) to solve these types of ODEs. I examined the critical parameters in the following key aspects: simulation time-step, resource utilization (area), computing performance and accuracy. During the implementation I attempted to elaborate an FPGA-based reconfigurable computing architecture, which can be well configured in arbitrary manner and due to the applied design methodology and rapid prototyping platform the behavior of various bio-inspired multi-channel vision models can be explored in real-time. The implemented emulated-digital CNN-UM architecture on FPGA makes it possible to perform complex, spatio-temporal dynamical equations described by coupled ODEs. Using fixed-point computing method it was important to know how accurately the novel implementation approximates the results of the microbiological measurements. Moreover, area requirements and the largest achievable performance at different computing precisions were measured. Considering scalable precision there was another vital question, what the lowest computing accuracy is by which the model gives qualitatively acceptable responses. (Considering the original microbiological measurements a real mammalian retina works at about 6-bits of analog computing precision.) CNN templates and settings of the interactions are calculated from the parameter tables of the CNN-based neuromorph retina model, whereas the behavior of the model is derived from neurobiological measurements. On one hand, both templates and algorithm solutions of the multi-layer retina model have been implemented on software simulator and tested on different conventional microprocessors. To achieve better performance, ANSI-C/C++ source codes are extended with optimized functions of the Intel Integrated Performance Library. This package is optimized for image-, and signal-processing tasks. On the other hand, the entire retina system on FPGAbased emulated-digital CNN-UM architecture was constructed by modifying and extending the Falcon emulated-digital CNN-UM architecture. Different kinds of prototyping platforms equipped with various Xilinx FPGAs were used for the neuromorph retina model calculations. Finally, the computed results have been compared to the original neurobiological measurements to verify the effectiveness of the proposed FPGA-based CNN-UM architecture. The dissertation also deals with the Global Analogic Programming Unit implementation on the reconfigurable emulated-digital CNN-UM architecture using FPGA. Due to its modular structure and operation it can be simply integrated with the previously elaborated original Falcon CNN processing architecture, thereby, a

7 Methodology of the research 6 universal Cellular Wave Computer architecture on FPGA can be implemented. During the implementation process, I have integrated the Xilinx MicroBlaze embedded soft-processor IP core, which has RISC instruction set, into the GAPU. Then, the embedded system as a global CNN control unit has been extended with some storage elements. Finally, this novel architecture has been integrated with the modified Falcon-ML processor array architecture and Vector Processing Elements. These improvements make it possible to effectively exploit the large computing performance of the Falcon processor in order to construct a fully functional, standalone, and real-time image processing system. By using the proposed embedded GAPU implementation, on one hand, template sequences of the complex sophisticated analogic CNN algorithms can be easily executed, and on the other hand, it is capable of controlling program organizing constructions (e.g. iteration, branch etc.) and I/O instructions similar to various commercial Visual System-on-Chip implementations. (Without using the embedded GAPU implementation only a single instruction or a template operation could be handled via a host PC, which is limited the performance of the Falcon processor significantly). During verification and performance tests the computing time, required to calculate the CNN equations, has been measured repeatedly (50-times). From this set the best runtime has been selected and compared to the estimated performance of different commercial CNN-UM implementations. The quality of the proposed embedded, emulated-digital CNN-UM GAPU implementation is demonstrated and verified by running a complex sophisticated analogic CNN algorithm, in which case consecutive steps of template operations and replacements are required. Based on real experiments, several important issues relating to the acceleration efficiency, computing accuracy, cell size, and area consumption are discussed and compared to the results of the software simulator and the concurrent state-of-the-art CNN-UM implementations. The research work has been done at the Cellular Neural Network Applications Laboratory of the Department of Image Processing and Neurocomputing (its new name is Department of Electrical Engineering and Informational Systems), University of Pannonia.

8 New Scientific Results 7 III. New Scientific Results Thesis Group 1: Implementation of a CNN-based neuromorph mammalian retina model on FPGA architecture I have implemented a novel single-, and multi-channel, multi-layer retina models on a reconfigurable emulated-digital CNN-UM architecture by applying the latest results of microbiological measurements relating to the CNN-based framework of the neuromorph mammalian (e.g. rabbit) retina model. The difficulty of this challenging task lies in the solution of a complex spatio-temporal problem which requires huge computing power. Real-time processing capability has been verified and tested on three different FPGA-based prototyping systems, such as Celoxica RC2000, Digilent XUPV2P, and Xilinx ML-506. I have shown experimentally that the single-, and multi-channel retina model implementations on FPGA achieve orders of magnitude performance increase over the software solutions while still providing high-flexibility in parameter settings. This implementation also makes it possible to handle various neuromoprh retina models or biological systems more easily and effectively owing to its rapid and effective parameter tuning and refining procedure. Related publications: [1],[3],[4],[5],[6],[7],[9],[12] Thesis 1.1: I have elaborated a reconfigurable emulated-digital CNN- UM computing architecture (Falcon-ML), which is feasible to implement CNN-based neuromorph, single-, and multi-channel, multilayer mammalian retina model computations on FPGA effectively. The architecture is tailored to implement single-, and multi-channel, multilayer retina model, therefore I have completely redesigned the Arithmetic Unit for the calculations with diffusion-, Gaussian-type symmetrical templates, and Intra/Inter layer zero-neighborhood connections. The Template Memory unit has been also expanded in order to store the various parameters related to the connections of the multi-channel, multi-layer retina structure. Thesis 1.2: I have shown experimentally that the performance of the optimized retina processor elements for calculating CNN dynamics can be significantly improved by decreasing the computing precision. Not knowing the exact analytical solution of this complex spatio-temporal problem, I have considered the double precision floating point numerical implementation as an accurate solution. In general, it is inferred that at least

9 New Scientific Results 8 22-bit computing precision is necessary to obtain qualitatively correct results from various CNN-based neuromorph mammalian retina channels. I have compared the results of different fixed-point computations to the double precision floating point results and neurobiological measurements. At low precisions (less than 14-bit) the error values of the FPGA-based neuromorph retina model implementation are very high because the model does not respond to the input stimulus. At least bit precision is required to get some response on the output of the model. If the CNN dynamics of the retina model should be computed more accurately, at least bit precision is required. Thesis 1.3: I have given equivalent transformations between the computing performance, the image size, the number of layers and the precision of the elaborated FPGA-based single-, and multi-channel neuromorph retina model implementation. These critical parameters determine the limitations of the FPGA implementation. Considering the qualitatively correct 22-bit state precision the elaborated architecture achieves times higher performance compared to the software solution running optimized codes on Intel Core2 Duo E8400 microprocessor. The multi-layer CNN simulation kernel is written in C using optimized functions of the Image Processing Library from Intel. To emulate a single retina channel at least 10 CNN layers are required, while any further ganglion channels increase the complexity of the CNN network by additional 7 layers. I have implemented the CNN-based neuromorph retina model on several different FPGA-based experimental systems (using Virtex-II, Virtex-II Pro, and Virtex-5 FPGAs), I have explored the maximal number of implementable Falcon processing elements on each platform, while I have estimated the results of the largest Virtex-6 FPGA. Considering 22-bit state precision, 2-7 ms time-step, at (on Virtex-II) and sized images (on Virtex-6) 1-48 parallel retina channels can be implemented and emulated in real-time depending on the dedicated FPGA resources. Larger images can be processed if more BRAM memories are available, but the processing will not be done in real-time. To process higher resolution images an external on-board memory is required. In this case the processing time is at least a half at most 3 orders of magnitude slower than using internal BRAM memories, due to the memory I/O bandwidth limitation.

10 New Scientific Results 9 Thesis Group 2: Implementation of embedded CNN-UM Global Analogical Programming Unit as a Cellular Wave Computer on FPGA architecture I have implemented a Global Analogical Programming Unit on a FPGA-based emulated-digital CNN-UM architecture to obtain a fully functional Cellular Wave Computing architecture. I have completely redesigned the local Control Unit of the Falcon reconfigurable emulated-digital CNN processor and optimized it for the communication with the GAPU, the modified processor called Falcon Processing Element (FPE). I have elaborated a new GAPU architecture to control the sequential template and/or arithmetic-logic operations and program organizing instructions of analogical CNN algorithms. To perform arithmetic and logic operations a new Vector Processing Element (VPE) has been implemented. Finally, the processing array consisting VPE and FPE units has been integrated with the GAPU implementation. I have demonstrated the operation and effectiveness of the proposed embedded GAPU architecture by executing a complex sophisticated skeletonization analogic CNN algorithm. Real-time image processing capability of this autonomous system has been verified and tested on different prototyping systems. I have shown experimentally, that on the largest FPGA architecture at least two orders of magnitude performance advantage can be achieved over the software simulator, while it also provides several times speed-up over competing analog VLSI CNN-UM implementations. Related publications: [2],[8],[10],[11] Thesis 2.1: I have elaborated and implemented a new emulated-digital CNN-UM GAPU architecture, as a Cellular Wave Computer on FPGA by integrating an embedded Xilinx MicroBlaze soft-processor core to control the sequential and program organizing instructions of analogic CNN algorithms effectively. Based on the original reconfigurable emulated-digital CNN-UM processor (FALCON) I have elaborated a new computing architecture, called Falcon Processing Element (FPE), which is optimized for the communication with GAPU. The local Control Unit of the original Falcon architecture has been completely redesigned. I have implemented a new Vector Processing Element (VPE) to perform the arithmetic and logic operations by utilizing the dedicated resources on the FPGA. The processing array of FPE and VPE units is integrated with the GAPU implementation. Without the GAPU, the full processing time of the previous solutions is mainly affected by the communication time between the host PC and the Falcon PE, which is necessary for downloading template sequences, images of input and initial state, and program organizing instructions (such as branch, cycle, etc.), and uploading the result of the computation in each steps of the

11 New Scientific Results 10 algorithm. I have reduced the communication time by storing these parameters and instructions in the internal registers of the GAPU, similarly to the standard CNN-UM structure. The embedded GAPU can communicate directly with the Falcon PEs across the high-speed PLB bus of the Xilinx MicroBlaze soft-core at a frequency of the FPGA s internal clock. Therefore, the Falcon PE is more efficiently (in 91% of the full computing time) utilized when performing complex analogic CNN algorithms. Thesis 2.2: I have proved experimentally that implementation and integration of FPEs, VPEs with a GAPU unit on the reconfigurable emulated-digital CNN-UM system is most optimal at 16-bit computing precision, where the number of implementable Falcon and Vector PEs is the largest. The 18-bit state-precision gives the optimal resource occupancy, which is best suited to the bit width of the dedicated BlockSelectRAM memories (e.g. BRAM18k) and multiplier blocks (e.g. MULT18 18). However, using a Xilinx MicroBlaze embedded soft-processor core the supported high-speed communication bus (e.g. the PLB bus) can be defined as 128-bit wide (multiples of 16 bit), therefore the practical computing precision of the FPE is also 16 bit. Only small amount of the available logic and dedicated chip resources is occupied by the proposed GAPU implementation embedded with a Xilinx MicroBlaze IP core, neither the number of implementable Falcon and Vector Processing elements nor the cumulative computing performance is decreased significantly. Thesis 2.3: I have shown experimentally that the elaborated FPGAbased CNN-UM GAPU implementation provides several orders of magnitude faster processing speed over software simulation and it may also outperform the current analog VLSI CNN-UM systems, depending on the selected FPGA. The computing performance is determined by the image-size, the accuracy of the solution and the number of available dedicated memory resources. I have implemented the embedded GAPU architecture on several different FPGA-based experimental systems (using Virtex-II, Virtex-II Pro, and Virtex- 5 FPGAs), I have measured the maximal performance on each platform, while I have estimated the results of the largest Virtex-6 FPGA. The skeletonization algorithm is selected and executed using nearest neighborhood templates to measure the performance. The functionality of the GAPU is examined both on , and sized images by running 10 Forward Euler iterations,

12 New Scientific Results 11 supposing the optimal 16-bit state-, constant- and 8-bit template precision. The CNN software simulation kernel of the skeletonization algorithm is written in C using optimized functions of the Intel Image Processing Library. Depending on the dedicated resources, the cumulative performance of the Falcon processing array extended with the proposed GAPU implementation can reach 1.33 billion CNN celliteration per second or 135 billion CNN operation per second. Depending on the selected image-size ( , or ) fold speed-up can be achieved over the software simulator running optimized code on an Intel Core2Duo E8400 microprocessor. Performance of the GAPU implementation can reach or may exceed (1-order of magnitude faster) the performance of the analog ASIC VLSI CNN-UM chips (e.g. ACE16k, Q- Eye).

13 Possible Applications 12 IV. Possible Applications Several years ago I had an opportunity to participate in the long design and verification process of the emulated-digital CNN-UM array architecture, called CASTLE, which has been implemented at Analogical and Neural Computing Laboratory, Hungarian Academy of Sciences. I have attained a fundamental knowledge about the high-level hardware description languages and the full-custom ASIC VLSI layout design-simulation procedure by elaborating the global timing-, and control unit of the CASTLE array processor architecture. Implementation of the standalone FPGA-based CNN-UM system with GAPU integration benefits from this know-how. Instead of using the expensive and long development procedure of the full-custom ASIC VLSI technology, I decided to apply reconfigurable computing (RC) devices for CNN computations. Reconfigurable computing architectures, such as FPGAs, make it possible to implement low-cost, flexible and reprogrammable emulated-digital CNN-UM systems for various application areas. In the dissertation the investigated emulated-digital CNN-UM architectures have been implemented on reconfigurable computing devices and they can be employed in the following possible application areas: On one hand, different neuromorph structured, multi-layer, multi-channel retina models can be analyzed or bio-inspired biological systems can be modeled on FPGA, where the high-speed processing capability is essential. High computing performance has been achieved by using simple locally interconnected processing elements, which arranged in a large array. Moreover, this implementation has orders of magnitude higher speed advance over software simulators, which provides examinations of the differently organized retina models in video realtime by rapid reconfiguration capability. By the proposed implementation it might be possible to explore and understand the relation between the stimulation of a neuron in its corresponding receptive field and the recorded spiking patterns for a given ganglion channel. The quality of the retina model can be examined by the comparison of the FPGA-based measurements and the results of the neurobiological measurements. Knowing the differences between them the structure and parameters of the retina model can be tuned and refined. By using the properly defined retina model a smart vision system can be implemented on FPGA, which makes more effective object recognition-, tracking-, and classification possible for example in surveillance or reconnaissance applications. On the other hand, the Falcon reconfigurable emulated-digital CNN-UM processor array embedded with the GAPU implementation gives a fully functional, stand-alone image processing system. Using the GAPU implementation complex sophisticated analogical CNN algorithms can be executed in real-time. It makes possible to perform sequences of template

14 New Scientific Results 13 operations, analog and logic operations and program organizing instructions on a single FPGA based system. The GAPU implementation can be easy integrated with the previously elaborated single-layer Falcon architecture (Falcon-SL), the previously mentioned Falcon multi-layer retina architecture (Falcon-ML), or the nonlinear template runner architecture (Falcon-Nonlinear), as well. Therefore, the wide spread applicability of the GAPU implementation is further expanded towards low-cost, smart and complex image processing systems.

15 List of Publications 14 IV. List of Publications Journal Papers [1] Z. Nagy., Zs. Vörösházi, P. Szolgay Emulated Digital CNN-UM Solution of Partial Differential Equations International Journal of Circuit Theory and Applications, Wiley, Vol. 34: Special Issue : Special Issue on CNN Technology (Part 2), July-Aug pp (IF: ), ISSN: [2] Zs. Vörösházi, A. Kiss, Z. Nagy, P. Szolgay Implementation of embedded emulated-digital CNN-UM Global Analogic Programming Unit on FPGA and its application International Journal of Circuit Theory and Applications, Wiley, Vol. 36: Special Issue: Cellular Wave Computing Architecture, July-Sep pp (IF: ), ISSN: [3] Zs. Vörösházi, Z. Nagy, P. Szolgay FPGA-Based Real Time, Multichannel Emulated-Digital Retina Model Implementation EURASIP Journal on Advances in Signal Processing, Hindawi, Vol. 2009, Special Issue on CNN Technology for Spatiotemporal Signal Processing (IF: ), ISSN: International Conference Papers [4] Z. Nagy, Zs. Vörösházi, P. Szolgay An Emulated Digital Retina Model implementation on FPGA. CNNA th IEEE International Workshop on Cellular Neural Networks and their Applications, Hsinchu, Taiwan, May, 2005, pp [5] Z. Nagy, Zs. Vörösházi, P. Szolgay Mammalian Retina Model Implementation on Emulated Digital FPGA HACIPPR th Joint Hungarian-Austrian Conference on Image Processing and Pattern Recognition, Veszprém, Hungary, May, 2005, pp [6] Zs. Vörösházi, Z. Nagy, P. Szolgay An Advanced emulated digital Retina Model on FPGA to implement a real-time test environment ISCAS 2006 IEEE International Symposium on Circuits and Systems, Kos, Greece, May, 2006, pp [7] P. Szolgay, S. Kocsárdi, Z. Nagy, P. Sonkoly, Zs. Vörösházi Complex Computational Problems in Cellular Architectures RSEE 2006, Oradea, Romania, 8-10 June, 2006, pp

16 List of Publications 15 [8] Zs. Vörösházi, A. Kiss, Z. Nagy, P. Szolgay An embedded CNN-UM Global Analogic Programming Unit implementation on FPGA CNNA th IEEE International Workshop on Cellular Neural Networks and their Applications, Istanbul, Turkey, Aug, 2006, pp [9] Z. Nagy, Zs. Vörösházi, P. Szolgay A Real-time Mammalian Retina Model Implementation on FPGA CNNA th IEEE International Workshop on Cellular Neural Networks and their Applications, Istanbul, Turkey, Aug, (live demo) [10] Zs. Vörösházi, A. Kiss, Z. Nagy, P. Szolgay FPGA Based Emulated-Digital CNN-UM Implementation with GAPU CNNA th IEEE International Workshop on Cellular Neural Networks and their Applications, Santiago de Compostela, Spain, July, 2008, pp [11] Zs. Vörösházi, A. Kiss, Z. Nagy, P. Szolgay A Standalone FPGA Based Emulated-Digital CNN-UM System CNNA th IEEE International Workshop on Cellular Neural Networks and their Applications, Santiago de Compostela, Spain, July, 2008, (live demo) pp. 4. [12] Zs. Vörösházi, Z. Nagy, P. Szolgay An Advanced Real-Time, Multi-Channel Emulated-Digital Retina Model Implementation on FPGA CNNA th IEEE International Workshop on Cellular Neural Networks and their Applications, Santiago de Compostela, Spain, July, 2008 (live demo), pp. 6.

Implementation of emulated digital CNN-UM architecture on programmable logic devices and its applications

Implementation of emulated digital CNN-UM architecture on programmable logic devices and its applications Implementation of emulated digital CNN-UM architecture on programmable logic devices and its applications Theses of the Ph.D. dissertation Zoltán Nagy Scientific adviser: Dr. Péter Szolgay Doctoral School

More information

Analogic Computers Ltd. CNN Technology. - introduction, tools and application examples-

Analogic Computers Ltd. CNN Technology. - introduction, tools and application examples- CNN Technology - introduction, tools and application examples- Outline Introduction to CNN Array structure and the analog core cell CNN Universal Machine CNN implementations Analog mixed-signal VLSI Emulated

More information

Architectures and Platforms

Architectures and Platforms Hardware/Software Codesign Arch&Platf. - 1 Architectures and Platforms 1. Architecture Selection: The Basic Trade-Offs 2. General Purpose vs. Application-Specific Processors 3. Processor Specialisation

More information

http://www.ece.ucy.ac.cy/labs/easoc/people/kyrkou/index.html BSc in Computer Engineering, University of Cyprus

http://www.ece.ucy.ac.cy/labs/easoc/people/kyrkou/index.html BSc in Computer Engineering, University of Cyprus Christos Kyrkou, PhD KIOS Research Center for Intelligent Systems and Networks, Department of Electrical and Computer Engineering, University of Cyprus, Tel:(+357)99569478, email: ckyrkou@gmail.com Education

More information

Digital Systems Design! Lecture 1 - Introduction!!

Digital Systems Design! Lecture 1 - Introduction!! ECE 3401! Digital Systems Design! Lecture 1 - Introduction!! Course Basics Classes: Tu/Th 11-12:15, ITE 127 Instructor Mohammad Tehranipoor Office hours: T 1-2pm, or upon appointments @ ITE 441 Email:

More information

7a. System-on-chip design and prototyping platforms

7a. System-on-chip design and prototyping platforms 7a. System-on-chip design and prototyping platforms Labros Bisdounis, Ph.D. Department of Computer and Communication Engineering 1 What is System-on-Chip (SoC)? System-on-chip is an integrated circuit

More information

Reconfigurable Architecture Requirements for Co-Designed Virtual Machines

Reconfigurable Architecture Requirements for Co-Designed Virtual Machines Reconfigurable Architecture Requirements for Co-Designed Virtual Machines Kenneth B. Kent University of New Brunswick Faculty of Computer Science Fredericton, New Brunswick, Canada ken@unb.ca Micaela Serra

More information

ON SUITABILITY OF FPGA BASED EVOLVABLE HARDWARE SYSTEMS TO INTEGRATE RECONFIGURABLE CIRCUITS WITH HOST PROCESSING UNIT

ON SUITABILITY OF FPGA BASED EVOLVABLE HARDWARE SYSTEMS TO INTEGRATE RECONFIGURABLE CIRCUITS WITH HOST PROCESSING UNIT 216 ON SUITABILITY OF FPGA BASED EVOLVABLE HARDWARE SYSTEMS TO INTEGRATE RECONFIGURABLE CIRCUITS WITH HOST PROCESSING UNIT *P.Nirmalkumar, **J.Raja Paul Perinbam, @S.Ravi and #B.Rajan *Research Scholar,

More information

International Workshop on Field Programmable Logic and Applications, FPL '99

International Workshop on Field Programmable Logic and Applications, FPL '99 International Workshop on Field Programmable Logic and Applications, FPL '99 DRIVE: An Interpretive Simulation and Visualization Environment for Dynamically Reconægurable Systems? Kiran Bondalapati and

More information

High-Level Synthesis for FPGA Designs

High-Level Synthesis for FPGA Designs High-Level Synthesis for FPGA Designs BRINGING BRINGING YOU YOU THE THE NEXT NEXT LEVEL LEVEL IN IN EMBEDDED EMBEDDED DEVELOPMENT DEVELOPMENT Frank de Bont Trainer consultant Cereslaan 10b 5384 VT Heesch

More information

Implementations of CNN-based image processing and adaptive optic system on FPGA

Implementations of CNN-based image processing and adaptive optic system on FPGA Implementations of CNN-based image processing and adaptive optic system on FPGA Ph.D. Theses Zoltán Kincses Supervisor: Péter Szolgay (DSc) University of Pannonia Doctoral School of Information Science

More information

How To Design An Image Processing System On A Chip

How To Design An Image Processing System On A Chip RAPID PROTOTYPING PLATFORM FOR RECONFIGURABLE IMAGE PROCESSING B.Kovář 1, J. Kloub 1, J. Schier 1, A. Heřmánek 1, P. Zemčík 2, A. Herout 2 (1) Institute of Information Theory and Automation Academy of

More information

Systolic Computing. Fundamentals

Systolic Computing. Fundamentals Systolic Computing Fundamentals Motivations for Systolic Processing PARALLEL ALGORITHMS WHICH MODEL OF COMPUTATION IS THE BETTER TO USE? HOW MUCH TIME WE EXPECT TO SAVE USING A PARALLEL ALGORITHM? HOW

More information

Introduction to Digital System Design

Introduction to Digital System Design Introduction to Digital System Design Chapter 1 1 Outline 1. Why Digital? 2. Device Technologies 3. System Representation 4. Abstraction 5. Development Tasks 6. Development Flow Chapter 1 2 1. Why Digital

More information

Development of a Research-oriented Wireless System for Human Performance Monitoring

Development of a Research-oriented Wireless System for Human Performance Monitoring Development of a Research-oriented Wireless System for Human Performance Monitoring by Jonathan Hill ECE Dept., Univ. of Hartford jmhill@hartford.edu Majdi Atallah ECE Dept., Univ. of Hartford atallah@hartford.edu

More information

SEISMIC WAVE PROPAGATION MODELLING ON EMULATED DIGITAL CNN-UM ARCHITECTURE

SEISMIC WAVE PROPAGATION MODELLING ON EMULATED DIGITAL CNN-UM ARCHITECTURE PERIODICA POLYTECHNICA SER. EL. ENG. VOL. 49, NO. 3 4, PP. 183 193 (005) SEISMIC WAVE PROPAGATION MODELLING ON EMULATED DIGITAL CNN-UM ARCHITECTURE Péter KOZMA 1, Zoltán NAGY 1 and Péter SZOLGAY 1, 1 Department

More information

Lesson 7: SYSTEM-ON. SoC) AND USE OF VLSI CIRCUIT DESIGN TECHNOLOGY. Chapter-1L07: "Embedded Systems - ", Raj Kamal, Publs.: McGraw-Hill Education

Lesson 7: SYSTEM-ON. SoC) AND USE OF VLSI CIRCUIT DESIGN TECHNOLOGY. Chapter-1L07: Embedded Systems - , Raj Kamal, Publs.: McGraw-Hill Education Lesson 7: SYSTEM-ON ON-CHIP (SoC( SoC) AND USE OF VLSI CIRCUIT DESIGN TECHNOLOGY 1 VLSI chip Integration of high-level components Possess gate-level sophistication in circuits above that of the counter,

More information

NIOS II Based Embedded Web Server Development for Networking Applications

NIOS II Based Embedded Web Server Development for Networking Applications NIOS II Based Embedded Web Server Development for Networking Applications 1 Sheetal Bhoyar, 2 Dr. D. V. Padole 1 Research Scholar, G. H. Raisoni College of Engineering, Nagpur, India 2 Professor, G. H.

More information

Agenda. Michele Taliercio, Il circuito Integrato, Novembre 2001

Agenda. Michele Taliercio, Il circuito Integrato, Novembre 2001 Agenda Introduzione Il mercato Dal circuito integrato al System on a Chip (SoC) La progettazione di un SoC La tecnologia Una fabbrica di circuiti integrati 28 How to handle complexity G The engineering

More information

CFD Implementation with In-Socket FPGA Accelerators

CFD Implementation with In-Socket FPGA Accelerators CFD Implementation with In-Socket FPGA Accelerators Ivan Gonzalez UAM Team at DOVRES FuSim-E Programme Symposium: CFD on Future Architectures C 2 A 2 S 2 E DLR Braunschweig 14 th -15 th October 2009 Outline

More information

Seeking Opportunities for Hardware Acceleration in Big Data Analytics

Seeking Opportunities for Hardware Acceleration in Big Data Analytics Seeking Opportunities for Hardware Acceleration in Big Data Analytics Paul Chow High-Performance Reconfigurable Computing Group Department of Electrical and Computer Engineering University of Toronto Who

More information

A Computer Vision System on a Chip: a case study from the automotive domain

A Computer Vision System on a Chip: a case study from the automotive domain A Computer Vision System on a Chip: a case study from the automotive domain Gideon P. Stein Elchanan Rushinek Gaby Hayun Amnon Shashua Mobileye Vision Technologies Ltd. Hebrew University Jerusalem, Israel

More information

Model-based system-on-chip design on Altera and Xilinx platforms

Model-based system-on-chip design on Altera and Xilinx platforms CO-DEVELOPMENT MANUFACTURING INNOVATION & SUPPORT Model-based system-on-chip design on Altera and Xilinx platforms Ronald Grootelaar, System Architect RJA.Grootelaar@3t.nl Agenda 3T Company profile Technology

More information

Moving Beyond CPUs in the Cloud: Will FPGAs Sink or Swim?

Moving Beyond CPUs in the Cloud: Will FPGAs Sink or Swim? Moving Beyond CPUs in the Cloud: Will FPGAs Sink or Swim? Successful FPGA datacenter usage at scale will require differentiated capability, programming ease, and scalable implementation models Executive

More information

Low-resolution Image Processing based on FPGA

Low-resolution Image Processing based on FPGA Abstract Research Journal of Recent Sciences ISSN 2277-2502. Low-resolution Image Processing based on FPGA Mahshid Aghania Kiau, Islamic Azad university of Karaj, IRAN Available online at: www.isca.in,

More information

A Mixed-Signal System-on-Chip Audio Decoder Design for Education

A Mixed-Signal System-on-Chip Audio Decoder Design for Education A Mixed-Signal System-on-Chip Audio Decoder Design for Education R. Koenig, A. Thomas, M. Kuehnle, J. Becker, E.Crocoll, M. Siegel @itiv.uni-karlsruhe.de @ims.uni-karlsruhe.de

More information

Reconfigurable System-on-Chip Design

Reconfigurable System-on-Chip Design Reconfigurable System-on-Chip Design MITCHELL MYJAK Senior Research Engineer Pacific Northwest National Laboratory PNNL-SA-93202 31 January 2013 1 About Me Biography BSEE, University of Portland, 2002

More information

FPGA area allocation for parallel C applications

FPGA area allocation for parallel C applications 1 FPGA area allocation for parallel C applications Vlad-Mihai Sima, Elena Moscu Panainte, Koen Bertels Computer Engineering Faculty of Electrical Engineering, Mathematics and Computer Science Delft University

More information

An Open Architecture through Nanocomputing

An Open Architecture through Nanocomputing 2009 International Symposium on Computing, Communication, and Control (ISCCC 2009) Proc.of CSIT vol.1 (2011) (2011) IACSIT Press, Singapore An Open Architecture through Nanocomputing Joby Joseph1and A.

More information

FPGA. AT6000 FPGAs. Application Note AT6000 FPGAs. 3x3 Convolver with Run-Time Reconfigurable Vector Multiplier in Atmel AT6000 FPGAs.

FPGA. AT6000 FPGAs. Application Note AT6000 FPGAs. 3x3 Convolver with Run-Time Reconfigurable Vector Multiplier in Atmel AT6000 FPGAs. 3x3 Convolver with Run-Time Reconfigurable Vector Multiplier in Atmel AT6000 s Introduction Convolution is one of the basic and most common operations in both analog and digital domain signal processing.

More information

Float to Fix conversion

Float to Fix conversion www.thalesgroup.com Float to Fix conversion Fabrice Lemonnier Research & Technology 2 / Thales Research & Technology : Research center of Thales Objective: to propose technological breakthrough for the

More information

Neural Network Design in Cloud Computing

Neural Network Design in Cloud Computing International Journal of Computer Trends and Technology- volume4issue2-2013 ABSTRACT: Neural Network Design in Cloud Computing B.Rajkumar #1,T.Gopikiran #2,S.Satyanarayana *3 #1,#2Department of Computer

More information

Computer System: User s View. Computer System Components: High Level View. Input. Output. Computer. Computer System: Motherboard Level

Computer System: User s View. Computer System Components: High Level View. Input. Output. Computer. Computer System: Motherboard Level System: User s View System Components: High Level View Input Output 1 System: Motherboard Level 2 Components: Interconnection I/O MEMORY 3 4 Organization Registers ALU CU 5 6 1 Input/Output I/O MEMORY

More information

Best Practises for LabVIEW FPGA Design Flow. uk.ni.com ireland.ni.com

Best Practises for LabVIEW FPGA Design Flow. uk.ni.com ireland.ni.com Best Practises for LabVIEW FPGA Design Flow 1 Agenda Overall Application Design Flow Host, Real-Time and FPGA LabVIEW FPGA Architecture Development FPGA Design Flow Common FPGA Architectures Testing and

More information

Design and Implementation of an On-Chip timing based Permutation Network for Multiprocessor system on Chip

Design and Implementation of an On-Chip timing based Permutation Network for Multiprocessor system on Chip Design and Implementation of an On-Chip timing based Permutation Network for Multiprocessor system on Chip Ms Lavanya Thunuguntla 1, Saritha Sapa 2 1 Associate Professor, Department of ECE, HITAM, Telangana

More information

Hardware/Software Co-Design of a Java Virtual Machine

Hardware/Software Co-Design of a Java Virtual Machine Hardware/Software Co-Design of a Java Virtual Machine Kenneth B. Kent University of Victoria Dept. of Computer Science Victoria, British Columbia, Canada ken@csc.uvic.ca Micaela Serra University of Victoria

More information

Custom design services

Custom design services Custom design services Your partner for electronic design services and solutions Barco Silex, Barco s center of competence for micro-electronic design, has established a solid reputation in the development

More information

AC 2007-2485: PRACTICAL DESIGN PROJECTS UTILIZING COMPLEX PROGRAMMABLE LOGIC DEVICES (CPLD)

AC 2007-2485: PRACTICAL DESIGN PROJECTS UTILIZING COMPLEX PROGRAMMABLE LOGIC DEVICES (CPLD) AC 2007-2485: PRACTICAL DESIGN PROJECTS UTILIZING COMPLEX PROGRAMMABLE LOGIC DEVICES (CPLD) Samuel Lakeou, University of the District of Columbia Samuel Lakeou received a BSEE (1974) and a MSEE (1976)

More information

A General Framework for Tracking Objects in a Multi-Camera Environment

A General Framework for Tracking Objects in a Multi-Camera Environment A General Framework for Tracking Objects in a Multi-Camera Environment Karlene Nguyen, Gavin Yeung, Soheil Ghiasi, Majid Sarrafzadeh {karlene, gavin, soheil, majid}@cs.ucla.edu Abstract We present a framework

More information

Aims and Objectives. E 3.05 Digital System Design. Course Syllabus. Course Syllabus (1) Programmable Logic

Aims and Objectives. E 3.05 Digital System Design. Course Syllabus. Course Syllabus (1) Programmable Logic Aims and Objectives E 3.05 Digital System Design Peter Cheung Department of Electrical & Electronic Engineering Imperial College London URL: www.ee.ic.ac.uk/pcheung/ E-mail: p.cheung@ic.ac.uk How to go

More information

Codesign: The World Of Practice

Codesign: The World Of Practice Codesign: The World Of Practice D. Sreenivasa Rao Senior Manager, System Level Integration Group Analog Devices Inc. May 2007 Analog Devices Inc. ADI is focused on high-end signal processing chips and

More information

ELEC 5260/6260/6266 Embedded Computing Systems

ELEC 5260/6260/6266 Embedded Computing Systems ELEC 5260/6260/6266 Embedded Computing Systems Spring 2016 Victor P. Nelson Text: Computers as Components, 3 rd Edition Prof. Marilyn Wolf (Georgia Tech) Course Topics Embedded system design & modeling

More information

A New, High-Performance, Low-Power, Floating-Point Embedded Processor for Scientific Computing and DSP Applications

A New, High-Performance, Low-Power, Floating-Point Embedded Processor for Scientific Computing and DSP Applications 1 A New, High-Performance, Low-Power, Floating-Point Embedded Processor for Scientific Computing and DSP Applications Simon McIntosh-Smith Director of Architecture 2 Multi-Threaded Array Processing Architecture

More information

Digitale Signalverarbeitung mit FPGA (DSF) Soft Core Prozessor NIOS II Stand Mai 2007. Jens Onno Krah

Digitale Signalverarbeitung mit FPGA (DSF) Soft Core Prozessor NIOS II Stand Mai 2007. Jens Onno Krah (DSF) Soft Core Prozessor NIOS II Stand Mai 2007 Jens Onno Krah Cologne University of Applied Sciences www.fh-koeln.de jens_onno.krah@fh-koeln.de NIOS II 1 1 What is Nios II? Altera s Second Generation

More information

Synchronization of sampling in distributed signal processing systems

Synchronization of sampling in distributed signal processing systems Synchronization of sampling in distributed signal processing systems Károly Molnár, László Sujbert, Gábor Péceli Department of Measurement and Information Systems, Budapest University of Technology and

More information

MEng, BSc Applied Computer Science

MEng, BSc Applied Computer Science School of Computing FACULTY OF ENGINEERING MEng, BSc Applied Computer Science Year 1 COMP1212 Computer Processor Effective programming depends on understanding not only how to give a machine instructions

More information

9/14/2011 14.9.2011 8:38

9/14/2011 14.9.2011 8:38 Algorithms and Implementation Platforms for Wireless Communications TLT-9706/ TKT-9636 (Seminar Course) BASICS OF FIELD PROGRAMMABLE GATE ARRAYS Waqar Hussain firstname.lastname@tut.fi Department of Computer

More information

Open Architecture Design for GPS Applications Yves Théroux, BAE Systems Canada

Open Architecture Design for GPS Applications Yves Théroux, BAE Systems Canada Open Architecture Design for GPS Applications Yves Théroux, BAE Systems Canada BIOGRAPHY Yves Théroux, a Project Engineer with BAE Systems Canada (BSC) has eight years of experience in the design, qualification,

More information

SoC Curricula at Tallinn Technical University

SoC Curricula at Tallinn Technical University SoC Curricula at Tallinn Technical University Margus Kruus, Kalle Tammemäe, Peeter Ellervee Tallinn Technical University Phone: +372-6202250, Fax: +372-6202246 kruus@cc.ttu.ee nalle@cc.ttu.ee lrv@cc.ttu.ee

More information

Networking Virtualization Using FPGAs

Networking Virtualization Using FPGAs Networking Virtualization Using FPGAs Russell Tessier, Deepak Unnikrishnan, Dong Yin, and Lixin Gao Reconfigurable Computing Group Department of Electrical and Computer Engineering University of Massachusetts,

More information

BUILD VERSUS BUY. Understanding the Total Cost of Embedded Design. www.ni.com/buildvsbuy

BUILD VERSUS BUY. Understanding the Total Cost of Embedded Design. www.ni.com/buildvsbuy BUILD VERSUS BUY Understanding the Total Cost of Embedded Design Table of Contents I. Introduction II. The Build Approach: Custom Design a. Hardware Design b. Software Design c. Manufacturing d. System

More information

IBM Deep Computing Visualization Offering

IBM Deep Computing Visualization Offering P - 271 IBM Deep Computing Visualization Offering Parijat Sharma, Infrastructure Solution Architect, IBM India Pvt Ltd. email: parijatsharma@in.ibm.com Summary Deep Computing Visualization in Oil & Gas

More information

A bachelor of science degree in electrical engineering with a cumulative undergraduate GPA of at least 3.0 on a 4.0 scale

A bachelor of science degree in electrical engineering with a cumulative undergraduate GPA of at least 3.0 on a 4.0 scale What is the University of Florida EDGE Program? EDGE enables engineering professional, military members, and students worldwide to participate in courses, certificates, and degree programs from the UF

More information

Hardware Task Scheduling and Placement in Operating Systems for Dynamically Reconfigurable SoC

Hardware Task Scheduling and Placement in Operating Systems for Dynamically Reconfigurable SoC Hardware Task Scheduling and Placement in Operating Systems for Dynamically Reconfigurable SoC Yuan-Hsiu Chen and Pao-Ann Hsiung National Chung Cheng University, Chiayi, Taiwan 621, ROC. pahsiung@cs.ccu.edu.tw

More information

Software Development Under Stringent Hardware Constraints: Do Agile Methods Have a Chance?

Software Development Under Stringent Hardware Constraints: Do Agile Methods Have a Chance? Software Development Under Stringent Hardware Constraints: Do Agile Methods Have a Chance? Jussi Ronkainen, Pekka Abrahamsson VTT Technical Research Centre of Finland P.O. Box 1100 FIN-90570 Oulu, Finland

More information

FPGAs in Next Generation Wireless Networks

FPGAs in Next Generation Wireless Networks FPGAs in Next Generation Wireless Networks March 2010 Lattice Semiconductor 5555 Northeast Moore Ct. Hillsboro, Oregon 97124 USA Telephone: (503) 268-8000 www.latticesemi.com 1 FPGAs in Next Generation

More information

Fraunhofer Institute for Telecommunications

Fraunhofer Institute for Telecommunications Fraunhofer Institute for Telecommunications Heinrich-Hertz-Institut SCUBE-ICT Emerging Berlin opportunities under FP7-ICT Call 5 Minsk, 25.-26.06.2009 Einsteinufer 37 10587 Berlin Germany Phone: Fax: email:

More information

Parallelized Architecture of Multiple Classifiers for Face Detection

Parallelized Architecture of Multiple Classifiers for Face Detection Parallelized Architecture of Multiple s for Face Detection Author(s) Name(s) Author Affiliation(s) E-mail Abstract This paper presents a parallelized architecture of multiple classifiers for face detection

More information

Secured Embedded Many-Core Accelerator for Big Data Processing

Secured Embedded Many-Core Accelerator for Big Data Processing Secured Embedded Many- Accelerator for Big Data Processing Amey Kulkarni PhD Candidate Advisor: Professor Tinoosh Mohsenin Energy Efficient High Performance Computing (EEHPC) Lab University of Maryland,

More information

MEng, BSc Computer Science with Artificial Intelligence

MEng, BSc Computer Science with Artificial Intelligence School of Computing FACULTY OF ENGINEERING MEng, BSc Computer Science with Artificial Intelligence Year 1 COMP1212 Computer Processor Effective programming depends on understanding not only how to give

More information

Rapid System Prototyping with FPGAs

Rapid System Prototyping with FPGAs Rapid System Prototyping with FPGAs By R.C. Coferand Benjamin F. Harding AMSTERDAM BOSTON HEIDELBERG LONDON NEW YORK OXFORD PARIS SAN DIEGO SAN FRANCISCO SINGAPORE SYDNEY TOKYO Newnes is an imprint of

More information

Image processing in the military technology

Image processing in the military technology AARMS Vol. 2, No. 2 (2003) 221 231 INFORMATICS ROBOTICS Image processing in the military technology TIBOR BUZÁSI Ministry of Defence, Technology Agency, Budapest, Hungary At the Ministry of Defence, Technology

More information

FPGA Design From Scratch It all started more than 40 years ago

FPGA Design From Scratch It all started more than 40 years ago FPGA Design From Scratch It all started more than 40 years ago Presented at FPGA Forum in Trondheim 14-15 February 2012 Sven-Åke Andersson Realtime Embedded 1 Agenda Moore s Law Processor, Memory and Computer

More information

The Big Data methodology in computer vision systems

The Big Data methodology in computer vision systems The Big Data methodology in computer vision systems Popov S.B. Samara State Aerospace University, Image Processing Systems Institute, Russian Academy of Sciences Abstract. I consider the advantages of

More information

An Embedded Hardware-Efficient Architecture for Real-Time Cascade Support Vector Machine Classification

An Embedded Hardware-Efficient Architecture for Real-Time Cascade Support Vector Machine Classification An Embedded Hardware-Efficient Architecture for Real-Time Support Vector Machine Classification Christos Kyrkou, Theocharis Theocharides KIOS Research Center, Department of Electrical and Computer Engineering

More information

White Paper FPGA Performance Benchmarking Methodology

White Paper FPGA Performance Benchmarking Methodology White Paper Introduction This paper presents a rigorous methodology for benchmarking the capabilities of an FPGA family. The goal of benchmarking is to compare the results for one FPGA family versus another

More information

Implementation and Design of AES S-Box on FPGA

Implementation and Design of AES S-Box on FPGA International Journal of Research in Engineering and Science (IJRES) ISSN (Online): 232-9364, ISSN (Print): 232-9356 Volume 3 Issue ǁ Jan. 25 ǁ PP.9-4 Implementation and Design of AES S-Box on FPGA Chandrasekhar

More information

FPGA-based MapReduce Framework for Machine Learning

FPGA-based MapReduce Framework for Machine Learning FPGA-based MapReduce Framework for Machine Learning Bo WANG 1, Yi SHAN 1, Jing YAN 2, Yu WANG 1, Ningyi XU 2, Huangzhong YANG 1 1 Department of Electronic Engineering Tsinghua University, Beijing, China

More information

FPGA-based Multithreading for In-Memory Hash Joins

FPGA-based Multithreading for In-Memory Hash Joins FPGA-based Multithreading for In-Memory Hash Joins Robert J. Halstead, Ildar Absalyamov, Walid A. Najjar, Vassilis J. Tsotras University of California, Riverside Outline Background What are FPGAs Multithreaded

More information

REAL-TIME STREAMING ANALYTICS DATA IN, ACTION OUT

REAL-TIME STREAMING ANALYTICS DATA IN, ACTION OUT REAL-TIME STREAMING ANALYTICS DATA IN, ACTION OUT SPOT THE ODD ONE BEFORE IT IS OUT flexaware.net Streaming analytics: from data to action Do you need actionable insights from various data streams fast?

More information

HARDWARE IMPLEMENTATION OF TASK MANAGEMENT IN EMBEDDED REAL-TIME OPERATING SYSTEMS

HARDWARE IMPLEMENTATION OF TASK MANAGEMENT IN EMBEDDED REAL-TIME OPERATING SYSTEMS HARDWARE IMPLEMENTATION OF TASK MANAGEMENT IN EMBEDDED REAL-TIME OPERATING SYSTEMS 1 SHI-HAI ZHU 1Department of Computer and Information Engineering, Zhejiang Water Conservancy and Hydropower College Hangzhou,

More information

Introduction to System-on-Chip

Introduction to System-on-Chip Introduction to System-on-Chip COE838: Systems-on-Chip Design http://www.ee.ryerson.ca/~courses/coe838/ Dr. Gul N. Khan http://www.ee.ryerson.ca/~gnkhan Electrical and Computer Engineering Ryerson University

More information

High Performance Computing. Course Notes 2007-2008. HPC Fundamentals

High Performance Computing. Course Notes 2007-2008. HPC Fundamentals High Performance Computing Course Notes 2007-2008 2008 HPC Fundamentals Introduction What is High Performance Computing (HPC)? Difficult to define - it s a moving target. Later 1980s, a supercomputer performs

More information

VON BRAUN LABS. Issue #1 WE PROVIDE COMPLETE SOLUTIONS ULTRA LOW POWER STATE MACHINE SOLUTIONS VON BRAUN LABS. State Machine Technology

VON BRAUN LABS. Issue #1 WE PROVIDE COMPLETE SOLUTIONS ULTRA LOW POWER STATE MACHINE SOLUTIONS VON BRAUN LABS. State Machine Technology VON BRAUN LABS WE PROVIDE COMPLETE SOLUTIONS WWW.VONBRAUNLABS.COM Issue #1 VON BRAUN LABS WE PROVIDE COMPLETE SOLUTIONS ULTRA LOW POWER STATE MACHINE SOLUTIONS State Machine Technology IoT Solutions Learn

More information

Extending the Power of FPGAs. Salil Raje, Xilinx

Extending the Power of FPGAs. Salil Raje, Xilinx Extending the Power of FPGAs Salil Raje, Xilinx Extending the Power of FPGAs The Journey has Begun Salil Raje Xilinx Corporate Vice President Software and IP Products Development Agenda The Evolution of

More information

Enhancing Cloud-based Servers by GPU/CPU Virtualization Management

Enhancing Cloud-based Servers by GPU/CPU Virtualization Management Enhancing Cloud-based Servers by GPU/CPU Virtualiz Management Tin-Yu Wu 1, Wei-Tsong Lee 2, Chien-Yu Duan 2 Department of Computer Science and Inform Engineering, Nal Ilan University, Taiwan, ROC 1 Department

More information

Latency in High Performance Trading Systems Feb 2010

Latency in High Performance Trading Systems Feb 2010 Latency in High Performance Trading Systems Feb 2010 Stephen Gibbs Automated Trading Group Overview Review the architecture of a typical automated trading system Review the major sources of latency, many

More information

A Survey of Video Processing with Field Programmable Gate Arrays (FGPA)

A Survey of Video Processing with Field Programmable Gate Arrays (FGPA) A Survey of Video Processing with Field Programmable Gate Arrays (FGPA) Heather Garnell Abstract This paper is a high-level, survey of recent developments in the area of video processing using reconfigurable

More information

White Paper. S2C Inc. 1735 Technology Drive, Suite 620 San Jose, CA 95110, USA Tel: +1 408 213 8818 Fax: +1 408 213 8821 www.s2cinc.com.

White Paper. S2C Inc. 1735 Technology Drive, Suite 620 San Jose, CA 95110, USA Tel: +1 408 213 8818 Fax: +1 408 213 8821 www.s2cinc.com. White Paper FPGA Prototyping of System-on-Chip Designs The Need for a Complete Prototyping Platform for Any Design Size, Any Design Stage with Enterprise-Wide Access, Anytime, Anywhere S2C Inc. 1735 Technology

More information

dspace DSP DS-1104 based State Observer Design for Position Control of DC Servo Motor

dspace DSP DS-1104 based State Observer Design for Position Control of DC Servo Motor dspace DSP DS-1104 based State Observer Design for Position Control of DC Servo Motor Jaswandi Sawant, Divyesh Ginoya Department of Instrumentation and control, College of Engineering, Pune. ABSTRACT This

More information

The Department of Electrical and Computer Engineering (ECE) offers the following graduate degree programs:

The Department of Electrical and Computer Engineering (ECE) offers the following graduate degree programs: Note that these pages are extracted from the full Graduate Catalog, please refer to it for complete details. College of 1 ELECTRICAL AND COMPUTER ENGINEERING www.ece.neu.edu SHEILA S. HEMAMI, PHD Professor

More information

Curriculum for a Master s Degree in ECE with focus on Mixed Signal SOC Design

Curriculum for a Master s Degree in ECE with focus on Mixed Signal SOC Design Curriculum for a Master s Degree in ECE with focus on Mixed Signal SOC Design Department of Electrical and Computer Engineering Overview The VLSI Design program is part of two tracks in the department:

More information

GEDAE TM - A Graphical Programming and Autocode Generation Tool for Signal Processor Applications

GEDAE TM - A Graphical Programming and Autocode Generation Tool for Signal Processor Applications GEDAE TM - A Graphical Programming and Autocode Generation Tool for Signal Processor Applications Harris Z. Zebrowitz Lockheed Martin Advanced Technology Laboratories 1 Federal Street Camden, NJ 08102

More information

LMS is a simple but powerful algorithm and can be implemented to take advantage of the Lattice FPGA architecture.

LMS is a simple but powerful algorithm and can be implemented to take advantage of the Lattice FPGA architecture. February 2012 Introduction Reference Design RD1031 Adaptive algorithms have become a mainstay in DSP. They are used in wide ranging applications including wireless channel estimation, radar guidance systems,

More information

Analecta Vol. 8, No. 2 ISSN 2064-7964

Analecta Vol. 8, No. 2 ISSN 2064-7964 EXPERIMENTAL APPLICATIONS OF ARTIFICIAL NEURAL NETWORKS IN ENGINEERING PROCESSING SYSTEM S. Dadvandipour Institute of Information Engineering, University of Miskolc, Egyetemváros, 3515, Miskolc, Hungary,

More information

Control 2004, University of Bath, UK, September 2004

Control 2004, University of Bath, UK, September 2004 Control, University of Bath, UK, September ID- IMPACT OF DEPENDENCY AND LOAD BALANCING IN MULTITHREADING REAL-TIME CONTROL ALGORITHMS M A Hossain and M O Tokhi Department of Computing, The University of

More information

Phase Change Memory for Neuromorphic Systems and Applications

Phase Change Memory for Neuromorphic Systems and Applications Phase Change Memory for Neuromorphic Systems and Applications M. Suri 1, O. Bichler 2, D. Querlioz 3, V. Sousa 1, L. Perniola 1, D. Vuillaume 4, C. Gamrat 2, and B. DeSalvo 1 (manan.suri@cea.fr, barbara.desalvo@cea.fr)

More information

High-Speed SERDES Interfaces In High Value FPGAs

High-Speed SERDES Interfaces In High Value FPGAs High-Speed SERDES Interfaces In High Value FPGAs February 2009 Lattice Semiconductor 5555 Northeast Moore Ct. Hillsboro, Oregon 97124 USA Telephone: (503) 268-8000 www.latticesemi.com 1 High-Speed SERDES

More information

Networking Remote-Controlled Moving Image Monitoring System

Networking Remote-Controlled Moving Image Monitoring System Networking Remote-Controlled Moving Image Monitoring System First Prize Networking Remote-Controlled Moving Image Monitoring System Institution: Participants: Instructor: National Chung Hsing University

More information

FPGA Design of Reconfigurable Binary Processor Using VLSI

FPGA Design of Reconfigurable Binary Processor Using VLSI ISSN (Online) : 2319-8753 ISSN (Print) : 2347-6710 International Journal of Innovative Research in Science, Engineering and Technology Volume 3, Special Issue 3, March 2014 2014 International Conference

More information

Electronic system-level development: Finding the right mix of solutions for the right mix of engineers.

Electronic system-level development: Finding the right mix of solutions for the right mix of engineers. Electronic system-level development: Finding the right mix of solutions for the right mix of engineers. Nowadays, System Engineers are placed in the centre of two antagonist flows: microelectronic systems

More information

A Compact FPGA Implementation of Triple-DES Encryption System with IP Core Generation and On-Chip Verification

A Compact FPGA Implementation of Triple-DES Encryption System with IP Core Generation and On-Chip Verification Proceedings of the 2010 International Conference on Industrial Engineering and Operations Management Dhaka, Bangladesh, January 9 10, 2010 A Compact FPGA Implementation of Triple-DES Encryption System

More information

Automotive Software Engineering

Automotive Software Engineering Automotive Software Engineering List of Chapters: 1. Introduction and Overview 1.1 The Driver Vehicle Environment System 1.1.1 Design and Method of Operation of Vehicle Electronic 1.1.2 Electronic of the

More information

HowHow to Get Rid of Unwanted Money

HowHow to Get Rid of Unwanted Money On-Chip Evolution Using a Soft Processor Core Applied to Image Recognition Kyrre Glette and Jim Torresen University of Oslo Department of Informatics PO Box 1080 Blindern, 0316 Oslo, Norway {kyrrehg,jimtoer}@ifiuiono

More information

NORTHEASTERN UNIVERSITY Graduate School of Engineering

NORTHEASTERN UNIVERSITY Graduate School of Engineering NORTHEASTERN UNIVERSITY Graduate School of Engineering Thesis Title: Enabling Communications Between an FPGA s Embedded Processor and its Reconfigurable Resources Author: Joshua Noseworthy Department:

More information

Microprocessor and Hardware Laboratory (MHL)

Microprocessor and Hardware Laboratory (MHL) Microprocessor and Hardware Laboratory (MHL) Διονύσης Πνευματικάτος Καθηγητής, Διευθυντής MHL Τμήμα Ηλεκτρονικών Μηχανικών και Μηχανικών Υπολογιστών ΠΟΛΥΤΕΧΝΕΙΟ ΚΡΗΤΗΣ Mission High Quality Research: Basic

More information

MATLAB/Simulink Based Hardware/Software Co-Simulation for Designing Using FPGA Configured Soft Processors

MATLAB/Simulink Based Hardware/Software Co-Simulation for Designing Using FPGA Configured Soft Processors MATLAB/Simulink Based Hardware/Software Co-Simulation for Designing Using FPGA Configured Soft Processors Jingzhao Ou and Viktor K. Prasanna Department of Electrical Engineering, University of Southern

More information

Power Reduction Techniques in the SoC Clock Network. Clock Power

Power Reduction Techniques in the SoC Clock Network. Clock Power Power Reduction Techniques in the SoC Network Low Power Design for SoCs ASIC Tutorial SoC.1 Power Why clock power is important/large» Generally the signal with the highest frequency» Typically drives a

More information

GPU System Architecture. Alan Gray EPCC The University of Edinburgh

GPU System Architecture. Alan Gray EPCC The University of Edinburgh GPU System Architecture EPCC The University of Edinburgh Outline Why do we want/need accelerators such as GPUs? GPU-CPU comparison Architectural reasons for GPU performance advantages GPU accelerated systems

More information

Systems on Chip Design

Systems on Chip Design Systems on Chip Design College: Engineering Department: Electrical First: Course Definition, a Summary: 1 Course Code: EE 19 Units: 3 credit hrs 3 Level: 3 rd 4 Prerequisite: Basic knowledge of microprocessor/microcontroller

More information