Investigation of emulated-digital CNN-UM architectures: Retina model and Cellular Wave-Computing Architecture implementation on FPGA
|
|
- Collin Bryan
- 8 years ago
- Views:
Transcription
1 University of Pannonia Information Science and Technology Doctoral School Investigation of emulated-digital CNN-UM architectures: Retina model and Cellular Wave-Computing Architecture implementation on FPGA Theses of Ph.D. dissertation Zsolt Vörösházi Supervisor: Péter Szolgay DSc Veszprém, Hungary 2009
2 Motivations and aims 1 I. Motivations and aims Nowadays, both the analog- and the digital circuit technology and fabrication are continuously improving and supplementing each other. This improvement is well featured by the scaling-down (micro-minimalization) effect based on the Moore s law. The choice between these two technologies in case of the high-performance, real-time, and near-sensor signal processing tasks is primarily determined by the method of the application. To support the decision critical and typical physical parameters are calculated: such as area (A), speed (S), dissipated power (P) of the complex, large-scale integrated VLSI circuits. Recently, the parallel array processing has become the focus point of the state-ofthe-art analog circuit technology and its digital counterpart. However, following this type of design methodology an important problem was emerged: most of the designers and researchers intended to construct a globally interconnected processor array structure, but its complexity grows exponentially according to the increasing number of processor elements in an array. Cellular Neural / Nonlinear Networks (CNN) are defined as analog, non-linear, parallel computing structure of array including a lot of elementary processor units (e.g. nucleus) arranged in a 2-dimensional regular grid. They can be implemented not only in a single-layer, but well formed in a multi-layer architecture, as well. The processor elements are locally connected (discrete) in space, but they operate continuously in time. The program of the CNN network (called template ) can be defined by the strength of the local interconnections between the elementary cells, or in other words by setting the matrix of the weight factors. The result of the computation is derived from both the spatio-temporal dynamic of the processing elements and the template operations (called analog transient) together. If each elementary CNN cell is extended with a local analog and logical memory unit, a local control unit, and an optical sensor input, moreover adding a global control unit to this integrated cell array the CNN Universal Machine (CNN-UM) architecture can be constructed, recently defined as Cellular Wave Computing Architecture. The CNN-UM is universal both in terms of Turing Machine and it may work as a non-linear computing operator. Each elementary instruction of the CNN-UM defines complex, spatio-temporal dynamic behavior. In the present era, this novel computing architecture based on the CNN paradigm has been implemented on several different platforms. First hardware prototypes of the CNN networks contained analog / mixed-signal VLSI chips. The huge computing performance of these CNN chips (a few TeraOPS operations/sec) far exceeds the performance of all the other digital processor implementations, and the power dissipation is very low. However, they have some disadvantages, which impede their wide spread usability in industrial applications. They are suffered from noisesensitivity, lack of flexibility, and as the most important problem the limited analog accuracy (about 7-8 bits in I/O) giving inaccurate solution in most of signal processing tasks.
3 Motivations and aims 2 The simplest, most accurate, and most flexible, but slowest CNN-UM implementation is the CNN software simulator running on a traditional computer. The software simulator is generally used to ease the template design and optimization process. Moreover, during the measurements some comparisons should be made between the speed-up of different CNN-UM approaches and the computing performance of the CNN software simulator, which latter is considered unity. As an alternate way, CNN software simulation can be accelerated by many-core technologies using GPU-based (Graphics Processor Unit) implementations, such as the NVidia CUDA, or the IBM CELL architecture. The emulated-digital CNN-UM solution means the best compromise between the analog VLSI CNN-UM implementations and software simulators regarding computing performance and accuracy. The emulated-digital solutions have many different physical forms, such as ASIC (Application Specific Integrated Circuit) like the CASTLE array processor, DSP-based (Digital Signal Processor) CNN-HAC prototyping board, or they can be built up on an FPGA (Field Programmable Gate Array) e.g. the FALCON architecture. In case of this emulated-digital approach the behavior of the analog CNN cell network can be approximated by a discretized model in space and time, while the locally connected digital processor elements are arranged in an array. Hence, the nature of CNN provides a flexible and effective computing structure for the complex spatio-temporal dynamical computations of various bio-inspired systems (such as a retina), moreover it makes possible to generate the activity patterns in video real-time, as well. The neuromorph structure of the multi-layer CNN retina model is derived from both morphological and electro-physiological information measured by neurobiologist. According to the latest results of neurobiological investigations a mammalian rabbit retina consists of about different types of ganglion channels, but further channels might be explored due to improvements in the methodology and measurement techniques. Each channel is made up from several (at least 10) diversely inter-connected stack of strata, on which large number of simple processor elements (neurons) are arranged on a two-dimensional structure. The difficulty lies in this evolvable computing problem, that we could handle large number of CNN layers with different physicaltiming parameters, and various connectivity properties beyond the increased computation power requirements. Universality of a CNN-UM network is based on the stored-programming principle, which task can be solved by integrating an embedded Global Analogic Programming Unit (hereafter GAPU) into the cell network. This GAPU is responsible for controlling the sequential instructions of the complex, sophisticated analogical (analog-logical) CNN algorithms; moreover, it can store the necessary values (input, state, bias) to perform the computations. I have chosen the FPGA-based reconfigurable computing devices both for neuromorph structured mammalian retina model implementation and for elaboration of the GAPU. The reason is that today s modern FPGAs provide good alternative to perform complex spatio-temporal, multi-layer CNN dynamical computations at high precision owing to their advantageous features, such as high flexibility and
4 Motivations and aims 3 computing performance, rapid prototyping development, and low cost (in low volume). Therefore, it is worthwhile to review the different CNN-UM approaches especially regarding the FPGA-based emulated-digital CNN-UM implementation. It is very interesting how its inherent computational potential can be exploited in the solution of various real-time processing tasks.
5 Methodology of the research 4 II. Methodology of the research Topic of the dissertation is based on one hand the neuromorph structured, multilayer mammalian retina model implementation, and on the other hand the Global Analogic Programming Unit (GAPU) implementation on FPGA architecture. During the software development and setup phase of different test applications I used several industry-standard programming and simulation EDA tools, such as Xilinx ISE synthesis tools with EDK embedded processor developer kit, Celoxica/Agility DK Design Suite supporting the Handel-C high-level description language, Mentor ModelSim simulator, and MATLAB programming tool. Moreover, for hardware prototyping I have tried to choose various development boards, which are capable of covering the distinct directions and levels of FPGA evolutions (e.g. Celoxica RC203 and RC2000 boards with Virtex-II, a Xilinx V2Pro board equipped with Virtex-II Pro, and finally a Xilinx ML-506 card embedded with Virtex-5 FGPA). I m intended to use the modern hardware-software co-design and co-verification techniques, which means the state-of-the-art and most popular form of the FPGAbased reconfigurable computing (RC) implementation. In these types of hardwaresoftware co-design tasks the partitioning step means the key problem, in which the designer can freely determine the distinction of hardware and software parts. However, the partitioning may also depend on the available resources and achievable performance. The reconfigurable FPGA architectures make it possible to perform optimized DSP operations by utilizing the internal dedicated building blocks at high speed: e.g. calculating a convolution by means of Multiply-Accumulate (MAC) DSP block. Today (2009) largest and most powerful high-end Xilinx FPGA is the Virtex-6 SX family (XC6VSX475T), which contains at most 2016 dedicated MAC DSP blocks. During the comprehensive research work I first examined how the bio-inspired mammalian retina model can be implemented on a FPGA-based emulated-digital CNN-UM architecture, which means an open and emerging computation problem. Although, several analog VLSI CNN-UM devices with complex-cells (e.g. CACE1k, CACE2k) are available, they can only handle external layers from a given retina channel. Using this type of CNN approach in case of an OPL (Outer Plexiform Layer) at most 2 or 3 strata can be simultaneously modeled in video real-time (about 25 fps). Because of this main limitation I have chosen the emulated-digital CNN-UM approach, which makes it possible to implement different retina channels with various configurable timing-, and connectivity parameters supposing a globally connected multi-layer structure, as well. The aim of the proposed multi-channel, multi-layer retina model implementation on FPGA is to get qualitatively correct results compared to the original neurobiological measurements in order to mimic the behavior of the living retina. Furthermore, the implementation provides real-time processing and several orders of magnitude faster than a software simulator. By the help of this FPGA-based implementation model parameters of the mammalian retina can be verified rapidly
6 Methodology of the research 5 and set correctly. (The run-time of a multi-layer retina model using a software simulator on conventional PC might last about several hours depending on the size of the model and its parameters.) Complexity of this task comes from the handling of large number of strata and the very different parameters of the layers, such as feed forward-, feedback connections, couplings, and time constants. The governing equations, which describe the dynamics of the neuromorph-structured retina model, do not have an exact analytical solution. Therefore, during the investigations I made comparisons between the results of different numerical integrating formulas (e.g. Forward Euler method, higher-order Runge-Kutta methods) to solve these types of ODEs. I examined the critical parameters in the following key aspects: simulation time-step, resource utilization (area), computing performance and accuracy. During the implementation I attempted to elaborate an FPGA-based reconfigurable computing architecture, which can be well configured in arbitrary manner and due to the applied design methodology and rapid prototyping platform the behavior of various bio-inspired multi-channel vision models can be explored in real-time. The implemented emulated-digital CNN-UM architecture on FPGA makes it possible to perform complex, spatio-temporal dynamical equations described by coupled ODEs. Using fixed-point computing method it was important to know how accurately the novel implementation approximates the results of the microbiological measurements. Moreover, area requirements and the largest achievable performance at different computing precisions were measured. Considering scalable precision there was another vital question, what the lowest computing accuracy is by which the model gives qualitatively acceptable responses. (Considering the original microbiological measurements a real mammalian retina works at about 6-bits of analog computing precision.) CNN templates and settings of the interactions are calculated from the parameter tables of the CNN-based neuromorph retina model, whereas the behavior of the model is derived from neurobiological measurements. On one hand, both templates and algorithm solutions of the multi-layer retina model have been implemented on software simulator and tested on different conventional microprocessors. To achieve better performance, ANSI-C/C++ source codes are extended with optimized functions of the Intel Integrated Performance Library. This package is optimized for image-, and signal-processing tasks. On the other hand, the entire retina system on FPGAbased emulated-digital CNN-UM architecture was constructed by modifying and extending the Falcon emulated-digital CNN-UM architecture. Different kinds of prototyping platforms equipped with various Xilinx FPGAs were used for the neuromorph retina model calculations. Finally, the computed results have been compared to the original neurobiological measurements to verify the effectiveness of the proposed FPGA-based CNN-UM architecture. The dissertation also deals with the Global Analogic Programming Unit implementation on the reconfigurable emulated-digital CNN-UM architecture using FPGA. Due to its modular structure and operation it can be simply integrated with the previously elaborated original Falcon CNN processing architecture, thereby, a
7 Methodology of the research 6 universal Cellular Wave Computer architecture on FPGA can be implemented. During the implementation process, I have integrated the Xilinx MicroBlaze embedded soft-processor IP core, which has RISC instruction set, into the GAPU. Then, the embedded system as a global CNN control unit has been extended with some storage elements. Finally, this novel architecture has been integrated with the modified Falcon-ML processor array architecture and Vector Processing Elements. These improvements make it possible to effectively exploit the large computing performance of the Falcon processor in order to construct a fully functional, standalone, and real-time image processing system. By using the proposed embedded GAPU implementation, on one hand, template sequences of the complex sophisticated analogic CNN algorithms can be easily executed, and on the other hand, it is capable of controlling program organizing constructions (e.g. iteration, branch etc.) and I/O instructions similar to various commercial Visual System-on-Chip implementations. (Without using the embedded GAPU implementation only a single instruction or a template operation could be handled via a host PC, which is limited the performance of the Falcon processor significantly). During verification and performance tests the computing time, required to calculate the CNN equations, has been measured repeatedly (50-times). From this set the best runtime has been selected and compared to the estimated performance of different commercial CNN-UM implementations. The quality of the proposed embedded, emulated-digital CNN-UM GAPU implementation is demonstrated and verified by running a complex sophisticated analogic CNN algorithm, in which case consecutive steps of template operations and replacements are required. Based on real experiments, several important issues relating to the acceleration efficiency, computing accuracy, cell size, and area consumption are discussed and compared to the results of the software simulator and the concurrent state-of-the-art CNN-UM implementations. The research work has been done at the Cellular Neural Network Applications Laboratory of the Department of Image Processing and Neurocomputing (its new name is Department of Electrical Engineering and Informational Systems), University of Pannonia.
8 New Scientific Results 7 III. New Scientific Results Thesis Group 1: Implementation of a CNN-based neuromorph mammalian retina model on FPGA architecture I have implemented a novel single-, and multi-channel, multi-layer retina models on a reconfigurable emulated-digital CNN-UM architecture by applying the latest results of microbiological measurements relating to the CNN-based framework of the neuromorph mammalian (e.g. rabbit) retina model. The difficulty of this challenging task lies in the solution of a complex spatio-temporal problem which requires huge computing power. Real-time processing capability has been verified and tested on three different FPGA-based prototyping systems, such as Celoxica RC2000, Digilent XUPV2P, and Xilinx ML-506. I have shown experimentally that the single-, and multi-channel retina model implementations on FPGA achieve orders of magnitude performance increase over the software solutions while still providing high-flexibility in parameter settings. This implementation also makes it possible to handle various neuromoprh retina models or biological systems more easily and effectively owing to its rapid and effective parameter tuning and refining procedure. Related publications: [1],[3],[4],[5],[6],[7],[9],[12] Thesis 1.1: I have elaborated a reconfigurable emulated-digital CNN- UM computing architecture (Falcon-ML), which is feasible to implement CNN-based neuromorph, single-, and multi-channel, multilayer mammalian retina model computations on FPGA effectively. The architecture is tailored to implement single-, and multi-channel, multilayer retina model, therefore I have completely redesigned the Arithmetic Unit for the calculations with diffusion-, Gaussian-type symmetrical templates, and Intra/Inter layer zero-neighborhood connections. The Template Memory unit has been also expanded in order to store the various parameters related to the connections of the multi-channel, multi-layer retina structure. Thesis 1.2: I have shown experimentally that the performance of the optimized retina processor elements for calculating CNN dynamics can be significantly improved by decreasing the computing precision. Not knowing the exact analytical solution of this complex spatio-temporal problem, I have considered the double precision floating point numerical implementation as an accurate solution. In general, it is inferred that at least
9 New Scientific Results 8 22-bit computing precision is necessary to obtain qualitatively correct results from various CNN-based neuromorph mammalian retina channels. I have compared the results of different fixed-point computations to the double precision floating point results and neurobiological measurements. At low precisions (less than 14-bit) the error values of the FPGA-based neuromorph retina model implementation are very high because the model does not respond to the input stimulus. At least bit precision is required to get some response on the output of the model. If the CNN dynamics of the retina model should be computed more accurately, at least bit precision is required. Thesis 1.3: I have given equivalent transformations between the computing performance, the image size, the number of layers and the precision of the elaborated FPGA-based single-, and multi-channel neuromorph retina model implementation. These critical parameters determine the limitations of the FPGA implementation. Considering the qualitatively correct 22-bit state precision the elaborated architecture achieves times higher performance compared to the software solution running optimized codes on Intel Core2 Duo E8400 microprocessor. The multi-layer CNN simulation kernel is written in C using optimized functions of the Image Processing Library from Intel. To emulate a single retina channel at least 10 CNN layers are required, while any further ganglion channels increase the complexity of the CNN network by additional 7 layers. I have implemented the CNN-based neuromorph retina model on several different FPGA-based experimental systems (using Virtex-II, Virtex-II Pro, and Virtex-5 FPGAs), I have explored the maximal number of implementable Falcon processing elements on each platform, while I have estimated the results of the largest Virtex-6 FPGA. Considering 22-bit state precision, 2-7 ms time-step, at (on Virtex-II) and sized images (on Virtex-6) 1-48 parallel retina channels can be implemented and emulated in real-time depending on the dedicated FPGA resources. Larger images can be processed if more BRAM memories are available, but the processing will not be done in real-time. To process higher resolution images an external on-board memory is required. In this case the processing time is at least a half at most 3 orders of magnitude slower than using internal BRAM memories, due to the memory I/O bandwidth limitation.
10 New Scientific Results 9 Thesis Group 2: Implementation of embedded CNN-UM Global Analogical Programming Unit as a Cellular Wave Computer on FPGA architecture I have implemented a Global Analogical Programming Unit on a FPGA-based emulated-digital CNN-UM architecture to obtain a fully functional Cellular Wave Computing architecture. I have completely redesigned the local Control Unit of the Falcon reconfigurable emulated-digital CNN processor and optimized it for the communication with the GAPU, the modified processor called Falcon Processing Element (FPE). I have elaborated a new GAPU architecture to control the sequential template and/or arithmetic-logic operations and program organizing instructions of analogical CNN algorithms. To perform arithmetic and logic operations a new Vector Processing Element (VPE) has been implemented. Finally, the processing array consisting VPE and FPE units has been integrated with the GAPU implementation. I have demonstrated the operation and effectiveness of the proposed embedded GAPU architecture by executing a complex sophisticated skeletonization analogic CNN algorithm. Real-time image processing capability of this autonomous system has been verified and tested on different prototyping systems. I have shown experimentally, that on the largest FPGA architecture at least two orders of magnitude performance advantage can be achieved over the software simulator, while it also provides several times speed-up over competing analog VLSI CNN-UM implementations. Related publications: [2],[8],[10],[11] Thesis 2.1: I have elaborated and implemented a new emulated-digital CNN-UM GAPU architecture, as a Cellular Wave Computer on FPGA by integrating an embedded Xilinx MicroBlaze soft-processor core to control the sequential and program organizing instructions of analogic CNN algorithms effectively. Based on the original reconfigurable emulated-digital CNN-UM processor (FALCON) I have elaborated a new computing architecture, called Falcon Processing Element (FPE), which is optimized for the communication with GAPU. The local Control Unit of the original Falcon architecture has been completely redesigned. I have implemented a new Vector Processing Element (VPE) to perform the arithmetic and logic operations by utilizing the dedicated resources on the FPGA. The processing array of FPE and VPE units is integrated with the GAPU implementation. Without the GAPU, the full processing time of the previous solutions is mainly affected by the communication time between the host PC and the Falcon PE, which is necessary for downloading template sequences, images of input and initial state, and program organizing instructions (such as branch, cycle, etc.), and uploading the result of the computation in each steps of the
11 New Scientific Results 10 algorithm. I have reduced the communication time by storing these parameters and instructions in the internal registers of the GAPU, similarly to the standard CNN-UM structure. The embedded GAPU can communicate directly with the Falcon PEs across the high-speed PLB bus of the Xilinx MicroBlaze soft-core at a frequency of the FPGA s internal clock. Therefore, the Falcon PE is more efficiently (in 91% of the full computing time) utilized when performing complex analogic CNN algorithms. Thesis 2.2: I have proved experimentally that implementation and integration of FPEs, VPEs with a GAPU unit on the reconfigurable emulated-digital CNN-UM system is most optimal at 16-bit computing precision, where the number of implementable Falcon and Vector PEs is the largest. The 18-bit state-precision gives the optimal resource occupancy, which is best suited to the bit width of the dedicated BlockSelectRAM memories (e.g. BRAM18k) and multiplier blocks (e.g. MULT18 18). However, using a Xilinx MicroBlaze embedded soft-processor core the supported high-speed communication bus (e.g. the PLB bus) can be defined as 128-bit wide (multiples of 16 bit), therefore the practical computing precision of the FPE is also 16 bit. Only small amount of the available logic and dedicated chip resources is occupied by the proposed GAPU implementation embedded with a Xilinx MicroBlaze IP core, neither the number of implementable Falcon and Vector Processing elements nor the cumulative computing performance is decreased significantly. Thesis 2.3: I have shown experimentally that the elaborated FPGAbased CNN-UM GAPU implementation provides several orders of magnitude faster processing speed over software simulation and it may also outperform the current analog VLSI CNN-UM systems, depending on the selected FPGA. The computing performance is determined by the image-size, the accuracy of the solution and the number of available dedicated memory resources. I have implemented the embedded GAPU architecture on several different FPGA-based experimental systems (using Virtex-II, Virtex-II Pro, and Virtex- 5 FPGAs), I have measured the maximal performance on each platform, while I have estimated the results of the largest Virtex-6 FPGA. The skeletonization algorithm is selected and executed using nearest neighborhood templates to measure the performance. The functionality of the GAPU is examined both on , and sized images by running 10 Forward Euler iterations,
12 New Scientific Results 11 supposing the optimal 16-bit state-, constant- and 8-bit template precision. The CNN software simulation kernel of the skeletonization algorithm is written in C using optimized functions of the Intel Image Processing Library. Depending on the dedicated resources, the cumulative performance of the Falcon processing array extended with the proposed GAPU implementation can reach 1.33 billion CNN celliteration per second or 135 billion CNN operation per second. Depending on the selected image-size ( , or ) fold speed-up can be achieved over the software simulator running optimized code on an Intel Core2Duo E8400 microprocessor. Performance of the GAPU implementation can reach or may exceed (1-order of magnitude faster) the performance of the analog ASIC VLSI CNN-UM chips (e.g. ACE16k, Q- Eye).
13 Possible Applications 12 IV. Possible Applications Several years ago I had an opportunity to participate in the long design and verification process of the emulated-digital CNN-UM array architecture, called CASTLE, which has been implemented at Analogical and Neural Computing Laboratory, Hungarian Academy of Sciences. I have attained a fundamental knowledge about the high-level hardware description languages and the full-custom ASIC VLSI layout design-simulation procedure by elaborating the global timing-, and control unit of the CASTLE array processor architecture. Implementation of the standalone FPGA-based CNN-UM system with GAPU integration benefits from this know-how. Instead of using the expensive and long development procedure of the full-custom ASIC VLSI technology, I decided to apply reconfigurable computing (RC) devices for CNN computations. Reconfigurable computing architectures, such as FPGAs, make it possible to implement low-cost, flexible and reprogrammable emulated-digital CNN-UM systems for various application areas. In the dissertation the investigated emulated-digital CNN-UM architectures have been implemented on reconfigurable computing devices and they can be employed in the following possible application areas: On one hand, different neuromorph structured, multi-layer, multi-channel retina models can be analyzed or bio-inspired biological systems can be modeled on FPGA, where the high-speed processing capability is essential. High computing performance has been achieved by using simple locally interconnected processing elements, which arranged in a large array. Moreover, this implementation has orders of magnitude higher speed advance over software simulators, which provides examinations of the differently organized retina models in video realtime by rapid reconfiguration capability. By the proposed implementation it might be possible to explore and understand the relation between the stimulation of a neuron in its corresponding receptive field and the recorded spiking patterns for a given ganglion channel. The quality of the retina model can be examined by the comparison of the FPGA-based measurements and the results of the neurobiological measurements. Knowing the differences between them the structure and parameters of the retina model can be tuned and refined. By using the properly defined retina model a smart vision system can be implemented on FPGA, which makes more effective object recognition-, tracking-, and classification possible for example in surveillance or reconnaissance applications. On the other hand, the Falcon reconfigurable emulated-digital CNN-UM processor array embedded with the GAPU implementation gives a fully functional, stand-alone image processing system. Using the GAPU implementation complex sophisticated analogical CNN algorithms can be executed in real-time. It makes possible to perform sequences of template
14 New Scientific Results 13 operations, analog and logic operations and program organizing instructions on a single FPGA based system. The GAPU implementation can be easy integrated with the previously elaborated single-layer Falcon architecture (Falcon-SL), the previously mentioned Falcon multi-layer retina architecture (Falcon-ML), or the nonlinear template runner architecture (Falcon-Nonlinear), as well. Therefore, the wide spread applicability of the GAPU implementation is further expanded towards low-cost, smart and complex image processing systems.
15 List of Publications 14 IV. List of Publications Journal Papers [1] Z. Nagy., Zs. Vörösházi, P. Szolgay Emulated Digital CNN-UM Solution of Partial Differential Equations International Journal of Circuit Theory and Applications, Wiley, Vol. 34: Special Issue : Special Issue on CNN Technology (Part 2), July-Aug pp (IF: ), ISSN: [2] Zs. Vörösházi, A. Kiss, Z. Nagy, P. Szolgay Implementation of embedded emulated-digital CNN-UM Global Analogic Programming Unit on FPGA and its application International Journal of Circuit Theory and Applications, Wiley, Vol. 36: Special Issue: Cellular Wave Computing Architecture, July-Sep pp (IF: ), ISSN: [3] Zs. Vörösházi, Z. Nagy, P. Szolgay FPGA-Based Real Time, Multichannel Emulated-Digital Retina Model Implementation EURASIP Journal on Advances in Signal Processing, Hindawi, Vol. 2009, Special Issue on CNN Technology for Spatiotemporal Signal Processing (IF: ), ISSN: International Conference Papers [4] Z. Nagy, Zs. Vörösházi, P. Szolgay An Emulated Digital Retina Model implementation on FPGA. CNNA th IEEE International Workshop on Cellular Neural Networks and their Applications, Hsinchu, Taiwan, May, 2005, pp [5] Z. Nagy, Zs. Vörösházi, P. Szolgay Mammalian Retina Model Implementation on Emulated Digital FPGA HACIPPR th Joint Hungarian-Austrian Conference on Image Processing and Pattern Recognition, Veszprém, Hungary, May, 2005, pp [6] Zs. Vörösházi, Z. Nagy, P. Szolgay An Advanced emulated digital Retina Model on FPGA to implement a real-time test environment ISCAS 2006 IEEE International Symposium on Circuits and Systems, Kos, Greece, May, 2006, pp [7] P. Szolgay, S. Kocsárdi, Z. Nagy, P. Sonkoly, Zs. Vörösházi Complex Computational Problems in Cellular Architectures RSEE 2006, Oradea, Romania, 8-10 June, 2006, pp
16 List of Publications 15 [8] Zs. Vörösházi, A. Kiss, Z. Nagy, P. Szolgay An embedded CNN-UM Global Analogic Programming Unit implementation on FPGA CNNA th IEEE International Workshop on Cellular Neural Networks and their Applications, Istanbul, Turkey, Aug, 2006, pp [9] Z. Nagy, Zs. Vörösházi, P. Szolgay A Real-time Mammalian Retina Model Implementation on FPGA CNNA th IEEE International Workshop on Cellular Neural Networks and their Applications, Istanbul, Turkey, Aug, (live demo) [10] Zs. Vörösházi, A. Kiss, Z. Nagy, P. Szolgay FPGA Based Emulated-Digital CNN-UM Implementation with GAPU CNNA th IEEE International Workshop on Cellular Neural Networks and their Applications, Santiago de Compostela, Spain, July, 2008, pp [11] Zs. Vörösházi, A. Kiss, Z. Nagy, P. Szolgay A Standalone FPGA Based Emulated-Digital CNN-UM System CNNA th IEEE International Workshop on Cellular Neural Networks and their Applications, Santiago de Compostela, Spain, July, 2008, (live demo) pp. 4. [12] Zs. Vörösházi, Z. Nagy, P. Szolgay An Advanced Real-Time, Multi-Channel Emulated-Digital Retina Model Implementation on FPGA CNNA th IEEE International Workshop on Cellular Neural Networks and their Applications, Santiago de Compostela, Spain, July, 2008 (live demo), pp. 6.
Implementation of emulated digital CNN-UM architecture on programmable logic devices and its applications
Implementation of emulated digital CNN-UM architecture on programmable logic devices and its applications Theses of the Ph.D. dissertation Zoltán Nagy Scientific adviser: Dr. Péter Szolgay Doctoral School
More informationAnalogic Computers Ltd. CNN Technology. - introduction, tools and application examples-
CNN Technology - introduction, tools and application examples- Outline Introduction to CNN Array structure and the analog core cell CNN Universal Machine CNN implementations Analog mixed-signal VLSI Emulated
More informationArchitectures and Platforms
Hardware/Software Codesign Arch&Platf. - 1 Architectures and Platforms 1. Architecture Selection: The Basic Trade-Offs 2. General Purpose vs. Application-Specific Processors 3. Processor Specialisation
More informationhttp://www.ece.ucy.ac.cy/labs/easoc/people/kyrkou/index.html BSc in Computer Engineering, University of Cyprus
Christos Kyrkou, PhD KIOS Research Center for Intelligent Systems and Networks, Department of Electrical and Computer Engineering, University of Cyprus, Tel:(+357)99569478, email: ckyrkou@gmail.com Education
More informationDigital Systems Design! Lecture 1 - Introduction!!
ECE 3401! Digital Systems Design! Lecture 1 - Introduction!! Course Basics Classes: Tu/Th 11-12:15, ITE 127 Instructor Mohammad Tehranipoor Office hours: T 1-2pm, or upon appointments @ ITE 441 Email:
More information7a. System-on-chip design and prototyping platforms
7a. System-on-chip design and prototyping platforms Labros Bisdounis, Ph.D. Department of Computer and Communication Engineering 1 What is System-on-Chip (SoC)? System-on-chip is an integrated circuit
More informationReconfigurable Architecture Requirements for Co-Designed Virtual Machines
Reconfigurable Architecture Requirements for Co-Designed Virtual Machines Kenneth B. Kent University of New Brunswick Faculty of Computer Science Fredericton, New Brunswick, Canada ken@unb.ca Micaela Serra
More informationON SUITABILITY OF FPGA BASED EVOLVABLE HARDWARE SYSTEMS TO INTEGRATE RECONFIGURABLE CIRCUITS WITH HOST PROCESSING UNIT
216 ON SUITABILITY OF FPGA BASED EVOLVABLE HARDWARE SYSTEMS TO INTEGRATE RECONFIGURABLE CIRCUITS WITH HOST PROCESSING UNIT *P.Nirmalkumar, **J.Raja Paul Perinbam, @S.Ravi and #B.Rajan *Research Scholar,
More informationInternational Workshop on Field Programmable Logic and Applications, FPL '99
International Workshop on Field Programmable Logic and Applications, FPL '99 DRIVE: An Interpretive Simulation and Visualization Environment for Dynamically Reconægurable Systems? Kiran Bondalapati and
More informationHigh-Level Synthesis for FPGA Designs
High-Level Synthesis for FPGA Designs BRINGING BRINGING YOU YOU THE THE NEXT NEXT LEVEL LEVEL IN IN EMBEDDED EMBEDDED DEVELOPMENT DEVELOPMENT Frank de Bont Trainer consultant Cereslaan 10b 5384 VT Heesch
More informationImplementations of CNN-based image processing and adaptive optic system on FPGA
Implementations of CNN-based image processing and adaptive optic system on FPGA Ph.D. Theses Zoltán Kincses Supervisor: Péter Szolgay (DSc) University of Pannonia Doctoral School of Information Science
More informationHow To Design An Image Processing System On A Chip
RAPID PROTOTYPING PLATFORM FOR RECONFIGURABLE IMAGE PROCESSING B.Kovář 1, J. Kloub 1, J. Schier 1, A. Heřmánek 1, P. Zemčík 2, A. Herout 2 (1) Institute of Information Theory and Automation Academy of
More informationSystolic Computing. Fundamentals
Systolic Computing Fundamentals Motivations for Systolic Processing PARALLEL ALGORITHMS WHICH MODEL OF COMPUTATION IS THE BETTER TO USE? HOW MUCH TIME WE EXPECT TO SAVE USING A PARALLEL ALGORITHM? HOW
More informationIntroduction to Digital System Design
Introduction to Digital System Design Chapter 1 1 Outline 1. Why Digital? 2. Device Technologies 3. System Representation 4. Abstraction 5. Development Tasks 6. Development Flow Chapter 1 2 1. Why Digital
More informationDevelopment of a Research-oriented Wireless System for Human Performance Monitoring
Development of a Research-oriented Wireless System for Human Performance Monitoring by Jonathan Hill ECE Dept., Univ. of Hartford jmhill@hartford.edu Majdi Atallah ECE Dept., Univ. of Hartford atallah@hartford.edu
More informationSEISMIC WAVE PROPAGATION MODELLING ON EMULATED DIGITAL CNN-UM ARCHITECTURE
PERIODICA POLYTECHNICA SER. EL. ENG. VOL. 49, NO. 3 4, PP. 183 193 (005) SEISMIC WAVE PROPAGATION MODELLING ON EMULATED DIGITAL CNN-UM ARCHITECTURE Péter KOZMA 1, Zoltán NAGY 1 and Péter SZOLGAY 1, 1 Department
More informationLesson 7: SYSTEM-ON. SoC) AND USE OF VLSI CIRCUIT DESIGN TECHNOLOGY. Chapter-1L07: "Embedded Systems - ", Raj Kamal, Publs.: McGraw-Hill Education
Lesson 7: SYSTEM-ON ON-CHIP (SoC( SoC) AND USE OF VLSI CIRCUIT DESIGN TECHNOLOGY 1 VLSI chip Integration of high-level components Possess gate-level sophistication in circuits above that of the counter,
More informationNIOS II Based Embedded Web Server Development for Networking Applications
NIOS II Based Embedded Web Server Development for Networking Applications 1 Sheetal Bhoyar, 2 Dr. D. V. Padole 1 Research Scholar, G. H. Raisoni College of Engineering, Nagpur, India 2 Professor, G. H.
More informationAgenda. Michele Taliercio, Il circuito Integrato, Novembre 2001
Agenda Introduzione Il mercato Dal circuito integrato al System on a Chip (SoC) La progettazione di un SoC La tecnologia Una fabbrica di circuiti integrati 28 How to handle complexity G The engineering
More informationCFD Implementation with In-Socket FPGA Accelerators
CFD Implementation with In-Socket FPGA Accelerators Ivan Gonzalez UAM Team at DOVRES FuSim-E Programme Symposium: CFD on Future Architectures C 2 A 2 S 2 E DLR Braunschweig 14 th -15 th October 2009 Outline
More informationSeeking Opportunities for Hardware Acceleration in Big Data Analytics
Seeking Opportunities for Hardware Acceleration in Big Data Analytics Paul Chow High-Performance Reconfigurable Computing Group Department of Electrical and Computer Engineering University of Toronto Who
More informationA Computer Vision System on a Chip: a case study from the automotive domain
A Computer Vision System on a Chip: a case study from the automotive domain Gideon P. Stein Elchanan Rushinek Gaby Hayun Amnon Shashua Mobileye Vision Technologies Ltd. Hebrew University Jerusalem, Israel
More informationModel-based system-on-chip design on Altera and Xilinx platforms
CO-DEVELOPMENT MANUFACTURING INNOVATION & SUPPORT Model-based system-on-chip design on Altera and Xilinx platforms Ronald Grootelaar, System Architect RJA.Grootelaar@3t.nl Agenda 3T Company profile Technology
More informationMoving Beyond CPUs in the Cloud: Will FPGAs Sink or Swim?
Moving Beyond CPUs in the Cloud: Will FPGAs Sink or Swim? Successful FPGA datacenter usage at scale will require differentiated capability, programming ease, and scalable implementation models Executive
More informationLow-resolution Image Processing based on FPGA
Abstract Research Journal of Recent Sciences ISSN 2277-2502. Low-resolution Image Processing based on FPGA Mahshid Aghania Kiau, Islamic Azad university of Karaj, IRAN Available online at: www.isca.in,
More informationA Mixed-Signal System-on-Chip Audio Decoder Design for Education
A Mixed-Signal System-on-Chip Audio Decoder Design for Education R. Koenig, A. Thomas, M. Kuehnle, J. Becker, E.Crocoll, M. Siegel @itiv.uni-karlsruhe.de @ims.uni-karlsruhe.de
More informationReconfigurable System-on-Chip Design
Reconfigurable System-on-Chip Design MITCHELL MYJAK Senior Research Engineer Pacific Northwest National Laboratory PNNL-SA-93202 31 January 2013 1 About Me Biography BSEE, University of Portland, 2002
More informationFPGA area allocation for parallel C applications
1 FPGA area allocation for parallel C applications Vlad-Mihai Sima, Elena Moscu Panainte, Koen Bertels Computer Engineering Faculty of Electrical Engineering, Mathematics and Computer Science Delft University
More informationAn Open Architecture through Nanocomputing
2009 International Symposium on Computing, Communication, and Control (ISCCC 2009) Proc.of CSIT vol.1 (2011) (2011) IACSIT Press, Singapore An Open Architecture through Nanocomputing Joby Joseph1and A.
More informationFPGA. AT6000 FPGAs. Application Note AT6000 FPGAs. 3x3 Convolver with Run-Time Reconfigurable Vector Multiplier in Atmel AT6000 FPGAs.
3x3 Convolver with Run-Time Reconfigurable Vector Multiplier in Atmel AT6000 s Introduction Convolution is one of the basic and most common operations in both analog and digital domain signal processing.
More informationFloat to Fix conversion
www.thalesgroup.com Float to Fix conversion Fabrice Lemonnier Research & Technology 2 / Thales Research & Technology : Research center of Thales Objective: to propose technological breakthrough for the
More informationNeural Network Design in Cloud Computing
International Journal of Computer Trends and Technology- volume4issue2-2013 ABSTRACT: Neural Network Design in Cloud Computing B.Rajkumar #1,T.Gopikiran #2,S.Satyanarayana *3 #1,#2Department of Computer
More informationComputer System: User s View. Computer System Components: High Level View. Input. Output. Computer. Computer System: Motherboard Level
System: User s View System Components: High Level View Input Output 1 System: Motherboard Level 2 Components: Interconnection I/O MEMORY 3 4 Organization Registers ALU CU 5 6 1 Input/Output I/O MEMORY
More informationBest Practises for LabVIEW FPGA Design Flow. uk.ni.com ireland.ni.com
Best Practises for LabVIEW FPGA Design Flow 1 Agenda Overall Application Design Flow Host, Real-Time and FPGA LabVIEW FPGA Architecture Development FPGA Design Flow Common FPGA Architectures Testing and
More informationDesign and Implementation of an On-Chip timing based Permutation Network for Multiprocessor system on Chip
Design and Implementation of an On-Chip timing based Permutation Network for Multiprocessor system on Chip Ms Lavanya Thunuguntla 1, Saritha Sapa 2 1 Associate Professor, Department of ECE, HITAM, Telangana
More informationHardware/Software Co-Design of a Java Virtual Machine
Hardware/Software Co-Design of a Java Virtual Machine Kenneth B. Kent University of Victoria Dept. of Computer Science Victoria, British Columbia, Canada ken@csc.uvic.ca Micaela Serra University of Victoria
More informationCustom design services
Custom design services Your partner for electronic design services and solutions Barco Silex, Barco s center of competence for micro-electronic design, has established a solid reputation in the development
More informationAC 2007-2485: PRACTICAL DESIGN PROJECTS UTILIZING COMPLEX PROGRAMMABLE LOGIC DEVICES (CPLD)
AC 2007-2485: PRACTICAL DESIGN PROJECTS UTILIZING COMPLEX PROGRAMMABLE LOGIC DEVICES (CPLD) Samuel Lakeou, University of the District of Columbia Samuel Lakeou received a BSEE (1974) and a MSEE (1976)
More informationA General Framework for Tracking Objects in a Multi-Camera Environment
A General Framework for Tracking Objects in a Multi-Camera Environment Karlene Nguyen, Gavin Yeung, Soheil Ghiasi, Majid Sarrafzadeh {karlene, gavin, soheil, majid}@cs.ucla.edu Abstract We present a framework
More informationAims and Objectives. E 3.05 Digital System Design. Course Syllabus. Course Syllabus (1) Programmable Logic
Aims and Objectives E 3.05 Digital System Design Peter Cheung Department of Electrical & Electronic Engineering Imperial College London URL: www.ee.ic.ac.uk/pcheung/ E-mail: p.cheung@ic.ac.uk How to go
More informationCodesign: The World Of Practice
Codesign: The World Of Practice D. Sreenivasa Rao Senior Manager, System Level Integration Group Analog Devices Inc. May 2007 Analog Devices Inc. ADI is focused on high-end signal processing chips and
More informationELEC 5260/6260/6266 Embedded Computing Systems
ELEC 5260/6260/6266 Embedded Computing Systems Spring 2016 Victor P. Nelson Text: Computers as Components, 3 rd Edition Prof. Marilyn Wolf (Georgia Tech) Course Topics Embedded system design & modeling
More informationA New, High-Performance, Low-Power, Floating-Point Embedded Processor for Scientific Computing and DSP Applications
1 A New, High-Performance, Low-Power, Floating-Point Embedded Processor for Scientific Computing and DSP Applications Simon McIntosh-Smith Director of Architecture 2 Multi-Threaded Array Processing Architecture
More informationDigitale Signalverarbeitung mit FPGA (DSF) Soft Core Prozessor NIOS II Stand Mai 2007. Jens Onno Krah
(DSF) Soft Core Prozessor NIOS II Stand Mai 2007 Jens Onno Krah Cologne University of Applied Sciences www.fh-koeln.de jens_onno.krah@fh-koeln.de NIOS II 1 1 What is Nios II? Altera s Second Generation
More informationSynchronization of sampling in distributed signal processing systems
Synchronization of sampling in distributed signal processing systems Károly Molnár, László Sujbert, Gábor Péceli Department of Measurement and Information Systems, Budapest University of Technology and
More informationMEng, BSc Applied Computer Science
School of Computing FACULTY OF ENGINEERING MEng, BSc Applied Computer Science Year 1 COMP1212 Computer Processor Effective programming depends on understanding not only how to give a machine instructions
More information9/14/2011 14.9.2011 8:38
Algorithms and Implementation Platforms for Wireless Communications TLT-9706/ TKT-9636 (Seminar Course) BASICS OF FIELD PROGRAMMABLE GATE ARRAYS Waqar Hussain firstname.lastname@tut.fi Department of Computer
More informationOpen Architecture Design for GPS Applications Yves Théroux, BAE Systems Canada
Open Architecture Design for GPS Applications Yves Théroux, BAE Systems Canada BIOGRAPHY Yves Théroux, a Project Engineer with BAE Systems Canada (BSC) has eight years of experience in the design, qualification,
More informationSoC Curricula at Tallinn Technical University
SoC Curricula at Tallinn Technical University Margus Kruus, Kalle Tammemäe, Peeter Ellervee Tallinn Technical University Phone: +372-6202250, Fax: +372-6202246 kruus@cc.ttu.ee nalle@cc.ttu.ee lrv@cc.ttu.ee
More informationNetworking Virtualization Using FPGAs
Networking Virtualization Using FPGAs Russell Tessier, Deepak Unnikrishnan, Dong Yin, and Lixin Gao Reconfigurable Computing Group Department of Electrical and Computer Engineering University of Massachusetts,
More informationBUILD VERSUS BUY. Understanding the Total Cost of Embedded Design. www.ni.com/buildvsbuy
BUILD VERSUS BUY Understanding the Total Cost of Embedded Design Table of Contents I. Introduction II. The Build Approach: Custom Design a. Hardware Design b. Software Design c. Manufacturing d. System
More informationIBM Deep Computing Visualization Offering
P - 271 IBM Deep Computing Visualization Offering Parijat Sharma, Infrastructure Solution Architect, IBM India Pvt Ltd. email: parijatsharma@in.ibm.com Summary Deep Computing Visualization in Oil & Gas
More informationA bachelor of science degree in electrical engineering with a cumulative undergraduate GPA of at least 3.0 on a 4.0 scale
What is the University of Florida EDGE Program? EDGE enables engineering professional, military members, and students worldwide to participate in courses, certificates, and degree programs from the UF
More informationHardware Task Scheduling and Placement in Operating Systems for Dynamically Reconfigurable SoC
Hardware Task Scheduling and Placement in Operating Systems for Dynamically Reconfigurable SoC Yuan-Hsiu Chen and Pao-Ann Hsiung National Chung Cheng University, Chiayi, Taiwan 621, ROC. pahsiung@cs.ccu.edu.tw
More informationSoftware Development Under Stringent Hardware Constraints: Do Agile Methods Have a Chance?
Software Development Under Stringent Hardware Constraints: Do Agile Methods Have a Chance? Jussi Ronkainen, Pekka Abrahamsson VTT Technical Research Centre of Finland P.O. Box 1100 FIN-90570 Oulu, Finland
More informationFPGAs in Next Generation Wireless Networks
FPGAs in Next Generation Wireless Networks March 2010 Lattice Semiconductor 5555 Northeast Moore Ct. Hillsboro, Oregon 97124 USA Telephone: (503) 268-8000 www.latticesemi.com 1 FPGAs in Next Generation
More informationFraunhofer Institute for Telecommunications
Fraunhofer Institute for Telecommunications Heinrich-Hertz-Institut SCUBE-ICT Emerging Berlin opportunities under FP7-ICT Call 5 Minsk, 25.-26.06.2009 Einsteinufer 37 10587 Berlin Germany Phone: Fax: email:
More informationParallelized Architecture of Multiple Classifiers for Face Detection
Parallelized Architecture of Multiple s for Face Detection Author(s) Name(s) Author Affiliation(s) E-mail Abstract This paper presents a parallelized architecture of multiple classifiers for face detection
More informationSecured Embedded Many-Core Accelerator for Big Data Processing
Secured Embedded Many- Accelerator for Big Data Processing Amey Kulkarni PhD Candidate Advisor: Professor Tinoosh Mohsenin Energy Efficient High Performance Computing (EEHPC) Lab University of Maryland,
More informationMEng, BSc Computer Science with Artificial Intelligence
School of Computing FACULTY OF ENGINEERING MEng, BSc Computer Science with Artificial Intelligence Year 1 COMP1212 Computer Processor Effective programming depends on understanding not only how to give
More informationRapid System Prototyping with FPGAs
Rapid System Prototyping with FPGAs By R.C. Coferand Benjamin F. Harding AMSTERDAM BOSTON HEIDELBERG LONDON NEW YORK OXFORD PARIS SAN DIEGO SAN FRANCISCO SINGAPORE SYDNEY TOKYO Newnes is an imprint of
More informationImage processing in the military technology
AARMS Vol. 2, No. 2 (2003) 221 231 INFORMATICS ROBOTICS Image processing in the military technology TIBOR BUZÁSI Ministry of Defence, Technology Agency, Budapest, Hungary At the Ministry of Defence, Technology
More informationFPGA Design From Scratch It all started more than 40 years ago
FPGA Design From Scratch It all started more than 40 years ago Presented at FPGA Forum in Trondheim 14-15 February 2012 Sven-Åke Andersson Realtime Embedded 1 Agenda Moore s Law Processor, Memory and Computer
More informationThe Big Data methodology in computer vision systems
The Big Data methodology in computer vision systems Popov S.B. Samara State Aerospace University, Image Processing Systems Institute, Russian Academy of Sciences Abstract. I consider the advantages of
More informationAn Embedded Hardware-Efficient Architecture for Real-Time Cascade Support Vector Machine Classification
An Embedded Hardware-Efficient Architecture for Real-Time Support Vector Machine Classification Christos Kyrkou, Theocharis Theocharides KIOS Research Center, Department of Electrical and Computer Engineering
More informationWhite Paper FPGA Performance Benchmarking Methodology
White Paper Introduction This paper presents a rigorous methodology for benchmarking the capabilities of an FPGA family. The goal of benchmarking is to compare the results for one FPGA family versus another
More informationImplementation and Design of AES S-Box on FPGA
International Journal of Research in Engineering and Science (IJRES) ISSN (Online): 232-9364, ISSN (Print): 232-9356 Volume 3 Issue ǁ Jan. 25 ǁ PP.9-4 Implementation and Design of AES S-Box on FPGA Chandrasekhar
More informationFPGA-based MapReduce Framework for Machine Learning
FPGA-based MapReduce Framework for Machine Learning Bo WANG 1, Yi SHAN 1, Jing YAN 2, Yu WANG 1, Ningyi XU 2, Huangzhong YANG 1 1 Department of Electronic Engineering Tsinghua University, Beijing, China
More informationFPGA-based Multithreading for In-Memory Hash Joins
FPGA-based Multithreading for In-Memory Hash Joins Robert J. Halstead, Ildar Absalyamov, Walid A. Najjar, Vassilis J. Tsotras University of California, Riverside Outline Background What are FPGAs Multithreaded
More informationREAL-TIME STREAMING ANALYTICS DATA IN, ACTION OUT
REAL-TIME STREAMING ANALYTICS DATA IN, ACTION OUT SPOT THE ODD ONE BEFORE IT IS OUT flexaware.net Streaming analytics: from data to action Do you need actionable insights from various data streams fast?
More informationHARDWARE IMPLEMENTATION OF TASK MANAGEMENT IN EMBEDDED REAL-TIME OPERATING SYSTEMS
HARDWARE IMPLEMENTATION OF TASK MANAGEMENT IN EMBEDDED REAL-TIME OPERATING SYSTEMS 1 SHI-HAI ZHU 1Department of Computer and Information Engineering, Zhejiang Water Conservancy and Hydropower College Hangzhou,
More informationIntroduction to System-on-Chip
Introduction to System-on-Chip COE838: Systems-on-Chip Design http://www.ee.ryerson.ca/~courses/coe838/ Dr. Gul N. Khan http://www.ee.ryerson.ca/~gnkhan Electrical and Computer Engineering Ryerson University
More informationHigh Performance Computing. Course Notes 2007-2008. HPC Fundamentals
High Performance Computing Course Notes 2007-2008 2008 HPC Fundamentals Introduction What is High Performance Computing (HPC)? Difficult to define - it s a moving target. Later 1980s, a supercomputer performs
More informationVON BRAUN LABS. Issue #1 WE PROVIDE COMPLETE SOLUTIONS ULTRA LOW POWER STATE MACHINE SOLUTIONS VON BRAUN LABS. State Machine Technology
VON BRAUN LABS WE PROVIDE COMPLETE SOLUTIONS WWW.VONBRAUNLABS.COM Issue #1 VON BRAUN LABS WE PROVIDE COMPLETE SOLUTIONS ULTRA LOW POWER STATE MACHINE SOLUTIONS State Machine Technology IoT Solutions Learn
More informationExtending the Power of FPGAs. Salil Raje, Xilinx
Extending the Power of FPGAs Salil Raje, Xilinx Extending the Power of FPGAs The Journey has Begun Salil Raje Xilinx Corporate Vice President Software and IP Products Development Agenda The Evolution of
More informationEnhancing Cloud-based Servers by GPU/CPU Virtualization Management
Enhancing Cloud-based Servers by GPU/CPU Virtualiz Management Tin-Yu Wu 1, Wei-Tsong Lee 2, Chien-Yu Duan 2 Department of Computer Science and Inform Engineering, Nal Ilan University, Taiwan, ROC 1 Department
More informationLatency in High Performance Trading Systems Feb 2010
Latency in High Performance Trading Systems Feb 2010 Stephen Gibbs Automated Trading Group Overview Review the architecture of a typical automated trading system Review the major sources of latency, many
More informationA Survey of Video Processing with Field Programmable Gate Arrays (FGPA)
A Survey of Video Processing with Field Programmable Gate Arrays (FGPA) Heather Garnell Abstract This paper is a high-level, survey of recent developments in the area of video processing using reconfigurable
More informationWhite Paper. S2C Inc. 1735 Technology Drive, Suite 620 San Jose, CA 95110, USA Tel: +1 408 213 8818 Fax: +1 408 213 8821 www.s2cinc.com.
White Paper FPGA Prototyping of System-on-Chip Designs The Need for a Complete Prototyping Platform for Any Design Size, Any Design Stage with Enterprise-Wide Access, Anytime, Anywhere S2C Inc. 1735 Technology
More informationdspace DSP DS-1104 based State Observer Design for Position Control of DC Servo Motor
dspace DSP DS-1104 based State Observer Design for Position Control of DC Servo Motor Jaswandi Sawant, Divyesh Ginoya Department of Instrumentation and control, College of Engineering, Pune. ABSTRACT This
More informationThe Department of Electrical and Computer Engineering (ECE) offers the following graduate degree programs:
Note that these pages are extracted from the full Graduate Catalog, please refer to it for complete details. College of 1 ELECTRICAL AND COMPUTER ENGINEERING www.ece.neu.edu SHEILA S. HEMAMI, PHD Professor
More informationCurriculum for a Master s Degree in ECE with focus on Mixed Signal SOC Design
Curriculum for a Master s Degree in ECE with focus on Mixed Signal SOC Design Department of Electrical and Computer Engineering Overview The VLSI Design program is part of two tracks in the department:
More informationGEDAE TM - A Graphical Programming and Autocode Generation Tool for Signal Processor Applications
GEDAE TM - A Graphical Programming and Autocode Generation Tool for Signal Processor Applications Harris Z. Zebrowitz Lockheed Martin Advanced Technology Laboratories 1 Federal Street Camden, NJ 08102
More informationLMS is a simple but powerful algorithm and can be implemented to take advantage of the Lattice FPGA architecture.
February 2012 Introduction Reference Design RD1031 Adaptive algorithms have become a mainstay in DSP. They are used in wide ranging applications including wireless channel estimation, radar guidance systems,
More informationAnalecta Vol. 8, No. 2 ISSN 2064-7964
EXPERIMENTAL APPLICATIONS OF ARTIFICIAL NEURAL NETWORKS IN ENGINEERING PROCESSING SYSTEM S. Dadvandipour Institute of Information Engineering, University of Miskolc, Egyetemváros, 3515, Miskolc, Hungary,
More informationControl 2004, University of Bath, UK, September 2004
Control, University of Bath, UK, September ID- IMPACT OF DEPENDENCY AND LOAD BALANCING IN MULTITHREADING REAL-TIME CONTROL ALGORITHMS M A Hossain and M O Tokhi Department of Computing, The University of
More informationPhase Change Memory for Neuromorphic Systems and Applications
Phase Change Memory for Neuromorphic Systems and Applications M. Suri 1, O. Bichler 2, D. Querlioz 3, V. Sousa 1, L. Perniola 1, D. Vuillaume 4, C. Gamrat 2, and B. DeSalvo 1 (manan.suri@cea.fr, barbara.desalvo@cea.fr)
More informationHigh-Speed SERDES Interfaces In High Value FPGAs
High-Speed SERDES Interfaces In High Value FPGAs February 2009 Lattice Semiconductor 5555 Northeast Moore Ct. Hillsboro, Oregon 97124 USA Telephone: (503) 268-8000 www.latticesemi.com 1 High-Speed SERDES
More informationNetworking Remote-Controlled Moving Image Monitoring System
Networking Remote-Controlled Moving Image Monitoring System First Prize Networking Remote-Controlled Moving Image Monitoring System Institution: Participants: Instructor: National Chung Hsing University
More informationFPGA Design of Reconfigurable Binary Processor Using VLSI
ISSN (Online) : 2319-8753 ISSN (Print) : 2347-6710 International Journal of Innovative Research in Science, Engineering and Technology Volume 3, Special Issue 3, March 2014 2014 International Conference
More informationElectronic system-level development: Finding the right mix of solutions for the right mix of engineers.
Electronic system-level development: Finding the right mix of solutions for the right mix of engineers. Nowadays, System Engineers are placed in the centre of two antagonist flows: microelectronic systems
More informationA Compact FPGA Implementation of Triple-DES Encryption System with IP Core Generation and On-Chip Verification
Proceedings of the 2010 International Conference on Industrial Engineering and Operations Management Dhaka, Bangladesh, January 9 10, 2010 A Compact FPGA Implementation of Triple-DES Encryption System
More informationAutomotive Software Engineering
Automotive Software Engineering List of Chapters: 1. Introduction and Overview 1.1 The Driver Vehicle Environment System 1.1.1 Design and Method of Operation of Vehicle Electronic 1.1.2 Electronic of the
More informationHowHow to Get Rid of Unwanted Money
On-Chip Evolution Using a Soft Processor Core Applied to Image Recognition Kyrre Glette and Jim Torresen University of Oslo Department of Informatics PO Box 1080 Blindern, 0316 Oslo, Norway {kyrrehg,jimtoer}@ifiuiono
More informationNORTHEASTERN UNIVERSITY Graduate School of Engineering
NORTHEASTERN UNIVERSITY Graduate School of Engineering Thesis Title: Enabling Communications Between an FPGA s Embedded Processor and its Reconfigurable Resources Author: Joshua Noseworthy Department:
More informationMicroprocessor and Hardware Laboratory (MHL)
Microprocessor and Hardware Laboratory (MHL) Διονύσης Πνευματικάτος Καθηγητής, Διευθυντής MHL Τμήμα Ηλεκτρονικών Μηχανικών και Μηχανικών Υπολογιστών ΠΟΛΥΤΕΧΝΕΙΟ ΚΡΗΤΗΣ Mission High Quality Research: Basic
More informationMATLAB/Simulink Based Hardware/Software Co-Simulation for Designing Using FPGA Configured Soft Processors
MATLAB/Simulink Based Hardware/Software Co-Simulation for Designing Using FPGA Configured Soft Processors Jingzhao Ou and Viktor K. Prasanna Department of Electrical Engineering, University of Southern
More informationPower Reduction Techniques in the SoC Clock Network. Clock Power
Power Reduction Techniques in the SoC Network Low Power Design for SoCs ASIC Tutorial SoC.1 Power Why clock power is important/large» Generally the signal with the highest frequency» Typically drives a
More informationGPU System Architecture. Alan Gray EPCC The University of Edinburgh
GPU System Architecture EPCC The University of Edinburgh Outline Why do we want/need accelerators such as GPUs? GPU-CPU comparison Architectural reasons for GPU performance advantages GPU accelerated systems
More informationSystems on Chip Design
Systems on Chip Design College: Engineering Department: Electrical First: Course Definition, a Summary: 1 Course Code: EE 19 Units: 3 credit hrs 3 Level: 3 rd 4 Prerequisite: Basic knowledge of microprocessor/microcontroller
More information