Low Latency, High Throughput, and Less Complex VLSI Architecture for 2D-DFT
|
|
- Eileen Holland
- 7 years ago
- Views:
Transcription
1 The International Arab Journal of Information Technology, Vol., o., January Low Latency, High Throughput, Less Complex VLSI Architecture for D-DT Sohil Shah, Preethi Venkatesan, Deepa Sundar, unii Kannan Department of Electronics Engineering, Anna University, India Abstract: This paper proposes a pipelined, systolic architecture for two- dimensional discrete ourier transform computation which is highly concurrent. The architecture consists of two, one-dimensional discrete ourier transform blocks connected via an intermediate buffer. The proposed architecture offers low latency as well as high throughput can perform both one two- dimensional discrete ourier transforms. The architecture supports transform length that is not power of two not based on products of co-prime numbers. The simulation synthesis were carried out using Cadence tools, csim RTL Compiler, respectively, with nm libraries. Keywords: Digital signal processing chip, discrete ourier transforms, systolic array, very-large-scale integration circuit. Received ebruary, ; accepted June,. Introduction The discrete ourier transform has a vital importance in various domains like signal processing, image processing, communication various other areas of science engineering. In particular, two dimensional Discrete ourier Transform (DT) has a variety of applications in image processing domain like filtering, compression, pattern matching, magnetic resonance imaging watermarking. Hence efficient implementation of the DT architecture has become a challenge. Since many of these applications process real time data, computational speed has to increase with low latency. ast ourier Transform (T) based architectures were designed to acquire computational speed [. But they had many disadvantages such as the requirement of transform length to be a power of some fixed radix which limits the choice of reachable values of the increase in computational complexity latency in pipelined Ts with the increase in the transform length. Other architectures has been studied for years, among which systolic architecture is preferred as it is simple offers rhythmic processing with repetitive elements or identical cells called processing elements. Hence there is a need for the development of a high performance systolic architecture for two- dimensional DT that work on transform lengths that is not power of two. In this paper, a less complex systolic architecture for the computation of d-dt is proposed based on an algorithm which facilitates the computation of DT for any transform length,.i.e., can be a prime number need not be a power of any fixed radix. High throughput is achieved by using pipelining concept where d-dt is computed via two pipelined d-dt blocks provided with an intermediate buffer. In Section, the previous works related to systolic architectures of DT are summarized. Section gives a mathematical framework for d-dt computation for transform length via four inner products of real valued data of length approximately equal to /. Section discusses the proposed VLSI architecture which includes one-dimensional DT block, intermediate buffer, floating point adders multipliers. Implementation aspects results for the proposed architecture are given in section.. Related ork A variety of systolic array designs have been proposed for the direct computation of DT [,. These designs offer simple architecture can be used for any transform length but they are computationally complex as they work on the direct algorithm of DT where the arithmetic operations per DT computation is of O( ). Hence as the transform length increases they offer a computation intensive architecture which does not support direct calculation of two dimensional, d-dt. Since computation of d-dt from d-dt requires a transposition operation, latency is reduced hence transposition free two dimensional DT architectures were developed [,. But this has been achieved at the cost of increased hardware reduced performance which is undesirable. Computationally efficient architecture has been designed using two level transform factorization which provides low latency high throughput [,. But this design is not
2 The International Arab Journal of Information Technology, Vol., o., January applicable for odd values of hence offers less choice in choosing. A reduced complexity algorithm that operates on any transform length for -D DT is given in [.Based on that algorithm, a two dimensional architecture offering low latency high throughput has been proposed in this paper where a dual port RA is used in between two -D blocks. The dual port RA is controlled in such a way that pipelining is achieved thereby reducing latency increasing throughput.. athematical Background.. General ormulation DT of an array {f (n, m) for n, m=,, -} of size ( x ) is defined as [, ( k, l ) = n = m = for k, m, l, n =,, -, f ( m, n ) ab =e -jπab/ ln DT of a sequence {f(n),,,..., -} is defined as, ( k ) = for k, n =,,., - f ( n ) The equation can be split into two separate mathematical computations, ' ( m, l ) = for l, n =,,., - ( k, l ) = for k, m =,,., - m = f ( m, n) ' ( m, l ) ln The -point D DT of all rows of an input array of size (X) will result in intermediate matrix [ (m, l). The D DT is then computed by the inner product of the k th column of the DT kernel with the l th column of the intermediate matrix [ (m, l) [... ormulation of a Proposed Algorithm Since equations indicates that D-DT computation can be performed using two consecutive D-DT, we hereby extend the algorithm which was proposed for D-DT to help in D-DT computation with the introduction of an additional hardware which we propose to avoid transposition operation to enhance speed of computation. Thus equation can be expressed as, km km () () () () () ( k ) = f ( ) + ( k ) = f () + ( f ( n ) ( f ( n) where k=,,., ( ) = where =( -)/. f ( ) + + ( f ( n ) + f ( n ) + f ( n) f ( n )) can either be an odd or even number. Here we assume to be an odd number for hardware proposal suitable modification can be done for even values of. odified equations to for even values of are, jπk ( k) = f () + f ( / ) e + (f(n) + f( n) ) () jπk ( k) = f () + f ( / ) e + ( f ( n) + f ( n) ) () ( ) = f () + ( f ( n) + f ( n)) + f ( / ) () ( / ) = ( f (n) f (n+ )) () where =(/)-. Splitting the real imaginary coefficients of we get, ( k ) = f () + ( ( k ) ( k )) + j( ( k ) + ( k )) ( k ) = f () + ( ( k ) + ( k )) + j( ( k ) ( k )) where where, ( k ) = Re[ x ( n ) Re[ n = ( k ) = Im[ y ( n ) Im[ ( k ) = Im[ x ( n ) Re[ ( k ) = n = Re[ y ( n ) Im[ x(n)=[f(n) + f( -n) y(n)=[f(n) - f( -n) Re[. Im[. are the real imaginary part of a complex inputs respectively. Thus -point complex inner products is reduced to real inner products ) ) () () () () () () () () () () ()
3 Low Latency, High Throughput, Less Complex VLSI Architecture for D-DT which can be processed simultaneously each of length (/) points.. Architectural Aspects.. Proposed D-DT Structure rom the mathematical modification it is own that d structure can be easily arrived at by simply computing d-dt concurrently for the rows columns of the input array matrix the intermediate matrix result, respectively. The resultant intermediate matrix columns are globally routed in a proper manner using the intermediate buffer block of size ( x ) cells each of length bits. igure shows the fully pipelined general architectural implementation of d- DT. It consists of number of Processing Cells (PCs), number of Summing blocks & (SB & SB) one Summing block (SB), where is given by (-)/. Summing blocks form the first row last column of the structure in both the DT Blocks Processing cells are placed on the left side of the structure below the first row. The functions of the processing cells summing blocks are shown in the igure. Each summing block consists of only floating point adders performing addition/subtraction operation, which in turn receives continuous inputs during every clock cycle. SB receives a pair of inputs f( + n) f( n + ) where,,..,, denotes the total number of SB & SB blocks, from the preprocessing blocks (integer to P converter) or from the intermediate buffer stage in case of second block. SB consists of adder cells. Each SB unit receives the first element of each row/column from the blocks discussed above it consists of only single adder, computing the DC component for each row of the input matrix. Each processing cells will consist of four floating point multiplier to multiply the two processed output from the summing blocks with the DT coefficient producing real imaginary outputs separately, thus in short it computes the operation shown in equations,,,, outputting four intermediate results. ultiplicative DT coefficients are hardwired in the multiplier unit. This complex multiplication can also be performed using CORDIC multiplier by repeated shift add operation or RO based look up tables can also be used. Here each multiplier are replaced by RO tables of size L, where L is the word length based number of bits used for mantissa part for P representation, which contain all the possible product values obtained on multiplying fixed DT coefficients by all possible inputs values. Thus four adders are required in processing cells partially computing the operation of real imaginary part separately as shown in equations. inal summing block SB takes four inputs from the processing cells in the previous column adds the DC component received from SB, thus performing five add/subtract operation, giving out real imaginary values of the computed d-dt separately, of the input row/column of the matrix. Each PCs SBs consists of latches at each output point for data transfer to the adjacent PCs/SB through both horizontal vertical stream thus making it fully pipelined architecture reducing latency. The output from the SB block is available after (+) cycles in the following next clock cycle, a pair of output from each SB unit is available one after another from d-dt unit. PCs form the critical path of the structure, which can be reduced using the techniques suggested above, thus serve as a deciding factor for maximum frequency of operation. The inputs outputs to the DT unit are latched in order to store the processed values in correct sequence to buffer unit which in turn is controlled using S thus this latch doesn t play any role in latency calculation. The number of latches increases with thus we have latches following the th SB block. The input to the second DT unit can be fed only after a column computed from the DT of row sequence is available in the buffer unit. Thus, only after (+) clock cycles, input is fed to second DT block thus output of the d-dt sequence become available after clock cycles from the time the input for which DT has to be computed is fed to the first DT block. igure. Proposed d systolic structure for (x) d-dt computation.
4 The International Arab Journal of Information Technology, Vol., o., January diagonal arrows indicate the flow in which input is stored. Vertical arrows indicate the flow at which the output is read out number adjacent to rows indicate the clock cycle number at which the particular row of cells each is read out. (a) Summing block SB. (b) Summing block SB. igure. Structure of internal buffer for =. (c) Summing block SB. (d) Processing cell PC. igure. Structure function of various units... Buffer Unit The internal structure operation of buffer unit is shown in igure. It consists of cells each of size * L (word length considering both real imaginary part). Twice the size of input matrix number of cells are required to store the d-dt output of input matrix, thus performing continuous computation without any halt. hile output from first block is store continuously into one set of buffer cells other set feed the input to second DT block concurrently. Thus it can also be termed as Dual Port Ram where ports are used to simultaneously write the input into one memory cell read the output from another memory cell simultaneously in a single clock period. This buffer can be implemented as either D- or as RA in ASIC PGA. Using it as RA unit will add to extra fetching time thus slow down the operation using it as D- designed using stard cells will add overhead in terms of area power. Clock gating can be applied to the cells using D- to save power. Thus there is a trade off in terms of area speed. The number inside the cell indicates the clock cycle number at which data to each cell is written.. loating Point Block Each bit floating point input is split into a real an imaginary part, with each part having a total of bits, where mantissa is bits, exponent is bits sign bit. To multiply two floating point numbers, the mantissa of both the inputs are multiplied, while the exponents are added, the result is rounded normalized. hen we add two floating point numbers, first their exponents have to be made equal for this, shifters have been used. Then both the input s mantissa is added the result is normalized. Carry look ahead addition technique has been used in both the floating point multiplication addition. odified Booth algorithm has been used for performing the multiplication. The output also consists of bits mantissa, bits exponent sign bit... Scope for uture ork This architecture can be even used for implementing prime factor DT as well as for multidimensional DT by simply repeating the same block over over again with careful implementation of buffer unit which will further store planes of (-) DT output before computation in the last th block. igure shows the general structure for multidimensional DT. igure. General structure for ultidimensional Implementation.. Implementation Results The whole d-dt architecture for a x input has been coded in VERILOG HDL simulation synthesis has been carried out using Cadence tools, csim RTL compiler respectively. The simulation is shown in igure. Each input is a bit integer. An integer to floating point converter is used to convert
5 Low Latency, High Throughput, Less Complex VLSI Architecture for D-DT the integer input to a complex number with floating point representation. A finite state machine has been used as a control circuit which controls the flow of d- DT output into the buffers from buffers into the other d-dt block. All the synthesis reports have been obtained using the nm technology library. The synthesis report of the entire design has been summarized in Table.umber of multiplication addition needed for different transform length along with their latency Average Cycles per Transform (ACT) for the proposed architecture has been given in Table. igure. Simulation using Cadence csim. Table. Synthesis results. d-dt block Speed Power Values hz.milliwatt Area.mm Gate Count Table. Latency, Computation for proposed architecture. umber of points Latency ACT ultipliers Adders, Table. Comparison of proposed structure with existing architectures =. Structure Adders ultipliers Latency ACT [,, [,, [,, Proposed,, The proposed architecture is compared with existing architectures of d-dt for being in Table. Here Latency denotes the number of cycles within which the first output is obtained. ACT is defined as the number of clock cycles between corresponding results for successive data blocks. A lesser value of ACC denotes higher throughput. umber of adders multipliers denotes the complexity of the structure. Hence from the table it is obvious that the proposed structure is less complex when compared to [, [. Also, it offers low latency compared to [ high throughput when compared to [ [.. Application As stated previously discrete ourier transform has a wide range of applications in signal processing communication domain. ireless communication has seen many advances in recent years in which channel estimation plays an important role to cancel out the channel noise offer better performance. Two dimensional discrete ourier transform offers an efficient method to estimate spatially correlated channels in IO communication systems [ offers an interesting area of research. Pattern matching is another area where D DT has found its importance. The phase component of discrete ourier transform serves as an excellent classifier has been used in areas like finger print recognition [, dental radiography identification [ texture feature extraction [.. Conclusion Thus a systolic architecture for two dimensional discrete ourier transform providing less complex design with low latency meeting high throughput requirements has been implemented in this paper. Architectural techniques such as pipelining parallelism have been used. Since the architecture is recursive, it can be extended to multi-dimensional DT by using intermediate buffers whose size increases exponentially with the dimension. Inverse Discrete Cosine Transforms (IDCT) Discrete Cosine Transform (DCT) can also be implemented using the same architecture with slight modification as they are based on the same algorithm. The front end design has been carried out. Placement Routing on Cadence Encounter is proposed for the future. References [ Chang C., ang C., Chang Y., Efficient VLSI Architectures for ast Computation of the Discrete ourier Transform its Inverse, IEEE Transactions on Signal Processing, vol., no., pp. -,. [ Ito K., orita A., Aoki T., Higuchi T., akajima H., Kobayashi K., A ingerprint Recognition Algorithm Using Phase-Based Image atching for Low-Quality ingerprints, in IEEE International Conference on the Image Processing, vol., pp. -,.
6 The International Arab Journal of Information Technology, Vol., o., January [ Kar D. Rao V., A ew Systolic Realization for the Discrete ourier Transform, IEEE Transactions on Signal Processing, vol., no., pp. -,. [ Lim H. Swartzler E., A Systolic Array for D DT D DCT, in Proceedings of the IEEE International Conference on Application Specific Array Processors, pp. -,. [ Lim H. Swartzler E., ultidimensional Systolic Arrays for The Implementation Of Discrete ourier Transforms, IEEE Transactions Signal Processing, vol., no., pp. -,. [ Liu C., Jen C., On the Design of VLSI Arrays for Discrete ourier Transform, in IEE Proceedings on Circuits, Devices Systems, vol., no., pp. -,. [ eher P., Design of a ully-pipelined Systolic Array for lexible Transposition-ree VLSI of - D DT, IEEE Transactions on Circuits System II, vol., no., pp. -,. [ eher P., Highly Concurrent Reduced Complexity -D Systolic Array for Discrete ourier Transform, IEEE Signal Processing Letters, vol., no.,. [ ash J., Computationally Efficient Systolic Architecture for Computing the Discrete ourier Transform, IEEE Transactions on Signal Processing, vol., no.,. [ ash J., Systolic Architecture for Computing the Discrete ourier Transform on PGAs, in Proceedings of the th Annual IEEE Symposium on ield-programmable Custom Computing achines, pp. -,. [ ikaido A., Ito K., Aoki T., Kosuge E., Kawamata R., A Phase-Based Image Registration Algorithm for Dental Radiograph Identification, IEEE International Conference on Image Processing, vol., pp. -,. [ Tong H. Zekavat S., Spatially Correlated IO Channel: Generation via Virtual Channel Representation, IEEE Communication Letters, vol., pp. -,. [ Xianchao Z., Liusheng H., Guoliang C., A ew Approach for Computing the Discrete ourier Transform of Arbitrary Length, in Proceedings of the th International Conference on Signal Processing, vol., pp. -,. [ Yu T., uthukkumarasamy V., Verma B., Blumenstein., A Texture eature Extraction Technique Using D DT Hamming Distance, in Proceedings of the ifth International Conference on Computational Intelligence ultimedia Applications, p.,. Sohil Shah has completed his BE in electronics communication Engineering at adras Institute of Technology, Anna University in the year. He is pursuing his.tech in communication at IIT, Bombay. His areas of interest include VLSI design optical communication. Preethi Venkatesan has completed her BE in electronics communication engineering at adras Institute of Technology, Anna University in the year. Her areas of interest include VLSI design PGA implementations. Deepa Sundar has completed her BE in electronics communication engineering at adras Institute of Technology, Anna University in the year. Her areas of interest include VLSI design image processing. unii Kannan has been working as faculty in the department of electronics engineering, IT Campus of Anna University since. His areas of interest are computer architecture, digital electronics, computer networking VLSI. His current area of interest is VLSI architecture for signal processing applications.
Floating Point Fused Add-Subtract and Fused Dot-Product Units
Floating Point Fused Add-Subtract and Fused Dot-Product Units S. Kishor [1], S. P. Prakash [2] PG Scholar (VLSI DESIGN), Department of ECE Bannari Amman Institute of Technology, Sathyamangalam, Tamil Nadu,
More informationImplementation of Full -Parallelism AES Encryption and Decryption
Implementation of Full -Parallelism AES Encryption and Decryption M.Anto Merline M.E-Commuication Systems, ECE Department K.Ramakrishnan College of Engineering-Samayapuram, Trichy. Abstract-Advanced Encryption
More informationAn Efficient RNS to Binary Converter Using the Moduli Set {2n + 1, 2n, 2n 1}
An Efficient RNS to Binary Converter Using the oduli Set {n + 1, n, n 1} Kazeem Alagbe Gbolagade 1,, ember, IEEE and Sorin Dan Cotofana 1, Senior ember IEEE, 1. Computer Engineering Laboratory, Delft University
More informationHigh-Mix Low-Volume Flow Shop Manufacturing System Scheduling
Proceedings of the 14th IAC Symposium on Information Control Problems in Manufacturing, May 23-25, 2012 High-Mix Low-Volume low Shop Manufacturing System Scheduling Juraj Svancara, Zdenka Kralova Institute
More informationChapter 2 Basic Structure of Computers. Jin-Fu Li Department of Electrical Engineering National Central University Jungli, Taiwan
Chapter 2 Basic Structure of Computers Jin-Fu Li Department of Electrical Engineering National Central University Jungli, Taiwan Outline Functional Units Basic Operational Concepts Bus Structures Software
More informationComputation of Forward and Inverse MDCT Using Clenshaw s Recurrence Formula
IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 51, NO. 5, MAY 2003 1439 Computation of Forward Inverse MDCT Using Clenshaw s Recurrence Formula Vladimir Nikolajevic, Student Member, IEEE, Gerhard Fettweis,
More informationImplementation of Modified Booth Algorithm (Radix 4) and its Comparison with Booth Algorithm (Radix-2)
Advance in Electronic and Electric Engineering. ISSN 2231-1297, Volume 3, Number 6 (2013), pp. 683-690 Research India Publications http://www.ripublication.com/aeee.htm Implementation of Modified Booth
More informationLet s put together a Manual Processor
Lecture 14 Let s put together a Manual Processor Hardware Lecture 14 Slide 1 The processor Inside every computer there is at least one processor which can take an instruction, some operands and produce
More informationDesign and Analysis of Parallel AES Encryption and Decryption Algorithm for Multi Processor Arrays
IOSR Journal of VLSI and Signal Processing (IOSR-JVSP) Volume 5, Issue, Ver. III (Jan - Feb. 205), PP 0- e-issn: 239 4200, p-issn No. : 239 497 www.iosrjournals.org Design and Analysis of Parallel AES
More informationLab 1: Full Adder 0.0
Lab 1: Full Adder 0.0 Introduction In this lab you will design a simple digital circuit called a full adder. You will then use logic gates to draw a schematic for the circuit. Finally, you will verify
More informationA Computer Vision System on a Chip: a case study from the automotive domain
A Computer Vision System on a Chip: a case study from the automotive domain Gideon P. Stein Elchanan Rushinek Gaby Hayun Amnon Shashua Mobileye Vision Technologies Ltd. Hebrew University Jerusalem, Israel
More information1. True or False? A voltage level in the range 0 to 2 volts is interpreted as a binary 1.
File: chap04, Chapter 04 1. True or False? A voltage level in the range 0 to 2 volts is interpreted as a binary 1. 2. True or False? A gate is a device that accepts a single input signal and produces one
More informationSystolic Computing. Fundamentals
Systolic Computing Fundamentals Motivations for Systolic Processing PARALLEL ALGORITHMS WHICH MODEL OF COMPUTATION IS THE BETTER TO USE? HOW MUCH TIME WE EXPECT TO SAVE USING A PARALLEL ALGORITHM? HOW
More informationImage Compression through DCT and Huffman Coding Technique
International Journal of Current Engineering and Technology E-ISSN 2277 4106, P-ISSN 2347 5161 2015 INPRESSCO, All Rights Reserved Available at http://inpressco.com/category/ijcet Research Article Rahul
More informationON SUITABILITY OF FPGA BASED EVOLVABLE HARDWARE SYSTEMS TO INTEGRATE RECONFIGURABLE CIRCUITS WITH HOST PROCESSING UNIT
216 ON SUITABILITY OF FPGA BASED EVOLVABLE HARDWARE SYSTEMS TO INTEGRATE RECONFIGURABLE CIRCUITS WITH HOST PROCESSING UNIT *P.Nirmalkumar, **J.Raja Paul Perinbam, @S.Ravi and #B.Rajan *Research Scholar,
More informationDesign and FPGA Implementation of a Novel Square Root Evaluator based on Vedic Mathematics
International Journal of Information & Computation Technology. ISSN 0974-2239 Volume 4, Number 15 (2014), pp. 1531-1537 International Research Publications House http://www. irphouse.com Design and FPGA
More informationInnovative improvement of fundamental metrics including power dissipation and efficiency of the ALU system
Innovative improvement of fundamental metrics including power dissipation and efficiency of the ALU system Joseph LaBauve Department of Electrical and Computer Engineering University of Central Florida
More informationFPGA Implementation of an Advanced Traffic Light Controller using Verilog HDL
FPGA Implementation of an Advanced Traffic Light Controller using Verilog HDL B. Dilip, Y. Alekhya, P. Divya Bharathi Abstract Traffic lights are the signaling devices used to manage traffic on multi-way
More informationFrom Concept to Production in Secure Voice Communications
From Concept to Production in Secure Voice Communications Earl E. Swartzlander, Jr. Electrical and Computer Engineering Department University of Texas at Austin Austin, TX 78712 Abstract In the 1970s secure
More informationChapter 07: Instruction Level Parallelism VLIW, Vector, Array and Multithreaded Processors. Lesson 05: Array Processors
Chapter 07: Instruction Level Parallelism VLIW, Vector, Array and Multithreaded Processors Lesson 05: Array Processors Objective To learn how the array processes in multiple pipelines 2 Array Processor
More informationRN-Codings: New Insights and Some Applications
RN-Codings: New Insights and Some Applications Abstract During any composite computation there is a constant need for rounding intermediate results before they can participate in further processing. Recently
More informationCHAPTER 5 FINITE STATE MACHINE FOR LOOKUP ENGINE
CHAPTER 5 71 FINITE STATE MACHINE FOR LOOKUP ENGINE 5.1 INTRODUCTION Finite State Machines (FSMs) are important components of digital systems. Therefore, techniques for area efficiency and fast implementation
More informationExample-driven Interconnect Synthesis for Heterogeneous Coarse-Grain Reconfigurable Logic
Example-driven Interconnect Synthesis for Heterogeneous Coarse-Grain Reconfigurable Logic Clifford Wolf, Johann Glaser, Florian Schupfer, Jan Haase, Christoph Grimm Computer Technology /99 Overview Ultra-Low-Power
More informationMULTISTAGE INTERCONNECTION NETWORKS: A TRANSITION TO OPTICAL
MULTISTAGE INTERCONNECTION NETWORKS: A TRANSITION TO OPTICAL Sandeep Kumar 1, Arpit Kumar 2 1 Sekhawati Engg. College, Dundlod, Dist. - Jhunjhunu (Raj.), 1987san@gmail.com, 2 KIIT, Gurgaon (HR.), Abstract
More informationLecture 5: Gate Logic Logic Optimization
Lecture 5: Gate Logic Logic Optimization MAH, AEN EE271 Lecture 5 1 Overview Reading McCluskey, Logic Design Principles- or any text in boolean algebra Introduction We could design at the level of irsim
More informationFPGA. AT6000 FPGAs. Application Note AT6000 FPGAs. 3x3 Convolver with Run-Time Reconfigurable Vector Multiplier in Atmel AT6000 FPGAs.
3x3 Convolver with Run-Time Reconfigurable Vector Multiplier in Atmel AT6000 s Introduction Convolution is one of the basic and most common operations in both analog and digital domain signal processing.
More informationA DA Serial Multiplier Technique based on 32- Tap FIR Filter for Audio Application
A DA Serial Multiplier Technique ased on 32- Tap FIR Filter for Audio Application K Balraj 1, Ashish Raman 2, Dinesh Chand Gupta 3 Department of ECE Department of ECE Department of ECE Dr. B.R. Amedkar
More informationHardware Implementation of Probabilistic State Machine for Word Recognition
IJECT Vo l. 4, Is s u e Sp l - 5, Ju l y - Se p t 2013 ISSN : 2230-7109 (Online) ISSN : 2230-9543 (Print) Hardware Implementation of Probabilistic State Machine for Word Recognition 1 Soorya Asokan, 2
More informationReconfigurable Low Area Complexity Filter Bank Architecture for Software Defined Radio
Reconfigurable Low Area Complexity Filter Bank Architecture for Software Defined Radio 1 Anuradha S. Deshmukh, 2 Prof. M. N. Thakare, 3 Prof.G.D.Korde 1 M.Tech (VLSI) III rd sem Student, 2 Assistant Professor(Selection
More informationAgenda. Michele Taliercio, Il circuito Integrato, Novembre 2001
Agenda Introduzione Il mercato Dal circuito integrato al System on a Chip (SoC) La progettazione di un SoC La tecnologia Una fabbrica di circuiti integrati 28 How to handle complexity G The engineering
More informationMultipliers. Introduction
Multipliers Introduction Multipliers play an important role in today s digital signal processing and various other applications. With advances in technology, many researchers have tried and are trying
More informationDigital Logic Design. Basics Combinational Circuits Sequential Circuits. Pu-Jen Cheng
Digital Logic Design Basics Combinational Circuits Sequential Circuits Pu-Jen Cheng Adapted from the slides prepared by S. Dandamudi for the book, Fundamentals of Computer Organization and Design. Introduction
More informationProperties of Stabilizing Computations
Theory and Applications of Mathematics & Computer Science 5 (1) (2015) 71 93 Properties of Stabilizing Computations Mark Burgin a a University of California, Los Angeles 405 Hilgard Ave. Los Angeles, CA
More informationDIGITAL-TO-ANALOGUE AND ANALOGUE-TO-DIGITAL CONVERSION
DIGITAL-TO-ANALOGUE AND ANALOGUE-TO-DIGITAL CONVERSION Introduction The outputs from sensors and communications receivers are analogue signals that have continuously varying amplitudes. In many systems
More informationArchitectures and Platforms
Hardware/Software Codesign Arch&Platf. - 1 Architectures and Platforms 1. Architecture Selection: The Basic Trade-Offs 2. General Purpose vs. Application-Specific Processors 3. Processor Specialisation
More informationLow-resolution Image Processing based on FPGA
Abstract Research Journal of Recent Sciences ISSN 2277-2502. Low-resolution Image Processing based on FPGA Mahshid Aghania Kiau, Islamic Azad university of Karaj, IRAN Available online at: www.isca.in,
More informationAims and Objectives. E 3.05 Digital System Design. Course Syllabus. Course Syllabus (1) Programmable Logic
Aims and Objectives E 3.05 Digital System Design Peter Cheung Department of Electrical & Electronic Engineering Imperial College London URL: www.ee.ic.ac.uk/pcheung/ E-mail: p.cheung@ic.ac.uk How to go
More information1. Memory technology & Hierarchy
1. Memory technology & Hierarchy RAM types Advances in Computer Architecture Andy D. Pimentel Memory wall Memory wall = divergence between CPU and RAM speed We can increase bandwidth by introducing concurrency
More informationLesson 7: SYSTEM-ON. SoC) AND USE OF VLSI CIRCUIT DESIGN TECHNOLOGY. Chapter-1L07: "Embedded Systems - ", Raj Kamal, Publs.: McGraw-Hill Education
Lesson 7: SYSTEM-ON ON-CHIP (SoC( SoC) AND USE OF VLSI CIRCUIT DESIGN TECHNOLOGY 1 VLSI chip Integration of high-level components Possess gate-level sophistication in circuits above that of the counter,
More informationMP3 Player CSEE 4840 SPRING 2010 PROJECT DESIGN. zl2211@columbia.edu. ml3088@columbia.edu
MP3 Player CSEE 4840 SPRING 2010 PROJECT DESIGN Zheng Lai Zhao Liu Meng Li Quan Yuan zl2215@columbia.edu zl2211@columbia.edu ml3088@columbia.edu qy2123@columbia.edu I. Overview Architecture The purpose
More informationImplementation of Digital Signal Processing: Some Background on GFSK Modulation
Implementation of Digital Signal Processing: Some Background on GFSK Modulation Sabih H. Gerez University of Twente, Department of Electrical Engineering s.h.gerez@utwente.nl Version 4 (February 7, 2013)
More information4/1/2017. PS. Sequences and Series FROM 9.2 AND 9.3 IN THE BOOK AS WELL AS FROM OTHER SOURCES. TODAY IS NATIONAL MANATEE APPRECIATION DAY
PS. Sequences and Series FROM 9.2 AND 9.3 IN THE BOOK AS WELL AS FROM OTHER SOURCES. TODAY IS NATIONAL MANATEE APPRECIATION DAY 1 Oh the things you should learn How to recognize and write arithmetic sequences
More informationJPEG compression of monochrome 2D-barcode images using DCT coefficient distributions
Edith Cowan University Research Online ECU Publications Pre. JPEG compression of monochrome D-barcode images using DCT coefficient distributions Keng Teong Tan Hong Kong Baptist University Douglas Chai
More informationALFFT FAST FOURIER Transform Core Application Notes
ALFFT FAST FOURIER Transform Core Application Notes 6-20-2012 Table of Contents General Information... 3 Features... 3 Key features... 3 Design features... 3 Interface... 6 Symbol... 6 Signal description...
More informationALGEBRA. sequence, term, nth term, consecutive, rule, relationship, generate, predict, continue increase, decrease finite, infinite
ALGEBRA Pupils should be taught to: Generate and describe sequences As outcomes, Year 7 pupils should, for example: Use, read and write, spelling correctly: sequence, term, nth term, consecutive, rule,
More informationSystem Interconnect Architectures. Goals and Analysis. Network Properties and Routing. Terminology - 2. Terminology - 1
System Interconnect Architectures CSCI 8150 Advanced Computer Architecture Hwang, Chapter 2 Program and Network Properties 2.4 System Interconnect Architectures Direct networks for static connections Indirect
More informationImplementation and Design of AES S-Box on FPGA
International Journal of Research in Engineering and Science (IJRES) ISSN (Online): 232-9364, ISSN (Print): 232-9356 Volume 3 Issue ǁ Jan. 25 ǁ PP.9-4 Implementation and Design of AES S-Box on FPGA Chandrasekhar
More informationFPGA Implementation of an Extended Binary GCD Algorithm for Systolic Reduction of Rational Numbers
FPGA Implementation of an Extended Binary GCD Algorithm for Systolic Reduction of Rational Numbers Bogdan Mătăsaru and Tudor Jebelean RISC-Linz, A 4040 Linz, Austria email: bmatasar@risc.uni-linz.ac.at
More informationThis Unit: Floating Point Arithmetic. CIS 371 Computer Organization and Design. Readings. Floating Point (FP) Numbers
This Unit: Floating Point Arithmetic CIS 371 Computer Organization and Design Unit 7: Floating Point App App App System software Mem CPU I/O Formats Precision and range IEEE 754 standard Operations Addition
More informationCHAPTER 3 Boolean Algebra and Digital Logic
CHAPTER 3 Boolean Algebra and Digital Logic 3.1 Introduction 121 3.2 Boolean Algebra 122 3.2.1 Boolean Expressions 123 3.2.2 Boolean Identities 124 3.2.3 Simplification of Boolean Expressions 126 3.2.4
More informationChapter 4 Register Transfer and Microoperations. Section 4.1 Register Transfer Language
Chapter 4 Register Transfer and Microoperations Section 4.1 Register Transfer Language Digital systems are composed of modules that are constructed from digital components, such as registers, decoders,
More informationECE 842 Report Implementation of Elliptic Curve Cryptography
ECE 842 Report Implementation of Elliptic Curve Cryptography Wei-Yang Lin December 15, 2004 Abstract The aim of this report is to illustrate the issues in implementing a practical elliptic curve cryptographic
More informationArea 3: Analog and Digital Electronics. D.A. Johns
Area 3: Analog and Digital Electronics D.A. Johns 1 1970 2012 Tech Advancements Everything but Electronics: Roughly factor of 2 improvement Cars and airplanes: 70% more fuel efficient Materials: up to
More informationThe string of digits 101101 in the binary number system represents the quantity
Data Representation Section 3.1 Data Types Registers contain either data or control information Control information is a bit or group of bits used to specify the sequence of command signals needed for
More informationConceptual Framework Strategies for Image Compression: A Review
International Journal of Computer Sciences and Engineering Open Access Review Paper Volume-4, Special Issue-1 E-ISSN: 2347-2693 Conceptual Framework Strategies for Image Compression: A Review Sumanta Lal
More informationAlgebra 2 Chapter 1 Vocabulary. identity - A statement that equates two equivalent expressions.
Chapter 1 Vocabulary identity - A statement that equates two equivalent expressions. verbal model- A word equation that represents a real-life problem. algebraic expression - An expression with variables.
More informationFAULT TOLERANCE FOR MULTIPROCESSOR SYSTEMS VIA TIME REDUNDANT TASK SCHEDULING
FAULT TOLERANCE FOR MULTIPROCESSOR SYSTEMS VIA TIME REDUNDANT TASK SCHEDULING Hussain Al-Asaad and Alireza Sarvi Department of Electrical & Computer Engineering University of California Davis, CA, U.S.A.
More informationAdvanced Computer Architecture-CS501. Computer Systems Design and Architecture 2.1, 2.2, 3.2
Lecture Handout Computer Architecture Lecture No. 2 Reading Material Vincent P. Heuring&Harry F. Jordan Chapter 2,Chapter3 Computer Systems Design and Architecture 2.1, 2.2, 3.2 Summary 1) A taxonomy of
More information8. Hardware Acceleration and Coprocessing
July 2011 ED51006-1.2 8. Hardware Acceleration and ED51006-1.2 This chapter discusses how you can use hardware accelerators and coprocessing to create more eicient, higher throughput designs in OPC Builder.
More informationCurrent Standard: Mathematical Concepts and Applications Shape, Space, and Measurement- Primary
Shape, Space, and Measurement- Primary A student shall apply concepts of shape, space, and measurement to solve problems involving two- and three-dimensional shapes by demonstrating an understanding of:
More informationBinary Division. Decimal Division. Hardware for Binary Division. Simple 16-bit Divider Circuit
Decimal Division Remember 4th grade long division? 43 // quotient 12 521 // divisor dividend -480 41-36 5 // remainder Shift divisor left (multiply by 10) until MSB lines up with dividend s Repeat until
More informationArchitectural Level Power Consumption of Network on Chip. Presenter: YUAN Zheng
Architectural Level Power Consumption of Network Presenter: YUAN Zheng Why Architectural Low Power Design? High-speed and large volume communication among different parts on a chip Problem: Power consumption
More information150127-Microprocessor & Assembly Language
Chapter 3 Z80 Microprocessor Architecture The Z 80 is one of the most talented 8 bit microprocessors, and many microprocessor-based systems are designed around the Z80. The Z80 microprocessor needs an
More informationCAs and Turing Machines. The Basis for Universal Computation
CAs and Turing Machines The Basis for Universal Computation What We Mean By Universal When we claim universal computation we mean that the CA is capable of calculating anything that could possibly be calculated*.
More informationExploiting Stateful Inspection of Network Security in Reconfigurable Hardware
Exploiting Stateful Inspection of Network Security in Reconfigurable Hardware Shaomeng Li, Jim Tørresen, Oddvar Søråsen Department of Informatics University of Oslo N-0316 Oslo, Norway {shaomenl, jimtoer,
More informationA Systolic Algorithm to Process Compressed Binary Images
A Systolic Algorithm to Process Compressed Binary Images Fikret Ercal, Mark Allen, and Hao Feng University of Missouri Rolla Department of Computer Science and Intelligent Systems Center Rolla, MO 65401
More informationShear :: Blocks (Video and Image Processing Blockset )
1 of 6 15/12/2009 11:15 Shear Shift rows or columns of image by linearly varying offset Library Geometric Transformations Description The Shear block shifts the rows or columns of an image by a gradually
More informationBest Practises for LabVIEW FPGA Design Flow. uk.ni.com ireland.ni.com
Best Practises for LabVIEW FPGA Design Flow 1 Agenda Overall Application Design Flow Host, Real-Time and FPGA LabVIEW FPGA Architecture Development FPGA Design Flow Common FPGA Architectures Testing and
More informationAn Efficient Architecture for Image Compression and Lightweight Encryption using Parameterized DWT
An Efficient Architecture for Image Compression and Lightweight Encryption using Parameterized DWT Babu M., Mukuntharaj C., Saranya S. Abstract Discrete Wavelet Transform (DWT) based architecture serves
More information. 0 1 10 2 100 11 1000 3 20 1 2 3 4 5 6 7 8 9
Introduction The purpose of this note is to find and study a method for determining and counting all the positive integer divisors of a positive integer Let N be a given positive integer We say d is a
More informationReliable Systolic Computing through Redundancy
Reliable Systolic Computing through Redundancy Kunio Okuda 1, Siang Wun Song 1, and Marcos Tatsuo Yamamoto 1 Universidade de São Paulo, Brazil, {kunio,song,mty}@ime.usp.br, http://www.ime.usp.br/ song/
More informationVLSI IMPLEMENTATION OF INTERNET CHECKSUM CALCULATION FOR 10 GIGABIT ETHERNET
VLSI IMPLEMENTATION OF INTERNET CHECKSUM CALCULATION FOR 10 GIGABIT ETHERNET Tomas Henriksson, Niklas Persson and Dake Liu Department of Electrical Engineering, Linköpings universitet SE-581 83 Linköping
More informationThe continuous and discrete Fourier transforms
FYSA21 Mathematical Tools in Science The continuous and discrete Fourier transforms Lennart Lindegren Lund Observatory (Department of Astronomy, Lund University) 1 The continuous Fourier transform 1.1
More informationFPGA Implementation of Human Behavior Analysis Using Facial Image
RESEARCH ARTICLE OPEN ACCESS FPGA Implementation of Human Behavior Analysis Using Facial Image A.J Ezhil, K. Adalarasu Department of Electronics & Communication Engineering PSNA College of Engineering
More informationSPARC64 VIIIfx: CPU for the K computer
SPARC64 VIIIfx: CPU for the K computer Toshio Yoshida Mikio Hondo Ryuji Kan Go Sugizaki SPARC64 VIIIfx, which was developed as a processor for the K computer, uses Fujitsu Semiconductor Ltd. s 45-nm CMOS
More informationEastern Washington University Department of Computer Science. Questionnaire for Prospective Masters in Computer Science Students
Eastern Washington University Department of Computer Science Questionnaire for Prospective Masters in Computer Science Students I. Personal Information Name: Last First M.I. Mailing Address: Permanent
More informationMonday January 19th 2015 Title: "Transmathematics - a survey of recent results on division by zero" Facilitator: TheNumberNullity / James Anderson, UK
Monday January 19th 2015 Title: "Transmathematics - a survey of recent results on division by zero" Facilitator: TheNumberNullity / James Anderson, UK It has been my pleasure to give two presentations
More informationDESIGN OF AN ERROR DETECTION AND DATA RECOVERY ARCHITECTURE FOR MOTION ESTIMATION TESTING APPLICATIONS
DESIGN OF AN ERROR DETECTION AND DATA RECOVERY ARCHITECTURE FOR MOTION ESTIMATION TESTING APPLICATIONS V. SWARNA LATHA 1 & K. SRINIVASA RAO 2 1 VLSI System Design A.I.T.S, Rajampet Kadapa (Dt), A.P., India
More informationSynchronization of sampling in distributed signal processing systems
Synchronization of sampling in distributed signal processing systems Károly Molnár, László Sujbert, Gábor Péceli Department of Measurement and Information Systems, Budapest University of Technology and
More informationA comprehensive survey on various ETC techniques for secure Data transmission
A comprehensive survey on various ETC techniques for secure Data transmission Shaikh Nasreen 1, Prof. Suchita Wankhade 2 1, 2 Department of Computer Engineering 1, 2 Trinity College of Engineering and
More informationRelating Empirical Performance Data to Achievable Parallel Application Performance
Published in Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA'99), Vol. III, Las Vegas, Nev., USA, June 28-July 1, 1999, pp. 1627-1633.
More informationVideo-Conferencing System
Video-Conferencing System Evan Broder and C. Christoher Post Introductory Digital Systems Laboratory November 2, 2007 Abstract The goal of this project is to create a video/audio conferencing system. Video
More informationAn Effective Deterministic BIST Scheme for Shifter/Accumulator Pairs in Datapaths
An Effective Deterministic BIST Scheme for Shifter/Accumulator Pairs in Datapaths N. KRANITIS M. PSARAKIS D. GIZOPOULOS 2 A. PASCHALIS 3 Y. ZORIAN 4 Institute of Informatics & Telecommunications, NCSR
More informationMemory Systems. Static Random Access Memory (SRAM) Cell
Memory Systems This chapter begins the discussion of memory systems from the implementation of a single bit. The architecture of memory chips is then constructed using arrays of bit implementations coupled
More informationa 11 x 1 + a 12 x 2 + + a 1n x n = b 1 a 21 x 1 + a 22 x 2 + + a 2n x n = b 2.
Chapter 1 LINEAR EQUATIONS 1.1 Introduction to linear equations A linear equation in n unknowns x 1, x,, x n is an equation of the form a 1 x 1 + a x + + a n x n = b, where a 1, a,..., a n, b are given
More informationHow To Calculate Kinematics Of A Parallel Robot
AUTOMATYKA/ AUTOMATICS 03 Vol. 7 No. http://dx.doi.org/0.7494/automat.03.7..87 Grzegorz Karpiel*, Konrad Gac*, Maciej Petko* FPGA Based Hardware Accelerator for Parallel Robot Kinematic Calculations. Introduction
More informationStochastic Processes and Queueing Theory used in Cloud Computer Performance Simulations
56 Stochastic Processes and Queueing Theory used in Cloud Computer Performance Simulations Stochastic Processes and Queueing Theory used in Cloud Computer Performance Simulations Florin-Cătălin ENACHE
More informationUniversity of Texas at Dallas. Department of Electrical Engineering. EEDG 6306 - Application Specific Integrated Circuit Design
University of Texas at Dallas Department of Electrical Engineering EEDG 6306 - Application Specific Integrated Circuit Design Synopsys Tools Tutorial By Zhaori Bi Minghua Li Fall 2014 Table of Contents
More informationINTRODUCTION TO DIGITAL SYSTEMS. IMPLEMENTATION: MODULES (ICs) AND NETWORKS IMPLEMENTATION OF ALGORITHMS IN HARDWARE
INTRODUCTION TO DIGITAL SYSTEMS 1 DESCRIPTION AND DESIGN OF DIGITAL SYSTEMS FORMAL BASIS: SWITCHING ALGEBRA IMPLEMENTATION: MODULES (ICs) AND NETWORKS IMPLEMENTATION OF ALGORITHMS IN HARDWARE COURSE EMPHASIS:
More informationLecture 3: Finding integer solutions to systems of linear equations
Lecture 3: Finding integer solutions to systems of linear equations Algorithmic Number Theory (Fall 2014) Rutgers University Swastik Kopparty Scribe: Abhishek Bhrushundi 1 Overview The goal of this lecture
More informationFEGYVERNEKI SÁNDOR, PROBABILITY THEORY AND MATHEmATICAL
FEGYVERNEKI SÁNDOR, PROBABILITY THEORY AND MATHEmATICAL STATIsTICs 4 IV. RANDOm VECTORs 1. JOINTLY DIsTRIBUTED RANDOm VARIABLEs If are two rom variables defined on the same sample space we define the joint
More informationADVANCED PROCESSOR ARCHITECTURES AND MEMORY ORGANISATION Lesson-12: ARM
ADVANCED PROCESSOR ARCHITECTURES AND MEMORY ORGANISATION Lesson-12: ARM 1 The ARM architecture processors popular in Mobile phone systems 2 ARM Features ARM has 32-bit architecture but supports 16 bit
More informationHere are some examples of combining elements and the operations used:
MATRIX OPERATIONS Summary of article: What is an operation? Addition of two matrices. Multiplication of a Matrix by a scalar. Subtraction of two matrices: two ways to do it. Combinations of Addition, Subtraction,
More informationIMPLEMENTATION OF BACKEND SYNTHESIS AND STATIC TIMING ANALYSIS OF PROCESSOR LOCAL BUS(PLB) PERFORMANCE MONITOR
International Journal of Engineering & Science Research IMPLEMENTATION OF BACKEND SYNTHESIS AND STATIC TIMING ANALYSIS OF PROCESSOR LOCAL BUS(PLB) PERFORMANCE MONITOR ABSTRACT Pathik Gandhi* 1, Milan Dalwadi
More informationComputer Systems Design and Architecture by V. Heuring and H. Jordan
1-1 Chapter 1 - The General Purpose Machine Computer Systems Design and Architecture Vincent P. Heuring and Harry F. Jordan Department of Electrical and Computer Engineering University of Colorado - Boulder
More informationAlpha CPU and Clock Design Evolution
Alpha CPU and Clock Design Evolution This lecture uses two papers that discuss the evolution of the Alpha CPU and clocking strategy over three CPU generations Gronowski, Paul E., et.al., High Performance
More informationImplementation of Canny Edge Detector of color images on CELL/B.E. Architecture.
Implementation of Canny Edge Detector of color images on CELL/B.E. Architecture. Chirag Gupta,Sumod Mohan K cgupta@clemson.edu, sumodm@clemson.edu Abstract In this project we propose a method to improve
More informationGiving credit where credit is due
CSCE 230J Computer Organization Processor Architecture VI: Wrap-Up Dr. Steve Goddard goddard@cse.unl.edu http://cse.unl.edu/~goddard/courses/csce230j Giving credit where credit is due ost of slides for
More informationThis document is downloaded from DR-NTU, Nanyang Technological University Library, Singapore.
This document is downloaded from DR-NTU, Nanyang Technological University Library, Singapore. Title Transcription of polyphonic signals using fast filter bank( Accepted version ) Author(s) Foo, Say Wei;
More informationSIM-PL: Software for teaching computer hardware at secondary schools in the Netherlands
SIM-PL: Software for teaching computer hardware at secondary schools in the Netherlands Ben Bruidegom, benb@science.uva.nl AMSTEL Instituut Universiteit van Amsterdam Kruislaan 404 NL-1098 SM Amsterdam
More information