High Speed FIR Filter Based on Truncated Multiplier and Parallel Adder

Similar documents
High Speed and Efficient 4-Tap FIR Filter Design Using Modified ETA and Multipliers

Implementation of Modified Booth Algorithm (Radix 4) and its Comparison with Booth Algorithm (Radix-2)

Floating Point Fused Add-Subtract and Fused Dot-Product Units

LMS is a simple but powerful algorithm and can be implemented to take advantage of the Lattice FPGA architecture.

An Efficient RNS to Binary Converter Using the Moduli Set {2n + 1, 2n, 2n 1}

A High-Performance 8-Tap FIR Filter Using Logarithmic Number System

RN-Codings: New Insights and Some Applications

FPGA. AT6000 FPGAs. Application Note AT6000 FPGAs. 3x3 Convolver with Run-Time Reconfigurable Vector Multiplier in Atmel AT6000 FPGAs.

Introduction to Xilinx System Generator Part II. Evan Everett and Michael Wu ELEC Spring 2013

International Journal of Electronics and Computer Science Engineering 1482

Reconfigurable Low Area Complexity Filter Bank Architecture for Software Defined Radio

A DA Serial Multiplier Technique based on 32- Tap FIR Filter for Audio Application

AVR223: Digital Filters with AVR. 8-bit Microcontrollers. Application Note. Features. 1 Introduction

RN-coding of Numbers: New Insights and Some Applications

An Efficient Architecture for Image Compression and Lightweight Encryption using Parameterized DWT

Design and Analysis of Parallel AES Encryption and Decryption Algorithm for Multi Processor Arrays

FFT Algorithms. Chapter 6. Contents 6.1

Multipliers. Introduction

Implementation of Full -Parallelism AES Encryption and Decryption

DESIGN OF AN ERROR DETECTION AND DATA RECOVERY ARCHITECTURE FOR MOTION ESTIMATION TESTING APPLICATIONS

Automatic Floating-Point to Fixed-Point Transformations

Binary Division. Decimal Division. Hardware for Binary Division. Simple 16-bit Divider Circuit

Understanding CIC Compensation Filters

Design and FPGA Implementation of a Novel Square Root Evaluator based on Vedic Mathematics

Implementation and Design of AES S-Box on FPGA

Image Compression through DCT and Huffman Coding Technique

Em bedded DSP : I ntroduction to Digital Filters

International Association of Scientific Innovation and Research (IASIR) (An Association Unifying the Sciences, Engineering, and Applied Research)

Binary Adders: Half Adders and Full Adders

FPGA Design of Reconfigurable Binary Processor Using VLSI

NEW adder cells are useful for designing larger circuits despite increase in transistor count by four per cell.

DIGITAL-TO-ANALOGUE AND ANALOGUE-TO-DIGITAL CONVERSION

Design of FIR Filters

Divide: Paper & Pencil. Computer Architecture ALU Design : Division and Floating Point. Divide algorithm. DIVIDE HARDWARE Version 1

From Concept to Production in Secure Voice Communications

How To Fix A 3 Bit Error In Data From A Data Point To A Bit Code (Data Point) With A Power Source (Data Source) And A Power Cell (Power Source)

Implementing the Functional Model of High Accuracy Fixed Width Modified Booth Multiplier

Simulation of Frequency Response Masking Approach for FIR Filter design

ADVANCED APPLICATIONS OF ELECTRICAL ENGINEERING

Innovative improvement of fundamental metrics including power dissipation and efficiency of the ALU system

A Novel Low Power, High Speed 14 Transistor CMOS Full Adder Cell with 50% Improvement in Threshold Loss Problem

Design and Implementation of Concurrent Error Detection and Data Recovery Architecture for Motion Estimation Testing Applications

UNIVERSITY OF CALIFORNIA. Santa Barbara. Optimization Techniques for Arithmetic Expressions. A Dissertation submitted in partial satisfaction of the

10 BIT s Current Mode Pipelined ADC

HIGH SPEED AREA EFFICIENT 1-BIT HYBRID FULL ADDER

This document is downloaded from DR-NTU, Nanyang Technological University Library, Singapore.

Design and Development of Virtual Instrument (VI) Modules for an Introductory Digital Logic Course

Design of Efficient Digital Interpolation Filters for Integer Upsampling. Daniel B. Turek

Efficient Interconnect Design with Novel Repeater Insertion for Low Power Applications

Computation of Forward and Inverse MDCT Using Clenshaw s Recurrence Formula

Error Detection and Data Recovery Architecture for Systolic Motion Estimators

IJESRT. [Padama, 2(5): May, 2013] ISSN:

DESIGN AND SIMULATION OF TWO CHANNEL QMF FILTER BANK FOR ALMOST PERFECT RECONSTRUCTION

CHAPTER 3 Boolean Algebra and Digital Logic

ADAPTIVE ALGORITHMS FOR ACOUSTIC ECHO CANCELLATION IN SPEECH PROCESSING

This Unit: Floating Point Arithmetic. CIS 371 Computer Organization and Design. Readings. Floating Point (FP) Numbers

Design of Low Power One-Bit Hybrid-CMOS Full Adder Cells

The string of digits in the binary number system represents the quantity

Transition Bandwidth Analysis of Infinite Impulse Response Filters

IIR Half-band Filter Design with TMS320VC33 DSP

Sistemas Digitais I LESI - 2º ano

73M2901CE Programming the Imprecise Call Progress Monitor Filter

Systolic Computing. Fundamentals

ALFFT FAST FOURIER Transform Core Application Notes

Let s put together a Manual Processor

Vedicmultiplier for RC6 Encryption Standards Using FPGA

A Sound Analysis and Synthesis System for Generating an Instrumental Piri Song

Filter Comparison. Match #1: Analog vs. Digital Filters

ON SUITABILITY OF FPGA BASED EVOLVABLE HARDWARE SYSTEMS TO INTEGRATE RECONFIGURABLE CIRCUITS WITH HOST PROCESSING UNIT

(Refer Slide Time: 01:11-01:27)

9/14/ :38

Performance Comparison of an Algorithmic Current- Mode ADC Implemented using Different Current Comparators

Model-Based Synthesis of High- Speed Serial-Link Transmitter Designs

Chapter 4 Register Transfer and Microoperations. Section 4.1 Register Transfer Language

C H A P T E R. Logic Circuits

2) Write in detail the issues in the design of code generator.

Computer Engineering: Incoming MS Student Orientation Requirements & Course Overview

數 位 積 體 電 路 Digital Integrated Circuits

DDS. 16-bit Direct Digital Synthesizer / Periodic waveform generator Rev Key Design Features. Block Diagram. Generic Parameters.

Fixed-Point Arithmetic: An Introduction

Design and Verification of Area-Optimized AES Based on FPGA Using Verilog HDL

How To Fix Out Of Focus And Blur Images With A Dynamic Template Matching Algorithm

A Modified Binary Search Algorithm for Carrier Acquisition in DSB-SC Communication System

Grid Interconnection of Renewable Energy Sources Using Modified One-Cycle Control Technique

A single register, called the accumulator, stores the. operand before the operation, and stores the result. Add y # add y from memory to the acc

A DESIGN OF DSPIC BASED SIGNAL MONITORING AND PROCESSING SYSTEM

Synchronization of sampling in distributed signal processing systems

INTEGRATED CIRCUITS. For a complete data sheet, please also download:

Gates, Circuits, and Boolean Algebra

VLSI IMPLEMENTATION OF INTERNET CHECKSUM CALCULATION FOR 10 GIGABIT ETHERNET

An Effective Deterministic BIST Scheme for Shifter/Accumulator Pairs in Datapaths

A Survey of Video Processing with Field Programmable Gate Arrays (FGPA)

Efficient Data Recovery scheme in PTS-Based OFDM systems with MATRIX Formulation

LOW COST HARDWARE IMPLEMENTATION FOR DIGITAL HEARING AID USING

AN IMPROVED DESIGN OF REVERSIBLE BINARY TO BINARY CODED DECIMAL CONVERTER FOR BINARY CODED DECIMAL MULTIPLICATION

INTRODUCTION TO DIGITAL SYSTEMS. IMPLEMENTATION: MODULES (ICs) AND NETWORKS IMPLEMENTATION OF ALGORITHMS IN HARDWARE

Cryptography and Network Security. Prof. D. Mukhopadhyay. Department of Computer Science and Engineering. Indian Institute of Technology, Kharagpur

Teaching DSP through the Practical Case Study of an FSK Modem

A Processor Generation Method from Instruction Behavior Description Based on Specification of Pipeline Stages and Functional Units

Transcription:

High Speed FIR Filter Based on Truncated Multiplier and Parallel Adder Deepshikha Bharti #1, K. Anusudha *2 #1 Student,M.Tech, Department of Electronics Engineering, Pondicherry University, puducherry, India #2 Assistant professor, Department of Electronics Engineering, Pondicherry University, puducherry,india Abstract High speed Finite Impulse Response filter (FIR) is designed using the concept of faithfully rounded truncated multiplier and parallel prefix adder. The bit width is also optimized without sacrificing the signal precision. A transposed form of FIR filter is implemented using an improved version of truncated multiplier and parallel prefix adder. Multiplication and addition is frequently required in Digital Signal Processing. Parallel prefix adder provides a high speed addition and the improved version of truncated multiplier also provides successive reduction in delay and the components used Keywords Digital signal processing(dsp), Truncated multiplier, Parallel adder, FIR filter. VLSI design. I. INTRODUCTION An finite impulse response (FIR) filter is a digital filter that works on digital inputs. A digital filter is a system that performs mathematical operations on sampled, discrete time signal to reduce or enhance certain aspect of that signal. There are two types of digital filter mainly used that are infinite response (IIR) filter and finite impulse response (FIR) filter. Finite impulse response (FIR) digital filter is one of the fundamental components in many digital signal processing (DSP) and communication systems. It is also widely used in many portable applications which requires fast operations. In general the transfer function for a linear, time-invariant, filter can be expressed in Z domain as : H(z) = () = ()..(1) When the denominator part is made equal to unity then the equation becomes for an FIR filter. So the final equation for the FIR filter of order N can be expressed as: y[n] = b x[n] + b x[n 1] + + b x[n N] = b x[n i]...(2) The FIR filter has two basic structures, direct form and transposed form as shown in fig 1. Fig. 1. Structures of FIR filters: (a) Direct form and (b) transposed form The direct form has one huge addition at the output, which maps well in multiple accumulation unit (MAC) operations on a digital signal processing (DSP) processor but hardware implementation is complex. The direct form also needs extra pipeline registers to reduce delay of the adder tree. The transposed form of FIR filter already has registers between the adders and achieve high throughput and less delay without adding any extra pipeline registers between the adders. The transposed form also has many small addition separated by delay element. The number of delay element is more in the transposed form and adding delay element to the structures makes the design faster. Fig 1 (a) shows the direct form of FIR filter in which the output is obtained by performing the concurrent multiplications of individual delayed signals and respective filter coefficients, followed by accumulation of all the products. In the transposed form in fig 1 (b),the inputs to the multiplier are the current input signal x[n] and coefficients. The results of individual multiplier goes through adder block and delay elements., there are many papers on the designs and implementations of low-cost or high-speed FIR filters (a) (b) ISSN: 2231-5381 http://www.ijettjournal.org Page 243

[1] [7]. In order to avoid multiplier block an FIR filter can also be implemented as: multiplier less and memory based design. Multiplier less-based designs are realized with shift-and add Operations and share the common sub operations using canonical signed digit (CSD) recoding and common sub expression elimination (CSE) to minimize the adder cost of MCM [8] [12]. Some multiplier less based FIR filter designs use the transposed structure to allow for cross-coefficient sharing and tend to be faster, particularly when the filter order is small. However, the area of delay elements is larger compared with that of the direct form due to the range expansion of the constant multiplications and the subsequent additions in the adder block. Memory-based FIR designs consist of two types of approaches: lookup table (LUT) methods and distributed arithmetic (DA) methods [11] [12]. The LUT-based design stores in ROMs odd multiples of the input signal to realize the constant multiplications in MCM [11]. The DA-based approaches recursively accumulate the bit-level partial results for the inner product computation in FIR filtering. II. FINDING THE FILTER COEFFICIENTS For the design of FIR filter the first step is to find the filter coefficients. The filter coefficients can be calculated in several ways as: window design method, frequency sampling method and paks-mcclellan method. The simplest way of designing filter is by using the windowing technique using MATLAB software. The filter coefficients can be calculated as: h n 1 2 1 2 c c H 1e 2 fc sin nc nc 2 fc e jn jn d d for n for n (3) Filter coefficients can be obtained using any of the windows. Hamming windows are particularly well-suited for this method because of their closed form specifications. Here hamming window is used for the calculation of filter coefficients. The equation for the hamming window is given as: N =..(4) Fig 2: Digital filter design stages An important design consideration for the design of FIR filter is the derivation of filter coefficients and the optimization of the bit widths for filter coefficients, which has direct impact on the area and speed of arithmetic units. Some of the DSP application do not require full precision outputs, in such cases the output can be rounded and truncated to save the area and also to increase the speed of the system. Whereas the total error will not be more than 1 unit in the last place (ulp). In this brief, we present high speed implementations of FIR filters based on the transposed structure in Fig. 1(b) with faithfully rounded truncated multipliers and parallel prefix adder. The faithfully truncated multiplier produces the output results by accumulating all the partial products (PPs) where the unnecessary partial products bits (PPBs) are removed without affecting the final precision of output. This brief is organized as follow: Section II discusses the derivation of filter coefficients, Section III describes the truncated multiplier, Section IV describes about the paralle prefix adder and Section V describes the experimental results. Finally the set of windowed impulse response coefficients h[n] is calculated as: n h n W n h d.(5) For N 1 N 1 n 2 2 for N odd N N n for N even 2 2 Where, hd(n) = desigred coefficients W(n) =.54 +,46cos( ) ISSN: 2231-5381 http://www.ijettjournal.org Page 244

.35.3.25.2 Truncated Impulse Response This paper implements a truncation design which decreases the number of used half adders and full adders. The lsb pp do not has much importance in the calculation of final products so these are not calculated which reduced the number of adders in the calculation of products. Value.15.1.5 -.5 1 2 3 4 5 6 7 8 Coefficient number Fig 3 : Impulse response of a 8 tap filter using hamming Window Magnitude (db ) -2-4 -6 Fig 5 : Method of truncation of PPBs.2.4.6.8 1 1.2 1.4 1.6 1.8 2 Frequency (Hz) x 1 4 Phase (degrees ) -2-4 -6.5 1 1.5 2 Frequency (Hz) x 1 4 Fig 4 : Magnitude and phase response of 8 tap filter using Hamming window III. TRUNCATED MULTIPLIER Parallel multipliers are typically implemented as either carry-save array or tree multipliers. In many computer systems, the (n+n)-bit products produced by parallel multipliers are rounded to n bits to avoid growth in word size. In various DSP application it is not needed to use the full output products of multiplier. As in most of the filter implementation the filter output can also be obtained using only the MSB bits of the multiplier output. This type of implementation will also calculate the filter output faster with less area used. This means a multiplier should be designed which requires less delay and that is possible with the truncated multiplier. In the wireless multimedia world, DSP systems are everywhere. DSP algorithms are computationally intensive and test the limits of battery life in portable device such as cell phones, pocket gadget, hearing aids, MP3 players, digital video recorders and so on. Multiplication is the main operation in many signal processing algorithms hence efficient parallel multipliers is desirable. Fig 6 : Truncated multiplier design In the faithfully rounded FIR filter implementation, it is required that the total error introduced during the arithmetic operations is no larger than one ulp. Fig 5 and fig 6 shows that a considerable amount of ppbs can be deleted leading to smaller area and delay.fig 5 shows a single row of PPBs is made undeletable (for the subsequent rounding), and the PPB elimination consists of only deletion and rounding. The error ranges of deletion and rounding is calculated as follows: ulp ED ½ulp ED= ED + ½ulp ½ulp ulp < E_R ½ ulp < ER=ER + ½ulp ½ulp ulp < E=(ED + ER) ulp ISSN: 2231-5381 http://www.ijettjournal.org Page 245

IV. PARALLEL PREFIX ADDER This paper has used a parallel prefix adder as kogge-stone for the implementation of fir filter. Parallel prefix adder is primarily faster than other carry propagate adder. Parallel prefix adder is the most efficient circuit for the binary addition. kogge-stone adder is a parallel prefix defined form of carry look ahead adder. It generates the carry signal in (log n) time, and is widely considered as the fastest adder design possible. It takes more area to implement as compared to others parallel adders but has a lower fan-out at each stages. A parallel Prefix Addition is generally a three step process. The first step implicate the creation of generate (gi) and propagate (pi) signals for the input operand bits. The second step involves the generation of carry signals and finally a simple adder to generate sum. The three stage structure of carry look ahead adder and parallel prefix adder is shown in fig 7. Addends Generation of Propagate and Generate signals Generation of carry signals Adder to generate the sum Final sum Fig 7: Three stage structure of a parallel prefix adder A kogge-stone type parallel prefix adder precomputes the propagate and generate signal. The propagate signals are obtained by XORing the input bits and the generate signals are obtained by ANDing the input bits. Further these propagate and generate bits are calculated in step by step manner by considering two different types of nodes as black and white nodes. The algorithm for the kogge-stone adder is shown in fig 8. The diamond shapes block shows the white notes and the black dot indicated the black nodes. The white nodes (diamond shaped box) is calculated as : {X[1],y[1]}= {a[1],(a[1] b[1])&b[]} The black nodes(black dot) is calculated as:..(6) {x[1],y[1]}={a[1]&b[1],(a[1] b[1])&b[]}..(7) Both the nodes calculation needs the OR gate and the AND gates Fig 8 shows that, each vertical stage produces a propagate and generate bit. The culminating generate bit are produced in the last stage (vertical), and these bits are XOR d with the initial propagate after the input to produce the sum bits. V. SIMULATION RESULTS An 8 tap FIR filter using the above truncated multiplier and the parallel prefix adder is implemented. The FIR filter simulation shows considerable reduction in the delay of the filter. For designing of any filter first of all the filter order is selected. Since the main aim of this filter design is to reduce the delay so a low order filter is been chosen for the design. The coefficients for the filter is calculated using MATLAB software. Later the coefficients are stored in the register. Total delay in the design of FIR filter includes that of arithmetic circuit (adder and multiplier) and the delay element (D-flip flop). The structure is been synthesized and simulated using XILINX ISE 13.2 design suit. The following table shows the delay and LUTs used in the calculation of multiplier, adder and the Fir filter. Block Truncated multiplier Parallel prefix adder TABLE I Arithmetic circuit results Logic delay Route delay LUTs used (ns) (ns) 11.372 ns 12.495 ns 91 8.498 ns 5.619 ns 36 Fig 8: Design flow of 8 bit kogge-stone adder Block 8 Tap FIR filter, Transposed form TABLE II An 8 Tap FIR filter results Logic Route LUTs Power delay (ns) delay (ns) used (mw) 35.11 56.634 1233 3.9 ISSN: 2231-5381 http://www.ijettjournal.org Page 246

VI. CONCLUSION This paper has presented a high speed FIR filter designs by considering the optimization of coefficient bit width and hardware resources in implementations. Although most prior designs are based on the direct form, we observe that the transposed form of FIR filter structure with faithfully truncated multiplier and parallel adder leads to less delay in the calculation of output of FIR filter. REFERNCES [1] A. Blad and O. Gustafsson, Integer linear programming-based bit-level optimization for high-speed FIR filter architecture, Circuits Syst.signal Process, vol. 29, no. 1. 81-11, Feb. 21. [2] S. Hwang, G. Han, S. Kang, and J.-S. Kim, New distributed arithmetic Algorithm for low-powe r FIR filter implementation, IEEE Signal Process. Lett., vol. 11, no. 5, pp. 463 466, May 24. [3] H. Samueli, An improved search algorithm for the design of multiplierless FIR filters with powers-of-two coefficient, IEEE Trans. Circuits Syst., vol. 36, no. 7, pp. 144 147, Jul. 1989. [4] Y. J. Yu and Y. C. Lim, Design of linear phase FIR filters in subexpression space using mixed integer linear programming, IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 54, no. 1, pp. 233 2338, Oct. 27. [5] M. M. Peiro, E. I. Boemo, and L. Wanhammar, Design of high-speed Multiplier less filters using a nonrecursive signed common subexpression algorithm, IEEE Trans. Circuits Syst. II, Analog Digit. Signal Process., vol. 49, no. 3, pp. 196 23, Mar. 22. [6] F. Xu, C. H. Chang, and C. C. Jong, Design of low-complexity FIR Filters based on signed -powers -of -two coefficients with reusable common subexpressions, IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., vol. 26, no. 1, pp. 1898 197, Oct. 27. [7] C.H. Chang, J. Chen, and A. P. Vinod, Information theoretic approach To complexity reduction of FIR filter design, IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 55, no. 8, pp. 231 2321, Sep. 28. [8] F. Xu, C. H. Chang, and C. C. Jong, Contention resolution A new Approach to versatile subexpressions sharing in multiple constant multiplications, IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 55, no. 2, pp. 559 571, Mar. 28. [9] F. Xu, C. H. Chang, and C. C. Jong, Contention resolution algorithms for common subexpression elimination in digital filter design, IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 52, no. 1, pp. 695 7, Oct. 25. [1] I.C. Park and H.-J. Kang, Digital filter synthesis based on an algorithm Comput.-Aided Design Integr. Circuits Syst., vol. 21, no. 12, pp. 1525 ISSN: 2231-5381 http://www.ijettjournal.org Page 247