Implementing the Functional Model of High Accuracy Fixed Width Modified Booth Multiplier



Similar documents
Floating Point Fused Add-Subtract and Fused Dot-Product Units

RN-coding of Numbers: New Insights and Some Applications

Implementation of Modified Booth Algorithm (Radix 4) and its Comparison with Booth Algorithm (Radix-2)

RN-Codings: New Insights and Some Applications

High Speed and Efficient 4-Tap FIR Filter Design Using Modified ETA and Multipliers

Multipliers. Introduction

Innovative improvement of fundamental metrics including power dissipation and efficiency of the ALU system

A High-Yield Area-Power Efficient DWT Hardware for Implantable Neural Interface Applications

International Journal of Electronics and Computer Science Engineering 1482

An Efficient RNS to Binary Converter Using the Moduli Set {2n + 1, 2n, 2n 1}

Chapter 1: Digital Systems and Binary Numbers

Vedicmultiplier for RC6 Encryption Standards Using FPGA

NEW adder cells are useful for designing larger circuits despite increase in transistor count by four per cell.

The string of digits in the binary number system represents the quantity

LMS is a simple but powerful algorithm and can be implemented to take advantage of the Lattice FPGA architecture.

DESIGN OF AN ERROR DETECTION AND DATA RECOVERY ARCHITECTURE FOR MOTION ESTIMATION TESTING APPLICATIONS

Chapter 4 Register Transfer and Microoperations. Section 4.1 Register Transfer Language

AN IMPROVED DESIGN OF REVERSIBLE BINARY TO BINARY CODED DECIMAL CONVERTER FOR BINARY CODED DECIMAL MULTIPLICATION

COMBINATIONAL CIRCUITS

NAME AND SURNAME. TIME: 1 hour 30 minutes 1/6

Design and FPGA Implementation of a Novel Square Root Evaluator based on Vedic Mathematics

Attaining EDF Task Scheduling with O(1) Time Complexity

Image Compression through DCT and Huffman Coding Technique

FPGA. AT6000 FPGAs. Application Note AT6000 FPGAs. 3x3 Convolver with Run-Time Reconfigurable Vector Multiplier in Atmel AT6000 FPGAs.

A New Algorithm for Carry-Free Addition of Binary Signed-Digit Numbers

Data Storage 3.1. Foundations of Computer Science Cengage Learning

Divide: Paper & Pencil. Computer Architecture ALU Design : Division and Floating Point. Divide algorithm. DIVIDE HARDWARE Version 1

Efficient Interconnect Design with Novel Repeater Insertion for Low Power Applications

Static-Noise-Margin Analysis of Conventional 6T SRAM Cell at 45nm Technology

1. True or False? A voltage level in the range 0 to 2 volts is interpreted as a binary 1.

CSE140 Homework #7 - Solution

Design and Implementation of Concurrent Error Detection and Data Recovery Architecture for Motion Estimation Testing Applications

Two-Phase Clocking Scheme for Low-Power and High- Speed VLSI

Clock Distribution in RNS-based VLSI Systems

Chapter 2 Logic Gates and Introduction to Computer Architecture

ASYNCHRONOUS COUNTERS

Lecture 8: Binary Multiplication & Division

System on Chip Design. Michael Nydegger

what operations can it perform? how does it perform them? on what kind of data? where are instructions and data stored?

Performance Comparison of an Algorithmic Current- Mode ADC Implemented using Different Current Comparators

A High-Performance 8-Tap FIR Filter Using Logarithmic Number System

Binary Division. Decimal Division. Hardware for Binary Division. Simple 16-bit Divider Circuit

An Efficient Architecture for Image Compression and Lightweight Encryption using Parameterized DWT

Efficient Motion Estimation by Fast Three Step Search Algorithms

Flip-Flops, Registers, Counters, and a Simple Processor

Implementation of Full -Parallelism AES Encryption and Decryption

A comprehensive survey on various ETC techniques for secure Data transmission

Binary Adders: Half Adders and Full Adders

2011, The McGraw-Hill Companies, Inc. Chapter 3

Non-Data Aided Carrier Offset Compensation for SDR Implementation

10 BIT s Current Mode Pipelined ADC

IJESRT. [Padama, 2(5): May, 2013] ISSN:

A Survey of Video Processing with Field Programmable Gate Arrays (FGPA)

A 1-GSPS CMOS Flash A/D Converter for System-on-Chip Applications

Design and Implementation of an On-Chip timing based Permutation Network for Multiprocessor system on Chip

(Refer Slide Time: 00:01:16 min)

Digital Logic Design. Basics Combinational Circuits Sequential Circuits. Pu-Jen Cheng

Binary Numbering Systems

An Efficient Hardware Architecture for Factoring Integers with the Elliptic Curve Method

Data Storage. Chapter 3. Objectives. 3-1 Data Types. Data Inside the Computer. After studying this chapter, students should be able to:

Distributed Elastic Switch Architecture for efficient Networks-on-FPGAs

Understanding Logic Design

ON SUITABILITY OF FPGA BASED EVOLVABLE HARDWARE SYSTEMS TO INTEGRATE RECONFIGURABLE CIRCUITS WITH HOST PROCESSING UNIT

FAULT TOLERANCE FOR MULTIPROCESSOR SYSTEMS VIA TIME REDUNDANT TASK SCHEDULING

A Dynamic Link Allocation Router

Optimization and Comparison of 4-Stage Inverter, 2-i/p NAND Gate, 2-i/p NOR Gate Driving Standard Load By Using Logical Effort

Binary search tree with SIMD bandwidth optimization using SSE

Error Detection and Data Recovery Architecture for Systolic Motion Estimators

ISSCC 2003 / SESSION 13 / 40Gb/s COMMUNICATION ICS / PAPER 13.7

A DA Serial Multiplier Technique based on 32- Tap FIR Filter for Audio Application

Duobinary Modulation For Optical Systems

A single register, called the accumulator, stores the. operand before the operation, and stores the result. Add y # add y from memory to the acc

FPGA Design of Reconfigurable Binary Processor Using VLSI

Reconfigurable Low Area Complexity Filter Bank Architecture for Software Defined Radio

Systems I: Computer Organization and Architecture

Cryptography and Network Security. Prof. D. Mukhopadhyay. Department of Computer Science and Engineering. Indian Institute of Technology, Kharagpur

Clocking. Figure by MIT OCW Spring /18/05 L06 Clocks 1

Lecture N -1- PHYS Microcontrollers

MICROPROCESSOR. Exclusive for IACE Students iacehyd.blogspot.in Ph: /422 Page 1

A PERFORMANCE EVALUATION OF COMMON ENCRYPTION TECHNIQUES WITH SECURE WATERMARK SYSTEM (SWS)

An Embedded Hardware-Efficient Architecture for Real-Time Cascade Support Vector Machine Classification

Barcode Based Automated Parking Management System

Research on the UHF RFID Channel Coding Technology based on Simulink

How To Fix A 3 Bit Error In Data From A Data Point To A Bit Code (Data Point) With A Power Source (Data Source) And A Power Cell (Power Source)

IMPROVING PERFORMANCE OF RANDOMIZED SIGNATURE SORT USING HASHING AND BITWISE OPERATORS

Lossless Grey-scale Image Compression using Source Symbols Reduction and Huffman Coding

Design of Low Power One-Bit Hybrid-CMOS Full Adder Cells

Architectures and Design Methodologies for Micro and Nanocomputing

Linear Codes. Chapter Basics

FAST IP ADDRESS LOOKUP ENGINE FOR SOC INTEGRATION

A Novel Low Power, High Speed 14 Transistor CMOS Full Adder Cell with 50% Improvement in Threshold Loss Problem

路 論 Chapter 15 System-Level Physical Design

A Quality of Service Scheduling Technique for Optical LANs

JPEG Image Compression by Using DCT

A PC-BASED TIME INTERVAL COUNTER WITH 200 PS RESOLUTION

This Unit: Floating Point Arithmetic. CIS 371 Computer Organization and Design. Readings. Floating Point (FP) Numbers

To convert an arbitrary power of 2 into its English equivalent, remember the rules of exponential arithmetic:

Digital Systems Design! Lecture 1 - Introduction!!

A Novel Cryptographic Key Generation Method Using Image Features

Transcription:

International Journal of Electronics and Computer Science Engineering 393 Available Online at www.ijecse.org ISSN: 2277-1956 Implementing the Functional Model of High Accuracy Fixed Width Modified Booth Multiplier V. Keerthana 1, C. Arun Prasath 2, 1 Embedded System Technology/K.S.R. College of Engineering, Tiruchengode, Tamil Nadu, India 2 Electronics and Communication/ K.S.R. College of Engineering, Tiruchengode, Tamil Nadu, India Email- 1 keerthanavaradarajan@gmail.com, 2 arunprasath_2004@yahoo.co.in Abstract: The sustained growth in VLSI technology is fuelled by the continued shrinking of transistor to ever smaller dimension. The benefits of miniaturization are high packing densities, high circuit speed and low power dissipation. Binary multiplier is an electronic circuit used in digital electronics such as a computer to multiply two binary numbers, which is built using a binary adder. A fixed-width multiplier is attractive to many multimedia and digital signal processing systems which are desirable to maintain a fixed format and allow a minimum accuracy loss to output data. This paper presents the design of high-accuracy fixed-width modified Booth multipliers using Carry Look ahead Adder. The high accuracy fixed width modified booth multiplier is used to satisfy the needs of the applications like digital filtering, arithmetic coding, wavelet transformation, echo cancellation, etc. The high accuracy fixed width modified booth multipliers can also be applicable to lossy applications to reduce the area and power consumption of the whole system while maintaining good output quality. Keywords--- Booth Encoder, Partial product generator, Fixed-width multiplier, Modified Booth multiplier, Compression tree, Carry Look ahead Adder. 1. Introduction Welcome to the world of incredible mathematics. Our every breath of life is associated with mathematics. From the day we born, to till our end, mathematics is associated with each and every part of our life. 2012 is the remarkable year for mathematics. In order to pay a tribute to the 125 th birth anniversary of the mathematics genius srivasa ramanujam, 2012 is declared as the National Mathematics year by our honorable Prime Minister Dr. P. Manmohan Singh. Mathematics is without the four basic operations namely add, subtract, multiply and divide. As the life quote says add your strength, subtract your weakness, multiply your knowledge and divide your illiteracy. We are more privileged to share the knowledge and innovation on multiplication. In this paper, we propose a high-accuracy fixed width modified booth multiplier. The functional model design consists of booth encoder, partial product generator and compression tree which uses Carry Look ahead Adder. The term high accuracy implies that the output produced by the normal 8X8 booth multiplication and the proposed 8X8 booth multiplication are equal. The term fixed width indicates that the partial product bits are adjusted to fixed width for Carry Look ahead Adder in the compression tree. In [1] the result and one operand for the new modulo multipliers use weighted representation, while the other uses the diminished - 1. By using the radix-4 Booth recoding, the new multipliers reduce the number of the partial products to n/2 for n even and (n+1)/2 for n odd except for one correction term. Although one correction term is used, the circuit is very simple. The architecture for the new multipliers consists of an inverted end-around-carry carry save adder tree and one diminished-1 adder. The new multipliers receive full inputs and avoid (n+1) - bit circuits. In [2] a closed form of compensation function for fixed-width Booth multipliers using generalized probabilistic estimation bias (GPEB) is proposed. The GPEB circuit can be easily built according to the proposed systematic steps. The GPEB fixed-width multipliers with variable-correction outperform the existing compensation circuits in reducing error. The GPEB circuit has improved absolute average error reduction, area saving, power efficiency and accuracy. In [3] a truncated multiplier is a multiplier with two n bit operands that produces a n bit result. Truncated multipliers discard some of the partial products of a complete multiplier to trade off accuracy with hardware cost. This paper presents a closed form analytical calculation, for every bit width, of the maximum error for a previously proposed family of truncated multipliers. The considered family of truncated multipliers is particularly important since it is proved to be the design that gives the lowest mean square error for a given number of discarder partial products. With the contribution of this paper, the considered family of truncated multipliers is the only architecture that can be designed, for every bit width, using an analytical approach that allows the a priori knowledge of the maximum error. In [4] a 2-bit Booth encoder with Josephson Transmission Lines (JTLs) and Passive Transmission Lines (PTLs) by using cell-based techniques and tools was designed. The Booth encoding method

IJECSE,Volume1,Number 2 V. Keerthana and C. Arun Prasath is one of the algorithms to obtain partial products. With this method, the number of partial products decreases down to the half compared to the AND array method. a test chip for a multiplier with a 2-bit Booth encoder with JTLs and PTLs was fabricated. The circuit area of the multiplier designed with the Booth encoder method is compared to that designed with the AND array method. In [5] New fixed-width multiplier topologies, with different accuracy versus hardware complexity trade-off, are obtained by varying the quantization scheme. Two topologies are in particular selected as the most effective ones. The first one is based on a uniform coefficient quantization, while the second topology uses a non uniform quantization scheme. The novel fixed-width multiplier topologies exhibit better accuracy with respect to previous solutions, close to the theoretical lower bound. The electrical performances of the proposed fixed-width multipliers are compared with previous architectures. It is found that in most of the investigated cases the new topologies are Pareto-optimal regarding the area-accuracy trade-off. In [6] a special moduli set Residue Number System (RNS) of high dynamic range (DR) can speed up the execution of very large word-length repetitive multiplications found in applications like public key cryptography. The modulo 2 n - 1 multiplier is usually the noncritical data path among all modulo multipliers in such high-dr RNS multiplier. This timing slack can be exploited to reduce the system area and power consumption without compromising the system performance. In [7] Probabilistic Estimation Bias (PEB) Circuit is proposed. Small area and low truncation error is achieved. Peak SNR is reduced by 17db and 2% area penalty compared to the direct truncation method. In [8] Modified Booth encoding (MBE) technique reduces the number of partial product rows, keeping the generation process faster. In [9] Truncated multipliers compute the most-significant bits of the nxn bits product. This paper focuses on variable-correction truncated multipliers, where some partial-products are discarded, to reduce complexity, and a suitable compensation function is added to partly compensate the introduced error. The optimal compensation function, that minimizes the mean square error, is obtained in this paper in closed-form for the first time. A sub optimal compensation function, best suited for hardware implementation, is introduced. Efficient multipliers implementation based on sub-optimal function is discussed. Proposed truncated multipliers are extensively compared with previously proposed circuits. In [10] Power efficient 16 times 16 Configurable Booth Multiplier (CBM) supports single 16-b, single 8-b, or twin parallel 8-b multiplication operations is proposed. Dynamic range detector detects the dynamic ranges of two input operands. It deactivates redundant switching activities in ineffective ranges. In [11] The new generation of high-performance decimal floating-point units (DFUs) is demanding efficient implementations of parallel decimal multipliers. In this paper, we describe the architectures of two parallel decimal multipliers. The parallel generation of partial products is performed using signed-digit radix-10 or radix-5 recodings of the multiplier and a simplified set of multiplicand multiples. The reduction of partial products is implemented in a tree structure based on a decimal multi-operand carry-save addition algorithm that uses unconventional (non BCD) decimal-coded number systems. The latency of the previous designs, which include: optimized digit recoders for the generation of 2n-tuples (and 5-tuples), decimal carry-save adders (CSAs) combining different decimal-coded operands, and carry-free adders implemented by special designed bit counters can be reduced by improved techniques. a design methodology that combines all these techniques to obtain efficient reduction trees with different area and delay trade-offs for any number of partial products generated. Evaluation results for 16-digit operands show that the proposed architectures have interesting area-delay figures compared to conventional Booth radix-4 and radix-8 parallel binary multipliers and outperform the figures of previous alternatives for decimal multiplication. In [12] In this paper, a new MAC architecture to execute the multiplication- accumulation operation, which is the key operation, for digital signal processing and multimedia information processing efficiently, was proposed. By removing the independent accumulation process that has the largest delay and merging it to the compression process of the partial products, the overall MAC performance has been improved almost twice as much as in the previous work. The proposed hardware was implemented and synthesized through four types of CMOS processes. When examination is based on theoretical and experimental results, the proposed MAC required the hardware resources as much as the previous research. The delay was modeled using Sakurai s alpha power law. While the delay has been increased slightly compared to the previous research, actual performance has been increased to about twice if the pipeline is incorporated. The proposed architecture can be used effectively in the area requiring high throughput such as a real-time digital signal processing can be expected. This paper is organized as follows. In Section 2, the functional model design is specified. Finally, Section 3 analyses the experimental results. Section 4 explains the future work, Section 5 concludes the paper and Section 6 specifies the references which was useful to perform the work. 2. Functional Model Design Let us consider the multiplication operation of two n-bit signed numbers X = x n-1, x n-2.x 0 (multiplicand) and Y = y n-1, y n-2.y 0 (multiplier). The two s complement representations of X and Y can be expressed as follows: X = - x n-1 2 n-1 +. x i 2 i, Y = - y n-1 2 n-1 +. y i 2 i (1)

Implementing the Functional Model of High Accuracy Fixed Width Modified Booth Multiplier 395 The functional model design of the booth multiplier consists of first, booth encoder for encoding the multiplicand/multiplier. Second, if the multiplicand and multiplier are of n-bits, partial product generator generates (0.n/2-1) n/2 number of partial product bits which are a one dimensional array. Third, compression tree consists of 9-bit, 12-bit and 16-bit carry Look ahead generator to produce final output product. 2.1. Booth Encoder Circuit The booth encoder circuit takes the input multiplicand X and produces mul, shift and twocom signal which is used by the partial product generation circuit to produce the partial product bits of one dimensional array. Fig.1 explains the booth encoder circuit. The term word in the figure indicates the multiplicand bits appended with zeroes and where i = 0..n/2-1. The append operation is used to make multiplicand compatible with the Carry Look ahead Adder input bits. In this case, the word is formed by appending a single zero to the multiplicand X. The word is of 0..n bits in length and the mul, shift and twocom is of 0 n/2 bits in length. 2.2. Partial Product Generation Circuit Figure1 Booth Encoder Circuit The partial product generation circuit takes the booth encoder output along with the input multiplicand X. If the input multiplicand and multiplier are n-bits means, then (0..n/2-1) n/2 number of one dimensional partial product bits are generated. Fig.2 shows the basic partial product generation circuit. Figure 2 Basic Partial Product Generation Circuit The 2 s complement representation of the multiplicand X is

IJECSE,Volume1,Number 2 V. Keerthana and C. Arun Prasath tmp = ~X + 1'b1 (2) The shift, two, zero, org and mul signals can be explained as shift = {~tmp [n-1], tmp [n-2:0], 1'b0} (3) two = {~tmp [n-1], tmp} (4) zero = {1'b1, `n'b0} (5) org = {~X [n-1], X} (6) mul = {~X [n-1], X [n-2:0], 1'b0} (7) where n is the number of bits in the multiplicand X. The inputs shift, mul, org, two and zero will be based on the multiplicand X. Depending on the selection line bits namely mul, shift and twocom which are the outputs of the booth encoder circuit, one of the inputs, namely shift, org, mul, two and zero which are a one dimensional array, will be selected by the 5X1 multiplexer as a partial product. If the 8x8 multiplication is taken, then the one dimensional partial product bits will be four vis-à-vis PP 0, PP 1, PP 2 and PP 3. Fig.3 indicates the generation of four one dimensional partial product bits array for 8x8 multiplication. Fiure.3 Generation of one dimensional partial product bits array for 8x8 multiplication. 2.3. Compression Tree using Carry Look Ahead Generator The compression tree is the one which adds the individual partial product bits generated by partial product generator using Carry Look ahead Adder and produce the final output product bits. The carry look ahead adder inputs are x, y and c in. The look ahead adder will calculate the carry generate and carry propagate signals given as g = x & y (8) p = x ^ y (9) The sum and carry output of the carry look ahead generator is obtained from the carry generate and carry propagate signals. sum = p i ^ c i-1 (10) carry = g i + p i c i (11)

Implementing the Functional Model of High Accuracy Fixed Width Modified Booth Multiplier 397 Figure.4 Compression Tree using Carry Look ahead Adder 3. Experimental Analysis The experimental analysis reveals the CPLD report of Xilinx ISE simulator for the normal booth multiplier and multiplier implemented using Carry Look ahead Adder. Multiplier Normal multiplier Table 1 Comparison among Memory usage, Macro cell and Functional Block parameters. booth Multiplier using carry look ahead adder Total memory usage (Kbytes) Macro cell used 120804 130/144 (91%) 119780 108/144 (75%) Macro cell in low power mode Functional Block input used 130 248/432 (58%) 108 192/432 It is implied that low memory usage, macro cell usage and functional block usage leads to minimum hardware usage and power consumption. 4. Future Work The current analysis produces high accuracy for the fixed width output product which is of length 2n - bits i.e. n multiplicand and n multiplier produce 2n - bit output product. There is a further need to produce high accuracy for the fixed width of half, quarter, one by eighth and one by sixteenth of the product term bits. The above need is satisfied by means of comparator and a sorting network which uses minimum number of logic gates. (45%) 5. Conclusion

IJECSE,Volume1,Number 2 V. Keerthana and C. Arun Prasath In this paper, a high-accuracy fixed-width modified Booth multiplier has been proposed. In the proposed multiplier, the booth encoder has reduced the number of partial product array to half the value. The partial product generator has generated the partial product array bits. The compression tree has generated the final output product bits. The adder which is used in the implementation of multiplier is Carry Look ahead Adder. The compression tree along with the carry look ahead adder has reduced the hardware overhead and power consumption. 6. References [1] J.W Chen, R.H Yao, and W.J Wu, Efficient 2 n + 1 Modulo Multipliers, IEEE Transactions on Very Large Scale Integration (VLSI) systems, V. 19, NO. 12, 2011. [2] Yuan-Ho Chen, Chung-Yi Li, and Tsin-Yuan Chang, IEEE, Area-Effective and Power-Efficient Fixed-Width Booth Multipliers Using Generalized Probabilistic Estimation Bias, IEEE Journal on Emerging and Selected topics in Circuits and Systems, V. 1, NO. 3, 2011. [3] Valeria Garofalo, Nicola Petra and Ettore Napoli, Analytical Calculation of the Maximum Error for a Family of Truncated Multipliers Providing Minimum Mean Square Error, IEEE Transactions on Computers, V. 60, NO. 9, 2011. [4] R. Nakamoto, S. Sakuraba, T. Onomi, S. Sato, and K. Nakajima, 4-bit SFQ Multiplier Based on Booth Encoder, IEEE Transactions on Applied Superconductivity, V. 21, NO. 3, 2011. [5] N. Petra, et.al Design of Fixed-Width Multipliers With Linear Compensation Function, IEEE Transactions on Circuits and Systems, V. 58, NO. 5, 2011. [6] Ramya Muralidharan, and Chip-Hong Chang, Radix-8 Booth Encoded Modulo 2 n - 1 Multipliers With Adaptive Delay for High Dynamic Range Residue Number System, IEEE Transactions on Circuits and Systems, V. 58, NO. 5, 2011. [7] C.Y Li et.al,, A Probabilistic Estimation Bias Circuit for Fixed-Width Booth Multiplier and Its DCT Applications, IEEE Transactions on Circuits and Systems, V. 58, NO. 4, 2011. [8] F. Lamberti, et.al Reducing the Computation Time in (Short Bit-Width) Two s Complement Multipliers, IEEE Transactions on Computers, V. 60, NO. 2, 2011. [9] N. Petra, et.al Truncated Binary Multipliers With Variable Correction and Minimum Mean Square Error, IEEE Transactions on Circuits and Systems, V. 57, NO. 6, 2010. [10] Shiann-Rong Kuang, and Jiun-Ping Wang, Design of Power-Efficient Configurable Booth Multiplier, IEEE Transactions on Circuits and Systems, V. 57, NO. 3, 2010. [11] A. Vazquez, E. Antelo, and P. Montuschi, Improved Design of High-Performance Parallel Decimal Multipliers, IEEE Transactions on Computers, V. 59, NO. 5, 2010. [12] Young-Ho Seo, and Dong-Wook Kim, A New VLSI Architecture of Parallel Multiplier Accumulator Based on Radix-2 Modified Booth Algorithm, IEEE Transactions on Very Large Scale Integration (VLSI) Systems, V. 18, NO. 2, 2010.