VHDL Code Generator for Optimized Carry-Save Reduction Strategy in Low Power Computer Arithmetic

Size: px
Start display at page:

Download "VHDL Code Generator for Optimized Carry-Save Reduction Strategy in Low Power Computer Arithmetic"

Transcription

1 VHDL Code Generator for Optimized Carry-Save Reduction Strategy in Low Power Computer Arithmetic DAVID NEUHÄUSER Friedrich Schiller University Department of Computer Science D Jena GERMANY EBERHARD ZEHENDNER Friedrich Schiller University Department of Computer Science D Jena GERMANY Abstract: Carry-save arithmetic is frequently used in multiplier design. When reducing an array of partial products by carry-save addition, one cannot be certain, which carry-save strategy yields the best results in terms of area, latency, and low power consumption. In our contribution, we expose differences between the strategies of Wallace and Dadda, as well as our carry-save method called short result strategy SRR, when applied to various arithmetic operations. We provide a software tool for time efficient analysis and rapid prototyping of carry-save arithmetic using strategies of this kind. We show results gained by employing our tool in terms of the expected area, latency, and power consumption of the resulting circuit, and outline the relevance for low power design. Key Words: carry-save, tree multiplier, multiply-accumulate, merged arithmetic, Dadda, Wallace tree, low power, VHDL 1 Introduction Carry-save arithmetic is often proposed to be used in multiplier [18], multiply-accumulate [4], digital filter [12] or cryptography circuits [7]. Carry-save arithmetic yields the advantage of fully parallel addition. The problem of calculating a multiplication, a multiply-accumulate operation, or a digital filter computation can be seen as the task to reduce n partial products to only two. These final two partial products can be used as input for further carry-save operations or summed up using a carry-propagate adder (CPA) to gain a number in the usual binary representation. The task of reducing several partial products can be accomplished by using different elements, in particular (3,2)-counters [12, 14, 13], (4,2)- counters [9] or (5,2)-counters [10]. In addition, for every set of used elements, several strategies can be found. We focus on strategies for (3,2)-counters, using full adders and half adders as elements. Depending on the used strategy, digital circuits composed of (3,2)- counters differ by the area required, the shape of the resulting partial products, and the power consumption of the whole circuit. In this paper, we investigate different strategies and point out the possible advantage of each strategy. For flexibility we implemented the strategies with the programming language C++, creating a tool, which applies the required strategies to the given task. The tool offers statistics about the area cost and expected latency and creates VHDL code. The gained statistics about the expected area can be used as an approximation of the final circuit s power consumption. Synthesis of this source code gives even more accurate information about the expected area, latency, and power consumption. The following section will provide a short review of carry-save arithmetic. In Section 3, we investigate different strategies and point out their application to regular structures like multipliers. Irregular structures as needed by multiply-accumulate arithmetic will be discussed in Section 4. The VHDL code generator tool is introduced in Section 5. In Section 6, we present generator based statistics as well as synthesis results. We conclude with Section 7 and show possible improvements and applications of our tool. 2 Carry-Save Arithmetic Any arithmetic operation that requires adding up more than two partial products can be implemented by using carry-save arithmetic, for example plain multipli-

2 Table 1: Reducing 6 6 bit multiplier partial products. HA FA s bits c bits area CPA Wallace Dadda SRR cation (Equ. 1), multiply-accumulate (Equ. 2), or digital filter operations, as for instance a FIR Filter (Equ. 3) [3]. C = A B (1) n C = A(i) B(i) (2) C(k) = i=1 n A(k i + 1) B(i) (3) i=1 To realize the required arithmetic operation, we reduce the set of partial products by taking two or three bits of the same weight and summing them up with a half adder resp. a full adder, depending on the applied strategy. All additions performed during one step operate on mutually disjoint sets of partial product bits and therefore are independent of each other, thus they can be conducted in parallel. As the result of such a step, we get a new arrangement consisting of sum bits and carry bits of the full adders and half adders, as well as some unreduced bits. This structure constitutes the task for the next step. After a certain number of steps we gain a result with no more than two unreduced bits per bit position, which can be seen as two remaining partial products. To reduce p partial products, we need h steps, which is the height of the resulting tree [19]. The maximum number p max of partial products that can be reduced to two final partial products by h steps is [14] p max (h) = p max (h 1) 3, for h > 0,(4) 2 p max (0) = 2, where is defined as b = a, for a IR, b IN and b a < b+1. (5) 3 Reduction Strategies There exist different strategies to decide which bits should be reduced by a half adder and which by a full adder in each step; some partial product bits may also stay unreduced. Figure 1: Wallace strategy for multipliers. Figure 2: Dadda strategy for multipliers. 3.1 Wallace tree: In a Wallace tree [19] we reduce the partial products by using a tree structure, consisting of carry-save adders. The strategy tends to compress partial product bits as much as possible during a single full adder delay. An example 6 bit 6 bit unsigned multiplier is shown in figure 1. We get the task of reducing 6 partial products. The least significant bit position is always the rightmost position. The applied steps shown in figures Dadda strategy: The Dadda [5] strategy looks for the bit position with most bits and calculates the smallest reachable level L min at this position within one step. All necessary steps are shown in Figures 2-. Without changing the weight of any bit, the array structure can be rearranged as a pyramid. The least significant bit position is always the rightmost position. Beginning with the least significant bit position, half adders and full adders are placed to reach L min, but no effort is done to reduce any bits in advance. Note that the Dadda resp. Wallace strategies yield exactly the same number of steps. 3.3 SRR: We propose a new strategy, called short result strategy SRR strategy. The goal of

3 1 1 Figure 4: MAC-unit partial products using Dadda. Figure 3: SRR strategy for multipliers. Table 2: Dadda and SRR in MAC-units. HA FA s b. c b. area lat. CPA Dadda SRR this strategy is to produce two final partial products of small width, but with hardware effort comparable to a Dadda. As in the other strategies, the maximum number of bits in a column is used to calculate the smallest reachable level within one step. The algorithm reduces a lower significant section following the Wallace strategy. Higher significant bits are only reduced if unavoidable to achieve the minimum number of steps, as in the Dadda strategy. The steps are shown in Figure 3. The mentioned strategies differ in the amount of half adders and/or full adders used, and in the shape of the resulting partial products; this can be seen in Table 1 for the example task of a 6 bit 6 bit multiplication. HA and FA give the numbers of half adders resp. full adders used; s bits and c bits are the numbers of bits of the two resulting partial products, and area is the half adder equivalent of the needed area with one full adder equaling two half adders. CPA is the number of common bits in both resulting partial products to be added up by the final CPA to gain a binary number. We won t consider the Wallace strategy any further in this paper, since it appears to be clearly weaker than the Dadda resp. the SRR strategy. 4 Irregular Structures For reasons of efficiency, Equations 2 resp. 3 can be implemented as instances of merged arithmetic [16], see also [14]. However, this approach leads to far more complex structures to be reduced. As an example, we assume A and B as 4 bit wide and C resp. C(k) as carry-save 8 bit wide, all in two s complement. We gain the task of reducing 6 partial products. Figure 4 shows these partial products when using a Dadda strategy. A Baugh-Wooley multi- Figure 5: MAC-unit steps using Dadda. plication array is used for two s complement multiplication (proposed in [2], reviewed in [17]). Bits in the leftmost column are most significant and signed; they are removed from the original partial products for separated overflow detection and sign correction. This leaves the partial products in Figure 4 to be reduced. Figures 5- show the steps. When applying SRR, we gain a different structure of partial products, as shown in Figure 6. Again, bits in the leftmost column are most significant and signed; they are removed from the original partial products for separated overflow detection and sign correction. This leaves the partial products in Figure 6 to be reduced. Figures 7- show the corresponding steps. As before, strategies differ in the shape of the resulting partial products; this can be seen in Table 2 for the example task of reducing a multiply-accumulate partial product array. Notice however, that the numbers of half adders resp. full adders and thus the total area agree in this example, in contrast to the results from Section 3. Again, HA and FA give the numbers of half adders resp. full adders used; s bits and c bits are the numbers of bits of the two resulting 1 1 Figure 6: MAC-unit partial products using SRR.

4 design flow control strategy task VHDL generator tree statistics tree VHDL synthesis synthesis statistics synthesized design VHDL framework excluding tree Figure 7: MAC-unit steps using SRR. Figure 8: VHDL code generator in design flow. partial products, and area is the half adder equivalent of the needed area. Latency is the sum of the tree height and the CPA, the latter being the number of common bits in both resulting partial products to be added up by the final CPA to gain a binary number. The multiplication and multiply-accumulate examples show only a small part of the variety of possible tasks. The effects of SSR on multiplyaccumulate units are shown in detail in [11]. Whether to implement a multiply, multiply-accumulate, or digital filter operation affects the shape of the structure to be reduced. Deciding for unsigned, one s or two s complement, or sign-magnitude representation influences the shape, too. Using carry-save or non carrysave accumulation has an additional effect. The same holds for the bit width of all operands as well as the different strategies. To take reasonable design decisions, one would have to consider all different possible designs, describe them in VHDL, and synthesize them manually. This seems to be a quite time consuming task. The need of automation is obvious. 5 VHDL Code Generator For flexibility we implemented these algorithms in C++, creating a generator which produces VHDL source code. This source code describes a partial product tree using one of the discussed partial product strategies. Figure 8 shows the incorporation of the generator into the design flow. The design flow control defines a task and the needed strategy. It also provides the VHDL framework. This VHDL source code needed for synthesis defines all of the arithmetic circuit except the tree. The generator reduces the given bit structure, using the required strategy, creating VHDL source code of the tree as well as time and area statistics. Both VHDL source codes are synthesized using Synopsys Design Compiler. As a result we gain the Table 3: VHDL-generator statistics for multipliers. area latency area * latency Bits Dadda SRR Dadda SRR percentage synthesized design and post synthesis statistics about area, latency, and power consumption of the design. The advantages of this approach are two-fold: On one hand, we get statistics of the expected circuit complexity and performance before doing time consuming synthesis. On the other hand, we can easily perform rapid prototyping to compare the effect of different approaches and different tasks through synthesis of the VHDL source code. This enables us to compare different strategies and different bit widths in less time than having to design different arrays by hand. 6 Results For a multiplier implementation, a shorter final CPA can be used by applying the SRR strategy. This final CPA has to add up less bits, therefore being faster and smaller. Thus one could assume the SRR strategy to require less area and latency than the Dadda strategy. The generator statistics for a multiplier as shown in Table 3, assuming a ripple-carry adder (RCA) for low power results, seem to proof this assumption. The latency advantage of the SRR based multiplier could be traded into an area advantage by synthesizing both designs with equal latency constraints. The resulting smaller SRR based multiplier yields significant less power consumption than the Dadda based multiplier. This assumption neglects the different latencies used to calculate each bit of the resulting partial products, and may thus produce misleading conclusions, see for instance [6, 13, 18]. It has been shown in [18], that the Wallace strategy, although leading to a

5 Table 4: SM-multipliers synthesis results using SRR. bits area latency area * latency power smaller final CPA, is worse than the Dadda strategy in terms of area and latency, when using full adders with input-dependent latencies. Similar arguments might hold for the SRR strategy. As an example, we generated VHDL source code for signed magnitude multipliers (SM-multipliers) and synthesized it with Synopsys Design Compiler and UMC 180 nm CMOS library. Synthesis results (in percent) for the SRR strategy are shown in Table 4, normalized to the results for the Dadda strategy. To create a low power design, we chose a RCA as the final CPA and minimized the area of the whole design throughout synthesis. The results are as predicted in [18]. The SRR strategy, although using a smaller CPA, performs worse than the Dadda strategy when designing a signed magnitude multiplier. Taking a look into a multiply-accumulate unit, we cannot rely on the same prediction. We now have the choice to either sum up the two resulting partial products and latch a smaller binary number, or latch both resulting partial products before adding them up. By moving the final adder out of the critical path, as in the second approach, we gain a significantly lower cycle time. Since the purpose of a multiply-accumulate unit is to add a sequence of products, this approach seems to be more efficient. Internally we latch both resulting partial products back to the partial product tree. Therefore we have to expand the partial product array by two additional lines. By latching both partial products, all latched bits will gain the same arrival time, depending on the worst bit latency. The Dadda strategy loses its latency advantage, now having the penalty of a larger and slower final CPA. Furthermore, the fewer result bits introduced by the SRR strategy require fewer latches than with the Dadda strategy. The generator statistics for two s complement multiply-accumulate units (TC-MAC-units) are shown in Table 5, assuming a RCA for low power performance. Again, the latency advantage of the SRR based design could be traded into an area advantage by synthesizing both designs with equal latency constraints. The resulting smaller SRR based design would yield significant less power consumption than the Dadda based design. Table 5: TC-MAC-units statistics using SRR. one multiply cycle one total operation bits area latency A L area latency A L Table 6: TCA-MAC-units synthesis using SRR. one MAC cycle final MAC cycle + recoding bits A L power A L power Synthesis results (in percent) for two s complement multiply-accumulate units using the SRR strategy are shown in Table 6, normalized to the results for the Dadda strategy. To create a low power design, we chose a RCA as the final CPA and minimized the area of the whole design throughout synthesis. The computation in Equation 2 corresponds to n 1 multiplyaccumulate (MAC) cycles yielding a result in carrysave representation, and one final MAC cycle that is followed by a recoding phase adding up the two remaining partial products. Table 6 shows the results for (left) one of the first n 1 cycles, and (right) the final MAC cycle, including the recoding. Comparing these two different designs, we gain an advantage by applying the SRR strategy. Therefore, when designing a multiply-accumulate unit with a redundant accumulate part, the SRR strategy is more efficient than the Dadda strategy, contrary to designing a multiplier. 7 Conclusion and Future Work The generator design tool allows us to freely define tasks of summing up partial products, gaining expected area and latency statistics before synthesis, as well as VHDL source code. Many different design choices when dealing with multipliers, multiplyaccumulate units, digital filters, and other complex computer arithmetic circuits can rapidly be prototyped using the generator. We have shown that there is no overall optimal strategy. The usability of the strategy depends on the required arithmetic operation. The need to compare the different strategies for every new arithmetic operation is obvious. Applying this design flow method on other complex computer arithmetic structures is our

6 primary goal. Especially looking into digital filters and deciding on an optimal strategy will be easier using the generator tool. Including other carry-save elements, as (4,2)-counters and (5,2)-counters, as well as other redundant representations like signed binary [1] would widen the comparable space of possible designs. Incorporating the option of pipelining into the tree by inserting latches can be another future improvement of the generator tool to compare a wider variety of designs, as done in [15]. References: [1] Avizienis, A.: Signed-digit number representations for fast parallel arithmetic. In: IRE Transactions on Electronic Computers, vol. 10, pp (1961) [2] Baugh, C.R., Wooley, B.A.: A two s complement parallel array multiplier algorithm. In: IEEE Transactions on Computers, vol. 22, pp (1973) [3] Bellanger, M. : Digital processing of signals. Theory and Practice. 3rd edition. Wiley, [4] Chen, J., Xu, R., Fu, Y.: Architecture Design of a High-Performance 32-Bit Fixed-Point DSP. In: ACSAC LNCS, vol. 3189, pp Springer, Heidelberg (2004) [5] Dadda, L.: Some schemes for parallel multipliers. In: Alta Frequenza, vol. 34, pp (1965) [6] Flynn, M.J., Oberman, S.F.: Advanced Computer Arithmetic Design. Wiley, New York, [7] Huang, M., Gaj, K., Kwon, S., El-Ghazawi, T.: An Optimized Hardware Architecture for the Montgomery Multiplication Algorithm. In: PKC 2008, LNCS, vol. 4939, pp International Association for Cryptologic Research (2008) [8] Johannson, K., Gustafsson, O., Wanhammar, L.: Power Estimation for Ripple-Carry Adders with Correlated Input Data. In: PATMOS LNCS, vol. 3254, pp Springer, Heidelberg (2005) [9] Kornerup, P.: Reviewing 4-to-2 Adders for Multi- Operand Addition. In: Journal of VLSI Signal Processing, vol. 40, pp Kluwer Academic Publishers(2005) [10] Kwon, O., Nowka, K., Swartzlander, E.E. Jr.: A 16-Bit by 16-Bit MAC Design Using Fast 5:3 Compressor Cells. In: Journal of VLSI Signal Processing, vol. 31, pp Kluwer Academic Publishers (2002) [11] Neuhäuser, D., Zehendner, E.: On Carry- Save Strategies for Multiply-Accumulate Arithmetic. European Conference of Computer Science (ECCS 11), Puerto de la Cruz, Spain (2011) [12] Noll, T.G.: Carry-Save Architectures for High- Speed Digital Signal Processing. In: Journal of VLSI Signal Processing, vol. 3, pp Kluwer Academic Publishers, Boston (1991) [13] Oklobdzija, V.G., Villeger, D., and Liu, S.S.: A method for speed optimized partial product and generation of fast parallel multipliers using an algorithmic aproach. In: IEEE Transactions on Computers, vol. 45, no. 3, pp (1996) [14] Parhami, B.: Computer arithmetic: algorithms and hardware designs. Oxford University Press, New York, Oxford, [15] Schuster, Ch., Nagel, J.L., Piguet, Ch., Farine, P.A.: Leakage Reduction at the Architectural Level and Its Application to 16 Bit Multiplier Architectures. In: E. Macii et al. (Eds.): PATMOS 2004, LNCS vol. 3254, pp Springer, Heidelberg (2004) [16] Swartzlander, E.E. Jr.: Merged Arithmetic. In: IEEE Transactions on Computers, vol. 29, no. 10, pp (1980) [17] Swartzlander, E.E. Jr.: The Negative Two s Complement Number System. In: Journal of VLSI Signal Processing, vol. 49, pp Springer Science + Business Media, LLC (2007) [18] Townsend, W.J., Swartzlander, E.E. Jr., Abraham, J.A.: A comparison of Dadda and Wallace multiplier delays. In: Advanced signal processing algorithms, architectures, and implementations. Conference No 13, vol. 5205, pp San Diego CA, USA (2003) [19] Wallace, C.S.: A suggestion for a fast multiplier. In: IEEE Transactions on Computers, vol. 13, pp (1964)

Implementation of Modified Booth Algorithm (Radix 4) and its Comparison with Booth Algorithm (Radix-2)

Implementation of Modified Booth Algorithm (Radix 4) and its Comparison with Booth Algorithm (Radix-2) Advance in Electronic and Electric Engineering. ISSN 2231-1297, Volume 3, Number 6 (2013), pp. 683-690 Research India Publications http://www.ripublication.com/aeee.htm Implementation of Modified Booth

More information

Multipliers. Introduction

Multipliers. Introduction Multipliers Introduction Multipliers play an important role in today s digital signal processing and various other applications. With advances in technology, many researchers have tried and are trying

More information

RN-Codings: New Insights and Some Applications

RN-Codings: New Insights and Some Applications RN-Codings: New Insights and Some Applications Abstract During any composite computation there is a constant need for rounding intermediate results before they can participate in further processing. Recently

More information

Floating Point Fused Add-Subtract and Fused Dot-Product Units

Floating Point Fused Add-Subtract and Fused Dot-Product Units Floating Point Fused Add-Subtract and Fused Dot-Product Units S. Kishor [1], S. P. Prakash [2] PG Scholar (VLSI DESIGN), Department of ECE Bannari Amman Institute of Technology, Sathyamangalam, Tamil Nadu,

More information

RN-coding of Numbers: New Insights and Some Applications

RN-coding of Numbers: New Insights and Some Applications RN-coding of Numbers: New Insights and Some Applications Peter Kornerup Dept. of Mathematics and Computer Science SDU, Odense, Denmark & Jean-Michel Muller LIP/Arénaire (CRNS-ENS Lyon-INRIA-UCBL) Lyon,

More information

An Efficient RNS to Binary Converter Using the Moduli Set {2n + 1, 2n, 2n 1}

An Efficient RNS to Binary Converter Using the Moduli Set {2n + 1, 2n, 2n 1} An Efficient RNS to Binary Converter Using the oduli Set {n + 1, n, n 1} Kazeem Alagbe Gbolagade 1,, ember, IEEE and Sorin Dan Cotofana 1, Senior ember IEEE, 1. Computer Engineering Laboratory, Delft University

More information

SAD computation based on online arithmetic for motion. estimation

SAD computation based on online arithmetic for motion. estimation SAD computation based on online arithmetic for motion estimation J. Olivares a, J. Hormigo b, J. Villalba b, I. Benavides a and E. L. Zapata b a Dept. of Electrics and Electronics, University of Córdoba,

More information

Step : Create Dependency Graph for Data Path Step b: 8-way Addition? So, the data operations are: 8 multiplications one 8-way addition Balanced binary

Step : Create Dependency Graph for Data Path Step b: 8-way Addition? So, the data operations are: 8 multiplications one 8-way addition Balanced binary RTL Design RTL Overview Gate-level design is now rare! design automation is necessary to manage the complexity of modern circuits only library designers use gates automated RTL synthesis is now almost

More information

A High-Performance 8-Tap FIR Filter Using Logarithmic Number System

A High-Performance 8-Tap FIR Filter Using Logarithmic Number System A High-Performance 8-Tap FIR Filter Using Logarithmic Number System Yan Sun and Min Sik Kim School of Electrical Engineering and Computer Science Washington State University Pullman, Washington 99164-2752,

More information

Lecture 5: Gate Logic Logic Optimization

Lecture 5: Gate Logic Logic Optimization Lecture 5: Gate Logic Logic Optimization MAH, AEN EE271 Lecture 5 1 Overview Reading McCluskey, Logic Design Principles- or any text in boolean algebra Introduction We could design at the level of irsim

More information

Hardware Implementations of RSA Using Fast Montgomery Multiplications. ECE 645 Prof. Gaj Mike Koontz and Ryon Sumner

Hardware Implementations of RSA Using Fast Montgomery Multiplications. ECE 645 Prof. Gaj Mike Koontz and Ryon Sumner Hardware Implementations of RSA Using Fast Montgomery Multiplications ECE 645 Prof. Gaj Mike Koontz and Ryon Sumner Overview Introduction Functional Specifications Implemented Design and Optimizations

More information

Computer Science 281 Binary and Hexadecimal Review

Computer Science 281 Binary and Hexadecimal Review Computer Science 281 Binary and Hexadecimal Review 1 The Binary Number System Computers store everything, both instructions and data, by using many, many transistors, each of which can be in one of two

More information

Chapter 2 Basic Structure of Computers. Jin-Fu Li Department of Electrical Engineering National Central University Jungli, Taiwan

Chapter 2 Basic Structure of Computers. Jin-Fu Li Department of Electrical Engineering National Central University Jungli, Taiwan Chapter 2 Basic Structure of Computers Jin-Fu Li Department of Electrical Engineering National Central University Jungli, Taiwan Outline Functional Units Basic Operational Concepts Bus Structures Software

More information

CHAPTER 3 Boolean Algebra and Digital Logic

CHAPTER 3 Boolean Algebra and Digital Logic CHAPTER 3 Boolean Algebra and Digital Logic 3.1 Introduction 121 3.2 Boolean Algebra 122 3.2.1 Boolean Expressions 123 3.2.2 Boolean Identities 124 3.2.3 Simplification of Boolean Expressions 126 3.2.4

More information

Implementing the Functional Model of High Accuracy Fixed Width Modified Booth Multiplier

Implementing the Functional Model of High Accuracy Fixed Width Modified Booth Multiplier International Journal of Electronics and Computer Science Engineering 393 Available Online at www.ijecse.org ISSN: 2277-1956 Implementing the Functional Model of High Accuracy Fixed Width Modified Booth

More information

Today. Binary addition Representing negative numbers. Andrew H. Fagg: Embedded Real- Time Systems: Binary Arithmetic

Today. Binary addition Representing negative numbers. Andrew H. Fagg: Embedded Real- Time Systems: Binary Arithmetic Today Binary addition Representing negative numbers 2 Binary Addition Consider the following binary numbers: 0 0 1 0 0 1 1 0 0 0 1 0 1 0 1 1 How do we add these numbers? 3 Binary Addition 0 0 1 0 0 1 1

More information

A New Algorithm for Carry-Free Addition of Binary Signed-Digit Numbers

A New Algorithm for Carry-Free Addition of Binary Signed-Digit Numbers 2014 IEEE 22nd International Symposium on Field-Programmable Custom Computing Machines A New Algorithm for Carry-Free Addition of Binary Signed-Digit Numbers Klaus Schneider and Adrian Willenbücher Embedded

More information

Binary Adders: Half Adders and Full Adders

Binary Adders: Half Adders and Full Adders Binary Adders: Half Adders and Full Adders In this set of slides, we present the two basic types of adders: 1. Half adders, and 2. Full adders. Each type of adder functions to add two binary bits. In order

More information

Lecture 2. Binary and Hexadecimal Numbers

Lecture 2. Binary and Hexadecimal Numbers Lecture 2 Binary and Hexadecimal Numbers Purpose: Review binary and hexadecimal number representations Convert directly from one base to another base Review addition and subtraction in binary representations

More information

Modeling Sequential Elements with Verilog. Prof. Chien-Nan Liu TEL: 03-4227151 ext:34534 Email: jimmy@ee.ncu.edu.tw. Sequential Circuit

Modeling Sequential Elements with Verilog. Prof. Chien-Nan Liu TEL: 03-4227151 ext:34534 Email: jimmy@ee.ncu.edu.tw. Sequential Circuit Modeling Sequential Elements with Verilog Prof. Chien-Nan Liu TEL: 03-4227151 ext:34534 Email: jimmy@ee.ncu.edu.tw 4-1 Sequential Circuit Outputs are functions of inputs and present states of storage elements

More information

High Speed and Efficient 4-Tap FIR Filter Design Using Modified ETA and Multipliers

High Speed and Efficient 4-Tap FIR Filter Design Using Modified ETA and Multipliers High Speed and Efficient 4-Tap FIR Filter Design Using Modified ETA and Multipliers Mehta Shantanu Sheetal #1, Vigneswaran T. #2 # School of Electronics Engineering, VIT University Chennai, Tamil Nadu,

More information

International Journal of Electronics and Computer Science Engineering 1482

International Journal of Electronics and Computer Science Engineering 1482 International Journal of Electronics and Computer Science Engineering 1482 Available Online at www.ijecse.org ISSN- 2277-1956 Behavioral Analysis of Different ALU Architectures G.V.V.S.R.Krishna Assistant

More information

COMP 250 Fall 2012 lecture 2 binary representations Sept. 11, 2012

COMP 250 Fall 2012 lecture 2 binary representations Sept. 11, 2012 Binary numbers The reason humans represent numbers using decimal (the ten digits from 0,1,... 9) is that we have ten fingers. There is no other reason than that. There is nothing special otherwise about

More information

The string of digits 101101 in the binary number system represents the quantity

The string of digits 101101 in the binary number system represents the quantity Data Representation Section 3.1 Data Types Registers contain either data or control information Control information is a bit or group of bits used to specify the sequence of command signals needed for

More information

A single register, called the accumulator, stores the. operand before the operation, and stores the result. Add y # add y from memory to the acc

A single register, called the accumulator, stores the. operand before the operation, and stores the result. Add y # add y from memory to the acc Other architectures Example. Accumulator-based machines A single register, called the accumulator, stores the operand before the operation, and stores the result after the operation. Load x # into acc

More information

From Concept to Production in Secure Voice Communications

From Concept to Production in Secure Voice Communications From Concept to Production in Secure Voice Communications Earl E. Swartzlander, Jr. Electrical and Computer Engineering Department University of Texas at Austin Austin, TX 78712 Abstract In the 1970s secure

More information

Method for Multiplier Verication Employing Boolean Equivalence Checking and Arithmetic Bit Level Description

Method for Multiplier Verication Employing Boolean Equivalence Checking and Arithmetic Bit Level Description Method for Multiplier Verication Employing Boolean ing and Arithmetic Bit Level Description U. Krautz 1, M. Wedler 1, W. Kunz 1 & K. Weber 2, C. Jacobi 2, M. Panz 2 1 University of Kaiserslautern - Germany

More information

Chapter 4 Register Transfer and Microoperations. Section 4.1 Register Transfer Language

Chapter 4 Register Transfer and Microoperations. Section 4.1 Register Transfer Language Chapter 4 Register Transfer and Microoperations Section 4.1 Register Transfer Language Digital systems are composed of modules that are constructed from digital components, such as registers, decoders,

More information

Adder.PPT(10/1/2009) 5.1. Lecture 13. Adder Circuits

Adder.PPT(10/1/2009) 5.1. Lecture 13. Adder Circuits Adder.T(//29) 5. Lecture 3 Adder ircuits Objectives Understand how to add both signed and unsigned numbers Appreciate how the delay of an adder circuit depends on the data values that are being added together

More information

VHDL Test Bench Tutorial

VHDL Test Bench Tutorial University of Pennsylvania Department of Electrical and Systems Engineering ESE171 - Digital Design Laboratory VHDL Test Bench Tutorial Purpose The goal of this tutorial is to demonstrate how to automate

More information

Addressing The problem. When & Where do we encounter Data? The concept of addressing data' in computations. The implications for our machine design(s)

Addressing The problem. When & Where do we encounter Data? The concept of addressing data' in computations. The implications for our machine design(s) Addressing The problem Objectives:- When & Where do we encounter Data? The concept of addressing data' in computations The implications for our machine design(s) Introducing the stack-machine concept Slide

More information

Testing Low Power Designs with Power-Aware Test Manage Manufacturing Test Power Issues with DFTMAX and TetraMAX

Testing Low Power Designs with Power-Aware Test Manage Manufacturing Test Power Issues with DFTMAX and TetraMAX White Paper Testing Low Power Designs with Power-Aware Test Manage Manufacturing Test Power Issues with DFTMAX and TetraMAX April 2010 Cy Hay Product Manager, Synopsys Introduction The most important trend

More information

To convert an arbitrary power of 2 into its English equivalent, remember the rules of exponential arithmetic:

To convert an arbitrary power of 2 into its English equivalent, remember the rules of exponential arithmetic: Binary Numbers In computer science we deal almost exclusively with binary numbers. it will be very helpful to memorize some binary constants and their decimal and English equivalents. By English equivalents

More information

Implementation and Design of AES S-Box on FPGA

Implementation and Design of AES S-Box on FPGA International Journal of Research in Engineering and Science (IJRES) ISSN (Online): 232-9364, ISSN (Print): 232-9356 Volume 3 Issue ǁ Jan. 25 ǁ PP.9-4 Implementation and Design of AES S-Box on FPGA Chandrasekhar

More information

A Novel Low Power, High Speed 14 Transistor CMOS Full Adder Cell with 50% Improvement in Threshold Loss Problem

A Novel Low Power, High Speed 14 Transistor CMOS Full Adder Cell with 50% Improvement in Threshold Loss Problem A Novel Low Power, High Speed 4 Transistor CMOS Full Adder Cell with 5% Improvement in Threshold Loss Problem T. Vigneswaran, B. Mukundhan, and P. Subbarami Reddy Abstract Full adders are important components

More information

NEW adder cells are useful for designing larger circuits despite increase in transistor count by four per cell.

NEW adder cells are useful for designing larger circuits despite increase in transistor count by four per cell. CHAPTER 4 THE ADDER The adder is one of the most critical components of a processor, as it is used in the Arithmetic Logic Unit (ALU), in the floating-point unit and for address generation in case of cache

More information

Binary Division. Decimal Division. Hardware for Binary Division. Simple 16-bit Divider Circuit

Binary Division. Decimal Division. Hardware for Binary Division. Simple 16-bit Divider Circuit Decimal Division Remember 4th grade long division? 43 // quotient 12 521 // divisor dividend -480 41-36 5 // remainder Shift divisor left (multiply by 10) until MSB lines up with dividend s Repeat until

More information

Optimization and Comparison of 4-Stage Inverter, 2-i/p NAND Gate, 2-i/p NOR Gate Driving Standard Load By Using Logical Effort

Optimization and Comparison of 4-Stage Inverter, 2-i/p NAND Gate, 2-i/p NOR Gate Driving Standard Load By Using Logical Effort Optimization and Comparison of -Stage, -i/p NND Gate, -i/p NOR Gate Driving Standard Load By Using Logical Effort Satyajit nand *, and P.K.Ghosh ** * Mody Institute of Technology & Science/ECE, Lakshmangarh,

More information

Let s put together a Manual Processor

Let s put together a Manual Processor Lecture 14 Let s put together a Manual Processor Hardware Lecture 14 Slide 1 The processor Inside every computer there is at least one processor which can take an instruction, some operands and produce

More information

Performance Comparison of an Algorithmic Current- Mode ADC Implemented using Different Current Comparators

Performance Comparison of an Algorithmic Current- Mode ADC Implemented using Different Current Comparators Performance Comparison of an Algorithmic Current- Mode ADC Implemented using Different Current Comparators Veepsa Bhatia Indira Gandhi Delhi Technical University for Women Delhi, India Neeta Pandey Delhi

More information

An Effective Deterministic BIST Scheme for Shifter/Accumulator Pairs in Datapaths

An Effective Deterministic BIST Scheme for Shifter/Accumulator Pairs in Datapaths An Effective Deterministic BIST Scheme for Shifter/Accumulator Pairs in Datapaths N. KRANITIS M. PSARAKIS D. GIZOPOULOS 2 A. PASCHALIS 3 Y. ZORIAN 4 Institute of Informatics & Telecommunications, NCSR

More information

AN IMPROVED DESIGN OF REVERSIBLE BINARY TO BINARY CODED DECIMAL CONVERTER FOR BINARY CODED DECIMAL MULTIPLICATION

AN IMPROVED DESIGN OF REVERSIBLE BINARY TO BINARY CODED DECIMAL CONVERTER FOR BINARY CODED DECIMAL MULTIPLICATION American Journal of Applied Sciences 11 (1): 69-73, 2014 ISSN: 1546-9239 2014 Science Publication doi:10.3844/ajassp.2014.69.73 Published Online 11 (1) 2014 (http://www.thescipub.com/ajas.toc) AN IMPROVED

More information

LMS is a simple but powerful algorithm and can be implemented to take advantage of the Lattice FPGA architecture.

LMS is a simple but powerful algorithm and can be implemented to take advantage of the Lattice FPGA architecture. February 2012 Introduction Reference Design RD1031 Adaptive algorithms have become a mainstay in DSP. They are used in wide ranging applications including wireless channel estimation, radar guidance systems,

More information

Digital Logic Design. Basics Combinational Circuits Sequential Circuits. Pu-Jen Cheng

Digital Logic Design. Basics Combinational Circuits Sequential Circuits. Pu-Jen Cheng Digital Logic Design Basics Combinational Circuits Sequential Circuits Pu-Jen Cheng Adapted from the slides prepared by S. Dandamudi for the book, Fundamentals of Computer Organization and Design. Introduction

More information

Embedded System Hardware - Processing (Part II)

Embedded System Hardware - Processing (Part II) 12 Embedded System Hardware - Processing (Part II) Jian-Jia Chen (Slides are based on Peter Marwedel) Informatik 12 TU Dortmund Germany Springer, 2010 2014 年 11 月 11 日 These slides use Microsoft clip arts.

More information

Oct: 50 8 = 6 (r = 2) 6 8 = 0 (r = 6) Writing the remainders in reverse order we get: (50) 10 = (62) 8

Oct: 50 8 = 6 (r = 2) 6 8 = 0 (r = 6) Writing the remainders in reverse order we get: (50) 10 = (62) 8 ECE Department Summer LECTURE #5: Number Systems EEL : Digital Logic and Computer Systems Based on lecture notes by Dr. Eric M. Schwartz Decimal Number System: -Our standard number system is base, also

More information

System on Chip Design. Michael Nydegger

System on Chip Design. Michael Nydegger Short Questions, 26. February 2015 What is meant by the term n-well process? What does this mean for the n-type MOSFETs in your design? What is the meaning of the threshold voltage (practically)? What

More information

SAD computation based on online arithmetic for motion estimation

SAD computation based on online arithmetic for motion estimation Microprocessors and Microsystems 30 (2006) 250 258 www.elsevier.com/locate/micpro SAD computation based on online arithmetic for motion estimation J. Olivares a, J. Hormigo b, J. Villalba b, *, I. Benavides

More information

High Speed Gate Level Synchronous Full Adder Designs

High Speed Gate Level Synchronous Full Adder Designs High Speed Gate Level Synchronous Full Adder Designs PADMANABHAN BALASUBRAMANIAN and NIKOS E. MASTORAKIS School of Computer Science, The University of Manchester, Oxford Road, Manchester M13 9PL, UNITED

More information

Coding Guidelines for Datapath Synthesis

Coding Guidelines for Datapath Synthesis Coding Guidelines for Datapath Synthesis Reto Zimmermann Synopsys July 2005 Abstract This document summarizes two classes of RTL coding guidelines for the synthesis of datapaths: Guidelines that help achieve

More information

FPGA. AT6000 FPGAs. Application Note AT6000 FPGAs. 3x3 Convolver with Run-Time Reconfigurable Vector Multiplier in Atmel AT6000 FPGAs.

FPGA. AT6000 FPGAs. Application Note AT6000 FPGAs. 3x3 Convolver with Run-Time Reconfigurable Vector Multiplier in Atmel AT6000 FPGAs. 3x3 Convolver with Run-Time Reconfigurable Vector Multiplier in Atmel AT6000 s Introduction Convolution is one of the basic and most common operations in both analog and digital domain signal processing.

More information

Data Structures Fibonacci Heaps, Amortized Analysis

Data Structures Fibonacci Heaps, Amortized Analysis Chapter 4 Data Structures Fibonacci Heaps, Amortized Analysis Algorithm Theory WS 2012/13 Fabian Kuhn Fibonacci Heaps Lacy merge variant of binomial heaps: Do not merge trees as long as possible Structure:

More information

Architectures and Design Methodologies for Micro and Nanocomputing

Architectures and Design Methodologies for Micro and Nanocomputing Architectures and Design Methodologies for Micro and Nanocomputing PhD Poster Day, December 4, 2014 Matteo Bollo 1 (ID: 24367, I PhD Year) Tutor: Maurizio Zamboni 1 Collaborators: Mariagrazia Graziano

More information

VLSI IMPLEMENTATION OF INTERNET CHECKSUM CALCULATION FOR 10 GIGABIT ETHERNET

VLSI IMPLEMENTATION OF INTERNET CHECKSUM CALCULATION FOR 10 GIGABIT ETHERNET VLSI IMPLEMENTATION OF INTERNET CHECKSUM CALCULATION FOR 10 GIGABIT ETHERNET Tomas Henriksson, Niklas Persson and Dake Liu Department of Electrical Engineering, Linköpings universitet SE-581 83 Linköping

More information

A DA Serial Multiplier Technique based on 32- Tap FIR Filter for Audio Application

A DA Serial Multiplier Technique based on 32- Tap FIR Filter for Audio Application A DA Serial Multiplier Technique ased on 32- Tap FIR Filter for Audio Application K Balraj 1, Ashish Raman 2, Dinesh Chand Gupta 3 Department of ECE Department of ECE Department of ECE Dr. B.R. Amedkar

More information

(Refer Slide Time: 00:01:16 min)

(Refer Slide Time: 00:01:16 min) Digital Computer Organization Prof. P. K. Biswas Department of Electronic & Electrical Communication Engineering Indian Institute of Technology, Kharagpur Lecture No. # 04 CPU Design: Tirning & Control

More information

CSE140 Homework #7 - Solution

CSE140 Homework #7 - Solution CSE140 Spring2013 CSE140 Homework #7 - Solution You must SHOW ALL STEPS for obtaining the solution. Reporting the correct answer, without showing the work performed at each step will result in getting

More information

Binary Numbering Systems

Binary Numbering Systems Binary Numbering Systems April 1997, ver. 1 Application Note 83 Introduction Binary numbering systems are used in virtually all digital systems, including digital signal processing (DSP), networking, and

More information

The BBP Algorithm for Pi

The BBP Algorithm for Pi The BBP Algorithm for Pi David H. Bailey September 17, 2006 1. Introduction The Bailey-Borwein-Plouffe (BBP) algorithm for π is based on the BBP formula for π, which was discovered in 1995 and published

More information

Managing Variability in Software Architectures 1 Felix Bachmann*

Managing Variability in Software Architectures 1 Felix Bachmann* Managing Variability in Software Architectures Felix Bachmann* Carnegie Bosch Institute Carnegie Mellon University Pittsburgh, Pa 523, USA fb@sei.cmu.edu Len Bass Software Engineering Institute Carnegie

More information

The implementation and performance/cost/power analysis of the network security accelerator on SoC applications

The implementation and performance/cost/power analysis of the network security accelerator on SoC applications The implementation and performance/cost/power analysis of the network security accelerator on SoC applications Ruei-Ting Gu grating@eslab.cse.nsysu.edu.tw Kuo-Huang Chung khchung@eslab.cse.nsysu.edu.tw

More information

Fragmentation and Data Allocation in the Distributed Environments

Fragmentation and Data Allocation in the Distributed Environments Annals of the University of Craiova, Mathematics and Computer Science Series Volume 38(3), 2011, Pages 76 83 ISSN: 1223-6934, Online 2246-9958 Fragmentation and Data Allocation in the Distributed Environments

More information

This Unit: Floating Point Arithmetic. CIS 371 Computer Organization and Design. Readings. Floating Point (FP) Numbers

This Unit: Floating Point Arithmetic. CIS 371 Computer Organization and Design. Readings. Floating Point (FP) Numbers This Unit: Floating Point Arithmetic CIS 371 Computer Organization and Design Unit 7: Floating Point App App App System software Mem CPU I/O Formats Precision and range IEEE 754 standard Operations Addition

More information

VHDL GUIDELINES FOR SYNTHESIS

VHDL GUIDELINES FOR SYNTHESIS VHDL GUIDELINES FOR SYNTHESIS Claudio Talarico For internal use only 1/19 BASICS VHDL VHDL (Very high speed integrated circuit Hardware Description Language) is a hardware description language that allows

More information

An Extension to DNA Based Fredkin Gate Circuits: Design of Reversible Sequential Circuits using Fredkin Gates

An Extension to DNA Based Fredkin Gate Circuits: Design of Reversible Sequential Circuits using Fredkin Gates An Extension to DNA Based Fredkin Gate Circuits: Design of Reversible Sequential Circuits using Fredkin Gates Himanshu Thapliyal and M.B Srinivas (thapliyalhimanshu@yahoo.com, srinivas@iiit.net) Center

More information

1. Give the 16 bit signed (twos complement) representation of the following decimal numbers, and convert to hexadecimal:

1. Give the 16 bit signed (twos complement) representation of the following decimal numbers, and convert to hexadecimal: Exercises 1 - number representations Questions 1. Give the 16 bit signed (twos complement) representation of the following decimal numbers, and convert to hexadecimal: (a) 3012 (b) - 435 2. For each of

More information

CISC, RISC, and DSP Microprocessors

CISC, RISC, and DSP Microprocessors CISC, RISC, and DSP Microprocessors Douglas L. Jones ECE 497 Spring 2000 4/6/00 CISC, RISC, and DSP D.L. Jones 1 Outline Microprocessors circa 1984 RISC vs. CISC Microprocessors circa 1999 Perspective:

More information

1. True or False? A voltage level in the range 0 to 2 volts is interpreted as a binary 1.

1. True or False? A voltage level in the range 0 to 2 volts is interpreted as a binary 1. File: chap04, Chapter 04 1. True or False? A voltage level in the range 0 to 2 volts is interpreted as a binary 1. 2. True or False? A gate is a device that accepts a single input signal and produces one

More information

CS201: Architecture and Assembly Language

CS201: Architecture and Assembly Language CS201: Architecture and Assembly Language Lecture Three Brendan Burns CS201: Lecture Three p.1/27 Arithmetic for computers Previously we saw how we could represent unsigned numbers in binary and how binary

More information

Advanced Computer Architecture-CS501. Computer Systems Design and Architecture 2.1, 2.2, 3.2

Advanced Computer Architecture-CS501. Computer Systems Design and Architecture 2.1, 2.2, 3.2 Lecture Handout Computer Architecture Lecture No. 2 Reading Material Vincent P. Heuring&Harry F. Jordan Chapter 2,Chapter3 Computer Systems Design and Architecture 2.1, 2.2, 3.2 Summary 1) A taxonomy of

More information

The Theory of Concept Analysis and Customer Relationship Mining

The Theory of Concept Analysis and Customer Relationship Mining The Application of Association Rule Mining in CRM Based on Formal Concept Analysis HongSheng Xu * and Lan Wang College of Information Technology, Luoyang Normal University, Luoyang, 471022, China xhs_ls@sina.com

More information

A Lab Course on Computer Architecture

A Lab Course on Computer Architecture A Lab Course on Computer Architecture Pedro López José Duato Depto. de Informática de Sistemas y Computadores Facultad de Informática Universidad Politécnica de Valencia Camino de Vera s/n, 46071 - Valencia,

More information

Hardware Task Scheduling and Placement in Operating Systems for Dynamically Reconfigurable SoC

Hardware Task Scheduling and Placement in Operating Systems for Dynamically Reconfigurable SoC Hardware Task Scheduling and Placement in Operating Systems for Dynamically Reconfigurable SoC Yuan-Hsiu Chen and Pao-Ann Hsiung National Chung Cheng University, Chiayi, Taiwan 621, ROC. pahsiung@cs.ccu.edu.tw

More information

A New Reversible TSG Gate and Its Application For Designing Efficient Adder Circuits

A New Reversible TSG Gate and Its Application For Designing Efficient Adder Circuits A New Reversible TSG Gate and Its Application For Designing Efficient Adder s Himanshu Thapliyal Center for VLSI and Embedded System Technologies International Institute of Information Technology Hyderabad-500019,

More information

A Processor Generation Method from Instruction Behavior Description Based on Specification of Pipeline Stages and Functional Units

A Processor Generation Method from Instruction Behavior Description Based on Specification of Pipeline Stages and Functional Units A Processor Generation Method from Instruction Behavior Description Based on Specification of Pipeline Stages and Functional Units Takeshi SHIRO, Masaaki ABE, Keishi SAKANUSHI, Yoshinori TAKEUCHI, and

More information

Algorithms and Data Structures

Algorithms and Data Structures Algorithms and Data Structures Part 2: Data Structures PD Dr. rer. nat. habil. Ralf-Peter Mundani Computation in Engineering (CiE) Summer Term 2016 Overview general linked lists stacks queues trees 2 2

More information

CMOS Binary Full Adder

CMOS Binary Full Adder CMOS Binary Full Adder A Survey of Possible Implementations Group : Eren Turgay Aaron Daniels Michael Bacelieri William Berry - - Table of Contents Key Terminology...- - Introduction...- 3 - Design Architectures...-

More information

Counters and Decoders

Counters and Decoders Physics 3330 Experiment #10 Fall 1999 Purpose Counters and Decoders In this experiment, you will design and construct a 4-bit ripple-through decade counter with a decimal read-out display. Such a counter

More information

High-Level Synthesis for FPGA Designs

High-Level Synthesis for FPGA Designs High-Level Synthesis for FPGA Designs BRINGING BRINGING YOU YOU THE THE NEXT NEXT LEVEL LEVEL IN IN EMBEDDED EMBEDDED DEVELOPMENT DEVELOPMENT Frank de Bont Trainer consultant Cereslaan 10b 5384 VT Heesch

More information

CS101 Lecture 26: Low Level Programming. John Magee 30 July 2013 Some material copyright Jones and Bartlett. Overview/Questions

CS101 Lecture 26: Low Level Programming. John Magee 30 July 2013 Some material copyright Jones and Bartlett. Overview/Questions CS101 Lecture 26: Low Level Programming John Magee 30 July 2013 Some material copyright Jones and Bartlett 1 Overview/Questions What did we do last time? How can we control the computer s circuits? How

More information

EE482: Advanced Computer Organization Lecture #11 Processor Architecture Stanford University Wednesday, 31 May 2000. ILP Execution

EE482: Advanced Computer Organization Lecture #11 Processor Architecture Stanford University Wednesday, 31 May 2000. ILP Execution EE482: Advanced Computer Organization Lecture #11 Processor Architecture Stanford University Wednesday, 31 May 2000 Lecture #11: Wednesday, 3 May 2000 Lecturer: Ben Serebrin Scribe: Dean Liu ILP Execution

More information

Divide: Paper & Pencil. Computer Architecture ALU Design : Division and Floating Point. Divide algorithm. DIVIDE HARDWARE Version 1

Divide: Paper & Pencil. Computer Architecture ALU Design : Division and Floating Point. Divide algorithm. DIVIDE HARDWARE Version 1 Divide: Paper & Pencil Computer Architecture ALU Design : Division and Floating Point 1001 Quotient Divisor 1000 1001010 Dividend 1000 10 101 1010 1000 10 (or Modulo result) See how big a number can be

More information

ECE 0142 Computer Organization. Lecture 3 Floating Point Representations

ECE 0142 Computer Organization. Lecture 3 Floating Point Representations ECE 0142 Computer Organization Lecture 3 Floating Point Representations 1 Floating-point arithmetic We often incur floating-point programming. Floating point greatly simplifies working with large (e.g.,

More information

Useful Number Systems

Useful Number Systems Useful Number Systems Decimal Base = 10 Digit Set = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9} Binary Base = 2 Digit Set = {0, 1} Octal Base = 8 = 2 3 Digit Set = {0, 1, 2, 3, 4, 5, 6, 7} Hexadecimal Base = 16 = 2

More information

Innovative improvement of fundamental metrics including power dissipation and efficiency of the ALU system

Innovative improvement of fundamental metrics including power dissipation and efficiency of the ALU system Innovative improvement of fundamental metrics including power dissipation and efficiency of the ALU system Joseph LaBauve Department of Electrical and Computer Engineering University of Central Florida

More information

VHDL DESIGN OF EDUCATIONAL, MODERN AND OPEN- ARCHITECTURE CPU

VHDL DESIGN OF EDUCATIONAL, MODERN AND OPEN- ARCHITECTURE CPU VHDL DESIGN OF EDUCATIONAL, MODERN AND OPEN- ARCHITECTURE CPU Martin Straka Doctoral Degree Programme (1), FIT BUT E-mail: strakam@fit.vutbr.cz Supervised by: Zdeněk Kotásek E-mail: kotasek@fit.vutbr.cz

More information

Introduction to Xilinx System Generator Part II. Evan Everett and Michael Wu ELEC 433 - Spring 2013

Introduction to Xilinx System Generator Part II. Evan Everett and Michael Wu ELEC 433 - Spring 2013 Introduction to Xilinx System Generator Part II Evan Everett and Michael Wu ELEC 433 - Spring 2013 Outline Introduction to FPGAs and Xilinx System Generator System Generator basics Fixed point data representation

More information

IMPLEMENTATION OF BACKEND SYNTHESIS AND STATIC TIMING ANALYSIS OF PROCESSOR LOCAL BUS(PLB) PERFORMANCE MONITOR

IMPLEMENTATION OF BACKEND SYNTHESIS AND STATIC TIMING ANALYSIS OF PROCESSOR LOCAL BUS(PLB) PERFORMANCE MONITOR International Journal of Engineering & Science Research IMPLEMENTATION OF BACKEND SYNTHESIS AND STATIC TIMING ANALYSIS OF PROCESSOR LOCAL BUS(PLB) PERFORMANCE MONITOR ABSTRACT Pathik Gandhi* 1, Milan Dalwadi

More information

An Efficient Hardware Architecture for Factoring Integers with the Elliptic Curve Method

An Efficient Hardware Architecture for Factoring Integers with the Elliptic Curve Method An Efficient Hardware Architecture for Factoring Integers with the Elliptic Curve Method Jens Franke 1, Thorsten Kleinjung 1, Christof Paar 2, Jan Pelzl 2, Christine Priplata 3, Martin Šimka4, Colin Stahlke

More information

CHAPTER 5 Round-off errors

CHAPTER 5 Round-off errors CHAPTER 5 Round-off errors In the two previous chapters we have seen how numbers can be represented in the binary numeral system and how this is the basis for representing numbers in computers. Since any

More information

Scheduling Shop Scheduling. Tim Nieberg

Scheduling Shop Scheduling. Tim Nieberg Scheduling Shop Scheduling Tim Nieberg Shop models: General Introduction Remark: Consider non preemptive problems with regular objectives Notation Shop Problems: m machines, n jobs 1,..., n operations

More information

Learning in Abstract Memory Schemes for Dynamic Optimization

Learning in Abstract Memory Schemes for Dynamic Optimization Fourth International Conference on Natural Computation Learning in Abstract Memory Schemes for Dynamic Optimization Hendrik Richter HTWK Leipzig, Fachbereich Elektrotechnik und Informationstechnik, Institut

More information

Lecture 8: Binary Multiplication & Division

Lecture 8: Binary Multiplication & Division Lecture 8: Binary Multiplication & Division Today s topics: Addition/Subtraction Multiplication Division Reminder: get started early on assignment 3 1 2 s Complement Signed Numbers two = 0 ten 0001 two

More information

SPARC64 VIIIfx: CPU for the K computer

SPARC64 VIIIfx: CPU for the K computer SPARC64 VIIIfx: CPU for the K computer Toshio Yoshida Mikio Hondo Ryuji Kan Go Sugizaki SPARC64 VIIIfx, which was developed as a processor for the K computer, uses Fujitsu Semiconductor Ltd. s 45-nm CMOS

More information

Design and Analysis of Parallel AES Encryption and Decryption Algorithm for Multi Processor Arrays

Design and Analysis of Parallel AES Encryption and Decryption Algorithm for Multi Processor Arrays IOSR Journal of VLSI and Signal Processing (IOSR-JVSP) Volume 5, Issue, Ver. III (Jan - Feb. 205), PP 0- e-issn: 239 4200, p-issn No. : 239 497 www.iosrjournals.org Design and Analysis of Parallel AES

More information

EXPERIMENT 4. Parallel Adders, Subtractors, and Complementors

EXPERIMENT 4. Parallel Adders, Subtractors, and Complementors EXPERIMENT 4. Parallel Adders, Subtractors, and Complementors I. Introduction I.a. Objectives In this experiment, parallel adders, subtractors and complementors will be designed and investigated. In the

More information

BINARY CODED DECIMAL: B.C.D.

BINARY CODED DECIMAL: B.C.D. BINARY CODED DECIMAL: B.C.D. ANOTHER METHOD TO REPRESENT DECIMAL NUMBERS USEFUL BECAUSE MANY DIGITAL DEVICES PROCESS + DISPLAY NUMBERS IN TENS IN BCD EACH NUMBER IS DEFINED BY A BINARY CODE OF 4 BITS.

More information

5 Combinatorial Components. 5.0 Full adder. Full subtractor

5 Combinatorial Components. 5.0 Full adder. Full subtractor 5 Combatorial Components Use for data transformation, manipulation, terconnection, and for control: arithmetic operations - addition, subtraction, multiplication and division. logic operations - AND, OR,

More information

Efficient Interconnect Design with Novel Repeater Insertion for Low Power Applications

Efficient Interconnect Design with Novel Repeater Insertion for Low Power Applications Efficient Interconnect Design with Novel Repeater Insertion for Low Power Applications TRIPTI SHARMA, K. G. SHARMA, B. P. SINGH, NEHA ARORA Electronics & Communication Department MITS Deemed University,

More information