Chapter 4. Arithmetic for Computers

Similar documents

Lecture 8: Binary Multiplication & Division

Binary Adders: Half Adders and Full Adders

CSE140 Homework #7 - Solution

Digital Logic Design. Basics Combinational Circuits Sequential Circuits. Pu-Jen Cheng

Let s put together a Manual Processor

Binary Division. Decimal Division. Hardware for Binary Division. Simple 16-bit Divider Circuit

Implementation of Modified Booth Algorithm (Radix 4) and its Comparison with Booth Algorithm (Radix-2)

Sistemas Digitais I LESI - 2º ano

Chapter 4 Register Transfer and Microoperations. Section 4.1 Register Transfer Language

Adder.PPT(10/1/2009) 5.1. Lecture 13. Adder Circuits

A single register, called the accumulator, stores the. operand before the operation, and stores the result. Add y # add y from memory to the acc

EE 261 Introduction to Logic Circuits. Module #2 Number Systems

NEW adder cells are useful for designing larger circuits despite increase in transistor count by four per cell.

This Unit: Floating Point Arithmetic. CIS 371 Computer Organization and Design. Readings. Floating Point (FP) Numbers

DEPARTMENT OF INFORMATION TECHNLOGY

Computer Science 281 Binary and Hexadecimal Review

To convert an arbitrary power of 2 into its English equivalent, remember the rules of exponential arithmetic:

Lecture 2. Binary and Hexadecimal Numbers

Binary full adder. 2-bit ripple-carry adder. CSE 370 Spring 2006 Introduction to Digital Design Lecture 12: Adders

COMP 303 MIPS Processor Design Project 4: MIPS Processor Due Date: 11 December :59

CHAPTER 3 Boolean Algebra and Digital Logic

exclusive-or and Binary Adder R eouven Elbaz reouven@uwaterloo.ca Office room: DC3576

(Refer Slide Time: 00:01:16 min)

The string of digits in the binary number system represents the quantity

FORDHAM UNIVERSITY CISC Dept. of Computer and Info. Science Spring, The Binary Adder

Divide: Paper & Pencil. Computer Architecture ALU Design : Division and Floating Point. Divide algorithm. DIVIDE HARDWARE Version 1

Flip-Flops, Registers, Counters, and a Simple Processor

ECE410 Design Project Spring 2008 Design and Characterization of a CMOS 8-bit Microprocessor Data Path

CS201: Architecture and Assembly Language

earlier in the semester: The Full adder above adds two bits and the output is at the end. So if we do this eight times, we would have an 8-bit adder.

Counters and Decoders

Lab 1: Full Adder 0.0

COMBINATIONAL CIRCUITS

1. Convert the following base 10 numbers into 8-bit 2 s complement notation 0, -1, -12

VHDL Test Bench Tutorial

5 Combinatorial Components. 5.0 Full adder. Full subtractor

COMP 250 Fall 2012 lecture 2 binary representations Sept. 11, 2012

Understanding Logic Design

CS 61C: Great Ideas in Computer Architecture Finite State Machines. Machine Interpreta4on

Instruction Set Design

United States Naval Academy Electrical and Computer Engineering Department. EC262 Exam 1

A New Paradigm for Synchronous State Machine Design in Verilog

Binary Representation. Number Systems. Base 10, Base 2, Base 16. Positional Notation. Conversion of Any Base to Decimal.

Counters are sequential circuits which "count" through a specific state sequence.

1. True or False? A voltage level in the range 0 to 2 volts is interpreted as a binary 1.

LFSR BASED COUNTERS AVINASH AJANE, B.E. A technical report submitted to the Graduate School. in partial fulfillment of the requirements

Step : Create Dependency Graph for Data Path Step b: 8-way Addition? So, the data operations are: 8 multiplications one 8-way addition Balanced binary

Instruction Set Architecture. or How to talk to computers if you aren t in Star Trek

Systems I: Computer Organization and Architecture

3.Basic Gate Combinations

Today. Binary addition Representing negative numbers. Andrew H. Fagg: Embedded Real- Time Systems: Binary Arithmetic

An Efficient RNS to Binary Converter Using the Moduli Set {2n + 1, 2n, 2n 1}

BOOLEAN ALGEBRA & LOGIC GATES

Oct: 50 8 = 6 (r = 2) 6 8 = 0 (r = 6) Writing the remainders in reverse order we get: (50) 10 = (62) 8

Session 7 Fractions and Decimals

BINARY CODED DECIMAL: B.C.D.

Asynchronous counters, except for the first block, work independently from a system clock.

Combinational Logic Design

CSE140: Components and Design Techniques for Digital Systems

Gates, Circuits, and Boolean Algebra

Module 3: Floyd, Digital Fundamental

Figure 8-1 Four Possible Results of Adding Two Bits

Pre-Algebra Lecture 6

CDA 3200 Digital Systems. Instructor: Dr. Janusz Zalewski Developed by: Dr. Dahai Guo Spring 2012

Computer organization

Chapter 4: Computer Codes

Chapter 2 Logic Gates and Introduction to Computer Architecture

RN-coding of Numbers: New Insights and Some Applications

FORDHAM UNIVERSITY CISC Dept. of Computer and Info. Science Spring, Lab 2. The Full-Adder

Levent EREN A-306 Office Phone: INTRODUCTION TO DIGITAL LOGIC

RN-Codings: New Insights and Some Applications

Positional Numbering System

CMOS Binary Full Adder

Digital Electronics Detailed Outline

MACHINE INSTRUCTIONS AND PROGRAMS

Two's Complement Adder/Subtractor Lab L03

Lecture 5: Gate Logic Logic Optimization

plc numbers Encoded values; BCD and ASCII Error detection; parity, gray code and checksums

Microprocessor & Assembly Language

Addressing The problem. When & Where do we encounter Data? The concept of addressing data' in computations. The implications for our machine design(s)

Part 1 Expressions, Equations, and Inequalities: Simplifying and Solving

CS101 Lecture 26: Low Level Programming. John Magee 30 July 2013 Some material copyright Jones and Bartlett. Overview/Questions

Zuse's Z3 Square Root Algorithm Talk given at Fall meeting of the Ohio Section of the MAA October College of Wooster

Reduced Instruction Set Computer (RISC)

Circuits and Boolean Expressions

Chapter 5 Instructor's Manual

Multiplying and Dividing Signed Numbers. Finding the Product of Two Signed Numbers. (a) (3)( 4) ( 4) ( 4) ( 4) 12 (b) (4)( 5) ( 5) ( 5) ( 5) ( 5) 20

Lecture 12: More on Registers, Multiplexers, Decoders, Comparators and Wot- Nots

Systems I: Computer Organization and Architecture

Karnaugh Maps & Combinational Logic Design. ECE 152A Winter 2012

Multipliers. Introduction

Introducción. Diseño de sistemas digitales.1

Tom wants to find two real numbers, a and b, that have a sum of 10 and have a product of 10. He makes this table.

Welcome to Basic Math Skills!

Take-Home Exercise. z y x. Erik Jonsson School of Engineering and Computer Science. The University of Texas at Dallas

Systems I: Computer Organization and Architecture

Systems I: Computer Organization and Architecture

Dr Brian Beaudrie pg. 1

Transcription:

Chapter 4 Arithmetic for Computers Arithmetic Where we've been: Performance (seconds, cycles, instructions) What's up ahead: Implementing the Architecture operation a b 32 32 ALU 32 result 2

Constructing an ALU (Arithmetic Logic Unit) ALU is a device that performs the arithmetic operations like addition and subtraction, or logical operations like AND and OR. An ALU can be constructed from four hardware building blocks: AND gate, OR gate, Inverter, and Multiplexor. (What is the truth table of each?) The Multiplexor: If S == then C = A else C = B Selects one of the inputs to be the output, based on a control input. S A B C 3 ALU (Cont.) Let's build an ALU to support the andi and ori instructions We'll just build a bit ALU, and use 32 of them because MIPS word is 32 bits wide operation op a b res a b result 4

The -bit Logical Unit for AND and OR Operation a Result b 5 Different Implementations Not easy to decide the best way to build something Don't want too many inputs to a single gate Don t want to have to go through too many gates for our purposes, ease of comprehension is important Let's look at a -bit ALU for addition: a b CarryOut Sum A full adder or a (3,2) adder has three inputs (a, b, & Cin) and two outputs (Sum & Cout). A half adder or a (2,2) adder has only 2 inputs (a & b) and 2 outputs (Sum & Cout). How could we build a -bit ALU for Add, AND, and OR? How could we build a 32-bit ALU? 6

7 Input and Output Specification for -bit Adder Sum CarryOut b a Outputs Inputs 8 Values of the Inputs when CarryOut is a CarryOut = (b. ) + (a. ) + (a. b) + (a. b. ) If a. b. is true, then all of the other terms must be true, so we leave out the last term. c out = a b + a c in + b c in b a Inputs

Adder Hardware for the Carry Out Signal a b CarryOut The above Hardware was constructed from the following equation: CarryOut = (b. ) + (a. ) + (a. b) 9 The Sum bit The Sum bit is set to when exactly one input is or when all three inputs are. (check Truth Table on slide 7). Sum = (a. b. ) + (a. b. ) + (a. b. ) + (a. b. ) Sum = a XOR b XOR c in Draw the logic circuit? (Left as exercise).

A -bit ALU that Performs AND, OR, & Addition Operation a Result b 2 CarryOut A 32 bit ALU Constructed from 32 -bit ALUs. Ripple carry: CarryOut of the less significant bit is connected to the of the more significant bit. a b a b C a rry In ALU C a rry O u t C a rry In ALU C a rry O u t O p e ra tio n Result Result a2 b2 C a rry In ALU2 C a rry O u t Result2 a3 b3 C a rry In ALU3 Result3 2

What about subtraction (a b)? Two's complement approach: just negate b and add. How do we negate? A very clever solution: Binvert Operation a Result b 2 CarryOut 3 Tailoring the ALU to the MIPS Need to support the set-on-less-than instruction (slt) remember: slt is an arithmetic instruction produces a if rs < rt and otherwise use subtraction: (a-b) < implies a < b Need to support test for equality (beq $t5, $t6, $t7) use subtraction: (a-b) = implies a = b 4

Detecting Overflow No overflow when adding a positive and a negative number No overflow when signs are the same for subtraction Overflow occurs when the value affects the sign: overflow when adding two positives yields a negative or, adding two negatives gives a positive or, subtract a negative from a positive and get a negative or, subtract a positive from a negative and get a positive 5 Supporting slt (A -bit ALU that performs AND, OR, ) In slt operation, the least significant bit is set either to or, and the rest of the bits (3 bits) are set to. means negative and means positive. So, we need only to connect the sign bit from the adder output to the least significant bit to get set on less than. The result output from the most significant ALU bit for the slt operation is not the output of the adder. So, the ALU output for slt is input value Less a. B in v e r t O p e r a tio n C a r r y I n a b 2 Less 3 C a r r y O u t R e s u lt

Supporting slt (A -bit ALU for the most significant bit) We need a new -bit ALU for the most significant bit that has an extra output bit: the adder output. This figure shows the design, with this new adder output line called Set, and used only for slt. As long as we need a special ALU for the most significant bit, we added the overflow detection logic since it is also associated with that bit. a Less Binvert b 2 Overflow detection Operation 3 Result Set Overflow b. 7 A 32-bit ALU constructed from 3 copies of -bit ALU) Binvert Operation a b ALU Less CarryOut Result a b ALU Less CarryOut Result a2 b2 ALU2 Less CarryOut Result2 a3 b3 ALU3 Less Set Result3 Overflow 8

Test for equality [ (a b = ) a = b ] The simplest way is to OR all the outputs together and then send that signal through an inverter as shown in the figure. Bnegate a b ALU Less CarryOut Result Operation a b ALU Less CarryOut Result Zero Note: zero is a when the result is zero! When ALU do subtraction, set both and Binvert to. For Adds or logical operations, we want both control lines to be. So, for simplicity combining the and Binvert to a single control line called Bnegate a2 b2 a3 b3 ALU2 Less CarryOut ALU3 Less Result2 Result3 Set Overflow 9 Control Lines We can think of the combination of the -bit Bnegate line and the 2-bit Operation lines as 3-bit control lines for the ALU, telling it to perform add, subtract, AND, OR, or set on less than. Notice control lines: = and = or = add = subtract = slt 2

The Symbol Commonly Used to Represent an ALU ALU operation a b ALU Zero result Overflow CarryOut 2 Conclusion We can build an ALU to support the MIPS instruction set key idea: use multiplexor to select the output we want we can efficiently perform subtraction using two s complement we can replicate a -bit ALU to produce a 32-bit ALU Important points about hardware all of the gates are always working the speed of a gate is affected by the number of inputs to the gate the speed of a circuit is affected by the number of gates in series (on the critical path or the deepest level of logic ) Our primary focus: Clever changes to organization can improve performance (similar to using better algorithms in software) we ll look at two examples for addition and multiplication 22

Problem: Ripple Carry Adder is Slow Is a 32-bit ALU as fast as a -bit ALU? Is there more than one way to do addition? two extremes: ripple carry and sum-of-products Can you see the ripple? How could you get rid of it? c = b c + a c +a b c 2 = b c + a c +a b c 2 = c 3 = b 2 c 2 + a 2 c 2 +a 2 b 2 c 3 = c 4 = b 3 c 3 + a 3 c 3 +a 3 b 3 c 4 = Not feasible! Why? 23 Carry Lookahead Adder (First Level of Abstraction) An approach in-between our two extremes Ci+ = (bi. ci) + (ai. ci) + (ai.bi) = (ai. bi) + (ai + bi). ci Example: c2=(a.b)+(a+b).((a.b)+(a+b).c) Motivation: If we didn't know the value of carry-in, what could we do? When would we always generate a carry? g i = a i b i When would we propagate the carry? p i = a i + b i Using generate and propagate to define ci+ we get: ci+ = gi + pi. ci Suppose gi equals. Then ci+ = gi + pi. ci = + pi. ci = That is, the adder generates a CarryOut (ci+) independent of the value of (ci) 24

Carry Lookahead Adder (Continue) Suppose that gi is and pi is. Then ci+ = gi + pi. ci = +. ci = ci That is, the adder propagates to a CarryOut. So, i+ = if either gi is or both pi= and i=. Example: We express the signals more economically. Let s show it for 4 bits: c= (g+(p.c) c2= g+(p.g)+(p.p.c) c3= g2+(p2.g)+(p2.p.g)+(p2.p.p.c) c4= g3+(p3.g2)+(p3.p2.g)+(p3.p2.p.g)+(p3.p2.p.p.c) Even this simplified form leads to large equations. For example, consider a 6-bit adder. 25 Carry Lookahead Adder (Second Level of Abstraction) To perform carry lookahead adder for 4-bit adders, we need propagate and generate signals at higher level. Example: For four 4-bit adder blocks: P= p3.p2.p.p P= p7.p6.p5.p4 P2= p.p.p9.p8 P3= p5.p4.p3.p2 That is, super propagate signal for the 4-bit abstraction (Pi) is true only if each of the bits in the group will propagate a carry. 26

Carry Lookahead Adder (Continue) For the super generate signal (Gi), we care only if there is a carry out of the most significant bit of the 4-bit group. It also occurs if an earlier generate is true and all the intermediate propagates, including that of the most significant bit, are also true: G= g3+(p3.g2)+(p3.p2.g)+(p3.p2.p.g) G= g7+(p7.g6)+(p7.p6.g5)+(p7.p6.p5.g4) G2= g+(p.g)+(p.p.g9)+(p.p.p9.g8) G3= g5+(p5.g4)+(p5.p4.g3)+(p5.p4.p3.g2) 27 Carry Lookahead Adder (Continue) The carry in for each 4-bit group of the 6-bit adder are: C= G + (P.c) C2= G + (P.G) + (P.P.c) C3= G2 + (P2.G) + (P2.P.G) + (P2.P.P.c) C4= G3 + (P3.G2) + (P3.P2.G) + (P3.P2.P.G) + (P3.P2.P.P.c) 28

Four 4-bit ALUs using Carry Lookahead to form a 6-bit adder C a r r y I n a b a b a 2 b 2 a 3 b 3 C a r r y I n A L U P G C p i g i c i + R e s u l t - - 3 C a r r y - l o o k a h e a d u n i t a 4 b 4 a 5 b 5 a 6 b 6 a 7 b 7 C a r r y I n A L U P G C 2 p i + g i + c i + 2 R e s u l t 4 - - 7 a 8 b 8 a 9 b 9 a b a b C a r r y I n A L U 2 P 2 G 2 C 3 p i + 2 g i + 2 c i + 3 R e s u l t 8 - - a 2 b 2 a 3 b 3 a 4 b 4 a 5 b 5 C a r r y I n A L U 3 P 3 G 3 C 4 p i + 3 g i + 3 c i + 4 R e s u l t 2 - - 5 C a r r y O u t 29 Example: Both Levels of the Propagate and Generate Determine the gi, pi, Pi, and Gi values of these two 6- bit numbers: a: two b: two Also, what is CarryOut 5 (C4)? Answer: Aligning the bits makes it easy to see the values of generate gi (ai. bi) and propagate pi (ai + bi): a: b: gi: pi: where the bits are numbered 5 to from left to right. 3

Answer (Continue) Next, the super propagates (P3, P2, P, P) are simply the AND of the lower-level propagates: P3 =... = P2 =... = P =... = P =... = The super generates are more complex, so use the following equations: G = g3 +(p3.g2)+(p3.p2.g)+(p3.p2.p.g) = +(.)+(..)+(...)= G = g7 +(p7.g6)+(p7.p6.g5)+(p7.p6.p5.g4) = +(.)+(..)+(...)= G2 = g +(p.g)+(p.p.g9)+(p.p.p9.g8) = +(.)+(..)+(...)= G3 = g5 +(p5.g4)+(p5.p4.g3)+(p5.p4.p3.g2) = +(.)+(..)+(...)= 3 Answer (Continue) Finally, CarryOut 5 is C4 = G3 +(P3.G2)+(P3.P2.G)+(P3.P2.P.G) +(P3.P2.P.P.c) = +(.)+(..)+(...)+(...)= Hence there is a carry out when adding these two 6-bit numbers. 32

Example: Speed of Ripple Carry versus Carry Lookahead Assume each AND or OR gate takes the same time for a signal to pass through it. Time is estimated by simply counting the number of gates along the longest path through a piece of logic. Compare the number of gate delays for the critical paths of two 6-bit adders, one using ripple carry and one using two-level carry lookahead. 33 Answer a b CarryOut The above figure shows that the carry out signal takes two gates delays per bit. Then the number of gate delays between a carry in to the least significant bit and the carry out to the most significant is 6 x 2 = 32. 34

Answer (Continue) For carry lookahead, the carry out of the most significant bit is just C4, defined in the previous example (Slide 32). It takes two level of logic to specify C4 in terms of Pi and Gi (the OR of several AND terms). Pi is specified in one level of logic (AND) using pi. Gi is specified in two levels using pi and gi. pi and gi are each one level of logic, defined in terms of ai and bi. If we assume one gate delay for each level of logic in these equations, the worst case is 2 + 2 + = 5 gate delays. Hence for 6-bit addition a carry-lookahead adder is six times faster, using this simple estimate of hardware speed. 35 Multiplication More complicated than addition accomplished via shifting and addition More time and more area Let's look at 3 versions based on grade school algorithm (multiplicand) x_ (multiplier) (Product) If we ignore the sign bits, the length of the multiplication of an n-bit multiplicand and an m-bit multiplier is a product that is n + m bits long. That is, n + m bits are required to represent all possible products. 36

Multiplication Each step of multiplication is simple:. Just place a copy of the multiplicand ( x multiplicand) in the proper place if the multiplier digit is, or 2. Place ( x multiplicand) in the proper place if the digit is. Negative numbers: convert and multiply there are better techniques, we won t look at them. 37 First Version of the Multiplication Hardware Multiplicand 64 bits Shift left 64-bit ALU Multiplier Shift right 32 bits Product 64 bits Write Control test 38

First Version of the Multiplication Hardware The Multiplicand register, ALU, and Product register are all 64 bits wide, with only the Multiplier register containing 32 bits. We will need to move the multiplicand left one digit each step as it may be added to the intermediate products. So, over 32 steps a 32-bit multiplicand would move 32 bits to the left. Hence we need a 64-bit Multiplicand register, initialized with the 32-bit multiplication in the right half and in the left half. The multiplier is shifted in the opposite direction at each step. The Product register initialized to. Control decides when to shift the Multiplicand and Multiplier registers and when to write new values into the Product register. 39 The First Multiplication Algorithm Using Hardware in Slide 38 S t a r t M u lt ip lie r =. T e s t M u lt ip lie r M u lt ip lie r = a. A d d m u lt ip lic a n d t o p r o d u c t a n d place the result in P roduct register 2. S hift the M ultiplicand register left bit 3. S hift the M ultiplier register right bit 3 2 n d r e p e t itio n? N o : < 3 2 r e p e tit io n s Y e s : 3 2 r e p e titio n s D o n e 4

The First Multiplication Algorithm If the least significant bit of the multiplier is, add the multiplicand to the product. If not, go to the next step. Shift the multiplicand left and the multiplier right in the next two steps. These three steps are repeated 32 times. 4 Example: First Multiply Algorithm Using 4-bit numbers to save space, multiply x. Iteration Step Initial Values Multiplier Multiplicand Product a: Prod =Prod + Mcand 2: Shift left Multiplicand 3: Shift right Multiplier 2 a: Prod =Prod + Mcand 2: Shift left Multiplicand 3: Shift right Multiplier 3 : no operation 2: Shift left Multiplicand 4 3: Shift right Multiplier : no operation 2: Shift left Multiplicand 3: Shift right Multiplier 42

Example: First Multiply Algorithm The table (slide 42) shows the value of each register for each of the steps labeled according to the algorithm (slide 4), with the final value of (6 in decimal). Color is used to indicate the register values that change on that step. The underline bit is the one examined to determine the operation of the next step. If each step took a clock cycle, this algorithm would require 32 x 3 steps = 96 clock cycles to multiply two 32-bit numbers. 43 Second Version of the Multiplication Algorithm and Hardware Half of the bits of the multiplicand in the first algorithm were always, so only half could contain useful bit values. A full 64-bit ALU seemed wasteful and slow since half of the adder bits were adding to the intermediate sum. The first algorithm shifts the multiplicand left with s inserted in the new positions, so the multiplicand cannot affect the least significant bits of the product. Instead of shifting the multiplicand left, what if we shift the product right? Now the multiplicand would be fixed relative to the product, and since we are adding only 32 bits, the adder need be only 32 bits wide. 44

Second Version of the Multiplication Hardware Multiplicand 32 bits 32-bit ALU Multiplier Shift right 32 bits Product 64 bits Shift right Write Control test 45 Second Version of the Multiplication Hardware Compare with the first version in slide 38. The Multiplicand register, ALU, and Multiplier register are all 32 bits wide with only the Product register left at 64 bits. Now the product is shifted right. 46

Second Multiplication Algorithm Using the Hardware in Slide 45 S t a r t M u ltip lie r =. T e s t M u ltip lie r M u ltip lie r = a. A d d m u lt ip lic a n d t o t h e le ft h a lf o f the product and place the result in th e le ft h a lf o f th e P ro d u c t re g iste r 2. S h if t t h e Product register right b it 3. S hift the M ultiplier register right bit 3 2 n d r e p e t itio n? N o : < 3 2 re p e titio n s Y e s : 3 2 r e p e t itio n s D o n e 47 Second Multiplication Algorithm This algorithm starts with the 32-bit Multiplicand and 32-bit Multiplier registers set to their named values and the 64-bit Product register set to. The Product register is shifted right instead of shifting the multiplicand. This algorithm only forms a 32-bit sum, so only the left half of the 64-bit Product register is changed by the addition. 48

Example: Second Multiply Algorithm Multiply x using the algorithm in slide 47. Iteration Step Initial Values Multiplier Multiplicand Product a: Prod =Prod + Mcand 3: Shift right Multiplier 2 a: Prod =Prod + Mcand 3: Shift right Multiplier 3 : no operation 4 3: Shift right Multiplier : no operation 3: Shift right Multiplier 49 Final Version of the Multiplication Algorithm and Hardware The final observation was that the Product register had wasted space that matched exactly the size of the multiplier. The third version of the multiplication algorithm combines the right-most half of the product with the multiplier. The least significant bit of the 64-bit Product register (Product) now is the bit to be tested. 5

Third Version of the Multiplication Hardware Multiplicand 32 bits 32-bit ALU Product 64 bits Shift right Write Control test 5 Third Version of the Multiplication Hardware Comparing with the second version in slide 45. The separate Multiplier register has disappeared. The multiplier is placed instead in the right half of the Product register. 52

The Third Multiplication Algorithm S t a r t P r o d u c t =. T e s t P r o d u c t P r o d u c t = a. A d d m u ltip lic a n d t o t h e le ft h a lf o f th e p ro d u c t a n d p la c e th e re s u lt in th e le ft h a lf o f th e P r o d u c t r e g is t e r 2. S h if t th e P r o d u c t r e g is te r r ig h t b it 3 2 n d r e p e titio n? N o : < 3 2 r e p e titio n s Y e s : 3 2 re p e titio n s D o n e 53 The Third Multiplication Algorithm The algorithm starts by assigning the multiplier to the right half of the Product register, placing in the upper half. It needs only two steps because the Product and Multiplier registers have been combined. 54

Example: Third Multiply Algorithm Multiply x using the algorithm in slide 53. Iteration Initial Values Step Multiplicand Product a: Prod =Prod + Mcand 2 3 a: Prod =Prod + Mcand : no operation 4 : no operation 55 Signed Multiplication So far we have dealt with positive numbers. Signed numbers: First convert the Multiplier and Multiplicand to positive numbers and then remember the original signs. The algorithm runs for 3 iterations, leaving the signs out of the calculations. We need to negate the product only if the original signs disagree. 56

Booth s Algorithm Booth s algorithm changes the first step of the third multiplication algorithm in slide 53 looking at bit of the multiplier and then deciding whether to add the multiplicand to looking at 2 bits of the multiplier. The new first step has four cases, depending on the values of the 2 bits (check next slide). Assume that the pair of bits examined consists of the current bit and the bit to the right which has the current bit in the previous step. The second step is to shift the product right. 57 Booth s Algorithm The value of these 2 bits: Current bit Bit to the right Explanation Example Beginning of a run of s Middle of a run of s End of a run of s Middle of a run of s 58

Booth s Algorithm Booth s Algorithm steps:. Depending on the current and previous bits, do one of the following: : Middle of a string of s, so no arithmetic operation. : End of a string of s, so add the multiplicand to the left half of the product. : Beginning of a string of s, so subtract the multiplicand from the left half of the product. : Middle of a string of s, so no arithmetic operation. 2. As in the previous algorithm (version three), shift the Product register right bit. 59 Comparing Third Algorithm and Booth s Algorithm for Positive Numbers Multilicand Iteration Original Algorithm Step Product Booth s Algorithm Step Product Initial values Initial values : no operation a: no operation 2 a: Prod = Prod + Mcand c: Prod = Prod Mcand 3 a: Prod = Prod + Mcand d: no operation 4 : no operation b: Prod = Prod + Mcand 6

Comparing Example (Continue) Booth s algorithm starts with a to the right of the rightmost bit for the first stage. Booth s operation is identified according to the values in the 2 bits. By the fourth step, the two algorithms have the same values in the Product register. The one other requirement is that shifting the product right must preserve the sign of the intermediate result, since we are dealing with signed numbers. The solution is to extend the sign when the product is shifted to the right. Example: Step 2 of the second iteration turns into instead of. This shift is called an arithmetic right shift to differentiate it from a logical right shift. 6 Example: Booth s Algorithm Let s try Booth s algorithm with negative numbers: 2 x 3 = -6 or x = (in binary). Iteration Step Multiplicand Product Initial values c: Prod = Prod Mcand 2 b: Prod = Prod + Mcand 3 c: Prod = Prod Mcand 4 d: no operation 62

Multiply in MIPS MIPS provides a separate pair of 32-bit registers to contain the 64-bit product, called Hi and Lo. To produce a properly signed or unsigned product, MIPS has two instructions: multiply (mult) and multiply unsigned (multu). 63 Chapter Four Summary Constructing an ALU to do the following arithmetic and logical operations: AND, OR, addition, subtraction, less than, and equality. Ripple Adder versus Carry-Lookahead. Three versions of Multiplications. Signed Multiplication Booth s Algorithm. We are ready to move on (and implement the processor). 64