Subtraction: Addition s Tricky Pal. A 16-bit ALU. This Unit: Arithmetic and ALU Design. Shifts. Rotations. Barrel Shifter

Similar documents
Binary Division. Decimal Division. Hardware for Binary Division. Simple 16-bit Divider Circuit

This Unit: Floating Point Arithmetic. CIS 371 Computer Organization and Design. Readings. Floating Point (FP) Numbers

Binary Adders: Half Adders and Full Adders

Let s put together a Manual Processor

Lecture 8: Binary Multiplication & Division

Chapter 4 Register Transfer and Microoperations. Section 4.1 Register Transfer Language

CSE140 Homework #7 - Solution

Sistemas Digitais I LESI - 2º ano

Systems I: Computer Organization and Architecture

A single register, called the accumulator, stores the. operand before the operation, and stores the result. Add y # add y from memory to the acc

EE 261 Introduction to Logic Circuits. Module #2 Number Systems

Divide: Paper & Pencil. Computer Architecture ALU Design : Division and Floating Point. Divide algorithm. DIVIDE HARDWARE Version 1

Today. Binary addition Representing negative numbers. Andrew H. Fagg: Embedded Real- Time Systems: Binary Arithmetic

Digital Logic Design. Basics Combinational Circuits Sequential Circuits. Pu-Jen Cheng

LSN 2 Number Systems. ECT 224 Digital Computer Fundamentals. Department of Engineering Technology

Oct: 50 8 = 6 (r = 2) 6 8 = 0 (r = 6) Writing the remainders in reverse order we get: (50) 10 = (62) 8

CS201: Architecture and Assembly Language

Number and codes in digital systems

To convert an arbitrary power of 2 into its English equivalent, remember the rules of exponential arithmetic:

Implementation of Modified Booth Algorithm (Radix 4) and its Comparison with Booth Algorithm (Radix-2)

Lecture 5: Gate Logic Logic Optimization

Chapter 2 Logic Gates and Introduction to Computer Architecture

1. True or False? A voltage level in the range 0 to 2 volts is interpreted as a binary 1.

The string of digits in the binary number system represents the quantity

CPEN Digital Logic Design Binary Systems

plc numbers Encoded values; BCD and ASCII Error detection; parity, gray code and checksums

1. Convert the following base 10 numbers into 8-bit 2 s complement notation 0, -1, -12

CHAPTER 3 Boolean Algebra and Digital Logic

DEPARTMENT OF INFORMATION TECHNLOGY

what operations can it perform? how does it perform them? on what kind of data? where are instructions and data stored?

Chapter 1: Digital Systems and Binary Numbers

Understanding Logic Design

NEW adder cells are useful for designing larger circuits despite increase in transistor count by four per cell.

Lecture 4: Binary. CS442: Great Insights in Computer Science Michael L. Littman, Spring I-Before-E, Continued

Gates, Circuits, and Boolean Algebra

Computer Science 281 Binary and Hexadecimal Review

Lab 1: Full Adder 0.0

Click on the links below to jump directly to the relevant section

CS101 Lecture 26: Low Level Programming. John Magee 30 July 2013 Some material copyright Jones and Bartlett. Overview/Questions

Number Systems and Radix Conversion

Adder.PPT(10/1/2009) 5.1. Lecture 13. Adder Circuits

CPU Organization and Assembly Language

NUMBER SYSTEMS. 1.1 Introduction

exclusive-or and Binary Adder R eouven Elbaz reouven@uwaterloo.ca Office room: DC3576

CS 61C: Great Ideas in Computer Architecture Finite State Machines. Machine Interpreta4on

Levent EREN A-306 Office Phone: INTRODUCTION TO DIGITAL LOGIC

=

Instruction Set Architecture. or How to talk to computers if you aren t in Star Trek

(Refer Slide Time: 00:01:16 min)

Computer organization

Gate Delay Model. Estimating Delays. Effort Delay. Gate Delay. Computing Logical Effort. Logical Effort

Multipliers. Introduction

Digital Design. Assoc. Prof. Dr. Berna Örs Yalçın

Pipeline Hazards. Structure hazard Data hazard. ComputerArchitecture_PipelineHazard1

Introducción. Diseño de sistemas digitales.1

Introduction to Programming (in C++) Loops. Jordi Cortadella, Ricard Gavaldà, Fernando Orejas Dept. of Computer Science, UPC

Example. Introduction to Programming (in C++) Loops. The while statement. Write the numbers 1 N. Assume the following specification:

EC 362 Problem Set #2

CPU Organisation and Operation

United States Naval Academy Electrical and Computer Engineering Department. EC262 Exam 1

Fundamentals of Computer Science (FCPS) CTY Course Syllabus

2) Write in detail the issues in the design of code generator.

Binary Numbering Systems

Addressing The problem. When & Where do we encounter Data? The concept of addressing data' in computations. The implications for our machine design(s)

Asynchronous counters, except for the first block, work independently from a system clock.

EXPERIMENT 4. Parallel Adders, Subtractors, and Complementors

Digital Fundamentals. Lab 8 Asynchronous Counter Applications

Hardware Implementations of RSA Using Fast Montgomery Multiplications. ECE 645 Prof. Gaj Mike Koontz and Ryon Sumner

Instruction Set Design

Solution: start more than one instruction in the same clock cycle CPI < 1 (or IPC > 1, Instructions per Cycle) Two approaches:

International Journal of Electronics and Computer Science Engineering 1482

3.Basic Gate Combinations

5 Combinatorial Components. 5.0 Full adder. Full subtractor

FORDHAM UNIVERSITY CISC Dept. of Computer and Info. Science Spring, Lab 2. The Full-Adder

for the Bill Hanlon

CSI 333 Lecture 1 Number Systems

COMP 250 Fall 2012 lecture 2 binary representations Sept. 11, 2012

ECE410 Design Project Spring 2008 Design and Characterization of a CMOS 8-bit Microprocessor Data Path

RN-Codings: New Insights and Some Applications

Systems I: Computer Organization and Architecture

Fractions and Linear Equations

Useful Number Systems

Base Conversion written by Cathy Saxton

2011, The McGraw-Hill Companies, Inc. Chapter 3

RN-coding of Numbers: New Insights and Some Applications

Floating Point Fused Add-Subtract and Fused Dot-Product Units

Factoring Trinomials of the Form x 2 bx c

Using Logic to Design Computer Components

B.Sc.(Computer Science) and. B.Sc.(IT) Effective From July 2011

Computer Organization. and Instruction Execution. August 22

Systems I: Computer Organization and Architecture

A Prime Investigation with 7, 11, and 13

Chapter 2. Binary Values and Number Systems

An Introduction to the ARM 7 Architecture

Gates, Plexers, Decoders, Registers, Addition and Comparison

earlier in the semester: The Full adder above adds two bits and the output is at the end. So if we do this eight times, we would have an 8-bit adder.

Binary Number System. 16. Binary Numbers. Base 10 digits: Base 2 digits: 0 1

SIM-PL: Software for teaching computer hardware at secondary schools in the Netherlands

High Speed and Efficient 4-Tap FIR Filter Design Using Modified ETA and Multipliers

Transcription:

Subtraction: Addition s Tricky Pal Sign/magnitude subtraction is mental reverse addition Two s complement subtraction is addition How to subtract using an adder? sub A, B = add A, -B Negate B before adding (fast negation trick: B = B + 1) Isn t a subtraction then a negation and two additions? + No, an adder can implement A+B+1 by setting the carry-in to 1 + Clever, huh? A B 1 A -bit ALU Build an ALU with functions: add/sub, and, or, not,xor All of these already in CLA adder/subtracter add A B, sub A B (done already) not B is needed for subtraction and A,B are first level Gs or A,B are first level Ps xor A,B? S i = A A i^b i^c i B ~ 1 & G P CLA -sum ^ ~ ^ 27 28 This Unit: Arithmetic and ALU Design Shifts Application OS Compiler Firmware CPU I/O Memory Digital Circuits Gates & Transistors Integer Arithmetic and ALU Binary number representations Addition and subtraction The integer ALU Shifting and rotating Multiplication Division Floating Point Arithmetic Binary number representations FP arithmetic Accuracy Shift: move all bits in a direction (left or right) Denoted by << (left shift) and >> (right shift) in C/C++/Java ICQ: Left shift example: 11 << 2 =? ICQ: Right shift example: 11 >> 2 =? Shifts are useful for Bit manipulation: extracting and setting individual bits in words Multiplication and division by powers of 2 A * 4 = A << 2 A / 8 = A >> 3 A * 5 = (A << 2) + A Compilers use this optimization, called strength reduction Easier to shift than it is to multiply (in general) 29 3 Rotations Barrel Shifter Rotations are slightly different than shifts 111 rotated 2 to the right =? Rotations are generally less useful than shifts But their implementation is natural if a shifter is there MIPS has only shifts What about shifting left by any amount from to 15? Cycle input through left-shift-by-1 up to 15 times? Complicated, variable latency consecutive left-shift-by-1-or- circuits? Fixed latency, but would take too long Barrel shifter: four shift-left-by-x-or- circuits (X = 1,2,4,8) A <<8 <<4 <<2 <<1 O SHAMT 31

Right Shifts and Rotations Right shifts and rotations also have barrel implementations But are a little different Shift Registers Shift register: shift in place by constant quantity Sometimes that s a useful thing Right shifts Can be logical (shift in s) or arithmetic (shift in copies of MSB) srl 1111,2 result is 11 sra 1111,2 result is 1111 Caveat: sra is not equal to division by 2 of negative numbers Why might we want both types of right shifts? Rotations Mux in wires of upper/lower bits WE SEL I DFF DFF DFF DFF O 33 34 Base1 Multiplication Remember base 1 multiplication from 3rd grade? 43 // multiplicand * 12 // multiplier 86 + 43 5 // product Start with running total, repeat steps until no multiplier digits Multiply multiplicand by least significant multiplier digit Add to total Shift multiplicand one digit to the left (multiply by 1) Shift multiplier one digit to the right (divide by 1) of N-digit and M-digit numbers potentially has N+M digits Binary Multiplication 43 = 1111 // multiplicand * 12 = 11 // multiplier = = 172 = 1111 + 344 = 1111 5 = 11 // product Same thing except There are more individual steps (smaller base) + But each step is simpler Multiply multiplicand by least significant multiplier bit or 1 no actual multiplication, just add multiplicand or not Add to total: we know how to do that Shift multiplicand left, multiplier right by one bit: shift registers 35 36 Simple x=bit Circuit Inefficiencies with Simple Circuit <<1 >>1 <<1 >>1 + 11 x 11 Control algorithm: repeat times If LSB(multiplier) == 1, then add multiplicand to product Shift multiplicand left by 1 Shift multiplier right by 1 + Notice -bit addition, but multiplicand bits are always And -bits are always moving Solution? Instead of shifting multiplicand left, shift product right 37 38

Better -bit Another Inefficiency >>1 >>1 + >>1 11 x 11 Control algorithm: repeat times LSB(multiplier) == 1? Add multiplicand to upper half of product Shift multiplier right by 1 Shift product right by 1 + >>1 Notice one more inefficiency What is initially the lower half of product gets thrown out As useless lower half of product is shifted right, so is multiplier Solution: use lower half of product as multiplier 39 4 Even Better -bit + >>1 11 x 11 Control algorithm: repeat times LSB(multiplier) == 1? Add multiplicand to upper half of product Shift product right by 1 Multiplying Negative Numbers If multiplicand is negative, our algorithm still works As long as right shifts are arithmetic and not logical Try 1111*11 If multiplier is negative, the algorithm breaks Two solutions 1) Negate multiplier, then negate product 2) Booth s algorithm 41 42 Booth s Algorithm Notice the following equality (Booth did) 2 J + 2 J 1 + 2 J 2 + + 2 K = 2 J+1 2 K Example: 111 = 1-1 We can exploit this to create a faster multiplier How? Sequence of N 1s in the multiplier yields sequence of N additions Replace with one addition and one subtraction Booth In Action For each multiplier bit, also examine bit to its right : middle of a run of s, do nothing 1: beginning of a run of 1s, subtract multiplicand 11: middle of a run of 1s, do nothing 1: end of a run of 1s, add multiplicand 43 = 1111 * 12 = 11 = // multiplier bits _ (implicit ) + = // multiplier bits - 172 = 111111 // multiplier bits 1 + = // multiplier bits 11 + 688 = 1111 // multiplier bits 1 5 = 11 ICQ: so why is Booth better? 43 44

Booth Hardware Control algorithm: repeat times LSBs == 1? Subtract multiplicand from product LSBs == 1? Add multiplicand to product Shift product/multiplier right by 1 (not by 2!) + or ± >>1 2 45 Booth in Summary Performance/efficiency + Good for sequences of 3 or more 1s Replaces 3 (or more) adds with 1 add and 1 subtract Doesn t matter for sequences of 2 1s Replaces 2 adds with 1 add and 1 subtract (add = subtract) Actually bad for singleton 1s Replaces 1 add with 1 add and 1 subtract Bottom line Worst case multiplier (111) requires N/2 adds + N/2 subs What is the worst case multiplier for straight multiplication? How is this better than normal multiplication? 46 Modified Booth s Algorithm What if we detect singleton 1s and do the right thing? Examine multiplier bits in groups of 2s plus a helper bit on the right (as opposed to 1 bit plus helper bit on right) Means we ll need to shift product/multiplier by 2 (not 1) : middle of run of s, do nothing 1: beginning of run of 1s, subtract multiplicand<<1 (M*2) Why M*2 instead of M? 1: singleton 1, add multiplicand 11: beginning of run of 1s, subtract multiplicand 1: end of run of 1s, add multiplicand 11: end of run of 1s, beginning of another, subtract multiplicand Why is this? 2 J+1 + 2 J = 2 J 11: end of a run of 1s, add multiplicand<<1 (M*2) 111: middle of run of 1s, do nothing 47 Modified Booth In Action 43 = 1111 * 12 = 11 = // multiplier bits - 172 = 111111 // multiplier bits 11 + 688 = 1111 // multiplier bits 1 5 = 11 48 Modified Booth Hardware Another : Multiple Adders <<1 or no shift + or ± >>2 Control algorithm: repeat 8 times (not!) Based on 3b groups, add/subtract shifted/unshifted multiplicand Shift product/multiplier right by 2 3 <<3 + <<2 + <<1 + + >>4 Can multiply by N bits at a time by using N adders Doesn t help: 4X fewer iterations, each one 4X longer (4*9=36) 4 49 5

Carry Save Addition (CSA) Carry save addition (CSA): d(n adds) < N*d(1 add) Enabling observation: unconventional view of full adder 3 inputs (A,B,C in ) 2 outputs (S,C out ) If adding two numbers, only thing to do is chain C out to C in+1 But what if we are adding three numbers (A+B+D)? One option: back-to-back conventional adders Add A + B = temp Add temp + D = Sum Better option: instead of rippling carries in first addition (A+B), feed the D bits in as the carry bits (treat D bits as C bits) Assume A+B+D = temp2 Then do traditional addition (not CSA) of temp2 and C bits generated during addition of A+B+D 51 Carry Save Addition (CSA) FA CO FA B 3 A 3 CO S 3 B 2 A 2 B 1 A 1 B A S 3 D 3 B 3 A 3 S 2 D 2 S 1 D 1 B 2 A 2 D 2 B 1 A 1 D 1 S D 3 B FA FA FA S 2 S 1 D A D S 2 conventional adders [2 * d(add)] gate levels d(add)=9 d = 18 k conventional adders d = [k * d(add)] d = 9k CSA+conventional adder d = [d(csa) + d(add)] d(csa) = d(1 FA) = 2 d = 11 k CSAs+conventional add d = [k*d(csa) + d(add)] d = 2k + 9 52 Carry Save Wallace Tree (based on CSA) <<3 <<2 CSA <<1 CSA << CSA + >>4 4 4-bit at a time multiplier using 3 CSA + 1 normal adder Actually helps: 4X fewer iterations, each only (2+2+2+9=15) 53 54