Divide: Paper & Pencil. Computer Architecture ALU Design : Division and Floating Point. Divide algorithm. DIVIDE HARDWARE Version 1



Similar documents
Binary Division. Decimal Division. Hardware for Binary Division. Simple 16-bit Divider Circuit

This Unit: Floating Point Arithmetic. CIS 371 Computer Organization and Design. Readings. Floating Point (FP) Numbers

Lecture 8: Binary Multiplication & Division

ECE 0142 Computer Organization. Lecture 3 Floating Point Representations

To convert an arbitrary power of 2 into its English equivalent, remember the rules of exponential arithmetic:

Binary Number System. 16. Binary Numbers. Base 10 digits: Base 2 digits: 0 1

Arithmetic in MIPS. Objectives. Instruction. Integer arithmetic. After completing this lab you will:

LSN 2 Number Systems. ECT 224 Digital Computer Fundamentals. Department of Engineering Technology

The string of digits in the binary number system represents the quantity

CS321. Introduction to Numerical Methods

2010/9/19. Binary number system. Binary numbers. Outline. Binary to decimal

Oct: 50 8 = 6 (r = 2) 6 8 = 0 (r = 6) Writing the remainders in reverse order we get: (50) 10 = (62) 8

CSI 333 Lecture 1 Number Systems

Levent EREN A-306 Office Phone: INTRODUCTION TO DIGITAL LOGIC

CHAPTER 5 Round-off errors

Computer Science 281 Binary and Hexadecimal Review

Solution for Homework 2

Data Storage 3.1. Foundations of Computer Science Cengage Learning

Systems I: Computer Organization and Architecture

EE 261 Introduction to Logic Circuits. Module #2 Number Systems

Attention: This material is copyright Chris Hecker. All rights reserved.

Recall the process used for adding decimal numbers. 1. Place the numbers to be added in vertical format, aligning the decimal points.

Number Representation

Binary Representation. Number Systems. Base 10, Base 2, Base 16. Positional Notation. Conversion of Any Base to Decimal.

Data Storage. Chapter 3. Objectives. 3-1 Data Types. Data Inside the Computer. After studying this chapter, students should be able to:

A single register, called the accumulator, stores the. operand before the operation, and stores the result. Add y # add y from memory to the acc

CS201: Architecture and Assembly Language

AN617. Fixed Point Routines FIXED POINT ARITHMETIC INTRODUCTION. Thi d t t d ith F M k Design Consultant

Correctly Rounded Floating-point Binary-to-Decimal and Decimal-to-Binary Conversion Routines in Standard ML. By Prashanth Tilleti

Lecture 11: Number Systems

Measures of Error: for exact x and approximation x Absolute error e = x x. Relative error r = (x x )/x.

Useful Number Systems

Numerical Matrix Analysis

1. Give the 16 bit signed (twos complement) representation of the following decimal numbers, and convert to hexadecimal:

This 3-digit ASCII string could also be calculated as n = (Data[2]-0x30) +10*((Data[1]-0x30)+10*(Data[0]-0x30));

Binary Numbering Systems

Goals. Unary Numbers. Decimal Numbers. 3,148 is s 100 s 10 s 1 s. Number Bases 1/12/2009. COMP370 Intro to Computer Architecture 1

Numbering Systems. InThisAppendix...

FLOATING-POINT ARITHMETIC IN AMD PROCESSORS MICHAEL SCHULTE AMD RESEARCH JUNE 2015

What Every Computer Scientist Should Know About Floating-Point Arithmetic

Floating point package user s guide By David Bishop (dbishop@vhdl.org)

Q-Format number representation. Lecture 5 Fixed Point vs Floating Point. How to store Q30 number to 16-bit memory? Q-format notation.

NUMBER SYSTEMS. William Stallings

COMP 250 Fall 2012 lecture 2 binary representations Sept. 11, 2012

COMPSCI 210. Binary Fractions. Agenda & Reading

Fast Logarithms on a Floating-Point Device

DNA Data and Program Representation. Alexandre David

Session 29 Scientific Notation and Laws of Exponents. If you have ever taken a Chemistry class, you may have encountered the following numbers:

Lecture 2. Binary and Hexadecimal Numbers

Number Conversions Dr. Sarita Agarwal (Acharya Narendra Dev College,University of Delhi)

Binary Representation

Binary Adders: Half Adders and Full Adders

Exponents, Radicals, and Scientific Notation

Zuse's Z3 Square Root Algorithm Talk given at Fall meeting of the Ohio Section of the MAA October College of Wooster

plc numbers Encoded values; BCD and ASCII Error detection; parity, gray code and checksums

Reduced Instruction Set Computer (RISC)

Intel 64 and IA-32 Architectures Software Developer s Manual

CDA 3200 Digital Systems. Instructor: Dr. Janusz Zalewski Developed by: Dr. Dahai Guo Spring 2012

Floating Point Fused Add-Subtract and Fused Dot-Product Units

PREPARATION FOR MATH TESTING at CityLab Academy

Chapter 4 -- Decimals

Chapter 5 Instructor's Manual

ASCII Characters. 146 CHAPTER 3 Information Representation. The sign bit is 1, so the number is negative. Converting to decimal gives

3. Convert a number from one number system to another

1. Convert the following base 10 numbers into 8-bit 2 s complement notation 0, -1, -12

Decimals Adding and Subtracting

Addition Methods. Methods Jottings Expanded Compact Examples = 15

Informatica e Sistemi in Tempo Reale

Digital System Design Prof. D Roychoudhry Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

MATH-0910 Review Concepts (Haugen)

Precision & Performance: Floating Point and IEEE 754 Compliance for NVIDIA GPUs

Positional Numbering System

Chapter 7D The Java Virtual Machine

NUMBER SYSTEMS. 1.1 Introduction

Chapter 2. Binary Values and Number Systems

( 214 (represen ed by flipping) (magnitude = -42) = = -42 (flip the bits and add 1)

HOMEWORK # 2 SOLUTIO

Numerical Analysis I

BINARY CODED DECIMAL: B.C.D.

How do you compare numbers? On a number line, larger numbers are to the right and smaller numbers are to the left.

Instruction Set Architecture (ISA)

Paramedic Program Pre-Admission Mathematics Test Study Guide

CPEN Digital Logic Design Binary Systems

Stacks. Linear data structures

THE EXACT DOT PRODUCT AS BASIC TOOL FOR LONG INTERVAL ARITHMETIC

High-Precision C++ Arithmetic

Copy in your notebook: Add an example of each term with the symbols used in algebra 2 if there are any.

YOU MUST BE ABLE TO DO THE FOLLOWING PROBLEMS WITHOUT A CALCULATOR!

RN-coding of Numbers: New Insights and Some Applications

MACHINE INSTRUCTIONS AND PROGRAMS

Numeral Systems. The number twenty-five can be represented in many ways: Decimal system (base 10): 25 Roman numerals:

1 Description of The Simpletron

26 Integers: Multiplication, Division, and Order

2011, The McGraw-Hill Companies, Inc. Chapter 3

Unsigned Conversions from Decimal or to Decimal and other Number Systems

Implementation of Modified Booth Algorithm (Radix 4) and its Comparison with Booth Algorithm (Radix-2)

Some Functions Computable with a Fused-mac

An Efficient RNS to Binary Converter Using the Moduli Set {2n + 1, 2n, 2n 1}

Instruction Set Design

Transcription:

Divide: Paper & Pencil Computer Architecture ALU Design : Division and Floating Point 1001 Quotient Divisor 1000 1001010 Dividend 1000 10 101 1010 1000 10 (or Modulo result) See how big a number can be subtracted, creating quotient bit on each step Quotient bit = 1 if can be subtracted, 0 otherwise Dividend = Quotient x Divisor + 3 versions of divide, successive refinement EEL-4713 Ann Gordon-Ross.1 EEL-4713 Ann Gordon-Ross.2 Divide algorithm DIVIDE HARDWARE Version 1 Main ideas: Expand both divisor and dividend to twice their size - Expanded divisor = divisor (half bits, MSB) zeroes (half bits, LSB) - Expanded dividend = zeroes (half bits, MSB) dividend (half bits, LSB) At each step, determine if divisor is smaller than dividend - Subtract the two, look at sign - If >=0: dividend/divisor>=1, mark this in quotient as 1 - If negative: divisor larger than dividend; mark this in quotient as 0 Shift divisor right and quotient left to cover next power of two Example: 7/2 64-bit Divisor reg, 64-bit ALU, 64-bit reg, 32-bit Quotient reg 0s Divisor 64-bit ALU Divid. 64 bits 0s 64 bits Write Shift Right Quotient Control 32 bits Shift Left EEL-4713 Ann Gordon-Ross.3 EEL-4713 Ann Gordon-Ross.4

Divide Algorithm Version 1: 7/2 Takes n+1 steps for n-bit Quotient & Rem. Quotient Divisor 0000 0111 0000 0010 0000 2a. Shift the Quotient register to the left setting the new rightmost bit to 1. >= 0 Start: Place Dividend in 1. Subtract the Divisor register from the register, and place the result in the register. Test < 0 2b. Restore the original value by adding the Divisor register to the register, & place the sum in the register. Also shift the Quotient register to the left, setting the new rightmost bit to 0. Divide Algorithm Version 1: 7 (0111) / 2 (0010) = 3 (0011) R 1 (0001) Step Quotient Divisor Rem-Div Initial 0000 0111 0000 0010 0000 < 0 1 0000 0111 0000 0001 0000 < 0 2 0000 0111 0000 0000 1000 < 0 3 0000 0111 0000 0000 0100 0000 0011 > 0 4 0000 0011 0001 0000 0010 0000 0001 > 0 5 0000 0001 0011 0000 0001 Final 1 3 3. Shift the Divisor register right1 bit. EEL-4713 Ann Gordon-Ross.5 n+1 repetition? Done No: < n+1 repetitions Yes: n+1 repetitions (n = 4 here) EEL-4713 Ann Gordon-Ross.6 Observations on Divide Version 1 1/2 bits in divisor always 0 => 1/2 of 64-bit adder is wasted => 1/2 of divisor is wasted Instead of shifting divisor to right, shift remainder to left? 1st step will never produce a 1 in quotient bit (otherwise too big) => switch order to shift first and then subtract, can save 1 iteration Divide Algorithm Version 1: 7 (0111) / 2 (0010) = 3 (0011) R 1 (0001) Step Quotient Divisor Rem-Div Initial 0000 0111 0000 0010 0000 < 0 1 0000 0111 0000 0001 0000 < 0 2 0000 0111 0000 0000 1000 < 0 3 0000 0111 0000 0000 0100 0000 0011 > 0 4 0000 0011 0001 0000 0010 0000 0001 > 0 5 0000 0001 0011 0000 0001 Final 1 3 First Rem-Dev always < 0 Always 0 EEL-4713 Ann Gordon-Ross.7 EEL-4713 Ann Gordon-Ross.8

DIVIDE HARDWARE Version 2 Divide Algorithm Version 2 Start: Place Dividend in 32-bit Divisor reg, 32-bit ALU, 64-bit reg, 32-bit Quotient reg Quotient Divisor 0000 0111 0000 0010 1. Shift the register left 1 bit. 2. Subtract the Divisor register from the left half of the register, & place the result in the left half of the register. Divisor >= 0 Test < 0 32 bits 32-bit ALU 64 bits Shift Left Write Quotient Control 32 bits Shift Left 3a. Shift the Quotient register to the left setting the new rightmost bit to 1. 3b. Restore the original value by adding the Divisor register to the left half of the register, &place the sum in the left half of the register. Also shift the Quotient register to the left, setting the new least significant bit to 0. nth repetition? No: < n repetitions Yes: n repetitions (n = 4 here) EEL-4713 Ann Gordon-Ross.9 EEL-4713 Ann Gordon-Ross.10 Done Observations on Divide Version 2 DIVIDE HARDWARE Version 3 Eliminate Quotient register by combining with as shifted left Start by shifting the left as before. Thereafter loop contains only two steps because the shifting of the register shifts both the remainder in the left half and the quotient in the right half The consequence of combining the two registers together and the new order of the operations in the loop is that the remainder will shifted left one time too many. Thus the final correction step must shift back only the remainder in the left half of the register 32-bit Divisor reg, 32 -bit ALU, 64-bit reg, (0-bit Quotient reg) Divisor 32-bit ALU 32 bits HI LO (Quotient) 64 bits Shift Left Write Control EEL-4713 Ann Gordon-Ross.11 EEL-4713 Ann Gordon-Ross.12

Divide Algorithm Version 3 Start: Place Dividend in 1. Shift the register left 1 bit. Divide Algorithm Version 3: 7 (0111) / 2 (0010) = 3 (0011) R 1 (0001) 3a. Shift the register to the left setting the new rightmost bit to 1. 2. Subtract the Divisor register from the left half of the register, & place the result in the left half of the register. >= 0 Test < 0 3b. Restore the original value by adding the Divisor register to the left half of the register, &place the sum in the left half of the register. Also shift the register to the left, setting the new least significant bit to 0. Step Divisor Rem-Div Initial 0000 0111 0010 Always < 0 Shift 0000 1110 0010 < 0 1 0001 1100 0010 < 0 2 0011 1000 0010 0011-0010 > 0 2 0001 1000 0010 3 0011 0001 0010 0011-0010 > 0 3 0001 0001 0010 4 0010 0011 0010 Final R1 3 EEL-4713 Ann Gordon-Ross.13 nth repetition? No: < n repetitions Yes: n repetitions (n = 4 here) Done. Shift left half of right 1 bit. EEL-4713 Ann Gordon-Ross.14 Observations on Divide Version 3 Same Hardware as Multiply: just need ALU to add or subtract, and 64-bit register to shift left or shift right Hi and Lo registers in MIPS combine to act as 64-bit register for multiply and divide Signed Divides: Simplest is to remember signs, make positive, and complement quotient and remainder if necessary Note: Dividend and must have same sign Note: Quotient negated if Divisor sign & Dividend sign disagree e.g., 7 2 = 3, remainder = 1 Floating-Point What can be represented in N bits? N Unsigned 0 to 2 N-1 N-1 2s Complement - 2 to 2-1 Integer numbers useful in many cases; must also consider real numbers with fractions E.g. 1/2 = 0.5 very large 9,349,398,989,000,000,000,000,000,000 very small 0.0000000000000000000000045691 EEL-4713 Ann Gordon-Ross.15 EEL-4713 Ann Gordon-Ross.16

Recall Scientific Notation decimal point Mantissa Sign, magnitude Issues: Arithmetic (+, -, *, / ) exponent Sign, magnitude 23-24 6.02 x 10 1.673 x 10 radix (base) IEEE F.P. ± 1.M x 2 e - 127 Representation, normalized form (e.g., x.xxx * 10 x ) Range and Precision Rounding Exceptions (e.g., divide by zero, overflow, underflow) Errors Normalized notation using powers of two Base 10: single non-zero digit left of the decimal point. Base 2: normalized numbers can also be represented as: 1.xxxxxx * 2^(yyyy), where x and y are binary Example: -0.75-75/100, or, -3/4-3 in binary: -11.0 Divided by 4 -> binary point moves left two positions, -0.11 Normalized: -1.1 * 2^(-1) EEL-4713 Ann Gordon-Ross.17 EEL-4713 Ann Gordon-Ross.18 *Review from Prerequisites: Floating-Point Arithmetic Representation of floating point numbers in IEEE 754 standard: 1 8 23 single precision sign S E M actual exponent is e = E 127 (bias) EEL-4713 Ann Gordon-Ross.19 exponent: excess 127 binary integer mantissa: sign + magnitude, normalized binary significand w/ hidden integer bit: 1.M 0 < E < 255 (bias makes < > comparisons easy) S E-127 N = (-1) 2 (1.M) Unbiased Biased +- 1.0000! 0000 x 2-126 => 1.0000! 0000 x 2 1 +- 1.1111! 1111 x 2 +127 => 1.1111! 1111 x 2 254 +- 1.0000! 0000 x 2 0 => 1.0000! 0000 x 2 127 Magnitude of numbers that can be represented is in the range: 2-126 (1.0) to 2 127 (2-2 23 ) which is approximately: 1.8 x 10-38 to 3.40 x 10 38 (integer comparison valid on IEEE Fl.Pt. numbers of same sign!) Single- and double-precision Single-precision: 32 bits (sign + 8 exponent + 23 fraction) Double-precision: 64 bits (sign + 11 exponent + 52 fraction) Increases reach of large/small numbers by 3 powers, but most noticeable improvement is in the number of bits used to represent fraction Example: -0.75-1.1 *2^(-1) Sign bit: 1 Exponent: e-127=-1 so e=126 (01111110) Mantissa: 1000!00 (Remember, for 1.x, the 1 is implicit so not in M) Single-precision representation: 1011111101000!00 EEL-4713 Ann Gordon-Ross.20

Operations with floating-point numbers Addition example Addition/subtraction: Need to have both operands with the same exponent - small ALU calculates exponent difference - Shift number with smaller exponent to the right Add/subtract the mantissas Multiplication/division Add/subtract the exponents Multiply/divide mantissas Normalize, round, (re-normalize) 99.99 + 0.161 Scientific notation, assume only 4 digits can be stored 9.999E+1, 1.610E-1 Must align exponents: 1.610E-1 = 0.0161E+1 Can only represent 4 digits: 0.016E+1 Sum: 10.015E+1 Not normalized; adjust to 1.0015E+2 Can only represent 4 digits; must round (0 to 4 down, 5 to 9 up) 1.002E+2 EEL-4713 Ann Gordon-Ross.21 It can happen that after rounding result is no longer normalized E.g. if the sum was 9.9999E+2, normalize again EEL-4713 Ann Gordon-Ross.22 Addition Addition EEL-4713 Ann Gordon-Ross.23 EEL-4713 Ann Gordon-Ross.24

Multiplication Multiplication Example: 1.110E10 * 9.200E-5 Add exponents: 10 + (-5) = 5 Remember: in IEEE format, the number stored in the FP bits is e, but the actual exponent is (e-127) (subtract the bias). To compute the exponent of the result, you have to add the e bits from both operands, and then subtract 127 to adjust E.g. exponent +10 is stored as 137; -5 as 122 137+122 = 259 259-127 = 132, which represents exponent +5 Multiply significands 1.110*9.200 = 10.212000 Normalize: 1.0212E+6 Check exponent for overflow (too large positive exponent) and underflow (too large negative exponent) Round to 4 digits: 1.021E+6 EEL-4713 Ann Gordon-Ross.25 EEL-4713 Ann Gordon-Ross.26 Infinity and NaNs result of operation overflows, i.e., is larger than the largest number that can be represented overflow (too large of an exponent) is not the same as divide by zero Both generate +/-Inf as result; but raise different exceptions +/- infinity S 1... 1 0... 0 It may make sense to do further computations with infinity e.g., X=Inf > Y may be a valid comparison Guard, round and sticky bits # of bits in floating-point fraction is fixed During an operation, can keep additional bits around to improve precision in rounding operations Guard and round bits are kept around during FP operation and used to decide direction to round Not a number, but not infinity (e.q. sqrt(-4)) invalid operation exception (unless operation is = or =) NaN S 1... 1 non-zero NaNs propagate: f(nan) = NaN HW decides what goes here Sticky bits: flag whether any bits that are not considered in an operation (they have been shifted right) are 1 Can be used as another factor to determine the direction of rounding EEL-4713 Ann Gordon-Ross.27 EEL-4713 Ann Gordon-Ross.28

Guard and round bits Floating-point in MIPS E.g. 2.56*10^0 + 2.34*10^2 3 significant decimal digits With guard and round digits: 2.3400 + 0.0256 --------- 2.3656 0 to 49: round down, 50 to 99: round up -> 2.37 Witouth guard and round digits: 2.34 + 0.02 ------ 2.36 Use different set of registers 32 32-bit floating point registers, $f0 - $f31 Individual registers: single-precision Two registers can be combined for double-precision $f0 ($f0,$f1), $f2 ($f2,$f3) add, sub, mult, div.s for single,.d for double precision Load and store memory word to 32-bit FP register Lwcl, swcl (cl refers to co-processor 1 when separate FPU used in past) Instructions to branch on floating point conditions (e.g. overflow), and to compare FP registers EEL-4713 Ann Gordon-Ross.29 EEL-4713 Ann Gordon-Ross.30 Floating-point in x86 Floating point in x86 First introduced with 8087 FP co-processor Primarily a stack architecture: Loads push numbers into stack Operations find operands on two top slots of stack Stores pop from stack Similar to HP calculators 2+3 -> 23+ Also supports one operand to come from either FP register below top of stack, or from memory Data movement: Load, load constant, store Arithmetic operations: Add, subtract, multiply, divide, square root Trigonometric/logarithmic operations Sin, cos, log, exp Comparison and branch 32-bit (single-precision) and 64-bit (double-precision) support EEL-4713 Ann Gordon-Ross.31 EEL-4713 Ann Gordon-Ross.32

SSE2 extensions Differences between x86 FP approaches Streaming SIMD extension 2 Introduced in 2001 SIMD: single-instruction, multiple data Basic idea: operate in parallel on elements within a wide word - e.g. 128-bit word can be seen as 4 single-precision FP numbers, or 2 double-precision Eight 128-bit registers 16 in the 64-bit AMD64/EM64T No stack any register can be referenced for FP operation 8087-based: SSE2: Registers are 80-bit (more accuracy during operations); data is converted to/from 64-bit when moving to/from memory Stack architecture Single operand per register Registers are 128-bit Register-register architecture Multiple operands per register Differences in internal representation can cause differences in results for the same program 80-bit representation used in operations Truncated to 64-bit during transfers Differences can accumulate, effected by when loads/stores occur EEL-4713 Ann Gordon-Ross.33 EEL-4713 Ann Gordon-Ross.34 Floating point operations Number of bits is limited and small errors in individual FP operations can compound over large iterations Numerical methods that perform operations such as to minimize accumulation of errors are needed in various scientific applications Operations may not work as you would expect E.g. floating-point add is not always associative x + (y+z) = (x+y) +z? x = -1.5*10^38, y=1.5*10^38, z=1.0 (x+y) + z = (-1.5*10^38 + 1.5*10^38) + 1.0 = (0.0) + 1.0 = 1.0 x + (y+z) = -1.5*10^38 + (1.5*10^38 + 1.0) = -1.5*10^38 + 1.5*10^38 = 0.0 1.5*10^38 is so much larger than 1, that sum is just 1.5*10^38 due to rounding during the operation Summary Bits have no inherent meaning: operations determine whether they are really ASCII characters, integers, floating point numbers Divide can use same hardware as multiply: Hi & Lo registers in MIPS Floating point basically follows paper and pencil method of scientific notation using integer algorithms for multiply and divide of significands IEEE 754 requires good rounding; special values for NaN, Infinity EEL-4713 Ann Gordon-Ross.35 EEL-4713 Ann Gordon-Ross.36