ECE 0142 Computer Organization. Lecture 3 Floating Point Representations

Similar documents
ECE 0142 Computer Organization. Lecture 3 Floating Point Representations

Divide: Paper & Pencil. Computer Architecture ALU Design : Division and Floating Point. Divide algorithm. DIVIDE HARDWARE Version 1

This Unit: Floating Point Arithmetic. CIS 371 Computer Organization and Design. Readings. Floating Point (FP) Numbers

Binary Division. Decimal Division. Hardware for Binary Division. Simple 16-bit Divider Circuit

Binary Number System. 16. Binary Numbers. Base 10 digits: Base 2 digits: 0 1

Computer Science 281 Binary and Hexadecimal Review

To convert an arbitrary power of 2 into its English equivalent, remember the rules of exponential arithmetic:

CS321. Introduction to Numerical Methods

Attention: This material is copyright Chris Hecker. All rights reserved.

Numerical Matrix Analysis

CHAPTER 5 Round-off errors

The string of digits in the binary number system represents the quantity

DNA Data and Program Representation. Alexandre David

Oct: 50 8 = 6 (r = 2) 6 8 = 0 (r = 6) Writing the remainders in reverse order we get: (50) 10 = (62) 8

Data Storage 3.1. Foundations of Computer Science Cengage Learning

Measures of Error: for exact x and approximation x Absolute error e = x x. Relative error r = (x x )/x.

2010/9/19. Binary number system. Binary numbers. Outline. Binary to decimal

CSI 333 Lecture 1 Number Systems

Data Storage. Chapter 3. Objectives. 3-1 Data Types. Data Inside the Computer. After studying this chapter, students should be able to:

COMP 250 Fall 2012 lecture 2 binary representations Sept. 11, 2012

1. Give the 16 bit signed (twos complement) representation of the following decimal numbers, and convert to hexadecimal:

HOMEWORK # 2 SOLUTIO

Today. Binary addition Representing negative numbers. Andrew H. Fagg: Embedded Real- Time Systems: Binary Arithmetic

Lecture 2. Binary and Hexadecimal Numbers

Number Representation

Arithmetic in MIPS. Objectives. Instruction. Integer arithmetic. After completing this lab you will:

Numbering Systems. InThisAppendix...

Correctly Rounded Floating-point Binary-to-Decimal and Decimal-to-Binary Conversion Routines in Standard ML. By Prashanth Tilleti

LSN 2 Number Systems. ECT 224 Digital Computer Fundamentals. Department of Engineering Technology

This 3-digit ASCII string could also be calculated as n = (Data[2]-0x30) +10*((Data[1]-0x30)+10*(Data[0]-0x30));

Computers. Hardware. The Central Processing Unit (CPU) CMPT 125: Lecture 1: Understanding the Computer

Session 29 Scientific Notation and Laws of Exponents. If you have ever taken a Chemistry class, you may have encountered the following numbers:

Solution for Homework 2

Name: Class: Date: 9. The compiler ignores all comments they are there strictly for the convenience of anyone reading the program.

Negative Exponents and Scientific Notation

8 Square matrices continued: Determinants

ASCII Characters. 146 CHAPTER 3 Information Representation. The sign bit is 1, so the number is negative. Converting to decimal gives

Binary Adders: Half Adders and Full Adders

Monday January 19th 2015 Title: "Transmathematics - a survey of recent results on division by zero" Facilitator: TheNumberNullity / James Anderson, UK

Binary Representation. Number Systems. Base 10, Base 2, Base 16. Positional Notation. Conversion of Any Base to Decimal.

What Every Computer Scientist Should Know About Floating-Point Arithmetic

Levent EREN A-306 Office Phone: INTRODUCTION TO DIGITAL LOGIC

Binary Numbering Systems

FLOATING-POINT ARITHMETIC IN AMD PROCESSORS MICHAEL SCHULTE AMD RESEARCH JUNE 2015

Systems I: Computer Organization and Architecture

Figure 1. A typical Laboratory Thermometer graduated in C.

RN-coding of Numbers: New Insights and Some Applications

1. Convert the following base 10 numbers into 8-bit 2 s complement notation 0, -1, -12

Useful Number Systems

Instruction Set Architecture (ISA)

MULTIPLICATION AND DIVISION OF REAL NUMBERS In this section we will complete the study of the four basic operations with real numbers.

26 Integers: Multiplication, Division, and Order

Q-Format number representation. Lecture 5 Fixed Point vs Floating Point. How to store Q30 number to 16-bit memory? Q-format notation.

Zuse's Z3 Square Root Algorithm Talk given at Fall meeting of the Ohio Section of the MAA October College of Wooster

Floating Point Arithmetic Chapter 14

CS201: Architecture and Assembly Language

1 Description of The Simpletron

How do you compare numbers? On a number line, larger numbers are to the right and smaller numbers are to the left.

Rational Exponents. Squaring both sides of the equation yields. and to be consistent, we must have

5.1 Radical Notation and Rational Exponents

Section 4.1 Rules of Exponents

Lecture 11: Number Systems

0.8 Rational Expressions and Equations

Bachelors of Computer Application Programming Principle & Algorithm (BCA-S102T)

JobTestPrep's Numeracy Review Decimals & Percentages

Review of Scientific Notation and Significant Figures

Precision & Performance: Floating Point and IEEE 754 Compliance for NVIDIA GPUs

Solving Exponential Equations

This document has been written by Michaël Baudin from the Scilab Consortium. December 2010 The Scilab Consortium Digiteo. All rights reserved.

A Short Guide to Significant Figures

Exponents, Radicals, and Scientific Notation

Intel 64 and IA-32 Architectures Software Developer s Manual

This section describes how LabVIEW stores data in memory for controls, indicators, wires, and other objects.

Lecture 8: Binary Multiplication & Division

Comp 255Q - 1M: Computer Organization Lab #3 - Machine Language Programs for the PDP-8

NUMBER SYSTEMS. William Stallings

No Solution Equations Let s look at the following equation: 2 +3=2 +7

This is a square root. The number under the radical is 9. (An asterisk * means multiply.)

Lies My Calculator and Computer Told Me

Part 1 Expressions, Equations, and Inequalities: Simplifying and Solving

Floating Point Fused Add-Subtract and Fused Dot-Product Units

RN-Codings: New Insights and Some Applications

Pemrograman Dasar. Basic Elements Of Java

2.2 Scientific Notation: Writing Large and Small Numbers

High-Precision C++ Arithmetic

THE EXACT DOT PRODUCT AS BASIC TOOL FOR LONG INTERVAL ARITHMETIC

EE 261 Introduction to Logic Circuits. Module #2 Number Systems

Quick Reference ebook

Floating point package user s guide By David Bishop (dbishop@vhdl.org)

MATH-0910 Review Concepts (Haugen)

Chapter 7 - Roots, Radicals, and Complex Numbers

Binary Numbers. Binary Octal Hexadecimal

Recall the process used for adding decimal numbers. 1. Place the numbers to be added in vertical format, aligning the decimal points.

Computer Arithmetic Aliasing issues: Call by reference, Pointer programs

2.6 Exponents and Order of Operations

Chapter 1. Binary, octal and hexadecimal numbers

IA-32 Intel Architecture Software Developer s Manual

A NEW REPRESENTATION OF THE RATIONAL NUMBERS FOR FAST EASY ARITHMETIC. E. C. R. HEHNER and R. N. S. HORSPOOL

Numerical Analysis I

Transcription:

ECE 0142 Computer Organization Lecture 3 Floating Point Representations 1

Floating-point arithmetic We often incur floating-point programming. Floating point greatly simplifies working with large (e.g., 2 70 ) and small (e.g., 2-17 ) numbers We ll focus on the IEEE 754 standard for floating-point arithmetic. How FP numbers are represented Limitations of FP numbers FP addition and multiplication 2

Floating-point representation IEEE numbers are stored using a kind of scientific notation. ± mantissa * 2 exponent We can represent floating-point numbers with three binary fields: a sign bit s, an exponent field e, and a fraction field f. s e f The IEEE 754 standard defines several different precisions. Single precision numbers include an 8-bit exponent field and a 23-bit fraction, for a total of 32 bits. Double precision numbers have an 11-bit exponent field and a 52-bit fraction, for a total of 64 bits. 3

Sign s e f The sign bit is 0 for positive numbers and 1 for negative numbers. But unlike integers, IEEE values are stored in signed magnitude format. 4

Mantissa s e f There are many ways to write a number in scientific notation, but there is always a unique normalized representation, with exactly one non-zero digit to the left of the point. 0.232 10 3 = 23.2 10 1 = 2.32 * 10 2 = 01001 = 1.001 2 3 = What s the normalized representation of 00101101.101? 00101101.101 = 1.01101101 2 5 What s the normalized representation of 0.0001101001110? 0.0001101001110 = 1.110100111 2-4 5

Mantissa s e f There are many ways to write a number in scientific notation, but there is always a unique normalized representation, with exactly one non-zero digit to the left of the point. 0.232 10 3 = 23.2 10 1 = 2.32 * 10 2 = 01001 = 1.001 2 3 = The field f contains a binary fraction. The actual mantissa of the floating-point value is (1 + f). In other words, there is an implicit 1 to the left of the binary point. For example, if f is 01101, the mantissa would be 1.01101 A side effect is that we get a little more precision: there are 24 bits in the mantissa, but we only need to store 23 of them. But, what about value 0? 6

Exponent s e f There are special cases that require encodings Infinities (overflow) NAN (divide by zero) For example: Single-precision: 8 bits in e 256 codes; 11111111 reserved for special cases 255 codes; one code (00000000) for zero 254 codes; need both positive and negative exponents half positives (127), and half negatives (127) Double-precision: 11 bits in e 2048 codes; 111 1 reserved for special cases 2047 codes; one code for zero 2046 codes; need both positive and negative exponents half positives (1023), and half negatives (1023) 7

Exponent s e f The e field represents the exponent as a biased number. It contains the actual exponent plus 127 for single precision, or the actual exponent plus 1023 in double precision. This converts all single-precision exponents from -126 to +127 into unsigned numbers from 1 to 254, and all double-precision exponents from -1022 to +1023 into unsigned numbers from 1 to 2046. Two examples with single-precision numbers are shown below. If the exponent is 4, the e field will be 4 + 127 = 131 (10000011 2 ). If e contains 01011101 (93 10 ), the actual exponent is 93-127 = - 34. Storing a biased exponent means we can compare IEEE values as if they were signed integers. 8

Mapping Between e and Actual Exponent e Actual Exponent 0000 0000 Reserved 0000 0001 1-127 = -126-126 10 0000 0010 2-127 = -125-125 10 0111 1111 0 10 1111 1110 254-127=127 127 10 1111 1111 Reserved 9

Converting an IEEE 754 number to decimal s e f The decimal value of an IEEE number is given by the formula: (1-2s) * (1 + f) * 2 e-bias Here, the s, f and e fields are assumed to be in decimal. (1-2s) is 1 or -1, depending on whether the sign bit is 0 or 1. We add an implicit 1 to the fraction field f, as mentioned earlier. Again, the bias is either 127 or 1023, for single or double precision. 10

Example IEEE-decimal conversion Let s find the decimal value of the following IEEE number. 1 01111100 11000000000000000000000 First convert each individual field to decimal. The sign bit s is 1. The e field contains 01111100 = 124 10. The mantissa is 0.11000 = 0.75 10. Then just plug these decimal values of s, e and f into our formula. (1-2s) * (1 + f) * 2 e-bias This gives us (1-2) * (1 + 0.75) * 2 124-127 = (-1.75 * 2-3 ) = -0.21875. 11

Converting a decimal number to IEEE 754 What is the single-precision representation of 347.625? 1. First convert the number to binary: 347.625 = 101011011.101 2. 2. Normalize the number by shifting the binary point until there is a single 1 to the left: 101011011.101 x 2 0 = 1.01011011101 x 2 8 3. The bits to the right of the binary point comprise the fractional field f. 4. The number of times you shifted gives the exponent. The field e should contain: exponent + 127. 5. Sign bit: 0 if positive, 1 if negative. 12

Exercise What is the single-precision representation of 639.6875 639.6875 = 1001111111.1011 2 = 1.0011111111011 2 9 s = 0 e = 9 + 127 = 136 = 10001000 f = 0011111111011 The single-precision representation is: 0 10001000 00111111110110000000000 13

Examples: Compare FP numbers ( <, >? ) 1. 0 0111 1111 110 0 0 1000 0000 110 0 +1.11 2 2 (127-127) =1.75 10 +1.11 2 2 (128-127) = 11.1 2 =3.5 10 0 0111 1111 110 0 0 1000 0000 110 0 + 0111 1111 < + 1000 0000 directly comparing exponents as unsigned values gives result 2. 1 0111 1111 110 0 1 1000 0000 110 0 -f 2 (0111 1111 ) -f 2 (1000 0000) For exponents: 0111 1111 < 1000 0000 So -f 2 (0111 1111 ) > -f 2 (1000 0000) 14

Special Values (single-precision) E F meaning Notes 00000000 0 0 0 +0.0 and -0.0 00000000 X X Valid number Unnormalized =(-1) S x 2-126 x (0.F) 11111111 0 0 Infinity 11111111 X X Not a Number 15

E Real Exponent F Value 0000 0000 Reserved 000 0 0 10 xxx x Unnormalized (-1) S x 2-126 x (0.F) 0000 0001-126 10 0000 0010-125 10 0111 1111 0 10 Normalized (-1) S x 2 e-127 x (1.F) 1111 1110 127 10 1111 1111 Reserved 000 0 Infinity xxx x NaN 16

Range of numbers Normalized (positive range; negative is symmetric) smallest largest 00000000100000000000000000000000 +2-126 (1+0) = 2-126 01111111011111111111111111111111 +2 127 (2-2 -23 ) Unnormalized smallest largest 00000000000000000000000000000001 +2-126 (2-23 ) = 2-149 00000000011111111111111111111111 +2-126 (1-2 -23 ) 0 2-149 2-126 (1-2 -23 ) 2-126 2 127 (2-2 -23 ) Positive underflow Positive overflow 17

In comparison The smallest and largest possible 32-bit integers in two s complement are only -2 31 and 2 31-1 How can we represent so many more values in the IEEE 754 format, even though we use the same number of bits as regular integers? 2-126 what s the next representable FP number? 0 +2-126 (1+2-23 ) differ from the smallest number by 2-149 18

Finiteness There aren t more IEEE numbers. With 32 bits, there are 2 32, or about 4 billion, different bit patterns. These can represent 4 billion integers or 4 billion reals. But there are an infinite number of reals, and the IEEE format can only represent some of the ones from about -2 128 to +2 128. Represent same number of values between 2 n and 2 n+1 as 2 n+1 and 2 n+2 2 4 8 16 Thus, floating-point arithmetic has issues Small roundoff errors can accumulate with multiplications or exponentiations, resulting in big errors. Rounding errors can invalidate many basic arithmetic principles such as the associative law, (x + y) + z = x + (y + z). The IEEE 754 standard guarantees that all machines will produce the same results but those results may not be mathematically accurate! 19

Limits of the IEEE representation Even some integers cannot be represented in the IEEE format. int x = 33554431; float y = 33554431; printf( "%d\n", x ); printf( "%f\n", y ); 33554431 33554432.000000 Some simple decimal numbers cannot be represented exactly in binary to begin with. 0.10 10 = 0.0001100110011... 2 20

0.10 During the Gulf War in 1991, a U.S. Patriot missile failed to intercept an Iraqi Scud missile, and 28 Americans were killed. A later study determined that the problem was caused by the inaccuracy of the binary representation of 0.10. The Patriot incremented a counter once every 0.10 seconds. It multiplied the counter value by 0.10 to compute the actual time. However, the (24-bit) binary representation of 0.10 actually corresponds to 0.099999904632568359375, which is off by 0.000000095367431640625. This doesn t seem like much, but after 100 hours the time ends up being off by 0.34 seconds enough time for a Scud to travel 500 meters! Professor Skeel wrote a short article about this. Roundoff Error and the Patriot Missile. SIAM News, 25(4):11, July 1992. 21

Floating-point addition example To get a feel for floating-point operations, we ll do an addition example. To keep it simple, we ll use base 10 scientific notation. Assume the mantissa has four digits, and the exponent has one digit. An example for the addition: 99.99 + 0.161 = 100.151 As normalized numbers, the operands would be written as: 9.999 * 10 1 1.610 * 10-1 22

Steps 1-2: the actual addition 1. Equalize the exponents. The operand with the smaller exponent should be rewritten by increasing its exponent and shifting the point leftwards. 1.610 * 10-1 = 0.01610 * 10 1 With four significant digits, this gets rounded to: 0.016 This can result in a loss of least significant digits the rightmost 1 in this case. But rewriting the number with the larger exponent could result in loss of the most significant digits, which is much worse. 2. Add the mantissas. 9.999 * 10 1 + 0.016 * 10 1 10.015 * 10 1 23

Steps 3-5: representing the result 3. Normalize the result if necessary. 10.015 * 10 1 = 1.0015 * 10 2 This step may cause the point to shift either left or right, and the exponent to either increase or decrease. 4. Round the number if needed. 1.0015 * 10 2 gets rounded to 1.002 * 10 2 5. Repeat Step 3 if the result is no longer normalized. We don t need this in our example, but it s possible for rounding to add digits for example, rounding 9.9995 yields 10.000. Our result is 1.002*10 2, or 100.2. The correct answer is 100.151, so we have the right answer to four significant digits, but there s a small error already. 24

Example Calculate 0 1000 0001 110 0 plus 0 1000 0010 00110..0 both are single-precision IEEE 754 representation 1. 1 st number: 1.11 2 2 (129-127) ; 2 nd number: 1.0011 2 2 (130-127) 2. Compare the e field: 1000 0001 < 1000 0010 3. Align exponents to 1000 0010; so the 1 st number becomes: 0.111 2 2 3 4. Add mantissa 1.0011 +0.1110 10.0001 5. So the sum is: 10.0001 2 3 = 1.00001 2 4 So the IEEE 754 format is: 0 1000 0011 000010 0 25

Multiplication To multiply two floating-point values, first multiply their magnitudes and add their exponents. 9.999 * 10 1 * 1.610 * 10-1 16.098 * 10 0 You can then round and normalize the result, yielding 1.610 * 10 1. The sign of the product is the exclusive-or of the signs of the operands. If two numbers have the same sign, their product is positive. If two numbers have different signs, the product is negative. 0 0 = 0 0 1 = 1 1 0 = 1 1 1 = 0 This is one of the main advantages of using signed magnitude. 26

The history of floating-point computation In the past, each machine had its own implementation of floating-point arithmetic hardware and/or software. It was impossible to write portable programs that would produce the same results on different systems. It wasn t until 1985 that the IEEE 754 standard was adopted. Having a standard at least ensures that all compliant machines will produce the same outputs for the same program. 27

Floating-point hardware When floating point was introduced in microprocessors, there wasn t enough transistors on chip to implement it. You had to buy a floating point co-processor (e.g., the Intel 8087) As a result, many ISA s use separate registers for floating point. Modern transistor budgets enable floating point to be on chip. Intel s 486 was the first x86 with built-in floating point (1989) Even the newest ISA s have separate register files for floating point. Makes sense from a floor-planning perspective. 28

FPU like co-processor on chip 29

Summary The IEEE 754 standard defines number representations and operations for floating-point arithmetic. Having a finite number of bits means we can t represent all possible real numbers, and errors will occur from approximations. 30