Floating Point Numbers

Similar documents
Binary Division. Decimal Division. Hardware for Binary Division. Simple 16-bit Divider Circuit

Divide: Paper & Pencil. Computer Architecture ALU Design : Division and Floating Point. Divide algorithm. DIVIDE HARDWARE Version 1

This Unit: Floating Point Arithmetic. CIS 371 Computer Organization and Design. Readings. Floating Point (FP) Numbers

Binary Number System. 16. Binary Numbers. Base 10 digits: Base 2 digits: 0 1

ECE 0142 Computer Organization. Lecture 3 Floating Point Representations

2010/9/19. Binary number system. Binary numbers. Outline. Binary to decimal

Oct: 50 8 = 6 (r = 2) 6 8 = 0 (r = 6) Writing the remainders in reverse order we get: (50) 10 = (62) 8

LSN 2 Number Systems. ECT 224 Digital Computer Fundamentals. Department of Engineering Technology

1. Give the 16 bit signed (twos complement) representation of the following decimal numbers, and convert to hexadecimal:

Computer Science 281 Binary and Hexadecimal Review

Solution for Homework 2

Binary Numbering Systems

Today. Binary addition Representing negative numbers. Andrew H. Fagg: Embedded Real- Time Systems: Binary Arithmetic

The string of digits in the binary number system represents the quantity

Numbering Systems. InThisAppendix...

HOMEWORK # 2 SOLUTIO

To convert an arbitrary power of 2 into its English equivalent, remember the rules of exponential arithmetic:

Systems I: Computer Organization and Architecture

Levent EREN A-306 Office Phone: INTRODUCTION TO DIGITAL LOGIC

CSI 333 Lecture 1 Number Systems

CHAPTER 5 Round-off errors

Number Representation

Numerical Matrix Analysis

Correctly Rounded Floating-point Binary-to-Decimal and Decimal-to-Binary Conversion Routines in Standard ML. By Prashanth Tilleti

Measures of Error: for exact x and approximation x Absolute error e = x x. Relative error r = (x x )/x.

CS321. Introduction to Numerical Methods

Attention: This material is copyright Chris Hecker. All rights reserved.

This 3-digit ASCII string could also be calculated as n = (Data[2]-0x30) +10*((Data[1]-0x30)+10*(Data[0]-0x30));

Lecture 8: Binary Multiplication & Division

Useful Number Systems

Data Storage 3.1. Foundations of Computer Science Cengage Learning

EE 261 Introduction to Logic Circuits. Module #2 Number Systems

Binary Numbers. Binary Octal Hexadecimal

Binary Representation. Number Systems. Base 10, Base 2, Base 16. Positional Notation. Conversion of Any Base to Decimal.

Data Storage. Chapter 3. Objectives. 3-1 Data Types. Data Inside the Computer. After studying this chapter, students should be able to:

CDA 3200 Digital Systems. Instructor: Dr. Janusz Zalewski Developed by: Dr. Dahai Guo Spring 2012

2.2 Scientific Notation: Writing Large and Small Numbers

CS201: Architecture and Assembly Language

Chapter 2. Binary Values and Number Systems

1. Convert the following base 10 numbers into 8-bit 2 s complement notation 0, -1, -12

DNA Data and Program Representation. Alexandre David

Q-Format number representation. Lecture 5 Fixed Point vs Floating Point. How to store Q30 number to 16-bit memory? Q-format notation.

Binary Adders: Half Adders and Full Adders

Review of Scientific Notation and Significant Figures

Lecture 2. Binary and Hexadecimal Numbers

Arithmetic in MIPS. Objectives. Instruction. Integer arithmetic. After completing this lab you will:

Binary Representation

Goals. Unary Numbers. Decimal Numbers. 3,148 is s 100 s 10 s 1 s. Number Bases 1/12/2009. COMP370 Intro to Computer Architecture 1

AN617. Fixed Point Routines FIXED POINT ARITHMETIC INTRODUCTION. Thi d t t d ith F M k Design Consultant

THE EXACT DOT PRODUCT AS BASIC TOOL FOR LONG INTERVAL ARITHMETIC

2011, The McGraw-Hill Companies, Inc. Chapter 3

Number and codes in digital systems

( 214 (represen ed by flipping) (magnitude = -42) = = -42 (flip the bits and add 1)

Numerical Analysis I

What Every Computer Scientist Should Know About Floating-Point Arithmetic

Monday January 19th 2015 Title: "Transmathematics - a survey of recent results on division by zero" Facilitator: TheNumberNullity / James Anderson, UK

Lecture 11: Number Systems

CPEN Digital Logic Design Binary Systems

Bachelors of Computer Application Programming Principle & Algorithm (BCA-S102T)

COMPSCI 210. Binary Fractions. Agenda & Reading

ASCII Characters. 146 CHAPTER 3 Information Representation. The sign bit is 1, so the number is negative. Converting to decimal gives

NUMBER SYSTEMS. 1.1 Introduction

Chapter 7 - Roots, Radicals, and Complex Numbers

plc numbers Encoded values; BCD and ASCII Error detection; parity, gray code and checksums

CS101 Lecture 11: Number Systems and Binary Numbers. Aaron Stevens 14 February 2011

MEP Y9 Practice Book A

26 Integers: Multiplication, Division, and Order

FAST INVERSE SQUARE ROOT

THE BINARY NUMBER SYSTEM

How do you compare numbers? On a number line, larger numbers are to the right and smaller numbers are to the left.

Paramedic Program Pre-Admission Mathematics Test Study Guide

MATH-0910 Review Concepts (Haugen)

Number Systems and Radix Conversion

Floating point package user s guide By David Bishop (dbishop@vhdl.org)

Z80 Instruction Set. Z80 Assembly Language

Chapter 5. Binary, octal and hexadecimal numbers

Fractions and Linear Equations

Numeral Systems. The number twenty-five can be represented in many ways: Decimal system (base 10): 25 Roman numerals:

A new binary floating-point division algorithm and its software implementation on the ST231 processor

Computers. Hardware. The Central Processing Unit (CPU) CMPT 125: Lecture 1: Understanding the Computer

MACHINE INSTRUCTIONS AND PROGRAMS

High-Precision C++ Arithmetic

Figure 1. A typical Laboratory Thermometer graduated in C.

Copy in your notebook: Add an example of each term with the symbols used in algebra 2 if there are any.

Chapter 1: Digital Systems and Binary Numbers

Zuse's Z3 Square Root Algorithm Talk given at Fall meeting of the Ohio Section of the MAA October College of Wooster

Unsigned Conversions from Decimal or to Decimal and other Number Systems

Digital Design. Assoc. Prof. Dr. Berna Örs Yalçın

Session 29 Scientific Notation and Laws of Exponents. If you have ever taken a Chemistry class, you may have encountered the following numbers:

Florida Math for College Readiness

A NEW REPRESENTATION OF THE RATIONAL NUMBERS FOR FAST EASY ARITHMETIC. E. C. R. HEHNER and R. N. S. HORSPOOL

Activity 1: Using base ten blocks to model operations on decimals

Digital System Design Prof. D Roychoudhry Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

Recall the process used for adding decimal numbers. 1. Place the numbers to be added in vertical format, aligning the decimal points.

Digital Logic Design. Introduction

Multiplying and Dividing Fractions

YOU CAN COUNT ON NUMBER LINES

Implementation of Modified Booth Algorithm (Radix 4) and its Comparison with Booth Algorithm (Radix-2)

Chapter 5 Instructor's Manual

Transcription:

Floating Point Numbers Integer Representation & Arithmetic Unsigned Signed magnitude 2 s compliment Unsigned Number A = n 1 i = 0 2 i a i Ex: 01010101 2 = 85 10 00000000 2 = 0 10 11111111 2 =255 10 The range of the number [0, 2 n -1]. (for n bits number) For 8-bit min: 00000000, max: 11111111 No negative number!

Sign-Magnitude sign + magnitude sign 0 means positive, 1 means negative Ex: +18 10 = 00010010 2, -18 10 = 10010010 2 Range: [-(2 (n-1) 1), (2 (n-1) 1)] For 8 bit: max = 01111111 2 = 127 10, min = 11111111 2 = - 127 10 Problems Sign check Two representations of zero (+0 and -0) Two s Compliment MSB is the sign bit: 0 positive, 1 negative For the rest of the bits When MSB = 0, same as the unsigned binary number When MSB = 1, invert each bit in the unsigned number and then add 1 Ex: unsigned number 1710 10 = 00010001 2 2 s complement number for 1710 10 = 00010001 2 2 s complement number for -1710 10 = 11101111 2 Two s Compliment Number range [-2 (n-1), 2 (n-1) -1] For 8 bit number: min = 10000000 2 = -128 10 max = 01111111 2 = 127 Converting n-bit to m-bit (n<m) Appending m-n copies of sign bit Ex: 4-bit 0101 1101 8-bit 00000101 11111101

2 s Compliment Addition/Subtraction Addition Just like the pencil-and-paper algorithm Fixed number of bits Don t have to worry about the sign bits Overflow problem The result exceeds the range of the number system Ex: -3 6 = -3 (1101) + (-6) (1010) = -9 (0111 - +7 ) Adding two large numbers with the same sign How to check it: Two operands have the same sign but the result have different sign. Subtraction Transform to an addition Examples Overflow or not? 1101101110 11011 1100100110-0110110011 1101101110 011111 Real Numbers Numbers with fractions Could be done in pure binary 1001.1010 = 2 4 + 2 0 +2-1 + 2-3 =9.625 Where is the binary point? Fixed? Very limited Floating? How do you show where it is?

Floating Point Representation Scientific notation Ex: 27600000 = 2.76 x 10 7 0.000000276= 2.76x 10-7 Floating point binary representation +/-.significand ifi x 2 exponent Ex: 32-bit floating number Sign bit (1) Exponent (8) Significand or Mantissa (23) Floating Point Representation Sign bit 0 positive 1-- negative Exponent is in biased notation What we mean by biased notation? The bias: A fixed value, usually equals 2 K-1 1 (where K is length of the exponent) The true exponent should be the one that subtract the bias from the value in the exponent field Floating Point Representation Exponent is in biased notation Example 8 bit exponent field (K=8) value in the exponent field 0b10101010 = 170 bias 2 K-1 1= 127 Pure value range 0-255, current value = 170 Subtract 127 to get correct value, i.e. 170 127 = 43 Range -127 to +128 Why biased notation? Bias numbers can be treated similar to unsigned integers with order of the number unchanged Easy for comparing two floating numbers

Floating Point Representation Sign Exponent is in biased notation The significand (mantissa) Normalization Exponent is adjusted so that the leading bit (MSB) of mantissa is 1. Normalized form +/- 1.bbbbbbb x 2 (+/- E) Ex: Not normalized 0.110 x 25 Normalized 1.100 x 24 Since it is always 1 there is no need to store it Floating Point Examples From actual exponent to machine representation: 10100 2 = 20 10 20 10 + 127 10 (bias) = 147 10 = 10010011 2-10100 2 + bias (0111 1111 2 ) = 01101011 2 From machine representation to actual exponent: biased value = 10010011 2 = 147 actual value = 147 10-127 10 = 20 biased value = 01101011 2 = 107 actual value = 107 10-127 10 = -20 Several Issues Expressible numbers (for a 32 bit number) Overflow/underflow Negative/Positive overflow/underflow Representation of zero? Special pattern, e.g. both exponent and mantissa are zero s Accuracy Not represent more individual values Extend the range Not space evenly

Expressible Numbers Absolute value Largest largest significand ( 1.1 1) + largest exponent (11 1) = (2 2-23 ) x 2 128 Smallest smallest significand (1.0 0) + smallest exponent (00 0) = 1 x 2-127 Negative number range: [-(2 2-24 ) x 2 128, -2-127 ] Positive number range: [2-127, (2-2 -23 )x2 128 ] Several Issues Expressible numbers (for a 32 bit number) Overflow/underflow Negative/Positive overflow/underflow Representation of zero? Special pattern, e.g. both exponent and mantissa are zero s Accuracy Not represent more individual values Extend the range Not space evenly Overflow When an arithmetic operation leads to a result that is out of the expressible regions Negative/Positive overflow/underflow

Several Issues Expressible numbers (for a 32 bit number) Overflow/underflow Negative/Positive overflow/underflow Representation of zero? Special pattern, e.g. both exponent and mantissa are zero s Accuracy Not represent more individual values Extend the range Not space evenly Density of Floating Point Numbers Increase accuracy? Use more bits, e.g. double IEEE Standard Standard for floating point storage 32 and 64 bit standards 8 and 11 bit exponent respectively Extended formats (both mantissa and exponent) for intermediate results

IEEE 754 Formats Example IEEE 754 binary single/double representation of -0.75 10-0.75 10 = -0.11 2 = -1.1 x 2-1 (normalized scientific notation) For single precision sign = 1 (negative) biased value = -1 + 127 = 126 10 = 01111110 2 (unsigned) exponent = 01111110 significand (23bits): 1000 0000 0000 0000 0000 000 For double precision sign = 1 (negative) biased value = -1 + 1023 = 1022 10 = 01111111110 2 (unsigned) exponent : 01111111110 significand (52-bits): 1000 0000 0000 0000 0000 000 0 Example Decimal value for IEEE 754 binary single representation Sign 1 Exponent 10000001 (8bits) Significand 010000 0 (23bits) Significand = 1.01 2 = 1.25 Biased value = 10000001 2 (unsigned) = 129 Actual exponent value = 129 127 = 2 Sign = 1 (negative) So the decimal value = - 1.25 x 2 2 = -5.0

Exponent Representation Summary Biased representation Bias Constant (2 (k-1) -1) k=8, bias = 127; k=11, bias = 1023 Actual value to machine representation biased value = actual value + bias Convert the biased value to unsigned integer Machine representation to actual value Convert the biased value (unsigned integer) to decimal value actual value = biased value (decimal) - bias FP Arithmetic +/- Check for zeros Align significands (adjusting exponents) Add or subtract significands Normalize result FP Arithmetic Examples Decimal 1.03x10 0-4.56 x 10-2 = 1.03 x 10 0-0.0456 x 10 0 = 0.9844 x 10 0 = 9.844 x 10-1 Binary 1.101 x 2-1010 + 1.011 x 2-1011 =1.101 x 2-1010 + 0.1011 x 2-1010 =10.0101 x 2-1010 = 1.00101 x 2-1001

Summary Integer representation/range of numbers Unsigned Signed magnitute 2 s complement 2 s complement addition/subtraction how overflow Floating point representation sign/exponent/significand biased representation data range over/under flow Floating point addition/subtraction