2 Computer arithmetics

Similar documents
This Unit: Floating Point Arithmetic. CIS 371 Computer Organization and Design. Readings. Floating Point (FP) Numbers

Binary Division. Decimal Division. Hardware for Binary Division. Simple 16-bit Divider Circuit

Divide: Paper & Pencil. Computer Architecture ALU Design : Division and Floating Point. Divide algorithm. DIVIDE HARDWARE Version 1

ECE 0142 Computer Organization. Lecture 3 Floating Point Representations

Measures of Error: for exact x and approximation x Absolute error e = x x. Relative error r = (x x )/x.

CHAPTER 5 Round-off errors

Numerical Matrix Analysis

Introduction to Xilinx System Generator Part II. Evan Everett and Michael Wu ELEC Spring 2013

Correctly Rounded Floating-point Binary-to-Decimal and Decimal-to-Binary Conversion Routines in Standard ML. By Prashanth Tilleti

Computer Science 281 Binary and Hexadecimal Review

1. Give the 16 bit signed (twos complement) representation of the following decimal numbers, and convert to hexadecimal:

Data Storage 3.1. Foundations of Computer Science Cengage Learning

What Every Computer Scientist Should Know About Floating-Point Arithmetic

This 3-digit ASCII string could also be calculated as n = (Data[2]-0x30) +10*((Data[1]-0x30)+10*(Data[0]-0x30));

The string of digits in the binary number system represents the quantity

Solution for Homework 2

Oct: 50 8 = 6 (r = 2) 6 8 = 0 (r = 6) Writing the remainders in reverse order we get: (50) 10 = (62) 8

Basics of Floating-Point Quantization

To convert an arbitrary power of 2 into its English equivalent, remember the rules of exponential arithmetic:

DNA Data and Program Representation. Alexandre David

2010/9/19. Binary number system. Binary numbers. Outline. Binary to decimal

Useful Number Systems

CS321. Introduction to Numerical Methods

RN-coding of Numbers: New Insights and Some Applications

HOMEWORK # 2 SOLUTIO

RN-Codings: New Insights and Some Applications

Levent EREN A-306 Office Phone: INTRODUCTION TO DIGITAL LOGIC

CSI 333 Lecture 1 Number Systems

Arithmetic Coding: Introduction

Data Storage. Chapter 3. Objectives. 3-1 Data Types. Data Inside the Computer. After studying this chapter, students should be able to:

A new binary floating-point division algorithm and its software implementation on the ST231 processor

Floating Point Fused Add-Subtract and Fused Dot-Product Units

Numbering Systems. InThisAppendix...

Shear :: Blocks (Video and Image Processing Blockset )

Precision & Performance: Floating Point and IEEE 754 Compliance for NVIDIA GPUs

AVR223: Digital Filters with AVR. 8-bit Microcontrollers. Application Note. Features. 1 Introduction

Today. Binary addition Representing negative numbers. Andrew H. Fagg: Embedded Real- Time Systems: Binary Arithmetic

A Static Analyzer for Large Safety-Critical Software. Considered Programs and Semantics. Automatic Program Verification by Abstract Interpretation

Binary Numbering Systems

Lecture 8: Binary Multiplication & Division

Let s put together a Manual Processor

Lecture 2. Binary and Hexadecimal Numbers

Chapter 4 Register Transfer and Microoperations. Section 4.1 Register Transfer Language

Binary Number System. 16. Binary Numbers. Base 10 digits: Base 2 digits: 0 1

Converting Models from Floating Point to Fixed Point for Production Code Generation

Implementation of Modified Booth Algorithm (Radix 4) and its Comparison with Booth Algorithm (Radix-2)

Arithmetic in MIPS. Objectives. Instruction. Integer arithmetic. After completing this lab you will:

Attention: This material is copyright Chris Hecker. All rights reserved.

Digital System Design Prof. D Roychoudhry Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

EE 261 Introduction to Logic Circuits. Module #2 Number Systems

COMP 250 Fall 2012 lecture 2 binary representations Sept. 11, 2012

Fixed-Point Arithmetic: An Introduction

THE EXACT DOT PRODUCT AS BASIC TOOL FOR LONG INTERVAL ARITHMETIC

LMS is a simple but powerful algorithm and can be implemented to take advantage of the Lattice FPGA architecture.

Binary Representation. Number Systems. Base 10, Base 2, Base 16. Positional Notation. Conversion of Any Base to Decimal.

Computers. Hardware. The Central Processing Unit (CPU) CMPT 125: Lecture 1: Understanding the Computer

(Refer Slide Time: 01:11-01:27)

Em bedded DSP : I ntroduction to Digital Filters

Chapter 7D The Java Virtual Machine

Number Representation

Fast Arithmetic Coding (FastAC) Implementations

Monday January 19th 2015 Title: "Transmathematics - a survey of recent results on division by zero" Facilitator: TheNumberNullity / James Anderson, UK

Filter Comparison. Match #1: Analog vs. Digital Filters

A High-Performance 8-Tap FIR Filter Using Logarithmic Number System

Floating point package user s guide By David Bishop (dbishop@vhdl.org)

ASCII Characters. 146 CHAPTER 3 Information Representation. The sign bit is 1, so the number is negative. Converting to decimal gives

Analysis of Filter Coefficient Precision on LMS Algorithm Performance for G.165/G.168 Echo Cancellation

Numeral Systems. The number twenty-five can be represented in many ways: Decimal system (base 10): 25 Roman numerals:

NUMBER SYSTEMS. William Stallings

Binary Adders: Half Adders and Full Adders

Department of Electrical and Computer Engineering Ben-Gurion University of the Negev. LAB 1 - Introduction to USRP

Common Emitter BJT Amplifier Design Current Mirror Design

Automatic Floating-Point to Fixed-Point Transformations

Some Functions Computable with a Fused-mac

New Hash Function Construction for Textual and Geometric Data Retrieval

Introduction to IQ-demodulation of RF-data

TTT4120 Digital Signal Processing Suggested Solution to Exam Fall 2008

Digital Design. Assoc. Prof. Dr. Berna Örs Yalçın

An Efficient RNS to Binary Converter Using the Moduli Set {2n + 1, 2n, 2n 1}

CDA 3200 Digital Systems. Instructor: Dr. Janusz Zalewski Developed by: Dr. Dahai Guo Spring 2012

Numerology - A Case Study in Network Marketing Fractions

Modeling, Computers, and Error Analysis Mathematical Modeling and Engineering Problem-Solving

Number Systems and Radix Conversion

plc numbers Encoded values; BCD and ASCII Error detection; parity, gray code and checksums

Chapter 1: Digital Systems and Binary Numbers

Principles of Scientific Computing. David Bindel and Jonathan Goodman

Section 3. Sensor to ADC Design Example

Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay

A new binary floating-point division algorithm and its software implementation on the ST231 processor

A single register, called the accumulator, stores the. operand before the operation, and stores the result. Add y # add y from memory to the acc

Infinite Impulse Response Filter Structures in Xilinx FPGAs

FLOATING-POINT ARITHMETIC IN AMD PROCESSORS MICHAEL SCHULTE AMD RESEARCH JUNE 2015

DIGITAL-TO-ANALOGUE AND ANALOGUE-TO-DIGITAL CONVERSION

LSN 2 Number Systems. ECT 224 Digital Computer Fundamentals. Department of Engineering Technology

This document has been written by Michaël Baudin from the Scilab Consortium. December 2010 The Scilab Consortium Digiteo. All rights reserved.

CPEN Digital Logic Design Binary Systems

In mathematics, there are four attainment targets: using and applying mathematics; number and algebra; shape, space and measures, and handling data.

8-bit Microcontroller. Application Note. AVR201: Using the AVR Hardware Multiplier

Bachelors of Computer Application Programming Principle & Algorithm (BCA-S102T)

Transcription:

2 Computer arithmetics Digital systems are implemented on hardware with finite wordlength. Implementations require special attention because of possible quantization and arithmetic errors. Part I: Real number representations Characteristics Basis: integers Fixed-point numbers Floating-point formats Part II: Design flows of fixed-point solutions Analysis based flow Simulation based flow 1

2.1 Background: real number representations Two categories: fixed-point and floating-point. In both cases, a certain number of significant bits represent an integer value, and associated scaling maps that integer to some real number. sign (s) + significand (f) significant bits exponent (e) scaling floating point representation integer / fraction fixed during design fixed point representation Exponent: floating-point hardware performs appropriate scaling during run-time. In the case of fixed-point numbers this is a design-time decision. Can be hard! Implications for choosing the computational platform: Do we really need an optimized fixed-point solution? Or, do we want to have an easier money-saving design process? In addition, the floating-point HW might not contain all features of the standards and word length might also be limited 2

2.1.1 Fixed-point representation Fixed-point numbers are based on scaling of integers (unsigned or signed). Two s complement integers used as a basis for signed fixed-point. (1) Binary point scaling: The value represented is Ṽ = V int /2 n where V int is the integer value represented by the bit string, and n is the number of fraction bits. Notation: up.n for unsigned and sp.n for signed formats where p is the word length and n is the number of fraction bits. For example, s8.3: 3

(2) Slope-bias scaling: The value represented is Ṽ = s V int +b where s > 0 is called the slope, and b is the (offset) bias. The slope can be represented as s = f 2 e where 1 f < 2 is called the fractional slope and e shows the radix point position. - binary point scaling is a special case of this: b = 0,f = 1,e = n. - precision (the weight of the least significant bit) is equal to the slope - the goal of slope (and bias) selection: utilization of the full dynamic range Example. We want to represent the angles in the range [ π,π) with maximal precision using 6 bits. 1) Binary point scaling: - we must have two integer bits as 011 = 3. - thus the format to be used is s6.3. - the range is [ 4,4 2 3 ]. - the precision of the format is ulp = 2 3 = 0.125. 2) Slope-bias scaling: - using zero bias, we use s = π/2 5 to use the full dynamic range. - the range is then π [ 1,1 2 5 ] and ulp = s 0.0982 4

2.1.2 Fixed-point arithmetics (1) Addition: guard bits - sp.n + sp.n s(p+1).n - guard bits g added to the accumulator of the MAC data path: n n multiply 2n add 2n+g - g log 2 (N), where N is the number of terms to be added - in MAC based FIR filtering, the bound for g depends on the coefficient values (2) Multiplication: The law of conservation of bits: sp 1.n 1 sp 2.n 2 s(p1+p 2 ).(n 1 +n 2 ) - note: one extra integer bit introduced (e.g. s4.3 s4.3 s8.6; 4-3-1 = 0, 8-6-1 = 1) - if the largest negative value does not occur, that extra integer bit is not needed 5

Modes of arithmetic: - in integer arithmetic fixed-point values are treated as integers, and the programmer must take care that overflows do not occur (e.g. intermediate scaling operations, coefficient magnitude restrictions). - in fractional arithmetic, one uses fractional fixed-point values. Multiplication and storing the result can be implemented in a special manner: Operand A Operand B S binary point integer multiplication (saturating to handle 1 x 1) S S (S) x x arithmetic shift left + binary point movement Result: S x x (rounding +) taking the most significant bits S x Multiplication by a power of two: (1) can be implemented simply as an arithmetic shift - left: may cause overflow, right: precision may be lost (2) the movement of the binary point to the right/left - overflow and precision loss is not possible - basis of the CORDIC algorithm discussed later 6

(3) Signal quantization: rounding of the arithmetic results to specific word lengths - There are different kinds of rounding methods: (1) truncation: simply discard least significant bits (2) round-to-nearest (3) convergent rounding (4) magnitude truncation - Introduces roundoff noise e s round s q modelled as s s q - Depending on the rounding method, noise can be biased (expectation E{e} 0) - The quantization noise gets amplified through noise transfer functions (4) Overflow handling: hardware may use guard bits, wrapping, or saturation - in the case of wrapping, overflows are neglected in HW. Therefore, one must either (1) ascertain that the final result is within the range, or (2) check that overflows cannot occur (by analysis/simulation) - saturating operations are not associative! Therefore some standards for algorithms may specify exact order of performing operations 7

2.1.3 Floating-point representation Design of the representation is based on HW implementation issues Bit string parts: (1) sign bit, (2) exponent, and (3) significand (order is important!). Modes of bit string interpretation: normalized number: the exponent is adjusted so that the maximum precision is achieved. Ṽ = ( 1) s (1+f) 2 e e b, where s is the value of the sign bit, e is the unsigned integer encoded by the exponent part, e b is the exponent bias, and f is the unsigned fractional fixed point value encoded by the significand. zero: representations of +0 and -0 denormalized number: for representing small numbers, which fill the underflow gap around zero infinity, not-a-number (NaN) IEEE 754 standard formats are commonly used single precision (32 bits = sign bit + exponent 8 bits + significand 23 bits; e b = 127) double precision (64 bits = 1 + 11 + 52; e b = 1023) half precision (16 bits = 1 + 5 + 11; e b = 15): used especially in computer graphics Non-standard format may be designed for arithmetics of a particular application. Support for all modes might not be needed in HW. 8

Example. IEEE 754 single precision Mode When normalized e {1,2,...,254} zero e = 0,f = 0 denormalized e = 0,f 0 infinity e = 255,f = 0 not-a-number e = 255,f 0 9

2.2 Design of a fixed-point signal processing solution 1. The first step in the development of signal processing systems is the development / selection of the algorithms: - design of a floating-point reference model - exploration of the numerical properties of the algorithm using that model - e.g. Matlab provides excellent tools for the design 2. Then, fixed-point model should be designed, using the floating-point model as a reference - the fixed-point model serves as the verification model for the final implementation - details of the fixed-point design reflect the primitives of the hardware e.g. split ALUs, CORDIC processors, saturation/wrapping arithmetic... - sufficient word lengths for various number objects must be determined (what ranges and precisions are needed?) Two basic approaches for the conversion work are: Analytic approach. Favored by algorithm designers who do not have complete understanding of the hardware. Simulation approach. Favored by HW designers who frown upon mathematics of the models. 10

2.2.1 Analysis based design flow Design outline: Algorithm design via mathematical modelling Analysis of the coefficient quantization effects Analysis of the rounding and scaling effects Selection of the word lengths Simulation to verify the design results (1) Coefficient quantization: one must check that the specifications are met, and in the case of IIR filtering that the filter remains stable Analysis example: - assume FIR filter with N coefficients - sp.(p 1) format for the coefficients is to be selected - quantization introduces a parallel filter: H q (z) = H(z)+E(z) - max. quantization error for the format is ulp/2. ulp=2 (p 1). Thus, e(n) 2 p. - assuming the worst-case error for all coefficients, we get E(ω) = 2 p N - maximum stop-band attenuation is therefore bounded by 20log 10 (2 p N) - having a specification available, we get an upper bound for p 11

(2) Effect of the roundoff noise - one can think in terms of the impulse response h(n) from the round-off location to the output - if the quantization interval is q then the signal quantization noise power is σ 2 ir = q2 /12 - the effect at output is σor 2 = σ2 ir h 2 (n) n=0 - if noises are assumed uncorrelated the effects of round-off locations sum up - however, the noises can be correlated. E.g. in telecommunications the signals are often preiodic - such correlation leads to peaking in the noise spectrum (3) Overflows in IIR structures - the adders on the feedback path may overflow - use of saturation arithmetic may lead to instability, and spoil the response - another solution is to perform input scaling to reduce its range Note on analysis: - approximation formulas may only provide quick checks and starting points for the design - more detailed derivations needed, or simulation-based approach taken 12

2.2.2 Tool/simulation based design flow Simulation-based development of fixed-point reference model can be based on writing C/C++ code alternative: in Matlab one can use the Fixed Point Toolbox the Simulink environment also contains features that can be used to implement fixed-point simulations Outline of the design process 1. Algorithm design using floating-point precision 2. Conversion into full-precision fixed-point model 3. Determination of the coefficient quantization effects 4. Determination of the maxima via value logging 5. Reduction of the word lengths for implementation platform Demonstration: fixed-point modelling support in the Simulink environment 13