Fast Logarithms on a Floating-Point Device



Similar documents
Binary Search Algorithm on the TMS320C5x

Floating Point C Compiler: Tips and Tricks Part I

SN54165, SN54LS165A, SN74165, SN74LS165A PARALLEL-LOAD 8-BIT SHIFT REGISTERS

Analysis of Filter Coefficient Precision on LMS Algorithm Performance for G.165/G.168 Echo Cancellation

Binary Numbering Systems

SDLS068A DECEMBER 1972 REVISED OCTOBER Copyright 2001, Texas Instruments Incorporated

Monitoring TMS320C240 Peripheral Registers in the Debugger Software

Divide: Paper & Pencil. Computer Architecture ALU Design : Division and Floating Point. Divide algorithm. DIVIDE HARDWARE Version 1

Motor Speed Measurement Considerations When Using TMS320C24x DSPs

Binary Number System. 16. Binary Numbers. Base 10 digits: Base 2 digits: 0 1

Designing Gain and Offset in Thirty Seconds

EDI s x32 MCM-L SRAM Family: Integrated Memory Solution for TMS320C3x DSPs

RETRIEVING DATA FROM THE DDC112

SINGLE-SUPPLY OPERATION OF OPERATIONAL AMPLIFIERS

TMS320C67x FastRTS Library Programmer s Reference

Signal Conditioning Wheatstone Resistive Bridge Sensors

WHAT DESIGNERS SHOULD KNOW ABOUT DATA CONVERTER DRIFT

Implementing an In-Service, Non- Intrusive Measurement Device in Telecommunication Networks Using the TMS320C31

Theory of Operation. Figure 1 illustrates a fan motor circuit used in an automobile application. The TPIC kω AREF.

EDI s x32 MCM-L SRAM Family: Integrated Memory Solution for TMS320C4x DSPs

Audio Tone Control Using The TLC074 Operational Amplifier

TSL INTEGRATED OPTO SENSOR

TSL250, TSL251, TLS252 LIGHT-TO-VOLTAGE OPTICAL SENSORS

A Low-Cost, Single Coupling Capacitor Configuration for Stereo Headphone Amplifiers

Filter Design in Thirty Seconds

Guidelines for Software Development Efficiency on the TMS320C6000 VelociTI Architecture

SN28838 PAL-COLOR SUBCARRIER GENERATOR

CUSTOM GOOGLE SEARCH PRO. User Guide. User Guide Page 1

Wireless Subwoofer TI Design Tests

Multi-Transformer LED TV Power User Guide. Anderson Hsiao

Teaching DSP through the Practical Case Study of an FSK Modem

Controlling TAS5026 Volume After Error Recovery

Current-Transformer Phase-Shift Compensation and Calibration

High-Speed Gigabit Data Transmission Across Various Cable Media at Various Lengths and Data Rate

SN54F157A, SN74F157A QUADRUPLE 2-LINE TO 1-LINE DATA SELECTORS/MULTIPLEXERS

CSI 333 Lecture 1 Number Systems

How To Make A Two Series Cell Battery Pack Supervisor Module

SN54HC157, SN74HC157 QUADRUPLE 2-LINE TO 1-LINE DATA SELECTORS/MULTIPLEXERS

MBA Jump Start Program

How To Close The Loop On A Fully Differential Op Amp

Data Storage 3.1. Foundations of Computer Science Cengage Learning

The string of digits in the binary number system represents the quantity

ECE 0142 Computer Organization. Lecture 3 Floating Point Representations

APPLICATION BULLETIN

APPENDIX B. Routers route based on the network number. The router that delivers the data packet to the correct destination host uses the host ID.

Using C to Access Data Stored in Program Space Memory on the TMS320C24x DSP

Using the Texas Instruments Filter Design Database

NUMBER SYSTEMS. William Stallings

LSN 2 Number Systems. ECT 224 Digital Computer Fundamentals. Department of Engineering Technology

Positional Numbering System

Binary Division. Decimal Division. Hardware for Binary Division. Simple 16-bit Divider Circuit

Number Representation

Next-Generation BTL/Futurebus Transceivers Allow Single-Sided SMT Manufacturing

bq2114 NiCd or NiMH Gas Gauge Module with Charge-Control Output Features General Description Pin Descriptions

Application Report SLVA051

Data Storage. Chapter 3. Objectives. 3-1 Data Types. Data Inside the Computer. After studying this chapter, students should be able to:

Lecture 8: Binary Multiplication & Division

Simplifying System Design Using the CS4350 PLL DAC

Texas Instruments. FB PS LLC Test Report HVPS SYSTEM AND APPLICATION TEAM REVA

Notes on Factoring. MA 206 Kurt Bryan

How To Fix An Lmx9838 Bluetooth Serial Port Module With Bluetooth (Bluetooth 2) From A Bluetooth Bluetooth 4.2 Device With A Bluembee 2.2 Module

THE RIGHT-HALF-PLANE ZERO --A SIMPLIFIED EXPLANATION

Basics of Floating-Point Quantization

This Unit: Floating Point Arithmetic. CIS 371 Computer Organization and Design. Readings. Floating Point (FP) Numbers

QUADRO POWER GUIDELINES

Supply voltage Supervisor TL77xx Series. Author: Eilhard Haseloff

PURSUITS IN MATHEMATICS often produce elementary functions as solutions that need to be

2010/9/19. Binary number system. Binary numbers. Outline. Binary to decimal

Managing Code Development Using CCS Project Manager

SN54HC191, SN74HC191 4-BIT SYNCHRONOUS UP/DOWN BINARY COUNTERS

CHAPTER 5 Round-off errors

Smart Battery Module with LEDs and Pack Supervisor

Attention: This material is copyright Chris Hecker. All rights reserved.

Using C to Access Data Stored in Program Memory on the TMS320C54x DSP

Advanced Tutorials. Numeric Data In SAS : Guidelines for Storage and Display Paul Gorrell, Social & Scientific Systems, Inc., Silver Spring, MD

DUAL MONITOR DRIVER AND VBIOS UPDATE

Lecture 11: Number Systems

This 3-digit ASCII string could also be calculated as n = (Data[2]-0x30) +10*((Data[1]-0x30)+10*(Data[0]-0x30));

Understanding the Terms and Definitions of LDO Voltage Regulators

Solution for Homework 2

COMP 250 Fall 2012 lecture 2 binary representations Sept. 11, 2012

BA-35 Solar Quick Reference Guide

Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay

Solving Exponential Equations

Useful Number Systems

CSE373: Data Structures and Algorithms Lecture 3: Math Review; Algorithm Analysis. Linda Shapiro Winter 2015

The BBP Algorithm for Pi

AN2604 Application note

AN48. Application Note DESIGNNOTESFORA2-POLEFILTERWITH DIFFERENTIAL INPUT. by Steven Green. 1. Introduction AIN- AIN+ C2

IMPORT/EXPORT CUSTOMER REVIEWS. User Guide. User Guide Page 1

8-bit Microcontroller. Application Note. AVR201: Using the AVR Hardware Multiplier

EVALUATING ACADEMIC READINESS FOR APPRENTICESHIP TRAINING Revised For ACCESS TO APPRENTICESHIP

Motorola 8- and 16-bit Embedded Application Binary Interface (M8/16EABI)

Digital Multiplexer and Demultiplexer. Features. General Description. Input/Output Connections. When to Use a Multiplexer. Multiplexer 1.

Expert Reference Series of White Papers. Simple Tricks To Ace the Subnetting Portion of Any Certification Exam COURSES.

CMOS Power Consumption and C pd Calculation

Vilnius University. Faculty of Mathematics and Informatics. Gintautas Bareikis

Transcription:

TMS320 DSP DESIGNER S NOTEBOOK Fast Logarithms on a Floating-Point Device APPLICATION BRIEF: SPRA218 Keith Larson Digital Signal Processing Products Semiconductor Group Texas Instruments March 1993

IMPORTANT NOTICE Texas Instruments (TI) reserves the right to make changes to its products or to discontinue any semiconductor product or service without notice, and advises its customers to obtain the latest version of relevant information to verify, before placing orders, that the information being relied on is current. TI warrants performance of its semiconductor products and related software to the specifications applicable at the time of sale in accordance with TI s standard warranty. Testing and other quality control techniques are utilized to the extent TI deems necessary to support this warranty. Specific testing of all parameters of each device is not necessarily performed, except those mandated by government requirements. Certain application using semiconductor products may involve potential risks of death, personal injury, or severe property or environmental damage ( Critical Applications ). TI SEMICONDUCTOR PRODUCTS ARE NOT DESIGNED, INTENDED, AUTHORIZED, OR WARRANTED TO BE SUITABLE FOR USE IN LIFE-SUPPORT APPLICATIONS, DEVICES OR SYSTEMS OR OTHER CRITICAL APPLICATIONS. Inclusion of TI products in such applications is understood to be fully at the risk of the customer. Use of TI products in such applications requires the written approval of an appropriate TI officer. Questions concerning potential risk applications should be directed to TI through a local SC sales office. In order to minimize risks associated with the customer s applications, adequate design and operating safeguards should be provided by the customer to minimize inherent or procedural hazards. TI assumes no liability for applications assistance, customer product design, software performance, or infringement of patents or services described herein. Nor does TI warrant or represent that any license, either express or implied, is granted under any patent right, copyright, mask work right, or other intellectual property right of TI covering or relating to any combination, machine, or process in which such semiconductor products or services might be or are used. Copyright 1997, Texas Instruments Incorporated

TRADEMARKS TI is a trademark of Texas Instruments Incorporated. Other brands and names are the property of their respective owners.

CONTACT INFORMATION US TMS320 HOTLINE (281) 274-2320 US TMS320 FAX (281) 274-2324 US TMS320 BBS (281) 274-2323 US TMS320 email dsph@ti.com

Contents Abstract... 7 Design Problem... 8 Solution... 8 Figures Figure 1. Accuracy Plot... 10 Tables Table 1. Squaring Operation of F0 = 1.5... 9

Fast Logarithms on a Floating-Point Device Abstract This document discusses a fast way to calculate logarithms (base 2) on a TMS320C30 or TMS320C40. This TMS320C30/C40 function calculates the log base two of a number in about half the time of conventional algorithms. The method can easily be scaled for faster execution if less accuracy is desired. The mathematics of the function is discussed in detail. There are plots to determine accuracy at different calculation levels, and a complete code listing. Fast Logarithms on a Floating-Point Device 7

Design Problem Solution What is the fastest way to calculate logarithms (base 2) on a TMS320C30 or TMS320C40? The following TMS320C30/C40 function calculates the log base two of a number in about half the time of conventional algorithms. Furthermore, the method can easily be scaled for faster execution if less accuracy is desired. The method is efficient because the algorithm uses the floating-point multipliers exponent/normalization hardware in a unique way. The following is a proof of the algorithm. The value of a floating point number X is given by: X = 2^EXP_old * mant_old If you then consider that the bit fields used to store the exponent and mantissa are actually integer, you will notice that the exponent is already in log2 (log base 2) form. In fact, the exponent is nothing more than a normalizing shift value. By converting both sides of the first equation to a logarithm, we find that the logarithm of the value becomes the sum of the exponent and mantissa in log form: log2(x) = EXP_old + log2(mant_old) (Log base two) Since EXP is in the exponent register, no calculation is needed and the value can be used directly as an integer. To extract the value of the exponent, PUSH, POP, and masking operations are used. The remaining mantissa conversion is done by first forcing the exponent bits to zero using an LDE 1.0 instruction. This causes the exponent term 2^EXP to equal 1.0, leaving 1.0 Value < 2.0. Then, by using the following identity, the logarithm of the mantissa can be extracted from the final result exponent. If the value (mant_old) is repeatedly squared, the sequence becomes: X_new = mant_old^n Where: 1.0 X_new < 2^N N = 1,2,4,8,16... Since the hardware multiplier will restructure the new value (X_new) during each squaring operation, we see that X_new will be represented by a new exponent (EXP_new) and mantissa (mant_new). X_new = 2^EXP_New * mant_new 8 SPRA218

By then applying familiar logarithm rules, we find that EXP_new holds the logarithm of Old_mant. This is best shown by setting the previous two equations equal to each other and taking the logarithm of both sides. mant_old^n = 2^EXP_new * mant_new N=1,2,4,8,16... N * log2(mant_old) = EXP_new + log2(mant_new) log2(mant_old) = EXP_new/N + log2(mant_new)/n This last equation shows that the logarithm of mant_old is indeed related to EXP_new. And as shown earlier, EXP_new can be separated from the new mantissa and used as the logarithm of the original mantissa. We also need to consider the divisor N, which is defined to be the series 1, 2, 4, 8, 16..., and EXP_new is an integer. The division by N becomes a shift for each squaring operation. What remains is to concatenate the bits of EXP_new to EXP_old and then repeat the process until the desired accuracy is achieved. Example Consider a mantissa value of 1.5 and an exponent value of 0 (giving an exponent multiplier 2^0, or 1.0). The TMS320C30/ C40 extended register bit pattern for the algorithm sequence is shown below. Table 1. Squaring Operation of F0 = 1.5 Exp S Mantissa 00000000 0 1000000000000000000000000000000 X =1.5 Exp=0 00000001 0 0010000000000000000000000000000 X^2 =2.25 Exp=1 00000010 0 0100010000000000000000000000000 X^4 =5.0625 Exp=2 00000100 0 1001101000010000000000000000000 X^8 =25.628906 Exp=4 00001001 0 0100100001101011101000001000000 X^16 =656.84083 Exp=9 00010010 0 1010010101010011111101110011111 X^32 =431.43988-E3 Exp=18 00100101 0 0101101010110110101000010101001 X^64 =186.14037-E9 Exp=37 01001010 0 1101010110010010001010101100011 X^128 =34.648238-E21 Exp=74 XXXXXXXX S MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM Hand-calculated value of log2(1.5) log2(1.5) = 0.58496250 = 1001010 111000000 xxxxxxx Åfirst 7 bits (exponent) mmm Åquick 3 bits (mantissa) Fast Logarithms on a Floating-Point Device 9

Figure 1. Accuracy Plot If you compare the hand-calculated value and the binary representation of log2(1.5) you will find that the sequence of bits in the exponent (seven bits worth) are equivalent to the seven MSBs of the logarithm. If the exponent could hold all the bits needed for full accuracy, then it would be possible to continue the operation for all 24 bits of the mantissa. Since there are only eight bits in the exponent and the MSBs is used for negative values, only seven iterations are possible before the exponent must be off-loaded and reinitialized to zero. By concatenating EXP_new to the previous exponent, longer strings of bits can be built for greater accuracy. The process is then repeated until the desired accuracy is achieved. Also remember that the original numbers exponent, which represents the whole number part of the result, becomes the eight MSBs of the final result. Another trick is to look at the three MSBs of the mantissa, and apply a roundup from the fourth bit, those same MSBs can be used as a quick extension of the exponent (logarithm). To visualize this, consider the following tabulated values and graph. NOTE: Notice how the fractional part is the same at the endpoints. 10 SPRA218

Example 1. Code Listing In the middle, only a slight bowing exists which can either be ignored or option-ally rounded for better accuracy. The maximum actually occurs at a mantissa value of 1/ln(2.0) or 1.442695. The value of log2(mant) at that point is 0.52876637, giving a maximum error of 0.086071. When finished, the bits representing the finished logarithm are in a fixed-point notation and will need to be scaled. This is done by using the FLOAT instruction followed by a multiplication by a constant scaling factor. If the final result needs to be in any other base, the scaling factor is simply adjusted for that base. Here are a few more helpful points. The round-off accuracy of the first three squaring operations will affect the final result if >21 mantissa bits are desired. A RND instruction placed after the first three MPYF R0,R0 instructions will remedy this, but adds to the cycle count. When the input value approaches 1.0, the result will be driven close to zero and accuracy will suffer. In this case, an input range comparison and a branch to a McLauren series expansion is used as a solution with minimal degradation in speed. This is because the power series converges quickly for input values close to 1.0. If you only need to calculate a visual quality logarithm, such as in spectrum analysis, the logarithm can often be calculated in one cycle. In this case the mantissa is substituted directly into the fractional bits of the logarithm giving a maximum error of 0.086 (about 3.5 bits). The one cycle arises from the need to remove the 2 s compliment sign bit in the TMS320C30/ C40 s mantissa. As far as your eye is concerned, it will never notice the difference! ************************************************************* * FAST logarithm for FFT displays * * >>>> NEED ONLY ADD ONE INSTRUCTION IN MANY CASES <<<< * ************************************************************* ; MPYF REAL,REAL,R0 ; calculate the magnitude MPYF IMAG,IMAG,R1 ; Note: sign bit is zero ADDF R1,R0 ; ASH -1,R0 ;<- One instruction logarithm! STF R0,OUT ; scaled externally in DAC ; ************************************************************* * _log_e.asm DEVICE: TMS320C30 * *************************************************************.global _log_e _log_e: POP AR1 ; return address -> AR1 POPF R0 ; X -> R0 LDF R0,R1 ; use R1 to accumulate answer Fast Logarithms on a Floating-Point Device 11

LDI 2,RC ; repeat 3x RPTB loop ; 8 + 13*3 + 9 ASH 7,R1 ; LDE 1.0,R0 ; EXP = 0 MPYF R0,R0 ; mant^2 MPYF R0,R0 ; mant^4 MPYF R0,R0 ; mant^8 MPYF R0,R0 ; mant^16 MPYF R0,R0 ; mant^32 MPYF R0,R0 ; mant^64 MPYF R0,R0 ; mant^128 PUSHF R0 ; offload 7 bits of exponent POP R3 ; ASH -24,R3 ; remove mantissa loop: OR R3,R1 ; R2 accumulates EXP <log2(man)> ASH 11,R1 ; Jam mant_r1 to top ; (concat. EXP_old) ASH -20,R0 ; align and append the ; MSBs of mant_r0 OR R0,R1 ; (accurate to 3 bits) PUSHF R1 ; PUSH EXP and Mantissa ; (sign is now data!) POP R0 ; POP as integer (EXP+FRACTION) BD AR1 ; FLOAT R0 ; convert EXP+FRACTION to float MPYF @CONST,R0 ; scale the result by 2^-24 ; and change base ADDI 1,SP ; restore stack pointer.data CONST_ADR:.word CONST CONST.long 0e7317219h ;Base e hand calc w/1 lsb round.end 12 SPRA218