# LECTURE 1: Number representation and errors

Save this PDF as:

Size: px
Start display at page:

## Transcription

1 LECTURE 1: Number representation and errors September 10, 2012

2 1 The representation of numbers in computers is usually not based on the decimal system, and therefore it is necessary to understand how to convert between different systems of representation. Computers also have a limited amount of bits for representing numbers, and for this reason numbers with arbitrarily large magnitude cannot be represented nor can floating-point numbers be represented with arbitrary precision. In this chapter, we define the key concepts of number representation in computers and discuss the main sources of errors in numerical computing. 0.1 Representation of numbers in different bases Decimal system We first review the representation of numbers in the familiar decimal system and then generalize the concept for systems with different base numbers. Integer part In the decimal system, the digits 0, 1, 2, 3, 4, 5, 6, 7, 8 and 9 are used for the representation of numbers. The individual digits in a number, such as 37294, are the coefficients of powers of 10: = = In general, a string of digits represents a number according to the formula Fractional part a n a n 1...a 1 a 0 = a a a n 1 10 n 1 + a n 10 n A number between 0 and 1 is represented by a string of digits to the right of a decimal point. The individual digits represent the coefficients of negative powers of 10. In general, we have the formula For example, 0.b 1 b 2 b 3... = b b b General form of base-10 numbers = 7/10 + 2/ / /10000 = The representation of a real number in the decimal system is given by the following general form in which the integer part is the first summation and the decimal part is the latter summation. (a n a n 1...a 1 a 0.b 1 b 2 b 3...) 10 = n k=0 a k 10 k + k=1 b k 10 k The notation () 10 is used for the decimal system to indicate that the number 10 is used as the base number.

3 Base β numbers In computers, the number 10 is generally not used as the base for number representation. Instead, other systems, such as the binary (2 as the base), the octal (8 as the base), and the hexadecimal (16 as the base) systems are commonly used in computing. In the general form, the base is noted as β. General form of base β numbers The digits used in the general β representation are 0,1,2,...,β 1. For example, in the octal representation of a number, the digits used are 0,1,2,3,4,5,6, and 7. The general form of base β representation is given by (a n a n 1...a 1 a 0.b 1 b 2 b 3...) β = n k=0 a k β k + k=1 b k β k The separator between the integer and fractional part is called the radix point since decimal point is reserved for base-10 numbers. For example, in the octal representation, the individual digits before the radix point refer to increasing powers of 8: (21467) 8 = = 7 + 8(6 + 8(4 + 8(1 + 8(2)))) = 9015 To convert this number to the decimal system, we can build the nested form of the number by taking a common denominator and then replacing each of the numbers by its representation in the decimal system. We will soon discuss the conversion between different bases in more detail. A number between 0 and 1, when expressed in the octal system, is given by the coefficients of negative powers of 8. For example, Conversion of integer part ( ) 8 = = 8 5 ( (2 + 8(6 + 8(3)))) = 15495/32768 = We now consider a general procedure for converting from one base to another. We do this by considering the conversion of the integer and fractional parts separately. Consider an integer N in the number system with base α: N = (a n a n 1...a 1 a 0 ) α = n a k α k k=0 In order to convert N to the number system with base β, we first write N in its nested form N = a 0 + α(a 1 + α(a α(a n 1 + α(a n ))...)) and then replace each of the numbers on the right by its representation in the β-arithmetic. The replacement requires a table showing how each of the numbers is represented in the

4 3 β-system. In addition, a base-β multiplication table may be required. As an example, consider the conversion of the decimal number 3781 to binary form. (3781) 10 = (8 + 10(7 + 10(3))) = (1) 2 + (1010) 2 ((1000) 2 + (1010) 2 ((111) 2 + (1010) 2 (11) 2 )) = ( ) 2 The following table can be used for the replacement of base-10 numbers by their binary counterparts: DEC BIN This calculations is easy for computers but quite tedious for humans. Therefore, another procedure can be used for hand calculations. Taking the conversion from the decimal to the binary system as an example, we continuously divide by 2, until 0 remains. Then the converted number is given by the remainders of the division in reversed order.

5 4 For example, here each number on the left hand side is obtained by dividing the previous number by 2 and taking the integer part, the quotient is marked on the right hand side. Thus the converted number is ( ) 2. Quotients Remainders Conversion of fractional part The conversion of the fractional part can be done by the following direct, but somewhat useless approach: (0.372) 10 = ( 1 (011) = (1010) 2 (1010) 2 ( (111) 2 + )) 1 ((010) 2 ) (1010) 2 Since dividing binary arithmetic is not straightforward, we look for alternative ways of doing the conversion such that the arithmetic can be carried out in the decimal system. Suppose that x is in the range 0 < x < 1 and given in the β representation: Observe that x = c k β k = (0.c 1 c 2 c 3...) β k=1 βx = (c 1.c 2 c 3 c 4...) β because we only need to multiply by β to shift the radix point. Thus the unknown digit c 1 can be described as the integer part of βx, denoted as I (βx). Denote the fractional part of βx by F (βx). The process can be repeated until all the unknown digits c k have been converted: d 0 = x d 1 = F (βd 0 ) c 1 = I (βd 0 ) d 2 = F (βd 1 ) c 2 = I (βd 1 ).

6 5 For example, here we repeatedly multiply by 2 and remove the integer parts to get (0.372) 10 = ( ) 2. Products Digits etc. 0.2 Representation of numbers in computers In this section, we discuss how numbers are represented in typical computers and how arithmetic between the numbers works. We begin by discussing integer numbers and then continue to floating-point numbers Integer numbers A number in integer presentation is exact. Arithmetic between two integer numbers is also exact provided that (i) the answer is not outside the range of representable numbers, and that (ii) division is interpreted as producing an integer result (throwing any remainder away). Unsigned integers are easy to represent. All digits are represented with bits 0 and 1, and thus the string of digits can be directly interpreted as a binary number. For example, with eight bits, we have = ( ) 10 = (57) 10 To include also negative numbers, we must assign a separate sign bit. The first bit of the string is the sign bit which is zero for positive numbers and one for negative numbers. The most often used method for obtaining the representation for negative numbers is a method called two s complement: In the binary representation of a positive number x, invert all the bits (0 1), and add 1 to get the binary representation of x. For example, if we have eight bits for the representation (of which one is for the sign and seven for the digits), then +2 = = = = =

7 6 The largest representable number in an n-bit system is 2 n 1 1. For example, a 32-bit integer number can have values between = = ( ) 2 and 2 31 = = ( ) 2 In the minimum value, the sign bit has been interpreted as part of the number. Most compilers do not give error messages of exceeding the range of integer numbers, except in some obvious situations. As an example, consider the following C code. int main(void) { short int si; int i, k, j1, j2, j3, j4; si = 1; i = 1; for(k=1; k<33; k++) { si *= 2; i *= 2; fprintf(stdout,"%3d%8d%12d\n",k,si,i); } } j1 = ; fprintf(stdout,"\n%d\n",j1); j2 = ; fprintf(stdout,"\n%d\n",j2); j3 = ; fprintf(stdout,"\n%d\n",j3); j4 = ; fprintf(stdout,"\n%d\n",j4); COMPILING: warning: decimal constant is so large that it is unsigned

8 7 OUTPUT:

9 Floating-point numbers In a computer, there are no real numbers or rational numbers (in the sense of the mathematical definition), but all noninteger numbers are represented with finite precision. For example, the numbers π or 1/3 cannot be represented precisely (in any representation), and in some representations, the numbers and are equally large. The floating-point representation in computers is based on an internal division of bits which are reserved for representing a given number x. The number is represented in three parts: a sign s that is either + or -, and an integer exponent c, and a positive mantissa M: x = s B c E M where B is the base of the representation (usually B = 2) and E is the bias of the exponent (a fixed integer constant for any given machine and representation which enables representing negative exponents without a separate sign bit). In the decimal system, this corresponds to the following normalized floating-point form where d 1 0 and n is an integer. x = ±0.d 1 d 2 d n = ±r 10 n In most computers, floating-point numbers are represented in the following standard IEEE floating-point form x = s 2 c E (1. f ) 2 The first bit s is the sign bit (0 = + and 1 = ). The next bits are used to represent the exponent c corresponding to 2 c E where E is a constant which enables representing negative exponents without a separate sign bit. The last bits are reserved for the mantissa (also called the significand) which is given in the "1-plus" form (1. f ) 2. The IEEE standard defines single-precision (32 bits) and double-precision (64 bits) floatingpoint numbers. The available bits are allocated as shown in Fig. 1 (constant E is 127 for single precision and 1023 for double precision): sign s exponent c mantissa f 1 bit 8 / 11 bits 23 / 52 bits Figure 1: IEEE standard for floating-point numbers.

10 9 Machine numbers A floating-point number system within a computer is always limited by the finite wordlength of computers. This means that only a finite number of digits can be represented. As a consequence, numbers that are too large or too small cannot be represented. Moreover, most real numbers cannot be represented exactly in a computer. For example, 1 10 = (0.1) 10 = ( ) 2 The effective number system for a computer is not a continuum but a discrete set of numbers called the machine numbers. In a normalized representation (used in most computers), the bit patterns are as "left shifted" as possible, corresponding to the "1-plus" form of the mantissa (this form doesn t waste any bits but uses the correct exponent such that there are no leading zero bits in the mantissa). The machine numbers in the normalized representation are not uniformly spaced but unevenly distributed about zero. For example, if we list all floating-point numbers which can be expressed in the following normalized form x = ±(0.1b 2 b 3 ) 2 2 ±k (k,b i [0,1]) where we have only two bits for the mantissa (b 2 and b 3 ) and one bit for the exponent (k), we get the set of numbers shown in Fig. 2. This shows the phenomenon known as the hole at zero. There is a relatively wide gap between the smallest positive number and zero, that is (0.100) = 1/4. 0 1/4 3/8 5/16 7/16 1/2 5/8 3/4 7/8 1 5/4 3/2 7/4 Figure 2: Representable numbers (machine numbers) in the example number system. Machine epsilon Arithmetic among numbers in floating-point representation is not exact even if the operands happen to be exactly represented. For example, two floating-point numbers are added by first right-shifting (dividing by two) the mantissa of the smaller one (in magnitude), simultaneously increasing the exponent, until the two operands have the same exponent. Low-order (least significant) bits of the smaller operand are lost by this shifting. If the two operands differ too much in magnitude, then the smaller operand is effectively replaced by zero. This leads us to the concept of machine accuracy. Machine epsilon ε (or machine accuracy) is defined to be the smallest number that can be added to 1.0 to give a number other than one. Thus, the machine epsilon describes the accuracy of floating-point calculations. Note that this is not the same as the smallest floating-point number that is representable in a given computer (see below)! When using single-precision numbers, there are 23 bits allocated for the mantissa. The machine epsilon is given by ε = , which means that numbers can be represented with approximately six accurate digits. In double precision, we have 52 bits allocated for the mantissa, and thus the machine epsilon is approximately ε = , giving the accuracy of about 15 digits.

11 10 To a great extent any arithmetic operation among floating-point numbers should be thought of as introducing an additional fractional error of at least ε. This type of error is called roundoff error (we return to this in more detail later in this chapter). Smallest and largest numbers The smallest and largest representable numbers are determined by the number of bits that are allocated for the exponent c. [Note: the machine epsilon, in contrast, depends on how many bits there are in the mantissa.] The smallest representable number is given by and similarly, the largest number is x min = 2 c min x max = M max 2 c max where c min and c max are the minimum and maximum values of the exponent, and M max is the maximum value of the mantissa. The value of c in the representation of single-precision floating-point numbers is restricted by the inequality 0 < c < ( ) 2 = 255 The values 0 and 255 are reserved for special cases including ±0 and ±. Hence, the actual exponent is restricted by Likewise, the mantissa is restricted by 126 < c 127 < (1. f ) 2 ( ) 2 = The largest representable 32-bit number is therefore ( ) The smallest number is Zero, infinity and NaN The IEEE standard defines some useful special characters. The number zero is represented by all bits being zero. The sign bit, however, can obtain both values, and thus there are two different representations of zero: +0 (sign bit 0) and -0 (sign bit 1).

12 11 Infinity is defined by setting all the exponent bits to one and those in mantissa to zero. Depending on the sign bit, we have two different representation of infinity: + and. The following rules apply to arithmetic operations involving : ± x =, x =, x = 0 In addition, the standard defines a constant called NaN (not a number) which is an error pattern rather than a number. It is obtained by setting all bits to ones. C header file float.h In C, the header file float.h contains constants which reflect the characteristics of that machine s floating-point arithmetic. Some of the most widely used values are (these are obtained from a Linux/Intel -machine): Number of decimal digits for float (FLT_DIG) = 6 Number of decimal digits for double (DBL_DIG) = 15 Precision for float (FLT_EPSILON) = e-07 Precision for double (DBL_EPSILON) = e-16 Maximum float (FLT_MAX) = e+38 Maximum double (DBL_MAX) = e+308 Minimum positive float (FLT_MIN) = e-38 Minimum positive double (DBL_MIN) = e-308 Single or double precision? Calculations with double-precision numbers are somewhat slower than when using single precision. The following loop (N = 10 7 ) for(i=0; i<n; i++) { x += 0.1; y += sin(x); z += log(x)*cos(x); } took 5.70 seconds with single-precision numbers and 6.43 seconds with double-precision in a Linux/Intel machine (y and z were defined as floats or doubles). In this case, the difference is not large but may depend strongly on the machine in question. In many cases, it is recommendable to use double-precision in order to avoid problems due to for example loss of significance or round-off errors.

13 Errors Numerical calculations always involve approximations due to several reasons. These errors are not the result of poor thinking or carelessness (like programming errors) but they inevitably arise in all numerical calculations. We can divide the sources of errors roughly into four categories: model, method, initial values (data) and roundoff Modeling errors When a practical problem is formulated into mathematical language, it is almost always necessary to make simplifications. Examples of modeling errors include leaving out lessinfluential factors (e.g., no air resistance in falling) or using a simplified description of a more complex system (e.g., classical description of a quantum-mechanical system). Modeling errors are not discussed here in more detail but left as a subject of courses in the various application fields Methodological errors The conversion of a mathematical problem into a numerical one is also a source of errors. Care should be taken to control these errors and to estimate their magnitude and thus the quality of the numerical solution. Note that by methodological errors we mean errors that would persist even if a hypothetical "perfect" computer had an infinitely accurate representation and no roundoff error. As a general rule, there is not much a programmer can do about the computer s roundoff error (see below for more details). Methodological errors, on the other hand, are entirely under the programmer s control. In fact, an incredible amount of work in the field of numerical analysis has been devoted to the fine minimization methodological errors! An example of methodological errors is the truncation error (or chopping error) which is encountered when, for example, an interminating series is chopped: x(t) = e t = 1 +t + t tn n! + R n+1(t) Here the truncation error is: R n+1 (t). It follows that when n is sufficiently large, the excess term R n+1 is small and the chopping gives a good approximation of the exact result. Another example of methodological errors is the discretizing error which results when a continuous quantity is replaced by a discrete approximation. For example, replacing the derivative by the difference quotient leads to a discretizing error: x (t) y(t,h) = x(t + h) x(t) h Or as another example, in numerical integration, the integrand is evaluated at a discrete set of points, rather than at "every" point.

14 Errors due to initial values The initial values of a numerical computation can involve inaccurate values (e.g. measurements). When designing the algorithm, it is important to keep in mind that the initial errors must not accumulate during the calculation. There are also techniques for data filtering that are designed to decrease the effects of errors in the initial values Roundoff errors Roundoff errors are the result of having a finite number of bits to represent floating-point numbers in computers. As already mentioned, arbitrarily large or small numbers cannot be represented and floating-point numbers cannot have arbitrary precision. Consider a positive real number x in normalized floating-point form x = (1.a 1 a 2 a 3...a 23 a 24 a 25...) 2 2 m The process of replacing x by its nearest machine number is called correctly rounding. The error involved is called the roundoff error. How large can it be? One nearby machine number can be obtained by rounding down or by dropping the excess bits a 24 a (if only 23 bits have been allocated to the mantissa). This machine number is x = (1.a 1 a 2 a 3...a 23 ) 2 2 m Another nearby machine number is found by rounding up. It is found by adding one unit to a 23 in the expression for x. Thus, x + = [(1.a 1 a 2 a 3...a 23 ) ] 2 m The closer of these machine numbers is chosen for x. Since the unit roundoff error (machine epsilon) for a 32-bit floating-point representation is ε = 2 23, we notice that in the case of rounding to the nearest, the relative error is no greater than ε/2. This is illustrated in Fig. 3. [Remember that the machine epsilon is defined to be the smallest number such that when added to 1, the result is not equal to 1.] < machine epsilon roundoff error x x x+ Figure 3: Rounding to the nearest: the number x is represented by the machine number that lies closest to the true value of x. The roundoff error involved in the operation is no greater than ε/2.

15 Elementary arithmetic operations We continue to examining errors that are produced in basic arithmetic operations. Roundoff errors accumulate with increasing amounts of calculation. If you perform N arithmetic operations to obtain a given value, the total roundoff error might be of the order Nε, if you are lucky! [The square root comes from a random walk, assuming that the errors come in randomly up or down.] However, this estimate can be badly off the mark for two reasons: (i) It frequently happens that for some reason (because of the regularities of your calculation or the peculiarities of your computer) the roundoff errors accumulate preferentially in one direction. In this case, the total error will be of order Nε. (ii) Some especially unfavorable operations can increase the roundoff error substantially in a single arithmetic operation. A good example is the subtraction of two almost equal numbers, which results in the loss of significance (see the discussion later in this chapter). To discuss these issues in more detail, we first introduce the notation fl(x) to denote the floating-point machine number that corresponds to the real number x. For example, in a hypothetical five-decimal machine, fl( ) = If x is any real number within the range of the computer, then the error involved in the operation is bound by ε/2 (assuming correct rounding) x fl(x) x 1 2 ε This can be written as fl(x) = x(1 + δ) ( δ 1 2 ε) Remember that the unit round-off error or the machine epsilon is the smallest positive machine number ε such that fl(1 + ε) > 1 Let the symbol denote any one of the operations, +, -,, or. Suppose that whenever two machine numbers x and y are combined arithmetically, the computer will produce fl(x y) instead of x y. Under this assumption, the previous analysis gives fl(x y) = (x y)(1 + δ) ( δ 1 2 ε) It is of course possible that x y overflows or underflows, but except for this case, the above assumption is realistic for most computers. However, in real calculations, the initial values are given as real numbers and not as machine numbers. For example, the relative error for addition can be estimated by z = fl[fl(x) + fl(y)] = [x(1 + δ 1 ) + y(1 + δ 2 )](1 + δ 3 ) (x + y) + x(δ 1 + δ 3 ) + y(δ 2 + δ 3 )

16 15 The relative round-off error is (x + y) z (x + y) = x(δ 1 + δ 3 ) + y(δ 2 + δ 3 ) (x + y) = δ 3 + xδ 1 + yδ 2 (x + y) We notice that this cannot be bounded, because the second term has a denominator that can be zero or close to zero. Problems due to these type of situations are discussed next. 0.4 Loss of significance In this section, we discuss how the problem of loss of significance arises and how it can be avoided by various techniques, such as the use of rationalization, Taylor series, trigonometric identities, and so on Significant digits We begin by considering the concept of significant digits in a number. Consider a real number x expressed in normalized scientific form. For example, x = The digits of x do not have the same significance since they represent different powers of 10. Thus, 3 is the most significant digit and 8 is the least significant digit. A mathematically exact quantity, such as π, can be expressed with as many significant digits as we like. A measured quantity, however, always involves an error whose magnitude depends on the measuring device. Likewise, all numerically computed quantities are not exact, but can only be expressed with a certain precision depending on the machine. It is important to understand that the precision of a computed quantity is determined by the least precise value that was used in the computation. For example, suppose we have a measured quantity s = It is a scientific convention that the least significant digit given in a measured quantity should be in error by at most five units. The following computation gives y = s (2) but the result should be reported as Remember that the output of a computer program often produces numbers with a long list of digits. It is perfectly acceptable (and even recommended) to include these in the raw output, but after analysis, the final results should always be given with appropriate precision!

17 Loss of significance In some cases, the relative error involved in arithmetic calculations can grow significantly large. This often involves subtracting two almost equal numbers. For example, consider the case y = x sin(x). Let us have a computing system which works with ten decimal digits. Then x = sinx = x sinx = = Thus the number of significant digits was reduced by three! Three spurious zeros were added by the computer to the last three decimal places, but these are not significant digits. The correct value is The simplest solution to this problem is to use floating-point numbers with higher precision, but this only helps up to a certain point. A better approach is to anticipate that a problematic situation may arise and change the algorithm in such a way that the problem is avoided Loss of precision theorem Exactly how many significant binary digits are lost in the subtraction x y when x is close to y? Let x and y be normalized floating-point numbers with x > y > 0. If 2 p 1 y/x 2 q for some positive integers p and q, then at most p and at least q significant binary digits are lost in the subtraction x y. Example. How many significant bits are lost in the subtraction x y = ? We have 1 y x = This lies between 2 12 = and 2 11 = Hence, at least 11 but at most 12 bits are lost.

18 Avoiding loss of significance i. Rationalizing Consider the function f (x) = (x 2 + 1) 1 We see that near zero, there is a potential loss of significance. However, the function can be rewritten in the form f (x) = ( ( ) (x (x ) ) 1) (x 2 + 1) + 1 = x 2 (x 2 + 1) + 1 This allows terms to be canceled out and therefore removes the problematic subtraction. ii. Using series expansion Consider the function f (x) = x sinx whose values are required near x = 0. We can avoid the loss of significance by using the Taylor series for sinx sinx = x x3 3! + x5 5! x7 7!... For x near zero, the series converges quite rapidly. We can now rewrite the function f as ) f (x) = x (x x3 3! + x5 5! x7 7!... = x3 3! + x5 5! x7 7!... This is a very effective form for calculating f for small x. iii. Using trigonometric identities As a simple example, consider the function f (x) = cos 2 (x) sin 2 (x) There will be loss of significance at x = π/4. The problem can be solved by the simple substitution cos 2 (x) sin 2 (x) = cos(2x)

19 Range reduction Another cause for loss of significant digits is the use of various library functions with very large arguments. For example, consider the sine function whose basic property is its periodicity sin(x) = sin(x + 2πn) for all real values of x and all integer values of n. The periodicity is used in the the computer evaluation of sinx: One only needs to know the values of sinx in some fixed interval of length 2π in order to compute sinx for arbitrary x. This is called range reduction. This procedure leads to an unavoidable loss of precision if the original argument x is very large. For example, if we want to evaluate sin( ), we first subtract 3988π from the original argument and calculate the value of sin(3.47). Here the argument has only three significant figures due to the subtraction! Thus our computed value of sin( ) can only be given with three significant digits! The only way to improve the situation is to use double- or extended-precision programming. This is recommended for variables which are used as arguments of library functions such as the sine function.

### CS321. Introduction to Numerical Methods

CS3 Introduction to Numerical Methods Lecture Number Representations and Errors Professor Jun Zhang Department of Computer Science University of Kentucky Lexington, KY 40506-0633 August 7, 05 Number in

### CHAPTER 5 Round-off errors

CHAPTER 5 Round-off errors In the two previous chapters we have seen how numbers can be represented in the binary numeral system and how this is the basis for representing numbers in computers. Since any

### > 2. Error and Computer Arithmetic

> 2. Error and Computer Arithmetic Numerical analysis is concerned with how to solve a problem numerically, i.e., how to develop a sequence of numerical calculations to get a satisfactory answer. Part

### ECE 0142 Computer Organization. Lecture 3 Floating Point Representations

ECE 0142 Computer Organization Lecture 3 Floating Point Representations 1 Floating-point arithmetic We often incur floating-point programming. Floating point greatly simplifies working with large (e.g.,

### Divide: Paper & Pencil. Computer Architecture ALU Design : Division and Floating Point. Divide algorithm. DIVIDE HARDWARE Version 1

Divide: Paper & Pencil Computer Architecture ALU Design : Division and Floating Point 1001 Quotient Divisor 1000 1001010 Dividend 1000 10 101 1010 1000 10 (or Modulo result) See how big a number can be

### Computer Science 281 Binary and Hexadecimal Review

Computer Science 281 Binary and Hexadecimal Review 1 The Binary Number System Computers store everything, both instructions and data, by using many, many transistors, each of which can be in one of two

### Numerical Matrix Analysis

Numerical Matrix Analysis Lecture Notes #10 Conditioning and / Peter Blomgren, blomgren.peter@gmail.com Department of Mathematics and Statistics Dynamical Systems Group Computational Sciences Research

### Measures of Error: for exact x and approximation x Absolute error e = x x. Relative error r = (x x )/x.

ERRORS and COMPUTER ARITHMETIC Types of Error in Numerical Calculations Initial Data Errors: from experiment, modeling, computer representation; problem dependent but need to know at beginning of calculation.

### To convert an arbitrary power of 2 into its English equivalent, remember the rules of exponential arithmetic:

Binary Numbers In computer science we deal almost exclusively with binary numbers. it will be very helpful to memorize some binary constants and their decimal and English equivalents. By English equivalents

### Binary Number System. 16. Binary Numbers. Base 10 digits: 0 1 2 3 4 5 6 7 8 9. Base 2 digits: 0 1

Binary Number System 1 Base 10 digits: 0 1 2 3 4 5 6 7 8 9 Base 2 digits: 0 1 Recall that in base 10, the digits of a number are just coefficients of powers of the base (10): 417 = 4 * 10 2 + 1 * 10 1

### Oct: 50 8 = 6 (r = 2) 6 8 = 0 (r = 6) Writing the remainders in reverse order we get: (50) 10 = (62) 8

ECE Department Summer LECTURE #5: Number Systems EEL : Digital Logic and Computer Systems Based on lecture notes by Dr. Eric M. Schwartz Decimal Number System: -Our standard number system is base, also

### 2010/9/19. Binary number system. Binary numbers. Outline. Binary to decimal

2/9/9 Binary number system Computer (electronic) systems prefer binary numbers Binary number: represent a number in base-2 Binary numbers 2 3 + 7 + 5 Some terminology Bit: a binary digit ( or ) Hexadecimal

### Correctly Rounded Floating-point Binary-to-Decimal and Decimal-to-Binary Conversion Routines in Standard ML. By Prashanth Tilleti

Correctly Rounded Floating-point Binary-to-Decimal and Decimal-to-Binary Conversion Routines in Standard ML By Prashanth Tilleti Advisor Dr. Matthew Fluet Department of Computer Science B. Thomas Golisano

### Lies My Calculator and Computer Told Me

Lies My Calculator and Computer Told Me 2 LIES MY CALCULATOR AND COMPUTER TOLD ME Lies My Calculator and Computer Told Me See Section.4 for a discussion of graphing calculators and computers with graphing

### Binary Division. Decimal Division. Hardware for Binary Division. Simple 16-bit Divider Circuit

Decimal Division Remember 4th grade long division? 43 // quotient 12 521 // divisor dividend -480 41-36 5 // remainder Shift divisor left (multiply by 10) until MSB lines up with dividend s Repeat until

### This Unit: Floating Point Arithmetic. CIS 371 Computer Organization and Design. Readings. Floating Point (FP) Numbers

This Unit: Floating Point Arithmetic CIS 371 Computer Organization and Design Unit 7: Floating Point App App App System software Mem CPU I/O Formats Precision and range IEEE 754 standard Operations Addition

### NUMBER SYSTEMS. William Stallings

NUMBER SYSTEMS William Stallings The Decimal System... The Binary System...3 Converting between Binary and Decimal...3 Integers...4 Fractions...5 Hexadecimal Notation...6 This document available at WilliamStallings.com/StudentSupport.html

### CHAPTER 3 Numbers and Numeral Systems

CHAPTER 3 Numbers and Numeral Systems Numbers play an important role in almost all areas of mathematics, not least in calculus. Virtually all calculus books contain a thorough description of the natural,

### CSI 333 Lecture 1 Number Systems

CSI 333 Lecture 1 Number Systems 1 1 / 23 Basics of Number Systems Ref: Appendix C of Deitel & Deitel. Weighted Positional Notation: 192 = 2 10 0 + 9 10 1 + 1 10 2 General: Digit sequence : d n 1 d n 2...

### The string of digits 101101 in the binary number system represents the quantity

Data Representation Section 3.1 Data Types Registers contain either data or control information Control information is a bit or group of bits used to specify the sequence of command signals needed for

### 1. Give the 16 bit signed (twos complement) representation of the following decimal numbers, and convert to hexadecimal:

Exercises 1 - number representations Questions 1. Give the 16 bit signed (twos complement) representation of the following decimal numbers, and convert to hexadecimal: (a) 3012 (b) - 435 2. For each of

### Some Notes on Taylor Polynomials and Taylor Series

Some Notes on Taylor Polynomials and Taylor Series Mark MacLean October 3, 27 UBC s courses MATH /8 and MATH introduce students to the ideas of Taylor polynomials and Taylor series in a fairly limited

### Digital System Design Prof. D Roychoudhry Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

Digital System Design Prof. D Roychoudhry Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture - 04 Digital Logic II May, I before starting the today s lecture

### Fractional Numbers. Fractional Number Notations. Fixed-point Notation. Fixed-point Notation

2 Fractional Numbers Fractional Number Notations 2010 - Claudio Fornaro Ver. 1.4 Fractional numbers have the form: xxxxxxxxx.yyyyyyyyy where the x es constitute the integer part of the value and the y

### Numbering Systems. InThisAppendix...

G InThisAppendix... Introduction Binary Numbering System Hexadecimal Numbering System Octal Numbering System Binary Coded Decimal (BCD) Numbering System Real (Floating Point) Numbering System BCD/Binary/Decimal/Hex/Octal

### Number Representation

Number Representation CS10001: Programming & Data Structures Pallab Dasgupta Professor, Dept. of Computer Sc. & Engg., Indian Institute of Technology Kharagpur Topics to be Discussed How are numeric data

### 198:211 Computer Architecture

198:211 Computer Architecture Topics: Lecture 8 (W5) Fall 2012 Data representation 2.1 and 2.2 of the book Floating point 2.4 of the book 1 Computer Architecture What do computers do? Manipulate stored

### Fixed-Point Arithmetic

Fixed-Point Arithmetic Fixed-Point Notation A K-bit fixed-point number can be interpreted as either: an integer (i.e., 20645) a fractional number (i.e., 0.75) 2 1 Integer Fixed-Point Representation N-bit

### COMP 250 Fall 2012 lecture 2 binary representations Sept. 11, 2012

Binary numbers The reason humans represent numbers using decimal (the ten digits from 0,1,... 9) is that we have ten fingers. There is no other reason than that. There is nothing special otherwise about

### Binary Numbers. Bob Brown Information Technology Department Southern Polytechnic State University

Binary Numbers Bob Brown Information Technology Department Southern Polytechnic State University Positional Number Systems The idea of number is a mathematical abstraction. To use numbers, we must represent

### The Essentials of Computer Organization and Architecture. Linda Null and Julia Lobur Jones and Bartlett Publishers, 2003

The Essentials of Computer Organization and Architecture Linda Null and Julia Lobur Jones and Bartlett Publishers, 2003 Chapter 2 Instructor's Manual Chapter Objectives Chapter 2, Data Representation,

### Figure 1. A typical Laboratory Thermometer graduated in C.

SIGNIFICANT FIGURES, EXPONENTS, AND SCIENTIFIC NOTATION 2004, 1990 by David A. Katz. All rights reserved. Permission for classroom use as long as the original copyright is included. 1. SIGNIFICANT FIGURES

### Lecture 2. Binary and Hexadecimal Numbers

Lecture 2 Binary and Hexadecimal Numbers Purpose: Review binary and hexadecimal number representations Convert directly from one base to another base Review addition and subtraction in binary representations

### Limit processes are the basis of calculus. For example, the derivative. f f (x + h) f (x)

SEC. 4.1 TAYLOR SERIES AND CALCULATION OF FUNCTIONS 187 Taylor Series 4.1 Taylor Series and Calculation of Functions Limit processes are the basis of calculus. For example, the derivative f f (x + h) f

### Restoring division. 2. Run the algorithm Let s do 0111/0010 (7/2) unsigned. 3. Find remainder here. 4. Find quotient here.

Binary division Dividend = divisor q quotient + remainder CS/COE447: Computer Organization and Assembly Language Given dividend and divisor, we want to obtain quotient (Q) and remainder (R) Chapter 3 We

### Useful Number Systems

Useful Number Systems Decimal Base = 10 Digit Set = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9} Binary Base = 2 Digit Set = {0, 1} Octal Base = 8 = 2 3 Digit Set = {0, 1, 2, 3, 4, 5, 6, 7} Hexadecimal Base = 16 = 2

### This 3-digit ASCII string could also be calculated as n = (Data[2]-0x30) +10*((Data[1]-0x30)+10*(Data[0]-0x30));

Introduction to Embedded Microcomputer Systems Lecture 5.1 2.9. Conversions ASCII to binary n = 100*(Data[0]-0x30) + 10*(Data[1]-0x30) + (Data[2]-0x30); This 3-digit ASCII string could also be calculated

### Lecture 4 Representing Data on the Computer. Ramani Duraiswami AMSC/CMSC 662 Fall 2009

Lecture 4 Representing Data on the Computer Ramani Duraiswami AMSC/CMSC 662 Fall 2009 x = ±(1+f) 2 e 0 f < 1 f = (integer < 2 52 )/ 2 52-1022 e 1023 e = integer Effects of floating point Effects of floating

### Computers. Hardware. The Central Processing Unit (CPU) CMPT 125: Lecture 1: Understanding the Computer

Computers CMPT 125: Lecture 1: Understanding the Computer Tamara Smyth, tamaras@cs.sfu.ca School of Computing Science, Simon Fraser University January 3, 2009 A computer performs 2 basic functions: 1.

Section 5.4 The Quadratic Formula 481 5.4 The Quadratic Formula Consider the general quadratic function f(x) = ax + bx + c. In the previous section, we learned that we can find the zeros of this function

### Continued Fractions and the Euclidean Algorithm

Continued Fractions and the Euclidean Algorithm Lecture notes prepared for MATH 326, Spring 997 Department of Mathematics and Statistics University at Albany William F Hammond Table of Contents Introduction

### Mathematical Procedures

CHAPTER 6 Mathematical Procedures 168 CHAPTER 6 Mathematical Procedures The multidisciplinary approach to medicine has incorporated a wide variety of mathematical procedures from the fields of physics,

### 0.8 Rational Expressions and Equations

96 Prerequisites 0.8 Rational Expressions and Equations We now turn our attention to rational expressions - that is, algebraic fractions - and equations which contain them. The reader is encouraged to

### Integer multiplication

Integer multiplication Suppose we have two unsigned integers, A and B, and we wish to compute their product. Let A be the multiplicand and B the multiplier: A n 1... A 1 A 0 multiplicand B n 1... B 1 B

### What Every Computer Scientist Should Know About Floating-Point Arithmetic

What Every Computer Scientist Should Know About Floating-Point Arithmetic D Note This document is an edited reprint of the paper What Every Computer Scientist Should Know About Floating-Point Arithmetic,

### Levent EREN levent.eren@ieu.edu.tr A-306 Office Phone:488-9882 INTRODUCTION TO DIGITAL LOGIC

Levent EREN levent.eren@ieu.edu.tr A-306 Office Phone:488-9882 1 Number Systems Representation Positive radix, positional number systems A number with radix r is represented by a string of digits: A n

### Taylor Polynomials and Taylor Series Math 126

Taylor Polynomials and Taylor Series Math 26 In many problems in science and engineering we have a function f(x) which is too complicated to answer the questions we d like to ask. In this chapter, we will

### Math Review. for the Quantitative Reasoning Measure of the GRE revised General Test

Math Review for the Quantitative Reasoning Measure of the GRE revised General Test www.ets.org Overview This Math Review will familiarize you with the mathematical skills and concepts that are important

### Name: Class: Date: 9. The compiler ignores all comments they are there strictly for the convenience of anyone reading the program.

Name: Class: Date: Exam #1 - Prep True/False Indicate whether the statement is true or false. 1. Programming is the process of writing a computer program in a language that the computer can respond to

### quotient = dividend / divisor, with a remainder Given dividend and divisor, we want to obtain quotient (Q) and remainder (R)

Binary division quotient = dividend / divisor, with a remainder dividend = divisor quotient + remainder Given dividend and divisor, we want to obtain quotient (Q) and remainder (R) We will start from our

### LIES MY CALCULATOR AND COMPUTER TOLD ME

LIES MY CALCULATOR AND COMPUTER TOLD ME See Section Appendix.4 G for a discussion of graphing calculators and computers with graphing software. A wide variety of pocket-size calculating devices are currently

### Fractions and Decimals

Fractions and Decimals Tom Davis tomrdavis@earthlink.net http://www.geometer.org/mathcircles December 1, 2005 1 Introduction If you divide 1 by 81, you will find that 1/81 =.012345679012345679... The first

### COMPASS Numerical Skills/Pre-Algebra Preparation Guide. Introduction Operations with Integers Absolute Value of Numbers 13

COMPASS Numerical Skills/Pre-Algebra Preparation Guide Please note that the guide is for reference only and that it does not represent an exact match with the assessment content. The Assessment Centre

### Elementary Number Theory We begin with a bit of elementary number theory, which is concerned

CONSTRUCTION OF THE FINITE FIELDS Z p S. R. DOTY Elementary Number Theory We begin with a bit of elementary number theory, which is concerned solely with questions about the set of integers Z = {0, ±1,

### 5.1 Radical Notation and Rational Exponents

Section 5.1 Radical Notation and Rational Exponents 1 5.1 Radical Notation and Rational Exponents We now review how exponents can be used to describe not only powers (such as 5 2 and 2 3 ), but also roots

### Borland C++ Compiler: Operators

Introduction Borland C++ Compiler: Operators An operator is a symbol that specifies which operation to perform in a statement or expression. An operand is one of the inputs of an operator. For example,

### Solution for Homework 2

Solution for Homework 2 Problem 1 a. What is the minimum number of bits that are required to uniquely represent the characters of English alphabet? (Consider upper case characters alone) The number of

### NUMBER SYSTEMS. 1.1 Introduction

NUMBER SYSTEMS 1.1 Introduction There are several number systems which we normally use, such as decimal, binary, octal, hexadecimal, etc. Amongst them we are most familiar with the decimal number system.

### Arithmetic in MIPS. Objectives. Instruction. Integer arithmetic. After completing this lab you will:

6 Objectives After completing this lab you will: know how to do integer arithmetic in MIPS know how to do floating point arithmetic in MIPS know about conversion from integer to floating point and from

### Systems I: Computer Organization and Architecture

Systems I: Computer Organization and Architecture Lecture 2: Number Systems and Arithmetic Number Systems - Base The number system that we use is base : 734 = + 7 + 3 + 4 = x + 7x + 3x + 4x = x 3 + 7x

### 3.1. RATIONAL EXPRESSIONS

3.1. RATIONAL EXPRESSIONS RATIONAL NUMBERS In previous courses you have learned how to operate (do addition, subtraction, multiplication, and division) on rational numbers (fractions). Rational numbers

### Number Systems and Radix Conversion

Number Systems and Radix Conversion Sanjay Rajopadhye, Colorado State University 1 Introduction These notes for CS 270 describe polynomial number systems. The material is not in the textbook, but will

### LSN 2 Number Systems. ECT 224 Digital Computer Fundamentals. Department of Engineering Technology

LSN 2 Number Systems Department of Engineering Technology LSN 2 Decimal Number System Decimal number system has 10 digits (0-9) Base 10 weighting system... 10 5 10 4 10 3 10 2 10 1 10 0. 10-1 10-2 10-3

### Simulation & Synthesis Using VHDL

Floating Point Multipliers: Simulation & Synthesis Using VHDL By: Raj Kumar Singh - B.E. (Hons.) Electrical & Electronics Shivananda Reddy - B.E. (Hons.) Electrical & Electronics BITS, PILANI Outline Introduction

### Lecture 11: Number Systems

Lecture 11: Number Systems Numeric Data Fixed point Integers (12, 345, 20567 etc) Real fractions (23.45, 23., 0.145 etc.) Floating point such as 23. 45 e 12 Basically an exponent representation Any number

### 2 Number Systems. Source: Foundations of Computer Science Cengage Learning. Objectives After studying this chapter, the student should be able to:

2 Number Systems 2.1 Source: Foundations of Computer Science Cengage Learning Objectives After studying this chapter, the student should be able to: Understand the concept of number systems. Distinguish

### Section 1.4 Place Value Systems of Numeration in Other Bases

Section.4 Place Value Systems of Numeration in Other Bases Other Bases The Hindu-Arabic system that is used in most of the world today is a positional value system with a base of ten. The simplest reason

### 2 Number Systems 2.1. Foundations of Computer Science Cengage Learning

2 Number Systems 2.1 Foundations of Computer Science Cengage Learning 2.2 Objectives After studying this chapter, the student should be able to: Understand the concept of number systems. Distinguish between

### Data Storage 3.1. Foundations of Computer Science Cengage Learning

3 Data Storage 3.1 Foundations of Computer Science Cengage Learning Objectives After studying this chapter, the student should be able to: List five different data types used in a computer. Describe how

### ELECTRICAL AND COMPUTER ENGINEERING DEPARTMENT, OAKLAND UNIVERSITY ECE-470/570: Microprocessor-Based System Design Fall 2014.

REVIEW OF NUMBER SYSTEMS Notes Unit 2 BINARY NUMBER SYSTEM In the decimal system, a decimal digit can take values from to 9. For the binary system, the counterpart of the decimal digit is the binary digit,

### TI-83 Plus Graphing Calculator Keystroke Guide

TI-83 Plus Graphing Calculator Keystroke Guide In your textbook you will notice that on some pages a key-shaped icon appears next to a brief description of a feature on your graphing calculator. In this

### Pre-Algebra Class 3 - Fractions I

Pre-Algebra Class 3 - Fractions I Contents 1 What is a fraction? 1 1.1 Fractions as division............................... 2 2 Representations of fractions 3 2.1 Improper fractions................................

### 4 Operations On Data

4 Operations On Data 4.1 Source: Foundations of Computer Science Cengage Learning Objectives After studying this chapter, students should be able to: List the three categories of operations performed on

### 6.4 Normal Distribution

Contents 6.4 Normal Distribution....................... 381 6.4.1 Characteristics of the Normal Distribution....... 381 6.4.2 The Standardized Normal Distribution......... 385 6.4.3 Meaning of Areas under

### Complex Numbers Basic Concepts of Complex Numbers Complex Solutions of Equations Operations on Complex Numbers

Complex Numbers Basic Concepts of Complex Numbers Complex Solutions of Equations Operations on Complex Numbers Identify the number as real, complex, or pure imaginary. 2i The complex numbers are an extension

### Numerical Analysis I

Numerical Analysis I M.R. O Donohoe References: S.D. Conte & C. de Boor, Elementary Numerical Analysis: An Algorithmic Approach, Third edition, 1981. McGraw-Hill. L.F. Shampine, R.C. Allen, Jr & S. Pruess,

### Representation of Floating Point Numbers in

Representation of Floating Point Numbers in Single Precision IEEE 754 Standard Value = N = (-1) S X 2 E-127 X (1.M) 0 < E < 255 Actual exponent is: e = E - 127 sign 1 8 23 S E M exponent: excess 127 binary

### Data Storage. Chapter 3. Objectives. 3-1 Data Types. Data Inside the Computer. After studying this chapter, students should be able to:

Chapter 3 Data Storage Objectives After studying this chapter, students should be able to: List five different data types used in a computer. Describe how integers are stored in a computer. Describe how

### Number Conversions Dr. Sarita Agarwal (Acharya Narendra Dev College,University of Delhi)

Conversions Dr. Sarita Agarwal (Acharya Narendra Dev College,University of Delhi) INTRODUCTION System- A number system defines a set of values to represent quantity. We talk about the number of people

### 1 if 1 x 0 1 if 0 x 1

Chapter 3 Continuity In this chapter we begin by defining the fundamental notion of continuity for real valued functions of a single real variable. When trying to decide whether a given function is or

### Section 3-3 Approximating Real Zeros of Polynomials

- Approimating Real Zeros of Polynomials 9 Section - Approimating Real Zeros of Polynomials Locating Real Zeros The Bisection Method Approimating Multiple Zeros Application The methods for finding zeros

### Number Systems, Base Conversions, and Computer Data Representation

, Base Conversions, and Computer Data Representation Decimal and Binary Numbers When we write decimal (base 10) numbers, we use a positional notation system. Each digit is multiplied by an appropriate

### Negative Integer Exponents

7.7 Negative Integer Exponents 7.7 OBJECTIVES. Define the zero exponent 2. Use the definition of a negative exponent to simplify an expression 3. Use the properties of exponents to simplify expressions

### This is a square root. The number under the radical is 9. (An asterisk * means multiply.)

Page of Review of Radical Expressions and Equations Skills involving radicals can be divided into the following groups: Evaluate square roots or higher order roots. Simplify radical expressions. Rationalize

### Decimal Numbers: Base 10 Integer Numbers & Arithmetic

Decimal Numbers: Base 10 Integer Numbers & Arithmetic Digits: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 Example: 3271 = (3x10 3 ) + (2x10 2 ) + (7x10 1 )+(1x10 0 ) Ward 1 Ward 2 Numbers: positional notation Number

### Square Roots. Learning Objectives. Pre-Activity

Section 1. Pre-Activity Preparation Square Roots Our number system has two important sets of numbers: rational and irrational. The most common irrational numbers result from taking the square root of non-perfect

### 26 Integers: Multiplication, Division, and Order

26 Integers: Multiplication, Division, and Order Integer multiplication and division are extensions of whole number multiplication and division. In multiplying and dividing integers, the one new issue

### Signed Binary Arithmetic

Signed Binary Arithmetic In the real world of mathematics, computers must represent both positive and negative binary numbers. For example, even when dealing with positive arguments, mathematical operations

### Bachelors of Computer Application Programming Principle & Algorithm (BCA-S102T)

Unit- I Introduction to c Language: C is a general-purpose computer programming language developed between 1969 and 1973 by Dennis Ritchie at the Bell Telephone Laboratories for use with the Unix operating

### SIMPLIFYING SQUARE ROOTS

40 (8-8) Chapter 8 Powers and Roots 8. SIMPLIFYING SQUARE ROOTS In this section Using the Product Rule Rationalizing the Denominator Simplified Form of a Square Root In Section 8. you learned to simplify

### Part 1 Expressions, Equations, and Inequalities: Simplifying and Solving

Section 7 Algebraic Manipulations and Solving Part 1 Expressions, Equations, and Inequalities: Simplifying and Solving Before launching into the mathematics, let s take a moment to talk about the words

### 1 Review of complex numbers

1 Review of complex numbers 1.1 Complex numbers: algebra The set C of complex numbers is formed by adding a square root i of 1 to the set of real numbers: i = 1. Every complex number can be written uniquely

### Base Conversion written by Cathy Saxton

Base Conversion written by Cathy Saxton 1. Base 10 In base 10, the digits, from right to left, specify the 1 s, 10 s, 100 s, 1000 s, etc. These are powers of 10 (10 x ): 10 0 = 1, 10 1 = 10, 10 2 = 100,

### Rational Exponents. Squaring both sides of the equation yields. and to be consistent, we must have

8.6 Rational Exponents 8.6 OBJECTIVES 1. Define rational exponents 2. Simplify expressions containing rational exponents 3. Use a calculator to estimate the value of an expression containing rational exponents

### EE 261 Introduction to Logic Circuits. Module #2 Number Systems

EE 261 Introduction to Logic Circuits Module #2 Number Systems Topics A. Number System Formation B. Base Conversions C. Binary Arithmetic D. Signed Numbers E. Signed Arithmetic F. Binary Codes Textbook