Division by Invariant Integers using Multiplication

Size: px
Start display at page:

Download "Division by Invariant Integers using Multiplication"

Transcription

1 Division by Invariant Integers using Multiplication Torbjörn Granlun Cygnus Support 1937 Lanings Drive Mountain View, CA Peter L. Montgomery Centrum voor Wiskune en Informatica 780 Las Colinas Roa San Rafael, CA Abstract Integer ivision remains expensive on toay s processors as the cost of integer multiplication eclines. We present coe sequences for ivision by arbitrary nonzero integer constants an run time invariants using integer multiplication. The algorithms assume a two s complement architecture. Most also require that the upper half of an integer prouct be quickly accessible. We treat unsigne ivision, signe ivision where the quotient rouns towars zero, signe ivision where the quotient rouns towars, an ivision where the result is known a priori to be exact. We give some implementation results using the C compiler GCC. 1 Introuction The cost of an integer ivision on toay s RISC processors is several times that of an integer multiplication. The tren is towars fast, often pipeline combinatoric multipliers that perform an operation in typically less than 10 cycles, with either no harware support for integer ivision or iterating iviers that are several times slower than the multiplier. Table 1.1 compares multiplication an ivision times on some processors. This table illustrates that the iscrepancy between multiplication an ivision timing has been growing. Integer ivision is use heavily in base conversions, number theoretic coes, an graphics coes. Compilers Work one by first author while at Sweish Institute of Computer Science, Stockholm, Sween. Work one by secon author while at University of California, Los Angeles. Supporte by U.S. Army fellowship DAAL03 89 G generate integer ivisions to compute loop counts an subtract pointers. In a static analysis of FORTRAN programs, Knuth [13, p. 9] reports that 39% of arithmetic operators were aitions, 22% subtractions, 27% multiplications, 10% ivisions, an 2% exponentiations. Knuth s counts o not istinguish integer an floating point operations, except that 4% of the ivisions were ivisions by 2. When integer multiplication is cheaper than integer ivision, it is beneficial to substitute a multiplication for a ivision. Multiple authors [2, 11, 15] present algorithms for ivision by constants, but only when the ivisor ivies 2 k 1 for some small k. Magenheimer et al [16, 7] give the founation of a more general approach, which Alverson [1] implements on the Tera Computing System. Compiler writers are only beginning to become aware of the general technique. For example, version 1.02 of the IBM RS/6000 xlc an xlf compilers uses the integer multiply instruction to expan signe integer ivisions by 3, 5, 7, 9, 25, an 125, but not by other o integer ivisors below 256, an never for unsigne ivision. We assume an N bit two s complement architecture. Unsigne (i.e., nonnegative) integers range from 0 to 2 N 1 inclusive; signe integers range from 2 N 1 to 2 N 1 1. We enote these integers by uwor an swor respectively. Unsigne oublewor integers (range 0 to 2 2N 1) are enote by uwor. Signe oublewor integers (range 2 2N 1 to 2 2N 1 1) are enote by swor. The type int is use for shift counts an logarithms. Several of the algorithms require the upper half of an integer prouct obtaine by multiplying two uwors or two swors. All algorithms nee simple operations such as as, shifts, an bitwise operations (bit ops) on uwors an swors, as summarize in Table 3.1. We show how to use these operations to ivie by arbitrary nonzero constants, as well as by ivisors which are loop invariant or repeate in a basic block, using one multiplication plus a few simple instructions per ivision. The presentation concentrates on three types of

2 Architecture/Implementation N Approx. Year Motorola MC68020 [18, pp. 9 22] Time (cycles) for HIGH(N bit N bit) Motorola MC Intel 386 [9] Intel 486 [10] Intel Pentium SPARC Cypress CY7C S 100 S SPARC Viking [20] HP PA 83 [16] S 70 S HP PA FP 70 S MIPS R3000 [12] P 35 P Time (cycles) for N bit/n bit ivie (unsigne) (signe) MIPS R4000 [17] P 139 POWER/RIOS I [4, 22] (signe only) 19 (signe only) PowerPC/MPC601 [19] DEC Alpha 21064AA [8] P 200 S Motorola MC S 38 Motorola MC P 18 S No irect harware support; approximate cycle count for software implementation F Does not inclue time for moving ata to/from floating point registers P Pipeline implementation (i.e., inepenent instructions can execute simultaneously) Table 1.1: Multiplication an ivision times on ifferent CPUs ivision, in orer by ifficulty: (i) unsigne, (ii) signe, quotient roune towars zero, (iii) signe, quotient roune towars. Other topics are ivision of a uwor by a run time invariant uwor, ivision when the remainer is known a priori to be zero, an testing for a given remainer. In each case we give the mathematical backgroun an suggest an algorithm which a compiler can use to generate the coe. The algorithms are ineffective when a ivisor is not invariant, such as in the Eucliean GCD algorithm. Most algorithms presente herein yiel only the quotient. The remainer, if esire, can be compute by an aitional multiplication an subtraction. We have implemente the algorithms in a evelopmental version of the GCC 2.6 compiler [21]. DEC uses some of these algorithms in its Alpha AXP compilers. 2 Mathematical notations Let x be a real number. Then x enotes the largest integer not exceeing x an x enotes the least integer not less than x. Let TRUNC(x) enote the integer part of x, roune towars zero. Formally, TRUNC(x) = x if x 0 an TRUNC(x) = x if x < 0. The absolute value of x is x. For x > 0, the (real) base 2 logarithm of x is log 2 x. A multiplication is written x y. If x, y, an n are integers an n 0, then x y (mo n) means x y is a multiple of n. Two remainer operators are common in language efinitions. Sometimes a remainer has the sign of the ivien an sometimes the sign of the ivisor. We use the Aa notations n rem = n TRUNC(n/) n mo = n n/ (sign of ivien), (sign of ivisor). (2.1) The Fortran 90 names are MOD an MODULO. In C, the efinition of remainer is implementation epenent (many C implementations roun signe quotients towars zero an use rem remainering). Other efinitions have been propose [6, 7]. If n is an uwor or swor, then HIGH(n) an LOW(n) enote the most significant an least significant halves of n. LOW(n) is a uwor, while HIGH(n) is an uwor if n is a uwor an an swor if n is a swor. In both cases n = 2 N HIGH(n) + LOW(n). 3 Assume instructions The suggeste coe assumes the operations in Table 3.1, on an N bit machine. Some primitives, such as loaing constants an operans, are implicit in the notation an are not inclue in the operation counts.

3 TRUNC(x) Truncation towars zero; see 2. HIGH(x), LOW(x) Upper an lower halves of x: see 2. MULL(x, y) Lower half of prouct x y (i.e., prouct moulo 2 N ). MULSH(x, y) Upper half of signe prouct x y: If 2 N 1 x, y 2 N 1 1, then x y = 2 N MULSH(x, y) + MULL(x, y). MULUH(x, y) Upper half of unsigne prouct x y: If 0 x, y 2 N 1, then x y = 2 N MULUH(x, y) + MULL(x, y). AND(x, y) Bitwise AND of x an y. EOR(x, y) Bitwise exclusive OR of x an y. NOT(x) Bitwise complement of x. Equal to 1 x if x is signe, to 2 N 1 x if x is unsigne. OR(x, y) Bitwise OR of x an y. SLL(x, n) Logical left shift of x by n bits (0 n N 1). SRA(x, n) Arithmetic right shift of x by n bits (0 n N 1). SRL(x, n) Logical right shift of x by n bits (0 n N 1). XSIGN(x) 1 if x < 0; 0 if x 0. Short for SRA(x, N 1) or SRL(x, N 1). x + y, x y, x Two s complement aition, subtraction, negation. Table 3.1: Mathematical notations an primitive operations The algorithm in 8 requires the ability to a or subtract two oublewors, obtaining a oublewor result; this typically expans into 2 4 instructions. The algorithms for processing constant ivisors require compile time arithmetic on uwors. Algorithms for processing run time invariant ivisors require taking the base 2 logarithm of a positive integer (sometimes roune up, sometimes own) an require iviing a uwor by a uwor. If the algorithms are use only for constant ivisors, then these operations are neee only at compile time. If the architecture has a leaing zero count (LDZ) instruction, then these logarithms can be foun from log 2 x = N LDZ(x 1), log 2 x = N 1 LDZ(x) (1 x 2 N 1). Some algorithms may prouce expressions such as SRL(x, 0) or (x y); the optimizer shoul make the obvious simplifications. Some escriptions show an aition or subtraction of 2 N, which is a no-op. If an architecture lacks arithmetic right shift, then it can be compute from the ientity SRA(x, l) = SRL(x + 2 N 1, l) 2 N 1 l whenever 0 l N 1. If an architecture has only one of MULSH an MULUH, then the other can be compute using MULUH(x, y) = MULSH(x, y) + AND(x, XSIGN(y)) + AND(y, XSIGN(x)) for arbitrary N bit patterns x, y (interprete as uwors for MULUH an as swors for MULSH). 4 Unsigne ivision Suppose we want to compile an unsigne ivision q = n/, where 0 < < 2 N is a constant or run time invariant an 0 n < 2 N is variable. Let s try to fin a rational approximation m/2 N+l of 1/ such that n m n = 2 N+l whenever 0 n 2 N 1. (4.1) Setting n = in (4.1) shows we require 2 N+l m. Setting n = q 1 shows 2 N+l q > m (q 1). Multiply by to erive ( m 2 N+l) (q 1) < 2 N+l. This inequality will hol for all values of q 1 below 2 N if m 2 N+l 2 l. Theorem 4.2 below states that these conitions are sufficient, because the maximum relative error (1 part in 2 N ) is too small to affect the quotient when n < 2 N. Theorem 4.2 Suppose m,, l are nonnegative integers such that 0 an 2 N+l m 2 N+l + 2 l. (4.3) Then n/ = m n/2 N+l for every integer n with 0 n < 2 N. Proof. Define k = m 2 N+l. Then 0 k 2 l by hypothesis. Given n with 0 n < 2 N, write n = q + r where q = n/ an 0 r 1. We must show that q = m n/2 N+l. A calculation gives m n k + 2N+l q = 2N+l n 2 N+l q = k n 2 N+l + n n r = k 2 l n 2 N 1 + r. (4.4)

4 This ifference is nonnegative an oes not excee 1 2N 1 2 N = N < 1. Theorem 4.2 allows ivision by to be replace with multiplication by m/2 N+l if (4.3) hols. In general we require 2 l 1 to ensure that a suitable multiple of exists in the interval [2 N+l, 2 N+l +2 l ]. For compatibility with the algorithms for signe ivision ( 5 an 6), it is convenient to choose m > 2 N+l even though Theorem 4.2 permits equality. Since m can be almost as large as 2 N+1, we on t multiply by m irectly, but instea by 2 N an m 2 N. This leas to the coe in Figure 4.1. Its cost is 1 multiply, 2 as/subtracts, an 2 shifts per quotient, after computing constants epenent only on the ivisor. Initialization (given uwor with 1 < 2 N ): int l = log 2 ; /* 2 l 2 1 */ uwor m = 2 N (2 l )/ + 1; /* m = 2 N+l / 2 N + 1 */ int sh 1 = min(l, 1); int sh 2 = max(l 1, 0); /* sh 2 = l sh 1 */ For q = n/, all uwor: uwor t 1 = MULUH(m, n); q = SRL(t 1 + SRL(n t 1, sh 1 ), sh 2 ); Figure 4.1: Unsigne ivision by run time invariant ivisor Explanation of Figure 4.1. If = 1, then l = 0, so m = 1 an sh 1 = sh 2 = 0. The coe computes t 1 = 1 n/2 N = 0 an q = n. If > 1, then l 1, so sh 1 = 1 an sh 2 = l 1. Since m 2N (2 l ) + 1 2N ( 1) + 1 < 2 N, the value of m fits in a uwor. Since 0 t 1 n, the formula for q simplifies to q = SRL(t 1 + SRL(n t 1, 1), l 1) t1 + (n t 1 )/2 = 2 l 1 (t1 + n)/2 t1 + n = =. 2 l 1 2 l (4.5) But t 1 + n = m n/2 N + n = (m + 2 N ) n/2 N. Set m = m + 2 N = 2 N+l / + 1. The hypothesis of Theorem 4.2 is satisfie since 2 N+l < m 2 N+l + 2 N+l + 2 l. Caution. Conceptually q is SRL(n + t 1, l), as in (4.5). Do not compute q this way, since n+t 1 may overflow N bits an the shift count may be out of bouns. Improvement. If is constant an a power of 2, replace the ivision by a shift. Improvement. If is constant an m = m + 2 N is even, then reuce m/2 l to lowest terms. The reuce multiplier fits in N bits, unlike the original. In rare cases (e.g., = 641 on a 32 bit machine, = on a 64 bit machine) the final shift is zero. Improvement. If is constant an even, rewrite n n/2 e = /2 e for some e > 0. Then n/2 e can be compute using SRL. Since n/2 e < 2 N e, less precision is neee in the multiplier than before. These ieas are reflecte in Figure 4.2, which generates coe for n/ where n is unsigne an is constant. Proceure CHOOSE MULTIPLIER, which is share by this an later algorithms, appears in Figure 6.2. Inputs: uwor an n, with constant. uwor o, t 1 ; uwor m; int e, l, l ummy, sh post, sh pre ; (m, sh post, l) = CHOOSE MULTIPLIER(, N); if m 2 N an is even then Fin e such that = 2 e o an o is o. /* 2 e = AND(, 2 N ) */ sh pre = e; (m, sh post, l ummy ) = CHOOSE MULTIPLIER( o, N e); else sh pre = 0; en if if = 2 l then Issue q = SRL(n, l); else if m 2 N then assert sh pre = 0; Issue t 1 = MULUH(m 2 N, n); Issue q = SRL(t 1 + SRL(n t 1, 1), sh post 1); else Issue q = SRL(MULUH(m, SRL(n, sh pre )), sh post ); en if Figure 4.2: Optimize coe generation of unsigne q = n/ for constant nonzero The following three examples illustrate the cases in Figure 4.2. All assume unsigne 32 bit arithmetic. Example. q = n/10. CHOOSE MULTIPLIER fins m low = (2 36 6)/10 an m high = ( )/10. After one roun of ivisions by 2, it returns (m, 3, 4), where m = ( )/5. The suggeste coe q = SRL(MULUH(( )/5, n), 3) eliminates the pre shift by 0. See Table Example. q = n/7. Here m = ( )/7 > This example uses the longer sequence in Figure 4.1. Example. q = n/14. CHOOSE MULTIPLIER first returns the same multiplier as when = 7. The

5 suggeste coe uses separate ivisions by 2 an 7: q = SRL(MULUH(( )/7, SRL(n, 1)), 2). 5 Signe ivision, quotient roune towars 0 Suppose we want to compile a signe ivision q = TRUNC(n/), where is constant or run time invariant, 0 < 2 N 1, an where 2 N 1 n 2 N 1 1 is variable. All quotients are to be roune towars zero. We coul prove a theorem like Theorem 4.2 about when TRUNC(n/) = TRUNC(m n/2 N+l ) for all n in a suitable range (cf. (7.1)), but it wouln t help since we can t compute the right sie given only m n/2 N. Instea we show how to ajust the estimate quotient when the ivien or ivisor is negative. Theorem 5.1 Suppose m,, l are integers such that 0 an 0 < m 2 N+l 1 2 l. Let n be an arbitrary integer such that 2 N 1 n 2 N 1 1. Define q 0 = m n/2 N+l 1. Then ( n ) TRUNC = q 0 if n 0 an > 0, 1 + q 0 if n < 0 an > 0, q 0 if n 0 an < 0, 1 q 0 if n < 0 an < 0. Proof. When n 0 an > 0, this is Theorem 4.2 with N replace by N 1. Suppose n < 0 an > 0, say n = q r where 0 r 1. Define k = m 2 N+l 1. Then q m n 2 N+l 1 = k 2 l n 2 N r, (5.2) as in (4.4). Since 0 < k 2 l by hypothesis, the first fraction on the right of (5.2) is positive an r/ is nonnegative. The sum is at most 1/ + ( 1)/ = 1, so q 0 = m n/2 N+l 1 = q 1, as asserte. For < 0, use TRUNC(n/) = TRUNC(n/ ). Caution. When < 0, avoi rewriting the quotient as TRUNC(( n)/ ), which fails for n = 2 N 1. For a run time invariant ivisor, this leas to the coe in Figure 5.1. Its cost is 1 multiply, 3 as, 2 shifts, an 1 bit op per quotient. Explanation of Figure 5.1. The multiplier m satisfies 2 N 1 < m < 2 N except when = ±1; in the latter cases m = 2 N + 1. In either case m = m 2 N fits in an swor. We compute m n/2 N as n+ (m 2 N ) n/2 N, using MULSH. The subtraction of XSIGN(n) as one if n < 0. The last line negates the tentative quotient if < 0 (i.e., if sign = 1). Variation. ( An alternate computation of m is m = 2 N (2 l 1 ) ) + 1 TRUNC. This uses signe (2N) bit/n bit ivision, with N bit quotient. Initialization (given constant swor with 0): int l = max ( log 2, 1); uwor m = N+l 1 / ; swor m = m 2 N ; swor sign = XSIGN(); int sh post = l 1; For q = TRUNC(n/), all swor: swor q 0 = n + MULSH(m, n); q 0 = SRA(q 0, sh post ) XSIGN(n); q = EOR(q 0, sign ) sign ; Figure 5.1: Signe ivision by run time invariant ivisor, roune towars zero Overflow etection. The quotient n/ overflows if n = 2 N 1 an = 1. The algorithm in Figure 5.1 returns 2 N 1. If overflow etection is require, the final subtraction of sign shoul check for overflow. Improvement. If m is constant an even, then reuce m/2 l to lowest terms, as in the unsigne case. This improvement is reflecte in Figure 5.2, which generates coe for TRUNC(n/) where is a nonzero constant. Figure 5.2 also checks for ivisor being a power of 2 or negative thereof. Inputs: swor an n, with constant an 0. uwor m; int l, sh post ; (m, sh post, l) = CHOOSE MULTIPLIER(, N 1); if = 1 then Issue q = ; else if = 2 l then Issue q = SRA(n + SRL(SRA(n, l 1), N l), l); else if m < 2 N 1 then Issue q = SRA(MULSH(m, n), sh post ) XSIGN(n); else Issue q = SRA(n + MULSH(m 2 N, n), sh post ) XSIGN(n); Cmt. Caution m 2 N is negative. en if if < 0 then Issue q = q; en if Figure 5.2: Optimize coe generation of signe q = TRUNC(n/) for constant 0 Example. q = TRUNC(n/3). On a 32 bit machine. CHOOSE MULTIPLIER(3, 31) returns sh post = 0 an m = ( )/3. The coe q = MULSH(m, n) XSIGN(n) uses one multiply, one shift, one subtract.

6 6 Signe ivision, quotient roune towars Some languages require negative quotients to roun towars rather than zero. With some ingenuity, we can compute these quotients in terms of quotients which roun towars zero, even if the signs of the ivien an ivisor are unknown at compile time. If n an are integers, then the ientities TRUNC(n/) if n 0 an > 0, n TRUNC((n + 1)/) 1 if n < 0 an > 0, = TRUNC((n 1)/) 1 if n > 0 an < 0, TRUNC(n/) if n 0 an < 0 are easily verifie. Since the new numerators n±1 never overflow, these ientities can be use for computation. They are summarize by n ( ) n + sign n sign = TRUNC + q sign, (6.1) where sign = XSIGN(), n sign = XSIGN(OR(n, n + sign )), an q sign = EOR(n sign, sign ). The cost is 2 shifts, 3 as/subtracts, an 2 bit ops, plus the ivie (n + sign is a repeate subexpression). For remainers, a corollary to (2.1) an (6.1) is n mo = n TRUNC((n + sign n sign )/) q sign = ((n + sign n sign ) rem ) (6.2) sign + n sign q sign = ((n + sign n sign ) rem ) + AND( 2 sign 1, q sign ). The last equality in (6.2) can be verifie by separately checking the cases q sign = n sign sign = 0 an q sign = n sign + sign = 1. The subexpression 2 sign 1 epens only on. For rouning towars +, an analog of (6.1) is n ( ) n sign + n pos = TRUNC EOR( sign, n pos ), where sign = XSIGN() an n pos = (n > sign ). Improvement. If > 0 is constant, then sign = 0. Then (6.1) becomes n ( ) n nsign = TRUNC + n sign, where n sign = XSIGN(n). Since TRUNC( x) = TRUNC(x) an EOR( 1, n) = 1 n = (n + 1), this is equivalent to n ( ( )) EOR(nsign, n) = EOR n sign, TRUNC (6.3) ( > 0). The ivien an ivisor on the right of (6.3) are both nonnegative an below 2 N 1. One can view them as signe or as unsigne when applying earlier algorithms. Improvement. The XSIGN(OR(n, n + sign )) is equivalent to (n NOT( sign )) an to (n < sign ), where the relationals prouce 1 if true an 0 if false. On the MIPS R2000/R3000 [12], for example, one can compute sign = SRL(, N 1); n sign = (n < sign ); /* SLT, signe */ q sign = EOR( n sign, sign ); q = TRUNC((n ( sign ) + ( n sign ))/) ( q sign ); (six instructions plus the ivie), saving an instruction over (6.1). Improvement. If n known to be nonzero, then n sign simplifies to XSIGN(n). For constant ivisors, one can use (6.1) an the algorithm in Figure 5.2. For constant > 0 a shorter algorithm, base on (6.3), appears in Figure 6.1. Inputs: swor n an, with constant an 0. uwor m; int l, sh post ; (m, sh post, l) = CHOOSE MULTIPLIER(, N 1); if = 2 l then Issue q = SRA(n, l); else assert m < 2 N ; Issue swor n sign = XSIGN(n); Issue uwor q 0 = MULUH(m, EOR(n sign, n)); Issue q = EOR(n sign, SRL(q 0, sh post )); en if Figure 6.1: Optimize coe generation of signe q = n/ for constant > 0 Example. Using signe 32 bit arithmetic, the coe for r = n mo 10 (nonnegative remainer) can be swor n sign = XSIGN(n); uwor q 0 = MULUH(( )/5, EOR(n sign, n)); swor q = EOR(n sign, SRL(q 0, 2)); r = n SLL(q, 1) SLL(q, 3);. The cost is 1 multiply, 4 shifts, 2 bit ops, 2 subtracts. Alternately, if one has a fast signe ivision algorithm which rouns quotients towars 0 an returns remainers, then (6.2) justifies the coe r = ((n XSIGN(n)) rem 10) + AND(9, XSIGN(n)). The cost is 1 ivie, 1 shift, 1 bit op, 2 as/subtracts.

7 proceure CHOOSE MULTIPLIER(uwor, int prec); Cmt. Constant ivisor to invert. 1 < 2 N. Cmt. prec Number of bits of precision neee, 1 prec N. Cmt. Fins m, sh post, l such that: Cmt. 2 l 1 < 2 l. Cmt. 0 sh post l. If sh post > 0, then N + sh post l + prec. Cmt. 2 N+sh post < m 2 N+sh post (1 + 2 prec ). Cmt. Corollary. If 2 prec, then m < 2 N+sh post ( l )/ 2 N+sh post l+1. Cmt. Hence m fits in max(prec, N l) + 1 bits (unsigne). Cmt. int l = log 2, sh post = l; uwor m low = 2 N+l /, m high = (2 N+l + 2 N+l prec )/ ; Cmt. To avoi numerator overflow, compute m low as 2 N + (m low 2 N ). Cmt. Likewise for m high. Compare m in Figure 4.1. Invariant. m low = 2 N+sh post/ < m high = 2 N+sh post (1 + 2 prec )/. while m low /2 < m high /2 an sh post > 0 o m low = m low /2 ; m high = m high /2 ; sh post = sh post 1; en while; /* Reuce to lowest terms. */ return (m high, sh post, l); /* Three outputs. */ en CHOOSE MULTIPLIER; Figure 6.2: Selection of multiplier an shift count 7 Use of floating point One alternative to MULUH an MULSH uses floating point arithmetic. Let the floating point mantissa be F bits wie (e.g., F = 53 for IEEE ouble precision arithmetic). Then any floating point operation has relative error at most 2 1 F, regarless of the rouning moe, unless exponent overflow or unerflow occurs. Suppose N 1 an F N + 3. We claim that where ( n ) TRUNC = TRUNC(q est ), ( ) F q est n, (7.1) whenever n 2 N 1 an 0 < < 2 N, regarless of the rouning moes use to compute q est. The proof assumes that n > 0 an > 0, by negating both sies of (7.1) if necessary (the case n = 0 is trivial). Since the relative error per operation is at most 2 1 F, the estimate quotient q est satisfies F ( F ) 2 n q est ( F ) ( F ) 2 n. Use this an the inequalities 1 2 F F < F ( F ) 2, ( F ) ( F ) 2 1 < F N to erive (1 2 F ) n < q est < n/ n/ 1 2 N 1 1 n+1 = n + 1. Denote q = TRUNC(n/). Then q est < (n + 1)/ implies TRUNC(q est ) q. If q est < q, then (1 2 F ) q (1 2 F ) n < q est < q. Both q an q est are exactly representable as floating point numbers, but there are no representable numbers strictly between (1 2 F ) q an q. This contraiction shows that q est q an hence q = TRUNC(q est ). For quotients roune towars, use (6.1). If F = 53 an N 50, then (7.1) can be use for N bit integer ivision. The algorithm may trigger an IEEE exception for inexactness if the application program enables that conition. Alverson [1] uses integer multiplication, but computes the multiplier using floating point arithmetic. Baker [3] oes moular multiplication using a combination of floating point an integer arithmetic. 8 Diviing uwor by uwor One primitive operation for multiple precision arithmetic [14, p. 251] is the ivision of a uwor by a uwor, obtaining uwor quotient an remainer, where the quotient is known to be less than 2 N. We

8 Initialization (given uwor, where 0 < < 2 N ): int l = 1 + log 2 ; /* 2 l 1 < 2 l */ uwor m = (2 N (2 l ) 1)/ ; /* m = (2 N+l 1)/ 2 N */ uwor norm = SLL(, N l); /* Normalize ivisor 2 N l */ For q = n/ an r = n q, where, q, r are uwor an n is uwor: uwor n 2 = SLL(HIGH(n), N l) + SRL(LOW(n), l); /* See note about shift count. */ uwor n 10 = SLL(LOW(n), N l); /* n 10 = n 1 2 N 1 + n 0 2 N l */ /* Ignore overflow. */ swor n 1 = XSIGN(n 10 ); uwor n aj = n 10 + AND( n 1, norm 2 N ); /* n 10 + n 1 ( norm 2 N ) */ /* = n 1 ( norm 2 N 1 ) + n 0 2 N l */ uwor q 1 = n 2 + HIGH ( ) m (n 2 ( n 1 )) + n aj ; /* Unerflow is impossible. */ /* See Lemma 8.1. */ swor r = n 2 N + (2 N 1 q 1 ) ; /* r = n q 1, r < */ q = HIGH(r) (2 N 1 q 1 ) + 2 N ; /* A 1 to quotient if r 0. */ r = LOW(r) + AND( 2 N, HIGH(r)); /* A to remainer if r < 0. */ Figure 8.1: Unsigne ivision of uwor by run time invariant uwor. escribe a way to compute this quotient an remainer after some preliminary computations involving only the ivisor, when the ivisor is a run time invariant expression. Lemma 8.1 Suppose that, m, an l are nonnegative integers such that 2 l 1 < 2 l 2 N an 0 < 2 N+l m. (8.2) Given n with 0 n 2 N 1, write n = n 2 2 l + n 1 2 l 1 + n 0, where n 0, n 1, an n 2 are integers with 0 n 1 1 an 0 n 0 2 l 1 1. Define integers q 1 an q 0 by q 1 2 N + q 0 = n 2 2 N + (n 2 + n 1 ) (m 2 N ) + n 1 ( 2 N l 2 N 1) + n 0 2 N l (8.3) an 0 q 0 2 N 1. 0 n q 1 < 2. Then 0 q 1 2 N 1 an Proof. Define k = 2 N+l m. Then (8.2) implies 0 < k 2 l 1. The boun n 2 N 1 implies n 2 2 N l 1. Equation (8.2) implies m > 2 N+l / > 2 N. A corollary to (8.3) is q 1 2 N + q 0 = n 2 m + n 1 (m 2 N ) + 2 N l ( n 1 ( 2 l 1 ) + n 0 ) ( 2 N l 1) m + 1 (m 2 N ) + 2 N l ( 1 (2 l 1 1) + (2 l 1 1) ) = 2 N l ( m 2) < 2 2N. This proves the upper boun on the integer q 1. A straightforwar calculation using the efinitions of k an q 0 an n 0 reveals that n q 1 = (n 2 + n 1 ) k + q 0 2 N + (1 2 ) ) l (n 1 ( 2 l 1 ) + n 0. (8.4) Since 2 l 1 < 2 l by hypothesis, the right sie of (8.4) is nonnegative. This remainer is boune by ( 2 N l ) + (2 N 1) 2 N + (1 2 ) ( ) l 1 ( 2 l 1 ) + (2 l 1 1) ( ) 2 < 2 l + + (1 2 ) l = 2, completing the proof. This leas to an algorithm like that in Figure 8.1 when iviing a uwor by a run time invariant uwor with quotient known to be less than 2 N. Unlike the previous algorithms, this coe rouns the multiplier own when computing a reciprocal. After initializations epening only on the ivisor, this algorithm requires two proucts (both halves of each) an simple operations (incluing oublewor as an subtracts). Five registers hol, norm, l, m, an N l. Note. The shift count l in the computations of m an n 2 may equal N. If this is too large, use separate shifts by l 1 an 1. If a oublewor shift is available, compute n 2 an n 10 together.

9 9 Exact ivision by constants Occasionally a language construct requires a ivision whose remainer is known to vanish. An example occurs in C when subtracting two pointers. Their numerical ifference is ivie by the object size. The object size is a compile time constant. Suppose we want coe for q = n/, where is a nonzero constant an n is an expression known to be ivisible by. Write = 2 e o where o is o. Fin inv such that 1 inv 2 N 1 an Then inv o 1 (mo 2 N ). (9.1) 2 e q = 2 e n = n o ( inv o ) n = inv n (mo 2 N ), o as in [2]. Hence 2 e q inv n (mo 2 N ). Since n/ o = 2 e q fits in N bits, it must equal the lower half of the prouct inv n, namely MULL( inv, n). An SRA (for signe ivision) or SRL (for unsigne ivision) prouces the quotient q. The multiplicative inverse inv of o moulo 2 N can be foun by the extene Eucliean GCD algorithm [14, p. 325]. Another algorithm observes that (9.1) hols moulo 2 3 if inv = o. Each Newton iteration inv inv (2 inv o ) mo 2 N (9.2) oubles the known exponent by which (9.1) hols, so log 2 (N/3) iterations of (9.2) suffice. If o = ±1, then inv = o so the multiplication by inv is trivial or a negation. If is o, then e = 0 an the shift isappears. A variation tests whether an integer n is exactly ivisible by a nonzero constant without computing the remainer. If is a power of 2 (or the negative thereof, in the signe case), then check the lower bits of n to test whether ivies n. Otherwise compute inv an e as above. Let q 0 = MULL( inv, n). If n = q for some q, then q 0 = 2 e q must be a multiple of 2 e. The original ivision is exact (no remainer) precisely when (i) q 0 is a multiple of 2 e, an (ii) q 0 is sufficiently small that q 0 o is representable by the original ata type. For unsigne ivision check that 2 0 q 0 2 e N 1 an that the bottom e bits of q 0 (or of n) are zero. When e > 0, these tests can be combine if the architecture has a rotate (i.e., circular shift) instruction, or by expaning this rotate into 2 N 1 OR(SRL(q 0, e), SLL(q 0, N e)). For signe ivision check that 2 2 e N 1 2 q 0 2 e N 1 1 an that the bottom e bits of q 0 are zero; the interval check can be one with an a an one signe or unsigne compare. Relately, to test whether n rem = r, where an r are constants with 1 r < an where n is signe, check whether MULL( inv, n r) is a nonnegative multiple of 2 e not exceeing 2 e (2 N 1 1 r)/. Example. To test whether a signe 32 bit value i is ivisible by 100, let inv = ( )/25. Compute swor q 0 = MULL( inv, i). Next check whether q 0 is a multiple of 4 in the interval [ q max, q max ], where q max = ( )/25. Since these algorithms require only the lower half of a prouct, other optimizations for integer multiplication apply here too. For example, applying strength reuction to the C loop signe long i, imax; for (i = 0; i < imax; i++) { if ((i % 100) == 0) {... } } might yiel (** enotes exponentiation) const unsigne long inv = (19*2**32 + 1)/25; const unsigne long qmax = (2**31-48)/25; unsigne long test = qmax; /* test = inv*i + qmax mo 2**32 */ for (i = 0; i < imax; i++, test += inv) { if (test <= 2*qmax && (test & 3) == 0) {... } } No explicit multiplication or ivision remains. 10 Implementation in GCC We have implemente the algorithms for constant ivisors in the freely available GCC compiler [21], by extening its machine an language inepenent internal coe generation. We also mae minor machine epenent moifications to some of the machine escriptor, or m files to get optimal coe. All languages an almost all processors supporte by GCC benefit. Our changes are scheule for inclusion in GCC 2.6.

10 To generate coe for ivision of N bit quantities, the CHOOSE MULTIPLIER function nees to perform (2N) bit arithmetic. This makes that proceure more complex than it might appear in Figure 6.2. Optimal selection of instructions epening on the bitsize of the operation is a tricky problem that we spent quite some time on. For some architectures, it is important to select a multiplication instruction that has the smallest available precision. On other architectures, the multiplication can be performe faster using a sequence of aitions, subtractions, an shifts. We have not implemente any algorithm for run time invariant ivisors. Only a few architectures (AMD 29050, Intel x86, Motorola 68k & 88110, an to some extent IBM POWER) have aequate harware support to make such an implementation viable, i.e., an instruction that can be use for integer logarithm computation, an a (2N) bit/n bit ivie instruction. Even with harware support, one must be careful that the transformation really improves the coe; e.g., a loop might nee to be execute many times before the faster loop boy outweighs the cost of the multiplier computation in the loop heaer. 11 Results Figure 11.1 has an example with compile time constant ivisor that gets rastically faster on all recent processor implementations. The program converts a binary number to a ecimal string. It calculates one quotient an one remainer per output igit. Table 11.1 shows the generate assembler coes for Alpha, MIPS, POWER, an SPARC. There is no explicit ivision. Although initially compute separately, the quotient an remainer calculations have been combine (by GCC s common subexpression elimination pass). The unsigne int ata type has 32 bits on all four architectures, but Alpha is a 64 bit architecture. The Alpha coe is longer than the others because it multiplies ( )/5 by x using 4 [ ( ) ( ) ( 4 [4 (4 x x) + x] x )] + x instea of the slower, 23 cycle, mulq. This illustrates that the multiplications neee by these algorithms can sometimes be compute quickly using a sequence of shifts, as, an subtracts [5], since multipliers for small constant ivisors have regular binary patterns. Table 11.2 compares the timing on some processor implementations for the raix conversion routine, with an without the ivision elimination algorithms. The number converte was a full 32 bit number, sufficiently large to hie proceure calling overhea from the measurements. We also ran the integer benchmarks from SPEC 92. The improvement was negligible for most of the programs; the best improvement seen was only about 3%. Some benchmarks that involve hashing show improvements up to about 30%. We anticipate significant improvements on some number theoretic coes. References [1] Robert Alverson. Integer ivision using reciprocals. In Peter Kornerup an Davi W. Matula, eitors, Proceeings 10th Symposium on Computer Arithmetic, pages , Grenoble, France, June [2] Ehu Artzy, James A. Hins, an Harry J. Saal. A fast ivision technique for constant ivisors. CACM, 19(2):98 101, February [3] Henry G. Baker. Computing A*B (mo N) efficiently in ANSI C. ACM SIGPLAN Notices, 27(1):95 98, January [4] H.B. Bakoglu, G.F. Grohoski, an R. K. Montoye. The IBM RISC system/6000 processor: Harware overview. IBM Journal of Research an Development, 34(1):12 22, January [5] Robert Bernstein. Multiplication by integer constants. Software Practice an Experience, 16(7): , July [6] Raymon T. Boute. The Eucliean efinition of the functions iv an mo. ACM Transactions on Programming Languages an Systems, 14(2): , April [7] A.P. Chang. A note on the moulo operation. SIGPLAN Notices, 20(4):19 23, April [8] Digital Equipment Corporation. DECchip AA Microprocessor, Harware Reference Manual, 1st eition, October [9] Intel Corporation, Santa Clara, CA. 386 DX Microprocessor Programmer s Reference Manual, [10] Intel Corporation, Santa Clara, CA. Intel486 Microprocessor Family Programmer s Reference Manual, [11] Davi H. Jacobsohn. A combinatoric ivision algorithm for fixe-integer ivisors. IEEE Trans. Comp., C 22(6): , June [12] Gerry Kane. MIPS RISC Architecture. Prentice Hall, Englewoo Cliffs, NJ, 1989.

11 #efine BUFSIZE 50 char *ecimal (unsigne int x) { static char buf[bufsize]; char *bp = buf + BUFSIZE - 1; *bp = 0; o { *--bp = 0 + x % 10; x /= 10; } while (x!= 0); return bp; /* Return pointer to first igit */ } Figure 11.1: Raix conversion coe Alpha MIPS POWER SPARC $2,buf la $5,buf+49 l 10,LC..0(2) sethi %hi(buf+49),%g2 sb $0,0($5) cau 11,0,0xcccc or %g2,%lo(buf+49),%o1 li $6,0xcccc0000 oril 11,11,0xccc stb %g0,[%o1] ori $6,$6,0xccc cal 0,0(0) sethi %hi(0xccccccc),%g2 L1: multu $4,$6 stb 0,0(10) or %g2,0xc,%o2 mfhi $3 L1: mul 9,3,11 L1: a %o1,-1,%o1 subu $5,$5,1 srai 0,3,31 umul %o0,%o2,%g0 srl $3,$3,3 an 0,0,11 r %y,%g3 sll $2,$3,2 a 9,9,0 srl %g3,3,%g3 au $2,$2,$3 a 9,9,3 sll %g3,2,%g2 sll $2,$2,1 sri 9,9,3 a %g2,%g3,%g2 subu $2,$4,$2 muli 0,9,10 sll %g2,1,%g2 au $2,$2,48 sf 0,0,3 sub %o0,%g2,%g2 move $4,$3 ai. 3,9,0 a %g2,48,%g2 bne $4,$0,L1 ai 0,0,48 orcc %g3,%g0,%o0 sb $2,0($5) stbu 0,-1(10) bne L1 j $31 bc 4,2,L1 stb %g2,[%o1] move $2,$5 la lq u $1,49($2) aq $2,49,$0 mskbl $1,$0,$1 stq u $1,49($2) L1: zapnot $16,15,$3 s4subq $3,$3,$2 s4aq $2,$3,$2 s4subq $2,$3,$2 sll $2,8,$1 subq $0,1,$0 aq $2,$1,$2 sll $2,16,$1 lq u $4,0($0) aq $2,$1,$2 s4aq $2,$3,$2 srl $2,35,$2 mskbl $4,$0,$4 s4al $2,$2,$1 aq $1,$1,$1 subl $16,$1,$1 al $1,48,$1 insbl $1,$0,$1 bis $2,$2,$16 bis $1,$4,$1 stq u $1,0($0) bne $16,L1 ret $31,($26),1 ai 3,10,0 br retl mov Table 11.1: Coe generate by our GCC for raix conversion %o1,%o0

12 Architecture/Implementation MHz Time with ivision performe Time with ivision eliminate Speeup ratio Motorola MC68020 [18, pp. 9 22] Motorola MC SPARC Viking [20] HP PA MIPS R3000 [12] MIPS R4000 [17] POWER/RIOS I [4, 22] DEC Alpha [8] * *This time ifference is artificial. The Alpha architecture has no integer ivie instruction, an the DEC library functions for ivision are slow. Table 11.2: Timing (microsecons) for raix conversion with an without ivision elimination [13] Donal E. Knuth. An empirical stuy of FOR- TRAN programs. Technical Report CS 186, Computer Science Department, Stanfor University, Stanfor artificial intelligence project memo AIM 137. [14] Donal E. Knuth. Seminumerical Algorithms, volume 2 of The Art of Computer Programming. Aison-Wesley, Reaing, MA, 2n eition, [15] Shuo-Yen Robert Li. Fast constant ivision routines. IEEE Trans. Comp., C 34(9): , September [16] Daniel J. Magenheimer, Liz Peters, Karl Pettis, an Dan Zuras. Integer multiplication an ivision on the HP Precision Architecture. In Proceeings Secon International Conference on Architectural Support for Programming Languages an Operating Systems (ASPLOS II). ACM, Publishe as SIGPLAN Notices, Volume 22, No. 10, October, [17] MIPS Computer Systems, Inc, Sunnyvale, CA. MIPS R4000 Microprocessor User s Manual, [18] Motorola, Inc. MC Bit Microprocessor User s Manual, 2n eition, [19] Motorola, Inc. PowerPC 601 RISC Microprocessor User s Manual, [20] SPARC International, Inc., Menlo Park, CA. The SPARC Architecture Manual, Version 8, [21] Richar M. Stallman. Using an Porting GCC. The Free Software Founation, Cambrige, MA, [22] Henry Warren. Preicting Execution Time on the IBM RISC System/6000. IBM, Preliminary Version.

Improved division by invariant integers

Improved division by invariant integers 1 Improve ivision by invariant integers Niels Möller an Torbjörn Granlun Abstract This paper consiers the problem of iviing a two-wor integer by a single-wor integer, together with a few extensions an

More information

Integral Regular Truncated Pyramids with Rectangular Bases

Integral Regular Truncated Pyramids with Rectangular Bases Integral Regular Truncate Pyramis with Rectangular Bases Konstantine Zelator Department of Mathematics 301 Thackeray Hall University of Pittsburgh Pittsburgh, PA 1560, U.S.A. Also: Konstantine Zelator

More information

Math 230.01, Fall 2012: HW 1 Solutions

Math 230.01, Fall 2012: HW 1 Solutions Math 3., Fall : HW Solutions Problem (p.9 #). Suppose a wor is picke at ranom from this sentence. Fin: a) the chance the wor has at least letters; SOLUTION: All wors are equally likely to be chosen. The

More information

Factoring Dickson polynomials over finite fields

Factoring Dickson polynomials over finite fields Factoring Dickson polynomials over finite fiels Manjul Bhargava Department of Mathematics, Princeton University. Princeton NJ 08544 manjul@math.princeton.eu Michael Zieve Department of Mathematics, University

More information

Chapter II Binary Data Representation

Chapter II Binary Data Representation Chapter II Binary Data Representation The atomic unit of data in computer systems is the bit, which is actually an acronym that stands for BInary digit. It can hold only 2 values or states: 0 or 1, true

More information

PROBLEMS. A.1 Implement the COINCIDENCE function in sum-of-products form, where COINCIDENCE = XOR.

PROBLEMS. A.1 Implement the COINCIDENCE function in sum-of-products form, where COINCIDENCE = XOR. 724 APPENDIX A LOGIC CIRCUITS (Corrispone al cap. 2 - Elementi i logica) PROBLEMS A. Implement the COINCIDENCE function in sum-of-proucts form, where COINCIDENCE = XOR. A.2 Prove the following ientities

More information

A Generalization of Sauer s Lemma to Classes of Large-Margin Functions

A Generalization of Sauer s Lemma to Classes of Large-Margin Functions A Generalization of Sauer s Lemma to Classes of Large-Margin Functions Joel Ratsaby University College Lonon Gower Street, Lonon WC1E 6BT, Unite Kingom J.Ratsaby@cs.ucl.ac.uk, WWW home page: http://www.cs.ucl.ac.uk/staff/j.ratsaby/

More information

Quiz for Chapter 3 Arithmetic for Computers 3.10

Quiz for Chapter 3 Arithmetic for Computers 3.10 Date: Quiz for Chapter 3 Arithmetic for Computers 3.10 Not all questions are of equal difficulty. Please review the entire quiz first and then budget your time carefully. Name: Course: Solutions in RED

More information

Computer Organization and Architecture

Computer Organization and Architecture Computer Organization and Architecture Chapter 9 Computer Arithmetic Arithmetic & Logic Unit Performs arithmetic and logic operations on data everything that we think of as computing. Everything else in

More information

Lecture 8: Binary Multiplication & Division

Lecture 8: Binary Multiplication & Division Lecture 8: Binary Multiplication & Division Today s topics: Addition/Subtraction Multiplication Division Reminder: get started early on assignment 3 1 2 s Complement Signed Numbers two = 0 ten 0001 two

More information

Pythagorean Triples Over Gaussian Integers

Pythagorean Triples Over Gaussian Integers International Journal of Algebra, Vol. 6, 01, no., 55-64 Pythagorean Triples Over Gaussian Integers Cheranoot Somboonkulavui 1 Department of Mathematics, Faculty of Science Chulalongkorn University Bangkok

More information

Binary Representation and Computer Arithmetic

Binary Representation and Computer Arithmetic Binary Representation and Computer Arithmetic The decimal system of counting and keeping track of items was first created by Hindu mathematicians in India in A.D. 4. Since it involved the use of fingers

More information

Digital Logic. The Binary System is a way of writing numbers using only the digits 0 and 1. This is the method used by the (digital) computer.

Digital Logic. The Binary System is a way of writing numbers using only the digits 0 and 1. This is the method used by the (digital) computer. Digital Logic 1 Data Representations 1.1 The Binary System The Binary System is a way of writing numbers using only the digits 0 and 1. This is the method used by the (digital) computer. The system we

More information

CHAPTER 5: MODULAR ARITHMETIC

CHAPTER 5: MODULAR ARITHMETIC CHAPTER 5: MODULAR ARITHMETIC LECTURE NOTES FOR MATH 378 (CSUSM, SPRING 2009). WAYNE AITKEN 1. Introduction In this chapter we will consider congruence modulo m, and explore the associated arithmetic called

More information

Review 1/2. CS61C Characters and Floating Point. Lecture 8. February 12, Review 2/2 : 12 new instructions Arithmetic:

Review 1/2. CS61C Characters and Floating Point. Lecture 8. February 12, Review 2/2 : 12 new instructions Arithmetic: Review 1/2 CS61C Characters and Floating Point Lecture 8 February 12, 1999 Handling case when number is too big for representation (overflow) Representing negative numbers (2 s complement) Comparing signed

More information

19.2. First Order Differential Equations. Introduction. Prerequisites. Learning Outcomes

19.2. First Order Differential Equations. Introduction. Prerequisites. Learning Outcomes First Orer Differential Equations 19.2 Introuction Separation of variables is a technique commonly use to solve first orer orinary ifferential equations. It is so-calle because we rearrange the equation

More information

CS 103X: Discrete Structures Homework Assignment 3 Solutions

CS 103X: Discrete Structures Homework Assignment 3 Solutions CS 103X: Discrete Structures Homework Assignment 3 s Exercise 1 (20 points). On well-ordering and induction: (a) Prove the induction principle from the well-ordering principle. (b) Prove the well-ordering

More information

ECE 0142 Computer Organization. Lecture 3 Floating Point Representations

ECE 0142 Computer Organization. Lecture 3 Floating Point Representations ECE 0142 Computer Organization Lecture 3 Floating Point Representations 1 Floating-point arithmetic We often incur floating-point programming. Floating point greatly simplifies working with large (e.g.,

More information

10.2 Systems of Linear Equations: Matrices

10.2 Systems of Linear Equations: Matrices SECTION 0.2 Systems of Linear Equations: Matrices 7 0.2 Systems of Linear Equations: Matrices OBJECTIVES Write the Augmente Matrix of a System of Linear Equations 2 Write the System from the Augmente Matrix

More information

CHAPTER 5 : CALCULUS

CHAPTER 5 : CALCULUS Dr Roger Ni (Queen Mary, University of Lonon) - 5. CHAPTER 5 : CALCULUS Differentiation Introuction to Differentiation Calculus is a branch of mathematics which concerns itself with change. Irrespective

More information

Inverse Trig Functions

Inverse Trig Functions Inverse Trig Functions c A Math Support Center Capsule February, 009 Introuction Just as trig functions arise in many applications, so o the inverse trig functions. What may be most surprising is that

More information

INTEGER DIVISION BY CONSTANTS

INTEGER DIVISION BY CONSTANTS 12/30/03 CHAPTER 10 INTEGER DIVISION BY CONSTANTS Insert this material at the end of page 201, just before the poem on page 202. 10 17 Methods Not Using Multiply High In this section we consider some methods

More information

It is time to prove some theorems. There are various strategies for doing

It is time to prove some theorems. There are various strategies for doing CHAPTER 4 Direct Proof It is time to prove some theorems. There are various strategies for doing this; we now examine the most straightforward approach, a technique called direct proof. As we begin, it

More information

Mathematics Review for Economists

Mathematics Review for Economists Mathematics Review for Economists by John E. Floy University of Toronto May 9, 2013 This ocument presents a review of very basic mathematics for use by stuents who plan to stuy economics in grauate school

More information

Prime Numbers. Chapter Primes and Composites

Prime Numbers. Chapter Primes and Composites Chapter 2 Prime Numbers The term factoring or factorization refers to the process of expressing an integer as the product of two or more integers in a nontrivial way, e.g., 42 = 6 7. Prime numbers are

More information

CHAPTER THREE. 3.1 Binary Addition. Binary Math and Signed Representations

CHAPTER THREE. 3.1 Binary Addition. Binary Math and Signed Representations CHAPTER THREE Binary Math and Signed Representations Representing numbers with bits is one thing. Doing something with them is an entirely different matter. This chapter discusses some of the basic mathematical

More information

Computer Science 281 Binary and Hexadecimal Review

Computer Science 281 Binary and Hexadecimal Review Computer Science 281 Binary and Hexadecimal Review 1 The Binary Number System Computers store everything, both instructions and data, by using many, many transistors, each of which can be in one of two

More information

Elementary Number Theory We begin with a bit of elementary number theory, which is concerned

Elementary Number Theory We begin with a bit of elementary number theory, which is concerned CONSTRUCTION OF THE FINITE FIELDS Z p S. R. DOTY Elementary Number Theory We begin with a bit of elementary number theory, which is concerned solely with questions about the set of integers Z = {0, ±1,

More information

A simple and fast algorithm for computing exponentials of power series

A simple and fast algorithm for computing exponentials of power series A simple and fast algorithm for computing exponentials of power series Alin Bostan Algorithms Project, INRIA Paris-Rocquencourt 7815 Le Chesnay Cedex France and Éric Schost ORCCA and Computer Science Department,

More information

MIPS floating-point arithmetic

MIPS floating-point arithmetic MIPS floating-point arithmetic Floating-point computations are vital for many applications, but correct implementation of floating-point hardware and software is very tricky. Today we ll study the IEEE

More information

198:211 Computer Architecture

198:211 Computer Architecture 198:211 Computer Architecture Topics: Lecture 8 (W5) Fall 2012 Data representation 2.1 and 2.2 of the book Floating point 2.4 of the book 1 Computer Architecture What do computers do? Manipulate stored

More information

2 HYPERBOLIC FUNCTIONS

2 HYPERBOLIC FUNCTIONS HYPERBOLIC FUNCTIONS Chapter Hyperbolic Functions Objectives After stuying this chapter you shoul unerstan what is meant by a hyperbolic function; be able to fin erivatives an integrals of hyperbolic functions;

More information

Differentiability of Exponential Functions

Differentiability of Exponential Functions Differentiability of Exponential Functions Philip M. Anselone an John W. Lee Philip Anselone (panselone@actionnet.net) receive his Ph.D. from Oregon State in 1957. After a few years at Johns Hopkins an

More information

Representation of Data

Representation of Data Representation of Data In contrast with higher-level programming languages, C does not provide strong abstractions for representing data. Indeed, while languages like Racket has a rich notion of data type

More information

Instruction Set Architecture. or How to talk to computers if you aren t in Star Trek

Instruction Set Architecture. or How to talk to computers if you aren t in Star Trek Instruction Set Architecture or How to talk to computers if you aren t in Star Trek The Instruction Set Architecture Application Compiler Instr. Set Proc. Operating System I/O system Instruction Set Architecture

More information

Number Systems and Number Representation

Number Systems and Number Representation Number Systems and Number Representation 1 For Your Amusement Question: Why do computer programmers confuse Christmas and Halloween? Answer: Because 25 Dec = 31 Oct -- http://www.electronicsweekly.com

More information

This Unit: Floating Point Arithmetic. CIS 371 Computer Organization and Design. Readings. Floating Point (FP) Numbers

This Unit: Floating Point Arithmetic. CIS 371 Computer Organization and Design. Readings. Floating Point (FP) Numbers This Unit: Floating Point Arithmetic CIS 371 Computer Organization and Design Unit 7: Floating Point App App App System software Mem CPU I/O Formats Precision and range IEEE 754 standard Operations Addition

More information

Bits, Data Types, and Operations. University of Texas at Austin CS310H - Computer Organization Spring 2010 Don Fussell

Bits, Data Types, and Operations. University of Texas at Austin CS310H - Computer Organization Spring 2010 Don Fussell Bits, Data Types, and Operations University of Texas at Austin CS3H - Computer Organization Spring 2 Don Fussell How do we represent data in a computer? At the lowest level, a computer is an electronic

More information

Number Systems and. Data Representation

Number Systems and. Data Representation Number Systems and Data Representation 1 Lecture Outline Number Systems Binary, Octal, Hexadecimal Representation of characters using codes Representation of Numbers Integer, Floating Point, Binary Coded

More information

West Windsor-Plainsboro Regional School District Algebra I Part 2 Grades 9-12

West Windsor-Plainsboro Regional School District Algebra I Part 2 Grades 9-12 West Windsor-Plainsboro Regional School District Algebra I Part 2 Grades 9-12 Unit 1: Polynomials and Factoring Course & Grade Level: Algebra I Part 2, 9 12 This unit involves knowledge and skills relative

More information

Integer Multiplication and Division

Integer Multiplication and Division 6 Integer Multiplication and Division 6.1 Objectives After completing this lab, you will: Understand binary multiplication and division Understand the MIPS multiply and divide instructions Write MIPS programs

More information

Presented By: Ms. Poonam Anand

Presented By: Ms. Poonam Anand Presented By: Ms. Poonam Anand Know the different types of numbers Describe positional notation Convert numbers in other bases to base 10 Convert base 10 numbers into numbers of other bases Describe the

More information

Firewall Design: Consistency, Completeness, and Compactness

Firewall Design: Consistency, Completeness, and Compactness C IS COS YS TE MS Firewall Design: Consistency, Completeness, an Compactness Mohame G. Goua an Xiang-Yang Alex Liu Department of Computer Sciences The University of Texas at Austin Austin, Texas 78712-1188,

More information

The Laws of Cryptography Cryptographers Favorite Algorithms

The Laws of Cryptography Cryptographers Favorite Algorithms 2 The Laws of Cryptography Cryptographers Favorite Algorithms 2.1 The Extended Euclidean Algorithm. The previous section introduced the field known as the integers mod p, denoted or. Most of the field

More information

Some Polynomial Theorems. John Kennedy Mathematics Department Santa Monica College 1900 Pico Blvd. Santa Monica, CA 90405 rkennedy@ix.netcom.

Some Polynomial Theorems. John Kennedy Mathematics Department Santa Monica College 1900 Pico Blvd. Santa Monica, CA 90405 rkennedy@ix.netcom. Some Polynomial Theorems by John Kennedy Mathematics Department Santa Monica College 1900 Pico Blvd. Santa Monica, CA 90405 rkennedy@ix.netcom.com This paper contains a collection of 31 theorems, lemmas,

More information

TECH. Arithmetic & Logic Unit. CH09 Computer Arithmetic. Number Systems. ALU Inputs and Outputs. Binary Number System

TECH. Arithmetic & Logic Unit. CH09 Computer Arithmetic. Number Systems. ALU Inputs and Outputs. Binary Number System CH09 Computer Arithmetic CPU combines of ALU and Control Unit, this chapter discusses ALU The Arithmetic and Logic Unit (ALU) Number Systems Integer Representation Integer Arithmetic Floating-Point Representation

More information

A single register, called the accumulator, stores the. operand before the operation, and stores the result. Add y # add y from memory to the acc

A single register, called the accumulator, stores the. operand before the operation, and stores the result. Add y # add y from memory to the acc Other architectures Example. Accumulator-based machines A single register, called the accumulator, stores the operand before the operation, and stores the result after the operation. Load x # into acc

More information

Math Review. for the Quantitative Reasoning Measure of the GRE revised General Test

Math Review. for the Quantitative Reasoning Measure of the GRE revised General Test Math Review for the Quantitative Reasoning Measure of the GRE revised General Test www.ets.org Overview This Math Review will familiarize you with the mathematical skills and concepts that are important

More information

Chapter 2 Review of Classical Action Principles

Chapter 2 Review of Classical Action Principles Chapter Review of Classical Action Principles This section grew out of lectures given by Schwinger at UCLA aroun 1974, which were substantially transforme into Chap. 8 of Classical Electroynamics (Schwinger

More information

Memory Management. 3.1 Fixed Partitioning

Memory Management. 3.1 Fixed Partitioning Chapter 3 Memory Management In a multiprogramming system, in orer to share the processor, a number o processes must be kept in memory. Memory management is achieve through memory management algorithms.

More information

Lecture 3: Finding integer solutions to systems of linear equations

Lecture 3: Finding integer solutions to systems of linear equations Lecture 3: Finding integer solutions to systems of linear equations Algorithmic Number Theory (Fall 2014) Rutgers University Swastik Kopparty Scribe: Abhishek Bhrushundi 1 Overview The goal of this lecture

More information

Attention: This material is copyright 1995-1997 Chris Hecker. All rights reserved.

Attention: This material is copyright 1995-1997 Chris Hecker. All rights reserved. Attention: This material is copyright 1995-1997 Chris Hecker. All rights reserved. You have permission to read this article for your own education. You do not have permission to put it on your website

More information

Reduced Instruction Set Computer (RISC)

Reduced Instruction Set Computer (RISC) Reduced Instruction Set Computer (RISC) Focuses on reducing the number and complexity of instructions of the ISA. RISC Goals RISC: Simplify ISA Simplify CPU Design Better CPU Performance Motivated by simplifying

More information

ECE 0142 Computer Organization. Lecture 3 Floating Point Representations

ECE 0142 Computer Organization. Lecture 3 Floating Point Representations ECE 0142 Computer Organization Lecture 3 Floating Point Representations 1 Floating-point arithmetic We often incur floating-point programming. Floating point greatly simplifies working with large (e.g.,

More information

Integer and Real Numbers Representation in Microprocessor Techniques

Integer and Real Numbers Representation in Microprocessor Techniques Brno University of Technology Integer and Real Numbers Representation in Microprocessor Techniques Microprocessor Techniques and Embedded Systems Lecture 1 Dr. Tomas Fryza 30-Sep-2011 Contents Numerical

More information

Answers to the Practice Problems for Test 2

Answers to the Practice Problems for Test 2 Answers to the Practice Problems for Test 2 Davi Murphy. Fin f (x) if it is known that x [f(2x)] = x2. By the chain rule, x [f(2x)] = f (2x) 2, so 2f (2x) = x 2. Hence f (2x) = x 2 /2, but the lefthan

More information

Elementary Number Theory and Methods of Proof. CSE 215, Foundations of Computer Science Stony Brook University http://www.cs.stonybrook.

Elementary Number Theory and Methods of Proof. CSE 215, Foundations of Computer Science Stony Brook University http://www.cs.stonybrook. Elementary Number Theory and Methods of Proof CSE 215, Foundations of Computer Science Stony Brook University http://www.cs.stonybrook.edu/~cse215 1 Number theory Properties: 2 Properties of integers (whole

More information

26 Integers: Multiplication, Division, and Order

26 Integers: Multiplication, Division, and Order 26 Integers: Multiplication, Division, and Order Integer multiplication and division are extensions of whole number multiplication and division. In multiplying and dividing integers, the one new issue

More information

U.C. Berkeley CS276: Cryptography Handout 0.1 Luca Trevisan January, 2009. Notes on Algebra

U.C. Berkeley CS276: Cryptography Handout 0.1 Luca Trevisan January, 2009. Notes on Algebra U.C. Berkeley CS276: Cryptography Handout 0.1 Luca Trevisan January, 2009 Notes on Algebra These notes contain as little theory as possible, and most results are stated without proof. Any introductory

More information

A Comparison of Performance Measures for Online Algorithms

A Comparison of Performance Measures for Online Algorithms A Comparison of Performance Measures for Online Algorithms Joan Boyar 1, Sany Irani 2, an Kim S. Larsen 1 1 Department of Mathematics an Computer Science, University of Southern Denmark, Campusvej 55,

More information

CS 16: Assembly Language Programming for the IBM PC and Compatibles

CS 16: Assembly Language Programming for the IBM PC and Compatibles CS 16: Assembly Language Programming for the IBM PC and Compatibles First, a little about you Your name Have you ever worked with/used/played with assembly language? If so, talk about it Why are you taking

More information

An Efficient RNS to Binary Converter Using the Moduli Set {2n + 1, 2n, 2n 1}

An Efficient RNS to Binary Converter Using the Moduli Set {2n + 1, 2n, 2n 1} An Efficient RNS to Binary Converter Using the oduli Set {n + 1, n, n 1} Kazeem Alagbe Gbolagade 1,, ember, IEEE and Sorin Dan Cotofana 1, Senior ember IEEE, 1. Computer Engineering Laboratory, Delft University

More information

Chapter 6. Number Theory. 6.1 The Division Algorithm

Chapter 6. Number Theory. 6.1 The Division Algorithm Chapter 6 Number Theory The material in this chapter offers a small glimpse of why a lot of facts that you ve probably nown and used for a long time are true. It also offers some exposure to generalization,

More information

Approximation Errors in Computer Arithmetic (Chapters 3 and 4)

Approximation Errors in Computer Arithmetic (Chapters 3 and 4) Approximation Errors in Computer Arithmetic (Chapters 3 and 4) Outline: Positional notation binary representation of numbers Computer representation of integers Floating point representation IEEE standard

More information

n-parameter families of curves

n-parameter families of curves 1 n-parameter families of curves For purposes of this iscussion, a curve will mean any equation involving x, y, an no other variables. Some examples of curves are x 2 + (y 3) 2 = 9 circle with raius 3,

More information

8051 Programming. The 8051 may be programmed using a low-level or a high-level programming language.

8051 Programming. The 8051 may be programmed using a low-level or a high-level programming language. 8051 Programming The 8051 may be programmed using a low-level or a high-level programming language. Low-Level Programming Assembly language programming writes statements that the microcontroller directly

More information

A Second Course in Mathematics Concepts for Elementary Teachers: Theory, Problems, and Solutions

A Second Course in Mathematics Concepts for Elementary Teachers: Theory, Problems, and Solutions A Second Course in Mathematics Concepts for Elementary Teachers: Theory, Problems, and Solutions Marcel B. Finan Arkansas Tech University c All Rights Reserved First Draft February 8, 2006 1 Contents 25

More information

CHAPTER 2 Data Representation in Computer Systems

CHAPTER 2 Data Representation in Computer Systems CHAPTER 2 Data Representation in Computer Systems 2.1 Introduction 47 2.2 Positional Numbering Systems 48 2.3 Converting Between Bases 48 2.3.1 Converting Unsigned Whole Numbers 49 2.3.2 Converting Fractions

More information

MODULAR ARITHMETIC. a smallest member. It is equivalent to the Principle of Mathematical Induction.

MODULAR ARITHMETIC. a smallest member. It is equivalent to the Principle of Mathematical Induction. MODULAR ARITHMETIC 1 Working With Integers The usual arithmetic operations of addition, subtraction and multiplication can be performed on integers, and the result is always another integer Division, on

More information

Determining the Optimal Combination of Trial Division and Fermat s Factorization Method

Determining the Optimal Combination of Trial Division and Fermat s Factorization Method Determining the Optimal Combination of Trial Division and Fermat s Factorization Method Joseph C. Woodson Home School P. O. Box 55005 Tulsa, OK 74155 Abstract The process of finding the prime factorization

More information

M147 Practice Problems for Exam 2

M147 Practice Problems for Exam 2 M47 Practice Problems for Exam Exam will cover sections 4., 4.4, 4.5, 4.6, 4.7, 4.8, 5., an 5.. Calculators will not be allowe on the exam. The first ten problems on the exam will be multiple choice. Work

More information

2010/9/19. Binary number system. Binary numbers. Outline. Binary to decimal

2010/9/19. Binary number system. Binary numbers. Outline. Binary to decimal 2/9/9 Binary number system Computer (electronic) systems prefer binary numbers Binary number: represent a number in base-2 Binary numbers 2 3 + 7 + 5 Some terminology Bit: a binary digit ( or ) Hexadecimal

More information

Unit 2: Number Systems, Codes and Logic Functions

Unit 2: Number Systems, Codes and Logic Functions Unit 2: Number Systems, Codes and Logic Functions Introduction A digital computer manipulates discrete elements of data and that these elements are represented in the binary forms. Operands used for calculations

More information

Fixed-Point Arithmetic

Fixed-Point Arithmetic Fixed-Point Arithmetic Fixed-Point Notation A K-bit fixed-point number can be interpreted as either: an integer (i.e., 20645) a fractional number (i.e., 0.75) 2 1 Integer Fixed-Point Representation N-bit

More information

The mathematics of RAID-6

The mathematics of RAID-6 The mathematics of RAID-6 H. Peter Anvin First version 20 January 2004 Last updated 20 December 2011 RAID-6 supports losing any two drives. syndromes, generally referred P and Q. The way

More information

Homework 5 Solutions

Homework 5 Solutions Homework 5 Solutions 4.2: 2: a. 321 = 256 + 64 + 1 = (01000001) 2 b. 1023 = 512 + 256 + 128 + 64 + 32 + 16 + 8 + 4 + 2 + 1 = (1111111111) 2. Note that this is 1 less than the next power of 2, 1024, which

More information

Arithmetic Operations

Arithmetic Operations Arithmetic Operations Dongbing Gu School of Computer Science and Electronic Engineering University of Essex UK Spring 2013 D. Gu (Univ. of Essex) Arithmetic Operations Spring 2013 1 / 34 Outline 1 Introduction

More information

Modelling and Resolving Software Dependencies

Modelling and Resolving Software Dependencies June 15, 2005 Abstract Many Linux istributions an other moern operating systems feature the explicit eclaration of (often complex) epenency relationships between the pieces of software

More information

Instruction Set Architecture (ISA)

Instruction Set Architecture (ISA) Instruction Set Architecture (ISA) * Instruction set architecture of a machine fills the semantic gap between the user and the machine. * ISA serves as the starting point for the design of a new machine

More information

Numerical Matrix Analysis

Numerical Matrix Analysis Numerical Matrix Analysis Lecture Notes #10 Conditioning and / Peter Blomgren, blomgren.peter@gmail.com Department of Mathematics and Statistics Dynamical Systems Group Computational Sciences Research

More information

Chapter 4, Arithmetic in F [x] Polynomial arithmetic and the division algorithm.

Chapter 4, Arithmetic in F [x] Polynomial arithmetic and the division algorithm. Chapter 4, Arithmetic in F [x] Polynomial arithmetic and the division algorithm. We begin by defining the ring of polynomials with coefficients in a ring R. After some preliminary results, we specialize

More information

CHAPTER 5 Round-off errors

CHAPTER 5 Round-off errors CHAPTER 5 Round-off errors In the two previous chapters we have seen how numbers can be represented in the binary numeral system and how this is the basis for representing numbers in computers. Since any

More information

11 Ideals. 11.1 Revisiting Z

11 Ideals. 11.1 Revisiting Z 11 Ideals The presentation here is somewhat different than the text. In particular, the sections do not match up. We have seen issues with the failure of unique factorization already, e.g., Z[ 5] = O Q(

More information

Introduction Number Systems and Conversion

Introduction Number Systems and Conversion UNIT 1 Introduction Number Systems and Conversion Objectives 1. Introduction The first part of this unit introduces the material to be studied later. In addition to getting an overview of the material

More information

ELET 7404 Embedded & Real Time Operating Systems. Fixed-Point Math. Chap. 9, Labrosse Book. Fall 2007

ELET 7404 Embedded & Real Time Operating Systems. Fixed-Point Math. Chap. 9, Labrosse Book. Fall 2007 ELET 7404 Embedded & Real Time Operating Systems Fixed-Point Math Chap. 9, Labrosse Book Fall 2007 Fixed-Point Math Most low-end processors, such as embedded processors Do not provide hardware-assisted

More information

Properties of Real Numbers

Properties of Real Numbers 16 Chapter P Prerequisites P.2 Properties of Real Numbers What you should learn: Identify and use the basic properties of real numbers Develop and use additional properties of real numbers Why you should

More information

Levent EREN levent.eren@ieu.edu.tr A-306 Office Phone:488-9882 INTRODUCTION TO DIGITAL LOGIC

Levent EREN levent.eren@ieu.edu.tr A-306 Office Phone:488-9882 INTRODUCTION TO DIGITAL LOGIC Levent EREN levent.eren@ieu.edu.tr A-306 Office Phone:488-9882 1 Number Systems Representation Positive radix, positional number systems A number with radix r is represented by a string of digits: A n

More information

The mathematics of RAID-6

The mathematics of RAID-6 The mathematics of RAID-6 H. Peter Anvin 1 December 2004 RAID-6 supports losing any two drives. The way this is done is by computing two syndromes, generally referred P and Q. 1 A quick

More information

4. Number Theory (Part 2)

4. Number Theory (Part 2) 4. Number Theory (Part 2) Terence Sim Mathematics is the queen of the sciences and number theory is the queen of mathematics. Reading Sections 4.8, 5.2 5.4 of Epp. Carl Friedrich Gauss, 1777 1855 4.3.

More information

Unit 5 Central Processing Unit (CPU)

Unit 5 Central Processing Unit (CPU) Unit 5 Central Processing Unit (CPU) Introduction Part of the computer that performs the bulk of data-processing operations is called the central processing unit (CPU). It consists of 3 major parts: Register

More information

The programming language C. sws1 1

The programming language C. sws1 1 The programming language C sws1 1 The programming language C invented by Dennis Ritchie in early 1970s who used it to write the first Hello World program C was used to write UNIX Standardised as K&C (Kernighan

More information

A New Vulnerable Class of Exponents in RSA

A New Vulnerable Class of Exponents in RSA A ew Vulnerable Class of Exponents in RSA Aberrahmane itaj Laboratoire e Mathmatiues icolas Oresme Universit e Caen, France nitaj@math.unicaen.fr http://www.math.unicaen.fr/~nitaj Abstract Let = p be an

More information

Theorem (The division theorem) Suppose that a and b are integers with b > 0. There exist unique integers q and r so that. a = bq + r and 0 r < b.

Theorem (The division theorem) Suppose that a and b are integers with b > 0. There exist unique integers q and r so that. a = bq + r and 0 r < b. Theorem (The division theorem) Suppose that a and b are integers with b > 0. There exist unique integers q and r so that a = bq + r and 0 r < b. We re dividing a by b: q is the quotient and r is the remainder,

More information

Basic Computer Organization

Basic Computer Organization SE 292 (3:0) High Performance Computing L2: Basic Computer Organization R. Govindarajan govind@serc Basic Computer Organization Main parts of a computer system: Processor: Executes programs Main memory:

More information

The BBP Algorithm for Pi

The BBP Algorithm for Pi The BBP Algorithm for Pi David H. Bailey September 17, 2006 1. Introduction The Bailey-Borwein-Plouffe (BBP) algorithm for π is based on the BBP formula for π, which was discovered in 1995 and published

More information

Chapter 4. Computer Arithmetic

Chapter 4. Computer Arithmetic Chapter 4 Computer Arithmetic 4.1 Number Systems A number system uses a specific radix (base). Radices that are power of 2 are widely used in digital systems. These radices include binary (base 2), quaternary

More information

5 =5. Since 5 > 0 Since 4 7 < 0 Since 0 0

5 =5. Since 5 > 0 Since 4 7 < 0 Since 0 0 a p p e n d i x e ABSOLUTE VALUE ABSOLUTE VALUE E.1 definition. The absolute value or magnitude of a real number a is denoted by a and is defined by { a if a 0 a = a if a

More information

Some facts about polynomials modulo m (Full proof of the Fingerprinting Theorem)

Some facts about polynomials modulo m (Full proof of the Fingerprinting Theorem) Some facts about polynomials modulo m (Full proof of the Fingerprinting Theorem) In order to understand the details of the Fingerprinting Theorem on fingerprints of different texts from Chapter 19 of the

More information

UNIT 2 MATRICES - I 2.0 INTRODUCTION. Structure

UNIT 2 MATRICES - I 2.0 INTRODUCTION. Structure UNIT 2 MATRICES - I Matrices - I Structure 2.0 Introduction 2.1 Objectives 2.2 Matrices 2.3 Operation on Matrices 2.4 Invertible Matrices 2.5 Systems of Linear Equations 2.6 Answers to Check Your Progress

More information