# Division by Invariant Integers using Multiplication

Save this PDF as:

Size: px
Start display at page:

## Transcription

1 Division by Invariant Integers using Multiplication Torbjörn Granlun Cygnus Support 1937 Lanings Drive Mountain View, CA Peter L. Montgomery Centrum voor Wiskune en Informatica 780 Las Colinas Roa San Rafael, CA Abstract Integer ivision remains expensive on toay s processors as the cost of integer multiplication eclines. We present coe sequences for ivision by arbitrary nonzero integer constants an run time invariants using integer multiplication. The algorithms assume a two s complement architecture. Most also require that the upper half of an integer prouct be quickly accessible. We treat unsigne ivision, signe ivision where the quotient rouns towars zero, signe ivision where the quotient rouns towars, an ivision where the result is known a priori to be exact. We give some implementation results using the C compiler GCC. 1 Introuction The cost of an integer ivision on toay s RISC processors is several times that of an integer multiplication. The tren is towars fast, often pipeline combinatoric multipliers that perform an operation in typically less than 10 cycles, with either no harware support for integer ivision or iterating iviers that are several times slower than the multiplier. Table 1.1 compares multiplication an ivision times on some processors. This table illustrates that the iscrepancy between multiplication an ivision timing has been growing. Integer ivision is use heavily in base conversions, number theoretic coes, an graphics coes. Compilers Work one by first author while at Sweish Institute of Computer Science, Stockholm, Sween. Work one by secon author while at University of California, Los Angeles. Supporte by U.S. Army fellowship DAAL03 89 G generate integer ivisions to compute loop counts an subtract pointers. In a static analysis of FORTRAN programs, Knuth [13, p. 9] reports that 39% of arithmetic operators were aitions, 22% subtractions, 27% multiplications, 10% ivisions, an 2% exponentiations. Knuth s counts o not istinguish integer an floating point operations, except that 4% of the ivisions were ivisions by 2. When integer multiplication is cheaper than integer ivision, it is beneficial to substitute a multiplication for a ivision. Multiple authors [2, 11, 15] present algorithms for ivision by constants, but only when the ivisor ivies 2 k 1 for some small k. Magenheimer et al [16, 7] give the founation of a more general approach, which Alverson [1] implements on the Tera Computing System. Compiler writers are only beginning to become aware of the general technique. For example, version 1.02 of the IBM RS/6000 xlc an xlf compilers uses the integer multiply instruction to expan signe integer ivisions by 3, 5, 7, 9, 25, an 125, but not by other o integer ivisors below 256, an never for unsigne ivision. We assume an N bit two s complement architecture. Unsigne (i.e., nonnegative) integers range from 0 to 2 N 1 inclusive; signe integers range from 2 N 1 to 2 N 1 1. We enote these integers by uwor an swor respectively. Unsigne oublewor integers (range 0 to 2 2N 1) are enote by uwor. Signe oublewor integers (range 2 2N 1 to 2 2N 1 1) are enote by swor. The type int is use for shift counts an logarithms. Several of the algorithms require the upper half of an integer prouct obtaine by multiplying two uwors or two swors. All algorithms nee simple operations such as as, shifts, an bitwise operations (bit ops) on uwors an swors, as summarize in Table 3.1. We show how to use these operations to ivie by arbitrary nonzero constants, as well as by ivisors which are loop invariant or repeate in a basic block, using one multiplication plus a few simple instructions per ivision. The presentation concentrates on three types of

2 Architecture/Implementation N Approx. Year Motorola MC68020 [18, pp. 9 22] Time (cycles) for HIGH(N bit N bit) Motorola MC Intel 386 [9] Intel 486 [10] Intel Pentium SPARC Cypress CY7C S 100 S SPARC Viking [20] HP PA 83 [16] S 70 S HP PA FP 70 S MIPS R3000 [12] P 35 P Time (cycles) for N bit/n bit ivie (unsigne) (signe) MIPS R4000 [17] P 139 POWER/RIOS I [4, 22] (signe only) 19 (signe only) PowerPC/MPC601 [19] DEC Alpha 21064AA [8] P 200 S Motorola MC S 38 Motorola MC P 18 S No irect harware support; approximate cycle count for software implementation F Does not inclue time for moving ata to/from floating point registers P Pipeline implementation (i.e., inepenent instructions can execute simultaneously) Table 1.1: Multiplication an ivision times on ifferent CPUs ivision, in orer by ifficulty: (i) unsigne, (ii) signe, quotient roune towars zero, (iii) signe, quotient roune towars. Other topics are ivision of a uwor by a run time invariant uwor, ivision when the remainer is known a priori to be zero, an testing for a given remainer. In each case we give the mathematical backgroun an suggest an algorithm which a compiler can use to generate the coe. The algorithms are ineffective when a ivisor is not invariant, such as in the Eucliean GCD algorithm. Most algorithms presente herein yiel only the quotient. The remainer, if esire, can be compute by an aitional multiplication an subtraction. We have implemente the algorithms in a evelopmental version of the GCC 2.6 compiler [21]. DEC uses some of these algorithms in its Alpha AXP compilers. 2 Mathematical notations Let x be a real number. Then x enotes the largest integer not exceeing x an x enotes the least integer not less than x. Let TRUNC(x) enote the integer part of x, roune towars zero. Formally, TRUNC(x) = x if x 0 an TRUNC(x) = x if x < 0. The absolute value of x is x. For x > 0, the (real) base 2 logarithm of x is log 2 x. A multiplication is written x y. If x, y, an n are integers an n 0, then x y (mo n) means x y is a multiple of n. Two remainer operators are common in language efinitions. Sometimes a remainer has the sign of the ivien an sometimes the sign of the ivisor. We use the Aa notations n rem = n TRUNC(n/) n mo = n n/ (sign of ivien), (sign of ivisor). (2.1) The Fortran 90 names are MOD an MODULO. In C, the efinition of remainer is implementation epenent (many C implementations roun signe quotients towars zero an use rem remainering). Other efinitions have been propose [6, 7]. If n is an uwor or swor, then HIGH(n) an LOW(n) enote the most significant an least significant halves of n. LOW(n) is a uwor, while HIGH(n) is an uwor if n is a uwor an an swor if n is a swor. In both cases n = 2 N HIGH(n) + LOW(n). 3 Assume instructions The suggeste coe assumes the operations in Table 3.1, on an N bit machine. Some primitives, such as loaing constants an operans, are implicit in the notation an are not inclue in the operation counts.

3 TRUNC(x) Truncation towars zero; see 2. HIGH(x), LOW(x) Upper an lower halves of x: see 2. MULL(x, y) Lower half of prouct x y (i.e., prouct moulo 2 N ). MULSH(x, y) Upper half of signe prouct x y: If 2 N 1 x, y 2 N 1 1, then x y = 2 N MULSH(x, y) + MULL(x, y). MULUH(x, y) Upper half of unsigne prouct x y: If 0 x, y 2 N 1, then x y = 2 N MULUH(x, y) + MULL(x, y). AND(x, y) Bitwise AND of x an y. EOR(x, y) Bitwise exclusive OR of x an y. NOT(x) Bitwise complement of x. Equal to 1 x if x is signe, to 2 N 1 x if x is unsigne. OR(x, y) Bitwise OR of x an y. SLL(x, n) Logical left shift of x by n bits (0 n N 1). SRA(x, n) Arithmetic right shift of x by n bits (0 n N 1). SRL(x, n) Logical right shift of x by n bits (0 n N 1). XSIGN(x) 1 if x < 0; 0 if x 0. Short for SRA(x, N 1) or SRL(x, N 1). x + y, x y, x Two s complement aition, subtraction, negation. Table 3.1: Mathematical notations an primitive operations The algorithm in 8 requires the ability to a or subtract two oublewors, obtaining a oublewor result; this typically expans into 2 4 instructions. The algorithms for processing constant ivisors require compile time arithmetic on uwors. Algorithms for processing run time invariant ivisors require taking the base 2 logarithm of a positive integer (sometimes roune up, sometimes own) an require iviing a uwor by a uwor. If the algorithms are use only for constant ivisors, then these operations are neee only at compile time. If the architecture has a leaing zero count (LDZ) instruction, then these logarithms can be foun from log 2 x = N LDZ(x 1), log 2 x = N 1 LDZ(x) (1 x 2 N 1). Some algorithms may prouce expressions such as SRL(x, 0) or (x y); the optimizer shoul make the obvious simplifications. Some escriptions show an aition or subtraction of 2 N, which is a no-op. If an architecture lacks arithmetic right shift, then it can be compute from the ientity SRA(x, l) = SRL(x + 2 N 1, l) 2 N 1 l whenever 0 l N 1. If an architecture has only one of MULSH an MULUH, then the other can be compute using MULUH(x, y) = MULSH(x, y) + AND(x, XSIGN(y)) + AND(y, XSIGN(x)) for arbitrary N bit patterns x, y (interprete as uwors for MULUH an as swors for MULSH). 4 Unsigne ivision Suppose we want to compile an unsigne ivision q = n/, where 0 < < 2 N is a constant or run time invariant an 0 n < 2 N is variable. Let s try to fin a rational approximation m/2 N+l of 1/ such that n m n = 2 N+l whenever 0 n 2 N 1. (4.1) Setting n = in (4.1) shows we require 2 N+l m. Setting n = q 1 shows 2 N+l q > m (q 1). Multiply by to erive ( m 2 N+l) (q 1) < 2 N+l. This inequality will hol for all values of q 1 below 2 N if m 2 N+l 2 l. Theorem 4.2 below states that these conitions are sufficient, because the maximum relative error (1 part in 2 N ) is too small to affect the quotient when n < 2 N. Theorem 4.2 Suppose m,, l are nonnegative integers such that 0 an 2 N+l m 2 N+l + 2 l. (4.3) Then n/ = m n/2 N+l for every integer n with 0 n < 2 N. Proof. Define k = m 2 N+l. Then 0 k 2 l by hypothesis. Given n with 0 n < 2 N, write n = q + r where q = n/ an 0 r 1. We must show that q = m n/2 N+l. A calculation gives m n k + 2N+l q = 2N+l n 2 N+l q = k n 2 N+l + n n r = k 2 l n 2 N 1 + r. (4.4)

4 This ifference is nonnegative an oes not excee 1 2N 1 2 N = N < 1. Theorem 4.2 allows ivision by to be replace with multiplication by m/2 N+l if (4.3) hols. In general we require 2 l 1 to ensure that a suitable multiple of exists in the interval [2 N+l, 2 N+l +2 l ]. For compatibility with the algorithms for signe ivision ( 5 an 6), it is convenient to choose m > 2 N+l even though Theorem 4.2 permits equality. Since m can be almost as large as 2 N+1, we on t multiply by m irectly, but instea by 2 N an m 2 N. This leas to the coe in Figure 4.1. Its cost is 1 multiply, 2 as/subtracts, an 2 shifts per quotient, after computing constants epenent only on the ivisor. Initialization (given uwor with 1 < 2 N ): int l = log 2 ; /* 2 l 2 1 */ uwor m = 2 N (2 l )/ + 1; /* m = 2 N+l / 2 N + 1 */ int sh 1 = min(l, 1); int sh 2 = max(l 1, 0); /* sh 2 = l sh 1 */ For q = n/, all uwor: uwor t 1 = MULUH(m, n); q = SRL(t 1 + SRL(n t 1, sh 1 ), sh 2 ); Figure 4.1: Unsigne ivision by run time invariant ivisor Explanation of Figure 4.1. If = 1, then l = 0, so m = 1 an sh 1 = sh 2 = 0. The coe computes t 1 = 1 n/2 N = 0 an q = n. If > 1, then l 1, so sh 1 = 1 an sh 2 = l 1. Since m 2N (2 l ) + 1 2N ( 1) + 1 < 2 N, the value of m fits in a uwor. Since 0 t 1 n, the formula for q simplifies to q = SRL(t 1 + SRL(n t 1, 1), l 1) t1 + (n t 1 )/2 = 2 l 1 (t1 + n)/2 t1 + n = =. 2 l 1 2 l (4.5) But t 1 + n = m n/2 N + n = (m + 2 N ) n/2 N. Set m = m + 2 N = 2 N+l / + 1. The hypothesis of Theorem 4.2 is satisfie since 2 N+l < m 2 N+l + 2 N+l + 2 l. Caution. Conceptually q is SRL(n + t 1, l), as in (4.5). Do not compute q this way, since n+t 1 may overflow N bits an the shift count may be out of bouns. Improvement. If is constant an a power of 2, replace the ivision by a shift. Improvement. If is constant an m = m + 2 N is even, then reuce m/2 l to lowest terms. The reuce multiplier fits in N bits, unlike the original. In rare cases (e.g., = 641 on a 32 bit machine, = on a 64 bit machine) the final shift is zero. Improvement. If is constant an even, rewrite n n/2 e = /2 e for some e > 0. Then n/2 e can be compute using SRL. Since n/2 e < 2 N e, less precision is neee in the multiplier than before. These ieas are reflecte in Figure 4.2, which generates coe for n/ where n is unsigne an is constant. Proceure CHOOSE MULTIPLIER, which is share by this an later algorithms, appears in Figure 6.2. Inputs: uwor an n, with constant. uwor o, t 1 ; uwor m; int e, l, l ummy, sh post, sh pre ; (m, sh post, l) = CHOOSE MULTIPLIER(, N); if m 2 N an is even then Fin e such that = 2 e o an o is o. /* 2 e = AND(, 2 N ) */ sh pre = e; (m, sh post, l ummy ) = CHOOSE MULTIPLIER( o, N e); else sh pre = 0; en if if = 2 l then Issue q = SRL(n, l); else if m 2 N then assert sh pre = 0; Issue t 1 = MULUH(m 2 N, n); Issue q = SRL(t 1 + SRL(n t 1, 1), sh post 1); else Issue q = SRL(MULUH(m, SRL(n, sh pre )), sh post ); en if Figure 4.2: Optimize coe generation of unsigne q = n/ for constant nonzero The following three examples illustrate the cases in Figure 4.2. All assume unsigne 32 bit arithmetic. Example. q = n/10. CHOOSE MULTIPLIER fins m low = (2 36 6)/10 an m high = ( )/10. After one roun of ivisions by 2, it returns (m, 3, 4), where m = ( )/5. The suggeste coe q = SRL(MULUH(( )/5, n), 3) eliminates the pre shift by 0. See Table Example. q = n/7. Here m = ( )/7 > This example uses the longer sequence in Figure 4.1. Example. q = n/14. CHOOSE MULTIPLIER first returns the same multiplier as when = 7. The

5 suggeste coe uses separate ivisions by 2 an 7: q = SRL(MULUH(( )/7, SRL(n, 1)), 2). 5 Signe ivision, quotient roune towars 0 Suppose we want to compile a signe ivision q = TRUNC(n/), where is constant or run time invariant, 0 < 2 N 1, an where 2 N 1 n 2 N 1 1 is variable. All quotients are to be roune towars zero. We coul prove a theorem like Theorem 4.2 about when TRUNC(n/) = TRUNC(m n/2 N+l ) for all n in a suitable range (cf. (7.1)), but it wouln t help since we can t compute the right sie given only m n/2 N. Instea we show how to ajust the estimate quotient when the ivien or ivisor is negative. Theorem 5.1 Suppose m,, l are integers such that 0 an 0 < m 2 N+l 1 2 l. Let n be an arbitrary integer such that 2 N 1 n 2 N 1 1. Define q 0 = m n/2 N+l 1. Then ( n ) TRUNC = q 0 if n 0 an > 0, 1 + q 0 if n < 0 an > 0, q 0 if n 0 an < 0, 1 q 0 if n < 0 an < 0. Proof. When n 0 an > 0, this is Theorem 4.2 with N replace by N 1. Suppose n < 0 an > 0, say n = q r where 0 r 1. Define k = m 2 N+l 1. Then q m n 2 N+l 1 = k 2 l n 2 N r, (5.2) as in (4.4). Since 0 < k 2 l by hypothesis, the first fraction on the right of (5.2) is positive an r/ is nonnegative. The sum is at most 1/ + ( 1)/ = 1, so q 0 = m n/2 N+l 1 = q 1, as asserte. For < 0, use TRUNC(n/) = TRUNC(n/ ). Caution. When < 0, avoi rewriting the quotient as TRUNC(( n)/ ), which fails for n = 2 N 1. For a run time invariant ivisor, this leas to the coe in Figure 5.1. Its cost is 1 multiply, 3 as, 2 shifts, an 1 bit op per quotient. Explanation of Figure 5.1. The multiplier m satisfies 2 N 1 < m < 2 N except when = ±1; in the latter cases m = 2 N + 1. In either case m = m 2 N fits in an swor. We compute m n/2 N as n+ (m 2 N ) n/2 N, using MULSH. The subtraction of XSIGN(n) as one if n < 0. The last line negates the tentative quotient if < 0 (i.e., if sign = 1). Variation. ( An alternate computation of m is m = 2 N (2 l 1 ) ) + 1 TRUNC. This uses signe (2N) bit/n bit ivision, with N bit quotient. Initialization (given constant swor with 0): int l = max ( log 2, 1); uwor m = N+l 1 / ; swor m = m 2 N ; swor sign = XSIGN(); int sh post = l 1; For q = TRUNC(n/), all swor: swor q 0 = n + MULSH(m, n); q 0 = SRA(q 0, sh post ) XSIGN(n); q = EOR(q 0, sign ) sign ; Figure 5.1: Signe ivision by run time invariant ivisor, roune towars zero Overflow etection. The quotient n/ overflows if n = 2 N 1 an = 1. The algorithm in Figure 5.1 returns 2 N 1. If overflow etection is require, the final subtraction of sign shoul check for overflow. Improvement. If m is constant an even, then reuce m/2 l to lowest terms, as in the unsigne case. This improvement is reflecte in Figure 5.2, which generates coe for TRUNC(n/) where is a nonzero constant. Figure 5.2 also checks for ivisor being a power of 2 or negative thereof. Inputs: swor an n, with constant an 0. uwor m; int l, sh post ; (m, sh post, l) = CHOOSE MULTIPLIER(, N 1); if = 1 then Issue q = ; else if = 2 l then Issue q = SRA(n + SRL(SRA(n, l 1), N l), l); else if m < 2 N 1 then Issue q = SRA(MULSH(m, n), sh post ) XSIGN(n); else Issue q = SRA(n + MULSH(m 2 N, n), sh post ) XSIGN(n); Cmt. Caution m 2 N is negative. en if if < 0 then Issue q = q; en if Figure 5.2: Optimize coe generation of signe q = TRUNC(n/) for constant 0 Example. q = TRUNC(n/3). On a 32 bit machine. CHOOSE MULTIPLIER(3, 31) returns sh post = 0 an m = ( )/3. The coe q = MULSH(m, n) XSIGN(n) uses one multiply, one shift, one subtract.

6 6 Signe ivision, quotient roune towars Some languages require negative quotients to roun towars rather than zero. With some ingenuity, we can compute these quotients in terms of quotients which roun towars zero, even if the signs of the ivien an ivisor are unknown at compile time. If n an are integers, then the ientities TRUNC(n/) if n 0 an > 0, n TRUNC((n + 1)/) 1 if n < 0 an > 0, = TRUNC((n 1)/) 1 if n > 0 an < 0, TRUNC(n/) if n 0 an < 0 are easily verifie. Since the new numerators n±1 never overflow, these ientities can be use for computation. They are summarize by n ( ) n + sign n sign = TRUNC + q sign, (6.1) where sign = XSIGN(), n sign = XSIGN(OR(n, n + sign )), an q sign = EOR(n sign, sign ). The cost is 2 shifts, 3 as/subtracts, an 2 bit ops, plus the ivie (n + sign is a repeate subexpression). For remainers, a corollary to (2.1) an (6.1) is n mo = n TRUNC((n + sign n sign )/) q sign = ((n + sign n sign ) rem ) (6.2) sign + n sign q sign = ((n + sign n sign ) rem ) + AND( 2 sign 1, q sign ). The last equality in (6.2) can be verifie by separately checking the cases q sign = n sign sign = 0 an q sign = n sign + sign = 1. The subexpression 2 sign 1 epens only on. For rouning towars +, an analog of (6.1) is n ( ) n sign + n pos = TRUNC EOR( sign, n pos ), where sign = XSIGN() an n pos = (n > sign ). Improvement. If > 0 is constant, then sign = 0. Then (6.1) becomes n ( ) n nsign = TRUNC + n sign, where n sign = XSIGN(n). Since TRUNC( x) = TRUNC(x) an EOR( 1, n) = 1 n = (n + 1), this is equivalent to n ( ( )) EOR(nsign, n) = EOR n sign, TRUNC (6.3) ( > 0). The ivien an ivisor on the right of (6.3) are both nonnegative an below 2 N 1. One can view them as signe or as unsigne when applying earlier algorithms. Improvement. The XSIGN(OR(n, n + sign )) is equivalent to (n NOT( sign )) an to (n < sign ), where the relationals prouce 1 if true an 0 if false. On the MIPS R2000/R3000 [12], for example, one can compute sign = SRL(, N 1); n sign = (n < sign ); /* SLT, signe */ q sign = EOR( n sign, sign ); q = TRUNC((n ( sign ) + ( n sign ))/) ( q sign ); (six instructions plus the ivie), saving an instruction over (6.1). Improvement. If n known to be nonzero, then n sign simplifies to XSIGN(n). For constant ivisors, one can use (6.1) an the algorithm in Figure 5.2. For constant > 0 a shorter algorithm, base on (6.3), appears in Figure 6.1. Inputs: swor n an, with constant an 0. uwor m; int l, sh post ; (m, sh post, l) = CHOOSE MULTIPLIER(, N 1); if = 2 l then Issue q = SRA(n, l); else assert m < 2 N ; Issue swor n sign = XSIGN(n); Issue uwor q 0 = MULUH(m, EOR(n sign, n)); Issue q = EOR(n sign, SRL(q 0, sh post )); en if Figure 6.1: Optimize coe generation of signe q = n/ for constant > 0 Example. Using signe 32 bit arithmetic, the coe for r = n mo 10 (nonnegative remainer) can be swor n sign = XSIGN(n); uwor q 0 = MULUH(( )/5, EOR(n sign, n)); swor q = EOR(n sign, SRL(q 0, 2)); r = n SLL(q, 1) SLL(q, 3);. The cost is 1 multiply, 4 shifts, 2 bit ops, 2 subtracts. Alternately, if one has a fast signe ivision algorithm which rouns quotients towars 0 an returns remainers, then (6.2) justifies the coe r = ((n XSIGN(n)) rem 10) + AND(9, XSIGN(n)). The cost is 1 ivie, 1 shift, 1 bit op, 2 as/subtracts.

7 proceure CHOOSE MULTIPLIER(uwor, int prec); Cmt. Constant ivisor to invert. 1 < 2 N. Cmt. prec Number of bits of precision neee, 1 prec N. Cmt. Fins m, sh post, l such that: Cmt. 2 l 1 < 2 l. Cmt. 0 sh post l. If sh post > 0, then N + sh post l + prec. Cmt. 2 N+sh post < m 2 N+sh post (1 + 2 prec ). Cmt. Corollary. If 2 prec, then m < 2 N+sh post ( l )/ 2 N+sh post l+1. Cmt. Hence m fits in max(prec, N l) + 1 bits (unsigne). Cmt. int l = log 2, sh post = l; uwor m low = 2 N+l /, m high = (2 N+l + 2 N+l prec )/ ; Cmt. To avoi numerator overflow, compute m low as 2 N + (m low 2 N ). Cmt. Likewise for m high. Compare m in Figure 4.1. Invariant. m low = 2 N+sh post/ < m high = 2 N+sh post (1 + 2 prec )/. while m low /2 < m high /2 an sh post > 0 o m low = m low /2 ; m high = m high /2 ; sh post = sh post 1; en while; /* Reuce to lowest terms. */ return (m high, sh post, l); /* Three outputs. */ en CHOOSE MULTIPLIER; Figure 6.2: Selection of multiplier an shift count 7 Use of floating point One alternative to MULUH an MULSH uses floating point arithmetic. Let the floating point mantissa be F bits wie (e.g., F = 53 for IEEE ouble precision arithmetic). Then any floating point operation has relative error at most 2 1 F, regarless of the rouning moe, unless exponent overflow or unerflow occurs. Suppose N 1 an F N + 3. We claim that where ( n ) TRUNC = TRUNC(q est ), ( ) F q est n, (7.1) whenever n 2 N 1 an 0 < < 2 N, regarless of the rouning moes use to compute q est. The proof assumes that n > 0 an > 0, by negating both sies of (7.1) if necessary (the case n = 0 is trivial). Since the relative error per operation is at most 2 1 F, the estimate quotient q est satisfies F ( F ) 2 n q est ( F ) ( F ) 2 n. Use this an the inequalities 1 2 F F < F ( F ) 2, ( F ) ( F ) 2 1 < F N to erive (1 2 F ) n < q est < n/ n/ 1 2 N 1 1 n+1 = n + 1. Denote q = TRUNC(n/). Then q est < (n + 1)/ implies TRUNC(q est ) q. If q est < q, then (1 2 F ) q (1 2 F ) n < q est < q. Both q an q est are exactly representable as floating point numbers, but there are no representable numbers strictly between (1 2 F ) q an q. This contraiction shows that q est q an hence q = TRUNC(q est ). For quotients roune towars, use (6.1). If F = 53 an N 50, then (7.1) can be use for N bit integer ivision. The algorithm may trigger an IEEE exception for inexactness if the application program enables that conition. Alverson [1] uses integer multiplication, but computes the multiplier using floating point arithmetic. Baker [3] oes moular multiplication using a combination of floating point an integer arithmetic. 8 Diviing uwor by uwor One primitive operation for multiple precision arithmetic [14, p. 251] is the ivision of a uwor by a uwor, obtaining uwor quotient an remainer, where the quotient is known to be less than 2 N. We

8 Initialization (given uwor, where 0 < < 2 N ): int l = 1 + log 2 ; /* 2 l 1 < 2 l */ uwor m = (2 N (2 l ) 1)/ ; /* m = (2 N+l 1)/ 2 N */ uwor norm = SLL(, N l); /* Normalize ivisor 2 N l */ For q = n/ an r = n q, where, q, r are uwor an n is uwor: uwor n 2 = SLL(HIGH(n), N l) + SRL(LOW(n), l); /* See note about shift count. */ uwor n 10 = SLL(LOW(n), N l); /* n 10 = n 1 2 N 1 + n 0 2 N l */ /* Ignore overflow. */ swor n 1 = XSIGN(n 10 ); uwor n aj = n 10 + AND( n 1, norm 2 N ); /* n 10 + n 1 ( norm 2 N ) */ /* = n 1 ( norm 2 N 1 ) + n 0 2 N l */ uwor q 1 = n 2 + HIGH ( ) m (n 2 ( n 1 )) + n aj ; /* Unerflow is impossible. */ /* See Lemma 8.1. */ swor r = n 2 N + (2 N 1 q 1 ) ; /* r = n q 1, r < */ q = HIGH(r) (2 N 1 q 1 ) + 2 N ; /* A 1 to quotient if r 0. */ r = LOW(r) + AND( 2 N, HIGH(r)); /* A to remainer if r < 0. */ Figure 8.1: Unsigne ivision of uwor by run time invariant uwor. escribe a way to compute this quotient an remainer after some preliminary computations involving only the ivisor, when the ivisor is a run time invariant expression. Lemma 8.1 Suppose that, m, an l are nonnegative integers such that 2 l 1 < 2 l 2 N an 0 < 2 N+l m. (8.2) Given n with 0 n 2 N 1, write n = n 2 2 l + n 1 2 l 1 + n 0, where n 0, n 1, an n 2 are integers with 0 n 1 1 an 0 n 0 2 l 1 1. Define integers q 1 an q 0 by q 1 2 N + q 0 = n 2 2 N + (n 2 + n 1 ) (m 2 N ) + n 1 ( 2 N l 2 N 1) + n 0 2 N l (8.3) an 0 q 0 2 N 1. 0 n q 1 < 2. Then 0 q 1 2 N 1 an Proof. Define k = 2 N+l m. Then (8.2) implies 0 < k 2 l 1. The boun n 2 N 1 implies n 2 2 N l 1. Equation (8.2) implies m > 2 N+l / > 2 N. A corollary to (8.3) is q 1 2 N + q 0 = n 2 m + n 1 (m 2 N ) + 2 N l ( n 1 ( 2 l 1 ) + n 0 ) ( 2 N l 1) m + 1 (m 2 N ) + 2 N l ( 1 (2 l 1 1) + (2 l 1 1) ) = 2 N l ( m 2) < 2 2N. This proves the upper boun on the integer q 1. A straightforwar calculation using the efinitions of k an q 0 an n 0 reveals that n q 1 = (n 2 + n 1 ) k + q 0 2 N + (1 2 ) ) l (n 1 ( 2 l 1 ) + n 0. (8.4) Since 2 l 1 < 2 l by hypothesis, the right sie of (8.4) is nonnegative. This remainer is boune by ( 2 N l ) + (2 N 1) 2 N + (1 2 ) ( ) l 1 ( 2 l 1 ) + (2 l 1 1) ( ) 2 < 2 l + + (1 2 ) l = 2, completing the proof. This leas to an algorithm like that in Figure 8.1 when iviing a uwor by a run time invariant uwor with quotient known to be less than 2 N. Unlike the previous algorithms, this coe rouns the multiplier own when computing a reciprocal. After initializations epening only on the ivisor, this algorithm requires two proucts (both halves of each) an simple operations (incluing oublewor as an subtracts). Five registers hol, norm, l, m, an N l. Note. The shift count l in the computations of m an n 2 may equal N. If this is too large, use separate shifts by l 1 an 1. If a oublewor shift is available, compute n 2 an n 10 together.

9 9 Exact ivision by constants Occasionally a language construct requires a ivision whose remainer is known to vanish. An example occurs in C when subtracting two pointers. Their numerical ifference is ivie by the object size. The object size is a compile time constant. Suppose we want coe for q = n/, where is a nonzero constant an n is an expression known to be ivisible by. Write = 2 e o where o is o. Fin inv such that 1 inv 2 N 1 an Then inv o 1 (mo 2 N ). (9.1) 2 e q = 2 e n = n o ( inv o ) n = inv n (mo 2 N ), o as in [2]. Hence 2 e q inv n (mo 2 N ). Since n/ o = 2 e q fits in N bits, it must equal the lower half of the prouct inv n, namely MULL( inv, n). An SRA (for signe ivision) or SRL (for unsigne ivision) prouces the quotient q. The multiplicative inverse inv of o moulo 2 N can be foun by the extene Eucliean GCD algorithm [14, p. 325]. Another algorithm observes that (9.1) hols moulo 2 3 if inv = o. Each Newton iteration inv inv (2 inv o ) mo 2 N (9.2) oubles the known exponent by which (9.1) hols, so log 2 (N/3) iterations of (9.2) suffice. If o = ±1, then inv = o so the multiplication by inv is trivial or a negation. If is o, then e = 0 an the shift isappears. A variation tests whether an integer n is exactly ivisible by a nonzero constant without computing the remainer. If is a power of 2 (or the negative thereof, in the signe case), then check the lower bits of n to test whether ivies n. Otherwise compute inv an e as above. Let q 0 = MULL( inv, n). If n = q for some q, then q 0 = 2 e q must be a multiple of 2 e. The original ivision is exact (no remainer) precisely when (i) q 0 is a multiple of 2 e, an (ii) q 0 is sufficiently small that q 0 o is representable by the original ata type. For unsigne ivision check that 2 0 q 0 2 e N 1 an that the bottom e bits of q 0 (or of n) are zero. When e > 0, these tests can be combine if the architecture has a rotate (i.e., circular shift) instruction, or by expaning this rotate into 2 N 1 OR(SRL(q 0, e), SLL(q 0, N e)). For signe ivision check that 2 2 e N 1 2 q 0 2 e N 1 1 an that the bottom e bits of q 0 are zero; the interval check can be one with an a an one signe or unsigne compare. Relately, to test whether n rem = r, where an r are constants with 1 r < an where n is signe, check whether MULL( inv, n r) is a nonnegative multiple of 2 e not exceeing 2 e (2 N 1 1 r)/. Example. To test whether a signe 32 bit value i is ivisible by 100, let inv = ( )/25. Compute swor q 0 = MULL( inv, i). Next check whether q 0 is a multiple of 4 in the interval [ q max, q max ], where q max = ( )/25. Since these algorithms require only the lower half of a prouct, other optimizations for integer multiplication apply here too. For example, applying strength reuction to the C loop signe long i, imax; for (i = 0; i < imax; i++) { if ((i % 100) == 0) {... } } might yiel (** enotes exponentiation) const unsigne long inv = (19*2**32 + 1)/25; const unsigne long qmax = (2**31-48)/25; unsigne long test = qmax; /* test = inv*i + qmax mo 2**32 */ for (i = 0; i < imax; i++, test += inv) { if (test <= 2*qmax && (test & 3) == 0) {... } } No explicit multiplication or ivision remains. 10 Implementation in GCC We have implemente the algorithms for constant ivisors in the freely available GCC compiler [21], by extening its machine an language inepenent internal coe generation. We also mae minor machine epenent moifications to some of the machine escriptor, or m files to get optimal coe. All languages an almost all processors supporte by GCC benefit. Our changes are scheule for inclusion in GCC 2.6.

10 To generate coe for ivision of N bit quantities, the CHOOSE MULTIPLIER function nees to perform (2N) bit arithmetic. This makes that proceure more complex than it might appear in Figure 6.2. Optimal selection of instructions epening on the bitsize of the operation is a tricky problem that we spent quite some time on. For some architectures, it is important to select a multiplication instruction that has the smallest available precision. On other architectures, the multiplication can be performe faster using a sequence of aitions, subtractions, an shifts. We have not implemente any algorithm for run time invariant ivisors. Only a few architectures (AMD 29050, Intel x86, Motorola 68k & 88110, an to some extent IBM POWER) have aequate harware support to make such an implementation viable, i.e., an instruction that can be use for integer logarithm computation, an a (2N) bit/n bit ivie instruction. Even with harware support, one must be careful that the transformation really improves the coe; e.g., a loop might nee to be execute many times before the faster loop boy outweighs the cost of the multiplier computation in the loop heaer. 11 Results Figure 11.1 has an example with compile time constant ivisor that gets rastically faster on all recent processor implementations. The program converts a binary number to a ecimal string. It calculates one quotient an one remainer per output igit. Table 11.1 shows the generate assembler coes for Alpha, MIPS, POWER, an SPARC. There is no explicit ivision. Although initially compute separately, the quotient an remainer calculations have been combine (by GCC s common subexpression elimination pass). The unsigne int ata type has 32 bits on all four architectures, but Alpha is a 64 bit architecture. The Alpha coe is longer than the others because it multiplies ( )/5 by x using 4 [ ( ) ( ) ( 4 [4 (4 x x) + x] x )] + x instea of the slower, 23 cycle, mulq. This illustrates that the multiplications neee by these algorithms can sometimes be compute quickly using a sequence of shifts, as, an subtracts [5], since multipliers for small constant ivisors have regular binary patterns. Table 11.2 compares the timing on some processor implementations for the raix conversion routine, with an without the ivision elimination algorithms. The number converte was a full 32 bit number, sufficiently large to hie proceure calling overhea from the measurements. We also ran the integer benchmarks from SPEC 92. The improvement was negligible for most of the programs; the best improvement seen was only about 3%. Some benchmarks that involve hashing show improvements up to about 30%. We anticipate significant improvements on some number theoretic coes. References [1] Robert Alverson. Integer ivision using reciprocals. In Peter Kornerup an Davi W. Matula, eitors, Proceeings 10th Symposium on Computer Arithmetic, pages , Grenoble, France, June [2] Ehu Artzy, James A. Hins, an Harry J. Saal. A fast ivision technique for constant ivisors. CACM, 19(2):98 101, February [3] Henry G. Baker. Computing A*B (mo N) efficiently in ANSI C. ACM SIGPLAN Notices, 27(1):95 98, January [4] H.B. Bakoglu, G.F. Grohoski, an R. K. Montoye. The IBM RISC system/6000 processor: Harware overview. IBM Journal of Research an Development, 34(1):12 22, January [5] Robert Bernstein. Multiplication by integer constants. Software Practice an Experience, 16(7): , July [6] Raymon T. Boute. The Eucliean efinition of the functions iv an mo. ACM Transactions on Programming Languages an Systems, 14(2): , April [7] A.P. Chang. A note on the moulo operation. SIGPLAN Notices, 20(4):19 23, April [8] Digital Equipment Corporation. DECchip AA Microprocessor, Harware Reference Manual, 1st eition, October [9] Intel Corporation, Santa Clara, CA. 386 DX Microprocessor Programmer s Reference Manual, [10] Intel Corporation, Santa Clara, CA. Intel486 Microprocessor Family Programmer s Reference Manual, [11] Davi H. Jacobsohn. A combinatoric ivision algorithm for fixe-integer ivisors. IEEE Trans. Comp., C 22(6): , June [12] Gerry Kane. MIPS RISC Architecture. Prentice Hall, Englewoo Cliffs, NJ, 1989.

11 #efine BUFSIZE 50 char *ecimal (unsigne int x) { static char buf[bufsize]; char *bp = buf + BUFSIZE - 1; *bp = 0; o { *--bp = 0 + x % 10; x /= 10; } while (x!= 0); return bp; /* Return pointer to first igit */ } Figure 11.1: Raix conversion coe Alpha MIPS POWER SPARC \$2,buf la \$5,buf+49 l 10,LC..0(2) sethi %hi(buf+49),%g2 sb \$0,0(\$5) cau 11,0,0xcccc or %g2,%lo(buf+49),%o1 li \$6,0xcccc0000 oril 11,11,0xccc stb %g0,[%o1] ori \$6,\$6,0xccc cal 0,0(0) sethi %hi(0xccccccc),%g2 L1: multu \$4,\$6 stb 0,0(10) or %g2,0xc,%o2 mfhi \$3 L1: mul 9,3,11 L1: a %o1,-1,%o1 subu \$5,\$5,1 srai 0,3,31 umul %o0,%o2,%g0 srl \$3,\$3,3 an 0,0,11 r %y,%g3 sll \$2,\$3,2 a 9,9,0 srl %g3,3,%g3 au \$2,\$2,\$3 a 9,9,3 sll %g3,2,%g2 sll \$2,\$2,1 sri 9,9,3 a %g2,%g3,%g2 subu \$2,\$4,\$2 muli 0,9,10 sll %g2,1,%g2 au \$2,\$2,48 sf 0,0,3 sub %o0,%g2,%g2 move \$4,\$3 ai. 3,9,0 a %g2,48,%g2 bne \$4,\$0,L1 ai 0,0,48 orcc %g3,%g0,%o0 sb \$2,0(\$5) stbu 0,-1(10) bne L1 j \$31 bc 4,2,L1 stb %g2,[%o1] move \$2,\$5 la lq u \$1,49(\$2) aq \$2,49,\$0 mskbl \$1,\$0,\$1 stq u \$1,49(\$2) L1: zapnot \$16,15,\$3 s4subq \$3,\$3,\$2 s4aq \$2,\$3,\$2 s4subq \$2,\$3,\$2 sll \$2,8,\$1 subq \$0,1,\$0 aq \$2,\$1,\$2 sll \$2,16,\$1 lq u \$4,0(\$0) aq \$2,\$1,\$2 s4aq \$2,\$3,\$2 srl \$2,35,\$2 mskbl \$4,\$0,\$4 s4al \$2,\$2,\$1 aq \$1,\$1,\$1 subl \$16,\$1,\$1 al \$1,48,\$1 insbl \$1,\$0,\$1 bis \$2,\$2,\$16 bis \$1,\$4,\$1 stq u \$1,0(\$0) bne \$16,L1 ret \$31,(\$26),1 ai 3,10,0 br retl mov Table 11.1: Coe generate by our GCC for raix conversion %o1,%o0

12 Architecture/Implementation MHz Time with ivision performe Time with ivision eliminate Speeup ratio Motorola MC68020 [18, pp. 9 22] Motorola MC SPARC Viking [20] HP PA MIPS R3000 [12] MIPS R4000 [17] POWER/RIOS I [4, 22] DEC Alpha [8] * *This time ifference is artificial. The Alpha architecture has no integer ivie instruction, an the DEC library functions for ivision are slow. Table 11.2: Timing (microsecons) for raix conversion with an without ivision elimination [13] Donal E. Knuth. An empirical stuy of FOR- TRAN programs. Technical Report CS 186, Computer Science Department, Stanfor University, Stanfor artificial intelligence project memo AIM 137. [14] Donal E. Knuth. Seminumerical Algorithms, volume 2 of The Art of Computer Programming. Aison-Wesley, Reaing, MA, 2n eition, [15] Shuo-Yen Robert Li. Fast constant ivision routines. IEEE Trans. Comp., C 34(9): , September [16] Daniel J. Magenheimer, Liz Peters, Karl Pettis, an Dan Zuras. Integer multiplication an ivision on the HP Precision Architecture. In Proceeings Secon International Conference on Architectural Support for Programming Languages an Operating Systems (ASPLOS II). ACM, Publishe as SIGPLAN Notices, Volume 22, No. 10, October, [17] MIPS Computer Systems, Inc, Sunnyvale, CA. MIPS R4000 Microprocessor User s Manual, [18] Motorola, Inc. MC Bit Microprocessor User s Manual, 2n eition, [19] Motorola, Inc. PowerPC 601 RISC Microprocessor User s Manual, [20] SPARC International, Inc., Menlo Park, CA. The SPARC Architecture Manual, Version 8, [21] Richar M. Stallman. Using an Porting GCC. The Free Software Founation, Cambrige, MA, [22] Henry Warren. Preicting Execution Time on the IBM RISC System/6000. IBM, Preliminary Version.

### Improved division by invariant integers

1 Improve ivision by invariant integers Niels Möller an Torbjörn Granlun Abstract This paper consiers the problem of iviing a two-wor integer by a single-wor integer, together with a few extensions an

### Integral Regular Truncated Pyramids with Rectangular Bases

Integral Regular Truncate Pyramis with Rectangular Bases Konstantine Zelator Department of Mathematics 301 Thackeray Hall University of Pittsburgh Pittsburgh, PA 1560, U.S.A. Also: Konstantine Zelator

### Math 230.01, Fall 2012: HW 1 Solutions

Math 3., Fall : HW Solutions Problem (p.9 #). Suppose a wor is picke at ranom from this sentence. Fin: a) the chance the wor has at least letters; SOLUTION: All wors are equally likely to be chosen. The

### Factoring Dickson polynomials over finite fields

Factoring Dickson polynomials over finite fiels Manjul Bhargava Department of Mathematics, Princeton University. Princeton NJ 08544 manjul@math.princeton.eu Michael Zieve Department of Mathematics, University

### Chapter II Binary Data Representation

Chapter II Binary Data Representation The atomic unit of data in computer systems is the bit, which is actually an acronym that stands for BInary digit. It can hold only 2 values or states: 0 or 1, true

### PROBLEMS. A.1 Implement the COINCIDENCE function in sum-of-products form, where COINCIDENCE = XOR.

724 APPENDIX A LOGIC CIRCUITS (Corrispone al cap. 2 - Elementi i logica) PROBLEMS A. Implement the COINCIDENCE function in sum-of-proucts form, where COINCIDENCE = XOR. A.2 Prove the following ientities

### A Generalization of Sauer s Lemma to Classes of Large-Margin Functions

A Generalization of Sauer s Lemma to Classes of Large-Margin Functions Joel Ratsaby University College Lonon Gower Street, Lonon WC1E 6BT, Unite Kingom J.Ratsaby@cs.ucl.ac.uk, WWW home page: http://www.cs.ucl.ac.uk/staff/j.ratsaby/

### Quiz for Chapter 3 Arithmetic for Computers 3.10

Date: Quiz for Chapter 3 Arithmetic for Computers 3.10 Not all questions are of equal difficulty. Please review the entire quiz first and then budget your time carefully. Name: Course: Solutions in RED

### Computer Organization and Architecture

Computer Organization and Architecture Chapter 9 Computer Arithmetic Arithmetic & Logic Unit Performs arithmetic and logic operations on data everything that we think of as computing. Everything else in

### Lecture 8: Binary Multiplication & Division

Lecture 8: Binary Multiplication & Division Today s topics: Addition/Subtraction Multiplication Division Reminder: get started early on assignment 3 1 2 s Complement Signed Numbers two = 0 ten 0001 two

### Pythagorean Triples Over Gaussian Integers

International Journal of Algebra, Vol. 6, 01, no., 55-64 Pythagorean Triples Over Gaussian Integers Cheranoot Somboonkulavui 1 Department of Mathematics, Faculty of Science Chulalongkorn University Bangkok

### Binary Representation and Computer Arithmetic

Binary Representation and Computer Arithmetic The decimal system of counting and keeping track of items was first created by Hindu mathematicians in India in A.D. 4. Since it involved the use of fingers

### Digital Logic. The Binary System is a way of writing numbers using only the digits 0 and 1. This is the method used by the (digital) computer.

Digital Logic 1 Data Representations 1.1 The Binary System The Binary System is a way of writing numbers using only the digits 0 and 1. This is the method used by the (digital) computer. The system we

### CHAPTER 5: MODULAR ARITHMETIC

CHAPTER 5: MODULAR ARITHMETIC LECTURE NOTES FOR MATH 378 (CSUSM, SPRING 2009). WAYNE AITKEN 1. Introduction In this chapter we will consider congruence modulo m, and explore the associated arithmetic called

### Review 1/2. CS61C Characters and Floating Point. Lecture 8. February 12, Review 2/2 : 12 new instructions Arithmetic:

Review 1/2 CS61C Characters and Floating Point Lecture 8 February 12, 1999 Handling case when number is too big for representation (overflow) Representing negative numbers (2 s complement) Comparing signed

### 19.2. First Order Differential Equations. Introduction. Prerequisites. Learning Outcomes

First Orer Differential Equations 19.2 Introuction Separation of variables is a technique commonly use to solve first orer orinary ifferential equations. It is so-calle because we rearrange the equation

### CS 103X: Discrete Structures Homework Assignment 3 Solutions

CS 103X: Discrete Structures Homework Assignment 3 s Exercise 1 (20 points). On well-ordering and induction: (a) Prove the induction principle from the well-ordering principle. (b) Prove the well-ordering

### ECE 0142 Computer Organization. Lecture 3 Floating Point Representations

ECE 0142 Computer Organization Lecture 3 Floating Point Representations 1 Floating-point arithmetic We often incur floating-point programming. Floating point greatly simplifies working with large (e.g.,

### 10.2 Systems of Linear Equations: Matrices

SECTION 0.2 Systems of Linear Equations: Matrices 7 0.2 Systems of Linear Equations: Matrices OBJECTIVES Write the Augmente Matrix of a System of Linear Equations 2 Write the System from the Augmente Matrix

### CHAPTER 5 : CALCULUS

Dr Roger Ni (Queen Mary, University of Lonon) - 5. CHAPTER 5 : CALCULUS Differentiation Introuction to Differentiation Calculus is a branch of mathematics which concerns itself with change. Irrespective

### Inverse Trig Functions

Inverse Trig Functions c A Math Support Center Capsule February, 009 Introuction Just as trig functions arise in many applications, so o the inverse trig functions. What may be most surprising is that

### INTEGER DIVISION BY CONSTANTS

12/30/03 CHAPTER 10 INTEGER DIVISION BY CONSTANTS Insert this material at the end of page 201, just before the poem on page 202. 10 17 Methods Not Using Multiply High In this section we consider some methods

### It is time to prove some theorems. There are various strategies for doing

CHAPTER 4 Direct Proof It is time to prove some theorems. There are various strategies for doing this; we now examine the most straightforward approach, a technique called direct proof. As we begin, it

### Mathematics Review for Economists

Mathematics Review for Economists by John E. Floy University of Toronto May 9, 2013 This ocument presents a review of very basic mathematics for use by stuents who plan to stuy economics in grauate school

### Prime Numbers. Chapter Primes and Composites

Chapter 2 Prime Numbers The term factoring or factorization refers to the process of expressing an integer as the product of two or more integers in a nontrivial way, e.g., 42 = 6 7. Prime numbers are

### CHAPTER THREE. 3.1 Binary Addition. Binary Math and Signed Representations

CHAPTER THREE Binary Math and Signed Representations Representing numbers with bits is one thing. Doing something with them is an entirely different matter. This chapter discusses some of the basic mathematical

### Computer Science 281 Binary and Hexadecimal Review

Computer Science 281 Binary and Hexadecimal Review 1 The Binary Number System Computers store everything, both instructions and data, by using many, many transistors, each of which can be in one of two

### Elementary Number Theory We begin with a bit of elementary number theory, which is concerned

CONSTRUCTION OF THE FINITE FIELDS Z p S. R. DOTY Elementary Number Theory We begin with a bit of elementary number theory, which is concerned solely with questions about the set of integers Z = {0, ±1,

### A simple and fast algorithm for computing exponentials of power series

A simple and fast algorithm for computing exponentials of power series Alin Bostan Algorithms Project, INRIA Paris-Rocquencourt 7815 Le Chesnay Cedex France and Éric Schost ORCCA and Computer Science Department,

### MIPS floating-point arithmetic

MIPS floating-point arithmetic Floating-point computations are vital for many applications, but correct implementation of floating-point hardware and software is very tricky. Today we ll study the IEEE

### 198:211 Computer Architecture

198:211 Computer Architecture Topics: Lecture 8 (W5) Fall 2012 Data representation 2.1 and 2.2 of the book Floating point 2.4 of the book 1 Computer Architecture What do computers do? Manipulate stored

### 2 HYPERBOLIC FUNCTIONS

HYPERBOLIC FUNCTIONS Chapter Hyperbolic Functions Objectives After stuying this chapter you shoul unerstan what is meant by a hyperbolic function; be able to fin erivatives an integrals of hyperbolic functions;

### Differentiability of Exponential Functions

Differentiability of Exponential Functions Philip M. Anselone an John W. Lee Philip Anselone (panselone@actionnet.net) receive his Ph.D. from Oregon State in 1957. After a few years at Johns Hopkins an

### Representation of Data

Representation of Data In contrast with higher-level programming languages, C does not provide strong abstractions for representing data. Indeed, while languages like Racket has a rich notion of data type

### Instruction Set Architecture. or How to talk to computers if you aren t in Star Trek

Instruction Set Architecture or How to talk to computers if you aren t in Star Trek The Instruction Set Architecture Application Compiler Instr. Set Proc. Operating System I/O system Instruction Set Architecture

### Number Systems and Number Representation

Number Systems and Number Representation 1 For Your Amusement Question: Why do computer programmers confuse Christmas and Halloween? Answer: Because 25 Dec = 31 Oct -- http://www.electronicsweekly.com

### This Unit: Floating Point Arithmetic. CIS 371 Computer Organization and Design. Readings. Floating Point (FP) Numbers

This Unit: Floating Point Arithmetic CIS 371 Computer Organization and Design Unit 7: Floating Point App App App System software Mem CPU I/O Formats Precision and range IEEE 754 standard Operations Addition

### Bits, Data Types, and Operations. University of Texas at Austin CS310H - Computer Organization Spring 2010 Don Fussell

Bits, Data Types, and Operations University of Texas at Austin CS3H - Computer Organization Spring 2 Don Fussell How do we represent data in a computer? At the lowest level, a computer is an electronic

### Number Systems and. Data Representation

Number Systems and Data Representation 1 Lecture Outline Number Systems Binary, Octal, Hexadecimal Representation of characters using codes Representation of Numbers Integer, Floating Point, Binary Coded

### West Windsor-Plainsboro Regional School District Algebra I Part 2 Grades 9-12

West Windsor-Plainsboro Regional School District Algebra I Part 2 Grades 9-12 Unit 1: Polynomials and Factoring Course & Grade Level: Algebra I Part 2, 9 12 This unit involves knowledge and skills relative

### Integer Multiplication and Division

6 Integer Multiplication and Division 6.1 Objectives After completing this lab, you will: Understand binary multiplication and division Understand the MIPS multiply and divide instructions Write MIPS programs

### Presented By: Ms. Poonam Anand

Presented By: Ms. Poonam Anand Know the different types of numbers Describe positional notation Convert numbers in other bases to base 10 Convert base 10 numbers into numbers of other bases Describe the

### Firewall Design: Consistency, Completeness, and Compactness

C IS COS YS TE MS Firewall Design: Consistency, Completeness, an Compactness Mohame G. Goua an Xiang-Yang Alex Liu Department of Computer Sciences The University of Texas at Austin Austin, Texas 78712-1188,

### The Laws of Cryptography Cryptographers Favorite Algorithms

2 The Laws of Cryptography Cryptographers Favorite Algorithms 2.1 The Extended Euclidean Algorithm. The previous section introduced the field known as the integers mod p, denoted or. Most of the field

### Some Polynomial Theorems. John Kennedy Mathematics Department Santa Monica College 1900 Pico Blvd. Santa Monica, CA 90405 rkennedy@ix.netcom.

Some Polynomial Theorems by John Kennedy Mathematics Department Santa Monica College 1900 Pico Blvd. Santa Monica, CA 90405 rkennedy@ix.netcom.com This paper contains a collection of 31 theorems, lemmas,

### TECH. Arithmetic & Logic Unit. CH09 Computer Arithmetic. Number Systems. ALU Inputs and Outputs. Binary Number System

CH09 Computer Arithmetic CPU combines of ALU and Control Unit, this chapter discusses ALU The Arithmetic and Logic Unit (ALU) Number Systems Integer Representation Integer Arithmetic Floating-Point Representation

### A single register, called the accumulator, stores the. operand before the operation, and stores the result. Add y # add y from memory to the acc

Other architectures Example. Accumulator-based machines A single register, called the accumulator, stores the operand before the operation, and stores the result after the operation. Load x # into acc

### Math Review. for the Quantitative Reasoning Measure of the GRE revised General Test

Math Review for the Quantitative Reasoning Measure of the GRE revised General Test www.ets.org Overview This Math Review will familiarize you with the mathematical skills and concepts that are important

### Chapter 2 Review of Classical Action Principles

Chapter Review of Classical Action Principles This section grew out of lectures given by Schwinger at UCLA aroun 1974, which were substantially transforme into Chap. 8 of Classical Electroynamics (Schwinger

### Memory Management. 3.1 Fixed Partitioning

Chapter 3 Memory Management In a multiprogramming system, in orer to share the processor, a number o processes must be kept in memory. Memory management is achieve through memory management algorithms.

### Lecture 3: Finding integer solutions to systems of linear equations

Lecture 3: Finding integer solutions to systems of linear equations Algorithmic Number Theory (Fall 2014) Rutgers University Swastik Kopparty Scribe: Abhishek Bhrushundi 1 Overview The goal of this lecture

### Reduced Instruction Set Computer (RISC)

Reduced Instruction Set Computer (RISC) Focuses on reducing the number and complexity of instructions of the ISA. RISC Goals RISC: Simplify ISA Simplify CPU Design Better CPU Performance Motivated by simplifying

### ECE 0142 Computer Organization. Lecture 3 Floating Point Representations

ECE 0142 Computer Organization Lecture 3 Floating Point Representations 1 Floating-point arithmetic We often incur floating-point programming. Floating point greatly simplifies working with large (e.g.,

### Integer and Real Numbers Representation in Microprocessor Techniques

Brno University of Technology Integer and Real Numbers Representation in Microprocessor Techniques Microprocessor Techniques and Embedded Systems Lecture 1 Dr. Tomas Fryza 30-Sep-2011 Contents Numerical

### Answers to the Practice Problems for Test 2

Answers to the Practice Problems for Test 2 Davi Murphy. Fin f (x) if it is known that x [f(2x)] = x2. By the chain rule, x [f(2x)] = f (2x) 2, so 2f (2x) = x 2. Hence f (2x) = x 2 /2, but the lefthan

### Elementary Number Theory and Methods of Proof. CSE 215, Foundations of Computer Science Stony Brook University http://www.cs.stonybrook.

Elementary Number Theory and Methods of Proof CSE 215, Foundations of Computer Science Stony Brook University http://www.cs.stonybrook.edu/~cse215 1 Number theory Properties: 2 Properties of integers (whole

### 26 Integers: Multiplication, Division, and Order

26 Integers: Multiplication, Division, and Order Integer multiplication and division are extensions of whole number multiplication and division. In multiplying and dividing integers, the one new issue

### U.C. Berkeley CS276: Cryptography Handout 0.1 Luca Trevisan January, 2009. Notes on Algebra

U.C. Berkeley CS276: Cryptography Handout 0.1 Luca Trevisan January, 2009 Notes on Algebra These notes contain as little theory as possible, and most results are stated without proof. Any introductory

### A Comparison of Performance Measures for Online Algorithms

A Comparison of Performance Measures for Online Algorithms Joan Boyar 1, Sany Irani 2, an Kim S. Larsen 1 1 Department of Mathematics an Computer Science, University of Southern Denmark, Campusvej 55,

### CS 16: Assembly Language Programming for the IBM PC and Compatibles

CS 16: Assembly Language Programming for the IBM PC and Compatibles First, a little about you Your name Have you ever worked with/used/played with assembly language? If so, talk about it Why are you taking

### An Efficient RNS to Binary Converter Using the Moduli Set {2n + 1, 2n, 2n 1}

An Efficient RNS to Binary Converter Using the oduli Set {n + 1, n, n 1} Kazeem Alagbe Gbolagade 1,, ember, IEEE and Sorin Dan Cotofana 1, Senior ember IEEE, 1. Computer Engineering Laboratory, Delft University

### Chapter 6. Number Theory. 6.1 The Division Algorithm

Chapter 6 Number Theory The material in this chapter offers a small glimpse of why a lot of facts that you ve probably nown and used for a long time are true. It also offers some exposure to generalization,

### Approximation Errors in Computer Arithmetic (Chapters 3 and 4)

Approximation Errors in Computer Arithmetic (Chapters 3 and 4) Outline: Positional notation binary representation of numbers Computer representation of integers Floating point representation IEEE standard

### n-parameter families of curves

1 n-parameter families of curves For purposes of this iscussion, a curve will mean any equation involving x, y, an no other variables. Some examples of curves are x 2 + (y 3) 2 = 9 circle with raius 3,

### 8051 Programming. The 8051 may be programmed using a low-level or a high-level programming language.

8051 Programming The 8051 may be programmed using a low-level or a high-level programming language. Low-Level Programming Assembly language programming writes statements that the microcontroller directly

### A Second Course in Mathematics Concepts for Elementary Teachers: Theory, Problems, and Solutions

A Second Course in Mathematics Concepts for Elementary Teachers: Theory, Problems, and Solutions Marcel B. Finan Arkansas Tech University c All Rights Reserved First Draft February 8, 2006 1 Contents 25

### CHAPTER 2 Data Representation in Computer Systems

CHAPTER 2 Data Representation in Computer Systems 2.1 Introduction 47 2.2 Positional Numbering Systems 48 2.3 Converting Between Bases 48 2.3.1 Converting Unsigned Whole Numbers 49 2.3.2 Converting Fractions

### MODULAR ARITHMETIC. a smallest member. It is equivalent to the Principle of Mathematical Induction.

MODULAR ARITHMETIC 1 Working With Integers The usual arithmetic operations of addition, subtraction and multiplication can be performed on integers, and the result is always another integer Division, on

### Determining the Optimal Combination of Trial Division and Fermat s Factorization Method

Determining the Optimal Combination of Trial Division and Fermat s Factorization Method Joseph C. Woodson Home School P. O. Box 55005 Tulsa, OK 74155 Abstract The process of finding the prime factorization

### M147 Practice Problems for Exam 2

M47 Practice Problems for Exam Exam will cover sections 4., 4.4, 4.5, 4.6, 4.7, 4.8, 5., an 5.. Calculators will not be allowe on the exam. The first ten problems on the exam will be multiple choice. Work

### 2010/9/19. Binary number system. Binary numbers. Outline. Binary to decimal

2/9/9 Binary number system Computer (electronic) systems prefer binary numbers Binary number: represent a number in base-2 Binary numbers 2 3 + 7 + 5 Some terminology Bit: a binary digit ( or ) Hexadecimal

### Unit 2: Number Systems, Codes and Logic Functions

Unit 2: Number Systems, Codes and Logic Functions Introduction A digital computer manipulates discrete elements of data and that these elements are represented in the binary forms. Operands used for calculations

### Fixed-Point Arithmetic

Fixed-Point Arithmetic Fixed-Point Notation A K-bit fixed-point number can be interpreted as either: an integer (i.e., 20645) a fractional number (i.e., 0.75) 2 1 Integer Fixed-Point Representation N-bit

### The mathematics of RAID-6

The mathematics of RAID-6 H. Peter Anvin First version 20 January 2004 Last updated 20 December 2011 RAID-6 supports losing any two drives. syndromes, generally referred P and Q. The way

### Homework 5 Solutions

Homework 5 Solutions 4.2: 2: a. 321 = 256 + 64 + 1 = (01000001) 2 b. 1023 = 512 + 256 + 128 + 64 + 32 + 16 + 8 + 4 + 2 + 1 = (1111111111) 2. Note that this is 1 less than the next power of 2, 1024, which

### Arithmetic Operations

Arithmetic Operations Dongbing Gu School of Computer Science and Electronic Engineering University of Essex UK Spring 2013 D. Gu (Univ. of Essex) Arithmetic Operations Spring 2013 1 / 34 Outline 1 Introduction

### Modelling and Resolving Software Dependencies

June 15, 2005 Abstract Many Linux istributions an other moern operating systems feature the explicit eclaration of (often complex) epenency relationships between the pieces of software

### Instruction Set Architecture (ISA)

Instruction Set Architecture (ISA) * Instruction set architecture of a machine fills the semantic gap between the user and the machine. * ISA serves as the starting point for the design of a new machine

### Numerical Matrix Analysis

Numerical Matrix Analysis Lecture Notes #10 Conditioning and / Peter Blomgren, blomgren.peter@gmail.com Department of Mathematics and Statistics Dynamical Systems Group Computational Sciences Research

### Chapter 4, Arithmetic in F [x] Polynomial arithmetic and the division algorithm.

Chapter 4, Arithmetic in F [x] Polynomial arithmetic and the division algorithm. We begin by defining the ring of polynomials with coefficients in a ring R. After some preliminary results, we specialize

### CHAPTER 5 Round-off errors

CHAPTER 5 Round-off errors In the two previous chapters we have seen how numbers can be represented in the binary numeral system and how this is the basis for representing numbers in computers. Since any

### 11 Ideals. 11.1 Revisiting Z

11 Ideals The presentation here is somewhat different than the text. In particular, the sections do not match up. We have seen issues with the failure of unique factorization already, e.g., Z[ 5] = O Q(

### Introduction Number Systems and Conversion

UNIT 1 Introduction Number Systems and Conversion Objectives 1. Introduction The first part of this unit introduces the material to be studied later. In addition to getting an overview of the material

### ELET 7404 Embedded & Real Time Operating Systems. Fixed-Point Math. Chap. 9, Labrosse Book. Fall 2007

ELET 7404 Embedded & Real Time Operating Systems Fixed-Point Math Chap. 9, Labrosse Book Fall 2007 Fixed-Point Math Most low-end processors, such as embedded processors Do not provide hardware-assisted

### Properties of Real Numbers

16 Chapter P Prerequisites P.2 Properties of Real Numbers What you should learn: Identify and use the basic properties of real numbers Develop and use additional properties of real numbers Why you should

### Levent EREN levent.eren@ieu.edu.tr A-306 Office Phone:488-9882 INTRODUCTION TO DIGITAL LOGIC

Levent EREN levent.eren@ieu.edu.tr A-306 Office Phone:488-9882 1 Number Systems Representation Positive radix, positional number systems A number with radix r is represented by a string of digits: A n

### The mathematics of RAID-6

The mathematics of RAID-6 H. Peter Anvin 1 December 2004 RAID-6 supports losing any two drives. The way this is done is by computing two syndromes, generally referred P and Q. 1 A quick

### 4. Number Theory (Part 2)

4. Number Theory (Part 2) Terence Sim Mathematics is the queen of the sciences and number theory is the queen of mathematics. Reading Sections 4.8, 5.2 5.4 of Epp. Carl Friedrich Gauss, 1777 1855 4.3.

### Unit 5 Central Processing Unit (CPU)

Unit 5 Central Processing Unit (CPU) Introduction Part of the computer that performs the bulk of data-processing operations is called the central processing unit (CPU). It consists of 3 major parts: Register

### The programming language C. sws1 1

The programming language C sws1 1 The programming language C invented by Dennis Ritchie in early 1970s who used it to write the first Hello World program C was used to write UNIX Standardised as K&C (Kernighan

### A New Vulnerable Class of Exponents in RSA

A ew Vulnerable Class of Exponents in RSA Aberrahmane itaj Laboratoire e Mathmatiues icolas Oresme Universit e Caen, France nitaj@math.unicaen.fr http://www.math.unicaen.fr/~nitaj Abstract Let = p be an

### Theorem (The division theorem) Suppose that a and b are integers with b > 0. There exist unique integers q and r so that. a = bq + r and 0 r < b.

Theorem (The division theorem) Suppose that a and b are integers with b > 0. There exist unique integers q and r so that a = bq + r and 0 r < b. We re dividing a by b: q is the quotient and r is the remainder,

### Basic Computer Organization

SE 292 (3:0) High Performance Computing L2: Basic Computer Organization R. Govindarajan govind@serc Basic Computer Organization Main parts of a computer system: Processor: Executes programs Main memory:

### The BBP Algorithm for Pi

The BBP Algorithm for Pi David H. Bailey September 17, 2006 1. Introduction The Bailey-Borwein-Plouffe (BBP) algorithm for π is based on the BBP formula for π, which was discovered in 1995 and published

### Chapter 4. Computer Arithmetic

Chapter 4 Computer Arithmetic 4.1 Number Systems A number system uses a specific radix (base). Radices that are power of 2 are widely used in digital systems. These radices include binary (base 2), quaternary

### 5 =5. Since 5 > 0 Since 4 7 < 0 Since 0 0

a p p e n d i x e ABSOLUTE VALUE ABSOLUTE VALUE E.1 definition. The absolute value or magnitude of a real number a is denoted by a and is defined by { a if a 0 a = a if a