Division by Invariant Integers using Multiplication


 Rosa Snow
 2 years ago
 Views:
Transcription
1 Division by Invariant Integers using Multiplication Torbjörn Granlun Cygnus Support 1937 Lanings Drive Mountain View, CA Peter L. Montgomery Centrum voor Wiskune en Informatica 780 Las Colinas Roa San Rafael, CA Abstract Integer ivision remains expensive on toay s processors as the cost of integer multiplication eclines. We present coe sequences for ivision by arbitrary nonzero integer constants an run time invariants using integer multiplication. The algorithms assume a two s complement architecture. Most also require that the upper half of an integer prouct be quickly accessible. We treat unsigne ivision, signe ivision where the quotient rouns towars zero, signe ivision where the quotient rouns towars, an ivision where the result is known a priori to be exact. We give some implementation results using the C compiler GCC. 1 Introuction The cost of an integer ivision on toay s RISC processors is several times that of an integer multiplication. The tren is towars fast, often pipeline combinatoric multipliers that perform an operation in typically less than 10 cycles, with either no harware support for integer ivision or iterating iviers that are several times slower than the multiplier. Table 1.1 compares multiplication an ivision times on some processors. This table illustrates that the iscrepancy between multiplication an ivision timing has been growing. Integer ivision is use heavily in base conversions, number theoretic coes, an graphics coes. Compilers Work one by first author while at Sweish Institute of Computer Science, Stockholm, Sween. Work one by secon author while at University of California, Los Angeles. Supporte by U.S. Army fellowship DAAL03 89 G generate integer ivisions to compute loop counts an subtract pointers. In a static analysis of FORTRAN programs, Knuth [13, p. 9] reports that 39% of arithmetic operators were aitions, 22% subtractions, 27% multiplications, 10% ivisions, an 2% exponentiations. Knuth s counts o not istinguish integer an floating point operations, except that 4% of the ivisions were ivisions by 2. When integer multiplication is cheaper than integer ivision, it is beneficial to substitute a multiplication for a ivision. Multiple authors [2, 11, 15] present algorithms for ivision by constants, but only when the ivisor ivies 2 k 1 for some small k. Magenheimer et al [16, 7] give the founation of a more general approach, which Alverson [1] implements on the Tera Computing System. Compiler writers are only beginning to become aware of the general technique. For example, version 1.02 of the IBM RS/6000 xlc an xlf compilers uses the integer multiply instruction to expan signe integer ivisions by 3, 5, 7, 9, 25, an 125, but not by other o integer ivisors below 256, an never for unsigne ivision. We assume an N bit two s complement architecture. Unsigne (i.e., nonnegative) integers range from 0 to 2 N 1 inclusive; signe integers range from 2 N 1 to 2 N 1 1. We enote these integers by uwor an swor respectively. Unsigne oublewor integers (range 0 to 2 2N 1) are enote by uwor. Signe oublewor integers (range 2 2N 1 to 2 2N 1 1) are enote by swor. The type int is use for shift counts an logarithms. Several of the algorithms require the upper half of an integer prouct obtaine by multiplying two uwors or two swors. All algorithms nee simple operations such as as, shifts, an bitwise operations (bit ops) on uwors an swors, as summarize in Table 3.1. We show how to use these operations to ivie by arbitrary nonzero constants, as well as by ivisors which are loop invariant or repeate in a basic block, using one multiplication plus a few simple instructions per ivision. The presentation concentrates on three types of
2 Architecture/Implementation N Approx. Year Motorola MC68020 [18, pp. 9 22] Time (cycles) for HIGH(N bit N bit) Motorola MC Intel 386 [9] Intel 486 [10] Intel Pentium SPARC Cypress CY7C S 100 S SPARC Viking [20] HP PA 83 [16] S 70 S HP PA FP 70 S MIPS R3000 [12] P 35 P Time (cycles) for N bit/n bit ivie (unsigne) (signe) MIPS R4000 [17] P 139 POWER/RIOS I [4, 22] (signe only) 19 (signe only) PowerPC/MPC601 [19] DEC Alpha 21064AA [8] P 200 S Motorola MC S 38 Motorola MC P 18 S No irect harware support; approximate cycle count for software implementation F Does not inclue time for moving ata to/from floating point registers P Pipeline implementation (i.e., inepenent instructions can execute simultaneously) Table 1.1: Multiplication an ivision times on ifferent CPUs ivision, in orer by ifficulty: (i) unsigne, (ii) signe, quotient roune towars zero, (iii) signe, quotient roune towars. Other topics are ivision of a uwor by a run time invariant uwor, ivision when the remainer is known a priori to be zero, an testing for a given remainer. In each case we give the mathematical backgroun an suggest an algorithm which a compiler can use to generate the coe. The algorithms are ineffective when a ivisor is not invariant, such as in the Eucliean GCD algorithm. Most algorithms presente herein yiel only the quotient. The remainer, if esire, can be compute by an aitional multiplication an subtraction. We have implemente the algorithms in a evelopmental version of the GCC 2.6 compiler [21]. DEC uses some of these algorithms in its Alpha AXP compilers. 2 Mathematical notations Let x be a real number. Then x enotes the largest integer not exceeing x an x enotes the least integer not less than x. Let TRUNC(x) enote the integer part of x, roune towars zero. Formally, TRUNC(x) = x if x 0 an TRUNC(x) = x if x < 0. The absolute value of x is x. For x > 0, the (real) base 2 logarithm of x is log 2 x. A multiplication is written x y. If x, y, an n are integers an n 0, then x y (mo n) means x y is a multiple of n. Two remainer operators are common in language efinitions. Sometimes a remainer has the sign of the ivien an sometimes the sign of the ivisor. We use the Aa notations n rem = n TRUNC(n/) n mo = n n/ (sign of ivien), (sign of ivisor). (2.1) The Fortran 90 names are MOD an MODULO. In C, the efinition of remainer is implementation epenent (many C implementations roun signe quotients towars zero an use rem remainering). Other efinitions have been propose [6, 7]. If n is an uwor or swor, then HIGH(n) an LOW(n) enote the most significant an least significant halves of n. LOW(n) is a uwor, while HIGH(n) is an uwor if n is a uwor an an swor if n is a swor. In both cases n = 2 N HIGH(n) + LOW(n). 3 Assume instructions The suggeste coe assumes the operations in Table 3.1, on an N bit machine. Some primitives, such as loaing constants an operans, are implicit in the notation an are not inclue in the operation counts.
3 TRUNC(x) Truncation towars zero; see 2. HIGH(x), LOW(x) Upper an lower halves of x: see 2. MULL(x, y) Lower half of prouct x y (i.e., prouct moulo 2 N ). MULSH(x, y) Upper half of signe prouct x y: If 2 N 1 x, y 2 N 1 1, then x y = 2 N MULSH(x, y) + MULL(x, y). MULUH(x, y) Upper half of unsigne prouct x y: If 0 x, y 2 N 1, then x y = 2 N MULUH(x, y) + MULL(x, y). AND(x, y) Bitwise AND of x an y. EOR(x, y) Bitwise exclusive OR of x an y. NOT(x) Bitwise complement of x. Equal to 1 x if x is signe, to 2 N 1 x if x is unsigne. OR(x, y) Bitwise OR of x an y. SLL(x, n) Logical left shift of x by n bits (0 n N 1). SRA(x, n) Arithmetic right shift of x by n bits (0 n N 1). SRL(x, n) Logical right shift of x by n bits (0 n N 1). XSIGN(x) 1 if x < 0; 0 if x 0. Short for SRA(x, N 1) or SRL(x, N 1). x + y, x y, x Two s complement aition, subtraction, negation. Table 3.1: Mathematical notations an primitive operations The algorithm in 8 requires the ability to a or subtract two oublewors, obtaining a oublewor result; this typically expans into 2 4 instructions. The algorithms for processing constant ivisors require compile time arithmetic on uwors. Algorithms for processing run time invariant ivisors require taking the base 2 logarithm of a positive integer (sometimes roune up, sometimes own) an require iviing a uwor by a uwor. If the algorithms are use only for constant ivisors, then these operations are neee only at compile time. If the architecture has a leaing zero count (LDZ) instruction, then these logarithms can be foun from log 2 x = N LDZ(x 1), log 2 x = N 1 LDZ(x) (1 x 2 N 1). Some algorithms may prouce expressions such as SRL(x, 0) or (x y); the optimizer shoul make the obvious simplifications. Some escriptions show an aition or subtraction of 2 N, which is a noop. If an architecture lacks arithmetic right shift, then it can be compute from the ientity SRA(x, l) = SRL(x + 2 N 1, l) 2 N 1 l whenever 0 l N 1. If an architecture has only one of MULSH an MULUH, then the other can be compute using MULUH(x, y) = MULSH(x, y) + AND(x, XSIGN(y)) + AND(y, XSIGN(x)) for arbitrary N bit patterns x, y (interprete as uwors for MULUH an as swors for MULSH). 4 Unsigne ivision Suppose we want to compile an unsigne ivision q = n/, where 0 < < 2 N is a constant or run time invariant an 0 n < 2 N is variable. Let s try to fin a rational approximation m/2 N+l of 1/ such that n m n = 2 N+l whenever 0 n 2 N 1. (4.1) Setting n = in (4.1) shows we require 2 N+l m. Setting n = q 1 shows 2 N+l q > m (q 1). Multiply by to erive ( m 2 N+l) (q 1) < 2 N+l. This inequality will hol for all values of q 1 below 2 N if m 2 N+l 2 l. Theorem 4.2 below states that these conitions are sufficient, because the maximum relative error (1 part in 2 N ) is too small to affect the quotient when n < 2 N. Theorem 4.2 Suppose m,, l are nonnegative integers such that 0 an 2 N+l m 2 N+l + 2 l. (4.3) Then n/ = m n/2 N+l for every integer n with 0 n < 2 N. Proof. Define k = m 2 N+l. Then 0 k 2 l by hypothesis. Given n with 0 n < 2 N, write n = q + r where q = n/ an 0 r 1. We must show that q = m n/2 N+l. A calculation gives m n k + 2N+l q = 2N+l n 2 N+l q = k n 2 N+l + n n r = k 2 l n 2 N 1 + r. (4.4)
4 This ifference is nonnegative an oes not excee 1 2N 1 2 N = N < 1. Theorem 4.2 allows ivision by to be replace with multiplication by m/2 N+l if (4.3) hols. In general we require 2 l 1 to ensure that a suitable multiple of exists in the interval [2 N+l, 2 N+l +2 l ]. For compatibility with the algorithms for signe ivision ( 5 an 6), it is convenient to choose m > 2 N+l even though Theorem 4.2 permits equality. Since m can be almost as large as 2 N+1, we on t multiply by m irectly, but instea by 2 N an m 2 N. This leas to the coe in Figure 4.1. Its cost is 1 multiply, 2 as/subtracts, an 2 shifts per quotient, after computing constants epenent only on the ivisor. Initialization (given uwor with 1 < 2 N ): int l = log 2 ; /* 2 l 2 1 */ uwor m = 2 N (2 l )/ + 1; /* m = 2 N+l / 2 N + 1 */ int sh 1 = min(l, 1); int sh 2 = max(l 1, 0); /* sh 2 = l sh 1 */ For q = n/, all uwor: uwor t 1 = MULUH(m, n); q = SRL(t 1 + SRL(n t 1, sh 1 ), sh 2 ); Figure 4.1: Unsigne ivision by run time invariant ivisor Explanation of Figure 4.1. If = 1, then l = 0, so m = 1 an sh 1 = sh 2 = 0. The coe computes t 1 = 1 n/2 N = 0 an q = n. If > 1, then l 1, so sh 1 = 1 an sh 2 = l 1. Since m 2N (2 l ) + 1 2N ( 1) + 1 < 2 N, the value of m fits in a uwor. Since 0 t 1 n, the formula for q simplifies to q = SRL(t 1 + SRL(n t 1, 1), l 1) t1 + (n t 1 )/2 = 2 l 1 (t1 + n)/2 t1 + n = =. 2 l 1 2 l (4.5) But t 1 + n = m n/2 N + n = (m + 2 N ) n/2 N. Set m = m + 2 N = 2 N+l / + 1. The hypothesis of Theorem 4.2 is satisfie since 2 N+l < m 2 N+l + 2 N+l + 2 l. Caution. Conceptually q is SRL(n + t 1, l), as in (4.5). Do not compute q this way, since n+t 1 may overflow N bits an the shift count may be out of bouns. Improvement. If is constant an a power of 2, replace the ivision by a shift. Improvement. If is constant an m = m + 2 N is even, then reuce m/2 l to lowest terms. The reuce multiplier fits in N bits, unlike the original. In rare cases (e.g., = 641 on a 32 bit machine, = on a 64 bit machine) the final shift is zero. Improvement. If is constant an even, rewrite n n/2 e = /2 e for some e > 0. Then n/2 e can be compute using SRL. Since n/2 e < 2 N e, less precision is neee in the multiplier than before. These ieas are reflecte in Figure 4.2, which generates coe for n/ where n is unsigne an is constant. Proceure CHOOSE MULTIPLIER, which is share by this an later algorithms, appears in Figure 6.2. Inputs: uwor an n, with constant. uwor o, t 1 ; uwor m; int e, l, l ummy, sh post, sh pre ; (m, sh post, l) = CHOOSE MULTIPLIER(, N); if m 2 N an is even then Fin e such that = 2 e o an o is o. /* 2 e = AND(, 2 N ) */ sh pre = e; (m, sh post, l ummy ) = CHOOSE MULTIPLIER( o, N e); else sh pre = 0; en if if = 2 l then Issue q = SRL(n, l); else if m 2 N then assert sh pre = 0; Issue t 1 = MULUH(m 2 N, n); Issue q = SRL(t 1 + SRL(n t 1, 1), sh post 1); else Issue q = SRL(MULUH(m, SRL(n, sh pre )), sh post ); en if Figure 4.2: Optimize coe generation of unsigne q = n/ for constant nonzero The following three examples illustrate the cases in Figure 4.2. All assume unsigne 32 bit arithmetic. Example. q = n/10. CHOOSE MULTIPLIER fins m low = (2 36 6)/10 an m high = ( )/10. After one roun of ivisions by 2, it returns (m, 3, 4), where m = ( )/5. The suggeste coe q = SRL(MULUH(( )/5, n), 3) eliminates the pre shift by 0. See Table Example. q = n/7. Here m = ( )/7 > This example uses the longer sequence in Figure 4.1. Example. q = n/14. CHOOSE MULTIPLIER first returns the same multiplier as when = 7. The
5 suggeste coe uses separate ivisions by 2 an 7: q = SRL(MULUH(( )/7, SRL(n, 1)), 2). 5 Signe ivision, quotient roune towars 0 Suppose we want to compile a signe ivision q = TRUNC(n/), where is constant or run time invariant, 0 < 2 N 1, an where 2 N 1 n 2 N 1 1 is variable. All quotients are to be roune towars zero. We coul prove a theorem like Theorem 4.2 about when TRUNC(n/) = TRUNC(m n/2 N+l ) for all n in a suitable range (cf. (7.1)), but it wouln t help since we can t compute the right sie given only m n/2 N. Instea we show how to ajust the estimate quotient when the ivien or ivisor is negative. Theorem 5.1 Suppose m,, l are integers such that 0 an 0 < m 2 N+l 1 2 l. Let n be an arbitrary integer such that 2 N 1 n 2 N 1 1. Define q 0 = m n/2 N+l 1. Then ( n ) TRUNC = q 0 if n 0 an > 0, 1 + q 0 if n < 0 an > 0, q 0 if n 0 an < 0, 1 q 0 if n < 0 an < 0. Proof. When n 0 an > 0, this is Theorem 4.2 with N replace by N 1. Suppose n < 0 an > 0, say n = q r where 0 r 1. Define k = m 2 N+l 1. Then q m n 2 N+l 1 = k 2 l n 2 N r, (5.2) as in (4.4). Since 0 < k 2 l by hypothesis, the first fraction on the right of (5.2) is positive an r/ is nonnegative. The sum is at most 1/ + ( 1)/ = 1, so q 0 = m n/2 N+l 1 = q 1, as asserte. For < 0, use TRUNC(n/) = TRUNC(n/ ). Caution. When < 0, avoi rewriting the quotient as TRUNC(( n)/ ), which fails for n = 2 N 1. For a run time invariant ivisor, this leas to the coe in Figure 5.1. Its cost is 1 multiply, 3 as, 2 shifts, an 1 bit op per quotient. Explanation of Figure 5.1. The multiplier m satisfies 2 N 1 < m < 2 N except when = ±1; in the latter cases m = 2 N + 1. In either case m = m 2 N fits in an swor. We compute m n/2 N as n+ (m 2 N ) n/2 N, using MULSH. The subtraction of XSIGN(n) as one if n < 0. The last line negates the tentative quotient if < 0 (i.e., if sign = 1). Variation. ( An alternate computation of m is m = 2 N (2 l 1 ) ) + 1 TRUNC. This uses signe (2N) bit/n bit ivision, with N bit quotient. Initialization (given constant swor with 0): int l = max ( log 2, 1); uwor m = N+l 1 / ; swor m = m 2 N ; swor sign = XSIGN(); int sh post = l 1; For q = TRUNC(n/), all swor: swor q 0 = n + MULSH(m, n); q 0 = SRA(q 0, sh post ) XSIGN(n); q = EOR(q 0, sign ) sign ; Figure 5.1: Signe ivision by run time invariant ivisor, roune towars zero Overflow etection. The quotient n/ overflows if n = 2 N 1 an = 1. The algorithm in Figure 5.1 returns 2 N 1. If overflow etection is require, the final subtraction of sign shoul check for overflow. Improvement. If m is constant an even, then reuce m/2 l to lowest terms, as in the unsigne case. This improvement is reflecte in Figure 5.2, which generates coe for TRUNC(n/) where is a nonzero constant. Figure 5.2 also checks for ivisor being a power of 2 or negative thereof. Inputs: swor an n, with constant an 0. uwor m; int l, sh post ; (m, sh post, l) = CHOOSE MULTIPLIER(, N 1); if = 1 then Issue q = ; else if = 2 l then Issue q = SRA(n + SRL(SRA(n, l 1), N l), l); else if m < 2 N 1 then Issue q = SRA(MULSH(m, n), sh post ) XSIGN(n); else Issue q = SRA(n + MULSH(m 2 N, n), sh post ) XSIGN(n); Cmt. Caution m 2 N is negative. en if if < 0 then Issue q = q; en if Figure 5.2: Optimize coe generation of signe q = TRUNC(n/) for constant 0 Example. q = TRUNC(n/3). On a 32 bit machine. CHOOSE MULTIPLIER(3, 31) returns sh post = 0 an m = ( )/3. The coe q = MULSH(m, n) XSIGN(n) uses one multiply, one shift, one subtract.
6 6 Signe ivision, quotient roune towars Some languages require negative quotients to roun towars rather than zero. With some ingenuity, we can compute these quotients in terms of quotients which roun towars zero, even if the signs of the ivien an ivisor are unknown at compile time. If n an are integers, then the ientities TRUNC(n/) if n 0 an > 0, n TRUNC((n + 1)/) 1 if n < 0 an > 0, = TRUNC((n 1)/) 1 if n > 0 an < 0, TRUNC(n/) if n 0 an < 0 are easily verifie. Since the new numerators n±1 never overflow, these ientities can be use for computation. They are summarize by n ( ) n + sign n sign = TRUNC + q sign, (6.1) where sign = XSIGN(), n sign = XSIGN(OR(n, n + sign )), an q sign = EOR(n sign, sign ). The cost is 2 shifts, 3 as/subtracts, an 2 bit ops, plus the ivie (n + sign is a repeate subexpression). For remainers, a corollary to (2.1) an (6.1) is n mo = n TRUNC((n + sign n sign )/) q sign = ((n + sign n sign ) rem ) (6.2) sign + n sign q sign = ((n + sign n sign ) rem ) + AND( 2 sign 1, q sign ). The last equality in (6.2) can be verifie by separately checking the cases q sign = n sign sign = 0 an q sign = n sign + sign = 1. The subexpression 2 sign 1 epens only on. For rouning towars +, an analog of (6.1) is n ( ) n sign + n pos = TRUNC EOR( sign, n pos ), where sign = XSIGN() an n pos = (n > sign ). Improvement. If > 0 is constant, then sign = 0. Then (6.1) becomes n ( ) n nsign = TRUNC + n sign, where n sign = XSIGN(n). Since TRUNC( x) = TRUNC(x) an EOR( 1, n) = 1 n = (n + 1), this is equivalent to n ( ( )) EOR(nsign, n) = EOR n sign, TRUNC (6.3) ( > 0). The ivien an ivisor on the right of (6.3) are both nonnegative an below 2 N 1. One can view them as signe or as unsigne when applying earlier algorithms. Improvement. The XSIGN(OR(n, n + sign )) is equivalent to (n NOT( sign )) an to (n < sign ), where the relationals prouce 1 if true an 0 if false. On the MIPS R2000/R3000 [12], for example, one can compute sign = SRL(, N 1); n sign = (n < sign ); /* SLT, signe */ q sign = EOR( n sign, sign ); q = TRUNC((n ( sign ) + ( n sign ))/) ( q sign ); (six instructions plus the ivie), saving an instruction over (6.1). Improvement. If n known to be nonzero, then n sign simplifies to XSIGN(n). For constant ivisors, one can use (6.1) an the algorithm in Figure 5.2. For constant > 0 a shorter algorithm, base on (6.3), appears in Figure 6.1. Inputs: swor n an, with constant an 0. uwor m; int l, sh post ; (m, sh post, l) = CHOOSE MULTIPLIER(, N 1); if = 2 l then Issue q = SRA(n, l); else assert m < 2 N ; Issue swor n sign = XSIGN(n); Issue uwor q 0 = MULUH(m, EOR(n sign, n)); Issue q = EOR(n sign, SRL(q 0, sh post )); en if Figure 6.1: Optimize coe generation of signe q = n/ for constant > 0 Example. Using signe 32 bit arithmetic, the coe for r = n mo 10 (nonnegative remainer) can be swor n sign = XSIGN(n); uwor q 0 = MULUH(( )/5, EOR(n sign, n)); swor q = EOR(n sign, SRL(q 0, 2)); r = n SLL(q, 1) SLL(q, 3);. The cost is 1 multiply, 4 shifts, 2 bit ops, 2 subtracts. Alternately, if one has a fast signe ivision algorithm which rouns quotients towars 0 an returns remainers, then (6.2) justifies the coe r = ((n XSIGN(n)) rem 10) + AND(9, XSIGN(n)). The cost is 1 ivie, 1 shift, 1 bit op, 2 as/subtracts.
7 proceure CHOOSE MULTIPLIER(uwor, int prec); Cmt. Constant ivisor to invert. 1 < 2 N. Cmt. prec Number of bits of precision neee, 1 prec N. Cmt. Fins m, sh post, l such that: Cmt. 2 l 1 < 2 l. Cmt. 0 sh post l. If sh post > 0, then N + sh post l + prec. Cmt. 2 N+sh post < m 2 N+sh post (1 + 2 prec ). Cmt. Corollary. If 2 prec, then m < 2 N+sh post ( l )/ 2 N+sh post l+1. Cmt. Hence m fits in max(prec, N l) + 1 bits (unsigne). Cmt. int l = log 2, sh post = l; uwor m low = 2 N+l /, m high = (2 N+l + 2 N+l prec )/ ; Cmt. To avoi numerator overflow, compute m low as 2 N + (m low 2 N ). Cmt. Likewise for m high. Compare m in Figure 4.1. Invariant. m low = 2 N+sh post/ < m high = 2 N+sh post (1 + 2 prec )/. while m low /2 < m high /2 an sh post > 0 o m low = m low /2 ; m high = m high /2 ; sh post = sh post 1; en while; /* Reuce to lowest terms. */ return (m high, sh post, l); /* Three outputs. */ en CHOOSE MULTIPLIER; Figure 6.2: Selection of multiplier an shift count 7 Use of floating point One alternative to MULUH an MULSH uses floating point arithmetic. Let the floating point mantissa be F bits wie (e.g., F = 53 for IEEE ouble precision arithmetic). Then any floating point operation has relative error at most 2 1 F, regarless of the rouning moe, unless exponent overflow or unerflow occurs. Suppose N 1 an F N + 3. We claim that where ( n ) TRUNC = TRUNC(q est ), ( ) F q est n, (7.1) whenever n 2 N 1 an 0 < < 2 N, regarless of the rouning moes use to compute q est. The proof assumes that n > 0 an > 0, by negating both sies of (7.1) if necessary (the case n = 0 is trivial). Since the relative error per operation is at most 2 1 F, the estimate quotient q est satisfies F ( F ) 2 n q est ( F ) ( F ) 2 n. Use this an the inequalities 1 2 F F < F ( F ) 2, ( F ) ( F ) 2 1 < F N to erive (1 2 F ) n < q est < n/ n/ 1 2 N 1 1 n+1 = n + 1. Denote q = TRUNC(n/). Then q est < (n + 1)/ implies TRUNC(q est ) q. If q est < q, then (1 2 F ) q (1 2 F ) n < q est < q. Both q an q est are exactly representable as floating point numbers, but there are no representable numbers strictly between (1 2 F ) q an q. This contraiction shows that q est q an hence q = TRUNC(q est ). For quotients roune towars, use (6.1). If F = 53 an N 50, then (7.1) can be use for N bit integer ivision. The algorithm may trigger an IEEE exception for inexactness if the application program enables that conition. Alverson [1] uses integer multiplication, but computes the multiplier using floating point arithmetic. Baker [3] oes moular multiplication using a combination of floating point an integer arithmetic. 8 Diviing uwor by uwor One primitive operation for multiple precision arithmetic [14, p. 251] is the ivision of a uwor by a uwor, obtaining uwor quotient an remainer, where the quotient is known to be less than 2 N. We
8 Initialization (given uwor, where 0 < < 2 N ): int l = 1 + log 2 ; /* 2 l 1 < 2 l */ uwor m = (2 N (2 l ) 1)/ ; /* m = (2 N+l 1)/ 2 N */ uwor norm = SLL(, N l); /* Normalize ivisor 2 N l */ For q = n/ an r = n q, where, q, r are uwor an n is uwor: uwor n 2 = SLL(HIGH(n), N l) + SRL(LOW(n), l); /* See note about shift count. */ uwor n 10 = SLL(LOW(n), N l); /* n 10 = n 1 2 N 1 + n 0 2 N l */ /* Ignore overflow. */ swor n 1 = XSIGN(n 10 ); uwor n aj = n 10 + AND( n 1, norm 2 N ); /* n 10 + n 1 ( norm 2 N ) */ /* = n 1 ( norm 2 N 1 ) + n 0 2 N l */ uwor q 1 = n 2 + HIGH ( ) m (n 2 ( n 1 )) + n aj ; /* Unerflow is impossible. */ /* See Lemma 8.1. */ swor r = n 2 N + (2 N 1 q 1 ) ; /* r = n q 1, r < */ q = HIGH(r) (2 N 1 q 1 ) + 2 N ; /* A 1 to quotient if r 0. */ r = LOW(r) + AND( 2 N, HIGH(r)); /* A to remainer if r < 0. */ Figure 8.1: Unsigne ivision of uwor by run time invariant uwor. escribe a way to compute this quotient an remainer after some preliminary computations involving only the ivisor, when the ivisor is a run time invariant expression. Lemma 8.1 Suppose that, m, an l are nonnegative integers such that 2 l 1 < 2 l 2 N an 0 < 2 N+l m. (8.2) Given n with 0 n 2 N 1, write n = n 2 2 l + n 1 2 l 1 + n 0, where n 0, n 1, an n 2 are integers with 0 n 1 1 an 0 n 0 2 l 1 1. Define integers q 1 an q 0 by q 1 2 N + q 0 = n 2 2 N + (n 2 + n 1 ) (m 2 N ) + n 1 ( 2 N l 2 N 1) + n 0 2 N l (8.3) an 0 q 0 2 N 1. 0 n q 1 < 2. Then 0 q 1 2 N 1 an Proof. Define k = 2 N+l m. Then (8.2) implies 0 < k 2 l 1. The boun n 2 N 1 implies n 2 2 N l 1. Equation (8.2) implies m > 2 N+l / > 2 N. A corollary to (8.3) is q 1 2 N + q 0 = n 2 m + n 1 (m 2 N ) + 2 N l ( n 1 ( 2 l 1 ) + n 0 ) ( 2 N l 1) m + 1 (m 2 N ) + 2 N l ( 1 (2 l 1 1) + (2 l 1 1) ) = 2 N l ( m 2) < 2 2N. This proves the upper boun on the integer q 1. A straightforwar calculation using the efinitions of k an q 0 an n 0 reveals that n q 1 = (n 2 + n 1 ) k + q 0 2 N + (1 2 ) ) l (n 1 ( 2 l 1 ) + n 0. (8.4) Since 2 l 1 < 2 l by hypothesis, the right sie of (8.4) is nonnegative. This remainer is boune by ( 2 N l ) + (2 N 1) 2 N + (1 2 ) ( ) l 1 ( 2 l 1 ) + (2 l 1 1) ( ) 2 < 2 l + + (1 2 ) l = 2, completing the proof. This leas to an algorithm like that in Figure 8.1 when iviing a uwor by a run time invariant uwor with quotient known to be less than 2 N. Unlike the previous algorithms, this coe rouns the multiplier own when computing a reciprocal. After initializations epening only on the ivisor, this algorithm requires two proucts (both halves of each) an simple operations (incluing oublewor as an subtracts). Five registers hol, norm, l, m, an N l. Note. The shift count l in the computations of m an n 2 may equal N. If this is too large, use separate shifts by l 1 an 1. If a oublewor shift is available, compute n 2 an n 10 together.
9 9 Exact ivision by constants Occasionally a language construct requires a ivision whose remainer is known to vanish. An example occurs in C when subtracting two pointers. Their numerical ifference is ivie by the object size. The object size is a compile time constant. Suppose we want coe for q = n/, where is a nonzero constant an n is an expression known to be ivisible by. Write = 2 e o where o is o. Fin inv such that 1 inv 2 N 1 an Then inv o 1 (mo 2 N ). (9.1) 2 e q = 2 e n = n o ( inv o ) n = inv n (mo 2 N ), o as in [2]. Hence 2 e q inv n (mo 2 N ). Since n/ o = 2 e q fits in N bits, it must equal the lower half of the prouct inv n, namely MULL( inv, n). An SRA (for signe ivision) or SRL (for unsigne ivision) prouces the quotient q. The multiplicative inverse inv of o moulo 2 N can be foun by the extene Eucliean GCD algorithm [14, p. 325]. Another algorithm observes that (9.1) hols moulo 2 3 if inv = o. Each Newton iteration inv inv (2 inv o ) mo 2 N (9.2) oubles the known exponent by which (9.1) hols, so log 2 (N/3) iterations of (9.2) suffice. If o = ±1, then inv = o so the multiplication by inv is trivial or a negation. If is o, then e = 0 an the shift isappears. A variation tests whether an integer n is exactly ivisible by a nonzero constant without computing the remainer. If is a power of 2 (or the negative thereof, in the signe case), then check the lower bits of n to test whether ivies n. Otherwise compute inv an e as above. Let q 0 = MULL( inv, n). If n = q for some q, then q 0 = 2 e q must be a multiple of 2 e. The original ivision is exact (no remainer) precisely when (i) q 0 is a multiple of 2 e, an (ii) q 0 is sufficiently small that q 0 o is representable by the original ata type. For unsigne ivision check that 2 0 q 0 2 e N 1 an that the bottom e bits of q 0 (or of n) are zero. When e > 0, these tests can be combine if the architecture has a rotate (i.e., circular shift) instruction, or by expaning this rotate into 2 N 1 OR(SRL(q 0, e), SLL(q 0, N e)). For signe ivision check that 2 2 e N 1 2 q 0 2 e N 1 1 an that the bottom e bits of q 0 are zero; the interval check can be one with an a an one signe or unsigne compare. Relately, to test whether n rem = r, where an r are constants with 1 r < an where n is signe, check whether MULL( inv, n r) is a nonnegative multiple of 2 e not exceeing 2 e (2 N 1 1 r)/. Example. To test whether a signe 32 bit value i is ivisible by 100, let inv = ( )/25. Compute swor q 0 = MULL( inv, i). Next check whether q 0 is a multiple of 4 in the interval [ q max, q max ], where q max = ( )/25. Since these algorithms require only the lower half of a prouct, other optimizations for integer multiplication apply here too. For example, applying strength reuction to the C loop signe long i, imax; for (i = 0; i < imax; i++) { if ((i % 100) == 0) {... } } might yiel (** enotes exponentiation) const unsigne long inv = (19*2**32 + 1)/25; const unsigne long qmax = (2**3148)/25; unsigne long test = qmax; /* test = inv*i + qmax mo 2**32 */ for (i = 0; i < imax; i++, test += inv) { if (test <= 2*qmax && (test & 3) == 0) {... } } No explicit multiplication or ivision remains. 10 Implementation in GCC We have implemente the algorithms for constant ivisors in the freely available GCC compiler [21], by extening its machine an language inepenent internal coe generation. We also mae minor machine epenent moifications to some of the machine escriptor, or m files to get optimal coe. All languages an almost all processors supporte by GCC benefit. Our changes are scheule for inclusion in GCC 2.6.
10 To generate coe for ivision of N bit quantities, the CHOOSE MULTIPLIER function nees to perform (2N) bit arithmetic. This makes that proceure more complex than it might appear in Figure 6.2. Optimal selection of instructions epening on the bitsize of the operation is a tricky problem that we spent quite some time on. For some architectures, it is important to select a multiplication instruction that has the smallest available precision. On other architectures, the multiplication can be performe faster using a sequence of aitions, subtractions, an shifts. We have not implemente any algorithm for run time invariant ivisors. Only a few architectures (AMD 29050, Intel x86, Motorola 68k & 88110, an to some extent IBM POWER) have aequate harware support to make such an implementation viable, i.e., an instruction that can be use for integer logarithm computation, an a (2N) bit/n bit ivie instruction. Even with harware support, one must be careful that the transformation really improves the coe; e.g., a loop might nee to be execute many times before the faster loop boy outweighs the cost of the multiplier computation in the loop heaer. 11 Results Figure 11.1 has an example with compile time constant ivisor that gets rastically faster on all recent processor implementations. The program converts a binary number to a ecimal string. It calculates one quotient an one remainer per output igit. Table 11.1 shows the generate assembler coes for Alpha, MIPS, POWER, an SPARC. There is no explicit ivision. Although initially compute separately, the quotient an remainer calculations have been combine (by GCC s common subexpression elimination pass). The unsigne int ata type has 32 bits on all four architectures, but Alpha is a 64 bit architecture. The Alpha coe is longer than the others because it multiplies ( )/5 by x using 4 [ ( ) ( ) ( 4 [4 (4 x x) + x] x )] + x instea of the slower, 23 cycle, mulq. This illustrates that the multiplications neee by these algorithms can sometimes be compute quickly using a sequence of shifts, as, an subtracts [5], since multipliers for small constant ivisors have regular binary patterns. Table 11.2 compares the timing on some processor implementations for the raix conversion routine, with an without the ivision elimination algorithms. The number converte was a full 32 bit number, sufficiently large to hie proceure calling overhea from the measurements. We also ran the integer benchmarks from SPEC 92. The improvement was negligible for most of the programs; the best improvement seen was only about 3%. Some benchmarks that involve hashing show improvements up to about 30%. We anticipate significant improvements on some number theoretic coes. References [1] Robert Alverson. Integer ivision using reciprocals. In Peter Kornerup an Davi W. Matula, eitors, Proceeings 10th Symposium on Computer Arithmetic, pages , Grenoble, France, June [2] Ehu Artzy, James A. Hins, an Harry J. Saal. A fast ivision technique for constant ivisors. CACM, 19(2):98 101, February [3] Henry G. Baker. Computing A*B (mo N) efficiently in ANSI C. ACM SIGPLAN Notices, 27(1):95 98, January [4] H.B. Bakoglu, G.F. Grohoski, an R. K. Montoye. The IBM RISC system/6000 processor: Harware overview. IBM Journal of Research an Development, 34(1):12 22, January [5] Robert Bernstein. Multiplication by integer constants. Software Practice an Experience, 16(7): , July [6] Raymon T. Boute. The Eucliean efinition of the functions iv an mo. ACM Transactions on Programming Languages an Systems, 14(2): , April [7] A.P. Chang. A note on the moulo operation. SIGPLAN Notices, 20(4):19 23, April [8] Digital Equipment Corporation. DECchip AA Microprocessor, Harware Reference Manual, 1st eition, October [9] Intel Corporation, Santa Clara, CA. 386 DX Microprocessor Programmer s Reference Manual, [10] Intel Corporation, Santa Clara, CA. Intel486 Microprocessor Family Programmer s Reference Manual, [11] Davi H. Jacobsohn. A combinatoric ivision algorithm for fixeinteger ivisors. IEEE Trans. Comp., C 22(6): , June [12] Gerry Kane. MIPS RISC Architecture. Prentice Hall, Englewoo Cliffs, NJ, 1989.
11 #efine BUFSIZE 50 char *ecimal (unsigne int x) { static char buf[bufsize]; char *bp = buf + BUFSIZE  1; *bp = 0; o { *bp = 0 + x % 10; x /= 10; } while (x!= 0); return bp; /* Return pointer to first igit */ } Figure 11.1: Raix conversion coe Alpha MIPS POWER SPARC $2,buf la $5,buf+49 l 10,LC..0(2) sethi %hi(buf+49),%g2 sb $0,0($5) cau 11,0,0xcccc or %g2,%lo(buf+49),%o1 li $6,0xcccc0000 oril 11,11,0xccc stb %g0,[%o1] ori $6,$6,0xccc cal 0,0(0) sethi %hi(0xccccccc),%g2 L1: multu $4,$6 stb 0,0(10) or %g2,0xc,%o2 mfhi $3 L1: mul 9,3,11 L1: a %o1,1,%o1 subu $5,$5,1 srai 0,3,31 umul %o0,%o2,%g0 srl $3,$3,3 an 0,0,11 r %y,%g3 sll $2,$3,2 a 9,9,0 srl %g3,3,%g3 au $2,$2,$3 a 9,9,3 sll %g3,2,%g2 sll $2,$2,1 sri 9,9,3 a %g2,%g3,%g2 subu $2,$4,$2 muli 0,9,10 sll %g2,1,%g2 au $2,$2,48 sf 0,0,3 sub %o0,%g2,%g2 move $4,$3 ai. 3,9,0 a %g2,48,%g2 bne $4,$0,L1 ai 0,0,48 orcc %g3,%g0,%o0 sb $2,0($5) stbu 0,1(10) bne L1 j $31 bc 4,2,L1 stb %g2,[%o1] move $2,$5 la lq u $1,49($2) aq $2,49,$0 mskbl $1,$0,$1 stq u $1,49($2) L1: zapnot $16,15,$3 s4subq $3,$3,$2 s4aq $2,$3,$2 s4subq $2,$3,$2 sll $2,8,$1 subq $0,1,$0 aq $2,$1,$2 sll $2,16,$1 lq u $4,0($0) aq $2,$1,$2 s4aq $2,$3,$2 srl $2,35,$2 mskbl $4,$0,$4 s4al $2,$2,$1 aq $1,$1,$1 subl $16,$1,$1 al $1,48,$1 insbl $1,$0,$1 bis $2,$2,$16 bis $1,$4,$1 stq u $1,0($0) bne $16,L1 ret $31,($26),1 ai 3,10,0 br retl mov Table 11.1: Coe generate by our GCC for raix conversion %o1,%o0
12 Architecture/Implementation MHz Time with ivision performe Time with ivision eliminate Speeup ratio Motorola MC68020 [18, pp. 9 22] Motorola MC SPARC Viking [20] HP PA MIPS R3000 [12] MIPS R4000 [17] POWER/RIOS I [4, 22] DEC Alpha [8] * *This time ifference is artificial. The Alpha architecture has no integer ivie instruction, an the DEC library functions for ivision are slow. Table 11.2: Timing (microsecons) for raix conversion with an without ivision elimination [13] Donal E. Knuth. An empirical stuy of FOR TRAN programs. Technical Report CS 186, Computer Science Department, Stanfor University, Stanfor artificial intelligence project memo AIM 137. [14] Donal E. Knuth. Seminumerical Algorithms, volume 2 of The Art of Computer Programming. AisonWesley, Reaing, MA, 2n eition, [15] ShuoYen Robert Li. Fast constant ivision routines. IEEE Trans. Comp., C 34(9): , September [16] Daniel J. Magenheimer, Liz Peters, Karl Pettis, an Dan Zuras. Integer multiplication an ivision on the HP Precision Architecture. In Proceeings Secon International Conference on Architectural Support for Programming Languages an Operating Systems (ASPLOS II). ACM, Publishe as SIGPLAN Notices, Volume 22, No. 10, October, [17] MIPS Computer Systems, Inc, Sunnyvale, CA. MIPS R4000 Microprocessor User s Manual, [18] Motorola, Inc. MC Bit Microprocessor User s Manual, 2n eition, [19] Motorola, Inc. PowerPC 601 RISC Microprocessor User s Manual, [20] SPARC International, Inc., Menlo Park, CA. The SPARC Architecture Manual, Version 8, [21] Richar M. Stallman. Using an Porting GCC. The Free Software Founation, Cambrige, MA, [22] Henry Warren. Preicting Execution Time on the IBM RISC System/6000. IBM, Preliminary Version.
Improved division by invariant integers
1 Improve ivision by invariant integers Niels Möller an Torbjörn Granlun Abstract This paper consiers the problem of iviing a twowor integer by a singlewor integer, together with a few extensions an
More informationMath 230.01, Fall 2012: HW 1 Solutions
Math 3., Fall : HW Solutions Problem (p.9 #). Suppose a wor is picke at ranom from this sentence. Fin: a) the chance the wor has at least letters; SOLUTION: All wors are equally likely to be chosen. The
More informationFactoring Dickson polynomials over finite fields
Factoring Dickson polynomials over finite fiels Manjul Bhargava Department of Mathematics, Princeton University. Princeton NJ 08544 manjul@math.princeton.eu Michael Zieve Department of Mathematics, University
More informationPROBLEMS. A.1 Implement the COINCIDENCE function in sumofproducts form, where COINCIDENCE = XOR.
724 APPENDIX A LOGIC CIRCUITS (Corrispone al cap. 2  Elementi i logica) PROBLEMS A. Implement the COINCIDENCE function in sumofproucts form, where COINCIDENCE = XOR. A.2 Prove the following ientities
More informationA Generalization of Sauer s Lemma to Classes of LargeMargin Functions
A Generalization of Sauer s Lemma to Classes of LargeMargin Functions Joel Ratsaby University College Lonon Gower Street, Lonon WC1E 6BT, Unite Kingom J.Ratsaby@cs.ucl.ac.uk, WWW home page: http://www.cs.ucl.ac.uk/staff/j.ratsaby/
More informationQuiz for Chapter 3 Arithmetic for Computers 3.10
Date: Quiz for Chapter 3 Arithmetic for Computers 3.10 Not all questions are of equal difficulty. Please review the entire quiz first and then budget your time carefully. Name: Course: Solutions in RED
More informationLecture 8: Binary Multiplication & Division
Lecture 8: Binary Multiplication & Division Today s topics: Addition/Subtraction Multiplication Division Reminder: get started early on assignment 3 1 2 s Complement Signed Numbers two = 0 ten 0001 two
More informationPythagorean Triples Over Gaussian Integers
International Journal of Algebra, Vol. 6, 01, no., 5564 Pythagorean Triples Over Gaussian Integers Cheranoot Somboonkulavui 1 Department of Mathematics, Faculty of Science Chulalongkorn University Bangkok
More informationReview 1/2. CS61C Characters and Floating Point. Lecture 8. February 12, Review 2/2 : 12 new instructions Arithmetic:
Review 1/2 CS61C Characters and Floating Point Lecture 8 February 12, 1999 Handling case when number is too big for representation (overflow) Representing negative numbers (2 s complement) Comparing signed
More informationCS 103X: Discrete Structures Homework Assignment 3 Solutions
CS 103X: Discrete Structures Homework Assignment 3 s Exercise 1 (20 points). On wellordering and induction: (a) Prove the induction principle from the wellordering principle. (b) Prove the wellordering
More informationECE 0142 Computer Organization. Lecture 3 Floating Point Representations
ECE 0142 Computer Organization Lecture 3 Floating Point Representations 1 Floatingpoint arithmetic We often incur floatingpoint programming. Floating point greatly simplifies working with large (e.g.,
More information10.2 Systems of Linear Equations: Matrices
SECTION 0.2 Systems of Linear Equations: Matrices 7 0.2 Systems of Linear Equations: Matrices OBJECTIVES Write the Augmente Matrix of a System of Linear Equations 2 Write the System from the Augmente Matrix
More informationInverse Trig Functions
Inverse Trig Functions c A Math Support Center Capsule February, 009 Introuction Just as trig functions arise in many applications, so o the inverse trig functions. What may be most surprising is that
More informationCHAPTER 5 : CALCULUS
Dr Roger Ni (Queen Mary, University of Lonon)  5. CHAPTER 5 : CALCULUS Differentiation Introuction to Differentiation Calculus is a branch of mathematics which concerns itself with change. Irrespective
More informationINTEGER DIVISION BY CONSTANTS
12/30/03 CHAPTER 10 INTEGER DIVISION BY CONSTANTS Insert this material at the end of page 201, just before the poem on page 202. 10 17 Methods Not Using Multiply High In this section we consider some methods
More informationSome Polynomial Theorems. John Kennedy Mathematics Department Santa Monica College 1900 Pico Blvd. Santa Monica, CA 90405 rkennedy@ix.netcom.
Some Polynomial Theorems by John Kennedy Mathematics Department Santa Monica College 1900 Pico Blvd. Santa Monica, CA 90405 rkennedy@ix.netcom.com This paper contains a collection of 31 theorems, lemmas,
More informationA simple and fast algorithm for computing exponentials of power series
A simple and fast algorithm for computing exponentials of power series Alin Bostan Algorithms Project, INRIA ParisRocquencourt 7815 Le Chesnay Cedex France and Éric Schost ORCCA and Computer Science Department,
More informationComputer Science 281 Binary and Hexadecimal Review
Computer Science 281 Binary and Hexadecimal Review 1 The Binary Number System Computers store everything, both instructions and data, by using many, many transistors, each of which can be in one of two
More informationElementary Number Theory We begin with a bit of elementary number theory, which is concerned
CONSTRUCTION OF THE FINITE FIELDS Z p S. R. DOTY Elementary Number Theory We begin with a bit of elementary number theory, which is concerned solely with questions about the set of integers Z = {0, ±1,
More informationMath Review. for the Quantitative Reasoning Measure of the GRE revised General Test
Math Review for the Quantitative Reasoning Measure of the GRE revised General Test www.ets.org Overview This Math Review will familiarize you with the mathematical skills and concepts that are important
More informationMathematics Review for Economists
Mathematics Review for Economists by John E. Floy University of Toronto May 9, 2013 This ocument presents a review of very basic mathematics for use by stuents who plan to stuy economics in grauate school
More informationFirewall Design: Consistency, Completeness, and Compactness
C IS COS YS TE MS Firewall Design: Consistency, Completeness, an Compactness Mohame G. Goua an XiangYang Alex Liu Department of Computer Sciences The University of Texas at Austin Austin, Texas 787121188,
More information198:211 Computer Architecture
198:211 Computer Architecture Topics: Lecture 8 (W5) Fall 2012 Data representation 2.1 and 2.2 of the book Floating point 2.4 of the book 1 Computer Architecture What do computers do? Manipulate stored
More informationThis Unit: Floating Point Arithmetic. CIS 371 Computer Organization and Design. Readings. Floating Point (FP) Numbers
This Unit: Floating Point Arithmetic CIS 371 Computer Organization and Design Unit 7: Floating Point App App App System software Mem CPU I/O Formats Precision and range IEEE 754 standard Operations Addition
More informationDifferentiability of Exponential Functions
Differentiability of Exponential Functions Philip M. Anselone an John W. Lee Philip Anselone (panselone@actionnet.net) receive his Ph.D. from Oregon State in 1957. After a few years at Johns Hopkins an
More informationA single register, called the accumulator, stores the. operand before the operation, and stores the result. Add y # add y from memory to the acc
Other architectures Example. Accumulatorbased machines A single register, called the accumulator, stores the operand before the operation, and stores the result after the operation. Load x # into acc
More informationAttention: This material is copyright 19951997 Chris Hecker. All rights reserved.
Attention: This material is copyright 19951997 Chris Hecker. All rights reserved. You have permission to read this article for your own education. You do not have permission to put it on your website
More informationInstruction Set Architecture. or How to talk to computers if you aren t in Star Trek
Instruction Set Architecture or How to talk to computers if you aren t in Star Trek The Instruction Set Architecture Application Compiler Instr. Set Proc. Operating System I/O system Instruction Set Architecture
More informationAnswers to the Practice Problems for Test 2
Answers to the Practice Problems for Test 2 Davi Murphy. Fin f (x) if it is known that x [f(2x)] = x2. By the chain rule, x [f(2x)] = f (2x) 2, so 2f (2x) = x 2. Hence f (2x) = x 2 /2, but the lefthan
More informationLecture 3: Finding integer solutions to systems of linear equations
Lecture 3: Finding integer solutions to systems of linear equations Algorithmic Number Theory (Fall 2014) Rutgers University Swastik Kopparty Scribe: Abhishek Bhrushundi 1 Overview The goal of this lecture
More informationA Second Course in Mathematics Concepts for Elementary Teachers: Theory, Problems, and Solutions
A Second Course in Mathematics Concepts for Elementary Teachers: Theory, Problems, and Solutions Marcel B. Finan Arkansas Tech University c All Rights Reserved First Draft February 8, 2006 1 Contents 25
More informationA Comparison of Performance Measures for Online Algorithms
A Comparison of Performance Measures for Online Algorithms Joan Boyar 1, Sany Irani 2, an Kim S. Larsen 1 1 Department of Mathematics an Computer Science, University of Southern Denmark, Campusvej 55,
More informationElementary Number Theory and Methods of Proof. CSE 215, Foundations of Computer Science Stony Brook University http://www.cs.stonybrook.
Elementary Number Theory and Methods of Proof CSE 215, Foundations of Computer Science Stony Brook University http://www.cs.stonybrook.edu/~cse215 1 Number theory Properties: 2 Properties of integers (whole
More informationECE 0142 Computer Organization. Lecture 3 Floating Point Representations
ECE 0142 Computer Organization Lecture 3 Floating Point Representations 1 Floatingpoint arithmetic We often incur floatingpoint programming. Floating point greatly simplifies working with large (e.g.,
More informationU.C. Berkeley CS276: Cryptography Handout 0.1 Luca Trevisan January, 2009. Notes on Algebra
U.C. Berkeley CS276: Cryptography Handout 0.1 Luca Trevisan January, 2009 Notes on Algebra These notes contain as little theory as possible, and most results are stated without proof. Any introductory
More informationCS 16: Assembly Language Programming for the IBM PC and Compatibles
CS 16: Assembly Language Programming for the IBM PC and Compatibles First, a little about you Your name Have you ever worked with/used/played with assembly language? If so, talk about it Why are you taking
More informationModelling and Resolving Software Dependencies
June 15, 2005 Abstract Many Linux istributions an other moern operating systems feature the explicit eclaration of (often complex) epenency relationships between the pieces of software
More information26 Integers: Multiplication, Division, and Order
26 Integers: Multiplication, Division, and Order Integer multiplication and division are extensions of whole number multiplication and division. In multiplying and dividing integers, the one new issue
More informationReduced Instruction Set Computer (RISC)
Reduced Instruction Set Computer (RISC) Focuses on reducing the number and complexity of instructions of the ISA. RISC Goals RISC: Simplify ISA Simplify CPU Design Better CPU Performance Motivated by simplifying
More informationNumerical Matrix Analysis
Numerical Matrix Analysis Lecture Notes #10 Conditioning and / Peter Blomgren, blomgren.peter@gmail.com Department of Mathematics and Statistics Dynamical Systems Group Computational Sciences Research
More informationChapter 4, Arithmetic in F [x] Polynomial arithmetic and the division algorithm.
Chapter 4, Arithmetic in F [x] Polynomial arithmetic and the division algorithm. We begin by defining the ring of polynomials with coefficients in a ring R. After some preliminary results, we specialize
More informationDetermining the Optimal Combination of Trial Division and Fermat s Factorization Method
Determining the Optimal Combination of Trial Division and Fermat s Factorization Method Joseph C. Woodson Home School P. O. Box 55005 Tulsa, OK 74155 Abstract The process of finding the prime factorization
More informationThe mathematics of RAID6
The mathematics of RAID6 H. Peter Anvin First version 20 January 2004 Last updated 20 December 2011 RAID6 supports losing any two drives. syndromes, generally referred P and Q. The way
More information2010/9/19. Binary number system. Binary numbers. Outline. Binary to decimal
2/9/9 Binary number system Computer (electronic) systems prefer binary numbers Binary number: represent a number in base2 Binary numbers 2 3 + 7 + 5 Some terminology Bit: a binary digit ( or ) Hexadecimal
More informationFixedPoint Arithmetic
FixedPoint Arithmetic FixedPoint Notation A Kbit fixedpoint number can be interpreted as either: an integer (i.e., 20645) a fractional number (i.e., 0.75) 2 1 Integer FixedPoint Representation Nbit
More informationMODULAR ARITHMETIC. a smallest member. It is equivalent to the Principle of Mathematical Induction.
MODULAR ARITHMETIC 1 Working With Integers The usual arithmetic operations of addition, subtraction and multiplication can be performed on integers, and the result is always another integer Division, on
More information11 Ideals. 11.1 Revisiting Z
11 Ideals The presentation here is somewhat different than the text. In particular, the sections do not match up. We have seen issues with the failure of unique factorization already, e.g., Z[ 5] = O Q(
More informationHomework 5 Solutions
Homework 5 Solutions 4.2: 2: a. 321 = 256 + 64 + 1 = (01000001) 2 b. 1023 = 512 + 256 + 128 + 64 + 32 + 16 + 8 + 4 + 2 + 1 = (1111111111) 2. Note that this is 1 less than the next power of 2, 1024, which
More informationCHAPTER 5 Roundoff errors
CHAPTER 5 Roundoff errors In the two previous chapters we have seen how numbers can be represented in the binary numeral system and how this is the basis for representing numbers in computers. Since any
More informationInstruction Set Architecture (ISA)
Instruction Set Architecture (ISA) * Instruction set architecture of a machine fills the semantic gap between the user and the machine. * ISA serves as the starting point for the design of a new machine
More informationSources: On the Web: Slides will be available on:
C programming Introduction The basics of algorithms Structure of a C code, compilation step Constant, variable type, variable scope Expression and operators: assignment, arithmetic operators, comparison,
More informationThe programming language C. sws1 1
The programming language C sws1 1 The programming language C invented by Dennis Ritchie in early 1970s who used it to write the first Hello World program C was used to write UNIX Standardised as K&C (Kernighan
More informationMeasures of distance between samples: Euclidean
4 Chapter 4 Measures of istance between samples: Eucliean We will be talking a lot about istances in this book. The concept of istance between two samples or between two variables is funamental in multivariate
More informationThe mathematics of RAID6
The mathematics of RAID6 H. Peter Anvin 1 December 2004 RAID6 supports losing any two drives. The way this is done is by computing two syndromes, generally referred P and Q. 1 A quick
More informationProperties of Real Numbers
16 Chapter P Prerequisites P.2 Properties of Real Numbers What you should learn: Identify and use the basic properties of real numbers Develop and use additional properties of real numbers Why you should
More informationAn Efficient RNS to Binary Converter Using the Moduli Set {2n + 1, 2n, 2n 1}
An Efficient RNS to Binary Converter Using the oduli Set {n + 1, n, n 1} Kazeem Alagbe Gbolagade 1,, ember, IEEE and Sorin Dan Cotofana 1, Senior ember IEEE, 1. Computer Engineering Laboratory, Delft University
More informationSome facts about polynomials modulo m (Full proof of the Fingerprinting Theorem)
Some facts about polynomials modulo m (Full proof of the Fingerprinting Theorem) In order to understand the details of the Fingerprinting Theorem on fingerprints of different texts from Chapter 19 of the
More informationBinary Division. Decimal Division. Hardware for Binary Division. Simple 16bit Divider Circuit
Decimal Division Remember 4th grade long division? 43 // quotient 12 521 // divisor dividend 480 4136 5 // remainder Shift divisor left (multiply by 10) until MSB lines up with dividend s Repeat until
More informationWhy? A central concept in Computer Science. Algorithms are ubiquitous.
Analysis of Algorithms: A Brief Introduction Why? A central concept in Computer Science. Algorithms are ubiquitous. Using the Internet (sending email, transferring files, use of search engines, online
More informationThe BBP Algorithm for Pi
The BBP Algorithm for Pi David H. Bailey September 17, 2006 1. Introduction The BaileyBorweinPlouffe (BBP) algorithm for π is based on the BBP formula for π, which was discovered in 1995 and published
More informationNotes on Factoring. MA 206 Kurt Bryan
The General Approach Notes on Factoring MA 26 Kurt Bryan Suppose I hand you n, a 2 digit integer and tell you that n is composite, with smallest prime factor around 5 digits. Finding a nontrivial factor
More informationComputing Cubic Fields in QuasiLinear Time
Computing Cubic Fields in QuasiLinear Time K. Belabas Département de mathématiques (A2X) Université Bordeaux I 351, cours de la Libération, 33405 Talence (France) belabas@math.ubordeaux.fr Cubic fields
More informationRisk Management for Derivatives
Risk Management or Derivatives he Greeks are coming the Greeks are coming! Managing risk is important to a large number o iniviuals an institutions he most unamental aspect o business is a process where
More informationELET 7404 Embedded & Real Time Operating Systems. FixedPoint Math. Chap. 9, Labrosse Book. Fall 2007
ELET 7404 Embedded & Real Time Operating Systems FixedPoint Math Chap. 9, Labrosse Book Fall 2007 FixedPoint Math Most lowend processors, such as embedded processors Do not provide hardwareassisted
More informationUNIT 2 MATRICES  I 2.0 INTRODUCTION. Structure
UNIT 2 MATRICES  I Matrices  I Structure 2.0 Introduction 2.1 Objectives 2.2 Matrices 2.3 Operation on Matrices 2.4 Invertible Matrices 2.5 Systems of Linear Equations 2.6 Answers to Check Your Progress
More informationDNA Data and Program Representation. Alexandre David 1.2.05 adavid@cs.aau.dk
DNA Data and Program Representation Alexandre David 1.2.05 adavid@cs.aau.dk Introduction Very important to understand how data is represented. operations limits precision Digital logic built on 2valued
More informationWitt#5e: Generalizing integrality theorems for ghostwitt vectors [not completed, not proofread]
Witt vectors. Part 1 Michiel Hazewinkel Sienotes by Darij Grinberg Witt#5e: Generalizing integrality theorems for ghostwitt vectors [not complete, not proofrea In this note, we will generalize most of
More informationLecture 17: Implicit differentiation
Lecture 7: Implicit ifferentiation Nathan Pflueger 8 October 203 Introuction Toay we iscuss a technique calle implicit ifferentiation, which provies a quicker an easier way to compute many erivatives we
More informationLevent EREN levent.eren@ieu.edu.tr A306 Office Phone:4889882 INTRODUCTION TO DIGITAL LOGIC
Levent EREN levent.eren@ieu.edu.tr A306 Office Phone:4889882 1 Number Systems Representation Positive radix, positional number systems A number with radix r is represented by a string of digits: A n
More informationAPPLICATIONS OF THE ORDER FUNCTION
APPLICATIONS OF THE ORDER FUNCTION LECTURE NOTES: MATH 432, CSUSM, SPRING 2009. PROF. WAYNE AITKEN In this lecture we will explore several applications of order functions including formulas for GCDs and
More informationDecimal Numbers: Base 10 Integer Numbers & Arithmetic
Decimal Numbers: Base 10 Integer Numbers & Arithmetic Digits: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 Example: 3271 = (3x10 3 ) + (2x10 2 ) + (7x10 1 )+(1x10 0 ) Ward 1 Ward 2 Numbers: positional notation Number
More informationInteger roots of quadratic and cubic polynomials with integer coefficients
Integer roots of quadratic and cubic polynomials with integer coefficients Konstantine Zelator Mathematics, Computer Science and Statistics 212 Ben Franklin Hall Bloomsburg University 400 East Second Street
More information20. Product rule, Quotient rule
20. Prouct rule, 20.1. Prouct rule Prouct rule, Prouct rule We have seen that the erivative of a sum is the sum of the erivatives: [f(x) + g(x)] = x x [f(x)] + x [(g(x)]. One might expect from this that
More informationNegative Integer Exponents
7.7 Negative Integer Exponents 7.7 OBJECTIVES. Define the zero exponent 2. Use the definition of a negative exponent to simplify an expression 3. Use the properties of exponents to simplify expressions
More information> 2. Error and Computer Arithmetic
> 2. Error and Computer Arithmetic Numerical analysis is concerned with how to solve a problem numerically, i.e., how to develop a sequence of numerical calculations to get a satisfactory answer. Part
More informationDivide: Paper & Pencil. Computer Architecture ALU Design : Division and Floating Point. Divide algorithm. DIVIDE HARDWARE Version 1
Divide: Paper & Pencil Computer Architecture ALU Design : Division and Floating Point 1001 Quotient Divisor 1000 1001010 Dividend 1000 10 101 1010 1000 10 (or Modulo result) See how big a number can be
More informationBCD (ASCII) Arithmetic. Where and Why is BCD used? Packed BCD, ASCII, Unpacked BCD. BCD Adjustment Instructions AAA. Example
BCD (ASCII) Arithmetic We will first look at unpacked BCD which means strings that look like '4567'. Bytes then look like 34h 35h 36h 37h OR: 04h 05h 06h 07h x86 processors also have instructions for packed
More informationThe Quick Calculus Tutorial
The Quick Calculus Tutorial This text is a quick introuction into Calculus ieas an techniques. It is esigne to help you if you take the Calculus base course Physics 211 at the same time with Calculus I,
More informationHull, Chapter 11 + Sections 17.1 and 17.2 Additional reference: John Cox and Mark Rubinstein, Options Markets, Chapter 5
Binomial Moel Hull, Chapter 11 + ections 17.1 an 17.2 Aitional reference: John Cox an Mark Rubinstein, Options Markets, Chapter 5 1. OnePerio Binomial Moel Creating synthetic options (replicating options)
More information5 =5. Since 5 > 0 Since 4 7 < 0 Since 0 0
a p p e n d i x e ABSOLUTE VALUE ABSOLUTE VALUE E.1 definition. The absolute value or magnitude of a real number a is denoted by a and is defined by { a if a 0 a = a if a
More informationcomp 180 Lecture 21 Outline of Lecture Floating Point Addition Floating Point Multiplication HKUST 1 Computer Science
Outline of Lecture Floating Point Addition Floating Point Multiplication HKUST 1 Computer Science IEEE 754 floatingpoint standard In order to pack more bits into the significant, IEEE 754 makes the leading
More informationSection 3.3. Differentiation of Polynomials and Rational Functions. Difference Equations to Differential Equations
Difference Equations to Differential Equations Section 3.3 Differentiation of Polynomials an Rational Functions In tis section we begin te task of iscovering rules for ifferentiating various classes of
More information64Bit Architecture Speeds RSA By 4x
64Bit Architecture Speeds RSA By 4x MIPS Technologies, Inc. June 2002 Publickey cryptography, and RSA in particular, is increasingly important to ecommerce transactions. Many digital consumer appliances
More informationChapter 4. Arithmetic for Computers
Chapter 4 Arithmetic for Computers Arithmetic Where we've been: Performance (seconds, cycles, instructions) What's up ahead: Implementing the Architecture operation a b 32 32 ALU 32 result 2 Constructing
More informationDETERMINANTS. b 2. x 2
DETERMINANTS 1 Systems of two equations in two unknowns A system of two equations in two unknowns has the form a 11 x 1 + a 12 x 2 = b 1 a 21 x 1 + a 22 x 2 = b 2 This can be written more concisely in
More informationSUBGROUPS OF CYCLIC GROUPS. 1. Introduction In a group G, we denote the (cyclic) group of powers of some g G by
SUBGROUPS OF CYCLIC GROUPS KEITH CONRAD 1. Introduction In a group G, we denote the (cyclic) group of powers of some g G by g = {g k : k Z}. If G = g, then G itself is cyclic, with g as a generator. Examples
More informationThe Dirichlet Unit Theorem
Chapter 6 The Dirichlet Unit Theorem As usual, we will be working in the ring B of algebraic integers of a number field L. Two factorizations of an element of B are regarded as essentially the same if
More informationCopy in your notebook: Add an example of each term with the symbols used in algebra 2 if there are any.
Algebra 2  Chapter Prerequisites Vocabulary Copy in your notebook: Add an example of each term with the symbols used in algebra 2 if there are any. P1 p. 1 1. counting(natural) numbers  {1,2,3,4,...}
More informationLecture L253D Rigid Body Kinematics
J. Peraire, S. Winall 16.07 Dynamics Fall 2008 Version 2.0 Lecture L253D Rigi Boy Kinematics In this lecture, we consier the motion of a 3D rigi boy. We shall see that in the general threeimensional
More informationInteger Factorization using the Quadratic Sieve
Integer Factorization using the Quadratic Sieve Chad Seibert* Division of Science and Mathematics University of Minnesota, Morris Morris, MN 56567 seib0060@morris.umn.edu March 16, 2011 Abstract We give
More information3. Applications of Number Theory
3. APPLICATIONS OF NUMBER THEORY 163 3. Applications of Number Theory 3.1. Representation of Integers. Theorem 3.1.1. Given an integer b > 1, every positive integer n can be expresses uniquely as n = a
More informationAdaptive Online Gradient Descent
Adaptive Online Gradient Descent Peter L Bartlett Division of Computer Science Department of Statistics UC Berkeley Berkeley, CA 94709 bartlett@csberkeleyedu Elad Hazan IBM Almaden Research Center 650
More informationMATH10040 Chapter 2: Prime and relatively prime numbers
MATH10040 Chapter 2: Prime and relatively prime numbers Recall the basic definition: 1. Prime numbers Definition 1.1. Recall that a positive integer is said to be prime if it has precisely two positive
More informationInformation Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay
Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay Lecture  17 ShannonFanoElias Coding and Introduction to Arithmetic Coding
More informationIntroduction to Integration Part 1: AntiDifferentiation
Mathematics Learning Centre Introuction to Integration Part : AntiDifferentiation Mary Barnes c 999 University of Syney Contents For Reference. Table of erivatives......2 New notation.... 2 Introuction
More informationInverses and powers: Rules of Matrix Arithmetic
Contents 1 Inverses and powers: Rules of Matrix Arithmetic 1.1 What about division of matrices? 1.2 Properties of the Inverse of a Matrix 1.2.1 Theorem (Uniqueness of Inverse) 1.2.2 Inverse Test 1.2.3
More informationCS208 Computer Architecture
CS208 Computer Architecture Lecture 5 Computer Arithmetic II: Shifters & Multipliers Fall 2014 Prof. Babak Falsafi parsa.epfl.ch/courses/cs208/ Adapted from slides originally developed by Profs. Hill,
More informationContinued Fractions and the Euclidean Algorithm
Continued Fractions and the Euclidean Algorithm Lecture notes prepared for MATH 326, Spring 997 Department of Mathematics and Statistics University at Albany William F Hammond Table of Contents Introduction
More informationOct: 50 8 = 6 (r = 2) 6 8 = 0 (r = 6) Writing the remainders in reverse order we get: (50) 10 = (62) 8
ECE Department Summer LECTURE #5: Number Systems EEL : Digital Logic and Computer Systems Based on lecture notes by Dr. Eric M. Schwartz Decimal Number System: Our standard number system is base, also
More information