Division by Invariant Integers using Multiplication
|
|
- Rosa Snow
- 8 years ago
- Views:
Transcription
1 Division by Invariant Integers using Multiplication Torbjörn Granlun Cygnus Support 1937 Lanings Drive Mountain View, CA Peter L. Montgomery Centrum voor Wiskune en Informatica 780 Las Colinas Roa San Rafael, CA Abstract Integer ivision remains expensive on toay s processors as the cost of integer multiplication eclines. We present coe sequences for ivision by arbitrary nonzero integer constants an run time invariants using integer multiplication. The algorithms assume a two s complement architecture. Most also require that the upper half of an integer prouct be quickly accessible. We treat unsigne ivision, signe ivision where the quotient rouns towars zero, signe ivision where the quotient rouns towars, an ivision where the result is known a priori to be exact. We give some implementation results using the C compiler GCC. 1 Introuction The cost of an integer ivision on toay s RISC processors is several times that of an integer multiplication. The tren is towars fast, often pipeline combinatoric multipliers that perform an operation in typically less than 10 cycles, with either no harware support for integer ivision or iterating iviers that are several times slower than the multiplier. Table 1.1 compares multiplication an ivision times on some processors. This table illustrates that the iscrepancy between multiplication an ivision timing has been growing. Integer ivision is use heavily in base conversions, number theoretic coes, an graphics coes. Compilers Work one by first author while at Sweish Institute of Computer Science, Stockholm, Sween. Work one by secon author while at University of California, Los Angeles. Supporte by U.S. Army fellowship DAAL03 89 G generate integer ivisions to compute loop counts an subtract pointers. In a static analysis of FORTRAN programs, Knuth [13, p. 9] reports that 39% of arithmetic operators were aitions, 22% subtractions, 27% multiplications, 10% ivisions, an 2% exponentiations. Knuth s counts o not istinguish integer an floating point operations, except that 4% of the ivisions were ivisions by 2. When integer multiplication is cheaper than integer ivision, it is beneficial to substitute a multiplication for a ivision. Multiple authors [2, 11, 15] present algorithms for ivision by constants, but only when the ivisor ivies 2 k 1 for some small k. Magenheimer et al [16, 7] give the founation of a more general approach, which Alverson [1] implements on the Tera Computing System. Compiler writers are only beginning to become aware of the general technique. For example, version 1.02 of the IBM RS/6000 xlc an xlf compilers uses the integer multiply instruction to expan signe integer ivisions by 3, 5, 7, 9, 25, an 125, but not by other o integer ivisors below 256, an never for unsigne ivision. We assume an N bit two s complement architecture. Unsigne (i.e., nonnegative) integers range from 0 to 2 N 1 inclusive; signe integers range from 2 N 1 to 2 N 1 1. We enote these integers by uwor an swor respectively. Unsigne oublewor integers (range 0 to 2 2N 1) are enote by uwor. Signe oublewor integers (range 2 2N 1 to 2 2N 1 1) are enote by swor. The type int is use for shift counts an logarithms. Several of the algorithms require the upper half of an integer prouct obtaine by multiplying two uwors or two swors. All algorithms nee simple operations such as as, shifts, an bitwise operations (bit ops) on uwors an swors, as summarize in Table 3.1. We show how to use these operations to ivie by arbitrary nonzero constants, as well as by ivisors which are loop invariant or repeate in a basic block, using one multiplication plus a few simple instructions per ivision. The presentation concentrates on three types of
2 Architecture/Implementation N Approx. Year Motorola MC68020 [18, pp. 9 22] Time (cycles) for HIGH(N bit N bit) Motorola MC Intel 386 [9] Intel 486 [10] Intel Pentium SPARC Cypress CY7C S 100 S SPARC Viking [20] HP PA 83 [16] S 70 S HP PA FP 70 S MIPS R3000 [12] P 35 P Time (cycles) for N bit/n bit ivie (unsigne) (signe) MIPS R4000 [17] P 139 POWER/RIOS I [4, 22] (signe only) 19 (signe only) PowerPC/MPC601 [19] DEC Alpha 21064AA [8] P 200 S Motorola MC S 38 Motorola MC P 18 S No irect harware support; approximate cycle count for software implementation F Does not inclue time for moving ata to/from floating point registers P Pipeline implementation (i.e., inepenent instructions can execute simultaneously) Table 1.1: Multiplication an ivision times on ifferent CPUs ivision, in orer by ifficulty: (i) unsigne, (ii) signe, quotient roune towars zero, (iii) signe, quotient roune towars. Other topics are ivision of a uwor by a run time invariant uwor, ivision when the remainer is known a priori to be zero, an testing for a given remainer. In each case we give the mathematical backgroun an suggest an algorithm which a compiler can use to generate the coe. The algorithms are ineffective when a ivisor is not invariant, such as in the Eucliean GCD algorithm. Most algorithms presente herein yiel only the quotient. The remainer, if esire, can be compute by an aitional multiplication an subtraction. We have implemente the algorithms in a evelopmental version of the GCC 2.6 compiler [21]. DEC uses some of these algorithms in its Alpha AXP compilers. 2 Mathematical notations Let x be a real number. Then x enotes the largest integer not exceeing x an x enotes the least integer not less than x. Let TRUNC(x) enote the integer part of x, roune towars zero. Formally, TRUNC(x) = x if x 0 an TRUNC(x) = x if x < 0. The absolute value of x is x. For x > 0, the (real) base 2 logarithm of x is log 2 x. A multiplication is written x y. If x, y, an n are integers an n 0, then x y (mo n) means x y is a multiple of n. Two remainer operators are common in language efinitions. Sometimes a remainer has the sign of the ivien an sometimes the sign of the ivisor. We use the Aa notations n rem = n TRUNC(n/) n mo = n n/ (sign of ivien), (sign of ivisor). (2.1) The Fortran 90 names are MOD an MODULO. In C, the efinition of remainer is implementation epenent (many C implementations roun signe quotients towars zero an use rem remainering). Other efinitions have been propose [6, 7]. If n is an uwor or swor, then HIGH(n) an LOW(n) enote the most significant an least significant halves of n. LOW(n) is a uwor, while HIGH(n) is an uwor if n is a uwor an an swor if n is a swor. In both cases n = 2 N HIGH(n) + LOW(n). 3 Assume instructions The suggeste coe assumes the operations in Table 3.1, on an N bit machine. Some primitives, such as loaing constants an operans, are implicit in the notation an are not inclue in the operation counts.
3 TRUNC(x) Truncation towars zero; see 2. HIGH(x), LOW(x) Upper an lower halves of x: see 2. MULL(x, y) Lower half of prouct x y (i.e., prouct moulo 2 N ). MULSH(x, y) Upper half of signe prouct x y: If 2 N 1 x, y 2 N 1 1, then x y = 2 N MULSH(x, y) + MULL(x, y). MULUH(x, y) Upper half of unsigne prouct x y: If 0 x, y 2 N 1, then x y = 2 N MULUH(x, y) + MULL(x, y). AND(x, y) Bitwise AND of x an y. EOR(x, y) Bitwise exclusive OR of x an y. NOT(x) Bitwise complement of x. Equal to 1 x if x is signe, to 2 N 1 x if x is unsigne. OR(x, y) Bitwise OR of x an y. SLL(x, n) Logical left shift of x by n bits (0 n N 1). SRA(x, n) Arithmetic right shift of x by n bits (0 n N 1). SRL(x, n) Logical right shift of x by n bits (0 n N 1). XSIGN(x) 1 if x < 0; 0 if x 0. Short for SRA(x, N 1) or SRL(x, N 1). x + y, x y, x Two s complement aition, subtraction, negation. Table 3.1: Mathematical notations an primitive operations The algorithm in 8 requires the ability to a or subtract two oublewors, obtaining a oublewor result; this typically expans into 2 4 instructions. The algorithms for processing constant ivisors require compile time arithmetic on uwors. Algorithms for processing run time invariant ivisors require taking the base 2 logarithm of a positive integer (sometimes roune up, sometimes own) an require iviing a uwor by a uwor. If the algorithms are use only for constant ivisors, then these operations are neee only at compile time. If the architecture has a leaing zero count (LDZ) instruction, then these logarithms can be foun from log 2 x = N LDZ(x 1), log 2 x = N 1 LDZ(x) (1 x 2 N 1). Some algorithms may prouce expressions such as SRL(x, 0) or (x y); the optimizer shoul make the obvious simplifications. Some escriptions show an aition or subtraction of 2 N, which is a no-op. If an architecture lacks arithmetic right shift, then it can be compute from the ientity SRA(x, l) = SRL(x + 2 N 1, l) 2 N 1 l whenever 0 l N 1. If an architecture has only one of MULSH an MULUH, then the other can be compute using MULUH(x, y) = MULSH(x, y) + AND(x, XSIGN(y)) + AND(y, XSIGN(x)) for arbitrary N bit patterns x, y (interprete as uwors for MULUH an as swors for MULSH). 4 Unsigne ivision Suppose we want to compile an unsigne ivision q = n/, where 0 < < 2 N is a constant or run time invariant an 0 n < 2 N is variable. Let s try to fin a rational approximation m/2 N+l of 1/ such that n m n = 2 N+l whenever 0 n 2 N 1. (4.1) Setting n = in (4.1) shows we require 2 N+l m. Setting n = q 1 shows 2 N+l q > m (q 1). Multiply by to erive ( m 2 N+l) (q 1) < 2 N+l. This inequality will hol for all values of q 1 below 2 N if m 2 N+l 2 l. Theorem 4.2 below states that these conitions are sufficient, because the maximum relative error (1 part in 2 N ) is too small to affect the quotient when n < 2 N. Theorem 4.2 Suppose m,, l are nonnegative integers such that 0 an 2 N+l m 2 N+l + 2 l. (4.3) Then n/ = m n/2 N+l for every integer n with 0 n < 2 N. Proof. Define k = m 2 N+l. Then 0 k 2 l by hypothesis. Given n with 0 n < 2 N, write n = q + r where q = n/ an 0 r 1. We must show that q = m n/2 N+l. A calculation gives m n k + 2N+l q = 2N+l n 2 N+l q = k n 2 N+l + n n r = k 2 l n 2 N 1 + r. (4.4)
4 This ifference is nonnegative an oes not excee 1 2N 1 2 N = N < 1. Theorem 4.2 allows ivision by to be replace with multiplication by m/2 N+l if (4.3) hols. In general we require 2 l 1 to ensure that a suitable multiple of exists in the interval [2 N+l, 2 N+l +2 l ]. For compatibility with the algorithms for signe ivision ( 5 an 6), it is convenient to choose m > 2 N+l even though Theorem 4.2 permits equality. Since m can be almost as large as 2 N+1, we on t multiply by m irectly, but instea by 2 N an m 2 N. This leas to the coe in Figure 4.1. Its cost is 1 multiply, 2 as/subtracts, an 2 shifts per quotient, after computing constants epenent only on the ivisor. Initialization (given uwor with 1 < 2 N ): int l = log 2 ; /* 2 l 2 1 */ uwor m = 2 N (2 l )/ + 1; /* m = 2 N+l / 2 N + 1 */ int sh 1 = min(l, 1); int sh 2 = max(l 1, 0); /* sh 2 = l sh 1 */ For q = n/, all uwor: uwor t 1 = MULUH(m, n); q = SRL(t 1 + SRL(n t 1, sh 1 ), sh 2 ); Figure 4.1: Unsigne ivision by run time invariant ivisor Explanation of Figure 4.1. If = 1, then l = 0, so m = 1 an sh 1 = sh 2 = 0. The coe computes t 1 = 1 n/2 N = 0 an q = n. If > 1, then l 1, so sh 1 = 1 an sh 2 = l 1. Since m 2N (2 l ) + 1 2N ( 1) + 1 < 2 N, the value of m fits in a uwor. Since 0 t 1 n, the formula for q simplifies to q = SRL(t 1 + SRL(n t 1, 1), l 1) t1 + (n t 1 )/2 = 2 l 1 (t1 + n)/2 t1 + n = =. 2 l 1 2 l (4.5) But t 1 + n = m n/2 N + n = (m + 2 N ) n/2 N. Set m = m + 2 N = 2 N+l / + 1. The hypothesis of Theorem 4.2 is satisfie since 2 N+l < m 2 N+l + 2 N+l + 2 l. Caution. Conceptually q is SRL(n + t 1, l), as in (4.5). Do not compute q this way, since n+t 1 may overflow N bits an the shift count may be out of bouns. Improvement. If is constant an a power of 2, replace the ivision by a shift. Improvement. If is constant an m = m + 2 N is even, then reuce m/2 l to lowest terms. The reuce multiplier fits in N bits, unlike the original. In rare cases (e.g., = 641 on a 32 bit machine, = on a 64 bit machine) the final shift is zero. Improvement. If is constant an even, rewrite n n/2 e = /2 e for some e > 0. Then n/2 e can be compute using SRL. Since n/2 e < 2 N e, less precision is neee in the multiplier than before. These ieas are reflecte in Figure 4.2, which generates coe for n/ where n is unsigne an is constant. Proceure CHOOSE MULTIPLIER, which is share by this an later algorithms, appears in Figure 6.2. Inputs: uwor an n, with constant. uwor o, t 1 ; uwor m; int e, l, l ummy, sh post, sh pre ; (m, sh post, l) = CHOOSE MULTIPLIER(, N); if m 2 N an is even then Fin e such that = 2 e o an o is o. /* 2 e = AND(, 2 N ) */ sh pre = e; (m, sh post, l ummy ) = CHOOSE MULTIPLIER( o, N e); else sh pre = 0; en if if = 2 l then Issue q = SRL(n, l); else if m 2 N then assert sh pre = 0; Issue t 1 = MULUH(m 2 N, n); Issue q = SRL(t 1 + SRL(n t 1, 1), sh post 1); else Issue q = SRL(MULUH(m, SRL(n, sh pre )), sh post ); en if Figure 4.2: Optimize coe generation of unsigne q = n/ for constant nonzero The following three examples illustrate the cases in Figure 4.2. All assume unsigne 32 bit arithmetic. Example. q = n/10. CHOOSE MULTIPLIER fins m low = (2 36 6)/10 an m high = ( )/10. After one roun of ivisions by 2, it returns (m, 3, 4), where m = ( )/5. The suggeste coe q = SRL(MULUH(( )/5, n), 3) eliminates the pre shift by 0. See Table Example. q = n/7. Here m = ( )/7 > This example uses the longer sequence in Figure 4.1. Example. q = n/14. CHOOSE MULTIPLIER first returns the same multiplier as when = 7. The
5 suggeste coe uses separate ivisions by 2 an 7: q = SRL(MULUH(( )/7, SRL(n, 1)), 2). 5 Signe ivision, quotient roune towars 0 Suppose we want to compile a signe ivision q = TRUNC(n/), where is constant or run time invariant, 0 < 2 N 1, an where 2 N 1 n 2 N 1 1 is variable. All quotients are to be roune towars zero. We coul prove a theorem like Theorem 4.2 about when TRUNC(n/) = TRUNC(m n/2 N+l ) for all n in a suitable range (cf. (7.1)), but it wouln t help since we can t compute the right sie given only m n/2 N. Instea we show how to ajust the estimate quotient when the ivien or ivisor is negative. Theorem 5.1 Suppose m,, l are integers such that 0 an 0 < m 2 N+l 1 2 l. Let n be an arbitrary integer such that 2 N 1 n 2 N 1 1. Define q 0 = m n/2 N+l 1. Then ( n ) TRUNC = q 0 if n 0 an > 0, 1 + q 0 if n < 0 an > 0, q 0 if n 0 an < 0, 1 q 0 if n < 0 an < 0. Proof. When n 0 an > 0, this is Theorem 4.2 with N replace by N 1. Suppose n < 0 an > 0, say n = q r where 0 r 1. Define k = m 2 N+l 1. Then q m n 2 N+l 1 = k 2 l n 2 N r, (5.2) as in (4.4). Since 0 < k 2 l by hypothesis, the first fraction on the right of (5.2) is positive an r/ is nonnegative. The sum is at most 1/ + ( 1)/ = 1, so q 0 = m n/2 N+l 1 = q 1, as asserte. For < 0, use TRUNC(n/) = TRUNC(n/ ). Caution. When < 0, avoi rewriting the quotient as TRUNC(( n)/ ), which fails for n = 2 N 1. For a run time invariant ivisor, this leas to the coe in Figure 5.1. Its cost is 1 multiply, 3 as, 2 shifts, an 1 bit op per quotient. Explanation of Figure 5.1. The multiplier m satisfies 2 N 1 < m < 2 N except when = ±1; in the latter cases m = 2 N + 1. In either case m = m 2 N fits in an swor. We compute m n/2 N as n+ (m 2 N ) n/2 N, using MULSH. The subtraction of XSIGN(n) as one if n < 0. The last line negates the tentative quotient if < 0 (i.e., if sign = 1). Variation. ( An alternate computation of m is m = 2 N (2 l 1 ) ) + 1 TRUNC. This uses signe (2N) bit/n bit ivision, with N bit quotient. Initialization (given constant swor with 0): int l = max ( log 2, 1); uwor m = N+l 1 / ; swor m = m 2 N ; swor sign = XSIGN(); int sh post = l 1; For q = TRUNC(n/), all swor: swor q 0 = n + MULSH(m, n); q 0 = SRA(q 0, sh post ) XSIGN(n); q = EOR(q 0, sign ) sign ; Figure 5.1: Signe ivision by run time invariant ivisor, roune towars zero Overflow etection. The quotient n/ overflows if n = 2 N 1 an = 1. The algorithm in Figure 5.1 returns 2 N 1. If overflow etection is require, the final subtraction of sign shoul check for overflow. Improvement. If m is constant an even, then reuce m/2 l to lowest terms, as in the unsigne case. This improvement is reflecte in Figure 5.2, which generates coe for TRUNC(n/) where is a nonzero constant. Figure 5.2 also checks for ivisor being a power of 2 or negative thereof. Inputs: swor an n, with constant an 0. uwor m; int l, sh post ; (m, sh post, l) = CHOOSE MULTIPLIER(, N 1); if = 1 then Issue q = ; else if = 2 l then Issue q = SRA(n + SRL(SRA(n, l 1), N l), l); else if m < 2 N 1 then Issue q = SRA(MULSH(m, n), sh post ) XSIGN(n); else Issue q = SRA(n + MULSH(m 2 N, n), sh post ) XSIGN(n); Cmt. Caution m 2 N is negative. en if if < 0 then Issue q = q; en if Figure 5.2: Optimize coe generation of signe q = TRUNC(n/) for constant 0 Example. q = TRUNC(n/3). On a 32 bit machine. CHOOSE MULTIPLIER(3, 31) returns sh post = 0 an m = ( )/3. The coe q = MULSH(m, n) XSIGN(n) uses one multiply, one shift, one subtract.
6 6 Signe ivision, quotient roune towars Some languages require negative quotients to roun towars rather than zero. With some ingenuity, we can compute these quotients in terms of quotients which roun towars zero, even if the signs of the ivien an ivisor are unknown at compile time. If n an are integers, then the ientities TRUNC(n/) if n 0 an > 0, n TRUNC((n + 1)/) 1 if n < 0 an > 0, = TRUNC((n 1)/) 1 if n > 0 an < 0, TRUNC(n/) if n 0 an < 0 are easily verifie. Since the new numerators n±1 never overflow, these ientities can be use for computation. They are summarize by n ( ) n + sign n sign = TRUNC + q sign, (6.1) where sign = XSIGN(), n sign = XSIGN(OR(n, n + sign )), an q sign = EOR(n sign, sign ). The cost is 2 shifts, 3 as/subtracts, an 2 bit ops, plus the ivie (n + sign is a repeate subexpression). For remainers, a corollary to (2.1) an (6.1) is n mo = n TRUNC((n + sign n sign )/) q sign = ((n + sign n sign ) rem ) (6.2) sign + n sign q sign = ((n + sign n sign ) rem ) + AND( 2 sign 1, q sign ). The last equality in (6.2) can be verifie by separately checking the cases q sign = n sign sign = 0 an q sign = n sign + sign = 1. The subexpression 2 sign 1 epens only on. For rouning towars +, an analog of (6.1) is n ( ) n sign + n pos = TRUNC EOR( sign, n pos ), where sign = XSIGN() an n pos = (n > sign ). Improvement. If > 0 is constant, then sign = 0. Then (6.1) becomes n ( ) n nsign = TRUNC + n sign, where n sign = XSIGN(n). Since TRUNC( x) = TRUNC(x) an EOR( 1, n) = 1 n = (n + 1), this is equivalent to n ( ( )) EOR(nsign, n) = EOR n sign, TRUNC (6.3) ( > 0). The ivien an ivisor on the right of (6.3) are both nonnegative an below 2 N 1. One can view them as signe or as unsigne when applying earlier algorithms. Improvement. The XSIGN(OR(n, n + sign )) is equivalent to (n NOT( sign )) an to (n < sign ), where the relationals prouce 1 if true an 0 if false. On the MIPS R2000/R3000 [12], for example, one can compute sign = SRL(, N 1); n sign = (n < sign ); /* SLT, signe */ q sign = EOR( n sign, sign ); q = TRUNC((n ( sign ) + ( n sign ))/) ( q sign ); (six instructions plus the ivie), saving an instruction over (6.1). Improvement. If n known to be nonzero, then n sign simplifies to XSIGN(n). For constant ivisors, one can use (6.1) an the algorithm in Figure 5.2. For constant > 0 a shorter algorithm, base on (6.3), appears in Figure 6.1. Inputs: swor n an, with constant an 0. uwor m; int l, sh post ; (m, sh post, l) = CHOOSE MULTIPLIER(, N 1); if = 2 l then Issue q = SRA(n, l); else assert m < 2 N ; Issue swor n sign = XSIGN(n); Issue uwor q 0 = MULUH(m, EOR(n sign, n)); Issue q = EOR(n sign, SRL(q 0, sh post )); en if Figure 6.1: Optimize coe generation of signe q = n/ for constant > 0 Example. Using signe 32 bit arithmetic, the coe for r = n mo 10 (nonnegative remainer) can be swor n sign = XSIGN(n); uwor q 0 = MULUH(( )/5, EOR(n sign, n)); swor q = EOR(n sign, SRL(q 0, 2)); r = n SLL(q, 1) SLL(q, 3);. The cost is 1 multiply, 4 shifts, 2 bit ops, 2 subtracts. Alternately, if one has a fast signe ivision algorithm which rouns quotients towars 0 an returns remainers, then (6.2) justifies the coe r = ((n XSIGN(n)) rem 10) + AND(9, XSIGN(n)). The cost is 1 ivie, 1 shift, 1 bit op, 2 as/subtracts.
7 proceure CHOOSE MULTIPLIER(uwor, int prec); Cmt. Constant ivisor to invert. 1 < 2 N. Cmt. prec Number of bits of precision neee, 1 prec N. Cmt. Fins m, sh post, l such that: Cmt. 2 l 1 < 2 l. Cmt. 0 sh post l. If sh post > 0, then N + sh post l + prec. Cmt. 2 N+sh post < m 2 N+sh post (1 + 2 prec ). Cmt. Corollary. If 2 prec, then m < 2 N+sh post ( l )/ 2 N+sh post l+1. Cmt. Hence m fits in max(prec, N l) + 1 bits (unsigne). Cmt. int l = log 2, sh post = l; uwor m low = 2 N+l /, m high = (2 N+l + 2 N+l prec )/ ; Cmt. To avoi numerator overflow, compute m low as 2 N + (m low 2 N ). Cmt. Likewise for m high. Compare m in Figure 4.1. Invariant. m low = 2 N+sh post/ < m high = 2 N+sh post (1 + 2 prec )/. while m low /2 < m high /2 an sh post > 0 o m low = m low /2 ; m high = m high /2 ; sh post = sh post 1; en while; /* Reuce to lowest terms. */ return (m high, sh post, l); /* Three outputs. */ en CHOOSE MULTIPLIER; Figure 6.2: Selection of multiplier an shift count 7 Use of floating point One alternative to MULUH an MULSH uses floating point arithmetic. Let the floating point mantissa be F bits wie (e.g., F = 53 for IEEE ouble precision arithmetic). Then any floating point operation has relative error at most 2 1 F, regarless of the rouning moe, unless exponent overflow or unerflow occurs. Suppose N 1 an F N + 3. We claim that where ( n ) TRUNC = TRUNC(q est ), ( ) F q est n, (7.1) whenever n 2 N 1 an 0 < < 2 N, regarless of the rouning moes use to compute q est. The proof assumes that n > 0 an > 0, by negating both sies of (7.1) if necessary (the case n = 0 is trivial). Since the relative error per operation is at most 2 1 F, the estimate quotient q est satisfies F ( F ) 2 n q est ( F ) ( F ) 2 n. Use this an the inequalities 1 2 F F < F ( F ) 2, ( F ) ( F ) 2 1 < F N to erive (1 2 F ) n < q est < n/ n/ 1 2 N 1 1 n+1 = n + 1. Denote q = TRUNC(n/). Then q est < (n + 1)/ implies TRUNC(q est ) q. If q est < q, then (1 2 F ) q (1 2 F ) n < q est < q. Both q an q est are exactly representable as floating point numbers, but there are no representable numbers strictly between (1 2 F ) q an q. This contraiction shows that q est q an hence q = TRUNC(q est ). For quotients roune towars, use (6.1). If F = 53 an N 50, then (7.1) can be use for N bit integer ivision. The algorithm may trigger an IEEE exception for inexactness if the application program enables that conition. Alverson [1] uses integer multiplication, but computes the multiplier using floating point arithmetic. Baker [3] oes moular multiplication using a combination of floating point an integer arithmetic. 8 Diviing uwor by uwor One primitive operation for multiple precision arithmetic [14, p. 251] is the ivision of a uwor by a uwor, obtaining uwor quotient an remainer, where the quotient is known to be less than 2 N. We
8 Initialization (given uwor, where 0 < < 2 N ): int l = 1 + log 2 ; /* 2 l 1 < 2 l */ uwor m = (2 N (2 l ) 1)/ ; /* m = (2 N+l 1)/ 2 N */ uwor norm = SLL(, N l); /* Normalize ivisor 2 N l */ For q = n/ an r = n q, where, q, r are uwor an n is uwor: uwor n 2 = SLL(HIGH(n), N l) + SRL(LOW(n), l); /* See note about shift count. */ uwor n 10 = SLL(LOW(n), N l); /* n 10 = n 1 2 N 1 + n 0 2 N l */ /* Ignore overflow. */ swor n 1 = XSIGN(n 10 ); uwor n aj = n 10 + AND( n 1, norm 2 N ); /* n 10 + n 1 ( norm 2 N ) */ /* = n 1 ( norm 2 N 1 ) + n 0 2 N l */ uwor q 1 = n 2 + HIGH ( ) m (n 2 ( n 1 )) + n aj ; /* Unerflow is impossible. */ /* See Lemma 8.1. */ swor r = n 2 N + (2 N 1 q 1 ) ; /* r = n q 1, r < */ q = HIGH(r) (2 N 1 q 1 ) + 2 N ; /* A 1 to quotient if r 0. */ r = LOW(r) + AND( 2 N, HIGH(r)); /* A to remainer if r < 0. */ Figure 8.1: Unsigne ivision of uwor by run time invariant uwor. escribe a way to compute this quotient an remainer after some preliminary computations involving only the ivisor, when the ivisor is a run time invariant expression. Lemma 8.1 Suppose that, m, an l are nonnegative integers such that 2 l 1 < 2 l 2 N an 0 < 2 N+l m. (8.2) Given n with 0 n 2 N 1, write n = n 2 2 l + n 1 2 l 1 + n 0, where n 0, n 1, an n 2 are integers with 0 n 1 1 an 0 n 0 2 l 1 1. Define integers q 1 an q 0 by q 1 2 N + q 0 = n 2 2 N + (n 2 + n 1 ) (m 2 N ) + n 1 ( 2 N l 2 N 1) + n 0 2 N l (8.3) an 0 q 0 2 N 1. 0 n q 1 < 2. Then 0 q 1 2 N 1 an Proof. Define k = 2 N+l m. Then (8.2) implies 0 < k 2 l 1. The boun n 2 N 1 implies n 2 2 N l 1. Equation (8.2) implies m > 2 N+l / > 2 N. A corollary to (8.3) is q 1 2 N + q 0 = n 2 m + n 1 (m 2 N ) + 2 N l ( n 1 ( 2 l 1 ) + n 0 ) ( 2 N l 1) m + 1 (m 2 N ) + 2 N l ( 1 (2 l 1 1) + (2 l 1 1) ) = 2 N l ( m 2) < 2 2N. This proves the upper boun on the integer q 1. A straightforwar calculation using the efinitions of k an q 0 an n 0 reveals that n q 1 = (n 2 + n 1 ) k + q 0 2 N + (1 2 ) ) l (n 1 ( 2 l 1 ) + n 0. (8.4) Since 2 l 1 < 2 l by hypothesis, the right sie of (8.4) is nonnegative. This remainer is boune by ( 2 N l ) + (2 N 1) 2 N + (1 2 ) ( ) l 1 ( 2 l 1 ) + (2 l 1 1) ( ) 2 < 2 l + + (1 2 ) l = 2, completing the proof. This leas to an algorithm like that in Figure 8.1 when iviing a uwor by a run time invariant uwor with quotient known to be less than 2 N. Unlike the previous algorithms, this coe rouns the multiplier own when computing a reciprocal. After initializations epening only on the ivisor, this algorithm requires two proucts (both halves of each) an simple operations (incluing oublewor as an subtracts). Five registers hol, norm, l, m, an N l. Note. The shift count l in the computations of m an n 2 may equal N. If this is too large, use separate shifts by l 1 an 1. If a oublewor shift is available, compute n 2 an n 10 together.
9 9 Exact ivision by constants Occasionally a language construct requires a ivision whose remainer is known to vanish. An example occurs in C when subtracting two pointers. Their numerical ifference is ivie by the object size. The object size is a compile time constant. Suppose we want coe for q = n/, where is a nonzero constant an n is an expression known to be ivisible by. Write = 2 e o where o is o. Fin inv such that 1 inv 2 N 1 an Then inv o 1 (mo 2 N ). (9.1) 2 e q = 2 e n = n o ( inv o ) n = inv n (mo 2 N ), o as in [2]. Hence 2 e q inv n (mo 2 N ). Since n/ o = 2 e q fits in N bits, it must equal the lower half of the prouct inv n, namely MULL( inv, n). An SRA (for signe ivision) or SRL (for unsigne ivision) prouces the quotient q. The multiplicative inverse inv of o moulo 2 N can be foun by the extene Eucliean GCD algorithm [14, p. 325]. Another algorithm observes that (9.1) hols moulo 2 3 if inv = o. Each Newton iteration inv inv (2 inv o ) mo 2 N (9.2) oubles the known exponent by which (9.1) hols, so log 2 (N/3) iterations of (9.2) suffice. If o = ±1, then inv = o so the multiplication by inv is trivial or a negation. If is o, then e = 0 an the shift isappears. A variation tests whether an integer n is exactly ivisible by a nonzero constant without computing the remainer. If is a power of 2 (or the negative thereof, in the signe case), then check the lower bits of n to test whether ivies n. Otherwise compute inv an e as above. Let q 0 = MULL( inv, n). If n = q for some q, then q 0 = 2 e q must be a multiple of 2 e. The original ivision is exact (no remainer) precisely when (i) q 0 is a multiple of 2 e, an (ii) q 0 is sufficiently small that q 0 o is representable by the original ata type. For unsigne ivision check that 2 0 q 0 2 e N 1 an that the bottom e bits of q 0 (or of n) are zero. When e > 0, these tests can be combine if the architecture has a rotate (i.e., circular shift) instruction, or by expaning this rotate into 2 N 1 OR(SRL(q 0, e), SLL(q 0, N e)). For signe ivision check that 2 2 e N 1 2 q 0 2 e N 1 1 an that the bottom e bits of q 0 are zero; the interval check can be one with an a an one signe or unsigne compare. Relately, to test whether n rem = r, where an r are constants with 1 r < an where n is signe, check whether MULL( inv, n r) is a nonnegative multiple of 2 e not exceeing 2 e (2 N 1 1 r)/. Example. To test whether a signe 32 bit value i is ivisible by 100, let inv = ( )/25. Compute swor q 0 = MULL( inv, i). Next check whether q 0 is a multiple of 4 in the interval [ q max, q max ], where q max = ( )/25. Since these algorithms require only the lower half of a prouct, other optimizations for integer multiplication apply here too. For example, applying strength reuction to the C loop signe long i, imax; for (i = 0; i < imax; i++) { if ((i % 100) == 0) {... } } might yiel (** enotes exponentiation) const unsigne long inv = (19*2**32 + 1)/25; const unsigne long qmax = (2**31-48)/25; unsigne long test = qmax; /* test = inv*i + qmax mo 2**32 */ for (i = 0; i < imax; i++, test += inv) { if (test <= 2*qmax && (test & 3) == 0) {... } } No explicit multiplication or ivision remains. 10 Implementation in GCC We have implemente the algorithms for constant ivisors in the freely available GCC compiler [21], by extening its machine an language inepenent internal coe generation. We also mae minor machine epenent moifications to some of the machine escriptor, or m files to get optimal coe. All languages an almost all processors supporte by GCC benefit. Our changes are scheule for inclusion in GCC 2.6.
10 To generate coe for ivision of N bit quantities, the CHOOSE MULTIPLIER function nees to perform (2N) bit arithmetic. This makes that proceure more complex than it might appear in Figure 6.2. Optimal selection of instructions epening on the bitsize of the operation is a tricky problem that we spent quite some time on. For some architectures, it is important to select a multiplication instruction that has the smallest available precision. On other architectures, the multiplication can be performe faster using a sequence of aitions, subtractions, an shifts. We have not implemente any algorithm for run time invariant ivisors. Only a few architectures (AMD 29050, Intel x86, Motorola 68k & 88110, an to some extent IBM POWER) have aequate harware support to make such an implementation viable, i.e., an instruction that can be use for integer logarithm computation, an a (2N) bit/n bit ivie instruction. Even with harware support, one must be careful that the transformation really improves the coe; e.g., a loop might nee to be execute many times before the faster loop boy outweighs the cost of the multiplier computation in the loop heaer. 11 Results Figure 11.1 has an example with compile time constant ivisor that gets rastically faster on all recent processor implementations. The program converts a binary number to a ecimal string. It calculates one quotient an one remainer per output igit. Table 11.1 shows the generate assembler coes for Alpha, MIPS, POWER, an SPARC. There is no explicit ivision. Although initially compute separately, the quotient an remainer calculations have been combine (by GCC s common subexpression elimination pass). The unsigne int ata type has 32 bits on all four architectures, but Alpha is a 64 bit architecture. The Alpha coe is longer than the others because it multiplies ( )/5 by x using 4 [ ( ) ( ) ( 4 [4 (4 x x) + x] x )] + x instea of the slower, 23 cycle, mulq. This illustrates that the multiplications neee by these algorithms can sometimes be compute quickly using a sequence of shifts, as, an subtracts [5], since multipliers for small constant ivisors have regular binary patterns. Table 11.2 compares the timing on some processor implementations for the raix conversion routine, with an without the ivision elimination algorithms. The number converte was a full 32 bit number, sufficiently large to hie proceure calling overhea from the measurements. We also ran the integer benchmarks from SPEC 92. The improvement was negligible for most of the programs; the best improvement seen was only about 3%. Some benchmarks that involve hashing show improvements up to about 30%. We anticipate significant improvements on some number theoretic coes. References [1] Robert Alverson. Integer ivision using reciprocals. In Peter Kornerup an Davi W. Matula, eitors, Proceeings 10th Symposium on Computer Arithmetic, pages , Grenoble, France, June [2] Ehu Artzy, James A. Hins, an Harry J. Saal. A fast ivision technique for constant ivisors. CACM, 19(2):98 101, February [3] Henry G. Baker. Computing A*B (mo N) efficiently in ANSI C. ACM SIGPLAN Notices, 27(1):95 98, January [4] H.B. Bakoglu, G.F. Grohoski, an R. K. Montoye. The IBM RISC system/6000 processor: Harware overview. IBM Journal of Research an Development, 34(1):12 22, January [5] Robert Bernstein. Multiplication by integer constants. Software Practice an Experience, 16(7): , July [6] Raymon T. Boute. The Eucliean efinition of the functions iv an mo. ACM Transactions on Programming Languages an Systems, 14(2): , April [7] A.P. Chang. A note on the moulo operation. SIGPLAN Notices, 20(4):19 23, April [8] Digital Equipment Corporation. DECchip AA Microprocessor, Harware Reference Manual, 1st eition, October [9] Intel Corporation, Santa Clara, CA. 386 DX Microprocessor Programmer s Reference Manual, [10] Intel Corporation, Santa Clara, CA. Intel486 Microprocessor Family Programmer s Reference Manual, [11] Davi H. Jacobsohn. A combinatoric ivision algorithm for fixe-integer ivisors. IEEE Trans. Comp., C 22(6): , June [12] Gerry Kane. MIPS RISC Architecture. Prentice Hall, Englewoo Cliffs, NJ, 1989.
11 #efine BUFSIZE 50 char *ecimal (unsigne int x) { static char buf[bufsize]; char *bp = buf + BUFSIZE - 1; *bp = 0; o { *--bp = 0 + x % 10; x /= 10; } while (x!= 0); return bp; /* Return pointer to first igit */ } Figure 11.1: Raix conversion coe Alpha MIPS POWER SPARC $2,buf la $5,buf+49 l 10,LC..0(2) sethi %hi(buf+49),%g2 sb $0,0($5) cau 11,0,0xcccc or %g2,%lo(buf+49),%o1 li $6,0xcccc0000 oril 11,11,0xccc stb %g0,[%o1] ori $6,$6,0xccc cal 0,0(0) sethi %hi(0xccccccc),%g2 L1: multu $4,$6 stb 0,0(10) or %g2,0xc,%o2 mfhi $3 L1: mul 9,3,11 L1: a %o1,-1,%o1 subu $5,$5,1 srai 0,3,31 umul %o0,%o2,%g0 srl $3,$3,3 an 0,0,11 r %y,%g3 sll $2,$3,2 a 9,9,0 srl %g3,3,%g3 au $2,$2,$3 a 9,9,3 sll %g3,2,%g2 sll $2,$2,1 sri 9,9,3 a %g2,%g3,%g2 subu $2,$4,$2 muli 0,9,10 sll %g2,1,%g2 au $2,$2,48 sf 0,0,3 sub %o0,%g2,%g2 move $4,$3 ai. 3,9,0 a %g2,48,%g2 bne $4,$0,L1 ai 0,0,48 orcc %g3,%g0,%o0 sb $2,0($5) stbu 0,-1(10) bne L1 j $31 bc 4,2,L1 stb %g2,[%o1] move $2,$5 la lq u $1,49($2) aq $2,49,$0 mskbl $1,$0,$1 stq u $1,49($2) L1: zapnot $16,15,$3 s4subq $3,$3,$2 s4aq $2,$3,$2 s4subq $2,$3,$2 sll $2,8,$1 subq $0,1,$0 aq $2,$1,$2 sll $2,16,$1 lq u $4,0($0) aq $2,$1,$2 s4aq $2,$3,$2 srl $2,35,$2 mskbl $4,$0,$4 s4al $2,$2,$1 aq $1,$1,$1 subl $16,$1,$1 al $1,48,$1 insbl $1,$0,$1 bis $2,$2,$16 bis $1,$4,$1 stq u $1,0($0) bne $16,L1 ret $31,($26),1 ai 3,10,0 br retl mov Table 11.1: Coe generate by our GCC for raix conversion %o1,%o0
12 Architecture/Implementation MHz Time with ivision performe Time with ivision eliminate Speeup ratio Motorola MC68020 [18, pp. 9 22] Motorola MC SPARC Viking [20] HP PA MIPS R3000 [12] MIPS R4000 [17] POWER/RIOS I [4, 22] DEC Alpha [8] * *This time ifference is artificial. The Alpha architecture has no integer ivie instruction, an the DEC library functions for ivision are slow. Table 11.2: Timing (microsecons) for raix conversion with an without ivision elimination [13] Donal E. Knuth. An empirical stuy of FOR- TRAN programs. Technical Report CS 186, Computer Science Department, Stanfor University, Stanfor artificial intelligence project memo AIM 137. [14] Donal E. Knuth. Seminumerical Algorithms, volume 2 of The Art of Computer Programming. Aison-Wesley, Reaing, MA, 2n eition, [15] Shuo-Yen Robert Li. Fast constant ivision routines. IEEE Trans. Comp., C 34(9): , September [16] Daniel J. Magenheimer, Liz Peters, Karl Pettis, an Dan Zuras. Integer multiplication an ivision on the HP Precision Architecture. In Proceeings Secon International Conference on Architectural Support for Programming Languages an Operating Systems (ASPLOS II). ACM, Publishe as SIGPLAN Notices, Volume 22, No. 10, October, [17] MIPS Computer Systems, Inc, Sunnyvale, CA. MIPS R4000 Microprocessor User s Manual, [18] Motorola, Inc. MC Bit Microprocessor User s Manual, 2n eition, [19] Motorola, Inc. PowerPC 601 RISC Microprocessor User s Manual, [20] SPARC International, Inc., Menlo Park, CA. The SPARC Architecture Manual, Version 8, [21] Richar M. Stallman. Using an Porting GCC. The Free Software Founation, Cambrige, MA, [22] Henry Warren. Preicting Execution Time on the IBM RISC System/6000. IBM, Preliminary Version.
Factoring Dickson polynomials over finite fields
Factoring Dickson polynomials over finite fiels Manjul Bhargava Department of Mathematics, Princeton University. Princeton NJ 08544 manjul@math.princeton.eu Michael Zieve Department of Mathematics, University
More informationMath 230.01, Fall 2012: HW 1 Solutions
Math 3., Fall : HW Solutions Problem (p.9 #). Suppose a wor is picke at ranom from this sentence. Fin: a) the chance the wor has at least letters; SOLUTION: All wors are equally likely to be chosen. The
More informationA Generalization of Sauer s Lemma to Classes of Large-Margin Functions
A Generalization of Sauer s Lemma to Classes of Large-Margin Functions Joel Ratsaby University College Lonon Gower Street, Lonon WC1E 6BT, Unite Kingom J.Ratsaby@cs.ucl.ac.uk, WWW home page: http://www.cs.ucl.ac.uk/staff/j.ratsaby/
More informationLecture 8: Binary Multiplication & Division
Lecture 8: Binary Multiplication & Division Today s topics: Addition/Subtraction Multiplication Division Reminder: get started early on assignment 3 1 2 s Complement Signed Numbers two = 0 ten 0001 two
More informationPythagorean Triples Over Gaussian Integers
International Journal of Algebra, Vol. 6, 01, no., 55-64 Pythagorean Triples Over Gaussian Integers Cheranoot Somboonkulavui 1 Department of Mathematics, Faculty of Science Chulalongkorn University Bangkok
More informationCS 103X: Discrete Structures Homework Assignment 3 Solutions
CS 103X: Discrete Structures Homework Assignment 3 s Exercise 1 (20 points). On well-ordering and induction: (a) Prove the induction principle from the well-ordering principle. (b) Prove the well-ordering
More information10.2 Systems of Linear Equations: Matrices
SECTION 0.2 Systems of Linear Equations: Matrices 7 0.2 Systems of Linear Equations: Matrices OBJECTIVES Write the Augmente Matrix of a System of Linear Equations 2 Write the System from the Augmente Matrix
More informationInverse Trig Functions
Inverse Trig Functions c A Math Support Center Capsule February, 009 Introuction Just as trig functions arise in many applications, so o the inverse trig functions. What may be most surprising is that
More informationECE 0142 Computer Organization. Lecture 3 Floating Point Representations
ECE 0142 Computer Organization Lecture 3 Floating Point Representations 1 Floating-point arithmetic We often incur floating-point programming. Floating point greatly simplifies working with large (e.g.,
More informationFirewall Design: Consistency, Completeness, and Compactness
C IS COS YS TE MS Firewall Design: Consistency, Completeness, an Compactness Mohame G. Goua an Xiang-Yang Alex Liu Department of Computer Sciences The University of Texas at Austin Austin, Texas 78712-1188,
More informationDifferentiability of Exponential Functions
Differentiability of Exponential Functions Philip M. Anselone an John W. Lee Philip Anselone (panselone@actionnet.net) receive his Ph.D. from Oregon State in 1957. After a few years at Johns Hopkins an
More informationComputer Science 281 Binary and Hexadecimal Review
Computer Science 281 Binary and Hexadecimal Review 1 The Binary Number System Computers store everything, both instructions and data, by using many, many transistors, each of which can be in one of two
More informationA simple and fast algorithm for computing exponentials of power series
A simple and fast algorithm for computing exponentials of power series Alin Bostan Algorithms Project, INRIA Paris-Rocquencourt 7815 Le Chesnay Cedex France and Éric Schost ORCCA and Computer Science Department,
More informationA single register, called the accumulator, stores the. operand before the operation, and stores the result. Add y # add y from memory to the acc
Other architectures Example. Accumulator-based machines A single register, called the accumulator, stores the operand before the operation, and stores the result after the operation. Load x # into acc
More informationInstruction Set Architecture. or How to talk to computers if you aren t in Star Trek
Instruction Set Architecture or How to talk to computers if you aren t in Star Trek The Instruction Set Architecture Application Compiler Instr. Set Proc. Operating System I/O system Instruction Set Architecture
More informationThis Unit: Floating Point Arithmetic. CIS 371 Computer Organization and Design. Readings. Floating Point (FP) Numbers
This Unit: Floating Point Arithmetic CIS 371 Computer Organization and Design Unit 7: Floating Point App App App System software Mem CPU I/O Formats Precision and range IEEE 754 standard Operations Addition
More informationLecture 3: Finding integer solutions to systems of linear equations
Lecture 3: Finding integer solutions to systems of linear equations Algorithmic Number Theory (Fall 2014) Rutgers University Swastik Kopparty Scribe: Abhishek Bhrushundi 1 Overview The goal of this lecture
More informationModelling and Resolving Software Dependencies
June 15, 2005 Abstract Many Linux istributions an other moern operating systems feature the explicit eclaration of (often complex) epenency relationships between the pieces of software
More informationA Comparison of Performance Measures for Online Algorithms
A Comparison of Performance Measures for Online Algorithms Joan Boyar 1, Sany Irani 2, an Kim S. Larsen 1 1 Department of Mathematics an Computer Science, University of Southern Denmark, Campusvej 55,
More informationSome Polynomial Theorems. John Kennedy Mathematics Department Santa Monica College 1900 Pico Blvd. Santa Monica, CA 90405 rkennedy@ix.netcom.
Some Polynomial Theorems by John Kennedy Mathematics Department Santa Monica College 1900 Pico Blvd. Santa Monica, CA 90405 rkennedy@ix.netcom.com This paper contains a collection of 31 theorems, lemmas,
More informationMathematics Review for Economists
Mathematics Review for Economists by John E. Floy University of Toronto May 9, 2013 This ocument presents a review of very basic mathematics for use by stuents who plan to stuy economics in grauate school
More informationReduced Instruction Set Computer (RISC)
Reduced Instruction Set Computer (RISC) Focuses on reducing the number and complexity of instructions of the ISA. RISC Goals RISC: Simplify ISA Simplify CPU Design Better CPU Performance Motivated by simplifying
More information26 Integers: Multiplication, Division, and Order
26 Integers: Multiplication, Division, and Order Integer multiplication and division are extensions of whole number multiplication and division. In multiplying and dividing integers, the one new issue
More informationAttention: This material is copyright 1995-1997 Chris Hecker. All rights reserved.
Attention: This material is copyright 1995-1997 Chris Hecker. All rights reserved. You have permission to read this article for your own education. You do not have permission to put it on your website
More informationElementary Number Theory and Methods of Proof. CSE 215, Foundations of Computer Science Stony Brook University http://www.cs.stonybrook.
Elementary Number Theory and Methods of Proof CSE 215, Foundations of Computer Science Stony Brook University http://www.cs.stonybrook.edu/~cse215 1 Number theory Properties: 2 Properties of integers (whole
More informationMath Review. for the Quantitative Reasoning Measure of the GRE revised General Test
Math Review for the Quantitative Reasoning Measure of the GRE revised General Test www.ets.org Overview This Math Review will familiarize you with the mathematical skills and concepts that are important
More informationAnswers to the Practice Problems for Test 2
Answers to the Practice Problems for Test 2 Davi Murphy. Fin f (x) if it is known that x [f(2x)] = x2. By the chain rule, x [f(2x)] = f (2x) 2, so 2f (2x) = x 2. Hence f (2x) = x 2 /2, but the lefthan
More informationDetermining the Optimal Combination of Trial Division and Fermat s Factorization Method
Determining the Optimal Combination of Trial Division and Fermat s Factorization Method Joseph C. Woodson Home School P. O. Box 55005 Tulsa, OK 74155 Abstract The process of finding the prime factorization
More informationHow To Write A Hexadecimal Program
The mathematics of RAID-6 H. Peter Anvin First version 20 January 2004 Last updated 20 December 2011 RAID-6 supports losing any two drives. syndromes, generally referred P and Q. The way
More informationNumerical Matrix Analysis
Numerical Matrix Analysis Lecture Notes #10 Conditioning and / Peter Blomgren, blomgren.peter@gmail.com Department of Mathematics and Statistics Dynamical Systems Group Computational Sciences Research
More informationU.C. Berkeley CS276: Cryptography Handout 0.1 Luca Trevisan January, 2009. Notes on Algebra
U.C. Berkeley CS276: Cryptography Handout 0.1 Luca Trevisan January, 2009 Notes on Algebra These notes contain as little theory as possible, and most results are stated without proof. Any introductory
More information2010/9/19. Binary number system. Binary numbers. Outline. Binary to decimal
2/9/9 Binary number system Computer (electronic) systems prefer binary numbers Binary number: represent a number in base-2 Binary numbers 2 3 + 7 + 5 Some terminology Bit: a binary digit ( or ) Hexadecimal
More informationInstruction Set Architecture (ISA)
Instruction Set Architecture (ISA) * Instruction set architecture of a machine fills the semantic gap between the user and the machine. * ISA serves as the starting point for the design of a new machine
More informationThe mathematics of RAID-6
The mathematics of RAID-6 H. Peter Anvin 1 December 2004 RAID-6 supports losing any two drives. The way this is done is by computing two syndromes, generally referred P and Q. 1 A quick
More informationChapter 4, Arithmetic in F [x] Polynomial arithmetic and the division algorithm.
Chapter 4, Arithmetic in F [x] Polynomial arithmetic and the division algorithm. We begin by defining the ring of polynomials with coefficients in a ring R. After some preliminary results, we specialize
More information11 Ideals. 11.1 Revisiting Z
11 Ideals The presentation here is somewhat different than the text. In particular, the sections do not match up. We have seen issues with the failure of unique factorization already, e.g., Z[ 5] = O Q(
More informationBinary Division. Decimal Division. Hardware for Binary Division. Simple 16-bit Divider Circuit
Decimal Division Remember 4th grade long division? 43 // quotient 12 521 // divisor dividend -480 41-36 5 // remainder Shift divisor left (multiply by 10) until MSB lines up with dividend s Repeat until
More informationA Second Course in Mathematics Concepts for Elementary Teachers: Theory, Problems, and Solutions
A Second Course in Mathematics Concepts for Elementary Teachers: Theory, Problems, and Solutions Marcel B. Finan Arkansas Tech University c All Rights Reserved First Draft February 8, 2006 1 Contents 25
More informationThe BBP Algorithm for Pi
The BBP Algorithm for Pi David H. Bailey September 17, 2006 1. Introduction The Bailey-Borwein-Plouffe (BBP) algorithm for π is based on the BBP formula for π, which was discovered in 1995 and published
More informationHull, Chapter 11 + Sections 17.1 and 17.2 Additional reference: John Cox and Mark Rubinstein, Options Markets, Chapter 5
Binomial Moel Hull, Chapter 11 + ections 17.1 an 17.2 Aitional reference: John Cox an Mark Rubinstein, Options Markets, Chapter 5 1. One-Perio Binomial Moel Creating synthetic options (replicating options)
More informationSources: On the Web: Slides will be available on:
C programming Introduction The basics of algorithms Structure of a C code, compilation step Constant, variable type, variable scope Expression and operators: assignment, arithmetic operators, comparison,
More informationThe Quick Calculus Tutorial
The Quick Calculus Tutorial This text is a quick introuction into Calculus ieas an techniques. It is esigne to help you if you take the Calculus base course Physics 211 at the same time with Calculus I,
More informationAn Efficient RNS to Binary Converter Using the Moduli Set {2n + 1, 2n, 2n 1}
An Efficient RNS to Binary Converter Using the oduli Set {n + 1, n, n 1} Kazeem Alagbe Gbolagade 1,, ember, IEEE and Sorin Dan Cotofana 1, Senior ember IEEE, 1. Computer Engineering Laboratory, Delft University
More information20. Product rule, Quotient rule
20. Prouct rule, 20.1. Prouct rule Prouct rule, Prouct rule We have seen that the erivative of a sum is the sum of the erivatives: [f(x) + g(x)] = x x [f(x)] + x [(g(x)]. One might expect from this that
More informationNegative Integer Exponents
7.7 Negative Integer Exponents 7.7 OBJECTIVES. Define the zero exponent 2. Use the definition of a negative exponent to simplify an expression 3. Use the properties of exponents to simplify expressions
More informationCHAPTER 5 Round-off errors
CHAPTER 5 Round-off errors In the two previous chapters we have seen how numbers can be represented in the binary numeral system and how this is the basis for representing numbers in computers. Since any
More informationMeasures of distance between samples: Euclidean
4- Chapter 4 Measures of istance between samples: Eucliean We will be talking a lot about istances in this book. The concept of istance between two samples or between two variables is funamental in multivariate
More information64-Bit Architecture Speeds RSA By 4x
64-Bit Architecture Speeds RSA By 4x MIPS Technologies, Inc. June 2002 Public-key cryptography, and RSA in particular, is increasingly important to e-commerce transactions. Many digital consumer appliances
More informationSection 3.3. Differentiation of Polynomials and Rational Functions. Difference Equations to Differential Equations
Difference Equations to Differential Equations Section 3.3 Differentiation of Polynomials an Rational Functions In tis section we begin te task of iscovering rules for ifferentiating various classes of
More informationComputing Cubic Fields in Quasi-Linear Time
Computing Cubic Fields in Quasi-Linear Time K. Belabas Département de mathématiques (A2X) Université Bordeaux I 351, cours de la Libération, 33405 Talence (France) belabas@math.u-bordeaux.fr Cubic fields
More informationWhy? A central concept in Computer Science. Algorithms are ubiquitous.
Analysis of Algorithms: A Brief Introduction Why? A central concept in Computer Science. Algorithms are ubiquitous. Using the Internet (sending email, transferring files, use of search engines, online
More informationDNA Data and Program Representation. Alexandre David 1.2.05 adavid@cs.aau.dk
DNA Data and Program Representation Alexandre David 1.2.05 adavid@cs.aau.dk Introduction Very important to understand how data is represented. operations limits precision Digital logic built on 2-valued
More informationLecture L25-3D Rigid Body Kinematics
J. Peraire, S. Winall 16.07 Dynamics Fall 2008 Version 2.0 Lecture L25-3D Rigi Boy Kinematics In this lecture, we consier the motion of a 3D rigi boy. We shall see that in the general three-imensional
More informationThe one-year non-life insurance risk
The one-year non-life insurance risk Ohlsson, Esbjörn & Lauzeningks, Jan Abstract With few exceptions, the literature on non-life insurance reserve risk has been evote to the ultimo risk, the risk in the
More informationThe programming language C. sws1 1
The programming language C sws1 1 The programming language C invented by Dennis Ritchie in early 1970s who used it to write the first Hello World program C was used to write UNIX Standardised as K&C (Kernighan
More informationProperties of Real Numbers
16 Chapter P Prerequisites P.2 Properties of Real Numbers What you should learn: Identify and use the basic properties of real numbers Develop and use additional properties of real numbers Why you should
More informationJON HOLTAN. if P&C Insurance Ltd., Oslo, Norway ABSTRACT
OPTIMAL INSURANCE COVERAGE UNDER BONUS-MALUS CONTRACTS BY JON HOLTAN if P&C Insurance Lt., Oslo, Norway ABSTRACT The paper analyses the questions: Shoul or shoul not an iniviual buy insurance? An if so,
More informationRisk Management for Derivatives
Risk Management or Derivatives he Greeks are coming the Greeks are coming! Managing risk is important to a large number o iniviuals an institutions he most unamental aspect o business is a process where
More informationLevent EREN levent.eren@ieu.edu.tr A-306 Office Phone:488-9882 INTRODUCTION TO DIGITAL LOGIC
Levent EREN levent.eren@ieu.edu.tr A-306 Office Phone:488-9882 1 Number Systems Representation Positive radix, positional number systems A number with radix r is represented by a string of digits: A n
More informationView Synthesis by Image Mapping and Interpolation
View Synthesis by Image Mapping an Interpolation Farris J. Halim Jesse S. Jin, School of Computer Science & Engineering, University of New South Wales Syney, NSW 05, Australia Basser epartment of Computer
More informationOn Adaboost and Optimal Betting Strategies
On Aaboost an Optimal Betting Strategies Pasquale Malacaria 1 an Fabrizio Smerali 1 1 School of Electronic Engineering an Computer Science, Queen Mary University of Lonon, Lonon, UK Abstract We explore
More informationMATH10040 Chapter 2: Prime and relatively prime numbers
MATH10040 Chapter 2: Prime and relatively prime numbers Recall the basic definition: 1. Prime numbers Definition 1.1. Recall that a positive integer is said to be prime if it has precisely two positive
More informationDivide: Paper & Pencil. Computer Architecture ALU Design : Division and Floating Point. Divide algorithm. DIVIDE HARDWARE Version 1
Divide: Paper & Pencil Computer Architecture ALU Design : Division and Floating Point 1001 Quotient Divisor 1000 1001010 Dividend 1000 10 101 1010 1000 10 (or Modulo result) See how big a number can be
More informationRules for Finding Derivatives
3 Rules for Fining Derivatives It is teious to compute a limit every time we nee to know the erivative of a function. Fortunately, we can evelop a small collection of examples an rules that allow us to
More informationBCD (ASCII) Arithmetic. Where and Why is BCD used? Packed BCD, ASCII, Unpacked BCD. BCD Adjustment Instructions AAA. Example
BCD (ASCII) Arithmetic We will first look at unpacked BCD which means strings that look like '4567'. Bytes then look like 34h 35h 36h 37h OR: 04h 05h 06h 07h x86 processors also have instructions for packed
More informationContinued Fractions and the Euclidean Algorithm
Continued Fractions and the Euclidean Algorithm Lecture notes prepared for MATH 326, Spring 997 Department of Mathematics and Statistics University at Albany William F Hammond Table of Contents Introduction
More informationOct: 50 8 = 6 (r = 2) 6 8 = 0 (r = 6) Writing the remainders in reverse order we get: (50) 10 = (62) 8
ECE Department Summer LECTURE #5: Number Systems EEL : Digital Logic and Computer Systems Based on lecture notes by Dr. Eric M. Schwartz Decimal Number System: -Our standard number system is base, also
More informationFast Arithmetic Coding (FastAC) Implementations
Fast Arithmetic Coding (FastAC) Implementations Amir Said 1 Introduction This document describes our fast implementations of arithmetic coding, which achieve optimal compression and higher throughput by
More informationInteger Factorization using the Quadratic Sieve
Integer Factorization using the Quadratic Sieve Chad Seibert* Division of Science and Mathematics University of Minnesota, Morris Morris, MN 56567 seib0060@morris.umn.edu March 16, 2011 Abstract We give
More informationIntroduction to Integration Part 1: Anti-Differentiation
Mathematics Learning Centre Introuction to Integration Part : Anti-Differentiation Mary Barnes c 999 University of Syney Contents For Reference. Table of erivatives......2 New notation.... 2 Introuction
More informationLAKE ELSINORE UNIFIED SCHOOL DISTRICT
LAKE ELSINORE UNIFIED SCHOOL DISTRICT Title: PLATO Algebra 1-Semester 2 Grade Level: 10-12 Department: Mathematics Credit: 5 Prerequisite: Letter grade of F and/or N/C in Algebra 1, Semester 2 Course Description:
More informationNotes on Factoring. MA 206 Kurt Bryan
The General Approach Notes on Factoring MA 26 Kurt Bryan Suppose I hand you n, a 2 digit integer and tell you that n is composite, with smallest prime factor around 5 digits. Finding a nontrivial factor
More informationSome facts about polynomials modulo m (Full proof of the Fingerprinting Theorem)
Some facts about polynomials modulo m (Full proof of the Fingerprinting Theorem) In order to understand the details of the Fingerprinting Theorem on fingerprints of different texts from Chapter 19 of the
More informationParameterized Algorithms for d-hitting Set: the Weighted Case Henning Fernau. Univ. Trier, FB 4 Abteilung Informatik 54286 Trier, Germany
Parameterize Algorithms for -Hitting Set: the Weighte Case Henning Fernau Trierer Forschungsberichte; Trier: Technical Reports Informatik / Mathematik No. 08-6, July 2008 Univ. Trier, FB 4 Abteilung Informatik
More informationSUBGROUPS OF CYCLIC GROUPS. 1. Introduction In a group G, we denote the (cyclic) group of powers of some g G by
SUBGROUPS OF CYCLIC GROUPS KEITH CONRAD 1. Introduction In a group G, we denote the (cyclic) group of powers of some g G by g = {g k : k Z}. If G = g, then G itself is cyclic, with g as a generator. Examples
More information8 Square matrices continued: Determinants
8 Square matrices continued: Determinants 8. Introduction Determinants give us important information about square matrices, and, as we ll soon see, are essential for the computation of eigenvalues. You
More informationAdaptive Online Gradient Descent
Adaptive Online Gradient Descent Peter L Bartlett Division of Computer Science Department of Statistics UC Berkeley Berkeley, CA 94709 bartlett@csberkeleyedu Elad Hazan IBM Almaden Research Center 650
More informationCopy in your notebook: Add an example of each term with the symbols used in algebra 2 if there are any.
Algebra 2 - Chapter Prerequisites Vocabulary Copy in your notebook: Add an example of each term with the symbols used in algebra 2 if there are any. P1 p. 1 1. counting(natural) numbers - {1,2,3,4,...}
More informationInformation Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay
Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay Lecture - 17 Shannon-Fano-Elias Coding and Introduction to Arithmetic Coding
More informationThe countdown problem
JFP 12 (6): 609 616, November 2002. c 2002 Cambridge University Press DOI: 10.1017/S0956796801004300 Printed in the United Kingdom 609 F U N C T I O N A L P E A R L The countdown problem GRAHAM HUTTON
More informationUseful Number Systems
Useful Number Systems Decimal Base = 10 Digit Set = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9} Binary Base = 2 Digit Set = {0, 1} Octal Base = 8 = 2 3 Digit Set = {0, 1, 2, 3, 4, 5, 6, 7} Hexadecimal Base = 16 = 2
More informationInteger roots of quadratic and cubic polynomials with integer coefficients
Integer roots of quadratic and cubic polynomials with integer coefficients Konstantine Zelator Mathematics, Computer Science and Statistics 212 Ben Franklin Hall Bloomsburg University 400 East Second Street
More informationRUNESTONE, an International Student Collaboration Project
RUNESTONE, an International Stuent Collaboration Project Mats Daniels 1, Marian Petre 2, Vicki Almstrum 3, Lars Asplun 1, Christina Björkman 1, Carl Erickson 4, Bruce Klein 4, an Mary Last 4 1 Department
More informationCORRELATED TO THE SOUTH CAROLINA COLLEGE AND CAREER-READY FOUNDATIONS IN ALGEBRA
We Can Early Learning Curriculum PreK Grades 8 12 INSIDE ALGEBRA, GRADES 8 12 CORRELATED TO THE SOUTH CAROLINA COLLEGE AND CAREER-READY FOUNDATIONS IN ALGEBRA April 2016 www.voyagersopris.com Mathematical
More informationFPGA Implementation of an Extended Binary GCD Algorithm for Systolic Reduction of Rational Numbers
FPGA Implementation of an Extended Binary GCD Algorithm for Systolic Reduction of Rational Numbers Bogdan Mătăsaru and Tudor Jebelean RISC-Linz, A 4040 Linz, Austria email: bmatasar@risc.uni-linz.ac.at
More informationCryptography and Network Security. Prof. D. Mukhopadhyay. Department of Computer Science and Engineering. Indian Institute of Technology, Kharagpur
Cryptography and Network Security Prof. D. Mukhopadhyay Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Module No. # 01 Lecture No. # 12 Block Cipher Standards
More informationAssembly Language Programming
Assembly Language Programming Assemblers were the first programs to assist in programming. The idea of the assembler is simple: represent each computer instruction with an acronym (group of letters). Eg:
More informationInformatica e Sistemi in Tempo Reale
Informatica e Sistemi in Tempo Reale Introduction to C programming Giuseppe Lipari http://retis.sssup.it/~lipari Scuola Superiore Sant Anna Pisa October 25, 2010 G. Lipari (Scuola Superiore Sant Anna)
More informationAnswer Key for California State Standards: Algebra I
Algebra I: Symbolic reasoning and calculations with symbols are central in algebra. Through the study of algebra, a student develops an understanding of the symbolic language of mathematics and the sciences.
More informationHow To Prove The Dirichlet Unit Theorem
Chapter 6 The Dirichlet Unit Theorem As usual, we will be working in the ring B of algebraic integers of a number field L. Two factorizations of an element of B are regarded as essentially the same if
More informationCSE373: Data Structures and Algorithms Lecture 3: Math Review; Algorithm Analysis. Linda Shapiro Winter 2015
CSE373: Data Structures and Algorithms Lecture 3: Math Review; Algorithm Analysis Linda Shapiro Today Registration should be done. Homework 1 due 11:59 pm next Wednesday, January 14 Review math essential
More informationThis unit will lay the groundwork for later units where the students will extend this knowledge to quadratic and exponential functions.
Algebra I Overview View unit yearlong overview here Many of the concepts presented in Algebra I are progressions of concepts that were introduced in grades 6 through 8. The content presented in this course
More informationRN-Codings: New Insights and Some Applications
RN-Codings: New Insights and Some Applications Abstract During any composite computation there is a constant need for rounding intermediate results before they can participate in further processing. Recently
More informationNotes on tangents to parabolas
Notes on tangents to parabolas (These are notes for a talk I gave on 2007 March 30.) The point of this talk is not to publicize new results. The most recent material in it is the concept of Bézier curves,
More informationHomework 8. problems: 10.40, 10.73, 11.55, 12.43
Hoework 8 probles: 0.0, 0.7,.55,. Proble 0.0 A block of ass kg an a block of ass 6 kg are connecte by a assless strint over a pulley in the shape of a soli isk having raius R0.5 an ass M0 kg. These blocks
More informationCSI 333 Lecture 1 Number Systems
CSI 333 Lecture 1 Number Systems 1 1 / 23 Basics of Number Systems Ref: Appendix C of Deitel & Deitel. Weighted Positional Notation: 192 = 2 10 0 + 9 10 1 + 1 10 2 General: Digit sequence : d n 1 d n 2...
More informationInstruction Set Architecture (ISA) Design. Classification Categories
Instruction Set Architecture (ISA) Design Overview» Classify Instruction set architectures» Look at how applications use ISAs» Examine a modern RISC ISA (DLX)» Measurement of ISA usage in real computers
More informationMathematics Course 111: Algebra I Part IV: Vector Spaces
Mathematics Course 111: Algebra I Part IV: Vector Spaces D. R. Wilkins Academic Year 1996-7 9 Vector Spaces A vector space over some field K is an algebraic structure consisting of a set V on which are
More information5.1 Radical Notation and Rational Exponents
Section 5.1 Radical Notation and Rational Exponents 1 5.1 Radical Notation and Rational Exponents We now review how exponents can be used to describe not only powers (such as 5 2 and 2 3 ), but also roots
More information