Improved division by invariant integers


 Amberlynn Jennings
 2 years ago
 Views:
Transcription
1 1 Improve ivision by invariant integers Niels Möller an Torbjörn Granlun Abstract This paper consiers the problem of iviing a twowor integer by a singlewor integer, together with a few extensions an applications. Due to lack of efficient ivision instructions in current processors, the ivision is performe as a multiplication using a precompute singlewor approximation of the reciprocal of the ivisor, followe by a couple of ajustment steps. There are three common types of unsigne multiplication instructions; we efine full wor multiplication (umul) which prouces the twowor prouct of two singlewor integers, low multiplication (umullo) which prouces only the least significant wor of the prouct, an high multiplication (umulhi), which prouces only the most significant wor. We escribe an algorithm which prouces a quotient an remainer using one umul an one umullo. This is an improvement over earlier methos, since the new metho uses cheaper multiplication operations. It turns out we also get some aitional savings from simpler ajustment conitions. The algorithm has been implemente in version 4.3 of the GMP library. When applie to the problem of iviing a large integer by a single wor, the new algorithm gives a speeup of roughly 30%, benchmarke on AMD an Intel processors in the x86 64 family. I. INTRODUCTION Integer ivision instructions are either not present at all in current microprocessors, or if they are present, they are consierably slower than the corresponing multiplication instructions. Multiplication instructions in turn are at least a few times slower than aition instructions, both in terms of throughput an latency. The situation was similar a ecae ago [1], an the tren has continue so that ivision latency is now typically 515 times higher than multiplication latency, an ivision throughput is up to 50 times worse than multiplication throughput. Another tren is that branches cost graually more, except for branches that the harware can preict correctly. But some branches are inherently unpreictable. Division can be implemente using multiplication, by first computing an approximate reciprocal, e.g., by Newton iteration, followe by a multiplication that results in a caniate quotient. Finally, the remainer corresponing to this caniate quotient is compute, an if the remainer is too small or too large, the quotient is ajuste. This proceure is particularly attractive when the same ivisor is use several times; then the reciprocal nee to be compute only once. Somewhat surprisingly, a welltune Newton reciprocal followe by multiplication an ajustments wins over the harware ivision instructions even for a single noninvariant ivision on moern 64bit PC processors. This paper consiers the problem of iviing a twowor number by a singlewor number, using a singlewor approx N. Möller is a long time member of the GMP research team. T. Granlun is with the Centre for Inustrial an Applie Mathematics, KTH, Stockholm. Granlun s work was sponsore by the Sweish Founation for Strategic Research. imate reciprocal. The main contributions are a new algorithm for ivision using such a reciprocal an new algorithms for computing a suitable reciprocal for 32bit an 64bit wor size. The key iea in our new ivision algorithm is to compute the caniate remainer as a single wor rather than a ouble wor, even though it oes not quite fit. We then use a fraction associate with the caniate quotient to resolve the ambiguity. The new metho is more efficient than previous methos for two reasons. It uses cheaper multiplication operations, omitting the most significant half one of the two proucts. Computing the least significant wor of a prouct is a cheaper operation than computing the most significant wor (e.g., on AMD Opteron, the ifference in latency is one cycle, while on Intel Core 2, the ifference is three cycles). The neee ajustment conitions are simpler. When the ivision algorithms in this paper are use as builing blocks for algorithms working with large numbers, our improvements typically affect the linear term of the execution time. This is of particular importance for applications using integers of size up to a few ozen wors, e.g., on a 64bit CPU, 2048bit RSA correspons to computations on 32wor numbers. The new algorithms have been implemente in the GMP library [2]. As an example of the resulting speeup, for ivision of a large integer by a single wor, the new metho gives a speeup of 31% compare to earlier methos, benchmarke on AMD Opteron an Intel Core 2. The outline of this paper is as follows. The rest of this section efines the notation we use. Section II explains how the neee reciprocal approximation is efine, an how it is useful. In Sec. III, we escribe new algorithms for computing the reciprocal, an we present our main result, a new algorithm for iviing a twowor number by a single wor. Analysis of the probability for the ajustment steps in the latter algorithm is provie in Appenix A. Section IV escribes a couple of extensions, primarily motivate by schoolbook ivision, the most important one being a metho for iviing a threewor number by a twowor number. In Sec. V, we consier an algorithm that can take irect avantage of the new ivision metho: Diviing a large integer by a singlewor. We escribe the x86 64 implementation of this algorithm using the new metho, an compare it to earlier results. Finally, Sec. VI summarises our conclusions. A. Notation an conventions Let l enote the computer wor size, an let = 2 l enote the base implie by the wor size. Lowercase letters enote singlewor numbers, an uppercase letters represent numbers
2 2 of any size. We use the notation X = x n 1,...,x 1,x 0 = x n 1 n x 1 + x 0, where the nwor integer X is represente by the wors x i, for 0 i < n. We use the following multiplication operations: p 1,p 0 umul(a,b) = ab Double wor prouct p 0 umullo(a,b) = (ab) mo ab p 1 umulhi(a,b) = Low wor High wor Our algorithms epen on the existence an efficiency of these basic multiplication operations, but they o not require both umul an umulhi. These are common operations in all processors, an very few processors lack both umul an umulhi 1. II. DIVISION USING AN APPROXIMATE RECIPROCAL Consier the problem of iviing a twowor number U = u 1,u 0 by a singlewor number, computing the quotient an remainer U q = r = U q. Clearly, r is a singlewor number. We assume that u 1 <, to ensure that also the quotient q fits in a single wor. We also restrict attention to the case that is a normalise singlewor number, i.e., /2 <. This is equivalent to the wor having its most significant bit set. It follows that u 0 / < 2, an one can get a reasonable quotient approximation from u 1 alone, without consiering u 0. We have 1/ < 1/ 2/. We represent the reciprocal 1/ using a fixepoint representation, with a single wor an an aitional implicit one bit at the most significant en. We efine the precompute reciprocal of as the integer 2 1 v =. (1) The constraints on imply that 0 < v <, in particular, v is a single wor number. We have ( + v)/ 2 1/, or more precisely, v 2 < 1. (2) For the borerline case = /2, we have the true reciprocal 1/ = 2/, which equals (+v)/ 2 for v =. Our efinition instea gives the singlewor number v = 1 in this case. The usefulness of v comes from Eq. (2) which implies U (u 1 + u 0 ) + v 2 = u 1 + u 1v + u 0 + u 0v 2. (3) Since ( + v)/ 2 < 1/, the integer part of the right han sie is at most q, an hence a single wor. Since the terms on the right han sie are nonnegative, this boun is still vali if some of the terms are omitte or truncate. 1 The SPARC v9 architecture is a notable exception, making high performance arithmetic on large numbers very challenging. A. Previous methos The trick of using a precompute reciprocal to replace integer ivision by multiplication is wellknown. The simplest variant is Alg. 1, which uses a quotient approximation base on the first two terms of Eq. (3). (q,r) DIV2BY1( u 1,u 0,,v) In: /2 <, u 1 <, v = ( 2 1)/ 1 q vu 1 / + u 1 // Caniate quotient (umulhi) 2 p 1,p 0 q // umul 3 r 1,r 0 u 1,u 0 p 1,p 0 // Caniate remainer 4 while r 1 > 0 or r 0 // Repeate at most 3 times 5 q q r 1,r 0 r 1,r 0 7 return q,r 0 Algorithm 1: Simple ivision of twowor number by a singlewor number, using a precompute singlewor reciprocal. To see how it works, let U = u 1,u 0 an let q enote the true quotient U/. We have ( + v) = 2 k, where 1 k. Let q enote the caniate quotient compute at line 1, an let q 0 = vu 1 mo enote the low, ignore, half of the prouct. Let R enote the corresponing caniate remainer, compute on line 3. Then R = U q = u 0 + u 1 u 1( + v) q 0 = u 0 + u 1k + q 0 We see that R 0, which correspons to q q. Since k, we also get the upper boun R < +2 4, which implies that q q 3. Since R may be larger than, it must be compute as a twowor number at line 3 an in the loop, at line 5, which is execute at most three times. The problem is that in the twowor subtraction U q, most, but not all, bits in the most significant wor cancel. Hence, we must use the expensive umul operation rather than the cheaper umullo. The quotient approximation can be improve. By checking if u 0, an if so, incrementing q before computing r, one gets R < 3 an q q 2. The metho in [1], Sec. 8, is more intricate, guaranteeing that R < 2, so that q q 1. However, it still computes the full prouct q, so this metho nees one umul an one umulhi. III. NEW ALGORITHMS In this section, we escribe our new algorithms. We first give efficient algorithms for computing the approximate reciprocal, an we then escribe our new algorithm for ivision of a oublewor number by a single wor. A. Computing the reciprocal From the efinition of v, we have 2 1 1, 1 v = =
3 3 so for architectures that provie an instruction for iviing a twowor number by a single wor, that instruction can be use to compute the reciprocal straightforwarly. If such a ivision instruction is lacking or if it is slow, the reciprocal can be compute using the Newton iteration This equation implies that x k+1 = x k + x k (1 x k ). (4) 1 x k+1 = (1 x k ) 2. (5) Consier one iteration, an assume that the accuracy of x k is roughly n bits. Then the esire accuracy of x k+1 is about 2n bits, an to achieve that, only about 2n bits of are neee in Eq. (4). If x k is represente using n bits, matching its accuracy, then the computation of the right han sie yiels 4n bits. In a practical implementation, the result shoul be truncate to match the accuracy of 2n bits. The resulting error in x k+1 is the combination of the error accoring to Eq (5), the truncation of the result, an any truncation of the input. v RECIPROCAL WORD() In: 2 63 < mo 2 // Least significant bit // Most significant 9 bits // Most significant 40 bits 4 63 /2 // Most significant 63 bits 5 v 0 ( )/ 9 // By table lookup 6 v v v // 2 umullo 7 v v v 1 (2 60 v 1 40 ) // 2 umullo 8 e 2 96 v v 2 /2 0 // umullo 9 v 3 (2 31 v v 2 e ) mo 2 64 // umulhi 10 v 4 (v (v ) ) mo 2 64 // umul 11 return v 4 Algorithm 2: Computing the reciprocal ( 2 1)/, for 64bit machines ( = 2 64 ). Algorithm 2 gives one variant, for = Here, v 0 is represente as 11 bits, v 1 as 21 bits, v 2 as 34 bits, an v 3 an v 4 as 65bit values where the most significant bit, which is always one, is implicit. Note that since 40 an 63 are roune upwars, they may be equal to 2 40 an 2 63 respectively, an hence not quite fit in 40 an 63 bits. Theorem 1 (64bit reciprocal): With = 2 64, the output v of Alg. 2 satisfies 0 < 2 ( + v). Proof: We will prove that the errors in each iteration are boune as follows: e 0 = 2 50 v 0 40 e 0 < (6) e 1 = 2 60 v e 1 < (7) e 2 = 2 97 v 2 0 < e 2 < (8) e 3 = ( v 3 ) 0 < e 3 < 2 (9) e 4 = ( v 4 ) 0 < e 4 (10) Each step involves a truncation, an we let 0 δ k < 1 enote the truncation error in each step. Start with (6). Let = , then We have v 0 = δ 0 e 0 = ( ) + δ 0 40 From this, we get For (7), we get = δ e δ 0 40 < = e > = v 1 = 2 11 v v (1 δ 1 ) e 1 = 2 60 (2 11 v v ) 40 + (1 δ 1 ) 40 = 2 40 e (1 δ 1 ) 40 It follows that e 1 > 0 an that ( ) 2 5 e 1 < = For (8), we first note that the prouct v 1 (2 60 v 1 40 ) fits in 64 bits, since the first factor is 21 bits an the secon factor is e 1, which fits in 43 bits. Let = , then We get v 2 = 2 13 v v 1 (2 60 v 1 40 ) δ 2 e 2 = 2 97 v 2 ( ) = (2 13 v v 1 (2 60 v 1 40 )) 40 + v 2 + δ 2 = 2 23 e v 2 + δ 2 It follows that e 2 > 0 an that ( ) 2 29 e 2 < = For (9), first note that the value e, compute at line 8, equals e 2 /2. Then (8) implies that this value fits in 64 bits. Let ǫ enote the least significant bit of e 2, so that e = (e 2 ǫ)/2. Define v 3 = 2 31 v v 2 (e 2 ǫ) e 3 = v 3 (We will see in a moment that v 3 = v 3, an hence also e 3 = e 3 ). We get e 3 = (2 31 v v 2 (2 97 v 2 ǫ)) + δ 3 = 2 66 e (2 66 v 2 ǫ + δ 3 ) It follows that e 3 > 0 an that ( ) 2 ( e 3 < ) < 2
4 4 v RECIPROCAL WORD() In: 2 31 < mo // Most significant 10 bits // Most significant 21 bits 4 31 /2 // Most significant 31 bits 5 v 0 ( )/ 10 // By table lookup 6 v v v // umullo + umulhi 7 e (2 48 v v 1 /2 0 ) // umullo 8 v v v 1 e // umulhi 9 v 3 (v (v ) ) mo 2 32 // umul 10 return v 3 Algorithm 3: Computing the reciprocal ( 2 1)/, for 32bit machines ( = 2 32 ). It remains to show that 2 64 v 3 < The upper boun follows from e 3 > 0. For the borerline case = , one can verify that v 3 = 2 64, an for , we get v 3 = 2128 e e = e For the final ajustment step, we have > (v ) = 2 64 (2 128 e 3 + ) = ( e 3 ) { 2 64 e 3 = e 3 > Hence, the effect of the ajustment is to increment the reciprocal approximation if an only if e 3 >. The esire boun, Eq. (10), follows. Algorithm 3 is a similar algorithm for = In this algorithm, v 0 is represente as 15 bits, v 1 as 18 bits, an v 2 an v 3 as 33bit values where the most significant bit, always one, is implicit. The correctness proof is analogous, with the following error bouns: e 0 = 2 35 v 0 21 e 0 < e 1 = 2 49 v 1 0 < e 1 < e 2 = 2 64 ( v 2 ) 0 < e 2 < 2 e 3 = 2 64 ( v 3 ) Remarks: 0 < e 2 The final step in the algorithm is not a Newton iteration, but an ajustment step which as zero or one to the reciprocal approximation. We gain precision in the first Newton iteration by choosing the initial value v 0 so that the range for the error e 0 is symmetric aroun zero. In the Newton iteration x+x(1 x), there is cancellation in the subtraction (1 x), since x is close to 1. In Alg. 2 an 3 we arrange so that the errors e k, for k 1, (q,r) DIV2BY1( u 1,u 0,,v) In: /2 <, u 1 <, v = ( 2 1)/ 1 q 1,q 0 vu 1 // umul 2 q 1,q 0 q 1,q 0 + u 1,u 0 3 q 1 (q 1 + 1) mo 4 r (u 0 q 1 ) mo // umullo 5 if r > q 0 // Unpreictable conition 6 q 1 (q 1 1) mo 7 r (r + ) mo 8 if r // Unlikely conition 9 q 1 q r r 11 return q 1,r Algorithm 4: New algorithm for iviing a twowor number by a singlewor number, using a precompute singlewor reciprocal. are nonnegative, an exploit that a certain number of the high bits of v k are know apriori to be all ones. The execution time of Alg. 2 is roughly 48 cycles on AMD Opteron, an 70 cycles on Intel Core 2. B. Diviing a twowor number by a single wor To improve performance of ivision, it woul be nice if we coul get away with using umullo for the multiplication q in Alg. 1 (line 2), rather than a full umul. Then the caniate remainer U q will be compute only moulo, even though the full range of possible values is too large to be represente by a single wor. We will nee some aitional information to be able to make a correct ajustment. It turns out that this is possible, if we take the fractional part of the quotient approximation into account. Intuitively, we expect the caniate remainer to be roughly proportional to the quotient fraction. Our new an improve metho is given in Alg. 4. It is base on the following theorem. Theorem 2: Assume /2 <, 0 u 1 <, an 0 u 0 <. Put v = ( 2 1)/. Form the twowor number q 1,q 0 = ( + v)u 1 + u 0. Form the caniate quotient an remainer Then r satisfies q = q r = u 1,u 0 q. max(,q 0 + 1) r < max(,q 0 ) Hence r is uniquely etermine given r mo, an q 0. Proof: We have ( + v) = 2 k, where 1 k. Substitution in the expression for r gives r = u 1 + u 0 q 1 = u 1k + u 0 ( ) + q 0.
5 5 For the lower boun, we clearly have r q 0. This boun implies that both these inequalities hol: r r (q 0 ) > q 0. The esire lower boun on r now follows. For the upper boun, we have r < 2 + ( ) + q 0 = ( ) + q 0 max(,q 0 ) where the final inequality follows from recognising the expression as a convex combination. Remark: The lower boun for r is attaine if an only if u 0 = u 1 = 0. Then q 1 = q 0 = 0, an r =. The upper boun is attaine if an only if u 0 = u 1 = 1 an = /2. Then v = 1, q 1 = 2, q 0 = /2, an r = /2 1. In Alg. 4, enote the value compute at line 4 by r. Then r = r mo. A straightforwar application of Theorem 2 woul compare this value to max(,q 0 ). In Alg. 4, we instea compare r to q 0. To see why this gives the correct result, consier two cases: Assume r 0. Then r = r < max(,q 0 ). Hence, whenever the conition at line 5 is true, we have r <, so that the aition at the next line oes not overflow. The secon ajustment conition, at line 8, reuces the remainer to the proper range 0 r <. Otherwise, r < 0. Then r = r+ max(,q 0 +1). Since r > q 0, the conition at line 5 is true, an since r, the aition (r + ) mo = r + = r + yiels a correct remainer in the proper range. The conition at line 8 is false. Of the two ajustment conitions, the first one is inherently unpreictable, with a nonnegligible probability for either outcome. This means that branch preiction will not be effective. For goo performance, the first ajustment must be implemente in a branchfree fashion, e.g., using a conitional move instructions. The secon conition, r, is true with very low probability (see Appenix A for analysis of this probability), an can be hanle by a preicate branch or using conitional move. IV. EXTENSIONS FOR SCHOOLBOOK DIVISION The key iea in Alg. 4 can be applie to other small ivisions, not just twowor ivie by single wor (which we call a 2/1 ivision). This leas to a family of algorithms, all which compute a quotient approximation by multiplication by a precompute reciprocal, then omit computing the high, almost cancelling, part of the corresponing caniate remainer, an finally, they perform an ajustment step using a fraction associate with the quotient approximation. We will focus on extensions that are useful for schoolbook ivision with a large ivisor. The most important extension (q, r 1,r 0 ) DIV3BY2( u 2,u 1,u 0, 1, 0,v) In: /2 1 <, u 2,u 1 < 1, 0, v = ( 2 1)/ 1 q 1,q 0 vu 2 // umul 2 q 1,q 0 q 1,q 0 + u 2,u 1 3 r 1 (u 1 q 1 1 ) mo // umullo 4 t 1,t 0 0 q 1 // umul 5 r 1,r 0 ( r 1,u 0 t 1,t 0 1, 0 ) mo 2 6 q 1 (q 1 + 1) mo 7 if r 1 q 0 8 q 1 (q 1 1) mo 9 r 1,r 0 ( r 1,r 0 + 1, 0 ) mo 2 10 if r 1,r 0 1, 0 // Unlikely conition 11 q 1 q r 1,r 0 r 1,r 0 1, 0 13 return q 1, r 1,r 0 Algorithm 5: Diviing a threewor number by a twowor number, using a precompute singlewor reciprocal. is 3/2ivision, i.e., iviing a threewor number by a twowor number. This is escribe next. Later on in this section, we will also look into variations that prouce more than one quotient wor. A. Diviing a threewor number by a twowor number For schoolbook ivision with a large ivisor, the simplest metho is to compute one quotient wor at a time by iviing the most significant two wors of the ivien by the single most significant wor of the ivisor, which is a irect application of Alg. 4. Assuming the ivisor is normalise, the resulting quotient approximation is at most two units too large. Next, the corresponing remainer caniate is compute an ajuste if necessary. A rawback with this metho is that the probability of ajustment is significant, an that each ajustment has to o an aition or a subtraction of large numbers. To improve performance, it is preferable to compute a quotient approximation base on one more wor of both ivien an ivisor, three wors ivie by two wors. With a normalise ivisor, the quotient approximation is at most one off, an the probability of error is small. For more etails on the schoolbook ivision algorithm, see [3, Sec , Alg. D] an [4]. We therefore consier the following problem: Divie u 2,u 1,u 0 by 1, 0, computing the quotient q an remainer r 1,r 0. To ensure that q fits in a single wor, we assume that u 2,u 1 < 1, 0, an like for 2/1 ivision, we also assume that the ivisor is normalise, 1 /2. Algorithm 5 is a new algorithm for 3/2 ivision. The ajustment conition at line 7 is inherently unpreictable, an shoul therefore be implemente in a branchfree fashion, while the secon one, at line 10, is true with very low probability. The algorithm is similar in spirit to Alg. 4. The correctness of the algorithm follows from the following theorem. Theorem 3: Consier the ivision of the threewor number U = u 2,u 1,u 0 by the twowor number D = 1, 0.
6 6 Assume that /2 1 < an u 2,u 1 < 1, 0 Put 3 1 v = D which is in the range 0 v <. Form the twowor number q 1,q 0 = ( + v)u 2 + u 1. Form the caniate quotient an remainer Then r satisfies with q = q r = u 2,u 1,u 0 q 1, 0. c 2 r < c c = max( 2 D,q 0 ). Proof: We have ( +v)d = 3 K, for some K in the range 1 K D. Substitution gives r = U qd = u 2K + u 1 ( 2 D) + u 0 + q 0 D D. The lower bouns r D an r > q 0 2 follow in the same way as in the proof of Theorem 2, proving the lower boun r c 2. For the upper boun, the borerline cases make the proof more involve. We nee to consier several cases. If u 2 1 1, then r < ( 1 1)D + ( 1)( 2 D) + 2 D + q 0 D = (2 D) 2 + q 0 D 0 D 2 = 2 D 2 ( 2 D) + D 2 q 0 0D 2 c. If u 2 = 1, then u 1 0 1, by assumption. In this case, we get r < 1D + ( 0 1)( 2 D) + 2 D + q 0 D = 2 D 2 ( 2 D) + D 2 q 0 + ( 0) ( ( + 1)D 3) 2 c + ( 0) ( ( + 1)D 3) 2. Uner the aitional assumption that D ( 1), we get ( + 1)D 3 < 0, an it follows that r < c. Finally, the remaining borerline case is u 2 = 1 an D > ( 1). We then have u 2 = 1 = 1, 0 u 1 < 0, an v = 0 since ( 3 1)/D < 1. It follows that q 1 = u 2 = 1. We get r = u D = (u 1 0 ) + u 0 < 0 < c. Hence the upper boun r < c is vali in all cases. v RECIPROCAL WORD 3BY2( 1, 0 ) In: /2 1 < 1 v RECIPROCAL WORD( 1 ) // We have 2 1 ( + v) 1 < 2. 2 p 1 v mo // umullo 3 p (p + 0 ) mo 4 if p < 0 // Equivalent to carry out 5 v v 1 6 if p 1 7 v v 1 8 p p 1 9 p (p 1 ) mo // We have 2 1 ( + v) < t 1,t 0 v 0 // umul 11 p (p + t 1 ) mo 12 if p < t 1 // Equivalent to carry out 13 v v 1 14 if p,t 0 1, 0 15 v v 1 16 return v Algorithm 6: Computing the reciprocal which DIV3BY2 expects, v = ( 3 1)/ 1, 0. This is a single wor reciprocal base on a twowor ivisor. B. Computing the reciprocal for 3/2 ivision The reciprocal neee by Alg. 5, even though still a single wor, is slightly ifferent from the reciprocal that is neee by Alg 4. One can use Alg. 2 or Alg. 3 (epening on wor size) to compute the reciprocal of the most significant wor 1, followe by a couple of ajustment steps to take into account the least significant wor 0. We suggest the following strategy: Start with the initial reciprocal v, base on 1 only, an the corresponing prouct ( +v) 1, where only the mile wor is represente explicitly (the high wor is 1, an the low wor is zero). We then a first 0 an then v 0 to this prouct. For each aition, if we get a carry out, we cancel that carry by appropriate subtractions of 1 an 0 to get an unerflow. The etails are given in Alg. 6. Remark: The prouct 1 v mo, compute in line 2, may be available cheaply, without multiplication, from the intermeiate values use in the final ajustment step of RECIPRO CAL WORD (Alg. 2 or Alg. 3). C. Larger quotients The basic algorithms for 2/1 ivision an 3/2 ivision can easily be extene in two ways. One can substitute oublewors or other fixesize units for the single wors in Alg. 4 an Alg. 5. This way, one can construct efficient algorithms that prouce quotients of two or more wors. E.g., with oublewor units, we get algorithms for ivision of sizes 4/2 an 6/4. In any of the algorithms constructe as above, one can fix one or more of the least significant wors of both
7 7 (Q,r) DIV NBY1(U,) In: U = u n 1...u 0, /2 < Out: Q = q n 1...q 0 1 v RECIPROCAL WORD() 2 r 0 3 for j = n 1,...,0 4 (q j,r) DIV2BY1( r,u j,,v) 5 return Q,r Algorithm 7: Diviing a large integer U = u n 1...u 0 by a normalise singlewor integer. ivien an ivisor to zero. This gives us algorithms for ivision of sizes such as 3/1 an 5/3 (an applying this proceure to 3/2 woul recover the goo ol 2/1 ivision). Details an applications for some of these variants are escribe in [4]. V. CASE STUDY: X86 64 IMPLEMENTATION OF n/1 DIVISION Schoolbook ivision is the main application of 3/2 ivision, as was escribe briefly in the previous section. We now turn to a more irect application of 2/1 ivision using Alg. 4. In this section, we escribe our implementation of DIV NBY1, iviing a large number by a single wor number, for current processors in the x86 64 family. We use conitional move (cmov) to avoi branches that are ifficult to hanle efficiently by branchpreiction. Besies cmov, the most crucial instructions use are mul, imul, a, ac, sub an lea. Detaile latency an throughput measurements of these instructions, for 32bit an 64bit processors in the x86 family, are given in [5]. We iscuss the timing only for AMD Opteron ( K8/K9 ) an Intel Core 2 (65 nm Conroe ) in this section. The AMD Opteron results are vali also for processors with the bran names Athlon an Phenom 2. Other recent Intel processors give results slightly ifferent from the 65 nm Core 2 results we escribe 3. Our results focus mainly on AMD chips since they are better optimise for scientific integer operations, i.e., the ones we epen on. If we on t specify host architecture, we are talking about AMD Opteron. A. Diviing a large integer by a single wor Consier ivision of an nwor number U by a single wor number. The result of the ivision is an nwor quotient an a singlewor remainer. This can be implemente by repeately replacing the two most significant wors of U by their singlewor remainer moulo, an recoring the 2 Phenom has the same multiplication latencies, but slightly higher(!) latency for ivision. 3 The 45 nm Core 2 has somewhat lower ivision latency, an the same multiplication latencies. The Core ix processors (x = 3, 5, 7, 9) have lower ivision latency, an for umul, they have lower latency for the low prouct wor, but higher(!) latency for the high prouct wor. loop: mov (np, un, 8), %rax iv mov %rax, (qp, un, 8) ec un jnz loop Example 1: Basic ivision loop using the iv instruction, running at 71 cycles per iteration on AMD Opteron, an 116 cycles on Intel Core 2. Note that rax an rx are implicit input an output arguments to the iv instruction. corresponing quotient wor [3, Sec , exercise 16]. The variant shown in Alg. 7 computes a reciprocal of (an hence requires that is normalise), an applies our new 2/1 ivision algorithm in each step. To use Alg. 7 irectly, must be normalise. To also hanle unnormalise ivisors, we select a shift count k such that /2 2 k <. Alg. 7 can then be applie to the shifte operans 2 k U an 2 k. The quotient is unchange by this transformation, while the resulting remainer has to be shifte k bits right at the en. Shifting of U can be one on the fly in the main loop. In the coe examples, register cl hols the normalisation shift count k. B. Naïve implementation The main loop of an implementation in x86 64 assembler is shown in Example. 1. Note that the iv instruction in the x86 family appear to be tailormae for this loop: This instructions takes a ivisor as the explicit argument. The twowor input ivien is place with the most significant wor in the rx register an the least significant wor in the rax register. The output quotient is prouce in rax an the remainer in rx. No other instruction in the loop nee to touch rx as the remainer is prouce by each iteration an consume in the next. However, the epenency between iterations, via the remainer in rx, means that the execution time is lower boune by the latency of the iv instruction, which is 71 cycles on AMD Opteron [5] (an even longer, 116 cycles, on Intel Core 2). Thanks to parallelism an outoforer execution, the rest of the instructions are execute while waiting for the result from the ivision. This loop is more than an orer of magnitue slower than the loop for multiplying a large number by a singlewor number. C. Ol ivision metho The earlier ivision metho from [1] can be implemente with the main loop in Example 2. The epenency between operations, via the rax register, is still crucial to unerstan the performance. Consier the sequence of epenent instructions in the loop, from the first use of rax until the output value of the iteration is prouce. This is what we call the recurrency chain of the loop. The assembler listing is annotate with cycle numbers, for AMD Opteron an Intel Core 2. We let cycle 0 be the cycle when the first instructions on the recurrency chain starts executing, an the following instructions in the chain are annotate with the cycle number of the earliest cycle the
8 8 loop: mov (up,un,8), %rx shl %cl, %rx, %r14 lea (,%r14), %r12 bt $63, %r14 cmovnc %r14, %r mov %rax, %r ac $0, %rax 1 2 mul inv 5 10 a %r12, %rax mov, %rax 6 11 ac %r10, %rx 7 13 not %rx 8 14 mov %rx, %r mul %rx a %rax, %r ac %rx, %r sub, %r lea (,%r14), %rax cmovnc %r14, %rax AMD Intel sub %r12, %r10 mov (up,un,8), %r14 mov %r10, 8(qp,un,8) ec un jnz loop Example 2: Previous metho using a precompute reciprocal, running at 17 cycles per iteration on AMD Opteron, an 32 cycles on Intel Core 2. instruction can start executing, taking its input epenencies into account. To create the annotations, one nees to know the latencies of the instructions. Most arithmetic instructions, incluing cmov an lea have a latency of one cycle. The crucial mul instruction has a latency of four cycles until the low wor of the prouct is available in rax, an one more cycle until the high wor is available in rx. The imul instructions, which prouces the low half only, also has a latency of four cycles. These numbers are for AMD, the latencies are slightly longer on Intel Core 2 (2 cycles for ac an cmov, 5 cycles for imul an 8 for mul). See [5] for extensive empirical timing ata. Using these latency figures, we fin that the latency of the recurrency chain in Example 2 is 15 cycles. This is a lower boun on the execution time. It turns out that the loop runs in 17 cycles per iteration; the instructions not on the recurrency chain are mostly scheule for execution in parallel with the recurrency instructions, an there s plenty of time, 8 cycles, when the CPU is otherwise just waiting for the results from the multiplication unit. This is a four time speeup compare to the 71cycle loop base on the iv instruction. For Intel Core 2, the latency of the recurrency chain is 28 cycles, while the actual running time is 32 cycles per iteration. D. New ivision metho The main loop of an implementation of the new ivision metho is given in Example 3. Annotating the listing with loop: nop mov (up,un,8), %r lea 1(%rax), %r11 shl %cl, %r10, %rbp 0 0 mul inv 4 8 a %rbp, %rax 5 9 ac %r11, %rx mov %rax, %r11 mov %rx, %r imul, %rx sub %rx, %rbp mov, %rax a %rbp, %rax cmp %r11, %rbp cmovb %rbp, %rax AMD Intel ac $1, %r13 cmp, %rax jae fix ok: mov %r13, (qp) sub $8, qp ec un mov %r10, %rbp jnz loop jmp one fix: sub, %rax inc %r13 jmp ok one: Example 3: Division coe (from GMP4.3) with the new ivision metho, base on Alg. 4. Running at 13 cycles per iteration on AMD Opteron, an 25 cycles on Intel Core 2. cycle numbers in the same way, we see that the latency of the recurrency chain is 13 cycles. Note that the rarely taken branch oes not belong to the recurrency chain. The loop actually also runs at 13 cycles per iteration; all the remaining instructions are scheule for execution in parallel with the recurrency chain 4. For Intel Core 2, the latency of the recurrency chain is 20 cycles, with an actual running time of 25 cycles per iteration. Comparing the ol an the new metho, first make the assumption (which is conservative in the Opteron case) that all the loops can be tune to get their running times own to the respective latency bouns. We then get a speeup of 15% on AMD Opteron an 40% on Intel Core 2. If we instea compare actual cycle counts, we see a speeup of 31% on both Opteron an Core 2. On Opteron, we gain one cycle from replacing one of the mul instructions by the faster imul, the other cycle shave off the recurrency chain are ue to the simpler ajustment conitions. In this application, the coe runs slower on Intel Core 2 than on AMD Opteron. The Intel CPU loses some cycles ue 4 It s curious that if the nop instruction at the top of the loop is remove, the loop runs one cycle slower. It seems likely that similar ranom changes to the instruction sequence in Example 2 can reuce its running time by one or even two cycles, to reach the lower boun of 15 cycles.
9 9 Implementation Recurrency chain latency an real cycle counts AMD Opteron Intel Core 2 Naïve iv loop (Ex. 1) Ol metho (Ex. 2) New metho (Ex. 3) TABLE I SUMMARY OF THE LATENCY OF THE RECURRENCY CHAIN, AND ACTUAL CYCLE COUNTS, FOR TWO X86 64 PROCESSORS. THE LATENCY NUMBERS ARE LOWER BOUNDS FOR THE ACTUAL CYCLE COUNTS. to higher latencies for multiplication an carry propagation, resulting in a higher overall latency of the recurrency chain. An then it loses some aitional cycles ue to the fact that the coe was written an scheule with Opteron in min. VI. CONCLUSIONS We have escribe an analyse a new algorithm for iviing a twowor number by a singlewor number ( 2/1 ivision). The key iea is that when computing a caniate remainer where the most significant wor almost cancels, we omit computing the most significant wor. To enable correct ajustment of the quotient an the remainer, we work with a slightly more precise quotient approximation than in previous algorithms, an an associate fractional wor. Like previous methos, we compute the quotient via an approximate reciprocal of the ivisor. We escribe new, more efficient, algorithms for computing this reciprocal for the most common cases of a wor size of 32 or 64 bits. The new algorithm for 2/1 ivision irectly gives a speeup of roughly 30% on current processors in the x86 64 family, for the application of iviing a large integer by a single wor. It is curious that on these processors, the combination of our reciprocal algorithm (Alg. 2) an ivision algorithm (Alg. 4) is significantly faster than the built in assembler instruction for 2/1 ivision. This inicates that the algorithms may be of interest for implementation in CPU microcoe. We have also escribe a couple of extensions of the basic algorithm, primarily to enable more efficient schoolbook ivision with a large ivisor. Most of the algorithms we escribe have been implemente in the GMP library [2]. ACKNOWLEDGEMENTS The authors wish to thank Stephan Tolksorf, Björn Terelius, Davi Harvey an Johan Håsta for valuable feeback on raft versions of this paper. As always, the responsibility for any remaining errors stays with the authors. REFERENCES [1] T. Granlun an P. L. Montgomery, Division by invariant integers using multiplication, in Proceeings of the SIGPLAN PLDI 94 Conference, June [2] T. Granlun, GNU multiple precision arithmetic library, version 4.3, May 2009, [3] D. E. Knuth, Seminumerical Algorithms, 3r e., ser. The Art of Computer Programming. Reaing, Massachusetts: AisonWesley, 1998, vol. 2. [4] T. Granlun an N. Möller, Division of integers large an small, August 2009, to appear. [5] T. Granlun, Instruction latencies an throughput for AMD an Intel x86 processors, 2009, tege/x86timing.pf. APPENDIX A PROBABILITY OF THE SECOND ADJUSTMENT STEP In this appenix, we analyse the probability of the secon ajustment step (line 8 in Alg. 4), an substantiate our claim that the secon ajustment is unlikely. We use the notation from Sec. IIIB. We also use the notation that P[event] is the probability of a given event, an E[X] is the expecte value of a ranom variable X. We will treat r as a ranom variable, but we first nee to investigate for which values of r that the secon ajustment step is one. There are two cases: If r, then r < max(,q 0 ) an imply that r < q 0. The first ajustment is skippe, the secon is one. If r > q 0, then r < max(,q 0 ) implies that r < an r + <. The first ajustment is one, then unone by the secon ajustment. The inequalities r an r q 0 are thus mutually exclusive, the former possible only when q 0 > an the latter possible only when q 0 <. One example of each kin, for = 2 5 = 32: U q r v k q q 0 r To fin the probabilities, in this section, we treat r as a ranom variable. Consier the expression for r, r = u 1k + u 0 ( ) + q 0. We assume we have a fixe = ξ, with 1/2 ξ < 1, an consier u 1 an u 0 as inepenent uniformly istribute ranom variables in the ranges 0 u 1 < an 0 u 0 <. We also make the simplifying assumptions that k an q 0 are inepenent an uniformly istribute, in the ranges 0 < k an 0 q 0 <, an that all these variables are continuous rather than integervalue. 5 Lemma 4: Assume that 1/2 ξ < 1, that u 1, u 0, k an q 0 are inepenent ranom variables, continuously an uniformly istribute with ranges 0 u 1,k ξ, 0 u 0,q 0. Let Then r = u 1k + u 0 (1 ξ) + q 0 ξ P[ r ξ or r q 0 ] ξ. (2 1/ξ)3 2 1/ξ = log + 1 6(1 ξ) 2 ξ 6 ( + (1 ξ) ξ 11 12ξ ) 36ξ 3 (11) 5 These assumptions are justifie for large worsize. Strictly speaking, with fixe, the variable k is of course not ranom at all. To make this argument strict, we woul have to treat as a ranom variable with values in a small range aroun ξ, e.g., uniformly istribute in the range ξ ± 3/4, an consier the limit as. Then the moulo operations involve in q 0 an k make these variables behave as almost inepenent an uniformly istribute.
10 10 Furthermore, if we efine then f(ξ) = (1 ξ) (1 ξ)2 17 (1 ξ)3 4 P[ r ξ or r q 0 ] (1 ξ)6 24f(ξ) (12) with an absolute error less than 0.01 percentage points, an a relative error less than 5%. Proof: Define the stochastic variables Now, X = u 1k ξ 2 R = u 1k + u 0 (1 ξ) ξ 2 Q = q 0. r ξ = R + Q 1. By assumption, Q is uniformly istribute, while R has a more complicate istribution. Conitioning on Q = s, we get the probabilities P[ r ξ] = = P[ r q 0 ] = = 1 3 ξ 1/ξ ξ+1/ξ ξ 0 1 1/ξ 1 P[R 2 s] s P[R 1 + s] s P[R 1 + (1/ξ 1)s] s ξ+1/ξ 2 0 P[R 1 + s] s. Aing the probabilities (recall that the events are mutually exclusive), we get the probability of ajustment as Probability [%] Fig. 1. Probability of the unlikely ajustment step, as a function of the ratio ξ = /. for ξ close to 1/2. The coefficients of f are chosen to give the same asymptotics. The error bouns for Eq. (12) are foun numerically. In Fig. 1, the ajustment probability of Eq. (11) is plotte as a function of the ratio ξ = /. This is a rapily ecreasing function, with maximum value for ξ = 1/2, which gives the worst case probability of 1/36 for close to /2. This curve is base on the assumptions on continuity an inepenence of the ranom variables. For a fixe an wor size, the ajustment probability for ranom u 1 an u 0 will eviate some from this continuous curve. In particular, the borerline case = /2 actually gives an ajustment probability of zero, so it is not the worst case. ξ 1 1 ξ ξ+1/ξ 2 0 P[R 1 + s] s. (13) We next nee the probabilities P[R s] for 1 s ξ + 1/ξ 1. By somewhat teious calculations, we fin P [X s] = s ( 1 log s ) P[R s] = ξ E[max(0,X (s (1/ξ 1)))] 1 ξ (s + 1 1/ξ)2 = log s + 1 1/ξ 2(1 ξ) ξ + ξ2 4(s + 1 1/ξ) + 3(s + 1 1/ξ) 2, 4(1 ξ) where the latter equation is vali only for s in the interval of interest. Substituting in Eq. (13) an integrating yiels Eq. (11). To approximate this complicate expression, we first erive its asymptotics: for ξ close to 1, an (1 ξ) 6 /24 + O ( (1 ξ) 7) 1/36 13/18(ξ 1/2) + 34/3(ξ 1/2) 2 + O ( (ξ 1/2) 3 log(ξ 1/2) )
CHAPTER 5 : CALCULUS
Dr Roger Ni (Queen Mary, University of Lonon)  5. CHAPTER 5 : CALCULUS Differentiation Introuction to Differentiation Calculus is a branch of mathematics which concerns itself with change. Irrespective
More informationLecture 8: Expanders and Applications
Lecture 8: Expaners an Applications Topics in Complexity Theory an Pseuoranomness (Spring 013) Rutgers University Swastik Kopparty Scribes: Amey Bhangale, Mrinal Kumar 1 Overview In this lecture, we will
More informationMath 230.01, Fall 2012: HW 1 Solutions
Math 3., Fall : HW Solutions Problem (p.9 #). Suppose a wor is picke at ranom from this sentence. Fin: a) the chance the wor has at least letters; SOLUTION: All wors are equally likely to be chosen. The
More informationSprings, Shocks and your Suspension
rings, Shocks an your Suspension y Doc Hathaway, H&S Prototype an Design, C. Unerstaning how your springs an shocks move as your race car moves through its range of motions is one of the basics you must
More informationnparameter families of curves
1 nparameter families of curves For purposes of this iscussion, a curve will mean any equation involving x, y, an no other variables. Some examples of curves are x 2 + (y 3) 2 = 9 circle with raius 3,
More informationOn Adaboost and Optimal Betting Strategies
On Aaboost an Optimal Betting Strategies Pasquale Malacaria 1 an Fabrizio Smerali 1 1 School of Electronic Engineering an Computer Science, Queen Mary University of Lonon, Lonon, UK Abstract We explore
More informationA Generalization of Sauer s Lemma to Classes of LargeMargin Functions
A Generalization of Sauer s Lemma to Classes of LargeMargin Functions Joel Ratsaby University College Lonon Gower Street, Lonon WC1E 6BT, Unite Kingom J.Ratsaby@cs.ucl.ac.uk, WWW home page: http://www.cs.ucl.ac.uk/staff/j.ratsaby/
More informationChapter 2 Review of Classical Action Principles
Chapter Review of Classical Action Principles This section grew out of lectures given by Schwinger at UCLA aroun 1974, which were substantially transforme into Chap. 8 of Classical Electroynamics (Schwinger
More informationLecture 17: Implicit differentiation
Lecture 7: Implicit ifferentiation Nathan Pflueger 8 October 203 Introuction Toay we iscuss a technique calle implicit ifferentiation, which provies a quicker an easier way to compute many erivatives we
More information10.2 Systems of Linear Equations: Matrices
SECTION 0.2 Systems of Linear Equations: Matrices 7 0.2 Systems of Linear Equations: Matrices OBJECTIVES Write the Augmente Matrix of a System of Linear Equations 2 Write the System from the Augmente Matrix
More informationMASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 14 10/27/2008 MOMENT GENERATING FUNCTIONS
MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 14 10/27/2008 MOMENT GENERATING FUNCTIONS Contents 1. Moment generating functions 2. Sum of a ranom number of ranom variables 3. Transforms
More information19.2. First Order Differential Equations. Introduction. Prerequisites. Learning Outcomes
First Orer Differential Equations 19.2 Introuction Separation of variables is a technique commonly use to solve first orer orinary ifferential equations. It is socalle because we rearrange the equation
More informationPurpose of the Experiments. Principles and Error Analysis. ε 0 is the dielectric constant,ε 0. ε r. = 8.854 10 12 F/m is the permittivity of
Experiments with Parallel Plate Capacitors to Evaluate the Capacitance Calculation an Gauss Law in Electricity, an to Measure the Dielectric Constants of a Few Soli an Liqui Samples Table of Contents Purpose
More informationWeb Appendices of Selling to Overcon dent Consumers
Web Appenices of Selling to Overcon ent Consumers Michael D. Grubb A Option Pricing Intuition This appenix provies aitional intuition base on option pricing for the result in Proposition 2. Consier the
More informationData Center Power System Reliability Beyond the 9 s: A Practical Approach
Data Center Power System Reliability Beyon the 9 s: A Practical Approach Bill Brown, P.E., Square D Critical Power Competency Center. Abstract Reliability has always been the focus of missioncritical
More informationModelling and Resolving Software Dependencies
June 15, 2005 Abstract Many Linux istributions an other moern operating systems feature the explicit eclaration of (often complex) epenency relationships between the pieces of software
More informationIntegral Regular Truncated Pyramids with Rectangular Bases
Integral Regular Truncate Pyramis with Rectangular Bases Konstantine Zelator Department of Mathematics 301 Thackeray Hall University of Pittsburgh Pittsburgh, PA 1560, U.S.A. Also: Konstantine Zelator
More informationState of Louisiana Office of Information Technology. Change Management Plan
State of Louisiana Office of Information Technology Change Management Plan Table of Contents Change Management Overview Change Management Plan Key Consierations Organizational Transition Stages Change
More informationMemory Management. 3.1 Fixed Partitioning
Chapter 3 Memory Management In a multiprogramming system, in orer to share the processor, a number o processes must be kept in memory. Memory management is achieve through memory management algorithms.
More informationMSc. Econ: MATHEMATICAL STATISTICS, 1995 MAXIMUMLIKELIHOOD ESTIMATION
MAXIMUMLIKELIHOOD ESTIMATION The General Theory of ML Estimation In orer to erive an ML estimator, we are boun to make an assumption about the functional form of the istribution which generates the
More informationThe Quick Calculus Tutorial
The Quick Calculus Tutorial This text is a quick introuction into Calculus ieas an techniques. It is esigne to help you if you take the Calculus base course Physics 211 at the same time with Calculus I,
More informationFactoring Dickson polynomials over finite fields
Factoring Dickson polynomials over finite fiels Manjul Bhargava Department of Mathematics, Princeton University. Princeton NJ 08544 manjul@math.princeton.eu Michael Zieve Department of Mathematics, University
More informationApplications of Global Positioning System in Traffic Studies. Yi Jiang 1
Applications of Global Positioning System in Traffic Stuies Yi Jiang 1 Introuction A Global Positioning System (GPS) evice was use in this stuy to measure traffic characteristics at highway intersections
More informationFor arbitrary a and n, let C(a, n) denote the number of cycles in G(a, n), and let c(a, n, d) be the number of cycles in G(a, n) with GCD d.
Directe Graphs Define by Arithmetic (mo n) EZRA BROWN Department of Mathematics Virginia Tech Blacksburg, Virginia 24061 0123, USA 1. Introuction. Let a an n>0beintegers, an efine G(a, n) to be the irecte
More informationFOURIER TRANSFORM TERENCE TAO
FOURIER TRANSFORM TERENCE TAO Very broaly speaking, the Fourier transform is a systematic way to ecompose generic functions into a superposition of symmetric functions. These symmetric functions are usually
More informationM147 Practice Problems for Exam 2
M47 Practice Problems for Exam Exam will cover sections 4., 4.4, 4.5, 4.6, 4.7, 4.8, 5., an 5.. Calculators will not be allowe on the exam. The first ten problems on the exam will be multiple choice. Work
More informationWeb Appendices to Selling to Overcon dent Consumers
Web Appenices to Selling to Overcon ent Consumers Michael D. Grubb MIT Sloan School of Management Cambrige, MA 02142 mgrubbmit.eu www.mit.eu/~mgrubb May 2, 2008 B Option Pricing Intuition This appenix
More informationKater Pendulum. Introduction. It is wellknown result that the period T of a simple pendulum is given by. T = 2π
Kater Penulum ntrouction t is wellknown result that the perio of a simple penulum is given by π L g where L is the length. n principle, then, a penulum coul be use to measure g, the acceleration of gravity.
More informationUCLA STAT 13 Introduction to Statistical Methods for the Life and Health Sciences. Chapter 9 Paired Data. Paired data. Paired data
UCLA STAT 3 Introuction to Statistical Methos for the Life an Health Sciences Instructor: Ivo Dinov, Asst. Prof. of Statistics an Neurology Chapter 9 Paire Data Teaching Assistants: Jacquelina Dacosta
More informationWeek 4  Linear Demand and Supply Curves
Week 4  Linear Deman an Supply Curves November 26, 2007 1 Suppose that we have a linear eman curve efine by the expression X D = A bp X an a linear supply curve given by X S = C + P X, where b > 0 an
More informationRules for Finding Derivatives
3 Rules for Fining Derivatives It is teious to compute a limit every time we nee to know the erivative of a function. Fortunately, we can evelop a small collection of examples an rules that allow us to
More informationAn intertemporal model of the real exchange rate, stock market, and international debt dynamics: policy simulations
This page may be remove to conceal the ientities of the authors An intertemporal moel of the real exchange rate, stock market, an international ebt ynamics: policy simulations Saziye Gazioglu an W. Davi
More informationLecture 13: Differentiation Derivatives of Trigonometric Functions
Lecture 13: Differentiation Derivatives of Trigonometric Functions Derivatives of the Basic Trigonometric Functions Derivative of sin Derivative of cos Using the Chain Rule Derivative of tan Using the
More informationDivision by Invariant Integers using Multiplication
Division by Invariant Integers using Multiplication Torbjörn Granlun Cygnus Support 1937 Lanings Drive Mountain View, CA 94043 0801 tege@cygnus.com Peter L. Montgomery Centrum voor Wiskune en Informatica
More informationPROBLEMS. A.1 Implement the COINCIDENCE function in sumofproducts form, where COINCIDENCE = XOR.
724 APPENDIX A LOGIC CIRCUITS (Corrispone al cap. 2  Elementi i logica) PROBLEMS A. Implement the COINCIDENCE function in sumofproucts form, where COINCIDENCE = XOR. A.2 Prove the following ientities
More informationInverse Trig Functions
Inverse Trig Functions c A Math Support Center Capsule February, 009 Introuction Just as trig functions arise in many applications, so o the inverse trig functions. What may be most surprising is that
More information2r 1. Definition (Degree Measure). Let G be a rgraph of order n and average degree d. Let S V (G). The degree measure µ(s) of S is defined by,
Theorem Simple Containers Theorem) Let G be a simple, rgraph of average egree an of orer n Let 0 < δ < If is large enough, then there exists a collection of sets C PV G)) satisfying: i) for every inepenent
More informationJON HOLTAN. if P&C Insurance Ltd., Oslo, Norway ABSTRACT
OPTIMAL INSURANCE COVERAGE UNDER BONUSMALUS CONTRACTS BY JON HOLTAN if P&C Insurance Lt., Oslo, Norway ABSTRACT The paper analyses the questions: Shoul or shoul not an iniviual buy insurance? An if so,
More information9.3. Diffraction and Interference of Water Waves
Diffraction an Interference of Water Waves 9.3 Have you ever notice how people relaxing at the seashore spen so much of their time watching the ocean waves moving over the water, as they break repeately
More informationIf you have ever spoken with your grandparents about what their lives were like
CHAPTER 7 Economic Growth I: Capital Accumulation an Population Growth The question of growth is nothing new but a new isguise for an ageol issue, one which has always intrigue an preoccupie economics:
More informationLecture L253D Rigid Body Kinematics
J. Peraire, S. Winall 16.07 Dynamics Fall 2008 Version 2.0 Lecture L253D Rigi Boy Kinematics In this lecture, we consier the motion of a 3D rigi boy. We shall see that in the general threeimensional
More information_Mankiw7e_CH07.qxp 3/2/09 9:40 PM Page 189 PART III. Growth Theory: The Economy in the Very Long Run
189220_Mankiw7e_CH07.qxp 3/2/09 9:40 PM Page 189 PART III Growth Theory: The Economy in the Very Long Run 189220_Mankiw7e_CH07.qxp 3/2/09 9:40 PM Page 190 189220_Mankiw7e_CH07.qxp 3/2/09 9:40 PM Page
More information(We assume that x 2 IR n with n > m f g are twice continuously ierentiable functions with Lipschitz secon erivatives. The Lagrangian function `(x y) i
An Analysis of Newton's Metho for Equivalent Karush{Kuhn{Tucker Systems Lus N. Vicente January 25, 999 Abstract In this paper we analyze the application of Newton's metho to the solution of systems of
More informationRisk Management for Derivatives
Risk Management or Derivatives he Greeks are coming the Greeks are coming! Managing risk is important to a large number o iniviuals an institutions he most unamental aspect o business is a process where
More informationParameterized Algorithms for dhitting Set: the Weighted Case Henning Fernau. Univ. Trier, FB 4 Abteilung Informatik 54286 Trier, Germany
Parameterize Algorithms for Hitting Set: the Weighte Case Henning Fernau Trierer Forschungsberichte; Trier: Technical Reports Informatik / Mathematik No. 086, July 2008 Univ. Trier, FB 4 Abteilung Informatik
More informationAnswers to the Practice Problems for Test 2
Answers to the Practice Problems for Test 2 Davi Murphy. Fin f (x) if it is known that x [f(2x)] = x2. By the chain rule, x [f(2x)] = f (2x) 2, so 2f (2x) = x 2. Hence f (2x) = x 2 /2, but the lefthan
More informationA Comparison of Performance Measures for Online Algorithms
A Comparison of Performance Measures for Online Algorithms Joan Boyar 1, Sany Irani 2, an Kim S. Larsen 1 1 Department of Mathematics an Computer Science, University of Southern Denmark, Campusvej 55,
More informationThe oneyear nonlife insurance risk
The oneyear nonlife insurance risk Ohlsson, Esbjörn & Lauzeningks, Jan Abstract With few exceptions, the literature on nonlife insurance reserve risk has been evote to the ultimo risk, the risk in the
More informationIntroduction to Integration Part 1: AntiDifferentiation
Mathematics Learning Centre Introuction to Integration Part : AntiDifferentiation Mary Barnes c 999 University of Syney Contents For Reference. Table of erivatives......2 New notation.... 2 Introuction
More informationA New Vulnerable Class of Exponents in RSA
A ew Vulnerable Class of Exponents in RSA Aberrahmane itaj Laboratoire e Mathmatiues icolas Oresme Universit e Caen, France nitaj@math.unicaen.fr http://www.math.unicaen.fr/~nitaj Abstract Let = p be an
More informationA New Evaluation Measure for Information Retrieval Systems
A New Evaluation Measure for Information Retrieval Systems Martin Mehlitz martin.mehlitz@ailabor.e Christian Bauckhage Deutsche Telekom Laboratories christian.bauckhage@telekom.e Jérôme Kunegis jerome.kunegis@ailabor.e
More informationarcsine (inverse sine) function
Inverse Trigonometric Functions c 00 Donal Kreier an Dwight Lahr We will introuce inverse functions for the sine, cosine, an tangent. In efining them, we will point out the issues that must be consiere
More informationLagrangian and Hamiltonian Mechanics
Lagrangian an Hamiltonian Mechanics D.G. Simpson, Ph.D. Department of Physical Sciences an Engineering Prince George s Community College December 5, 007 Introuction In this course we have been stuying
More informationA Data Placement Strategy in Scientific Cloud Workflows
A Data Placement Strategy in Scientific Clou Workflows Dong Yuan, Yun Yang, Xiao Liu, Jinjun Chen Faculty of Information an Communication Technologies, Swinburne University of Technology Hawthorn, Melbourne,
More information2 HYPERBOLIC FUNCTIONS
HYPERBOLIC FUNCTIONS Chapter Hyperbolic Functions Objectives After stuying this chapter you shoul unerstan what is meant by a hyperbolic function; be able to fin erivatives an integrals of hyperbolic functions;
More informationWitt#5e: Generalizing integrality theorems for ghostwitt vectors [not completed, not proofread]
Witt vectors. Part 1 Michiel Hazewinkel Sienotes by Darij Grinberg Witt#5e: Generalizing integrality theorems for ghostwitt vectors [not complete, not proofrea In this note, we will generalize most of
More informationView Synthesis by Image Mapping and Interpolation
View Synthesis by Image Mapping an Interpolation Farris J. Halim Jesse S. Jin, School of Computer Science & Engineering, University of New South Wales Syney, NSW 05, Australia Basser epartment of Computer
More informationSensor Network Localization from Local Connectivity : Performance Analysis for the MDSMAP Algorithm
Sensor Network Localization from Local Connectivity : Performance Analysis for the MDSMAP Algorithm Sewoong Oh an Anrea Montanari Electrical Engineering an Statistics Department Stanfor University, Stanfor,
More informationNotes on tangents to parabolas
Notes on tangents to parabolas (These are notes for a talk I gave on 2007 March 30.) The point of this talk is not to publicize new results. The most recent material in it is the concept of Bézier curves,
More informationOptimal Control Policy of a Production and Inventory System for multiproduct in Segmented Market
RATIO MATHEMATICA 25 (2013), 29 46 ISSN:15927415 Optimal Control Policy of a Prouction an Inventory System for multiprouct in Segmente Market Kuleep Chauhary, Yogener Singh, P. C. Jha Department of Operational
More informationThroughputScheduler: Learning to Schedule on Heterogeneous Hadoop Clusters
ThroughputScheuler: Learning to Scheule on Heterogeneous Haoop Clusters Shehar Gupta, Christian Fritz, Bob Price, Roger Hoover, an Johan e Kleer Palo Alto Research Center, Palo Alto, CA, USA {sgupta, cfritz,
More informationThe Elastic Capacitor and its Unusual Properties
1 The Elastic Capacitor an its Unusual Properties Michael B. Partensky, Department of Chemistry, Braneis University, Waltham, MA 453 partensky@attbi.com The elastic capacitor (EC) moel was first introuce
More information6.3 Microbial growth in a chemostat
6.3 Microbial growth in a chemostat The chemostat is a wielyuse apparatus use in the stuy of microbial physiology an ecology. In such a chemostat also known as continuousflow culture), microbes such
More informationMeanValue Theorem (Several Variables)
MeanValue Theorem (Several Variables) 1 MeanValue Theorem (Several Variables) THEOREM THE MEANVALUE THEOREM (SEVERAL VARIABLES) If f is ifferentiable at each point of the line segment ab, then there
More informationDIFFRACTION AND INTERFERENCE
DIFFRACTION AND INTERFERENCE In this experiment you will emonstrate the wave nature of light by investigating how it bens aroun eges an how it interferes constructively an estructively. You will observe
More informationMODELLING OF TWO STRATEGIES IN INVENTORY CONTROL SYSTEM WITH RANDOM LEAD TIME AND DEMAND
art I. robobabilystic Moels Computer Moelling an New echnologies 27 Vol. No. 23 ransport an elecommunication Institute omonosova iga V9 atvia MOEING OF WO AEGIE IN INVENOY CONO YEM WIH ANOM EA IME AN
More informationLECTURE 15: LINEAR ARRAY THEORY  PART I
LECTURE 5: LINEAR ARRAY THEORY  PART I (Linear arrays: the twoelement array; the Nelement array with uniform amplitue an spacing; broa  sie array; enfire array; phase array). Introuction Usually the
More informationPractical Lab 2 The Diffraction Grating
Practical Lab 2 The Diffraction Grating OBJECTIVES: 1) Observe the interference pattern prouce when laser light passes through multipleslit grating (a iffraction grating). 2) Graphically verify the wavelength
More informationDetecting Possibly Fraudulent or ErrorProne Survey Data Using Benford s Law
Detecting Possibly Frauulent or ErrorProne Survey Data Using Benfor s Law Davi Swanson, Moon Jung Cho, John Eltinge U.S. Bureau of Labor Statistics 2 Massachusetts Ave., NE, Room 3650, Washington, DC
More informationReading: Ryden chs. 3 & 4, Shu chs. 15 & 16. For the enthusiasts, Shu chs. 13 & 14.
7 Shocks Reaing: Ryen chs 3 & 4, Shu chs 5 & 6 For the enthusiasts, Shu chs 3 & 4 A goo article for further reaing: Shull & Draine, The physics of interstellar shock waves, in Interstellar processes; Proceeings
More informationPythagorean Triples Over Gaussian Integers
International Journal of Algebra, Vol. 6, 01, no., 5564 Pythagorean Triples Over Gaussian Integers Cheranoot Somboonkulavui 1 Department of Mathematics, Faculty of Science Chulalongkorn University Bangkok
More information1 HighDimensional Space
Contents HighDimensional Space. Properties of HighDimensional Space..................... 4. The HighDimensional Sphere......................... 5.. The Sphere an the Cube in Higher Dimensions...........
More informationCURVES: VELOCITY, ACCELERATION, AND LENGTH
CURVES: VELOCITY, ACCELERATION, AND LENGTH As examples of curves, consier the situation where the amounts of ncommoities varies with time t, qt = q 1 t,..., q n t. Thus, the amount of the commoities are
More informationCalibration of the broad band UV Radiometer
Calibration of the broa ban UV Raiometer Marian Morys an Daniel Berger Solar Light Co., Philaelphia, PA 19126 ABSTRACT Mounting concern about the ozone layer epletion an the potential ultraviolet exposure
More informationThe Derivative of ln x. d dx EXAMPLE 3.1. Differentiate the function f(x) x ln x. EXAMPLE 3.2
Chapter 4 Section 3 Differentiation of Logarithmic an Eponential Functions 331 3 RADIOLOGY Differentiation of Logarithmic an Eponential Functions 61. The raioactive isotope gallium67 ( 67 Ga), use in
More informationUsing Stein s Method to Show Poisson and Normal Limit Laws for Fringe Subtrees
AofA 2014, Paris, France DMTCS proc. (subm., by the authors, 1 12 Using Stein s Metho to Show Poisson an Normal Limit Laws for Fringe Subtrees Cecilia Holmgren 1 an Svante Janson 2 1 Department of Mathematics,
More informationJitter effects on Analog to Digital and Digital to Analog Converters
Jitter effects on Analog to Digital an Digital to Analog Converters Jitter effects copyright 1999, 2000 Troisi Design Limite Jitter One of the significant problems in igital auio is clock jitter an its
More informationDifferentiability of Exponential Functions
Differentiability of Exponential Functions Philip M. Anselone an John W. Lee Philip Anselone (panselone@actionnet.net) receive his Ph.D. from Oregon State in 1957. After a few years at Johns Hopkins an
More informationOption Pricing for Inventory Management and Control
Option Pricing for Inventory Management an Control Bryant Angelos, McKay Heasley, an Jeffrey Humpherys Abstract We explore the use of option contracts as a means of managing an controlling inventories
More informationProfessional Level Options Module, Paper P4(SGP)
Answers Professional Level Options Moule, Paper P4(SGP) Avance Financial Management (Singapore) December 2007 Answers Tutorial note: These moel answers are consierably longer an more etaile than woul be
More informationMOOCULUS. massive open online calculus C A L C U L U S T H I S D O C U M E N T W A S T Y P E S E T O N A P R I L 1 0,
MOOCULUS massive open online calculus C A L C U L U S T H I S D O C U M E N T W A S T Y P E S E T O N A P R I L 0, 2 0 4. 2 Copyright c 204 Jim Fowler an Bart Snapp This work is license uner the Creative
More information11 CHAPTER 11: FOOTINGS
CHAPTER ELEVEN FOOTINGS 1 11 CHAPTER 11: FOOTINGS 11.1 Introuction Footings are structural elements that transmit column or wall loas to the unerlying soil below the structure. Footings are esigne to transmit
More informationRisk Adjustment for Poker Players
Risk Ajustment for Poker Players William Chin DePaul University, Chicago, Illinois Marc Ingenoso Conger Asset Management LLC, Chicago, Illinois September, 2006 Introuction In this article we consier risk
More informationSensitivity Analysis of Nonlinear Performance with Probability Distortion
Preprints of the 19th Worl Congress The International Feeration of Automatic Control Cape Town, South Africa. August 2429, 214 Sensitivity Analysis of Nonlinear Performance with Probability Distortion
More informationStock Market Value Prediction Using Neural Networks
Stock Market Value Preiction Using Neural Networks Mahi Pakaman Naeini IT & Computer Engineering Department Islamic Aza University Paran Branch email: m.pakaman@ece.ut.ac.ir Hamireza Taremian Engineering
More informationCh 10. Arithmetic Average Options and Asian Opitons
Ch 10. Arithmetic Average Options an Asian Opitons I. Asian Option an the Analytic Pricing Formula II. Binomial Tree Moel to Price Average Options III. Combination of Arithmetic Average an Reset Options
More informationMATH 125: LAST LECTURE
MATH 5: LAST LECTURE FALL 9. Differential Equations A ifferential equation is an equation involving an unknown function an it s erivatives. To solve a ifferential equation means to fin a function that
More informationModule 2. DC Circuit. Version 2 EE IIT, Kharagpur
Moule 2 DC Circuit Lesson 9 Analysis of c resistive network in presence of one nonlinear element Objectives To unerstan the volt (V ) ampere ( A ) characteristics of linear an nonlinear elements. Concept
More informationCalculus Refresher, version 2008.4. c 19972008, Paul Garrett, garrett@math.umn.edu http://www.math.umn.edu/ garrett/
Calculus Refresher, version 2008.4 c 9972008, Paul Garrett, garrett@math.umn.eu http://www.math.umn.eu/ garrett/ Contents () Introuction (2) Inequalities (3) Domain of functions (4) Lines (an other items
More informationDEVELOPMENT OF A BRAKING MODEL FOR SPEED SUPERVISION SYSTEMS
DEVELOPMENT OF A BRAKING MODEL FOR SPEED SUPERVISION SYSTEMS Paolo Presciani*, Monica Malvezzi #, Giuseppe Luigi Bonacci +, Monica Balli + * FS Trenitalia Unità Tecnologie Materiale Rotabile Direzione
More informationENSURING POSITIVENESS OF THE SCALED DIFFERENCE CHISQUARE TEST STATISTIC ALBERT SATORRA UNIVERSITAT POMPEU FABRA UNIVERSITY OF CALIFORNIA
PSYCHOMETRIKA VOL. 75, NO. 2, 243 248 JUNE 200 DOI: 0.007/S336009935Y ENSURING POSITIVENESS OF THE SCALED DIFFERENCE CHISQUARE TEST STATISTIC ALBERT SATORRA UNIVERSITAT POMPEU FABRA PETER M. BENTLER
More informationFirewall Design: Consistency, Completeness, and Compactness
C IS COS YS TE MS Firewall Design: Consistency, Completeness, an Compactness Mohame G. Goua an XiangYang Alex Liu Department of Computer Sciences The University of Texas at Austin Austin, Texas 787121188,
More informationMeasures of distance between samples: Euclidean
4 Chapter 4 Measures of istance between samples: Eucliean We will be talking a lot about istances in this book. The concept of istance between two samples or between two variables is funamental in multivariate
More informationThe Inefficiency of Marginal cost pricing on roads
The Inefficiency of Marginal cost pricing on roas Sofia GrahnVoornevel Sweish National Roa an Transport Research Institute VTI CTS Working Paper 4:6 stract The economic principle of roa pricing is that
More informationMathematics Review for Economists
Mathematics Review for Economists by John E. Floy University of Toronto May 9, 2013 This ocument presents a review of very basic mathematics for use by stuents who plan to stuy economics in grauate school
More information20122013 Enhanced Instructional Transition Guide Mathematics Algebra I Unit 08
01013 Enhance Instructional Transition Guie Unit 08: Exponents an Polynomial Operations (18 ays) Possible Lesson 01 (4 ays) Possible Lesson 0 (7 ays) Possible Lesson 03 (7 ays) POSSIBLE LESSON 0 (7 ays)
More informationCrossOver Analysis Using TTests
Chapter 35 CrossOver Analysis Using ests Introuction his proceure analyzes ata from a twotreatment, twoperio (x) crossover esign. he response is assume to be a continuous ranom variable that follows
More informationOptimal Energy Commitments with Storage and Intermittent Supply
Submitte to Operations Research manuscript OPRE200909406 Optimal Energy Commitments with Storage an Intermittent Supply Jae Ho Kim Department of Electrical Engineering, Princeton University, Princeton,
More informationThe most common model to support workforce management of telephone call centers is
Designing a Call Center with Impatient Customers O. Garnett A. Manelbaum M. Reiman Davison Faculty of Inustrial Engineering an Management, Technion, Haifa 32000, Israel Davison Faculty of Inustrial Engineering
More informationOptimal Control Of Production Inventory Systems With Deteriorating Items And Dynamic Costs
Applie Mathematics ENotes, 8(2008), 194202 c ISSN 16072510 Available free at mirror sites of http://www.math.nthu.eu.tw/ amen/ Optimal Control Of Prouction Inventory Systems With Deteriorating Items
More information