Multiplication Algorithms for Radix-2 RN-Codings and Two s Complement Numbers

Multplcaton Algorthms for Radx- RN-Codngs and Two s Complement Numbers Jean-Luc Beuchat Projet Arénare, LIP, ENS Lyon 46, Allée d Itale F 69364 Lyon Cedex 07 jean-luc.beuchat@ens-lyon.fr Jean-Mchel Muller Projet Arénare, LIP, ENS Lyon 46, Allée d Itale F 69364 Lyon Cedex 07 jean-mchel.muller@ens-lyon.fr Abstract The RN-codngs, where RN stands for Round to Nearest, are partcular cases of sgned-dgt representatons, for whch roundng to nearest s always dentcal to truncaton. In radx, Booth recodng s an RN-codng. In ths paper, we suggest several multplcaton algorthms able to handle RN-codngs, and we analyze ther propertes.. Introducton Kornerup and Muller recently ntroduced the noton of RN-codng [7], where RN stands for Round to Nearest. An RN-codng s a radx-β sgned-dgt representaton for whch truncatng s always equvalent to roundng to nearest. For example, n radx 0, the RN-codng of π starts wth 3.4 4 3 3 54 4 0 3...where (as usually) 4 means 4. An nterestng property of these recodngs s that when we use them, there s no phenomenon of double roundng. In ths paper, we wsh to present some arthmetc operators for these RN codngs. More precsely, our goal s to add the smallest amount of hardware to a standard arthmetc and logc unt (ALU) so that t can effcently handle both two s complement numbers and RN-codngs. Before gong further, let us gve some defntons. Defnton. Let X be a radx-β number. We defne n (X) as X rounded to the nearest n-dgt radx-β number. Example. Consder the radx-0 number X =.93454. We have 8 (X) =.9345, 6 (X) =.935, 4 (X) =.9, and 3 (X) =.3. Defnton (RN-codngs [7]). Let β be an nteger greater than or equal to. The dgt sequence D = d n d n d n d n 3...d 0.d d...(wth β + d β ) s a radx-β RN-codng of X f X = j d β βj for any j. = n = d β and Theorem (Characterzatons of RN-codngs [7]). Let β be an nteger greater than or equal to. If β s odd, then D = d n d n d n d n 3...d 0.d d... s an RNcodng f and only f all dgts have absolute value less than or equal to (β )/. If β s even then D = d n d n d n d n 3...d 0.d d... s an RN-codng f and only f. all dgts have absolute value less than or equal to β/;. f d = β/, then the frst non-zero dgt that follows on the rght has an opposte sgn, that s, the largest j<such that d j 0satsfes d d j < 0. Example. The well-known Booth recodng [4] s a radx- RN-codng. Each number s wrtten on the dgt set {, 0, } and the rghtmost non-zero dgt s. Each number has therefore a sngle fnte representaton [7]. Algorthm summarzes the converson of a two s complement number, where each output dgt s deduced from a constant-szed sldng wndow of nput dgts. Consder the (n +)-bt two s complement number X = n = (00...0). Algorthm returns the (n +)-dgt number Y = n n =( 0...0), whereas the n-dgt number Z = n = (0...0) s also an RN-codng of X. If the (n +)-bt numbers we have to convert belong to the set { n,..., n }, t s therefore preferable to apply Algorthm, whch guarantees that the result s an n-dgt RN-codng [3]. In the followng, we wll represent a radx- RN-codng X wth a modfed borrow-save encodng []. We defne 063-686/05 $0.00 005 IEEE

Algorthm Booth recodng of a two s complement number. Requre: An (n + )-bt two s complement number X { n,..., n } wth x =0 Ensure: An (n +)-dgt radx- RN-codng Y such that X = Y : for =0to n do : y x x ; 3: end for Algorthm Two s complement to radx- RN-codng converson. Requre: An (n + )-bt two s complement number X { n,..., n } wth x =0 Ensure: An n-dgt radx- RN-codng Y such that X = Y : for =0to n do : y x x ; 3: end for 4: y n x n(x n x n ) x nx n x n ; two bt-strngs X + and X such that { X = X + X, f x ± =then x =0. Ths paper devoted to the multplcaton of radx- RNcodngs s organzed as follows: Secton descrbes an mprovement of a multplcaton algorthm we have ntroduced prevously. Its major drawback s that t does not share many common resources wth a conventonal multpler. In Secton 3, we show how to very slghtly adapt a conventonal multpler so that t can also handle RN-codngs. Note that an extended verson ncludng proofs of propertes and algorthms s avalable as LIP Research Report 005 05 [].. Improvement of the Multplcaton Algorthm Proposed n [3] Our frst multplcaton algorthm, publshed n [3], conssts n computng XY = X + Y + + X Y X + Y X Y + n two s complement and n convertng the result to radx- RN-codng. Accordng to Defnton and Equaton (), nether X + nor X contan a strng of ones whose length s greater than one (.e. f x ± =,thenx ± + = x ± = 0). Ths property reduces the number of partal products nvolved n the computaton of X + Y +, X Y, X + Y, and X Y + by half at the prce of OR gates (Fgure a). Fgure a depcts the archtecture of the multpler proposed n [3]. Four Partal Product Generators (PPGs) takng advantage of the above-mentoned property and four carry-save adder trees compute n parallel X + Y +, X Y, X + Y, and X Y +. Then, two adders based on (4, )- compressors and a subtracter generate the carry-save form of the product P = XY. The last step conssts n () convertng ths ntermedate result to radx- RN-codng. Snce the maxmal absolute value of an n-dgt radx- RN-codng X s n [3], the product XY belongs to { n,..., n }. We use a fast adder to compute the two s complement representaton of P and apply Algorthm to generate a (n )-dgt radx- RN-codng. Consder now the addton of X + Y + and X Y. The algorthm descrbed n [3] computes (x + 0 y+ + x+ y+ 0 )+ (x 0 y + x y 0 )=(x+ 0 y+ x+ y+ 0 )+(x 0 y x y 0 ) wth carry-save adders. Ths addton generates a carry ff x + 0 = y+ = x = y 0 =or x+ = y+ 0 = x 0 = y =. In both cases, the defnton of the radx- RN-codng guarantees that (x + 0 y+ +x+ y+ )+(x 0 y +x y )=0. Consequently, the jth partal products of X + Y + and X Y can be added wthout carry propagaton (Fgure b). In the followng, we respectvely denote the carry bt and the sum bt of (x + y+ j +x+ + y+ j )+(x y j +x + y j ) by ψ(, j) and ϕ(, j). Applyng ths notaton to our example, we obtan: (x + 0 y+ + x+ y+ 0 )+(x 0 y + x y 0 )=ψ(0, ) + ϕ(0, ), (x + 0 y+ + x+ y+ )+(x 0 y + x y )=ψ(0, ) + ϕ(0, ), and ψ(0, ) + ϕ(0, ) = ψ(0, ) ϕ(0, ). The mprovement proposed here s based on such propertes: nstead of addng X + Y + to X Y, we combne ther respectve partal products n order to compute (X + Y + +X Y ) wth a sngle n/ -operand carry-save adder. Theorem, and Algorthm 3 descrbe more precsely ths partal product generaton process. Compared to the scheme descrbed n [3], our new approach only requres half-adder () cells and OR gates, thus addng an XOR gate and an OR gate n the crtcal path. Theorem. Let X and Y be two radx- RN-codngs. Consder two functons ϕ(, j) and ψ(, j) defned as follows: ϕ(, j) =(x + y+ j x+ + y+ j ) (x y j x + y j ) () and ψ(, j) = (x + y+ j x+ + y+ j ) (x y j x + y j ) = x + y+ j x + y j x y j x+ + y+ j, (3) where j n and + j<n. Then, x + y+ j + x y j = x+ y+ j x y j, (4) x + y+ j + x y j + ψ(,j) = x + y+ j x y j ψ(,j), (5) x + y+ j + x+ + y+ j + x y j + x + y j =ψ(, j)+ϕ(, j) {0,, }, (6) ϕ(, j)+ψ(, j ) = ϕ(, j) ψ(, j ), (7) ϕ(, j)+ψ(,j)=ϕ(, j) ψ(,j). (8) 063-686/05 $0.00 005 IEEE

x y + x + y x + y + x + y + x y x y x + y + x + y + 0 x y x 0 y x + y + x + y + 0 0 x y x 0 y 0 x 0 y + 0 x + 0 y 0 x + y + x x + y + x + y + y + x x y 0 + 0 + x y + + x y + 0 + x y + y + + + 0 0 + 0 ψ (, ) ψ (0, ) ψ (0, ) ϕ (, ) ϕ (0, ) ϕ (0, ) (a) (4) (3) () (b) () (0) Fgure. (a) Computaton of X + Y + when n =3. (b) Computaton of X + Y + + X Y when n =3. Partal Product Generator Carry save adder tree (4,) compressors X + Y + X Y X + Y X Y + PPG PPG PPG PPG............ X + Y + Carry save adder X + Y + + X Y X Y Carry save subtracter X + Y X Y + n Carry save adder X + Y + X Y + partals products Carry save X + Y + + X + Y + X Y X + Y X Y + X Y PPG...... X + Y + X Y + Carry save subtracter PPG Carry save Two s complement to RN codng Two s complement Two s complement to RN codng Two s complement XY (RN codng) (a) XY (RN codng) (b) Fgure. Multplcaton of two RN-codngs. (a) Archtecture of the multpler descrbed n [3]. (b) Proposed mprovement. Example 3. Let us consder two 3-dgt RN-codngs X and Y. We could compute the sum of the sx partal products x + Y + and x Y, 0, by means of a carrysave adder tree. In ths example, Theorem allows to defne a sngle partal product (Fgure b), whose bt of weght j s denoted by λ (j) 0, 0 j 4. Accordng to Equaton (4), we replace an addton by an OR gate: λ (0) 0 = x + 0 y+ 0 x 0 y 0. Snce the computaton of λ (0) 0 does not generate a carry, we have λ () 0 = (x + 0 y+ + x+ y+ 0 + x 0 y + ) mod = ϕ(0, ) (Equaton (6). x y 0 We then apply twce Equaton (6) to compute ϕ(0, ), ψ(0, ), ϕ(, ), and ψ(, ). Fnally, Equatons (7), (8), and (5) respectvely allow to compute λ () 0, λ(3) 0, and λ(4) 0. Fgure b shows the general archtecture of the multpler. At the prce of a more complex partal product generaton, we save two carry-save adder trees and two carrysave adders based on (4, )-compressors. It s worth beng notced that the crtcal path of our new Partal Product Generator (two OR gates, one AND gate, and an XOR gate) s shorter than the one of a (4, )-compressor (three XOR gates). Let (U (c),u (s) ) denote the carry-save form of X + Y + + X Y.WehaveX + Y + + X Y =U (c) + U (s), where u (c) =0and u (s) = λ () 0 for {0,, n 3, n }. Algorthm 3 can also be appled to the computaton of (X + Y + X Y + ). Thus, we obtan a carrysave number (V (c),v (s) ) such that X + Y + X Y + = 063-686/05 $0.00 005 IEEE

Algorthm 3 Partal product generaton for the computaton of X + Y + + X Y. Requre: Two n-dgt RN-codngs X and Y ; two functons ϕ(, j) and ψ(, j) respectvely defned by Equaton () and Equaton (3) Ensure: (n ) vectors Λ (j) = λ (j) k...λ(j) 0, where k n/ : λ (0) 0 x + 0 y+ 0 x 0 y 0 ; λ() 0 ϕ(0, ); λ (n ) 0 x + n y+ n x n y n ψ(n,n ); : for =to n do 3: for j =0to do 4: λ (+) j ϕ(j, + j) ψ(j, j); 5: end for 6: λ (+) ϕ(, ); 7: end for 8: for =to n do 9: for j =0to do 0: λ () j ϕ(j, j) ψ(j, j ); : end for : λ () x + y+ 0 x y 0 ; 3: end for 4: for = n+ to n do 5: for j =0to n do 6: λ () j ϕ( +j n +,n j ) ψ( +j n, n j ); 7: end for 8: λ () ψ(n, n+ n x+ n y+ n+ x n y n+ ); 9: end for 0: for = n to n do : for j =0to n do : λ (+) j ϕ( +j n +,n j ) ψ( j n +,n j ); 3: end for 4: end for V (c) +V (s). Note that the two s complement forms of V (s) and V (c) are respectvely defned by: n V (s) =+ n =0 v (s) and n V (c) = n +8+ n 3 =3 v (c). Thus, the carry-save form P =(P (c),p (s) ) of the product can be computed as follows: P (c) + P (s) = 9+(u (s) n + v(s) n + )n + n 3 =3 =0 (u (s) (u (s) + u (c) + v(s) + v (c) ) + + v (s) ). The operator mplementng ths equaton s manly based on (4, )-compressors. P s fnally converted to radx- RNcodng. Though ths multpler reduces the area and the delay compared to the one publshed n [3], t could only share a multoperand adder wth a two s complement multpler avalable n the ALU of a processor. 3. Modfed Booth Recodng Revsted Theorem 3 (Radx-β k RN-codngs). Let Y = y n...y 0 be an n-dgt radx-β RN-codng of X. The radx-β k number Z = z n/k...z 0,wthz = k j=0 y k+jβ j, s also an RN-codng of X. Example 4 (Radx-4 modfed Booth Recodng). Tryng to recode one of the bnary operands so that ts representaton contans as many zeros as possble s a common way to desgn fast multplers. The orgnal Booth recodng was a frst attempt [4]. However, t sometmes ncreases the number of non-zero dgts of the operand and s not mplemented n modern ALUs: f we apply Algorthm to X = (0.00) =.6875, we obtan Y =(. ).A common soluton, known as radx-4 modfed Booth recodng, conssts n wrtng X on the dgt set {,...,} as follows: Z = = ((x x + )+(x x ))4 = (y + + y )4 =(.) 4 =.6875, (9) = where x 5 =0and x 3 =(sgn extenson). We deduce from Equaton (9) and from Theorem 3 that the radx-4 modfed Booth recodng s an RN-codng of X. Daumas and Matula observed that a dgt can only be followed by a negatve dgt possbly preceded by a strng of zeros [5] and that ths specal notaton s not redundant. They appled ths fact to desgn a crcut able to effcently recode both bnary and carry-save operands, but dd not dscover the roundng property of the representaton. 3.. Multplcaton Algorthm Based on Radx-4 Modfed Booth Recodng A consequence of Theorem 3 s that a two s complement multpler wth radx-4 modfed Booth recodng can easly be modfed so that the recoded operand s ether a two s complement number or a radx- RN-codng. Partal product generaton s often performed n two steps n state-ofthe-art multplers [6]: a s responsble of wrtng the operand X on the dgt set {,...,}; then, a Booth selector chooses a partal product among 0, +Y, Y, +Y, and Y accordng to a dgt of the recoded operand. Goto et al. showed that the area of the Booth selector hghly depends on the encodng of the radx-4 dgts [6] and proposed a soluton wth four bts: PL (postve), M (negatve), R (doubled factor), and R (unchanged factor). Table descrbes the computaton of these control sgnals from two dgts of a radx- RN-codng X. Note that several patterns never occur (we denote them by φ). 063-686/05 $0.00 005 IEEE

Accordng to the defnton of X, we know for nstance that x + = x =0. Buldng a Karnaugh map allows to compute the logc equatons defnng PL, M, R, and R : PL = x + + x+ x +, M = x + x x+ +, R = x + x, R = x + x = R. Fgure 3 shows the mplementaton of the proposed by Goto et al. and our for radx- RN-codngs. Accordng to Goto et al., the occupes.% of the space n a 54 54-bt multpler. Therefore, our modfcaton does not ncrease sgnfcantly the area and the delay (a multplexer added n the crtcal path) of the multpler. Table. Truth table for radx-4 modfed Booth recodng. x + + x + x+ x Func R R PL M 0 0 0 0 0 0 0 0 0 0 0 Y 0 0 0 0 0 +Y 0 0 0 0 φ φ φ φ 0 0 0 Y 0 0 0 0 φ φ φ φ 0 0 Y 0 0 0 φ φ φ φ 0 0 0 +Y 0 0 0 0 +Y 0 0 0 0 φ φ φ φ......... φ φ φ φ Ths operator s a buldng block for our second multplcaton algorthm. We suggest to compute XY = XY + XY, where X s an n-dgt radx- RN-codng, and Y + and Y are n-bt unsgned numbers. Up to ths pont, we assumed that X was ether a two s complement number or a radx- RN-codng, and that the second operand was an n-bt two s complement number. Consequently, we need to perform a sgn extenson so that the second operand s an (n +)-bt two s complement number. We defne { n y n + Y f Y s a two s complement number, Ỹ = Y otherwse. Ths mples a modfcaton of the carry-save adder tree to handle an (n +)-bt nput. However, the number of partal products as well as the depth of the tree reman unchanged. Fgure 3 descrbes an operator based on two such multpler blocks, a subtracter, two fast adders, and multplexers able to multply two s complement numbers and RN-codngs. Fve control bts allow to select the desred operaton (Table ). 3.. Combnng Roundng to Nearest and Partal Product Generaton Theorem 4 (Radx- k modfed Booth recodng). Let X be a two s complement number wth n nteger bts and m fractonal bts. Choose two numbers b and k such that k and bk < m, assume that x j = x n, j n (sgn extenson), and consder the radx- k number Y defned as follows: Y = n/k = b ( ) k k x k+k + j x k+j + x k k. j=0 Y s the radx- k modfed Booth recodng of n+bk (X). If X s exactly between two representable numbers, then Y = X + bk. Algorthm 4 Combnng roundng to nearest and radx- k RN-codng converson. Requre: A two s complement number wth n nteger bts and m fractonal bts; two ntegers b and k wth bk < m; x j = x n, j n (sgn extenson) Ensure: A radx- k RN-codng Y such that Y = n+bk (X) : for = b to n/k do k : y k x k+k + j x k+j + x k ; 3: end for j=0 Example 5. Let X = (00.00000000) = 5.3645785. We want to multply a -bt two s complement number W by (X). The standard soluton conssts n roundng X to nearest and n recodng (X) to select the partal products. We obtan (X) = (00.0000) and Y =(. ) 4 = 5.36385. Theorem 4 allows to skp the frst step: appled to X wth k =and b =4, Algorthm 4 returns Y =(. ) 4 = 5.36385. To compute 3 (X), we choose b = k =3and obtan Y =( 3. 3 ) 8 = 5.36385. Note that X s exactly between two representable numbers. We check that X + bk = X + 0 = 5.36385. We can for nstance take advantage of Theorem 4 to evaluate a polynomal wth Horner s rule. Instead of roundng ntermedate results to nearest, we slghtly modfy the Booth selector of the multpler so that t mplements Algorthm 4. Assume that k =. In standard multpler, the recodng cell responsble for the least sgnfcant dgt s smpler than other cells n the sense that t only requres two bts (we assume that x bk =0). To apply Algorthm 4, t suffces to add a thrd nput bt to ths cell so that t takes x bk nto account. 063-686/05 $0.00 005 IEEE

c Y or Y + Z or Y y n z n x x X + X X (RN codng) (two s complement) R R PL 4 n bts c 0 0 n bts Sgn extenson Booth selector Sgn extenson Booth selector...... Carry save adder/subtracter n bts c 0 (two s complement) (RN codng) x + x c 3 x + + W + W W R R PL x + M 0 0 c 4 x + M Booth recodng Booth recodng P0 P Fgure 3. A multpler handlng two s complement numbers and radx- RN-codngs. W, X, Y, and W are n-bt two s complement numbers. X +, X, Y +, and Y are n-bt unsgned numbers. Table. Some operatons mplemented by the multpler shown n Fgure 3. Operaton c 4 c 3 c c c 0 P0 XY P WZ 0 0 0 0 0 P0 XY + WZ P WZ 0 0 0 0 P0 XY WZ P WZ 0 0 0 P0 (X + X )Y P WZ 0 0 0 0 P0 (X + X )Y P (X + X )Z 0 0 0 P0 (X + X )(Y + Y ) P (X + X )Y 4. Concluson We have shown that very slghtly modfed arthmetc operators can effcently handle both conventonal bnary representatons and RN-codngs. Snce RN-codngs can also be effcently compressed for storage [7], we conclude that RN-codngs are a good canddate for numercal computatons. In a further study we wll desgn dedcated dvson and square root algorthms, as well as an ALU able to handle RN-codngs and conventonal bnary numbers. References [] J.-C. Bajard, J. Duprat, S. Kla, and J.-M. Muller. Some operators for on-lne radx- computatons. Journal of Parallel and Dstrbuted Computng, :336 345, 994. [] J.-L. Beuchat and J.-M. Muller. Multplcaton algorthms for radx- RN-codngs and two s complement numbers. Techncal Report 005 05, Laboratore de l Informatque du Parallélsme, École Normale Supéreure de Lyon, 46 Allée d Itale, 69364 Lyon Cedex 07, Feb. 005. [3] J.-L. Beuchat and J.-M. Muller. RN-codes : algorthmes d addton, de multplcaton et d élévaton au carré. In SympA 005: 0ème édton du SYMPosum en Archtectures nouvelles de machnes, pages 73 84, Apr. 005. [4] A. D. Booth. A sgned bnary multplcaton technque. Quarterly Journal of Mechancs and Appled Mathematcs, 4():36 40, 95. [5] M. Daumas and D. W. Matula. Further reducng the redundancy of a notaton over a mnmally redundant dgt set. Journal of VLSI Sgnal Processng, 33:7 8, 003. [6] G. Goto, A. Inoue, R. Ohe, S. Kashwakura, S. Mtara, T. Tsuru, and T. Izawa. A 4.-ns compact 54 54-b multpler utlzng sgn-select s. IEEE Journal of Sold-State Crcuts, 3():676 68, Nov. 997. [7] P. Kornerup and J.-M. Muller. RN-codng of numbers: defnton and some propertes. In Proceedngs of the 7th IMACS World Congress on Scentfc Computaton, Appled Mathematcs and Smulaton, Pars, July 005. 063-686/05 $0.00 005 IEEE