Fs Flog Po Squr Roo Thoms F. H, Dvd B. Mrcr Absrc H d Frr hv proposd dffr flog po squr roo lgorhms h c b ffcly mplmd hrdwr. Th lgorhms r comprd d vlud o boh prformc d prcso. Id Trms lgorhms, squr roo, dgl rhmc, flog po rhmc. E I. INTRODUCTION XTRACTING squr roos s commo ough opro h h fuco frquly fds d r o modr flog po procssors ddcd o. Uforuly, h opro s lso qu psv boh m d spc. H d Frr hv proposd lrv lgorhms h my prov o b supror o curr mhods [4]. Nwo proposd rv mhod for ppromg h roo of fuco. Gv fuco f, s frs drvv f, d ppromo, br ppromo c b foud by h followg quo: f( ) + f ( ) O fuco h c b usd o fd h squr roo of usg Nwo s mhod s f( ) f ( ) + + Howvr, hs formul rqurs dvso vry ro, whch s slf qu psv opro. A lrv fuco s f( ) f ( ) 3 + 3 ( 3 ) Muscrp rcvd Mrch 4, 005. T. F. H s wh h School of Compur & Iformo Sccs, Uvrsy of Souh Albm, Mobl, AL 36688 USA (pho: 5-460-6390; - ml: h@usouhl.du). D. B. Mrcr, s wh Th SSI Group, 47 Morrso Drv Mobl, AL 36609 (pho: 5-345-0000; -ml: dmrcr@lum.m.du). Ths fuco rqurs oly l dvso o clcul. Iros oly rqur mulplcos d subrco. Squr roo lgorhms bsd o Nwo s ro covrg o h rsul vry quckly O(l og p) whr p s h umbr of bs of prcso. Ohr lgorhms r lso us for clculg squr roos, such s Goldschmd s lgorhm [5] usd by h TI 8847 [6]. Goldschmd s lgorhm s rv lgorhm bgg wh 0 d y 0 d h rg: + y r ry wh r chos o drv rsulg y. Wh mplmd, Goldschmd s lgorhm s closly rld o Nwo s ro. I hs ppr, w compr wo lgorhms h clcul flog po squr roo wy h s sly d ffcly mplmd hrdwr. H s lgorhm hs oly b publshd s rl rpor [3]. Ths lgorhm s comprd o rc d comprbl lgorhm by Frr []. Th lgorhms r lucdd Sco II. A prformc d prcso lyss s prsd Sco III, d s followd up wh prml comprso Sco IV. Th coclusos r flly prsd Sco V. A. Rprso of Flog Po Numbrs Flog po umbrs r dsc from fd po umbrs, whch hv mpld dcml po (usully h d of h umbr h cs of grs), h hy r dscrbd by boh h dgs of h umbr d h poso of h dcml po. I dcml, w would us scfc oo of ± 0, whr < 0 d s h po (o h url log bs). I compurs, howvr, h bry umbr sysm s mor ppropr, d so h rprso s usully of h form ± whr <. Throughou hs ppr, w wll ssum rl flog po rprso h uss po bs of wo d provds cov spr ccss o h sg, mss, d po. Wh mplmo dls rqur, h 3-b IEEE 754 sdrd wll b usd, hough h lgorhms dscrbd hr c b modfd o work wh ohr flog po rprsos. B. Covrso of Fd Po Algorhms o Flog Po Th lgorhms comprd hr r frs prsd s fd po lgorhms, d h s show how hy c b covrd o flog po lgorhms. Th procss of covrg h fd po vrsos of H s d Frr s lgorhms o flog po s smlr, so h commo cocps r prsd hr rhr h hvg o b rpd for boh lgo-
rhms. Thr r hr prs o flog po umbr: h sg, h po, d h mss. Wh compug h squr roo of flog po umbr, h sg s h ss b o compu: f h sg of h pu s posv, h h squr roo wll b posv; f h sg of h pu s gv, h hr s o rl squr roo. W wll complly gor h gv cs hr, s h s s rvl d ursg o our lyss. Th po s lso qu sy o compu. whr boh msss r grr h or qul o o d lss h wo. I bry rhmc, s sly compud s >>, bu uforuly, IEEE 754 rprso, h po s rprsd bsd form, whr h cul po,, s qul o h umbr formd by bs 3 30 mus 7. Ths ms w c jus do rgh shf of h po, sc h would hlv h bs s wll. Isd, h w po IEEE 754 flog pos would b clculd s: + + 63 Compug h mss of h roo s, of cours, h m problm bg solvd by H d Frr. Howvr, s o ru flog po oo h. Isd,, f s v, f s odd s qul o Sc < 4, h fd po rprso of h mss of h pu,, mus llow for wo bs bfor h dcml po (wh s v, h mos sgfc b would b 0). I IEEE rprso, whr h mss hs 4 bs, hs ms w hr hv o us 5 bs o rprs h mss, or drop h ls sgfc b wh h po s v so w c rgh shf ldg 0 o h mos sgfc b poso. Droppg h ls sgfc b wll rsul uccpbl ccurcy (ccur oly up o bs), so w mus prsrv h ls sgfc b. Thus, h fd po umbr h wll b pssd o h wo compg lgorhms hs ppr wll b 5-b fd-po umbr wh h mpld dcml po bw h scod d hrd bs. Th pcd oupu wll b 4-b fd po umbr wh h mpld dcml po bw h frs d scod bs. No h hs s o h oly wy o gr fd-po pus d oupus. O could, for sc, mulply h mss by 3 o cr gr d hs s h pproch proposd by Frr bu h pproch chos bov s sr o mplm IEEE 754 flog po umbrs. Thrfor, c b ssumd h fd po lgorhms r wrppd h followg psudocod o cr h flog po lgorhms. SQRT() bs 3 30 of bs 0 of, dd o 5 bs by prpdg "0" 3 + 4 f s odd 5 << df 6 f( ), whr f s h mplmo of h squr roo lgorhm 7 + 63 8 "0" + 8 bs of + bs 0 of 9 rur 3 sg po mss 8 3 0 + 8 v MUX odd...7 0 7 5 +63 4 8 3 0... sg po mss Fgur. Hrdwr o mplm grc squr roo lgorhm o IEEE 754 flos. A cul mplmo of hs would hv o hdl gv, dormlzd, f, d NN pus, bu s o show hr. I hrdwr, hs c b vsulzd by Fgur. No h opmzg hs flog po wrppr s o rlv o hs sudy; s smply mpor h w us h sm wrppr wh comprg h wo lgorhms. II. ALGORITHMS Th lgorhms prsd hs ppr r dffr mplmos of fudmlly h sm cocp. ( ± b) ± b+ b whr s sm of h squr roo, d b s rl offs h s powr of wo d s succssvly hlvd ul h dsrd ccurcy s rchd. Thrfor, ch ro provds us wh ddol b of ccurcy. Ths s cors o h sdrd Nwo lgorhms, whch covrg much fsr bu volv mor compl (mcosumg) clculos. A. H s Algorhm H s lgorhm cosss of drmg h rsul b by b, by bgg wh rl offs, b, of h mos-sgfc b of h oupu. If p, h h l rl offs s b 0 p. As ch b s cosdrd, h rl offs s hlvd, so h b p. A ch ro, h rl offs,, s ddd o h roo sm,, d f b b ( + ), h h b rprsd by 3 3
3 b mus b o, so s ddd o our rsul. f < ( + b) + b ohrws No h h comprso c b wr dffr wy: < ( + b) < + b + b < b + b b + b < b b H cplzs o h fc h h h wo sds of h quly r sr o lz d upd from ro o ro h hy r o clcul ch ro. If w cll hs vrbls s d rspcvly, h s + s clculd s: + + b + s d, f s b + + b + b ( ), ohrws b + 4 s, f s < 4( s ), ohrws + s clculd s: b + b + + + + b +, f s b + + b + b < < ( ), ohrws b + ( ) +, f s < 3 ( + ) +, ohrws Pug hs cocps oghr o lgorhm, w hv H s lgorhm, SQRT_HAIN. SQRT_HAIN() [ s p-b gr] s 0 0 3 4 p for o 5 s 4 s+ p, 4mod p [mplmd s LSH o s 6 f s < 7 8 [mplmd s LSH 0 o ] 9 s s 0 + + [mplmd s LSH o ] d f + [mplmd s LSH o ] 3 rur ) Covrso To IEEE 754 Flos Covrso of SQRT_HAIN o hdl flog po umbrs s rvl. Alhough H s lgorhm rms fr p ros, sc h gr squr roo of gr hs hlf s my bs s h pu, h prcpls of h lgorhm hold for ogrs. Th s, w c cou H s lgorhm o s my bs of prcso s dsrd so log s w kow whr o plc h dcml po wh w r do. I hs mplmo, howvr, hs s sy, sc h pu s umbr bw d 4, h rsul s bw d (whch s o b lss h h pu, whch s why h hrdwr mplmo Fgur oly cos 4 bs h oupu), so w kow h dcml po wll b bw h frs d scod bs. Of cours, fr h frs p ros, h wo bs w lf-shf off sp 5 wll b zros. Flly, sc w kow h h frs b of h rsul wll b, w c shor-crcu h frs loop ro by lzg our vrbls corrcly sps 3. Th rsulg lgorhm for mplmo our IEEE 754 flo squr roo procssor s s follows. No h subscrps dc bs of h vrbl, d ll rgsrs r 4 bs log. (Alhough h pu s 5 bs log, s mmdly cu dow o 3 bs, so, fc, oly 3-b rgsr s rqurd o hold.)
4 SQRT_HAIN() [ s 5-b fd po umbr < 4] s ( >> 3) << 3 4 5 5 whl [wll loop 3 ms] 3 6 s ( s << ) + ( >> 3) 7 << 8 f s < 9 0 << s s + 3 ( << ) + 4 d f ( << ) + loop 5 s s 6 f s 7 + d f 8 rur Sps 5 7 of h loop r o drm whhr o roud dow or up, d r prl rpo of h loop. No ddol hrdwr s rqurd o prform hs chck, hough ddol ddr s rqurd for sp 7. Mhmcl proprs of squr roos r such h y squr roo wh o hs 5 h poso s rrol, so w do hv o worry bou whhr o roud up or dow (IEEE 754 spcfs h such roudg would b owrd h v umbr, bu sc h umbr s rrol, w r gurd o o o hv rsul cly hlfwy bw wo IEEE 754 umbrs). B. Frr s Algorhm Frr s lgorhm for fdg squr roos s bsd o h d h h ls of ll possbl squr roos for prculr prcso s f d ordrd, d hrfor bry srch c b prformd o h rg o fd h mchg squr roo vlu. Th ls s ordrd bcus squr roos r moooclly crsg (for vry > y, > y ), d s f, bcus hr r f umbr of umbrs h c b prssd wh h p bs usd by compur o sor umbrs. Th lgorhm s dscrbd by h followg psudocod, d llusrd Fgur. 3 3,04 40 4,68 40,600 4 4,739 48,304 44 4,764 48 44,936 64 4,096 64 8 6,384 Fgur. Fdg h squr roo of,739 usg Frr s boml srch lgorhm. I h fgur, p 6, so hr r 6 65,536 possbl umbrs h umbr sysm (0 65,535). Th mmum squr roo s hrfor 6 < 56, so h l s vlu s 56 8. Th crm bgs s hlf h l s vlu, d s hlvd wh ch ro: 64, 3, 6, 8, c. Sc 8 >,739 h crm s subrcd from h s vlu, d h sd g. A ch ro, f h squr of h s vlu s grr h,739, h h crm s subrcd from h s vlu; f h squr s lss, h h crm s ddd o h s vlu. Afr sv ros, w hv drmd h h swr ls somwhr bw 4 d 4, d w hv ru ou of bs for our prcso. To flly drm whch o o choos, w could prform o mor ro o s how 4.5 comprs o,739, bu, fc, hr s som fsr mhmcs h c b prformd o som of h rsdul vrbls o drm whch o o choos. I hs mpl, 4 s chos. SQRT_FREIRE() [ s p-b gr] b p p 3 4 b p 5 b do 6 b b 4 [mplmd s b >> ] 7 b b [mplmd s b >> ] 8 f 9 + + b [ mplmd s << ] 0 + b + b b d f 3 4 loop whl > 0 5 f 6 7 f > + 8 + d f 9 rur I h lgorhm, rprss h rsul, whch s rvly mprovd by ddg or subrcg h dl b, whch s hlvd ch ro. Isd of squrg ch ro o compr o (whch volvs cosly mulplco), s kp s 8
5 ow vrbl d updd ccordg o h formul ± b ± b+ b. ( ) Th b s kow h bgg ( p ), d s b s hlvd ch ro (l 7), b s qurrd (l 6), boh of whch c b ccomplshd wh bws rgh-shfs. Furhrmor, h ppr mulplco of b c b lmd by rlzg h b, so b, whch c g b mplmd by bws shf of bs o h lf (ls 9 d ). Thrfor, ch ro of h loop rqurs hr bws shf opros, four ddos or subrcos, d wo comprsos (o of whch s h comprso of o 0), bu o mulplcos. Frr cully dos propos slgh mprovm o h lgorhm show, bsd o h obsrvo h s rlly oly usd o shf lf ch ro o ffc h b mulplco. Sc bgs p d dcrss dow o 0, w c p p sd bg shfd lf p p (.., ) d shf rgh o b ch ro. Now b, whch ws bg ddd o or subrcd from ch ro lso ds o b shfd lf by h sm mou, d sd of rghshfg o b ch ro, s rgh-shfd wo bs. Now wh b bg rgh-shfd wo bs vry ro, s oly o b-shf wy from h old b, so w o logr hv o kp rck of h vlu s wll. Th mprovd lgorhm bcoms: SQRT_FREIRE() [ s p-b gr] p 3 b 3 p do 4 f 9 0 + + ( b>> ) ( + b) >> + ( b >> ) ( b) >> d f 3 b b >> 4 loop whl b 0 5 f 6 7 f > + 8 + d f 9 rur Ths s h lgorhm Frr covrd o hdl flog po umbrs. ) Covrso o flog Po Frr s lgorhm s show ks p-b umbr d rurs s p -b squr roo. Ths works for grs bcus h gr squr roo of gr hs hlf s my bs s h pu. Howvr, flog po, w w h oupu o hv h sm umbr of bs s h pu. Ths wll rqur us o hv wc s my ros hrough h loop d doubl-lgh rgsrs o hold rmd vlus. To ob h 4-b oupu, w mus sr wh 48-b pu, so w would bg by shfg h 5-b pu lf 3 bs o 48-b rgsr. As wh H s lgorhm, sc w kow h h frs b s gog o b o, w kow h rsul of h frs comprso, whch w c shor-crcu by lzg h vrbls s f h frs loop ro hs lrdy b compld. Th lzos sps 3 c b rplcd by rspcvly 43 3 44, b, d 9 44. Th rs of h lgorhm rms cly h sm. III. PERFORMANCE AND PRECISION ANALYSIS As hs lgorhms r dd o b mplmd hrdwr, lyss of hr prformc mus k o ccou hr hrdwr mplmo. Prcso mus lso b c ordr for h lgorhms o b cosdrd ccur ough for mos procssors ody. Foruly, s w shll s, boh lgorhms produc h full 4-b ccurcy ordr o produc h closs possbl ppromo 3-b IEEE- 754 flos. A. H s Algorhm From prformc prspcv, w c mplm H s lgorhm hrdwr smlr o Fgur 4. I s clr from h hrdwr mplmo h H s loop rqurs oly h m rqurd o prform subrco d mulpl, d lock h rsuls o h rgsrs. Ev h subrco c b shor-crcud, howvr, oc h rsul s drmd o b gv, sc h cul rsul wll o b usd h cs. b3...4 b0... 5 << << b0 << << b0 s b0... << 4 3 4 4 4 4 4 4 g SUB + b0 0 4 b0... b0... 3 3 MUX MUX 4 3 ADD Ilzo Loop Trmo 4 Fgur 3. Hrdwr o mplm H s squr roo lgorhm. All rgsrs r 4 bs log. B shfs c b hrdwrd. Loop rms fr 3 ros, wh 3. Th loop rps p, or Θ ( p), ms. Th loop lzo d rmo sps rqur oly subrco (lzo) d ddo (rmo). Ev hs opros c b mplmd wh hlf-ddrs, sc hy r oly ddg d subrcg o.
6 W c hrfor clcul h rug m of H s lgorhm by h followg formul: T ( p) + ( p )( + ) + H sub sub mu dd ( p+ ) + ( p ) [ssumg ] dd mu sub dd Ths dcs h H s lgorhm s vry fs. For lrg ough p, rdol Nwo mhods wll ouprform, sc hy k Θ(log p), hough hr coss wll b sgfcly hghr. Th mhmcl bss of H s lgorhm dcs h should produc l + bs of prcso, whr l s h umbr of loop ros. Sc h loop rs p ms, w d up wh full p bs of prcso, s rqurd by IEEE 754. H s fl ddo s du o h fc h h ( p + ) h b of h rsul my b o, whch cs h rsul mus b roudd up. Thr should b o ssus wh IEEE roudg (whch rqur h rsul hlfwy bw wo p-b flos b roudd owrd h v umbr), sc ll p-b umbrs wh squr roos of ls p sgfc bs r rrol. Ths prcso s sd h sco. B. Frr s Algorhm Frr s lgorhm c b mplmd hrdwr smlr o Fgur 4. I cors o H s lgorhm, Frr s m loop hs hr lyrs of clculos. Furhrmor hs clculos r prformd wc h prcso of H s, so ddo o h ddol hrdwr d wrs rqurd o clcul d crry h r bs, h ddos d subrcos wll k bou % logr (bs-cs ssumpo, ssumg crry-lookhd ddrs; f rppl-crry ddrs r usd, h h m wll doubl; d f crry-skp or crry-slc ddrs r usd, ddos wll k bou 4% logr). Fgur 4. Hrdwr o mplm Frr s squr roo lgorhm. Dgrm k drcly from Frr []. I hs dgrm, rgsr EAX s, EBX s, ECX s b, d EDX s. All rgsrs r 48 bs log. Loop rms fr ros, wh b 0. Omd from Frr s dgrm, howvr, r h clculos h mus b prformd fr h loop rms o hdl roudg. Th rmo clculos r bou s complcd s sgl loop ro d rqur ddol hrdwr. Frr s ru-m c b clculd s follows: T ( p) ( p )( + ) + ( + ) [ssumg ] Frr dd mu dd mu) sub dd (p 3) + p dd mu If w ssum h., d h, h dd Frr s lgorhm ks bou.44 ms s log s H s. Frr s lgorhm s vry b s prcs s H s. Is posloop procssg sslly smuls ohr ro of h loop o drm whhr o ds o b ddd o or subrcd from h rsul. Ths wll b sd h sco. A fl ssu worh cosdrg s h spc cosumd by Frr s lgorhm comprd o H s. I s clr h Frr s would occupy ls wc s much spc s H s du o h sz doublg of h rgsrs. Furhrmor, Frr s pproch rqurs mor rhmc d logcl compos, so s o ursobl o prsum h Frr s lgorhm would k.5 or 3 ms h mou of spc o chp s H s. dd IV. EXPERIMENTAL COMPARISON A progrm ws wr o s h prcso of hs lgorhms, o drm whhr h ssros md h prvous sco r vld. Alhough prformc ws lso msurd d h rsuls prsd hr, prformc ss of h lgorhms wr sofwr r oly of us s vdc for or gs rhr h proof of h prformc coclusos md h prvous sco. A. Mhodology Th wo lgorhms wr mplmd usg h sm flog-po-o-fd-po covrso so h h oly dffrc would b h wy h wo lgorhms compu h squr roo. Th wo fucos wr h usd o compu h squr roo for vry 3-b IEEE-754 flog po umbr bw o d four (clusv of o, clusv of four). Ths lms wr chos bcus hy clud vry possbl 5-b fd po mss s o h squr roo fucos for ll ormlzd flos from 6 o 8. Epdg h lms would chg oly h po whou chgg h mss. Prcso ws chckd by squrg h rsul d comprg h squr o h pu vlu. If h comprso showd h h squr of h clculd roo ws smllr h h pu, h h rsul hd h smlls possbl quum ( 3 ) ddd o o s f h rsulg squr would b closr o h pu. If wr o, h h fuco hd rurd h closs possbl 3-b IEEE-754 umbr o h squr roo. Th oppos ws do f h squr of h clculd roo ws lrgr h h pu (.., quum ws subrcd from h rsul d h squr comprd o h pu). As sy chck, H s squr roo lgorhm ws lso mplmd mcrocod o slc/6-b Ts Isrums 74AS-EVM-6 procssor usg 3-b v-odd rgsr prs. Frr s lgorhm ws o mplmd, bu rhr H s squr roo opro ws comprd o mcrocodd mu dd Obbl from h uhors.
7 flog po dvd opro o h sm rchcur. B. Rsuls Boh fucos clculd ll 6,777,6 squr roos ccurly vry b. Th progrmmd mplmo of H s lgorhm provd o b bou 86% fsr h Frr s. Howvr, s mod rlr, h rlv mgs would dpd o h spcfc hrdwr mplmos, whr som of h mcroopros opros c b cud prlll. Th mcrocodd comprsos showd h H s flog po squr roo mplmo ook oly 89% mor clock cycls h h flog po dvd opro. Th s, r slghly fsr h wo flog po dvds. Ag, hs should b rgrdd s dcv oly, sc o rchcurl opmzos wr prformd. I s cocvbl, howvr, h y opmzos o o opro could b ppld o h ohr, so h h rlv mgs my, fc, b good dcor. V. CONCLUSIONS Whl hy r quvl prcso, H s lgorhm s supror o Frr s boh hrdwr cos d m. Alhough H s lgorhm covrgs o s rsul O(p) m vrsus ohr mhods h covrg O(log p) m, s lkly h H s lgorhm s supror for smll p (sy 3, or 64), d furhr comprsos could b prformd o drm h brk-v po whr covol mhods bg o ouprform H. REFERENCES [] Frr, P. hp://www.pdrofrr.com/sqr. 00. [] Goldbrg, D. Compur Arhmc. Xro Plo Alo Rsrch Cr, 003. Publshd s App. H Compur Archcur: A Quv Approch, Thrd Edo by Hssy, J.L. d Prso, D.A., Morg Kufm, 00. Avlbl hp://books.vr.com/compos/558605967/ppdcs/558605967-ppd-h.pdf. [3] H, T. Flog Po Arhmc Procssor. Irl Rpor, CIS, Uvrsy of Souh Albm, Mobl, AL 36688, 989. [4] Jovovć, B., Dmjovć, M.: Dgl Sysms for Squr Roo Compuo, Zbork Rdov XLVI Kofrcj Er, Hrcg Nov, Mogro, Ju 003, Vol, pp. 68-7. [5] R. Goldschmd, Applco of dvso by covrgc, Msr s hss, MIT, Ju,964. [6] Su Mcrosysms, Ic, Numrcl Compuo Gud, hp://docs.su.com/sourc/806-3568/cgtoc.hml.