Numercal Algorhms manuscr No. (wll be nsered by he edor) A bnary owerng Schur algorhm for comung rmary marx roos Federco Greco Bruno Iannazzo Receved: dae / Acceed: dae Absrac An algorhm for comung rmary roos of a nonsngular marx A s resened. In arcular, comues he rncal roo of a real marx havng no nonosve real egenvalues, usng real arhmec. The algorhm s based on he Schur decomoson of A and has an order of comlexy lower han he cusomary Schur based algorhm, namely he Smh algorhm. Keywords marx h roo marx funcons Schur mehod bnary owerng echnque Mahemacs Subec Classfcaon (2000) 65F30 15A15 1 Inroducon Le be a osve neger. A rmary h roo of a square marx A C n n s a soluon of he marx equaon X A = 0 ha can be wren as a olynomal of A. If A has l dsnc egenvalues, say λ 1,..., λ l, none of whch s zero, hen A has exacly l rmary h roos. They are obaned as f(a) := 1 2π f(z)(zi A) 1 dz, (1) γ where f s any of he l analyc funcons defned on he secrum of A, denoed by σ(a) := {λ 1,..., λ l }, and such ha f(z) = z and γ s a closed conour whch encloses σ(a). The reason why f(a) s a olynomal of A s suble and s well exlaned n [10]. If A has no nonosve real egenvalues hen here exss only one rmary h roo whose egenvalues le n he secor S = {z C \ {0} : arg(z) < π/}, (2) Federco Greco, Bruno Iannazzo Darmeno d Maemaca e Informaca, Unversà d Peruga Va Vanvell 1, I-06123 Peruga, Ialy E-mal: {greco,bruno.annazzo}@dma.ung.
2 whch s called rncal h roo. The man numercal roblem s o comue he rncal h roo of A, whose alcaons arse n fnance or n he numercal comuaon of oher marx funcons [8,9,15]. In arcular f A s real and has no nonosve real egenvalues, hen he rncal h roo s roved o be real [8], and n order o comue, s referable o have an algorhm whch works enrely n real arhmec. The relable algorhms are essenally of wo knds: 1. Algorhms based on marx eraons; 2. Algorhms based on he Schur normal form. In he frs case, one uses a raonal marx eraon whch converges o he rncal h roo of A. Ths aroach s very comlcaed snce he eraons usually do no deend connuously on he nal daa, ha s, f a erurbaon on some erae s nroduced hen s oenally amlfed by he subsequen ses and could resul n numercal nsably. Moreover, n he case > 2 he convergence roeres, also n he scalar case, are hard o descrbe. The frs raonal eraon used for he square roo s he so-called Newon mehod X k+1 = 2 1 (X k + X 1 k A), whch was observed o be unsable by Laasonen [14], bu he nsably was frs analyzed by Hgham [6]. Some sable eraons have been roosed, he frs of hem s he Denman and Beavers eraon [2] and many ohers have followed [11,17]. The case > 2 s more comlcaed. In order o have a general algorhm, some knd of rerocessng of he marx A should be done. The frs general and sable algorhm was gven by Iannazzo [12] and some ohers have followed [3,4,13,16]. The comuaonal cos of hese algorhms s O(n 3 log 2 ) arhmec oeraons (os) and he sorage requred s O(n 2 log 2 ) real numbers. The algorhms based on some marx eraon show good numercal sably n he numercal ess, even f her behavor n he fne arhmec for any marx A s raccally unredcable and a horough analyss s ye o be develoed. Moreover, hese algorhms comue us he rncal h roo, and s no clear f hey can comue any of he rmary h roos of A wh he same comuaonal cos. For he second class of algorhms, n order o comue a soluon of X A = 0, one comues he Schur normal form of A, say Q AQ = R, where Q s unary and R s uer rangular and hen solves he equaon Y R = 0 and deduces X = QY Q. Snce Y s roved o be uer rangular, he equaon Y R = 0 s solved by a recurson on he elemens of Y [8]. In he moran case n whch A s real, he real Schur form of A s formed, say Q T AQ = R, where Q s orhogonal and R s quas-uer rangular, ha s real and block uer rangular wh dagonal blocks of sze 1 or 2 accordng as hey corresond o one real or a coule of comlex conugae egenvalues, resecvely. The equaon Y R = 0 s solved by a recurson on he blocks of Y whch s roved o have he same block srucure as R. Ths aroach works on he dea of he Schur-Parle recurrence for comung general marx funcons. The case = 2 was frs develoed by Börck and Hammarlng [1] n he comlex case and hen by Hgham [7] n he real case. Fnally, he case > 2 was worked ou by Smh [18]. The mehod develoed by Smh has a cos of O(n 3 ) os and requres he sorage of O(n 2 ) real numbers. If s comose, say = q 1 q 2, s hus convenen o form frs he q 1 h roo and hen he q 2 h roo of A. However, f s rme, he cos of he mehod of Smh can be large and ha makes he algorhm uneffecve.
3 A nce feaure of he Smh algorhm s ha has been roved o be backward sable, hus s n some sense omal n fne arhmec. Moreover, can comue any of he rmary h roos of A wh he same cos. We roose a new algorhm based on he Schur normal form of A whose cos s lowered o O(n 2 + n 3 log 2 ) os and he sorage s lowered o O(n + n 2 log 2 ) real numbers. The roosed algorhm combnes he advanages of beng based on he Schur form and he low comuaonal cos of he eraons. The numercal ess show ha he new algorhm reach he same numercal accuracy as he one of Smh. The aer s organzed as follows. In Secon 2 we descrbe and analyze he roosed mehod; n Secon 3 we summarze he resulng algorhm; n Secon 4 we dscuss how o furher reduce he cos of he algorhm; fnally, n Secon 5 we resen some numercal exermens whch confrm he relably of he algorhm. 2 A Schur mehod based on he bnary owerng echnque One of he mos used mehods for comung he rncal h roo, for osve neger, of a real marx A R n n havng no nonosve real egenvalues has been roosed by Smh n [18]. Snce he rncal h roo of such a marx s roved o be real, he mehod s desgned o work enrely n real arhmecs. The dea of he algorhm s o comue he real Schur normal form of A, say Q T AQ = R, where Q s orhogonal and R s real and quas-uer rangular, namely he marx s block σ σ, block uer rangular and s σ dagonal blocks are real numbers or 2 2 real marces corresondng o a coule of comlex conugae egenvalues. Once he real Schur form s obaned, one ales he ransformaon o he equaon X = A, (3) obanng he new equaon U = R, (4) where U = Q T XQ. If X s he rncal h roo of A hen U s he rncal h roo of R as well, moreover, he marx U s quas-uer rangular wh he same block srucure as R (ha follows from he fac ha a rmary h roo of a marx A s a olynomal n A, see also [8]). From a soluon U of (4) one obans a soluon of (3) usng X = QUQ T, so f U s chosen o be he rncal h roo of R, say U = R 1/ hen he rncal h roo of A s A 1/ = QUQ T (recall ha U and QUQ T have he same egenvalues). The key on of he algorhm s he soluon of (4) whch s done by a clever recurson, emloyng he same dea as he mehods of Börck and Hammarlng [1] and Hgham [6] for he marx square roo. The recurson of Smh [18, 19] s obaned consderng he sequence of marces { Ṽ (0) = U, Ṽ (k) = UṼ (k 1) = U k+1, k = 1,..., 2, (5) havng he same quas-rangular srucure as U and R. The dagonal blocks of U are obaned from he h roos of he corresondng blocks of R usng a smle formula:
4 f U s a 1 1 block, hen U s he rncal h roo of he scalar R ; f U s a 2 2 block corresondng o he comlex conugae egenvalues θ ± µ, hen U = αi + β µ (R θi), (6) where I s he 2 2 deny marx and α + β s he rncal h roo of θ + µ. The uer rangular ar of U s obaned, a block column a a me, equang he (, ) blocks n equaon (5) (more deals can be found n [8,18]). The man drawback of he Smh mehod s he hgh cos n erms of arhmec oeraons (os) and sorage for large, n arcular needs O(n 3 ) os and he sorage of O(n 2 ) real numbers. The mos exensve ar s he comuaon of he elemens of he nermedae marces Ṽ (k) and her sorage. The dea of our algorhm s o use a recurson smlar o he one of Smh bu wh less nermedae marces obanng an algorhm wh smlar feaures bu less exensve. The roosed recurson s based on he bnary owerng decomoson of he neger, ha s = log 2 k=0 b k 2 k, for a unque choce of b 0,..., b log2 {0, 1}, (7) where b log2 = 1. Observe ha b, for = 0,..., log 2, are he dgs n he bnary reresenaon of. We defne also he ses c() = {k : b k = 1}, c() + = c() \ {0}. (8) The se c() has cardnaly m + 1, for some nonnegave neger m, whle he se c() + concdes wh c() f s even (ha s 0 c()) and has cardnaly m f s odd. Clearly, m + 1 denoes he number of 1s comarng n he bnary reresenaon of. Le c 0, c 1,..., c m be he sequence obaned by sorng he elemens of c() by decreasng order. Noe ha f m > 0, hen c 0 = log 2, c h = max{k : k < c h 1, b k = 1}, h = 1,..., m, (9) whle f m = 0, hen he sequence conans us he erm c 0 = log 2. Snce U = R, log 2 R = U m = U b k2 k = U 2c h, k=0 and s ossble o devse a mehod based on a sequence of c 0 + m = O(log 2 ) nermedae marces from whch consruc a recurson for comung U. We oban he furher c 0 marces as follows { V (0) = U V (k) = V (k 1) V (k 1) (10) = U 2k, k = 1,..., c 0, h=0 and he laer m marces as follows { W (0) = V (c0) W (h) = W (h 1) V (c h), h = 1,..., m, (11)
5 where W (m) = R. The marces V (k) and W (h) have he same block srucure as R, beng quas-uer rangular. We denoe he blocks of V (k) and W (h) by V (k) and W (h), resecvely, where he ndces, go from 1 o σ, where σ 2 s he number of blocks n he aronng of R. Each choce of and could corresond o a 1 1, or o a 1 2, or o a 2 1, or o a 2 2 block, accordng o he quas-rangular srucure of R. The dea of he roosed mehod s o comue, usng (10) and (11), he blocks of U, ha s V (0), n he followng order: frs, comue he dagonal blocks of U, hen comue he uer ar of U, V (k) and W (h) a column a a me from he boom o he o. Durng he comuaon we need he dagonal blocks of U q for q = 1,...,. These blocks can be comued wh a cos of O(n 2 ) os and a sorage of O(n) real numbers. Relaons (10) and (11) can be resaed n erms of blocks, for each, = 1,..., σ such ha, for k = 1,..., c 0, and for h = 1,..., m one has V (k) = W (h) = ξ= ξ= V (k 1) ξ V (k 1) ξ, W (h 1) ξ V (c h) ξ, (12) whle for > he blocks V (k) and W (h) are zero for each k. In order o ge useful formulae we solae he erms conanng he ndces and n he sum, obanng V (k 1) V (k 1) W (h 1) V (c h) + V (k 1) V (k 1) + V (c h) W (h 1) = V (k) = W (h) B (k), (13) C (h). (14) where B (k), and C(h) denoe somehng whch s already known when one s comung he block U, n fac, for > + 1, B (k) = and for = + 1, B (k) 1 ξ=+1 V (k 1) ξ V (k 1) ξ, C (h) = = C (h) = 0. 1 ξ=+1 W (h 1) ξ V (c h) ξ, Now we show how o use equaons (13) and (14) n order o oban a sngle equaon from whch recover U for each and. The consrucon of such equaon s que echncal and wll be done n he res of he secon. Le A 1 = {(0; 1; 0)}, and le A k = (r;s;) A k 1 {(r + 2 k 1 ; s; ), (r; s; + 2 k 1 )} {(0; k; 0)}, (15) for any neger k > 1. The se A k conans 2 k 1 rles; can be easly shown ha he followng wo roeres comleely descrbe A k :
6 () (0; k; 0) A k holds; () If (r; s; ) A k, and s > 1, hen (r+2 s 1 ; s 1; ) A k, and (r; s 1; +2 s 1 ) A k. Proery (), and () allow us o reresen he elemens n A k by a ree. In Fgure 1 he ree n he case k = 3 s deced. (0; 3; 0) (6; 1; 0) (4; 2; 0) (4; 1; 2) (2; 1; 4) (0; 2; 4) (0; 1; 6) Fg. 1 A ree reresenaon of A 3 We frs exlan how he blocks of U can be consruced when = 2 k, hen he general case s descrbed. We recall ha he algorhm wll be used essenally only f s a rme number, he case = 2 k s resened us for he sake of he clary. Le us llusrae wha haens n he case = 16 o beer undersand he case = 2 k. Observe ha for = 2 k one needs only equaon (10). Le us suose ha he dagonal blocks U q for any q = 0,..., 15 have been already comued. I follows from (13) aled for k = 4 ha V (4) = V (3) V (3) + V (3) V (3) + B (4). Noe ha B (4) = U 0 B(4) U 0, and can be assocaed wh he rle (0; 4; 0) belongng o he frs level n he ree corresondng o A 4. Le us consder he erm V (3) V (3), follows from (13) aled for k = 3 ha V (3) V (3) = U 8 ( ) V (2) V (2) + V (2) V (2) + UB 8 (3). Noe ha U 8 B(3) = U 8 B(3) U 0, and ha can be assocaed wh he rle (8; 3; 0) belongng o a ar of he second level of he ree corresondng o A 4. Clearly, he
rle (0; 3; 8) aears when we subsue (13) for k = 3 n V (3) V (3). Recallng ha V (2) = U 4 (2), and ha V = U 4, we oban by makng use of (13) aled for k = 2 ha ( ) U 12 V (2) + UV 8 (2) U 4 = U 12 V (1) V (1) + V (1) V (1) + U 12 B (2) + U 8 ( V (1) V (1) + V (1) V (1) 7 ) U 4 + U 8 B (2) U 4. By argung as above, he wo erms U 12 B(2) and U 8 B(2) U 4 are assocaed wh he rles (12; 2; 0), and (8; 2; 4), resecvely. The oher wo rles belongng o he hrd level of he ree corresondng o A 4 aear, for symmery, when we smlfy V (3) V (3). Rememberng ha V (1) = U 2 (1), and ha V = U 2, we oban by makng use of (13) for k = 1 ha U 14 V (1) = U 15 V (0) + U 14 V (0) U + U 14 B (1), U 12 V (1) U 2 = U 13 V (0) U 2 + U 12 V (0) U 3 + U 12 B (1) U 2, U 10 V (1) U 4 = U 11 V (0) U 4 + U 10 V (0) U 5 + U 10 B (1) U 4, U 8 V (1) U 6 = U 9 V (0) U 6 + U 8 V (0) U 7 + U 8 B (1) U 6, whch make aear he mssng erms n A 4. In general, holds he followng resul. Lemma 1 If = 2 c0, for some osve neger c 0, hen 1 R = U q V (0) U 1 q + U. (16) (r;s;) A c0 Proof The clam s roved by nducon on c 0. Le us suose c 0 = 1, from (13) aled for k = 1 follows ha R = V (1) = V (0) V (0) + V (0) V (0) + B (1). Snce V (0) = U, V (0) = U, and A 1 = {(0; 1; 0)}, he clam rvally follows. Le us assume he clam for c 0 = c > 0, and le us rove for c 0 = c + 1. From (13) aled for k = c + 1 follows ha R = V (c+1) = V (c) V (c) + V (c) V (c) + B (c+1). By makng use of nducve hyohess and observng ha V (c) he above equaon can be wren as 2 c 1 R = U 2c U q V (0) U 2c 1 q + U (r;s;) A c 2 c 1 + U q V (0) U 2c 1 q + U (r;s;) A c = + 2 c 1 (U q+2c (r;s;) A c V (0) U 2c 1 q + U q V (0) ) (U r+2c B (s) U + U +2c ) U 2c 1 q+2 c = U 2c (c), and V = U 2c, U 2c + B (c+1) + UB 0 (c+1) U. 0
8 The former erm of he las exresson can be wren as 2 c+1 1 U q V (0) U 2c+1 1 q, whle he laer erm of he las exresson can be rearranged as U (r;s;) A c+1 as a consequence of he defnon of A k for k = c + 1. The clam hus follows. Noe ha he wo sums nvolved n R have, and 2 c0 1 = 1 erms, resecvely. Lemma 1 rovdes a bass for an algorhm for he 2 k h roo of a marx. We need he use of he Kronecker noaon [10], ha s he Kronecker roduc, he vec oeraor whch sacks he columns of a marx n a long vecor and he well-know relaon vec(axb) = (B T A) vec(x), for A, X, B marces of suable szes. Usng he Kronecker noaon and V (0) = U, equaon (16) can be rewren as 1 ( ) U 1 q T U q vec(u ) = vec R U, (17) (r;s;) A c0 whch s a lnear sysem of sze a mos 4, whose unknown s vec(u ), and he marx coeffcen and he rgh hand sde are known quanes snce hey nvolve already comued blocks. The soluon s unque as he marx coeffcen s he ransose of he one aearng n he Smh algorhm whch s roved o be nonsngular [18]. In order o go furher o he case n whch s arbrary, le us llusrae wha haens for = 23, where m = 3, c 0 = 4, c 1 = 2, c 2 = 1, c 3 = 0. By makng use of (14) for k = 3, we have ha W (3) = W (2) V (0) + W (2) V (0) + C (3) = U 22 V (0) + W (2) U + C (3), where we have used ha W (2) = U 2c 0 +2 c 1 +2 c 2 = U 22 (0), and ha V = U. The only summand whch needs o be furher reduced s he second one; accordng o (14), for k = 2 we have ha W (2) U = (W (1) V (1) + W (1) V (1) )U + C (2) U = U 20 V (1) U + W (1) U 3 + C (2) U, where we have used ha W (1) = U 2c 0 +2 c 1 = U 20 (1), and V follows from Lemma 1 ha = U 2. Moreover, hence, V (1) = U 20 V (1) U = U 20 1 1 U q V (0) U 1 q + U, (r;s;) A 1 U q V (0) U 1 q U + U 20 U U. (r;s;) A 1
9 Accordng o (14), for k = 1, we have ha W (1) U 3 = W (0) V (2) U 3 + W (0) V (2) U 3 + C (1) U 3 = U 16 V (2) U 3 + W (0) U 7 + C (1) U 3. Noe ha W (0) = U 2c 0 = U 16 (2), and ha V follows ha = U 4. Moreover, from Lemma 1 hence, V (2) = U 16 V (2) U 3 = U 16 On he oher hand, W (0) U 7 = V (4) U 7 = 3 3 U q V (0) U 3 q + U, (r;s;) A 2 U q V (0) 15 U 3 q U q V (0) U 3 + U 16 U 15 q by makng use of Lemma 1 for c 0 = 4, and of (14) for k = 0. All erms nvolvng V (0) can be groued as follows U U. 3 (r;s;) A 2 U 7 + U U, 7 (r;s;) A 4 22 U q V (0) U 22 q, whle he remanng erms can be dvded no wo summands. The frs one conanng C (h),, = 1,..., σ, can be wren as 3 h=1 C (h) U 2c h+1 + +2 c 3, where U 2c h+1 + +2 c 3 denoes he deny marx for h = 3. The second one referrng o he B (k) s, can be wren as U 23 2 c h 2 c 3 U U 2c h+1 + +2 c 3, h c(23) + (r;s;) A ch where c(23) + s he se {4, 2, 1}, accordng o defnon (8). Now we gve he man resul of hs secon.
10 m Theorem 1 Le = 2 c h be a osve neger greaer han 1 and c() + as n (8). h=0 Then 1 R = U q V (0) U 1 q + m C (h) U 2c h+1 + +2 cm h=1 + U 2 c h 2 cm U h c() + (r;s;) A ch U 2c h+1 + +2 cm, (18) where, for h = m, U 2c h+1 + +2 cm denoes he deny marx, for m = 0 he second summand on he rgh hand sde of (18) s he zero marx and he hrd summand on he rgh hand sde of (18) s he zero marx when c() + s he emy se. Proof The clam s done by nducon on m. Le us assume m = 0, hence, = 2 c0, for some osve neger c 0. The clam hus follows from Lemma 1. Le us assume he clam for m = µ, and le us rove for m = µ + 1, noe ha n µ µ hs case = 2 c h + 2 cµ+1. Le denoe he neger 2 c h, hus = + 2 cµ+1. h=0 I follows from (11) aled for h = µ + 1 ha h=0 R = W (µ+1) = W (µ) V (cµ+1) + W (µ) V (cµ+1) + C (µ+1). Noe ha W (µ) = U (cµ+1), and ha V = U 2c µ+1. By makng use of he nducon hyohess for W (µ), ( R = U V (cµ+1) 1 µ + U q V (0) U 1 q + h=1 + U 2 c h 2 cµ U h c( ) + (r;s;) A ch C (h) U 2c h+1 + +2 cµ U 2c h+1 + +2 cµ As a consequence of he relaon = + 2 cµ+1, we have ha R = U V (cµ+1) 1 µ + U q V (0) U 1 q + h=1 + U 2 c h 2 cµ 2 c µ+1 U h c( ) + (r;s;) A ch ) U 2c µ+1 + C (µ+1). C (h) U 2c h+1 + +2 cµ +2 c µ+1 + C (µ+1) U 2c h+1 + +2 cµ +2 c µ+1. As µ + 1 = m, U 2c µ+2 + +2 c µ+1 denoes he deny marx. Hence, he erm C[] := µ h=1 C(h) U 2c h+1 + +2 c µ+1 + C (µ+1) s equal o he second sum n he rgh hand sde of equaon (18) for m = µ + 1. In order o comlee he roof, we dsngush wo cases: c µ+1 = 0 and c µ+1 > 0 whch corresond o odd and even, resecvely.
11 If c µ+1 = 0, hen V (cµ+1) = V (0) = U, and U 2c µ+1 = U hold, hence, R = U V (0) 1 + U q V (0) U 1 q + C[] (19) + U 2 c h 2 cµ 2 c µ+1 U h c( ) + (r;s;) A ch U 2c h+1 + +2 cµ +2 c µ+1. Snce = + 1, he frs wo summands of he rgh hand sde of (19) can be wren 1 as U q V (0) U 1 q, whch corresonds o he frs sum n he he rgh hand sde of equaon (18) for m = µ + 1. Fnally, nong ha c() + = c( ) +, he second row n equaon (19) corresonds o he hrd sum n he he rgh hand sde of equaon (18) for m = µ + 1. The clam hus follows n he case c µ+1 = 0. Suose, now, ha c µ+1 > 0. Le us comue V (cµ+1) usng Lemma 1 for c = c µ+1. Hence, ( 2 c µ+1 1 R = U U q V (0) U 2c µ+1 ) 1 q + U (r;s;) A cµ+1 1 + U q V (0) U 1 q + C[] + U 2 c h 2 cµ 2 c µ+1 U U 2c h+1 + +2 cµ +2 c µ+1. h c( ) + (r;s;) A ch Snce = + 2 cµ+1, 2 c µ+1 1 U U q V (0) U 2c µ+1 1 1 q + U q V (0) 1 U 1 q = U q V (0) U 1 q, hus, he frs sum n he he rgh hand sde of equaon (18) for m = µ + 1 has been obaned. Moreover, U U = U 2c µ+1 (r;s;) A cµ+1 ( ) U U 0 (r;s;) A cµ+1 Nong ha c() + = c( ) + {µ + 1}, he remanng erms can be wren as U 2 c h 2 cµ 2 c µ+1 U U 2c h+1 + +2 cµ +2 c µ+1. h c() + (r;s;) A ch The roof s comleed.
12 Noe ha he frs sums nvolved n R accordng o Theorem 1 has summands. The second one has m summands, and he hrd one has h c() +(2c h 1) erms. In arcular, m + (2 ch 1) = 1 h c() + n boh cases c m = 0, and c m > 0. 3 The algorhm We summarze he algorhm for comung he rncal h roo of a real marx havng no nonosve real egenvalues. Algorhm 1 (Bnary owerng Schur algorhm for he rncal h roo of a real marx A) 1. comue a real Schur decomoson A = QRQ T, where R s block σ σ 2. comue b 0,..., b log2 and c 0,..., c m n he bnary decomoson of as n (7) and (9) 3. for = 1 : σ 4. comue U = R 1/ (usng (6) f he sze of U s 2) 5. for q = 0 : 1 comue D (q) 6. for k = 0 : c 0 se V (k) 7. W (0) = V (c0) 8. for h = 1 : m se W (h) 9. for = 1 : 1 : 1 10. for k = 1 : c 0 11. B k = 1 12. end 13. for h = 1 : m 14. C h = 1 15. end 16. solve 1 l=+1 V (k 1) ξ = U 2k, end = U q, end = W (h 1) V (c h), end V (k 1) ξ l=+1 W (h) ξ V (c h) ξ D(q) U D ( q 1) h 2 cm ) h c() + D( 2c wh resec o U 17. V (0) = U 18. for k = 1 : c 0 19. V (k) 20. end 21. W (0) = V (c0) 22. for h = 1 : m 23. W (h) = B k + V (k 1) 24. end 25. end 26. end 27. comue A 1/ = Q T UQ. V (k 1) = C h + W (h 1) V (c h) = R m h=1 C hd (2c h+1 + +2 cm ) [ (r;s;) A ch D (r) B sd () + V (k 1) V (k 1) + W (h 1) V (c h) ] D (2c h+1 + +2 cm )
13 In Ses 11, 14 and 16, we assume ha a vod sum s he zero marx, whle n Se 16 we assume ha gven a marx M, M 2c h+1 + +2 cm s he deny marx for h = m. Le us analyze he comuaonal cos of Algorhm 1. We can assume ha σ = O(n), c 0 = O(log 2 ) and m = O(log 2 ). Se 5 requres he comuaon of owers of s blocks of sze a mos 2, he cos s O(n) os. Ses 6 8 are obaned wh no more cos. Ses 10 12 requre c 0 sums from + 1 o 1 for each < 1, he resulng cos s O(n 3 log 2 ) os, he same cos s requred for Ses 13 15. Formng he coeffcens and solvng he equaon a Se 16 requres O(n 2 ) os, snce he sum on he rgh hand sde conans no more han 2 log 2 erms. Fnally, he cos of Ses 18 20 and 22 24 s O(n 2 log 2 ). In summary he cos of he algorhm s O(n 2 +n 3 log 2 ) os whch asymocally favorably comares o he Smh mehod whose cos s O(n 3 ) os. Algorhm 1 requres less oeraons also for small or n and hs leads o a faser comuaon as we wll show n Secon 5. The cos could be furher lowered as suggesed n Secon 4. Consder now he cos n memory. The man exenses are due: o he sorage of V (k), W (h) whch are O(log 2 ) n n marces for a oal of O(n 2 log 2 ) real numbers; o he sorage of he block dagonal of U q, namely, he blocks D (q), where = 1,..., σ and q = 0,..., 1 for a oal of O(n) real numbers. In summary he algorhm requres he sorage of O(n + n 2 log 2 ) real numbers. Algorhm 1 can be slghly modfed o work wh he comlex Schur form as well, n ha case one ges he rncal h roo of a comlex marx. More generally he algorhm can be used o comue any rmary h roo of a nonsngular marx A, by choosng for each egenvalue he desred h roo a Se 4, wh he resrcon ha he same branch of he h roo funcon mus be chosen for reeaed egenvalues. If wo dfferen branches of he h roo are chosen for he same egenvalue aearng n wo dfferen blocks, hen he lnear marx equaon a Se 18 adms no unque soluon, and Algorhm 1 fals. However, n ha case he resulng h roo would be nonrmary. 4 Possble furher mrovemens Algorhm 1 has a comuaonal cos whch s O(n 2 + n 3 log 2 ) os and needs he sorage of O(n + n 2 log 2 ) real numbers. The lnear deendence on s boherng snce he algorhms for he marx h roo based on marx eraons deend only on he logarhm of. I s ossble o reduce furher he comuaonal cos of Algorhm 1. The sorage of O(n) real numbers s due o he need of all he owers of U, say U q, for = 1,..., σ and q = 1,...,. The comuaonal cos of O(n 2 ) os s due o he soluon of he marx equaons 1 U q U U 1 q = R m C (h) U 2c h+1 + +2 cm (20) h=1 U 2 c h 2 cm U h c() + (r;s;) A ch U 2c h+1 + +2 cm,
14 wh resec o U, for each and. We wll exlan how o reduce hese coss for fxed and. Frs, we comue and sore λ k for any egenvalue λ of U, and for k = 2, 4,..., 2 c0, k = 2 c h 2 cm, h = 1,..., m, k = 2 c h+1 + + 2 cm, h = 1,..., m 1, for a oal amoun of O(log 2 ) values of k. Then, observe ha f R s a scalar µ, hen U = µ 1/ =: λ, hus, U k = λk ; f R s a 2 2 real marx corresondng o he coule of comlex egenvalues θ ± µ, hen s rncal h roo U s obaned from he rncal h roo of he scalar θ + µ ha s α + β by formula (6). In a smlar manner f α (k) + β (k) := (α + β) k, hen s easy o see ha U k = α (k) I + β(k) µ (R θi). (21) Now, we can roceed n removng he lnear erm n n he asymoc coss. Frs, we exlan how o consruc he marx coeffcen 1 ( M := U q 1 ) T U q wh O(log 2 ) os. Le λ = θ + µ be one of he wo egenvalues of U and le λ q = α(q) + β (q), for q = 1,..., 1, be he corresondng egenvalue of U q, hen usng (21) he marx coeffcen becomes 1 M = 1 + α ( q 1) α ( q 1) β (q) where, for λ λ, α (q) 1 I + β ( q 1) I (R 1 θ I) + µ α (q) β ( q 1) (R θ I) T I µ β (q) ( ( 1 α ( q 1) α (q) = 1 λ Re ) ( λ λ + Re )) λ, 2 λ λ λ λ ( ( 1 β ( q 1) α (q) = 1 λ Im ) ( λ λ + Im )) λ, 2 λ λ λ λ ( ( 1 α ( q 1) β (q) = 1 λ Im ) ( λ λ Im )) λ, 2 λ λ λ λ ( ( 1 β ( q 1) β (q) = 1 λ Re ) ( λ λ Re )) λ, 2 λ λ λ λ (R θ I) T (R θ I) µ µ,
whle for λ = λ holds ha ( ) λ λ λ λ = λ 1. Thus, for comung M one needs us he h ower of he egenvalues of A, whch have been already comued, and hen erformng a fxed number of arhmec oeraons. The second summand on he rgh hand sde of (20) s a sum of m = O(log 2 ) erms. I can be comued wh O(log 2 ) os, snce λ 2c h+1 + +2 cm formula (21), U 2c h+1 + +2 cm are known for each h = 1,..., m. Fnally, we dscuss how o comue n O(log 2 2 15 and, n vew of ) he las summand on rgh hand sde of equaon (20), ha s, U 2 c h 2 cm U U 2c h+1 + +2 cm. (22) h c() + (r;s;) A ch The cardnaly of c() + s O(log 2 ), so, n order o oban a oal cos of O(log 2 2 ) os we need o comue he sum (r;s;) A ch U r B(s) U n O(log 2 ) os and he remullcaon by U 2c h 2 cm and he os-mullcaon by U 2c h+1 + +2 cm n O(1) os. The laer wo asks follow from he fac ha we already know λ 2c h 2 cm and λ 2c h+1 + +2 cm and from he use of (21). To conclude, we rewre (r;s;) A ch U r B(s) U n he equvalen form c h ( s=1 r, (r,s,) A ch (U ) T U r ) vec(b (s) ), (23) where he marx (U )T U r s comued by a rck smlar o he one used for M, by usng only he known values of λ k and λk. For nsance for A 3 (comare Fgure 1) one mus comue he marces (U 6 ) T I + (U 4 ) T U 2 + (U 2 ) T U 4 + I U 2, (U 4 ) T I + I U 4. If he marces are 1 1,.e. U = λ, U = λ and λ λ, hen he comuaon s reduced o /2 1 λ 2( q 1) λ 2q = λ λ λ 2 q=1 and λ 4 λ2 + λ 4. A drawback of hs aroach s ha s based on he smlfcaon 1 λ q λ q 1 = λ λ λ λ, comung he rgh hand sde requres a lower comuaonal cos, bu s less numercally sable. An oen roblem s he ossbly o rearrange hese deas n a way such ha he resulng algorhm s sable.
16 5 Numercal exermens The analyss of he cos of Algorhm 1 of Secon 3 boh n erms of arhmec oeraons and sorage shows ha s asymocally less exensve han he mehod roosed by Smh. We show by some numercal ess ha n racce Algorhm 1 s faser han he one of Smh also for moderae values of, moreover, he wo algorhms reach he same numercal accuracy. For small such as 2 or 3 he new algorhm does no gve any advanage wh resec o he one of Smh, on he conrary he laer n mos cases s a b faser n erms of CPU me. The ess are erformed on Malab 6, wh un roundoff 2 53 1.1 10 16, where for he Smh mehod he mlemenaon room_real of Hgham s Marx Funcon Toolbox [5] s used and for he new algorhm he mlemenaon can be found a [20]. We comare he erformance of he wo algorhms on some es marces. In arcular he CPU me requred for he execuon of he wo algorhms s comued and he accuracy s esmaed n erms of he quany ρ A ( X) A := X X 1 =0 ( X 1 ) T X, where X s he comued h roo of A and s any marx norm (n our ess we used he Frobenus norm denoed by F ). In [8], he quany ρ A s roved o be a measure of accuracy more realsc han he norm of he relave resdual, say X A / A. To beer descrbe he numercal roeres of he mehods we also comue he quany β(u) = U 2 / R 2, where U s he comued roo of he (quas) rangular marx R from he Schur decomoson of A, hs quany has been nroduced n [19] as a measure of sably. The resuls are summarzed n Table 1, where n s he sze of he marces and me s he CPU me (n seconds) comued by Malab. If no oherwse saed we always comue he rncal h roo. Tes 1 We consder he quas uer rangular marx 1 1 1 1 0 2 1 1 A = 0 0 1 1 0 0 1 1 and comue s rncal h roo for some values of. Snce he dfference beween Smh s algorhm and Algorhm 1 s he recurson used o comue he h roo of a (quas) rangular marx, he es s suable o comare he accuracy and he CPU me of he wo algorhms. Tes 2 We consder a 8 8 random sochasc marx havng no nonosve real egenvalues, whch may be assumed o be he ranson marx relave o a erod of one year n a Markov model [8,9]. If one needs he ranson marx for one day, hen a 365h roo of A s requred. Observe ha 365 = 73 5 so s enough o comue he 73h roo followed by he 5h roo. The average seedu of comung he 73h roo of A wh Algorhm 1 wh resec o he one of Smh s 9, whle he resdual ρ A s essenally he same. For large, he seedu ncreases furher, for nsance, f one comues he 521h roo of A, he seedu s 60. The value of β(u) s moderae and s he same for boh algorhms.,
17 Tes 3 ([19]) We consder he marx and comue s non rncal 8h roo 1.0000 1.0000 1.0000 1.000 A = 0 1.3000 1.0000 1.0000 0 0 1.7000 1.000, 0 0 0 2.0000 X = 1.0000 6.7778 17.091 36.469 0 1.0333 5.2548 17.707 0 0 1.0686 7.1970, 0 0 0 1.0905 for whch β s large. Also n ha case he wo algorhms gve he same numercal resuls. Tes 4 We consder he 10 10 Frank marx, from Malab gallery funcon, a marx wh ll-condoned egenvalues and for whch he value of β and he condon number of he marx roos are raher large. Tes n Smh Algorhm 1 β(u) ρ A ( X) me β(u) ρ A ( X) me 1 4 11 1.06 2.78 10 17 < 0.02 1.06 1.98 10 17 < 0.02 4 101 1.06 5.21 10 17 0.58 1.06 5.21 10 17 0.05 4 1001 1.06 4.84 10 17 43 1.06 4.84 10 17 0.34 2 8 73 60.6 5.34 10 16 0.91 60.6 5.36 10 16 0.094 8 521 61.0 6.02 10 16 40 61.0 5.98 10 16 0.70 3 4 8 6.56 10 12 6.56 10 19 < 0.02 6.56 10 12 8.34 10 19 < 0.02 4 10 11 6.18 10 32 4.16 10 20 0.30 6.18 10 32 4.67 10 20 0.062 Table 1 Comarson beween Smh s algorhm and Algorhm 1 of Secon 3 for some es marces. Acknowledgmens We would lke o hank Prof. N. J. Hgham and he anonymous referees for her helful commens whch mroved he resenaon. References 1. Å. Börck and S. Hammarlng. A Schur mehod for he square roo of a marx. Lnear Algebra Al., 52/53:127 140, 1983. 2. E. D. Denman and A. N. Beavers, Jr. The marx sgn funcon and comuaons n sysems. Al. Mah. Comu., 2(1):63 94, 1976. 3. C.-H. Guo. On Newon s mehod and Halley s mehod for he rncal h roo of a marx. Lnear Algebra Al. o aear.
18 4. C.-H. Guo and N. J. Hgham. A Schur Newon mehod for he marx h roo and s nverse. SIAM J. Marx Anal. Al., 28(3):788 804, 2006. 5. N. J. Hgham. The Marx Funcon Toolbox. h://www.ma.man.ac.uk/~hgham/ mcoolbox (Rereved on November 3, 2009). 6. N. J. Hgham. Newon s mehod for he marx square roo. Mah. Com., 46(174):537 549, 1986. 7. N. J. Hgham. Comung real square roos of a real marx. Lnear Algebra Al., 88/89:405 430, 1987. 8. N. J. Hgham. Funcons of Marces: Theory and Comuaon. Socey for Indusral and Aled Mahemacs, Phladelha, PA, USA, 2008. 9. N. J. Hgham and L. Ln. On h roos of sochasc marces. MIMS EPrn 2009.21, Mancheser Insue for Mahemacal Scences, The Unversy of Mancheser, UK, Mar. 2009. 10. R. A. Horn and C. R. Johnson. Tocs n Marx Analyss. Cambrdge Unversy Press, Cambrdge, 1994. Correced rern of he 1991 orgnal. 11. B. Iannazzo. A noe on comung he marx square roo. Calcolo, 40(4):273 283, 2003. 12. B. Iannazzo. On he Newon mehod for he marx h roo. SIAM J. Marx Anal. Al., 28(2):503 523, 2006. 13. B. Iannazzo. A famly of raonal eraons and s alcaon o he comuaon of he marx h roo. SIAM J. Marx Anal. Al., 30(4):1445 1462, 2008. 14. P. Laasonen. On he erave soluon of he marx equaon AX 2 I = 0. Mah. Tables Ads Comu., 12:109 116, 1958. 15. B. Laszkewcz and K. Zȩak. Algorhms for he marx secor funcon. Elecron. Trans. Numer. Anal. To aear. 16. B. Laszkewcz and K. Zȩak. A Padé famly of eraons for he marx secor funcon and he marx h roo. Numer. Lnear Alg. Al. DOI: 10.1002/nla.656. 17. B. Men. The marx square roo from a new funconal ersecve: heorecal resuls and comuaonal ssues. SIAM J. Marx Anal. Al., 26(2):362 376, 2004/05. 18. M. I. Smh. A Schur algorhm for comung marx h roos. SIAM J. Marx Anal. Al., 24(4):971 989, 2003. 19. M. I. Smh. Numercal Comuaon of Marx Funcons. PhD hess, Unversy of Mancheser, Mancheser, England, Seember 2002. 20. h:\\bezou.dm.un.\sofware (Rereved on November 3, 2009).