Matrix Multiplication I

Matrx Multplcaton I Yuval Flmus February 2, 2012 These notes are based on a lecture gven at the Toronto Student Semnar on February 2, 2012. The materal s taen mostly from the boo Algebrac Complexty Theory [ACT] and the lecture notes by Bläser and Bendun [Blä]. Starred sectons are the ones I ddn t have tme to cover. 1 Problem statement Ths lecture dscusses the problem of multplyng two square matrces. We wll be worng n the algebrac complexty model. For us, an algorthm for multplyng two n n matrces wll mean a sequence of steps, where step l s a statement of the form t l r for any r R t l a or t l b for, {1,..., n} t l t p t q, where p, q < l and {+,,, /} c t p for p < l We wll say that such an algorthm computes the product C = AB f at the end of the program, c = a b. The runnng-tme or complexty of the algorthm s the total number of steps, dsregardng nput and output steps. Our model s non-unform. As an example, consder Strassen s algorthm for multplyng two 2 2 matrces, as coped from Wpeda: m 1 = (a 11 + a 22 )(b 11 + b 22 ) m 2 = (a 21 + a 22 )b 11 m 3 = a 11 (b 12 b 22 ) m 4 = a 22 (b 21 b 11 ) m 5 = (a 11 + a 12 )b 22 m 6 = (a 21 a 11 )(b 11 + b 12 ) m 7 = (a 12 a 22 )(b 21 + b 22 ) c 11 = m 1 + m 4 m 5 + m 7 c 12 = m 3 + m 5 c 21 = m 2 + m 4 c 22 = m 1 m 2 + m 3 + m 6 1

In our model (usng some bengn shortcuts), t wll loo le ths: t 1 a 11 + a 22 t 2 b 11 + b 22 t 3 a 21 + a 22 t 4 b 12 b 22 t 5 b 21 b 11 t 6 a 11 + a 12 t 7 a 21 a 11 t 8 b 11 + b 12 t 9 a 12 a 22 t 10 b 21 + b 22 t 11 t 1 t 2 t 12 t 3 b 11 t 13 a 11 t 4 t 14 a 22 t 5 t 16 t 6 b 22 t 17 t 7 t 8 t 18 t 9 t 10 t 19 t 11 + t 14 t 20 t 19 t 15 c 11 t 20 + t 17 c 12 t 13 + t 15 c 21 t 12 + t 14 t 21 t 11 t 12 t 22 t 21 + t 13 c 22 t 22 + t 16 The total complexty of ths algorthm s 26 operatons. Later we wll wrte such algorthms n a much better way. Strassen s algorthm can be used to multply two 2 n 2 n matrces. The ey s to wrte the matrces n bloc form: [ ] [ ] [ ] A11 A 12 B11 B 12 C11 C = 12 A 21 A 22 B 21 B 22 C 21 C 22 Each of the blocs s a 2 n 1 2 n 1 matrx. Applyng Strassen s algorthm to the bg matrces, there are some addtons and subtractons that we already now how to do, and 7 multplcatons of two 2 n 1 2 n 1 matrces. For the latter, we apply the algorthm recursvely. Eventually, everythng wll reduce to 7 n scalar multplcatons, and countless 1 addtons and subtractons. Ths constructon, called tensorng, s the basc tool used n matrx multplcaton algorthms. 1 Not really countless, as we show n the sequel. 2

Once we show that the runnng-tme s domnated by the number of scalar multplcatons, we can conclude that two matrces can be multpled n tme O(n log 2 7 ). Ths shows that ω log 2 7 2.81. Here ω, also nown as the exponent of matrx multplcaton, s defned as the nfmum of all α such that two n n matrces can be multpled n tme O(n α ) (the constant can depend on α). 2 Normal form We spent much of the prevous secton on defnng the algebrac complexty model. Now t s tme to show that we can forget about t and concentrate on algorthms of a very specfc form. Ths s the result of the followng two theorems. Theorem 1. Suppose one can multply two n n matrces n runnng-tme T. Then there s an algorthm n the followng normal form, for M = 2T : For 1 M, compute α, a lnear combnaton of entres of A. For 1 M, compute β, a lnear combnaton of entres of B. For 1 M, compute p = α β. For 1, n, compute c as a lnear combnaton of the p. All these lnear combnatons are fxed (don t depend on A, B). Our orgnal presentaton of Strassen s algorthm s n ths form. You mght as what s the pont of usng ths transformaton. After all, for each orgnal operaton we now have two multplcatons, and there are also lots of lnear combnatons to compute, each requrng up to n 2 multplcatons and addtons. Ths s explaned by the followng theorem. Theorem 2. Suppose there s an algorthm n normal form that multples two n n matrces wth M = n α. Then ω α. So asymptotcally, all the extra addtons and multplcatons don t really count. Coppersmth and Wnograd [CW] showed that n fact ω < α (we wll comment on ths later on). Together, the two theorems show that we can forget about the algebrac complexty model and concentrate on normal-form algorthms, tryng to reduce M. 2.1 Proofs* The dea of the proof of Theorem 1 s that t s enough to eep trac of the lnear and quadratc parts of all computatons nvolved, snce hgher degree parts don t matter. Proof of Theorem 1. 1. Dvsons are confusng, so let s gnore them for now. We wll prove somethng very smlar to the statement of the theorem, wth M = T and α, β both lnear combnatons of both nput matrces. If there are no dvsons, then each quantty t computed durng the algorthm s a multvarate polynomal n the nput varables. We can group t nto ts homogeneous parts. The dth homogeneous part P (d) of a polynomal P conssts of all those monomals wth total degree exactly d. For 3

example, f P = 1 + 2a + b + a 2 + 3ab then P (0) = 1 P (1) = 2a + b P (2) = a 2 + 3ab Every polynomal s the sum of ts homogeneous parts. The outputs c are all quadratc,.e. homogeneous of degree 2. So we re not really nterested n t (d) for d > 2. In order to compute for all, t s enough to compute the constant, lnear and quadratc parts of everythng (.e.,, ). We wll convert an arbtrary program nto a new program computng these parts for all of the orgnal varables. Replace a statement t r wth Replace a statement t a wth Replace a statement t t ± t wth Replace a statement t t t wth r 0 0 0 a 0 ± ± ± + + + These statements are not basc, but that s not gong to matter. Fnally, replace an output statement c t wth c. It s easy to prove by nducton that the new program also computes matrx multplcaton. Another nducton shows that: 4

All constant parts are constant (don t depend on the nputs). All lnear parts are lnear combnatons of nputs. All quadratc parts are lnear combnatons of products the orgnal program. appearng up to step n From ths t s easy to extract the (modfed) normal form. 2. We ve almost reached our normal form. The only problem s that α and β may each depend on both parts of the nput. We can always wrte α = α A + α B, β = β A + β B, separatng the two parts. Snce α β = α A β A + α A β B + α B β A + α B β B, we can separate the two nputs at the cost of quadruplng M. Nothng bad would happen f we drop α AβA and α BβB, snce at the end we don t need these terms whch are quadratc n the entres of one of the matrces. So t s really enough only to double M. 3. Now t s tme to handle dvsons. Every ratonal functon can be extended to a formal power seres (gven that the denomnator s not zero at the orgn 2 ). So t s natural to do the same thng as before, addng the followng rule for statements t t /t : / ( )/t(0) ( + )/t(0). We got these statements from reversng what we got for multplcaton above. The rest of the proof goes through. To prove Theorem 2, we follow the same technque we used to show that Strassen s algorthm appled recursvely to 2 n 2 n matrces results n 7 n scalar multplcatons, only ths tme we also account for the rest of the operatons. We get a recurrence whose soluton s O(n α ). Proof of Theorem 2. Le we dd for Strassen s algorthm, we can extend the gven algorthm to an algorthm for multplyng two n n matrces. Denote ts runnng tme by T (), so T (1) = n α. What s T ( + 1)? The frst thng we have to do s compute M lnear combnatons n the entres of A, namely the α. Each lnear combnaton taes roughly 3n 2 operatons, for a total of 3Mn 2. The same number of operatons s needed to compute the M lnear combnatons β. Next, we multply these usng MT () operatons. Fnally, we compute n 2 lnear combnatons, one for each entry n the target matrx. So T ( + 1) n α T () + (6M + n 2 )n 2. One can show that necessarly α > 2 (n other words, you cannot multply two n n matrces wth M = n 2 ), and so the soluton to the mplct recurrence s T () = O(n α ). In terms of the matrx sze N = n, ths s T () = O(N α ). 2 That happens n our case snce the algorthm should correctly compute the product of two zero matrces. 5

3 Tensor notaton There s a neat way to wrte algorthms n normal form. Start by wrtng α t as a sum of a and β t as a sum of b. For Strassen s algorthm, for example, we d have α 1 = a 11 + a 22, β 1 = b 11 + b 22, and so on. Next, each c s a lnear combnaton c = M t=1 C t α tβ t. Defne γ t = t C t c, where we thn of c as formal varables. For example, n Strassen s algorthm we have γ 1 = c 11 + c 22. Let s loo at the expresson M t=1 α tβ t γ t. What s the coeffcent of c? It s the value that c gets at the end of the algorthm,.e. a b. So n a formal sense, M α t β t γ t = t=1 Here s Strassen s algorthm n ths form: 2,,=1 a b c = n,,=1 a b c. (a 11 + a 22 )(b 11 + b 22 )(c 11 + c 22 )+ (a 21 + a 22 )b 11 (c 21 c 22 )+ a 11 (b 12 b 22 )(c 12 + c 22 )+ a 22 (b 21 b 11 )(c 11 + c 21 )+ (a 11 + a 12 )b 22 ( c 11 + c 12 )+ (a 21 a 11 )(b 11 + b 12 )c 22 + (a 12 a 22 )(b 21 + b 22 )c 11 We denote the left-hand sde by 2, 2, 2. In general, n, m, p = n m p a b c =1 =1 =1 corresponds to multplyng an n m matrx by an m p matrx, obtanng an n p matrx as a result. We call these new obects tensors. They are generalzatons of matrces (see below). The mnmal M such that a tensor T can be represented as a sum M =1 α β γ, where α, β, γ are lnear combnatons of entres of A, B, C (respectvely) s nown as the ran of T, denoted R(T ). Strassen s algorthm shows that R( 2, 2, 2 ) 7 (we actually have equalty n ths case). In earler lterature, the ran s also called the number of essental multplcatons. How does tensor ran generalze matrx ran? Encode a matrx A of dmenson n m as n =1 =1 m a x y. 6

The ran of ths tensor s the mnmal R such that ( n m R n a x y = b ) m x c y. =1 =1 =1 In terms of matrces, ths corresponds to a representaton A = =1 R b c, =1 where b s a column vector and c s a row vector (so these are all outer products). Ths expresses each row of A as a lnear combnaton of the row vectors c, and so R s the row ran of A. Swtchng the roles, we see that R s also the column ran of A. The symmetrc form of ths representaton thus mmedately mples that the row ran equals the column ran. We can rephrase Theorem 2 usng our new termnology: =1 ω log n R( n, n, n ). (1) As we ve already commented, the nequalty s actually strct. Why dd we bother descrbng ths tensor notaton? The new notaton shows the symmetry among the varables a, b, c. Startng wth the tensor for n, m, p, f we swtch a wth b then we get the tensor p, m, n (we also need to swtch some of the ndces). Gong over all permutatons, we fnd out that R( n, m, p ) = R( n, p, m ) = R( m, n, p ) = R( m, p, n ) = R( p, n, m ) = R( p, m, n ). (2) Suppose we now that R( n 1, m 1, p 1 ) R 1 and R( n 2, m 2, p 2 ) R 2. What can we say about R( n 1 n 2, m 1 m 2, p 1 p 2 )? We can thn of the n 1 n 2 m 1 m 2 matrx A as an n 1 n 2 matrx whose entres are n 2 m 2 matrces. The matrces B, C can be decomposed analogously. We then apply the algorthm showng that R( n 1, m 1, p 1 ) R 1. Each tme we have to multply two elements, whch are now matrces, we use the algorthm showng R( n 2, m 2, p 2 ) R 2. Ths wll reduce everythng to computatons of R 1 R 2 matrx products. So R( n 1 n 2, m 1 m 2, p 1 p 2 ) R( n 1, m 1, p 1 )R( n 2, m 2, p 2 ). (3) Let s apply ths together wth the symmetry relatons: Ths has the mplcaton that R( nmp, nmp, nmp ) R( n, m, p ) 3. Ths generalzes (1). We ll see (and use) even more general forms below. ω 3 log nmp R( n, m, p ). (4) 7

4 Border ran Bn came up wth the followng dentty: ɛ(a 11 b 11 c 11 + a 11 b 12 c 21 + a 12 b 21 c 11 + a 12 b 22 c 21 + a 21 b 11 c 12 + a 21 b 12 c 22 )+ ɛ 2 (a 11 b 22 c 21 + a 11 b 11 c 12 + a 12 b 21 c 22 + a 21 b 21 c 22 ) = (a 12 + ɛa 11 )(b 12 + ɛb 22 )c 21 + (a 21 + ɛa 11 )b 11 (c 11 + ɛc 12 ) a 12 b 12 (c 11 + c 21 + ɛc 22 ) a 21 (b 11 + b 12 + ɛb 21 )c 11 + (a 12 + a 21 )(b 12 + ɛb 21 )(c 11 + ɛc 22 ). On the left-hand sde, we have ɛ tmes the tensor correspondng to the followng partal matrx multplcaton: [ ] [ ] [ ] a11 a 12 b11 b 12 c11 c = 12. a 21 0 b 21 b 22 c 21 c 22 From a computatonal pont of vew, here s how you d use ths dentty to compute ths partal matrx multplcaton approxmately (f you could calculate wth nfnte precson). Pc ɛ > 0 very small, and use the dentty. Dvde the result by ɛ. We obtan roughly the result. By pcng ɛ nfntesmal, we can compute the matrx multplcaton exactly. But there s a way to do t even wthout nonstandard analyss, as we ll see below. Frst, let s see how you d use the dentty for multplyng large square matrces. Puttng together two copes of the dentty, we get an dentty equatng ɛ 3, 2, 2 + O(ɛ 2 ) wth a sum of 10 terms (the bg-o notaton hdes terms whch grow le ɛ 3 or smaller). Symmetrzng, we get an dentty equatng ɛ 3 12, 12, 12 + O(ɛ 4 ) wth a sum of 10 3 terms. Tang the Nth tensor power, we get an dentty equatng ɛ 3N 12 N, 12 N, 12 N + O(ɛ 3N+1 ) wth a sum of 10 3N terms. What do we do wth the new dentty? We re only really nterested n all terms wth coeffcent ɛ 3N. How do we solate them? Consder the product (a 12 + ɛa 11 )(b 12 + ɛb 22 )c 21. If we wanted to compute only the coeffcent of ɛ, we would need to compute a 11 b 12 c 21 + a 12 b 22 c 21. The general stuaton s smlar. For each gven term α β γ, n order to compute the coeffcent of ɛ 3N, we wll tae terms correspondng to coeffcents d α, d β, d γ from α, β, γ (respectvely) such that d α + d β + d γ = 6N, and sum all of these. There are at most ( ) 3N+2 2 = O(N 2 ) such terms. So ω log 12 N O(N 2 10 3N ) 3 log 12 10 2.78. The polynomal factor O(N 2 ) doesn t really mae any dfference n the lmt. In other words, the fact that the dentty was only approxmate maes no dfference. Bn s dentty (after tang two copes of t) shows that R( 3, 2, 2 ) 10. The border ran R(T ) of a tensor T s exactly the mnmum M such that ɛ d T + O(ɛ d+1 ) = α β γ for some d. Our argument shows n general that ω 3 log nmp R( n, m, p ). (5) 8

Border ran satsfes many of the propertes of ran. For example, t s symmetrc 2 and submultplcatve 3. We can thn of t as a generalzaton of ran whch s good enough to obtan bounds on ω. Bn s partcular dentty can be leveraged even more, showng that ω 3 log 6 5 2.70, usng the methods of 6. In earler lterature, these denttes are nown as λ-computatons (so ɛ s replaced wth λ). Also, negatve powers of λ are used so that the left-hand sde has an error term of magntude O(λ). 5 Schönhage s τ theorem Schönhage dscovered the followng surprsng dentty: 4 4 3 3 ɛ 2 A B C + X 3+ Y 3+ Z + O(ɛ 3 ) = 3 =1 =1 =1 =1 =1 =1 3 (A + ɛx 3+ )(B + ɛy 3+ )(ɛ 2 C + Z)+ 3 A (B 4 ɛ =1 3 Y 3+ )(ɛ 2 C 4 + Z) + =1 A 4 B 4 (ɛ 2 C 44 + Z) Ths dentty shows that ( 4 =1 A ) 4 =1 B 3 (A 4 ɛ =1 Z. 3 X 3+ )B (ɛ 2 C 4 + Z)+ =1 R( 4, 1, 4 1, 9, 1 ) 17. (In fact there s equalty.) The dentty shows how to compute the outer product 4, 1, 4 of two vectors of length 4 along wth the nner product 1, 9, 1 of two other vectors of length 9 (the symbol emphaszes the fact that the two products concern dfferent varables) wth only 4 2 + 1 multplcatons. Ths s surprsng, snce easy arguments show that R( 4, 1, 4 ) = 16, and wth one more multplcaton t s suddenly possble to compute along an extra (long) nner product. How do we use ths dentty? Schönhage nvented hs τ-theorem (also: asymptotc sum nequalty) to answer ths queston. We wll approach hs soluton through a seres of smple steps. For smplcty, we wll only explctly consder ran, but everythng we do also wors wth border ran, whch s how we state our results. 1. Suppose we had an dentty showng that R( n, n, n ) M (here n, n, n s the drect sum of copes of n, n, n wth dsont varables). How would we use t to multply square matrces? Suppose we wanted to multply two n T n T matrces, where T s very large. We apply our new algorthm recursvely. In the frst level we can t really tae advantage of the full abltes of our algorthm, snce we only have one matrx product to compute. In the second level, we already have M of these, so we need to apply our algorthm recursvely only M/ tmes. As we go along, we wll be able to group the products we need to compute at each level to groups of sze, and leverage our algorthm almost perfectly. So asymptotcally, our algorthm behaves as f we had a way to multply two n n matrces usng M/ products (note that M/ need not be ntegral). 9

The detals wor out, showng R( n, n, n ) ω log n. (6) 2. A smple symmetrzaton argument shows that 3. Consder now Schönhage s dentty, showng R( n, m, p ) ω 3 log nmp. (7) R( 4, 1, 4 1, 9, 1 ) 17. Compute the Nth tensor power: ( N ) R c n, m, p 17 N, where c = =0 ( ) N, n = p = 4, m = 9 N. Our goal s to obtan a bound usng (7). If we were amng at a bound ω 3τ, then we d need 17 N c (n m p ) τ. It s natural to defne accordngly the volume of a tensor n, m, p by V τ ( n, m, p ) = (nmp) τ, and extend the defnton addtvely to drect sums. Ths noton of volume s multplcatve,.e. V τ (T 1 T 2 ) = V τ (T 1 )V τ (T 2 ), so ( N ) V τ c n, m, p = V τ ( 4, 1, 4 1, 9, 1 ) N = (16 τ + 9 τ ) N. =0 Snce there are only N + 1 terms on the left, there must be some term satsfyng c (n m p ) τ (16τ + 9 τ ) N N + 1 We want ths to be roughly equal to 17 N, so we choose τ so that Wth ths value of τ, formula (7) mples that 16 τ + 9 τ = 17. ω 3 log n m p 17 N c 3τ. In the lmt, we actually get ω 3τ. In our case, τ 0.85 and 3τ 2.55. Ths proof generalzes to gve Schönhage s τ-theorem: ( ) (n m p ) τ = R n, m, p mples ω 3τ. (8). 10

An alternatve form shows why t s worthy of ts other name, the asymptotc sum nequalty: ( ) (n m p ) ω/3 R n, m, p. (9) Coppersmth and Wnograd [CW, Corollary 3.2] showed that R( n, m, p 1, R( n, m, p ) m(n + p 1), 1 ) R( n, m, p ) + m. Ths shows that startng wth an algorthm for n, m, p, we can always obtan a better algorthm (yeldng a better ω through the asymptotc sum nequalty). In other words, no sngle algorthm for n, m, p can yeld the optmal ω. A smlar result of thers from the same paper (Corollary 3.5) mples that the nequalty n (8) s always strct. 6 Fast multplcaton of rectangular matrces* For ths secton, we change our focus and concentrate on multplcaton problems of the type n, n, n α. We want to fnd a value of α such that R( n, n, n α ) = Õ(n2 ). Followng Coppersmth s footsteps [Cop2], we consder another dentty due to Schönhage: ɛ 2 (a 11 b 11 c 11 + a 11 b 12 c 21 + a 12 b 21 c 11 + a 13 b 31 c 11 + a 22 b 21 c 12 + a 23 b 31 c 12 ) + O(ɛ 3 ) = (a 11 + ɛ 2 a 12 )(b 21 + ɛ 2 b 11 )c 11 + (a 11 + ɛ 2 a 13 )b 31 (c 11 ɛc 21 )+ (a 11 + ɛ 2 a 22 )(b 21 ɛb 12 )c 12 + (a 11 + ɛ 2 a 23 )(b 31 + ɛb 12 )(c 12 + ɛc 21 ) a 11 (b 21 + b 31 )(c 11 + c 12 ). Ths dentty descrbes a fast way to approxmately multply a partal 2 3 matrx wth a partal 3 2 matrx usng only fve multplcatons: ( ) a11 a 12 a 11 b 12 ( ) 13 b b 0 a 22 a 21 0 c11 c = 12. 23 c b 31 0 21 c 22 It s mportant here that ths dentty s tght: there s a trval matchng lower bound for the border ran, comng from the number of ndetermnates n the A matrx. Tae ths dentty and tensor t N tmes to get a new dentty nvolvng 5 N multplcatons, showng how to multply a partal 2 N 3 N matrx wth a partal 3 N 2 N matrx. These matrces have a very complcated pattern of zeroes, but we wll only be nterested n the number of non-zero entres n each column of the frst matrx, and the number of non-zero entres n each row of the second matrx. There are N + 1 dfferent types of such ndces (does that sound famlar?). Type nvolves m = ( ) N 2 columns of the frst matrx wth n = 2 non-zero entres, and matchng rows of the second matrx wth p = 2 N non-zero entres. Suppose we group together all ndces of type, and zero all other entres n the two matrces. We get an dentty for approxmately computng AB, where A conssts of m columns, each havng 11

n non-zero entres, and B conssts of m rows, each havng p non-zero entres. So t s morally a tensor of type n, m, p. In fact, we can actually encode n, m, p nsde the product AB (see below). Ths shows that R( n, m, p ) 5 N. Symmetrze once to get R( n m, n m, p 2 ) 5 2N. We would le n m 5 N. Snce n m = ( ) N 4, n m s maxmzed for = 4N/5, at whch pont n m = Θ(5 N / N). For ths we have p 2 = 4N/5 = 5 αn for α = (log 5 4)/5 0.17227. Concludng, ( ) 5 N R, 5N, 5 αn 5 2N. N N Tang n = 5 N / N, Ths mples that R( n, n, n α ) = O(n 2 log n). R( n, n, n α ) = O(n 2 log 2 n). The usual tensorng trc shows that one can multply an n n α matrx by an n α n matrx usng O(n 2 log 2 n) arthmetc operatons. Moreover, as observed by Ryan Wllams [Wl], we can descrbe all these transformatons unformly. Ryan Wllams used fast matrx multplcaton to solve SAT on certan symmetrc crcuts on all nputs very qucly. Later, Andreas Börlund found a more elementary dynamc programmng accomplshng the same tas, so that fast matrx multplcaton s no longer needed. 6.1 Proof It remans to prove that n, m, p matrx multplcaton can be embedded n AB matrx multplcaton f A conssts of m columns, each havng n non-zero entres, and B conssts of m rows, each havng p non-zero entres. Let A be an n m matrx. We wll construct a (constant) matrx T translatng between A and A: gven A, we wll be able to fnd A so that A = T A. The same constructon wll also gve a matrx S translatng between an arbtrary m p matrx B and B, n the sense that B = BS. We can then reduce A B to AB usng A B = T (AB)S. How do we fnd the matrx T? We show one way to do ths for small numbers, the general case beng smlar. Suppose that n = 2 and that A has three rows, so the matrx T has dmensons 2 3. Our matrx T wll be [ ] 1 1 0 T =. 1 0 1 The matrx T corresponds to the lnear transformaton T (x, y, z) = (x + y, x + z). Ths lnear transformaton has the property that, gven the constrant that only two of x, y, z are non-zero, we can stll encode any par (a, b): T (a, b a, 0) = (a, b) T (a, 0, b a) = (a, b) T (0, a, b) = (a, b) Now tae each column of the matrx A, and use the approprate formula to encode t nto a column of A. Ths encodng ensures that A = T A. More generally, the property we need from the matrx T s the all square mnors are non-zero. Ths s certanly the case for a random matrx T. 12

6.2 Alternatve method In the same paper descrbng the precedng constructon, Coppersmth also descrbed a slghtly nferor constructon gvng only α 0.1402. However, the same method, combned wth the dentty and methods of [CW], gves a much better result, α 0.29462. Ths has recently been mproved by Le Gall [Gal] to α 0.30298. These constructons apparently cannot be used for Wllams result snce they only gve O(n 2+ɛ ) algorthms for any ɛ > 0. The dea s to start wth an dentty by Schönhage whch we have already consdered (wth dfferent parameters): 3 3 2 2 ɛ 2 A B C + X 2+ Y 2+ Z + O(ɛ 3 ) = 2 =1 =1 =1 =1 =1 =1 2 (A + ɛx 2+ )(B + ɛy 2+ )(ɛ 2 C + Z)+ 2 A (B 3 ɛ =1 2 Y 2+ )(ɛ 2 C 3 + Z) + =1 A 3 B 3 (ɛ 2 C 33 + Z) Ths dentty shows that ( 3 =1 A ) 3 =1 B 2 (A 3 ɛ =1 Z. 2 X 2+ )B (ɛ 2 C 3 + Z)+ =1 R( 3, 1, 3 1, 4, 1 ) 10. Agan, ths dentty s tght. The dea now s to tae a hgh tensor power wthout symmetrzng: N ( ) R N 3, 4 N, 3 10 N. =0 The dea now s to choose an approprate and zero everythng else. Whch should we pc? The same method used for provng the asymptotc sum nequalty shows that for large M, R( 3 M, 4 (N )M, 3 M ) 10NM ( N ) M. In order to get a tght bound, we would le the rght-hand sde to be approxmately (3 M ) 2. In other words, tang the Mth root, we want ( N ) 9 10 N. When provng the asymptotc sum nequalty, t was enough to comment that such a can be found by tang the maxmum over ( N ) 9, snce the sum of these N + 1 terms s 10 N. In ths case we also need to now the value of, whch s roughly 0.9N. Ths shows that roughly speang, R( 3 0.9NM, 4 0.1NM, 3 0.9NM ) 9 0.9NM. Puttng n = 3 0.9NM, the other ndex s 4 0.1NM = n α for α = 0.1 log 4/0.9 log 3 0.1402. When unrollng the constructon, for techncal reasons we only get a bound of the form O(n 2+ɛ ) rather than Õ(n2 ) as before. 13

References [ACT] Peter Bürgsser, Mchael Clausen and M. Amn Shorollah, Algebrac Complexty Theory, Sprnger, 1997. [Blä] Marus Bläser, Complexty of blnear problems (lecture notes scrbed by Faban Bendun), http://www-cc.cs.un-saarland.de/teachng/ss09/complextyofblnearproblems/ scrpt.pdf, 2009. [Cop] Dan Coppersmth, Rapd multplcaton of rectangular matrces, SIAM J. Comput. 11:467 471, 1982. [Cop2] Dan Coppersmth, Rectangular matrx multplcaton revsted, J. Comp. 13:42 49, 1997. [CW] Dan Coppersmth, Shmuel Wnograd, On the asymptotc complexty of matrx multplcaton, SIAM J. Comput. 5:618 623, 1976. [Gal] Fran cos Le Gall, Faster algorthms for rectangular matrx multplcaton, ArXv, 2012. [Wl] Ryan Wllams, Non-unform ACC crcut lower bounds, CCC 2011. 14