Communication Cost for Updating Linear Functions when Message Updates are Sparse: Connections to Maximally Recoverable Codes

1 Communiction Cost for Updting Liner Functions when Messge Updtes re Sprse: Connections to Mximlly Recoverble Codes N. Prksh nd Muriel Médrd Abstrct rxiv:1605.01105v2 [cs.it] 8 Jun 2016 We consider communiction problem in which n updte of the source messge needs to be conveyed to one or more distnt receivers tht re interested in mintining specific liner functions of the source messge. The setting is one in which the updtes re sprse in nture, nd where neither the source nor the receiver(s is wre of the exct difference vector, but only know the mount of sprsity tht is present in the difference-vector. Under this setting, we re interested in devising liner encoding nd decoding schemes tht minimize the communiction cost involved. We show tht the optiml solution to this problem is closely relted to the notion of mximlly recoverble codes (MRCs, which were originlly introduced in the context of coding for storge systems. In the context of storge, MRCs gurntee optiml ersure protection when the system is prtilly constrined to hve locl prity reltions mong the storge nodes. In our problem, we show tht optiml solutions exist if nd only if MRCs of certin kind (identified by the desired liner functions exist. We consider point-to-point nd brodcst versions of the problem, nd identify connections to MRCs under both these settings. I. INTRODUCTION We consider communiction problem in which n updte of the source messge needs to be communicted to one or more distnt receivers tht re interested in mintining specific functions of the ctul source messge. The setting is one in which the updtes re sprse in nture, nd where neither the source nor receivers re wre of the difference-vector. Under this setting, we re interested in devising encoding nd decoding schemes tht llow the receiver to updte itself to the desired function of the new source messge, such tht the communiction cost is minimized. A. Pont-to-Point Setting The system for the point-to-point cse is shown in Fig. 1. Here, the n-length column-vector X n F n q denotes the initil source messge, where F q denotes the finite field of q elements. The receiver mintins the liner function AX n, where A is n m n mtrix, m n, over F q. Let the updted source messge be given by X n + E n, where E n denotes the difference-vector, nd is such tht Hmming wt.(e n ɛ, 0 ɛ n. We sy tht the vector E n is ɛ-sprse. We consider liner encoding t the source using the l n mtrix H. We ssume tht the source is wre of the function A nd the prmeter ɛ, but does not know the vector E n. The gol is to encode X n + E n t the source such tht the receiver cn updte itself to A(X n + E n, given the source encoding nd AX n. Assuming tht the prmeters n, q, ɛ nd A re fixed, we define the communiction cost s the prmeter l, which is the number of symbols generted by the liner encoder H. The gol is to design the encoder nd decoder so to minimize l. We ssume zero-probbility-of-error, worst-cse-scenrio model; in orther words, we re interested in schemes which llow error-free decoding for every X n F n q nd ɛ-sprse E n F n q. The problem is in prt motivted by the setting of distributed storge systems (DSSs tht use liner ersure codes for dt storge, nd where the underlying uncoded file is subject to frequent but tiny updtes. In scenrio where multiple users shre the DSS, it is often the cse tht user pplies his updtes on version of the file N. Prksh nd Muriel Médrd re with the Reserch Lbortory of Electronics, Msschusetts Institute of Technology, USA (emil: {prkshn, medrd}@mit.edu. The results in this rticle were presented in prt s n invited tlk t 53rd Annul Allerton Conference on Communiction, Control, nd Computing, Sept 29-Oct 2, 2015, Allerton Prk nd Retret Center, Monticello, IL, USA. The work is in prt supported by AFOSR under grnt number FA9550-14-1-043.

2 Fig. 1. The system model for function updte in the point-to-point setting. which ws obtined loclly from different user, nd which is not present in the DSS. In our model, we consider updtes s substitutions in the file, which is resonble ssumption in scenrios when the file-size is much lrger thn the quntum of updtes. With respect to Fig. 1, X n represents the version of the file tht is stored in the DSS. The user hs ccess to loclly obtined version X n + E1 n tht is not stored in the DSS, nd updtes it to X n + E1 n + En 2 = Xn + E n, where E n = E1 n + En 2 represents the overll updte tht the file hs undergone. Further, our interest is in DSSs like peer-to-peer DSS where the user typiclly estblishes multiple connections with vrious nodes in order to updte their coded dt. Figure 1 illustrtes scenrio in which the user updtes the coded dt corresponding to one of the storge nodes tht uses mtrix A s its ersure coding coefficients. We illustrte the usge of the mtrix A vi the following exmple. Exmple 1: Consider DSS tht uses n [N, K] liner code for storing dt cross N storge nodes. The dt file X n is striped before encoding, the vrious stripes re individully encoded nd stcked ginst ech other to get the overll coded file. Let = [ 1 2 K ], i F q denote the coding coefficients corresponding to one of the storge nodes. Assuming tht the first K symbols of the vector X n correspond to the first stripe, symbols K +1 to 2K correspond to the second stripe nd so on, the overll coding mtrix A (tking into ccount ll stripes is given by A =. (1 Here we ssume tht K n, nd thus there re m = n K stripes in the file. The prmeter m lso represents the number of coded symbols stored by ny one of the N nodes. The mtrix A is of size m mk. The scenrio of interest is one in which the number of coded symbols m is much lrger thn the sprsity prmeter ɛ, which is the number of symbols tht get updted in the uncoded file. The gol is to design the encoder H nd the decoder D tht minimizes the communiction cost. B. Brodcst Setting The system model nturlly extends to brodcst settings consisting of single source nd multiple destintions tht re interested in possibly different functions of the source messge. In our work, we study the brodcst setting for the cse of two destintions (see Fig. 2, where the receivers initilly hold functions AX n nd BX n, nd get updted to A(X n + E n nd B(X n + E n, respectively. The brodcst setting is motivted by settings in which two of the storge nodes in peer-to-peer DSS re nerly co-locted, s fr s (distnt user is concerned. In this cse, the question of interest is to identify whether it is possible to send one encoded pcket H(X n + E n to simultneously updte the coded dt of both the storge nodes. We will next present two exmples of the brodcst setting. As we will see lter in this document, the first exmple is n instnce in which brodcsting does not help to reduce communiction cost, i.e., it s optiml to individully updte the two destintions. The second exmple is n instnce in which it is beneficil to brodcst thn individully updte the two destintions. Exmple 2: As in Exmple 1, we consider striping nd encoding of dt file by n [N, K] liner code over F q with K 2. We pply the brodcst model to simultneously updte the contents of 2 out of the N storge nodes. Let us ssume tht the K-length coding vectors ssocited with two of the storge nodes re given by

3 Fig. 2. The system model for function updtes in the brodcst setting involving two receivers. = [ 1 2... K ], b = [b 1 b 2... b K ], i, b i F q. The overll coding mtrices A nd B corresponding to the m stripes re then given by b A = nd B = b, (2 b where both A nd B hve size m mk. We will see lter tht for this exmple, in order to chieve the optiml communiction cost, the source must trnsmit s though it is individully updting the two storge nodes, i.e., there is no dded benefit due to brodcsting. Exmple 3: In this exmple, we consider striping nd encoding of dt file by minimum bndwidth regenerting (MBR code [1] tht is obtined vi the product-mtrix construction described in [2]. Regenerting codes [1] re codes specificlly designed for dt storge, nd they llow the system designer to trde-off storge overhed ginst repir bndwidth for given level of fult tolernce. MBR codes hve the best possible repir bndwidth (t the expense of storge overhed for given level of fult tolernce. We first give quick introduction to specific instnce of n MBR code in [2], nd use tht in our brodcst setting. The MBR code: Consider n [N = 5, K = 3, D = 4](α = 4, β = 1 MBR code over F q tht encodes 9 symbols into 20 symbols nd stores cross N = 5 nodes such tht ech node holds α = 4 symbols. The code hs the property tht the contents of ny K = 3 nodes re sufficient to reconstruct the 9 uncoded symbols. Also, the contents of ny one of the N = 5 node cn be recovered by connecting to ny set of D = 4 other nodes 1 nd downloding β = 1 symbols from ech of them. The ltter property is known s the node-repir property of the regenerting code. Let [m 1,..., m 9 ] denote the vector of 9 messge symbols, where m i F q. The encoding is described by product-mtrx construction s follows: C N α = Ψ N D M D α (3 ψ 1 ψ m 1 m 2 m 3 m 7 2 = ψ m 2 m 4 m 5 m8 3 ψ m 3 m 5 m 6 m 9. (4 4 m 7 m 8 m 9 0 ψ 5 1 In our exmple, one connects to ll the remining N 1 = 4 nodes, since D = 4.

4 Here, M denotes the D α messge mtrix whose entries re populted from mong the 9 messge symbols, in specific mnner, s shown in (4. Note tht the mtrix M is symmetric. The N D encoding mtrix Ψ cn be chosen s Vndermonde mtrix under the product-mtrix frmework. The vector ψ i, 1 i 5 denotes the i th row of Ψ. In this exmple, we ssume tht Ψ is given s follows: Ψ = 1 γ γ 2 γ 3 1 γ 2 γ 4 γ 6 1 γ 3 γ 6 γ 9 1 γ 4 γ 8 γ 12 1 γ 5 γ 10 γ 15, (5 where we pick γ s primitive element in F q. The N α mtrix C is the codeword mtrix, with the i th row representing the contents tht get stored in the i th node, 1 i 5. Striping using the MBR code, nd the ssocited coding mtrices A nd B: As before, we let X n to denote the uncoded dt file, which is divided into m stripes ech of length 9 symbols. Also, recll tht in our model of striping, the first stripe (for the current exmple consists of the first 9 symbols of X n, the second stripe consists of symbols X 10,..., X 18, nd so on. The length n is given by n = 9m. Consider the cse where we use the brodcst model to updte the contents of the first two nodes. The overll coding mtrices A nd B, both of size 4m 9m, corresponding to the first nd the second node re respectively given by A 1 B 1 A 1 A = nd B = B 1, (6 A1 B1 where A 1 = 1 γ γ 2 γ 3 1 γ γ 2 γ 3 1 γ γ 2 γ 3 1 γ γ 2 nd B 1 = 1 γ 2 γ 4 γ 6 1 γ 2 γ 4 γ 6 1 γ 2 γ 4 γ 6 1 γ 2 γ 4. (7 As we shll lter, such brodcsting, rther thn individully trnsmitting to the two destintions, cn reduce the totl communiction cost for this exmple by nerly 12%. C. Connection to Mximlly Recoverble Codes In this pper, we identify necessry nd sufficient conditions for solving the function updte problems in Fig. 1, Fig. 2, for generl mtrices A nd B. We show tht the existence of optiml solutions under both these settings is closely relted to the notion of mximlly recoverble codes, which were originlly studied in the context of coding for storge systems [3], [4], [5]. In the context of storge, MRCs form subclss of broder clss of codes known s loclly repirble codes (LRC [6], [7]. An [n, k] liner code C is clled n LRC with loclity r if ech of the n code symbols is recoverble s liner combintion of t most r other symbols of the code. Qulittively, n LRC is sid to be mximlly recoverble ( forml definition ppers in Section II if it offers optiml beyond-minimum-distnce correction cpbility. When restricted to the setting of LRCs, MRCs re known s prtil-mds codes [4] or mximlly recoverble codes with loclity [5]. MRCs with loclity re used in prcticl DSSs like Windows Azure [8]. The usge of the concept of MRCs in our work is more generl, nd not necessrily restricted to the context of LRCs. Below, we define the notion of mximlly recoverble subcode of given code, sy C 0. Definition 1 (Mximlly Recoverble Subcode C of C 0 : Let C 0 denote n [n, t] liner code over F q hving genertor mtrix G 0. Also, consider n [n, k] subcode C of C 0 for some k t, nd let G denote genertor mtrix of C. The code C will be referred to s mximlly recoverble subcode (MRSC of C 0 if for ny set S [n], S = k such tht rnk (G 0 S = k, we hve rnk (G S = k.

5 Exmple 4: Consider n [n = 9, t = 3] binry code C 0 whose genertor check mtrix G 0 is given by 1 1 1 G 0 = 1 1 1. (8 1 1 1 Next, consider the [n = 9, k = 2] subcode C of C 0 hving genertor mtrix G given by [ ] 1 1 1 1 1 1 G =. (9 1 1 1 1 1 1 It is strightforwrd to verify tht C is n MRSC of C 0. D. Summry of Results Following is summry of the results obtined in this pper. 1 For the point-to-point setting in Fig. 1, the communiction cost is lower bounded by 2 l mx(2ɛ, rnk(a. (10 If C A nd C H denote n-length liner codes respectively generted by the rows of the mtrices A nd H, then under optimlity (i.e., chieving equlity in (10, we show tht the code C H must necessrily be MRSC of C A. An chievble scheme bsed on MRSCs is presented to estblish the optimlity of the bound in (10. For generl mtrix A, our chievbility is gurnteed only when the field size q used in the model (see Fig. 1 is sufficiently lrge. 2 An explicit low field size encoding mtrix H is provided for the setting considered in Exmple 1. Recll tht in Exmple 1, we use n rbitrry [N, K] liner code to stripe nd store the dt. The MRSC C H in this cse corresponds to MRCs with loclity, where the locl codes re scled repetition codes. 3 We identify necessry nd sufficient conditions for solving the brodcst setting given in Fig. 2. Let C A nd C B respectively denote the liner block codes generted by the rows of the mtrices A nd B. For the specil cse when the codes C A nd C B intersect trivilly, there is no benefit from brodcsting, i.e., the encoder must trnsmit s though it is trnsmitting individully to the two receivers, nd thus the optiml communiction is given by l mx(2ɛ, rnk(a + mx(2ɛ, rnk(b. (11 For the generl cse when C A nd C B hve non-trivil intersection, the optiml communiction cost cn be less thn wht is given by (11. The expression for the optiml communiction cost ppers in Theorem 5.5. Like in the point-to-point cse, optimlity is gurnteed only when the field size q is sufficiently lrge. 4 Our chievbility scheme for the generl cse of the brodcst setting involves finding nswer to the following sub-problem : Given n [n, t] code C 0 nd n [n, s] subcode Ĉ of C 0, cn we find n [n, k] MRSC C of C 0 such tht Ĉ is subcode of C. We refer to this s the problem of finding sndwiched MRSCs. Here the prmeters s, k, t stisfy the reltion s k t. We present two different techniques tht show the existence of sndwiched MRSCs (under certin trivil necessry conditions on the code C 0. The first technique is prity-check mtrix bsed pproch, nd relies on the existence of non-zero evlutions of multivrite polynomil over lrge enough finite field. The second technique is genertor mtrix bsed pproch, uses the concept of linerized polynomils, nd yields n explicit construction. E. Relted Work Coding for File Synchroniztion : In [9], the uthors consider the problem of oblivious file synchroniztion, under the substitution model, in distributed storge setting employing liner codes. The model is one in which one of the coded storge nodes gets updted with the help of other coded storge nodes, who hve lredy undergone updtes. The uthors considers the problem of jointly designing the storge code, nd lso the updte scheme. The updtes, like in our setting, is crried out under the ssumption tht only the sprsity of updtes is known, nd 2 The bound in (10 ws lredy proved in [9] for n verge cse setting.

6 not the ctul updtes themselves. Optiml solutions re presented when the storge code is ssumed to n MDS code. A recent version of this work [10] extends the work to the setting of regenerting codes, with the restriction tht only one symbol (ɛ = 1 is obliviously updted. The key difference between these works nd ours is tht we consider designing optiml updte schemes for n rbitrry liner function, while the bove works ssume specific structure of the storge code (e.g., MDS in [9]. Plese lso see Remrk 1 for comprison of the converse sttements ppering in the two ppers. A second work which is relted to ours ppers in [11], where the uthors consider the problem of designing protocols for simultneously optimizing storge performnce s well s communiction cost for updtes. Contrry to our setting, the updtes re modeled s insertions/deletions in the file. A similrity with our work (like the work in [9] is tht the destintion node holds coded form of the dt (which cn be considered s liner function of the uncoded dt, nd is interested in updting the coded dt. The gol is to communicte nd updte the coded file in such wy tht reconstruction/repir properties in the storge system re preserved. Their protocol permits modifictions to the structure of the storge code itself for optimizing the communiction cost. Note tht in our model we optimize the communiction cost under the ssumption tht the destintion is interested in the sme function A of the updted dt s well. The problem of synchronizing files under the insertion/deletion model hs been previously studied, using informtiontheoretic techniques, in [12], [13]. Recll tht our model uses substitutions, rther thn insertions/deletions, for specifying updtes. In [12], point-to-point setting is considered in which the decoder hs ccess to prtillydeleted version of the input sequence, s side informtion. The uthors clculte the minimum rte t which the source needs to compress the (undeleted input sequence so s to permit lossless recovery (i.e., vnishing probbility of error for lrge block-lengths t the destintion. The model of [13] llows both insertions nd deletions in the originl file to get the updted file. Further, both the source nd the destintion hve ccess to the originl file. Under this setting, the uthors study the minimum rte t which the updted file must be encoded t the source, so tht lossless recovery is possible t the destintion. The problem of synchronizing edited sequences is lso considered in [14], where the uthors llow insertions, deletions or substitutions, nd ssume zero-error recovery model. An importnt difference between our work nd ll the works mentioned bove is tht while we re interested in specific liner function of the input sequence, ll of the bove works ssume recovery of the entire source sequence t the destintion. Mximlly Recoverble Codes: The notion of mximl recoverbility in liner codes ws originlly introduced in [3]. Low field size constructions of prtil MDS codes (nother nme for MRCs with loclity for specific prmeter sets codes pper in [4], [15], [16]. In the terminology of prtil MDS codes, these three works respectively provide low field size constructions for the settings up to one, two nd three globl prities, nd ny number of locl prities. Explicit constructions bsed on linerized polynomils for the cse of one locl prity nd ny number of globl prities pper in [5]. Identifying low-field-size constructions of prtil MDS codes for generl prmeter sets remin n open problem. Vrious pproximtions of prtil MDS codes like Sector Disk codes [4], [15], [16], STAIR codes [17], nd prtil-mximlly-recoverble-codes [18] hve been considered in literture, nd these permit low field size constructions for lrger prmeter sets. Other known results on mximlly recoverble codes include expressions for the weight enumertors of MRCs with loclity [19], nd the generlized Hmming weights (GHWs of the MRSC C in terms of the GHWs of the code C 0 [20]. Zero-Error Function Computtion : Finlly, we note tht the problem considered in this pper cn be considered s one of zero-error function computtion. Even though the existing literture does not directly ddress the problem tht we study here, we review some of the relevnt works on zero-error function computtion, in point-to-point s well s network settings. In [21], the uthors exmine the problem of computing generl function with zeroerror, under worst-cse s well s verge-cse models in the context of sensor networks. One of the problems considered there involves point-to-point setting, where source hving ccess to x needs to encode nd trnsmit to destintion which hs ccess to side informtion y. The destintion is interested in computing the function f(x, y with zero-error in the worst-cse model. Though the point-to-point model in Fig. 1 cn be considered s specil cse of the setting in [21], our ssumptions regrding the specific nture of the decoder side-informtion, nd the use of liner encoding enble us to show deeper results, especilly the connection to mximlly recoverble codes. The problem of zero-error computtion of symmetric functions in network setting is studied in [22], [23]. In [22], the uthors chrcterize the rte t which symmetric functions (eg: men, mode, mx etc. cn be computed t

7 sink node, while in [23], the uthors provide optiml communiction strtegies for computing symmetric Boolen functions. The works of [24], [25], [26] study zero-error function computtion in networks under the frmework of network coding. In [24], the uthors study the significnce of the min-cut of the network when destintion node is interested in computing generl function of set of independent source nodes. The works of [25], [26] consider sum networks, where set of destintion nodes in the network is interested in computing the sum of the observtions corresponding to set of source nodes. The orgniztion of the rest of the pper is s follows. Sections II contins definitions nd fcts relting to mximlly recoverble codes. Necessry nd sufficient conditions for chieving optiml communiction cost in the point-to-point cse ppers in Section III. In Section IV, we present low field size construction for the setting considered in Exmple 1. The brodcst setting nd the ssocited problem of finding sndwiched MRSCs re discussed in Section V. Finlly, our conclusions pper in Section VI. Nottion : Given mtrix A F m n q, m n hving rnk m, we write C A to denote the [n, m] liner block code over F q generted by the rows of A. The rnk of mtrix A will be denoted s ρ(a. We write CA to denote the dul code of C A. For ny set S [n] = {1, 2,..., n} nd for ny n-length liner code C, we use C S to denote the restriction of the code C to the set of coordintes indexed by the set S. Also, we write C S to denote the subcode of C obtined by shortening C to the set of coordintes indexed by the set S. In other words, C S denotes the set of ll those codewords in C whose support is confined to S. For ny codeword c C such tht c = [c 1 c 2 c n ], the support of c is defined s supp(c = {i [n], c i 0}. We recll the well known fct tht (C S = ( C S. II. BACKGROUND ON MAXIMALLY RECOVERABLE CODES In this section, we review relevnt known fcts regrding MRSCs, including equivlent definitions, existence of MRSCs nd the notion of MRCs with loclity. Definition 2 (l-cores [6]: Consider n [n, k] code C, nd let C denote the dul of C. Any set S [n], S = l is termed s n l-core of C if supp (c S, c C. We next stte certin equivlent definitions of MRSCs. These re mostly known from vrious existing works in the context of MRCs with loclity, nd re strightforwrd to verify. A proof is however included in Appendix A for the ske of completeness. Lemm 2.1: Consider n [n, t] code C 0, nd let C denote n [n, k] subcode of C. Also let G 0 nd G denote genertor mtrices for C 0 nd C, respectively. Then, the following sttements re equivlent: 1. C is mximlly recoverble subcode of C 0, s defined in Definition 1, i.e., for ny set S [n], S = k such tht ρ (G 0 S = k, we hve ρ (G S = k. 2. Any set S [n], S = k which is k-core of C 0 is lso k-core of C. 3. For ny set S [n], S = k which is k-core of C 0, we hve ρ(h [n]\s = n k. Here H denotes prity check mtrix for the code C. 4. For ny set S [n], S k, ρ (G 0 S = ρ (G S. Since the dul of punctured code is shortened code of the dul code, this is equivlent to sying tht C is n MRSC of C 0 if nd only if for ny S [n], S k, ( C S = ( C 0 S. We note tht this is further equivlent to sying tht ny k-sprse vector c tht is codeword of C, is lso codeword of C 0. Proof: See Appendix A. The following lemm, which is resttement of Lemm 14 in [6], gurntees the existence of MRSCs under sufficiently lrge field size. Lemm 2.2 ([6]: Given ny [n, t] code C 0 over F q, nd ny k such tht k < t, there exists n [n, k] mximlly recoverble subcode C of C 0, whenever q > kn k. We next present n explicit construction of MRSCs bsed on mtrix representtions corresponding to linerized polynomils. Linerized polynomil bsed constructions for MRCs with loclity, from prity-check mtrix point

8 of view, pper in [5]. In the sme pper [5], the uthors show, gin using linerized polynomils nd pritycheck mtrix ides, how to obtin n explicit construction of n MRSC C of C 0, when G 0 is binry mtrix. The construction described below 3 uses genertor mtrix view point, nd the technique is similr to the usge of linerized polynomils ppering in works of [27], [28], [29]. These works used linerized polynomils for constructing loclly repirble codes. As we will see, the construction of the sndwiched MRSCs tht we present in the context of the brodcst setting (see Fig. 2 is n dpttion of the following construction. Construction 2.3: Consider the code [n, t] code C 0 over F q, hving genertor mtrix G 0. Let F Q denote n extension field of F q, where Q = q t. Also, let {α i F Q, 1 i t} denote bsis of F Q over F q. Define the elements {β i F Q, 1 i n} s follows: [β 1 β 2 β n ] = [α 1 α 2 α t ]G 0. (12 Next, consider the [n, k] code C over F Q hving genertor mtrix G given by β 1 β 2 β n β q 1 β q 2 βn q G =.. (13 β qk 1 1 β qk 1 2 βn qk 1 The code C is the cndidte MRSC of C 0, where we consider C 0 itself s code over F Q. This is formlly stted in the following theorem. Theorem 2.4: Consider n [n, t] code C 0 over F q, hving genertor mtrix G 0, where G 0 F t n q. Let C (Q 0 denote the [n, t] code over F Q tht is lso generted by G 0, where Q = q t. Then the [n, k] code C over F Q obtined in Construction 2.3 is mximlly recoverble subcode of C (Q 0. Proof: See Appendix B. A. Mximlly Recoverble Codes with Loclity (Prtil MDS Codes Below we give the definition of MRCs with loclity, using the notion of MRSCs presented in Definition 1. Definition 3 (MRCs with Loclity [5], [6]: Assume tht the prity mtrix H 0 of C 0 hs the following form: H L,1 H L,2 H 0 =, (14 HL,l where ech H L,i, 1 i l genertes n [r + δ 1, δ 1] MDS code, for some fixed r, δ. The code C0 hs prmeters [n = l(r + δ 1, n t = l(δ 1]. Then n [n, k] MRSC C of C 0 will be referred to s n [n, k] (r, δ mximlly recoverble code with loclity. We would like to point out tht the precise form of H L,i, 1 i l is not imposed by the bove definition. In other words, while designing n [n, k] (r, δ MRC with loclity, we re free to choose ny H L,i, 1 i l tht genertes n [r + δ 1, r, δ] MDS code. MRCs with loclity re lso known in literture s prtil MDS codes [4]. An [n, k] (r, δ MRC with loclity corresponds to n [m, n ](r, s prtil MDS code 4, where m = n r + δ 1 (15 n = r + δ 1 (16 r = δ 1 (17 s n = r k. (18 r + δ 1 3 We do not clim novelty for Construction 2.3, since the ide used here follows directly from works like [27], [28], [29] 4 Nottion used for prtil MDS codes correspond to the one used in [4].

9 The ide here is tht we see the code s two dimensionl rry code hving m rows nd n columns. Ech row of the rry is n [n = r + δ 1, n r = r] MDS code hving r = δ 1 locl prities. The prmeter s = dim(c 0 dim(c refers to the number of globl prity symbols. An [m, n ](r, s prtil MDS code cn tolerte ny combintion of r ersures in ech row, nd n dditionl s ersures mong the remining elements in the rry. In this write-up we will follow the terminology of MRCs with loclity. III. COMMUNICATION COST FOR THE POINT-TO-POINT CASE In this section, we obtin necessry nd sufficient conditions for solving the point-to-point problem setting in Fig. 1. An encoder-decoder pir (H, D will be referred to s vlid scheme for the problem in Fig. 1, if the decoder s estimte of A(X n + E n is correct for ll X n, E n F n q, such tht Hmming wt.(e n ɛ. Recll our ssumption tht we del with zero probbility of error, worst-cse scenrio model. Also, recll tht the communiction cost ssocited with the encoder H is given by l = ρ(h, where the prmeters n, q, ɛ nd A of the system re ssumed to be fixed priori. Without loss of generlity, we ssume tht the m n mtrix A hs rnk m. A. Necessry Conditions Theorem 3.1: Consider ny vlid scheme (H, D for the communiction problem described in Fig. 1. Let C H nd C A denote the liner codes generted by the rows of the mtrices H nd A respectively. Also, consider the code C = C H C A, nd let B denote genertor mtrix for C. Then, the following conditions must necessrily be stisfied: 1. If Y n is ny 2ɛ-sprse vector such tht AY n 0, then it must lso be true tht BY n 0. 2. dim (C min(m, 2ɛ. Combining the two prts it then follows tht if dim (C = min(m, 2ɛ, then C must be mximlly recoverble subcode of C A. Proof: 1. We will prove the first prt by contrdiction. Let us suppose tht there exists non-zero 2ɛ-sprse vector Y n such tht Y n C, but Y n / CA. In this cse, we will show tht there exists two distinct pirs (Xn 1, En 1 nd (X2 n, En 2, with En 1 nd En 2 being ɛ-sprse, such tht H(Xn 1 + En 1 = H(Xn 2 + En 2 nd AXn 1 = AXn 2, but AE1 n AEn 2. Thus, no decoder D cn resolve between the two pirs successfully, nd we would hve contrdicted the vlidity of our scheme. Towrds constructing the desired (X n, E n pirs, note tht Y n cn be expressed s Y n = E1 n En 2 for some two distinct ɛ-sprse vectors En 1 nd En 2. Also, since C = CH +C A, there exists U n CA nd V n CH such tht En 1 En 2 = U n + V n. We now choose X1 n = U n nd X2 n = 0. In this cse, we see tht H(X1 n + En 1 = H(Xn 2 + En 2 = HEn 2 nd AXn 1 = AXn 2 = 0, but AEn 1 AEn 2, which contrdicts the vlidity of our scheme. This completes the proof of the first prt. 2. Assume tht dim(c < min(m, 2ɛ which implies tht dim ( C > n min(m, 2ɛ. In this cse, it is strightforwrd to see tht there exists bsis L of C consisting entirely of vectors tht re 2ɛ-sprse. Also, note tht dim (C < min(m, 2ɛ dim (C A, which implies tht dim ( C > dim ( CA. Thus t lest one of the elements of the set L is not contined in CA. In other words, there exists non-zero 2ɛ-sprse vector Y n such tht Y n C, but Y n / CA. The rest of the proof (to rrive t contrdiction follows from the proof of Prt 1. of the theorem. We thus conclude tht dim(c min(m, 2ɛ. Finlly, ssume tht m > 2ɛ, nd note tht Prt 1 implies tht if S [n], S = 2ɛ is such tht ρ(a S = 2ɛ, then ρ(b S = 2ɛ. It then follows from Definition 1 tht C must be MRSC of C A, if dim (C = 2ɛ. Corollry 3.2: The communiction cost l ssocited with ny vlid scheme for the problem given in Fig. 1 is lower bounded by l min(m, 2ɛ. b If m 2ɛ nd l = m (optiml communiction cost, then the encoding mtrix H is necessrily given by H = A. c If m > 2ɛ nd l = 2ɛ (once gin optiml communiction cost, then it must necessrily be true tht the code C H is MRSC of C A. Remrk 1: The bound l min(m, 2ɛ in Prt of the bove corollry ws lredy proved in [9] for n verge cse setting, under the ssumption tht the vectors X n nd X n + E n re uniformly picked t rndom from the set of ll vectors stisfying Hmming wt.(e n ɛ. This result in [9] cn be directly used to rgue the correctness of Prt of the bove corollry, for the worse cse setting tht we consider here. However, the connection to mximlly recoverble codes, especilly the necessity of MRSCs for chieving optimlity (Prt c of the bove

10 corollry is novelty of this pper, nd s we we will see next forms the bsis of the our chievbility scheme for n rbitry mtrix A chosen for sufficiently lrge field size. B. Optiml Achievble Scheme bsed on Mximlly Recoverble Subcodes, when m > 2ɛ We now present vlid scheme for the cse m > 2ɛ, hving optiml communiction cost l = 2ɛ. From Corollry 3.2, we know tht the [n, 2ɛ] code C H must necessrily be n MRSC of C A. We lso know from Lemm 2.2 tht such n MRSC lwys exists whenever the field size q > 2ɛn 2ɛ. Since C H is subcode of C A, there exists 2ɛ m mtrix S such tht H = SA. Given the mtrices H nd S, we now refer to Fig. 3 for schemtic of the decoder to be used. The vrious steps performed by the decoder to estimte A(X n + E n re s follows: (1 Given the encoder output H(X n + E n nd the side informtion AX n, obtin HE n s HE n = H(X n + E n S (AX n. (19 (2 Determine ny ɛ-sprse vector Ê n such tht HE n = HÊ n. To mke the decoder deterministic, ssume tht Ê n is the lest Hmming weight vector such tht HE n = HÊ n. Since the vector (E n Ê n is 2ɛ-sprse, nd since C H is 2ɛ-dimensionl MRSC of C A, we know from Prt 4. of Lemm 2.1 tht HE n = HÊ n = AE n = AÊ n. (3 Compute the desired estimte s AX n + AÊ n = A(X n + E n. + Syndrome Decoder - + + Fig. 3. The decoder used in the chievbility proof for the cse m > 2ɛ. Remrk 2: We note tht it is possible to construct vlid scheme, not necessrily optiml, using ny mtrix H which stisfies the conditions of Theorem 3.1. This lso implies tht if (H, D is ny vlid scheme for the problem, nd if C A C H C A, then second vlid scheme for the problem hving lower communiction cost cn be constructed using the mtrix B for encoding, where B is genertor mtrix for the code C A C H. This follows becuse the mtrix B will lso stisfy the conditions of the Theorem 3.1. IV. UPDATING LINEARLY ENCODED STRIPED-DATA-FILES In this section, we present low-field MRSC construction for the setting of Exmple 1, where we llow striping nd encoding by n rbitrry [N, K] liner code C st over F q. Recll tht in Exmple 1, we pply the point-to-point model in Fig. 1 in order to updte the coded dt in ny one of the N storge nodes in the system. The K-length coding vector for one of the storge nodes is given by = [ 1 2 K ], i F q. The vector X n denotes the uncoded dt file, which is divided into m stripes ech of length K symbols. The first stripe consists of the first K symbols of X n, the second stripe consists of symbols X K+1,..., X 2k, nd so on. We ssume tht the length n = mk. The overll coding mtrix A, which corresponds to the desired function t the destintion, is given by A =. (20

11 If we design the optiml encoder H bsed on existence of MRSCs from Lemm 2.2, then the field size q must be t lest 2ɛn 2ɛ. We now present n lternte low field size explicit construction of MRSCs of C A, when the mtrix A tkes on the specil form given in (20. For the coding vector = [ 1 2 K ], if ssume tht i 0, i [K], then note tht n [n, 2ɛ] MRSC C H of C A is in fct n [n, 2ɛ](K, 1 MRC with loclity (see Definition 3, i.e., code with loclity where ll the locl codes re scled repetition codes. The problem of low field size constructions of MRCS with loclity is in generl n open problem. However, s we show here, for the specil cse when the locl codes pper s repetition codes, low field size constructions re esily identified. We divide the discussion into two prts. We will first show how MRSCs of ny generl code C 0 cn be obtined by first suitbly extending C 0, nd then shortening the extended code. Given this result, we then show tht the problem of constructing n MRSC of C A where A is s given by (20, is s simple s finding n [m, m 2ɛ] MDS code over F q. If we employ Reed-Solomon codes, field size q > m is sufficient for the construction. A. An Equivlent Definition of MRSCs vi Code Extensions Consider n [n, t] code C 0 over F q hving genertor mtrix G 0. Define new [n +, t] code C (e 0 over F q whose genertor mtrix G (e 0 is given by G (e 0 = [G 0 Q], (21 for some t mtrix Q over F q. Code C (e 0 will be referred to s extension of C 0. In the following lemm, we show tht MRSCs of C 0 exist if nd only if certin extensions of C 0 exist. Lemm 4.1: Consider n [n +, t] extension C (e 0 of the [n, t] code C 0 such tht (i < t, nd (ii for ny S [n], S = t, we hve ( ρ G (e 0 S {n+1,...,n+ } = ρ (G 0 S +. (22 ( Then the shortened code C (e 0 [n] is n [n, k = t ] MRSC of C0. Conversely, suppose tht there exists n [n, k] MRSC of the [n, t] code C 0, where k = t for some < t. Then, there exists n [n +, t] extension C (e 0 of the [n, t] code C 0 tht stisfies (22 for ny S [n], S = t. Proof: We only prove the forwrd prt here, proof of converse ppers in Appendix C. To prove the forwrd ( prt, we ssume the existence of extension C (e 0 stisfying (22, nd show tht the shortened code C C (e [n] 0 is n [n, k = t ] MRSC of C 0. From Prt 4, Lemm 2.1, we know it is sufficient to prove tht (C S ( S = C0 S [n], S k. (23 Let us suppose (23 is not true; i.e., there exist set S [n], S k such tht ( C0 S ( C S. In other words, there exists vector c C such tht c / C0 nd supp(c S. Next, note tht if H 0 denotes prity check mtrix for C 0, then prity check mtrices H (e 0 nd H for the codes C (e 0 nd C cn be respectively given by [ ] [ ] H (e H0 0 0 = n t H0 nd H =, (24 H e I where H e is some n mtrix over F q, nd I denotes the identity mtrix. The fct tht the prity-check mtrix H for the code C is s given bove, follows from the fct the dul of shorten code is simply ( the punctured code of the dul code. In this cse, corresponding to the vector c, there exists vector c (e such tht supp ( c (e S {n + 1,, n + }, c = ( c (e S, nd the set supp ( c (e {n + 1,, n + } is not empty. The lst prt follows from our ssumption tht c / C0 (. The presence of the vector c (e C (e 0 stisfying the bove conditions mens tht ( ρ G (e 0 S {n+1,...,n+ } < ρ (G 0 S +, (25 H e C (e 0

12 which is contrdiction to our ssumption tht the extension C (e 0 stisfies (22. From this we conclude tht (23 is indeed true, nd this completes the proof of the forwrd prt. B. Low Field Size MRSCs for Updting Striped-Dt-Files Construction 4.2: Consider the mtrix A F m n q s given in (20, nd the ssocited [n, m] liner code C A over F q. Next consider the [n + m 2ɛ, m] extended code C (e A over F q, whose genertor mtrix A (e is given by A (e = [A Q], (26 where Q F m (m 2ɛ q, nd is such tht Q T genertes n [m, m 2ɛ] MDS code over F q. The cndidte for the desired MRSC C H is given by C H = Note tht C H is n [n, 2ɛ] code. This completes the description of the construction. ( C (e A [n]. (27 Theorem 4.3: The [n, 2ɛ] code C H obtined in Construction 4.2 is n MRSC of the [n, m] code C A, where A is given in (20. Proof: It is strightforwrd to check tht the extended mtrix A (e in (26 stisfies the condition given in (22, i.e., for ny S [n], S = 2ɛ, we hve ρ (A (e S {n+1,...,n+m 2ɛ} = ρ (A S + (m 2ɛ. (28 The proof now follows from the forwrd prt of Lemm 4.1. Remrk 3: The converse prt of Lemm 4.1 cn be used to show tht given ny [n, 2ɛ] MRSC C H of C A, nd if the vector is such tht i 0 i [K], then there exists n extended mtrix A (e = [A Q] such tht Q T genertes n [m, m 2ɛ] MDS code over F q. Note tht the ssumption i 0 ws not imposed in Theorem 4.3. V. BROADCASTING TO RECEIVERS INTERESTED IN DIFFERENT FUNCTIONS In the brodcst setting (see Fig. 2, we consider two receivers tht re interested in computing two seprte liner functions of the messge vector. The functions correspond to the mtrices A nd B, respectively. Though the rest of the prmeters re similr to those of the point-to-point cse, we quickly describe them here gin for ske of clrity. The vector X n F n q denotes the initil source messge. The two receivers hold AX n nd BX n s side informtion, respectively. The mtrices A nd B hve sizes m A n nd m B n, respectively, where m A, m B n. Without loss of generlity, we ssume tht ρ(a = m A nd ρ(b = m B. The updted source messge is given by vector X n + E n, where the difference-vector E n is ɛ-sprse. Encoding is crried out vi the l n mtrix H. The gol is to recover the functions A(X n + E n nd B(X n + E n t the respective receivers. The decoders used t the two receivers re denoted by D 1 nd D 2, respectively. Once gin, we ssume zero-probbility-of-error, worst-cse-scenrio model. We ssume knowledge of the functions A, B nd the prmeter ɛ, while designing the encoder H. The communiction cost for the model is given by l, ssuming tht the prmeters n, q, ɛ, A nd B re fixed. The triplet (H, D 1, D 2 will be referred to s vlid scheme for the brodcst problem, if both the decoders estimtes re correct for ll X n, E n F n q such tht Hmming wt.(e n ɛ. Our gol in this section is to identify necessry nd sufficient conditions on vlid schemes for the brodcst problem. Specificlly, we re interested in chrcterizing the minimum communiction cost (mong vlid schemes tht cn be chieved for the setting. We divide the discussion into three prts. We first consider the specil cse when the two liner codes C A nd C B intersect trivilly, i.e., C A C B = {0}. The optiml communiction cost for this cse is strightforwrd to compute, given the observtions from the point-to-point cse. We then present necessry nd sufficient conditions for the existence of sndwiched MRSCs. Recll tht in the problem of sndwiched MRSCs, we begin with code C 0 nd subcode Ĉ of C 0. The gol is to identify MRSC C of C 0 such tht Ĉ C. Finlly, we will show how the concept of sndwiched MRSCs cn be used to identify vlid schemes hving optiml communiction cost for the cse of rbitrry mtrices A nd B. Illustrtion of the results will be done by nlyzing the settings in Exmples 2 nd 3.

13 A. Optiml Communiction Cost for the Cse C A C B = {0} Let (H, D 1, D 2 denote vlid scheme for the cse C A C B = {0}. For decoders D 1 nd D 2 to be successful, we know from Theorem 3.1 tht dim(c A C H min(m A, 2ɛ nd dim(c B C H min(m B, 2ɛ, respectively. From this if follows tht for the cse when dim(c A + C B = dim(c A + dim(c B, the communiction cost l = dim(c H is lower bounded by l min(m A, 2ɛ + min(m B, 2ɛ. (29 Given the chievbility result from Section III-B, we see tht the bove bound is trivilly chieved by encoding seprtely for the two receivers, where ech of the two encodings is optiml for the respective receiver. We formlly stte the bove observtions in the following theorem: Theorem 5.1: The optiml communiction cost for the brodcst setting in Fig. 2 is given by l = min(m A, 2ɛ + min(m B, 2ɛ, (30 whenever the codes C A nd C B intersect trivilly. Achievbility is gurnteed under the ssumption tht the field size q > 2ɛn 2ɛ. The following exmple illustrtes the cse under considertion. Exmple 2 Revisited: Consider the setting in Exmple 2, where striping nd encoding of dt file ws done by n [N, K] liner code over F q. Assume tht the code is n [N, K] MDS code with K 2. We pply the brodcst model to simultneously updte the contents of two out of the N storge nodes. Also, recll our ssumption tht the K-length coding vectors ssocited with the two storge nodes re given by = [ 1 2... K ], b = [b 1 b 2... b K ], i, b i F q, i [K], nd the overll coding mtrices A nd B corresponding to the m stripes re then given by b A = nd B = b, (31 b where both A nd B hve size m mk. Under the ssumption tht the [N, K] code is MDS with K 2, it is strightforwrd to see tht the codes C A nd C B corresponding to the mtrices A nd B intersect trivilly, i.e., C A C B = {0}. In this cse, from Theorem 5.1, we know tht brodcsting does not help; the source must trnsmit s though it is individully updting the two storge nodes. B. Sndwiched Mximlly Recoverble Subcodes We now tke slight detour, nd identify necessry nd sufficient conditions for the existence of sndwiched MRSCs. We ssume tht we re given n [n, t] code C 0 nd n [n, s] subcode Ĉ of C 0. The question tht we re interested is whether we cn find n [n, k] MRSC C of C 0 such tht Ĉ is subcode of C. We ssume tht the prmeters s, k, t stisfy the reltion s k t. Also, let G, G 0 nd Ĝ denote genertor mtrices for the codes C, C 0 nd Ĉ, respectively. Recll from Definition 1 tht for C to be n MRSC of C 0, it must be true tht ρ(g 0 S = k = ρ(g S = k, S [n], S = k. (32 Thus, necessry condition for the existence of the sndwiched MRSC C cn be given s follows: ρ(g 0 S = k = ρ (Ĝ S = s, S [n], S = k. (33 Note tht the condition in (33 is equivlent to sying tht ρ (Ĝ S = s for ny S [n], S = k which is n k-core of C0. In the following lemm, we show the sufficiency of the condition in (33 for the existence of sndwiched MRSCs, under the ssumption of lrge enough finite field size q. Lemm 5.2: Suppose tht we re given n [n, t] code C 0 over F q, nd n [n, s] subcode Ĉ, whose genertor mtrices G 0 nd Ĝ stisfy (33, where k is such tht s < k < t. Then, there exists n [n, k] code C such tht

14 Ĉ C nd C is mximlly recoverble subcode of C 0, whenever q > ( n k. Proof: See Appendix D. We next present n lternte, explicit construction of sndwiched MRSCs ppers using linerized polynomils. Construction 5.3: Consider the code [n, t] code C 0 over F q, hving genertor mtrix G 0. Also, consider the [n, s] subcode Ĉ of C 0, hving genertor mtrix Ĝ. Without loss of generlity, ssume tht the genertor mtrix G 0 is given by G 0 = [ Ĝ B ], (34 for some t s n mtrix B. Note tht we hve ρ(b = t s. Let F Q denote n extension field of F q, where Q = q t s. Further, let {α i F Q, 1 i t s} denote bsis of F Q over F q. Define the elements {β i F Q, 1 i n} s follows: [β 1 β 2 β n ] = [α 1 α 2 α t s ]B. (35 Next, consider the code C over F Q hving genertor mtrix G given by Ĝ β 1 β 2 β n G = β q 1 β q 2 βn q. (36. β qk s 1 1 β qk s 1 2 βn qk s 1 The code C is the cndidte for the [n, k] sndwiched MRSC of C 0, where we consider C 0 nd Ĉ themselves s codes over F Q. This is formlly stted in the following theorem. Theorem 5.4: Consider n [n, t] code C 0 over F q, hving genertor mtrix G 0 F t n q. Also, consider the [n, s] subcode Ĉ of C 0, hving genertor mtrix Ĝ such tht the following condition is stisfied: ρ(g 0 S = k = ρ (Ĝ S = s, S [n], S = k, (37 where k is such tht s k t. Next, let C (Q 0 nd Ĉ (Q denote the [n, t] nd [n, s] codes over F Q tht re respectively generted by G 0 nd Ĝ, where Q = q t s. Then the [n, k] code C over F Q obtined in Construction 5.3 is mximlly recoverble subcode of C (Q 0 such tht Ĉ (Q C. Proof: See Appendix E. C. Optiml Communiction Cost for Arbitrry Mtrices A nd B under the Brodcst Setting We now use the concept of sndwiched MRSCs nd chrcterize the optiml communiction cost for the brodcst problem when the codes C A nd C B hve non-trivil intersection. Below, we first give qulittive description of our pproch, before presenting technicl detils. Consider vlid scheme (H, D 1, D 2 tht we design for the problem. Let H A nd H B denote the rows in the row-spce of H which help the decoders D 1 nd D 2 recover A(X n + E n nd B(X n + E n, respectively. Also let C HA nd C HB denote the liner codes generted by H A nd H B respectively. Our pproch to minimizing the communiction cost is to pick the encoder H such tht C HA is 2ɛ-dimensionl (ssuming tht m A > 2ɛ MRSC of C A, C HB is 2ɛ-dimensionl (ssuming tht m B > 2ɛ MRSC of C B, nd lso mximize the dimension of intersection between the subcodes C HA nd C HB. The extent to which dim(c HA C HB cn be mximized depends on the dimension of the intersection between C A nd C B.

15 For ese of presenttion, we give the expression nd proofs for optiml communiction cost under the ssumption 5 tht both m A nd m B re greter thn 2ɛ. Under this ssumption, consider the code C = CA C B hving genertor mtrix H. Further, consider the quntities θa, θ B nd θ defined s follows: θ A = min ρ( H S, (38 S [n], S =2ɛ S is 2ɛ-core of CA θ B = min ρ( H S, nd (39 S [n], S =2ɛ S is 2ɛ-core of CB θ = min(θ A, θ B. (40 Note tht the prmeters θ A, θ B nd θ re entirely determined given the mtrices A nd B, nd the sprsity prmeter ɛ. The following theorem chrcterizes the optiml communiction cost in terms of the prmeter θ. Theorem 5.5: The optiml communiction cost for the brodcst setting shown in Fig. 2 is given by l = 4ɛ θ, (41 whenever m A nd m B re both greter thn 2ɛ, nd where the prmeter θ is s defined by (40. Achievbility is gurnteed under the ssumption tht the field size q > mx (( ( n 2ɛ, n θ. Proof: Let us prove the converse first, i.e., we show tht the communiction cost of n encoder H ssocited with ny vlid scheme is lower bounded by l = ρ(h 4ɛ θ. Towrds this, consider the codes C H, C A, C B, nd define the codes C HA, C HB nd Ĉ s follows: C HA = C H C A, (42 C HB = C H C B, nd (43 Ĉ = C HA C HB. (44 Note tht Ĉ C = CA C B. Also ssume tht H A, H B nd Ĥ denote genertor mtrices for the codes C HA, C HB nd Ĉ, respectively. From Theorem 3.1, we know tht if Y n is ny 2ɛ-sprse vector such tht AY n 0, then H A Y n 0. From this it follows tht ρ(a S = 2ɛ = ρ(h A S = 2ɛ, S [n], S = 2ɛ. (45 Next, consider the definition of θ A in (38, nd let S [n], S = 2ɛ denote 2ɛ-core of C A such tht Now, consider the following chin of inequlities: ρ( H S = θ A. (46 2ɛ ( = ρ(a S (47 (b = ρ(h A S (48 (c ( ρ(ĥ S + ρ(h A ρ(ĥ (49 (d ( ρ( H S + ρ(h A ρ(ĥ (50 ( (e = θ A + ρ(h A ρ(ĥ. (51 Here, ( follows becuse S is 2ɛ-core of C A, (b follows from (45, (c follows by observing since Ĉ is subcode of C A, we hve ρ(h A ρ(ĥ ρ(h A S ρ(ĥ S, for ny set S [n]. In prticulr, it is true tht ρ(h A ρ(ĥ ρ(h A S ρ(ĥ S, 5 This is the hrdest of ll the cses. The remining cses when one or both the rnks m A nd m B re less thn 2ɛ cn be similrly hndled.

16 (d follows since Ĉ is subcode of C, nd finlly (e follows from (46. In other words, we get tht In similr fshion, one cn lso show tht ρ(ĥ θ A + ρ(h A 2ɛ. (52 ρ(ĥ θ B + ρ(h B 2ɛ. (53 The communiction cost l ssocited with the encoder H cn now be lower bounded s follows: l = ρ(h ρ(h A + ρ(h B ρ(ĥ (55 ( ρ(h A + ρ(h B min(θ A + ρ(h A 2ɛ, θ B + ρ(h B 2ɛ (56 ρ(h A + ρ(h B min(θ A, θ B mx(ρ(h A, ρ(h B + 2ɛ (57 (b 4ɛ min(θ A, θ B = 4ɛ θ. (58 Here ( follows from (52 nd (53, nd (b follows from our ssumption tht the rnks of both H A nd H B re greter thn or equl to 2ɛ. This completes the proof of the lower bound on the communiction cost. Proof of Achievbility: Let us now show tht it is indeed possible to construct vlid scheme hving communiction cost l = 4ɛ θ, under the ssumption of sufficiently lrge field size. Towrds this, consider the code C nd let Ĉ denote n θ-dimensionl MRSC of C. We know from Lemm 5.2 tht the code Ĉ lwys exists 6 whenever the field size q > ( n θ. Also, let Ĥ denote genertor mtrix for the code Ĉ. Now, observe tht if S is 2ɛ-core of either CA or C B, we know from the definition of θ in (40 tht ρ( H S θ. Noting tht θ 2ɛ nd using the fct tht Ĉ is n θ-dimensionl MRSC of C, we get tht ρ(ĥ S = θ. In this cse, we know from Lemm 5.2 tht it is possible to identify 2ɛ-dimensionl MRSC C HA of C A such tht Ĉ C HA, whenever the field size q > ( n 2ɛ. Similrly, we cn lso identify 2ɛ-dimensionl MRSC CHB of C B such tht Ĉ C HB, whenever the field size q > ( n 2ɛ. The overll field size requirement, when we tke into the ccount the minimum q needed for the existence of Ĉ is given by q > mx (( ( n 2ɛ, n θ. The cndidte code for the encoder is now given by C H = C HA + C HB. The communiction cost of the encoder H is then given by l = rnk(h A + rnk(h B dim(c HA C HB rnk(h A + rnk(h B rnk(ĥ = 4ɛ θ. Also, we know from the chievbility result of the point-to-point setting in Section III-B tht decoders D 1 nd D 2 in Fig. 2 cn be constructed bsed on the mtrices H A nd H B respectively. This completes the proof of the chievbility prt of the theorem. Remrk 4: In the bove proof of chievbility, we noted tht Ĉ C HA C HB. In fct, it is strightforwrd to see tht Ĉ = C HA C HB ; else we would contrdict the minimlity of either θ A or θ B or both. The following exmple illustrtes the cse under considertion. Exmple 3 revisited: We now revisit Exmple 3 where we considered striping nd encoding of dt file by [N = 5, K = 3, D = 4](α = 4, β = 1 MBR code over F q tht encodes 9 symbols into 20 symbols nd stores cross N = 5 nodes such tht ech node holds α = 4 symbols. Recll tht the contents of ny K = 3 nodes re sufficient to reconstruct the 9 uncoded symbols, nd tht the contents of ny one of the N = 5 node cn be recovered by connecting to ny set of D = 4 other nodes. Also, recll tht the encoding is described by 6 Note tht this follows from Lemm 2.2 s well; our use of Lemm 5.2 (with Ĉ = {0} in Lemm 5.2 must be considered s mtter of choice. (54